Warning, /system/kio-fuse/DESIGN.md is written in an unsupported language. File is not indexed.

0001 # Design of KIO FUSE
0002 
0003 This file explains the internal implementation only, please read the README
0004 first to learn about the external interface.
0005 
0006 ## Goals
0007 
0008 KIO FUSE's design is based on these requirements:
0009 
0010 * Maximum compatibility with applications (as far as KIO allows).
0011 * Work with most protocols: The minimum set of operations a worker must support
0012   are `KIO::stat` and `KIO::get`.
0013 * Good usability: this means mostly acceptable speed (using caching whereever
0014   possible), but also having a simple API to the outside world.
0015 * Security: the password of mounted URLs is not obviously visible.
0016 * Keep It Simple Stupid (KISS).
0017 
0018 ## Use of the `libfuse` low-level API
0019 
0020 Compared to the "old" version of KIO FUSE in SVN, this implementation has a
0021 major difference in the implementation: instead of the high-level `libfuse` API,
0022 which translates the inode numbers passed from the kernel to paths for calling
0023 the operations, the lowlevel `libfuse` API is used.
0024 
0025 While it may look like translating paths to URLs is much easier than keeping
0026 track of inode numbers, the high-level API has actually completely different
0027 behaviour in many other ways, which actually makes it much more complex to use.
0028 The most important difference is that the lowlevel API can be used
0029 asynchronously, which makes it possible to process multiple requests in one
0030 thread. This matches the Qt (and thus KIO) event processing model perfectly.
0031 This means that multithreading is not required (KIO works on the main thread
0032 anyway), resulting in no need for locking and passing queues around.
0033 
0034 Additonally, a design flaw in `libfuse` means that it's impossible to keep track
0035 of the lookup count correctly/race-free when using multiple threads:
0036 [Move Lookup Count Management into LLFUSE?](https://github.com/python-llfuse/python-llfuse/blob/master/developer-notes/lookup_counts.rst)
0037 
0038 The additional `lookup` lowlevel operation makes it possible to avoid calling
0039 `readdir` on all path elements when opening a file for the first time.
0040 Example: `smb://host/dir/subdir/file` is mounted as `smb/host/dir/subdir/file`.
0041 When opening that for the first time with the high-level API, it would result in
0042 these calls: `opendir(/) -> readdir(/) -> closedir(/) -> opendir(smb) -> ...`
0043 This is because `libfuse` has to build an internal tree model for mapping inode
0044 numbers to path elements.
0045 With the lowlevel API, lookup is enough: `lookup(/, smb) -> lookup(smb, host)...`
0046 This can be implemented using `KIO::stat` instead of a full `KIO::listDir`, both on
0047 the initial access and when the node already exists locally, to recheck whether
0048 the local representation still matches.
0049 With the high-level API, lookup on existing nodes is not passed to the FS
0050 implementation if the node is part of the internal tree already. This makes
0051 it harder (if not infeasible) to react to changes on the remote side, e.g.
0052 deletes or renames.
0053 
0054 Not using inode numbers in the high-level API means that implementing unlink
0055 properly (i.e. already opened file handles are still valid) is not possible,
0056 so instead of calling unlink directly, `libfuse` renames deleted files as
0057 `.fuse_hiddenXXX` and deletes them when their lookup count drops to zero.
0058 By using the low-level API, implementing deletion is up to the filesystem.
0059 
0060 ## The VFS node tree
0061 
0062 Downside of the lowlevel API is that the inode number → path mapping has to be
0063 implemented by the filesystem. For implementing local caching of nodes having
0064 a tree structure is necessary anyway though, so this does not actually make it
0065 more complex.
0066 
0067 The tree is implemented as a `std::unordered_map` of `fuse_ino_t` to `KIOFuseNode`.
0068 Each node knows about its parent and children inode numbers. The two root nodes
0069 have an invalid inode number (0) set as parent.
0070 For details on the class hierarchy and their members, read
0071 [kiofusenode.h](./kiofusenode.h).
0072 
0073 For carrying out special operations depending on the node type, RTTI is used,
0074 by either querying the typeid or doing a `dynamic_cast`.
0075 
0076 During runtime, the tree can look like this:
0077 
0078 ```
0079 "" (ino: 1)
0080 KIOFuseDirNode
0081     |
0082     |        "smb"
0083     |------> KIOFuseDirNode
0084     |           \
0085     |            \        "user@fileserver01"       "a file"
0086     |             ------> KIOFuseRemoteDirNode -----> KIOFuseRemoteFileNode
0087     |                    "user:pass@fileserver01"
0088     |                         \
0089     |                          \        "directory"
0090     |                           ------> KIOFuseRemoteDirNode
0091     |                                       \
0092     |                                        \        "another file"
0093     |                                         ------> KIOFuseRemoteFileNode
0094     |
0095     |       "sftp"
0096     ------> KIOFuseDirNode
0097             \
0098             \        "user@someserver"         "a file"
0099             ------> KIOFuseRemoteDirNode -----> KIOFuseRemoteFileNode
0100                     "user:pass@someserver"
0101 
0102 "" (ino: 2)           "deleted file"
0103 KIOFuseDirNode ----> KIOFuseRemoteFileNode
0104 ```
0105 
0106 The root node with inode number 1 represents the root of the VFS.
0107 Only files below are visible in the VFS hierarchy.
0108 
0109 Note that `RemoteFileNode` is an abstract class. At runtime, it will actually be
0110 instantiated as one of its subclasses (`KIOFuseRemoteCacheBasedNode` or
0111 `KIOFuseRemoteFileJobBasedNode`). The type of class instantiated will depend on
0112 the URL of the file. Please see the [File IO](#file-io) section to learn more.
0113 
0114 Remote nodes (`KIOFuseRemote*Node`) are derived from `KIOFuseRemoteFileInfo` in
0115 addition to the base node type, which contains some members specific for remote
0116 access.
0117 
0118 `m_overrideUrl` is used to implement URL mountpoints and redirections. To get the
0119 remote URL of a node, the tree is traversed upwards until an override is found
0120 and the path is appended.
0121 
0122 `m_lastStatRefresh` stores the time when `m_stat` was updated last. It's updated
0123 on node construction and by `updateNodeFromUDSEntry` and queried by the
0124 `awaitAttrRefreshed` method.
0125 
0126 ## Mounting a URL
0127 
0128 Goal of mounting a URL is to make the target file/directory reachable over the
0129 FUSE filesystem. The first step is to verify that the target actually exists,
0130 if that is not the case an error is returned without doing any changes.
0131 
0132 If the target is reachable, the next step is to find which part of the URL
0133 (from left to right) is the first accessible one. This is needed as ioworkers
0134 like `tar` do not support listing `tar:///` and instead need some part of the path
0135 to return results. This minimum URL is the "origin" and a `KIOFuseRemoteDirNode`
0136 with the origin as `m_overrideUrl` is created, with the parents as plain
0137 `KIOFuseDirNode`s.
0138 
0139 The local path to this origin node with the path from the origin to the target
0140 node is returned. Note that at this point this node doesn't actually exist
0141 locally though, only everything up until (including) the origin. To reach the
0142 rest, the kernel does lookup operations which trigger `KIO::stat` and node
0143 creation for each path component.
0144 
0145 Initially, mounting was implemented in a different way, to only require a
0146 single `KIO::stat` call for determining accessibility and the target node's
0147 attributes. All path components except the final one were assumed to be
0148 traversable directories, but this assumption doesn't hold for symlinks. By
0149 letting the kernel deal with path traversal, symlinks returned by lookup are
0150 handled correctly.
0151 
0152 ## Unlinking a node
0153 
0154 The root node with inode number 2 is used as a parent for deleted, but still
0155 opened (non-zero lookup count) nodes. This is used for proper unlinking.
0156 When the loopup count of a node below the "deleted root" drops to zero, the
0157 node is deleted, i.e. the inode number can be reused and memory is freed.
0158 When unlinking a node which already has a lookup count of zero, it is directly
0159 deleted.
0160 
0161 ## General anatomy of a write operation
0162 
0163 All write operations are implemented by verifying the parameters locally (if
0164 possible at all) and then starting the operation to KIO. Once the operation
0165 completes, either an error is sent or the change is done in the local VFS tree
0166 and the result is sent.
0167 
0168 ```cpp
0169 void KIOFuseVFS::operation(fuse_req_t req, fuse_ino_t inode, ...)
0170 {
0171     KIOFuseVFS *that = reinterpret_cast<KIOFuseVFS*>(fuse_req_userdata(req));
0172     auto node = that->nodeForIno(parent);
0173     if(!node) complain and return;
0174 
0175     auto job = KIO::operation(that->remoteUrl(node), ...);
0176     connect(job, &finished, [=] {
0177         if(job->error())
0178             fuse_reply_err(req, kioErrorToFuseError(job->error()));
0179         else
0180         {
0181             that->doOperation(node);
0182             fuse_reply_result(req, ...);
0183         }
0184     });
0185 }
0186 ```
0187 
0188 ## Permissions
0189 
0190 While the `st_uid`/`st_gid`/`st_mode` fields of nodes are used from KIO if possible,
0191 access is not checked by `kio-fuse` at all. Instead, KIO returns errors if an
0192 operation fails because of missing permissions and those are simply forwarded.
0193 
0194 ## Node attributes
0195 
0196 For every node in the VFS, the full `struct stat` is already available when
0197 inserting it into the tree. This happens when mounting a URL (uses `KIO::stat`)
0198 and when requesting the children of a URL (using `KIO::listDir` with details).
0199 The same is true of the symlink's target path.
0200 
0201 As a result, `getattr` and `readlink` are non-blocking if the node's attributes
0202 have not timed out.
0203 
0204 `setattr` instead does block, it only returns if all of the requested operations
0205 (e.g. `SET_ATTR_MTIME`, `SET_ATTR_MODE`, `SET_ATTR_UID`) completed.
0206 
0207 ## Directory operations
0208 
0209 To support the optimization possibility the lookup operation in the low-level
0210 API offers, children of `KIOFuseRemoteDirNode` are loaded lazily. This means
0211 that the full list of children is only requested (using `KIO::listDir`) if
0212 required, so if lookup on the directory fails or if readdir is executed.
0213 
0214 A node's children are considered valid for 30 seconds. The last time a node
0215 was dir listed via `KIO::listDir` is stored in `m_lastChildrenRefreshed`.
0216 Each readdir request checks if they have timed out via the
0217 `haveChildrenTimedOut()` method and updates the children (and consequently, their
0218 attributes) as appropriate. This is implemented in `awaitChildrenComplete`.
0219 
0220 ## Node Expiration
0221 
0222 Each remote node has a timeout on its attributes and its children, which is
0223 currently set to 30 seconds.
0224 
0225 When a node's attributes are requested, the `awaitAttrRefreshed` method checks
0226 whether the attributes expired and if so, calls `mountUrl` to refresh it via
0227 `updateNodeFromUDSEntry`. If the result of `KIO::stat` indicates that the node does
0228 not exist on the remote side anymore it is (recursively) marked as deleted.
0229 Otherwise, a new node based on the fresh attributes is created and if the type
0230 matches, used to update the existing node. If the type does not match, the old
0231 node is marked as deleted and the new node inserted into the tree.
0232 
0233 For directories, `awaitChildrenComplete` calls `KIO::listDir` for refreshing the
0234 list of children, either removing vanished nodes, creating new nodes or
0235 updating existing ones using the same method as outlined above.
0236 
0237 ## File IO
0238 
0239 File IO is done in either of two ways, depending on what the protocol supports.
0240 If the protocol supports `KIO::open` (and all of its operations, which at the time
0241 of writing is `read`/`write`/`seek`/`truncate`) then IO will be based upon KIO's `FileJob`.
0242 KIO's `FileJob` interface allows random-access IO, and hence all `read`, `write` and
0243 `truncate` requests are simply forwarded to the corresponding `FileJob` functions.
0244 
0245 Whilst improving performance for larger files compared to the cache-based IO
0246 described below, the performance of individual `read`/`write`/`truncate` requests
0247 if significantly reduced.
0248 
0249 The protocols that currently support `KIO::open` are `file`/`sftp`/`smb`.
0250 
0251 Otherwise, file IO is implemented completely on top of a file based cache.
0252 On the first read or write access to a non truncated file, the whole file is
0253 downloaded into a new temporary file and all readers are notified on cache
0254 completeness changes (see `awaitBytesAvailable`).
0255 
0256 Therefore the read and write ops itself are trivial, they just forward the
0257 IO operation to the temporary file once enough data is available.
0258 
0259 On each write to a file, the file is marked as dirty and added to the set
0260 of dirty nodes. On various occasions, `awaitNodeFlushed` is called which removes
0261 the node from the dirty set and starts a `KIO::put` for the file. On success,
0262 it is checked whether a write occured during flushing and if so, another
0263 flush is started. This is repeated until the node was still marked as clean
0264 on finish.
0265 
0266 When there a no open file descriptors to a node anymore, the cache is flushed
0267 if necessary and then dropped.
0268 
0269 ## Hardlinks
0270 
0271 Hardlinks are not supported well in the current design of KIO so they were
0272 simply not considered during KIO FUSE development either.
0273 
0274 While inode and device numbers can be returned are part of UDSEntries returned
0275 from workers, neither `stat.st_link` nor `::link` are accessible.