Warning, /system/kio-fuse/DESIGN.md is written in an unsupported language. File is not indexed.
0001 # Design of KIO FUSE 0002 0003 This file explains the internal implementation only, please read the README 0004 first to learn about the external interface. 0005 0006 ## Goals 0007 0008 KIO FUSE's design is based on these requirements: 0009 0010 * Maximum compatibility with applications (as far as KIO allows). 0011 * Work with most protocols: The minimum set of operations a worker must support 0012 are `KIO::stat` and `KIO::get`. 0013 * Good usability: this means mostly acceptable speed (using caching whereever 0014 possible), but also having a simple API to the outside world. 0015 * Security: the password of mounted URLs is not obviously visible. 0016 * Keep It Simple Stupid (KISS). 0017 0018 ## Use of the `libfuse` low-level API 0019 0020 Compared to the "old" version of KIO FUSE in SVN, this implementation has a 0021 major difference in the implementation: instead of the high-level `libfuse` API, 0022 which translates the inode numbers passed from the kernel to paths for calling 0023 the operations, the lowlevel `libfuse` API is used. 0024 0025 While it may look like translating paths to URLs is much easier than keeping 0026 track of inode numbers, the high-level API has actually completely different 0027 behaviour in many other ways, which actually makes it much more complex to use. 0028 The most important difference is that the lowlevel API can be used 0029 asynchronously, which makes it possible to process multiple requests in one 0030 thread. This matches the Qt (and thus KIO) event processing model perfectly. 0031 This means that multithreading is not required (KIO works on the main thread 0032 anyway), resulting in no need for locking and passing queues around. 0033 0034 Additonally, a design flaw in `libfuse` means that it's impossible to keep track 0035 of the lookup count correctly/race-free when using multiple threads: 0036 [Move Lookup Count Management into LLFUSE?](https://github.com/python-llfuse/python-llfuse/blob/master/developer-notes/lookup_counts.rst) 0037 0038 The additional `lookup` lowlevel operation makes it possible to avoid calling 0039 `readdir` on all path elements when opening a file for the first time. 0040 Example: `smb://host/dir/subdir/file` is mounted as `smb/host/dir/subdir/file`. 0041 When opening that for the first time with the high-level API, it would result in 0042 these calls: `opendir(/) -> readdir(/) -> closedir(/) -> opendir(smb) -> ...` 0043 This is because `libfuse` has to build an internal tree model for mapping inode 0044 numbers to path elements. 0045 With the lowlevel API, lookup is enough: `lookup(/, smb) -> lookup(smb, host)...` 0046 This can be implemented using `KIO::stat` instead of a full `KIO::listDir`, both on 0047 the initial access and when the node already exists locally, to recheck whether 0048 the local representation still matches. 0049 With the high-level API, lookup on existing nodes is not passed to the FS 0050 implementation if the node is part of the internal tree already. This makes 0051 it harder (if not infeasible) to react to changes on the remote side, e.g. 0052 deletes or renames. 0053 0054 Not using inode numbers in the high-level API means that implementing unlink 0055 properly (i.e. already opened file handles are still valid) is not possible, 0056 so instead of calling unlink directly, `libfuse` renames deleted files as 0057 `.fuse_hiddenXXX` and deletes them when their lookup count drops to zero. 0058 By using the low-level API, implementing deletion is up to the filesystem. 0059 0060 ## The VFS node tree 0061 0062 Downside of the lowlevel API is that the inode number → path mapping has to be 0063 implemented by the filesystem. For implementing local caching of nodes having 0064 a tree structure is necessary anyway though, so this does not actually make it 0065 more complex. 0066 0067 The tree is implemented as a `std::unordered_map` of `fuse_ino_t` to `KIOFuseNode`. 0068 Each node knows about its parent and children inode numbers. The two root nodes 0069 have an invalid inode number (0) set as parent. 0070 For details on the class hierarchy and their members, read 0071 [kiofusenode.h](./kiofusenode.h). 0072 0073 For carrying out special operations depending on the node type, RTTI is used, 0074 by either querying the typeid or doing a `dynamic_cast`. 0075 0076 During runtime, the tree can look like this: 0077 0078 ``` 0079 "" (ino: 1) 0080 KIOFuseDirNode 0081 | 0082 | "smb" 0083 |------> KIOFuseDirNode 0084 | \ 0085 | \ "user@fileserver01" "a file" 0086 | ------> KIOFuseRemoteDirNode -----> KIOFuseRemoteFileNode 0087 | "user:pass@fileserver01" 0088 | \ 0089 | \ "directory" 0090 | ------> KIOFuseRemoteDirNode 0091 | \ 0092 | \ "another file" 0093 | ------> KIOFuseRemoteFileNode 0094 | 0095 | "sftp" 0096 ------> KIOFuseDirNode 0097 \ 0098 \ "user@someserver" "a file" 0099 ------> KIOFuseRemoteDirNode -----> KIOFuseRemoteFileNode 0100 "user:pass@someserver" 0101 0102 "" (ino: 2) "deleted file" 0103 KIOFuseDirNode ----> KIOFuseRemoteFileNode 0104 ``` 0105 0106 The root node with inode number 1 represents the root of the VFS. 0107 Only files below are visible in the VFS hierarchy. 0108 0109 Note that `RemoteFileNode` is an abstract class. At runtime, it will actually be 0110 instantiated as one of its subclasses (`KIOFuseRemoteCacheBasedNode` or 0111 `KIOFuseRemoteFileJobBasedNode`). The type of class instantiated will depend on 0112 the URL of the file. Please see the [File IO](#file-io) section to learn more. 0113 0114 Remote nodes (`KIOFuseRemote*Node`) are derived from `KIOFuseRemoteFileInfo` in 0115 addition to the base node type, which contains some members specific for remote 0116 access. 0117 0118 `m_overrideUrl` is used to implement URL mountpoints and redirections. To get the 0119 remote URL of a node, the tree is traversed upwards until an override is found 0120 and the path is appended. 0121 0122 `m_lastStatRefresh` stores the time when `m_stat` was updated last. It's updated 0123 on node construction and by `updateNodeFromUDSEntry` and queried by the 0124 `awaitAttrRefreshed` method. 0125 0126 ## Mounting a URL 0127 0128 Goal of mounting a URL is to make the target file/directory reachable over the 0129 FUSE filesystem. The first step is to verify that the target actually exists, 0130 if that is not the case an error is returned without doing any changes. 0131 0132 If the target is reachable, the next step is to find which part of the URL 0133 (from left to right) is the first accessible one. This is needed as ioworkers 0134 like `tar` do not support listing `tar:///` and instead need some part of the path 0135 to return results. This minimum URL is the "origin" and a `KIOFuseRemoteDirNode` 0136 with the origin as `m_overrideUrl` is created, with the parents as plain 0137 `KIOFuseDirNode`s. 0138 0139 The local path to this origin node with the path from the origin to the target 0140 node is returned. Note that at this point this node doesn't actually exist 0141 locally though, only everything up until (including) the origin. To reach the 0142 rest, the kernel does lookup operations which trigger `KIO::stat` and node 0143 creation for each path component. 0144 0145 Initially, mounting was implemented in a different way, to only require a 0146 single `KIO::stat` call for determining accessibility and the target node's 0147 attributes. All path components except the final one were assumed to be 0148 traversable directories, but this assumption doesn't hold for symlinks. By 0149 letting the kernel deal with path traversal, symlinks returned by lookup are 0150 handled correctly. 0151 0152 ## Unlinking a node 0153 0154 The root node with inode number 2 is used as a parent for deleted, but still 0155 opened (non-zero lookup count) nodes. This is used for proper unlinking. 0156 When the loopup count of a node below the "deleted root" drops to zero, the 0157 node is deleted, i.e. the inode number can be reused and memory is freed. 0158 When unlinking a node which already has a lookup count of zero, it is directly 0159 deleted. 0160 0161 ## General anatomy of a write operation 0162 0163 All write operations are implemented by verifying the parameters locally (if 0164 possible at all) and then starting the operation to KIO. Once the operation 0165 completes, either an error is sent or the change is done in the local VFS tree 0166 and the result is sent. 0167 0168 ```cpp 0169 void KIOFuseVFS::operation(fuse_req_t req, fuse_ino_t inode, ...) 0170 { 0171 KIOFuseVFS *that = reinterpret_cast<KIOFuseVFS*>(fuse_req_userdata(req)); 0172 auto node = that->nodeForIno(parent); 0173 if(!node) complain and return; 0174 0175 auto job = KIO::operation(that->remoteUrl(node), ...); 0176 connect(job, &finished, [=] { 0177 if(job->error()) 0178 fuse_reply_err(req, kioErrorToFuseError(job->error())); 0179 else 0180 { 0181 that->doOperation(node); 0182 fuse_reply_result(req, ...); 0183 } 0184 }); 0185 } 0186 ``` 0187 0188 ## Permissions 0189 0190 While the `st_uid`/`st_gid`/`st_mode` fields of nodes are used from KIO if possible, 0191 access is not checked by `kio-fuse` at all. Instead, KIO returns errors if an 0192 operation fails because of missing permissions and those are simply forwarded. 0193 0194 ## Node attributes 0195 0196 For every node in the VFS, the full `struct stat` is already available when 0197 inserting it into the tree. This happens when mounting a URL (uses `KIO::stat`) 0198 and when requesting the children of a URL (using `KIO::listDir` with details). 0199 The same is true of the symlink's target path. 0200 0201 As a result, `getattr` and `readlink` are non-blocking if the node's attributes 0202 have not timed out. 0203 0204 `setattr` instead does block, it only returns if all of the requested operations 0205 (e.g. `SET_ATTR_MTIME`, `SET_ATTR_MODE`, `SET_ATTR_UID`) completed. 0206 0207 ## Directory operations 0208 0209 To support the optimization possibility the lookup operation in the low-level 0210 API offers, children of `KIOFuseRemoteDirNode` are loaded lazily. This means 0211 that the full list of children is only requested (using `KIO::listDir`) if 0212 required, so if lookup on the directory fails or if readdir is executed. 0213 0214 A node's children are considered valid for 30 seconds. The last time a node 0215 was dir listed via `KIO::listDir` is stored in `m_lastChildrenRefreshed`. 0216 Each readdir request checks if they have timed out via the 0217 `haveChildrenTimedOut()` method and updates the children (and consequently, their 0218 attributes) as appropriate. This is implemented in `awaitChildrenComplete`. 0219 0220 ## Node Expiration 0221 0222 Each remote node has a timeout on its attributes and its children, which is 0223 currently set to 30 seconds. 0224 0225 When a node's attributes are requested, the `awaitAttrRefreshed` method checks 0226 whether the attributes expired and if so, calls `mountUrl` to refresh it via 0227 `updateNodeFromUDSEntry`. If the result of `KIO::stat` indicates that the node does 0228 not exist on the remote side anymore it is (recursively) marked as deleted. 0229 Otherwise, a new node based on the fresh attributes is created and if the type 0230 matches, used to update the existing node. If the type does not match, the old 0231 node is marked as deleted and the new node inserted into the tree. 0232 0233 For directories, `awaitChildrenComplete` calls `KIO::listDir` for refreshing the 0234 list of children, either removing vanished nodes, creating new nodes or 0235 updating existing ones using the same method as outlined above. 0236 0237 ## File IO 0238 0239 File IO is done in either of two ways, depending on what the protocol supports. 0240 If the protocol supports `KIO::open` (and all of its operations, which at the time 0241 of writing is `read`/`write`/`seek`/`truncate`) then IO will be based upon KIO's `FileJob`. 0242 KIO's `FileJob` interface allows random-access IO, and hence all `read`, `write` and 0243 `truncate` requests are simply forwarded to the corresponding `FileJob` functions. 0244 0245 Whilst improving performance for larger files compared to the cache-based IO 0246 described below, the performance of individual `read`/`write`/`truncate` requests 0247 if significantly reduced. 0248 0249 The protocols that currently support `KIO::open` are `file`/`sftp`/`smb`. 0250 0251 Otherwise, file IO is implemented completely on top of a file based cache. 0252 On the first read or write access to a non truncated file, the whole file is 0253 downloaded into a new temporary file and all readers are notified on cache 0254 completeness changes (see `awaitBytesAvailable`). 0255 0256 Therefore the read and write ops itself are trivial, they just forward the 0257 IO operation to the temporary file once enough data is available. 0258 0259 On each write to a file, the file is marked as dirty and added to the set 0260 of dirty nodes. On various occasions, `awaitNodeFlushed` is called which removes 0261 the node from the dirty set and starts a `KIO::put` for the file. On success, 0262 it is checked whether a write occured during flushing and if so, another 0263 flush is started. This is repeated until the node was still marked as clean 0264 on finish. 0265 0266 When there a no open file descriptors to a node anymore, the cache is flushed 0267 if necessary and then dropped. 0268 0269 ## Hardlinks 0270 0271 Hardlinks are not supported well in the current design of KIO so they were 0272 simply not considered during KIO FUSE development either. 0273 0274 While inode and device numbers can be returned are part of UDSEntries returned 0275 from workers, neither `stat.st_link` nor `::link` are accessible.