|
| 1 | +# The `SftpFileSystem` |
| 2 | + |
| 3 | +Class `SftpFileSystem` is an implementation of `java.nio.file.FileSystem` and |
| 4 | +lets client code treat an SFTP server like any other file system. It is a |
| 5 | +*remote* file system, though, and that has some effects that clients have |
| 6 | +to aware of. Because operations are *remote* operations involving network |
| 7 | +requests and answers, the performance characteristics are different than |
| 8 | +most other file systems. |
| 9 | + |
| 10 | +# Creating an `SftpFileSystem` |
| 11 | + |
| 12 | +An `SftpFileSystem` needs an SSH session to be able to talk to the SFTP server. |
| 13 | + |
| 14 | +There are two ways to create an `SftpFileSystem`: |
| 15 | + |
| 16 | +1. If you already have an SSH `ClientSession`, you can create the file system |
| 17 | + off that session using `SftpClientFactory.instance().createSftpFileSystem()`. |
| 18 | + The file system remains valid until it is closed, or until the session is |
| 19 | + closed. When the file system is closed, the session will *not* be closed. |
| 20 | + |
| 21 | +2. You can create an `SftpFileSystem` with a `sftp://` URI using the standard |
| 22 | + Java factory `java.nio.file.FileSystems.newFileSystem()`. This will automatically |
| 23 | + create an `SshClient` with default settings, and the file system will open |
| 24 | + an SSH session itself. This session has heartbeats enabled to keep it open |
| 25 | + for as long as the file system is open. The file system remains valid until |
| 26 | + closed, at which point it will close the session it had created. |
| 27 | + |
| 28 | +In either case, the file system will be closed if the session closes. |
| 29 | + |
| 30 | +# SSH Resource Management |
| 31 | + |
| 32 | +Most operations on an `SftpFileSystem`, or on streams returned, produce SFTP |
| 33 | +requests over the network, and wait for a reply to be received. This works |
| 34 | +internally by using `SftpClient` to talk to the SFTP server. An `SftpClient` |
| 35 | +is the client-side implementation of the SFTP protocol over an SSH channel |
| 36 | +(a `ChannelSession`). |
| 37 | + |
| 38 | +The SSH channel and the `SftpClient` are tightly coupled: the channel is |
| 39 | +opened when the `SftpClient`is initialized, and the channel is closed when |
| 40 | +the `SftpClient` is closed, and when the channel closes, so is the `SftpClient`. |
| 41 | + |
| 42 | +For `SftpFileSystem` it would be rather inefficient to use a new `SftpClient` |
| 43 | +for each new operation. That would create a new channel every time, and tear |
| 44 | +it down after the operation. But channels have a setup cost, and the SFTP |
| 45 | +protocol also has to initialized, both of which involve exchanging messages |
| 46 | +over the network. This is not efficient if one wants to perform multiple |
| 47 | +operations, such as transferring multiple files with `java.nio.file.Files.copy()`. |
| 48 | + |
| 49 | +## The Channel Pool |
| 50 | + |
| 51 | +The `SftpFileSystem` thus employs a *pool* of `SftpClient`s. This pool is |
| 52 | +initially empty. The first operation will create an `SftpClient`, initialize |
| 53 | +it, and then perform its operation. But then it will add the still open |
| 54 | +`SftpClient` to the pool for use by subsequent operations instead of closing |
| 55 | +it. The next operation can then simply grab this already initialized `SftpClient` |
| 56 | +with its open channel and perform its operation. |
| 57 | + |
| 58 | +The pool is limited by a maximum size of `SftpModuleProperties.POOL_SIZE` (by |
| 59 | +default 8). The pool can grow to this size if there are that many threads that |
| 60 | +perform operations on the `SftpFileSystem` concurrently. |
| 61 | + |
| 62 | +`SftpClient`s in the pool need to be closed at some point. Consider an application |
| 63 | +that has a burst of file transfers and uses 8 threads to perform them. Afterwards, |
| 64 | +the pool will contain 8 `SftpClient`s: that's 8 open SSH channels, each with an SFTP |
| 65 | +subsystem at the server's end. If the application then does only little, like |
| 66 | +transferring a few files sequentially over the next few hours, until the next burst |
| 67 | +(which may never come), then we don't want to keep all 8 channels open and consuming |
| 68 | +resources not only in the client but also on the server side. |
| 69 | + |
| 70 | +(This assumes that the whole SSH session remains open for that long, which can be |
| 71 | +accomplished by using heartbeats on the session.) |
| 72 | + |
| 73 | +The `SftpFileSystem` handles this by expiring inactive clients from the pool. If a |
| 74 | +client has been in the pool for `SftpModuleProperties.POOL_LIFE_TIME` (default is 10 |
| 75 | +seconds), it is removed from the pool and closed. (If it was in the pool for that |
| 76 | +time, this means it was idle for that time: no operation was performed on it.) If |
| 77 | +no operation on the `SftpFileSystem` occurs at all for this time, it's possible that |
| 78 | +the pool is emptied, and the next operation has to create and initialize a new |
| 79 | +`SftpClient`and channel. |
| 80 | + |
| 81 | +If an application doesn't want this, it can define `SftpModuleProperties.POOL_CORE_SIZE`, |
| 82 | +which must be smaller than `POOL_SIZE`. By default, it is zero. If greater than zero, |
| 83 | +that many `SftpClient`s are kept in the pool (and that many channels are kept open) |
| 84 | +even if they are idle. |
| 85 | + |
| 86 | +It should be noted that the SFTP server may also decide to close channels whenever |
| 87 | +it wants. This will close the channel and the `SftpClient`on the client side. If it |
| 88 | +happens while a client-side operation is ongoing, the operation will fail with an |
| 89 | +exception; it it happens on an idle `SftpClient` in the pool, the `SftpClient` is |
| 90 | +simply removed from the pool. |
| 91 | + |
| 92 | +If the whole SSH session is closed, the `SftpFileSystem` is closed. When a |
| 93 | +`SftpFileSystem` is closed, all `SftpClient`s in the pool are closed, and no new |
| 94 | +clients will be added to the pool. |
| 95 | + |
| 96 | +## Choosing the Pool Size |
| 97 | + |
| 98 | +If there are more then `POOL_SIZE` threads using the same `SftpFileSystem`, it is |
| 99 | +possible that all `POOL_SIZE` clients are already in use when a thread tries to |
| 100 | +do a file system operation. In this case, a new `SftpClient` is created, which |
| 101 | +will be closed after the operation. This is a sign of the `POOL_SIZE` being too |
| 102 | +small for the application, or that the application is badly designed. Using too |
| 103 | +many threads for remote file operations is not a good idea in SFTP: all traffic |
| 104 | +in the end goes over a single network connection anyway. Using a limited number |
| 105 | +of threads may bring some speedup compared to strictly sequential operations |
| 106 | +because the handling of the data received is offloaded to these threads, while |
| 107 | +the next message can already be sent or received. But copying 1000 files using |
| 108 | +1000 threads and SSH channels is nonsense; it's far better to handle that many |
| 109 | +files in batches with a smaller number of threads (maybe 8). |
| 110 | + |
| 111 | +In any case, `SftpModuleProperties.POOL_SIZE` should be large enough to accommodate |
| 112 | +the number of threads the client application is going to use for operations on |
| 113 | +the `SftpFileSystem`. If there are more threads, performance may degrade. |
| 114 | + |
| 115 | +Versions of Apache MINA sshd <= 2.10.0 tried to mitigate this performance drop |
| 116 | +for such "extra" threads by keeping the `SftpClient` in a `ThreadLocal`, so that |
| 117 | +such "extra" threads could re-use the `SftpClient`. This mechanism has been *removed* |
| 118 | +because it sometimes caused memory leaks. The mechanism was also flawed because |
| 119 | +there were use cases where it just could not work correctly. |
| 120 | + |
| 121 | +Design your application such that is uses a small maximum number of threads that |
| 122 | +perform operations on a `SftpFileSystem`instance. Set `SftpModuleProperties.POOL_SIZE` |
| 123 | +such that it is >= the maximum number of threads that operate concurrently on the |
| 124 | +file system. The default pool size is 8. |
0 commit comments