User Tools

Site Tools


network_storage_accessibility_and_peformance

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Last revision Both sides next revision
network_storage_accessibility_and_peformance [2014/11/14 11:37]
admin created
network_storage_accessibility_and_peformance [2014/11/14 15:21]
admin [File Level Sharing]
Line 19: Line 19:
 ===== File Level Sharing ===== ===== File Level Sharing =====
  
-This is the first type of sharing storage over the network. It basically means that client computers see the network attached storage as a tree of directories and files. Then they can issue commands to the server to read or write those files.+This is the first type of sharing storage over the network. It basically means that the server exposes the entire shared tree of directories and files to the client and client computers see the network attached storage as a tree of directories and files. Then they can issue commands to the server to read or write those files.
  
 The key feature of this type of sharing is convenience. The protocols are very well supported by all major operating systems and you can mount the virtual the entire remote storage tree on your local computer and access it as a local resource. One other important feature of this type of sharing is that many clients can read and write remote files at the same time. The server takes care of keeping the data consistent all the time. However this comes at a cost because this mechanism implies locking, something that is done transparently by the server but which decreases performance most notably in our scenario: accessing a very large number of relatively small files. The key feature of this type of sharing is convenience. The protocols are very well supported by all major operating systems and you can mount the virtual the entire remote storage tree on your local computer and access it as a local resource. One other important feature of this type of sharing is that many clients can read and write remote files at the same time. The server takes care of keeping the data consistent all the time. However this comes at a cost because this mechanism implies locking, something that is done transparently by the server but which decreases performance most notably in our scenario: accessing a very large number of relatively small files.
Line 42: Line 42:
  
 ==== SFTP (RSync, SCP, FUSE) ==== ==== SFTP (RSync, SCP, FUSE) ====
 +
 +All these rely on ssh, which (as the name suggests) was meant to provide secure shell access. But it actually provides much more than this: encryption for transfering files, forwarding internet traffic and so on. SFTP is actually the part that provides file transfers with an ftp like access. SCP is a very simple way to transfer files. FUSE is actuallly a piece of software to mount filesystems in userspace. What it actually means is that by writing a FUSE plugin you can make any filesystem available without modifying the kernel (or writing a driver). It has a SFTP plugin that can be used to mount SFTP remote filesystems just as you would with SMB. Last but not least rsync is a very powerful program that can transfer files over ssh. The power comes from it's flexibility and features. It is a much more powerful solution than scp or sftp.
 +
 +Provided with a powerful CPU transfer over ssh is the fastest solution available. When I say that I mean not only for large file transfers but especially for small files. When using sftp transfer with on the fly compression you can even achieve transfer rates higher than the actual maximum network speed (for highly compressible data). On the fly compression is a standard feature of ssh which is absent by default in all other solutions. If you mention also that all transfers are encrypted by default this solution becomes a clear winner on the security part as well. None of the other solutions provide encryption by default and some lack much more than this on the security side.
 +
 +On our particular set of tests transfering /usr data with rsync was 10 times faster than SMB or NFS. With SSHFS (FUSE with SSH plugin) we achieved only about 5 times the performance of SMB or NFS. In our tests we didn't use compression. If you transfer even smaller files the performance is expected to increase much more compared with SMB or NFS. The tremendous performance is achieved because of the protocol design but there'​s no point in going into details about this.
 +
 +There are some caveats though. First is that if you read carefully you saw something about CPU power. All transfers over ssh are encrypted - this was the whole purpose of ssh in the first place. Encryption requires a lot of CPU power. And it gets even better: there are 2 CPUs involved for each transfer - the server one and the client one. If either one does not have enough power to sustain the transfer speed it becomes the bottleneck and it limits the entire transfer speed according to it's power. Some new CPUs try to help in this situation because they provide AES hardware acceleration (AES-NI instructions),​ AES being the most popular encryption block cipher. When you use such CPUs both on server and client the performance is expected to increase tremendously when using AES encryption. If at least one of the CPUs does not support this instruction set or you want to use a different encryption algorithm then your CPU might easily become the bottleneck.
 +
 +One popular way to ease the pain when you don't have AES-NI on both sides is to use the Arcfour protocol aka RC4 (either arcfour, arcfour128 or arcfour256 in ssh Ciphers option - performance difference between them is marginal). All benchmarks point this one to be the fastest ssh encryption protocol without hardware acceleration. We used it ourselves in our tests (arcfour128). The downside in this case is that arcfour is currently considered insecure. This doesn'​t mean that any kid can instantly see your traffic but don't rely on NSA wasting too much time on decrypting your transfered files.
 +
 +When you don't necessarily want the protection offered by ssh encryption (maybe you're on the local network in your bunker) you might want to drop it altogether and only use the tremendous speed. Unforturnately some people didn't consider this to be a very good idea so there'​s no current way in official ssh to drop encryption entirely. There are however some third party patches for this which you can use at your own risk.
 +
 +When we get to the accessibility aspect there'​s only one way to compare SSH family to SMB or NFS: SSHFS over FUSE. This is the only way to mount your remote tree in your local tree over ssh and enjoy the performance (of powerful CPUs). All other ways (rsync, scp, sftp) are linux commands which provide an entirely different experience. They'​re great for transfering a bunch files fast but not for every day browsing. There are also file manager plugins and even standalone products that you could use to browse and transfer files in a quick and convenient way. But the key difference with these is that the remote files and folders are perceived locally as remote resources. This means that if for instance you click on a film in these file managers the entire film will be first copied locally (don't hold your breath - it will take quite some time) and then it will start playing. With all other systems (SMB, NFS, SSHFS etc.) When you click on that film it starts playing immediately and data is transfered in background as needed while you watch the film.
 +
 +Even more than this FUSE is not available on all operating systems (although there are several attempts to implement something similar). When it comes to file manager plugins or stand alone programs you will probably find several of these for your operating system of choice but you won't get the user experience consistency you get from using something that mounts as a native resource.
 +
 +SSHF does a good job but using it on daily basis seams cumbersome to us. First of all it's permission handling looks more like SMB (but with some notable differencies). You are logged in on the remote machine and perform all actions in the name of that user. Furthermore this means that is difficult to share the remote tree with some other local users: they each have to mount the remote tree in their own user space using their own credentials. When you look how it handles connection problems you soon start to look for alternative solutions. NFS is no stranger in this department either but the chances of some eventual recovery are somewhat bigger if it's your lucky day.
  
  
network_storage_accessibility_and_peformance.txt ยท Last modified: 2014/11/17 11:04 by admin