User Tools

Site Tools


network_storage_accessibility_and_peformance

This is an old revision of the document!


A Word on Network Storage Accessibility and Performance

Introduction

You've got yourself some storage attached to your gigabit network - now what? How do you make use of it? How do you access it?

There are several ways in which you can access that storage. Or should I say there are several protocols and software packages which enable you to access that storage. Each has advantages and disadvantages in terms of performance and convenience. This is what this article is about.

If you transfer one small file over the network the key point in choosing one of these technologies is convenience. If that file is very large (maybe a film, a CD or a DVD) you'll perform sequential access. Most of the different types of technologies offer very good performance for sequential access (with some exceptions though - see below) provided that the hardware (network card, CPU and HDD) can sustain the load. In this scenario choosing one of different technologies is again convenience.

The most interesting part though is when you have to transfer lots of files (maybe small ones). You may want to copy all your documents somewhere else or maybe move your photo or mp3 collection. Or if you're a brave one you might one to compile the kernel tree residing on your network attached storage. There are several real world scenarios that involve accessing a large number of files as fast as possible. This is the most interesting scenario because there are big performance differences between different technologies.

We all love benchmarks. Those little numbers that tell you that what you bought is 3% faster than competition. You may never notice the difference but the important part is that you got the best. In our case performance and convenience differences are so big that you don't really need some lab benchmarks to see the differences. This is why there won't be many numbers in this article.

However we did perform many tests. The tests involved copying a large tree of relatively small real files to and from our tumaBox over and over again. We used our /usr and /usr/lib partitions - that's right: real files from the real world. Of course we took necessary precautions such as clearing caches in order to avoid fake performance improvements. The reasons we don't put the actual numbers in the article are: 1. we don't think it's important whether you get 50 or 52.5 percent difference (nor even 20 or 30 percent) - the important thing is that there is a huge performance difference; 2. even though we took precautions to get real results we didn't perform a scientific benchmark by the book. So feel free to throw your stones and discard everything we say here.

File Level Sharing

This is the first type of sharing storage over the network. It basically means that client computers see the network attached storage as a tree of directories and files. Then they can issue commands to the server to read or write those files.

The key feature of this type of sharing is convenience. The protocols are very well supported by all major operating systems and you can mount the virtual the entire remote storage tree on your local computer and access it as a local resource. One other important feature of this type of sharing is that many clients can read and write remote files at the same time. The server takes care of keeping the data consistent all the time. However this comes at a cost because this mechanism implies locking, something that is done transparently by the server but which decreases performance most notably in our scenario: accessing a very large number of relatively small files.

SMB / CIFS

This is one of the most used protocols. Chances are that you have already used it even without knowing this. It's the protocol used by default by all Windows variants (although in different flavours) but it's also accessible on any other major operating system. On Windows it's very easy to use: if you access something from your “Network Neighborhood” you're already using it. On Windows you can also mount it as a drive (D:, X: or whatever you like). You can also mount it and access it in a variety of ways on all other major operating systems. This is pretty much as easy as it gets.

The performance for sequential transfer (large files) is quite good - you can easily saturate your gigabit network card or even your old HDD. The down side is that accessing lots of small files is slow compared with other alternatives. In this scenario you might have difficulties saturating a 100 Mbps ethernet card and not because of the HDD. The performance problems come from the inherent network latencies combined with the protocol ping-pong over the network and the locks that need to be aquired and released.

The main competitor for SMB/CIFS is NFS (see below) - the competitor from the Unix world. There's a continuous battle on which has better performance. From our experience it depends on actual conditions. Over the years we got mixed results with different hardware, protocol variants and configuration options. But in our current particular tests SMB performs much lower on writing (think 30-40%) and a bit lower on reading (think around 10%) than NFS.

You may say that this is a huge performance difference but this doesn't matter much to us. We heavily use computers and our tumaBox and on a daily basis we don't see this difference as having an impact on our productivity. Throw a different bunch of files to them and the results may be very different. We don't usually copy /usr for our work.

What is more important from our point of view is the difference in accessing the files. Yes, you can mount both SMB and NFS in your local tree but with SMB you mount the remote tree in the name of some user (by supplying username and password) and then all operations you do on the remote files appear as performed by that user. You can see this a security feature or it may get you into a permission nightmare.

NFS

This is the alternative to SMB from the Unix the world. This is why with NFS you can mount the remote tree in the local tree and the local user permissions are enforced on the remote tree as well. From the local user perspective it's really like adding another directory with some extra storage. It's a very convenient way of using remote storage on Unixes and Linuxes and all their siblings but one may argue that this might have some security implications. Knowing these implications we can take the necessary precautions to make NFS a safe thing to use.

On the performance level NFS is very similar with SMB. Again single large file transfers can saturate network or HDD. Again transfering lots of small files has poor performance. In our tests it did perform better than SMB especially on the write side but again it may depend on the particular setup and scenario. The reason why we don't care much about the performance difference between SMB and NFS is because they both perform so much poorly on small files than other alternatives. So if you have to transfer lots of small files stay away from both.

SFTP (RSync, SCP, FUSE)

network_storage_accessibility_and_peformance.1415965024.txt.gz · Last modified: 2014/11/14 11:37 by admin