Building your own Network Attached Storage server
In an earlier post, I discussed what was necessary to cable 1U servers up with power to run SATA drives. Next, you'll need some drives and a controller if you have not already purchased them.
The SATA controller you select is one of the most important pieces of the NAS box. It is my recommendation you get a high quality card, even if you are just doing RAID-1 (Mirroring). There are basically three types of controllers: standard, fakeraid, and hardware RAID. A standard controller is just a host-bus adapter. Any advanced features like striping and mirroring will need to be performed by your OS. Fakeraid includes a RAID BIOS, but all processing is left to a software driver. Hardware RAID is a complete subsystem with a dedicated CPU.
Standard controllers and fakeraid cards are generally cheap and maybe even integrated on your motherboard. However, these are rarely a good choice for servers. Aside from requiring the host to perform a lot more work, these can also cause a lot of trouble down the road. For instance, if a hard disk fails with software RAID, the system may not be able to reboot without intervention. Fakeraid may present the disks to the machine as one logical disk to avoid this, but these cards are the most evil. They tend to use closed drivers and proprietary disk formats which basically means avoid these like the plague. The cards themselves are fine, just use them like a standard controller and use the software RAID of your operating system. That leaves us with hardware RAID controllers.
Hardware RAID controllers are the only logical choice for a server. Hardware RAID cards offload all RAID processing to a subsystem on the card. These cards have an integrated CPU to perform parity calculations for RAID-5, and usually a large amount of RAM to act as cache. The RAM can even be used as write cache, but it is important to have a battery ON THE CARD in case of a power failure or crash to avoid potentially much greater data loss. Hardware RAID controllers present their RAID volumes as logical disks to the computer and operating system, so faults are transparent to the host. They also tend to allow hot addition of storage. Finally, since the card is handling all the I/O, bandwidth requirements are heavily reduced. If I am writing to a hardware RAID-1 device, data only needs to be sent across the PCI bus once as opposed to software RAID which will need to write the data to each disk. This is a very important consideration for older machines that have limited bus bandwidth and large arrays.
All things considered, I went with a 3Ware 9500S. Cost: $100 on eBay.
3Ware has good drivers in the Linux kernel, and has been manufacturing SATA RAID controllers for some time. Other decent manufactures are Adaptec and LSI Logic. Be sure to check for OS compatibility before purchasing a card.
Just because you are getting the price point of SATA, you should not disregard the quality of drives you are purchasing. Although most standard consumer drives will work fine, there is a much more attractive option. Seagate and others offer what they call "nearline" drives which are basically the consumer drives with a RAID friendly firmware and continuous duty cycle. What's best is that these usually only have a $10 or so premium over their consumer counterparts. When it comes to selecting a manufacturer, take a look at the warranty and technology integrated in the drives. Seagate and Hitachi are both good choices. Take a look at reviews too, especially StorageReview.
When most people purchase a hard disk, they make their choice simply on size. For a server (and even desktop!), you should also consider the spindle speed and size of the drive cache. 7,200RPM is enough for most servers, but 10,000RPM will deliver far greater performance under concurrent and random access. Of course, these usually come at a a heavy price premium and lesser capacity.
Here, I went with a pair of nearline Seagate Barracuda ES drives. The 320GB model was ample for my need, but these go all the way up to 750GB. They use perpendicular magnetic recording which increases density and speed, and feature a large 16MB cache. My initial testing shows nearly 80MB/s throughput! That level of speed has traditionally only been available on expensive SCSI disks. Cost: $100/drive.
Equally important to a NAS server is ample networking bandwidth. This is largely dependent on the scale of services, but a decent gigabit NIC should get you near disk performance. I picked up an Intel Pro/1000 MT Server Adapter, which features various offloading schemes to free the host's CPUs from networking tasks. If your needs are greater, consider aggregating several gigabit ports together. Cost: $20.
The total cost for storage hardware comes to about $320.
Cabling the drives up and configuring the array is a pretty straight forward task. If you chose software RAID, there are plenty of guides on the internet to assist you in setting up md and device-mapper. Try http://linas.org/linux/raid.html.
When it comes to RAID levels, the choice depends on the number of drives you have and the level of protection you need. RAID-1 is a great choice for most applications in that you get true redundancy and greater read performance, but at the cost of half the physical capacity. RAID-5 is also commonly used in which you lose the capacity of one drive, with a three drive minimum. It allows for the failure of a single drive. With the size of modern hard disks, RAID-5 is less attractive than it once was and also suffers from heavy write performance loss due to parity calculations. Try http://www.acnc.com/04_01_00.html for the lowdown on various RAID levels.
My next post will cover file system selection.