May 282015

Taranis RAID-6 logoTodays post shall be about storage. My new storage array actually. I wanted to make this post episodic, with multiple small posts that make sort of a build log, but since I’m so damn lazy, I never did that. So by now, I have quite some material piled up, which you’re all getting in one shot here. This is still not finished however, so don’t expect any benchmarks or even disks – yet! Some parts will be published in the near future, in the episodic manner I had actually intended to go for. So…

I’ve been into parity RAID (redundant array of independent/inexpensive disks) since the days of PATA/IDE with the Promise Supertrak SX6000, which I got in the beginning of 2003. At first with six 120GB Western Digital disks in RAID-5 (~558GiB of usable capacity), then upgraded to six 300GB Maxtor MaxLine II disks (~1.4TiB, the first to break the TiB barrier for me). It was very stable, but so horribly slow and fragmented at the end, that playback of larger video files – think HDTV, Blu-Rays were hitting the market around that time – became impossible, and the space was once again filled up at the end of 2005 anyway.

2006, that was when I got the controller I’m still using today, the 3ware 9650SE-8LPML. Typically, I’d say that each upgrade has to give me double capacity at the very least. Below that I wouldn’t even bother with replacing either disks or a whole subsystem, given the significant costs. The gain has to be large enough to make it worthwhile.

The 3ware had its disks upgraded once too, going from a RAID-6 array consisting of 8×1TB Hitachi Deskstars (~5.45TiB usable) to 8×2TB Hitachi Ultrastars (~10.91TiB usable), which is where I’m sitting at right now. All of this – my whole workstation – is installed in an ancient EYE-2020 server tower from the 90s, which so far has housed everything starting from my old Pentium II 300MHz with a Voodoo² SLI setup all the way up to my current Core i7 980X hexcore with a nVidia SLI subsystem. Talk about some long-lasting hardware right there. So here’s what the “Helios” RAID-6 array and that ugly piece of steel look like today, and please forgive me for not providing any pictures of the actual RAID controller or its battery backup unit, I don’t have any nice photos of them, so I have to point you to some web search regarding the 3ware 9650SE-8LPML, as always, please CTRL+click to enlarge:

As you can see, that makes 16 × 40mm fans. It’s not like server-class super noisy, but it for sure ain’t silent either. It’s quite amazing that the Y.S. Tech fans in there have survived running 24/7 from 2003 to 2015, that’s a whopping 12 years! They are noisier now, and every few weeks one of the bearings would go to saw-blade mode for a brief moment, but what can you expect. None have died so far, so that’s a win in my book for any consumer hardware (which the HDCS was).

Thing is, I have two of those 3ware RAID controllers now, but each one has issues. One wouldn’t properly synchronize on the PCIe bus, negotiating only a single PCIe lane, and that thing is PCIe v1.1 even, which means a 250MiB/s limit in that crippled mode. The second one syncs properly, but has a more pressing issue; Whenever there are sharp environmental temperature changes (opening the window for 5 minutes when it’s cool outside is enough), the controller randomly starts dropping drives from the array. It took me a LONG time to figure that out, as you probably can imagine. Must be some bad soldering spots on the board or something, but I couldn’t really identify any.

Plus, capacity is running out again. Now, the latest 3ware firmware would enable me to upgrade this to at least 8 × 6TB, but with 4K video coming up and with my desire to build something very long-lasting, I decided to retire “Helios”. Ah, yes. The name…

Consider me as being childish here, but naming is something very important for me, when it comes to machines and disks or arrays. ;) I had decided to name each array once per controller. For disk upgrades, it simply gets a new number. So there was the IDE one, “Polaris”. Then “Polaris 2”, then “Helios” and “Helios 2”.

The next one shall be called “Taranis”, named after an iconic vessel a player could fly in the game [EVE Online], and its own namesake, an ancient Celtic [god of thunder].

Supposedly, a famous Taranis pilot once said this:

“The taranis is a ship for angry men or people who prefer to deal in absolutes. None of that cissy boy, ‘we danced around a bit, shot some ammo then ran away LOL’, or, ‘I couldn’t break his tank so I left’, crap. It goes like this:

You fly Taranis. A fight starts. Someone dies.”

I flew on the wing of a Taranis pilot for only one single time. A lot of people died that night, including our entire wing! ;)

In any case, I wanted to 1up this a bit. From certain enterprise storage solutions I of course knew the concept of hot-swapping and more importantly error reporting LEDs on the front of a storage enclosure. Since that’s extremely useful, I wanted both for my new array in a DIY way. I also wanted to get rid of the Antec HDCS, which had served me for 12 years now, and ultimately also semi-retire my case, after understanding that it was just too cramped for this. A case that had served me for 17 years, 24/7.

Holy shit. That’s a long time!

So I had to come up with a good solution. The first part was: I needed hot-swap bays that could do error reporting in a way supported by at least some RAID controllers. I found only ONE aftermarket bay that would fully satisfy my requirements. The controller could come later, I would just pick it from a pool of controllers supporting the error LEDs of the cages.

It was the Chieftec SST-2131SAS ([link 1], [link 2]), the oldest of Chieftecs SAS/SATA bays. It had to be the old one, because the newer TLB and CBP series no longer have any hard disk error reporting capability built in for whatever reason, and on top of that, the older SST series shows much less plastic and just steel and what I think is magnesium alloy, feels awesome:

So there is no fancy digital I²C bus for error reporting on the bays, just some plain LED connectors that do require the whole system to have a common electrical ground to work for closing the circuit, as we only got cathode pins. I got myself four such bays, which makes for a total of 12 possible drives. As you may already be guessing, I’m going for more than just twice the capacity on this one.

For a fast, well-maintainable controller, I went for the Areca [ARC-1883ix-12], which was released just at the end of 2014. It supports both I²C as well as the old “just an error LED” solution my bays have, pretty nice!

Areca (and I can confirm this first-hand) is very well known for their excellent support, which means a lot of points have to go to them for that. Sure the Taiwanese Areca guys don’t speak perfect English, but given their technical competence, I can easily overlook that. And then they support a ton of operating systems, including XP x64, even after it’s [supposed] demise (The system shall run with a mirror of my current XP x64 setup at first, and either some Linux or FreeBSD UNIX later). This thing comes with a dual-core ROC (RAID-on-Chip) running at 1.2GHz, +20% when compared to its predecessor. Plus, you get 2GiB of cache, which is Reg. ECC DDR-III/1866. Let’s just show you a few pictures before going into detail:

So there are several things to notice here:

  1. It’s got an always-full-power fan and a big cooler, so it’s not going to run cool. Like, ever.
  2. It requires PCIe power! Why? Because all non-PEG devices sucking more than 35W have to, by PCIe specification. This one eats up to 37.2W (PEG meaning the “PCI Express Graphics” device class, graphics cards get 75W from the slot itself).
  3. It has Ethernet. Why? Because you need no management software. The management software runs completely *ON* the card itself!

The really interesting part of course is the Ethernet plug. In essence, the card runs a complete embedded operating system, including a web server to enable the administrator to manage it in an out-of-band way.

That means that a.) it can be managed on all operating systems even without a driver and b.) it can even be managed, when the host operating system has crashed fatally, or when the machine sits in the system BIOS or in DOS. Awesome!

Ok, but then, there is heat. The system mockup build I’m going to show you farther below was still built with the “lets plug it in the top PCIe x4 slot” idea in mind. That would include my EVGA GeForce GTX580 3GB Classified Ultra SLI system still being there, meaning that the controller would have to sit right above an extremely hot GPU.

By now, I’ve abandoned this idea for a thermally more viable solution, replacing the SLI with a GeForce GTX Titan Black I got for an acceptable price. In the former setup, the controllers many thermal probes have reported temperatures reaching 90°C during testing, and that’s without the GPUs even doing much, so yeah.

But before we get to the mockup system build, there is one more thing, and that’s the write cache backup for the RAID controller for cases of power failures. Typically, Lithium-Ion batteries are used for that, but I’m already a bit fed up with my 3ware batteries having gone belly-up every 2 years. So I wanted to ditch that. There are such battery backup units (“BBUs”) for the Areca, but it may also be combined with a so-called flash backup module (“FBM”). Typically, a BBU would keep the DRAM and its write cache alive on the controller during power outages for like maybe 24-48 hours, waiting for the main AC power to return. Then, the controller would flush the cached data to the disks to retain a consistent state.

An FBM does it differently: It uses capacitors instead, plus a small on-board SSD. It would keep the memory alive for just seconds, just enough to copy the data off the DRAM and onto its local SSD. Then it would power off entirely. The data gets fetched back after any arbitrary amount of downtime upon power-up of the system, and flushed out to the RAID disks. The hope here is, that the supercapacitors being used by such modules can survive for much longer than the LiOn batteries.

There is one additional issue though: Capacity (both in terms of electrical capacitance and SSD capacity) is limited by price and physical dimensions. So the FBM can only cover 2GiB of cache, but not the larger sizes of 4GiB or 8GiB.

That’s where Areca support came into play, readily helping you with any pre-purchase question. I talked to a guy there, and described my workload profile to him, which boils down to highly sequential I/O with relatively few parallel streams (~40% read + ~60% write), and very little random R/W. He told me that based on that use case, more cache doesn’t make sense, as that’d be useful only for highly random I/O profiles with a very high workload and high parallelism. Think busy web servers or mail servers. But for me, 4GiB or the maximum of 8GiB of cache wouldn’t do more than what the stock 2GiB does.

As such, I forgot about the cache upgrade idea and went with the flash backup module instead of a conventional BBU. That FBM is called the ARC-1883-CAP:

So, let’s put all we have for now together, and look at some build pictures:

Let me tell you one thing; Yes, the Lian Li PC-A79B is nice, because it’s so manageable. The floors in the HDD cages can be removed even, so that any HDD bay can fit, with no metal noses in the way in the wrong places. Its deep, long and generally reasonably spacious.

But – there is always a but – when you’re coming from an ancient steel monster like I did, the aluminium just feels like thin paper or maybe tin foil. The EYE-2020 can could the weight of a whole man standing on top of it. But with an aluminium tower you’ll have to be careful not to bend anything when just pulling out the mainboard tray. The HDD cage feels as if you could very easily rip it out entirely with just one hand.

Aluminium is really soft and weak for a case material, so that’s a big minus. But I can have a ton of drives, a much better cooling concept and a much, much, MUCH cleaner setup, hiding a lot of cables from the viewer and leaving room for air to move around. Because that part was already quite terrible in my old EYE.

Please note that the above pictures do not show the actual system as it’s supposed to look like in the end though. The RAID controller already moved one slot downwards, away from the 4 PCIe lanes coming from the ICH10R (“southbridge”), which in turn is connected to the IOH (“northbridge”) only via a 2GiB/s DMI v1 bus. So it went down one slot, onto the PCIe/PEG x16 slot which is connected to the X58 chipsets IOH directly. This should take care of any potential bandwidth problems, given that the ICH10R also has to route all my USB 2.0 ports, the LAN ports, all Intel SATA ports including my system SSD and the BD drives, one Marvell eSATA controller and one Marvell SAS Controller to the IOH and with it ultimately to the CPU & RAM, all via a bus that might’ve gotten a bit overcrowded when using a lot of those subsystems at once.

Also, this tiny Intel cooler isn’t gonna stay there, it just came “for free” with the second ASUS P6T Deluxe I bought, together with a Core i7 930. Well, as a matter of fact, that board… umm… let’s just say it had a little accident and had to be replaced *again*, but that’s a story for the next episode. ;) A Noctua NH-D15 monster and the free S1366 mounting kit that Noctua sends you if you need one, plus a proper power supply all have already arrived, so there might be a new post soon enough, with even more Noctuafication also being on the way! Well, as soon as I get out of my chair to actually get something done at least. ;)

And for those asking the obvious question “what drives are you gonna buy for this?”, the answer to that (or at least the current plan) is either the 6TB Seagate Enterprise Capacity 3.5 in their 4Kn version, the [ST6000NM0014], or the 6TB Hitachi Ultrastar 7K6000, also in their 4Kn version, that’d be the [HUS726060AL4210]. Given that I want drives with a read error rate of <1 error in 1015 bits read instead of <1 in 1014, like it is for consumer drives, those would be my primary drives of choice. Seagates cheap [SMR] (shingled magnetic recording) disks are completely unacceptable for me anyway, and from what I’ve heard so far, I can’t really trust Hitachis helium technology with being reliable either, so it all boils down to 6TB enterprise class drives with conventional air filling for now. That’s if there aren’t any dramatic changes in the next few months of course.

Those disks are all non-encrypting drives by the way, as encryption will likely be handled by the Areca controllers own AES256 ASIC and/or Truecrypt or Veracrypt.

Ah, I almost forgot, I’m not even done here yet. As I may get a low-air-pressure system in the end, with less air intake than exhaust, potentially sucking dust in everywhere, I’m going to filter or block dust wherever I possibly can. And the one big minus for the Chieftec bays is that they have no dust filters. And the machine sits in an environment with quite a lot of dust, so every hole has to be filtered or blocked, especially those that air gets sucked through directly, like the HDD bays.

For that I got myself some large 1×1 meter stainless steel filter roll off eBay. This filter has a tiny 0.2mm mesh aperture size and 0.12mm wire diameter, so it’s very, very fine. I think it was originally meant to filter water rather than air, but that doesn’t mean it can’t do the job. With that, I could get those bays properly modified. I don’t want them to become dust containers eventually after all.

See here:

Steel filter with 0.2mm mesh aperture

Steel filter with 0.2mm mesh aperture, coins for size comparison (10 Austrian shillings and 1 Euro).

I went for steel to have something easy enough to work with, yet still stable. Now, it took me an entire week to get this done properly, and that’s because it’s some really nasty work. First, let’s look at one of the trays that need filtering, so you can see why it’s troublesome:

So as you can see, I had to cut out many tiny pieces, that would then be glued into the tray front from the inside, for function as well as neat looks. This took more than ten man-hours for all 4 bays (12 trays), believe it or not. This is what it looks like:

Now that still leaves the other hexagonal holes in the bay frame, that air may get sucked through and into the bays inside. Naturally, we’ll have to handle them as well:

And here is our final product, I gotta say, it looks reaaal nice! And all you’d have to do every now and then is to go over the front with your vacuum cleaner, and you’re done:

SST-2131SAS, fully filtered by steel

A completed SST-2131, fully filtered by pure steel.

So yeah, that’s it for now, more to follow, including the new power supply, more dust filtering and blocking measures, all bays installed in the tower and so on and so forth…

Edit: [Part 2 is now ready]!

CC BY-NC-SA 4.0 Building the 54.5TiB “Taranis” RAID-6 array and the hardware around it, part 1 by The GAT at is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

  2 Responses to “Building the 54.5TiB “Taranis” RAID-6 array and the hardware around it, part 1”

  1. Where on ebay did you get that stainless stuff? Looks great, and less hassle in the long run – I was going to use a pantyhose around my server.

    • Hey man,

      Hah yeah, the trick with the nylons is the oldest in the book, but they’re damaged too easily. I got the stuff from Germany actually, but shipping might be a bit expensive to other countries. It was ok to Austria with 10€. Just search for “Edelstahl Drahtgewebe 0,2mm” and you may find it. Like this one here for instance: [Link].

      You’d really need to check out the other offerings though, they’re more expensive, but I don’t know about the shipping costs to your country, so yeah. Oh, and prepare a sturdy pair of scissors, or alternatively one that you don’t need for anything else anymore! I nearly completely ruined a pair of good scissors cutting the stuff. It’s steel after all…

      In the USA, I even found laboratory filters made of steel with a mesh aperture as fine as 16.5µm, but those are pretty expensive, about 5-10 times as much as the one you can find at the link I provided, so that’s probably not worth it for just filtering dust out of the air. 0.2mm is pretty damn fine already. :)

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre lang="" line="" escaped="" cssfile="">