Dec 012015
 

Taranis RAID-6 logoWhile there has been quite some trouble with the build of my new storage array, as you can see in the last [part 3½], everything seems to have been resolved now. As far as tests have shown, the instability issues with my drives have indeed been caused by older Y-cables used to support all eight 4P molex plugs of my Chieftec 2131SAS drive bays. This was necessary, as all plugs on the Corsair AX1200i power supply had been used up, partly to support the old RAID-6 arrays 8 × SATA power plugs as well.

To fix it, I just ripped out half of the Y-cables, more specifically those connected to the bays which showed trouble, and hooked the affected bays up to a dedicated ATX power supply. The no-name 400W PSU used for this wasn’t stable with zero load on the ATX cable however, so just shorting the green and grey cables on the ATX plug didn’t work. Happens for a lot of ATX PSUs, so I hooked another ASUS P6T Deluxe up to it, which stabilized all voltage rails.

After that, a full encryption of the (aligned) GPT partition created on the device, rsync for 3 days, then a full diff for a bit more than 2 days, and yep. Everything worked just as planned, all 10.5TiB of my data was synced over to the new array correctly and without any inconsistencies. After that, I ripped out the old array, and did the cabling properly, and well – still no problems at all!

Taranis RAID-6 freshly filled with Data from the old Helios 10.9TiB Array

With everything having been copied over, that little blue triangle still has ways to go trying to eat up Taranis!

I do have to apologize for not giving you pictures of the 12 drives though, but while completing everything, I was just in too much of a rush to get everything done, so no ripping out of disks for photos. :( Besides some additional benchmarks I can give you a few nightshots of the machine though. This is with my old 3ware 9650SE-8LPML card and all of its drives removed already. Everything has been cleaned one last time, the flash backup module reconnected to the Areca ARC-1883ix-12, the controllers management interface itself hooked up to my LAN and made accessible via a SSH tunnel and all status-/error-LED headers hooked up in the correct order.

For the first one of these images, the error LEDs have been lit manually via Arecas “identify enclosure” function applied to the whole SAS expander chip on the card:

The drive bays’ power LEDs are truly insanely bright. The two red error LEDs that each bay has – one for fan failure, one for overheating – are off here. What you can see are the 12 drive bays’ activity and status LEDs as well as the machines’ power LED. The red system SSD LED and the three BD-RW drive LEDs are off. It’s still a nice christmas tree. ;)

The two side intakes, Noctua 120mm fans in this case, filtered by Silverstone ultra-fine dust filters let some green light through. This wasn’t planned, and it’s caused by the green LEDs of the GeForce GTX Titan Black inside. It’s quite dim though. The fans a live savers by the way, as they keep the Areca RAID controllers’ dual-core 1.2GHz PowerPC 476 processor at temperatures <=70°C instead of something close to 90°C. The SAS expander chip sits at around 60°C with the board temperature at 38°C, and the flash backup module temperature is at ~40°C. All of this at an ambient testing temperature of 28°C after 4 hours of runtime. So that part’s perfectly fine.

Only problem are the drives, which can still reach temperatures as high as 49-53°C. While the trip temperature of the drives is 85°C, everything approaching 60°C should already be quite unhealthy. We’ll see how well that goes, but hopefully it’ll be fine for them. My old 2TiB A7K2000 Ultrastars ran for what is probably a full accumulated year at ~45°C without issues. Hm…

In any case, some more benchmarks:

Taranis RAID-6 running ATTO disk benchmark v2.47

The Taranis RAID-6 running ATTO disk benchmark v2.47, 12 × Ultrastar 7K6000 SAS @ ARC-1883ix-12 in RAID-6, results are kiB/s

 

 

In contrast to some really nice theoretical results, practical tests with [dd] and [mkvextract+mkvmerge] show, that the transfer rate on the final, encrypted and formatted volume sits somewhere in between 500-1000MiB/s for very large sequential transfers with large block sizes, which is what I’m interested in. While the performance loss seems significant when taking the proper partition-to-stripe-width-alignment and the multi-threaded, AES-NI boosted encryption into account, it’s still nothing to be ashamed of at all. In the end, this is by several factors faster than the old array which delivered roughly 200-250MiB/s or rather less at the end, with severe fragmentation beginning to hurt the file system significantly.

Ah yes, one more thing that might be interesting: Power consumption of the final system! To measure this, I’m gonna rely on the built-in monitoring and management system of my Corsair AX1200i power supply again. But first, a list of the devices hooked up to the PSU:

  • ASUS P6T Deluxe mainboard, X58 Tylersburg chipset
  • 3 × 8 = 24GB DDR-III/1066 CL8 SDRAM (currently for testing, would otherwise be 48GB)
  • Intel Xeon X5690 3.46GHz hexcore processor, not overclocked, idle during testing
  • nVidia GeForce GTX Titan Black, power target at 106%, not overclocked, idle during testing
  • Areca ARC-1883ix-12 controller + ARC-1883-CAP flash backup module
  • Auzentech X-Fi Prelude 7.1
  • 1 × Intel 320 SSD 600GB, idle during testing
  • 3 × LG HL-DT-ST BH16NS40 BD-RW drives, idle during testing
  • 1 × Teac FD-CR8 combo drive (card reader + FDD), idle during testing
  • 12 × Hitachi Global Storage Ultrastar 7K6000 6TB SAS/12Gbps, sequential transfer during testing
  • 4 × Chieftec 2131SAS HDD bays
  • 2 × Noctua NF-A15 140mm fans
  • 2 × Noctua NF-A14 PWM 140mm fans
  • 3 × Noctua NF-F12 PWM 120mm fans
  • 4 × Noctua NF-A8 FLX 80mm fans (in the drive bays)
  • 1 × Noctua NF-A4x10 40mm fan
  • 1 × unspecified 140mm PWM fan in the power supply
Full system load with the new Taranis RAID-6 array

Full system load with the new Taranis RAID-6 array

So we’re still under the 300W mark, which I had originally expected to be cracked, since the old system was in the same ballpark when it comes to power consumption. But the old system had an overclocked i7 980X instead of this seriously cool-running Xeon as well (it has a low VID, it’s cooler even on stock settings).

Now all that’s missing is the adaptation of my old scripts checking the RAID controller and drive status periodically. For this, I was using 3wares tw_cli tool and SmartMonTools originally. I’ll continue to use the SmartMonTools of course, as they’ve been adapted to make use of Arecas API as well, thus being able to fetch S.M.A.R.T. data from all individual drives in the array. The tw_cli part will have to be replaced with Arecas own command line tool though, including a lot of post-processing with Perl to publish this in a nice HTML form again. When it’s done, the stats will be reachable [here].

Depending on how extremely my laziness and my severe Anime addiction bog me down, this may take a few days. Or weeks. :roll:

Edit: Ah, actually, I was motivated enough to do it, cost me several hours, inflicted quite some pain due to the weirdness of Microsoft Batch, but it’s done, the RAID-6 web status reporting script is back online! More (including the source code) in [part 4½]!

Nov 202015
 

Taranis RAID-6 logoYeah, after [part 3] it should be “part 4”. The final stage. However, while I’d love to present my final ~55TiB RAID-6 to you, I cannot do so yet, because there were and probably are some severe issues with the setup, which I will talk about down below. So, since my level of trust for Seagate is rather low because of the failure rates reported by Backblaze and my own experiences at work as well as the experiences from some other administrators I know, their line of Enterprise disks was out of the game. Another option would’ve been Hitachis Helium-filled Ultrastar He8, but since the He6 was reportedly rather disastrous, I don’t really want to trust those drives either.

This Helium stuff is just so new and daring, that I don’t want to trust them to be the very base of a RAID array that’s supposed to last for many, many years just yet.

Ultimately, I decided to get myself 12 insanely expensive Hitachi Ultrastar 7K6000 disks, “The last in Air” as they call ’em themselves. That’s a classic 5-platter 10-head airfilled enterprise disk with 7200rpm rotational speed and 6TB of capacity. I got the SAS/12Gbps version which also boasts 128MiB of cache. Mechanically, that’s all the same old tech that I’ve already been using with my 8 × 1TB Deskstars and now 8 × 2TB Ultrastars, so it’s something I can trust. However, as I said, there were/are some very serious issues. Maybe you remember this image:

"Helios" RAID-6 array emergency migration

Old array to the left…

So my old RAID-6 based on a 3ware 9650SE-8LPML with 8 × 2TB Ultrastars is sitting on the table, while the new one has been plugged into the Chieftec 2131SAS bays and hooked up to the Areca ARC-1883ix-12. Both RAID systems are thus connected to the same host machine at the same time, making it a total of 20 drives. This is supposed to make data migration using rsync very convenient and easy.

The problem is that I didn’t have enough power connectors for this (12 × SATA for the old array, ODDs and SSD, 8 × 4P Molex for the SAS bays), so I settled for Y-adapters to hook up the new array. Then the trouble started. At first I thought it was the passive SAS bays to blame. But as I continued my tests, drives would behave slightly differently as I exchanged and rotated the Y cables. What I observed was some weird “jitter”, where the drive heads were audibly moving around were they shouldn’t have, and sometimes drives would stall for a moment as well.

Ultimately, the array ran into a massive failure during init at about 60%, and 4-5 drives successively failed, collecting tons of recoverable read AND write errors in their S.M.A.R.T. logs. Bleh… At least no unrecoverable ones, but still…

At this point I ripped out half of the Y cables and hooked two of the four bays up to a dedicated power supply (only two, because of a lack of plugs). It seems this greatly changed the behavior of the whole setup, stabilizing it significantly. Of course it’s too early to say anything for sure, because now I’m just at roughly 25% through the second initialization process. But if I’m right, then a few 1€ parts have successfully wrecked a ~8000€ RAID array, now that’s something, eh?

In any case, before getting my Ultrastars I also tried the system with some Seagate Cheetah 15k.6 and 15k.7 drives I managed to borrow at work, 300GB 15000rpm SAS pieces, just for some benchmarks. Since those showed more severe problems even than the Hitachis (probably because they’re more power hungy?), I went down to 11, then 8 drives. Some of the benches will also show sudden stalls. Yeah. That’s the power issue.

Well, it can still serve as a quick glance at the performance levels one can expect with the Areca ARC-1883ix-12, even in such a state. Let me just say: It is a nice feeling to see a RAID array based on mechanical drives push 1000-1200MiB/s over the bus on average, reading at 64kiB-1MiB block sizes. At least that part is undeniably awesome! Here are a few screenshots for you, RAID stripe block sizes are always 64kiB, read block sizes are 4kiB, 64kiB and 1MiB, write block sizes are 64kiB, 512kiB and 1MiB. For the RAID-6 setup there are also benches during init and in 2-disk degraded mode, software’s just a cheap HDTune 2.55 + HDTune Pro 5.00 for now.

Ah yes, you might be wondering why the CPU usage is so high. Well, these were just quick preliminary tests anyway, so some video transcoders were running in the background at the same time, that’s why. Here we go:

RAID-0, 8 × 15000rpm Cheetahs, reads:

RAID-0, 8 × 15000rpm Cheetahs, writes:

RAID-6, 11 × 15000rpm Cheetahs, reads in normal state:

RAID-6, 11 × 15000rpm Cheetahs, reads during initialization:

RAID-6, 11 × 15000rpm Cheetahs, reads in 2-disk degraded mode:

The performance degradation due to the initialization process is somewhat in line with what’s configured on the controller itself, giving the background process a low 20% priority. The degradation in 2-disk degraded mode is what’s really interesting though. Here we can see that the 1.2GHz dual core PowerPC RAID engine is seriously powerful. With double parity computation required on the fly, the array still delivers 64kiB transfer rates in excess of 800MiB/s! That’s insane! I was hoping for normal transfer rates over 600MiB/s, but this really waters ones mouth!

Of couse, all of this is still preliminary, my array still doesn’t work and these aren’t the final drives running through the tests, nor is the controller fully configured yet. Let’s just hope that I can get a grip on that situation soon… because all these problems are seriously pissing me off already, as you may be able to understand, given the price of the hardware and the pressing issue that I’m running out of space on my old array.

Well, let’s hope a real “part 4” can follow soon!

Edit: And finally, [here it is]!