Jan 152015

4Kn logoWhile I’ve been planning to build myself a new RAID-6 array for some time (more space, more speed), I got interested in the latest and greatest of hard drive innovations, which is the 4Kn Advanced Format. Now you may now classic hard drives with 512 byte sectors and the regular Advanced Format also known as 512e, which uses 4kiB physical sector sizes, but emulates 512 byte sectors for compatibility reasons. Interestingly, [Microsoft themselves state], that “real” 4Kn harddrives, which expose their sector size to the operating system with no glue layers in between are only supported in Windows 8 and above. So even Windows 7 has no official support.

On top of that, Intel [has stated], that their SATA controller drivers do not support 4Kn, so hooking one such drive up to your Intel chipsets’ I/O controller hub (ICH) or platform controller hub (PCH) will not work. Quote:

“Intel® Rapid Storage Technology (Intel® RST) version 9.6 and newer supports 4k sector disks if the device supports 512 byte emulation (512e). Intel® RST does not support 4k native sector size devices.”

For clarity, to make 4Kn work in a clean fashion, it must be supported on three levels, from lowest to highest:

  1. The firmware: For mainboards, this means your system BIOS/UEFI. For dedicated storage controllers, the controller BIOS itself.
  2. The kernel driver of the storage controller, so that’s your SATA AHCI/RAID drivers or SAS drivers.
  3. Any applications above it performing raw disk access, whether kernel or user space. File system drivers, disk cloning software, low level benchmarks, etc.

Granted, 4Kn drives are extremely new and still very rare. There is basically only the 6TB Seagate enterprise drives available ([see here]) and then some Toshiba drives, also enterprise class. But, to protect my future investments in that RAID-6, I got myself a [Toshiba MG04ACA300A] 3TB drive, which was the only barely affordable 4Kn disk I could get, basically also the only one available right now besides the super expensive 6TB Seagates. That way I can check for 4Kn compatibility relatively cheaply (click to enlarge images):

If you look closely, you can spot the nice 4Kn logo right there. In case you ask yourselves “Why 4Kn?”, well, mostly cost and efficiency. 4kiB sectors are 8 times as large as classic 512 byte ones. Thus, for the same data payload you need 8 times less sector gaps, 8 times less synchronization markers and 8 times less address markers. Also, a stronger checksum can be used for data integrity. See this picture from [Wikipedia]:


Sector size comparison (Image is © Dougolsen under the CC-BY 3.0 unported license)

Now this efficiency is already there with 512e drives. 512e Advanced Format was supposedly invented, because more than half the programs working with raw disks out there can’t handle variable sector sizes and are hardcoded for 512n. That also includes system firmwares, so your mainboards’ BIOS/UEFI. To solve those issues, they used 4kiB sectors, then let a fast ARM processor translate them into 512 byte sectors on the fly to give legacy software something it could understand.

4Kn on the other hand is the purist, “right” approach. No more emulation, no more cheating. No more 1GHz ARM dual core processor in your hard drive just to be able to serve data fast enough.

Now we already know that Intel controllers won’t work. For fun, I hooked it up to my ASUS P6T Deluxes’ secondary SATA controller though, a Marvell 88SE6120. Plus, I gave the controller the latest possible driver, the quite hard-to-get version You can download that [here] for x86 and x64.  To forestall the result: It doesn’t work. At all. This is what the systems’ log has to say about it (click to enlarge):

So that’s a complete failure right there. Even after the “plugged out” message, the timeouts would still continue to happen roughly every 30 seconds, accompanied by the whole operating system freezing for 1-2 seconds every time. I cannot say for any other controllers like the Marvell 9128 or Silicon Image chips and others, but I do get the feeling that none of them will be able to handle 4Kn.

Luckily, I do already have the controller for my future RAID-6 right here, an Areca ARC-1883ix-12, the latest and greatest tech with PCIe x8 3.0 and SAS/12Gbit ports with SATA/6Gbit encapsulation. Its firmware and driver supports 4Kn fully as you can see in Arecas [specifications]. The controller features an out-of-band management system via its own ethernet port and integrated web server for browser-based administration, even if the system doesn’t even have any OS booted up. All that needs to be installed on the OS then is a very tiny driver (click to enlarge):

Plus, Areca gives us one small driver for many Windows operating systems. Only for the Windows XP 32-Bit NT5.1 kernel you’ll get a SCSI Miniport driver exclusively, while all newer systems (WinXP x64, Windows Vista, 7, 8) get a more efficient StorPort driver. So, plugged the controller in, installed the driver, hooked up the disk, and it seems we’re good to go:

The 4Kn drive is being recognized

The 4Kn drive is being recognized (click to enlarge)

Now, any legacy master boot record (MBR) partition table has a 32-bit address field. That means, it can address 232 elements. With each element being 512 bytes large, you reach 2TiB. So that’s where the 2TiB limit comes from. With 4Kn however, the smallest addressable atom is now eight times as large: 4096 bytes! So we should be able to reach 16TiB due to the larger sector size. Supposedly, some USB hard drive manufacturers have used this trick (by emulating 4Kn) to make their larger drives work easily on Windows XP. When trying to partition the Toshiba drive however, I hit a wall, as it seems Windows disk management is about as stupid as was the FAT32 formatter on Windows 98:

MBR initialization failed

MBR initialization failed (click to enlarge)

That gets me thinking. On XP x64, I can still just switch from MBR to the GPT partitioning scheme to be able to partition huge block devices. But what about Windows XP 32-bit? I don’t know how the USB drive manufacturers do it, so I can only presume they ship the drives pre-partitioned if its one of those that don’t come with a special mapping tool for XP. In my case, I just switch to GPT and carry on (click to enlarge):

Now I guess I am the first person in the world to be able to look at this, and potentially the last too:

fsutil.exe showing a 4Kn drive on XP x64

fsutil.exe showing a native SATA 4Kn drive on XP x64, encapsulated in SAS. Windows 7 would show the physical and logical sector size separately due to its official 512e support. Windows XP always reports the logical sector size (click to enlarge)

So far so good. The very first and most simple test? Just copy a file onto the newly formatted file system. I picked the 4k (no pun intended) version of the movie “Big Buck Bunny”:

Copying a first file onto the 4Kn disks NTFS file system

Copying a first file onto the 4Kn disks NTFS file system

Hidden files and folders are shown here, but Windows doesn’t seem to want to create a System Volume Information\ folder for whatever reason. Other than that it’s very fast and seems to work just nicely. Since the speed is affected by the RAID controllers write back cache, I thought I’d try HD Tune 2.55 for a quick sequential benchmark. Or in other words: “Let’s hit our second legacy software wall” (click to enlarge):

Yeah, so… HD Tune never detects anything above 2TiB, but this? At first glance, 375GB might sound quite strange for a 3TB drive. But consider this: 375 × 8 = 3000. What happened here is that HD Tune got the correct sector count of the drive, but misinterpreted each sectors’ size as 512 bytes. Thus, it reports the devices’ size as eight times as small. Reportedly, this is also the exact way how Intels RST drivers fail when trying to address a 4Kn drive. HD Tune 2.55 is thus clearly hardcoded for 512n. There is no way to make this work. Let’s try the paid version of the tool which is usually quite ahead of its free and legacy counterpart (click to enlarge):

Indeed, HD Tune Pro 5.00 works just as it should when accessing the raw drive. Users who don’t want to pay are left dead in the water here. Next, I tried HDTach, also an older tool. HDTach however reads from a formatted file system instead of from a raw block device. The file system abstracts the device to a higher level, so HDTach doesn’t know and doesn’t need to know anything about sectors. As a result, it also just works:

HD Tach benchmarking NTFS on a 4Kn drive

HD Tach benchmarking NTFS on a 4Kn drive (click to enlarge)

Next, let’s try an ancient benchmark, that again accesses drives on the sector level: The ATTO disk benchmark. It is here where we learn that 4Kn, or generally variable sector sizes aren’t space magic. This tool was written well before the times of 512e or 4Kn, and look at that (click to enlarge):

Now what does that tell us? It tells us, that hardware developers feared the chaotic ecosystem of tools and software that accesses disks at low levels. Some might be cleanly programmed, where most may not. That doesn’t just include operating systems’ built-in toolsets, but also 3rd party software, independently from the operating system itself. Maybe it also affects disk cloning software like from Acronis? Volume shadow copies? Bitlocker? Who knows. Thing is, to be sure, you need to test that stuff. And I presume that to go as far as hard drive manufacturers did with 512e, they likely found one abhorrent hell of crappy software during their tests. Nothing else will justify ARM processors at high clock rates on hard drives just to translate sector sizes plus all the massive work that went into defining the 512e Advanced Format standard before 4Kn Advanced Format.

Windows 8 might now fully support 4Kn, but that doesn’t say anything about the 3rd party software you’re going to run on that OS either. So we still live in a Windows world where a lot of fail is waiting for us. Naturally, Linux and certain UNIX systems have adapted much earlier or have never even made the mistake of hardcoding sector sizes into their kernels and tools.

But now, to the final piece of my preliminary tests: Truecrypt. A disk encryption software I still trust despite the project having been shut down. Still being audited without any terrible security hole discoveries so far, it’s my main choice for cross-platform disk encryption, working cleanly on at least Windows, MacOS X and Linux.

Now, 4Kn is disabled for MacOS X in Truecrypts source code, but seemingly, this [can be fixed]. I also discovered that TC will refuse to use anything other than 512n on Linux if Linux kernel crypto is unavailable or disabled by the user in TC, see this part of Truecrypts’ CoreUnix.cpp:

#if defined (TC_LINUX)
if (volume->GetSectorSize() != TC_SECTOR_SIZE_LEGACY)
  if (options.Protection == VolumeProtection::HiddenVolumeReadOnly)
    throw UnsupportedSectorSizeHiddenVolumeProtection();
  if (options.NoKernelCrypto)
    throw UnsupportedSectorSizeNoKernelCrypto();

Given that TC_SECTOR_SIZE_LEGACY equals 512, it becomes clear that hidden volumes are unavailable as a whole with 4Kn on Linux, and encryption is completely unavailable altogether if kernel crypto isn’t there. So I checked out the Windows specific parts of the code, but couldn’t find anything suspicious in the source for data volume encryption. It seems 4Kn is not allowed for bootable system volumes (lots of “512’s” there), but for data volumes it seems TC is fully capable of working with variable sector sizes.

Now this code has probably never been run before on an actual SATA 4Kn drive, so let’s just give it a shot (click to enlarge):

Amazingly, Truecrypt, another software written and even abandoned by its original developers before the advent of 4Kn works just fine. This time, Windows does create the System Volume Information\ folder on the file system within the Truecrypt container, and fsutil.exe once again reports a sector size of 4096 bytes. This shows clearly that TC understands 4Kn and passes the sector size on to any layers above itself in the kernel I/O stack flawlessly (The layer beneath it should be either the NT I/O scheduler or maybe the storage controller driver directly and the layer above it the NTFS file system driver, if my assumptions are correct).

Two final tests for data integrities’ sake:

Both a binary diff and SHA512 checksums prove, that the data copied from a 512n medium to the 4Kn one is still intact

Both a binary diff and SHA512 checksums prove, that the data copied from a 512n medium to the 4Kn one is still intact

So, my final conclusion? Anything that needs to work with a raw block device on a sector-by-sector level needs to be checked out before investing serious money in such hard drives and storage arrays. It might be cleanly programmed, with some foresight. It also might not.

Anything that sits above the file system layer though (anything that reads and writes folders and files instead of raw sectors) will always work nicely, as such software does not need to know anything about sectors.

Given the possibly enormous amount of software with hardcoded routines for 512 byte sectors, my assumption would be that the migration to 4Kn will be quite a sluggish one. We can see that the enterprise sector is adapting first, clearly because Linux and UNIX systems adapt much faster. The consumer market however might not see 4Kn drives anytime soon, given 512 byte sectors have been around for about 60 years (!) now.

Update 2014-01-16 (Linux): I just couldn’t let it go, so I took the Toshiba 4Kn drive to work with me, and hot plugged it into an Intel ICH10R. So that’s the same chipset as the one I ran the Windows tests on, an Intel X58. Only difference is, that now we’re on CentOS 6.6 Linux running the 2.6.32-504.1.3.el6.x86_64 kernel.  This is what dmesg had to say about my hotplugging:

ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata3.00: ATA-8: TOSHIBA MG04ACA300A, FP2A, max UDMA/100
ata3.00: 732566646 sectors, multi 2: LBA48 NCQ (depth 31/32), AA
ata3.00: configured for UDMA/100
ata3: EH complete
scsi 2:0:0:0: Direct-Access     ATA      TOSHIBA MG04ACA3 FP2A PQ: 0 ANSI: 5
sd 2:0:0:0: Attached scsi generic sg7 type 0
sd 2:0:0:0: [sdf] 732566646 4096-byte logical blocks: (3.00 TB/2.72 TiB)
sd 2:0:0:0: [sdf] Write Protect is off
sd 2:0:0:0: [sdf] Mode Sense: 00 3a 00 00
sd 2:0:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 2:0:0:0: [sdf] 732566646 4096-byte logical blocks: (3.00 TB/2.72 TiB)
sd 2:0:0:0: [sdf] 732566646 4096-byte logical blocks: (3.00 TB/2.72 TiB)
sd 2:0:0:0: [sdf] Attached SCSI disk

Looking good so far, also the Linux kernel typically cares rather less about the systems BIOS, bypassing whatever crap it’s trying to tell the kernel. Which is usually a good thing. Let’s verify with fdisk:

Note: sector size is 4096 (not 512)
WARNING: The size of this disk is 3.0 TB (3000592982016 bytes).
DOS partition table format can not be used on drives for volumes
larger than (17592186040320 bytes) for 4096-byte sectors. Use parted(1) and GUID 
partition table format (GPT).

Now that’s more like it! fdisk is warning me, that it will be limited to addressing 16TiB on this disk. A regular 512n or 512e drive would be limited to 2TiB as we know. Awesome. So, I created a classic MBR style partition on it, formatted it using the EXT4 file system, and mounted it. And what we get is this:

Filesystem            Size  Used Avail Use% Mounted on
/dev/sdf1             2.7T   73M  2.6T   1% /mnt/sdf1

And Intel is telling us that they don’t manage to give us any Windows drivers that can do 4Kn? Marvell doesn’t even comment on their inabilities? Well, suck this: Linux’ free driver for an Intel ICH10R south bridge (or any other that has a driver coming with the Linux kernel for that matter) seems to have no issues with that whatsoever. I bet it’s the same with BSD. Just weak, Intel. And Marvell. And all you guys who had so much time to prepare and yet did nothing!

Update 2014-01-20 (Windows XP 32-Bit): So what about regular 32-Bit Windows XP? There are stories going around that some USB drives with 3-4TB capacity would use a 4Kn emulation (or real 4Kn, bypassing the 512e layer by telling the drive firmware to do so?), specifically to enable XP compatibility without having to resort to special mapping tools.

Today, I had the time to install XP SP3 on a spare AMD machine (FX9590, 990FX), which is pretty fast thanks to a small, unused testing SSD I still had lying around. Before that I wiped all GPT partition tables from the 4Kn drive, both the one at the start as well as the backup copy at the end of the drive using dd. Again, for this test, the Areca ARC-1883ix-12 was used, now with its SCSI miniport driver, since XP 32-Bit does not support StorPort.

Please note, that this is a German installation of Windows XP SP3. I hope the screenshots are still understandable enough for English speakers.

Recognition and MBR initialization seems to work just fine this time, unlike on XP x64:

The 4Kn Toshiba as detected by Windows XP Pro 32-Bit SP3, again on an Areca ARC-1883ix-12

The 4Kn Toshiba as detected by Windows XP Pro 32-Bit SP3, again on an Areca ARC-1883ix-12 (click to enlarge)

Let’s try to partition it:

Partitioning the drive once more, MBR style

Partitioning the drive once more, MBR style

Sure looks good! And then, we get this:

A Master Boot Record, Windows XP and 4Kn: It does work after all

A Master Boot Record, Windows XP and 4Kn: It does work after all (click to enlarge)

So why does XP x64 not allow for initialization and partitioning of a 4Kn drive using MBR? Maybe because it’s got GPT for that? So in any case, it’s usable on both systems, the older NT 5.1 (XP 32-Bit) as well as the newer NT 5.2 (XP x64, Server 2003). Again, fsutil.exe confirms proper recognition of our 4Kn drive:

fsutil.exe reporting a 4kiB sector size, just like on XP x64

fsutil.exe reporting a 4kiB sector size, just like on XP x64

So all you need – just like on XP x64 – is a proper controller with proper firmware and drivers!

There is one hard limit here though that XP 32-Bit users absolutely need to keep in mind; Huge RAID volumes using LUN carving/splitting and software JBOD/disk spanning using Microsofts Dynamic Volumes are no longer possible when using 4Kn drives. Previously, you could tell certain RAID controllers to just serve huge arrays to the OS in 2TiB LUN slices (e.g. best practice for 3ware controllers on XP 32-Bit). Then, in Windows, you’d just make those slices Dynamic Volumes and span a single NTFS file system over all of them, thus pseudo-breaking the 2TiB barrier.

This can no longer be done, as Dynamic Volumes seemingly do not work with 4Kn drives on Microsoft operating systems before Windows 8, or at least not on XP 32-Bit. The option for converting the volume from MBR to GPT is simply greyed out in Windows disk management.

That means that the absolute maximum volume size using 4Kn disks on 32-Bit Windows XP is 16TiB! On XP x64 – thanks to GPT – it’s just a few blocks short of 256TiB, a limit imposed on us by the NTFS file systems’ 32-bit address field and 64kiB clusters, as 232 * 64KiB × 1024 × 1024 × 1024 = 256TiB.

And that concludes my tests, unless I have time and an actual machine to try FreeBSD or OpenBSD UNIX. Or maybe Windows 7. The likelihood for that is not too high at the moment though.

Sep 232014

CD burning logo[1] At work I usually have to burn a ton of heavily modified Knoppix CDs for our lectures every year or so. The Knoppix distribution itself is being built by me and a colleague to get a highly secure read-only, server-controlled environment for exams and lectures. Now, usually I’m burning on both a Windows box with Ahead [Nero], and on Linux with the KDE tool [K3B] (despite being a Gnome 2 user), both GUI tools. My Windows box had 2 burners, my Linux box one. To speed things up and increase disc quality at the same time the idea was to plug more burners into the machines and burn each individual disc slower, but parallelized.

I was shocked to learn that K3B can actually not burn to multiple burners at once! I thought I was just being blind, stumbling through the GUI like an idiot, but it’s actually really not there. Nero on the other hand managed to do this for what I believe is already the better part of a decade!

True disc burning stations are just too expensive, like 500€ for the smaller ones instead of the 80-120€ I had to spend on a bunch of drives, so what now? Was I building this for nothing?

Poor mans disc station

Poor mans disc station. Also a shitty photograph, my apologies for that, but I had no real camera available at work.

Well, where there is a shell, there’s a way, right? Being the lazy ass that I am, I was always reluctant to actually use the backend tools of K3B on the command line myself. CD/DVD burning was something I had just always done on a GUI. But now was the time to script that stuff myself, and for simplicities sake I just used the bash. In addition to the shell, the following core tools were used:

  • cut
  • grep
  • mount
  • sudo (For a dismount operation, might require editing /etc/sudoers)

Also, the following additional tools were used (most Linux distributions should have them, conservative RedHat derivatives like CentOS can get the stuff from [EPEL]):

  • [eject(eject and retract drive trays)
  • [sdparm(read SATA device information)
  • sha512sum (produce and compare high-quality checksums)
  • wodim (burn optical discs)

I know there are already scripts for this purpose, but I just wanted to do this myself. Might not be perfect, or even good, but here we go. The work(-in-progress) is divided into three scripts. The first one is just a helper script generating a set of checksum files from a master source (image file or disc) that you want to burn to multiple discs later on, I call it create-checksumfiles.sh. We need one file for each burner device node later, because sha512sum needs that to verify freshly burned discs, so that’s why this exists:

expand/collapse source code
  1. #!/bin/bash
  3. wrongpath=1 # Path for the source/master image is set to invalid in the
  4.             # beginning.
  6. # Getting path to the master CD or image file from the user. This will be
  7. # used to generate the checksum for later use by multiburn.sh
  8. until [ $wrongpath -eq 0 ]
  9. do
  10.   echo -e "Please enter the file name of the master image or device"
  11.   echo -e "(if it's a physical disc) to create our checksum. Please"
  12.   echo -e 'provide a full path always!'
  13.   echo -e "e.g.: /home/myuser/isos/master.iso"
  14.   echo -e "or"
  15.   echo -e "/dev/sr0\n"
  16.   read -p "> " -e master
  18.   if [ -b $master -o -f $master ] && [ -n "$master" ]; then
  19.     wrongpath=0 # If device or file exists, all ok: Break this loop.
  20.   else
  21.     echo -e "\nI can find neither a file nor a device called $master.\n"
  22.   fi
  23. done
  25. echo -e "\nComputing SHA512 checksum (may take a few minutes)...\n"
  27. checksum=`sha512sum $master | cut -d' ' -f1` # Computing checksum.
  29. # Getting device node name prefix of the users' CD/DVD burners from the
  30. # user.
  31. echo -e "Now please enter the device node prefix of your disc burners."
  32. echo -e "e.g.: \"/dev/sr\" if you have burners called /dev/sr1, /dev/sr2,"
  33. echo -e "etc."
  34. read -p "> " -e devnode
  36. # Getting number of burners in the system from the user.
  37. echo -e "\nNow enter the total number of attached physical CD/DVD burners."
  38. read -p "> " -e burners
  40. ((burners--)) # Decrementing by 1. E.g. 5 burners means 0..4, not 1..5!
  42. echo -e "\nDone, creating the following files with the following contents"
  43. echo -e "for later use by the multiburner for disc verification:"
  45. # Creating the per-burner checksum files for later use by multiburn.sh.
  46. for ((i=0;i<=$burners;i++))
  47. do
  48.   echo -e " * sum$i.txt: $checksum $devnode$i"
  49.   echo "$checksum $devnode$i" > sum$i.txt
  50. done
  52. echo -e ""

As you can see it’s getting its information from the user interactively on the shell. It’s asking the user where the master medium to checksum is to be found, what the users burner / optical drive devices are called, and how many of them there are in the system. When done, it’ll generate a checksum file for each burner device, called e.g. sum0.txt, sum1.txt, … sum<n>.txt.

Now to burn and verify media in a parallel fashion, I’m using an old concept I have used before. There are two more scripts, one is the controller/launcher, which will then spawn an arbitrary amount of the second script, that I call a worker. First the controller script, here called multiburn.sh:

expand/collapse source code
  1. #!/bin/bash
  3. if [ $# -eq 0 ]; then
  4.   echo -e "\nPlease specify the number of rounds you want to use for burning."
  5.   echo -e "Each round produces a set of CDs determined by the number of"
  6.   echo -e "burners specified in $0."
  7.   echo -e "\ne.g.: ./multiburn.sh 3\n"
  8.   exit
  9. fi
  11. #@========================@
  12. #| User-configurable part:|
  13. #@========================@
  15. # Path that the image resides in.
  16. prefix="/home/knoppix/"
  18. # Image to burn to discs.
  19. image="knoppix-2014-09.iso"
  21. # Number of rounds are specified via command line parameter.
  22. copies=$1
  24. # Number of available /dev/sr* devices to be used, starting
  25. # with and including /dev/sr0 always.
  26. burners=3
  28. # Device node name used on your Linux system, like "/dev/sr" for burners
  29. # called /dev/sr0, /dev/sr1, etc.
  30. devnode="/dev/sr"
  32. # Number of blocks per complete disc. You NEED to specify this properly!
  33. # Failing to do so will break the script. You can read the block count 
  34. # from a burnt master disc by running e.g. 
  35. # ´sdparm --command=capacity /dev/sr*´ on it.
  36. blocks=340000
  38. # Burning speed in factors. For CDs, 1 = 150KiB/s, 48x = 7.2MiB/s, etc.
  39. speed=32
  41. #@===========================@
  42. #|NON user-configurable part:|
  43. #@===========================@
  45. # Checking whether all required tools are present first:
  46. # Checking for eject:
  47. if [ ! `which eject 2&gt;/dev/null` ]; then
  48.   echo -e "\e[0;33meject not found. $0 cannot operate without eject, you'll need to install"
  49.   echo -e "the tool before $0 can work. Terminating...\e[0m"
  50.   exit
  51. fi
  52. # Checking for sdparm:
  53. if [ ! `which sdparm 2&gt;/dev/null` ]; then
  54.   echo -e "\e[0;33msdparm not found. $0 cannot operate without sdparm, you'll need to install"
  55.   echo -e "the tool before $0 can work. Terminating...\e[0m"
  56.   exit
  57. fi
  58. # Checking for sha512sum:
  59. if [ ! `which sha512sum 2&gt;/dev/null` ]; then
  60.   echo -e "\e[0;33msha512sum not found. $0 cannot operate without sha512sum, you'll need to install"
  61.   echo -e "the tool before $0 can work. Terminating...\e[0m"
  62.   exit
  63. fi
  64. # Checking for sudo:
  65. if [ ! `which sudo 2&gt;/dev/null` ]; then
  66.   echo -e "\e[0;33msudo not found. $0 cannot operate without sudo, you'll need to install"
  67.   echo -e "the tool before $0 can work. Terminating...\e[0m"
  68.   exit
  69. fi
  70. # Checking for wodim:
  71. if [ ! `which wodim 2&gt;/dev/null` ]; then
  72.   echo -e "\e[0;33mwodim not found. $0 cannot operate without wodim, you'll need to install"
  73.   echo -e "the tool before $0 can work. Terminating...\e[0m\n"
  74.   exit
  75. fi
  77. ((burners--)) # Reducing number of burners by one as we also have a burner "0".
  79. # Initial burner ejection:
  80. echo -e "\nEjecting trays of all burners...\n"
  81. for ((g=0;g&lt;=$burners;g++))
  82. do
  83.   eject $devnode$g &amp;
  84. done
  85. wait
  87. # Ask user for confirmation to start the burning session.
  88. echo -e "Burner trays ejected, please insert the discs and"
  89. echo -e "press any key to start.\n"
  90. read -n1 -s # Wait for key press.
  92. # Retract trays on first round. Waiting for disc will be done in
  93. # the worker script afterwards.
  94. for ((l=0;l&lt;=$burners;l++))
  95. do
  96.   eject -t $devnode$l &amp;
  97. done
  99. for ((i=1;i&lt;=$copies;i++)) # Iterating through burning rounds.
  100. do
  101.   for ((h=0;h&lt;=$burners;h++)) # Iterating through all burners per round.
  102.   do
  103.     echo -e "Burning to $devnode$h, round $i."
  104.     # Burn image to burners in the background:
  105.     ./burn-and-check-worker.sh $h $prefix$image $blocks $i $speed $devnode &amp;
  106.   done
  107.   wait # Wait for background processes to terminate.
  108.   ((j=$i+1));
  109.   if [ $j -le $copies ]; then
  110.     # Ask user for confirmation to start next round:
  111.     echo -e "\nRemove discs and place new discs in the drives, then"
  112.     echo -e "press a key for the next round #$j."
  113.     read -n1 -s # Wait for key press.
  114.     for ((k=0;k&lt;=$burners;k++))
  115.     do
  116.       eject -t $devnode$k &amp;
  117.     done
  118.     wait
  119.   else
  120.     # Ask user for confirmation to terminate script after last round.
  121.     echo -e "\n$i rounds done, remove discs and press a key for termination."
  122.     echo -e "Trays will close automatically."
  123.     read -n1 -s # Wait for key press.
  124.     for ((k=0;k&lt;=$burners;k++))
  125.     do
  126.       eject -t $devnode$k &amp; # Pull remaining empty trays back in.
  127.     done
  128.     wait
  129.   fi
  130. done

This one will take one parameter on the command line which will define the number of “rounds”. Since I have to burn a lot of identical discs this makes my life easier. If you have 5 burners, and you ask the script to go for 5 rounds that would mean you get 5 × 5 = 25 discs, if all goes well. It also needs to know the size of the medium in blocks for a later phase. For now you have to specify that within the script. The documentation inside shows you how to get that number, basically by checking a physical master disc with sdparm –command=capacity.

Other things you need to specify are the path to the image, the image files’ name, the device node name prefix, and the burning speed in factor notation. Also, of course, the number of physical burners available in the system. When run, it’ll eject all trays, prompt the user to put in discs, and launch the burning & checksumming workers in parallel.

The controller script will wait for all background workers within a round to terminate, and only then prompt the user to remove and replace all discs with new blank media. If this is the last round already, it’ll prompt the user to remove the last media set, and will then retract all trays by itself at the press of any key. All tray ejection and retraction is done automatically, so with all your drive trays still empty and closed, you launch the script, it’ll eject all drive trays for you, and retract after a keypress signaling the script all trays have been loaded by the user etc.

Let’s take a look at the worker script, which is actually doing the burning & verifying, I call this burn-and-check-worker.sh:

expand/collapse source code
  1. #!/bin/bash
  3. burner=$1   # Burner number for this process.
  4. image=$2    # Image file to burn.
  5. blocks=$3   # Image size in blocks.
  6. round=$4    # Current round (purely to show the info to the user).
  7. speed=$5    # Burning speed.
  8. devnode=$6  # Device node prefix (devnode+burner = burner device).
  9. bwait=0     # Timeout variable for "blank media ready?" waiting loop.
  10. mwait=0     # Timeout variable for automount waiting loop.
  11. swait=0     # Timeout variable for "disc ready?" waiting loop.
  12. m=0         # Boolean indicating automount failure.
  14. echo -e "Now burning $image to $devnode$burner, round $round."
  16. # The following code will check whether the drive has a blank medium
  17. # loaded ready for writing. Otherwise, the burning might be started too
  18. # early when using drives with slow disc access.
  19. until [ "`sdparm --command=capacity $devnode$burner | grep blocks:\ 1`" ]
  20. do
  21.   ((bwait++))
  22.   if [ $bwait -gt 30 ]; then # Abort if blank disc cannot be detected for 30 seconds.
  23.     echo -e "\n\e[0;31mFAILURE, blank media did not become ready. Ejecting and aborting this thread..."
  24.     echo -e "(Was trying to burn to $devnode$burner in round $round,"
  25.     echo -e "failed to detect any blank medium in the drive.)\e[0m"
  26.     eject $devnode$burner
  27.     exit
  28.   fi
  29.   sleep 1 # Sleep 1 second before next check.
  30. done
  32. wodim -dao speed=$speed dev=$devnode$burner $image # Burning image.
  34. # Notify user if burning failed.
  35. if [[ $? != 0 ]]; then
  36.   echo -e "\n\e[0;31mFAILURE while burning $image to $devnode$burner, burning process ran into trouble."
  37.   echo -e "Ejecting and aborting this thread.\e[0m\n"
  38.   eject $devnode$burner
  39.   exit
  40. fi
  42. # The following code will eject and reload the disc to clear the device
  43. # status and then wait for the drive to become ready and its disc to
  44. # become readable (checking the discs block count as output by sdparm).
  45. eject $devnode$burner &amp;&amp; eject -t $devnode$burner
  46. until [ "`sdparm --command=capacity $devnode$burner | grep $blocks`" = "blocks: $blocks" ]
  47. do
  48.   ((swait++))
  49.   if [ $swait -gt 30 ]; then # Abort if disc cannot be redetected for 30 seconds.
  50.     echo -e "\n\e[0;31mFAILURE, device failed to become ready. Aborting this thread..."
  51.     echo -e "(Was trying to access $devnode$burner in round $round,"
  52.     echo -e "failed to re-read medium for 30 seconds after retraction.)\e[0m\n."
  53.     exit
  54.   fi
  55.   sleep 1 # Sleep 1 second before next check to avoid unnecessary load.
  56. done
  58. # The next part is only necessary if your system auto-mounts optical media.
  59. # This is usually the case, but if your system doesn't do this, you need to
  60. # comment the next block out. This will otherwise wait for the disc to
  61. # become mounted. We need to dismount afterwards for proper checksumming.
  62. until [ -n "`mount | grep $devnode$burner`" ]
  63. do
  64.   ((mwait++))
  65.   if [ $mwait -gt 30 ]; then # Warn user that disc was not automounted.
  66.     echo -e "\n\e[0;33mWARNING, disc did not automount as expected."
  67.     echo -e "Attempting to carry on..."
  68.     echo -e "(Was waiting for disc on $devnode$burner to automount in"
  69.     echo -e "round $round for 30 seconds.)\e[0m\n."
  70.     m=1
  71.     break
  72.   fi
  73.   sleep 1 # Sleep 1 second before next check to avoid unnecessary load.
  74. done
  75. if [ ! $m = 1 ]; then # Only need to dismount if disc was automounted.
  76.   sleep 1 # Give the mounter a bit of time to lose the "busy" state.
  77.   sudo umount $devnode$burner # Dismount burner as root/superuser.
  78. fi
  80. # On to the checksumming.
  81. echo -e "Now comparing checksums for $devnode$burner, round $round."
  82. sha512sum -c sum$burner.txt # Comparing checksums.
  83. if [[ $? != 0 ]]; then # If checksumming produced errors, notify user. 
  84.   echo -e "\n\e[0;31mFAILURE while burning $image to $devnode$burner, checksum mismatch.\e[0m\n"
  85. fi
  87. eject $devnode$burner # Ejecting disc after completion.

So as you can probably see, this is not very polished, scripts aren’t using configuration files yet (would be a nice to have), and it’s still a bit chaotic when it comes to actual usability and smoothness. It does work quite well however, with the device/disc readiness checking as well as the anti-automount workaround having been the major challenges (now I know why K3B ejects the disc before starting its checksumming, it’s simply impossible to read from the disc after burning finishes).

When run, it looks like this (user names have been removed and paths altered for the screenshot):


“multiburner.sh” at work. I was lucky enough to hit a bad disc, so we can see the checksumming at work here. The disc actually became unreadable near its end. Verification is really important for reliable disc deployment.

When using a poor mans disc burning station like this, I would actually recommend putting stickers on the trays like I did. That way, you’ll immediately know which disc to throw into the garbage bin.

This could still use a lot of polishing, and it’s quite sad, that the “big” GUI tools can’t do parallel burning, but I think I can now make due. Oh, and I actually also tried Gnomes “brasero” burning tool, and that one is far too minimalistic and can also not burn to multiple devices at the same time. They may be other GUI fatsos that can do it, but I didn’t want to try and get any of those installed on my older CentOS 6 Linux, so I just did it the UNIX way, even if not very elegantly. ;)

Maybe this can help someone out there, even though I think there might be better scripts than mine to get it done, but still. Otherwise, it’s just documentation for myself again. :)

Edit: Updated the scripts to implement a proper blank media detection to avoid burning starting prematurely in rare cases. In addition to that, I added some code to detect burning errors (where the burning process itself would fail) and notify the user about it. Also applied some cosmetic changes.

Edit 2: Added tool detection to multiburn.sh, and removed redundant color codes in warning & error messages in burn-and-check-worker.sh.

[1] Article logo based on the works of Lorian and Marcin Sochacki, “DVD.png” licensed under the CC BY-SA 3.0.

Apr 022014

ADATA logoIn the past, ADATA has been known for its budget series of solid state drives, but never really for any killer products. That place has recently been taken by the likes of Intel, Samsung and Crucial amongst a few others. Now it seems that ADATA has seen enough mediocrity and is reaching for the top of the line. Based on a Marvell 88SS9189 – just like the Crucial M550 1TB drive – the new ADATA Premier Pro SP920 boasts the same 1TB capacity, some 20nm Micron NAND, a SATA/6Gbps interface and a somewhat more rich kit than Crucial does, for roughly the same price.

ADATA Premier Pro SP920 1TB announcement

ADATA Premier Pro SP920 1TB announcement

While the disk is available in smaller capacities of 128GB, 256GB and 512GB too, only the largest ones with 512GB and 1TB will make use of the full potential of NAND flash parallelism, reaching 4k random read IOPS close to 100k and write IOPS close to 90k with read and write transfer rates beyond the 500MB/s wall. The good thing about this drive – just as with the Crucial m550 – is that we finally get some large SSDs for a relatively affordable price. But that alone wouldn’t really interest me that much, now would it?

The thing that really piqued my interest was the fact that ADATA decided to develop their own SSD toolbox, which comes in the form of a tiny, single “SSDTool.exe” file. The rather slim tool features most important features, the ATA TRIM command above all. So ADATA is now joining the ranks of the manufacturers backporting TRIM to Windows XP and Windows XP Pro x64 Edition as the fourth member, limited to Intel, Samsung and Corsair before that. I had to give that a try immediately of course, again on XP x64, see here:

Please note that since I did not really have any supported ADATA drive available, some functionality of the ADATA SSD toolbox naturally wasn’t available to me. But as we can see, all the important stuff is there, just like on other well-developed SSD toolboxes such as the one made by Intel. There are the OS tweaks, TRIM of course, host writes information, firmware update functionality, S.M.A.R.T. and secure erase. And it works just fine on XP / XP x64 as far as I can see.

If you want to check out which ADATA SSDs are currently supported by SSDTool.exe, please check out the [compatibility list] on ADATAs download page (just scroll down a bit)! Also, if you want to learn more about the functionality before installing, a [manual] is available.

Please note that Windows XP and Windows XP Pro x64 Edition are however not officially supported, so this is subject to change any time. This is also the reason why I decided to offer a download of the current version 1.2 of the toolbox right here:

  • [ADATA SSD Toolbox version 1.2] (use the checksums below to compare with the version provided by ADATA)
    • md5 checksum: 942b8920a1d3e97a4c33d817220eb1ff
    • SHA1 checksum: a996ccc8edae8916f7f7f2cf372d8527bd912015
    • SHA512 checksum: 0fe1e18c184c19dab83060351238043f18a47e570d8cab3139566490fe3c03f66a \
    • 38736527143a62167163f736d79eb14000b9ecc00a9482e0b4ed7dc6122bb9

So there you go! While this may not stay compatible to XP forever regarding future SSD releases, it should work just fine with the current ones, like the massive 1TB Premier Pro SP920! So besides Intel, Samsung and Corsair, this opens up a fourth option for steadfast XP users who wish to use a large and fast SSD!

Update, 2014-04-07: Just to make sure whether my assumptions were correct and to probe ADATA support, I actually sent them a request a few days ago, asking whether their SSD toolbox started up on Windows XP unintentionally, or whether support is official. Also, I asked if the thing actually really works with an ADATA SSD, since I didn’t have any around to test it. Now guess what…

ADATA support actually informed me, that they would need to ask the tech guys or whatever to test it out, as they were not aware of this. So somebody at ADATA actually fired up a Windows XP machine with an ADATA SSD, installed their SSD toolbox and tested its functionality. Just now, one day before the end of official Microsoft support for Windows XP ADATA let me know, that their tests were successful, and that they will update their manual on the website accordingly. Holy shit! Now that level of support I’d call awesome! Who else goes that extra mile these days? Makes the Premier Pro SP920 all the more attractive, at least for me. :)

Jan 202014

ZFS logoI have been using PC-BSD UNIX for testing purposes since I think October last year. Since I only created a 30GB virtual disk image for that system in VMware Player, I started running out of space recently. As PC-BSD uses the ZFS as its root file system – originally developed by Sun – and everybody seems to think that it’s the end of all means, I thought lets try and grow my file system online!

First of all, this thing has storage pools, or so-called zpools. A zpool can have entire disks or partitions (called “slices” on BSD, as partitions are something slightly different there) as members. Then, it can make RAID arrays out of that, or add a fast SSD as a caching or log device etc.

When you’re done with your zpool, you need the actual ZFS on top as far as I understand this (I’m still very new to ZFS). There you can have extremely nice features like block-level data deduplication or transparent compression like on NTFS.

So, how to grow this thing? Since I didn’t dare to resize the slice with the BSD disklabel and BSD partition structures inside, I went ahead and just added another slice that I would then just plug into my root storage pool called “tank”. BSD doesn’t use conventional structures like other operating systems. It can use a “slice” (meaning a regular MBR/GPT partition), but it will create its own “BSD disklabel” inside, which is like another partition table, and then it will create its BSD style partitions using that disklabel. Typically, other operating systems cannot handle BSD disk structures. But they can handle “slices”, so I did add another such “slice” or partition with a bootable gparted ISO, but I believe you can reach the same goal by running this as root given the disk /dev/da0, filling the remaining disk space up completely with that new slice:

gpart add -t freebsd da0

Ok, this is the current status before modifying anything else, as you can see, I have only 3.2GB of space left on my ZFS:

[thrawn@unixbox ~]$ zfs list
NAME                       USED  AVAIL  REFER  MOUNTPOINT
tank                      24.7G   3.2G    31K  legacy
tank/ROOT                 11.5G   3.2G    31K  legacy
tank/ROOT/default         11.5G   3.2G  11.5G  /mnt
tank/tmp                  99.5K   3.2G  99.5K  /tmp
tank/usr                  13.0G   3.2G    31K  /mnt/usr
tank/usr/home             4.26G   3.2G    32K  /usr/home
tank/usr/home/thrawn      4.26G   3.2G  4.26G  /usr/home/thrawn
tank/usr/jails              31K   3.2G    31K  /usr/jails
tank/usr/obj                31K   3.2G    31K  /usr/obj
tank/usr/pbi              8.03G   3.2G  8.03G  /usr/pbi
tank/usr/ports             778M   3.2G   536M  /usr/ports
tank/usr/ports/distfiles   241M   3.2G   241M  /usr/ports/distfiles
tank/usr/src                31K   3.2G    31K  /usr/src
tank/var                  6.82M   3.2G    31K  /mnt/var
tank/var/audit              33K   3.2G    33K  /var/audit
tank/var/log               442K   3.2G   442K  /var/log
tank/var/tmp              6.33M   3.2G  6.33M  /var/tmp

Now, to make sure our zpool plus its ZFS file system will grow automatically when adding new devices, run the following, activating automatic expansion for our zpool:

[thrawn@unixbox] ~# zpool set autoexpand=on tank

Time to add stuff to the zpool now. Growing ZFS itself shouldn’t be necessary, as it scales with the zpool, at least that’s what I think it does. What I tried to do then was to add the new slice to the pool “tank”, but that failed initially. See what the zpool command had to say about what I attempted to do:

[thrawn@unixbox] ~# zpool add tank /dev/da0s2
cannot add to 'tank': root pool can not have multiple vdevs or separate logs

Ok so… What the hell? I googled around a bit and found out, that this seems to be a leftover limitation coming from SunOS/Solaris, where ZFS originally comes from. This problem means, we cannot add data, log or cache devices to this pool, because its the one we’re booting from. There is a dirty little trick that will make zpool believe it’s just a data pool though! Just clear the bootfs string, then retry and add bootfs after that again, as we wouldn’t have a boot volume otherwise. Like that:

[thrawn@unixbox] ~# zpool set bootfs= tank
[thrawn@unixbox] ~# zpool add tank /dev/da0s2
[thrawn@unixbox] ~# zpool set bootfs=tank tank

The first line clears the flag by setting it to an empty string for the zpool “tank”, then we’re adding the new slice to it, and finally, we’re making “tank” our bootfs again. That’s a bit scary, so you should probably make sure you have a full backup system image before trying this. For me, it did work, and immediately afterwards I got my +30GB from the added slice:

[thrawn@unixbox ~]$ zfs list
NAME                       USED  AVAIL  REFER  MOUNTPOINT
tank                      24.7G  33.2G    31K  legacy
tank/ROOT                 11.5G  33.2G    31K  legacy
tank/ROOT/default         11.5G  33.2G  11.5G  /mnt
tank/tmp                  99.5K  33.2G  99.5K  /tmp
tank/usr                  13.0G  33.2G    31K  /mnt/usr
tank/usr/home             4.26G  33.2G    32K  /usr/home
tank/usr/home/thrawn      4.26G  33.2G  4.26G  /usr/home/thrawn
tank/usr/jails              31K  33.2G    31K  /usr/jails
tank/usr/obj                31K  33.2G    31K  /usr/obj
tank/usr/pbi              8.03G  33.2G  8.03G  /usr/pbi
tank/usr/ports             778M  33.2G   536M  /usr/ports
tank/usr/ports/distfiles   241M  33.2G   241M  /usr/ports/distfiles
tank/usr/src                31K  33.2G    31K  /usr/src
tank/var                  6.82M  33.2G    31K  /mnt/var
tank/var/audit              33K  33.2G    33K  /var/audit
tank/var/log               440K  33.2G   440K  /var/log
tank/var/tmp              6.33M  33.2G  6.33M  /var/tmp

Took a deep breath, rebooted, and it worked just fine. And it works instantly and online! There is no lengthy resizing process involved for ZFS, it seems to just grow on demand, writing data to the new underlying devices as needed. It IS pretty awesome actually.

But I still have a lot to learn there, as all these tools like zpool and zfs are pretty weird, and I’m not used to BSD disk structures or the ZFS structures. For somebody like me, who has not really handled anything but MBR/GPT style primary partitions before (minus some Windows dynamic disks), it’s actually quite confusing. But I’ll get used to it I guess.

With all those other super powerful features that ZFS has, I think this was just a glimpse of what can be achieved!

Oh, and before I forget to mention it, some initial help for my hopeless self was provided by [kmoore134] on the [PC-BSD forums].

Oct 102013

SSD alignment logoRecently, a friend from a forum asked away about what he would need to be cautious about when installing Windows 98 to a flash medium, like an SSD or in his case an IDE CompactFlash card. While this is the extreme case, people tend to ask such questions also for Windows XP and XP x64 or other older systems. So I thought I’d show you, using this Win98 + CF card case as an example. Most of the stuff would be similar for more “modern” systems like maybe Windows 2000, XP, Linux 2.4 etc. anyway.

For this, we are going to use the bootable CD version of [gparted]. In case you do not know gparted yet, you can think of it as an OpenSource PartitionMagic, that you can also boot from CD or USB. If you know other graphical partition managers, you will most likely feel right at home with gparted. Since this article is partially meant as a guide for my friend, parts of the screenshots will show german language, but it shouldn’t be too much of a problem. First, let’s boot this thing up from CD (or USB key if you have no optical drive). See the following series of screenshots, it’s pretty self-explanatory:

Now that we’re up and running you will be presented with a list of available drives accessible at “GParted / devices”. In our case we don’t need to do that as we only have one device posing as a 16GB large CompactFlash card as you will see below. Since this is a factory-new device, we shall assume that it will be unformatted and unpartitioned. If it IS pre-formatted you would need to check the given alignment first, see below to understand what i mean. But you can also just delete whatever partitions are on there if you don’t need any data from the device and proceed with this guide afterwards.

What we will do is create a properly aligned partition, so we won’t create unnecessary wear on our NAND flash and so we won’t sacrifice performance. This is critical especially for slower flash media, because if you’re going to install an operating system onto that, you’d really feel the pain with random I/O on a misaligned partition (seek performance drops, far too large reads/writes would occur etc.). Just follow these steps here to partition the device and pre-format it with FAT32. If you’re on Windows 2000 or newer, you may want to choose NTFS instead, for Linux choose whatever bootable file system your target distribution understands, like maybe JFS or something. If your flash medium supports TRIM (newer SATA SSDs do), choose EXT4, XFS or BTRFS instead, for BSD UNIX pick UFS or ZFS if you are ok with those:

This is it, our partition starts at sector number 2048 (1MiB, or 1MB as I tend to say), which will work for all known flash media. If you expect your medium to have larger blocks, you could set the start to sector 20480 instead, which would mean 10MB. I don’t think there are media out there with blocks THAT large, but by doing this you could be a 100% certain that it’ll be aligned. Time to shut gparted down:

After that, you may still re-format the partition from DOS or legacy Windows, but make sure to not re-partition the disk from within a legacy operating system, so no fdisk and no disk management partitioning stuff in Win2000/XP!

On a side note, FAT32 file systems as shown here can’t be [TRIM]’ed on Windows, so when the media gets enough write cycles, writes will permanently slow down, with maybe garbage collection helping a little bit. You could hook that FAT32-formatted disk up to a modern Linux and TRIM it there though, if it’s a new enough SATA drive. In my friends’ case that’s not an issue anyway, as he is using IDE/PATA flash that doesn’t support the TRIM command to begin with. But if you do have a modern enough SATA system and a TRIM-capable SSD, you might want to go with NTFS for Windows or EXT4/BTRFS/XFS for Linux as well as UFS/ZFS for BSD UNIX if you can, as those file systems are supported by current TRIM implementations, or yeah, FAT32 for data exchange between operating systems. Keep in mind though, to TRIM FAT32 SATA SSDs, Linux is required at the time of this writing.

And no, no IDE/PATA media can say “TRIM” as far as I know.

Also: You can re-align existing partitions with gparted in a somewhat similar fashion as shown above by just editing them. This may be useful if you messed up with DOS or WinXP and have already installed your system on a misaligned partition.  Gparted will then have to re-align the entire file system on that partition though, and that may take hours, e.g. for my Intel 320 600GB SSD (about 400GB full) it took almost 3 hours. To see which file systems gparted supports for re-alignment, [look here] (It’s mostly “shrink” that is required)!

Jul 242013

Western Digital logoMost people who had to tackle the problem I was confronted with probably know all about what I’m going to say now anyway. But still: Recently a hard drive in our NAS box at work failed, and the only locally available replacement drive was a Western Digital Greenpower 2TB, exact model WDC WD20EARX-00PASB0. And all of these drives, including the more “professional” series like RE4-GP or WD Blacks have a really serious problem with their firmware.

What the drive is trying to do is to park its read-/write-heads very quickly. In the case of my drive, it attempted to do that every 8 seconds. What it managed to do in the end was to park the heads every 27 seconds. That meant that over a runtime of 327 hours the drive had accumulated more than 43.000 load-/unload cycles. That’s in slightly less than 2 weeks. At >200.000 it gets really unhealthy, as you’re then marching towards mechanical failure. See here:

root@TS-XLC62:~# echo && smartctl -d marvell -a /dev/sda | grep -i -e power_on -e load_cycle

  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       327
193 Load_Cycle_Count        0x0032   186   186   000    Old_age   Always       -       43450

Now WD support told me, that there is no software or firmware update for my product and that everything had been manufactured to highest quality standards blah blah blah. Turns out not to be true. For the WD RE4-GP they’re offering a tool called [wdidle3], that can change the so-called “Idle3” timer in the firmware of ALL modern Western Digital drives, not just RE4-GP. You can set it to a maximum of 300 seconds or 5 minutes, or disable it entirely, after which the heads will only be parked on power cycle or when requested by the operating system. The way it should be.

You can do that by creating a [boot disk] or [bootable USB pendrive] with DOS on it, put wdidle3.exe on that drive, boot from it and you can do stuff like:

  • wdidle3.exe /R (Report the current Idle3 Setting, typically 8.000 or eight seconds)
  • wdidle3.exe /S300 (Sets the Idle3 timer to its maximum of 300 seconds.  This will create ~288 load-/unload-cycles per day in 24/7)
  • wdidle3.exe /D (This sets the timer to 3720 seconds or 62 minutes, which seems to be interpreted as “disabled”. No more parking)

Obviously, the “disabled” Option (“/D”) is what one would want to go for. In that mode, the heads can still be parked by the OS itself, but otherwise they’ll just hold still as all other hard drives do it. After the next reboot the drive should behave normally.

I find it quite sad and frankly pathetic that WD is trying to sell us a “Green IT” hard drive, that actually wastes more resources than it saves. Even in normal desktop usage, these drives often die prematurely because of this artificially designed parking crap-feature! This wastes metals / rare earths, plastics and energy required to build replacement drives, and far more so than the little bit of energy you’re saving by having your heads parked. One might argue that the feature is designed in such a way, that drives typically fail right after the end of the warranty period for the largest target audience, the common desktop PC user.

For anything not “RAID Edition” WD doesn’t even give you the solution, even though it would work! Regular WD GP disks can be configured by wdidle3 just fine. They all seem to use the same firmware anyway. And yet WD says “there is no tool for your drive and nothing wrong here”. They also did not comment on my SMART statistics and my simple math showing my lifetime predictions.

A hard drive that would just work for 10 years out of the box would save a lot more resources and energy. By simply NOT dying or artificially killing itself. Now how about that for Green IT?

I hate it when companies are lying to people like that and then even trying to deny them knowledge about the clearly existing possibilities to fix the problem. You could have made it right, WD. But you didn’t. FU!

Jul 182013

Buffalo logoSince a colleague of mine has [rooted] our Buffalo Terastation III NAS (TS-XLC62) at work a while back, we changed the remote shell from Telnet to SSH and did a few other nice hacks. But there is one problem: The TS-XLC62 does not monitor the hard drives’ health by SMART, even though parts of the required smartmontools are installed on the tiny embedded Linux system. They’re just stitting there unused, just like the sshd before.

Today I’m going to show you how you can make this stuff work and how to enable SMART email notifications on this system, which has no standard Linux mail command, but a Buffalo-specific tmail command instead. We will enable the background smartd service, and configure it properly for this specific Terastation model. All of the steps shown here are done on a rooted TS-XLC62, so make sure you’re always root here:

Buffalo Terastation IIIThe smartmontools on the box are actually almost complete. Only the drive database and init scripts are missing, and for some reason, running update-smart-drivedb on the system would fail. So we need to get the database from another Linux/UNIX or even Windows machine running smartmontools. Usually, on Linux you can find the file here: /usr/local/share/smartmontools/drivedb.h“. Copy it onto the Terastation using scp from another *nix box: scp /usr/local/share/smartmontools/drivedb.h root@<terastation-host-or-ip>:/usr/local/share/smartmontools/“. You can use [FileZilla] or [puTTY] to copy stuff over from a Windows machine instead.

Note that this only makes sense if you have smartmontools 5.40 or newer (smartctl -V tells the version). Older releases cannot have their drive databases updated seperately, but it will most likely still work fine.

Now, log in to your Terastation using Telnet or SSH, and you can test whether it’s working by running a quick info check on one of the hard drives. We will need to specify the controller type as marvell, as the SATA controller of the Marvell Feroceon  MV78XX0 SoC in the box cannot be addressed by regular ATA/SCSI commands. Run:

smartctl -d marvell -i /dev/sda

In my case I get this, as I have already replaced the first failing Seagate hard drive with an even crappier WD one already (yeah, yeah, I know, but it was the only one available), it’s also not yet known by the smartmontools database:

smartctl version 5.37 [arm-none-linux-gnueabi] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Device Model:     WDC WD20EARX-00PASB0
Serial Number:    WD-WCAZAL555899
Firmware Version: 51.0AB51
User Capacity:    2,000,398,934,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Thu Jul 18 09:54:53 2013 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Now that that’s done we should make sure that smartmontools’ daemon called smartd will be running in the background doing regular checks on the drives. But since we will need to configure email notifications for that, we need to make sure that smartd can send emails first. The Terastation has no mail command however, only some Buffalo tmail command, that is no valid drop-in replacement for mail as the syntax is different.

So we need to write some glue-code, that will then later be invoked by smartd. I call this mailto.sh, and I’ll place it in /usr/local/sbin/. It’s based on [this article], that gave me a non-working solution on my Terastation for several reasons, but that’s easily fixed up, so it shall look somewhat like this (you’ll need to fill in several of the variables with your own data of course), oh, and as always, don’t forget to do chmod 550 on it when it’s done:

expand/collapse source code
  1. #! /bin/bash
  2. ##############################################################
  3. # Written as glue code, so that smartmontools/smartd can use #
  4. # Buffalos own "tmail", as we don't have "mail" installed    #
  5. # on the Terastation.                                        #
  6. ##############################################################
  8. # User-specific declarations:
  10. TMP_FILE=/tmp/Smartctl.error.txt
  12. SMTP_PORT=25
  14. SUBJECT="SMART Error"
  16. ENCODING="UTF-8"
  17. BYTE=8
  19. # Code:
  21. # Write email metadata to the temp file (smartd gives us this):
  22. echo To:  $SMARTD_ADDRESS &gt; $TMP_FILE 
  23. echo Subject:  "$SMARTD_SUBJECT" &gt;&gt; $TMP_FILE 
  24. echo &gt;&gt; $TMP_FILE 
  25. echo &gt;&gt; $TMP_FILE 
  27. # Save the email message (STDIN) to the temp file:
  28. cat &gt;&gt; $TMP_FILE 
  30. # Append the output of smartctl -a to the message:
  31. smartctl -a -d $SMARTD_DEVICETYPE $SMARTD_DEVICE &gt;&gt; $TMP_FILE 
  33. # Now email the message to the user using Buffalos mailer:
  34. tmail -s $SMTP -t $SMARTD_ADDRESS -f $FROM -sub $SUBJECT \
  35. -h $FROM_NAME -c $ENCODING -b $BYTE -s_port $SMTP_PORT &lt; $TMP_FILE 
  37. # Delete temporary file
  38. rm -f $TMP_FILE

So this is our mailer script wrapping the stuff coming from smartd's invocation of mail around Buffalos own tmail. Now how do we make smartd call this? Let’s edit /usr/local/etc/smartd.conf to make it happen, fill in your email address where it says here, like you changed all the variables in mailto.sh before:

  1. # Monitor all four harddrives in the Buffalo Terastation with self-tests running
  2. # on Sunday 01:00AM for disk 1, 02:00AM for disk 2, 03:00AM for disk 3 and 04:00AM
  3. # for disk 4:
  5. /dev/sda -d marvell -a -s L/../../7/01 -m &lt;EMAIL&gt; -M exec /usr/local/sbin/mailto.sh
  6. /dev/sdb -d marvell -a -s L/../../7/02 -m &lt;EMAIL&gt; -M exec /usr/local/sbin/mailto.sh
  7. /dev/sdc -d marvell -a -s L/../../7/03 -m &lt;EMAIL&gt; -M exec /usr/local/sbin/mailto.sh
  8. /dev/sdd -d marvell -a -s L/../../7/04 -m &lt;EMAIL&gt; -M exec /usr/local/sbin/mailto.sh

Now if you want to test the functionality of the mailer beforehand, you can use this instead:

/dev/sda -d marvell -a -s L/../../7/01 -m &lt;EMAIL&gt; -M exec /usr/local/sbin/mailto.sh -M test

To test it, just run smartd -d on the shell. This will give you debugging output including some warnings caused by a bit of unexpected output that tmail will pass to smartd. This is non-critical though, it should look similar to this:

smartd version 5.37 [arm-none-linux-gnueabi]
Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Opened configuration file /usr/local/etc/smartd.conf
Configuration file /usr/local/etc/smartd.conf parsed.
Device: /dev/sda, opened
Device: /dev/sda, is SMART capable. Adding to "monitor" list.
Monitoring 1 ATA and 0 SCSI devices
Executing test of /usr/local/sbin/mailto.sh to <EMAIL> ...
Test of /usr/local/sbin/mailto.sh to <EMAIL> produced unexpected 
output (50 bytes) to STDOUT/STDERR: 
smtp_port 25
Get smtp portnum 25
pop3_port (null)

Test of /usr/local/sbin/mailto.sh to <EMAIL>: successful

Now you can kill smartd on a secondary shell by running the following command. We will be re-using this in an init script later too, as the Terastation init functions are leaving quite a lot to be desired, so I’ll go into the details a bit:

kill `ps | grep smartd | grep -v grep | cut -f1 -d"r"`

This command will get the process id of smartd and feed it to the kill command. The delimiter “r” is used for the cut command, because whitespace won’t work in some cases where the leading character of the ps output is also a whitespace, so it’ll match the first letter of the user running smartd, which has to be root.

To understand this better, just run ps | grep smartd | grep -v grep while smartd is running. If the PID is 5-digit, the leading character will be a number from the PID, but if it is 4-digit, the leading character is a whitespace instead, which would make cut -f1 -d " " report an empty string in our case, hence cut -f1 -d"r"… Very dirty, I know… Don’t care though. ;) You may remove the -M test directive from /usr/local/etc/smartd.conf now, if you’ve played around with that, so the smart spam will stop. :roll:

Finally, to make our monitoring run as a smooth auto-starting daemon in the background, we will need to write ourselves that init script. The default smartmontools one won’t work out of the box, as a few functions like killproc or daemon are missing on the Terastations embedded Linux. Yeah, I was too lazy to port them over. So a few adaptions will make it happen in a simplified fashion. See this reduced and adapted init script called smartd sitting in /etc/init.d/:

expand/collapse source code
  1. #! /bin/sh
  2. SMARTD_BIN=/usr/local/sbin/smartd
  4. RETVAL=0
  5. prog=smartd
  6. pidfile=/var/lock/subsys/smartd
  7. config=/usr/local/etc/smartd.conf
  9. start()
  10. {
  11.         [ $UID -eq 0 ] || exit 4
  12.         [ -x $SMARTD_BIN ] || exit 5
  13.         [ -f $config ] || exit 6
  14.         echo -n $"Starting $prog: "
  15.         $SMARTD_BIN $smartd_opts
  16.         RETVAL=$?
  17.         echo
  18.         [ $RETVAL = 0 ] &amp;&amp; touch $pidfile
  19.         return $RETVAL
  20. }
  22. stop()
  23. {
  24.         [ $UID -eq 0 ] || exit 4
  25.         echo -n $"Shutting down $prog: "
  26.         kill `ps | grep smartd | grep -v grep | cut -f1 -d"r"`
  27.         RETVAL=$?
  28.         echo
  29.         rm -f $pidfile
  30.         return $RETVAL
  31. }
  33. *)
  34.         echo $"Usage: $0 {start|stop}"
  35.         RETVAL=2
  36.         [ "$1" = 'usage' ] &amp;&amp; RETVAL=0
  38. esac
  40. exit $RETVAL

So yeah, instead of killproc we’re making due with kill and most of the service functions have been removed, limiting the script to start and stop. Plus, it will not check for multiple start invocations in this version, so it’s possible to start multiple smartd daemons and stop will only work for one running process at a time, so you’ll need to pay attention. Could be fixed easily, but I think it’s good enough that way. To make smartd start on boot, link it properly, somewhat like that, I guess S90 should be fine:

ln -s /etc/init.d/smartd /etc/rc.d/sysinit.d/S90smartd

Also, you can start and stop smartd from the shell more conveniently now without having to run smartd in the foreground and kill it from a secondary shell as it doesn’t have CTRL+C kill it. You can now just do these two things instead, like on any other SysVinit system, only with the limitations described above:

root@TS-XLC62:~# /etc/init.d/smartd stop
Shutting down smartd: Terminated
root@TS-XLC62:~# /etc/init.d/smartd start
Starting smartd: 

Better, eh? Now, welcome your SMART monitoring-enabled Buffalo Terastation with email notifications being sent on any upcoming hard drive problems detected by courtesy of smartmontools! :cool:

Edit: And here is a slighty more sophisticated init script, that will detect whether smartd is already running or not on start, so that multiple starts can no longer happen. It will also detect if smartd has been killed from outside the scope of the init scripts (like when it crashed or something) by looking at the PID file:

expand/collapse source code
  1. #! /bin/sh
  2. SMARTD_BIN=/usr/local/sbin/smartd
  3. RETVAL=0
  4. prog=smartd
  5. pidfile=/var/lock/subsys/smartd
  6. config=/usr/local/etc/smartd.conf
  8. start()
  9. {
  10.   [ $UID -eq 0 ] || exit 4
  11.   [ -x $SMARTD_BIN ] || exit 5
  12.   [ -f $config ] || exit 6
  13.   if [ -f $pidfile ]; then
  14.     echo "PID file $pidfile found! Will not start,"
  15.     echo "smartd probably already running!"
  16.     PID=`ps | grep smartd | grep -v grep | grep -v "smartd start" | cut -f1 -d"r"`
  17.     if [ ${#PID} -gt 0 ]; then
  18.       echo "Trying to determine smartd PID: $PID"
  19.     elif [ ${#PID} -eq 0 ]; then
  20.       echo "No running smartd process found. You may want to"
  21.       echo "delete $pidfile and then try again."
  22.     fi
  23.     exit 6
  24.   elif [ ! -f $pidfile ]; then
  25.     echo -n $"Starting $prog: "
  26.     $SMARTD_BIN $smartd_opts
  27.     RETVAL=$?
  28.     echo
  29.     [ $RETVAL = 0 ] &amp;&amp; touch $pidfile
  30.     return $RETVAL
  31.   fi
  32. }
  34. stop()
  35. {
  36.   [ $UID -eq 0 ] || exit 4
  37.   PID=`ps | grep smartd | grep -v grep | grep -v "smartd stop" | cut -f1 -d"r"`
  38.   if [ ${#PID} -eq 0 ]; then
  39.     echo "Error: No running smartd process detected!"
  40.     echo "Cleaning up..."
  41.     echo -n "Removing $pidfile if there is one... "
  42.     rm -f $pidfile
  43.     echo "Done."
  44.     exit 6
  45.   elif [ ${#PID} -gt 0 ]; then
  46.     echo -n $"Shutting down $prog: "
  47.     kill `ps | grep smartd | grep -v grep | grep -v "smartd stop" | cut -f1 -d"r"`
  48.     RETVAL=$?
  49.     echo
  50.     rm -f $pidfile
  51.     return $RETVAL
  52.   fi
  53. }
  55. case "$1" in
  56.   start)
  57.     start
  58.     ;;
  59.   stop)
  60.     stop
  61.     ;;
  62.   restart)
  63.     stop
  64.     start
  65.     ;;
  66.   status)
  67.     ps | grep smartd | grep -v grep | grep -v status
  68.     RETVAL=$?
  69.     ;;
  70.   *)
  71.     echo $"Usage: $0 {start|stop|restart|status}"
  72.     RETVAL=2
  73.     [ "$1" = 'usage' ] &amp;&amp; RETVAL=0
  74. esac
  76. exit $RETVAL

May 042013

Corsair logoCorsair, originally known only for memory modules, has since expanded into many different markets, like power supplies, enclosures, watercooling solutions, input devices and last but not least solid state drives. Corsairs offerings of the latter kind are typically based on a variety of SSD controller chips from LAMD (like the LM87800) and Sandforce (SF-2100, SF-2200) to the now widespread and well-received Marvell controllers (88SS9174).

Now, there are currently two competitors in the market which offer users a software tool to manage their solid state drives, one is Intel with its SSD toolbox, and the other is Samsung with its SSD magician. Both fully support Windows XP and Windows XP x64 Edition so you can get TRIM support for those older operating systems that cannot issue the ATA command on their own, like say Windows 7 does. The result is that there will be very little write performance degradation. None at all, as long as you don’t fully fill the SSD up.

So, TRIM always makes sure that there is some space on the SSD marked as free, so write commands can be issued instead of expensive read-modify-writes. And now, Corsair enters the fray, providing its own SSD toolbox, that also supports Windows XP operating systems and features several configuration options. See the following screenshots:

So, as you can see, Corsairs SSD toolbox supports not only TRIM, but also a reconfiguration of their drives overprovisioning proportions. That means you can take away some usable space and add it to the spare area that is being used to reallocate damaged regions of the usable area to itself. Also, you can shrink the spare area, increasing the amount of usable space. Neat.

Unfortunately, I do not currently own a Corsair SSD, so I can’t really test the functionality, as it doesn’t work with my Intel SSD of course. But I think since the Corsair SSD toolbox is not “always on” like Samsungs SSD magician, its TRIM scheduling option will most likely work by using the internal Windows task scheduler, like the Intel solution does. Preferrable if you ask me, because that way, the toolbox doesn’t always need to run in the background.

So now it’s three companies officially supporting TRIM on Windows XP: Intel, Samsung and Corsair!

Mar 192013

SCSI LVD/SE logoMy [servers]German flag web, SQL and mail directories are all sitting on a RAID-5 array hosted by an ancient IBM ServeRAID II controller that delivers 10MB/s at most. Using a very old 33MHz PowerPC 403GCX as its XOR processor and only 4MB of cache, what could one expect? Also, most stuff is cached in RAM anyway, so there isn’t much hurt.

Still, there is a guy that is currently selling much more powerful IBM ServeRAID 4H controllers on ebay, fully equipped with 128MB cache and battery backup unit. Who knows if the batteries still work, but nonetheless, for 10€ it’s just dirt cheap, so I got one. The auction is hosted in germany, see [this link]German flag! That one is 64-Bit 66MHz PCI and doesn’t quite fit into my older server (from a style perspective), also it features U160 SCSI, whereas my backplanes are only UW SCSI. So 160MB/s versus 40MB/s. But still, If I can max out the bus system of those backplanes, it’s going to be quite a boost anyway. And that should very well be possible given the far more powerful 266MHz PowerPC 750 XOR processor on this controller.

Also, the ServeRAID 4H features a pretty majestic appearance, full length and all, just need to add the bracket at the end of the card so it will sit properly in my server:

Now there is just one thing that remains unclear, and that’s whether my server will even be able to run the card. First I’ll test it in another machine anyway, to check for basic 32-Bit 33MHz PCI support. Unfortunately I can only test for PCI 2.1, whereas my server uses ancient 2.0. But we’ll see. It’s going to be quite some time before I’ll try it in my server anyway.