Nov 212016
 

IBM ServeRAID Manager logoBelieve it or not, the server hosting the very web site you’re reading right now has all of its data stored on an ancient IBM ServeRAID II array made in the year 1995. That makes the SCSI RAID-5 controller 21 years old, and the 9.1GB SCA drives attached to it via hot-plug bays are from 1999, so 17 years old. Recently, I found out that IBMs’ latest SCSI ServeRAID manager from 2011 still supports that ancient controller as well as the almost equally ancient Windows 2000 Server I’m running on the machine. In hope for better management functionality, I chose to give the new software a try. So additionally to my antiquated NT4 ServeRAID manager v2.23.3 I’d also run v9.30.21 side-by-side! This is also in preparation for a potential upgrade to a much newer ServeRAID-4H and larger SCSI drives.

Just so you know how the old v2.23.3 looks, here it is:

IBM ServeRAID Manager v2.23.3

IBM ServeRAID Manager v2.23.3

It really looks like 1996-1997 software? It can do the most important tasks, but there are two major drawbacks:

  1. It can’t notify me of any problems via eMail
  2. It’s a purely standalone software, meaning no server/client architecture => I have to log in via KVM-over-IP or SSH+VNC to manage it

So my hope was that the new software would have a server part and a detachable client component as well as the ability to send eMails whenever shit happens. However, when first launching the new ServeRAID manager, I was greeted with this:

ServeRAID Manager v9.30.21 GUI failure

Now this doesn’t look right… (click to enlarge)

Note that this was my attempt to run the software on Windows XP x64. On Windows 2000, it looked a bit better, but still somewhat messed up. Certain GUI elements would pop up upon mouseover, but overall, the program just wasn’t usable. After finding out that this is Java software being executed by a bundled and ancient version of Sun Java (v1.4.2_12), i just tried to run the RaidMan.jar file with my platform Java. On XP x64 that’s the latest and greatest Java 1.8u112 (even though the installer says it needs a newer operating system this seems to work just fine) and on Windows 2000 it’s the latest supported on that OS: Java 1.6u31. To make RaidMan.jar run on a different JRE on Windows, you can just alter the shortcut the installer creates for you:

Changing the JRE that ServeRAID Manager should be executed by

Changing the JRE that ServeRAID Manager should be executed by

Here it’s run by the javaw.exe command that an old JDK 1.7.0 installer created in %WINDIR%\system32\. It was only later that I changed it to 1.8u112. After changing the JRE to a more modern one, everything magically works:

ServeRAID Manager v9.30.21, logged in

ServeRAID Manager v9.30.21, remotely logged in to my server (click to enlarge)

And this is already me having launched the Manager component on a different machine on my LAN, connecting to the ServeRAID agent service running on my server. So that part works. Since this software also runs on Linux and FreeBSD UNIX, I can set up a proper SSH tunnel script to access it remotely and securely from the outside world as well. Yay! Clicking on the controller gave me this:

ServeRAID Manager v9.30.21 array overview

Array overview (click to enlarge)

Ok, this reminds me of Adaptecs’/ICPs’ StorMan, and since there is some Adaptec license included on the IBM Application CD that this version came from, it might very well be practically the same software. It does show warnings on all drives, while the array and volume are “ok”. The warnings are pretty negligible though, as you can already see above, let’s have a more detailed look:

ServeRAID Manager v9.30.21 disk warranty warnings

So I have possible non-warranted drives? No shit, sherlock! Most of them are older than the majority of todays’ Internet users… I still don’t get how 12 of these drives are still running, seriously… (click to enlarge)

So that’s not really an issue. But what about eMail notifications? Well, take a look:

ServeRAID Manager v9.30.21 notification options

It’s there! (click to enlarge)

Yes! It can notify to the desktop, to the system log and to various email recipients. Also, you can choose who gets which mails by selecting different log levels for different recipients. The only downside is, that the ServeRAID manager doesn’t allow for SSL/TLS connections to mail servers and it can’t even provide any login data. As such, you need your own eMail server on your local network, that allows for unauthenticated and unencrypted SMTP access from the IP of your ServeRAID machine. In my case, no problem, so I can now get eMail notifications to my home and work addresses, as well as an SMS by using my 3G providers’ eMail-2-SMS gateway!

On top of that, you can of course check out disk and controller status as well:

ServeRAID Manager v9.30.21 disk status

Disk status – not much to see here at all (on none of the tabs), probably because the old ServeRAID II can’t do S.M.A.R.T. Maybe good that it can’t, I don’t really want to see 17 year old hard drives’ S.M.A.R.T. logs anyway. ;)

 

ServeRAID Manager v9.30.21 controller status

Status of my ServeRAID II controller, no battery backup unit attached for the 4MB EDO-DRAM write cache and no temperature sensors present, so not much to see here either.

Now there is only one problem with this and that is that the new ServeRAID agent service consumes quite a lot of CPU power in the background, showing as 100% peaks on a single CPU core every few seconds. This is clearly visible in my web-based monitoring setup:

ServeRAID Manager v9.30.21 agent CPU load

The background service is a bit too CPU hungry for my taste (Pentium Pro™ 200MHz). The part left of the “hole” is before installation, the part right of it after installation.

And in case you’re wondering what that hole is right between about 20:30 and 22:00, that’s the ServeRAID Managers’ SNMP components which killed my Microsoft SNMP services upon installation. My network and CPU monitoring solution is based on SNMP though, so that was not good. Luckily, just restarting the SNMP services fixed it. However, as you can see, one of the slow 200MHz cores is now under much higher load. I don’t like that because I’m short on CPU power all the time anyway, but I’ll leave it alone for now, let’s see how it goes.

ServeRAID Manager v9.30.21 splash screen

“Fast configuration”, but a pretty slow background service… :roll:

Now all I need to get is a large pack of large SCA SCSI drives, since I still have that much faster [ServeRAID 4H] with 128MB SDRAM cache and BBU lying around for 3 years anyway! Ah, and as always, the motivation to actually upgrade the server. ;)

Edit: It turns out I found the main culprit for the high CPU load. It seems to be IBMs’ [SNMP sub-agent component] after all, the one that also caused my SNMP service to shut down upon installation. Uninstalling the ServeRAID Manager v9.30.21 and reinstalling it with the SNMP component deselected resulted in a different load profile. See the following graph, the vertical red line separates the state before (with SNMP sub-agent) from the state after (without SNMP sub-agent). Take a look at the magenta line depicting the CPU core that the RAID service was bound to:

ServeRAID Manager v9.30.21 with reduced CPU load

Disabling the ServeRAID managers’ SNMP sub-agent lowers the CPU load significantly!

Thanks fly out to [these guys at Ars Technica] for giving me the right idea!

Nov 082016
 

G.SHDSL extender failure (logo)…and it wasn’t even my fault! Can you believe it?! Probably not if you know me, but it’s true nonetheless… Almost 4 days of downtime and we’re back up since just about 2½ hours or so. Given that I already had to do maintenance on the server once this year (replacing a bad hard drive and doing a thorough cleaning as well as dust filter installation), this has crushed the yearly 99%+ availability that I was so proud of. So for the first time since 2006, XIN.at failed to satisfy my personal requirement in that regard. Including the maintenance done on the server and several regular ISP maintenances on the G.SHDSL line, the full downtime should now amount to roughly 90 hours in 2016. If we assume a sum of 8760 hours per year, I’m now down to an availability of ~98.97%.

That value might get a bit worse though if my ISP decides to do another few rounds of maintenance on the DSLAMs in the automatic exchange hub.

So, how did this happen?

It all began when my RAID-6 started acting up, the one in my workstation though, not in the server. Ok, I know, that’s entirely unrelated, but still. It died no pretty death right there last Friday. And once again (this happened before!) it was not the disks to blame, neither the controller, nor the FBM, not even the hotplug bay that I suspected because all disk failures where happening in the same bay. It was the power cable extensions. Again. Even though they’re brand new! I mean, what the hell. At least I know now, that an Areca controller can force RAID-6 arrays to come back to life even if already completely failed with 3+ disks down. Nice one, Areca, I’ll have a cold one in your honor!

And when that RAID was back up, I wanted to pull up my rolling shutters a bit, just because. Which is when the belt ripped in half and the shutters went crashing down, damning me to darkness. Ok, after that I had a beer and just went to bed. Not my day. Next day I did some makeshift repairs on the shutters so they would at least be rolled all the way up and stay there. Having 0% daylight at 09:00am is pretty depressing after all. Ok, after that was done (it was Saturday now), I sat back down in my chair and thought: “Ok, let’s just read my emails…”.

And then my G.SHDSL extender burned up, sending me, my email client, my server and the rest of my digital existence offline…

And that’s when I just knew I had to get up, drive to the supermarket and get a TON of beer!

Seriously… There is bad luck and then there is…

Bad luck never comes alone!

When it rains, it pours, they say

So, the thing just went dark from one moment to the next! No fan, no LEDs, no nothing. At first I thought it might be its external power supply, some standard 12V DC unit. But I measured the voltage and it was perfectly fine. So the extender itself was obviously dead. Never seen such a thing happen with Paradyne/Zhone hardware, but what can you do. So here’s the new one (or maybe it’s refurbished, you never know with this stuff):

Paradyne/Zhone SNE2040G G.SHDSL network extender

Paradyne/Zhone SNE2040G G.SHDSL network extender (click to enlarge)

Now all that’s left is to send the defective unit back and that’s that. I hope I won’t see anything like that happen again… :( At least I got them on the phone on Saturday (business level support), but I only have the small service level agreement with my current contract, so I couldn’t get a technician on weekends. And I wasn’t available “on-site” (at home) on Monday, so the replacement unit had to be shipped via parcel service.

Oh, and neither the 3G fallback solution nor the large SLA (full 24/7 on-site support) will ever be agreed upon for XIN.at – too expensive at ~40€ a month. :( There is just so much money I can pour into a free server after all.

At least everything is back up now, so cheers! Prost!

Jul 022015
 

NetworkIt seems that two months after the last maintenance in May another one needs to be done on 2015-07-08 around 00:00 AM – 06:00 AM UTC+1. Again this means that all XIN.at services may see some downtime during this period, this weblog included. So no eMail server, no web server, no anything. By trend these maintenances tend to put me offline for short periods of time only, but who knows what UPCs gonna do. Just so you know.