Nov 082016

G.SHDSL extender failure (logo)…and it wasn’t even my fault! Can you believe it?! Probably not if you know me, but it’s true nonetheless… Almost 4 days of downtime and we’re back up since just about 2½ hours or so. Given that I already had to do maintenance on the server once this year (replacing a bad hard drive and doing a thorough cleaning as well as dust filter installation), this has crushed the yearly 99%+ availability that I was so proud of. So for the first time since 2006, failed to satisfy my personal requirement in that regard. Including the maintenance done on the server and several regular ISP maintenances on the G.SHDSL line, the full downtime should now amount to roughly 90 hours in 2016. If we assume a sum of 8760 hours per year, I’m now down to an availability of ~98.97%.

That value might get a bit worse though if my ISP decides to do another few rounds of maintenance on the DSLAMs in the automatic exchange hub.

So, how did this happen?

It all began when my RAID-6 started acting up, the one in my workstation though, not in the server. Ok, I know, that’s entirely unrelated, but still. It died no pretty death right there last Friday. And once again (this happened before!) it was not the disks to blame, neither the controller, nor the FBM, not even the hotplug bay that I suspected because all disk failures where happening in the same bay. It was the power cable extensions. Again. Even though they’re brand new! I mean, what the hell. At least I know now, that an Areca controller can force RAID-6 arrays to come back to life even if already completely failed with 3+ disks down. Nice one, Areca, I’ll have a cold one in your honor!

And when that RAID was back up, I wanted to pull up my rolling shutters a bit, just because. Which is when the belt ripped in half and the shutters went crashing down, damning me to darkness. Ok, after that I had a beer and just went to bed. Not my day. Next day I did some makeshift repairs on the shutters so they would at least be rolled all the way up and stay there. Having 0% daylight at 09:00am is pretty depressing after all. Ok, after that was done (it was Saturday now), I sat back down in my chair and thought: “Ok, let’s just read my emails…”.

And then my G.SHDSL extender burned up, sending me, my email client, my server and the rest of my digital existence offline…

And that’s when I just knew I had to get up, drive to the supermarket and get a TON of beer!

Seriously… There is bad luck and then there is…

Bad luck never comes alone!

When it rains, it pours, they say

So, the thing just went dark from one moment to the next! No fan, no LEDs, no nothing. At first I thought it might be its external power supply, some standard 12V DC unit. But I measured the voltage and it was perfectly fine. So the extender itself was obviously dead. Never seen such a thing happen with Paradyne/Zhone hardware, but what can you do. So here’s the new one (or maybe it’s refurbished, you never know with this stuff):

Paradyne/Zhone SNE2040G G.SHDSL network extender

Paradyne/Zhone SNE2040G G.SHDSL network extender (click to enlarge)

Now all that’s left is to send the defective unit back and that’s that. I hope I won’t see anything like that happen again… :( At least I got them on the phone on Saturday (business level support), but I only have the small service level agreement with my current contract, so I couldn’t get a technician on weekends. And I wasn’t available “on-site” (at home) on Monday, so the replacement unit had to be shipped via parcel service.

Oh, and neither the 3G fallback solution nor the large SLA (full 24/7 on-site support) will ever be agreed upon for – too expensive at ~40€ a month. :( There is just so much money I can pour into a free server after all.

At least everything is back up now, so cheers! Prost!

Jun 092016

The InternetAnd no, it’s not because of that [failing drive] in the server, but because another Wednesday’s coming up, and it’s maintenance time again, says my internet service provider. I don’t know what they’re doing really, but it seems they have to fix something like every other month now. Well, whatever. The time frame is the same as last time, but instead of several short offline periods, we should be expecting a single longer one this time. So:

Wed, 2016-06-15, 01:00 am – 06:00 am UTC+1: A downtime of ~2h duration is to be expected within this time window!

When I’m ready to replace the breaking hard drive, I’ll let you know in advance as well. Maybe. Depends on my mood. :roll:

Apr 292016

The InternetAnd here we go again… Wednesdays. Well, on Wednesday, 2016-05-04 my internet service provider will have to do some undefined maintenance again, and that’s supposed to be happening in between 01:00 – 06:00 am. According to my provider UPC, there will be several short periods of unavailability of service about 15 minutes long. Of course they’ll do everything as quickly and properly as possible as to not to interrupt their customers’ services too much – or so they say. So, once again:

Wed, 2016-05-04, 01:00 am – 06:00 am UTC+1 – Expect several short downtimes in that time frame!

Oh, and in case anyone noticed (nah, I know you didn’t…), sorry for that unannounced downtime on Thu, 2016-04-21. Should’ve been roughly from 03:00 pm – 07:00 pm UTC+1. This was done for two reasons:

  1. The WinSSL layer had broken down permanently (LsaSrv) due to a bug. When that happens, only a reboot can fix it. There is only one service relying on WinSSL instead of OpenSSL or GnuTLS, but it’s an important one that I do not wish to operate without a cryptography option, as it’s endangering the privacy of my users.
  2. The OS hard drive had shown signs of sectors becoming defective. This called for an analysis to see what areas of the file system had taken damage. Luckily, it only got some $I30 index files (deleted files still being referenced) and one unimportant IPCache.dat, which is just a local DNS cache file of a single service. Also, if the state would be recoverable, a full backup was in order.

So, the file system errors have been repaired, and defective blocks have been marked, restoring proper, consistent operation. Since the drive is no longer to be considered reliable enough, I decided to take the extra time to create a fresh full backup / system image of the operating systems’ and services’ current state, so when “C:” does die, I can restore without too much hassle. The last full backup was too old anyway, with a lot of important stuff already missing by now.

Got any flawlessly working U160/U320 68-pin SCSI drives >=36GB that you could part with? ;) If so, post in the comments, let me know how much you want for it. ;) And maybe don’t try to post in the time noted above. :)

Oct 012015

The InternetAnd here we have another one: Next Wednesday on the 7th of October, starting from midnight UTC+1 and lasting until 06:00am UTC+1 there will be network outages that may make all my services – this weblog included as well – unavailable. I’m guessing that the line will be down only for minutes, at most half an hour like last time, but you never know with my internet service provider. Could very well be down for hours. So if you can’t reach any XIN services on 2015-10-07, 00:00am – 06:00am UTC+1, that’s why. As usual, the company did not give any details as to the nature of the maintenance work being done on that day. So there you have it.

And in case you’re wondering about the lack of posts in recent weeks*: The Anime madness is still ongoing, and I’m still not tired of it. Heh, maybe I’ll just post some Anime-related stuff next? It’s not like my RAID-6 storage project is really coming along due to lack of hard drives anyway. Maybe during Christmas (It’s not like that wasn’t planned for last years Christmas, uhmm…).

*I’m still pretending that somebody actually reads this!

Feb 062015

Network[1] Everybody hates servers going offline. Especially email servers. Or web servers. Or MY SERVER! Now I prepared for a lot of things with my home server, I prepared for power failures, storage failures, operating system kernel crashes, everything. I thought I can recover from almost any possible breakdown even remotely, all but one: My four bonded G.SHDSL lines all failing at once. Which is what just happened. After lots of calls and even a replacement Paradyne/Zhone SNE2040G-S network extender having been brought to me within the time allowed by my SLA, all four lines still remained dark.

Now, today the telecommunication company which is responsible for the national network fixed the issue in the local automatic exchange. I tried to find out what had happened exactly, but ran into walls there. My Internet provider UPC got no information feedback from the telecommunication company A1 either, or at least nothing besides “it’s been fixed at the digital exchange”. Plus, as I am not an A1 customer exactly, so they won’t answer me directly. The stack is: UPC (internet provider) <=> Kapsch (field technicians handling UPC branded Internet access hardware, via outsourcing by UPC) <=> A1 (field technicians regarding the whole telecommunications infrastructure), while UPC may also communicate with A1 directly to handle outages. Communication seems to be kept to a minimum though. :(

Bad thing is, for a “business class” line, an outage of almost two days or 47 hours is a bit extreme. In such a case, more efficient communication could easily fix it faster. But it is what it is, I guess. And now I have to send one of the two Paradyne/Zhone G.SHDSL extenders back to UPC, this little bugger here:

The actively cooled Zhone 2040 G.SHDSL extender

The actively cooled Zhone SNE2040G-S G.SHDSL extender (click to enlarge)

There is actually a HSDPA (3G) fallback option, which works by implementing an OSI layer 2 coupling between the G.SHDSL line and the 3G access, keeping all IP addresses and domains the same and the services reachable during times of complete DSL failure. But I won’t order that upgrade, because it’s a steep 39€ before tax per month, or 46.80€ after tax. That’s just too expensive on top of what that connection’s already draining from my wallet.

All in all, this greatly endangers my usual, self-imposed yearly service availability of >=99%. 47 hours is a lot after all. So to maintain 99%, the server cannot go offline for more than 3 days, 15 hours and 36 minutes per regular year, and now I already have 1 day and 23 hours on the clock, and it’s just the beginning of the year! Let’s hope it runs more smoothly for the rest of 2015.

[1] Logo image is © Kyle Wickert, Do You Really Understand The Applications Flowing Through Your Network?

May 152012

Recently a guy from my internet service provider UPC wrote me an eMail, introducing himself as my new personal contact person there. Personal support is one of the fewer good things you actually get when using a business-level internet line. Well, since my contract has already run out and I am no longer legally bound to stay in the contract, it’s about time to re-evaluate the situation. So I decided to write the guy a nice eMail, basically asking for a free upgrade to double the bandwidth. ;) While that might sound a bit too cocky, this has already been done twice, so I really hope it can be done again.

It’s about time anyway. The current 4/4 Mbit are somewhat ok, but since most consumer lines are so much faster I’m starting to feel the pain again. Problem is, if you’re in a more rural area (where rural means “not within a state capital”), upstream bandwidth is extremely expensive, especially when you’re limited to ADSL/SHDSL and have no other options like fibre or cable available.

Well, I should receive more information about my options today in the morning, probably including an offer for 8/8 Mbits. We’ll see if that works out or not. The unfortunate thing is, since 3.5G and LTE are pushing conventional land lines off the map in “rural areas” and most companies are moving their hosting services to housing centers in bigger cities, it’s beginning to look more and more grim for synchronous high-bandwidth low-latency land lines outside of the big cities. Well, we’ll see..

Apr 052012

The InternetAnd here we go again. Today, the internet was down. Well, at least if you happened to be in a certain state of Austria and happened to be hooked up to a certain Telekom backbone line. Which my server just so happened to be, together with a few thousand other households and businesses. The downtime started roughly at 01:00pm and lasted for about an hour, or maybe one and a half. I was a bit on edge because of this, since it’s hard to tell whether it’s just the WAN access being down, or my actual server going up in flames. Problem is, that I can’t just ping my SDSL/SHDSL extender to see which one is the case, because the IP of that little metal box is being mapped through in a DMZ style for all protocols, all the way from god knows where, maybe even the VIX backbone, but at least from the state backbone.

So, even if the entire town is offline, I can still ping that IP no matter if the DSLAM is a mile away, or actually 50 miles away. Weird stuff. But that’s the way it is. So while I was still thinking, wether I shall mount my metal horse and ride home at full speed to rescue my server – possibly in vain – all those little lights started coming back online again; IRC bots, mail services, web, blah. What a relief, the IBM monstrosity at home was still alive, as usual. Mostly. I should think of some preferrably free way to verify the health of that machine, without having to rely on its actual WAN link… Mmmh.. maybe another day.

Feb 152012

Internet LogoMy Internet provider UPC just informed me, that there will be some undefined maintainance work going on on the 29th of February, that’s exactly in two weeks from now. According to the announcement, the downtime might range from midnight to 06:00 AM. That means, that all services including this website will be down between 00:00 AM – 06:00 AM on 2012-02-29. I hope we can expect everything to just work as expected after the maintainance is done. But you never know with Internet providers, not even when running a business-class line, so better be prepared for anything.

Services that will be down include, but are not limited to the web server, mail server, FTP server, IRC server and other minor services like time serving etc. Since the work is being done at night, chances are that only very few people will even notice this. Let’s hope that it goes that way!