Nov 082016
 

G.SHDSL extender failure (logo)…and it wasn’t even my fault! Can you believe it?! Probably not if you know me, but it’s true nonetheless… Almost 4 days of downtime and we’re back up since just about 2½ hours or so. Given that I already had to do maintenance on the server once this year (replacing a bad hard drive and doing a thorough cleaning as well as dust filter installation), this has crushed the yearly 99%+ availability that I was so proud of. So for the first time since 2006, XIN.at failed to satisfy my personal requirement in that regard. Including the maintenance done on the server and several regular ISP maintenances on the G.SHDSL line, the full downtime should now amount to roughly 90 hours in 2016. If we assume a sum of 8760 hours per year, I’m now down to an availability of ~98.97%.

That value might get a bit worse though if my ISP decides to do another few rounds of maintenance on the DSLAMs in the automatic exchange hub.

So, how did this happen?

It all began when my RAID-6 started acting up, the one in my workstation though, not in the server. Ok, I know, that’s entirely unrelated, but still. It died no pretty death right there last Friday. And once again (this happened before!) it was not the disks to blame, neither the controller, nor the FBM, not even the hotplug bay that I suspected because all disk failures where happening in the same bay. It was the power cable extensions. Again. Even though they’re brand new! I mean, what the hell. At least I know now, that an Areca controller can force RAID-6 arrays to come back to life even if already completely failed with 3+ disks down. Nice one, Areca, I’ll have a cold one in your honor!

And when that RAID was back up, I wanted to pull up my rolling shutters a bit, just because. Which is when the belt ripped in half and the shutters went crashing down, damning me to darkness. Ok, after that I had a beer and just went to bed. Not my day. Next day I did some makeshift repairs on the shutters so they would at least be rolled all the way up and stay there. Having 0% daylight at 09:00am is pretty depressing after all. Ok, after that was done (it was Saturday now), I sat back down in my chair and thought: “Ok, let’s just read my emails…”.

And then my G.SHDSL extender burned up, sending me, my email client, my server and the rest of my digital existence offline…

And that’s when I just knew I had to get up, drive to the supermarket and get a TON of beer!

Seriously… There is bad luck and then there is…

Bad luck never comes alone!

When it rains, it pours, they say

So, the thing just went dark from one moment to the next! No fan, no LEDs, no nothing. At first I thought it might be its external power supply, some standard 12V DC unit. But I measured the voltage and it was perfectly fine. So the extender itself was obviously dead. Never seen such a thing happen with Paradyne/Zhone hardware, but what can you do. So here’s the new one (or maybe it’s refurbished, you never know with this stuff):

Paradyne/Zhone SNE2040G G.SHDSL network extender

Paradyne/Zhone SNE2040G G.SHDSL network extender (click to enlarge)

Now all that’s left is to send the defective unit back and that’s that. I hope I won’t see anything like that happen again… :( At least I got them on the phone on Saturday (business level support), but I only have the small service level agreement with my current contract, so I couldn’t get a technician on weekends. And I wasn’t available “on-site” (at home) on Monday, so the replacement unit had to be shipped via parcel service.

Oh, and neither the 3G fallback solution nor the large SLA (full 24/7 on-site support) will ever be agreed upon for XIN.at – too expensive at ~40€ a month. :( There is just so much money I can pour into a free server after all.

At least everything is back up now, so cheers! Prost!

Jun 092016
 

The InternetAnd no, it’s not because of that [failing drive] in the server, but because another Wednesday’s coming up, and it’s maintenance time again, says my internet service provider. I don’t know what they’re doing really, but it seems they have to fix something like every other month now. Well, whatever. The time frame is the same as last time, but instead of several short offline periods, we should be expecting a single longer one this time. So:

Wed, 2016-06-15, 01:00 am – 06:00 am UTC+1: A downtime of ~2h duration is to be expected within this time window!

When I’m ready to replace the breaking hard drive, I’ll let you know in advance as well. Maybe. Depends on my mood. :roll:

Nov 202015
 

The InternetIn recent days, there sure are more maintenances scheduled for the DSL infrastructure in my town. So here’s another one: Next Wednesday (it seems it’s always Wednesdays), 2015-11-25 my server and all of its services including this weblog may go offline in between 00:00 a.m. UTC+1 and 06:00 a.m. UTC+1 for any arbitrary time. As usual, my ISP says “we’re going to work as fast as possible”, but you never know, right? So there you have it. Last time it lasted about an hour if I remember correctly, and the time before it was about as long as well. So I guess the downtime’s going to be in the same ballpark roughly. Let’s keep our fingers crossed and hope they don’t mess up my being hooked up to the correct DSLAM again…

Jul 022015
 

NetworkIt seems that two months after the last maintenance in May another one needs to be done on 2015-07-08 around 00:00 AM – 06:00 AM UTC+1. Again this means that all XIN.at services may see some downtime during this period, this weblog included. So no eMail server, no web server, no anything. By trend these maintenances tend to put me offline for short periods of time only, but who knows what UPCs gonna do. Just so you know.

Feb 062015
 

Network[1] Everybody hates servers going offline. Especially email servers. Or web servers. Or MY SERVER! Now I prepared for a lot of things with my home server, I prepared for power failures, storage failures, operating system kernel crashes, everything. I thought I can recover from almost any possible breakdown even remotely, all but one: My four bonded G.SHDSL lines all failing at once. Which is what just happened. After lots of calls and even a replacement Paradyne/Zhone SNE2040G-S network extender having been brought to me within the time allowed by my SLA, all four lines still remained dark.

Now, today the telecommunication company which is responsible for the national network fixed the issue in the local automatic exchange. I tried to find out what had happened exactly, but ran into walls there. My Internet provider UPC got no information feedback from the telecommunication company A1 either, or at least nothing besides “it’s been fixed at the digital exchange”. Plus, as I am not an A1 customer exactly, so they won’t answer me directly. The stack is: UPC (internet provider) <=> Kapsch (field technicians handling UPC branded Internet access hardware, via outsourcing by UPC) <=> A1 (field technicians regarding the whole telecommunications infrastructure), while UPC may also communicate with A1 directly to handle outages. Communication seems to be kept to a minimum though. :(

Bad thing is, for a “business class” line, an outage of almost two days or 47 hours is a bit extreme. In such a case, more efficient communication could easily fix it faster. But it is what it is, I guess. And now I have to send one of the two Paradyne/Zhone G.SHDSL extenders back to UPC, this little bugger here:

The actively cooled Zhone 2040 G.SHDSL extender

The actively cooled Zhone SNE2040G-S G.SHDSL extender (click to enlarge)

There is actually a HSDPA (3G) fallback option, which works by implementing an OSI layer 2 coupling between the G.SHDSL line and the 3G access, keeping all IP addresses and domains the same and the services reachable during times of complete DSL failure. But I won’t order that upgrade, because it’s a steep 39€ before tax per month, or 46.80€ after tax. That’s just too expensive on top of what that connection’s already draining from my wallet.

All in all, this greatly endangers my usual, self-imposed yearly service availability of >=99%. 47 hours is a lot after all. So to maintain 99%, the server cannot go offline for more than 3 days, 15 hours and 36 minutes per regular year, and now I already have 1 day and 23 hours on the clock, and it’s just the beginning of the year! Let’s hope it runs more smoothly for the rest of 2015.

[1] Logo image is © Kyle Wickert, Do You Really Understand The Applications Flowing Through Your Network?

Feb 152012
 

Internet LogoMy Internet provider UPC just informed me, that there will be some undefined maintainance work going on on the 29th of February, that’s exactly in two weeks from now. According to the announcement, the downtime might range from midnight to 06:00 AM. That means, that all XIN.at services including this website will be down between 00:00 AM – 06:00 AM on 2012-02-29. I hope we can expect everything to just work as expected after the maintainance is done. But you never know with Internet providers, not even when running a business-class line, so better be prepared for anything.

Services that will be down include, but are not limited to the XIN.at web server, mail server, FTP server, IRC server and other minor services like time serving etc. Since the work is being done at night, chances are that only very few people will even notice this. Let’s hope that it goes that way!