Nov 242016
 

Broken Windows logo[1] I know what I should do if a system service on Microsoft Windows starts crashing of course; Fixing it is the way to go! But sometimes you simply can’t, because the component causing a certain instability can’t be swapped out or updated. Now Windows services do have a mechanism for monitoring and restarting a service upon failure, but it seems that only works if the system gets an actual error code back from the service upon termination. But it doesn’t seem to work (at least for me) if the service just dies abnormally. Windows recognizes the service has stopped somehow of course, but the restart procedure just doesn’t kick in.

So I thought I’d do it myself, programmatically. And it’s actually pretty easy. I solved this with VBScript, Windows Batch and Mark Russinovichs’ pslist plus grep. So the prerequisites are:

  • Microsoft Windows (well, huh..)
  • MS Windows Script(ing) Host / VBScript, Windows should come with this preinstalled since Windows 2000.
  • [pslist]
  • [grep][src] (grep is optional, I used GNU grep 2.5.4 in this case, licensed under the [GPLv3+])

Make sure the pstools and grep are within your %PATH%, so Windows can find those .exe files. If you don’t want to use grep, you can also use Microsofts’ own find command, if your version of Windows has it.

I divided this into two small scripts. Since the main part is Batch, it might be problematic if you run it at very short intervals, checking for the services’ status, because you get a command window popping up on the desktop. Since most users wouldn’t want that, another script acts as a launcher, hiding the cmd.exe window so it’s run fully in the background without disturbing any potential users or administrators. The launcher looks like this, in my case it’s meant to watch over an Apache web server:

  1. Set WshShell = CreateObject("WScript.Shell")
  2. WshShell.Run chr(34) & "C:\Server\Scripts\monitor-httpd.bat" & Chr(34), 0
  3. Set WshShell = Nothing

And that script C:\Server\Scripts\monitor-httpd.bat we’re launching looks like this:

  1. @ECHO OFF
  2. FOR /F "tokens=* delims= usebackq" %%I IN (`pslist ^| grep httpd`) DO SET HTTPDSTATUS=%%I
  3. IF NOT DEFINED HTTPDSTATUS (net start "Apache2.2") ELSE (SET HTTPDSTATUS=)

A version relying on Microsoft find instead of GNU grep could look like this:

  1. @ECHO OFF
  2. FOR /F "tokens=* delims= usebackq" %%I IN (`pslist ^| find /I "httpd"`) DO SET HTTPDSTATUS=%%I
  3. IF NOT DEFINED HTTPDSTATUS (net start "Apache2.2") ELSE (SET HTTPDSTATUS=)

To get a services’ exact name, just launch services.msc from Start \ Run or run the command net start on a cmd terminal.

As you can see, this greps “httpd” from the process list and pushes its output into %%I and finally into %HTTPDSTATUS%. We have to use a FOR /F for that, as Windows has no way of pushing command outputs from subshells into shell variables like UNIX has (like e.g. var=`command` or var=$(command)). Then we check for the status of that variable. If it’s not defined, then the process http.exe was nowhere to be found! In that case we restart the associated system service (needs proper permissions!). If the variable is defined, we do nothing but unsetting it, since we can assume the service is operating normally. Or at the very least it’s running. ;)

You can automate that by using the Windows task scheduler:

Scheduling an Apache web server "watchdog"

Scheduling an Apache web server “watchdog” (German Windows)

Create a Schedule to your liking and you’re done! If you can afford the affected service to be down for 5 minutes and no longer, just run it every 4 minutes or so.

The solution shown above can easily be adapted to monitor and restart any Windows service you have, as long as the service isn’t fundamentally broken so that it wouldn’t even start up anymore. Also, you can do a lot more, like sending notification eMails with a command line mailer like [blat] when crashes do occur. Of course, this is only useful for services that crash rarely. If it dies every few minutes, you should reaaally fix it instead of just pushing the restart button all the time… ;)

And that’s that!

[1] © Mar.0007. Original Version for desktopwallpapers4.me.

Nov 222016
 

FreeBSD IBM ServeRAID Manager logoAnd yet another FreeBSD-related post: After [updating] the IBM ServeRAID manager on my old Windows 2000 server I wanted to run the management software on any possible client. Given it’s Java stuff, that shouldn’t be too hard, right? Turned out not to be too easy either. Just copying the .jar file over to Linux and UNIX and running it like $ java -jar RaidMan.jar wouldn’t do the trick. Got nothing but some exception I didn’t understand. I wanted to have it work on XP x64 (easy, just use the installer) and Linux (also easy) as well as FreeBSD. But there is no version for FreeBSD?!

The ServeRAID v9.30.21 manager only supports the following operating systems:

  • SCO OpenServer 5 & 6
  • SCO Unixware 7.1.3 & 7.1.4
  • Oracle Solaris 10
  • Novell NetWare 6.5
  • Linux (only certain older distributions)
  • Windows (2000 or newer)

I started by installing the Linux version on my CentOS 6.8 machine. It does come with some platform-specific libraries as well, but those are for running the actual RAID controller management agent for interfacing with the driver on the machine running the ServeRAID controller. But I only needed the user space client program, which is 100% Java stuff. All I needed was the proper invocation to run it! By studying IBMs RaidMan.sh, I came up with a very simple way of launching the manager on FreeBSD by using this script I called serveraid.sh (Java is required naturally):

  1. #!/bin/sh
  2.  
  3. # ServeRAID Manager launcher script for FreeBSD UNIX
  4. # written by GAT. http://www.xin.at/archives/3967
  5. # Requirements: An X11 environment and java/openjdk8-jre
  6.  
  7. curDir="$(pwd)"
  8. baseDir="$(dirname $0)/"
  9.  
  10. mkdir ~/.serveraid 2>/dev/null
  11. cd ~/.serveraid/
  12.  
  13. java -Xms64m -Xmx128m -cp "$baseDir"RaidMan.jar com.ibm.sysmgt.raidmgr.mgtGUI.Launch \
  14. -jar "$baseDir"RaidMan.jar $* < /dev/null >> RaidMan_StartUp.log 2>&1
  15.  
  16. mv ~/RaidAgnt.pps ~/RaidGUI.pps ~/.serveraid/
  17. cd "$curDir"

Now with that you probably still can’t run everything locally (=in a FreeBSD machine with ServeRAID SCSI controller) because of the Linux libraries. I haven’t tried running those components on linuxulator, nor do I care for that. But what I can do is to launch the ServeRAID manager and connect to a remote agent running on Linux or Windows or whatever is supported.

Now since this server/client stuff probably isn’t secure at all (no SSL/TLS I think), I’m running this through an SSH tunnel. However, the Manager refuses to connect to a local port because “localhost” and “127.0.0.1” make it think you want to connect to an actual local RAID controller. It would refuse to add such a host, because an undeleteable “local machine” is always already set up to begin with, and that one won’t work with an SSH tunnel as it’s probably not running over TCP/IP. This can be circumvented easily though!

Open /etc/hosts as root and enter an additional fantasy host name for 127.0.0.1. I did it like that with “xin”:

::1			localhost localhost.my.domain xin
127.0.0.1		localhost localhost.my.domain xin

Now I had a new host “xin” that the ServeRAID manager wouldn’t complain about. Now set up the SSH tunnel to the target machine, I put that part into a script /usr/local/sbin/serveraidtunnel.sh. Here’s an example, 34571 is the ServeRAID agents’ default TCP listen port, 10.20.15.1 shall be the LAN IP of our remote machine hosting the ServeRAID array:

#!/bin/bash
ssh -fN -p22 -L34571:10.20.15.1:34571 mysshuser@www.myserver.com

You’d also need to replace “mysshuser” with your user name on the remote machine, and “www.myserver.com” with the Internet host name of the server via which you can access the ServeRAID machine. Might be the same machine or a port forward to some box within the remote LAN.

Now you can open the ServeRAID manager and connect to the made-up host “xin” (or whichever name you chose), piping traffic to and from the ServeRAID manager through a strongly encrypted SSH tunnel:

IBM ServeRAID Manager on FreeBSD

It even detects the local systems’ operating system “FreeBSD” correctly!

And:

IBM ServeRAID Manager on FreeBSD

Accessing a remote Windows 2000 server with a ServeRAID II controller through an SSH tunnel, coming from FreeBSD 11.0 UNIX

IBM should’ve just given people the RaidMan.jar file with a few launcher scripts to be able to run it on any operating system with a Java runtime environment, whether Windows, or some obscure UNIX flavor or something else entirely, just for the client side. Well, as it stands, it ain’t as straight-forward as it may be on Linux or Windows, but this FreeBSD solution should work similarly on other systems as well, like e.g. Apple MacOS X or HP-UX and others. I tested this with the Sun JRE 1.6.0_32, Oracle JRE 1.8.0_112 and OpenJDK 1.8.0_102 for now, and even though it was originally built for Java 1.4.2, it still works just fine.

Actually, it works even better than with the original JRE bundled with RaidMan.jar, at least on MS Windows (no more GUI glitches).

And for the easy way, here’s the [package]! Unpack it wherever you like, maybe in /usr/local/. On FreeBSD, you need [archivers/p7zip] to unpack it and a preferably modern Java version, like [java/openjdk8-jre], as well as X11 to run the GUI. For easy binary installation: # pkg install p7zip openjdk8-jre. To run the manager, you don’t need any root privileges, you can execute it as a normal user, maybe like this:

$ /usr/local/RaidMan/serveraid.sh

Please note that my script will create your ServeRAID configuration in ~/.serveraid/, so if you want to run it as a different user or on a different machine later on, you should recursively copy that directory to the new user/machine. That’ll retain the local client configuration.

That should do it! :)

Nov 212016
 

IBM ServeRAID Manager logoBelieve it or not, the server hosting the very web site you’re reading right now has all of its data stored on an ancient IBM ServeRAID II array made in the year 1995. That makes the SCSI RAID-5 controller 21 years old, and the 9.1GB SCA drives attached to it via hot-plug bays are from 1999, so 17 years old. Recently, I found out that IBMs’ latest SCSI ServeRAID manager from 2011 still supports that ancient controller as well as the almost equally ancient Windows 2000 Server I’m running on the machine. In hope for better management functionality, I chose to give the new software a try. So additionally to my antiquated NT4 ServeRAID manager v2.23.3 I’d also run v9.30.21 side-by-side! This is also in preparation for a potential upgrade to a much newer ServeRAID-4H and larger SCSI drives.

Just so you know how the old v2.23.3 looks, here it is:

IBM ServeRAID Manager v2.23.3

IBM ServeRAID Manager v2.23.3

It really looks like 1996-1997 software? It can do the most important tasks, but there are two major drawbacks:

  1. It can’t notify me of any problems via eMail
  2. It’s a purely standalone software, meaning no server/client architecture => I have to log in via KVM-over-IP or SSH+VNC to manage it

So my hope was that the new software would have a server part and a detachable client component as well as the ability to send eMails whenever shit happens. However, when first launching the new ServeRAID manager, I was greeted with this:

ServeRAID Manager v9.30.21 GUI failure

Now this doesn’t look right… (click to enlarge)

Note that this was my attempt to run the software on Windows XP x64. On Windows 2000, it looked a bit better, but still somewhat messed up. Certain GUI elements would pop up upon mouseover, but overall, the program just wasn’t usable. After finding out that this is Java software being executed by a bundled and ancient version of Sun Java (v1.4.2_12), i just tried to run the RaidMan.jar file with my platform Java. On XP x64 that’s the latest and greatest Java 1.8u112 (even though the installer says it needs a newer operating system this seems to work just fine) and on Windows 2000 it’s the latest supported on that OS: Java 1.6u31. To make RaidMan.jar run on a different JRE on Windows, you can just alter the shortcut the installer creates for you:

Changing the JRE that ServeRAID Manager should be executed by

Changing the JRE that ServeRAID Manager should be executed by

Here it’s run by the javaw.exe command that an old JDK 1.7.0 installer created in %WINDIR%\system32\. It was only later that I changed it to 1.8u112. After changing the JRE to a more modern one, everything magically works:

ServeRAID Manager v9.30.21, logged in

ServeRAID Manager v9.30.21, remotely logged in to my server (click to enlarge)

And this is already me having launched the Manager component on a different machine on my LAN, connecting to the ServeRAID agent service running on my server. So that part works. Since this software also runs on Linux and FreeBSD UNIX, I can set up a proper SSH tunnel script to access it remotely and securely from the outside world as well. Yay! Clicking on the controller gave me this:

ServeRAID Manager v9.30.21 array overview

Array overview (click to enlarge)

Ok, this reminds me of Adaptecs’/ICPs’ StorMan, and since there is some Adaptec license included on the IBM Application CD that this version came from, it might very well be practically the same software. It does show warnings on all drives, while the array and volume are “ok”. The warnings are pretty negligible though, as you can already see above, let’s have a more detailed look:

ServeRAID Manager v9.30.21 disk warranty warnings

So I have possible non-warranted drives? No shit, sherlock! Most of them are older than the majority of todays’ Internet users… I still don’t get how 12 of these drives are still running, seriously… (click to enlarge)

So that’s not really an issue. But what about eMail notifications? Well, take a look:

ServeRAID Manager v9.30.21 notification options

It’s there! (click to enlarge)

Yes! It can notify to the desktop, to the system log and to various email recipients. Also, you can choose who gets which mails by selecting different log levels for different recipients. The only downside is, that the ServeRAID manager doesn’t allow for SSL/TLS connections to mail servers and it can’t even provide any login data. As such, you need your own eMail server on your local network, that allows for unauthenticated and unencrypted SMTP access from the IP of your ServeRAID machine. In my case, no problem, so I can now get eMail notifications to my home and work addresses, as well as an SMS by using my 3G providers’ eMail-2-SMS gateway!

On top of that, you can of course check out disk and controller status as well:

ServeRAID Manager v9.30.21 disk status

Disk status – not much to see here at all (on none of the tabs), probably because the old ServeRAID II can’t do S.M.A.R.T. Maybe good that it can’t, I don’t really want to see 17 year old hard drives’ S.M.A.R.T. logs anyway. ;)

 

ServeRAID Manager v9.30.21 controller status

Status of my ServeRAID II controller, no battery backup unit attached for the 4MB EDO-DRAM write cache and no temperature sensors present, so not much to see here either.

Now there is only one problem with this and that is that the new ServeRAID agent service consumes quite a lot of CPU power in the background, showing as 100% peaks on a single CPU core every few seconds. This is clearly visible in my web-based monitoring setup:

ServeRAID Manager v9.30.21 agent CPU load

The background service is a bit too CPU hungry for my taste (Pentium Pro™ 200MHz). The part left of the “hole” is before installation, the part right of it after installation.

And in case you’re wondering what that hole is right between about 20:30 and 22:00, that’s the ServeRAID Managers’ SNMP components which killed my Microsoft SNMP services upon installation. My network and CPU monitoring solution is based on SNMP though, so that was not good. Luckily, just restarting the SNMP services fixed it. However, as you can see, one of the slow 200MHz cores is now under much higher load. I don’t like that because I’m short on CPU power all the time anyway, but I’ll leave it alone for now, let’s see how it goes.

ServeRAID Manager v9.30.21 splash screen

“Fast configuration”, but a pretty slow background service… :roll:

Now all I need to get is a large pack of large SCA SCSI drives, since I still have that much faster [ServeRAID 4H] with 128MB SDRAM cache and BBU lying around for 3 years anyway! Ah, and as always, the motivation to actually upgrade the server. ;)

Edit: It turns out I found the main culprit for the high CPU load. It seems to be IBMs’ [SNMP sub-agent component] after all, the one that also caused my SNMP service to shut down upon installation. Uninstalling the ServeRAID Manager v9.30.21 and reinstalling it with the SNMP component deselected resulted in a different load profile. See the following graph, the vertical red line separates the state before (with SNMP sub-agent) from the state after (without SNMP sub-agent). Take a look at the magenta line depicting the CPU core that the RAID service was bound to:

ServeRAID Manager v9.30.21 with reduced CPU load

Disabling the ServeRAID managers’ SNMP sub-agent lowers the CPU load significantly!

Thanks fly out to [these guys at Ars Technica] for giving me the right idea!

Sep 072016
 

TeamViewer on Linux logoI’m not exactly a big fan of TeamViewer, since you’ll never know what’s going to happen with that traffic of yours, so I prefer VNC over SSH instead. A few weeks ago I got TeamViewer access to a remote workstation machine for the purpose of processing A/V files however. Basically, it was about video and audio transcoding on said machine.

Since the stream meta data (like the language of an audio stream) wasn’t always there, I wanted to check it by playing back the files remotely in foobar2000 or MPC-HC. TeamViewer does offer a feature to relay the audio from a remote machine to your local box, as long as the remote server has some kind of soundcard / sound chip installed. I was using TeamViewer 11 – the newest version at the time of writing – to connect from CentOS 6.8 Linux to a Windows 7 Professional machine. Playing back audio yielded nothing but silence though.

Now, TeamViewer is actually not native Linux software. Both its Linux and MacOS X versions come with a bundled Wine 1.6 distribution preconfigured to run the 32-bit TeamViewer Windows binary. It was thus logical to assume that the configuration of TeamViewers’ built-in Wine was broken. This may happen in cases where you upgrade TeamViewer from previous releases (which is what I had done, 7 -> 8 -> 9 -> 11).

There are a multitude of proposed solutions to fix this, and since none of them worked for me as-is, I’d like to add my own to the mix. The first useful hint came from [here]. You absolutely need a working system-wide Wine setup for this. I already had one that I needed for work anyway, namely Wine 1.8.6 from the [EPEL] repository, configured using [winetricks]. We’re going to take some files from that installation and essentially replace TeamViewers’ own Wine with the one distributed by EPEL.

So I had TeamViewer 11 installed in /opt/teamviewer/ and some important configuration files for it in ~/.local/share/teamviewer11/ and ~/.config/teamviewer/. First, we backup the wine files of TeamViewer and replace them with the platform ones (the paths may vary depending on your Linux distribution, but the file names should not):

# mv /opt/teamviewer/tv_bin/wine/bin/wine /opt/teamviewer/tv_bin/wine/bin/wine.BACKUP
# mv /opt/teamviewer/tv_bin/wine/bin/wineserver /opt/teamviewer/tv_bin/wine/bin/wineserver.BACKUP
# mv /opt/teamviewer/tv_bin/wine/bin/wine-preloader /opt/teamviewer/tv_bin/wine/bin/wine-preloader.BACKUP
# mv /opt/teamviewer/tv_bin/wine/lib/libwine.so.1.0 /opt/teamviewer/tv_bin/wine/lib/libwine.so.1.0.BACKUP
# mv /opt/teamviewer/tv_bin/wine/lib/wine/ /opt/teamviewer/tv_bin/wine/lib/wine.BACKUP/
# cp /usr/bin/wine /usr/bin/wineserver /usr/bin/wine-preloader /opt/teamviewer/tv_bin/wine/bin/
# cp /usr/lib/libwine.so.1.0 /opt/teamviewer/tv_bin/wine/lib/
# cp -r /usr/lib/wine/ /opt/teamviewer/tv_bin/wine/lib/

This will replace all the binaries and libraries, in my case shoving Wine 1.8.6 underneath TeamViewer. This isn’t all that’s needed however. We’ll also need the system registry hive of your working Wine installation (with sound). That should be stored in ~/.wine/system.reg! Let’s replace TeamViewers’ own hive with this one:

$ mv ~/.local/share/teamviewer11/system.reg ~/.local/share/teamviewer11/system.reg.BACKUP
$ cp ~/.wine/system.reg ~/.local/share/teamviewer11/

Ok, and the final part is adding the proper Linux audio backend to this Wines’ configuration. That part is stored in ~/.wine/user.reg. Replacing the whole file didn’t work for me though, as TeamViewer would crash upon launch, probably missing some keys from its own user.reg. So, let’s just edit its file instead, open ~/.local/share/teamviewer11/system.reg with your favorite text editor and add the following line in a proper location (it’s sorted alphabetically):

[Software\\Wine\\Drivers\\winepulse.drv] 1473239241

The corresponding file should be found within TeamViewers’ replaced Wine distribution now by the way, in my case it’s /opt/teamviewer/tv_bin/wine/lib/wine/fakedlls/winepulse.drv.

Now, run the TeamViewer profile updater (Some people say it’s required to make this work, it wasn’t for me, but it didn’t hurt either): $ /opt/teamviewer/tv_bin/TeamViewer --update-profile and then its’ Wine configuration: $ /opt/teamviewer/tv_bin/TeamViewer --winecfg. After that, you should be greeted with this:

TeamViewer 11 running its own winecfg

TeamViewer 11 running its own copy of winecfg.

Before the modifications, the configuration window would show “None” as the driver, without any way to change it. So no audio, whereas we have Pulseaudio now. Press “Test Sound” if you want to check whether it truly works. I haven’t tested the ALSA backend by the way. In my case, as soon as the registry was fixed, Wine just autoselected Pulseaudio, which is fine for me.

Now launch TeamViewer and check out the audio options in this submenu:

TeamViewer preferences

The TeamViewer 11 preferences can be found here.

It should look like this:

TeamViewers audio options

Make sure “Play computer sounds and music” is checked! (click to enlarge)

Now, after having connected and logged in, you may also wish to verify the conference audio settings in TeamViewers’ top menu:

TeamViewer 11 conference audio settings

TeamViewer 11 conference audio settings, make sure “Computer sound” is checked!

When you play a sound file on the remote computer, you should hear it on your local one as well. With that, I can finally test the audio files I’m supposed to use on that remote machine for their actual language (which is a rather important detail) where meta data isn’t available.

This seems to be a problem of TeamViewers installation / update procedure which hasn’t been addressed for several major released now. I presume just removing all traces of TeamViewer and installing it from scratch might also do the trick, but I didn’t try it for myself.

Ah, and one more thing: If you can’t launch TeamViewer on CentOS 6.x because you’re getting the following error…

teamviewerd error

TeamViewer Daemon not running…

…forget about the solutions on the web on top of what this message is telling you. TeamViewer 11 uses a systemd-style script for launching its daemon on Linux now, and that won’t do on SysV init systems. Just become root and launch the crap manually: # /opt/teamviewer/tv_bin/teamviewerd &, then press <CTRL>+<d> and it works!

Let’s hope that daemon isn’t doing anything evil while running as root. :roll:

Aug 282016
 

KERNEL_DATA_INPAGE_ERROR logoHere is how a responsible system administrator should handle downtimes and replacements of faulty hardware: Give advance notice to all users and make sure to give everybody enough time to prepare for services going offline, if possible. Specify a precise time window which is as convenient as possible for most users. Also, explain the exact technical reasons in words as simple as possible.

How I handled the replacement of XINs’ system hard disk? See that nice blue logo on the top left side? KERNEL_DATA_INPAGE_ERROR, bugcheck code 0x0000007a. And [it isn’t the first of its kind either], last one was a KERNEL_STACK_INPAGE_ERROR, clearly disk related given that the disk had logged controller errors as well as unrecoverable dead sectors. And NO, that one wasn’t the first one too. :roll: So yeah, I rebooted the [monster], and decided that it’s too much of a pain in the ass to fix it and hoped (=told myself while in denial) that it would just live on happily ever after! Clearly in ignorance of the obvious problem, just so I could walk over to my workstation and continue to watch some Anime and have a few cold ones in peace…

So, my apologies for being lazy in a slightly dangerous way this time. Well, it’s not like there aren’t any system backups or anything, but still. In the end, it caused an unannounced and unplanned downtime 3½ hours long. This still shouldn’t hurt XINs’ >=99% yearly availability, but it clearly wasn’t the right way to deal with it either…

Well, it’s fixed now, because this time I got a bit nervous and pissed off as well. Thanks to [Umlüx], the XIN server is now running a factory-new HP/Compaq 15000rpm 68p LVD/SE SCSI drive, essentially a Seagate Cheetah 15k.3. As I am writing this the drive has only 2.9h of power on time accumulated. Pretty nice to find such pristine hardware!

Thanks do however also fly out to [Grindhavoc]German flag and [lommodore]German flag from [Voodooalert]German flag, who also kindly provided a few drives, of which some were quite usable. They’re in store now, for when the current HP drive starts behaving badly.

Now, let’s hope it was just the disk and no Controller / cabling problem on top of that, but it looks like this should be it for now. One less thing to worry about as well. ;)

Jun 022016
 

KERNEL_STACK_INPAGE_ERROR logoRecently, this server (just to remind you: an ancient quad Pentium Pro machine with SCSI storage and FPM DRAM) experienced a 1½ hour downtime due to a KERNEL_STACK_INPAGE_ERROR bluescreen, stop code 0x00000077. Yeah yeah, I’m dreaming about running OpenBSD on XIN.at, but it’s still the same old Windows server system. Bites, but hard to give up and/or migrate certain pieces of software. In any case, what exactly does that mean? In essence, it means that the operating systems’ paged pool memory got corrupted. So, heh?

More clearly, either a DRAM error or a disk error, as not the entire paged pool needs to actually be paged to disk. The paged pool is swappable to disk, but not necessarily swapped to disk. So we need to dig a bit deeper. Since this server has 2-error correction and 3-error reporting capability for its memory due to IBM combining the parity FPM-DRAM with additional ECC chips, we can look for ECC/parity error reports in the servers’ system log. Also, disk errors should be pretty apparent in the log. And look what we’ve got here (The actual error messages are German even though the log is being displayed on a remote, English system – well, the server itself is running a German OS):

Actually, when grouping the error log by disk events, I get this:

54 disk errors in total - 8 of which were dead sectors

54 disk errors in total – 8 of which were medium errors – dead sectors

8 unrecoverable dead sectors and 46 controller errors, starting from march 2015 and nothing before that date. Now the actual meaning of a “controller error” isn’t quite clear. In case of SCSI hardware like here, it could be many things. Starting from firmware issues over cabling problems all the way to wrong SCSI bus terminations. Judging from the sporadic nature and the limited time window of the error I guess it’s really failing electronics in the drive however. The problems started roughly 10 years after that drive was manufactured, and it’s an 68-pin 10.000rpm Seagate Cheetah drive with 36GB capacity by the way.

So yeah, march 2015. Now you’re gonna say “you fuck, you saw it coming a long time ago!!”, and yeah, what can I say, it’s true. I did. But you know, looking away while whistling some happy tune is just too damn easy sometimes. :roll:

So, what happened exactly? There are no system memory errors at all, and the last error that has been reported before the BSOD was a disk event id 11, controller error. Whether there was another URE (unrecoverable read error / dead sector) as well, I can’t say. But this happened exactly before the machine went down, so I guess it’s pretty clear: The NT kernel tried to read swapped kernel paged pool memory back from disk, and when the disk error corrupted that critical read operation (whether controller error or URE), the kernel space memory got corrupted in the process, in which case any kernel has to halt the operating system as safe operation can no longer be guaranteed.

So, in the next few weeks, I will have to shut the machine down again to replace the drive and restore from a system image to a known good disk. In the meantime I’ll get some properly tested drives and I’m also gonna test the few drives I have in stock myself to find a proper replacement in due time.

Thank god I have that remote KVM and power cycle capabilities, so that even a non-ACPI compliant machine like the XIN.at server can recover from severe errors like this one, no matter where in the world I am. :) Good thing I spent some cash on an expensive UPS unit with management capabilities and that KVM box…

Jan 152016
 

qWebIRC logoWhen I had set XINs web chat up back in 2014, I thought I’d found the holy grail of free IRC web frontends, but that wasn’t quite the case. While it worked, it wasn’t overly stable, and its GUI was a pretty crappy high-load HTML5/JavaScript part that didn’t work in a lot of browsers. It was based on the “kind of pre-alpha” [webchat2], a project which was dropped somewhere in the middle of the development process.

The biggest issue however was, that when a user was idle for like 5-10 minutes, webchat2 would drop his IRC connection in the backend without telling the user. So while the user kept thinking “oh, nobody is saying anything”, people might have continued to talk without him seeing it. The error became apparent only if the affected user started to write something again, which is when the “connection lost”-or-something message appeared.

Webchat, joined a channel

webchat2 – It looks nice, but it doesn’t really work that well.

It seems that software was bad at maintaining persistent connections for extended periods of time.

Back then I had tried several other alternatives, but most are based on [node.js], which my ancient Windows 2000 server (yeah yeah, I know) cannot run. I did stumble over the Python-based [qWebIRC] back then, but for some reason I had probably failed to install it properly. That piece was developed by the [QuakeNet] guys, who’re running it on their own site as well.

Yesterday I decided to give it another shot, and well…

qWebIRC login

The minimalistic qWebIRC login screen. “LunaticNet” isn’t really an IRC network though, it’s just the XIN.at IRC server by itself…

I wanted it perfect as well, so I aimed at fulfilling all the dependencies, which are:

  • Some IRC server (Duh! I won’t cover that part in detail here, but I’m running UnrealIRCd).
  • Python 2.5.x, 2.6.x or 2.7.x (obviously, and keep in mind that it won’t work with any Python 3.x).
  • zope.interface (a contract-based programming interface required by Twisted).
  • Twisted (for event-driven networking, something IRC needs to push stuff  happening on the IRC server to the web frontend).
  • pyWin32 (to enable Python to interface with the Win32 APIs).
  • simplejson (optional; preferably a version including its C extensions, provides a performance boost).
  • pyOpenSSL (optional; required if you wish to connect to IRC+SSL servers and/or to host the web chat via HTTPS instead of HTTP).
  • Java (optional; used for JavaScript minify during compile time. Makes the JS much smaller to save bandwidth).
  • Mercurial (optional; fast versioning system, provides a qWebIRC performance boost for some reason I don’t quite get yet).
  • instsrv & srvany (optional; Used to create a Windows system service for qWebIRC).

Now that’s quite something, and given that I’m doing this on Windows 2000, there have to be compromises. While the latest Python 2.7.11 can work on Win2k, the installer will fail. 2.7.3 is the last which works “cleanly”. You can still install 2.7.11 on a modern Windows box and then just copy it over, but then you won’t have it registered in the OS. In any case, I decided to go with the much older Python 2.5.4, also because some of the modules listed above including machine code were nowhere to be found for Python 2.7.x in a pre-compiled state.

So, some software is brand-new (from 2016 even), and other parts not so much. I tried to use the newest possible software without having to compile any machine code myself (like the C extensions of simplejson), because that would’ve been a lot of work.

I packaged everything I picked for this into one archive for you to use, here it is:

What you get are the following versions:

  • qWebIRC #516de557ddc7
  • Python v2.5.4
  • zope.interface v3.8.0
  • Twisted v12.1.0
  • pyWin32 v220
  • simplejson v2.1.1 with C extensions
  • pyOpenSSL v0.13.12 built by egenix
  • Sun Java Runtime Environment v1.6u31
  • Mercurial v3.4.2

And that’s what it looks like when it’s up and running:

qWebIRC chat

What qWebIRC looks like for a user logged into the XIN.at IRC server.

Now how do you install this? Simply follow these step-by-step instructions:

  1. Install Python 2.5.4. Make sure python.exe is in your systems search path. If it isn’t, add it.
  2. Copy the zope\ folder from the zope.interface 3.8.0 to the Lib\ subdirectory of your Python 2.5 installation, so that it looks like: C:\Program Files\Python25\Lib\zope\. Make sure the user who will run qWebIRC has sufficient permissions on the folder.
  3. Install Twisted 12.1.0.
  4. Install pyWin32 220
  5. Install simplejson 2.1.1
  6. Install egenix’ pyOpenSSL 0.13.12.
  7. Install Java 1.6u31. Make sure to disable auto-updates in the system control panel and disable the browser plugins for security reasons. Java is only needed for JavaScript code compression when compiling qWebIRC and for nothing else!
  8. Install Mercurial 3.4.2.
  9. Copy qWebIRC to a target directory, copy config.py.example to config.py and configure qWebIRC to your liking by editing config.py.
  10. When done, open a cmd.exe shell, cd to your qWebIRC installation directory and run python .\compile.py (This will take a few seconds). To test it, run python .\run.py, which will launch qWebIRC on the default port 9090. You can terminate it cleanly by pressing CTRL+C twice in a row.
  11. Optional, if you want qWebIRC as a system service: Copy instsrv.exe and srvany.exe to %WINDIR%\system32\. Then run instsrv qWebIRC %WINDIR%\system32\srvany.exe. Actual service configuration is discussed below.
  12. Optional, if you want SSL, create a certificate and a private key in PEM format using OpenSSL. If you don’t know how to do that, get OpenSSL [from here] and [read this] for a quick and simple solution. Create a subfolder SSL\ in your qWebIRC installation directory and put the certificate and key files in there. When ran as a background service, the passphrase has to be removed from the key! Make sure to keep your key file safe from theft!

After that, you’ll have compiled Python byte code and compressed JavaScript code for the static part of the web frontend. If you chose to create the service stub as well, you’ll need to configure the service first, otherwise it won’t really do anything. Find the service in your registry by running regedit. It should be in HKLM\SYSTEM\CurrentControlSet\Services\, called qWebIRC.

Here:

qWebIRC service

A qWebIRC service, configured to run the XIN.at chat with SSL on port 8080.

My Windows 2000 Server is German, but I guess it’s still understandable. The values are all REG_SZ / strings. Set the following three:

  1. AppDirectory (the working directory, should be the installation dir of qWebIRC).
  2. Application (the application to be launched by the service, so python.exe).
  3. AppParameters (the parameters to be passed to Python for launching qWebIRCs’ run.py. Here, I’m specifying a port to run on, as well as SSL certificate and key files to load, so qWebIRC can automatically switch to HTTPS).

Now, go to your system control panel, create a simple, restricted user to run qWebIRC as (if you don’t have a suitable one already) and make sure that user has permissions to read & execute the qWebIRC and Python 2.5 installations. For the qWebIRC\ directory the user also needs write access. Then, go to the Administrative Tools in the system control panel and configure the service qWebIRC to run as that restricted user.

Start the service and you should be done.

Of course, you can always just run a shell and launch it interactively from the command prompt as well, which is very useful for debugging by the way.

If you click on the web chat on the top right on this page, you can try it out for yourself! :) It may not look as fancy as webchat2, but it works a lot faster and is far more stable!

Ah, you’d have to accept the self-signed certificate of course, your web browser will likely warn you about it.

And that’s that. Now visitors not only have easy access to my IRC chat server, but also one that works properly and doesn’t consume a ton of resources. ;)

Jul 022015
 

NetworkIt seems that two months after the last maintenance in May another one needs to be done on 2015-07-08 around 00:00 AM – 06:00 AM UTC+1. Again this means that all XIN.at services may see some downtime during this period, this weblog included. So no eMail server, no web server, no anything. By trend these maintenances tend to put me offline for short periods of time only, but who knows what UPCs gonna do. Just so you know.

Feb 062015
 

Network[1] Everybody hates servers going offline. Especially email servers. Or web servers. Or MY SERVER! Now I prepared for a lot of things with my home server, I prepared for power failures, storage failures, operating system kernel crashes, everything. I thought I can recover from almost any possible breakdown even remotely, all but one: My four bonded G.SHDSL lines all failing at once. Which is what just happened. After lots of calls and even a replacement Paradyne/Zhone SNE2040G-S network extender having been brought to me within the time allowed by my SLA, all four lines still remained dark.

Now, today the telecommunication company which is responsible for the national network fixed the issue in the local automatic exchange. I tried to find out what had happened exactly, but ran into walls there. My Internet provider UPC got no information feedback from the telecommunication company A1 either, or at least nothing besides “it’s been fixed at the digital exchange”. Plus, as I am not an A1 customer exactly, so they won’t answer me directly. The stack is: UPC (internet provider) <=> Kapsch (field technicians handling UPC branded Internet access hardware, via outsourcing by UPC) <=> A1 (field technicians regarding the whole telecommunications infrastructure), while UPC may also communicate with A1 directly to handle outages. Communication seems to be kept to a minimum though. :(

Bad thing is, for a “business class” line, an outage of almost two days or 47 hours is a bit extreme. In such a case, more efficient communication could easily fix it faster. But it is what it is, I guess. And now I have to send one of the two Paradyne/Zhone G.SHDSL extenders back to UPC, this little bugger here:

The actively cooled Zhone 2040 G.SHDSL extender

The actively cooled Zhone SNE2040G-S G.SHDSL extender (click to enlarge)

There is actually a HSDPA (3G) fallback option, which works by implementing an OSI layer 2 coupling between the G.SHDSL line and the 3G access, keeping all IP addresses and domains the same and the services reachable during times of complete DSL failure. But I won’t order that upgrade, because it’s a steep 39€ before tax per month, or 46.80€ after tax. That’s just too expensive on top of what that connection’s already draining from my wallet.

All in all, this greatly endangers my usual, self-imposed yearly service availability of >=99%. 47 hours is a lot after all. So to maintain 99%, the server cannot go offline for more than 3 days, 15 hours and 36 minutes per regular year, and now I already have 1 day and 23 hours on the clock, and it’s just the beginning of the year! Let’s hope it runs more smoothly for the rest of 2015.

[1] Logo image is © Kyle Wickert, Do You Really Understand The Applications Flowing Through Your Network?

Oct 222014
 

Webchat logoXIN.at has been running an IRC chat server for some time now, but the problem always lies with people needing some client software to use it, like X-Chat or Nettalk or whatever.

People usually just don’t want to install yet another chat client software, no matter how old and well-established IRC itself may be. Alternatively, they can use some other untrusted web interface to connect to either the plain text [irc://www.xin.at:6666] or the encrypted [irc+ssl://www.xin.at:6697] server via a browser, but this isn’t optimal either. Since JavaScript cannot open TCP sockets on its own, and hence cannot connect to an IRC server directly, there are only two kinds of solutions:

  • Purely client-based as a Java Applet or Adobe Flash Applet, neither of wich are very good options.
  • JavaScript client + server backend for handling the actual communication with the IRC server.
    • Server backends exist in JavaScript/Node.js, Perl, Python, PHP etc.

Since I cannot run [Node.js] and [cgi:irc] is unportable due to its reliance on UNIX sockets, only Python and PHP remained. Since PHP was easier for me, I tried the old [WebChat2] software developed by Chris Chabot for this. To achieve connection-oriented encryption security, I wrapped SSL/TLS around the otherwise unencrypting PHP socket server of WebChat2. You can achieve this with cross-platform software like [stunnel], which can essentially wrap SSL around almost every servers connection (minus the complex FTP protocol maybe). While WebChat2’s back end is based on PHP, the front end uses JavaScript/Comet. This is what it looks like:

So that should do away with the “I don’t wanna install some chat client software” problem, especially when considering that most people these days don’t even know what Internet Relay Chat is anymore. ;) It also allows anonymous visitors on this web log to contact me directly, while allowing for a more tap-proof conversation when compared with what typical commercial solutions would give you (think WhatsApp, Skype and the likes). Well, it’s actually not more tap-proof considering the server operator can still read all communication at will, but I would like to believe that I am a more trustworthy server operator than certain big corporations. ;)

Oh, and if you finally do find it in yourself to use some good client software, check out [XChat] on Linux/UNIX and its fork [HexChat] on Windows, or [LimeChat] on MacOS X. There are mobile clients too, like for Android ([AndroIRC], [AndChat]), iOS ([SIRCL], [TurboIRC]), Windows Phone 8 ([IRC Free], [IRC Chat]), Symbian 9.x S60 ([mIRGGI]) and others.

So, all made easy now, whether client software or just web browser! Ah and before I forget it, here’s the link of course:

Edit: Currently, only the following browsers are known to work with the chat (older version may sometimes work, but are untested):

  • Mozilla FireFox 31+
  • Chromium (incl. Chrome/SRWare Iron) 30+
  • Opera 25+
  • Apple Safari 5.1.7+
  • KDE Konqueror 4.3.4+

The following browsers are known to either completely break or to make the interface practically unusable:

  • Internet Explorer <=11
  • Opera <=12.17