Nov 192016
 

FreeBSD GMABoost logoRecently, after finding out that the old Intel GMA950 profits greatly from added memory bandwidth (see [here]), I wondered if the overclocking mechanism applied by the Windows tool [here] had leaked into the public after all this time. The developer of said tool refused to open source the software even after it turning into abandonware – announced support for GMA X3100 and X4500 as well as MacOS X and Linux never came to be. Also, he did not say how he managed to overclock the GMA950 in the first place.

Some hackers disassembled the code of the GMABooster however, and found out that all that’s needed is a simple PCI register modification that you could probably apply by yourself on Microsoft Windows by using H.Oda!s’ [WPCREdit].

Tools for PCI register modification do exist on Linux and UNIX as well of course, so I wondered whether I could apply this knowledge on FreeBSD UNIX too. Of course, I’m a few years late to the party, because people have already solved this back in 2011! But just in case the scripts and commands disappear from the web, I wanted this to be documented here as well. First, let’s see whether we even have a GMA950 (of course I do, but still). It should be PCI device 0:0:2:0, you can use FreeBSDs’ own pciconf utility or the lspci command from Linux:

# lspci | grep "00:02.0"
00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller (rev 03)
 
# pciconf -lv pci0:0:2:0
vgapci0@pci0:0:2:0:    class=0x030000 card=0x30aa103c chip=0x27a28086 rev=0x03 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller'
    class      = display
    subclass   = VGA

Ok, to alter the GMA950s’ render clock speed (we are not going to touch it’s 2D “desktop” speed), we have to write certain values into some PCI registers of that chip at 0xF0hex and 0xF1hex. There are three different values regulating clockspeed. Since we’re going to use setpci, you’ll need to install the sysutils/pciutils package on your machine via # pkg install pciutils. I tried to do it with FreeBSDs’ native pciconf tool, but all I managed was to crash the machine a lot! Couldn’t get it solved that way (just me being too stupid I guess), so we’ll rely on a Linux tool for this. Here is my version of the script, which I call gmaboost.sh. I placed that in /usr/local/sbin/ for global execution:

  1. #!/bin/sh
  2.  
  3. case "$1" in
  4.   200) clockStep=34 ;;
  5.   250) clockStep=31 ;;
  6.   400) clockStep=33 ;;
  7.   *)
  8.     echo "Wrong or no argument specified! You need to specify a GMA clock speed!" >&2
  9.     echo "Usage: $0 [200|250|400]" >&2
  10.     exit 1
  11.   ;;
  12. esac
  13.  
  14. setpci -s 02.0 F0.B=00,60
  15. setpci -s 02.0 F0.B=$clockStep,05
  16.  
  17. echo "Clockspeed set to "$1"MHz"

Now you can do something like this: # gmaboost.sh 200 or # gmaboost.sh 400, etc. Interestingly, FreeBSDs’ i915_kms graphics driver seems to have set the 3D render clock speed of my GMA950 to 400MHz already, so there was nothing to be gained for me in terms of performance. I can still clock it down to conserve energy though. A quick performance comparison using a crappy custom-recorded ioquake3 demo shows the following results:

  • 200MHz: 30.6fps
  • 250MHz: 35.8fps
  • 400MHz: 42.6fps

Hardware was a Core 2 Duo T7600 and the GPU was making use of two DDR-II/667 4-4-4 memory modules in dual channel configuration. Resolution was 1400×1050 with quite a few changes in the Quake III configuration to achieve more performance, so your results won’t be comparable, even when running ioquake3 on identical hardware. I’d post my ~/.ioquake3/baseq3/q3config.cfg here, but in my stupidity I just managed to freaking wipe the file out. Now I have to redo all the tuning, pfh.

But in any case, this really works!

Unfortunately, it only applies to the GMA950. And I still wonder what it was that was so wrong with # pciconf -w -h pci0:0:2:0 0xF0 0060 && pciconf -w -h pci0:0:2:0 0xF0 3405 and the like. I tried a few combinations just in case my byte order was messed up or in case I really had to write single bytes instead of half-words, but either the change wouldn’t apply at all, or the machine would just lock up. Would be nice to do this with only BSD tools on actual FreeBSD UNIX, but I guess I’m just too stupid for pciconf

Jan 232014
 

Tulsa logoOver the past few years, my [x264 benchmark] has been honored to accept results from many an exotic system. Amongst these are some of the weirder x86 CPUs like a Transmeta Efficēon, a cacheless Intel Celeron that only exists in Asia, and even my good old 486 DX4-S/100 which needed almost nine months to complete what modern boxes do in 1-2 hours. Plus the more exotic ones like the VLIW architecture Intel Itanium² or some ARM RISC chips, one of them sitting on a Raspberry Pi. Also, PowerPC, a MIPS-style chinese 龙芯, or Loongson-2f as we call it, and so on and so forth.

There is however one chip that we’ve been hunting for years now, and never got a hold of. The Intel TULSA. A behemoth, just like the [golden driller] standing in the city that gave the chip its name. Sure, the Pentium 4 / Netburst era wasn’t the best for Intel, and the architecture was the laughingstock of all AMD users of that time. Some of the cores weren’t actually that bad though, and Tulsa is a specifically mad piece of technology.

Tulisa Contostavlos

Tulisa? That you?

Ehm… I said Tulsa, not Tulisa, come on guys, stay focused here! A processor, silicon and stuff (not silicone, fellas).

Xeon 7140M "Tulsa"

An Intel Xeon 7140M “Tulsa” (photograph kindly provided by Thomsen-XE)

Now that’s more like it right there! People seem to agree that the first native x86 dual core was built by Intel and that it was the Core 2. Which is wrong. It wasn’t. It was a hilarious 150W TDP Netburst Monster weighing almost 1.33 billion transistors with up to 16MB of Level 3 cache, Hyperthreading and an unusually high clock speed for a top-end server processor. The FSB800 16MB L3 Xeon MP 7140M part we’re seeing here clocks at 3.4GHz, which is pretty high even for a single core desktop Pentium 4. There also was an FSB667 part called Xeon MP 7150N clocking at 3.5GHz. Only that here we have 2 cores with HT and a metric ton of cache!

These things can run on quad sockets. Meaning a total of 8 cores and 16 threads, like seen on some models of the HP ProLiant DL580 G4. Plus, they’re x86_64 chips too, so they can run 64-Bit operating systems.

Tulsa die shot

Best Tulsa die shot I could find. To the right you can see the massive 16MB L3 cache. There is also 2 x 1MB L2.

And the core point: They’re rare. Extremely rare, especially in the maxed-out configuration of four processors. And I want them tested, as real results are scarce and almost nowhere to be found. Also, Thomsen-XE (who took that photograph of a 7140M up there) wants to see them show off! We have been searching for so long, and missed two guys with corresponding machines by such a narrow margin already!

We want the mightiest of all Netbursts and Intels first native dual core processor to finally show its teeth and prove that with enough brute force, it can even kill the Core 2 micro-architecture (as long as you have your own power plant, that is)!

So now, I’m asking you to please tell us in the comments whether you have or have access to such a machine and if you would agree to run the completely free x264 benchmark on that system. Windows would be nice for a reference x264 result, but don’t mind the operating system too much. Linux and most flavors of UNIX will do the job too! Guides for multiple operating systems are readily available at the bottom of the results list in [English] as well as [German].

If anyone can help us out, that’d be awesome! Your result will of course be published under your name, and there will be a big thank you here for you!

And don’t forget to say bye bye to Tulisa:

Tulisa Contostavlos #1

Well, thanks for your visit, Miss Contostavlos, but TULSA is the #1 we seek today!

Update: According to a [comment] by Sjaak Trekhaak my statements that Tulsa was Intels first native dual core were false. There were others with release dates before Tulsa, like the first Core Duo or the smaller Netburst-based Xeons with Paxville DP core, as you can also see in my reply to Sjaaks comment. Thus, the strike-through parts in the above text.

Jun 132013
 

Wine LogoSure there are ways to compile the components of my x264 benchmark on [almost any platform]. But you never get the “reference” version of it. The one originally published for Microsoft Windows and the one really usable for direct comparisons. A while back I tried to run that Windows version on Linux using [Wine], but it wouldn’t work because it needs a shell. It never occurred to me that I could maybe just copy over a real cmd.exe from an actual Windows. A colleague looked it up in the Wine AppDB, and it seems the cmd.exe only has [bronze support status] as of Wine version 1.3.35, suggesting some major problems with the shell.

Nevertheless, I just tried using my Wine 1.4.1 on CentOS 6.3 Linux, and it seems support has improved drastically. All cmd.exe shell builtins seem to work nicely. It was just a few tools that didn’t like Wines userspace Windows API, especially timethis.exe, which also had problems talking to ReactOS. I guess it wants something from the Windows NT kernel API that Wine cannot provide in its userspace reimplementation.

But: You can make cmd.exe just run one subcommand and then terminate using the following syntax:

cmd.exe /c <command to run including switches>

Just prepend the Unix time command plus the wine invocation and you’ll get a single Windows command (or batch script) run within cmd.exe on Wine, and get the runtime out of it at the end. Somewhat like this:

time wine cmd.exe /c <command to run including switches>

Easy enough, right? So does this work with the Win32 version of x264? Look for yourself:

So as you can see it does work. It runs, it detects all instruction set extensions (SSE…) just as if it was 100% native, and as you can see from the htop and Linux system monitor screens, it utilizes all four CPU cores or all eight threads / logical CPUs to be more precise. By now this runs at around 3fps+ on a Core i7 950, so I assume it’s slower than on native Windows.

Actually, the benchmark publication itself currently knows several flags for making results “not reference / not comparable”. One is the flag for custom x264 versions / compilations, one is for virtualized systems and one for systems below minium specifications. The Wine on Linux setup wouldn’t fit into any of those. Definitely not a custom version, running on a machine that satisfies my minimum system specs, leaving the VM stuff to debate. Wine is per definition a runtime environment, not an emulator, not a VM hypervisor or paravirtualizer. It just reimplements the Win32/64 API, mapping certain function calls to real Linux libraries or (where the user configures it as such) to real Microsoft or 3rd party DLLs copied over. That’s not emulation. But it’s not quite the same as running on native Windows either.

I haven’t fully decided yet, but I think I will mark those results as “green” in the [results list], extending the meaning of that flag from virtual machines to virtual machines AND Wine, otherwise it doesn’t quite seem right.

gt;

May 292013
 

Gainward logoSo, there is this mainboard, an Intel D850EMV2, or rather D850EMVR, which is a sub-version of the former, i850E Rambus chipset. What’s special about that old Pentium 4 board? Well, I won it once in a giveaway at one of the largest german hardware websites, [Computerbase]German flag. And after that, Jan-Frederik Timm, founder and boss of the place contacted me on ICQ, telling me about it. First time I had ever won anything! He asked me to put it to good use, because he was kind of fed up with people just reselling their won stuff. So i promised him that I would.

And boy, did i keep that promise! At first i used a shabby 1.6GHz Northwood processor with just 128MB Rambus RDRAM. Can’t remember the rest of the machine, but over time I upgraded it a bit (when it was already considered old, with Core 2 Duos on the market), a bit more RAM etc. for small LAN party sessions. At some time, I sold it to my cousin, and to make it more powerful and capable for her, I gave the machine the only one and fastest hyper-threaded processor available on that platform, the Pentium 4 HT 3.06GHz, plus 1.5GB PC800-45 RDRAM and my old GeForce 6800 Ultra AGP.

She used that for the internet and gaming etc. for some time until the GeForce died and she thought the machine barely powerful enough for her top-tier games anyway, so she bought a new one, which I built for her. The Intel board and its stuff got my older GeForce FX5950 Ultra then and was used by my uncle for the Internet on Debian Linux and low-end LAN games on Windows XP.

A long time after I first got it, I contacted Jan from Computerbase again, to tell him that I had kept my promise and ensured the board had been used properly for 8 years now. Needless to say he was delighted and very happy that it wasn’t just sold off for quick cash.

Soon after, my cousin got another even more powerful machine, as her Core 2 Duo mainboard died off. Now it was S1156, GTX480 etc. So my uncle bought a new mainboard and I rebuilt the C2D for him with my cousins old GTX275. I asked him if he would part with the D850EMVR and he agreed to give it back to me, after which it collected dust for a year or so.

Now, we need another machine for our small LAN parties, as our Notebooks can’t drive the likes of Torchlight II or Alien Swarm. It was clear, what I had to do: Keep the damn Intel board running until it fucking dies!

This time I chose to make it as powerful as it could remotely become. With a Gainward Bliss GeForce 7800GS+ AGP. The most powerful nVidia based AGP card ever built, equipped with a very overclockable 7900GT GPU with a full 24 pixel pipelines and 8 vertex shaders as well as 512MB Samsung RAM. Only Gainward built it that way (a small 7900 GTX you could say), as nVidia did not officially allow such powerful AGP cards. So this was a limited edition too. I always wanted to have one of those, but could never afford them. Now was the time:

As expected (there were later, more powerful AGP8x systems in comparison to this AGP4x system, with faster Pentium4s and Athlon64s), the CPU is limiting the card. But at least I can add some FSAA or even HDRR at little cost in some games, and damn, that card overclocks better than shown on some of the original reviews! The core got from 450MHz to 600MHz so far, dangerously close to the top-end 7900 GTX PCIe of the time with its 650MHz. Also, the memory accepted some pushing from 1.25GHZ DDR3 to 1.4GHz DDR3 data rate. Nice one!

This was Furmark stable, and the card is very silent and rather cool even under such extreme loads. Maybe it’ll accept even more speed, and all that at a low 1.2V GPU voltage. Cool stuff. Here, a little AquaMark 3 for you:

7800gs+ in AquaMark 3

So, this is at 600MHz core and 1400MHz DDR memory. For comparison I got a result slightly above 53k at just 300MHz core. So as you can see, at least the K.R.A.S.S. engine in AquaMark 3 is heavily CPU bound on this system. So yeah, for my native resolution of 1280×1024 on that box, the card is too powerful for the CPU in most cases. The tide can turn though (in Alien Swarm for instance) when turning on some compute-heavy 128-bit floating point rendering with HDR or very complex shaders, or FSAA etc., so the extra power is going to be used. ;) And soon, 2GB PC1066-32p RDRAM will arrive to replace the 1GB PC800-45 Rambus I have currently, to completely max it out!

So I am keeping my promise. Still. After about 10 years now. Soon there will be another small LAN party, and I’m going to use it there. And I will continue to do so until it goes up in flames! :)

Update: The user [Tweakstone] has mentioned on [Voodooalert]German flag, that XFX once built a GeForce 7950GT for AGP, which was more powerful than the Gainward. So I checked it out, and he seems to be right! The XFX 7950GT was missing the big silent cooler, but provided an architecturally similar G71 GPU at higher clock rates! While the Gainward 7800GS+ offered 450MHz on the core and 1250MHz DDR data rate on the memory, the XFX would give you 550MHz core and 1300MHz DDR date rate at a similar amount of 512MB DDR3 memory. That’s a surprise to me, I wasn’t aware of the XFX. But since my Gainward overclocks so well (it’s the same actual chip after all) and is far more silent and cool, I guess my choice wasn’t wrong after all. ;)

Update 2: Since there was a slight glitch in the geometry setup unit of my card, I have now replaced it with a Sapphire Radeon HD3850 AGP, which gives more performance, slightly better FSAA and as the icing on the cake proper DXVA1 video acceleration. Even plays BluRays in MPC-HC now. ;) Also, I retested AquaMark 3, which seems to require the deletion of the file direcpll.dll from the AquaMark 3 installation directory to not run into an access violation exception at the end of the benchmark on certain ATi or AMD graphics hardware. I guess the drivers are the problem here. But with that troublesome file gone, here’s a new result:

AquaMark 3 on an ATi Radeon HD3850 AGP

Yeah, it’s a bit faster now, but not much. As we can see, the processor is clearly the limiting factor here. But at least I now have relatively problem-free 3D rendering and DXVA on top of it!

Oct 252012
 

Sun Grid Engine LogoOk ok, I guess whoever is reading this (probably nobody anyway) will most likely already be tired of all this x264 stuff. But this one I need as documentation for myself anyway, because the experiment might be repeated later. So, the [chair for simulation and modelling of metallurgic processes] here at my university has allowed me to try and play with a distributed grid-style Linux cluster built by Supermicro. It’s basically one full rack cabinet with one Pentium 4 3.2GHz processor and 1GB RAM per node, with Hyper-Threading being disabled because it slowed down the simulation jobs that were originally being run on the cluster. Operating system for the nodes was OpenSuSE 10.3. Also, the head node was very similar to the compute nodes, which made it easy to compile libav and x264 on the head node and let the compute nodes just use those binaries.

The software installed for using it is the so called Sun GRID engine. I  have once already set up my own distributed cluster based on an OpenPBS style system called Torque, together with the Maui scheduler. When I was introduced to this Sun GRID engine, most of the stuff seemed awfully familiar, even the job submission tools and scripting system were quite the same actually. So this system uses tools like qsub, qdel, qstat plus some additional ones not found in the open source Torque system, like sns.

Now since x264 is not cluster-aware and not MPI capable, how DO we actually distribute the work across several physical machines? Lacking any more advanced approaches, I chose a very crude way to do it. Basically, I just cut the input video into n slices, where n is the number of cluster nodes. Since the cluster nodes all have access to the same storage backend via NFS, there was no need to send the files to the nodes, as access to the users home directory was a given.

Now, to make the job more easy, all slices were numbered serially, and I wrote a qsub job array script, where the array id would be used to specify the input file. So node[2] would get file[2], node[15] would get file[15] to encode etc. The job array script would then invoke the actual worker script. This is what the sliced input files look like before starting the computation:

Sliced input file

And here, the qsub job array script that I sent to the cluster, called benchmark-qsub.sh, the directory /SAS/home/autumnf is the users home directory:

#$ -N x264benchmark
#$ -t 1-19
 
export PATH=$PATH:/SAS/home/autumnf/usr/bin
echo $PATH
 
cd /SAS/home/autumnf/x264benchmark
 
time transcode.sh

And the actual worker script, transcode.sh:

#!/bin/bash
# Pass 1:
x264 --preset veryslow --tune film --b-adapt 2 --b-pyramid normal -r 3 -f -2:0 --bitrate 10000 --aq-mode 1 -p 1 --slow-firstpass --stats benchmark_slice$SGE_TASK_ID.stats -t 2 --no-fast-pskip --cqm flat slice$SGE_TASK_ID.264 -o benchmark_1stpass_slice$SGE_TASK_ID.264
 
# Pass 2:
x264 --preset veryslow --tune film --b-adapt 2 --b-pyramid normal -r 3 -f -2:0 --bitrate 10000 --aq-mode 1 -p 2 --stats benchmark_slice$SGE_TASK_ID.stats -t 2 --no-fast-pskip --cqm flat slice$SGE_TASK_ID.264 -o benchmark_2ndpass_slice$SGE_TASK_ID.264

As you can see, the worker is using the environment variable $SGE_TASK_ID as a part of the input and output file names. This variable contains the job array id passed down from the job submission system of the Sun GRID engine. The actual job submission script contains the line #$ -t 1-19 which tells the system, that the job array consists of 19 jobs, as the cluster had 19 working nodes left, the rest was already dead as the cluster was pretty much out of service and hence unmaintained. Let’s see how the Sun tool sns reports the current status of the grid cluster:

Empty Grid Cluster

So, some nodes are in “au” or “E” status. While I do not know the exact meaning of the status abbreviations, that basically means that those nodes are non-functional. Taking the broken nodes into account we have 19 working ones left. Now every node invokes its own x264 job and gives to it the proper input file from slice1.264 to slice19.264, writing correspondingly named outputs for both passes. Now let’s send the script to the Sun GRID engine using qsub ./benchmark-qsub.sh and check what sns has to say about this afterwards:

Sns reporting a grid cluster under load

Hurray! Now if you’re more used to OpenPBS style tools, we can also use qstat to report the current job status on the cluster:

Qstat showing a cluster under load

As you can see,  qstat also reports a “ja-task-ID”, which is essentially our job array id or in other words $SGE_TASK_ID. So thats basically one job with one job id, but 19 “daughter” processes, each having its own array id. Using tools like qdel or qalter you can either modify the entire job, or only subprocesses on specific nodes. Pretty handy. Now the Pentium 4 processor might suck ass, but 19 of them are still pretty damn powerful when combined, at the moment of writing you can find the cluster on [place #4 on the results list]! Here the Voodooalert style result, just under one hour:

0:58:01.600 | SMMP | 19/1/1 | Intel Pentium 4 (no HT) 3.20GHz | 1GB DDR-I/266 (per node) | SuperMicro/SGE GRID Cluster | OpenSuSE 10.3 Linux (Custom GCC Build)

To ensure that this is actually working, I have recombined the output slices of pass 2, and tried to play that file. To my surprise it worked and would also allow seeking. Pretty nice considering that there is quite some bogus data in the file, like multiple H.264/AVC headers or cut up frames. I originally tried to split the input file into slices at keyframes in a clean fashion using ffmpeg, but that just wouldn’t work for that type of input, so I had to use dd, resulting in some frames being cut up (and hence dropped), and slices 2-19 having no headers. That required very specific versions of libav and x264, as not all versions can accept garbled files like this.

Also, the output files have been recombined using dd. Luckily, mplayer using libav/ffmpeg would play that stuff nicely, but there’s simply no guarantee that every player and/or decoder would. So that’s why it cannot be considered a clean solution. Also, since motion estimation is less efficient for this setup at the cutting points, it’s not directly comparable to a non-clustered run. So there are some drawbacks. But if you would cluster x264 for productive work, you’d still do it kind of like that. Here, the final output, already containing the final concatenated file, quite a mess of files right there:

Clusterrun done

So this is it. The clustered x264. I hope to be able to test this approach on another cluster at the Metallurgy chair in the next months, a Nehalem-based machine with far more cores, so that’d be really massive. Also, access to a Sandy Bridge-E cluster is possible, although not really probable. But we’ll see. If you’re interested in using x264 in a similar approach, you might want to check out the software versions that I used, these should be able to cope with rudely cut up slices quite well:

Also, if you require some guidance building that source code on Linux, please check out my guide:

If anybody knows a better way to slice up H.264/AVC elementary video streams, by all means, let me know! I would love to be able to have slices cut at proper keyframe positions including their own header, and I would also like to be able to reconcatenate slices to one file that is clean, having only one header at the beginning of the file and no damaged / to be dropped frames at their joints. So if you know how to do that – preferrably using Linux command line tools – just tell me, I’d be happy to learn that!

Edit: Thanks to [LoRd_MuldeR] from the [Doom9 Forums] I now have a way of splitting the input stream cleanly at GOP (group of pictures) boundaries, as the Elephants Dream movie is luckily using closed GOPs. Basically, it involves the widely-used [MKVtoolnix]. With that tool, you can just take the stream, split it to n MKV slices, and then either use those or extract the H.264/AVC streams from those slices, maybe using [tsMuxer]. Just make as many as your cluster has compute nodes, and you’re done!

By the way, both MKVtoolnix and tsMuxer are available for Windows and Linux, also MacOS X.

This is clean, safe and proper, other than my dirty previous approach!

Oct 232012
 

DragonFly BSD LogoAt first I thought there were only three big BSD distributions, namely FreebBSD, OpenBSD and NetBSD. But there is a fourth one, which might very well become the last for me to port the x264 benchmark to. DragonFly BSD. Actually a pretty cool system with its own, quite modern file system called “HAMMER”. Surprisingly, like NetBSD it made my life [quite easy], so now 50% of the major BSD systems are covered successfully. OpenBSD might eventually join the ranks as soon as version 5.2 is being released, supposedly including a more modern compiler and hopefully assembler. But so far, it’s NetBSD and DragonFly BSD.

Interestingly, DragonFly BSD not only features its own file system, but also its own light-weight threading implementation. Of course I cannot use that, as I have to stick to a threading model supported by libav/x264 (either BeOS threads which are broken anyway, win32 or posix, so posix it is), but it really looks like a lot of work went into this project. On the down side, it only supports x86_32 and x86_64, where the other BSDs support a sometimes extremely wide range of microarchitectures, with NetBSD clearly [in the lead] even over any Linux distribution.

Now it seems almost everything that can be conquered, has been conquered. From oddballs like the Win32 ReactOS to strange alien systems like Haiku and Unices like Solaris and different BSDs. And Linux of course, on quite some different architectures (MIPS, PPC, ARM, IA-64, …). While OpenBSD might still follow, and someday maybe even FreeBSD, I think the exploration time is pretty much over.

I did try to get on to [Fafner], a VAX running in the cellar of a crazy university professor, but it seems  the OpenVMS operating system with its compilers and build toolchains is far, far beyond my porting skills. Same goes for weirdos like Tanenbaums Minix. So for now, this is it:

DragonFly BSD running x264

Oct 062012
 

ReactOS LogoIt seems that some miracle has happened. I have now tried quite some builds of ReactOS when i stumbled over a certain [v0.4-SVN r57481] just two days ago. Now previously, that OS would just bluescreen on larger data transfers. Older builds would give an NTOSKRNL.EXE bluescreen, newer ones would show the tcpip.sys kernel driver faulting. Also, on x264 CPU/RAM load the kernel would just die after very few minutes, sometimes seconds. So my little x264 benchmark would reach something like 300-400 (of 15691) frames, and then everything would go right down to hell.

As you can imagine, I was quite pleasantly surprised when all of a sudden I could easily download an almost 700MB large file from the internet with v0.4-SVN r57481 without the transfer stalling (which also happened sometimes previously) or the machine bluescreening. I was even more suprised when i ran x264 and found the box to gnaw its way through 8 hours of high load without a single glitch! Now dear developers, that’s quite some progress right there in kernel space, I am impressed!

As for  the x264 benchmark thing, there are a few minor modifications necessary for it to do its time measurement correctly, but that’s no big issue. Links to corresponding guides have already been [added]. Here you can see the thing running for quite some time already, with no crash. I still can’t believe it, so many problems gone all of a sudden:

ReactOS stabilized

Maybe ReactOS can become a real open source replacement for Windows one day? The sudden leap in stability is kind of reassuring. If only they could get more developers working on this stuff.

Sep 252012
 

Haiku LogoAnd so the quest continues, today, a new operating system type: BeOS, in its modern open source form: Haiku. BeOS was originally developed in the 90s as a multimedia operating system to compete with Microsoft Windows and Apple MacOS. However, the operating system never quite took off. In more recent times, the entire OS has been resurrected under a new, japanese name: Haiku.

Still featuring a BeOS style kernel, the OS has been equipped with a lot of Posix & GNU tools to easy the pain of porting and developing software for the OS. It actually looks and feels very tidy, fast and cool.

In recent development builds, there are even versions with GCC 4.6 shipped, but on top of that, what you get is also a complete modern GNU build toolchain, including but not limited to yasm, autoconf, make, imake etc. So I tried to build and link libav and x264 obviously, but failed. One problem was that some OS functions (like Posix thread implementation) are implemented in different libraries (libroot instead of libm or libpthread), so modifications of linker flags are necessary, e.g. -lroot instead of -lm.

But there were more severe problems prohibiting me from linking x264 against libav or ffmpeg itself. I have as of yet not been able to fully figure out why, but it’s most definitely a linker problem with failing library/header detections and even missing references on linking. Maybe some of the libs are even actually missing, but I am not sure why they wouldn’t have built when compiling libav and/or ffmpeg. Well, maybe I’ll manage in the future, meanwhile, check out this Haiku screenshot, it does look rather cool:

Haiku Screenshot

Sep 202012
 

x264 LogoI have played around with PHP a little again, and actually managed to generate PNG images with some elements rendered to them using a few basic GD functions of the scripting language. This is all still very new to me, so don’t be harsh! ;)

I thought I might use this to create some dynamic and more fancy than plain text statistics about the [x264 benchmark]. I decided to do some simple stats about operating systems and CPUs first, didn’t want to overdo it.

So I went for basic OS families and a more broken down visualization of all Windows and all UNIX derivatives. For microprocessors I went for basic architecture families (x86, RISC, VLIW) and a manufacturer breakdown. I know “x86” should probably have been “CISC” instead, but since x86 in itself is so wide-spread, I thought I should just make it its own family. See the following links:

Just so you can see how the generated images look like, I’ll link them in here. As you can see I decided to keep it very plain and simple, no fancy graphics, operating systems first:

Operating systems

Windows operating systems

Windows operating systems

And the microprocessors:

Microprocessor architectures

Microprocessor manufacturers

Not too bad for my first PHP-generated dynamic images? I would sure like to think so. ;)

Sep 172012
 

ReactOS LogoIn my ever-ongoing quest to port my x264 benchmark to every possibly thinkable platform, I gave ReactOS a shot, in both VirtualBox and VMware Player. The idea of this operating system is actually pretty nice, reminds me of what Linux did. Essentially Linux was aiming at building a UNIX system, totally free and open, but using the same standards and APIs. Now ReactOS is striving to be the same for Windows, mostly 2000/XP/2003. So you should be able to run this open source operating system and be able to run regular Win32 binary code on it. Which…  kind of works. To some extent.

Main problem though is stability. So I got system freezes and bluescreens just copying data over the network using the FTP protocol. Needless to say the same thing happened when copying data from a CD ISO image to C:\. Some memory allocation BSODs, according to the backtrace I tried. In the end I copied my video data offline using a Knoppix CD, and then tried to run what I intended to run in the first place, although – you may have guessed it – a few minutes into my benchmark it was BSOD time once again:

x264 benchmark running on ReactOS

Really a shame. ReactOS – now at version 0.3.14 – has been in active development for 14 (in words: FOURTEEN) years, and this is what they came up with? Not only is it horribly unstable, it’s also horribly incompatible to a wide range of Win32 applications that work perfectly fine in Windows XP and Windows 2000, heck, even on Wine! Now, Wine is another component of ReactOS for the handling of userspace applications, but in this specific form it does not work very well. Almost worse than what the kernel does. It seems that ReactOS development just can’t quite take off, like Linux did.

This is actually the real shame here, because a free, open source replacement for Windows – even if it’s just NT5.x – would be amazingly great, given that even parts of DirectX have already been ported to “ReactX”. Despite its age however, it’s just far too deep in alpha stage to be of any use it seems. :(