Jan 112018
AMI logo

1.) Introduction

This is something I’ve been wanting do do years ago, but the recent outcry regarding the Meltdown and Spectre hardware security holes have reminded me of this. Since fixing at least parts of the exploits also requires CPU microcode support (from here on: “µcode”), and not all operating systems and mainboards/machines might get them, this might be useful for the current situation as well. It can also be helpful if you wish to patch Intel Xeon support into a slightly older platform, where the manufacturer hasn’t done so already. This article does however not apply to modern UEFI-based systems.

So, instead of getting the required CPU µcodes from a donor BIOS, why not obtain the latest version from Intels’ [µcode package for Linux]? Oh, and it’s not really Linux-specific anyway.

I will show you how to extract and prepare the Intel µcodes and how to patch them so you can embed them into pretty much any somewhat modern AMI BIOS. AMI BIOSes are pretty widespread, but tools similar to the AMI-specific ones shown here might of course also exist for other BIOS brands.

The steps of this guide will be shown for both Linux/UNIX as well as Microsoft Windows.

All the tools used in this post can be found at the end of the article. You can also get all the currently released µcodes in one package there, so you don’t need to download all the archives from Intel (not all archives contain all µcodes…).

2.) Fetch and extract/convert Intels’ µcode package

You can get the µcodes directly from Intel, again, here’s the [link]. To unpack the .tgz archive on MS Windows, you may want to use [7-zip] (It’s a 2-step process with 7-zip, from .tgz to .tar and from .tar to the final files). For older releases, all you’ll get by doing this is a microcode.dat file, sometimes the file name will also contain a date. That’s all the µcodes assembled in one file in a text format. This is useless for patching the data into BIOS images, so we’ll need to extract the individual CPU µcodes and convert them into the proper binary format.

Newer releases might contain pre-built binary versions as well, but we’ll just continue to work with microcode.dat.

2a.) On Linux or UNIX

We’ll use iucode_tool for that, as at least most Linux systems should have a package ready for that program. If you can by no means get or compile iucode_tool, you might compile microdecode from source on Linux or UNIX instead, see 2b.) and 8.)! I have tested this on CentOS 6.9 Linux, and microdecode compiles just fine. But let’s continue with iucode_tool:

To extract the µcodes, switch to the directory containing microcode.dat (after unpacking the .tgz file with tar -xzf) and run the following commands:

$ mkdir ./µcodes/
$ iucode_tool -t d -L --write-named-to="./µcodes/" "microcode.dat"; done

With that, you’ll get a lot of *.fw binary files in ./µcodes/. Those are in the regular Intel µcode format and come in different sizes depending on the CPU. The file names look similar to e.g. this: s000206C2_m00000003_r00000013.fw. The first “s” part contains the CPUID, the second “m” part shows the platform ID, and the “r” part denotes the µcode revision, all of it in hexadecimal notation.

2b.) On Windows

On Windows, we’ll use the microdecode command line tool. It’s a bit easier to use than iucode_tool, as it’s only purpose is really just µcode extraction and nothing else. Switch to the directory containing the unpacked microcode.dat, make sure microdecode.exe is in your search path (or in the same directory), and run the following command:

microdecode.exe microcodes.dat

This will result in a lot of *.bin files that are identical to the .fw files the *nix tool extracts. The file names will be similar to this: cpu000206c2_plat00000003_ver00000013_date20100907.bin. The first “cpu” part contains the CPUID like above, the second “plat” part contains the platform ID, and the “ver” part shows the µcode revision. Additionally, we also get the release date in the final “date” part.

3.) Identifying your CPU

You’ll likely know your CPU model if you’re reading this. But what you’ll need to know above that is your exact CPUID, stepping and the revision of CPU µcode you’re currently running. You can look up some of the information on cpu-worlds’ [CPUID database], but ultimately, you’ll need to fetch it from your local machine anyway.

Important background information: The CPUID is a 32-bit value structured as follows:

CPUID structure

The binary structure of Intels’ 32-bit CPUID, 4 bits equal one hexadecimal character

The binary matrix is usually expressed as 4-bit hexadecimal characters. e.g. the binary CPUID 0000:0000:0000:0010:0000:0110:1100:0010bin would result in 000206C2hex.

3a.) On Linux/UNIX

On Linux, the required information can be grabbed from procfs. Like this, for example:

$ cat /proc/cpuinfo | grep -e "model" -e "stepping" -e "microcode" -e "cpu family" | sort | uniq

cpu family	: 6
microcode	: 15
model		: 44
model name	: Intel(R) Core(TM) i7 CPU       X 980  @ 3.33GHz
stepping	: 2

You have to be careful interpreting this information however, as it’s mostly decoded in decimal form. Modern kernels might show some of the data in hexadecimal, so be careful here!

But first, convert the decimals from /proc/cpuinfo to hexadecimal, where necessary, in this case we’ll do it for the µcode revision and the model number:

$ printf '%X\n' 15
$ printf'%X\n' 44

So, the model number is 2Chex. Here, the right part or Chex is the actual model number (CPUID bits 4..7) and 2hex is the extended family number (CPUID bits 16..19).

The µcode version here would be Fhex. Note that µcode versions can be wider than just 1 hexadecimal character as well. The stepping 2 is a number lower than 15, so we don’t need to convert it to hexadecimal.

To assemble our final string which is 8 hexadecimal characters wide, we walk through the bit mask on the image shown above step by step! All “reserved” parts are just filled with blanks:

  1. 0 (Bits 31..28, reserved)
  2. 0 (Bits 27..24, extended CPU family, zero for family 6)
  3. 0 (Bits 23..20, extended CPU family, zero for family 6)
  4. 2 (Bits 19..16, extended CPU model number)
  5. 0 (Bits 15..12, two reserved and two platform type bits, typically zero)
  6. 6 (Bits 11..8, CPU family code)
  7. C (Bits 7..4, CPU model number)
  8. 2 (Bits 3..0, CPU stepping ID)

As a result of this, our CPUID string looks as follows:

  • 000206C2hex, currently running µcode revision Fhex.

Please note that information down, it’s needed in the next step.

Of course, if you’re running some other UNIX, you’d be needing different tools. On FreeBSD you may be able to work with the “cpuflags” tool, like this:

$ cpuflags x 2>& | grep "Origin"

For other UNICES, you’re on your own though.

3b.) On Windows

On Windows I’d recommend Ray Hinchcliffes’ [System Information Viewer], or “SIV” in short. It’s a pretty powerful freeware system information tool that will read out the stuff we need for the next steps (You can also use something like CPU-Z of course). See the following screenshot to learn what you need to look for:

System Information Viewer (SIV)

System Information Viewer (click to enlarge)

From left to right, we need the following data as pointed out with those arrows: CPU family 6hex, CPU model 2Chex (you can ignore the decimal value “44”), stepping 2hex and µcode revision 13hex. Again, 2Chex is a combination of model number Chex and extended model number 2hex!

Just like on Linux, the final identification string needs to be assembled first. Luckily, SIV already does the decimal -> hexadecimal conversion for us. To assemble the entire CPUID, put everything together like this: 

  1. 0 (Bits 31..28, reserved)
  2. 0 (Bits 27..24, extended CPU family, zero for family 6)
  3. 0 (Bits 23..20, extended CPU family, zero for family 6)
  4. 2 (Bits 19..16, extended CPU model number)
  5. 0 (Bits 15..12, two reserved and two platform type bits, usually zero)
  6. 6 (Bits 11..8, CPU family code)
  7. C (Bits 7..4, CPU model number)
  8. 2 (Bits 3..0, CPU stepping ID)

As a result of this, our CPUID string looks as follows:

  • 000206C2hex, currently running µcode revision 13hex.

The revision is the µcode version you’re currently running. Newer version will just have higher numbers, so that’s how you can compare them with what you extracted from Intels’ package.

Note that information down, as you’ll need it in the next step.

4.) Locating the correct µcodes

4a.) On Linux/Unix

Let’s continue to assume your CPUID string is 000206C2hex. Switch to the directory where you extracted the Intel µcode package for Linux and look for the correct µcode binaries (in my case I have them grouped in folders by date as well):

$ find . -iname "*000206C2*"

Two versions are available: The newer 13hex and the older Fhex (in decimal that would amount to 19 vs. 15, higher is always newer).

4b.) On Windows

Like on Linux, let’s assume your CPUID string is 000206C2hex. Switch to the directory where your extracted Intel µcodes are, and look for the corresponding files:

DIR /B /S "*000206C2*"

There you go, the newer 13hex and the older Fhex µcodes.

5.) What’s wrong with AMI MMTool: A detailed explanation

To embed the µcodes into an actual AMI BIOS image, AMIs’ MMTool “CPU Patch” function is required. It’s a Windows program, but given its simple nature, it will run fine with Wine on Linux and FreeBSD UNIX as well:


 But if you attempt to embed Intels’ original binaries, it’ll fail right away:

MMTool failing to import Intel µcodes directly

MMTool failing to import Intel µcodes directly

If you extract existing µcodes from a given BIOS image and compare them with the same µcode revision in Intels’ format, you’ll notice a size difference:

$ ls -al ./06C2-Rev13.bin 20100914/µcodes/s000206C2_m00000003_r00000013.fw
-rw-rw-r-- 1 thrawn users 8192 Jan 10 14:04 ./06C2-Rev13.bin
-rw-r--r-- 1 thrawn users 7168 Jan 10 14:26 20100914/out/s000206C2_m00000003_r00000013.fw

Intels’ µcode file in this case is 7168 bytes or 7kiB in size, but the stuff we extracted from that AMI BIOS is 8192 bytes or 8kiB in size. So what’s the deal here? At first I had no idea at all and thought that they’d just be encoded differently in the AMI BIOS or something. But it turns out that that is not the case. If you open both files in a hex editor of your choice, you’ll notice something at the end of the file that came from the AMI BIOS:


The final 1kiB is just binary zeroes! So how do we fix that? In essence, the entire procedure is like this:

  1. Extract µcodes from the target AMI BIOS image using MMTool.
  2. Look at how large that file is (in bytes).
  3. Look at how large the desired Intel µcode file “for Linux” is for your exact CPU.
  4. Calculate the difference in bytes.
  5. Fill the end of the Intel file with as many binary zeroes as there are missing bytes.
  6. Save that file and try to load it into your BIOS image using MMTools’ “CPU Patch” function.

Alright, let’s do it!

6.) Patching those µcodes for AMI MMTool and patching the BIOS itself

First, extract some random µcode from your BIOS using AMI MMTools’ “CPU Patch” function. Look at the file and note down its’ size in bytes. Also, you may want to look at the extracted µcode to see whether it has a lot of zeroes at the end. In my case, as said, the µcodes in that AMI BIOS are 8192 bytes, or 8kiB large, while Intel’s µcode is 7kiB in size. 8192 – 7168 = 1024, so again, the difference is 1024 bytes or 1kiB.

Please triple-check this stuff, as a wrong padding will break the µcode! Of course MMTool should warn you if something’s wrong, but it doesn’t hurt to make sure it’s safe from the start.

6a.) On Linux/UNIX

Let’s make this easy, generate a 1kiB zero padding file for future use on similar µcode files:

$ dd bs=1 count=1024 if=/dev/zero of=./1kiB-zeropadding.bin
1024+0 records in
1024+0 records out
1024 bytes (1.0 kB) copied, 0.00383825 s, 267 kB/s

And now, append the zeroes to the end of the file, generating a new file in the process, so we don’t touch the originals:

$ cat ./20100914/µcodes/s000206C2_m00000003_r00000013.fw ./1kiB-zeropadding.bin > ./0206C2-rev13.bin

That file 0206C2-rev13.bin is now the final µcode file for patching into the target BIOS. As an example, I’ll patch those µcodes for Xeon X5600 processors into an ancient BIOS version 0402 of an ASUS P6T Deluxe mainboard:


Save that file, and you’re done modifying your BIOS with µcodes from Intels’ Linux package!

6b.) On Windows

On MS Windows there is no dd command by default, but we have a another preinstalled utility for creating our 1kiB zero padding file: fsutil! Like this:

fsutil File CreateNew .\1kiB-zeropadding.bin 1024

We don’t have a cat command on Windows either, but we can use a binary copy for file concatenation, no need to install any additional tools. We’ll attach the zero padding to the end of the Intel µcode and write the result to a new file, so we’re not altering the originals:

COPY /B .\20100914\cpu000206c2_plat00000003_ver00000013_date20100907.bin + .\1kiB-zeropadding.bin .\0206C2-rev13.bin
        1 file(s) copied.

That file 0206C2-rev13.bin is now the final µcode file for patching into the target BIOS. As an example, I’ll patch those µcodes for Xeon X5600 processors into an ancient BIOS version 0402 of an ASUS P6T Deluxe mainboard:


Save that file, and you’re done modifying your BIOS with µcodes from Intels’ Linux package!

7.) Flashing the modified BIOS

This step depends on your system. You may be able to flash from within the BIOS itself (e.g. from floppy disk or USB pen drive), or from within MS Windows or by using some bootable medium.

In any case, your BIOS image is ready now, and can be flashed at your convenience. After the update, re-check your µcode revision as shown in 3.). Your CPU should now be running the updated µcode!

8.) Downloads

Some of the programs used in this article can be downloaded here:

  • [AMI MMTool v3.22]
  • [microdecode], compiled by myself, contains 32-bit and 64-bit versions of the tool as well as the source code and license file. Compiles on Linux/UNIX as well.
  • [Intel µcodes], everything from 2009-03-30 up to 2018-01-08 as available on the Intel web site at the moment.

9.) Hopefully you haven’t bricked your mainboard

…all of this without any kind of guarantees of course! :roll:

Jan 232014

Tulsa logoOver the past few years, my [x264 benchmark] has been honored to accept results from many an exotic system. Amongst these are some of the weirder x86 CPUs like a Transmeta Efficēon, a cacheless Intel Celeron that only exists in Asia, and even my good old 486 DX4-S/100 which needed almost nine months to complete what modern boxes do in 1-2 hours. Plus the more exotic ones like the VLIW architecture Intel Itanium² or some ARM RISC chips, one of them sitting on a Raspberry Pi. Also, PowerPC, a MIPS-style chinese 龙芯, or Loongson-2f as we call it, and so on and so forth.

There is however one chip that we’ve been hunting for years now, and never got a hold of. The Intel TULSA. A behemoth, just like the [golden driller] standing in the city that gave the chip its name. Sure, the Pentium 4 / Netburst era wasn’t the best for Intel, and the architecture was the laughingstock of all AMD users of that time. Some of the cores weren’t actually that bad though, and Tulsa is a specifically mad piece of technology.

Tulisa Contostavlos

Tulisa? That you?

Ehm… I said Tulsa, not Tulisa, come on guys, stay focused here! A processor, silicon and stuff (not silicone, fellas).

Xeon 7140M "Tulsa"

An Intel Xeon 7140M “Tulsa” (photograph kindly provided by Thomsen-XE)

Now that’s more like it right there! People seem to agree that the first native x86 dual core was built by Intel and that it was the Core 2. Which is wrong. It wasn’t. It was a hilarious 150W TDP Netburst Monster weighing almost 1.33 billion transistors with up to 16MB of Level 3 cache, Hyperthreading and an unusually high clock speed for a top-end server processor. The FSB800 16MB L3 Xeon MP 7140M part we’re seeing here clocks at 3.4GHz, which is pretty high even for a single core desktop Pentium 4. There also was an FSB667 part called Xeon MP 7150N clocking at 3.5GHz. Only that here we have 2 cores with HT and a metric ton of cache!

These things can run on quad sockets. Meaning a total of 8 cores and 16 threads, like seen on some models of the HP ProLiant DL580 G4. Plus, they’re x86_64 chips too, so they can run 64-Bit operating systems.

Tulsa die shot

Best Tulsa die shot I could find. To the right you can see the massive 16MB L3 cache. There is also 2 x 1MB L2.

And the core point: They’re rare. Extremely rare, especially in the maxed-out configuration of four processors. And I want them tested, as real results are scarce and almost nowhere to be found. Also, Thomsen-XE (who took that photograph of a 7140M up there) wants to see them show off! We have been searching for so long, and missed two guys with corresponding machines by such a narrow margin already!

We want the mightiest of all Netbursts and Intels first native dual core processor to finally show its teeth and prove that with enough brute force, it can even kill the Core 2 micro-architecture (as long as you have your own power plant, that is)!

So now, I’m asking you to please tell us in the comments whether you have or have access to such a machine and if you would agree to run the completely free x264 benchmark on that system. Windows would be nice for a reference x264 result, but don’t mind the operating system too much. Linux and most flavors of UNIX will do the job too! Guides for multiple operating systems are readily available at the bottom of the results list in [English] as well as [German].

If anyone can help us out, that’d be awesome! Your result will of course be published under your name, and there will be a big thank you here for you!

And don’t forget to say bye bye to Tulisa:

Tulisa Contostavlos #1

Well, thanks for your visit, Miss Contostavlos, but TULSA is the #1 we seek today!

Update: According to a [comment] by Sjaak Trekhaak my statements that Tulsa was Intels first native dual core were false. There were others with release dates before Tulsa, like the first Core Duo or the smaller Netburst-based Xeons with Paxville DP core, as you can also see in my reply to Sjaaks comment. Thus, the strike-through parts in the above text.

May 292013

Gainward logoSo, there is this mainboard, an Intel D850EMV2, or rather D850EMVR, which is a sub-version of the former, i850E Rambus chipset. What’s special about that old Pentium 4 board? Well, I won it once in a giveaway at one of the largest german hardware websites, [Computerbase]German flag. And after that, Jan-Frederik Timm, founder and boss of the place contacted me on ICQ, telling me about it. First time I had ever won anything! He asked me to put it to good use, because he was kind of fed up with people just reselling their won stuff. So i promised him that I would.

And boy, did i keep that promise! At first i used a shabby 1.6GHz Northwood processor with just 128MB Rambus RDRAM. Can’t remember the rest of the machine, but over time I upgraded it a bit (when it was already considered old, with Core 2 Duos on the market), a bit more RAM etc. for small LAN party sessions. At some time, I sold it to my cousin, and to make it more powerful and capable for her, I gave the machine the only one and fastest hyper-threaded processor available on that platform, the Pentium 4 HT 3.06GHz, plus 1.5GB PC800-45 RDRAM and my old GeForce 6800 Ultra AGP.

She used that for the internet and gaming etc. for some time until the GeForce died and she thought the machine barely powerful enough for her top-tier games anyway, so she bought a new one, which I built for her. The Intel board and its stuff got my older GeForce FX5950 Ultra then and was used by my uncle for the Internet on Debian Linux and low-end LAN games on Windows XP.

A long time after I first got it, I contacted Jan from Computerbase again, to tell him that I had kept my promise and ensured the board had been used properly for 8 years now. Needless to say he was delighted and very happy that it wasn’t just sold off for quick cash.

Soon after, my cousin got another even more powerful machine, as her Core 2 Duo mainboard died off. Now it was S1156, GTX480 etc. So my uncle bought a new mainboard and I rebuilt the C2D for him with my cousins old GTX275. I asked him if he would part with the D850EMVR and he agreed to give it back to me, after which it collected dust for a year or so.

Now, we need another machine for our small LAN parties, as our Notebooks can’t drive the likes of Torchlight II or Alien Swarm. It was clear, what I had to do: Keep the damn Intel board running until it fucking dies!

This time I chose to make it as powerful as it could remotely become. With a Gainward Bliss GeForce 7800GS+ AGP. The most powerful nVidia based AGP card ever built, equipped with a very overclockable 7900GT GPU with a full 24 pixel pipelines and 8 vertex shaders as well as 512MB Samsung RAM. Only Gainward built it that way (a small 7900 GTX you could say), as nVidia did not officially allow such powerful AGP cards. So this was a limited edition too. I always wanted to have one of those, but could never afford them. Now was the time:

As expected (there were later, more powerful AGP8x systems in comparison to this AGP4x system, with faster Pentium4s and Athlon64s), the CPU is limiting the card. But at least I can add some FSAA or even HDRR at little cost in some games, and damn, that card overclocks better than shown on some of the original reviews! The core got from 450MHz to 600MHz so far, dangerously close to the top-end 7900 GTX PCIe of the time with its 650MHz. Also, the memory accepted some pushing from 1.25GHZ DDR3 to 1.4GHz DDR3 data rate. Nice one!

This was Furmark stable, and the card is very silent and rather cool even under such extreme loads. Maybe it’ll accept even more speed, and all that at a low 1.2V GPU voltage. Cool stuff. Here, a little AquaMark 3 for you:

7800gs+ in AquaMark 3

So, this is at 600MHz core and 1400MHz DDR memory. For comparison I got a result slightly above 53k at just 300MHz core. So as you can see, at least the K.R.A.S.S. engine in AquaMark 3 is heavily CPU bound on this system. So yeah, for my native resolution of 1280×1024 on that box, the card is too powerful for the CPU in most cases. The tide can turn though (in Alien Swarm for instance) when turning on some compute-heavy 128-bit floating point rendering with HDR or very complex shaders, or FSAA etc., so the extra power is going to be used. ;) And soon, 2GB PC1066-32p RDRAM will arrive to replace the 1GB PC800-45 Rambus I have currently, to completely max it out!

So I am keeping my promise. Still. After about 10 years now. Soon there will be another small LAN party, and I’m going to use it there. And I will continue to do so until it goes up in flames! :)

Update: The user [Tweakstone] has mentioned on [Voodooalert]German flag, that XFX once built a GeForce 7950GT for AGP, which was more powerful than the Gainward. So I checked it out, and he seems to be right! The XFX 7950GT was missing the big silent cooler, but provided an architecturally similar G71 GPU at higher clock rates! While the Gainward 7800GS+ offered 450MHz on the core and 1250MHz DDR data rate on the memory, the XFX would give you 550MHz core and 1300MHz DDR date rate at a similar amount of 512MB DDR3 memory. That’s a surprise to me, I wasn’t aware of the XFX. But since my Gainward overclocks so well (it’s the same actual chip after all) and is far more silent and cool, I guess my choice wasn’t wrong after all. ;)

Update 2: Since there was a slight glitch in the geometry setup unit of my card, I have now replaced it with a Sapphire Radeon HD3850 AGP, which gives more performance, slightly better FSAA and as the icing on the cake proper DXVA1 video acceleration. Even plays BluRays in MPC-HC now. ;) Also, I retested AquaMark 3, which seems to require the deletion of the file direcpll.dll from the AquaMark 3 installation directory to not run into an access violation exception at the end of the benchmark on certain ATi or AMD graphics hardware. I guess the drivers are the problem here. But with that troublesome file gone, here’s a new result:

AquaMark 3 on an ATi Radeon HD3850 AGP

Yeah, it’s a bit faster now, but not much. As we can see, the processor is clearly the limiting factor here. But at least I now have relatively problem-free 3D rendering and DXVA on top of it!

Sep 202012

x264 LogoI have played around with PHP a little again, and actually managed to generate PNG images with some elements rendered to them using a few basic GD functions of the scripting language. This is all still very new to me, so don’t be harsh! ;)

I thought I might use this to create some dynamic and more fancy than plain text statistics about the [x264 benchmark]. I decided to do some simple stats about operating systems and CPUs first, didn’t want to overdo it.

So I went for basic OS families and a more broken down visualization of all Windows and all UNIX derivatives. For microprocessors I went for basic architecture families (x86, RISC, VLIW) and a manufacturer breakdown. I know “x86” should probably have been “CISC” instead, but since x86 in itself is so wide-spread, I thought I should just make it its own family. See the following links:

Just so you can see how the generated images look like, I’ll link them in here. As you can see I decided to keep it very plain and simple, no fancy graphics, operating systems first:

Operating systems

Windows operating systems

Windows operating systems

And the microprocessors:

Microprocessor architectures

Microprocessor manufacturers

Not too bad for my first PHP-generated dynamic images? I would sure like to think so. ;)

May 102012

I have tested an sgi Altix 350 shared memory cluster with 10 Intel Itanium² CPUs once, running the [x264 benchmark] on it, and here’s another. Prof. Ludwig and Mr. Otto from the [Chair of Simulation and Modelling of Metallurgic Processes] (SMMP) at the University of Leoben have agreed to let me benchmark their even larger Altix 350 with 16 processors. Now while I have already done that successfully using the GNU C compiler (GCC), performance was a little bit sub-par and rather unsatisfactory, only slightly faster than its smaller counterpart, here [the filtered result]. Now on the previous sgi Altix machine I had access to the rather hairy ICC 10.1, the Intel C/C++ compiler. It did give around 10% performance boost back then after I finally managed to build x264 with it, so I wanted to try that once again.

Unfortunately, it’s not so easy to get access to ICC. The newest version 12.0 is not available for the IA64 (Itanium) architecture anymore, and you can’t even get to a proper trial license generator for Linux on Itanium. The latest version for Itanium is 11.1.080. Even the download is almost impossible to locate by regular means on Intels website, you can get it after logging in to [registrationcenter.intel.com] if you know what you’re looking for and have an ICC 12.0 trial license tied to that account already. So after some support emailing I registered at [premier.intel.com] (You get a link for that when downloading the regular ICC 12.0 trial for x86). There I opened a support ticket to get a proper trial license.

The supportsperson did his best to generate a license for me, but the compiler installer just wouldn’t accept them. In the end he found that he does not have the proper tools anymore to generate valid IA64 licenses, so he forwarded the issue to internal registration and license management within Intel. He said, they have to have the proper set up to still generate a valid trial serial number / license for Linux on Itanium.

There you see, the Itanium ship seems to really be sinking if you can’t even get any Intel software trials for it anymore. I still hope I can get the 11.1 compiler working as that would be probably the best Itanium² result we’re ever going to see from that kind of IA64 shared memory cluster platform.

Feb 202012

Power Mac G5 Quad CPU ModuleOk, this might be slightly overboard for x264 benchmarking, but I just bought myself the most powerful RISC-based workstation ever built, an Apple Power Mac G5 Quad. Well, yeah, it’s Apple, but as my pal Cosmonate would say, it’s a machine from the time when Apple was not yet cool. So there you have it. ;) To the left you can see the massive CPU module of the quad machine. It actually consists of two physical CPUs with two cores and 2MB L2 cache each. The CPUs are IBM PowerPC 970MPs overclocked by Apple from 2.3GHz to 2.5GHz. Even though they’re built by IBM and not Motorola/Freescale, the cores supposedly feature the Altivec SIMD extension and not IBMs VMX. We’ll see.

Because of the overclock, Apple decided to ship the CPU module with a closed watercooling system including cooler blocks, pump and radiator, all in one module that you can see to the top left. The seller has stated, that the fans are somewhat loud in the machine, which may hint at the watercooling system being broken. I might need to refill the circuit, which should be possible without too much trouble.

And after installing Debian 6.0.4 Linux and running the x264 benchmark, I might turn this into a real workstation. Would be nice to see if those 2005 RISC powerhouses can actually run Gnash and H.264/AVC video at 10Mbps and 1080p. :)

Feb 092012

Silicon Graphics Altix 350After doing some testing on the Intel C compiler (ICC) and also compilation of libav/x264 without root privileges, I finally got access to the real deal! For that, my thanks go to Prof. Supancic and Mr. Flicker from the [Institute for Structural and Functional Ceramics] at the University of Leoben.

So, the machine: A Silicon Graphics Altix 350, equipped with 5 modules with 2 processors each. That makes for a total of 10 Intel “Madison” Itanium² processors clocked at 1.5GHz and packed with 4MB cache each. The memory subsystem consists of 56GB Reg. ECC DDR-I/333 memory and the storage backend is SCSI. The entire machine consumes roughly 5000W of power and runs on SuSE Linux Enterprise Server or SLES, version 10.

So much for the specification mumbo jumbo, for an idea how a fully packed Altix 350 half-height rack would look like, see the thumbnail picture. Yes, it’s big. So, was it easy to get x264 to work? Well, nah-ah, it wasn’t. Continue reading »

Feb 022012

Intel Itanium 2 LogoNow [look at that]! Just a few months ago, you would have easily paid 2000-3000€ for such a box on Ebay, and now it’s just 149€ with ridiculously cheap shipping to Austria. I mean, 4.90€ for a mass of 27kg? Insanely cheap. So yes, I can totally feel the itching in my fingers. The Itanium² would be a perfect new toy for the [x264 benchmark]! So far, no one has tested such a chip, and since it’s a VLIW (Very Long Instruction Word) architecture, not a classic RISC or CISC design, this would be a total novum. Well, there has been one VLIW so far, my own Transmeta Efficēon TM8600, but that one kinda doesn’t count, since it’s emulating x86_32 via it’s firmware-embedded codemorphing engine.

But this is the real thing. There is only one thing keeping me back: And old friend of mine, who goes by the nick of DarkHunter has given me contact information for a guy working at another institute at the University I’m also working for. And they even have dual-core Itaniums there, so if they let me play around a bit with that monster, that’d be perfect! It’s even running SuSE Enterprise Linux already, so that should be a walk in the park for code compilation etc. Well, whatever happens, it seems there is going to be an Intel Itanium² with VLIW architecture in the x264 results list at some point in the nearer future.. Exotic and exciting!