Jan 232014
 

Tulsa logoOver the past few years, my [x264 benchmark] has been honored to accept results from many an exotic system. Amongst these are some of the weirder x86 CPUs like a Transmeta Efficēon, a cacheless Intel Celeron that only exists in Asia, and even my good old 486 DX4-S/100 which needed almost nine months to complete what modern boxes do in 1-2 hours. Plus the more exotic ones like the VLIW architecture Intel Itanium² or some ARM RISC chips, one of them sitting on a Raspberry Pi. Also, PowerPC, a MIPS-style chinese 龙芯, or Loongson-2f as we call it, and so on and so forth.

There is however one chip that we’ve been hunting for years now, and never got a hold of. The Intel TULSA. A behemoth, just like the [golden driller] standing in the city that gave the chip its name. Sure, the Pentium 4 / Netburst era wasn’t the best for Intel, and the architecture was the laughingstock of all AMD users of that time. Some of the cores weren’t actually that bad though, and Tulsa is a specifically mad piece of technology.

Tulisa Contostavlos

Tulisa? That you?

Ehm… I said Tulsa, not Tulisa, come on guys, stay focused here! A processor, silicon and stuff (not silicone, fellas).

Xeon 7140M "Tulsa"

An Intel Xeon 7140M “Tulsa” (photograph kindly provided by Thomsen-XE)

Now that’s more like it right there! People seem to agree that the first native x86 dual core was built by Intel and that it was the Core 2. Which is wrong. It wasn’t. It was a hilarious 150W TDP Netburst Monster weighing almost 1.33 billion transistors with up to 16MB of Level 3 cache, Hyperthreading and an unusually high clock speed for a top-end server processor. The FSB800 16MB L3 Xeon MP 7140M part we’re seeing here clocks at 3.4GHz, which is pretty high even for a single core desktop Pentium 4. There also was an FSB667 part called Xeon MP 7150N clocking at 3.5GHz. Only that here we have 2 cores with HT and a metric ton of cache!

These things can run on quad sockets. Meaning a total of 8 cores and 16 threads, like seen on some models of the HP ProLiant DL580 G4. Plus, they’re x86_64 chips too, so they can run 64-Bit operating systems.

Tulsa die shot

Best Tulsa die shot I could find. To the right you can see the massive 16MB L3 cache. There is also 2 x 1MB L2.

And the core point: They’re rare. Extremely rare, especially in the maxed-out configuration of four processors. And I want them tested, as real results are scarce and almost nowhere to be found. Also, Thomsen-XE (who took that photograph of a 7140M up there) wants to see them show off! We have been searching for so long, and missed two guys with corresponding machines by such a narrow margin already!

We want the mightiest of all Netbursts and Intels first native dual core processor to finally show its teeth and prove that with enough brute force, it can even kill the Core 2 micro-architecture (as long as you have your own power plant, that is)!

So now, I’m asking you to please tell us in the comments whether you have or have access to such a machine and if you would agree to run the completely free x264 benchmark on that system. Windows would be nice for a reference x264 result, but don’t mind the operating system too much. Linux and most flavors of UNIX will do the job too! Guides for multiple operating systems are readily available at the bottom of the results list in [English] as well as [German].

If anyone can help us out, that’d be awesome! Your result will of course be published under your name, and there will be a big thank you here for you!

And don’t forget to say bye bye to Tulisa:

Tulisa Contostavlos #1

Well, thanks for your visit, Miss Contostavlos, but TULSA is the #1 we seek today!

Update: According to a [comment] by Sjaak Trekhaak my statements that Tulsa was Intels first native dual core were false. There were others with release dates before Tulsa, like the first Core Duo or the smaller Netburst-based Xeons with Paxville DP core, as you can also see in my reply to Sjaaks comment. Thus, the strike-through parts in the above text.

May 292013
 

Gainward logoSo, there is this mainboard, an Intel D850EMV2, or rather D850EMVR, which is a sub-version of the former, i850E Rambus chipset. What’s special about that old Pentium 4 board? Well, I won it once in a giveaway at one of the largest german hardware websites, [Computerbase]German flag. And after that, Jan-Frederik Timm, founder and boss of the place contacted me on ICQ, telling me about it. First time I had ever won anything! He asked me to put it to good use, because he was kind of fed up with people just reselling their won stuff. So i promised him that I would.

And boy, did i keep that promise! At first i used a shabby 1.6GHz Northwood processor with just 128MB Rambus RDRAM. Can’t remember the rest of the machine, but over time I upgraded it a bit (when it was already considered old, with Core 2 Duos on the market), a bit more RAM etc. for small LAN party sessions. At some time, I sold it to my cousin, and to make it more powerful and capable for her, I gave the machine the only one and fastest hyper-threaded processor available on that platform, the Pentium 4 HT 3.06GHz, plus 1.5GB PC800-45 RDRAM and my old GeForce 6800 Ultra AGP.

She used that for the internet and gaming etc. for some time until the GeForce died and she thought the machine barely powerful enough for her top-tier games anyway, so she bought a new one, which I built for her. The Intel board and its stuff got my older GeForce FX5950 Ultra then and was used by my uncle for the Internet on Debian Linux and low-end LAN games on Windows XP.

A long time after I first got it, I contacted Jan from Computerbase again, to tell him that I had kept my promise and ensured the board had been used properly for 8 years now. Needless to say he was delighted and very happy that it wasn’t just sold off for quick cash.

Soon after, my cousin got another even more powerful machine, as her Core 2 Duo mainboard died off. Now it was S1156, GTX480 etc. So my uncle bought a new mainboard and I rebuilt the C2D for him with my cousins old GTX275. I asked him if he would part with the D850EMVR and he agreed to give it back to me, after which it collected dust for a year or so.

Now, we need another machine for our small LAN parties, as our Notebooks can’t drive the likes of Torchlight II or Alien Swarm. It was clear, what I had to do: Keep the damn Intel board running until it fucking dies!

This time I chose to make it as powerful as it could remotely become. With a Gainward Bliss GeForce 7800GS+ AGP. The most powerful nVidia based AGP card ever built, equipped with a very overclockable 7900GT GPU with a full 24 pixel pipelines and 8 vertex shaders as well as 512MB Samsung RAM. Only Gainward built it that way (a small 7900 GTX you could say), as nVidia did not officially allow such powerful AGP cards. So this was a limited edition too. I always wanted to have one of those, but could never afford them. Now was the time:

As expected (there were later, more powerful AGP8x systems in comparison to this AGP4x system, with faster Pentium4s and Athlon64s), the CPU is limiting the card. But at least I can add some FSAA or even HDRR at little cost in some games, and damn, that card overclocks better than shown on some of the original reviews! The core got from 450MHz to 600MHz so far, dangerously close to the top-end 7900 GTX PCIe of the time with its 650MHz. Also, the memory accepted some pushing from 1.25GHZ DDR3 to 1.4GHz DDR3 data rate. Nice one!

This was Furmark stable, and the card is very silent and rather cool even under such extreme loads. Maybe it’ll accept even more speed, and all that at a low 1.2V GPU voltage. Cool stuff. Here, a little AquaMark 3 for you:

7800gs+ in AquaMark 3

So, this is at 600MHz core and 1400MHz DDR memory. For comparison I got a result slightly above 53k at just 300MHz core. So as you can see, at least the K.R.A.S.S. engine in AquaMark 3 is heavily CPU bound on this system. So yeah, for my native resolution of 1280×1024 on that box, the card is too powerful for the CPU in most cases. The tide can turn though (in Alien Swarm for instance) when turning on some compute-heavy 128-bit floating point rendering with HDR or very complex shaders, or FSAA etc., so the extra power is going to be used. ;) And soon, 2GB PC1066-32p RDRAM will arrive to replace the 1GB PC800-45 Rambus I have currently, to completely max it out!

So I am keeping my promise. Still. After about 10 years now. Soon there will be another small LAN party, and I’m going to use it there. And I will continue to do so until it goes up in flames! :)

Update: The user [Tweakstone] has mentioned on [Voodooalert]German flag, that XFX once built a GeForce 7950GT for AGP, which was more powerful than the Gainward. So I checked it out, and he seems to be right! The XFX 7950GT was missing the big silent cooler, but provided an architecturally similar G71 GPU at higher clock rates! While the Gainward 7800GS+ offered 450MHz on the core and 1250MHz DDR data rate on the memory, the XFX would give you 550MHz core and 1300MHz DDR date rate at a similar amount of 512MB DDR3 memory. That’s a surprise to me, I wasn’t aware of the XFX. But since my Gainward overclocks so well (it’s the same actual chip after all) and is far more silent and cool, I guess my choice wasn’t wrong after all. ;)

Update 2: Since there was a slight glitch in the geometry setup unit of my card, I have now replaced it with a Sapphire Radeon HD3850 AGP, which gives more performance, slightly better FSAA and as the icing on the cake proper DXVA1 video acceleration. Even plays BluRays in MPC-HC now. ;) Also, I retested AquaMark 3, which seems to require the deletion of the file direcpll.dll from the AquaMark 3 installation directory to not run into an access violation exception at the end of the benchmark on certain ATi or AMD graphics hardware. I guess the drivers are the problem here. But with that troublesome file gone, here’s a new result:

AquaMark 3 on an ATi Radeon HD3850 AGP

Yeah, it’s a bit faster now, but not much. As we can see, the processor is clearly the limiting factor here. But at least I now have relatively problem-free 3D rendering and DXVA on top of it!

Sep 202012
 

x264 LogoI have played around with PHP a little again, and actually managed to generate PNG images with some elements rendered to them using a few basic GD functions of the scripting language. This is all still very new to me, so don’t be harsh! ;)

I thought I might use this to create some dynamic and more fancy than plain text statistics about the [x264 benchmark]. I decided to do some simple stats about operating systems and CPUs first, didn’t want to overdo it.

So I went for basic OS families and a more broken down visualization of all Windows and all UNIX derivatives. For microprocessors I went for basic architecture families (x86, RISC, VLIW) and a manufacturer breakdown. I know “x86” should probably have been “CISC” instead, but since x86 in itself is so wide-spread, I thought I should just make it its own family. See the following links:

Just so you can see how the generated images look like, I’ll link them in here. As you can see I decided to keep it very plain and simple, no fancy graphics, operating systems first:

Operating systems

Windows operating systems

Windows operating systems

And the microprocessors:

Microprocessor architectures

Microprocessor manufacturers

Not too bad for my first PHP-generated dynamic images? I would sure like to think so. ;)

May 102012
 

I have tested an sgi Altix 350 shared memory cluster with 10 Intel Itanium² CPUs once, running the [x264 benchmark] on it, and here’s another. Prof. Ludwig and Mr. Otto from the [Chair of Simulation and Modelling of Metallurgic Processes] (SMMP) at the University of Leoben have agreed to let me benchmark their even larger Altix 350 with 16 processors. Now while I have already done that successfully using the GNU C compiler (GCC), performance was a little bit sub-par and rather unsatisfactory, only slightly faster than its smaller counterpart, here [the filtered result]. Now on the previous sgi Altix machine I had access to the rather hairy ICC 10.1, the Intel C/C++ compiler. It did give around 10% performance boost back then after I finally managed to build x264 with it, so I wanted to try that once again.

Unfortunately, it’s not so easy to get access to ICC. The newest version 12.0 is not available for the IA64 (Itanium) architecture anymore, and you can’t even get to a proper trial license generator for Linux on Itanium. The latest version for Itanium is 11.1.080. Even the download is almost impossible to locate by regular means on Intels website, you can get it after logging in to [registrationcenter.intel.com] if you know what you’re looking for and have an ICC 12.0 trial license tied to that account already. So after some support emailing I registered at [premier.intel.com] (You get a link for that when downloading the regular ICC 12.0 trial for x86). There I opened a support ticket to get a proper trial license.

The supportsperson did his best to generate a license for me, but the compiler installer just wouldn’t accept them. In the end he found that he does not have the proper tools anymore to generate valid IA64 licenses, so he forwarded the issue to internal registration and license management within Intel. He said, they have to have the proper set up to still generate a valid trial serial number / license for Linux on Itanium.

There you see, the Itanium ship seems to really be sinking if you can’t even get any Intel software trials for it anymore. I still hope I can get the 11.1 compiler working as that would be probably the best Itanium² result we’re ever going to see from that kind of IA64 shared memory cluster platform.

Feb 202012
 

Power Mac G5 Quad CPU ModuleOk, this might be slightly overboard for x264 benchmarking, but I just bought myself the most powerful RISC-based workstation ever built, an Apple Power Mac G5 Quad. Well, yeah, it’s Apple, but as my pal Cosmonate would say, it’s a machine from the time when Apple was not yet cool. So there you have it. ;) To the left you can see the massive CPU module of the quad machine. It actually consists of two physical CPUs with two cores and 2MB L2 cache each. The CPUs are IBM PowerPC 970MPs overclocked by Apple from 2.3GHz to 2.5GHz. Even though they’re built by IBM and not Motorola/Freescale, the cores supposedly feature the Altivec SIMD extension and not IBMs VMX. We’ll see.

Because of the overclock, Apple decided to ship the CPU module with a closed watercooling system including cooler blocks, pump and radiator, all in one module that you can see to the top left. The seller has stated, that the fans are somewhat loud in the machine, which may hint at the watercooling system being broken. I might need to refill the circuit, which should be possible without too much trouble.

And after installing Debian 6.0.4 Linux and running the x264 benchmark, I might turn this into a real workstation. Would be nice to see if those 2005 RISC powerhouses can actually run Gnash and H.264/AVC video at 10Mbps and 1080p. :)

Feb 092012
 

Silicon Graphics Altix 350After doing some testing on the Intel C compiler (ICC) and also compilation of libav/x264 without root privileges, I finally got access to the real deal! For that, my thanks go to Prof. Supancic and Mr. Flicker from the [Institute for Structural and Functional Ceramics] at the University of Leoben.

So, the machine: A Silicon Graphics Altix 350, equipped with 5 modules with 2 processors each. That makes for a total of 10 Intel “Madison” Itanium² processors clocked at 1.5GHz and packed with 4MB cache each. The memory subsystem consists of 56GB Reg. ECC DDR-I/333 memory and the storage backend is SCSI. The entire machine consumes roughly 5000W of power and runs on SuSE Linux Enterprise Server or SLES, version 10.

So much for the specification mumbo jumbo, for an idea how a fully packed Altix 350 half-height rack would look like, see the thumbnail picture. Yes, it’s big. So, was it easy to get x264 to work? Well, nah-ah, it wasn’t. Continue reading »

Feb 022012
 

Intel Itanium 2 LogoNow [look at that]! Just a few months ago, you would have easily paid 2000-3000€ for such a box on Ebay, and now it’s just 149€ with ridiculously cheap shipping to Austria. I mean, 4.90€ for a mass of 27kg? Insanely cheap. So yes, I can totally feel the itching in my fingers. The Itanium² would be a perfect new toy for the [x264 benchmark]! So far, no one has tested such a chip, and since it’s a VLIW (Very Long Instruction Word) architecture, not a classic RISC or CISC design, this would be a total novum. Well, there has been one VLIW so far, my own Transmeta Efficēon TM8600, but that one kinda doesn’t count, since it’s emulating x86_32 via it’s firmware-embedded codemorphing engine.

But this is the real thing. There is only one thing keeping me back: And old friend of mine, who goes by the nick of DarkHunter has given me contact information for a guy working at another institute at the University I’m also working for. And they even have dual-core Itaniums there, so if they let me play around a bit with that monster, that’d be perfect! It’s even running SuSE Enterprise Linux already, so that should be a walk in the park for code compilation etc. Well, whatever happens, it seems there is going to be an Intel Itanium² with VLIW architecture in the x264 results list at some point in the nearer future.. Exotic and exciting!