Nov 192016

FreeBSD GMABoost logoRecently, after finding out that the old Intel GMA950 profits greatly from added memory bandwidth (see [here]), I wondered if the overclocking mechanism applied by the Windows tool [here] had leaked into the public after all this time. The developer of said tool refused to open source the software even after it turning into abandonware – announced support for GMA X3100 and X4500 as well as MacOS X and Linux never came to be. Also, he did not say how he managed to overclock the GMA950 in the first place.

Some hackers disassembled the code of the GMABooster however, and found out that all that’s needed is a simple PCI register modification that you could probably apply by yourself on Microsoft Windows by using H.Oda!s’ [WPCREdit].

Tools for PCI register modification do exist on Linux and UNIX as well of course, so I wondered whether I could apply this knowledge on FreeBSD UNIX too. Of course, I’m a few years late to the party, because people have already solved this back in 2011! But just in case the scripts and commands disappear from the web, I wanted this to be documented here as well. First, let’s see whether we even have a GMA950 (of course I do, but still). It should be PCI device 0:0:2:0, you can use FreeBSDs’ own pciconf utility or the lspci command from Linux:

# lspci | grep "00:02.0"
00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller (rev 03)
# pciconf -lv pci0:0:2:0
vgapci0@pci0:0:2:0:    class=0x030000 card=0x30aa103c chip=0x27a28086 rev=0x03 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller'
    class      = display
    subclass   = VGA

Ok, to alter the GMA950s’ render clock speed (we are not going to touch it’s 2D “desktop” speed), we have to write certain values into some PCI registers of that chip at 0xF0hex and 0xF1hex. There are three different values regulating clockspeed. Since we’re going to use setpci, you’ll need to install the sysutils/pciutils package on your machine via # pkg install pciutils. I tried to do it with FreeBSDs’ native pciconf tool, but all I managed was to crash the machine a lot! Couldn’t get it solved that way (just me being too stupid I guess), so we’ll rely on a Linux tool for this. Here is my version of the script, which I call I placed that in /usr/local/sbin/ for global execution:

  1. #!/bin/sh
  3. case "$1" in
  4.   200) clockStep=34 ;;
  5.   250) clockStep=31 ;;
  6.   400) clockStep=33 ;;
  7.   *)
  8.     echo "Wrong or no argument specified! You need to specify a GMA clock speed!" >&2
  9.     echo "Usage: $0 [200|250|400]" >&2
  10.     exit 1
  11.   ;;
  12. esac
  14. setpci -s 02.0 F0.B=00,60
  15. setpci -s 02.0 F0.B=$clockStep,05
  17. echo "Clockspeed set to "$1"MHz"

Now you can do something like this: # 200 or # 400, etc. Interestingly, FreeBSDs’ i915_kms graphics driver seems to have set the 3D render clock speed of my GMA950 to 400MHz already, so there was nothing to be gained for me in terms of performance. I can still clock it down to conserve energy though. A quick performance comparison using a crappy custom-recorded ioquake3 demo shows the following results:

  • 200MHz: 30.6fps
  • 250MHz: 35.8fps
  • 400MHz: 42.6fps

Hardware was a Core 2 Duo T7600 and the GPU was making use of two DDR-II/667 4-4-4 memory modules in dual channel configuration. Resolution was 1400×1050 with quite a few changes in the Quake III configuration to achieve more performance, so your results won’t be comparable, even when running ioquake3 on identical hardware. I’d post my ~/.ioquake3/baseq3/q3config.cfg here, but in my stupidity I just managed to freaking wipe the file out. Now I have to redo all the tuning, pfh.

But in any case, this really works!

Unfortunately, it only applies to the GMA950. And I still wonder what it was that was so wrong with # pciconf -w -h pci0:0:2:0 0xF0 0060 && pciconf -w -h pci0:0:2:0 0xF0 3405 and the like. I tried a few combinations just in case my byte order was messed up or in case I really had to write single bytes instead of half-words, but either the change wouldn’t apply at all, or the machine would just lock up. Would be nice to do this with only BSD tools on actual FreeBSD UNIX, but I guess I’m just too stupid for pciconf

May 132016

3dfx logoI had actually meant to post this upon release, but forgot. I’ve now been reminded on the forums that I should maybe do this, even if it isn’t really a match for this weblog, considering the target audience, language-wise. Well, whatever. Somewhere in 2014, a user named [OutOfRange]German flag from the German 3dfx forum [VoodooAlert]German flag had begun to write a German transcript for an interview that the [Computer History Museum] had conducted with the founders of the now-long-gone PC 3D graphics pioneer 3dfx Interactive. This is especially interesting stuff for PC gamers and hardware enthusiasts of the mid-90s to the early 2000s. I allows quite an insight into both the technological and business sides of a now-legendary 90s chip startup company.

Also, those guys are really nice people as well it seems. :)

It had been requested to have some kind of text file-based translation to read while watching the video by some guys who can’t speak English, and OutOfRange had begun to translate the video and write that, with some minor passages done by another user called [Kaitou]German flag. Ultimately, when I picked up the project, I (and others) had the idea of making a German fansub out of that. So together with OutOfRange, who supplied me with the latest versions of his transcript on-the-go, I started to learn about how to do (very) simple ASS/SSA soft subtitling using [Subtitle Workshop] and just got to it. Early on, there was little or no color coding and very little formatting, so working on it looked like this:

The early on of subtitling the 3Dfx Oral History Panel

The early-on of subtitling the 3Dfx Oral History Panel, I translated as I got fed new bits and pieces by OutOfRange…

But it got better and better, colors got assigned to all the guys to differentiate them from each other better when several people were talking at once (you get used to the colors fast!), explanatory information was put to the top of the frame and slowly but surely, the German fansub of that 2½ hour interview started to take shape:

3dfx Oral History Panel Screenshot

The 3dfx founders talking about how the would never do CAD on their first 3D chip for PC gamers, known as the “3Dfx Voodoo Graphics”. From left to right: Gary Tarolli, Scott Sellers, Ross Smith and Gordon Campbell.

So, where are the files? Here they are (thanks fly out once more to OutOfRange, [CryptonNite]German flag and [Hutzeputz]German flag for helping out with download mirrors!):

The 3Dfx Oral History Panel in its german fansub:


Audio & video are © 2013 Computer History Museum
[original English transcript] (This fansub is not based on it!), [original web stream] [] [Terms of use] Used with express permission!

The subtitles are licensed under the [CC BY-NC-SA 4.0].

  • Initiator and main translator: OutOfRange
  • Co-translator: Kaitou
  • Editor & fansubber: GrandAdmiralThrawn (that would be myself)
  • []German flag

This release was made possible also because of the friendly help of a legal representative of the Computer History Museum who helped clear up how we can dual-license and re-release this video by ourselves. Note that if you wish to redistribute a modified version (!) of this by yourself, you’d need to get express permission from the Computer History Museum by yourself. If you wish to modify the fansub, you may redistribute the .ass files freely under the CC BY-NC-SA 4.0, but not the whole, remuxed video unless you have permission from the Computer History Museum!

In essence: You may not redistribute modified versions of audio & video without express permission from the creator, as their license is in essence something like the [CC BY-NC-ND].

As a final note, I’m not really a fansubber, so don’t expect any kind of professional quality here. Or something that you’d get from an actual fansubbing group. But for an amateurish sub, it should be quite decent. ;)

Have fun! :)

PS.: If you’re ever anywhere near Silicon Valley, California, and even if you’re just a tiny bit interested in computer technology and history, pay that museum a visit! It’s seriously awesome, they even have mainframes from the 60s, magnetic core memory panels from the 50s and more modern stuff as well. Ah, I remember stuff like that CRAY machine or the “Weapons Director” box,  really nice exhibits there!

Mar 152016

H.265/HEVC logoJust recently, I’ve tested the computational cost of decoding 10-bit H.265/HEVC on older PCs as well as Android devices – with some external help. See [here]. The result was, that a reasonable Core 2 Quad can do 1080p @ 23.976fps @ 3MBit/s in software without issues, while a Core 2 Duo at 1.6GHz will fail. Also, it has been shown that Android devices – even when using seriously fast quad- and octa-core CPUs can’t do it fluently without a hardware decoder capable of accelerating 10-bit H.265. To my knowledge there is a hack for Tegra K1- and X1-based devices used by MX Player, utilizing the CUDA cores to do the decoding, but all others are being left behind for at least a few more months until Snapdragon 820 comes out.

Today, I’m going to show the results of my tests on Intel Skylake hardware to see whether Intels’ claims are true, for Intel has said that some of their most modern integrated GPUs can indeed accelerate 10-bit video, at least when it comes to the expensive H.265/HEVC. They didn’t claim this for all of their hardware however, so I’d like to look at some lower-end integrated GPUs today, the Intel HD Graphics 520 and the Intel HD Graphics 515. Here are the test systems, both running the latest Windows 10 Pro x64:

  • HP Elitebook 820 G3 (tiny)
  • HP Elitebook 820 G3
  • CPU: Intel [Core i5-6200U]
  • GPU: Intel HD Graphics 520
  • RAM: 8GB DDR4/2133 9-9-9-28-1T
  • Cooling: Active
  • HP Elite X2 1012 G1 (tiny)
  • HP Elite X2 1012 G1 Convertible
  • CPU: Intel [Core m5-6Y54]
  • GPU: Intel HD Graphics 515
  • RAM: 8GB LPDDR3/1866 14-17-17-40-1T
  • Cooling: Passive

Let’s look at the more powerful machine first, which would clearly be the actively cooled Elitebook 820 G3. First, let’s inspect the basic H.265/HEVC capabilities of the GPU with [DXVAChecker]:

DXVAChecker on an Intel HD Graphics 520

DXVAChecker looks good with the latest Intel drivers provided by HP (version 4331): 10-Bit H.264/HEVC is being supported all the way up to 8K!

And this is the ultra-low-voltage CPU housing the graphics core:

Intel Core i5-6200U

Intel Core i5-6200U

So let’s launch the Windows media player of my choice, [MPC-HC], and look at the video decoder options we have:

In any case, both HEVC and UHD decoding have to be enabled manually. On top of that, it seems that either Intels’ proprietary QuickSync can’t handle H.265/HEVC yet, or MPC-HC simply can’t make use of it. The standard Microsoft DXVA2 API however supports it just fine.

Once again, I’m testing with the Anime “Garden of Words” in 1920×1080 at ~23.976fps, but this time with a smaller slice at a higher bitrate of 5Mbit. The encoding options were as follows for pass 1 and pass 2:

--y4m -D 10 --fps 24000/1001 -p veryslow --open-gop --bframes 16 --b-pyramid --bitrate 5000 --rect
--amp --aq-mode 3 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0
--rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 1
--slow-firstpass --stats v.stats --sar 1 --range full

--y4m -D 10 --fps 24000/1001 -p veryslow --open-gop --bframes 16 --b-pyramid --bitrate 5000 --rect
--amp --aq-mode 3 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0
--rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 2
--stats v.stats --sar 1 --range full

Let’s look at the performance during some intense scenes with lots of rain at the beginning and some less taxing indoor scenes later:

There is clearly some difference, but it doesn’t appear to be overly dramatic. Let’s do a combined graph, putting the CPU loads for GPU-assisted decoding over the regular one as an overlay:

CPU load with software decoding in blue and DXVA2 GPU-accelerated hardware decoding in red

Blue = software decoding, magenta (cause I messed up with the red color) = GPU-assisted hardware decoding

Well, using DXVA2 does improve the situation here, even if it’s not by too much. It’s just that I would’ve expected a bit more here, but I guess that we’d still need to rely on proprietary APIs like nVidia CUVID or Intel QuickSync to get some really drastic results.

Let’s take a look at the Elite X2 1012 G1 convertible/tablet with its slightly lower CPU and GPU clock rates next:

Its processor:

Core m5-6Y54

Core m5-6Y54

And this is, what DXVAChecker has to say about its integrated GPU:

DXVAChecker on an Intel HD Graphics 515

Whoops… Something important seems to be missing here…

Now what do we have here?! Both HD Graphics 520 and 515 should be [architecturally identical]. Both are GT2 cores with 192 shader cores distributed over 24 clusters, 24 texture mapping units as well as 3 rasterizers. Both support the same QuickSync generation. The only marginal difference seems to be the maximum boost clock of 1.05GHz vs. 1GHz, and yet HD Graphics 515 shows no sign of supporting the Main10 profile for H.264/HEVC (“HEVC_VLD_Main10”), so no GPU-assisted 10-bit decoding! Why? Who knows. At the very least they could just scratch 8K support, and implement it for SD, HD, FHD and UHD 4K resolutions. But nope… Only 8-bit is supported here.

I even tried the latest beta driver version 4380 to see whether anything has changed in the meantime, but no; It behaves in the same way.

Let’s look at what that means for CPU load on the slower platform:

CPU load with software decoding

The small Core m5-6Y54 has to do all the work!

We can see that we get close to hitting the ceiling with the CPUs’ boost clock going up all the way. This is problematic for thermally constrained systems like this one. During a >4 hour [x264 benchmark run], the Elite X2 1012 G1 has shown that its 4.5W CPU can’t hold boost clocks this high for a long time, given the passive cooling solution. Instead, it sat somehwere in between 1.7-2.0GHz, mostly in the 1.8-1.9GHz area. This might still be enough with bigger decoding buffers, but DXVA2 would help a bit here in making this slightly less taxing on the CPU, especially considering higher bitrates or even 4K content. Also, when upping the ambient temperature, the runtime could be pushed back by almost an hour, pushing the CPU clock rate further down by 100-200MHz. So it might just not play that movie on the beach in summer at 36°C. ;)

So, what can we learn from that? If you’re going for an Intel/PC-based tablet, convertible or Ultrabook, you need to pick your Intel CPU+graphics solution wisely, and optimally not without testing it for yourself first! Who knows what other GPUs might be missing certain GPU video decoding features like HD Graphics 515 does. Given that there is no actual compatibility matrix for this as of yet (I have asked Intel to publish one, but they said they can’t promise anything), you need to be extra careful!

For stuff like my 10-bit H.265/HEVC videos at reasonably “low” bitrates, it’s likely ok even with the smallest Core m3-6Y30 + HD Graphics 515 that you can find in devices like Microsofts’ own Surface Pro 4. But considering modern tablets’ WiDi (Wireless Display) tech with UHD/4K resolutions, you might want to be careful when choosing that Windows (or Linux) playback device for your big screens!

Dec 032013

Call of Juarez: Gunslinger logoActually, I never quite warmed up to Wild West themed games, shooters or not. I don’t know why, I did like related movies, especially the Bud Spencer & Terence Hill comedies, but also the more serious ones. Never liked any of the games though. But what I heard about the latest Call of Juarez game – a budget title for the first time – sounded so fresh that I gave it a shot. And damn, I was blown away. Sure it’s a linear shooter, yeah, ok, it has checkpoints instead of save games, and even quick-time events. It has all the bad stuff, and still it’s awesome, now how can that be? Simple. It’s all implemented in the right fashion and the game always delivers fun in great ways.

An old man going by the name of Silas Greaves walks into a bar one day in the early 20th century. Clearly, he’d seen better times, and he’s kind of out-of-place in this modern world with cars driving around, him still riding a horse, wearing sixshooters on his belt. In that bar, an old bartender and a few guys including an enthusiastic young dime novel reading boy await him. Recognizing his name as one of a relatively famous bounty hunter they start buying him some drinks to hear him tell his stories.

This unfolds in Mr. Greaves telling them the story of his life. And as he tells them, his words materialize within the game you’re playing, on the fly! And it soon turns out that he is telling his story a bit over-the-top and not quite so historically correct either, since he seems to be meeting all the big names of the West like Billy the Kid, Jesse James, Butch Cassidy and so on, duelling ’em all in the process of course, as he’s gunning his way though it all. The fun thing is, that the 3D engine itself and the gameplay adjust dynamically as Silas Greaves is telling his story, and sometimes he has to adjust a bit of his story as well here and there, especially as his listeners are left in disbelief more often than not.

Let me give you an idea: Silas is walking around on a wooden structure somewhere outside of a gold mine, and he’s stuck. Maybe because so is the story the old man’s telling here, so he has to come up with something new, and as he is remembering (or sometimes seemingly just making up) stuff, a ladder suddenly appears, the one that he thinks he climbed down to continue onwards. This would manifest in the game as a ladder “growing” up from the floor below, or just flying down from the sky and into the right place for Silas to step on it, all while you hear him tell the story piece in the background briefly. Sometimes he even has to tell an entire part of the story anew, which you will see as the whole game “rewinding” itself to let you play again, only very differently this time around.

So, it’s well-told, well-executed in the form of a fast linear shooting gallery that refuses to be ashamed of itself, the Chrome Engine 5 does a nice job of showing all that to us, the artwork is great, music plus awesome gun sounds plus awesome voice-over do a great job of pulling us in, and it costs next to nothing! Well, not all is perfect (for me) though, so let’s get technical, shall we?

The Chrome Engine 5 happens to be using a very fast Direct3D 9 renderer, leaving the game very compatible with Windows from XP/XP x64 all the way up to Windows 8.1 including SLI/CF support. There is only one thing missing. The one tiny little thing that I refuse to game without, and that it anti-aliasing. Missing FSAA leaves the Cel-shaded engine with very jagged edges and foliage. Unacceptable. So I tried to play around with nVidia Inspector again. See this first (click to zoom):

The Cel shader produces well-defined edges everywhere, which results in them being intensely jagged. Also, foliage is very jagged, which you might not see very well on the screenshot, but which will become very visible when moving around. Now, Call of Juarez: Gunslinger does have a driver profile in nVidias GeForce driver, but this does not feature any anti-aliasing compatibility bits whatsoever. After researching the web about this, I have assembled several forum and blog postings into this:

Call of Juarez: Gunslinger and nVidia Inspector

nVidia Inspector (click to zoom)

Now, this does feature some non-AA related stuff like SSAO bits, but that won’t hurt. The most important parts here are the compatibility flags 0x000010C0 and the behavior flags 0x00000000 / “None” as well as the Anti Aliasing Transparency Super-Sampling bits. Usually, Transparency Super-Sampling is for Alpha-Test textures only, to anti-alias foliage for instance. However, in Call of Juarez: Gunslinger, AA will not work without it at all. More specifically it will not work without this feature called “TAA” switched to expensive “SGSS TAA” or sparse-grid super-sampling, as ordered grids seem to be doing nothing at all. Now this does work, but there is a caveat too, as you will soon see:

“Wow” the pure FSAA aficionado might say, while the pure anisotropio (cough) might puke. Cel shaders and most other post processing shaders have a significant problem with sparse grid AA, and that is a distinct blur setting in on the entire 3D frame as soon as both are combined. Like with shader-based AA like CSAA or even more so FXAA, there will be blurring and it will be “ew”. So basically you’ll have to choose whether you’ll want smooth edges+foliage or sharp textures. Ew. For certain games there is a way around this, which means either you go for regular SGSS TAA withouth the sparse grid (doesn’t work on Gunslinger) or full-scene super-sampling anti-aliasing like 2×2 SSAA (works on Gunslinger, sort of). The latter looks like this:

“Beatufil, yaaay” all might scream in unison, but not entirely so I unfortunately have to interpose. This will most likely cause artifacts on textures that have pixel shaders render on alpha-test textures, like for instance heat haze or water, having those textures either disappear partially or render in a broken fashion. Extremely good if it doesn’t happen for you, but it did happen for me and a friend of mine, who has a very different configuration with Keplers instead of Fermi GPUs and Windows 7 instead of my XP x64, indicating a rather fundamental problem than a specific one. Meh.

Me, I chose the mode that you can see in that nVidia Inspector screenshot. Now I don’t like an even slightly blurry image, but I like jaggies less, so yeah. Hard choice there, but that’s all I can do right now. In case you’re interested what kind of a hardware configuration this profile was made for, here are the relevant specifications:

  • 2 x nVidia GeForce GTX 580 3GB in SLI
  • GeForce driver 331.82 non-WHQL
  • 2560x1600x32 (~4.1MPixel)

As I said, SGSS TAA is expensive, which is why I had to limit myself to 4 subsamples (Don’t count the CV samples). 8 samples would make the game not so enjoyable anymore. So to use this, you’ll need some serious GPU horsepower.

Also, there was another texture rendering problem with FSAA in Gunslinger, one that has been fixed by that “Antialiasing Fix” flag. The bug results in weird lines being rendered on textures, something also known to happen with the game “Dead Island” when attempting similar profile modifications.

People are saying that forcing AA works out of the box in both games for AMD graphics cards with the single exception of said bug appearing. For nVidia you can fix that, but not so for AMD. It might be, that newer drivers have taken care of this though, but I cannot comment on that, as I do not own an according system.

But hey, this is better than nothing at least!

Aug 202013

Powered by EVGA LogoTo complete my EVGA GeForce GTX 580 Classified Ultra 3GB SLI setup, I have recently ordered the corresponding [backplates at EVGA], as they are only 6.90€ per piece, which I believe is quite acceptable. The proposed function of the backplates is to protect the rear of the card against accidental damage and to slightly improve cooling performance (which I was unable to verify, but well).

While adding the backplates to the cards was the primary objective I also took advantage of the maintenance downtime to clean the insides of the cards, as I expected quite a lot of dust in there. Actually, one of the cards was rather clean and the other one very messy inside, so I’ll show you the uglier one:

Unfortunately I forgot to make photos of the cleaned heat sink, but rest assured, the awful contamination is gone! ;) While I was at it, I have also renewed the thermal grease to make sure everything works nicely when reassembling the card. There was only one dreadful moment when my monitor suddenly just went dark 5 minutes after booting back into my operating system, and the LEDs on the primary card were just flickering threateningly. But luckily, that was just a loose 4-pin molex plug there. All good now. Just for comparison, an old picture of the cards before cleaning and mounting any backplates:

The large EVGA PCB mostly holds voltage regulation circuitry

The large EVGA PCB mostly holds voltage regulation circuitry

Only thing is, that I probably have to disassemble the second card once more. The primary one behaves very well now, but the secondary isn’t running any cooler. I think there was not enough thermal grease on that one. But other than that, it looks pretty nice now, and the mounted backplates should also increase an EVGA cards’ resale value, not that that’s very important to me, as I’m probably going to keep them for a long time to come. Just saying.

As a friend of mine stated, they are now “dressed to impress”. ;)

May 292013

Gainward logoSo, there is this mainboard, an Intel D850EMV2, or rather D850EMVR, which is a sub-version of the former, i850E Rambus chipset. What’s special about that old Pentium 4 board? Well, I won it once in a giveaway at one of the largest german hardware websites, [Computerbase]German flag. And after that, Jan-Frederik Timm, founder and boss of the place contacted me on ICQ, telling me about it. First time I had ever won anything! He asked me to put it to good use, because he was kind of fed up with people just reselling their won stuff. So i promised him that I would.

And boy, did i keep that promise! At first i used a shabby 1.6GHz Northwood processor with just 128MB Rambus RDRAM. Can’t remember the rest of the machine, but over time I upgraded it a bit (when it was already considered old, with Core 2 Duos on the market), a bit more RAM etc. for small LAN party sessions. At some time, I sold it to my cousin, and to make it more powerful and capable for her, I gave the machine the only one and fastest hyper-threaded processor available on that platform, the Pentium 4 HT 3.06GHz, plus 1.5GB PC800-45 RDRAM and my old GeForce 6800 Ultra AGP.

She used that for the internet and gaming etc. for some time until the GeForce died and she thought the machine barely powerful enough for her top-tier games anyway, so she bought a new one, which I built for her. The Intel board and its stuff got my older GeForce FX5950 Ultra then and was used by my uncle for the Internet on Debian Linux and low-end LAN games on Windows XP.

A long time after I first got it, I contacted Jan from Computerbase again, to tell him that I had kept my promise and ensured the board had been used properly for 8 years now. Needless to say he was delighted and very happy that it wasn’t just sold off for quick cash.

Soon after, my cousin got another even more powerful machine, as her Core 2 Duo mainboard died off. Now it was S1156, GTX480 etc. So my uncle bought a new mainboard and I rebuilt the C2D for him with my cousins old GTX275. I asked him if he would part with the D850EMVR and he agreed to give it back to me, after which it collected dust for a year or so.

Now, we need another machine for our small LAN parties, as our Notebooks can’t drive the likes of Torchlight II or Alien Swarm. It was clear, what I had to do: Keep the damn Intel board running until it fucking dies!

This time I chose to make it as powerful as it could remotely become. With a Gainward Bliss GeForce 7800GS+ AGP. The most powerful nVidia based AGP card ever built, equipped with a very overclockable 7900GT GPU with a full 24 pixel pipelines and 8 vertex shaders as well as 512MB Samsung RAM. Only Gainward built it that way (a small 7900 GTX you could say), as nVidia did not officially allow such powerful AGP cards. So this was a limited edition too. I always wanted to have one of those, but could never afford them. Now was the time:

As expected (there were later, more powerful AGP8x systems in comparison to this AGP4x system, with faster Pentium4s and Athlon64s), the CPU is limiting the card. But at least I can add some FSAA or even HDRR at little cost in some games, and damn, that card overclocks better than shown on some of the original reviews! The core got from 450MHz to 600MHz so far, dangerously close to the top-end 7900 GTX PCIe of the time with its 650MHz. Also, the memory accepted some pushing from 1.25GHZ DDR3 to 1.4GHz DDR3 data rate. Nice one!

This was Furmark stable, and the card is very silent and rather cool even under such extreme loads. Maybe it’ll accept even more speed, and all that at a low 1.2V GPU voltage. Cool stuff. Here, a little AquaMark 3 for you:

7800gs+ in AquaMark 3

So, this is at 600MHz core and 1400MHz DDR memory. For comparison I got a result slightly above 53k at just 300MHz core. So as you can see, at least the K.R.A.S.S. engine in AquaMark 3 is heavily CPU bound on this system. So yeah, for my native resolution of 1280×1024 on that box, the card is too powerful for the CPU in most cases. The tide can turn though (in Alien Swarm for instance) when turning on some compute-heavy 128-bit floating point rendering with HDR or very complex shaders, or FSAA etc., so the extra power is going to be used. ;) And soon, 2GB PC1066-32p RDRAM will arrive to replace the 1GB PC800-45 Rambus I have currently, to completely max it out!

So I am keeping my promise. Still. After about 10 years now. Soon there will be another small LAN party, and I’m going to use it there. And I will continue to do so until it goes up in flames! :)

Update: The user [Tweakstone] has mentioned on [Voodooalert]German flag, that XFX once built a GeForce 7950GT for AGP, which was more powerful than the Gainward. So I checked it out, and he seems to be right! The XFX 7950GT was missing the big silent cooler, but provided an architecturally similar G71 GPU at higher clock rates! While the Gainward 7800GS+ offered 450MHz on the core and 1250MHz DDR data rate on the memory, the XFX would give you 550MHz core and 1300MHz DDR date rate at a similar amount of 512MB DDR3 memory. That’s a surprise to me, I wasn’t aware of the XFX. But since my Gainward overclocks so well (it’s the same actual chip after all) and is far more silent and cool, I guess my choice wasn’t wrong after all. ;)

Update 2: Since there was a slight glitch in the geometry setup unit of my card, I have now replaced it with a Sapphire Radeon HD3850 AGP, which gives more performance, slightly better FSAA and as the icing on the cake proper DXVA1 video acceleration. Even plays BluRays in MPC-HC now. ;) Also, I retested AquaMark 3, which seems to require the deletion of the file direcpll.dll from the AquaMark 3 installation directory to not run into an access violation exception at the end of the benchmark on certain ATi or AMD graphics hardware. I guess the drivers are the problem here. But with that troublesome file gone, here’s a new result:

AquaMark 3 on an ATi Radeon HD3850 AGP

Yeah, it’s a bit faster now, but not much. As we can see, the processor is clearly the limiting factor here. But at least I now have relatively problem-free 3D rendering and DXVA on top of it!

Jul 162012

I have now been waiting for nVidia long enough. Still, support claims, SLI support for the GTX 680 is coming to XP / XP x64 in the future but nobody knows when. Additionally, one support person said on the quiet, that he doesn’t believe in this feature seeing the light of day anmore. One WHQL driver and several Betas and nothing. Only that weird 302.59 driver that had a broken SLI implemented (worked only in D3D windowed mode and OpenGL, otherwise the dreaded, well-known redscreen appeared).

So, what to do? I sold my two GTX 680 cards to an old friend of mine, he used them to replace his noisy GTX 480 SLI. With the money that I got from the sale, I bought the biggest, baddest GTX 580 cards ever built, the EVGA GeForce GTX 580 Classified Ultra. Actually, only one of them is an Ultra, the other one a regular Classified, but hey, both run on Ultra clockrates, so that’s fine. So here is Fermi again, in its maximum configuration with 512 shader clusters.

Usually, a GTX 580 features 772MHz rasterizer and 1544MHz shader clock with 1.5GB of VRAM running at 4GHz GDDR5 data rate. The Ultra with its heavily modified custom PCB design however runs the rasterizer at 900MHz, the shaders at 1800MHz and has 3GB of VRAM running at 4.2GHz data rate. So it can almost reach the GTX 680 speed, by spending a very high energy consumption under load because of its absolutely monstrous 14+3 phase voltage regulation module. While aircooled, this card was definitely made for running under LN2 pots to reach world records. Whatever, SLI is once again working fine for me now:

Energy consumption is an issue here though as mentioned. The cards are even hungrier than a pair of dreaded GTX 480 Fermis. Where the GTX 480 SLI would hit the ceiling at 900W in Furmark, the two extreme 580s here can eat up to 1140W under the same load. That’s intimidating! Regular load while playing The Witcher 2 was around 800-880W, which is still a lot. Especially considering the necessary power adaptions. Since each card has 2 x 8-Pin and 1 x 6-Pin PCI Express power plugs, you need a total of 4 x 8P and 2 x 6P, which even my Tagan Piperock 1.3kW couldn’t provide. So I had to adapt my two 6P to 8P and 4 Molex 4P HDD plugs to 2 x 6P. It seems to work fine though even with the 6P cables being overloaded by almost 100%. Obviously Tagans current limiters aren’t kicking in here, which is good as those limiters (usually sold to the end user as “rails”) are mostly pointless anyway. At least, idle power consumption was only marginally higher when compared to the slim GTX 680. Here I can see 280-300W instead of 250W, which is still somewhat acceptable.

This might very well be the one most powerful GPU solution possible under XP x64 unless GK110 (the full-size Kepler, that will most likely be sold as a GTX 780) is faster than both Ultras here. Of course there is still the slim chance that the 680 (GK104) and maybe even GK110 will get XP / XP x64 SLI, but I don’t count on that anymore. For now, the Ultra SLI is the best I can get. We’ll see if that’s going to change in the future or not.

Apr 112012

SLIOh, and in other news: I started another live chat with nVidia yesterday, and talked to yet another rather well-informed support person named Karthik, who answered my primary question. Of course I asked, wether there actually would be any SLI support for the GeForce GTX 680 on Windows XP and Windows XP x64 as the recent beta driver not having it shocked me quite some. Mr. Karthik however insisted, that the 2-way SLI feature is still fully supported for NT 5.x and that even the current beta driver is actually not feature-complete yet.

That would mean, that I really only have to wait for the WHQL driver and that it won’t be necessary to give up my cards for GTX 580s. There is still a little bit of doubt here, so I’m keeping my options (and bartering threads) open for now.

Apr 102012

SLISo the neverending story continues..  and as it seems, it is spiralling towards a really bad ending. Yesterday, nVidia released a new beta driver for the GeForce GTX 680, version 301.24. As this is the first unified driver, it also supports older cards. However, as a new missing feature it states, that “GeForce GTX 680 SLI” will be missing for both XP and XP x64. Just that one feature that I need, god dammit. Since beta drivers are usually feature complete, this means that SLI won’t be making it into XP (x64) for the GTX 680. I’ll wait for the final WHQL driver, but it’s looking really grim. I have already prepared and started forum threads to set an exchange for two GTX 580s into motion.

This is really, really bad. It’s crap to see that card still supported, just with the most important part missing. If this isn’t in the WHQL, I’ll start a last support chat with nVidia for final clarification. I have a feeling I’m not gonna like what they’ll have to say. We’ll see soon enough I guess.

Mar 302012

Zotac GeForce GTX 680 boxesTo the left you can see two Zotac graphics card boxes. That’s two GeForce GTX 680 cards, so the latest and greatest, nVidias just released Kepler architecture in the form of GK104. So while GK104 is not the final high-end part (GK110 will be just that around late summer or autumn), it’s being sold as such at the moment, since it’s simply the most powerful single GPU to date. And a power-efficient one at that. I guess GK110 will be a monster if nVidia decides to go all-out with the design. While GK104 boasts a 6Ghz QDR 256-Bit memory interface with 2GB, GK110 will give us a supposed 6GHz 512-Bit interface with 4GB. And on top of that 2306 shader cores instead of GK104’s 1536. Well, I don’t believe the memory interface stories yet though. Sounds a bit too good to be true, as that’d be almost 400GB/s..  Well, whatever. Let’s get to the point here, now shall we?

Two GeForce GTX 680 and two GeForce GTX 480To the right you can see the two new GeForce GTX 680 cards on top, and the replaced GeForce GTX 480 cards on the bottom. One SLI supposed to replace another. If only it was that easy. At this moment, nVidia is not offering any driver for either Windows XP nor Windows XP x64 Edition, which I am using. No WHQL and no beta. The only drivers available are leaked alpha versions 300.65 (which detects “NVIDIA GK104” cards instead of “GeForce GTX 680” indicating that this is a development version) and 300.83. These have been leaked by nVidia manufacturing partners like MSI and Gigabyte over the web and others like Zotac on their installation media. Now while they work in prinicple, it seems that one very important feature is missing: SLI.

Since there is no SLI, and currently there seems to be only a single GPU mode, performance at the moment is slightly lower than before. Dammit. A few days ago I have chatted with nVidia support, and the support guys have confirmed, that Windows XP OSes are still officially supported, but the driver ain’t ready yet. So it’s the waiting game again. I have however still sent Zotac an eMail requesting another driver with SLI support, but I don’t think there is going to be any before the official WHQL driver launch for XP and XP x64. A damn shame. Waiting game all over again indeed…