Aug 182017
 

Adobe Photoshop CS6 logo1. Introduction

While I’ve been a Corel Photopaint user for about 20 years now (!), I’ve recently bought myself a license for Adobe Photoshop CS6 Extended, simply because I’ve hit some memory and stability limitations with Corel’s solution when working with very large files (A0 @ 300dpi) and lots of objects (=”layers”), sometimes hundreds of them. Generally, Photopaint tends to become unstable as you keep working with very large files, and things like masks or object copies start failing silently even well before its 32-bit address space is exhausted. Well, Adobe Photoshop CS6 was the last standalone version without relying on the “Creative Cloud” and the last that would work on Windows XP / XP x64, even in its 64-bit version on the latter. So there you have it. I was thinking of migrating.

2. Licensing woes

The first thing that really got on my nerves was Adobes’ licensing system. In my version of CS6, Adobes’ application manager just wouldn’t talk to the activation servers anymore and offline activation is broken as well, as the tool won’t generate any request codes for you (the option just isn’t there anymore). Why is this? During testing I found that Microsofts’ ancient Internet Explorer 8 – the latest on XP – can’t talk to certain activation servers via encrypted connections any longer. I guess Adobe upgraded SSL on some of their servers.

My assumption is that the Adobe application manager uses IEs’ browsing engine to connect to those servers via HTTPS, which fails on anything that isn’t Vista+ with a more modern version of Internet Explorer.

So? I was sitting there with a valid license, but unable to activate the software with it. It’s sad, but in the end, I had to crack Photoshop, which is the one thing I didn’t want to do. Even though Photoshop CS6 officially supported XP, it can no longer be activated on such old operating systems, and nobody cares as it’s all deprecated anyway. It’s annoying, but there was no other choice but to remove the copy protection. But hey, my conscience is clean, I did buy it after all.

3. Language woes

Well, I bought a German version of Photoshop. Which I thought was ok, as Adobe would never do something as stupid as to region-lock their licenses, right? Riiight? [Wrong]. You have to download the [correct version] for your region, or the license won’t work. I mean, German is my native language, but I’m using operating systems and software almost exclusively in English, so that was unacceptable. Thankfully, there is an easy fix for that.

First: Close Photoshop. Then, let’s assume you’ve installed it in the default location. If so, enter the folder %PROGRAMFILES%\Adobe\Adobe Photoshop CS6 (64 Bit)\Locales\de_DE\Support Files\ (swap the de_DE\ part with your locale, you’ll find it in that Locales\ folder). There you’ll find a .dat file, in my case it was called tw10428.dat. Rename it to something like tw10428.dat-backup. Relaunch Photoshop. It’ll be English from now on[1]!

4. The hardware I used for testing

This is important, as some features do depend on certain hardware specifications, like 512MiB of VRAM on the graphics card or certain levels of OpenGL and OpenCL support, those kinds of things. So here’s the hardware, which should be modern enough for CS6:

  • Intel Xeon X5690 3.46GHz hexcore with HT (Westmere architecture, think “Core i7 990X”)
  • 48GiB of RAM
  • nVidia GeForce GTX Titan Black 6GiB (Kepler architecture, somewhat similar to a GeForce GTX 780Ti)

Also, the driver used for the Titan Black is the latest and last one available for XP, 368.81. It even gives you modern things like OpenGL 4.5, OpenCL 1.2 and CUDA 8.0:

nVidia control panel for driver 368.81 on XP x64

nVidia control panel for driver 368.81 on XP x64

5. What it looks like without any compatibility hacks

However, even though the 64-bit version of Photoshop CS6 runs on Windows XP x64 (when cracked :( ), a few things are missing regarding GPU support and 2D/3D acceleration. Let’s launch the application:

Photoshop CS6 Extended on Windows XP x64 without any modifications

Photoshop CS6 Extended on Windows XP x64 without any modifications (click to enlarge)

What we can see immediately is that the 3D menu is missing from the menu bar. This is because Adobe generally forbids the use of extended 3D features (import and rendering of 3D models) on all XP systems. There was an explanation about that on the Adobe labs pages, but that link is dead by now, so I can’t be sure why it was originally disabled. I’ll talk about that stuff later. First, we’ll look at the hardware we do get to use for now, this can be found under Edit \ Preferences \ Performance...:

 

Well, we do get to use a ton of memory, but as you can see, my relatively powerful GTX Titan Black is not being used. Strangely, it says it’s because of me not running the Extended version of Photoshop CS6, but that’s not quite correct:

Photoshop CS6 Extended splash screen

Photoshop CS6 Extended splash screen

I’m not actually sure whether it always said that, as I’ve been messing with Photoshop for a while now, so maybe there was a different message there at first. But it doesn’t matter. The bottom line is, we’re not getting anything out of that GPU!

And we’re gonna fix that.

6. Getting 2D and OpenCL GPU acceleration to work

This is actually easy. We’ll fool Photoshop CS6 into thinking that my GTX Titan Black (which is newer than the software) is an “old GPU”. You can tell Photoshop to still do its stuff on older, slower GPUs despite them not being officially supported. And when that switch is flipped, it seems to apply to all unknown GPUs and graphics drivers, no matter what. Check out the following registry hack:

  1. Windows Registry Editor Version 5.00
  2.  
  3. [HKEY_CURRENT_USER\Software\Adobe\Photoshop\60.0]
  4. "AllowOldGPUS"=dword:00000001

Save that text into a .reg file, for instance AllowOldGPUS-in-Photoshop-CS6.reg. Close Photoshop, then double-click on that file, and it’ll create that dword value AllowOldGPUS for you and set it to 1[2].

Note that the registry path will be different for older versions of Photoshop, so the last part might not be 60.0 but something else, like 11.0 in such a case. Just check first if you have to, by launching Windows’ Regedit tool and navigating to that part of your user registry.

After that, launch Photoshop, and again look at Edit \ Preferences \ Performance...:

Graphics settings when "AllowOldGPUS" is enabled

Graphics settings when “AllowOldGPUS” is enabled

Aha! Looking better. But if you click on that “Advanced Settings…” button, you get this:

Advanced gaphics settings with "AllowOldGPUS" enabled

Advanced gaphics settings with “AllowOldGPUS” enabled

Almost everything checks out – even OpenCL – but we still don’t get to use the advanced OpenGL rendering mode that is supposed to make things even faster. But most of the GPU accelerated 2D functionality is there now, it’s just not working at maximum performance, at least according to Adobe. You’ll get to use stuff like Oil Paint and Scrubby Zoom etc. now though.

But, the 3D stuff that’s been disabled on XP is still missing entirely. And we’ll get to that final part now!

7. Getting 3D and the advanced level of OpenGL 2D acceleration to work

This is a bit more tricky. This stuff is blocked by Photoshops’ operating system detection routines. You can check their results by clicking on Help \ System Info... and looking at the top part of that report:

Adobe Photoshop Version: 13.0.1 (13.0.1.3 20131024.r.34 2013/10/24:21:00:00) x64
Operating System: Windows XP Professional 64-bit
Version: 5.2 Service Pack 2

So we have to fool it into thinking that it’s sitting on a more modern operating system. The tool for the job is Microsofts’ own [Application Verifier]. It’s a software testing tool meant to be used during testing phases of a software development process. It’ll do just what we need though. Download the 64-bit edition (32-bit one is included in that as well, you can ignore that for our purpose), and run it.

Microsofts' Application Verifier

Microsofts’ Application Verifier

Right click into its Applications pane, and add two applications that you can find in your Photoshop installation directory, Photoshop.exe and sniffer_gpu.exe. The latter is a command line tool that detects GPU VRAM, driver version, OpenGL / OpenCL versions etc. You can launch that on a terminal window as well, it outputs everything right to stdout, so you can see what it’s reporting right on the terminal. This program is launched by Photoshop every time during startup to determine GPU and graphics driver+API capabilities.

Well, uncheck everything in the Tests pane, and check just Compatibility \ HighVersionLie. Right-click it, and pick Properties and enter the following[3]:

Faking Windows 7 x64 SP2

Faking Windows 7 SP2

So we’re reporting an NT 6.1 kernel (Windows 7) with build number 7601 and service pack 2, which would be current as of today. Keep it like that, and launch Photoshop. You’ll see this:

Photoshop CS6s' full functionality, ready to be used?

Photoshop CS6s’ full functionality, ready to be used? (Click to enlarge)

And there we go, our 3D menu is active on top, right between the “Filter” and “View” menus. But that doesn’t mean that it’s tools are really working… Anyway, the System Info will now report a different operating system:

Adobe Photoshop Version: 13.0.1 (13.0.1.3 20131024.r.34 2013/10/24:21:00:00) x64
Operating System: Windows 7 64-bit
Version: 6.1

Oddly enough, the service pack 2 part is missing, but who cares. Let’s take a look at the graphics settings!

Advanced graphics settings, revisited

Advanced graphics settings, revisited

Alright, we can use advanced OpenGL effects after applying our Windows 7 fake! Very good. That means all 2D OpenGL / OpenCL accelerated parts are working at maximum performance now! And as a first indicator of 3D also working, we can now access the 3D settings as well, look at this:

3D settings

3D settings, even the full 6GB VRAM minus some 32MiB of reserved memory have been detected (click to enlarge)

For the 3D test, I decided to download a 3DStudioMax sample model from GrabCAD, [here] (that website requires registration). A complete render of that model can also be seen there, looks like this:

"Chainmail maelstrom"

It’s called “Chainmail maelstrom unit”

So, let’s create a new document, import that model via 3D \ New 3D Layer from File... and switch to Photoshops’ 3D view! And…

OpenGL accelerated 3D preview

OpenGL accelerated 3D preview (click to enlarge)

It’s totally working! Strangely you don’t get anti-aliasing even though the card can do it, but at least it’s something. I’m almost feeling like sitting in front of some kind of CAD software now, with all that realtime 3D rendering. Photoshop does interpret the material properties a bit weirdly however, so it’s not glossy enough and the color is wrong for something that’s supposed to be “steel”. I didn’t fix it up though, just left it as-is and clicked “render”! Then, after lots of waiting, you get some nicer output:

Final render

Final render (click to enlarge)

Please note that the above screenshots are 8-bit PNG, so they’re not true representations of the source at all. Still, should be good enough.

One disappointing part is, that the final render – or rather raytracing output – isn’t being crunched on the GPU. Instead, it’s using a lot of CPU cores. In this case, my hexcore Xeon with 12 threads was loaded up to 70-90%. But it would still be a lot faster if done by an OpenCL or CUDA raytracer on the GPU. I will need to check whether there are any plugins for that.

Anyway, this means that the full potential of Adobe Photoshop CS6 Extended can be unlocked on Windows XP Professional x64 Edition SP2! Cheers! Beer Smilie

8. What about Windows XP 32-bit?

No idea. I haven’t tested it. According to users on the web, the OpenCL part wouldn’t work on 32-bit at all, no matter which operating system. It’s supposed to be “by design” for whatever reason. I’m not sure if its just disabled like with 3D, or whether the libraries using it are really not there for the 32-bit version of Photoshop CS6. You would need to test that by yourself, but I wouldn’t get my hopes up with that. At least the other parts should be doable, including 3D.

Now, all that remains is to learn how to actually use Photoshop CS6. :roll: It’s probably going to be a bit painful coming from the Corel side of things…

 

[1] Adobe Photoshop CS6 – Change language to English, Tim Barrett, 2014-08-17 on Youtube

[2] Enable GPU Acceleration in x64 XP, op2rules, 2009-12-11 on Youtube

[3] A comment, Schlaubstar, 2012 on Youtube

Nov 192016
 

FreeBSD GMABoost logoRecently, after finding out that the old Intel GMA950 profits greatly from added memory bandwidth (see [here]), I wondered if the overclocking mechanism applied by the Windows tool [here] had leaked into the public after all this time. The developer of said tool refused to open source the software even after it turning into abandonware – announced support for GMA X3100 and X4500 as well as MacOS X and Linux never came to be. Also, he did not say how he managed to overclock the GMA950 in the first place.

Some hackers disassembled the code of the GMABooster however, and found out that all that’s needed is a simple PCI register modification that you could probably apply by yourself on Microsoft Windows by using H.Oda!s’ [WPCREdit].

Tools for PCI register modification do exist on Linux and UNIX as well of course, so I wondered whether I could apply this knowledge on FreeBSD UNIX too. Of course, I’m a few years late to the party, because people have already solved this back in 2011! But just in case the scripts and commands disappear from the web, I wanted this to be documented here as well. First, let’s see whether we even have a GMA950 (of course I do, but still). It should be PCI device 0:0:2:0, you can use FreeBSDs’ own pciconf utility or the lspci command from Linux:

# lspci | grep "00:02.0"
00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller (rev 03)
 
# pciconf -lv pci0:0:2:0
vgapci0@pci0:0:2:0:    class=0x030000 card=0x30aa103c chip=0x27a28086 rev=0x03 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller'
    class      = display
    subclass   = VGA

Ok, to alter the GMA950s’ render clock speed (we are not going to touch it’s 2D “desktop” speed), we have to write certain values into some PCI registers of that chip at 0xF0hex and 0xF1hex. There are three different values regulating clockspeed. Since we’re going to use setpci, you’ll need to install the sysutils/pciutils package on your machine via # pkg install pciutils. I tried to do it with FreeBSDs’ native pciconf tool, but all I managed was to crash the machine a lot! Couldn’t get it solved that way (just me being too stupid I guess), so we’ll rely on a Linux tool for this. Here is my version of the script, which I call gmaboost.sh. I placed that in /usr/local/sbin/ for global execution:

  1. #!/bin/sh
  2.  
  3. case "$1" in
  4.   200) clockStep=34 ;;
  5.   250) clockStep=31 ;;
  6.   400) clockStep=33 ;;
  7.   *)
  8.     echo "Wrong or no argument specified! You need to specify a GMA clock speed!" >&2
  9.     echo "Usage: $0 [200|250|400]" >&2
  10.     exit 1
  11.   ;;
  12. esac
  13.  
  14. setpci -s 02.0 F0.B=00,60
  15. setpci -s 02.0 F0.B=$clockStep,05
  16.  
  17. echo "Clockspeed set to "$1"MHz"

Now you can do something like this: # gmaboost.sh 200 or # gmaboost.sh 400, etc. Interestingly, FreeBSDs’ i915_kms graphics driver seems to have set the 3D render clock speed of my GMA950 to 400MHz already, so there was nothing to be gained for me in terms of performance. I can still clock it down to conserve energy though. A quick performance comparison using a crappy custom-recorded ioquake3 demo shows the following results:

  • 200MHz: 30.6fps
  • 250MHz: 35.8fps
  • 400MHz: 42.6fps

Hardware was a Core 2 Duo T7600 and the GPU was making use of two DDR-II/667 4-4-4 memory modules in dual channel configuration. Resolution was 1400×1050 with quite a few changes in the Quake III configuration to achieve more performance, so your results won’t be comparable, even when running ioquake3 on identical hardware. I’d post my ~/.ioquake3/baseq3/q3config.cfg here, but in my stupidity I just managed to freaking wipe the file out. Now I have to redo all the tuning, pfh.

But in any case, this really works!

Unfortunately, it only applies to the GMA950. And I still wonder what it was that was so wrong with # pciconf -w -h pci0:0:2:0 0xF0 0060 && pciconf -w -h pci0:0:2:0 0xF0 3405 and the like. I tried a few combinations just in case my byte order was messed up or in case I really had to write single bytes instead of half-words, but either the change wouldn’t apply at all, or the machine would just lock up. Would be nice to do this with only BSD tools on actual FreeBSD UNIX, but I guess I’m just too stupid for pciconf

May 132016
 

3dfx logoI had actually meant to post this upon release, but forgot. I’ve now been reminded on the forums that I should maybe do this, even if it isn’t really a match for this weblog, considering the target audience, language-wise. Well, whatever. Somewhere in 2014, a user named [OutOfRange]German flag from the German 3dfx forum [VoodooAlert]German flag had begun to write a German transcript for an interview that the [Computer History Museum] had conducted with the founders of the now-long-gone PC 3D graphics pioneer 3dfx Interactive. This is especially interesting stuff for PC gamers and hardware enthusiasts of the mid-90s to the early 2000s. I allows quite an insight into both the technological and business sides of a now-legendary 90s chip startup company.

Also, those guys are really nice people as well it seems. :)

It had been requested to have some kind of text file-based translation to read while watching the video by some guys who can’t speak English, and OutOfRange had begun to translate the video and write that, with some minor passages done by another user called [Kaitou]German flag. Ultimately, when I picked up the project, I (and others) had the idea of making a German fansub out of that. So together with OutOfRange, who supplied me with the latest versions of his transcript on-the-go, I started to learn about how to do (very) simple ASS/SSA soft subtitling using [Subtitle Workshop] and just got to it. Early on, there was little or no color coding and very little formatting, so working on it looked like this:

The early on of subtitling the 3Dfx Oral History Panel

The early-on of subtitling the 3Dfx Oral History Panel, I translated as I got fed new bits and pieces by OutOfRange…

But it got better and better, colors got assigned to all the guys to differentiate them from each other better when several people were talking at once (you get used to the colors fast!), explanatory information was put to the top of the frame and slowly but surely, the German fansub of that 2½ hour interview started to take shape:

3dfx Oral History Panel Screenshot

The 3dfx founders talking about how the would never do CAD on their first 3D chip for PC gamers, known as the “3Dfx Voodoo Graphics”. From left to right: Gary Tarolli, Scott Sellers, Ross Smith and Gordon Campbell.

So, where are the files? Here they are (thanks fly out once more to OutOfRange, [CryptonNite]German flag and [Hutzeputz]German flag for helping out with download mirrors!):

The 3Dfx Oral History Panel in its german fansub:

Downloads:

Audio & video are © 2013 Computer History Museum
[original English transcript] (This fansub is not based on it!), [original web stream] [http://www.computerhistory.org] [Terms of use] Used with express permission!

The subtitles are licensed under the [CC BY-NC-SA 4.0].
CC BY-NC-SA 4.0

  • Initiator and main translator: OutOfRange
  • Co-translator: Kaitou
  • Editor & fansubber: GrandAdmiralThrawn (that would be myself)
  • [http://www.voodooalert.de]German flag

This release was made possible also because of the friendly help of a legal representative of the Computer History Museum who helped clear up how we can dual-license and re-release this video by ourselves. Note that if you wish to redistribute a modified version (!) of this by yourself, you’d need to get express permission from the Computer History Museum by yourself. If you wish to modify the fansub, you may redistribute the .ass files freely under the CC BY-NC-SA 4.0, but not the whole, remuxed video unless you have permission from the Computer History Museum!

In essence: You may not redistribute modified versions of audio & video without express permission from the creator, as their license is in essence something like the [CC BY-NC-ND].

As a final note, I’m not really a fansubber, so don’t expect any kind of professional quality here. Or something that you’d get from an actual fansubbing group. But for an amateurish sub, it should be quite decent. ;)

Have fun! :)

PS.: If you’re ever anywhere near Silicon Valley, California, and even if you’re just a tiny bit interested in computer technology and history, pay that museum a visit! It’s seriously awesome, they even have mainframes from the 60s, magnetic core memory panels from the 50s and more modern stuff as well. Ah, I remember stuff like that CRAY machine or the “Weapons Director” box,  really nice exhibits there!

Mar 152016
 

H.265/HEVC logoJust recently, I’ve tested the computational cost of decoding 10-bit H.265/HEVC on older PCs as well as Android devices – with some external help. See [here]. The result was, that a reasonable Core 2 Quad can do 1080p @ 23.976fps @ 3MBit/s in software without issues, while a Core 2 Duo at 1.6GHz will fail. Also, it has been shown that Android devices – even when using seriously fast quad- and octa-core CPUs can’t do it fluently without a hardware decoder capable of accelerating 10-bit H.265. To my knowledge there is a hack for Tegra K1- and X1-based devices used by MX Player, utilizing the CUDA cores to do the decoding, but all others are being left behind for at least a few more months until Snapdragon 820 comes out.

Today, I’m going to show the results of my tests on Intel Skylake hardware to see whether Intels’ claims are true, for Intel has said that some of their most modern integrated GPUs can indeed accelerate 10-bit video, at least when it comes to the expensive H.265/HEVC. They didn’t claim this for all of their hardware however, so I’d like to look at some lower-end integrated GPUs today, the Intel HD Graphics 520 and the Intel HD Graphics 515. Here are the test systems, both running the latest Windows 10 Pro x64:

  • HP Elitebook 820 G3 (tiny)
  • HP Elitebook 820 G3
  • CPU: Intel [Core i5-6200U]
  • GPU: Intel HD Graphics 520
  • RAM: 8GB DDR4/2133 9-9-9-28-1T
  • Cooling: Active
  • HP Elite X2 1012 G1 (tiny)
  • HP Elite X2 1012 G1 Convertible
  • CPU: Intel [Core m5-6Y54]
  • GPU: Intel HD Graphics 515
  • RAM: 8GB LPDDR3/1866 14-17-17-40-1T
  • Cooling: Passive

Let’s look at the more powerful machine first, which would clearly be the actively cooled Elitebook 820 G3. First, let’s inspect the basic H.265/HEVC capabilities of the GPU with [DXVAChecker]:

DXVAChecker on an Intel HD Graphics 520

DXVAChecker looks good with the latest Intel drivers provided by HP (version 4331): 10-Bit H.264/HEVC is being supported all the way up to 8K!

And this is the ultra-low-voltage CPU housing the graphics core:

Intel Core i5-6200U

Intel Core i5-6200U

So let’s launch the Windows media player of my choice, [MPC-HC], and look at the video decoder options we have:

In any case, both HEVC and UHD decoding have to be enabled manually. On top of that, it seems that either Intels’ proprietary QuickSync can’t handle H.265/HEVC yet, or MPC-HC simply can’t make use of it. The standard Microsoft DXVA2 API however supports it just fine.

Once again, I’m testing with the Anime “Garden of Words” in 1920×1080 at ~23.976fps, but this time with a smaller slice at a higher bitrate of 5Mbit. The encoding options were as follows for pass 1 and pass 2:

--y4m -D 10 --fps 24000/1001 -p veryslow --open-gop --bframes 16 --b-pyramid --bitrate 5000 --rect
--amp --aq-mode 3 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0
--rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 1
--slow-firstpass --stats v.stats --sar 1 --range full

--y4m -D 10 --fps 24000/1001 -p veryslow --open-gop --bframes 16 --b-pyramid --bitrate 5000 --rect
--amp --aq-mode 3 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0
--rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 2
--stats v.stats --sar 1 --range full

Let’s look at the performance during some intense scenes with lots of rain at the beginning and some less taxing indoor scenes later:

There is clearly some difference, but it doesn’t appear to be overly dramatic. Let’s do a combined graph, putting the CPU loads for GPU-assisted decoding over the regular one as an overlay:

CPU load with software decoding in blue and DXVA2 GPU-accelerated hardware decoding in red

Blue = software decoding, magenta (cause I messed up with the red color) = GPU-assisted hardware decoding

Well, using DXVA2 does improve the situation here, even if it’s not by too much. It’s just that I would’ve expected a bit more here, but I guess that we’d still need to rely on proprietary APIs like nVidia CUVID or Intel QuickSync to get some really drastic results.

Let’s take a look at the Elite X2 1012 G1 convertible/tablet with its slightly lower CPU and GPU clock rates next:

Its processor:

Core m5-6Y54

Core m5-6Y54

And this is, what DXVAChecker has to say about its integrated GPU:

DXVAChecker on an Intel HD Graphics 515

Whoops… Something important seems to be missing here…

Now what do we have here?! Both HD Graphics 520 and 515 should be [architecturally identical]. Both are GT2 cores with 192 shader cores distributed over 24 clusters, 24 texture mapping units as well as 3 rasterizers. Both support the same QuickSync generation. The only marginal difference seems to be the maximum boost clock of 1.05GHz vs. 1GHz, and yet HD Graphics 515 shows no sign of supporting the Main10 profile for H.264/HEVC (“HEVC_VLD_Main10”), so no GPU-assisted 10-bit decoding! Why? Who knows. At the very least they could just scratch 8K support, and implement it for SD, HD, FHD and UHD 4K resolutions. But nope… Only 8-bit is supported here.

I even tried the latest beta driver version 4380 to see whether anything has changed in the meantime, but no; It behaves in the same way.

Let’s look at what that means for CPU load on the slower platform:

CPU load with software decoding

The small Core m5-6Y54 has to do all the work!

We can see that we get close to hitting the ceiling with the CPUs’ boost clock going up all the way. This is problematic for thermally constrained systems like this one. During a >4 hour [x264 benchmark run], the Elite X2 1012 G1 has shown that its 4.5W CPU can’t hold boost clocks this high for a long time, given the passive cooling solution. Instead, it sat somehwere in between 1.7-2.0GHz, mostly in the 1.8-1.9GHz area. This might still be enough with bigger decoding buffers, but DXVA2 would help a bit here in making this slightly less taxing on the CPU, especially considering higher bitrates or even 4K content. Also, when upping the ambient temperature, the runtime could be pushed back by almost an hour, pushing the CPU clock rate further down by 100-200MHz. So it might just not play that movie on the beach in summer at 36°C. ;)

So, what can we learn from that? If you’re going for an Intel/PC-based tablet, convertible or Ultrabook, you need to pick your Intel CPU+graphics solution wisely, and optimally not without testing it for yourself first! Who knows what other GPUs might be missing certain GPU video decoding features like HD Graphics 515 does. Given that there is no actual compatibility matrix for this as of yet (I have asked Intel to publish one, but they said they can’t promise anything), you need to be extra careful!

For stuff like my 10-bit H.265/HEVC videos at reasonably “low” bitrates, it’s likely ok even with the smallest Core m3-6Y30 + HD Graphics 515 that you can find in devices like Microsofts’ own Surface Pro 4. But considering modern tablets’ WiDi (Wireless Display) tech with UHD/4K resolutions, you might want to be careful when choosing that Windows (or Linux) playback device for your big screens!

Dec 032013
 

Call of Juarez: Gunslinger logoActually, I never quite warmed up to Wild West themed games, shooters or not. I don’t know why, I did like related movies, especially the Bud Spencer & Terence Hill comedies, but also the more serious ones. Never liked any of the games though. But what I heard about the latest Call of Juarez game – a budget title for the first time – sounded so fresh that I gave it a shot. And damn, I was blown away. Sure it’s a linear shooter, yeah, ok, it has checkpoints instead of save games, and even quick-time events. It has all the bad stuff, and still it’s awesome, now how can that be? Simple. It’s all implemented in the right fashion and the game always delivers fun in great ways.

An old man going by the name of Silas Greaves walks into a bar one day in the early 20th century. Clearly, he’d seen better times, and he’s kind of out-of-place in this modern world with cars driving around, him still riding a horse, wearing sixshooters on his belt. In that bar, an old bartender and a few guys including an enthusiastic young dime novel reading boy await him. Recognizing his name as one of a relatively famous bounty hunter they start buying him some drinks to hear him tell his stories.

This unfolds in Mr. Greaves telling them the story of his life. And as he tells them, his words materialize within the game you’re playing, on the fly! And it soon turns out that he is telling his story a bit over-the-top and not quite so historically correct either, since he seems to be meeting all the big names of the West like Billy the Kid, Jesse James, Butch Cassidy and so on, duelling ’em all in the process of course, as he’s gunning his way though it all. The fun thing is, that the 3D engine itself and the gameplay adjust dynamically as Silas Greaves is telling his story, and sometimes he has to adjust a bit of his story as well here and there, especially as his listeners are left in disbelief more often than not.

Let me give you an idea: Silas is walking around on a wooden structure somewhere outside of a gold mine, and he’s stuck. Maybe because so is the story the old man’s telling here, so he has to come up with something new, and as he is remembering (or sometimes seemingly just making up) stuff, a ladder suddenly appears, the one that he thinks he climbed down to continue onwards. This would manifest in the game as a ladder “growing” up from the floor below, or just flying down from the sky and into the right place for Silas to step on it, all while you hear him tell the story piece in the background briefly. Sometimes he even has to tell an entire part of the story anew, which you will see as the whole game “rewinding” itself to let you play again, only very differently this time around.

So, it’s well-told, well-executed in the form of a fast linear shooting gallery that refuses to be ashamed of itself, the Chrome Engine 5 does a nice job of showing all that to us, the artwork is great, music plus awesome gun sounds plus awesome voice-over do a great job of pulling us in, and it costs next to nothing! Well, not all is perfect (for me) though, so let’s get technical, shall we?

The Chrome Engine 5 happens to be using a very fast Direct3D 9 renderer, leaving the game very compatible with Windows from XP/XP x64 all the way up to Windows 8.1 including SLI/CF support. There is only one thing missing. The one tiny little thing that I refuse to game without, and that it anti-aliasing. Missing FSAA leaves the Cel-shaded engine with very jagged edges and foliage. Unacceptable. So I tried to play around with nVidia Inspector again. See this first (click to zoom):

The Cel shader produces well-defined edges everywhere, which results in them being intensely jagged. Also, foliage is very jagged, which you might not see very well on the screenshot, but which will become very visible when moving around. Now, Call of Juarez: Gunslinger does have a driver profile in nVidias GeForce driver, but this does not feature any anti-aliasing compatibility bits whatsoever. After researching the web about this, I have assembled several forum and blog postings into this:

Call of Juarez: Gunslinger and nVidia Inspector

nVidia Inspector (click to zoom)

Now, this does feature some non-AA related stuff like SSAO bits, but that won’t hurt. The most important parts here are the compatibility flags 0x000010C0 and the behavior flags 0x00000000 / “None” as well as the Anti Aliasing Transparency Super-Sampling bits. Usually, Transparency Super-Sampling is for Alpha-Test textures only, to anti-alias foliage for instance. However, in Call of Juarez: Gunslinger, AA will not work without it at all. More specifically it will not work without this feature called “TAA” switched to expensive “SGSS TAA” or sparse-grid super-sampling, as ordered grids seem to be doing nothing at all. Now this does work, but there is a caveat too, as you will soon see:

“Wow” the pure FSAA aficionado might say, while the pure anisotropio (cough) might puke. Cel shaders and most other post processing shaders have a significant problem with sparse grid AA, and that is a distinct blur setting in on the entire 3D frame as soon as both are combined. Like with shader-based AA like CSAA or even more so FXAA, there will be blurring and it will be “ew”. So basically you’ll have to choose whether you’ll want smooth edges+foliage or sharp textures. Ew. For certain games there is a way around this, which means either you go for regular SGSS TAA withouth the sparse grid (doesn’t work on Gunslinger) or full-scene super-sampling anti-aliasing like 2×2 SSAA (works on Gunslinger, sort of). The latter looks like this:

“Beatufil, yaaay” all might scream in unison, but not entirely so I unfortunately have to interpose. This will most likely cause artifacts on textures that have pixel shaders render on alpha-test textures, like for instance heat haze or water, having those textures either disappear partially or render in a broken fashion. Extremely good if it doesn’t happen for you, but it did happen for me and a friend of mine, who has a very different configuration with Keplers instead of Fermi GPUs and Windows 7 instead of my XP x64, indicating a rather fundamental problem than a specific one. Meh.

Me, I chose the mode that you can see in that nVidia Inspector screenshot. Now I don’t like an even slightly blurry image, but I like jaggies less, so yeah. Hard choice there, but that’s all I can do right now. In case you’re interested what kind of a hardware configuration this profile was made for, here are the relevant specifications:

  • 2 x nVidia GeForce GTX 580 3GB in SLI
  • GeForce driver 331.82 non-WHQL
  • 2560x1600x32 (~4.1MPixel)

As I said, SGSS TAA is expensive, which is why I had to limit myself to 4 subsamples (Don’t count the CV samples). 8 samples would make the game not so enjoyable anymore. So to use this, you’ll need some serious GPU horsepower.

Also, there was another texture rendering problem with FSAA in Gunslinger, one that has been fixed by that “Antialiasing Fix” flag. The bug results in weird lines being rendered on textures, something also known to happen with the game “Dead Island” when attempting similar profile modifications.

People are saying that forcing AA works out of the box in both games for AMD graphics cards with the single exception of said bug appearing. For nVidia you can fix that, but not so for AMD. It might be, that newer drivers have taken care of this though, but I cannot comment on that, as I do not own an according system.

But hey, this is better than nothing at least!

Aug 202013
 

Powered by EVGA LogoTo complete my EVGA GeForce GTX 580 Classified Ultra 3GB SLI setup, I have recently ordered the corresponding [backplates at EVGA], as they are only 6.90€ per piece, which I believe is quite acceptable. The proposed function of the backplates is to protect the rear of the card against accidental damage and to slightly improve cooling performance (which I was unable to verify, but well).

While adding the backplates to the cards was the primary objective I also took advantage of the maintenance downtime to clean the insides of the cards, as I expected quite a lot of dust in there. Actually, one of the cards was rather clean and the other one very messy inside, so I’ll show you the uglier one:

Unfortunately I forgot to make photos of the cleaned heat sink, but rest assured, the awful contamination is gone! ;) While I was at it, I have also renewed the thermal grease to make sure everything works nicely when reassembling the card. There was only one dreadful moment when my monitor suddenly just went dark 5 minutes after booting back into my operating system, and the LEDs on the primary card were just flickering threateningly. But luckily, that was just a loose 4-pin molex plug there. All good now. Just for comparison, an old picture of the cards before cleaning and mounting any backplates:

The large EVGA PCB mostly holds voltage regulation circuitry

The large EVGA PCB mostly holds voltage regulation circuitry

Only thing is, that I probably have to disassemble the second card once more. The primary one behaves very well now, but the secondary isn’t running any cooler. I think there was not enough thermal grease on that one. But other than that, it looks pretty nice now, and the mounted backplates should also increase an EVGA cards’ resale value, not that that’s very important to me, as I’m probably going to keep them for a long time to come. Just saying.

As a friend of mine stated, they are now “dressed to impress”. ;)

May 292013
 

Gainward logoSo, there is this mainboard, an Intel D850EMV2, or rather D850EMVR, which is a sub-version of the former, i850E Rambus chipset. What’s special about that old Pentium 4 board? Well, I won it once in a giveaway at one of the largest german hardware websites, [Computerbase]German flag. And after that, Jan-Frederik Timm, founder and boss of the place contacted me on ICQ, telling me about it. First time I had ever won anything! He asked me to put it to good use, because he was kind of fed up with people just reselling their won stuff. So i promised him that I would.

And boy, did i keep that promise! At first i used a shabby 1.6GHz Northwood processor with just 128MB Rambus RDRAM. Can’t remember the rest of the machine, but over time I upgraded it a bit (when it was already considered old, with Core 2 Duos on the market), a bit more RAM etc. for small LAN party sessions. At some time, I sold it to my cousin, and to make it more powerful and capable for her, I gave the machine the only one and fastest hyper-threaded processor available on that platform, the Pentium 4 HT 3.06GHz, plus 1.5GB PC800-45 RDRAM and my old GeForce 6800 Ultra AGP.

She used that for the internet and gaming etc. for some time until the GeForce died and she thought the machine barely powerful enough for her top-tier games anyway, so she bought a new one, which I built for her. The Intel board and its stuff got my older GeForce FX5950 Ultra then and was used by my uncle for the Internet on Debian Linux and low-end LAN games on Windows XP.

A long time after I first got it, I contacted Jan from Computerbase again, to tell him that I had kept my promise and ensured the board had been used properly for 8 years now. Needless to say he was delighted and very happy that it wasn’t just sold off for quick cash.

Soon after, my cousin got another even more powerful machine, as her Core 2 Duo mainboard died off. Now it was S1156, GTX480 etc. So my uncle bought a new mainboard and I rebuilt the C2D for him with my cousins old GTX275. I asked him if he would part with the D850EMVR and he agreed to give it back to me, after which it collected dust for a year or so.

Now, we need another machine for our small LAN parties, as our Notebooks can’t drive the likes of Torchlight II or Alien Swarm. It was clear, what I had to do: Keep the damn Intel board running until it fucking dies!

This time I chose to make it as powerful as it could remotely become. With a Gainward Bliss GeForce 7800GS+ AGP. The most powerful nVidia based AGP card ever built, equipped with a very overclockable 7900GT GPU with a full 24 pixel pipelines and 8 vertex shaders as well as 512MB Samsung RAM. Only Gainward built it that way (a small 7900 GTX you could say), as nVidia did not officially allow such powerful AGP cards. So this was a limited edition too. I always wanted to have one of those, but could never afford them. Now was the time:

As expected (there were later, more powerful AGP8x systems in comparison to this AGP4x system, with faster Pentium4s and Athlon64s), the CPU is limiting the card. But at least I can add some FSAA or even HDRR at little cost in some games, and damn, that card overclocks better than shown on some of the original reviews! The core got from 450MHz to 600MHz so far, dangerously close to the top-end 7900 GTX PCIe of the time with its 650MHz. Also, the memory accepted some pushing from 1.25GHZ DDR3 to 1.4GHz DDR3 data rate. Nice one!

This was Furmark stable, and the card is very silent and rather cool even under such extreme loads. Maybe it’ll accept even more speed, and all that at a low 1.2V GPU voltage. Cool stuff. Here, a little AquaMark 3 for you:

7800gs+ in AquaMark 3

So, this is at 600MHz core and 1400MHz DDR memory. For comparison I got a result slightly above 53k at just 300MHz core. So as you can see, at least the K.R.A.S.S. engine in AquaMark 3 is heavily CPU bound on this system. So yeah, for my native resolution of 1280×1024 on that box, the card is too powerful for the CPU in most cases. The tide can turn though (in Alien Swarm for instance) when turning on some compute-heavy 128-bit floating point rendering with HDR or very complex shaders, or FSAA etc., so the extra power is going to be used. ;) And soon, 2GB PC1066-32p RDRAM will arrive to replace the 1GB PC800-45 Rambus I have currently, to completely max it out!

So I am keeping my promise. Still. After about 10 years now. Soon there will be another small LAN party, and I’m going to use it there. And I will continue to do so until it goes up in flames! :)

Update: The user [Tweakstone] has mentioned on [Voodooalert]German flag, that XFX once built a GeForce 7950GT for AGP, which was more powerful than the Gainward. So I checked it out, and he seems to be right! The XFX 7950GT was missing the big silent cooler, but provided an architecturally similar G71 GPU at higher clock rates! While the Gainward 7800GS+ offered 450MHz on the core and 1250MHz DDR data rate on the memory, the XFX would give you 550MHz core and 1300MHz DDR date rate at a similar amount of 512MB DDR3 memory. That’s a surprise to me, I wasn’t aware of the XFX. But since my Gainward overclocks so well (it’s the same actual chip after all) and is far more silent and cool, I guess my choice wasn’t wrong after all. ;)

Update 2: Since there was a slight glitch in the geometry setup unit of my card, I have now replaced it with a Sapphire Radeon HD3850 AGP, which gives more performance, slightly better FSAA and as the icing on the cake proper DXVA1 video acceleration. Even plays BluRays in MPC-HC now. ;) Also, I retested AquaMark 3, which seems to require the deletion of the file direcpll.dll from the AquaMark 3 installation directory to not run into an access violation exception at the end of the benchmark on certain ATi or AMD graphics hardware. I guess the drivers are the problem here. But with that troublesome file gone, here’s a new result:

AquaMark 3 on an ATi Radeon HD3850 AGP

Yeah, it’s a bit faster now, but not much. As we can see, the processor is clearly the limiting factor here. But at least I now have relatively problem-free 3D rendering and DXVA on top of it!

Jul 162012
 

I have now been waiting for nVidia long enough. Still, support claims, SLI support for the GTX 680 is coming to XP / XP x64 in the future but nobody knows when. Additionally, one support person said on the quiet, that he doesn’t believe in this feature seeing the light of day anmore. One WHQL driver and several Betas and nothing. Only that weird 302.59 driver that had a broken SLI implemented (worked only in D3D windowed mode and OpenGL, otherwise the dreaded, well-known redscreen appeared).

So, what to do? I sold my two GTX 680 cards to an old friend of mine, he used them to replace his noisy GTX 480 SLI. With the money that I got from the sale, I bought the biggest, baddest GTX 580 cards ever built, the EVGA GeForce GTX 580 Classified Ultra. Actually, only one of them is an Ultra, the other one a regular Classified, but hey, both run on Ultra clockrates, so that’s fine. So here is Fermi again, in its maximum configuration with 512 shader clusters.

Usually, a GTX 580 features 772MHz rasterizer and 1544MHz shader clock with 1.5GB of VRAM running at 4GHz GDDR5 data rate. The Ultra with its heavily modified custom PCB design however runs the rasterizer at 900MHz, the shaders at 1800MHz and has 3GB of VRAM running at 4.2GHz data rate. So it can almost reach the GTX 680 speed, by spending a very high energy consumption under load because of its absolutely monstrous 14+3 phase voltage regulation module. While aircooled, this card was definitely made for running under LN2 pots to reach world records. Whatever, SLI is once again working fine for me now:

Energy consumption is an issue here though as mentioned. The cards are even hungrier than a pair of dreaded GTX 480 Fermis. Where the GTX 480 SLI would hit the ceiling at 900W in Furmark, the two extreme 580s here can eat up to 1140W under the same load. That’s intimidating! Regular load while playing The Witcher 2 was around 800-880W, which is still a lot. Especially considering the necessary power adaptions. Since each card has 2 x 8-Pin and 1 x 6-Pin PCI Express power plugs, you need a total of 4 x 8P and 2 x 6P, which even my Tagan Piperock 1.3kW couldn’t provide. So I had to adapt my two 6P to 8P and 4 Molex 4P HDD plugs to 2 x 6P. It seems to work fine though even with the 6P cables being overloaded by almost 100%. Obviously Tagans current limiters aren’t kicking in here, which is good as those limiters (usually sold to the end user as “rails”) are mostly pointless anyway. At least, idle power consumption was only marginally higher when compared to the slim GTX 680. Here I can see 280-300W instead of 250W, which is still somewhat acceptable.

This might very well be the one most powerful GPU solution possible under XP x64 unless GK110 (the full-size Kepler, that will most likely be sold as a GTX 780) is faster than both Ultras here. Of course there is still the slim chance that the 680 (GK104) and maybe even GK110 will get XP / XP x64 SLI, but I don’t count on that anymore. For now, the Ultra SLI is the best I can get. We’ll see if that’s going to change in the future or not.

Apr 112012
 

SLIOh, and in other news: I started another live chat with nVidia yesterday, and talked to yet another rather well-informed support person named Karthik, who answered my primary question. Of course I asked, wether there actually would be any SLI support for the GeForce GTX 680 on Windows XP and Windows XP x64 as the recent beta driver not having it shocked me quite some. Mr. Karthik however insisted, that the 2-way SLI feature is still fully supported for NT 5.x and that even the current beta driver is actually not feature-complete yet.

That would mean, that I really only have to wait for the WHQL driver and that it won’t be necessary to give up my cards for GTX 580s. There is still a little bit of doubt here, so I’m keeping my options (and bartering threads) open for now.

Apr 102012
 

SLISo the neverending story continues..  and as it seems, it is spiralling towards a really bad ending. Yesterday, nVidia released a new beta driver for the GeForce GTX 680, version 301.24. As this is the first unified driver, it also supports older cards. However, as a new missing feature it states, that “GeForce GTX 680 SLI” will be missing for both XP and XP x64. Just that one feature that I need, god dammit. Since beta drivers are usually feature complete, this means that SLI won’t be making it into XP (x64) for the GTX 680. I’ll wait for the final WHQL driver, but it’s looking really grim. I have already prepared and started forum threads to set an exchange for two GTX 580s into motion.

This is really, really bad. It’s crap to see that card still supported, just with the most important part missing. If this isn’t in the WHQL, I’ll start a last support chat with nVidia for final clarification. I have a feeling I’m not gonna like what they’ll have to say. We’ll see soon enough I guess.