Just recently, I’ve tested the computational cost of decoding 10-bit H.265/HEVC on older PCs as well as Android devices – with some external help. See [here]. The result was, that a reasonable Core 2 Quad can do 1080p @ 23.976fps @ 3MBit/s in software without issues, while a Core 2 Duo at 1.6GHz will fail. Also, it has been shown that Android devices – even when using seriously fast quad- and octa-core CPUs can’t do it fluently without a hardware decoder capable of accelerating 10-bit H.265. To my knowledge there is a hack for Tegra K1- and X1-based devices used by MX Player, utilizing the CUDA cores to do the decoding, but all others are being left behind for at least a few more months until Snapdragon 820 comes out.
Today, I’m going to show the results of my tests on Intel Skylake hardware to see whether Intels’ claims are true, for Intel has said that some of their most modern integrated GPUs can indeed accelerate 10-bit video, at least when it comes to the expensive H.265/HEVC. They didn’t claim this for all of their hardware however, so I’d like to look at some lower-end integrated GPUs today, the Intel HD Graphics 520 and the Intel HD Graphics 515. Here are the test systems, both running the latest Windows 10 Pro x64:
- HP Elitebook 820 G3
- CPU: Intel [Core i5-6200U]
- GPU: Intel HD Graphics 520
- RAM: 8GB DDR4/2133 9-9-9-28-1T
- Cooling: Active
- HP Elite X2 1012 G1 Convertible
- CPU: Intel [Core m5-6Y54]
- GPU: Intel HD Graphics 515
- RAM: 8GB LPDDR3/1866 14-17-17-40-1T
- Cooling: Passive
Let’s look at the more powerful machine first, which would clearly be the actively cooled Elitebook 820 G3. First, let’s inspect the basic H.265/HEVC capabilities of the GPU with [DXVAChecker]:
And this is the ultra-low-voltage CPU housing the graphics core:
So let’s launch the Windows media player of my choice, [MPC-HC], and look at the video decoder options we have:
In any case, both HEVC and UHD decoding have to be enabled manually. On top of that, it seems that either Intels’ proprietary QuickSync can’t handle H.265/HEVC yet, or MPC-HC simply can’t make use of it. The standard Microsoft DXVA2 API however supports it just fine.
Once again, I’m testing with the Anime “Garden of Words” in 1920×1080 at ~23.976fps, but this time with a smaller slice at a higher bitrate of 5Mbit. The encoding options were as follows for pass 1 and pass 2:
--y4m -D 10 --fps 24000/1001 -p veryslow --open-gop --bframes 16 --b-pyramid --bitrate 5000 --rect --amp --aq-mode 3 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0 --rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 1 --slow-firstpass --stats v.stats --sar 1 --range full --y4m -D 10 --fps 24000/1001 -p veryslow --open-gop --bframes 16 --b-pyramid --bitrate 5000 --rect --amp --aq-mode 3 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0 --rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 2 --stats v.stats --sar 1 --range full
Let’s look at the performance during some intense scenes with lots of rain at the beginning and some less taxing indoor scenes later:
There is clearly some difference, but it doesn’t appear to be overly dramatic. Let’s do a combined graph, putting the CPU loads for GPU-assisted decoding over the regular one as an overlay:
Well, using DXVA2 does improve the situation here, even if it’s not by too much. It’s just that I would’ve expected a bit more here, but I guess that we’d still need to rely on proprietary APIs like nVidia CUVID or Intel QuickSync to get some really drastic results.
Let’s take a look at the Elite X2 1012 G1 convertible/tablet with its slightly lower CPU and GPU clock rates next:
And this is, what DXVAChecker has to say about its integrated GPU:
Now what do we have here?! Both HD Graphics 520 and 515 should be [architecturally identical]. Both are GT2 cores with 192 shader cores distributed over 24 clusters, 24 texture mapping units as well as 3 rasterizers. Both support the same QuickSync generation. The only marginal difference seems to be the maximum boost clock of 1.05GHz vs. 1GHz, and yet HD Graphics 515 shows no sign of supporting the Main10 profile for H.264/HEVC (“HEVC_VLD_Main10”), so no GPU-assisted 10-bit decoding! Why? Who knows. At the very least they could just scratch 8K support, and implement it for SD, HD, FHD and UHD 4K resolutions. But nope… Only 8-bit is supported here.
I even tried the latest beta driver version 4380 to see whether anything has changed in the meantime, but no; It behaves in the same way.
Let’s look at what that means for CPU load on the slower platform:
We can see that we get close to hitting the ceiling with the CPUs’ boost clock going up all the way. This is problematic for thermally constrained systems like this one. During a >4 hour [x264 benchmark run], the Elite X2 1012 G1 has shown that its 4.5W CPU can’t hold boost clocks this high for a long time, given the passive cooling solution. Instead, it sat somehwere in between 1.7-2.0GHz, mostly in the 1.8-1.9GHz area. This might still be enough with bigger decoding buffers, but DXVA2 would help a bit here in making this slightly less taxing on the CPU, especially considering higher bitrates or even 4K content. Also, when upping the ambient temperature, the runtime could be pushed back by almost an hour, pushing the CPU clock rate further down by 100-200MHz. So it might just not play that movie on the beach in summer at 36°C.
So, what can we learn from that? If you’re going for an Intel/PC-based tablet, convertible or Ultrabook, you need to pick your Intel CPU+graphics solution wisely, and optimally not without testing it for yourself first! Who knows what other GPUs might be missing certain GPU video decoding features like HD Graphics 515 does. Given that there is no actual compatibility matrix for this as of yet (I have asked Intel to publish one, but they said they can’t promise anything), you need to be extra careful!
For stuff like my 10-bit H.265/HEVC videos at reasonably “low” bitrates, it’s likely ok even with the smallest Core m3-6Y30 + HD Graphics 515 that you can find in devices like Microsofts’ own Surface Pro 4. But considering modern tablets’ WiDi (Wireless Display) tech with UHD/4K resolutions, you might want to be careful when choosing that Windows (or Linux) playback device for your big screens!