Mar 152016
 

H.265/HEVC logoJust recently, I’ve tested the computational cost of decoding 10-bit H.265/HEVC on older PCs as well as Android devices – with some external help. See [here]. The result was, that a reasonable Core 2 Quad can do 1080p @ 23.976fps @ 3MBit/s in software without issues, while a Core 2 Duo at 1.6GHz will fail. Also, it has been shown that Android devices – even when using seriously fast quad- and octa-core CPUs can’t do it fluently without a hardware decoder capable of accelerating 10-bit H.265. To my knowledge there is a hack for Tegra K1- and X1-based devices used by MX Player, utilizing the CUDA cores to do the decoding, but all others are being left behind for at least a few more months until Snapdragon 820 comes out.

Today, I’m going to show the results of my tests on Intel Skylake hardware to see whether Intels’ claims are true, for Intel has said that some of their most modern integrated GPUs can indeed accelerate 10-bit video, at least when it comes to the expensive H.265/HEVC. They didn’t claim this for all of their hardware however, so I’d like to look at some lower-end integrated GPUs today, the Intel HD Graphics 520 and the Intel HD Graphics 515. Here are the test systems, both running the latest Windows 10 Pro x64:

  • HP Elitebook 820 G3 (tiny)
  • HP Elitebook 820 G3
  • CPU: Intel [Core i5-6200U]
  • GPU: Intel HD Graphics 520
  • RAM: 8GB DDR4/2133 9-9-9-28-1T
  • Cooling: Active
  • HP Elite X2 1012 G1 (tiny)
  • HP Elite X2 1012 G1 Convertible
  • CPU: Intel [Core m5-6Y54]
  • GPU: Intel HD Graphics 515
  • RAM: 8GB LPDDR3/1866 14-17-17-40-1T
  • Cooling: Passive

Let’s look at the more powerful machine first, which would clearly be the actively cooled Elitebook 820 G3. First, let’s inspect the basic H.265/HEVC capabilities of the GPU with [DXVAChecker]:

DXVAChecker on an Intel HD Graphics 520

DXVAChecker looks good with the latest Intel drivers provided by HP (version 4331): 10-Bit H.264/HEVC is being supported all the way up to 8K!

And this is the ultra-low-voltage CPU housing the graphics core:

Intel Core i5-6200U

Intel Core i5-6200U

So let’s launch the Windows media player of my choice, [MPC-HC], and look at the video decoder options we have:

In any case, both HEVC and UHD decoding have to be enabled manually. On top of that, it seems that either Intels’ proprietary QuickSync can’t handle H.265/HEVC yet, or MPC-HC simply can’t make use of it. The standard Microsoft DXVA2 API however supports it just fine.

Once again, I’m testing with the Anime “Garden of Words” in 1920×1080 at ~23.976fps, but this time with a smaller slice at a higher bitrate of 5Mbit. The encoding options were as follows for pass 1 and pass 2:

--y4m -D 10 --fps 24000/1001 -p veryslow --open-gop --bframes 16 --b-pyramid --bitrate 5000 --rect
--amp --aq-mode 3 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0
--rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 1
--slow-firstpass --stats v.stats --sar 1 --range full

--y4m -D 10 --fps 24000/1001 -p veryslow --open-gop --bframes 16 --b-pyramid --bitrate 5000 --rect
--amp --aq-mode 3 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0
--rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 2
--stats v.stats --sar 1 --range full

Let’s look at the performance during some intense scenes with lots of rain at the beginning and some less taxing indoor scenes later:

There is clearly some difference, but it doesn’t appear to be overly dramatic. Let’s do a combined graph, putting the CPU loads for GPU-assisted decoding over the regular one as an overlay:

CPU load with software decoding in blue and DXVA2 GPU-accelerated hardware decoding in red

Blue = software decoding, magenta (cause I messed up with the red color) = GPU-assisted hardware decoding

Well, using DXVA2 does improve the situation here, even if it’s not by too much. It’s just that I would’ve expected a bit more here, but I guess that we’d still need to rely on proprietary APIs like nVidia CUVID or Intel QuickSync to get some really drastic results.

Let’s take a look at the Elite X2 1012 G1 convertible/tablet with its slightly lower CPU and GPU clock rates next:

Its processor:

Core m5-6Y54

Core m5-6Y54

And this is, what DXVAChecker has to say about its integrated GPU:

DXVAChecker on an Intel HD Graphics 515

Whoops… Something important seems to be missing here…

Now what do we have here?! Both HD Graphics 520 and 515 should be [architecturally identical]. Both are GT2 cores with 192 shader cores distributed over 24 clusters, 24 texture mapping units as well as 3 rasterizers. Both support the same QuickSync generation. The only marginal difference seems to be the maximum boost clock of 1.05GHz vs. 1GHz, and yet HD Graphics 515 shows no sign of supporting the Main10 profile for H.264/HEVC (“HEVC_VLD_Main10”), so no GPU-assisted 10-bit decoding! Why? Who knows. At the very least they could just scratch 8K support, and implement it for SD, HD, FHD and UHD 4K resolutions. But nope… Only 8-bit is supported here.

I even tried the latest beta driver version 4380 to see whether anything has changed in the meantime, but no; It behaves in the same way.

Let’s look at what that means for CPU load on the slower platform:

CPU load with software decoding

The small Core m5-6Y54 has to do all the work!

We can see that we get close to hitting the ceiling with the CPUs’ boost clock going up all the way. This is problematic for thermally constrained systems like this one. During a >4 hour [x264 benchmark run], the Elite X2 1012 G1 has shown that its 4.5W CPU can’t hold boost clocks this high for a long time, given the passive cooling solution. Instead, it sat somehwere in between 1.7-2.0GHz, mostly in the 1.8-1.9GHz area. This might still be enough with bigger decoding buffers, but DXVA2 would help a bit here in making this slightly less taxing on the CPU, especially considering higher bitrates or even 4K content. Also, when upping the ambient temperature, the runtime could be pushed back by almost an hour, pushing the CPU clock rate further down by 100-200MHz. So it might just not play that movie on the beach in summer at 36°C. ;)

So, what can we learn from that? If you’re going for an Intel/PC-based tablet, convertible or Ultrabook, you need to pick your Intel CPU+graphics solution wisely, and optimally not without testing it for yourself first! Who knows what other GPUs might be missing certain GPU video decoding features like HD Graphics 515 does. Given that there is no actual compatibility matrix for this as of yet (I have asked Intel to publish one, but they said they can’t promise anything), you need to be extra careful!

For stuff like my 10-bit H.265/HEVC videos at reasonably “low” bitrates, it’s likely ok even with the smallest Core m3-6Y30 + HD Graphics 515 that you can find in devices like Microsofts’ own Surface Pro 4. But considering modern tablets’ WiDi (Wireless Display) tech with UHD/4K resolutions, you might want to be careful when choosing that Windows (or Linux) playback device for your big screens!

CC BY-NC-SA 4.0 Testing 10-bit H.265/HEVC decoding with hardware acceleration on Intel Skylake graphics (HD Graphics 520, HD Graphics 515) by The GAT at XIN.at is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

  19 Responses to “Testing 10-bit H.265/HEVC decoding with hardware acceleration on Intel Skylake graphics (HD Graphics 520, HD Graphics 515)”

  1. Maybe you could take the rom file for the Intel 520 and graft it into the bios replacing the Intel 515 rom file? I have done that before on a couple of my PCs though on a laptop or tablet it would be risky. Worth the try if your adventurous!

    • Huh?! You can do that?

      I wasn’t aware of the possibility of replacing the VGA BIOS on Intel iGPUs. Could you describe the process that I have to follow to get this done? It does sound quite interesting!

  2. I can’t find the video settings page you have depicted where you can selected QVC hardware accelleration on MPC_HC latest 1.7 version. How did you open that option page??

    Thanks

    • Hello David,

      Ok, here is how to find that page (This is on a German Windows XP using MPC-HC 1.7.10, but the process is identical for any Windows 7/8.x/10 etc.). First, click “View” \ “Options”:

      MPC HC View \ Options
      View \ Options

      Then, look for the “Internal Filters” and click on “Video Decoder”:

      MPC HC Internal Filters
      Internal Filters section

      There you are:

      MPC HC Video Decoder Properties

      Hope this helps!

  3. So just to confirm the 520 or 515 have no h.265 encoding capabilities?

    Can any laptop Skylake chips hardware encode h.265?

    Thanks,

    John

    • Hello John,

      Intels’ QuickSync can encode H.265 in hardware on the Skylake platform, albeit only in 8-bit per color channel. So no 10-bit until Eaglelake arrives…

      If you’re looking for a simple GUI tool to do it on Microsoft Windows, StaxRip might be your best choice. It comes with QSVEncC and should be able to hardware-encode 8-bit H.265/HEVC for you! I haven’t tested it myself though, since I’m using software encoders exclusively, because they deliver the highest quality per bitrate so far (x264, x265).

  4. would be interesting to see the CPU load for the same videos with 8bit encoding.

    Skylake does not support 10bit HEVC (will be added in Kaby Lake) – only 8bit. That would show how low the CPU utilization is when decoding is done fully in HW.

    • Hey Chris,

      Hm, why not? I can do that, I still have the raw data lying around, don’t even need to re-rip the BD. I’ll push it into an 8-bit x264 encoder at the same quality settings. You’ll need to give me a bit of time for completing the encode and re-running the test however. Maybe a day or two or something like that. :)

      PS: And that’s why I regretted buying an expensive Skylake convertible. Dammit, just waiting for one more generation would’ve given me a true 10-bit H.265 decoding ASIC. :cry:

      Edit: Eeh, seems I have to diagnose a little problem on my main workstation (me no likey MACHINE_CHECK_EXCEPTION *** STOP: 0x0000009C) that will require a lot of diagnosis and likely reboots (whether voluntary ones or not…). I’m pushing the input data to my FreeBSD A/V cruncher over the Internet. So it’ll take a few extra days to complete the 8-bit re-encode. :(

    • Hello again Chris,

      Here is your result:

      HP Elite X2 1012 G1 decoding 5Mbit, 8-bit H.265/HEVC with 16 B-frames in hardware using DXVA2
      HP Elite X2 1012 G1 / Core m5-6Y54 decoding 5Mbit H.265/HEVC in hardware using DXVA2 in MPC-HC

      As you can see, it’s clearly a “true” ASIC kicking in with 8-bit H.265/HEVC. The CPU load is negligible, and there is no load variance with changing scene complexity / bitrate. What little there is, is pure kernel load, so I’m guessing that’s just the thread pushing data into DXVA2 for decoding. The other parts like SRT subtitle rendering and 5.1 AAC-LC decoding seem to cost next to nothing. DXVA2 was set to native of course, so no copy-back to system RAM.

      Intels’ QuickSync still doesn’t support H.265/HEVC even with the latest Beta driver, so that’s why it had to be DXVA2.

      For completeness’ sake, I should re-run the test on the Elitebook 820 as well, but unfortunately I don’t have those machines available anymore. Still, this kind of performance on a lowly m5-6Y54 already speaks volumes…

  5. Hardware: Intel Z8300 Tablet with 2GB Ram (Teclast X80 Pro)
    Software: Windows 10 and MPC-HC (Ver 1.7.10)
    Video File Format: HVC1 1920×1080 23.976fps [V: hevc main 10, yuv420p10le, 1920×1080 [default]]

    => Playing smoothly on internal display (1920×1200) and on HDMI-connected Panasonic Plasma TV (1920×1080)

    • Hello highwind,

      Ah, an Atom. I see, 1.6GHz all-core boost clock. Well, I do have much faster cores here, but you got double the cores on the other hand. ;) Good to see a quadcore Atom can still run 10-bit HEVC at that resolution. Do you get hybrid HW acceleration as well? And, do you know the average bitrate of your video at any chance ([Mediainfo] might help in determining that). Also, how much load does it put on your cores? Just curious over here.

      Ah, by the way, I read a few documents recently, and it seems that Intels upcoming Kaby Lake CPUs/GPUs will come with a real H.265/HEVC decoder ASIC (probably encoder as well), and it’s going to support 10-bit too. That means that next round we’ll get some “real” HW acceleration, not the half-assed version of (some) Skylake(s). :) Just in case anyone’s interested.

      • Hi thrawn,

        my videos are all between 2500 and 2800 Kbps. I think it is hybrid acceleration, but I dont know how to check with MPC-HC to be honest. CPU load is pretty high, 70% at lowest with spikes going up to 100% and all cores are stressed equally.
        Strange thing is: Those videos are only playing lag-free with MPC-HC on Windows… VLC on Windows lags/stutters pretty bad, MX Player or VLC (even the new 2.0) on Android lags/stutters aswell.

        I read that about 10bit on Kaby Lake aswell, thats why I started investigating this topic in first place and got to your side in the process ;-)
        While complete 10bit HW decoding would be very nice, I think Intel also said that they wanted to withdraw from the “mobile market” (which I guess the low-power Atoms belong to), so I wouldnt be to optistic on Intel tablet SOCs with such a feature… It would be a shame though, I really love this Windows/Android dual-boot stuff.

        • Hi highwind,

          Ah, I didn’t even know you could dual-boot Windows and Android on x86 tablets, heh. Well, I am a bit behind the times when it comes to mobile devices after all (and not just there). About MPC-HC being fast: An old friend of mine recommended me the player when I was still using WMP with classic 6.4 skin. That guy was always testing tons of software against each other to determine which was the fastest and most lightweight for a given task. His result for video players on Windows? MPC-HC. That was long, looong ago, but it seems it’s still true today. ;)

          As for determining whether your decoding is pure HW via an ASIC or just hybrid, well, I don’t think you can do that with MPC-HC alone. But [DXVAChecker] might fit the bill. Look at this for comparison, it’s on Windows XP x64 with an older GeForce driver (no hybrid decoding possible on that platform with that GPU!), so the supported features are reduced in the first place):

          DXVA Checker on a GeForce GTX Titan Black on Windows XP x64 SP2
          (Click to enlarge)

          If you compare that to the [screenshot above], you can see that for some codecs, it says something like “DXVA2/D3D11” or maybe “DXVA1/2/D3D11”, while the Titan Black only shows pure “DXVA1” or “DXVA1/2” or the likes.

          That’s the hint you would need I presume. If there is a “D3D11” there, it should mean that regular shader code (or Direct Compute code?) is helping out as well, making it hybrid decoding. At least that’s what I get from what I read on the web, Doom9 forums and all. Not 100% sure if that’s it, but that’s my guess. Machine-translating the readme from Japanese to English or German didn’t tell me anything else either.

          Ah, shame about the Atoms though, they did have a reason to exist after all. I heard that Intel would throw them away entirely, concentrating on low-power Core processors instead. Quite a shame though. Not that x86 is actually such a good microarchitecture, but after such a long evolution and with Intels’ prowess in manufacturing it’s a good option after all. Especially if you wanna run Windows with common software on a low-power platform with some decent battery life.

    • I have a x5-z8300 tablet too, but DXVAChecker also states it doesn’t support 10bit HEVC HW decode. Also, none of my HEVC files play with DXVA via MPC. So whats correct? Does the x8300 support 10bit HEVC? It doesn’t look like it does. Actually none of my HEVC files play via HW decode: http://i.imgur.com/YQZ8kE5.png Thats no 10bit, isn’t it? Anyone why or whats the problem? I checked the options in the LVA filter of course: http://i.imgur.com/23QRaNd.png

      • Hello maffle,

        From your screenshots I cannot determine whether the video file is 10-bit or not. You should try and use [mediainfo] on it to make sure. MPC-HC has it built in too, accessible via file / properties while playing, but you need to switch to the “Mediainfo” tab to get the goods. Look at “Bit depth” (8 bits, 10 bits or 12 bits) and/or “Format profile” (something like High or High 10) to make sure.

  6. Thanks that’s helpful.
    Small question: how do you get from MPC-HC these LAV Video Decoder “Properties” pages? I can’t figure out!

    • Hello Joffrey,

      Ah… that one. It can be found in View \ Options \ Internal Filters \ Video Decoder, see the following screenshot:

      MPC-HC lavf properties page
      (Click to enlarge)

  7. Thankx for the test and review. It would be also very interesting to see how a i5-6200u is (not) able to play a 10 bit HEVC video.

    • Hello pezollner,

      Not sure if I understand you properly, but a Core i5-6200U is included in this test (first three task manager screenshots), including a test of the hybrid hardware assist by HD Graphics 520.

      Or did you mean something else?

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre lang="" line="" escaped="" cssfile="">

(required)

(required)