Feb 232016
 

H.265/HEVC logoAfter [compiling] and running the x265 HEVC encoder, and after [looking at its quality] for animated content, here’s another little piece of information about my experiments with H.265/HEVC. And this time it’s the decoding part. Playing H.265-encoded videos on the PC is relatively easy. On Windows I tend to use [MPC-HC] for this, and on Linux/UNIX you can use [mplayer] or [VLC]. The newest versions of those players are all linked against a modern libav or ffmpeg library collection, so they can decode anything any H.265 encoder can throw at them.

The questions are: At what price? And: What about mobile devices?

H.265/HEVC is costly in terms of computation, and not just in the encoding stage. Decoding this stuff is hard as well. So I looked at two older Core 2 processors to see how they fare when decoding regular 10-bit H.264/AVC and the same content encoded as 10-bit H.265/HEVC, both times at the same bitrate of 3Mbit/s ABR. Again, the marvelous Anime movie “The Garden of Words” was used for this. The video player of my choice for playback was [MPC-HC] v1.7.10, rendering to a VMR9 surface.

On top of that, I can also provide some insight on how relatively modern Android devices will handle this (devices partly without a H.265 hardware decoder chip however!), all thanks to [Umlüx], who’s been willing to install the necessary Apps and run some tests! On Android, [MX Player] was used.

For the record, the encoding settings were like this for x264 (pass 1 & pass 2), …

--fps 24000/1001 --preset veryslow --tune animation --open-gop --b-adapt 2 --b-pyramid normal -f -2:0
--bitrate 3000 --aq-mode 1 -p 1 --slow-firstpass --stats v.stats -t 2 --no-fast-pskip --cqm flat
--non-deterministic

--fps 24000/1001 --preset veryslow --tune animation --open-gop --b-adapt 2 --b-pyramid normal -f -2:0
--bitrate 3000 --aq-mode 1 -p 2 --stats v.stats -t 2 --no-fast-pskip --cqm flat --non-deterministic

…and this for x265:

--y4m -D 10 --fps 24000/1001 -p veryslow --open-gop --bframes 16 --b-pyramid --bitrate 3000 --rect
--amp --aq-mode 3 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0
--rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 1
--slow-firstpass --stats v.stats --sar 1 --range full

--y4m -D 10 --fps 24000/1001 -p veryslow --open-gop --bframes 16 --b-pyramid --bitrate 3000 --rect
--amp --aq-mode 3 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0
--rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 2
--stats v.stats --sar 1 --range full

PC first:

Contender #1 is my old Sony TT45X subnotebook, and this is the processor inside:

CPU-Z on a Core 2 Duo SU9600

A Core 2 Duo SU9600 Penryn, which is an ULV processor at 1.6GHz, shown under all-core load, where it doesn’t boost to 1.83GHz any longer (And yeah, this is still WinXP+POSReady2009).

Contender #2 is my secondary workstation at work, that I recently upgraded with an SSD and a better processor we had lying around. It’s using this chip now:

CPU-Z on a Core 2 Quad Q9505

A Core 2 Quad Q9505 Yorkfield, which has been overclocked to a rock solid 3.4GHz.

Let’s throw some video at them! First, we’ll try some good old H.264 on the slow-clocked Core 2 Duo mobile chip:

Core 2 Duo playing the beginning of "The Garden of Words" as 3Mbit H.264/AVC

The Core 2 Duo SU9600 playing the beginning of “The Garden of Words” as 3Mbit H.264/AVC (on a German version of Windows).

We can see some high load there. Mind you, since this is 10-bit H.264, there is no GPU acceleration whatsoever, minus maybe the bicubic scaler, which is implemented as Pixel Shader 2.0 code. Everything else has to be done by the CPU and its SSE extensions. Let’s take a look at the new H.265 then:

Core 2 Duo playing the beginning of "The Garden of Words" as 3Mbit H.265/HEVC

Same thing as before, but with 3Mbit H.265/HEVC.

The beginning of the movie, which shows only some moving logo on mostly uniform background and then some static text does just fine. You can really see what’s happening on screen when looking at the usage curve above. As soon as the first serious video frame with lots of movement starts to fade in, the machine just falls on its face, as the bitrate is rising. Realtime playback can no longer be achieved, A/V synchronization is being lost and a ton of frames are being dropped, resulting in a horrible experience.

Thing is, even with all the frame drops, it’s still losing sync even and falling short of delivering those frames which are still being decoded with the proper timing. It’s just abysmal.

So let’s change our environment to a CPU with twice the cores and roughly twice the clock speed, while staying with the same architecture / instruction set, H.264 first:

Core 2 Quad playing the beginning of "The Garden of Words" as 3Mbit H.264/AVC

The quad-core Q9505 hardly has any trouble with H.264 at all. It’s completely smooth sailing.

And with the harder stuff:

Core 2 Quad playing the beginning of "The Garden of Words" as 3Mbit H.265/HEVC

We can see that the load has roughly doubled with H.265.

Now, H.265 is clearly putting some load even on the quad-core. I’m assuming that with higher bitrates as we would use for 4K/UHD material, this processor might be in trouble. For 1080p and bitrates up to maybe 6Mbit/s it should still be ok however. Of course, with the most modern graphics cards you can give even a Core 2 a companion device which can do H.265 decoding in hardware, like an NVidia GPU that can provide you with [PureVideo] feature sets E or even F, or maybe an AMD Radeon with  [UVD] level 6. But even then, 10-bit content might not be accelerated, so you need to be careful when choosing your GPU. Intel has recently added 10-bit support to some of their onboard GPUs (HD Graphics 5500 & 6000, Iris Graphics 6100) with driver v15.36.14.4080, nVidia added it starting with the Maxwell generation and AMD Radeons still don’t have it as far as I know.

Now, what about mobile devices? A fast enough PC can do H.265/HEVC even without hardware assist, at the very least in 1080p, and likely 4K as well, when we look at modern Core i3/i5/i7 and somewhat comparable AMD Athlon FX chips. How about multicore ARMv7/v8 chips with NEON instruction set extensions?

Let’s look at two Android devices, a Sony Xperia Tablet Z2 (without a H.265 hardware decoder) and a Sony Xperia Z5 phone:

The Tablet features the following CPU:

The Xperia Tablet Z2 uses a Qualcomm Snapdragon 801

The Xperia Tablet Z2 uses a Qualcomm Snapdragon 801 quad-core.[1]

That’s not exactly a slow chip, but an out-of-order execution pipeline with NEON extensions at a decent clock rate. And the phone:

The Xperia Z5 has a Qualcomm Snapdragon 810

The Xperia Z5 has a Qualcomm Snapdragon 810 in big.LITTLE configuration.[1]

So the phone has a very modern 64-bit ARMv8 big.LITTLE CPU setup with four faster out-of-order cores, and four slower, energy-efficient in-order cores. Optimally, all of them should be used as much as possible when throwing a seriously demanding task at the device. Let’s look at how it goes, but first on the older tablet with H.264 for starters:

The Xperia Tablet Z2 playing H.264/AVC.

The Z2 manages to play the H.264/AVC version without any stuttering, but just barely. It has to boost its clock speed all the way to the top to manage.[1]

You’re probably thinking “There’s no way this can work with H.265”, right? Well, you’d be correct:

With H.265, the Xperia Tablet Z2 stumbles.

With 3-Mbit H.265, the Xperia Tablet Z2 stumbles. Full clock speed, all cores under load, no chance to play demanding scenes without massive problems.[1]

It just bombs. Frame drops and stuttering, no way you’d want to watch anything when it goes like that. Some of the calmer scenes (=lower bitrate) still work, but lots of rain all over the frame and the Z2 is done for. So let’s move to the more modern hybrid-core processor of the Sony Xperia Z5 smartphone:

The Z5 playing the 10-bit H.264 on CPU only

The Z5 playing the 10-bit H.264 on CPU only. The octa-core big.LITTLE processor seems to load its fat cores mostly, which makes sense with demanding workloads.[1]

The big Cortex-A57 cores seem to be doing most of the work here, clocking at their maximum speed. Given we’re over 50% load with the most difficult scenes however, some threads seem to be pushed over to the smaller Cortex-A53 cores as well. In any case, the end result is that H.264 works smoothly throughout the movie. But still, load is high, and our only headroom left are some free, slow in-order cores… So, H.265?

The Z5 fails with 10-bit H.264 too. Where is my hardware decoder?!

The Z5 fails as well. While it can cope a little better, scenes like the above are just too much. Where is my hardware decoder?![1]

Something strange is happening now: H.265/HEVC hardware decoders should support this 10-bit H.265 file just fine. But for some reason, MX Player still falls back to software decoding, despite the player being up-to-date – something even a CPU as powerful as a Snapdragon 810 cannot handle for all parts of “The Garden of Words” at the given settings. The really demanding scenes will still fail to play back decently. CPU clock is also lowered a bit now, probably because of power and/or heat management.

My assumption would be that MX Player simply doesn’t support the HW decoder in this phone yet, which is a shame if it’s true. Another reason might be that I used some parameters and/or features of H.265 in this encode, that are not implemented in this chip. Whichever the case may be, the Snapdragon 810 alone cannot handle it either!

Update: After further research it turns out that almost no hardware accelerator supports Hi10p or Main-10 for H.265. In other words: No 10-bit decoding for H.265! It’s possible on NVidias Tegra K1 & X1 due to some CUDA hacks in MX Player, but nowhere else it seems. The upcoming Snapdragon 820 should however support it, and devices based on it should become available around March 2016:

Snapdragon 820s' Advancements

This is what Snapdragon 820 (MSM8996) should give us over Snapdragon 810 (MSM8994), including 10-bit H.265/HEVC decoding in hardware ([source]Russian flag)!

And this concludes my little performance analysis, after which I can say that either you need a relatively ok PC to use H.265, or a hardware decoder chip that works with your files and your software, if you’re targeting other platforms like hardware players or smartphones and tablets!

PS.: Thanks fly out to Umlüx for doing the Android tests!

[1] Images are © 2016 Umlüx, used with express permission.

CC BY-NC-SA 4.0 A very quick and simple look at the performance cost of playing back H.265/HEVC on PC and Android by The GAT at XIN.at is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre lang="" line="" escaped="" cssfile="">

(required)

(required)