After [compiling] and running the x265 HEVC encoder, and after [looking at its quality] for animated content, here’s another little piece of information about my experiments with H.265/HEVC. And this time it’s the decoding part. Playing H.265-encoded videos on the PC is relatively easy. On Windows I tend to use [MPC-HC] for this, and on Linux/UNIX you can use [mplayer] or [VLC]. The newest versions of those players are all linked against a modern libav or ffmpeg library collection, so they can decode anything any H.265 encoder can throw at them.
The questions are: At what price? And: What about mobile devices?
H.265/HEVC is costly in terms of computation, and not just in the encoding stage. Decoding this stuff is hard as well. So I looked at two older Core 2 processors to see how they fare when decoding regular 10-bit H.264/AVC and the same content encoded as 10-bit H.265/HEVC, both times at the same bitrate of 3Mbit/s ABR. Again, the marvelous Anime movie “The Garden of Words” was used for this. The video player of my choice for playback was [MPC-HC] v1.7.10, rendering to a VMR9 surface.
On top of that, I can also provide some insight on how relatively modern Android devices will handle this (devices partly without a H.265 hardware decoder chip however!), all thanks to [Umlüx], who’s been willing to install the necessary Apps and run some tests! On Android, [MX Player] was used.
For the record, the encoding settings were like this for x264 (pass 1 & pass 2), …
--fps 24000/1001 --preset veryslow --tune animation --open-gop --b-adapt 2 --b-pyramid normal -f -2:0 --bitrate 3000 --aq-mode 1 -p 1 --slow-firstpass --stats v.stats -t 2 --no-fast-pskip --cqm flat --non-deterministic --fps 24000/1001 --preset veryslow --tune animation --open-gop --b-adapt 2 --b-pyramid normal -f -2:0 --bitrate 3000 --aq-mode 1 -p 2 --stats v.stats -t 2 --no-fast-pskip --cqm flat --non-deterministic
…and this for x265:
--y4m -D 10 --fps 24000/1001 -p veryslow --open-gop --bframes 16 --b-pyramid --bitrate 3000 --rect --amp --aq-mode 3 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0 --rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 1 --slow-firstpass --stats v.stats --sar 1 --range full --y4m -D 10 --fps 24000/1001 -p veryslow --open-gop --bframes 16 --b-pyramid --bitrate 3000 --rect --amp --aq-mode 3 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0 --rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 2 --stats v.stats --sar 1 --range full
Contender #1 is my old Sony TT45X subnotebook, and this is the processor inside:
Contender #2 is my secondary workstation at work, that I recently upgraded with an SSD and a better processor we had lying around. It’s using this chip now:
Let’s throw some video at them! First, we’ll try some good old H.264 on the slow-clocked Core 2 Duo mobile chip:
We can see some high load there. Mind you, since this is 10-bit H.264, there is no GPU acceleration whatsoever, minus maybe the bicubic scaler, which is implemented as Pixel Shader 2.0 code. Everything else has to be done by the CPU and its SSE extensions. Let’s take a look at the new H.265 then:
The beginning of the movie, which shows only some moving logo on mostly uniform background and then some static text does just fine. You can really see what’s happening on screen when looking at the usage curve above. As soon as the first serious video frame with lots of movement starts to fade in, the machine just falls on its face, as the bitrate is rising. Realtime playback can no longer be achieved, A/V synchronization is being lost and a ton of frames are being dropped, resulting in a horrible experience.
Thing is, even with all the frame drops, it’s still losing sync even and falling short of delivering those frames which are still being decoded with the proper timing. It’s just abysmal.
So let’s change our environment to a CPU with twice the cores and roughly twice the clock speed, while staying with the same architecture / instruction set, H.264 first:
And with the harder stuff:
Now, H.265 is clearly putting some load even on the quad-core. I’m assuming that with higher bitrates as we would use for 4K/UHD material, this processor might be in trouble. For 1080p and bitrates up to maybe 6Mbit/s it should still be ok however. Of course, with the most modern graphics cards you can give even a Core 2 a companion device which can do H.265 decoding in hardware, like an NVidia GPU that can provide you with [PureVideo] feature sets E or even F, or maybe an AMD Radeon with [UVD] level 6. But even then, 10-bit content might not be accelerated, so you need to be careful when choosing your GPU. Intel has recently added 10-bit support to some of their onboard GPUs (HD Graphics 5500 & 6000, Iris Graphics 6100) with driver v220.127.116.1180, nVidia added it starting with the Maxwell generation and AMD Radeons still don’t have it as far as I know.
Now, what about mobile devices? A fast enough PC can do H.265/HEVC even without hardware assist, at the very least in 1080p, and likely 4K as well, when we look at modern Core i3/i5/i7 and somewhat comparable AMD Athlon FX chips. How about multicore ARMv7/v8 chips with NEON instruction set extensions?
Let’s look at two Android devices, a Sony Xperia Tablet Z2 (without a H.265 hardware decoder) and a Sony Xperia Z5 phone:
The Tablet features the following CPU:
That’s not exactly a slow chip, but an out-of-order execution pipeline with NEON extensions at a decent clock rate. And the phone:
So the phone has a very modern 64-bit ARMv8 big.LITTLE CPU setup with four faster out-of-order cores, and four slower, energy-efficient in-order cores. Optimally, all of them should be used as much as possible when throwing a seriously demanding task at the device. Let’s look at how it goes, but first on the older tablet with H.264 for starters:
You’re probably thinking “There’s no way this can work with H.265”, right? Well, you’d be correct:
It just bombs. Frame drops and stuttering, no way you’d want to watch anything when it goes like that. Some of the calmer scenes (=lower bitrate) still work, but lots of rain all over the frame and the Z2 is done for. So let’s move to the more modern hybrid-core processor of the Sony Xperia Z5 smartphone:
The big Cortex-A57 cores seem to be doing most of the work here, clocking at their maximum speed. Given we’re over 50% load with the most difficult scenes however, some threads seem to be pushed over to the smaller Cortex-A53 cores as well. In any case, the end result is that H.264 works smoothly throughout the movie. But still, load is high, and our only headroom left are some free, slow in-order cores… So, H.265?
Something strange is happening now: H.265/HEVC hardware decoders should support this 10-bit H.265 file just fine. But for some reason, MX Player still falls back to software decoding, despite the player being up-to-date – something even a CPU as powerful as a Snapdragon 810 cannot handle for all parts of “The Garden of Words” at the given settings. The really demanding scenes will still fail to play back decently. CPU clock is also lowered a bit now, probably because of power and/or heat management.
My assumption would be that MX Player simply doesn’t support the HW decoder in this phone yet, which is a shame if it’s true. Another reason might be that I used some parameters and/or features of H.265 in this encode, that are not implemented in this chip. Whichever the case may be, the Snapdragon 810 alone cannot handle it either!
Update: After further research it turns out that almost no hardware accelerator supports Hi10p or Main-10 for H.265. In other words: No 10-bit decoding for H.265! It’s possible on NVidias Tegra K1 & X1 due to some CUDA hacks in MX Player, but nowhere else it seems. The upcoming Snapdragon 820 should however support it, and devices based on it should become available around March 2016:
And this concludes my little performance analysis, after which I can say that either you need a relatively ok PC to use H.265, or a hardware decoder chip that works with your files and your software, if you’re targeting other platforms like hardware players or smartphones and tablets!
PS.: Thanks fly out to Umlüx for doing the Android tests!
 Images are © 2016 Umlüx, used with express permission.