Mar 142018
H.265/HEVC logo

1.) Introduction

This post is a followup to the original you can find [here]. My reason for writing a new one instead of just editing the existing one is that the new results have been measured with a slightly different parametrization and under slightly different circumstances, so direct comparability has been thrown out the window. Anyway, like in the former article, multiple x265 video encoder versions starting with 1.7+512 are going to be compared to each other to see how the performance of x265 has evolved over time.

Originally, I wanted to test this only on a machine very similar to my original testbed, maxing out at 6 cores and SSE4.2 instruction set extensions. Since it is said that x265 features significant optimizations for newer AVX and AVX2 instructions however, I decided to include a more modern machine as well, to determine how the scaling differs on both an older and a newer machine.

2.) Benchmarks

First of all, the video used for this is not very “high definition”. Well, maybe by definition as it is 720p, but still. This also puts limits on scalability across cores/threads, which shall provide some additional insights later. Let’s start.

2a.) The older machine on MS Windows

The older one goes first: Here we have an Intel Core i7 980X running CentOS 6.9 Linux. Since I already had all those [Win64 builds] lying around, I ran the test from within a VirtualBox running Windows XP Professional x64 Edition SP2. The specs:

  • CPU: Intel Core i7 980X 3.33GHz (6 cores, 12 threads), last instruction set extension: SSE4.2
  • OS: Windows XP Professional x64 Edition SP2 on VirtualBox 5.1.2 on CentOS Linux 6.9

Here we go:

x265 performance trend on the older machine

x265 performance trend on the older machine (click to enlarge)

While the developments here aren’t exactly identical to what has been posted in the former article, the trend is clearly the same. The new part starts with version 2.3+2, after which we can see a long-needed performance improvement, maybe due to improvements to the assembly code. Also, from here on out we can observe a slight upwards trend for all color channel depths. The basic drop from 8-bit to 10-bit and then 12-bit color stays rather similar though, courtesy of the more expensive 16-bit arithmetic used for the higher color depths (The 8-bit version of x265 also uses 8-bit arithmetic where applicable, reducing effective precision).

There is nothing groundbreaking to be seen like with the introduction of --no-rskip somewhere in between 1.9+141 and 1.9+200, but whatever happened between 2.3+2 and 2.4+2 isn’t too shabby either.

Now let’s see how the modern box with AVX, AVX2, BMI2 and FMA3 instructions fares here!

2b.) The newer machine on CentOS Linux

Alright, specs first:

  • CPUs: 2 × Intel Xeon E5-2620 v4 (2 × 8 cores, 16 Threads for a 16C/32T total), newest instruction set extensions: AVX2, BMI2, FMA3
  • OS: CentOS 7.3.1611 with Linux kernel 3.10.0-514.16.1

This is a modern 2017 Broadwell-EP server featuring a much higher IPC when compared to the older Westmere/Gulftown architecture from 2010. My estimate would be around +40% of raw, additional instructions per clock here. On top of that, we’ll get a peek at its AVX/AVX2 etc. scaling:

x265 performance trend on the modern machine

x265 performance trend on the modern machine

Now, before I comment on the trend to be seen in that chart, let me say one thing: Surprisingly, unlike what x264 has shown over the years, x265 didn’t seem to want to scale more across cores/threads with progressing development. So the core count was not really having much influence at all; Pass 1 of the encode would use 5-6 cores, where pass 2 would eat up 6-7, all the way from the oldest to the newest versions.

What we do see here aside from the --no-rskip drop and the performance increase from 2.3+2 to 2.4+2 are three things: First, we can see a new upwards trend in performance starting with 2.5+9. The development has been more linear for the older CPU, which is why I’m assuming that we get to see some work invested in AVX(2) code here, probably also in BMI/FMA code.

The second thing is the more significant decline starting after 2.0+11. I have no explanation for this, as new features added around here shouldn’t have more negative impacts on newer CPUs as when to compared to older ones. If anything, it should be the other way around, so that part going on up to 2.1+60 is a bit confusing.

And the third thing that is very clear here: The 12-bit version is much closer to 10-bit performance. Only thing I can think of here is that the AVX/AVX2 code paths have generally always been faster on the 12-bit version when compared to the older ≤SSE4.2 assembly code.

Seeing this, I’d like to look at one more thing: The dropoff we can see from 8-bit -> 10-bit -> 12-bit compared between the two architectures. Let’s do that:

3.) 8-/10-/12-bit scaling across platforms

Let’s compare this side-by-side across all versions tested, in a very simple chart:

Well, the modern Xeons do lose less performance even from 8- to 10-bit, but as seen above, the drop to 12-bit is surprisingly insignificant on modern CPUs. This is a bit unfortunate actually, given that all the rage is about 10-bit with the Anime communities and also the official Blu-Ray UHD/4K standards, but oh well. :roll:

4.) Summing it up

Anyway, this hasn’t been too interesting actually, aside from the 12-bit surprise that I guess nobody cares about anyway. Well, we’re seeing some more significant performance increases for recent versions of x265 on modern architectures, but those are only compensating for the loss of performance we’ve seen before, that didn’t occur on older machines.

Maybe the AVX+AVX2 code was good from the very start instead of spreading its wings only now? Anyway, we’re seeing some slow but steady increase in performance for now even on older machines. If there’s a reason to buy new processors for x265 it’s probably not the AVX/AVX2/FMA3/BMI2/AVX512/whatever, but rather the generally higher IPC of newer chips…

5.) How the benchmark was done

5a.) On Windows

Given a folder with x265 versions named x1.7+512.exe, x1.6+2.exe etc., as well as an ffmpeg.exe (like from [here]), the subfolders log\, output\, stats\ and results\, you can adapt and use the following script after putting it right next to the x265 binaries, I call it test-performance-trends.bat:

expand/collapse source code
  1. @ECHO OFF
  3. FOR %%I IN (1.7-8b 1.9+15 1.9+108 1.9+141 1.9+200 1.9+210 1.9+230 2.0+11 2.0+54 ^
  4.  2.1+2 2.1+60 2.2+22 2.3+2 2.4+2 2.5+9 2.6+2 2.7) DO ECHO Testing x265-%%I, 8 ^
  5.  bits... & .\timethis.exe "echo x265 version %%I 8 bit results: & .\ffmpeg.exe ^
  6.  -r 24000/1001 -i .\video-teekyuu.h264 -f yuv4mpegpipe -pix_fmt yuv420p10le ^
  7.  -strict -1 -r 24000/1001 - 2>NUL | .\x%%I.exe - --y4m -D 8 --fps 24000/1001 -p ^
  8.  veryslow --bitrate 2000 --pass 1 --slow-firstpass --stats .\stats\%%I-8b.stats ^
  9.  -o .\output\%%I-8b-p1.h265 2>NUL & .\ffmpeg.exe -r 24000/1001 -i ^
  10.  .\video-teekyuu.h264 -f yuv4mpegpipe -pix_fmt yuv420p10le -strict -1 ^
  11.  -r 24000/1001 - 2>NUL | .\x%%I.exe - --y4m -D 8 --fps 24000/1001 -p veryslow ^
  12.  --bitrate 2000 --pass 2 --stats .\stats\%%I-8b.stats -o .\output\%%I-8b-p2.h265 ^
  13.  2>NUL" 1> .\results\results-%%I-8b.txt 2>.\log\timethis-errorlog-%%I-8b.txt
  16. FOR %%J IN (1.7-10b 1.9+15 1.9+108 1.9+141 1.9+200 1.9+210 1.9+230 2.0+11 2.0+54 ^
  17.  2.1+2 2.1+60 2.2+22 2.3+2 2.4+2 2.5+9 2.6+2 2.7) DO ECHO Testing x265-%%J, 10 ^
  18.  bits... & .\timethis.exe "echo x265 version %%J 10 bit results: & .\ffmpeg.exe ^
  19.  -r 24000/1001 -i .\video-teekyuu.h264 -f yuv4mpegpipe -pix_fmt yuv420p10le ^
  20.  -strict -1 -r 24000/1001 - 2>NUL | .\x%%J.exe - --y4m -D 10 --fps 24000/1001 -p ^
  21.  veryslow --bitrate 2000 --pass 1 --slow-firstpass --stats .\stats\%%J-10b.stats ^
  22.  -o .\output\%%J-10b-p1.h265 2>NUL & .\ffmpeg.exe -r 24000/1001 -i ^
  23.  .\video-teekyuu.h264 -f yuv4mpegpipe -pix_fmt yuv420p10le -strict -1 ^
  24.  -r 24000/1001 - 2>NUL | .\x%%J.exe - --y4m -D 10 --fps 24000/1001 -p veryslow ^
  25.  --bitrate 2000 --pass 2 --stats .\stats\%%J-10b.stats -o ^
  26.  .\output\%%J-10b-p2.h265 2>NUL" 1> .\results\results-%%J-10b.txt ^
  27.  2>.\log\timethis-errorlog-%%J-10b.txt
  29. FOR %%K IN (1.7-12b 1.9+15 1.9+108 1.9+141 1.9+200 1.9+210 1.9+230 2.0+11 2.0+54 ^
  30.  2.1+2 2.1+60 2.2+22 2.3+2 2.4+2 2.5+9 2.6+2 2.7) DO ECHO Testing x265-%%K, 12 ^
  31.  bits... & .\timethis.exe "echo x265 version %%K 12 bit results: & .\ffmpeg.exe ^
  32.  -r 24000/1001 -i .\video-teekyuu.h264 -f yuv4mpegpipe -pix_fmt yuv420p10le ^
  33.  -strict -1 -r 24000/1001 - 2>NUL | .\x%%K.exe - --y4m -D 12 --fps 24000/1001 ^
  34.  -p veryslow --bitrate 2000 --pass 1 --slow-firstpass --stats ^
  35.  .\stats\%%K-12b.stats -o .\output\%%K-12b-p1.h265 2>NUL & .\ffmpeg.exe -r ^
  36.  24000/1001 -i .\video-teekyuu.h264 -f yuv4mpegpipe -pix_fmt yuv420p10le -strict ^
  37.  -1 -r 24000/1001 - 2>NUL | .\x%%K.exe - --y4m -D 12 --fps 24000/1001 ^
  38.  -p veryslow --bitrate 2000 --pass 2 --stats .\stats\%%K-12b.stats -o ^
  39.  .\output\%%K-12b-p2.h265 2>NUL" 1> .\results\results-%%K-12b.txt ^
  40.  2>.\log\timethis-errorlog-%%K-12b.txt
  42. ECHO All done, results are to be found in the results\results-*.txt files!

5b.) On Linux

Given a system-wide installation of ffmpeg and a folder with statically linked x265 binaries named x1.7+512, x1.6+2 etc., as well as the subfolder results/, the following script – – should do the job after being adapted for your list of x265 binaries and put right next to them:

expand/collapse source code
  1. #!/usr/bin/env sh
  3. for i in {1.7+512,1.9+15,1.9+108,1.9+141,1.9+200,1.9+210,1.9+230,2.0+11,2.0+54,\
  4. 2.1+2,2.1+60,2.2+22,2.3+2,2.4+2,2.5+9,2.6+2,2.7}; do printf "\nTesting x265-$i, \
  5. 8 bits...\n" && time ( printf "x265 version $i 8 bit results:\n" \
  6. 1>./results/results-$i-8b.txt && ffmpeg -r 24000/1001 -i ./video-teekyuu.h264 \
  7. -f yuv4mpegpipe -pix_fmt yuv420p10le -strict -1 -r 24000/1001 - 2>/dev/null | \
  8. ./x$i - --y4m -D 8 --fps 24000/1001 -p veryslow --bitrate 2000 --pass 1 \
  9. --slow-firstpass --stats ./stats/$i-8b.stats -o ./output/$i-8b-p1.h265 \
  10. 2>/dev/null && ffmpeg -r 24000/1001 -i ./video-teekyuu.h264 -f yuv4mpegpipe \
  11. -pix_fmt yuv420p10le -strict -1 -r 24000/1001 - 2>/dev/null | ./x$i - --y4m \
  12. -D 8 --fps 24000/1001 -p veryslow --bitrate 2000 --pass 2 --stats \
  13. ./stats/$i-8b.stats -o ./output/$i-8b-p2.h265 2>/dev/null ) \
  14. 2>>./results/results-$i-8b.txt; done
  16. for j in {1.7+512,1.9+15,1.9+108,1.9+141,1.9+200,1.9+210,1.9+230,2.0+11,2.0+54,\
  17. 2.1+2,2.1+60,2.2+22,2.3+2,2.4+2,2.5+9,2.6+2,2.7}; do printf "\nTesting x265-$j, \
  18. 10 bits...\n\n" && time ( printf "x265 version $j 10 bit results:\n" \
  19. 1>./results/results-$j-10b.txt && ffmpeg -r 24000/1001 -i ./video-teekyuu.h264 \
  20. -f yuv4mpegpipe -pix_fmt yuv420p10le -strict -1 -r 24000/1001 - 2>/dev/null | \
  21. ./x$j - --y4m -D 10 --fps 24000/1001 -p veryslow --bitrate 2000 --pass 1 \
  22. --slow-firstpass --stats ./stats/$j-10b.stats -o ./output/$j-10b-p1.h265 \
  23. 2>/dev/null && ffmpeg -r 24000/1001 -i ./video-teekyuu.h264 -f yuv4mpegpipe \
  24. -pix_fmt yuv420p10le -strict -1 -r 24000/1001 - 2>/dev/null | ./x$j - --y4m \
  25. -D 10 --fps 24000/1001 -p veryslow --bitrate 2000 --pass 2 --stats \
  26. ./stats/$j-10b.stats -o ./output/$j-10b-p2.h265 2>/dev/null ) \
  27. 2>>./results/results-$j-10b.txt; done
  29. for k in {1.7+512,1.9+15,1.9+108,1.9+141,1.9+200,1.9+210,1.9+230,2.0+11,2.0+54,\
  30. 2.1+2,2.1+60,2.2+22,2.3+2,2.4+2,2.5+9,2.6+2,2.7}; do printf "\nTesting x265-$k, \
  31. 12 bits...\n\n" && time ( printf "x265 version $k 12 bit results:\n" \
  32. 1>./results/results-$k-12b.txt && ffmpeg -r 24000/1001 -i ./video-teekyuu.h264 \
  33. -f yuv4mpegpipe -pix_fmt yuv420p10le -strict -1 -r 24000/1001 - 2>/dev/null | \
  34. ./x$k - --y4m -D 12 --fps 24000/1001 -p veryslow --bitrate 2000 --pass 1 \
  35. --slow-firstpass --stats ./stats/$k-12b.stats -o ./output/$k-12b-p1.h265 \
  36. 2>/dev/null && ffmpeg -r 24000/1001 -i ./video-teekyuu.h264 -f yuv4mpegpipe \
  37. -pix_fmt yuv420p10le -strict -1 -r 24000/1001 - 2>/dev/null | ./x$k - --y4m \
  38. -D 12 --fps 24000/1001 -p veryslow --bitrate 2000 --pass 2 --stats \
  39. ./stats/$k-12b.stats -o ./output/$k-12b-p2.h265 2>/dev/null ) \
  40. 2>>./results/results-$k-12b.txt; done
  42. printf "\nBenchmarks completed, results in `pwd`/results/results-*.txt!\n"

That’s it!

Feb 232016

H.265/HEVC logoAfter [compiling] and running the x265 HEVC encoder, and after [looking at its quality] for animated content, here’s another little piece of information about my experiments with H.265/HEVC. And this time it’s the decoding part. Playing H.265-encoded videos on the PC is relatively easy. On Windows I tend to use [MPC-HC] for this, and on Linux/UNIX you can use [mplayer] or [VLC]. The newest versions of those players are all linked against a modern libav or ffmpeg library collection, so they can decode anything any H.265 encoder can throw at them.

The questions are: At what price? And: What about mobile devices?

H.265/HEVC is costly in terms of computation, and not just in the encoding stage. Decoding this stuff is hard as well. So I looked at two older Core 2 processors to see how they fare when decoding regular 10-bit H.264/AVC and the same content encoded as 10-bit H.265/HEVC, both times at the same bitrate of 3Mbit/s ABR. Again, the marvelous Anime movie “The Garden of Words” was used for this. The video player of my choice for playback was [MPC-HC] v1.7.10, rendering to a VMR9 surface.

On top of that, I can also provide some insight on how relatively modern Android devices will handle this (devices partly without a H.265 hardware decoder chip however!), all thanks to [Umlüx], who’s been willing to install the necessary Apps and run some tests! On Android, [MX Player] was used.

For the record, the encoding settings were like this for x264 (pass 1 & pass 2), …

--fps 24000/1001 --preset veryslow --tune animation --open-gop --b-adapt 2 --b-pyramid normal -f -2:0
--bitrate 3000 --aq-mode 1 -p 1 --slow-firstpass --stats v.stats -t 2 --no-fast-pskip --cqm flat

--fps 24000/1001 --preset veryslow --tune animation --open-gop --b-adapt 2 --b-pyramid normal -f -2:0
--bitrate 3000 --aq-mode 1 -p 2 --stats v.stats -t 2 --no-fast-pskip --cqm flat --non-deterministic

…and this for x265:

--y4m -D 10 --fps 24000/1001 -p veryslow --open-gop --bframes 16 --b-pyramid --bitrate 3000 --rect
--amp --aq-mode 3 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0
--rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 1
--slow-firstpass --stats v.stats --sar 1 --range full

--y4m -D 10 --fps 24000/1001 -p veryslow --open-gop --bframes 16 --b-pyramid --bitrate 3000 --rect
--amp --aq-mode 3 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0
--rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 2
--stats v.stats --sar 1 --range full

PC first:

Contender #1 is my old Sony TT45X subnotebook, and this is the processor inside:

CPU-Z on a Core 2 Duo SU9600

A Core 2 Duo SU9600 Penryn, which is an ULV processor at 1.6GHz, shown under all-core load, where it doesn’t boost to 1.83GHz any longer (And yeah, this is still WinXP+POSReady2009).

Contender #2 is my secondary workstation at work, that I recently upgraded with an SSD and a better processor we had lying around. It’s using this chip now:

CPU-Z on a Core 2 Quad Q9505

A Core 2 Quad Q9505 Yorkfield, which has been overclocked to a rock solid 3.4GHz.

Let’s throw some video at them! First, we’ll try some good old H.264 on the slow-clocked Core 2 Duo mobile chip:

Core 2 Duo playing the beginning of "The Garden of Words" as 3Mbit H.264/AVC

The Core 2 Duo SU9600 playing the beginning of “The Garden of Words” as 3Mbit H.264/AVC (on a German version of Windows).

We can see some high load there. Mind you, since this is 10-bit H.264, there is no GPU acceleration whatsoever, minus maybe the bicubic scaler, which is implemented as Pixel Shader 2.0 code. Everything else has to be done by the CPU and its SSE extensions. Let’s take a look at the new H.265 then:

Core 2 Duo playing the beginning of "The Garden of Words" as 3Mbit H.265/HEVC

Same thing as before, but with 3Mbit H.265/HEVC.

The beginning of the movie, which shows only some moving logo on mostly uniform background and then some static text does just fine. You can really see what’s happening on screen when looking at the usage curve above. As soon as the first serious video frame with lots of movement starts to fade in, the machine just falls on its face, as the bitrate is rising. Realtime playback can no longer be achieved, A/V synchronization is being lost and a ton of frames are being dropped, resulting in a horrible experience.

Thing is, even with all the frame drops, it’s still losing sync even and falling short of delivering those frames which are still being decoded with the proper timing. It’s just abysmal.

So let’s change our environment to a CPU with twice the cores and roughly twice the clock speed, while staying with the same architecture / instruction set, H.264 first:

Core 2 Quad playing the beginning of "The Garden of Words" as 3Mbit H.264/AVC

The quad-core Q9505 hardly has any trouble with H.264 at all. It’s completely smooth sailing.

And with the harder stuff:

Core 2 Quad playing the beginning of "The Garden of Words" as 3Mbit H.265/HEVC

We can see that the load has roughly doubled with H.265.

Now, H.265 is clearly putting some load even on the quad-core. I’m assuming that with higher bitrates as we would use for 4K/UHD material, this processor might be in trouble. For 1080p and bitrates up to maybe 6Mbit/s it should still be ok however. Of course, with the most modern graphics cards you can give even a Core 2 a companion device which can do H.265 decoding in hardware, like an NVidia GPU that can provide you with [PureVideo] feature sets E or even F, or maybe an AMD Radeon with  [UVD] level 6. But even then, 10-bit content might not be accelerated, so you need to be careful when choosing your GPU. Intel has recently added 10-bit support to some of their onboard GPUs (HD Graphics 5500 & 6000, Iris Graphics 6100) with driver v15.36.14.4080, nVidia added it starting with the Maxwell generation and AMD Radeons still don’t have it as far as I know.

Now, what about mobile devices? A fast enough PC can do H.265/HEVC even without hardware assist, at the very least in 1080p, and likely 4K as well, when we look at modern Core i3/i5/i7 and somewhat comparable AMD Athlon FX chips. How about multicore ARMv7/v8 chips with NEON instruction set extensions?

Let’s look at two Android devices, a Sony Xperia Tablet Z2 (without a H.265 hardware decoder) and a Sony Xperia Z5 phone:

The Tablet features the following CPU:

The Xperia Tablet Z2 uses a Qualcomm Snapdragon 801

The Xperia Tablet Z2 uses a Qualcomm Snapdragon 801 quad-core.[1]

That’s not exactly a slow chip, but an out-of-order execution pipeline with NEON extensions at a decent clock rate. And the phone:

The Xperia Z5 has a Qualcomm Snapdragon 810

The Xperia Z5 has a Qualcomm Snapdragon 810 in big.LITTLE configuration.[1]

So the phone has a very modern 64-bit ARMv8 big.LITTLE CPU setup with four faster out-of-order cores, and four slower, energy-efficient in-order cores. Optimally, all of them should be used as much as possible when throwing a seriously demanding task at the device. Let’s look at how it goes, but first on the older tablet with H.264 for starters:

The Xperia Tablet Z2 playing H.264/AVC.

The Z2 manages to play the H.264/AVC version without any stuttering, but just barely. It has to boost its clock speed all the way to the top to manage.[1]

You’re probably thinking “There’s no way this can work with H.265”, right? Well, you’d be correct:

With H.265, the Xperia Tablet Z2 stumbles.

With 3-Mbit H.265, the Xperia Tablet Z2 stumbles. Full clock speed, all cores under load, no chance to play demanding scenes without massive problems.[1]

It just bombs. Frame drops and stuttering, no way you’d want to watch anything when it goes like that. Some of the calmer scenes (=lower bitrate) still work, but lots of rain all over the frame and the Z2 is done for. So let’s move to the more modern hybrid-core processor of the Sony Xperia Z5 smartphone:

The Z5 playing the 10-bit H.264 on CPU only

The Z5 playing the 10-bit H.264 on CPU only. The octa-core big.LITTLE processor seems to load its fat cores mostly, which makes sense with demanding workloads.[1]

The big Cortex-A57 cores seem to be doing most of the work here, clocking at their maximum speed. Given we’re over 50% load with the most difficult scenes however, some threads seem to be pushed over to the smaller Cortex-A53 cores as well. In any case, the end result is that H.264 works smoothly throughout the movie. But still, load is high, and our only headroom left are some free, slow in-order cores… So, H.265?

The Z5 fails with 10-bit H.264 too. Where is my hardware decoder?!

The Z5 fails as well. While it can cope a little better, scenes like the above are just too much. Where is my hardware decoder?![1]

Something strange is happening now: H.265/HEVC hardware decoders should support this 10-bit H.265 file just fine. But for some reason, MX Player still falls back to software decoding, despite the player being up-to-date – something even a CPU as powerful as a Snapdragon 810 cannot handle for all parts of “The Garden of Words” at the given settings. The really demanding scenes will still fail to play back decently. CPU clock is also lowered a bit now, probably because of power and/or heat management.

My assumption would be that MX Player simply doesn’t support the HW decoder in this phone yet, which is a shame if it’s true. Another reason might be that I used some parameters and/or features of H.265 in this encode, that are not implemented in this chip. Whichever the case may be, the Snapdragon 810 alone cannot handle it either!

Update: After further research it turns out that almost no hardware accelerator supports Hi10p or Main-10 for H.265. In other words: No 10-bit decoding for H.265! It’s possible on NVidias Tegra K1 & X1 due to some CUDA hacks in MX Player, but nowhere else it seems. The upcoming Snapdragon 820 should however support it, and devices based on it should become available around March 2016:

Snapdragon 820s' Advancements

This is what Snapdragon 820 (MSM8996) should give us over Snapdragon 810 (MSM8994), including 10-bit H.265/HEVC decoding in hardware ([source]Russian flag)!

And this concludes my little performance analysis, after which I can say that either you need a relatively ok PC to use H.265, or a hardware decoder chip that works with your files and your software, if you’re targeting other platforms like hardware players or smartphones and tablets!

PS.: Thanks fly out to Umlüx for doing the Android tests!

[1] Images are © 2016 Umlüx, used with express permission.