Jul 152016
 

H.265/HEVC logoSince I’ve started to release x265 binaries for Windows XP / XP x64 (and above), I’ve started to take interest in how it’s performance has developed over time. When looking back on x264, we can see that the encoders’ performance has always gradually improved with more and faster assembly code, optimizations for new CPUs’ instruction set extensions (think AMD FMA4, SSE4a or Intel AVX/AVX2) and so on. Over many years x264 has become a ton faster, as you can also see when comparing an old x264 v0.107.1745 (white results) with newer ones (red results) for the same CPUs in the [x264 benchmark].

So, x265 is much newer, but still, it’s been around for a few years now – since 2013 to be more precise – so I’d like to take a look at its performance trends using the options I typically apply for transcoding animated content. You can find the encoding settings and other information about the test below.

Note that I don’t compile every revision of x265, so I can compare only a few, namely the ones I chose to [release] periodically, excluding the new 2.0+2, which I haven’t released yet (waiting a bit more for a release). Here we go:

x265 performance trend across versions (2017-01-26)

x265 performance trend across versions, state 2017-01-26 (click to enlarge)

previous charts
x265 performance trends across versions, 2016-07-15

x265 performance trends across versions, state 2016-07-15

Note that this runtime graph is inverted, in the sense that “higher = better”. Version 1.7 was the first which could actually encode using the options of my choice, older versions don’t understand all of them and are not comparable in my case because of that. We start off with pretty bad performance and then we get an ample boost with the earlier 1.9 versions.

From 1.9+15 to 1.9+141 we can see a gradual performance increase, as expected from continued development and optimization. Naturally, 8-bits per color channel is fastest, as it makes use of a lot of 8-bit arithmetic internally. For higher bit depths (10 and 12 bits per channel, or 30 and 36 bits per pixel respectively) the internal arithmetic is boosted to a 16 bit precision, resulting in better outputs for finer gradients. This costs performance of course.

As expected, 10 and 12 bit runtimes are relatively close in terms of speed with 8 bit being quite far ahead.

Now the most surprising thing is the nose dive we take from 1.9+141 to 1.9+200. I have no idea what happened here?! Did they fix a major bug? Or did some performance-critical options change in the --preset veryslow that I’m using? I have no idea. But to put is in numbers easier to understand: For 8 bit, my test encode went from a runtime of 30:39.312 (1.9+141) to 41:20.296 (1.9+200)! Ah, the time format being MM:SS.sss, M = minutes, S = seconds and s = milliseconds. That’s almost -35%! It’s less pronounced for the higher bit depths with -25% for 10 bits and -24% for 12 bits, but that’s still significant. Maybe I shouldn’t have deleted the output data, might have been useful to look at the H.265 stream headers.

Well, from there on out we continue to see a gradual performance increase again, in a steady and stable fashion. But what happened between 1.9+141 and 1.9+200? I don’t know. Something major must have changed, I just don’t know what it was exactly…

Maybe somebody can enlighten me a bit[1], so here are the options used, right from the benchmarking script:

expand/collapse source code
  1. @ECHO OFF
  2.  
  3. :: ###################################################################################
  4. :: #                                                                                 #
  5. :: #  Please note, that command lines passed to timethis.exe may not be longer than  #
  6. :: #  1024 characters including expanded variables, so we need to keep it short      #
  7. :: #  enough. Some of the options passed to x265 may already be superflous due to    #
  8. :: #  being included in the veryslow preset, but I did not really double-check that. #
  9. :: #  Instead, I kept it just short enough by using shorter file names instead.      #
  10. :: #                                                                                 #
  11. :: #  Where to get the files:                                                        #
  12. :: #    * x265.exe     : http://wp.xin.at/x265-for-winxp-and-winxp-x64               #
  13. :: #    * avconv.exe   : http://wp.xin.at/x265-for-winxp-and-winxp-x64               #
  14. :: #    * colorecho.exe: http://wp.xin.at/archives/3064                              #
  15. :: #    * timethis.exe : https://support.microsoft.com/en-us/kb/927229               #
  16. :: #                                                                                 #
  17. :: #  GPL v3 license for this scripts code, colorecho.exe, parts of avconv.exe and   #
  18. :: #  the CygWin companion libraries for avconv.exe:                                 #
  19. :: #    * https://www.gnu.org/licenses/gpl-3.0.en.html                               #
  20. :: #                                                                                 #
  21. :: #  GPL v2 license for x265.exe:                                                   #
  22. :: #    * https://www.gnu.org/licenses/old-licenses/gpl-2.0.html                     #
  23. :: #                                                                                 #
  24. :: #  LGPL v3 license for other parts of avconv.exe:                                 #
  25. :: #    * https://www.gnu.org/licenses/lgpl-3.0.html                                 #
  26. :: #                                                                                 #
  27. :: #  NT Resource Kit license for timethis.exe:                                      #
  28. :: #    * https://enterprise.dejacode.com/license_library/Demo/ms-nt-resource-kit    #
  29. :: #                                                                                 #
  30. :: #  The x265 encoder  : http://x265.org                                            #
  31. :: #  The libav         : https://libav.org                                          #
  32. :: #  The Cygwin project: https://www.cygwin.com                                     #
  33. :: #                                                                                 #
  34. :: ###################################################################################
  35.  
  36. :: First round, 8 bits per channel color depth (MAIN profile), 8 bit arithmetic:
  37. FOR %%I IN (1.7-8b 1.9+15 1.9+108 1.9+141 1.9+200 1.9+210 1.9+230) DO .\colorecho.exe ^
  38.  ""Testing v%%I, 8 bits..."" 10 & .\timethis.exe ""echo x265 v%%I-8b & .\avconv.exe -r ^
  39.  24000/1001 -i input.h264 -f yuv4mpegpipe -pix_fmt yuv420p -r 24000/1001 - 2>NUL | ^
  40.  .\x%%I.exe - --y4m -D 8 --fps 24000/1001 -p veryslow --open-gop --ref 6 --bframes ^
  41.  16 --b-pyramid --bitrate 2000 --rect --amp --aq-mode 3 --no-sao --qcomp 0.75 ^
  42.  --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0 --rdoq-level 1 ^
  43.  --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 1 ^
  44.  --slow-firstpass --stats %%I-8b.stats --sar 1 --range full -o %%I-8b-p1.h265 2>NUL & ^
  45.  .\avconv.exe -r 24000/1001 -i input.h264 -f yuv4mpegpipe -pix_fmt yuv420p -r ^
  46.  24000/1001 - 2>NUL | .\x%%I.exe - --y4m -D 8 --fps 24000/1001 -p veryslow ^
  47.  --open-gop --ref 6 --bframes 16 --b-pyramid --bitrate 2000 --rect --amp --aq-mode 3 ^
  48.  --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0 ^
  49.  --rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 ^
  50.  --pass 2 --stats %%I-8b.stats --sar 1 --range full -o %%I-8b-p2.h265 2>NUL"" 1> ^
  51.  .\results-%%I-8b.txt 2>.\timethis-errorlog-%%I-8b.txt
  52.  
  53. :: Second round, 10 bits per channel color depth (MAIN10 profile), 16 bit arithmetic:
  54. FOR %%J IN (1.7-10b 1.9+15 1.9+108 1.9+141 1.9+200 1.9+210 1.9+230) DO .\colorecho.exe ^
  55.  ""Testing v%%J, 10 bits..."" 10 & .\timethis.exe ""echo x265 v%%J-10b & .\avconv.exe -r ^
  56.  24000/1001 -i input.h264 -f yuv4mpegpipe -pix_fmt yuv420p -r 24000/1001 - 2>NUL | ^
  57.  .\x%%J.exe - --y4m -D 10 --fps 24000/1001 -p veryslow --open-gop --ref 6 --bframes ^
  58.  16 --b-pyramid --bitrate 2000 --rect --amp --aq-mode 3 --no-sao --qcomp 0.75 ^
  59.  --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0 --rdoq-level 1 ^
  60.  --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 1 ^
  61.  --slow-firstpass --stats %%J-10b.stats --sar 1 --range full -o %%J-10b-p1.h265 2>NUL ^
  62.  & .\avconv.exe -r 24000/1001 -i input.h264 -f yuv4mpegpipe -pix_fmt yuv420p -r ^
  63.  24000/1001 - 2>NUL | .\x%%J.exe - --y4m -D 10 --fps 24000/1001 -p veryslow ^
  64.  --open-gop --ref 6 --bframes 16 --b-pyramid --bitrate 2000 --rect --amp --aq-mode 3 ^
  65.  --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0 ^
  66.  --rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass ^
  67.  2 --stats %%J-10b.stats --sar 1 --range full -o %%J-10b-p2.h265 2>NUL"" 1> ^
  68.  .\results-%%J-10b.txt 2>.\timethis-errorlog-%%J-10b.txt
  69.  
  70. :: Second round, 12 bits per channel color depth (MAIN12 profile), 16 bit arithmetic:
  71. FOR %%K IN (1.7-12b 1.9+15 1.9+108 1.9+141 1.9+200 1.9+210 1.9+230) DO .\colorecho.exe ^
  72.  ""Testing v%%K, 12 bits"" 10 & .\timethis.exe ""echo x265 v%%K-12b & .\avconv.exe -r ^
  73.  24000/1001 -i input.h264 -f yuv4mpegpipe -pix_fmt yuv420p -r 24000/1001 - 2>NUL | ^
  74.  .\x%%K.exe - --y4m -D 12 --fps 24000/1001 -p veryslow --open-gop --ref 6 --bframes ^
  75.  16 --b-pyramid --bitrate 2000 --rect --amp --aq-mode 3 --no-sao --qcomp 0.75 ^
  76.  --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0 --rdoq-level 1 ^
  77.  --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 1 ^
  78.  --slow-firstpass --stats %%K-12b.stats --sar 1 --range full -o %%K-12b-p1.h265 2>NUL ^
  79.  & .\avconv.exe -r 24000/1001 -i input.h264 -f yuv4mpegpipe -pix_fmt yuv420p -r ^
  80.  24000/1001 - 2>NUL | .\x%%K.exe - --y4m -D 12 --fps 24000/1001 -p veryslow ^
  81.  --open-gop --ref 6 --bframes 16 --b-pyramid --bitrate 2000 --rect --amp --aq-mode 3 ^
  82.  --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0 ^
  83.  --rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass ^
  84.  2 --stats %%K-12b.stats --sar 1 --range full -o %%K-12b-p2.h265 2>NUL"" 1> ^
  85.  .\results-%%K-12b.txt 2>.\timethis-errorlog-%%K-12b.txt
  86.  
  87. .\colorecho.exe "Benchmarks completed, cleaning up..." 10
  88.  
  89. :: Removing output and statistics files:
  90. del /Q *p*.h265 *stats*
  91.  
  92. .\colorecho.exe "All done, results are to be found in the results-*.txt files!" 10

So if I missed some critical changes that happened in between 1.9+141 and 1.9+200, please let me know! Oh, and here are the exact system specifications, in case it matters (it probably doesn’t):

  • Intel Xeon X5690 3.33GHz (SSE4.2 is max, no AVX), running at an all-core turbo of 3.6GHz
  • 24GB DDR-III/1066 10-10-10 2T
  • X58 chipset
  • Windows XP Professional x64 Edition SP2 with all Server 2003 R2 x64 updates

I guess I’ll keep doing the performance evaluations from here on out just to see how the encoder evolves over time, performance-wise… And maybe I’ll redo 1.9+141 and one of the newer versions and parse the stream headers to see if the effective encoding options differ anywhere after all. If yes, I’ll update this post!

Update 2017-01-27:

Three new x265 versions have been added to the performance trend analysis, 2.0+54, 2.1+60 and 2.2+22. Much to my dismay, there haven’t been any big developments when it comes to x265s’ performance on x86 machines over the last 6-8 months. Since 1.9+230 it’s pretty much linear, with only minimal variance.

So there have been changes and new options and all that, but it seems that the basic speed of the encoder in the --preset veryslow, implying --no-rskip has stayed the same. Maybe a Google Summer of Code performance challenge would do something for x265? As far as I can remember this worked miracles for x264 in the past as well, if my memory serves me right. Let’s see if we’ll get any significant improvements in the future…

[1] Thanks to “Particular” this [has been clarified] in the comments.