Aug 032016
 

xmms logoYeah, I’m still using the good old xmms v1 on Linux and UNIX. Now, the boxes I used to play music on were all 32-bit x86 so far. Just some old hardware. Recently I tried to play some of my [AAC-LC] files on my 64-bit Linux workstation, and to my surprise, it failed to play the file, giving me the error Pulse coding not allowed in short blocks when started from the shell (just so I can read the error output).

I thought this had to have something to do with my file and not the player, so I tried .aac files with ADTS headers instead of audio packed into an mp4/m4a container. The result was the same. Also, the encoding (SBR/VBR or even HE-AAC, which sucks for music anyway) didn’t make a difference. Then I found [this post] on the Gentoo forums, showing that the problem was really the architecture. 32-bit builds wouldn’t fail, but the source code of the [libfaad2] decoding library of xmms’ mp4 plugin wasn’t ready for 64-bit.

xmms’ xmms_mp4Plugin-0.4 comes with a pretty old libfaad2, 2.0 or something. I will show you how to upgrade that to version 2.7, which is fixed for x86_64 (Readily fixed source code is also provided at the bottom). We’ll also fix up other parts of that plugin so it can compile using more modern versions of GCC and clang, and my test platform for this would be a rather conservative CentOS 6.8 Linux. First, get the source code:

I’m assuming you already have xmms installed, otherwise obtain version 1.2.11 from [here]!

Step 1, libfaad2:

Unpack both archives from above, then enter the mp4 plugins’ source directory. You’ll find a libfaad2/ directory in there. Delete or move it and all of its contents. From the faad2 source tree, copy the directory libfaad/ from there to libfaad2/ in the plugins’ directory, replacing the deleted one. Now that’s the source, but we also need the updated headers so that the xmms plugin can link against the newer libfaad2. To do that, copy the two files include/faad.h and include/neaacdec.h from the faad2 2.7 source tree to the directory include/ in the plugin source tree. Overwrite faad.h if prompted.

Step 2, libmp4v2:

Also, the bundled libmp4v2 of the mp4 plugin is very old and broken on modern systems due to code issues. Let’s fix them, so back to the mp4 plugin source tree. We need to fix some invalid pure specifiers in the following files: libmp4v2/mp4property.h, libmp4v2/mp4property.cpp, libmp4v2/rtphint.h and libmp4v2/rtphint.cpp.

To do so, open them in a text editor and search and replace all occurences of = NULL with = 0. This will fix all assignments and comparisons that would otherwise break. When using vi or vim, you can do :%s/=\sNULL/= 0/g for this.

On top of that, we need to fix a invalid const char* to char* conversion in libmp4v2/rtphint.cpp as well. Open it, and hop to line number 325, you’ll see this:

  1. char* pSlash = strchr(pRtpMap, '/');

Replace it with this:

  1. const char* pSlash = strchr(pRtpMap, '/');

And we’re done!

Now, in the mp4 plugins’ source root directory, run $ ./bootstrap && ./configure. Hopefully, no errors will occur. If all is ok, run $ make. Given you have xmms 1.2.11 installed, it should compile and link fine now. Last step: # make install.

This has been tested with my current GCC 4.4.7 platform compiler as well as GCC 4.9.3 on CentOS 6.8 Linux. Also, it has been tested with clang 3.4.1 on FreeBSD 10.3 UNIX, also x86_64. Please note that FreeBSD 10.3 needs extensive modifications to its build tools as well, so I can’t provide fixed source for this. However, packages on FreeBSD have already been fixed in that regard, so you can just # pkg install xmms-faad2 and it’s done anyway. This is likely the case for several modern Linux distros as well.

Let’s play:

xmms playing AAC-LC Freebsd 10.3 UNIX on x86_64

Yeah, it works. In this case on FreeBSD.

And the shell would say:

2-MPEG-4 AAC Low Complexity profile
MP4 - 2 channels @ 44100 Hz

Perfect! And here is the [fixed source code] upgraded with libfaad2 2.7, so you don’t have to do anything by yourself other than $ ./bootstrap && ./configure && make and # make install! Ah, actually I’m not so sure whether xmms itself builds fine on CentOS 6.8 from source these days… Maybe I’ll check that out another day. ;)

Jul 152016
 

H.265/HEVC logoSince I’ve started to release x265 binaries for Windows XP / XP x64 (and above), I’ve started to take interest in how it’s performance has developed over time. When looking back on x264, we can see that the encoders’ performance has always gradually improved with more and faster assembly code, optimizations for new CPUs’ instruction set extensions (think AMD FMA4, SSE4a or Intel AVX/AVX2) and so on. Over many years x264 has become a ton faster, as you can also see when comparing an old x264 v0.107.1745 (white results) with newer ones (red results) for the same CPUs in the [x264 benchmark].

So, x265 is much newer, but still, it’s been around for a few years now – since 2013 to be more precise – so I’d like to take a look at its performance trends using the options I typically apply for transcoding animated content. You can find the encoding settings and other information about the test below.

Note that I don’t compile every revision of x265, so I can compare only a few, namely the ones I chose to [release] periodically, excluding the new 2.0+2, which I haven’t released yet (waiting a bit more for a release). Here we go:

x265 performance trend across versions (2017-01-26)

x265 performance trend across versions, state 2017-01-26 (click to enlarge)

previous charts
x265 performance trends across versions, 2016-07-15

x265 performance trends across versions, state 2016-07-15

Note that this runtime graph is inverted, in the sense that “higher = better”. Version 1.7 was the first which could actually encode using the options of my choice, older versions don’t understand all of them and are not comparable in my case because of that. We start off with pretty bad performance and then we get an ample boost with the earlier 1.9 versions.

From 1.9+15 to 1.9+141 we can see a gradual performance increase, as expected from continued development and optimization. Naturally, 8-bits per color channel is fastest, as it makes use of a lot of 8-bit arithmetic internally. For higher bit depths (10 and 12 bits per channel, or 30 and 36 bits per pixel respectively) the internal arithmetic is boosted to a 16 bit precision, resulting in better outputs for finer gradients. This costs performance of course.

As expected, 10 and 12 bit runtimes are relatively close in terms of speed with 8 bit being quite far ahead.

Now the most surprising thing is the nose dive we take from 1.9+141 to 1.9+200. I have no idea what happened here?! Did they fix a major bug? Or did some performance-critical options change in the --preset veryslow that I’m using? I have no idea. But to put is in numbers easier to understand: For 8 bit, my test encode went from a runtime of 30:39.312 (1.9+141) to 41:20.296 (1.9+200)! Ah, the time format being MM:SS.sss, M = minutes, S = seconds and s = milliseconds. That’s almost -35%! It’s less pronounced for the higher bit depths with -25% for 10 bits and -24% for 12 bits, but that’s still significant. Maybe I shouldn’t have deleted the output data, might have been useful to look at the H.265 stream headers.

Well, from there on out we continue to see a gradual performance increase again, in a steady and stable fashion. But what happened between 1.9+141 and 1.9+200? I don’t know. Something major must have changed, I just don’t know what it was exactly…

Maybe somebody can enlighten me a bit[1], so here are the options used, right from the benchmarking script:

expand/collapse source code
  1. @ECHO OFF
  2.  
  3. :: ###################################################################################
  4. :: #                                                                                 #
  5. :: #  Please note, that command lines passed to timethis.exe may not be longer than  #
  6. :: #  1024 characters including expanded variables, so we need to keep it short      #
  7. :: #  enough. Some of the options passed to x265 may already be superflous due to    #
  8. :: #  being included in the veryslow preset, but I did not really double-check that. #
  9. :: #  Instead, I kept it just short enough by using shorter file names instead.      #
  10. :: #                                                                                 #
  11. :: #  Where to get the files:                                                        #
  12. :: #    * x265.exe     : http://wp.xin.at/x265-for-winxp-and-winxp-x64               #
  13. :: #    * avconv.exe   : http://wp.xin.at/x265-for-winxp-and-winxp-x64               #
  14. :: #    * colorecho.exe: http://wp.xin.at/archives/3064                              #
  15. :: #    * timethis.exe : https://support.microsoft.com/en-us/kb/927229               #
  16. :: #                                                                                 #
  17. :: #  GPL v3 license for this scripts code, colorecho.exe, parts of avconv.exe and   #
  18. :: #  the CygWin companion libraries for avconv.exe:                                 #
  19. :: #    * https://www.gnu.org/licenses/gpl-3.0.en.html                               #
  20. :: #                                                                                 #
  21. :: #  GPL v2 license for x265.exe:                                                   #
  22. :: #    * https://www.gnu.org/licenses/old-licenses/gpl-2.0.html                     #
  23. :: #                                                                                 #
  24. :: #  LGPL v3 license for other parts of avconv.exe:                                 #
  25. :: #    * https://www.gnu.org/licenses/lgpl-3.0.html                                 #
  26. :: #                                                                                 #
  27. :: #  NT Resource Kit license for timethis.exe:                                      #
  28. :: #    * https://enterprise.dejacode.com/license_library/Demo/ms-nt-resource-kit    #
  29. :: #                                                                                 #
  30. :: #  The x265 encoder  : http://x265.org                                            #
  31. :: #  The libav         : https://libav.org                                          #
  32. :: #  The Cygwin project: https://www.cygwin.com                                     #
  33. :: #                                                                                 #
  34. :: ###################################################################################
  35.  
  36. :: First round, 8 bits per channel color depth (MAIN profile), 8 bit arithmetic:
  37. FOR %%I IN (1.7-8b 1.9+15 1.9+108 1.9+141 1.9+200 1.9+210 1.9+230) DO .\colorecho.exe ^
  38.  ""Testing v%%I, 8 bits..."" 10 & .\timethis.exe ""echo x265 v%%I-8b & .\avconv.exe -r ^
  39.  24000/1001 -i input.h264 -f yuv4mpegpipe -pix_fmt yuv420p -r 24000/1001 - 2>NUL | ^
  40.  .\x%%I.exe - --y4m -D 8 --fps 24000/1001 -p veryslow --open-gop --ref 6 --bframes ^
  41.  16 --b-pyramid --bitrate 2000 --rect --amp --aq-mode 3 --no-sao --qcomp 0.75 ^
  42.  --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0 --rdoq-level 1 ^
  43.  --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 1 ^
  44.  --slow-firstpass --stats %%I-8b.stats --sar 1 --range full -o %%I-8b-p1.h265 2>NUL & ^
  45.  .\avconv.exe -r 24000/1001 -i input.h264 -f yuv4mpegpipe -pix_fmt yuv420p -r ^
  46.  24000/1001 - 2>NUL | .\x%%I.exe - --y4m -D 8 --fps 24000/1001 -p veryslow ^
  47.  --open-gop --ref 6 --bframes 16 --b-pyramid --bitrate 2000 --rect --amp --aq-mode 3 ^
  48.  --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0 ^
  49.  --rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 ^
  50.  --pass 2 --stats %%I-8b.stats --sar 1 --range full -o %%I-8b-p2.h265 2>NUL"" 1> ^
  51.  .\results-%%I-8b.txt 2>.\timethis-errorlog-%%I-8b.txt
  52.  
  53. :: Second round, 10 bits per channel color depth (MAIN10 profile), 16 bit arithmetic:
  54. FOR %%J IN (1.7-10b 1.9+15 1.9+108 1.9+141 1.9+200 1.9+210 1.9+230) DO .\colorecho.exe ^
  55.  ""Testing v%%J, 10 bits..."" 10 & .\timethis.exe ""echo x265 v%%J-10b & .\avconv.exe -r ^
  56.  24000/1001 -i input.h264 -f yuv4mpegpipe -pix_fmt yuv420p -r 24000/1001 - 2>NUL | ^
  57.  .\x%%J.exe - --y4m -D 10 --fps 24000/1001 -p veryslow --open-gop --ref 6 --bframes ^
  58.  16 --b-pyramid --bitrate 2000 --rect --amp --aq-mode 3 --no-sao --qcomp 0.75 ^
  59.  --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0 --rdoq-level 1 ^
  60.  --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 1 ^
  61.  --slow-firstpass --stats %%J-10b.stats --sar 1 --range full -o %%J-10b-p1.h265 2>NUL ^
  62.  & .\avconv.exe -r 24000/1001 -i input.h264 -f yuv4mpegpipe -pix_fmt yuv420p -r ^
  63.  24000/1001 - 2>NUL | .\x%%J.exe - --y4m -D 10 --fps 24000/1001 -p veryslow ^
  64.  --open-gop --ref 6 --bframes 16 --b-pyramid --bitrate 2000 --rect --amp --aq-mode 3 ^
  65.  --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0 ^
  66.  --rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass ^
  67.  2 --stats %%J-10b.stats --sar 1 --range full -o %%J-10b-p2.h265 2>NUL"" 1> ^
  68.  .\results-%%J-10b.txt 2>.\timethis-errorlog-%%J-10b.txt
  69.  
  70. :: Second round, 12 bits per channel color depth (MAIN12 profile), 16 bit arithmetic:
  71. FOR %%K IN (1.7-12b 1.9+15 1.9+108 1.9+141 1.9+200 1.9+210 1.9+230) DO .\colorecho.exe ^
  72.  ""Testing v%%K, 12 bits"" 10 & .\timethis.exe ""echo x265 v%%K-12b & .\avconv.exe -r ^
  73.  24000/1001 -i input.h264 -f yuv4mpegpipe -pix_fmt yuv420p -r 24000/1001 - 2>NUL | ^
  74.  .\x%%K.exe - --y4m -D 12 --fps 24000/1001 -p veryslow --open-gop --ref 6 --bframes ^
  75.  16 --b-pyramid --bitrate 2000 --rect --amp --aq-mode 3 --no-sao --qcomp 0.75 ^
  76.  --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0 --rdoq-level 1 ^
  77.  --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 1 ^
  78.  --slow-firstpass --stats %%K-12b.stats --sar 1 --range full -o %%K-12b-p1.h265 2>NUL ^
  79.  & .\avconv.exe -r 24000/1001 -i input.h264 -f yuv4mpegpipe -pix_fmt yuv420p -r ^
  80.  24000/1001 - 2>NUL | .\x%%K.exe - --y4m -D 12 --fps 24000/1001 -p veryslow ^
  81.  --open-gop --ref 6 --bframes 16 --b-pyramid --bitrate 2000 --rect --amp --aq-mode 3 ^
  82.  --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0 ^
  83.  --rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass ^
  84.  2 --stats %%K-12b.stats --sar 1 --range full -o %%K-12b-p2.h265 2>NUL"" 1> ^
  85.  .\results-%%K-12b.txt 2>.\timethis-errorlog-%%K-12b.txt
  86.  
  87. .\colorecho.exe "Benchmarks completed, cleaning up..." 10
  88.  
  89. :: Removing output and statistics files:
  90. del /Q *p*.h265 *stats*
  91.  
  92. .\colorecho.exe "All done, results are to be found in the results-*.txt files!" 10

So if I missed some critical changes that happened in between 1.9+141 and 1.9+200, please let me know! Oh, and here are the exact system specifications, in case it matters (it probably doesn’t):

  • Intel Xeon X5690 3.33GHz (SSE4.2 is max, no AVX), running at an all-core turbo of 3.6GHz
  • 24GB DDR-III/1066 10-10-10 2T
  • X58 chipset
  • Windows XP Professional x64 Edition SP2 with all Server 2003 R2 x64 updates

I guess I’ll keep doing the performance evaluations from here on out just to see how the encoder evolves over time, performance-wise… And maybe I’ll redo 1.9+141 and one of the newer versions and parse the stream headers to see if the effective encoding options differ anywhere after all. If yes, I’ll update this post!

Update 2017-01-27:

Three new x265 versions have been added to the performance trend analysis, 2.0+54, 2.1+60 and 2.2+22. Much to my dismay, there haven’t been any big developments when it comes to x265s’ performance on x86 machines over the last 6-8 months. Since 1.9+230 it’s pretty much linear, with only minimal variance.

So there have been changes and new options and all that, but it seems that the basic speed of the encoder in the --preset veryslow, implying --no-rskip has stayed the same. Maybe a Google Summer of Code performance challenge would do something for x265? As far as I can remember this worked miracles for x264 in the past as well, if my memory serves me right. Let’s see if we’ll get any significant improvements in the future…

[1] Thanks to “Particular” this [has been clarified] in the comments.