Mar 142018
 
H.265/HEVC logo

1.) Introduction

This post is a followup to the original you can find [here]. My reason for writing a new one instead of just editing the existing one is that the new results have been measured with a slightly different parametrization and under slightly different circumstances, so direct comparability has been thrown out the window. Anyway, like in the former article, multiple x265 video encoder versions starting with 1.7+512 are going to be compared to each other to see how the performance of x265 has evolved over time.

Originally, I wanted to test this only on a machine very similar to my original testbed, maxing out at 6 cores and SSE4.2 instruction set extensions. Since it is said that x265 features significant optimizations for newer AVX and AVX2 instructions however, I decided to include a more modern machine as well, to determine how the scaling differs on both an older and a newer machine.

2.) Benchmarks

First of all, the video used for this is not very “high definition”. Well, maybe by definition as it is 720p, but still. This also puts limits on scalability across cores/threads, which shall provide some additional insights later. Let’s start.

2a.) The older machine on MS Windows

The older one goes first: Here we have an Intel Core i7 980X running CentOS 6.9 Linux. Since I already had all those [Win64 builds] lying around, I ran the test from within a VirtualBox running Windows XP Professional x64 Edition SP2. The specs:

  • CPU: Intel Core i7 980X 3.33GHz (6 cores, 12 threads), last instruction set extension: SSE4.2
  • OS: Windows XP Professional x64 Edition SP2 on VirtualBox 5.1.2 on CentOS Linux 6.9

Here we go:

x265 performance trend on the older machine

x265 performance trend on the older machine (click to enlarge)

While the developments here aren’t exactly identical to what has been posted in the former article, the trend is clearly the same. The new part starts with version 2.3+2, after which we can see a long-needed performance improvement, maybe due to improvements to the assembly code. Also, from here on out we can observe a slight upwards trend for all color channel depths. The basic drop from 8-bit to 10-bit and then 12-bit color stays rather similar though, courtesy of the more expensive 16-bit arithmetic used for the higher color depths (The 8-bit version of x265 also uses 8-bit arithmetic where applicable, reducing effective precision).

There is nothing groundbreaking to be seen like with the introduction of --no-rskip somewhere in between 1.9+141 and 1.9+200, but whatever happened between 2.3+2 and 2.4+2 isn’t too shabby either.

Now let’s see how the modern box with AVX, AVX2, BMI2 and FMA3 instructions fares here!

2b.) The newer machine on CentOS Linux

Alright, specs first:

  • CPUs: 2 × Intel Xeon E5-2620 v4 (2 × 8 cores, 16 Threads for a 16C/32T total), newest instruction set extensions: AVX2, BMI2, FMA3
  • OS: CentOS 7.3.1611 with Linux kernel 3.10.0-514.16.1

This is a modern 2017 Broadwell-EP server featuring a much higher IPC when compared to the older Westmere/Gulftown architecture from 2010. My estimate would be around +40% of raw, additional instructions per clock here. On top of that, we’ll get a peek at its AVX/AVX2 etc. scaling:

x265 performance trend on the modern machine

x265 performance trend on the modern machine

Now, before I comment on the trend to be seen in that chart, let me say one thing: Surprisingly, unlike what x264 has shown over the years, x265 didn’t seem to want to scale more across cores/threads with progressing development. So the core count was not really having much influence at all; Pass 1 of the encode would use 5-6 cores, where pass 2 would eat up 6-7, all the way from the oldest to the newest versions.

What we do see here aside from the --no-rskip drop and the performance increase from 2.3+2 to 2.4+2 are three things: First, we can see a new upwards trend in performance starting with 2.5+9. The development has been more linear for the older CPU, which is why I’m assuming that we get to see some work invested in AVX(2) code here, probably also in BMI/FMA code.

The second thing is the more significant decline starting after 2.0+11. I have no explanation for this, as new features added around here shouldn’t have more negative impacts on newer CPUs as when to compared to older ones. If anything, it should be the other way around, so that part going on up to 2.1+60 is a bit confusing.

And the third thing that is very clear here: The 12-bit version is much closer to 10-bit performance. Only thing I can think of here is that the AVX/AVX2 code paths have generally always been faster on the 12-bit version when compared to the older ≤SSE4.2 assembly code.

Seeing this, I’d like to look at one more thing: The dropoff we can see from 8-bit -> 10-bit -> 12-bit compared between the two architectures. Let’s do that:

3.) 8-/10-/12-bit scaling across platforms

Let’s compare this side-by-side across all versions tested, in a very simple chart:

Well, the modern Xeons do lose less performance even from 8- to 10-bit, but as seen above, the drop to 12-bit is surprisingly insignificant on modern CPUs. This is a bit unfortunate actually, given that all the rage is about 10-bit with the Anime communities and also the official Blu-Ray UHD/4K standards, but oh well. :roll:

4.) Summing it up

Anyway, this hasn’t been too interesting actually, aside from the 12-bit surprise that I guess nobody cares about anyway. Well, we’re seeing some more significant performance increases for recent versions of x265 on modern architectures, but those are only compensating for the loss of performance we’ve seen before, that didn’t occur on older machines.

Maybe the AVX+AVX2 code was good from the very start instead of spreading its wings only now? Anyway, we’re seeing some slow but steady increase in performance for now even on older machines. If there’s a reason to buy new processors for x265 it’s probably not the AVX/AVX2/FMA3/BMI2/AVX512/whatever, but rather the generally higher IPC of newer chips…

5.) How the benchmark was done

5a.) On Windows

Given a folder with x265 versions named x1.7+512.exe, x1.6+2.exe etc., as well as an ffmpeg.exe (like from [here]), the subfolders log\, output\, stats\ and results\, you can adapt and use the following script after putting it right next to the x265 binaries, I call it test-performance-trends.bat:

expand/collapse source code
  1. @ECHO OFF
  2.  
  3. FOR %%I IN (1.7-8b 1.9+15 1.9+108 1.9+141 1.9+200 1.9+210 1.9+230 2.0+11 2.0+54 ^
  4.  2.1+2 2.1+60 2.2+22 2.3+2 2.4+2 2.5+9 2.6+2 2.7) DO ECHO Testing x265-%%I, 8 ^
  5.  bits... & .\timethis.exe "echo x265 version %%I 8 bit results: & .\ffmpeg.exe ^
  6.  -r 24000/1001 -i .\video-teekyuu.h264 -f yuv4mpegpipe -pix_fmt yuv420p10le ^
  7.  -strict -1 -r 24000/1001 - 2>NUL | .\x%%I.exe - --y4m -D 8 --fps 24000/1001 -p ^
  8.  veryslow --bitrate 2000 --pass 1 --slow-firstpass --stats .\stats\%%I-8b.stats ^
  9.  -o .\output\%%I-8b-p1.h265 2>NUL & .\ffmpeg.exe -r 24000/1001 -i ^
  10.  .\video-teekyuu.h264 -f yuv4mpegpipe -pix_fmt yuv420p10le -strict -1 ^
  11.  -r 24000/1001 - 2>NUL | .\x%%I.exe - --y4m -D 8 --fps 24000/1001 -p veryslow ^
  12.  --bitrate 2000 --pass 2 --stats .\stats\%%I-8b.stats -o .\output\%%I-8b-p2.h265 ^
  13.  2>NUL" 1> .\results\results-%%I-8b.txt 2>.\log\timethis-errorlog-%%I-8b.txt
  14.  
  15.  
  16. FOR %%J IN (1.7-10b 1.9+15 1.9+108 1.9+141 1.9+200 1.9+210 1.9+230 2.0+11 2.0+54 ^
  17.  2.1+2 2.1+60 2.2+22 2.3+2 2.4+2 2.5+9 2.6+2 2.7) DO ECHO Testing x265-%%J, 10 ^
  18.  bits... & .\timethis.exe "echo x265 version %%J 10 bit results: & .\ffmpeg.exe ^
  19.  -r 24000/1001 -i .\video-teekyuu.h264 -f yuv4mpegpipe -pix_fmt yuv420p10le ^
  20.  -strict -1 -r 24000/1001 - 2>NUL | .\x%%J.exe - --y4m -D 10 --fps 24000/1001 -p ^
  21.  veryslow --bitrate 2000 --pass 1 --slow-firstpass --stats .\stats\%%J-10b.stats ^
  22.  -o .\output\%%J-10b-p1.h265 2>NUL & .\ffmpeg.exe -r 24000/1001 -i ^
  23.  .\video-teekyuu.h264 -f yuv4mpegpipe -pix_fmt yuv420p10le -strict -1 ^
  24.  -r 24000/1001 - 2>NUL | .\x%%J.exe - --y4m -D 10 --fps 24000/1001 -p veryslow ^
  25.  --bitrate 2000 --pass 2 --stats .\stats\%%J-10b.stats -o ^
  26.  .\output\%%J-10b-p2.h265 2>NUL" 1> .\results\results-%%J-10b.txt ^
  27.  2>.\log\timethis-errorlog-%%J-10b.txt
  28.  
  29. FOR %%K IN (1.7-12b 1.9+15 1.9+108 1.9+141 1.9+200 1.9+210 1.9+230 2.0+11 2.0+54 ^
  30.  2.1+2 2.1+60 2.2+22 2.3+2 2.4+2 2.5+9 2.6+2 2.7) DO ECHO Testing x265-%%K, 12 ^
  31.  bits... & .\timethis.exe "echo x265 version %%K 12 bit results: & .\ffmpeg.exe ^
  32.  -r 24000/1001 -i .\video-teekyuu.h264 -f yuv4mpegpipe -pix_fmt yuv420p10le ^
  33.  -strict -1 -r 24000/1001 - 2>NUL | .\x%%K.exe - --y4m -D 12 --fps 24000/1001 ^
  34.  -p veryslow --bitrate 2000 --pass 1 --slow-firstpass --stats ^
  35.  .\stats\%%K-12b.stats -o .\output\%%K-12b-p1.h265 2>NUL & .\ffmpeg.exe -r ^
  36.  24000/1001 -i .\video-teekyuu.h264 -f yuv4mpegpipe -pix_fmt yuv420p10le -strict ^
  37.  -1 -r 24000/1001 - 2>NUL | .\x%%K.exe - --y4m -D 12 --fps 24000/1001 ^
  38.  -p veryslow --bitrate 2000 --pass 2 --stats .\stats\%%K-12b.stats -o ^
  39.  .\output\%%K-12b-p2.h265 2>NUL" 1> .\results\results-%%K-12b.txt ^
  40.  2>.\log\timethis-errorlog-%%K-12b.txt
  41.  
  42. ECHO All done, results are to be found in the results\results-*.txt files!

5b.) On Linux

Given a system-wide installation of ffmpeg and a folder with statically linked x265 binaries named x1.7+512, x1.6+2 etc., as well as the subfolder results/, the following script – test-performance-trends.sh – should do the job after being adapted for your list of x265 binaries and put right next to them:

expand/collapse source code
  1. #!/usr/bin/env sh
  2.  
  3. for i in {1.7+512,1.9+15,1.9+108,1.9+141,1.9+200,1.9+210,1.9+230,2.0+11,2.0+54,\
  4. 2.1+2,2.1+60,2.2+22,2.3+2,2.4+2,2.5+9,2.6+2,2.7}; do printf "\nTesting x265-$i, \
  5. 8 bits...\n" && time ( printf "x265 version $i 8 bit results:\n" \
  6. 1>./results/results-$i-8b.txt && ffmpeg -r 24000/1001 -i ./video-teekyuu.h264 \
  7. -f yuv4mpegpipe -pix_fmt yuv420p10le -strict -1 -r 24000/1001 - 2>/dev/null | \
  8. ./x$i - --y4m -D 8 --fps 24000/1001 -p veryslow --bitrate 2000 --pass 1 \
  9. --slow-firstpass --stats ./stats/$i-8b.stats -o ./output/$i-8b-p1.h265 \
  10. 2>/dev/null && ffmpeg -r 24000/1001 -i ./video-teekyuu.h264 -f yuv4mpegpipe \
  11. -pix_fmt yuv420p10le -strict -1 -r 24000/1001 - 2>/dev/null | ./x$i - --y4m \
  12. -D 8 --fps 24000/1001 -p veryslow --bitrate 2000 --pass 2 --stats \
  13. ./stats/$i-8b.stats -o ./output/$i-8b-p2.h265 2>/dev/null ) \
  14. 2>>./results/results-$i-8b.txt; done
  15.  
  16. for j in {1.7+512,1.9+15,1.9+108,1.9+141,1.9+200,1.9+210,1.9+230,2.0+11,2.0+54,\
  17. 2.1+2,2.1+60,2.2+22,2.3+2,2.4+2,2.5+9,2.6+2,2.7}; do printf "\nTesting x265-$j, \
  18. 10 bits...\n\n" && time ( printf "x265 version $j 10 bit results:\n" \
  19. 1>./results/results-$j-10b.txt && ffmpeg -r 24000/1001 -i ./video-teekyuu.h264 \
  20. -f yuv4mpegpipe -pix_fmt yuv420p10le -strict -1 -r 24000/1001 - 2>/dev/null | \
  21. ./x$j - --y4m -D 10 --fps 24000/1001 -p veryslow --bitrate 2000 --pass 1 \
  22. --slow-firstpass --stats ./stats/$j-10b.stats -o ./output/$j-10b-p1.h265 \
  23. 2>/dev/null && ffmpeg -r 24000/1001 -i ./video-teekyuu.h264 -f yuv4mpegpipe \
  24. -pix_fmt yuv420p10le -strict -1 -r 24000/1001 - 2>/dev/null | ./x$j - --y4m \
  25. -D 10 --fps 24000/1001 -p veryslow --bitrate 2000 --pass 2 --stats \
  26. ./stats/$j-10b.stats -o ./output/$j-10b-p2.h265 2>/dev/null ) \
  27. 2>>./results/results-$j-10b.txt; done
  28.  
  29. for k in {1.7+512,1.9+15,1.9+108,1.9+141,1.9+200,1.9+210,1.9+230,2.0+11,2.0+54,\
  30. 2.1+2,2.1+60,2.2+22,2.3+2,2.4+2,2.5+9,2.6+2,2.7}; do printf "\nTesting x265-$k, \
  31. 12 bits...\n\n" && time ( printf "x265 version $k 12 bit results:\n" \
  32. 1>./results/results-$k-12b.txt && ffmpeg -r 24000/1001 -i ./video-teekyuu.h264 \
  33. -f yuv4mpegpipe -pix_fmt yuv420p10le -strict -1 -r 24000/1001 - 2>/dev/null | \
  34. ./x$k - --y4m -D 12 --fps 24000/1001 -p veryslow --bitrate 2000 --pass 1 \
  35. --slow-firstpass --stats ./stats/$k-12b.stats -o ./output/$k-12b-p1.h265 \
  36. 2>/dev/null && ffmpeg -r 24000/1001 -i ./video-teekyuu.h264 -f yuv4mpegpipe \
  37. -pix_fmt yuv420p10le -strict -1 -r 24000/1001 - 2>/dev/null | ./x$k - --y4m \
  38. -D 12 --fps 24000/1001 -p veryslow --bitrate 2000 --pass 2 --stats \
  39. ./stats/$k-12b.stats -o ./output/$k-12b-p2.h265 2>/dev/null ) \
  40. 2>>./results/results-$k-12b.txt; done
  41.  
  42. printf "\nBenchmarks completed, results in `pwd`/results/results-*.txt!\n"

That’s it!

Jan 252017
 
H.265/HEVC logo

1.) Introduction

After doing a [somewhat proper comparison between x264 and x265] a while back, I thought I’d do another one at extremely low bitrates. It reminded me of the time I’ve been using ISDN at 64kbit/s (my provider didn’t let me use CAPI channel aggregation for 128kbit/s), which was the first true flat rate in my country. ‘Cause I’ve been thinking this:

“Can H.265/HEVC enable an ISDN user to stream 1080p content in any useful form?” and “What would H.264/AVC look like in that case?”

Let me say this first: It reaaally depends on how you define “useful”. :roll:

Pretty much nobody uses ISDN these days, and V.9x 56kbaud modems are dying out in the 1st world as well, so this article doesn’t make a lot of sense. To be fair, I didn’t even pick encoding settings fit for low-latency streaming either, nor are my settings fit for live encoding. So it’s just for the lulz, but still! I wanted to see whether it could be done at all, in theory.

To make it happen, I had to choose extremely low bitrates not just for video, but audio as well. There are even subtitles included in my example, which are present in Matroska-style zlib-compressed [.ass] format, so compressed text essentially.

For the audio part, I chose the Fraunhofer FhG-AAC encoder to encode to the lowest possible constant bitrate, which is 8kbit/s HEv2-AAC. That’s a VoIP-focused version of the codec targeted at preserving human speech as well as possible at conditions as bad as they get. And yes, it sounds terrible. But it still gets across just enough to be able to understand what people are saying and what type of sounds are occurring in a scene. Music and most environmental sounds are terrible in quality, but they are still discernible.

For video, I picked a 2-pass ABR mode with a 50kbit/s target bitrate, which is insanely low even for the Anime content I picked (my apologies, Mr. “[Anime is not what everyone watches]”, but yes, I picked Anime again). Note that 2D animated content is pretty easy on the encoders in this case, so the results would’ve likely been a lot worse with 1080p live action content. As for the encoder settings, you can find those [down below] and as for how I’m taking the screenshots, I’ll spare you those details, they’re pretty similar to the stuff shown in the link at the top.

Before we start with the actual quality comparison, I should mention that my test results actually overshot their target, so they’re really unsuitable for live streaming even in the ISDN case. I just didn’t care enough for trying to push the bitrate down any further. Regular streaming would still be possible with my result files, but not without prebuffering. See here:

$ ls -hs *.mkv
2.6M Piaceː Watashi no Italian - Episode 02-H.264+HEv2AAC-V50kbit-A8kbit.mkv
2.0M Piaceː Watashi no Italian - Episode 02-H.265+HEv2AAC-V50kbit-A8kbit.mkv
 76M Piaceː Watashi no Italian - Episode 02.mkv
$ for i in {'Piaceː Watashi no Italian - Episode 02.mkv','Piaceː Watashi no Italian - \
Episode 02-H.264+HEv2AAC-V50kbit-A8kbit.mkv','Piaceː Watashi no Italian - \
Episode 02-H.265+HEv2AAC-V50kbit-A8kbit.mkv'}; do mediainfo "$i" | grep -i "overall bit rate"; done
Overall bit rate                         : 2 640 kb/s
Overall bit rate                         : 88.6 kb/s
Overall bit rate                         : 69.2 kb/s

The first one is the source (note: From [Crunchyroll], legal to watch and record in my country at this time), the second my x264 and the third my x265 versions. Let’s show you the bitrate overshoot of just the video streams in my versions:

$ for i in {'Piaceː Watashi no Italian - Episode 02-H.264+HEv2AAC-V50kbit-A8kbit.mkv','Piaceː \
Watashi no Italian - Episode 02-H.265+HEv2AAC-V50kbit-A8kbit.mkv'}; do mediainfo \
--Output="Video;%BitRate/String%" "$i"; done
71.1 kb/s
51.6 kb/s

So as you can see, x264 messed up pretty big, overshooting by 21.1kbit/s (42.2%), whereas x265 almost landed on target, overshooting by a mere 1.6kbit/s (3.2%) overall. And still… well… Let’s give you an overview first (as usual, click to enlarge images):

2.) Quality comparisons

Note that the color shown in those thumbnails is not representative of the real images, this has been transformed to 256 color .png to make it easier to download (again, if your browser supports it, .webp will be loaded instead transparently). This is just to show you some basic differences between what x264 and x265 are able to preserve, and what they are not. Also, keep in mind, that “~50kbit/s nominal bitrate” means 71.1kbit/s for x264 and 51.6kbit/s for x265!

Overall, x264 fucks up big time. There are frames with partial macroblock drops and completely blank frames even! Also, a lot of frames lose their color either partially or completely as well, making them B/W. And that’s given that x264 even invested 42.2% more bitrate than what I aimed for!

x265 has no such severe issues, all frames are completely there in full color, and that at a bitrate reasonable close to the target. Let’s look at a few interesting cases side by side:

Scene 1 (left: x264, middle: x265, right: source file):

There are some indications of use of larger CTUs (coding tree unit, H.265s’ replacement for macroblocks) in x265s’ case, which is supposed to be one of its strong points, especially for very large resolution encoding (think: 4K/UHD, 8K). While larger blocks can mean loss of detail in that area, it’s ok for larger areas of uniform color, which this Anime has a ton of. H.264/AVC can’t do that so well, because the upper limit for a macroblocks’ size is rather low with 16×16 pixels. You can see the macroblock size pretty clearly in the blocky frame to the left. You need to look a bit more carefully in x265s’ case, but there are a few spots where I believe it can be seen as well. In my case the CTU size for x265 was 32×32px.

Hm, maybe --ctu 64 would’ve been better for this specific case, but whatever.

Lets look at two more mostly color-related comparisons:

Scenes 2 & 3 (left: x264, middle: x265, right: source file):

 

In the first case it seems as if x264 is trying to preserve shades of green more than anything, but in the second case, something terrible happens. There is a lot of red in the scene before this one, and there is quite some red on those can labels as well. It seems x264 doesn’t know where to put the color anymore, and the reds bleed almost all over the frame. And it stays like that for the entire scene as well, which means for several seconds. The greens and browns are lost. Block artifacts are excessive as well, but at least x264 managed to give us whole frames here, with some color even.

Well, the color kinda went everywhere, but uhm, yeah…

Two more:

Scenes 4 & 5 (left: x264, middle: x265, right: source file):

 

I really don’t know what’s with x264 and the reds. Shouldn’t green have priority? I mean, not just in the chroma subsampling, but in encoding as well? But red seems what x264 drops last, and it happens more than once. Given the detail and movements in that last part, even x265 fails though. Yes, it does preserve more color, but it doesn’t come remotely close to the source at this bitrate.

And that other frame with the cuteness overload? There are a lot like those, where x264 just kinda panics, drops everything it has and then frantically tries to (re?)construct the current frame, sometimes only partially until the next I-frame arrives or so.

So that’s it for my quick & dirty “ultra low bitrate” comparison between x264 and x265, at pretty taxing encoding settings once again.

3.) Additional information

x264 encoding settings:

$ mediainfo Piaceː\ Watashi\ no\ Italian\ -\ Episode\ 02-H.264+HEv2AAC-V50kbit-A8kbit.mkv | grep -i \
"encoding settings"
Encoding settings                        : cabac=1 / ref=16 / deblock=1:-2:0 / analyse=0x3:0x133 / \
me=umh / subme=10 / psy=1 / psy_rd=0.40:0.00 / mixed_ref=1 / me_range=24 / chroma_me=1 / trellis=2 \
/ 8x8dct=1 / cqm=0 / deadzone=21,11 / fast_pskip=0 / chroma_qp_offset=-2 / threads=18 / \
lookahead_threads=4 / sliced_threads=0 / nr=0 / decimate=1 / interlaced=0 / bluray_compat=0 / \
constrained_intra=0 / bframes=16 / b_pyramid=2 / b_adapt=2 / b_bias=0 / direct=3 / weightb=1 / \
open_gop=1 / weightp=2 / keyint=250 / keyint_min=23 / scenecut=40 / intra_refresh=0 / \
rc_lookahead=60 / rc=2pass / mbtree=1 / bitrate=50 / ratetol=1.0 / qcomp=0.60 / qpmin=0 / qpmax=81 \
/ qpstep=4 / cplxblur=20.0 / qblur=0.5 / ip_ratio=1.40 / aq=1:0.60

x265 encoding settings (note: 10 bits per color channel were chosen, same as for x264):

$ mediainfo Piaceː\ Watashi\ no\ Italian\ -\ Episode\ 02-H.265+HEv2AAC-V50kbit-A8kbit.mkv | grep -i \
"encoding settings"
Encoding settings                        : cpuid=1049087 / frame-threads=3 / wpp / pmode / pme / \
no-psnr / no-ssim / log-level=2 / input-csp=1 / input-res=1920x1080 / interlace=0 / total-frames=0 \
/ level-idc=0 / high-tier=1 / uhd-bd=0 / ref=6 / no-allow-non-conformance / no-repeat-headers / \
annexb / no-aud / no-hrd / info / hash=0 / no-temporal-layers / open-gop / min-keyint=23 / \
keyint=250 / bframes=16 / b-adapt=2 / b-pyramid / bframe-bias=0 / rc-lookahead=40 / \
lookahead-slices=0 / scenecut=40 / no-intra-refresh / ctu=32 / min-cu-size=8 / rect / amp / \
max-tu-size=32 / tu-inter-depth=4 / tu-intra-depth=4 / limit-tu=0 / rdoq-level=1 / signhide / \
no-tskip / nr-intra=0 / nr-inter=0 / no-constrained-intra / no-strong-intra-smoothing / max-merge=5 \
/ limit-refs=1 / limit-modes / me=3 / subme=4 / merange=57 / temporal-mvp / weightp / weightb / \
no-analyze-src-pics / deblock=0:0 / no-sao / no-sao-non-deblock / rd=6 / no-early-skip / rskip / \
no-fast-intra / no-tskip-fast / no-cu-lossless / b-intra / rdpenalty=0 / psy-rd=1.60 / \
psy-rdoq=5.00 / no-rd-refine / analysis-mode=0 / no-lossless / cbqpoffs=0 / crqpoffs=0 / rc=abr / \
bitrate=50 / qcomp=0.75 / qpstep=4 / stats-write=0 / stats-read=2 / stats-file=265/v.stats / \
cplxblur=20.0 / qblur=0.5 / ipratio=1.40 / pbratio=1.30 / aq-mode=3 / aq-strength=1.00 / cutree / \
zone-count=0 / no-strict-cbr / qg-size=32 / no-rc-grain / qpmax=69 / qpmin=0 / sar=1 / overscan=0 / \
videoformat=5 / range=1 / colorprim=2 / transfer=2 / colormatrix=2 / chromaloc=0 / display-window=0 \
/ max-cll=0,0 / min-luma=0 / max-luma=1023 / log2-max-poc-lsb=8 / vui-timing-info / vui-hrd-info / \
slices=1 / opt-qp-pps / opt-ref-list-length-pps / no-multi-pass-opt-rps / scenecut-bias=0.05

x264 version:

$ x264 --version
x264 0.148.x
(libswscale 3.0.0)
(libavformat 56.1.0)
built on Sep  6 2016, gcc: 6.2.0
x264 configuration: --bit-depth=10 --chroma-format=all
libx264 configuration: --bit-depth=10 --chroma-format=all
x264 license: GPL version 2 or later
libswscale/libavformat license: nonfree and unredistributable
WARNING: This binary is unredistributable!

x265 version:

$ x265 --version
x265 [info]: HEVC encoder version 2.2+23-58dddcf01b7d
x265 [info]: build info [Linux][GCC 6.2.0][64 bit] 8bit+10bit+12bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2

Encoding & testing platform:

$ uname -sr
Linux 2.6.32-573.8.1.el6.x86_64
$ cat /etc/redhat-release 
CentOS release 6.8 (Final)
$ cat /proc/cpuinfo | grep "model name" | uniq
model name	: Intel(R) Core(TM) i7 CPU       X 980  @ 3.33GHz

4.) Answers

Q: “Can H.265/HEVC enable an ISDN user to stream 1080p content in any useful form?

A: It can probably stream something that at least resembles the original source in a recognizable fashion, but… whether you can call that “useful” or not is another thing entirely…

Q: “What would H.264/AVC look like in that case?”

A: Like shit! :roll:

Feb 202016
 

H.265/HEVC logoRecently, after [successfully compiling] the next generation x265 H.265/HEVC video encoder on Windows, Linux and FreeBSD, I decided to ask for guidance when it comes to compressing Anime (live action will follow at a later time) in the Doom9 forums, [see here]. Thing is, I didn’t understand all of the knobs x265 has to offer, and some of the convenient presets of x264 didn’t exist here (like --tune film and --tune animation). So for a newbie it can be quite hard to make x265 perform well without sacrificing far too much CPU power, as x265 is significantly more taxing on the processor than x264.

Thanks to [Asmodian] and [MeteorRain]/[LittlePox] I got rid of x265s’ blurring issues, and I took their settings and turned them up to achieve more quality while staying within sane encoding times. My goal was to be able to encode 1080p ~24fps videos on an Intel Xeon X5690 hexcore @ 3.6GHz all-core boost clock at >=1fps for a target bitrate of 2.5Mbit.

In this post, I’d like to compare 7 scenes from the highly opulent Anime [The Garden of Words] (言の葉の庭) by [Makoto Shinkai] (新海 誠) at three different average bitrates, 1Mbit, 2.5Mbit (my current x264 default) and 5Mbit. The Blu-Ray source material is H.264/AVC at roughly 28Mbit on average. Also, both encoders are running in 10-bit color depth mode instead of the common 8-bit, meaning that the internal arithmetic precision is boosted from 8- to 16-bit integers as well. While somewhat “non-standard” for H.264/AVC, this is officially supported by H.265/HEVC for Blu-Ray 4K. The mode of operation is 2-pass to aim for comparable file sizes and bitrates. The encoding speed penalty for switching from x264 to x265 at the given settings is around a factor of 8. Somewhat.

The screenshots below are losslessly compressed 1920×1080 images. Since this is all about compression, I chose to serve the large versions of the images in WebP format to all browsers which support it (Opera 11+, Chromium-based Browsers like Chrome, Iron, Vivaldi, the Android Browser or Pale Moon as the only Gecko browser). This is done, because at maximum level, WebP does lossless compression much more efficiently, so the pictures are smaller. This helps, because my server only has 8Mbit/s upstream. If your browser doesn’t support WebP (like Firefox, IE, Edge, Safari), it’ll be fed lossless PNG instead. All of this happens automatically, you don’t need to do anything!

Now, let’s start with the specs.

Specifications:

Here are the source material encoding settings according to the video stream header:

cabac=1 / ref=4 / deblock=1:1:1 / analyse=0x3:0x133 / me=umh / subme=10 / psy=1 / psy_rd=0.40:0.00 /
mixed_ref=1 / me_range=24 / chroma_me=1 / trellis=2 / 8x8dct=1 / cqm=0 / deadzone=21,11 /
fast_pskip=1 / chroma_qp_offset=-2 / threads=12 / lookahead_threads=1 / sliced_threads=0 / slices=4 /
nr=0 / decimate=1 / interlaced=0 / bluray_compat=1 / constrained_intra=0 / bframes=3 / b_pyramid=1 /
b_adapt=2 / b_bias=0 / direct=3 / weightb=1 / open_gop=1 / weightp=1 / keyint=24 / keyint_min=1 /
scenecut=40 / intra_refresh=0 / rc_lookahead=24 / rc=2pass / mbtree=1 / bitrate=28229 /
ratetol=1.0 / qcomp=0.60 / qpmin=0 / qpmax=69 / qpstep=4 / cplxblur=20.0 / qblur=0.5 /
vbv_maxrate=31600 / vbv_bufsize=30000 / nal_hrd=vbr / filler=0 / ip_ratio=1.40 / aq=1:0.60

x264 10-bit encoding settings (pass 1 & pass 2), 2.5Mbit example:

--fps 24000/1001 --preset veryslow --tune animation --open-gop --b-adapt 2 --b-pyramid normal -f -2:0
--bitrate 2500 --aq-mode 1 -p 1 --slow-firstpass --stats v.stats -t 2 --no-fast-pskip --cqm flat
--non-deterministic

--fps 24000/1001 --preset veryslow --tune animation --open-gop --b-adapt 2 --b-pyramid normal -f -2:0
--bitrate 2500 --aq-mode 1 -p 2 --stats v.stats -t 2 --no-fast-pskip --cqm flat --non-deterministic

x265 10-bit encoding settings (pass 1 & pass 2), 2.5Mbit example:

--y4m -D 10 --fps 24000/1001 -p veryslow --open-gop --bframes 16 --b-pyramid --bitrate 2500 --rect
--amp --aq-mode 3 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0
--rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 1
--slow-firstpass --stats v.stats --sar 1 --range full

--y4m -D 10 --fps 24000/1001 -p veryslow --open-gop --bframes 16 --b-pyramid --bitrate 2500 --rect
--amp --aq-mode 3 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0
--rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 2
--stats v.stats --sar 1 --range full

Since x265 can only read raw YUV and Y4M, the source video is being fed to it via [libavs’] avconv tool, piping it into x265. The avconv commandline for that looks as follows:

$ avconv -r 24000/1001 -i input.h264 -f yuv4mpegpipe -pix_fmt yuv420p -r 24000/1001 - 2>/dev/null

If you want to do something similar, but you don’t like avconv, you can use [ffmpeg] as a replacement, the options are completely the same. Note that you should always specify the correct frame rates (-r) for input and output, or the bitrate setting of the encoder will be applied wrongly!

x264 on the other hand was linked against libav directly, using its decoding capabilities without any workarounds.

Let’s compare:

“The Garden of Words” has a lot of rain. This is a central story element of the 46 minute movie, and it’s hard on any encoder, because a lot of stuff is moving on screen all the time. Let’s take a look at such a scene for our first side-by-side comparison. Each comparison is done in two rows: H.264/AVC first (including the source material), and below that H.265/HEVC, also including the source.

Let’s go:

Scene 1, H.264/AVC encoded by x264 0.148.x:

Scene 1, H.265/HEVC encoded by x265 1.9+15-425b583f25db:

It has been said that x265 performs specifically well at two things: Very high resolutions (which we don’t have here) and low bitrates. And yep, it shows. When comparing the 1Mbit shots, it becomes clear pretty quickly that x265 manages to preserve more detail for the parts with lots of motion. x264 on the other hand starts to wash out the scene pretty severely, smearing out some raindrops, spray water and parts of the foliage. Also, it’s pretty bad around the outlines as well, but that’s true for both encoders. You can spot that easily with all the aliasing artifacts on the raindrops.

Moving up a notch, it becomes very hard to distinguish between the two. When zooming in you can still spot a few minor differences (note the kids umbrella, even if it’s not marked), but it’s quite negligible. Here I’m already starting to think x265 might not be worth it in all cases. There are still differences between the two 2.5Mbit shots and the original however, see the red areas of the umbrella and the most low-contrast, dark parts of the foliage.

At 5Mbit, I really can’t see any difference anymore. Maybe the colors are a little off or something, but when seen in motion, distinguishing between the two and the original becomes virtually impossible. Given that we just threw a really difficult scene at x264 and x265, this should be a trend to continue throughout the whole test.

Now, even more rain:

Scene 2, H.264/AVC encoded by x264 0.148.x:

Scene 2, H.265/HEVC encoded by x265 1.9+15-425b583f25db:

Now this is extreme at 1Mbit! Looking at H.264, the spray water on top of the cable can’t even be told apart from the cloud in the background anymore. Detail loss all over the scene is catastrophic in comparison to the original. Tons of raindrops are simply gone entirely, and the texture details on the tower and the angled brick wall of the house to the left? Almost completely washed out and smeared.

Now, let’s look at H.265 @ 1Mbit. The spray water is also pretty bad, but it’s amazing how much more detail was preserved overall. Sure, there are still parts of the raindrops missing, but it’s much, much closer to the original. We can now see details on the walls as well, even the steep angle one on the left. The only serious issue is the red light bleeding at the tower. There is very little red there in the original, so I’m not sure what happened there. x264 does this as well, but x265 is a bit worse.

At the next level, the differences are less pronounced again, but there is still a significant enough improvement when going from x264 to x265 at 2.5Mbit: The spray water on the cable becomes more well-defined, and more rain is being preserved. Also, the textures on the walls are a tiny little bit more detailed and crisp. Once again though, x265 is bleeding too much red light at the tower.

Since it’s noticeably not fully on the level of the source still, let’s look at 5Mbit briefly. x265 is able to preserve a tiny little bit more rain detail here, coming extremely close to the original. In motion, you can’t really see the difference however.

Now, let’s get steamy:

Scene 3, H.264/AVC encoded by x264 0.148.x:

Scene 3, H.265/HEVC encoded by x265 1.9+15-425b583f25db:

1Mbit first again: Let me just say: It’s ugly. x264 pretty much messes up the steam coming from the iron. We get lots of block artifacts now. Some of the low-contrast patterns on the ironing board are being smeared out at a pretty terrible level. Also, the bokeh background partly shows block artifacts and banding. x265 produces quite a lot of banding here itself, but no blocks. Also, outlines and sharp contrasts are more well-defined, and the low contrast part is done noticeably better.

At 2.5Mbit, the patterns repeat themselves now. The steam is only slightly better with x265, outlines are slightly more well-defined, and the low-contrast patterns are slightly more visible. For some of the blurred parts, x265 seems to be a tiny little bit to prone to banding though, in a very few spots, x264 might be just that 1% better. Overall however, x265 wins this, and even if it’s just for the better outlines.

At 5Mbit, you really need to zoom in and analyze very small details, e.g. around the outer border of the steam. Yes, x265 does better again. But you’d not really be able to notice this when watching.

How about we go cry a little bit:

Scene 4, H.264/AVC encoded by x264 0.148.x:

Scene 4, H.265/HEVC encoded by x265 1.9+15-425b583f25db:

Cutting onions would be a classic fun part in a slice-of-life anime. Here, it’s just kitchen work. And quite the bad looking one for H.264 at 1Mbit. The letters on the knife are partly lost completely, becoming unreadable. The onion parts that fly off are visibly worse than when encoded with x265 at the same bitrate. Also, x264 produced block artifacts in the blurred bokeh areas again, that simply aren’t there with x265.

On the next level, the two come much closer to each other. However, x265 simply does the outlines better. Less artifacts and sharper, just like with the writing on the knifes’ blade as well. The issues with the bokeh are nearly gone. What’s left is a negligible amount of blocking for x264 and banding for x265. Not really noticeable however.

Finally, at 5Mbit, x265 shows those ever so slightly more well-done outlines. But that’s about it, the rest looks nice for both, and pretty much identical to the source.

Now, please, dig in:

Scene 5, H.264/AVC encoded by x264 0.148.x:

Scene 5, H.265/HEVC encoded by x265 1.9+15-425b583f25db:

Let’s keep this short: x264 does blocking, and bad transitions/outlines. x265 does it better. Plain and simple.

At 2.5Mbit, x265 nearly reaches quality transparency when compared to the original, something x264 falls short of, just a bit. While x265 does the outlines and the steam part quite like in the original frame, x264 rips the outlines apart a bit too much, and slight block artifacts can again be seen for the steam part.

At 5Mbit, x264 still shows some blocking artifacts in a part that most lossy image/video compression algorithms traditionally suck at: The reds. While not true for all human beings, most eyes perceive much finer gradients for greens, then blues, and do worst with reds. Meaning, our eyes have an unequal sensitivity distribution when it comes to colors. So image and video codecs try to save bitrate in the reds first, because supposedly it’d be less noticeable. To me subjectively, x265 achieves transparency here, meaning it looks just like the original. x264 doesn’t manage entirely. Close, but not not entirely.

Next:

Scene 6, H.264/AVC encoded by x264 0.148.x:

Scene 6, H.265/HEVC encoded by x265 1.9+15-425b583f25db:

This is a highly static scene, with only few moving parts, so there is some rain again, and some shadow cast by raindrops as well. Now, for the static parts, incremental B frames really work wonders here. Most detail is being preserved by both encoders. What’s supposed to be happening is happening: The encoders save bit rate where the human eye can’t easily tell: In the parts where stuff is moving around very quickly. That’s how we lose a lot of raindrop shadows and some drops as well. x264 seems to have trouble separating the scene into even smaller macro blocks though? Not sure if that’s the reason, but a lot of mesh detail for the basket on the balcony on the top right is lost – x265 does better there! This is maybe because x264 couldn’t distinct the moving drops from the static background so well anymore?

At 2.5Mbit, the scenes become almost indistinguishable. The more static content we have, the easier it gets of course, so the transparency threshold becomes lower. And if you ask me, both of them reach perfect quality at 5Mbit.

Let’s throw another hard one at them for the last round:

Scene 7, H.264/AVC encoded by x264 0.148.x:

Scene 7, H.265/HEVC encoded by x265 1.9+15-425b583f25db:

Enough rain already? Pffh! Here we have a lot of foliage and low contrast added to the mix. And it gets smeared a lot by x264, rain detail lost, fine details of the bushes turning into green mud, that’s how it goes. x265 also loses too much detail here (I mean, 1Mbit is really NOT much), but again, it fares quite a bit better.

At 2.5Mbit, both encoders do very well. Somehow, this scene doesn’t seem to penalize x264 that much at the medium level. You’d really need your magnifying glass to find the spots where x265 still does better, which surprises me a bit for this scene. And finally, at 5Mbit – if you ask me – visual transparency is reached for both x264 and x265.

Final thoughts:

Clearly it’s true what a lot of people have been saying. x265 rocks at low bitrates, if configured correctly. But that isn’t gonna give me perfect quality or anything. Yeah, it sucks less – much less – than x264 in that department, but at a higher 2.5Mbit, where both start looking quite decent, x265 having just a slight edge… it becomes hardly justifiable to use it, simply because it’s that much slower to run it at decent settings.

Also, you need to take device compatibility into account. Sure, a powerful PC can always play the stuff. No matter if it’s some UNIX, Linux, MacOS X or Windows. But how about video game consoles? Older TVs? That kind of thing. Most of those can only play H.264/AVC. Or course, if you’re only using your PC and you have a lot of time and electricity to burn, then hey – why not?

But I’ll have to think really long and really hard about whether I want to replace x264 with x265 at this given point in time. Overall, it might not be practical enough on my current hardware yet. Maybe I’d need an AVX/AVX2-capable processor, as x265 has tons of optimizations for those instruction set extensions. But I’m gonna stay on my Xeon X5690 for quite a while, so SSE 4.2 is the latest I have.

I’d say, if you can stand some quality degradation, then x265 might be the way to go, as it can give you much smaller file sizes at lower bitrates with slight degradation.

If you’re aiming for high bitrates and quality, it might not be worth it right now, at least for 1080p. It’s been said the tables are turning once more when going up to 4K and UHD, but I haven’t tested that yet, as all my content – both Anime and live action movies – are “low resolution” *cough* 1080p or 720p.

Addendum:

Screenshots were taken using the latest stable mplayer 1.3.0 on Linux. Thank god they’re bundling it with ffmpeg now, making things much easier. This choice was made because mplayer can easily grab screenshots from specific spots in a video in an automated fashion. I used framesteps for this, like this:

$ mplayer ./TEST-H.265/HEVC-1mbit.mkv -nosound -vf framestep=24 \
 -vo png:z=9:outdir=./screenshots/1mbit/H.265/HEVC/:prefix=H.265/HEVC-1mbit-

This will decode every 24th frame of the video file TEST-H.265/HEVC-1mbit.mkv, and grab it into a .png file with maximum lossless compression as supported by mplayer. The .png files will be prefixed with a user-defined string and numbered, like H.265/HEVC-1mbit-00000001.pngH.265/HEVC-1mbit-00000002.png and so on, until the end of file is reached.

To encode the full size screenshots to WebP, the most recent [libwebp-0.5.0], or rather one of its companion tools – cwebp – was used as follows:

$ cwebp -z 9 -lossless -q 100 -noalpha -mt ./input.png -o ./output.webp

Now… somebody wanna grant me remote access to some quad socket Haswell-EX Xeon box for free?

No?

Meh… :roll: