Jan 252017
 

H.265/HEVC logo1.) Introduction

After doing a [somewhat proper comparison between x264 and x265] a while back, I thought I’d do another one at extremely low bitrates. It reminded me of the time I’ve been using ISDN at 64kbit/s (my provider didn’t let me use CAPI channel aggregation for 128kbit/s), which was the first true flat rate in my country. ‘Cause I’ve been thinking this:

“Can H.265/HEVC enable an ISDN user to stream 1080p content in any useful form?” and “What would H.264/AVC look like in that case?”

Let me say this first: It reaaally depends on how you define “useful”. :roll:

Pretty much nobody uses ISDN these days, and V.9x 56kbaud modems are dying out in the 1st world as well, so this article doesn’t make a lot of sense. To be fair, I didn’t even pick encoding settings fit for low-latency streaming either, nor are my settings fit for live encoding. So it’s just for the lulz, but still! I wanted to see whether it could be done at all, in theory.

To make it happen, I had to choose extremely low bitrates not just for video, but audio as well. There are even subtitles included in my example, which are present in Matroska-style zlib-compressed [.ass] format, so compressed text essentially.

For the audio part, I chose the Fraunhofer FhG-AAC encoder to encode to the lowest possible constant bitrate, which is 8kbit/s HEv2-AAC. That’s a VoIP-focused version of the codec targeted at preserving human speech as well as possible at conditions as bad as they get. And yes, it sounds terrible. But it still gets across just enough to be able to understand what people are saying and what type of sounds are occurring in a scene. Music and most environmental sounds are terrible in quality, but they are still discernible.

For video, I picked a 2-pass ABR mode with a 50kbit/s target bitrate, which is insanely low even for the Anime content I picked (my apologies, Mr. “[Anime is not what everyone watches]”, but yes, I picked Anime again). Note that 2D animated content is pretty easy on the encoders in this case, so the results would’ve likely been a lot worse with 1080p live action content. As for the encoder settings, you can find those [down below] and as for how I’m taking the screenshots, I’ll spare you those details, they’re pretty similar to the stuff shown in the link at the top.

Before we start with the actual quality comparison, I should mention that my test results actually overshot their target, so they’re really unsuitable for live streaming even in the ISDN case. I just didn’t care enough for trying to push the bitrate down any further. Regular streaming would still be possible with my result files, but not without prebuffering. See here:

$ ls -hs *.mkv
2.6M Piaceː Watashi no Italian - Episode 02-H.264+HEv2AAC-V50kbit-A8kbit.mkv
2.0M Piaceː Watashi no Italian - Episode 02-H.265+HEv2AAC-V50kbit-A8kbit.mkv
 76M Piaceː Watashi no Italian - Episode 02.mkv
$ for i in {'Piaceː Watashi no Italian - Episode 02.mkv','Piaceː Watashi no Italian - \
Episode 02-H.264+HEv2AAC-V50kbit-A8kbit.mkv','Piaceː Watashi no Italian - \
Episode 02-H.265+HEv2AAC-V50kbit-A8kbit.mkv'}; do mediainfo "$i" | grep -i "overall bit rate"; done
Overall bit rate                         : 2 640 kb/s
Overall bit rate                         : 88.6 kb/s
Overall bit rate                         : 69.2 kb/s

The first one is the source (note: From [Crunchyroll], legal to watch and record in my country at this time), the second my x264 and the third my x265 versions. Let’s show you the bitrate overshoot of just the video streams in my versions:

$ for i in {'Piaceː Watashi no Italian - Episode 02-H.264+HEv2AAC-V50kbit-A8kbit.mkv','Piaceː \
Watashi no Italian - Episode 02-H.265+HEv2AAC-V50kbit-A8kbit.mkv'}; do mediainfo \
--Output="Video;%BitRate/String%" "$i"; done
71.1 kb/s
51.6 kb/s

So as you can see, x264 messed up pretty big, overshooting by 21.1kbit/s (42.2%), whereas x265 almost landed on target, overshooting by a mere 1.6kbit/s (3.2%) overall. And still… well… Let’s give you an overview first (as usual, click to enlarge images):

2.) Quality comparisons

Note that the color shown in those thumbnails is not representative of the real images, this has been transformed to 256 color .png to make it easier to download (again, if your browser supports it, .webp will be loaded instead transparently). This is just to show you some basic differences between what x264 and x265 are able to preserve, and what they are not. Also, keep in mind, that “~50kbit/s nominal bitrate” means 71.1kbit/s for x264 and 51.6kbit/s for x265!

Overall, x264 fucks up big time. There are frames with partial macroblock drops and completely blank frames even! Also, a lot of frames lose their color either partially or completely as well, making them B/W. And that’s given that x264 even invested 42.2% more bitrate than what I aimed for!

x265 has no such severe issues, all frames are completely there in full color, and that at a bitrate reasonable close to the target. Let’s look at a few interesting cases side by side:

Scene 1 (left: x264, middle: x265, right: source file):

There are some indications of use of larger CTUs (coding tree unit, H.265s’ replacement for macroblocks) in x265s’ case, which is supposed to be one of its strong points, especially for very large resolution encoding (think: 4K/UHD, 8K). While larger blocks can mean loss of detail in that area, it’s ok for larger areas of uniform color, which this Anime has a ton of. H.264/AVC can’t do that so well, because the upper limit for a macroblocks’ size is rather low with 16×16 pixels. You can see the macroblock size pretty clearly in the blocky frame to the left. You need to look a bit more carefully in x265s’ case, but there are a few spots where I believe it can be seen as well. In my case the CTU size for x265 was 32×32px.

Hm, maybe --ctu 64 would’ve been better for this specific case, but whatever.

Lets look at two more mostly color-related comparisons:

Scenes 2 & 3 (left: x264, middle: x265, right: source file):

 

In the first case it seems as if x264 is trying to preserve shades of green more than anything, but in the second case, something terrible happens. There is a lot of red in the scene before this one, and there is quite some red on those can labels as well. It seems x264 doesn’t know where to put the color anymore, and the reds bleed almost all over the frame. And it stays like that for the entire scene as well, which means for several seconds. The greens and browns are lost. Block artifacts are excessive as well, but at least x264 managed to give us whole frames here, with some color even.

Well, the color kinda went everywhere, but uhm, yeah…

Two more:

Scenes 4 & 5 (left: x264, middle: x265, right: source file):

 

I really don’t know what’s with x264 and the reds. Shouldn’t green have priority? I mean, not just in the chroma subsampling, but in encoding as well? But red seems what x264 drops last, and it happens more than once. Given the detail and movements in that last part, even x265 fails though. Yes, it does preserve more color, but it doesn’t come remotely close to the source at this bitrate.

And that other frame with the cuteness overload? There are a lot like those, where x264 just kinda panics, drops everything it has and then frantically tries to (re?)construct the current frame, sometimes only partially until the next I-frame arrives or so.

So that’s it for my quick & dirty “ultra low bitrate” comparison between x264 and x265, at pretty taxing encoding settings once again.

3.) Additional information

x264 encoding settings:

$ mediainfo Piaceː\ Watashi\ no\ Italian\ -\ Episode\ 02-H.264+HEv2AAC-V50kbit-A8kbit.mkv | grep -i \
"encoding settings"
Encoding settings                        : cabac=1 / ref=16 / deblock=1:-2:0 / analyse=0x3:0x133 / \
me=umh / subme=10 / psy=1 / psy_rd=0.40:0.00 / mixed_ref=1 / me_range=24 / chroma_me=1 / trellis=2 \
/ 8x8dct=1 / cqm=0 / deadzone=21,11 / fast_pskip=0 / chroma_qp_offset=-2 / threads=18 / \
lookahead_threads=4 / sliced_threads=0 / nr=0 / decimate=1 / interlaced=0 / bluray_compat=0 / \
constrained_intra=0 / bframes=16 / b_pyramid=2 / b_adapt=2 / b_bias=0 / direct=3 / weightb=1 / \
open_gop=1 / weightp=2 / keyint=250 / keyint_min=23 / scenecut=40 / intra_refresh=0 / \
rc_lookahead=60 / rc=2pass / mbtree=1 / bitrate=50 / ratetol=1.0 / qcomp=0.60 / qpmin=0 / qpmax=81 \
/ qpstep=4 / cplxblur=20.0 / qblur=0.5 / ip_ratio=1.40 / aq=1:0.60

x265 encoding settings (note: 10 bits per color channel were chosen, same as for x264):

$ mediainfo Piaceː\ Watashi\ no\ Italian\ -\ Episode\ 02-H.265+HEv2AAC-V50kbit-A8kbit.mkv | grep -i \
"encoding settings"
Encoding settings                        : cpuid=1049087 / frame-threads=3 / wpp / pmode / pme / \
no-psnr / no-ssim / log-level=2 / input-csp=1 / input-res=1920x1080 / interlace=0 / total-frames=0 \
/ level-idc=0 / high-tier=1 / uhd-bd=0 / ref=6 / no-allow-non-conformance / no-repeat-headers / \
annexb / no-aud / no-hrd / info / hash=0 / no-temporal-layers / open-gop / min-keyint=23 / \
keyint=250 / bframes=16 / b-adapt=2 / b-pyramid / bframe-bias=0 / rc-lookahead=40 / \
lookahead-slices=0 / scenecut=40 / no-intra-refresh / ctu=32 / min-cu-size=8 / rect / amp / \
max-tu-size=32 / tu-inter-depth=4 / tu-intra-depth=4 / limit-tu=0 / rdoq-level=1 / signhide / \
no-tskip / nr-intra=0 / nr-inter=0 / no-constrained-intra / no-strong-intra-smoothing / max-merge=5 \
/ limit-refs=1 / limit-modes / me=3 / subme=4 / merange=57 / temporal-mvp / weightp / weightb / \
no-analyze-src-pics / deblock=0:0 / no-sao / no-sao-non-deblock / rd=6 / no-early-skip / rskip / \
no-fast-intra / no-tskip-fast / no-cu-lossless / b-intra / rdpenalty=0 / psy-rd=1.60 / \
psy-rdoq=5.00 / no-rd-refine / analysis-mode=0 / no-lossless / cbqpoffs=0 / crqpoffs=0 / rc=abr / \
bitrate=50 / qcomp=0.75 / qpstep=4 / stats-write=0 / stats-read=2 / stats-file=265/v.stats / \
cplxblur=20.0 / qblur=0.5 / ipratio=1.40 / pbratio=1.30 / aq-mode=3 / aq-strength=1.00 / cutree / \
zone-count=0 / no-strict-cbr / qg-size=32 / no-rc-grain / qpmax=69 / qpmin=0 / sar=1 / overscan=0 / \
videoformat=5 / range=1 / colorprim=2 / transfer=2 / colormatrix=2 / chromaloc=0 / display-window=0 \
/ max-cll=0,0 / min-luma=0 / max-luma=1023 / log2-max-poc-lsb=8 / vui-timing-info / vui-hrd-info / \
slices=1 / opt-qp-pps / opt-ref-list-length-pps / no-multi-pass-opt-rps / scenecut-bias=0.05

x264 version:

$ x264 --version
x264 0.148.x
(libswscale 3.0.0)
(libavformat 56.1.0)
built on Sep  6 2016, gcc: 6.2.0
x264 configuration: --bit-depth=10 --chroma-format=all
libx264 configuration: --bit-depth=10 --chroma-format=all
x264 license: GPL version 2 or later
libswscale/libavformat license: nonfree and unredistributable
WARNING: This binary is unredistributable!

x265 version:

$ x265 --version
x265 [info]: HEVC encoder version 2.2+23-58dddcf01b7d
x265 [info]: build info [Linux][GCC 6.2.0][64 bit] 8bit+10bit+12bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2

Encoding & testing platform:

$ uname -sr
Linux 2.6.32-573.8.1.el6.x86_64
$ cat /etc/redhat-release 
CentOS release 6.8 (Final)
$ cat /proc/cpuinfo | grep "model name" | uniq
model name	: Intel(R) Core(TM) i7 CPU       X 980  @ 3.33GHz

4.) Answers

Q: “Can H.265/HEVC enable an ISDN user to stream 1080p content in any useful form?

A: It can probably stream something that at least resembles the original source in a recognizable fashion, but… whether you can call that “useful” or not is another thing entirely…

Q: “What would H.264/AVC look like in that case?”

A: Like shit! :roll:

Jul 272016
 

x264 logoSince I’ve been doing a bit of Anime batch video transcoding with x264 and x265 in the last few months, I thought I’d document this for myself here. My goal was to loop over an arbitrary amount of episodes and just batch-transcode them all at once. And that on three different operating systems: Windows (XP x64), Linux (CentOS 6.8 x86_64) and FreeBSD 10.3 UNIX, x86_64. Since I’ve started to split the work across multiple machines, I lost track of what was where and which machine finished what, and when.

So I thought, why not let the loop send me a small notification email upon completion? And that’s what I did. On Linux and UNIX this relies on the bash shell and the mailx command. Please note that I’m talking about [Heirloom mailx], not some other mail program by the same name! I’m mentioning this, because there is a different default mailx on FreeBSD, that won’t work for this. That’s why I put alias mailx='/usr/local/bin/mailx' in my ~/.bash_profile on that OS after installing the right program to make it the default for my user.

On Windows, my loops depend on my own [colorecho] command (you can replace that with cmds’ ECHO if you want) as well as the command line mailer [blat]. Note that, if you need to use SSL/TLS encryption when mailing, blat can’t do that. A suitable replacement could be [mailsend]. Please note, that mailsend does not work on Windows XP however.

In the x265 case, avconv (from the [libav] package) is required on all platforms. You can get my build for Windows [here]. If you don’t like it, the wide-spread [ffmpeg] can be a suitable drop-in replacement.

Now, when setting up blat on Windows, make sure to run blat -help first, and learn the details about blat -install. You need to run that with certain parameters to set it up for your SMTP mail server. For whatever reason, blat reads some of that data from the registry (ew…), and blat -install will set that up for you.

Typically, when I transcode, I do so on the elementary streams rather than .mkv files directly. So I’d loop through some source files and extract the needed streams. Let’s say we have “A series – episode 01.mkv” and some more, all the way up to “A series – episode 13.mkv”, then, assuming track #0 is the video stream…

On Windows:

FOR %I IN (01,02,03,04,05,06,07,08,09,10,11,12,13) DO mkvextract tracks "A series - episode %I.mkv" ^
 0:%I\video.h264

On Linux/UNIX:

for i in {01..13}; do mkvextract tracks "A series - episode $i.mkv" 0:$i/video.h264; done

mkvextract will create the non-existing subfolder for us, and a x264 transcoding loop would then look like this on Windows:

expand/collapse source code
cmd /V /C "ECHO OFF & SET MACHINE=NOVASTORM& SET EPNUM=13& SET SERIES="AnimeX"& (FOR %I IN ^
 (01,02,03,04,05,06,07,08,09,10,11,12,13) DO "c:\Program Files\VFX\x264cli\x264-10b.exe" --fps ^
 24000/1001 --preset veryslow --tune animation --open-gop -b 16 --b-adapt 2 --b-pyramid normal -f ^
 -2:0 --bitrate 2500 --aq-mode 1 -p 1 --slow-firstpass --stats %I\v.stats -t 2 --no-fast-pskip ^
 --cqm flat --non-deterministic --demuxer lavf %I\video.h264 -o %I\pass1.264 & colorecho "Pass 1 ^
 done for Episode %I/"!EPNUM!" of "!SERIES!"" 10 & ECHO. & ^
 "c:\Program Files\VFX\x264cli\x264-10b.exe" --fps 24000/1001 --preset veryslow --tune animation ^
 --open-gop -b 16 --b-adapt 2 --b-pyramid normal -f -2:0 --bitrate 2500 --aq-mode 1 -p 2 --stats ^
 %I\v.stats -t 2 --no-fast-pskip --cqm flat --non-deterministic --demuxer lavf %I\video.h264 -o ^
 %I\pass2.264 & colorecho "Pass 2 done for Episode %I/"!EPNUM!" of "!SERIES!"" 10) & echo !SERIES! ^
 transcoding complete | blat - -t "myself@another.mailhost.com" -c "myself@mailhost.com" -s "x264 ^
 notification from !MACHINE!" & SET MACHINE= & SET EPNUM= & SET SERIES="

Note that I always write all the iteration out in full here. That’s because cmd can’t do loops with leading zeroes in the iterator. The reason for this is that those source files usually have them in their lower episode numbers. If it wasn’t 01,02, … ,12,13, but 1,2, … ,12,13 instead, you could do FOR /L %I IN (1,1,13) DO. But this isn’t possible in my case. Even if elements need alphanumeric names like here,  FOR %I IN (01,02,03,special1,special2,ova1,ova2) DO, you still won’t need that syntax on Linux/UNIX because the bash can have iterator groups like for i in {{01..13},special1,special2,ova1,ova2}; do. Makes me despise the cmd once more. ;)

Edit:

Ah, according to [this], you can actually do something like cmd /V /C "FOR /L %I IN (1,1,13) DO (SET "fI=00%I" & echo "!fI!:~-2")", holy shit. It actually works and gives you leading zeroes. :~-2 for 2 digits, :~-3 for three. Expand fI for more in this example. I mean, what is this even? Some number formatting magic? I probably don’t even wanna know… Couldn’t find any way of having several groups for the iterator however. Meh. Still don’t like it.

So, well, it’s like this on Linux/UNIX:

expand/collapse source code
(export MACHINE=BEAST EPNUM=13 SERIES='AnimeX'; for i in {01..13}; do nice -n19 x264 --fps \
24000/1001 --preset veryslow --tune animation --open-gop -b 16 --b-adapt 2 --b-pyramid normal -f \
-2:0 --bitrate 2500 --aq-mode 1 -p 1 --slow-firstpass --stats $i/v.stats -t 2 --no-fast-pskip \
--cqm flat --non-deterministic --demuxer lavf $i/video.h264 -o $i/pass1.264 && echo && echo -e \
"\e[1;31m`date +%H:%M`, pass 1 done for episode $i/$EPNUM of $SERIES\e[0m" && echo && nice -n19 \
x264 --fps 24000/1001 --preset veryslow --tune animation --open-gop -b 16 --b-adapt 2 --b-pyramid \
normal -f -2:0 --bitrate 2500 --aq-mode 1 -p 2 --stats $i/v.stats -t 2 --no-fast-pskip --cqm flat \
--non-deterministic --demuxer lavf $i/video.h264 -o $i/pass2.264 && echo && echo -e \
"\e[1;31m`date +%H:%M`, pass 2 done for episode $i/$EPNUM of $SERIES\e[0m" && echo; done && echo \
"$SERIES transcoding complete" | mailx -s "x264 notification from $MACHINE" -r \
"myself@mailhost.com" -c "myself@another.mailhost.com" -S smtp-auth="login" -S \
smtp="smtp.mailhost.com" -S smtp-auth-user="myuser" -S smtp-auth-password="mysecurepassword" \
myself@mailhost.com)

The variable $MACHINE or %MACHINE%/!MACHINE! specifies the machines’ host name. This will be noted in the email, so I know which machine just completed something. $EPNUM – or %EPNUM%/!EPNUM! on Windows – is used for periodic updates on the shell. The output would be like “Pass 1 done for Episode 07/13 of AnimeX” in green on Windows and bold red on Linux/UNIX (just change the color to your liking).

Finally, $SERIES aka %SERIES%/!SERIES! would be the series’ name. So say, the UNIX machine named “BEAST” above is done with this loop. The email would come with the subject line “x264 notification from BEAST” and would read “AnimeX transcoding complete” in plain text. That’s all.

Please note, that cmd batch on Windows is extremely creepy. Every whitespace (especially the leading ones when doing multi-line like this for display) needs to be exactly where it is. The same goes for double quotes where you might think they aren’t needed. They are! Also, this needs delayed variable expansion once again, which is why we see variables like !EPNUM! instead of %EPNUM% and why it’s called in a subshell by running cmd /V /C.

On Linux/UNIX we don’t need to rely on some specific API like cmds’ SetConsoleTextAttribute() to print colors, as most terminals understand ANSI color codes.

And this is what it looks like for x265:

Windows:

expand/collapse source code
cmd /V /C "ECHO OFF & SET MACHINE=NOVASTORM& SET EPNUM=13& SET SERIES="AnimeX"& (FOR %I IN ^
 (01,02,03,04,05,06,07,08,09,10,11,12,13) DO avconv -r 24000/1001 -i %I\video.h264 -f yuv4mpegpipe ^
 -pix_fmt yuv420p -r 24000/1001 - 2>NUL | "C:\Program Files\VFX\x265cli-mb\x265.exe" - --y4m -D 10 ^
 --fps 24000/1001 -p veryslow --pmode --pme --open-gop --ref 6 --bframes 16 --b-pyramid --bitrate ^
 2500 --rect --amp --aq-mode 3 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 ^
 --psy-rdoq 5.0 --rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 ^
 --pass 1 --slow-firstpass --stats %I\v.stats --sar 1 --range full -o %I\pass1.h265 & colorecho ^
 "Pass 1 done for Episode %I/"!EPNUM!" of "!SERIES!"" 10 & ECHO. & avconv -r 24000/1001 -i ^
 %I\video.h264 -f yuv4mpegpipe -pix_fmt yuv420p -r 24000/1001 - 2>;NUL | ^
 "C:\Program Files\VFX\x265cli-mb\x265.exe" - --y4m -D 10 --fps 24000/1001 -p veryslow --pmode ^
 --pme --open-gop --ref 6 --bframes 16 --b-pyramid --bitrate 2500 --rect --amp --aq-mode 3 ^
 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0 --rdoq-level 1 ^
 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 2 --stats %I\v.stats --sar ^
 1 --range full -o %I\pass2.h265 & colorecho "Pass 2 done for Episode %I/"!EPNUM!" of "!SERIES!"" ^
 10) & echo !SERIES! transcoding complete | blat - -t "myself@another.mailhost.com" -c ^
 "myself@mailhost.com" -s "x265 notification from !MACHINE!" & SET MACHINE= & SET EPNUM= & SET ^
 SERIES="

Linux/UNIX:

expand/collapse source code
(export MACHINE=BEAST EPNUM=13 SERIES='AnimeX'; for i in {01..13}; do avconv -r 24000/1001 -i \
$i/video.h264 -f yuv4mpegpipe -pix_fmt yuv420p -r 24000/1001 - 2>/dev/null | nice -19 x265 - --y4m \
-D 10 --fps 24000/1001 -p veryslow --open-gop --ref 6 --bframes 16 --b-pyramid --bitrate 2500 \
--rect --amp --aq-mode 3 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq \
5.0 --rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 1 \
--slow-firstpass --stats $i/v.stats --sar 1 --range full -o $i/pass1.h265 && echo && echo -e \
"\e[1;31m`date +%H:%M`, pass 1 done for episode $i/$EPNUM of $SERIES\e[0m" && echo && avconv -r \
24000/1001 -i $i/video.h264 -f yuv4mpegpipe -pix_fmt yuv420p -r 24000/1001 - 2>/dev/null | nice \
-19 x265 - --y4m -D 10 --fps 24000/1001 -p veryslow --open-gop --ref 6 --bframes 16 --b-pyramid \
--bitrate 2500 --rect --amp --aq-mode 3 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd \
1.6 --psy-rdoq 5.0 --rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 \
--pass 2 --stats $i/v.stats --sar 1 --range full -o $i/pass2.h265 && echo && echo -e \
"\e[1;31m`date +%H:%M`, pass 2 done for episode $i/$EPNUM of $SERIES\e[0m" && echo; done && echo \
"$SERIES transcoding complete" | mailx -s "x265 notification from $MACHINE" -r \
"myself@mailhost.com" -c "myself@another.mailhost.com" -S smtp-auth="login" -S \
smtp="smtp.mailhost.com" -S smtp-auth-user="myuser" -S smtp-auth-password="mysecurepassword" \
myself@mailhost.com)

And that’s it. The loops for audio transcoding are simpler, as that part is so fast, it doesn’t need email notifications. Runs for minutes rather than days. When all is done, I’d usually fire up the MKVToolnix GUI, and prepare a mux for the first episode. There is a nice “copy command line to clipboard” function there when you click on “Muxing” after everything is set up. With that I can build another loop that muxes everything to final .mkv files. On Windows that part is more complicated if you want Unicode support, so I needed to create input files by using a generator I wrote in Perl for that, but that’s for another day… :)

Oh, and if you wanna ssh into your Linux or UNIX boxes from afar to check on your transcoders, consider launching them on a GNU screen] session. It’s immensely useful! Too bad it won’t work on the Windows cmd. :(

Jul 152016
 

H.265/HEVC logoSince I’ve started to release x265 binaries for Windows XP / XP x64 (and above), I’ve started to take interest in how it’s performance has developed over time. When looking back on x264, we can see that the encoders’ performance has always gradually improved with more and faster assembly code, optimizations for new CPUs’ instruction set extensions (think AMD FMA4, SSE4a or Intel AVX/AVX2) and so on. Over many years x264 has become a ton faster, as you can also see when comparing an old x264 v0.107.1745 (white results) with newer ones (red results) for the same CPUs in the [x264 benchmark].

So, x265 is much newer, but still, it’s been around for a few years now – since 2013 to be more precise – so I’d like to take a look at its performance trends using the options I typically apply for transcoding animated content. You can find the encoding settings and other information about the test below.

Note that I don’t compile every revision of x265, so I can compare only a few, namely the ones I chose to [release] periodically, excluding the new 2.0+2, which I haven’t released yet (waiting a bit more for a release). Here we go:

x265 performance trend across versions (2017-01-26)

x265 performance trend across versions, state 2017-01-26 (click to enlarge)

previous charts
x265 performance trends across versions, 2016-07-15

x265 performance trends across versions, state 2016-07-15

Note that this runtime graph is inverted, in the sense that “higher = better”. Version 1.7 was the first which could actually encode using the options of my choice, older versions don’t understand all of them and are not comparable in my case because of that. We start off with pretty bad performance and then we get an ample boost with the earlier 1.9 versions.

From 1.9+15 to 1.9+141 we can see a gradual performance increase, as expected from continued development and optimization. Naturally, 8-bits per color channel is fastest, as it makes use of a lot of 8-bit arithmetic internally. For higher bit depths (10 and 12 bits per channel, or 30 and 36 bits per pixel respectively) the internal arithmetic is boosted to a 16 bit precision, resulting in better outputs for finer gradients. This costs performance of course.

As expected, 10 and 12 bit runtimes are relatively close in terms of speed with 8 bit being quite far ahead.

Now the most surprising thing is the nose dive we take from 1.9+141 to 1.9+200. I have no idea what happened here?! Did they fix a major bug? Or did some performance-critical options change in the --preset veryslow that I’m using? I have no idea. But to put is in numbers easier to understand: For 8 bit, my test encode went from a runtime of 30:39.312 (1.9+141) to 41:20.296 (1.9+200)! Ah, the time format being MM:SS.sss, M = minutes, S = seconds and s = milliseconds. That’s almost -35%! It’s less pronounced for the higher bit depths with -25% for 10 bits and -24% for 12 bits, but that’s still significant. Maybe I shouldn’t have deleted the output data, might have been useful to look at the H.265 stream headers.

Well, from there on out we continue to see a gradual performance increase again, in a steady and stable fashion. But what happened between 1.9+141 and 1.9+200? I don’t know. Something major must have changed, I just don’t know what it was exactly…

Maybe somebody can enlighten me a bit[1], so here are the options used, right from the benchmarking script:

expand/collapse source code
  1. @ECHO OFF
  2.  
  3. :: ###################################################################################
  4. :: #                                                                                 #
  5. :: #  Please note, that command lines passed to timethis.exe may not be longer than  #
  6. :: #  1024 characters including expanded variables, so we need to keep it short      #
  7. :: #  enough. Some of the options passed to x265 may already be superflous due to    #
  8. :: #  being included in the veryslow preset, but I did not really double-check that. #
  9. :: #  Instead, I kept it just short enough by using shorter file names instead.      #
  10. :: #                                                                                 #
  11. :: #  Where to get the files:                                                        #
  12. :: #    * x265.exe     : http://wp.xin.at/x265-for-winxp-and-winxp-x64               #
  13. :: #    * avconv.exe   : http://wp.xin.at/x265-for-winxp-and-winxp-x64               #
  14. :: #    * colorecho.exe: http://wp.xin.at/archives/3064                              #
  15. :: #    * timethis.exe : https://support.microsoft.com/en-us/kb/927229               #
  16. :: #                                                                                 #
  17. :: #  GPL v3 license for this scripts code, colorecho.exe, parts of avconv.exe and   #
  18. :: #  the CygWin companion libraries for avconv.exe:                                 #
  19. :: #    * https://www.gnu.org/licenses/gpl-3.0.en.html                               #
  20. :: #                                                                                 #
  21. :: #  GPL v2 license for x265.exe:                                                   #
  22. :: #    * https://www.gnu.org/licenses/old-licenses/gpl-2.0.html                     #
  23. :: #                                                                                 #
  24. :: #  LGPL v3 license for other parts of avconv.exe:                                 #
  25. :: #    * https://www.gnu.org/licenses/lgpl-3.0.html                                 #
  26. :: #                                                                                 #
  27. :: #  NT Resource Kit license for timethis.exe:                                      #
  28. :: #    * https://enterprise.dejacode.com/license_library/Demo/ms-nt-resource-kit    #
  29. :: #                                                                                 #
  30. :: #  The x265 encoder  : http://x265.org                                            #
  31. :: #  The libav         : https://libav.org                                          #
  32. :: #  The Cygwin project: https://www.cygwin.com                                     #
  33. :: #                                                                                 #
  34. :: ###################################################################################
  35.  
  36. :: First round, 8 bits per channel color depth (MAIN profile), 8 bit arithmetic:
  37. FOR %%I IN (1.7-8b 1.9+15 1.9+108 1.9+141 1.9+200 1.9+210 1.9+230) DO .\colorecho.exe ^
  38.  ""Testing v%%I, 8 bits..."" 10 & .\timethis.exe ""echo x265 v%%I-8b & .\avconv.exe -r ^
  39.  24000/1001 -i input.h264 -f yuv4mpegpipe -pix_fmt yuv420p -r 24000/1001 - 2>NUL | ^
  40.  .\x%%I.exe - --y4m -D 8 --fps 24000/1001 -p veryslow --open-gop --ref 6 --bframes ^
  41.  16 --b-pyramid --bitrate 2000 --rect --amp --aq-mode 3 --no-sao --qcomp 0.75 ^
  42.  --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0 --rdoq-level 1 ^
  43.  --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 1 ^
  44.  --slow-firstpass --stats %%I-8b.stats --sar 1 --range full -o %%I-8b-p1.h265 2>NUL & ^
  45.  .\avconv.exe -r 24000/1001 -i input.h264 -f yuv4mpegpipe -pix_fmt yuv420p -r ^
  46.  24000/1001 - 2>NUL | .\x%%I.exe - --y4m -D 8 --fps 24000/1001 -p veryslow ^
  47.  --open-gop --ref 6 --bframes 16 --b-pyramid --bitrate 2000 --rect --amp --aq-mode 3 ^
  48.  --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0 ^
  49.  --rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 ^
  50.  --pass 2 --stats %%I-8b.stats --sar 1 --range full -o %%I-8b-p2.h265 2>NUL"" 1> ^
  51.  .\results-%%I-8b.txt 2>.\timethis-errorlog-%%I-8b.txt
  52.  
  53. :: Second round, 10 bits per channel color depth (MAIN10 profile), 16 bit arithmetic:
  54. FOR %%J IN (1.7-10b 1.9+15 1.9+108 1.9+141 1.9+200 1.9+210 1.9+230) DO .\colorecho.exe ^
  55.  ""Testing v%%J, 10 bits..."" 10 & .\timethis.exe ""echo x265 v%%J-10b & .\avconv.exe -r ^
  56.  24000/1001 -i input.h264 -f yuv4mpegpipe -pix_fmt yuv420p -r 24000/1001 - 2>NUL | ^
  57.  .\x%%J.exe - --y4m -D 10 --fps 24000/1001 -p veryslow --open-gop --ref 6 --bframes ^
  58.  16 --b-pyramid --bitrate 2000 --rect --amp --aq-mode 3 --no-sao --qcomp 0.75 ^
  59.  --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0 --rdoq-level 1 ^
  60.  --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 1 ^
  61.  --slow-firstpass --stats %%J-10b.stats --sar 1 --range full -o %%J-10b-p1.h265 2>NUL ^
  62.  & .\avconv.exe -r 24000/1001 -i input.h264 -f yuv4mpegpipe -pix_fmt yuv420p -r ^
  63.  24000/1001 - 2>NUL | .\x%%J.exe - --y4m -D 10 --fps 24000/1001 -p veryslow ^
  64.  --open-gop --ref 6 --bframes 16 --b-pyramid --bitrate 2000 --rect --amp --aq-mode 3 ^
  65.  --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0 ^
  66.  --rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass ^
  67.  2 --stats %%J-10b.stats --sar 1 --range full -o %%J-10b-p2.h265 2>NUL"" 1> ^
  68.  .\results-%%J-10b.txt 2>.\timethis-errorlog-%%J-10b.txt
  69.  
  70. :: Second round, 12 bits per channel color depth (MAIN12 profile), 16 bit arithmetic:
  71. FOR %%K IN (1.7-12b 1.9+15 1.9+108 1.9+141 1.9+200 1.9+210 1.9+230) DO .\colorecho.exe ^
  72.  ""Testing v%%K, 12 bits"" 10 & .\timethis.exe ""echo x265 v%%K-12b & .\avconv.exe -r ^
  73.  24000/1001 -i input.h264 -f yuv4mpegpipe -pix_fmt yuv420p -r 24000/1001 - 2>NUL | ^
  74.  .\x%%K.exe - --y4m -D 12 --fps 24000/1001 -p veryslow --open-gop --ref 6 --bframes ^
  75.  16 --b-pyramid --bitrate 2000 --rect --amp --aq-mode 3 --no-sao --qcomp 0.75 ^
  76.  --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0 --rdoq-level 1 ^
  77.  --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 1 ^
  78.  --slow-firstpass --stats %%K-12b.stats --sar 1 --range full -o %%K-12b-p1.h265 2>NUL ^
  79.  & .\avconv.exe -r 24000/1001 -i input.h264 -f yuv4mpegpipe -pix_fmt yuv420p -r ^
  80.  24000/1001 - 2>NUL | .\x%%K.exe - --y4m -D 12 --fps 24000/1001 -p veryslow ^
  81.  --open-gop --ref 6 --bframes 16 --b-pyramid --bitrate 2000 --rect --amp --aq-mode 3 ^
  82.  --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0 ^
  83.  --rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass ^
  84.  2 --stats %%K-12b.stats --sar 1 --range full -o %%K-12b-p2.h265 2>NUL"" 1> ^
  85.  .\results-%%K-12b.txt 2>.\timethis-errorlog-%%K-12b.txt
  86.  
  87. .\colorecho.exe "Benchmarks completed, cleaning up..." 10
  88.  
  89. :: Removing output and statistics files:
  90. del /Q *p*.h265 *stats*
  91.  
  92. .\colorecho.exe "All done, results are to be found in the results-*.txt files!" 10

So if I missed some critical changes that happened in between 1.9+141 and 1.9+200, please let me know! Oh, and here are the exact system specifications, in case it matters (it probably doesn’t):

  • Intel Xeon X5690 3.33GHz (SSE4.2 is max, no AVX), running at an all-core turbo of 3.6GHz
  • 24GB DDR-III/1066 10-10-10 2T
  • X58 chipset
  • Windows XP Professional x64 Edition SP2 with all Server 2003 R2 x64 updates

I guess I’ll keep doing the performance evaluations from here on out just to see how the encoder evolves over time, performance-wise… And maybe I’ll redo 1.9+141 and one of the newer versions and parse the stream headers to see if the effective encoding options differ anywhere after all. If yes, I’ll update this post!

Update 2017-01-27:

Three new x265 versions have been added to the performance trend analysis, 2.0+54, 2.1+60 and 2.2+22. Much to my dismay, there haven’t been any big developments when it comes to x265s’ performance on x86 machines over the last 6-8 months. Since 1.9+230 it’s pretty much linear, with only minimal variance.

So there have been changes and new options and all that, but it seems that the basic speed of the encoder in the --preset veryslow, implying --no-rskip has stayed the same. Maybe a Google Summer of Code performance challenge would do something for x265? As far as I can remember this worked miracles for x264 in the past as well, if my memory serves me right. Let’s see if we’ll get any significant improvements in the future…

[1] Thanks to “Particular” this [has been clarified] in the comments.

Jun 032016
 

H.265/HEVC logoAnd here’s another x265 build for Windows XP and Windows XP x64, following [1.9+141]. As usual, these work on modern versions of Windows just as well. Again, built with Microsoft Visual Studio 2010 SP1 and tested for correct encodes for 8-bit, 10-bit and 12-bit color depth. The 8-bit test has been done using the x86_32 version, the 10- & 12-bit tests has been done with the x86_64 version. I’m not running complicated test suits on this, just a simple encode with manual output checking.

Here is the software for 32-bit and 64-bit systems:

As usual, the builds depend on the Microsoft Visual C++ 2010 runtime which you can download from Microsoft [for 32-bit systems] and [for 64-bit systems] if you don’t have it already.

This time around, it’s a pure binary release, giving you the x265.exe and libx265.dll. I think I’m gonna keep it that way. It’s meant for users, not developers anyway.

I’m thinking I might create a project page for this, so that all releases get consolidated on a single spot, that’d probably better than creating a new post for each and every build I’m pushing out. If I’m gonna do that, links to it will be added to each post regarding information about how to build x265 for WinXP+, and also to all binary release posts.

Such a page could also give you an avconv release on the spot, so you can work with all kinds of video input to your liking, given that x265 can only accept raw YUV video by itself. Just need to build a 32-bit version of libav as well then.

Oh well, have fun! :)

Update: All x265 releases have now been consolidated on [this page]! All future XP- and XP x64-compatible releases of x265 plus a relatively recent version of avconv to act as a decoder for feeding x265 with any kind of input video streams will be posted there as well.

Apr 192016
 

H.265/HEVC logoLike I said, I’ll keep doing these. Following version [1.9+108], here comes another build of the x265 encoder for Windows XP+ and Windows XP x64/Server 2003 x64+, this time it’s version 1.9+141. I’m not sure for how long the developers at Multicoreware are going to keep up support for NT 5.1/5.2 based operating systems, but for as long as they do, I’ll keep releasing builds for the old MS operating systems. Just keep in mind that I’m not running automated build & test systems, so I’m going to release selected binaries every 1-2 months or so. If you need a specific version, please just request it (or try and build it yourself, see previous posts).

Whenever Multicoreware does drop support I’ll still continue as long as it’s easily patchable. We can’t be sure of anything though, they’ve dropped deep color support (10-bit/12-bit per color channel) on 32-bit x86 platforms before, so…

Well, here is 1.9+141:

Once more, this has been built with Microsofts’ VisualStudio 2010 SP1 + yasm 1.3.0, and tested doing a 2-pass encode & quick output video verification for all color depths. Requires the MS VC++ 2010 runtime, you can get it here: [32-bit version], [64-bit version].

Apr 082016
 

H.265/HEVC logoPreviously, I have shown you [how to compile x265 on Windows] using Microsoft Visual Studio 2010 in a way that results in binaries compatible with Windows NT 5.1/5.2, or in other words: Windows XP, XP x64 and Windows Server 2003. And while that works for most purposes, today I’d like to show you how to build an actual multilib binary, that can handle all three color bit depths supported by x265, the standardized 8- and 10-bit (MAIN and MAIN10 profiles) as well as 12-bit (MAIN12 profile). With that, it’s all in one exe instead of three. As before though, multilib x265 is only supported on 64-Bit Windows. But first, once again…

1.) Giving you the binaries

There were a lot of improvements since the last version I published back in February of course, also performance-wise. So here’s the current version from Multicoreware for both 32-bit and 64-bit Windows, compiled with MSVC 2010 SP1 and yasm 1.3.0. This requires the Microsoft Visual C++ 2010 runtime to work, see previous article:

This time around, the binaries have been tested as well! On regular 32-bit Windows XP, only fundamental binary compatibility was tested. However, all versions, so the 32-Bit one and the 64-Bit multilib ones have been ran through a 2-pass ABR encoding test with output verification for 8-bit color depth (32- & 64-bit) as well as 10- and 12-bit color depths (64-bit only) on Windows XP Professional x64 Edition using the following command line (see previous post for details):

avconv -r 24000/1001 -i input.h264 -f yuv4mpegpipe -pix_fmt yuv420p -r 24000/1001 - 2>NUL^ 
 | .\x265.exe - --y4m -D 10 --fps 24000/1001 -p veryslow^
 --open-gop --ref 6 --bframes 16 --b-pyramid --bitrate 2500 --rect --amp --aq-mode 3^
 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0^
 --rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 1^
 --slow-firstpass --stats v.stats --sar 1 --range full -o pass1.h265 & avconv^
 -r 24000/1001 -i input.h264 -f yuv4mpegpipe -pix_fmt yuv420p -r 24000/1001 - 2>NUL^
 | .\x265.exe - --y4m -D 10 --fps 24000/1001 -p veryslow --open-gop --ref 6^
 --bframes 16 --b-pyramid --bitrate 2500 --rect --amp --aq-mode 3 --no-sao --qcomp 0.75^
 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0 --rdoq-level 1^
 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 2^
 --stats v.stats --sar 1 --range full -o pass2.h265

Needless to say, they should work fine on Windows Vista/7/8/8.1/10/Server 2008/Server 2012/HS 2007/HS 2011 as well.

From time to time, I’ll release new binaries, so you might wanna check back every few months or so, if you’re interested. You can also request a build in the comments if you’re growing impatient and need a specific version more quickly because of some bugfix / feature improvement in x265.

2.) Compiling an XP/2003-compatible x265 multilib binary yourself

First, please look at the previous article I linked to in the beginning, point 2. You need the software prerequisites listed in 2a and you might still wish to read through 2b to understand some of the stuff better. You don’t need to actually run any of the commands shown there though.

Now, the multilib build is done a bit differently from the rest, as everything is scripted, so this is 100% command line work, no graphical cmake, no running the full Visual Studio IDE. Usually, with all software in place, sitting in the root directory of the x265 source tree, all you need to do is to go to build\vc10-x86_64\ and run ./multilib.bat. This won’t give us an XP/2003-compatible binary however, and the reason lies within the build script multilib.bat, here is the stock version:

expand/collapse source code (multilib.bat)
  1. @echo off
  2. if "%VS100COMNTOOLS%" == "" (
  3.   msg "%username%" "Visual Studio 10 not detected"
  4.   exit 1
  5. )
  6.  
  7. call "%VS100COMNTOOLS%\..\..\VC\vcvarsall.bat"
  8.  
  9. @mkdir 12bit
  10. @mkdir 10bit
  11. @mkdir 8bit
  12.  
  13. @cd 12bit
  14. cmake -G "Visual Studio 10 Win64" ../../../source -DHIGH_BIT_DEPTH=ON -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=OFF -DMAIN12=ON
  15. if exist x265.sln (
  16.   MSBuild /property:Configuration="Release" x265.sln
  17.   copy/y Release\x265-static.lib ..\8bit\x265-static-main12.lib
  18. )
  19.  
  20. @cd ..\10bit
  21. cmake -G "Visual Studio 10 Win64" ../../../source -DHIGH_BIT_DEPTH=ON -DEXPORT_C_API=OFF -DENABLE_SHARED=OFF -DENABLE_CLI=OFF
  22. if exist x265.sln (
  23.   MSBuild /property:Configuration="Release" x265.sln
  24.   copy/y Release\x265-static.lib ..\8bit\x265-static-main10.lib
  25. )
  26.  
  27. @cd ..\8bit
  28. if not exist x265-static-main10.lib (
  29.   msg "%username%" "10bit build failed"
  30.   exit 1
  31. )
  32. if not exist x265-static-main12.lib (
  33.   msg "%username%" "12bit build failed"
  34.   exit 1
  35. )
  36. cmake -G "Visual Studio 10 Win64" ../../../source -DEXTRA_LIB="x265-static-main10.lib;x265-static-main12.lib" -DLINKED_10BIT=ON -DLINKED_12BIT=ON
  37. if exist x265.sln (
  38.   MSBuild /property:Configuration="Release" x265.sln
  39.   :: combine static libraries (ignore warnings caused by winxp.cpp hacks)
  40.   move Release\x265-static.lib x265-static-main.lib
  41.   LIB.EXE /ignore:4006 /ignore:4221 /OUT:Release\x265-static.lib x265-static-main.lib x265-static-main10.lib x265-static-main12.lib
  42. )
  43.  
  44. pause

So I took all the options from the files generated by the original cmake when doing the normal build, and added them to the script to ensure our output binaries would be XP-compatible. This is the fixed build script:

expand/collapse source code (multilib.bat, patched for XP/2003)
  1. @echo off
  2. if "%VS100COMNTOOLS%" == "" (
  3.   msg "%username%" "Visual Studio 10 not detected"
  4.   exit 1
  5. )
  6.  
  7. call "%VS100COMNTOOLS%\..\..\VC\vcvarsall.bat"
  8.  
  9. @mkdir 12bit
  10. @mkdir 10bit
  11. @mkdir 8bit
  12.  
  13. @cd 12bit
  14. cmake -DCMAKE_BUILD_TYPE="Release" -DCMAKE_CONFIGURATION_TYPES="Release" -G "Visual Studio 10 Win64" ../../../source -DHIGH_BIT_DEPTH=ON -DEXPORT_C_API=OFF -DWINXP_SUPPORT=ON -DENABLE_SHARED=OFF -DENABLE_CLI=OFF -DMAIN12=ON
  15. if exist x265.sln (
  16.   MSBuild /property:Configuration="Release" x265.sln
  17.   copy/y Release\x265-static.lib ..\8bit\x265-static-main12.lib
  18. )
  19.  
  20. @cd ..\10bit
  21. cmake -DCMAKE_BUILD_TYPE="Release" -DCMAKE_CONFIGURATION_TYPES="Release" -G "Visual Studio 10 Win64" ../../../source -DHIGH_BIT_DEPTH=ON -DEXPORT_C_API=OFF -DWINXP_SUPPORT=ON -DENABLE_SHARED=OFF -DENABLE_CLI=OFF
  22. if exist x265.sln (
  23.   MSBuild /property:Configuration="Release" x265.sln
  24.   copy/y Release\x265-static.lib ..\8bit\x265-static-main10.lib
  25. )
  26.  
  27. @cd ..\8bit
  28. if not exist x265-static-main10.lib (
  29.   msg "%username%" "10bit build failed"
  30.   exit 1
  31. )
  32. if not exist x265-static-main12.lib (
  33.   msg "%username%" "12bit build failed"
  34.   exit 1
  35. )
  36. cmake -DCMAKE_BUILD_TYPE="Release" -DCMAKE_CONFIGURATION_TYPES="Release" -G "Visual Studio 10 Win64" ../../../source -DWINXP_SUPPORT=ON -DEXTRA_LIB="x265-static-main10.lib;x265-static-main12.lib" -DLINKED_10BIT=ON -DLINKED_12BIT=ON
  37. if exist x265.sln (
  38.   MSBuild /property:Configuration="Release" x265.sln
  39.   :: combine static libraries (ignore warnings caused by winxp.cpp hacks)
  40.   move Release\x265-static.lib x265-static-main.lib
  41.   LIB.EXE /ignore:4006 /ignore:4221 /OUT:Release\x265-static.lib x265-static-main.lib x265-static-main10.lib x265-static-main12.lib
  42. )
  43.  
  44. pause

You can just rename your original script for backup and put the fixed code in its place, build\vc10-x86_64\multilib.bat, then run it on the command line. If all the required tools are present, it will compile a 12-bit library, then a 10-bit library (both static) and finally an 8-bit binary that will have the other two libraries statically linked in. The final x265.exe can then be found in build\vc10-x86_64\8bit\Release\. To check whether it’s the real thing, look for the bitness by running .\x265.exe --version while sitting in that folder on the command line. You should see something like this:

x265 multilib binary

A x265 multilib binary shows that it’s “8-bit+10-bit+12-bit”

Per-color-channel bitness can be defined with x265s’ command line option -D. So that’d be -D 8, -D 10 or -D 12. Note that only 8- and 10-bit are part of the official Blu-Ray UHD/4k specification however.

3.) A side note

In case you’re new to this, you might not get why “8-bit” and “10-bit” etc. Aren’t color spaces supposed to be 16-bit, 24-bit, 32-bit etc.? Well, it seems that in the world of video processing, people don’t refer to whole color space bitness, but rather individual color channel bitness. So with three channels (red, green, blue for instance), you’d have 8/10/12 bits per channel, so that’s 24-, 30- and 36-bit total, or 16.7 million, 1 billion and 64 billion colors.

The more important part – and the reason why nobody encodes to 12-bit – is the internal arithmetic precision of x265 though (same applies to x264). At 8-bit color depth, arithmetic precision is also at 8-bits. When you hop over to 10-bit, you can’t use 8-bit operations and data types any longer, so everything is done at 16-bit precision. This makes the code slower, but also more efficient in preserving color gradients. Since 10-bit H.265/HEVC is officially a part of Blu-Ray UHD/4k, this would be the sweet spot, unless you’re dealing with devices too slow to play it.

Going to 12-bit won’t boost the precision further, it just gives you more colors, that most of today’s displays won’t be able to show anyway. Not much benefit.

So that’s that.

Have fun! :)

Update: All x265 releases have now been consolidated on [this page]! All future XP- and XP x64-compatible releases of x265 plus a relatively recent version of avconv to act as a decoder for feeding x265 with any kind of input video streams will be posted there as well.

Mar 152016
 

H.265/HEVC logoJust recently, I’ve tested the computational cost of decoding 10-bit H.265/HEVC on older PCs as well as Android devices – with some external help. See [here]. The result was, that a reasonable Core 2 Quad can do 1080p @ 23.976fps @ 3MBit/s in software without issues, while a Core 2 Duo at 1.6GHz will fail. Also, it has been shown that Android devices – even when using seriously fast quad- and octa-core CPUs can’t do it fluently without a hardware decoder capable of accelerating 10-bit H.265. To my knowledge there is a hack for Tegra K1- and X1-based devices used by MX Player, utilizing the CUDA cores to do the decoding, but all others are being left behind for at least a few more months until Snapdragon 820 comes out.

Today, I’m going to show the results of my tests on Intel Skylake hardware to see whether Intels’ claims are true, for Intel has said that some of their most modern integrated GPUs can indeed accelerate 10-bit video, at least when it comes to the expensive H.265/HEVC. They didn’t claim this for all of their hardware however, so I’d like to look at some lower-end integrated GPUs today, the Intel HD Graphics 520 and the Intel HD Graphics 515. Here are the test systems, both running the latest Windows 10 Pro x64:

  • HP Elitebook 820 G3 (tiny)
  • HP Elitebook 820 G3
  • CPU: Intel [Core i5-6200U]
  • GPU: Intel HD Graphics 520
  • RAM: 8GB DDR4/2133 9-9-9-28-1T
  • Cooling: Active
  • HP Elite X2 1012 G1 (tiny)
  • HP Elite X2 1012 G1 Convertible
  • CPU: Intel [Core m5-6Y54]
  • GPU: Intel HD Graphics 515
  • RAM: 8GB LPDDR3/1866 14-17-17-40-1T
  • Cooling: Passive

Let’s look at the more powerful machine first, which would clearly be the actively cooled Elitebook 820 G3. First, let’s inspect the basic H.265/HEVC capabilities of the GPU with [DXVAChecker]:

DXVAChecker on an Intel HD Graphics 520

DXVAChecker looks good with the latest Intel drivers provided by HP (version 4331): 10-Bit H.264/HEVC is being supported all the way up to 8K!

And this is the ultra-low-voltage CPU housing the graphics core:

Intel Core i5-6200U

Intel Core i5-6200U

So let’s launch the Windows media player of my choice, [MPC-HC], and look at the video decoder options we have:

In any case, both HEVC and UHD decoding have to be enabled manually. On top of that, it seems that either Intels’ proprietary QuickSync can’t handle H.265/HEVC yet, or MPC-HC simply can’t make use of it. The standard Microsoft DXVA2 API however supports it just fine.

Once again, I’m testing with the Anime “Garden of Words” in 1920×1080 at ~23.976fps, but this time with a smaller slice at a higher bitrate of 5Mbit. The encoding options were as follows for pass 1 and pass 2:

--y4m -D 10 --fps 24000/1001 -p veryslow --open-gop --bframes 16 --b-pyramid --bitrate 5000 --rect
--amp --aq-mode 3 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0
--rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 1
--slow-firstpass --stats v.stats --sar 1 --range full

--y4m -D 10 --fps 24000/1001 -p veryslow --open-gop --bframes 16 --b-pyramid --bitrate 5000 --rect
--amp --aq-mode 3 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0
--rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 2
--stats v.stats --sar 1 --range full

Let’s look at the performance during some intense scenes with lots of rain at the beginning and some less taxing indoor scenes later:

There is clearly some difference, but it doesn’t appear to be overly dramatic. Let’s do a combined graph, putting the CPU loads for GPU-assisted decoding over the regular one as an overlay:

CPU load with software decoding in blue and DXVA2 GPU-accelerated hardware decoding in red

Blue = software decoding, magenta (cause I messed up with the red color) = GPU-assisted hardware decoding

Well, using DXVA2 does improve the situation here, even if it’s not by too much. It’s just that I would’ve expected a bit more here, but I guess that we’d still need to rely on proprietary APIs like nVidia CUVID or Intel QuickSync to get some really drastic results.

Let’s take a look at the Elite X2 1012 G1 convertible/tablet with its slightly lower CPU and GPU clock rates next:

Its processor:

Core m5-6Y54

Core m5-6Y54

And this is, what DXVAChecker has to say about its integrated GPU:

DXVAChecker on an Intel HD Graphics 515

Whoops… Something important seems to be missing here…

Now what do we have here?! Both HD Graphics 520 and 515 should be [architecturally identical]. Both are GT2 cores with 192 shader cores distributed over 24 clusters, 24 texture mapping units as well as 3 rasterizers. Both support the same QuickSync generation. The only marginal difference seems to be the maximum boost clock of 1.05GHz vs. 1GHz, and yet HD Graphics 515 shows no sign of supporting the Main10 profile for H.264/HEVC (“HEVC_VLD_Main10”), so no GPU-assisted 10-bit decoding! Why? Who knows. At the very least they could just scratch 8K support, and implement it for SD, HD, FHD and UHD 4K resolutions. But nope… Only 8-bit is supported here.

I even tried the latest beta driver version 4380 to see whether anything has changed in the meantime, but no; It behaves in the same way.

Let’s look at what that means for CPU load on the slower platform:

CPU load with software decoding

The small Core m5-6Y54 has to do all the work!

We can see that we get close to hitting the ceiling with the CPUs’ boost clock going up all the way. This is problematic for thermally constrained systems like this one. During a >4 hour [x264 benchmark run], the Elite X2 1012 G1 has shown that its 4.5W CPU can’t hold boost clocks this high for a long time, given the passive cooling solution. Instead, it sat somehwere in between 1.7-2.0GHz, mostly in the 1.8-1.9GHz area. This might still be enough with bigger decoding buffers, but DXVA2 would help a bit here in making this slightly less taxing on the CPU, especially considering higher bitrates or even 4K content. Also, when upping the ambient temperature, the runtime could be pushed back by almost an hour, pushing the CPU clock rate further down by 100-200MHz. So it might just not play that movie on the beach in summer at 36°C. ;)

So, what can we learn from that? If you’re going for an Intel/PC-based tablet, convertible or Ultrabook, you need to pick your Intel CPU+graphics solution wisely, and optimally not without testing it for yourself first! Who knows what other GPUs might be missing certain GPU video decoding features like HD Graphics 515 does. Given that there is no actual compatibility matrix for this as of yet (I have asked Intel to publish one, but they said they can’t promise anything), you need to be extra careful!

For stuff like my 10-bit H.265/HEVC videos at reasonably “low” bitrates, it’s likely ok even with the smallest Core m3-6Y30 + HD Graphics 515 that you can find in devices like Microsofts’ own Surface Pro 4. But considering modern tablets’ WiDi (Wireless Display) tech with UHD/4K resolutions, you might want to be careful when choosing that Windows (or Linux) playback device for your big screens!

Feb 232016
 

H.265/HEVC logoAfter [compiling] and running the x265 HEVC encoder, and after [looking at its quality] for animated content, here’s another little piece of information about my experiments with H.265/HEVC. And this time it’s the decoding part. Playing H.265-encoded videos on the PC is relatively easy. On Windows I tend to use [MPC-HC] for this, and on Linux/UNIX you can use [mplayer] or [VLC]. The newest versions of those players are all linked against a modern libav or ffmpeg library collection, so they can decode anything any H.265 encoder can throw at them.

The questions are: At what price? And: What about mobile devices?

H.265/HEVC is costly in terms of computation, and not just in the encoding stage. Decoding this stuff is hard as well. So I looked at two older Core 2 processors to see how they fare when decoding regular 10-bit H.264/AVC and the same content encoded as 10-bit H.265/HEVC, both times at the same bitrate of 3Mbit/s ABR. Again, the marvelous Anime movie “The Garden of Words” was used for this. The video player of my choice for playback was [MPC-HC] v1.7.10, rendering to a VMR9 surface.

On top of that, I can also provide some insight on how relatively modern Android devices will handle this (devices partly without a H.265 hardware decoder chip however!), all thanks to [Umlüx], who’s been willing to install the necessary Apps and run some tests! On Android, [MX Player] was used.

For the record, the encoding settings were like this for x264 (pass 1 & pass 2), …

--fps 24000/1001 --preset veryslow --tune animation --open-gop --b-adapt 2 --b-pyramid normal -f -2:0
--bitrate 3000 --aq-mode 1 -p 1 --slow-firstpass --stats v.stats -t 2 --no-fast-pskip --cqm flat
--non-deterministic

--fps 24000/1001 --preset veryslow --tune animation --open-gop --b-adapt 2 --b-pyramid normal -f -2:0
--bitrate 3000 --aq-mode 1 -p 2 --stats v.stats -t 2 --no-fast-pskip --cqm flat --non-deterministic

…and this for x265:

--y4m -D 10 --fps 24000/1001 -p veryslow --open-gop --bframes 16 --b-pyramid --bitrate 3000 --rect
--amp --aq-mode 3 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0
--rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 1
--slow-firstpass --stats v.stats --sar 1 --range full

--y4m -D 10 --fps 24000/1001 -p veryslow --open-gop --bframes 16 --b-pyramid --bitrate 3000 --rect
--amp --aq-mode 3 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0
--rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 2
--stats v.stats --sar 1 --range full

PC first:

Contender #1 is my old Sony TT45X subnotebook, and this is the processor inside:

CPU-Z on a Core 2 Duo SU9600

A Core 2 Duo SU9600 Penryn, which is an ULV processor at 1.6GHz, shown under all-core load, where it doesn’t boost to 1.83GHz any longer (And yeah, this is still WinXP+POSReady2009).

Contender #2 is my secondary workstation at work, that I recently upgraded with an SSD and a better processor we had lying around. It’s using this chip now:

CPU-Z on a Core 2 Quad Q9505

A Core 2 Quad Q9505 Yorkfield, which has been overclocked to a rock solid 3.4GHz.

Let’s throw some video at them! First, we’ll try some good old H.264 on the slow-clocked Core 2 Duo mobile chip:

Core 2 Duo playing the beginning of "The Garden of Words" as 3Mbit H.264/AVC

The Core 2 Duo SU9600 playing the beginning of “The Garden of Words” as 3Mbit H.264/AVC (on a German version of Windows).

We can see some high load there. Mind you, since this is 10-bit H.264, there is no GPU acceleration whatsoever, minus maybe the bicubic scaler, which is implemented as Pixel Shader 2.0 code. Everything else has to be done by the CPU and its SSE extensions. Let’s take a look at the new H.265 then:

Core 2 Duo playing the beginning of "The Garden of Words" as 3Mbit H.265/HEVC

Same thing as before, but with 3Mbit H.265/HEVC.

The beginning of the movie, which shows only some moving logo on mostly uniform background and then some static text does just fine. You can really see what’s happening on screen when looking at the usage curve above. As soon as the first serious video frame with lots of movement starts to fade in, the machine just falls on its face, as the bitrate is rising. Realtime playback can no longer be achieved, A/V synchronization is being lost and a ton of frames are being dropped, resulting in a horrible experience.

Thing is, even with all the frame drops, it’s still losing sync even and falling short of delivering those frames which are still being decoded with the proper timing. It’s just abysmal.

So let’s change our environment to a CPU with twice the cores and roughly twice the clock speed, while staying with the same architecture / instruction set, H.264 first:

Core 2 Quad playing the beginning of "The Garden of Words" as 3Mbit H.264/AVC

The quad-core Q9505 hardly has any trouble with H.264 at all. It’s completely smooth sailing.

And with the harder stuff:

Core 2 Quad playing the beginning of "The Garden of Words" as 3Mbit H.265/HEVC

We can see that the load has roughly doubled with H.265.

Now, H.265 is clearly putting some load even on the quad-core. I’m assuming that with higher bitrates as we would use for 4K/UHD material, this processor might be in trouble. For 1080p and bitrates up to maybe 6Mbit/s it should still be ok however. Of course, with the most modern graphics cards you can give even a Core 2 a companion device which can do H.265 decoding in hardware, like an NVidia GPU that can provide you with [PureVideo] feature sets E or even F, or maybe an AMD Radeon with  [UVD] level 6. But even then, 10-bit content might not be accelerated, so you need to be careful when choosing your GPU. Intel has recently added 10-bit support to some of their onboard GPUs (HD Graphics 5500 & 6000, Iris Graphics 6100) with driver v15.36.14.4080, nVidia added it starting with the Maxwell generation and AMD Radeons still don’t have it as far as I know.

Now, what about mobile devices? A fast enough PC can do H.265/HEVC even without hardware assist, at the very least in 1080p, and likely 4K as well, when we look at modern Core i3/i5/i7 and somewhat comparable AMD Athlon FX chips. How about multicore ARMv7/v8 chips with NEON instruction set extensions?

Let’s look at two Android devices, a Sony Xperia Tablet Z2 (without a H.265 hardware decoder) and a Sony Xperia Z5 phone:

The Tablet features the following CPU:

The Xperia Tablet Z2 uses a Qualcomm Snapdragon 801

The Xperia Tablet Z2 uses a Qualcomm Snapdragon 801 quad-core.[1]

That’s not exactly a slow chip, but an out-of-order execution pipeline with NEON extensions at a decent clock rate. And the phone:

The Xperia Z5 has a Qualcomm Snapdragon 810

The Xperia Z5 has a Qualcomm Snapdragon 810 in big.LITTLE configuration.[1]

So the phone has a very modern 64-bit ARMv8 big.LITTLE CPU setup with four faster out-of-order cores, and four slower, energy-efficient in-order cores. Optimally, all of them should be used as much as possible when throwing a seriously demanding task at the device. Let’s look at how it goes, but first on the older tablet with H.264 for starters:

The Xperia Tablet Z2 playing H.264/AVC.

The Z2 manages to play the H.264/AVC version without any stuttering, but just barely. It has to boost its clock speed all the way to the top to manage.[1]

You’re probably thinking “There’s no way this can work with H.265”, right? Well, you’d be correct:

With H.265, the Xperia Tablet Z2 stumbles.

With 3-Mbit H.265, the Xperia Tablet Z2 stumbles. Full clock speed, all cores under load, no chance to play demanding scenes without massive problems.[1]

It just bombs. Frame drops and stuttering, no way you’d want to watch anything when it goes like that. Some of the calmer scenes (=lower bitrate) still work, but lots of rain all over the frame and the Z2 is done for. So let’s move to the more modern hybrid-core processor of the Sony Xperia Z5 smartphone:

The Z5 playing the 10-bit H.264 on CPU only

The Z5 playing the 10-bit H.264 on CPU only. The octa-core big.LITTLE processor seems to load its fat cores mostly, which makes sense with demanding workloads.[1]

The big Cortex-A57 cores seem to be doing most of the work here, clocking at their maximum speed. Given we’re over 50% load with the most difficult scenes however, some threads seem to be pushed over to the smaller Cortex-A53 cores as well. In any case, the end result is that H.264 works smoothly throughout the movie. But still, load is high, and our only headroom left are some free, slow in-order cores… So, H.265?

The Z5 fails with 10-bit H.264 too. Where is my hardware decoder?!

The Z5 fails as well. While it can cope a little better, scenes like the above are just too much. Where is my hardware decoder?![1]

Something strange is happening now: H.265/HEVC hardware decoders should support this 10-bit H.265 file just fine. But for some reason, MX Player still falls back to software decoding, despite the player being up-to-date – something even a CPU as powerful as a Snapdragon 810 cannot handle for all parts of “The Garden of Words” at the given settings. The really demanding scenes will still fail to play back decently. CPU clock is also lowered a bit now, probably because of power and/or heat management.

My assumption would be that MX Player simply doesn’t support the HW decoder in this phone yet, which is a shame if it’s true. Another reason might be that I used some parameters and/or features of H.265 in this encode, that are not implemented in this chip. Whichever the case may be, the Snapdragon 810 alone cannot handle it either!

Update: After further research it turns out that almost no hardware accelerator supports Hi10p or Main-10 for H.265. In other words: No 10-bit decoding for H.265! It’s possible on NVidias Tegra K1 & X1 due to some CUDA hacks in MX Player, but nowhere else it seems. The upcoming Snapdragon 820 should however support it, and devices based on it should become available around March 2016:

Snapdragon 820s' Advancements

This is what Snapdragon 820 (MSM8996) should give us over Snapdragon 810 (MSM8994), including 10-bit H.265/HEVC decoding in hardware ([source]Russian flag)!

And this concludes my little performance analysis, after which I can say that either you need a relatively ok PC to use H.265, or a hardware decoder chip that works with your files and your software, if you’re targeting other platforms like hardware players or smartphones and tablets!

PS.: Thanks fly out to Umlüx for doing the Android tests!

[1] Images are © 2016 Umlüx, used with express permission.

Feb 202016
 

H.265/HEVC logoRecently, after [successfully compiling] the next generation x265 H.265/HEVC video encoder on Windows, Linux and FreeBSD, I decided to ask for guidance when it comes to compressing Anime (live action will follow at a later time) in the Doom9 forums, [see here]. Thing is, I didn’t understand all of the knobs x265 has to offer, and some of the convenient presets of x264 didn’t exist here (like --tune film and --tune animation). So for a newbie it can be quite hard to make x265 perform well without sacrificing far too much CPU power, as x265 is significantly more taxing on the processor than x264.

Thanks to [Asmodian] and [MeteorRain]/[LittlePox] I got rid of x265s’ blurring issues, and I took their settings and turned them up to achieve more quality while staying within sane encoding times. My goal was to be able to encode 1080p ~24fps videos on an Intel Xeon X5690 hexcore @ 3.6GHz all-core boost clock at >=1fps for a target bitrate of 2.5Mbit.

In this post, I’d like to compare 7 scenes from the highly opulent Anime [The Garden of Words] (言の葉の庭) by [Makoto Shinkai] (新海 誠) at three different average bitrates, 1Mbit, 2.5Mbit (my current x264 default) and 5Mbit. The Blu-Ray source material is H.264/AVC at roughly 28Mbit on average. Also, both encoders are running in 10-bit color depth mode instead of the common 8-bit, meaning that the internal arithmetic precision is boosted from 8- to 16-bit integers as well. While somewhat “non-standard” for H.264/AVC, this is officially supported by H.265/HEVC for Blu-Ray 4K. The mode of operation is 2-pass to aim for comparable file sizes and bitrates. The encoding speed penalty for switching from x264 to x265 at the given settings is around a factor of 8. Somewhat.

The screenshots below are losslessly compressed 1920×1080 images. Since this is all about compression, I chose to serve the large versions of the images in WebP format to all browsers which support it (Opera 11+, Chromium-based Browsers like Chrome, Iron, Vivaldi, the Android Browser or Pale Moon as the only Gecko browser). This is done, because at maximum level, WebP does lossless compression much more efficiently, so the pictures are smaller. This helps, because my server only has 8Mbit/s upstream. If your browser doesn’t support WebP (like Firefox, IE, Edge, Safari), it’ll be fed lossless PNG instead. All of this happens automatically, you don’t need to do anything!

Now, let’s start with the specs.

Specifications:

Here are the source material encoding settings according to the video stream header:

cabac=1 / ref=4 / deblock=1:1:1 / analyse=0x3:0x133 / me=umh / subme=10 / psy=1 / psy_rd=0.40:0.00 /
mixed_ref=1 / me_range=24 / chroma_me=1 / trellis=2 / 8x8dct=1 / cqm=0 / deadzone=21,11 /
fast_pskip=1 / chroma_qp_offset=-2 / threads=12 / lookahead_threads=1 / sliced_threads=0 / slices=4 /
nr=0 / decimate=1 / interlaced=0 / bluray_compat=1 / constrained_intra=0 / bframes=3 / b_pyramid=1 /
b_adapt=2 / b_bias=0 / direct=3 / weightb=1 / open_gop=1 / weightp=1 / keyint=24 / keyint_min=1 /
scenecut=40 / intra_refresh=0 / rc_lookahead=24 / rc=2pass / mbtree=1 / bitrate=28229 /
ratetol=1.0 / qcomp=0.60 / qpmin=0 / qpmax=69 / qpstep=4 / cplxblur=20.0 / qblur=0.5 /
vbv_maxrate=31600 / vbv_bufsize=30000 / nal_hrd=vbr / filler=0 / ip_ratio=1.40 / aq=1:0.60

x264 10-bit encoding settings (pass 1 & pass 2), 2.5Mbit example:

--fps 24000/1001 --preset veryslow --tune animation --open-gop --b-adapt 2 --b-pyramid normal -f -2:0
--bitrate 2500 --aq-mode 1 -p 1 --slow-firstpass --stats v.stats -t 2 --no-fast-pskip --cqm flat
--non-deterministic

--fps 24000/1001 --preset veryslow --tune animation --open-gop --b-adapt 2 --b-pyramid normal -f -2:0
--bitrate 2500 --aq-mode 1 -p 2 --stats v.stats -t 2 --no-fast-pskip --cqm flat --non-deterministic

x265 10-bit encoding settings (pass 1 & pass 2), 2.5Mbit example:

--y4m -D 10 --fps 24000/1001 -p veryslow --open-gop --bframes 16 --b-pyramid --bitrate 2500 --rect
--amp --aq-mode 3 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0
--rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 1
--slow-firstpass --stats v.stats --sar 1 --range full

--y4m -D 10 --fps 24000/1001 -p veryslow --open-gop --bframes 16 --b-pyramid --bitrate 2500 --rect
--amp --aq-mode 3 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0
--rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 2
--stats v.stats --sar 1 --range full

Since x265 can only read raw YUV and Y4M, the source video is being fed to it via [libavs’] avconv tool, piping it into x265. The avconv commandline for that looks as follows:

$ avconv -r 24000/1001 -i input.h264 -f yuv4mpegpipe -pix_fmt yuv420p -r 24000/1001 - 2>/dev/null

If you want to do something similar, but you don’t like avconv, you can use [ffmpeg] as a replacement, the options are completely the same. Note that you should always specify the correct frame rates (-r) for input and output, or the bitrate setting of the encoder will be applied wrongly!

x264 on the other hand was linked against libav directly, using its decoding capabilities without any workarounds.

Let’s compare:

“The Garden of Words” has a lot of rain. This is a central story element of the 46 minute movie, and it’s hard on any encoder, because a lot of stuff is moving on screen all the time. Let’s take a look at such a scene for our first side-by-side comparison. Each comparison is done in two rows: H.264/AVC first (including the source material), and below that H.265/HEVC, also including the source.

Let’s go:

Scene 1, H.264/AVC encoded by x264 0.148.x:

Scene 1, H.265/HEVC encoded by x265 1.9+15-425b583f25db:

It has been said that x265 performs specifically well at two things: Very high resolutions (which we don’t have here) and low bitrates. And yep, it shows. When comparing the 1Mbit shots, it becomes clear pretty quickly that x265 manages to preserve more detail for the parts with lots of motion. x264 on the other hand starts to wash out the scene pretty severely, smearing out some raindrops, spray water and parts of the foliage. Also, it’s pretty bad around the outlines as well, but that’s true for both encoders. You can spot that easily with all the aliasing artifacts on the raindrops.

Moving up a notch, it becomes very hard to distinguish between the two. When zooming in you can still spot a few minor differences (note the kids umbrella, even if it’s not marked), but it’s quite negligible. Here I’m already starting to think x265 might not be worth it in all cases. There are still differences between the two 2.5Mbit shots and the original however, see the red areas of the umbrella and the most low-contrast, dark parts of the foliage.

At 5Mbit, I really can’t see any difference anymore. Maybe the colors are a little off or something, but when seen in motion, distinguishing between the two and the original becomes virtually impossible. Given that we just threw a really difficult scene at x264 and x265, this should be a trend to continue throughout the whole test.

Now, even more rain:

Scene 2, H.264/AVC encoded by x264 0.148.x:

Scene 2, H.265/HEVC encoded by x265 1.9+15-425b583f25db:

Now this is extreme at 1Mbit! Looking at H.264, the spray water on top of the cable can’t even be told apart from the cloud in the background anymore. Detail loss all over the scene is catastrophic in comparison to the original. Tons of raindrops are simply gone entirely, and the texture details on the tower and the angled brick wall of the house to the left? Almost completely washed out and smeared.

Now, let’s look at H.265 @ 1Mbit. The spray water is also pretty bad, but it’s amazing how much more detail was preserved overall. Sure, there are still parts of the raindrops missing, but it’s much, much closer to the original. We can now see details on the walls as well, even the steep angle one on the left. The only serious issue is the red light bleeding at the tower. There is very little red there in the original, so I’m not sure what happened there. x264 does this as well, but x265 is a bit worse.

At the next level, the differences are less pronounced again, but there is still a significant enough improvement when going from x264 to x265 at 2.5Mbit: The spray water on the cable becomes more well-defined, and more rain is being preserved. Also, the textures on the walls are a tiny little bit more detailed and crisp. Once again though, x265 is bleeding too much red light at the tower.

Since it’s noticeably not fully on the level of the source still, let’s look at 5Mbit briefly. x265 is able to preserve a tiny little bit more rain detail here, coming extremely close to the original. In motion, you can’t really see the difference however.

Now, let’s get steamy:

Scene 3, H.264/AVC encoded by x264 0.148.x:

Scene 3, H.265/HEVC encoded by x265 1.9+15-425b583f25db:

1Mbit first again: Let me just say: It’s ugly. x264 pretty much messes up the steam coming from the iron. We get lots of block artifacts now. Some of the low-contrast patterns on the ironing board are being smeared out at a pretty terrible level. Also, the bokeh background partly shows block artifacts and banding. x265 produces quite a lot of banding here itself, but no blocks. Also, outlines and sharp contrasts are more well-defined, and the low contrast part is done noticeably better.

At 2.5Mbit, the patterns repeat themselves now. The steam is only slightly better with x265, outlines are slightly more well-defined, and the low-contrast patterns are slightly more visible. For some of the blurred parts, x265 seems to be a tiny little bit to prone to banding though, in a very few spots, x264 might be just that 1% better. Overall however, x265 wins this, and even if it’s just for the better outlines.

At 5Mbit, you really need to zoom in and analyze very small details, e.g. around the outer border of the steam. Yes, x265 does better again. But you’d not really be able to notice this when watching.

How about we go cry a little bit:

Scene 4, H.264/AVC encoded by x264 0.148.x:

Scene 4, H.265/HEVC encoded by x265 1.9+15-425b583f25db:

Cutting onions would be a classic fun part in a slice-of-life anime. Here, it’s just kitchen work. And quite the bad looking one for H.264 at 1Mbit. The letters on the knife are partly lost completely, becoming unreadable. The onion parts that fly off are visibly worse than when encoded with x265 at the same bitrate. Also, x264 produced block artifacts in the blurred bokeh areas again, that simply aren’t there with x265.

On the next level, the two come much closer to each other. However, x265 simply does the outlines better. Less artifacts and sharper, just like with the writing on the knifes’ blade as well. The issues with the bokeh are nearly gone. What’s left is a negligible amount of blocking for x264 and banding for x265. Not really noticeable however.

Finally, at 5Mbit, x265 shows those ever so slightly more well-done outlines. But that’s about it, the rest looks nice for both, and pretty much identical to the source.

Now, please, dig in:

Scene 5, H.264/AVC encoded by x264 0.148.x:

Scene 5, H.265/HEVC encoded by x265 1.9+15-425b583f25db:

Let’s keep this short: x264 does blocking, and bad transitions/outlines. x265 does it better. Plain and simple.

At 2.5Mbit, x265 nearly reaches quality transparency when compared to the original, something x264 falls short of, just a bit. While x265 does the outlines and the steam part quite like in the original frame, x264 rips the outlines apart a bit too much, and slight block artifacts can again be seen for the steam part.

At 5Mbit, x264 still shows some blocking artifacts in a part that most lossy image/video compression algorithms traditionally suck at: The reds. While not true for all human beings, most eyes perceive much finer gradients for greens, then blues, and do worst with reds. Meaning, our eyes have an unequal sensitivity distribution when it comes to colors. So image and video codecs try to save bitrate in the reds first, because supposedly it’d be less noticeable. To me subjectively, x265 achieves transparency here, meaning it looks just like the original. x264 doesn’t manage entirely. Close, but not not entirely.

Next:

Scene 6, H.264/AVC encoded by x264 0.148.x:

Scene 6, H.265/HEVC encoded by x265 1.9+15-425b583f25db:

This is a highly static scene, with only few moving parts, so there is some rain again, and some shadow cast by raindrops as well. Now, for the static parts, incremental B frames really work wonders here. Most detail is being preserved by both encoders. What’s supposed to be happening is happening: The encoders save bit rate where the human eye can’t easily tell: In the parts where stuff is moving around very quickly. That’s how we lose a lot of raindrop shadows and some drops as well. x264 seems to have trouble separating the scene into even smaller macro blocks though? Not sure if that’s the reason, but a lot of mesh detail for the basket on the balcony on the top right is lost – x265 does better there! This is maybe because x264 couldn’t distinct the moving drops from the static background so well anymore?

At 2.5Mbit, the scenes become almost indistinguishable. The more static content we have, the easier it gets of course, so the transparency threshold becomes lower. And if you ask me, both of them reach perfect quality at 5Mbit.

Let’s throw another hard one at them for the last round:

Scene 7, H.264/AVC encoded by x264 0.148.x:

Scene 7, H.265/HEVC encoded by x265 1.9+15-425b583f25db:

Enough rain already? Pffh! Here we have a lot of foliage and low contrast added to the mix. And it gets smeared a lot by x264, rain detail lost, fine details of the bushes turning into green mud, that’s how it goes. x265 also loses too much detail here (I mean, 1Mbit is really NOT much), but again, it fares quite a bit better.

At 2.5Mbit, both encoders do very well. Somehow, this scene doesn’t seem to penalize x264 that much at the medium level. You’d really need your magnifying glass to find the spots where x265 still does better, which surprises me a bit for this scene. And finally, at 5Mbit – if you ask me – visual transparency is reached for both x264 and x265.

Final thoughts:

Clearly it’s true what a lot of people have been saying. x265 rocks at low bitrates, if configured correctly. But that isn’t gonna give me perfect quality or anything. Yeah, it sucks less – much less – than x264 in that department, but at a higher 2.5Mbit, where both start looking quite decent, x265 having just a slight edge… it becomes hardly justifiable to use it, simply because it’s that much slower to run it at decent settings.

Also, you need to take device compatibility into account. Sure, a powerful PC can always play the stuff. No matter if it’s some UNIX, Linux, MacOS X or Windows. But how about video game consoles? Older TVs? That kind of thing. Most of those can only play H.264/AVC. Or course, if you’re only using your PC and you have a lot of time and electricity to burn, then hey – why not?

But I’ll have to think really long and really hard about whether I want to replace x264 with x265 at this given point in time. Overall, it might not be practical enough on my current hardware yet. Maybe I’d need an AVX/AVX2-capable processor, as x265 has tons of optimizations for those instruction set extensions. But I’m gonna stay on my Xeon X5690 for quite a while, so SSE 4.2 is the latest I have.

I’d say, if you can stand some quality degradation, then x265 might be the way to go, as it can give you much smaller file sizes at lower bitrates with slight degradation.

If you’re aiming for high bitrates and quality, it might not be worth it right now, at least for 1080p. It’s been said the tables are turning once more when going up to 4K and UHD, but I haven’t tested that yet, as all my content – both Anime and live action movies – are “low resolution” *cough* 1080p or 720p.

Addendum:

Screenshots were taken using the latest stable mplayer 1.3.0 on Linux. Thank god they’re bundling it with ffmpeg now, making things much easier. This choice was made because mplayer can easily grab screenshots from specific spots in a video in an automated fashion. I used framesteps for this, like this:

$ mplayer ./TEST-H.265/HEVC-1mbit.mkv -nosound -vf framestep=24 \
 -vo png:z=9:outdir=./screenshots/1mbit/H.265/HEVC/:prefix=H.265/HEVC-1mbit-

This will decode every 24th frame of the video file TEST-H.265/HEVC-1mbit.mkv, and grab it into a .png file with maximum lossless compression as supported by mplayer. The .png files will be prefixed with a user-defined string and numbered, like H.265/HEVC-1mbit-00000001.pngH.265/HEVC-1mbit-00000002.png and so on, until the end of file is reached.

To encode the full size screenshots to WebP, the most recent [libwebp-0.5.0], or rather one of its companion tools – cwebp – was used as follows:

$ cwebp -z 9 -lossless -q 100 -noalpha -mt ./input.png -o ./output.webp

Now… somebody wanna grant me remote access to some quad socket Haswell-EX Xeon box for free?

No?

Meh… :roll:

Feb 122016
 

H.265/HEVC logo1.) Giving you the binaries:

Just recently I tried to give the x265 H.265/HEVC video encoder another chance to prove itself, because so far I’ve been using x264, so H.264/AVC. x264 does a really good job, but given that the marketing guys are talking about colossal efficiency/quality gains with H.265, I thought I’d put that to the test once again. It was quite easy to compile the software on my CentOS 6.7 Linux, but my old XP x64 machine proved to be a bit tricky.

But, after a bit of trial and error and getting used to the build toolchain, I managed to compile a seemingly stable version from the latest snapshot:

x265 cli, showing the version info

x265 cli, showing its version info.

Update 2: And here comes my first attempt to build an x86_64 multilib binary, that can encode H.265 at 8-bit, 10-bit and 12-bit per pixel color depths. You may wish to use this if you need more flexibility (like using 12-bit for PC only and 8- or 10-bit for a broader array of target systems like TVs, cellphones etc.). It’s currently still being tested for a short encoding run. You can specify the desired color depth with the parameter -D, like -D 8, -D 10 or -D 12:

Update: And here are the newer 1.9 versions, built from source code directly from the [MulticoreWare] (=the developers) servers. I haven’t tested these yet, but given that I configured & compiled them in the same way as before, they “should work™”:

And the old 1.7 versions from the Videolan servers:

So this has been built with MSVC10 and yasm 1.3.0 on Windows XP Pro x64 SP2, meaning this needs the Microsoft Visual C++ 2010 runtime. You can get it from Microsoft if you don’t have it yet: [32-bit version], [64-bit version]. v1.7 tested for basic functionality on XP 32-bit and tested for encoding on XP x64. The 32-bit version only supports 8-bit per pixel, which is default for x264 as well. The 64-bit versions support either 8-, 10- or 12 bits per pixel. Typically, higher internal precision per pixel results in finer gradients and less banding. 8-/10-bit is default for H.265/HEVC. 12-bit will likely not be supported by any hardware players, just as it was with 10-bit H.264/AVC before.

You may or may not know it, but as of now, x265 cannot be linked against either ffmpeg or libav, so it can only read raw input. To “feed” it properly you need either a frame server like [AviSynth] in combination with the pipe tool [Avs4x265], or a decoder that can pipe raw YUV to x265. I went for the latter version, because I already have libav+fdkaac compiled for Windows to get the avconv.exe binary. It’s quite similar to ffmpeg.

This I can only provide as a 64-bit binary, as I’m not going to build it for 32-bit Windows anytime soon I guess, so here you go:

This was compiled with GCC 4.9.2 and yasm 1.3.0 on CygWin x64. To use the two together, add the locations of your EXE files (avconv.exe and x265.exe) to your search path. Then, you can feed arbitrary video (VC1, H.264/AVC, MPEG-2, whatever) to x265. An example for a raw H.264/AVC input stream using the 64-bit versions of the software:

avconv -r 24000/1001 -i video-input.h264 -f yuv4mpegpipe -pix_fmt yuv420p - 2>NUL |^
 x265 - --wpp --y4m -D 12 -p slower --bframes 16 --bitrate 2000 --crf 18 --sar 1^
 --range full -o video-output.h265

Or another, reading some video stream from an MKV container, disabling audio and subtitles:

avconv -r 24000/1001 -i video-input.mkv -f yuv4mpegpipe -pix_fmt yuv420p -an -sn^
 - 2>NUL | x265 - --wpp --y4m -D 12 -p slower --bframes 16 --bitrate 2000 --crf 18^
 --sar 1 --range full -o video-output.h265

Just remove the carets and line breaks to make single-line commands out of those if preferred. To understand all the options, make yourself some readmes like this: avconv -h full > avconv-readme.txt and x265 --log-level full --help > x265-readme.txt or read the documentation online.

2.) How to compile by yourself:

2a.) Prerequisites:

I won’t describe how to build libav here, but just the x265 part. First of all, you need some software to be able to do this, some of it free, other not so much (unless you can swap MSVC with MSYS, I didn’t try that):

  • [cmake] (I used version 2.8.12 because that’s roughly what I have on Linux.)
  • [Mercurial] (Needed to fetch the latest version from x265′ versioning system. I used the latest Inno Setup installer.)
  • [yasm] (Put yasm.exe in your search path and you’re fine. This is optional, but you really want this for speed reasons.)
  • [Microsoft Visual Studio] (Use 2010 if you’re on Windows XP. Supported versions: 2008/VC9, 2010/VC10, 2012/VC11 & 2013/VC12)
  • [x265 source code] (Enter a target download path and use Mercurials hg.exe like hg clone https://bitbucket.org/multicoreware/x265 to fetch it)

2b.) Preparation of the solution:

Usually, you would use cmake to have it compile your entire project, but in this case it’ll build Visual Studio project files and a solution file for us. To do this, enter the proper build path. In my case I’m using Visual Studio 2010, so VC10, and I’d like to build the 64-bit version, so with the unpacked x265 source, I’d enter its subdirectory build\vc10-x86_64\ and then run the generation script: .\make-solutions.bat:

make-solutions.bat preparing cmake for us

make-solutions.bat is preparing cmake for us.

There are several things you need to make sure here: First, if you’re on Windows XP or Windows Vista, you need to toggle the WINXP_SUPPORT flag. Also, if you’re compiling for a 64-bit target, you may wish to enable HIGH_BIT_DEPTH as well to get to either 10-bit or even 12-bit per pixel other than just 8. The 32-bit code doesn’t seem to support high bit dephts right now.

Then there is one more important thing; With CMAKE_CONFIGURATION_TYPES set to Debug;Release;MinSizeRel;RelWithDebInfo, my build was unstable, throwing errors during encoding, like x265 [error]: scaleChromaDist wrap detected dist: -2029735662 lambda: 256. Setting CMAKE_CONFIGURATION_TYPES to just Release solved that problem! So you may wish to do the same.

Make sure ENABLE_CLI and ENABLE_ASSEMBLY are checked as well, then click Configure. If you’re building with high bit depth support, you’ll be presented with another option, MAIN12. You should enable this to get Main12 profile support in case you’re planning to build a 12-bit encoder. If you don’t pick it, you’ll get a 10-bit version instead, staying within Blu-Ray 4K specifications. After that, click Configure again. Generally, you need to click Configure unless no more red stuff pops up, then click Generate.

2c.) Compiling and installing the solution:

Load the resulting solution file x265.sln into Microsoft Visual Studio, then right click ALL_BUILD and pick Build. This will compile x265. If you want to install it from the IDE as well, right click INSTALL and select Build. This will install x265 in %PROGRAMFILES%\x265\ with the binary sitting in %PROGRAMFILES%\x265\bin\:

Microsoft Visual Studio 2010, ready to compile the x265 solution generated by cmake

Microsoft Visual Studio 2010 with the x265 solution generated by cmake loaded, compiled and installed.

3.) Running it:

Now we can feed x265 some raw YUV files like this, after adding x265.exe to the search path:

x265 encoding a raw YUV file to H.265/HEVC

x265 encoding a raw YUV 4:2:0 file to H.265/HEVC (The options given to x265 may actually suck, I’m still in the learning process).

Or we can use a decoder to feed it arbitrary video formats, even from MKV containers, like shown in the beginning. ffmpeg or avconv can decode pretty much anything, and then pipe it into x265 as raw YUV 4:2:0:

x265 being fed a H.264/AVC bitstream by avconv

x265 being fed a H.264/AVC bitstream by avconv.

And that’s it! Now all I need is some 18-core beast processor to handle the extreme slowness of the encoder at such crazy settings. When going almost all-out, it’s easily 10 times as slow as x264 (at equally insane settings)! Or maybe I can get access to some rack server with tons of cores or something… :roll:

Update: All x265 releases have now been consolidated on [this page]! All future XP- and XP x64-compatible releases of x265 plus a relatively recent version of avconv to act as a decoder for feeding x265 with any kind of input video streams will be posted there as well.