1.) Introduction

After doing a [somewhat proper comparison between x264 and x265] a while back, I thought I’d do another one at extremely low bitrates. It reminded me of the time I’ve been using ISDN at 64kbit/s (my provider didn’t let me use CAPI channel aggregation for 128kbit/s), which was the first true flat rate in my country. ‘Cause I’ve been thinking this:

“Can H.265/HEVC enable an ISDN user to stream 1080p content in any useful form?” and “What would H.264/AVC look like in that case?”

Let me say this first: It reaaally depends on how you define “useful”.

Pretty much nobody uses ISDN these days, and V.9x 56kbaud modems are dying out in the 1st world as well, so this article doesn’t make a lot of sense. To be fair, I didn’t even pick encoding settings fit for low-latency streaming either, nor are my settings fit for live encoding. So it’s just for the lulz, but still! I wanted to see whether it could be done at all, in theory.

To make it happen, I had to choose extremely low bitrates not just for video, but audio as well. There are even subtitles included in my example, which are present in Matroska-style zlib-compressed [.ass] format, so compressed text essentially.

For the audio part, I chose the Fraunhofer FhG-AAC encoder to encode to the lowest possible constant bitrate, which is 8kbit/s HEv2-AAC. That’s a VoIP-focused version of the codec targeted at preserving human speech as well as possible at conditions as bad as they get. And yes, it sounds terrible. But it still gets across just enough to be able to understand what people are saying and what type of sounds are occurring in a scene. Music and most environmental sounds are terrible in quality, but they are still discernible.

For video, I picked a 2-pass ABR mode with a 50kbit/s target bitrate, which is insanely low even for the Anime content I picked (my apologies, Mr. “[Anime is not what everyone watches]”, but yes, I picked Anime again). Note that 2D animated content is pretty easy on the encoders in this case, so the results would’ve likely been a lot worse with 1080p live action content. As for the encoder settings, you can find those [down below] and as for how I’m taking the screenshots, I’ll spare you those details, they’re pretty similar to the stuff shown in the link at the top.

Before we start with the actual quality comparison, I should mention that my test results actually overshot their target, so they’re really unsuitable for live streaming even in the ISDN case. I just didn’t care enough for trying to push the bitrate down any further. Regular streaming would still be possible with my result files, but not without prebuffering. See here:

$ls -hs *.mkv 2.6M Piaceː Watashi no Italian - Episode 02-H.264+HEv2AAC-V50kbit-A8kbit.mkv 2.0M Piaceː Watashi no Italian - Episode 02-H.265+HEv2AAC-V50kbit-A8kbit.mkv 76M Piaceː Watashi no Italian - Episode 02.mkv$ for i in {'Piaceː Watashi no Italian - Episode 02.mkv','Piaceː Watashi no Italian - \
Episode 02-H.264+HEv2AAC-V50kbit-A8kbit.mkv','Piaceː Watashi no Italian - \
Episode 02-H.265+HEv2AAC-V50kbit-A8kbit.mkv'}; do mediainfo "$i" | grep -i "overall bit rate"; done Overall bit rate : 2 640 kb/s Overall bit rate : 88.6 kb/s Overall bit rate : 69.2 kb/s The first one is the source (note: From [Crunchyroll], legal to watch and record in my country at this time), the second my x264 and the third my x265 versions. Let’s show you the bitrate overshoot of just the video streams in my versions: $ for i in {'Piaceː Watashi no Italian - Episode 02-H.264+HEv2AAC-V50kbit-A8kbit.mkv','Piaceː \
Watashi no Italian - Episode 02-H.265+HEv2AAC-V50kbit-A8kbit.mkv'}; do mediainfo \
--Output="Video;%BitRate/String%" "$i"; done 71.1 kb/s 51.6 kb/s So as you can see, x264 messed up pretty big, overshooting by 21.1kbit/s (42.2%), whereas x265 almost landed on target, overshooting by a mere 1.6kbit/s (3.2%) overall. And still… well… Let’s give you an overview first (as usual, click to enlarge images): 2.) Quality comparisons Note that the color shown in those thumbnails is not representative of the real images, this has been transformed to 256 color .png to make it easier to download (again, if your browser supports it, .webp will be loaded instead transparently). This is just to show you some basic differences between what x264 and x265 are able to preserve, and what they are not. Also, keep in mind, that “~50kbit/s nominal bitrate” means 71.1kbit/s for x264 and 51.6kbit/s for x265! Overall, x264 fucks up big time. There are frames with partial macroblock drops and completely blank frames even! Also, a lot of frames lose their color either partially or completely as well, making them B/W. And that’s given that x264 even invested 42.2% more bitrate than what I aimed for! x265 has no such severe issues, all frames are completely there in full color, and that at a bitrate reasonable close to the target. Let’s look at a few interesting cases side by side: Scene 1 (left: x264, middle: x265, right: source file): There are some indications of use of larger CTUs (coding tree unit, H.265s’ replacement for macroblocks) in x265s’ case, which is supposed to be one of its strong points, especially for very large resolution encoding (think: 4K/UHD, 8K). While larger blocks can mean loss of detail in that area, it’s ok for larger areas of uniform color, which this Anime has a ton of. H.264/AVC can’t do that so well, because the upper limit for a macroblocks’ size is rather low with 16×16 pixels. You can see the macroblock size pretty clearly in the blocky frame to the left. You need to look a bit more carefully in x265s’ case, but there are a few spots where I believe it can be seen as well. In my case the CTU size for x265 was 32×32px. Hm, maybe --ctu 64 would’ve been better for this specific case, but whatever. Lets look at two more mostly color-related comparisons: Scenes 2 & 3 (left: x264, middle: x265, right: source file): In the first case it seems as if x264 is trying to preserve shades of green more than anything, but in the second case, something terrible happens. There is a lot of red in the scene before this one, and there is quite some red on those can labels as well. It seems x264 doesn’t know where to put the color anymore, and the reds bleed almost all over the frame. And it stays like that for the entire scene as well, which means for several seconds. The greens and browns are lost. Block artifacts are excessive as well, but at least x264 managed to give us whole frames here, with some color even. Well, the color kinda went everywhere, but uhm, yeah… Two more: Scenes 4 & 5 (left: x264, middle: x265, right: source file): I really don’t know what’s with x264 and the reds. Shouldn’t green have priority? I mean, not just in the chroma subsampling, but in encoding as well? But red seems what x264 drops last, and it happens more than once. Given the detail and movements in that last part, even x265 fails though. Yes, it does preserve more color, but it doesn’t come remotely close to the source at this bitrate. And that other frame with the cuteness overload? There are a lot like those, where x264 just kinda panics, drops everything it has and then frantically tries to (re?)construct the current frame, sometimes only partially until the next I-frame arrives or so. So that’s it for my quick & dirty “ultra low bitrate” comparison between x264 and x265, at pretty taxing encoding settings once again. 3.) Additional information x264 encoding settings: $ mediainfo Piaceː\ Watashi\ no\ Italian\ -\ Episode\ 02-H.264+HEv2AAC-V50kbit-A8kbit.mkv | grep -i \
"encoding settings"
Encoding settings                        : cabac=1 / ref=16 / deblock=1:-2:0 / analyse=0x3:0x133 / \
me=umh / subme=10 / psy=1 / psy_rd=0.40:0.00 / mixed_ref=1 / me_range=24 / chroma_me=1 / trellis=2 \
/ 8x8dct=1 / cqm=0 / deadzone=21,11 / fast_pskip=0 / chroma_qp_offset=-2 / threads=18 / \
constrained_intra=0 / bframes=16 / b_pyramid=2 / b_adapt=2 / b_bias=0 / direct=3 / weightb=1 / \
open_gop=1 / weightp=2 / keyint=250 / keyint_min=23 / scenecut=40 / intra_refresh=0 / \
rc_lookahead=60 / rc=2pass / mbtree=1 / bitrate=50 / ratetol=1.0 / qcomp=0.60 / qpmin=0 / qpmax=81 \
/ qpstep=4 / cplxblur=20.0 / qblur=0.5 / ip_ratio=1.40 / aq=1:0.60

x265 encoding settings (note: 10 bits per color channel were chosen, same as for x264):

$mediainfo Piaceː\ Watashi\ no\ Italian\ -\ Episode\ 02-H.265+HEv2AAC-V50kbit-A8kbit.mkv | grep -i \ "encoding settings" Encoding settings : cpuid=1049087 / frame-threads=3 / wpp / pmode / pme / \ no-psnr / no-ssim / log-level=2 / input-csp=1 / input-res=1920x1080 / interlace=0 / total-frames=0 \ / level-idc=0 / high-tier=1 / uhd-bd=0 / ref=6 / no-allow-non-conformance / no-repeat-headers / \ annexb / no-aud / no-hrd / info / hash=0 / no-temporal-layers / open-gop / min-keyint=23 / \ keyint=250 / bframes=16 / b-adapt=2 / b-pyramid / bframe-bias=0 / rc-lookahead=40 / \ lookahead-slices=0 / scenecut=40 / no-intra-refresh / ctu=32 / min-cu-size=8 / rect / amp / \ max-tu-size=32 / tu-inter-depth=4 / tu-intra-depth=4 / limit-tu=0 / rdoq-level=1 / signhide / \ no-tskip / nr-intra=0 / nr-inter=0 / no-constrained-intra / no-strong-intra-smoothing / max-merge=5 \ / limit-refs=1 / limit-modes / me=3 / subme=4 / merange=57 / temporal-mvp / weightp / weightb / \ no-analyze-src-pics / deblock=0:0 / no-sao / no-sao-non-deblock / rd=6 / no-early-skip / rskip / \ no-fast-intra / no-tskip-fast / no-cu-lossless / b-intra / rdpenalty=0 / psy-rd=1.60 / \ psy-rdoq=5.00 / no-rd-refine / analysis-mode=0 / no-lossless / cbqpoffs=0 / crqpoffs=0 / rc=abr / \ bitrate=50 / qcomp=0.75 / qpstep=4 / stats-write=0 / stats-read=2 / stats-file=265/v.stats / \ cplxblur=20.0 / qblur=0.5 / ipratio=1.40 / pbratio=1.30 / aq-mode=3 / aq-strength=1.00 / cutree / \ zone-count=0 / no-strict-cbr / qg-size=32 / no-rc-grain / qpmax=69 / qpmin=0 / sar=1 / overscan=0 / \ videoformat=5 / range=1 / colorprim=2 / transfer=2 / colormatrix=2 / chromaloc=0 / display-window=0 \ / max-cll=0,0 / min-luma=0 / max-luma=1023 / log2-max-poc-lsb=8 / vui-timing-info / vui-hrd-info / \ slices=1 / opt-qp-pps / opt-ref-list-length-pps / no-multi-pass-opt-rps / scenecut-bias=0.05 x264 version: $ x264 --version
x264 0.148.x
(libswscale 3.0.0)
(libavformat 56.1.0)
built on Sep  6 2016, gcc: 6.2.0
x264 configuration: --bit-depth=10 --chroma-format=all
libx264 configuration: --bit-depth=10 --chroma-format=all
x264 license: GPL version 2 or later
WARNING: This binary is unredistributable!

x265 version:

$x265 --version x265 [info]: HEVC encoder version 2.2+23-58dddcf01b7d x265 [info]: build info [Linux][GCC 6.2.0][64 bit] 8bit+10bit+12bit x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 Encoding & testing platform: $ uname -sr
Linux 2.6.32-573.8.1.el6.x86_64
$cat /etc/redhat-release CentOS release 6.8 (Final)$ cat /proc/cpuinfo | grep "model name" | uniq
model name	: Intel(R) Core(TM) i7 CPU       X 980  @ 3.33GHz

Q: “Can H.265/HEVC enable an ISDN user to stream 1080p content in any useful form?

A: It can probably stream something that at least resembles the original source in a recognizable fashion, but… whether you can call that “useful” or not is another thing entirely…

Q: “What would H.264/AVC look like in that case?”

A: Like shit!

Since I’ve been doing a bit of Anime batch video transcoding with x264 and x265 in the last few months, I thought I’d document this for myself here. My goal was to loop over an arbitrary amount of episodes and just batch-transcode them all at once. And that on three different operating systems: Windows (XP x64), Linux (CentOS 6.8 x86_64) and FreeBSD 10.3 UNIX, x86_64. Since I’ve started to split the work across multiple machines, I lost track of what was where and which machine finished what, and when.

So I thought, why not let the loop send me a small notification email upon completion? And that’s what I did. On Linux and UNIX this relies on the bash shell and the mailx command. Please note that I’m talking about [Heirloom mailx], not some other mail program by the same name! I’m mentioning this, because there is a different default mailx on FreeBSD, that won’t work for this. That’s why I put alias mailx='/usr/local/bin/mailx' in my ~/.bash_profile on that OS after installing the right program to make it the default for my user.

On Windows, my loops depend on my own [colorecho] command (you can replace that with cmds’ ECHO if you want) as well as the command line mailer [blat]. Note that, if you need to use SSL/TLS encryption when mailing, blat can’t do that. A suitable replacement could be [mailsend]. Please note, that mailsend does not work on Windows XP however.

In the x265 case, avconv (from the [libav] package) is required on all platforms. You can get my build for Windows [here]. If you don’t like it, the wide-spread [ffmpeg] can be a suitable drop-in replacement.

Now, when setting up blat on Windows, make sure to run blat -help first, and learn the details about blat -install. You need to run that with certain parameters to set it up for your SMTP mail server. For whatever reason, blat reads some of that data from the registry (ew…), and blat -install will set that up for you.

Typically, when I transcode, I do so on the elementary streams rather than .mkv files directly. So I’d loop through some source files and extract the needed streams. Let’s say we have “A series – episode 01.mkv” and some more, all the way up to “A series – episode 13.mkv”, then, assuming track #0 is the video stream…

On Windows:

FOR %I IN (01,02,03,04,05,06,07,08,09,10,11,12,13) DO mkvextract tracks "A series - episode %I.mkv" ^
0:%I\video.h264

On Linux/UNIX:

for i in {01..13}; do mkvextract tracks "A series - episode $i.mkv" 0:$i/video.h264; done

mkvextract will create the non-existing subfolder for us, and a x264 transcoding loop would then look like this on Windows:

expand/collapse source code
cmd /V /C "ECHO OFF & SET MACHINE=NOVASTORM& SET EPNUM=13& SET SERIES="AnimeX"& (FOR %I IN ^
(01,02,03,04,05,06,07,08,09,10,11,12,13) DO "c:\Program Files\VFX\x264cli\x264-10b.exe" --fps ^
24000/1001 --preset veryslow --tune animation --open-gop -b 16 --b-adapt 2 --b-pyramid normal -f ^
-2:0 --bitrate 2500 --aq-mode 1 -p 1 --slow-firstpass --stats %I\v.stats -t 2 --no-fast-pskip ^
--cqm flat --non-deterministic --demuxer lavf %I\video.h264 -o %I\pass1.264 & colorecho "Pass 1 ^
done for Episode %I/"!EPNUM!" of "!SERIES!"" 10 & ECHO. & ^
"c:\Program Files\VFX\x264cli\x264-10b.exe" --fps 24000/1001 --preset veryslow --tune animation ^
--open-gop -b 16 --b-adapt 2 --b-pyramid normal -f -2:0 --bitrate 2500 --aq-mode 1 -p 2 --stats ^
%I\v.stats -t 2 --no-fast-pskip --cqm flat --non-deterministic --demuxer lavf %I\video.h264 -o ^
%I\pass2.264 & colorecho "Pass 2 done for Episode %I/"!EPNUM!" of "!SERIES!"" 10) & echo !SERIES! ^
transcoding complete | blat - -t "myself@another.mailhost.com" -c "myself@mailhost.com" -s "x264 ^
notification from !MACHINE!" & SET MACHINE= & SET EPNUM= & SET SERIES="

Note that I always write all the iteration out in full here. That’s because cmd can’t do loops with leading zeroes in the iterator. The reason for this is that those source files usually have them in their lower episode numbers. If it wasn’t 01,02, … ,12,13, but 1,2, … ,12,13 instead, you could do FOR /L %I IN (1,1,13) DO. But this isn’t possible in my case. Even if elements need alphanumeric names like here,  FOR %I IN (01,02,03,special1,special2,ova1,ova2) DO, you still won’t need that syntax on Linux/UNIX because the bash can have iterator groups like for i in {{01..13},special1,special2,ova1,ova2}; do. Makes me despise the cmd once more.

Edit:

Ah, according to [this], you can actually do something like cmd /V /C "FOR /L %I IN (1,1,13) DO (SET "fI=00%I" & echo "!fI!:~-2")", holy shit. It actually works and gives you leading zeroes. :~-2 for 2 digits, :~-3 for three. Expand fI for more in this example. I mean, what is this even? Some number formatting magic? I probably don’t even wanna know… Couldn’t find any way of having several groups for the iterator however. Meh. Still don’t like it.

So, well, it’s like this on Linux/UNIX:

expand/collapse source code
(export MACHINE=BEAST EPNUM=13 SERIES='AnimeX'; for i in {01..13}; do nice -n19 x264 --fps \
24000/1001 --preset veryslow --tune animation --open-gop -b 16 --b-adapt 2 --b-pyramid normal -f \
-2:0 --bitrate 2500 --aq-mode 1 -p 1 --slow-firstpass --stats $i/v.stats -t 2 --no-fast-pskip \ --cqm flat --non-deterministic --demuxer lavf$i/video.h264 -o $i/pass1.264 && echo && echo -e \ "\e[1;31mdate +%H:%M, pass 1 done for episode$i/$EPNUM of$SERIES\e[0m" && echo && nice -n19 \
x264 --fps 24000/1001 --preset veryslow --tune animation --open-gop -b 16 --b-adapt 2 --b-pyramid \
normal -f -2:0 --bitrate 2500 --aq-mode 1 -p 2 --stats $i/v.stats -t 2 --no-fast-pskip --cqm flat \ --non-deterministic --demuxer lavf$i/video.h264 -o $i/pass2.264 && echo && echo -e \ "\e[1;31mdate +%H:%M, pass 2 done for episode$i/$EPNUM of$SERIES\e[0m" && echo; done && echo \
"$SERIES transcoding complete" | mailx -s "x264 notification from$MACHINE" -r \
"myself@mailhost.com" -c "myself@another.mailhost.com" -S smtp-auth="login" -S \
myself@mailhost.com)

The variable $MACHINE or %MACHINE%/!MACHINE! specifies the machines’ host name. This will be noted in the email, so I know which machine just completed something. $EPNUM – or %EPNUM%/!EPNUM! on Windows – is used for periodic updates on the shell. The output would be like “Pass 1 done for Episode 07/13 of AnimeX” in green on Windows and bold red on Linux/UNIX (just change the color to your liking).

Finally, $SERIES aka %SERIES%/!SERIES! would be the series’ name. So say, the UNIX machine named “BEAST” above is done with this loop. The email would come with the subject line “x264 notification from BEAST” and would read “AnimeX transcoding complete” in plain text. That’s all. Please note, that cmd batch on Windows is extremely creepy. Every whitespace (especially the leading ones when doing multi-line like this for display) needs to be exactly where it is. The same goes for double quotes where you might think they aren’t needed. They are! Also, this needs delayed variable expansion once again, which is why we see variables like !EPNUM! instead of %EPNUM% and why it’s called in a subshell by running cmd /V /C. On Linux/UNIX we don’t need to rely on some specific API like cmds’ SetConsoleTextAttribute() to print colors, as most terminals understand ANSI color codes. And this is what it looks like for x265: Windows: expand/collapse source code cmd /V /C "ECHO OFF & SET MACHINE=NOVASTORM& SET EPNUM=13& SET SERIES="AnimeX"& (FOR %I IN ^ (01,02,03,04,05,06,07,08,09,10,11,12,13) DO avconv -r 24000/1001 -i %I\video.h264 -f yuv4mpegpipe ^ -pix_fmt yuv420p -r 24000/1001 - 2>NUL | "C:\Program Files\VFX\x265cli-mb\x265.exe" - --y4m -D 10 ^ --fps 24000/1001 -p veryslow --pmode --pme --open-gop --ref 6 --bframes 16 --b-pyramid --bitrate ^ 2500 --rect --amp --aq-mode 3 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 ^ --psy-rdoq 5.0 --rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 ^ --pass 1 --slow-firstpass --stats %I\v.stats --sar 1 --range full -o %I\pass1.h265 & colorecho ^ "Pass 1 done for Episode %I/"!EPNUM!" of "!SERIES!"" 10 & ECHO. & avconv -r 24000/1001 -i ^ %I\video.h264 -f yuv4mpegpipe -pix_fmt yuv420p -r 24000/1001 - 2>;NUL | ^ "C:\Program Files\VFX\x265cli-mb\x265.exe" - --y4m -D 10 --fps 24000/1001 -p veryslow --pmode ^ --pme --open-gop --ref 6 --bframes 16 --b-pyramid --bitrate 2500 --rect --amp --aq-mode 3 ^ --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0 --rdoq-level 1 ^ --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 2 --stats %I\v.stats --sar ^ 1 --range full -o %I\pass2.h265 & colorecho "Pass 2 done for Episode %I/"!EPNUM!" of "!SERIES!"" ^ 10) & echo !SERIES! transcoding complete | blat - -t "myself@another.mailhost.com" -c ^ "myself@mailhost.com" -s "x265 notification from !MACHINE!" & SET MACHINE= & SET EPNUM= & SET ^ SERIES=" Linux/UNIX: expand/collapse source code (export MACHINE=BEAST EPNUM=13 SERIES='AnimeX'; for i in {01..13}; do avconv -r 24000/1001 -i \$i/video.h264 -f yuv4mpegpipe -pix_fmt yuv420p -r 24000/1001 - 2>/dev/null | nice -19 x265 - --y4m \
-D 10 --fps 24000/1001 -p veryslow --open-gop --ref 6 --bframes 16 --b-pyramid --bitrate 2500 \
--rect --amp --aq-mode 3 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq \
5.0 --rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 1 \
--slow-firstpass --stats $i/v.stats --sar 1 --range full -o$i/pass1.h265 && echo && echo -e \
"\e[1;31mdate +%H:%M, pass 1 done for episode $i/$EPNUM of $SERIES\e[0m" && echo && avconv -r \ 24000/1001 -i$i/video.h264 -f yuv4mpegpipe -pix_fmt yuv420p -r 24000/1001 - 2>/dev/null | nice \
-19 x265 - --y4m -D 10 --fps 24000/1001 -p veryslow --open-gop --ref 6 --bframes 16 --b-pyramid \
--bitrate 2500 --rect --amp --aq-mode 3 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd \
1.6 --psy-rdoq 5.0 --rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 \
--pass 2 --stats $i/v.stats --sar 1 --range full -o$i/pass2.h265 && echo && echo -e \
"\e[1;31mdate +%H:%M, pass 2 done for episode $i/$EPNUM of $SERIES\e[0m" && echo; done && echo \ "$SERIES transcoding complete" | mailx -s "x265 notification from $MACHINE" -r \ "myself@mailhost.com" -c "myself@another.mailhost.com" -S smtp-auth="login" -S \ smtp="smtp.mailhost.com" -S smtp-auth-user="myuser" -S smtp-auth-password="mysecurepassword" \ myself@mailhost.com) And that’s it. The loops for audio transcoding are simpler, as that part is so fast, it doesn’t need email notifications. Runs for minutes rather than days. When all is done, I’d usually fire up the MKVToolnix GUI, and prepare a mux for the first episode. There is a nice “copy command line to clipboard” function there when you click on “Muxing” after everything is set up. With that I can build another loop that muxes everything to final .mkv files. On Windows that part is more complicated if you want Unicode support, so I needed to create input files by using a generator I wrote in Perl for that, but that’s for another day… Oh, and if you wanna ssh into your Linux or UNIX boxes from afar to check on your transcoders, consider launching them on a GNU screen] session. It’s immensely useful! Too bad it won’t work on the Windows cmd. Recently, after [successfully compiling] the next generation x265 H.265/HEVC video encoder on Windows, Linux and FreeBSD, I decided to ask for guidance when it comes to compressing Anime (live action will follow at a later time) in the Doom9 forums, [see here]. Thing is, I didn’t understand all of the knobs x265 has to offer, and some of the convenient presets of x264 didn’t exist here (like --tune film and --tune animation). So for a newbie it can be quite hard to make x265 perform well without sacrificing far too much CPU power, as x265 is significantly more taxing on the processor than x264. Thanks to [Asmodian] and [MeteorRain]/[LittlePox] I got rid of x265s’ blurring issues, and I took their settings and turned them up to achieve more quality while staying within sane encoding times. My goal was to be able to encode 1080p ~24fps videos on an Intel Xeon X5690 hexcore @ 3.6GHz all-core boost clock at >=1fps for a target bitrate of 2.5Mbit. In this post, I’d like to compare 7 scenes from the highly opulent Anime [The Garden of Words] (言の葉の庭) by [Makoto Shinkai] (新海 誠) at three different average bitrates, 1Mbit, 2.5Mbit (my current x264 default) and 5Mbit. The Blu-Ray source material is H.264/AVC at roughly 28Mbit on average. Also, both encoders are running in 10-bit color depth mode instead of the common 8-bit, meaning that the internal arithmetic precision is boosted from 8- to 16-bit integers as well. While somewhat “non-standard” for H.264/AVC, this is officially supported by H.265/HEVC for Blu-Ray 4K. The mode of operation is 2-pass to aim for comparable file sizes and bitrates. The encoding speed penalty for switching from x264 to x265 at the given settings is around a factor of 8. Somewhat. The screenshots below are losslessly compressed 1920×1080 images. Since this is all about compression, I chose to serve the large versions of the images in WebP format to all browsers which support it (Opera 11+, Chromium-based Browsers like Chrome, Iron, Vivaldi, the Android Browser or Pale Moon as the only Gecko browser). This is done, because at maximum level, WebP does lossless compression much more efficiently, so the pictures are smaller. This helps, because my server only has 8Mbit/s upstream. If your browser doesn’t support WebP (like Firefox, IE, Edge, Safari), it’ll be fed lossless PNG instead. All of this happens automatically, you don’t need to do anything! Now, let’s start with the specs. Specifications: Here are the source material encoding settings according to the video stream header: cabac=1 / ref=4 / deblock=1:1:1 / analyse=0x3:0x133 / me=umh / subme=10 / psy=1 / psy_rd=0.40:0.00 / mixed_ref=1 / me_range=24 / chroma_me=1 / trellis=2 / 8x8dct=1 / cqm=0 / deadzone=21,11 / fast_pskip=1 / chroma_qp_offset=-2 / threads=12 / lookahead_threads=1 / sliced_threads=0 / slices=4 / nr=0 / decimate=1 / interlaced=0 / bluray_compat=1 / constrained_intra=0 / bframes=3 / b_pyramid=1 / b_adapt=2 / b_bias=0 / direct=3 / weightb=1 / open_gop=1 / weightp=1 / keyint=24 / keyint_min=1 / scenecut=40 / intra_refresh=0 / rc_lookahead=24 / rc=2pass / mbtree=1 / bitrate=28229 / ratetol=1.0 / qcomp=0.60 / qpmin=0 / qpmax=69 / qpstep=4 / cplxblur=20.0 / qblur=0.5 / vbv_maxrate=31600 / vbv_bufsize=30000 / nal_hrd=vbr / filler=0 / ip_ratio=1.40 / aq=1:0.60 x264 10-bit encoding settings (pass 1 & pass 2), 2.5Mbit example: --fps 24000/1001 --preset veryslow --tune animation --open-gop --b-adapt 2 --b-pyramid normal -f -2:0 --bitrate 2500 --aq-mode 1 -p 1 --slow-firstpass --stats v.stats -t 2 --no-fast-pskip --cqm flat --non-deterministic --fps 24000/1001 --preset veryslow --tune animation --open-gop --b-adapt 2 --b-pyramid normal -f -2:0 --bitrate 2500 --aq-mode 1 -p 2 --stats v.stats -t 2 --no-fast-pskip --cqm flat --non-deterministic x265 10-bit encoding settings (pass 1 & pass 2), 2.5Mbit example: --y4m -D 10 --fps 24000/1001 -p veryslow --open-gop --bframes 16 --b-pyramid --bitrate 2500 --rect --amp --aq-mode 3 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0 --rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 1 --slow-firstpass --stats v.stats --sar 1 --range full --y4m -D 10 --fps 24000/1001 -p veryslow --open-gop --bframes 16 --b-pyramid --bitrate 2500 --rect --amp --aq-mode 3 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0 --rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 2 --stats v.stats --sar 1 --range full Since x265 can only read raw YUV and Y4M, the source video is being fed to it via [libavs’] avconv tool, piping it into x265. The avconv commandline for that looks as follows: $ avconv -r 24000/1001 -i input.h264 -f yuv4mpegpipe -pix_fmt yuv420p -r 24000/1001 - 2>/dev/null

If you want to do something similar, but you don’t like avconv, you can use [ffmpeg] as a replacement, the options are completely the same. Note that you should always specify the correct frame rates (-r) for input and output, or the bitrate setting of the encoder will be applied wrongly!

x264 on the other hand was linked against libav directly, using its decoding capabilities without any workarounds.

Let’s compare:

“The Garden of Words” has a lot of rain. This is a central story element of the 46 minute movie, and it’s hard on any encoder, because a lot of stuff is moving on screen all the time. Let’s take a look at such a scene for our first side-by-side comparison. Each comparison is done in two rows: H.264/AVC first (including the source material), and below that H.265/HEVC, also including the source.

Let’s go:

Scene 1, H.264/AVC encoded by x264 0.148.x:

Scene 1, H.265/HEVC encoded by x265 1.9+15-425b583f25db:

It has been said that x265 performs specifically well at two things: Very high resolutions (which we don’t have here) and low bitrates. And yep, it shows. When comparing the 1Mbit shots, it becomes clear pretty quickly that x265 manages to preserve more detail for the parts with lots of motion. x264 on the other hand starts to wash out the scene pretty severely, smearing out some raindrops, spray water and parts of the foliage. Also, it’s pretty bad around the outlines as well, but that’s true for both encoders. You can spot that easily with all the aliasing artifacts on the raindrops.

Moving up a notch, it becomes very hard to distinguish between the two. When zooming in you can still spot a few minor differences (note the kids umbrella, even if it’s not marked), but it’s quite negligible. Here I’m already starting to think x265 might not be worth it in all cases. There are still differences between the two 2.5Mbit shots and the original however, see the red areas of the umbrella and the most low-contrast, dark parts of the foliage.

At 5Mbit, I really can’t see any difference anymore. Maybe the colors are a little off or something, but when seen in motion, distinguishing between the two and the original becomes virtually impossible. Given that we just threw a really difficult scene at x264 and x265, this should be a trend to continue throughout the whole test.

Now, even more rain:

Scene 2, H.264/AVC encoded by x264 0.148.x:

Scene 2, H.265/HEVC encoded by x265 1.9+15-425b583f25db:

Now this is extreme at 1Mbit! Looking at H.264, the spray water on top of the cable can’t even be told apart from the cloud in the background anymore. Detail loss all over the scene is catastrophic in comparison to the original. Tons of raindrops are simply gone entirely, and the texture details on the tower and the angled brick wall of the house to the left? Almost completely washed out and smeared.

Now, let’s look at H.265 @ 1Mbit. The spray water is also pretty bad, but it’s amazing how much more detail was preserved overall. Sure, there are still parts of the raindrops missing, but it’s much, much closer to the original. We can now see details on the walls as well, even the steep angle one on the left. The only serious issue is the red light bleeding at the tower. There is very little red there in the original, so I’m not sure what happened there. x264 does this as well, but x265 is a bit worse.

At the next level, the differences are less pronounced again, but there is still a significant enough improvement when going from x264 to x265 at 2.5Mbit: The spray water on the cable becomes more well-defined, and more rain is being preserved. Also, the textures on the walls are a tiny little bit more detailed and crisp. Once again though, x265 is bleeding too much red light at the tower.

Since it’s noticeably not fully on the level of the source still, let’s look at 5Mbit briefly. x265 is able to preserve a tiny little bit more rain detail here, coming extremely close to the original. In motion, you can’t really see the difference however.

Now, let’s get steamy:

Scene 3, H.264/AVC encoded by x264 0.148.x:

Scene 3, H.265/HEVC encoded by x265 1.9+15-425b583f25db:

1Mbit first again: Let me just say: It’s ugly. x264 pretty much messes up the steam coming from the iron. We get lots of block artifacts now. Some of the low-contrast patterns on the ironing board are being smeared out at a pretty terrible level. Also, the bokeh background partly shows block artifacts and banding. x265 produces quite a lot of banding here itself, but no blocks. Also, outlines and sharp contrasts are more well-defined, and the low contrast part is done noticeably better.

At 2.5Mbit, the patterns repeat themselves now. The steam is only slightly better with x265, outlines are slightly more well-defined, and the low-contrast patterns are slightly more visible. For some of the blurred parts, x265 seems to be a tiny little bit to prone to banding though, in a very few spots, x264 might be just that 1% better. Overall however, x265 wins this, and even if it’s just for the better outlines.

At 5Mbit, you really need to zoom in and analyze very small details, e.g. around the outer border of the steam. Yes, x265 does better again. But you’d not really be able to notice this when watching.

How about we go cry a little bit:

Scene 4, H.264/AVC encoded by x264 0.148.x:

Scene 4, H.265/HEVC encoded by x265 1.9+15-425b583f25db:

Cutting onions would be a classic fun part in a slice-of-life anime. Here, it’s just kitchen work. And quite the bad looking one for H.264 at 1Mbit. The letters on the knife are partly lost completely, becoming unreadable. The onion parts that fly off are visibly worse than when encoded with x265 at the same bitrate. Also, x264 produced block artifacts in the blurred bokeh areas again, that simply aren’t there with x265.

On the next level, the two come much closer to each other. However, x265 simply does the outlines better. Less artifacts and sharper, just like with the writing on the knifes’ blade as well. The issues with the bokeh are nearly gone. What’s left is a negligible amount of blocking for x264 and banding for x265. Not really noticeable however.

Finally, at 5Mbit, x265 shows those ever so slightly more well-done outlines. But that’s about it, the rest looks nice for both, and pretty much identical to the source.

Scene 5, H.264/AVC encoded by x264 0.148.x:

Scene 5, H.265/HEVC encoded by x265 1.9+15-425b583f25db:

Let’s keep this short: x264 does blocking, and bad transitions/outlines. x265 does it better. Plain and simple.

At 2.5Mbit, x265 nearly reaches quality transparency when compared to the original, something x264 falls short of, just a bit. While x265 does the outlines and the steam part quite like in the original frame, x264 rips the outlines apart a bit too much, and slight block artifacts can again be seen for the steam part.

At 5Mbit, x264 still shows some blocking artifacts in a part that most lossy image/video compression algorithms traditionally suck at: The reds. While not true for all human beings, most eyes perceive much finer gradients for greens, then blues, and do worst with reds. Meaning, our eyes have an unequal sensitivity distribution when it comes to colors. So image and video codecs try to save bitrate in the reds first, because supposedly it’d be less noticeable. To me subjectively, x265 achieves transparency here, meaning it looks just like the original. x264 doesn’t manage entirely. Close, but not not entirely.

Next:

Scene 6, H.264/AVC encoded by x264 0.148.x:

Scene 6, H.265/HEVC encoded by x265 1.9+15-425b583f25db:

This is a highly static scene, with only few moving parts, so there is some rain again, and some shadow cast by raindrops as well. Now, for the static parts, incremental B frames really work wonders here. Most detail is being preserved by both encoders. What’s supposed to be happening is happening: The encoders save bit rate where the human eye can’t easily tell: In the parts where stuff is moving around very quickly. That’s how we lose a lot of raindrop shadows and some drops as well. x264 seems to have trouble separating the scene into even smaller macro blocks though? Not sure if that’s the reason, but a lot of mesh detail for the basket on the balcony on the top right is lost – x265 does better there! This is maybe because x264 couldn’t distinct the moving drops from the static background so well anymore?

At 2.5Mbit, the scenes become almost indistinguishable. The more static content we have, the easier it gets of course, so the transparency threshold becomes lower. And if you ask me, both of them reach perfect quality at 5Mbit.

Let’s throw another hard one at them for the last round:

Scene 7, H.264/AVC encoded by x264 0.148.x:

Scene 7, H.265/HEVC encoded by x265 1.9+15-425b583f25db:

Enough rain already? Pffh! Here we have a lot of foliage and low contrast added to the mix. And it gets smeared a lot by x264, rain detail lost, fine details of the bushes turning into green mud, that’s how it goes. x265 also loses too much detail here (I mean, 1Mbit is really NOT much), but again, it fares quite a bit better.

At 2.5Mbit, both encoders do very well. Somehow, this scene doesn’t seem to penalize x264 that much at the medium level. You’d really need your magnifying glass to find the spots where x265 still does better, which surprises me a bit for this scene. And finally, at 5Mbit – if you ask me – visual transparency is reached for both x264 and x265.

Final thoughts:

Clearly it’s true what a lot of people have been saying. x265 rocks at low bitrates, if configured correctly. But that isn’t gonna give me perfect quality or anything. Yeah, it sucks less – much less – than x264 in that department, but at a higher 2.5Mbit, where both start looking quite decent, x265 having just a slight edge… it becomes hardly justifiable to use it, simply because it’s that much slower to run it at decent settings.

Also, you need to take device compatibility into account. Sure, a powerful PC can always play the stuff. No matter if it’s some UNIX, Linux, MacOS X or Windows. But how about video game consoles? Older TVs? That kind of thing. Most of those can only play H.264/AVC. Or course, if you’re only using your PC and you have a lot of time and electricity to burn, then hey – why not?

But I’ll have to think really long and really hard about whether I want to replace x264 with x265 at this given point in time. Overall, it might not be practical enough on my current hardware yet. Maybe I’d need an AVX/AVX2-capable processor, as x265 has tons of optimizations for those instruction set extensions. But I’m gonna stay on my Xeon X5690 for quite a while, so SSE 4.2 is the latest I have.

I’d say, if you can stand some quality degradation, then x265 might be the way to go, as it can give you much smaller file sizes at lower bitrates with slight degradation.

If you’re aiming for high bitrates and quality, it might not be worth it right now, at least for 1080p. It’s been said the tables are turning once more when going up to 4K and UHD, but I haven’t tested that yet, as all my content – both Anime and live action movies – are “low resolution” *cough* 1080p or 720p.

Screenshots were taken using the latest stable mplayer 1.3.0 on Linux. Thank god they’re bundling it with ffmpeg now, making things much easier. This choice was made because mplayer can easily grab screenshots from specific spots in a video in an automated fashion. I used framesteps for this, like this:

$mplayer ./TEST-H.265／HEVC-1mbit.mkv -nosound -vf framestep=24 \ -vo png:z=9:outdir=./screenshots/1mbit/H.265／HEVC/:prefix=H.265／HEVC-1mbit- This will decode every 24th frame of the video file TEST-H.265／HEVC-1mbit.mkv, and grab it into a .png file with maximum lossless compression as supported by mplayer. The .png files will be prefixed with a user-defined string and numbered, like H.265／HEVC-1mbit-00000001.pngH.265／HEVC-1mbit-00000002.png and so on, until the end of file is reached. To encode the full size screenshots to WebP, the most recent [libwebp-0.5.0], or rather one of its companion tools – cwebp – was used as follows: $ cwebp -z 9 -lossless -q 100 -noalpha -mt ./input.png -o ./output.webp

No?

Meh…

Sure there are ways to compile the components of my x264 benchmark on [almost any platform]. But you never get the “reference” version of it. The one originally published for Microsoft Windows and the one really usable for direct comparisons. A while back I tried to run that Windows version on Linux using [Wine], but it wouldn’t work because it needs a shell. It never occurred to me that I could maybe just copy over a real cmd.exe from an actual Windows. A colleague looked it up in the Wine AppDB, and it seems the cmd.exe only has [bronze support status] as of Wine version 1.3.35, suggesting some major problems with the shell.

Nevertheless, I just tried using my Wine 1.4.1 on CentOS 6.3 Linux, and it seems support has improved drastically. All cmd.exe shell builtins seem to work nicely. It was just a few tools that didn’t like Wines userspace Windows API, especially timethis.exe, which also had problems talking to ReactOS. I guess it wants something from the Windows NT kernel API that Wine cannot provide in its userspace reimplementation.

But: You can make cmd.exe just run one subcommand and then terminate using the following syntax:

cmd.exe /c <command to run including switches>

Just prepend the Unix time command plus the wine invocation and you’ll get a single Windows command (or batch script) run within cmd.exe on Wine, and get the runtime out of it at the end. Somewhat like this:

time wine cmd.exe /c <command to run including switches>

Easy enough, right? So does this work with the Win32 version of x264? Look for yourself:

So as you can see it does work. It runs, it detects all instruction set extensions (SSE…) just as if it was 100% native, and as you can see from the htop and Linux system monitor screens, it utilizes all four CPU cores or all eight threads / logical CPUs to be more precise. By now this runs at around 3fps+ on a Core i7 950, so I assume it’s slower than on native Windows.

Actually, the benchmark publication itself currently knows several flags for making results “not reference / not comparable”. One is the flag for custom x264 versions / compilations, one is for virtualized systems and one for systems below minium specifications. The Wine on Linux setup wouldn’t fit into any of those. Definitely not a custom version, running on a machine that satisfies my minimum system specs, leaving the VM stuff to debate. Wine is per definition a runtime environment, not an emulator, not a VM hypervisor or paravirtualizer. It just reimplements the Win32/64 API, mapping certain function calls to real Linux libraries or (where the user configures it as such) to real Microsoft or 3rd party DLLs copied over. That’s not emulation. But it’s not quite the same as running on native Windows either.

I haven’t fully decided yet, but I think I will mark those results as “green” in the [results list], extending the meaning of that flag from virtual machines to virtual machines AND Wine, otherwise it doesn’t quite seem right.

gt;

Now if you decide to read this post, you’ll be going through a lot of technical crap again, so to make it easier for you, click on the posts logo to the left. There you get an interesting picture to look at if you get bored by me talking, and if you apply the cross-eye technique (although you’ll have to cross your eyes up to a level where it might be a bit of a strain, just try it until you get a “lock” on the “middle” picture), you’ll see the image in sterescopic 3D. Yay, boobies in stereoscopic 3D, not real ones though, very unfortunate. Better than nothing though, eh?

Now, where was I? Ah yes, stereoscopy. These days it seems to be a big thing, not just in movie theatres, but also for the home user with 3D Blu-Rays and 3D HDTVs. I’m quite into transcoding Blu-Rays using the x264 encoder as well as a bunch of other helper tools, and I’m archiving all my discs to nice little MKV files, that I prefer to having to actually insert the BD and deal with all the DRM crap like ACSS encryption and BD+. Ew.

But, with 3D, there is an entirely new challenge to master. On a Blu-Ray, 3D information is stored as a so-called MVC stream, which is an extension to H.264/AVC, which also means, there is no Microsoft Codec involved anymore, so there is no VC-1 3D. Only H.264. This extension is for the right eye and works like a diff stream or like DTS MLP data roughly, so while the full 2D left eye stream may be 10GB in size, the right eye one (as it only contains parts that are different from the left eye stream) will be maybe around 5GB.  Usually, stereoscopic 3D works by giving you one image for your left and one for your right eye, while filtering the left eye stream for the right eye (so the right eye never sees the left eye stream) and vice-versa. There are slight differences in the streams, which trick your brain into interpreting the information presented to it as 3D depth.

A cheap version of “filtering” is the cross-eye technique, that works best for plain images, like the logo above, which is a so called side-by-side 3D image. Side-by-side is cheap, easy, and for movies it’s very compatible, because pretty much any 3D screen supports it. Also, there are free tools available to work with SBS 3D. But, before SBS 3D can be created, the method of the Blu-Ray (which is not SBS but “frame packing”, see my description of MVC above) needs to be decoded first. Since the MVC right eye stream is a diff, and not a complete one, the complete SBS right eye stream needs to be constructed from the MVC and the left-eye H.264/AVC first. That can be done using certain plugins in the AviSynth frameserver under Windows. One of those plugins is H264StereoSource, which offers the function H264StereoSource().

Using this, i successfully transcoded Prometheus 3D. However, that was the only success ever. I tried it again as I got new 3D Blu-Rays, but all of a sudden, this (click to enlarge):

As you can see in the shell window, there is lots of weird crap instead of regular x264 output. At some point, rather sooner than later, x264 would just crash. Taking whatever output it has generated by that time, you get something that looks like this (click to enlarge):

Clearly, this is bad. As an SBS, the right frame should look awfully familiar, almost mirroring the left one. As you can see, this looks awfully not very much like anything. Fragments of the shapes of the left eye stream can be recognized in the right eye one when the movie plays, but it’s totally garbled and a bit too.. well… green. Whatever. H264StereoSource() fucked up big time, so I had to look into another solution. Luckily, [Sharc], a [Doom9] user pointed me in the right direction by mentioning [BD3D2MK3D].

This is yet another GUI with some of the tools I already use in the background plus some additional ones, most significantly another AviSynth plugin called SSIFSource2.dll, offering the function ssifSource2(). That one reads an SSIF container (which is H.264/AVC and MVC linked together) directly, and can not only decode it, but also generate proper left/right eye output from it directly.

Now first, this is what my old AviSynth script looked like, the now broken one:

1. LoadPlugin("C:\Program Files (x86)\VFX\AviSynth 2.5\plugins\DGAVCDecode.dll")
2. LoadPlugin("C:\Program Files (x86)\VFX\BDtoAVCHD\H264StereoSource\H264StereoSource.dll")
3. left = AVCSource("D:\left.dga")
4. right = H264StereoSource("D:\crash.cfg", 132566).AssumeFPS(24000,1001)
5. return StackHorizontal(HorizontalReduceBy2(left), HorizontalReduceBy2(right)).Trim(0, 132516)

This loads the H.264/AVC decoder, the stereo plugin, reads the left-eye stream index file, a dga created by DGAVCIndex, and loads a configuration file that defines both streams as they were demuxed by using eac3to. Then it returns a shrinked version of the output (compressing the 3840×1080 full SBS stream to 1920×1080 half SBS, that most TVs can handle), trimming the last 50 frames away to work around a frame alignment bug between decoder and encoder. The cfg file looks somewhat like this:

1. InputFile = "D:\left.h264"
2. InputFile2 = "D:\right.h264"
3. FileFormat = 0
4. POCScale = 1
5. DisplayDecParams = 1
6. ConcealMode = 0
7. RefPOCGap = 2
8. POCGap = 2
9. IntraProfileDeblocking = 1
10. DecFrmNum = 0

By playing around with it I found out that it was truly the right-eye MVC decoding that did no longer work, H264StereoSource() was just broken, so ssifSource2() then! BD3D2MK3D will allow you to use its GUI to analyze a decrypted 3D Blu-Ray folder structure, and generate an AviSynth script for you. There was only one part that did not work for me, and that was the determination of the frame count. But there’s an easy fix. The AviSynth script that BD3D2MK3D will generate will look somewhat like this:

expand/collapse source code
1. # Avisynth script generated Wed Mar 06 18:07:23 CET 2013 by BD3D2MK3D v0.13
2. # to convert "D:\0_ADVDWORKDIR\My Movie\3D\MYMOVIE\BDMV\STREAM\SSIF\00001.ssif"
3. # (referenced by playlist 00001.mpls)
4. # to Half Side by Side, Left first.
5. # Movie title: My Movie 3D
6. #
7. # Source MPLS information:
8. # MPLS file: 00001.mpls
9. # 1: Chapters, 16 chapters
10. # 2: h264/AVC  (left eye), 1080p, fps: 24 /1.001 (16:9)
11. # 3: h264/AVC (right eye), 1080p, fps: 24 /1.001 (16:9)
12. # 4: DTS Master Audio, English, 7.1 channels, 16 bits, 48kHz
13. # (core: DTS, 5.1 channels, 16 bits, 1509kbps, 48kHz)
14. # 5: AC3, English, 5.1 channels, 640kbps, 48kHz, dialnorm: 28dB
15. # 6: DTS Master Audio, German, 5.1 channels, 16 bits, 48kHz
16. # (core: DTS, 5.1 channels, 16 bits, 1509kbps, 48kHz)
17. # 7: AC3, Turkish, 5.1 channels, 640kbps, 48kHz
18. # 8: Subtitle (PGS), English
19. # 9: Subtitle (PGS), German
20. # 10: Subtitle (PGS), Turkish
21. 
22. #LoadPlugin("C:\Program Files (x86)\VFX\BD3D2MK3D\toolset\stereoplayer.exe\DirectShowMVCSource.dll")
23. ## Alt method using ssifSource2 (for SBS or T&amp;B only):
24. LoadPlugin("C:\Program Files (x86)\VFX\BD3D2MK3D\toolset\stereoplayer.exe\ssifSource2.dll")
25. 
26. SIDEBYSIDE_L     = 14 # Half Side by Side, Left first
27. SIDEBYSIDE_R     = 13 # Half Side by Side, Right first
28. OVERUNDER_L      = 16 # Half Top/Bottom, Left first
29. OVERUNDER_R      = 15 # Half Top/Bottom, Right first
30. SIDEBYSIDE_L     = 14 # Full Side by Side, Left first
31. SIDEBYSIDE_R     = 13 # Full Side by Side, Right first
32. OVERUNDER_L      = 16 # Full Top/Bottom, Left first
33. OVERUNDER_R      = 15 # Full Top/Bottom, Right first
34. ROWINTERLEAVED_L = 18 # Row interleaved, Left first
35. ROWINTERLEAVED_R = 17 # Row interleaved, Right first
36. COLINTERLEAVED_L = 20 # Column interleaved, Left first
37. COLINTERLEAVED_R = 19 # Column interleaved, Right first
38. OPTANA_RC        = 45 # Anaglyph: Optimised, Red - Cyan
39. OPTANA_CR        = 46 # Anaglyph: Optimised, Cyan - Red
40. OPTANA_YB        = 47 # Anaglyph: Optimised, Yellow - Blue
41. OPTANA_BY        = 48 # Anaglyph: Optimised, Blue - Yellow
42. OPTANA_GM        = 49 # Anaglyph: Optimised, Green - Magenta
43. OPTANA_MG        = 50 # Anaglyph: Optimised, Magenta - Green
44. COLORANA_RC      = 39 # Anaglyph: Colour, Red - Cyan
45. COLORANA_CR      = 40 # Anaglyph: Colour, Cyan - Red
46. COLORANA_YB      = 41 # Anaglyph: Colour, Yellow - Blue
47. COLORANA_BY      = 42 # Anaglyph: Colour, Blue - Yellow
48. COLORANA_GM      = 43 # Anaglyph: Colour, Green - Magenta
49. COLORANA_MG      = 44 # Anaglyph: Colour, Magenta - Green
50. SHADEANA_RC      = 33 # Anaglyph: Half-colour, Red - Cyan
51. SHADEANA_CR      = 34 # Anaglyph: Half-colour, Cyan - Red
52. SHADEANA_YB      = 35 # Anaglyph: Half-colour, Yellow - Blue
53. SHADEANA_BY      = 36 # Anaglyph: Half-colour, Blue - Yellow
54. SHADEANA_GM      = 37 # Anaglyph: Half-colour, Green - Magenta
55. SHADEANA_MG      = 38 # Anaglyph: Half-colour, Magenta - Green
56. GREYANA_RC       = 27 # Anaglyph: Grey, Red - Cyan
57. GREYANA_CR       = 28 # Anaglyph: Grey, Cyan - Red
58. GREYANA_YB       = 29 # Anaglyph: Grey, Yellow - Blue
59. GREYANA_BY       = 30 # Anaglyph: Grey, Blue - Yellow
60. GREYANA_GM       = 31 # Anaglyph: Grey, Green - Magenta
61. GREYANA_MG       = 32 # Anaglyph: Grey, Magenta - Green
62. PUREANA_RB       = 23 # Anaglyph: Pure, Red - Blue
63. PUREANA_BR       = 24 # Anaglyph: Pure, Blue - Red
64. PUREANA_RG       = 25 # Anaglyph: Pure, Red - Green
65. PUREANA_GR       = 26 # Anaglyph: Pure, Green - Red
66. MONOSCOPIC_L     = 1 # Monoscopic 2D: Left only
67. MONOSCOPIC_R     = 2 # Monoscopic 2D: Right only
68. 
69. # Load main video (0 frames)
70. #DirectShowMVCSource("D:\0_ADVDWORKDIR\My Movie\3D\MYMOVIE\BDMV\STREAM\SSIF\00001.ssif", seek=false, seekzero=true, stf=SIDEBYSIDE_L)
71. ## Alt method using ssifSource2
72. ssifSource2("D:\0_ADVDWORKDIR\My Movie\3D\MYMOVIE\BDMV\STREAM\SSIF\00001.ssif", 153734, left_view = true, right_view = true, horizontal_stack = true)
73. 
74. AssumeFPS(24000,1001)
75. # Resize is necessary only for Half-SBS or Half-Top/Bottom,
76. # or when the option to Resize to 720p is on.
77. LanczosResize(1920, 1080)
78. # Anaglyph and Interleaved modes are in RGB32, and must be converted to YV12.
79. #ConvertToYV12(matrix="PC.709")

So here, when ssifSource2() is being called, there is a number “153734”, where initially there will only be “0”. I got the run time and frame rate of the movie from the file eac3to_demux.log, that BD3D2MK3D generates. Using that, you can easily calculate your total frame number to enter there, like I have already done in this example. Also enter the framerate in the AssumeFPS() function. Here, 24000,1001 means 24000/1001p or in other words roughly 23.976fps, which is the NTSC film frame rate. Now, the tricky part: For this function to operate, the libraries and binaries of BD3D2MK3D must be in the systems search path and our x264 encoder must sit in the same directory as the libs, also you will want to make sure you’re using a 32-Bit version of x264. In our example, the installation path of BD3D2MK3D is C:\Program Files (x86)\VFX\BD3D2MK3D\, and the tools and libs are in C:\Program Files (x86)\VFX\BD3D2MK3D\toolset\stereoplayer.exe\. So, open a cmd shell, copy your preferred 32-bit x264.exe to C:\Program Files (x86)\VFX\BD3D2MK3D\toolset\stereoplayer.exe\ and then add the necessary paths to the search path:

set %PATH%=C:\Program Files (x86)\VFX\BD3D2MK3D\toolset\stereoplayer.exe;C:\Program Files (x86)\VFX\BD3D2MK3D\toolset;%PATH%

Of course, you’ll need to adjust the paths to reflect your local BD3D2MK3D installation folder. Now, you can call x264 using the generated AviSynth script as its input (usually called _ENCODE_3D_MOVIE.avs), and it should work! Just make sure you call x264.exe from that folder C:\Program Files (x86)\VFX\BD3D2MK3D\toolset\stereoplayer.exe\, and not from somewhere else on the system, or you’ll get this: avs [error]: Can't create the graph!

For me, x264 using ssifSource2() sometimes talks about adding tons of duplicate frames while transcoding. I am not sure what this means and how it may affect things, but for now output seems to work just fine regardless. When it does happen, it looks like this:

Nonetheless, it creates half-SBS output, and using MKVtoolnix you can multiplex that into an MKV/MK3D container setting the proper stereoscopy flags (usually half SBS, left eye first) and everything will work just fine.

My method using half-SBS will cost half the horizontal resolution unfortunately, but few TVs support full SBS, plus it’s costly because files would become far larger at similar quality settings. So far I’ve been pretty surprised how good, sharp and detailed it still looks, despite the halved resolution. So far it’s the only real possibility anyway, as full-SBS (using two 1920×1080 frames besides each other) is not very wide-spread, large and bandwidth-hungry, and for the frame packed mode that Blu-Rays use, there are simply no free tools available so far, so direct MVC transcoding is not an option.

But, at least it works now, thanks to the [help from the Doom9 forums]!

Update: It seems the frame duplicates are a really weird bug, that makes the output very jittery and no longer as smooth as you would expect. Even stranger is the fact, that it is not deterministic. Sometimes it happens a lot, adding tens of thousands of frame duplicates. Then, abort, restart the process and the problem is magically gone. Sometimes you might have to restart 10-20 times and all of a sudden it works. How is this even possible? I do not know. Luckily, BD3D2MK3D also comes with a second Avisynth plugin that you can use instead of ssifSource2.dll, this one is called DirectShowMVCSource.dll and gives you the corresponding function DirectShowMVCSource().

For DirectShowMVCSource() you will also no longer need to determine a frame count, it will be able to do that all by itself. Also, BD3D2MK3D will automatically add it to the Avisynth script that it generates, it’s just commented out. So, uncomment it, comment the ssifSource2 lines instead, and you’re good to go. So far, the error rate seems to be far lower with DirectShowMVCSource(), although I had a problem once, with a very long movie / high frame count, here a 2nd pass encode would just stall at 0% CPU usage.

But other than that it seems to be more solid, even if it has to suboptimally filter stuff through DirectShow. There is another more significant drawback though: DirectShowMVCSource() is a lot slower than ssifSource2(). I would say, it’s a factor of 3-4 even. But at least it’s giving us another option!

Ok ok, I guess whoever is reading this (probably nobody anyway) will most likely already be tired of all this x264 stuff. But this one I need as documentation for myself anyway, because the experiment might be repeated later. So, the [chair for simulation and modelling of metallurgic processes] here at my university has allowed me to try and play with a distributed grid-style Linux cluster built by Supermicro. It’s basically one full rack cabinet with one Pentium 4 3.2GHz processor and 1GB RAM per node, with Hyper-Threading being disabled because it slowed down the simulation jobs that were originally being run on the cluster. Operating system for the nodes was OpenSuSE 10.3. Also, the head node was very similar to the compute nodes, which made it easy to compile libav and x264 on the head node and let the compute nodes just use those binaries.

The software installed for using it is the so called Sun GRID engine. I  have once already set up my own distributed cluster based on an OpenPBS style system called Torque, together with the Maui scheduler. When I was introduced to this Sun GRID engine, most of the stuff seemed awfully familiar, even the job submission tools and scripting system were quite the same actually. So this system uses tools like qsub, qdel, qstat plus some additional ones not found in the open source Torque system, like sns.

Now since x264 is not cluster-aware and not MPI capable, how DO we actually distribute the work across several physical machines? Lacking any more advanced approaches, I chose a very crude way to do it. Basically, I just cut the input video into n slices, where n is the number of cluster nodes. Since the cluster nodes all have access to the same storage backend via NFS, there was no need to send the files to the nodes, as access to the users home directory was a given.

Now, to make the job more easy, all slices were numbered serially, and I wrote a qsub job array script, where the array id would be used to specify the input file. So node[2] would get file[2], node[15] would get file[15] to encode etc. The job array script would then invoke the actual worker script. This is what the sliced input files look like before starting the computation:

And here, the qsub job array script that I sent to the cluster, called benchmark-qsub.sh, the directory /SAS/home/autumnf is the users home directory:

#$-N x264benchmark #$ -t 1-19

export PATH=$PATH:/SAS/home/autumnf/usr/bin echo$PATH

cd /SAS/home/autumnf/x264benchmark

time transcode.sh

And the actual worker script, transcode.sh:

#!/bin/bash
# Pass 1:
x264 --preset veryslow --tune film --b-adapt 2 --b-pyramid normal -r 3 -f -2:0 --bitrate 10000 --aq-mode 1 -p 1 --slow-firstpass --stats benchmark_slice$SGE_TASK_ID.stats -t 2 --no-fast-pskip --cqm flat slice$SGE_TASK_ID.264 -o benchmark_1stpass_slice$SGE_TASK_ID.264 # Pass 2: x264 --preset veryslow --tune film --b-adapt 2 --b-pyramid normal -r 3 -f -2:0 --bitrate 10000 --aq-mode 1 -p 2 --stats benchmark_slice$SGE_TASK_ID.stats -t 2 --no-fast-pskip --cqm flat slice$SGE_TASK_ID.264 -o benchmark_2ndpass_slice$SGE_TASK_ID.264

As you can see, the worker is using the environment variable $SGE_TASK_ID as a part of the input and output file names. This variable contains the job array id passed down from the job submission system of the Sun GRID engine. The actual job submission script contains the line #$ -t 1-19 which tells the system, that the job array consists of 19 jobs, as the cluster had 19 working nodes left, the rest was already dead as the cluster was pretty much out of service and hence unmaintained. Let’s see how the Sun tool sns reports the current status of the grid cluster:

So, some nodes are in “au” or “E” status. While I do not know the exact meaning of the status abbreviations, that basically means that those nodes are non-functional. Taking the broken nodes into account we have 19 working ones left. Now every node invokes its own x264 job and gives to it the proper input file from slice1.264 to slice19.264, writing correspondingly named outputs for both passes. Now let’s send the script to the Sun GRID engine using qsub ./benchmark-qsub.sh and check what sns has to say about this afterwards:

Hurray! Now if you’re more used to OpenPBS style tools, we can also use qstat to report the current job status on the cluster:

As you can see,  qstat also reports a “ja-task-ID”, which is essentially our job array id or in other words \$SGE_TASK_ID. So thats basically one job with one job id, but 19 “daughter” processes, each having its own array id. Using tools like qdel or qalter you can either modify the entire job, or only subprocesses on specific nodes. Pretty handy. Now the Pentium 4 processor might suck ass, but 19 of them are still pretty damn powerful when combined, at the moment of writing you can find the cluster on [place #4 on the results list]! Here the Voodooalert style result, just under one hour:

0:58:01.600 | SMMP | 19/1/1 | Intel Pentium 4 (no HT) 3.20GHz | 1GB DDR-I/266 (per node) | SuperMicro/SGE GRID Cluster | OpenSuSE 10.3 Linux (Custom GCC Build)

To ensure that this is actually working, I have recombined the output slices of pass 2, and tried to play that file. To my surprise it worked and would also allow seeking. Pretty nice considering that there is quite some bogus data in the file, like multiple H.264/AVC headers or cut up frames. I originally tried to split the input file into slices at keyframes in a clean fashion using ffmpeg, but that just wouldn’t work for that type of input, so I had to use dd, resulting in some frames being cut up (and hence dropped), and slices 2-19 having no headers. That required very specific versions of libav and x264, as not all versions can accept garbled files like this.

Also, the output files have been recombined using dd. Luckily, mplayer using libav/ffmpeg would play that stuff nicely, but there’s simply no guarantee that every player and/or decoder would. So that’s why it cannot be considered a clean solution. Also, since motion estimation is less efficient for this setup at the cutting points, it’s not directly comparable to a non-clustered run. So there are some drawbacks. But if you would cluster x264 for productive work, you’d still do it kind of like that. Here, the final output, already containing the final concatenated file, quite a mess of files right there:

So this is it. The clustered x264. I hope to be able to test this approach on another cluster at the Metallurgy chair in the next months, a Nehalem-based machine with far more cores, so that’d be really massive. Also, access to a Sandy Bridge-E cluster is possible, although not really probable. But we’ll see. If you’re interested in using x264 in a similar approach, you might want to check out the software versions that I used, these should be able to cope with rudely cut up slices quite well:

Also, if you require some guidance building that source code on Linux, please check out my guide:

If anybody knows a better way to slice up H.264/AVC elementary video streams, by all means, let me know! I would love to be able to have slices cut at proper keyframe positions including their own header, and I would also like to be able to reconcatenate slices to one file that is clean, having only one header at the beginning of the file and no damaged / to be dropped frames at their joints. So if you know how to do that – preferrably using Linux command line tools – just tell me, I’d be happy to learn that!

Edit: Thanks to [LoRd_MuldeR] from the [Doom9 Forums] I now have a way of splitting the input stream cleanly at GOP (group of pictures) boundaries, as the Elephants Dream movie is luckily using closed GOPs. Basically, it involves the widely-used [MKVtoolnix]. With that tool, you can just take the stream, split it to n MKV slices, and then either use those or extract the H.264/AVC streams from those slices, maybe using [tsMuxer]. Just make as many as your cluster has compute nodes, and you’re done!

By the way, both MKVtoolnix and tsMuxer are available for Windows and Linux, also MacOS X.

This is clean, safe and proper, other than my dirty previous approach!

At first I thought there were only three big BSD distributions, namely FreebBSD, OpenBSD and NetBSD. But there is a fourth one, which might very well become the last for me to port the x264 benchmark to. DragonFly BSD. Actually a pretty cool system with its own, quite modern file system called “HAMMER”. Surprisingly, like NetBSD it made my life [quite easy], so now 50% of the major BSD systems are covered successfully. OpenBSD might eventually join the ranks as soon as version 5.2 is being released, supposedly including a more modern compiler and hopefully assembler. But so far, it’s NetBSD and DragonFly BSD.

Interestingly, DragonFly BSD not only features its own file system, but also its own light-weight threading implementation. Of course I cannot use that, as I have to stick to a threading model supported by libav/x264 (either BeOS threads which are broken anyway, win32 or posix, so posix it is), but it really looks like a lot of work went into this project. On the down side, it only supports x86_32 and x86_64, where the other BSDs support a sometimes extremely wide range of microarchitectures, with NetBSD clearly [in the lead] even over any Linux distribution.

Now it seems almost everything that can be conquered, has been conquered. From oddballs like the Win32 ReactOS to strange alien systems like Haiku and Unices like Solaris and different BSDs. And Linux of course, on quite some different architectures (MIPS, PPC, ARM, IA-64, …). While OpenBSD might still follow, and someday maybe even FreeBSD, I think the exploration time is pretty much over.

I did try to get on to [Fafner], a VAX running in the cellar of a crazy university professor, but it seems  the OpenVMS operating system with its compilers and build toolchains is far, far beyond my porting skills. Same goes for weirdos like Tanenbaums Minix. So for now, this is it:

It seems that some miracle has happened. I have now tried quite some builds of ReactOS when i stumbled over a certain [v0.4-SVN r57481] just two days ago. Now previously, that OS would just bluescreen on larger data transfers. Older builds would give an NTOSKRNL.EXE bluescreen, newer ones would show the tcpip.sys kernel driver faulting. Also, on x264 CPU/RAM load the kernel would just die after very few minutes, sometimes seconds. So my little x264 benchmark would reach something like 300-400 (of 15691) frames, and then everything would go right down to hell.

As you can imagine, I was quite pleasantly surprised when all of a sudden I could easily download an almost 700MB large file from the internet with v0.4-SVN r57481 without the transfer stalling (which also happened sometimes previously) or the machine bluescreening. I was even more suprised when i ran x264 and found the box to gnaw its way through 8 hours of high load without a single glitch! Now dear developers, that’s quite some progress right there in kernel space, I am impressed!

As for  the x264 benchmark thing, there are a few minor modifications necessary for it to do its time measurement correctly, but that’s no big issue. Links to corresponding guides have already been [added]. Here you can see the thing running for quite some time already, with no crash. I still can’t believe it, so many problems gone all of a sudden:

Maybe ReactOS can become a real open source replacement for Windows one day? The sudden leap in stability is kind of reassuring. If only they could get more developers working on this stuff.

Now this was freaky. Very, very freaky.  I don’t even want to begin talking about all the dirty, crappy details, but I managed to “defeat” both Haiku and also the even more nasty OpenSolaris 11 when it comes to compiling and linking libav and x264 on those operating systems. And yes, as for x264+libav, I have even used the newest versions successfully (nightly snapshots). So, no dirty details, let’s just briefly sum up what went so terribly wrong on each one of the two weirdos.

Oh, and before that, just in case you don’t know the two: Haiku is an open source operating system unrelated to any other strain of OS. Derived from the multimedia-centered BeOS it has been reinforced with Posix and some GNU tools. Haiku runs native BeOS programs. OpenSolaris on the other hand is Suns (now Oracles) UNIX derivative. It is a bit strange in actually being open source too, despite it being a UNIX system. Well, here we go:

Haiku problems:

• Need developer version with GCC4.
• yasm compilers configure scripts need adjustment for operating system detection.
• Needs fixing of linker flags in LDFLAGS and some scripts, because OS functions are implemented in non-standard libraries (libroot, libnetwork).
• libav needs entirely manual CPU instruction extension configuration, or code will break (config.mak, config.asm, config.h).
• libav warning/error flags need adjustment, or compilation will break.
• x264 OS detection uses wrong linker flags and wrong threading model (BeOS instead of Posix). Needs to be fixed in the script.
• x264 needs manual source code patch to remove invocation of nice() system call, which does not exist on Haiku.
• x264 needs manual thread count specification, logical CPU count detection failing.

OpenSolaris problems:

• Has lots of tools that look like GNU tools (awk, sed, make, echo, rm, cp,  cat, test, cmp, sort, …) but don’t work like them. Breaks EVERYTHING. Needs manual fixes in MANY scripts and files all over the place to replace them with GNU tools (gawk, gsed, gmake, gecho, …).
• Needs fixing of linker flags in LDFLAGS and some scripts, because OS functions are implemented in non-standard libraries (libxnet, libresolv).
• libc is not standard compliant. Need to link against supplemental library “values-xpg6.o” to extend libc functionality to reach standard compliance.
• libav needs all assembler code (external and inline) disabled, or libraries will break with segmentation fault.
• libav needs manual removal of certain instruction extensions from multiple post-configure files.
• x264 needs manually set linker and include paths even if they’re in default search paths. Reason unknown.
• x264 OS detection routine in configure script wants to do it right, but fails. Needs manual fix.
• x264 needs manual removal of certain compiler flags that will break the codes capability to link against libc (-O3 becomes -O1, and -ffast-math, -mfpmath=sse, -msse need to go altogether). Otherwise: Segmentation fault.
• x264 needs one hell of a lot of fixing of its “install” target in the Makefile. Another Sun<->GNU fuckup.
• x264 needs manual thread count specification, logical CPU count detection failing.

I have now updated [this post with all the guide links], it now offers Linux, MacOS X @ Intel, MacOS X @ PowerPC, OpenSolaris and Haiku guides in both english and german, in case you’re interested. And here are a few screenshots of those OS weirdos, the newest freaky additions to the [x264 benchmark project]:

Now I would absolutely love to take on ReactOS again, but I tried a few development builds and they just fail in every way thinkable. The kernel is a mess, bluescreens all the time. Pity, because it would be an easy platform to port to, because as a matter of fact you don’t need to “port” anything, just use the Windows version. Sounds so beautifully easy and it IS. But it also bluescreens. All the time. Maybe I’ll try that one later, like in a few months or so. But I doubt x264 will ever work on ReactOS, those guys have been working on that OS for 14 years, and it even bluescreens when downloading a simple file over HTTP. Or FTP. Or anything actually. *sigh*

Oh, and before that, just in case you don’t know the two: Haiku is an open source operating system unrelated to any other strain of OS. Derived from the multimedia-centered BeOS it has been reinforced with Posix and some GNU tools. Haiku runs native BeOS programs. OpenSolaris on the other hand is Suns (now Oracles) UNIX derivative. It is a bit strange in actually being open source too, despite it being a UNIX system. Well, here we go:

And so the quest continues, today, a new operating system type: BeOS, in its modern open source form: Haiku. BeOS was originally developed in the 90s as a multimedia operating system to compete with Microsoft Windows and Apple MacOS. However, the operating system never quite took off. In more recent times, the entire OS has been resurrected under a new, japanese name: Haiku.

Still featuring a BeOS style kernel, the OS has been equipped with a lot of Posix & GNU tools to easy the pain of porting and developing software for the OS. It actually looks and feels very tidy, fast and cool.

In recent development builds, there are even versions with GCC 4.6 shipped, but on top of that, what you get is also a complete modern GNU build toolchain, including but not limited to yasm, autoconf, make, imake etc. So I tried to build and link libav and x264 obviously, but failed. One problem was that some OS functions (like Posix thread implementation) are implemented in different libraries (libroot instead of libm or libpthread), so modifications of linker flags are necessary, e.g. -lroot instead of -lm.

But there were more severe problems prohibiting me from linking x264 against libav or ffmpeg itself. I have as of yet not been able to fully figure out why, but it’s most definitely a linker problem with failing library/header detections and even missing references on linking. Maybe some of the libs are even actually missing, but I am not sure why they wouldn’t have built when compiling libav and/or ffmpeg. Well, maybe I’ll manage in the future, meanwhile, check out this Haiku screenshot, it does look rather cool: