Jul 272016
 

x264 logoSince I’ve been doing a bit of Anime batch video transcoding with x264 and x265 in the last few months, I thought I’d document this for myself here. My goal was to loop over an arbitrary amount of episodes and just batch-transcode them all at once. And that on three different operating systems: Windows (XP x64), Linux (CentOS 6.8 x86_64) and FreeBSD 10.3 UNIX, x86_64. Since I’ve started to split the work across multiple machines, I lost track of what was where and which machine finished what, and when.

So I thought, why not let the loop send me a small notification email upon completion? And that’s what I did. On Linux and UNIX this relies on the bash shell and the mailx command. Please note that I’m talking about [Heirloom mailx], not some other mail program by the same name! I’m mentioning this, because there is a different default mailx on FreeBSD, that won’t work for this. That’s why I put alias mailx='/usr/local/bin/mailx' in my ~/.bash_profile on that OS after installing the right program to make it the default for my user.

On Windows, my loops depend on my own [colorecho] command (you can replace that with cmds’ ECHO if you want) as well as the command line mailer [blat]. Note that, if you need to use SSL/TLS encryption when mailing, blat can’t do that. A suitable replacement could be [mailsend]. Please note, that mailsend does not work on Windows XP however.

In the x265 case, avconv (from the [libav] package) is required on all platforms. You can get my build for Windows [here]. If you don’t like it, the wide-spread [ffmpeg] can be a suitable drop-in replacement.

Now, when setting up blat on Windows, make sure to run blat -help first, and learn the details about blat -install. You need to run that with certain parameters to set it up for your SMTP mail server. For whatever reason, blat reads some of that data from the registry (ew…), and blat -install will set that up for you.

Typically, when I transcode, I do so on the elementary streams rather than .mkv files directly. So I’d loop through some source files and extract the needed streams. Let’s say we have “A series – episode 01.mkv” and some more, all the way up to “A series – episode 13.mkv”, then, assuming track #0 is the video stream…

On Windows:

FOR %I IN (01,02,03,04,05,06,07,08,09,10,11,12,13) DO mkvextract tracks "A series - episode %I.mkv" ^
 0:%I\video.h264

On Linux/UNIX:

for i in {01..13}; do mkvextract tracks "A series - episode $i.mkv" 0:$i/video.h264; done

mkvextract will create the non-existing subfolder for us, and a x264 transcoding loop would then look like this on Windows:

expand/collapse source code
cmd /V /C "ECHO OFF & SET MACHINE=NOVASTORM& SET EPNUM=13& SET SERIES="AnimeX"& (FOR %I IN ^
 (01,02,03,04,05,06,07,08,09,10,11,12,13) DO "c:\Program Files\VFX\x264cli\x264-10b.exe" --fps ^
 24000/1001 --preset veryslow --tune animation --open-gop -b 16 --b-adapt 2 --b-pyramid normal -f ^
 -2:0 --bitrate 2500 --aq-mode 1 -p 1 --slow-firstpass --stats %I\v.stats -t 2 --no-fast-pskip ^
 --cqm flat --non-deterministic --demuxer lavf %I\video.h264 -o %I\pass1.264 & colorecho "Pass 1 ^
 done for Episode %I/"!EPNUM!" of "!SERIES!"" 10 & ECHO. & ^
 "c:\Program Files\VFX\x264cli\x264-10b.exe" --fps 24000/1001 --preset veryslow --tune animation ^
 --open-gop -b 16 --b-adapt 2 --b-pyramid normal -f -2:0 --bitrate 2500 --aq-mode 1 -p 2 --stats ^
 %I\v.stats -t 2 --no-fast-pskip --cqm flat --non-deterministic --demuxer lavf %I\video.h264 -o ^
 %I\pass2.264 & colorecho "Pass 2 done for Episode %I/"!EPNUM!" of "!SERIES!"" 10) & echo !SERIES! ^
 transcoding complete | blat - -t "myself@another.mailhost.com" -c "myself@mailhost.com" -s "x264 ^
 notification from !MACHINE!" & SET MACHINE= & SET EPNUM= & SET SERIES="

Note that I always write all the iteration out in full here. That’s because cmd can’t do loops with leading zeroes in the iterator. The reason for this is that those source files usually have them in their lower episode numbers. If it wasn’t 01,02, … ,12,13, but 1,2, … ,12,13 instead, you could do FOR /L %I IN (1,1,13) DO. But this isn’t possible in my case. Even if elements need alphanumeric names like here,  FOR %I IN (01,02,03,special1,special2,ova1,ova2) DO, you still won’t need that syntax on Linux/UNIX because the bash can have iterator groups like for i in {{01..13},special1,special2,ova1,ova2}; do. Makes me despise the cmd once more. ;)

Edit:

Ah, according to [this], you can actually do something like cmd /V /C "FOR /L %I IN (1,1,13) DO (SET "fI=00%I" & echo "!fI!:~-2")", holy shit. It actually works and gives you leading zeroes. :~-2 for 2 digits, :~-3 for three. Expand fI for more in this example. I mean, what is this even? Some number formatting magic? I probably don’t even wanna know… Couldn’t find any way of having several groups for the iterator however. Meh. Still don’t like it.

So, well, it’s like this on Linux/UNIX:

expand/collapse source code
(export MACHINE=BEAST EPNUM=13 SERIES='AnimeX'; for i in {01..13}; do nice -n19 x264 --fps \
24000/1001 --preset veryslow --tune animation --open-gop -b 16 --b-adapt 2 --b-pyramid normal -f \
-2:0 --bitrate 2500 --aq-mode 1 -p 1 --slow-firstpass --stats $i/v.stats -t 2 --no-fast-pskip \
--cqm flat --non-deterministic --demuxer lavf $i/video.h264 -o $i/pass1.264 && echo && echo -e \
"\e[1;31m`date +%H:%M`, pass 1 done for episode $i/$EPNUM of $SERIES\e[0m" && echo && nice -n19 \
x264 --fps 24000/1001 --preset veryslow --tune animation --open-gop -b 16 --b-adapt 2 --b-pyramid \
normal -f -2:0 --bitrate 2500 --aq-mode 1 -p 2 --stats $i/v.stats -t 2 --no-fast-pskip --cqm flat \
--non-deterministic --demuxer lavf $i/video.h264 -o $i/pass2.264 && echo && echo -e \
"\e[1;31m`date +%H:%M`, pass 2 done for episode $i/$EPNUM of $SERIES\e[0m" && echo; done && echo \
"$SERIES transcoding complete" | mailx -s "x264 notification from $MACHINE" -r \
"myself@mailhost.com" -c "myself@another.mailhost.com" -S smtp-auth="login" -S \
smtp="smtp.mailhost.com" -S smtp-auth-user="myuser" -S smtp-auth-password="mysecurepassword" \
myself@mailhost.com)

The variable $MACHINE or %MACHINE%/!MACHINE! specifies the machines’ host name. This will be noted in the email, so I know which machine just completed something. $EPNUM – or %EPNUM%/!EPNUM! on Windows – is used for periodic updates on the shell. The output would be like “Pass 1 done for Episode 07/13 of AnimeX” in green on Windows and bold red on Linux/UNIX (just change the color to your liking).

Finally, $SERIES aka %SERIES%/!SERIES! would be the series’ name. So say, the UNIX machine named “BEAST” above is done with this loop. The email would come with the subject line “x264 notification from BEAST” and would read “AnimeX transcoding complete” in plain text. That’s all.

Please note, that cmd batch on Windows is extremely creepy. Every whitespace (especially the leading ones when doing multi-line like this for display) needs to be exactly where it is. The same goes for double quotes where you might think they aren’t needed. They are! Also, this needs delayed variable expansion once again, which is why we see variables like !EPNUM! instead of %EPNUM% and why it’s called in a subshell by running cmd /V /C.

On Linux/UNIX we don’t need to rely on some specific API like cmds’ SetConsoleTextAttribute() to print colors, as most terminals understand ANSI color codes.

And this is what it looks like for x265:

Windows:

expand/collapse source code
cmd /V /C "ECHO OFF & SET MACHINE=NOVASTORM& SET EPNUM=13& SET SERIES="AnimeX"& (FOR %I IN ^
 (01,02,03,04,05,06,07,08,09,10,11,12,13) DO avconv -r 24000/1001 -i %I\video.h264 -f yuv4mpegpipe ^
 -pix_fmt yuv420p -r 24000/1001 - 2>NUL | "C:\Program Files\VFX\x265cli-mb\x265.exe" - --y4m -D 10 ^
 --fps 24000/1001 -p veryslow --pmode --pme --open-gop --ref 6 --bframes 16 --b-pyramid --bitrate ^
 2500 --rect --amp --aq-mode 3 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 ^
 --psy-rdoq 5.0 --rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 ^
 --pass 1 --slow-firstpass --stats %I\v.stats --sar 1 --range full -o %I\pass1.h265 & colorecho ^
 "Pass 1 done for Episode %I/"!EPNUM!" of "!SERIES!"" 10 & ECHO. & avconv -r 24000/1001 -i ^
 %I\video.h264 -f yuv4mpegpipe -pix_fmt yuv420p -r 24000/1001 - 2>;NUL | ^
 "C:\Program Files\VFX\x265cli-mb\x265.exe" - --y4m -D 10 --fps 24000/1001 -p veryslow --pmode ^
 --pme --open-gop --ref 6 --bframes 16 --b-pyramid --bitrate 2500 --rect --amp --aq-mode 3 ^
 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq 5.0 --rdoq-level 1 ^
 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 2 --stats %I\v.stats --sar ^
 1 --range full -o %I\pass2.h265 & colorecho "Pass 2 done for Episode %I/"!EPNUM!" of "!SERIES!"" ^
 10) & echo !SERIES! transcoding complete | blat - -t "myself@another.mailhost.com" -c ^
 "myself@mailhost.com" -s "x265 notification from !MACHINE!" & SET MACHINE= & SET EPNUM= & SET ^
 SERIES="

Linux/UNIX:

expand/collapse source code
(export MACHINE=BEAST EPNUM=13 SERIES='AnimeX'; for i in {01..13}; do avconv -r 24000/1001 -i \
$i/video.h264 -f yuv4mpegpipe -pix_fmt yuv420p -r 24000/1001 - 2>/dev/null | nice -19 x265 - --y4m \
-D 10 --fps 24000/1001 -p veryslow --open-gop --ref 6 --bframes 16 --b-pyramid --bitrate 2500 \
--rect --amp --aq-mode 3 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd 1.6 --psy-rdoq \
5.0 --rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 --pass 1 \
--slow-firstpass --stats $i/v.stats --sar 1 --range full -o $i/pass1.h265 && echo && echo -e \
"\e[1;31m`date +%H:%M`, pass 1 done for episode $i/$EPNUM of $SERIES\e[0m" && echo && avconv -r \
24000/1001 -i $i/video.h264 -f yuv4mpegpipe -pix_fmt yuv420p -r 24000/1001 - 2>/dev/null | nice \
-19 x265 - --y4m -D 10 --fps 24000/1001 -p veryslow --open-gop --ref 6 --bframes 16 --b-pyramid \
--bitrate 2500 --rect --amp --aq-mode 3 --no-sao --qcomp 0.75 --no-strong-intra-smoothing --psy-rd \
1.6 --psy-rdoq 5.0 --rdoq-level 1 --tu-inter-depth 4 --tu-intra-depth 4 --ctu 32 --max-tu-size 16 \
--pass 2 --stats $i/v.stats --sar 1 --range full -o $i/pass2.h265 && echo && echo -e \
"\e[1;31m`date +%H:%M`, pass 2 done for episode $i/$EPNUM of $SERIES\e[0m" && echo; done && echo \
"$SERIES transcoding complete" | mailx -s "x265 notification from $MACHINE" -r \
"myself@mailhost.com" -c "myself@another.mailhost.com" -S smtp-auth="login" -S \
smtp="smtp.mailhost.com" -S smtp-auth-user="myuser" -S smtp-auth-password="mysecurepassword" \
myself@mailhost.com)

And that’s it. The loops for audio transcoding are simpler, as that part is so fast, it doesn’t need email notifications. Runs for minutes rather than days. When all is done, I’d usually fire up the MKVToolnix GUI, and prepare a mux for the first episode. There is a nice “copy command line to clipboard” function there when you click on “Muxing” after everything is set up. With that I can build another loop that muxes everything to final .mkv files. On Windows that part is more complicated if you want Unicode support, so I needed to create input files by using a generator I wrote in Perl for that, but that’s for another day… :)

Oh, and if you wanna ssh into your Linux or UNIX boxes from afar to check on your transcoders, consider launching them on a GNU screen] session. It’s immensely useful! Too bad it won’t work on the Windows cmd. :(

Jul 092012
 

PHP & MySQL LogoServing UTF-8 can be useful. Why? For instance to display text in several languages on your website. If you want to mix different European and for instance Asian languages, more limited character sets won’t suffice anymore. Thus, UTF-8. In my case, I wanted to display a mix of European and Chinese Mandarin. However, my PHP scripts, MySQL database and even the script textfiles themselves were not prepared. So what do you need to do? First, make sure your database is encoded in UTF-8, using a proper collation too:

ALTER DATABASE `dbname` CHARACTER SET `utf8` COLLATE `utf8_general_ci`;
ALTER TABLE `tablename` CONVERT TO CHARACTER SET `utf8` COLLATE `utf8_general_ci`;

Now your raw data will be stored in UTF-8. The next step is rather easy; Ensure that your script files themselves (also any raw HTML present) are encoded in UTF-8. For that – on MS Windows – you can just use [Notepad++], it can autodetect your source character set and convert to UTF-8.

Additionally, any raw HTML that you might have or that is written by PHP needs a proper encoding definition in the <head></head> section to tell the browser “My text is UTF-8”. Like this:

<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Your Title</title>
</head>

Good. Now the hard part. Since PHP sucks a bit, it will default to ISO-8859-1 in pretty much everything it’s doing, that’s basically some modern Latin1 character set including the ‘€’ symbol. But that ain’t good enough for displaying some Japanese or Chinese, so we need to take special care here. The first important thing is to make sure that PHP will read data from MySQL in the properly encoded fashion. Otherwise it will read UTF-8 assuming it’s ISO-8859-1, hence garbling most of the text. So when defining your MySQL connection in PHP, you need to add the mysql_set_charset() function. Could look like this:

$connection = mysql_connect($sql_host,$sql_user,$sql_pass)
mysql_select_db($sql_dbname, $connection)
mysql_set_charset('utf8',$connection);

Now PHP will read and output UTF-8 by default. But be aware that there are additional caveats in PHP! Most string processing functions like e.g. substr() won’t process strings with large multibyte characters correctly. They are hardcoded for ISO-8859-1! Luckily you can load the PHP extension mb_string, which provides multibyte-capable string processing functions. They just use a mb_ prefix, so substr() simply becomes mb_substr()! Still, that means you might need to update your PHP code for UTF-8 capability, sometimes significantly. A documentation of all multibyte-related functions in PHP can be found [here]. With that, handling UTF-8 in both input, processing and output shouldn’t be a problem anymore.