Thursday, November 26, 2009

Profiling ffmpeg on Linux x86_64

In this blog post I'll elaborate on how to profile ffmpeg v0.5 on Slamd64 12.1 with the GNU profiler (gprof). Well, the profiling is meant to find the bottleneck of ffmpeg (and probably do something with it in the future).
First, rebuild the ffmpeg package with profiling enabled. This is the important excerpt from my build script:

...
./configure \
--prefix=/usr \
--libdir=/usr/lib64 \
--mandir=/usr/man \
--shlibdir=/usr/lib64 \
--disable-debug \
...
--enable-gprof \
--disable-stripping \
...

#REMEMBER NOT TO STRIP THE FINAL BINARY -- SO COMMENT THE LINES BELOW
#find $PKG | xargs file | grep -e "executable" -e "shared object" | grep ELF \
# | cut -f 1 -d : | xargs strip --strip-unneeded 2> /dev/null || true
...

Now, rebuild the ffmpeg package. Install it afterwards.

Second, run ffmpeg to transcode a DVD VOB file into an mpeg4 avi file. This is the example:

ffmpeg -i /mnt/dvd/VIDEO_TS/VTS_01_4.VOB -f avi -vcodec mpeg4 -b 800k -g 300 -bf 2 -acodec libfaac -ab 128k outlander_01_4.avi

After the transcoding completed. We will have the profiling information in a file named gmon.out in the directory where ffmpeg was executed.

Next up, use gprof to generate a human readable statistics of the previous ffmpeg run.

gprof /usr/bin/ffmpeg gmon.out > stats.txt


At this point we have the profiling information in stats.txt. The following is snippets of the contents of my stats.txt

Flat profile:

Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
45.46 0.20 0.20 main
40.92 0.38 0.18 68375 0.00 0.00 output_packet
6.82 0.41 0.03 38468 0.00 0.00 do_audio_out
3.41 0.43 0.02 87594 0.00 0.00 write_frame
2.27 0.44 0.01 107095 0.00 0.00 print_report
1.14 0.44 0.01 1 5.00 5.00 opt_format
0.00 0.44 0.00 1982784 0.00 0.00 data_start
0.00 0.44 0.00 7 0.00 0.00 find_codec_or_die
0.00 0.44 0.00 7 0.00 0.00 set_context_opts
0.00 0.44 0.00 4 0.00 0.00 opt_default
0.00 0.44 0.00 2 0.00 0.00 opt_bitrate
0.00 0.44 0.00 1 0.00 0.00 av_exit
0.00 0.44 0.00 1 0.00 0.00 new_audio_stream
0.00 0.44 0.00 1 0.00 0.00 new_video_stream
0.00 0.44 0.00 1 0.00 0.00 opt_audio_codec
0.00 0.44 0.00 1 0.00 0.00 opt_input_file
0.00 0.44 0.00 1 0.00 0.00 opt_output_file
0.00 0.44 0.00 1 0.00 0.00 opt_video_codec
0.00 0.44 0.00 1 0.00 5.00 parse_options
0.00 0.44 0.00 1 0.00 0.00 print_all_lib_versions
0.00 0.44 0.00 1 0.00 0.00 show_banner

% the percentage of the total running time of the
time program used by this function.
...
granularity: each sample hit covers 2 byte(s) for 2.27% of 0.44 seconds

index % time self children called name

[1] 100.0 0.20 0.24 main [1]
0.18 0.05 68375/68375 output_packet [2]
0.01 0.00 107095/107095 print_report [5]
0.00 0.01 1/1 parse_options [7]
0.00 0.00 715631/1982784 data_start [8]
0.00 0.00 1/1 show_banner [21]
0.00 0.00 1/1 av_exit [13]
-----------------------------------------------
0.18 0.05 68375/68375 main [1]
[2] 51.1 0.18 0.05 68375 output_packet [2]
0.03 0.01 38468/38468 do_audio_out [3]
0.01 0.00 29892/87594 write_frame [4]
0.00 0.00 512643/1982784 data_start [8]
...
Index by function name

[13] av_exit [16] opt_audio_codec [2] output_packet
[8] data_start [12] opt_bitrate [7] parse_options
[3] do_audio_out [11] opt_default [20] print_all_lib_versions
[9] find_codec_or_die [6] opt_format [5] print_report
[1] main [17] opt_input_file [10] set_context_opts
[14] new_audio_stream [18] opt_output_file [21] show_banner
[15] new_video_stream [19] opt_video_codec [4] write_frame


There is an explanation following every statistics section generated by gprof. Therefore, it shouldn't be hard to analyze the profiling result. That's it for the moment.

Monday, November 23, 2009

OpenMP vs OpenMPI

For beginner "computationalist" like me, it's quite hard to understand the difference between OpenMP and OpenMPI. At first, I thought both of them tackles the same problem in the same way, namely parallel execution. However, after studying them both further, it's clear that OpenMPI uses a distributed-memory architecture while OpenMP uses shared-memory model. Both of the memory architecture can be explained as follows:

  • In a distributed-memory architecture, each process doesn't share the same address space as the other process (which very possibly run on different machine). This means each process cannot "see" the other process variable(s). The process must "send a message" to the other process to change variable in the other process. Hence, the "Massage Passing Interface (MPI)". The MPI library such as OpenMPI basically is a sort of "middleware" to facilitate the massage passing between the processes, the process migration, initialization and tear-down.
  • In a shared-memory architecture, there is usually one process which contains couple of threads which share the same memory address space, file handles and so on. Hence, the shared memory name. In this architecture, each threads can modify a "precess" global data. Therefore, a semaphore mechanism must be in use. OpenMP simplify the programming for shared memory architeture by providing compiler "extensions" in the form of various standardized "pragma"s. 

Upon reading both of the simplified explanation above, it's obvious that we can combine OpenMPI and OpenMP for paralel execution of code. Say, use OpenMP for "local execution within a machine" and use OpenMPI for inter-machine process communication. 

OpenMPI Slackbuild Script for Slamd64 12.1

I've just got OpenMPI to work on my Slamd64 12.1 system. I really hate cluttering the system. Therefore, I just build the package to ease removing it when I want to upgrade to a newer OpenMPI version later. This is the slackbuild script:

#!/bin/sh

# Slackware build script for Open MPI

# Written by Aleksandar Samardzic
# Modified for Slamd64 12.1 by Darmawan Salihun

PRGNAM=openmpi
VERSION=${VERSION:-1.3.3}
ARCH=${ARCH:-x86_64}
BUILD=${BUILD:-1}
TAG=${TAG:-_SBo}

CWD=$(pwd)
TMP=${TMP:-/tmp/SBo}
PKG=$TMP/package-$PRGNAM
OUTPUT=${OUTPUT:-/tmp}

if [ "$ARCH" = "i486" ]; then
SLKCFLAGS="-O2 -march=i486 -mtune=i686"
elif [ "$ARCH" = "i686" ]; then
SLKCFLAGS="-O2 -march=i686 -mtune=i686"
elif [ "$ARCH" = "x86_64" ]; then
SLKCFLAGS="-O2 -fPIC"
fi

set -e

rm -rf $PKG
mkdir -p $TMP $PKG $OUTPUT
cd $TMP
rm -rf $PRGNAM-$VERSION
tar xvf $CWD/$PRGNAM-$VERSION.tar.bz2
cd $PRGNAM-$VERSION
chown -R root:root .
find . \
\( -perm 777 -o -perm 775 -o -perm 711 -o -perm 555 -o -perm 511 \) \
-exec chmod 755 {} \; -o \
\( -perm 666 -o -perm 664 -o -perm 600 -o -perm 444 -o -perm 440 -o -perm 400 \) \
-exec chmod 644 {} \;


CFLAGS="$SLKCFLAGS" \
CXXFLAGS="$SLKCFLAGS" \
./configure \
--prefix=/usr \
--libdir=/usr/lib64 \
--mandir=/usr/man \
--sysconfdir=/etc \
--localstatedir=/var \
--docdir=/usr/doc/$PRGNAM-$VERSION \
--enable-static \
--enable-mpirun-prefix-by-default \
--build=$ARCH-slamd64-linux

make -j4
make install DESTDIR=$PKG

find $PKG | xargs file | grep -e "executable" -e "shared object" | grep ELF \
| cut -f 1 -d : | xargs strip --strip-unneeded 2> /dev/null || true

( cd $PKG/usr/man
find . -type f -exec gzip -9 {} \;
for i in $(find . -type l) ; do ln -s $(readlink $i).gz $i.gz ; rm $i ; done
)

# Let's not clobber config files
mv $PKG/etc/openmpi-totalview.tcl $PKG/etc/openmpi-totalview.tcl.new
mv $PKG/etc/openmpi-mca-params.conf $PKG/etc/openmpi-mca-params.conf.new
mv $PKG/etc/openmpi-default-hostfile $PKG/etc/openmpi-default-hostfile.new

mkdir -p $PKG/usr/doc/$PRGNAM-$VERSION
cp -a AUTHORS INSTALL LICENSE NEWS README VERSION examples \
$PKG/usr/doc/$PRGNAM-$VERSION
cat $CWD/$PRGNAM.SlackBuild > $PKG/usr/doc/$PRGNAM-$VERSION/$PRGNAM.SlackBuild

mkdir -p $PKG/install
cat $CWD/slack-desc > $PKG/install/slack-desc

cd $PKG
/sbin/makepkg -l y -c n $OUTPUT/$PRGNAM-$VERSION-$ARCH-$BUILD$TAG.tgz


That's it. Now, I can upgrade the package easily later. Anyway, I used OpenMPI 1.3.3 for my current experimental side projects.

Thursday, November 5, 2009

AMI BIOS Reverse Engineering Article

AMI BIOS Reverse Engineering article is up. Check out the details here.