Tipps and Tricks by Thomas



I am an avid user of Linux and other derivatives of Unix.

I come up with few-line solutions that once in a while other people like a lot.

I put those where I believe they might be valuable for a wider audience.

LaTeX-Tricks

Squeezing papers into X pages

I assembled my collection of tricks how I make papers fit the number of pages that conferences require here

LaTeX beamerposter

Jointly with Philippe Dreuw I created the beamerposter package based on the beamer and the a0poster classes which allows to create posters of arbitrary size using LaTeX which even look nice.

The rounding package

Jointly with Philippe Dreuw I created macros to allow for rounding of numbers in LaTeX documents. This is very handy, if you have a table, but you don’t yet know to what precision you want to show the numbers in the end. Just put in the full precision and decide later.

Download: style file example

more coming soon

Shell-Tricks and tools

rowavg.py and colavg.py

Two very simple tools I frequently use are rowavg.py and colavg.py. They compute the average of columns or rows of their stdin. I use this for example when I want to compute the average result of a series of experiments I computed parallelized on a cluster:

grep '^ERROR RATE' *.log | colavg.py

Download: rowavg.py colavg.py

joinlines.py and joinprefixlines.py

Imagine you have multiple results per file, e.g. cross-validation, and you want to have the average (see above) for them individually.

Therefore I use joinprefixlines.py:

grep '^ERROR RATE' *.log | joinprefixlines.py -d : | colavg.py

will get all lines starting with ERROR RATE from all log-files, will then concatenate all lines from one fill and average over the columns.

joinlines.py <n> reads stdin and concatenates every n lines.

Download: joinlines.py joinprefixlines.py

todeadline.sh

Ever wondered how long you really have until the deadline of a conference? Luckily, bash has all the tools onboard to figure out… including time zone conversion. Just change the DEADLINE variable to your deadline and run the script.

Download: todeadline.sh

tablify.py

One of my most usefull python scripts is tablify.py which I use to create tables out of raw data. I use this a lot to summarize the results of a series of experiments.

Say I have my results in a set of files and using a series of grep, awk, sed,

 # grep ^EER *.out | grep -v svm | awk '{print $1,$2,$4}' | sed -e 's/semCon-//' -e 's/\.[0-9]*\.0\.out:EER:/ /' | tr - ' '  |  > RESULTS.txt

I format them into a format like this:

   # cat RESULTS.txt
   m0 b1.1  0.660566 0.723227
   m0 b1.2  0.665577 0.731097
   m0 b1.5  0.672113 0.740922
   m0 b2.0  0.671024 0.744400
   m1 b1.1  0.660566 0.723227
   m1 b1.2  0.665577 0.731097
   m1 b1.5  0.672113 0.740922
   m1 b2.0  0.671024 0.744400
   m2 b1.1  0.660566 0.723227
   m2 b1.2  0.665577 0.731097
   m2 b1.5  0.670153 0.740704
   m2 b2.0  0.672767 0.744604

where m and b are parameters to my algorithm which I want to analyse, and the other two columns are two different evaluation measures.

Then, I call

# cat RESULTS.txt | tablify -n 2 -avg 

   |               b1.1               b1.2               b1.5               b2.0 
---+----------------------------------------------------------------------------
m0 |  0.660566 0.723227  0.665577 0.731097  0.672113 0.740922  0.671024 0.744400 0.6673 0.7349
m1 |  0.660566 0.723227  0.665577 0.731097  0.672113 0.740922  0.671024 0.744400 0.6673 0.7349
m2 |  0.660566 0.723227  0.665577 0.731097  0.670153 0.740704  0.672767 0.744604 0.6673 0.7349
AV |  0.6606   0.7232    0.6656   0.7311    0.6715   0.7408    0.6716   0.7445 

and this table very easily shows that the m-parameter has no impact and the b parameter has an impact.

The last row/columns are the average values of their respective columns/rows.

Tablify can work with arbitrary many dimensions for both input and output variables. However, it becomes hard to read tables with more than 3 or 4 input variables.

Download: tablify.py

more coming soon