My LaTeX, Pandoc and Makefile workflow for writing papers in 2022
Contents
- Install base Tex system
- Pandoc
- Author paper
- Create bibliography (BibTeX)
- Render the paper as PDF
- Use Git
- Resources
LaTeX is a high-quality typesetting system; features are designed for the production of technical and scientific documentation. It’s the de-facto standard for the communication and publication of scientific documents, and available as free software. LaTeX is actually built on the TeX typesetting system created by the legendary Donald Knuth.
LaTeX is nothing more than a series of TeX macros, providing ready made commands for common formatting and layout needs, such as section headings, footnotes, bibliographies and cross references.
The name LaTeX is pronounced lah-teck or lay-teck, and NOT lay-tecks. The characters in TeX, on which LaTeX is based, on the greek word τέχνη (skill, art, technique). Creator Donald Knuth promotes a pronunciation of /tɛx/ (tekh) (using a voiceless velar fricative as in Modern Greek, similar to the ch in loch).
Install base Tex system⌗
Tex Live is one great (GPLv2 licenced) way to get up and running with a completely standalone TeX system. In addition to the core TeX compiler, can bundle in optional packages and macros. These really are essential if you want to be productive.
On Arch, you can installed the everything with sudo pacman -S textlive-most
. For Debian based distros sudo apt install textlive-full
Alternatively cherry pick the individual packages (e.g. I went with these on my Arch system):
texlive-core
base system and libstexlive-bin
CLI programs and compilertexlive-latexextra
LaTeX macrostexlive-publishers
texlive-fontsextra
fontspdflatex
for rendering LaTeX to PDF
Pandoc⌗
Pandoc is an amazing program (written in haskell) that converts document formats from one format to another. Some examples are Markdown to reveal.js. HTML to docx. In the specific case of writing technical reports, I’m most interesting in going from Markdown to LaTeX.
On Arch, install these packages:
pandoc
base Pandoc systempandoc-citeproc
bibliography supportpandoc-crossref
cross referencing of figures and tables
Author paper⌗
Write the paper in plain old markdown. Use YAML frontmatter to specify variables for Pandoc to use in its LaTeX template. By default Pandoc will produce its own template pandoc -D latex
. Supported front matter options for a LaTeX targetted output format are documented here (along with others). Here’s an example from one of my papers:
---
title: "Semantic Similarity in Binaries with BinHunt"
date: "May 2019"
author: "Benjamin Simmonds, UNSW Canberra"
abstract: Extracting meaningful semantic differences between software binaries without source code is difficult. This is a challenging problem due to the overwhelming amount of syntactic noise that small changes can result in at the assembly level.
---
# Introduction
Software binaries, or architecture specific compiled programs are often
Create bibliography (BibTeX)⌗
If not familiar with BibTeX, don’t worry the basics are easy.
BibTeX is reference management software for formatting lists of references. The BibTeX tool often coupled with the LaTeX document preparation system.
Create a BibTeX file with all the references you consumed to produce your research. I for example always create a file called paper.bib
, and cite every reference I use in this file. An excerpt from one:
@misc{tardos2006algorithm,
title={Algorithm Design},
author={Tardos, Eva and Kleinberg, Jon},
year={2006},
publisher={Reading (MA): Addison-Wesley}
}
@ONLINE {symbolicexecution2019,
author = "Schroeder, Brian and Burget, Joel",
title = "A gentle introduction to symbolic execution",
month = "apr",
year = "2019",
url = "https://blog.monic.co/a-gentle-introduction-to-symbolic-execution/"
}
@article{eskandari2012graph,
title={A graph mining approach for detecting unknown malwares},
author={Eskandari, Mojtaba and Hashemi, Sattar},
journal={Journal of Visual Languages \& Computing},
volume={23},
number={3},
pages={154--162},
year={2012},
publisher={Elsevier}
}
@inproceedings{jin2012binary,
title={Binary function clustering using semantic hashes},
author={Jin, Wesley and Chaki, Sagar and Cohen, Cory and Gurfinkel, Arie and Havrilla, Jeffrey and Hines, Charles and Narasimhan, Priya},
booktitle={2012 11th International Conference on Machine Learning and Applications},
volume={1},
pages={386--391},
year={2012},
organization={IEEE}
}
Don’t worry, most academic and journal databases support BibTeX formatting out of the box. Using Google Scholar for example locate the relevant paper you want to cite, click the Cite option and then the BibTeX link.
The pandoc-citeproc
filter automatically creates a references section at the end of the document, and will substitute all references in the markdown source in academic citation style. To reference a BibTeX entry in the content of the paper, simply prefix the BibTeX tag with an ampersand @
, for example:
The novel concept first introduced (@jin2012binary, P42) blah blah blah
References with BibTeX entry:
@inproceedings{jin2012binary,
title={Binary function clustering using semantic hashes},
author={Jin, Wesley and Chaki, Sagar and Cohen, Cory and Gurfinkel, Arie and Havrilla, Jeffrey and Hines, Charles and Narasimhan, Priya},
booktitle={2012 11th International Conference on Machine Learning and Applications},
volume={1},
pages={386--391},
year={2012},
organization={IEEE}
}
Render the paper as PDF⌗
You could run the following pandoc
command on the shell. I prefer to use make
, which will happily take on the tedious task of building the PDF version of your paper. Simply run make
.
To do this create a new file called Makefile
, placing it in the same directory as paper.md
(the Markdown file to be converted into LaTeX by pandoc) and paper.bib
(a vanilla BibTeX file) files:
all: clean paper
paper:
pandoc -s -F pandoc-crossref -F pandoc-citeproc --bibliography=paper.bib \
--variable papersize=a4paper \
--variable classoption=twocolumn \
-s paper.md -o paper.pdf
tex:
pandoc -s -F pandoc-crossref -F pandoc-citeproc --bibliography=paper.bib \
--variable papersize=a4paper \
-s paper.md -t latex -o paper.tex
clean:
rm -f *.pdf *.log *.tex
By default, running make
will kick off the all
target, first cleaning up old output files, following up by creating a fresh PDF output. Optionally if you want the raw TeX output, can do so with a make tex
.
Use Git⌗
Authoring documents using Markdown and LaTeX is absolutely life changing. You end up with the most beautifully typeset document output known to man, while maintaining the source for the report in a plain text format. Plain text (unlike nasty proprietary binary formats like docx) lends itself to:
- Diffing (with
diff
) and patching (withpatch
) - Keeping in a version control system (VCS) like Git, to track the when, who, what of all changes.
- Can use your favourite text editor (i.e. Vim)
I personally have a single Git repo for all my papers. In the top level of the repo, create a .gitignore
based on the useful TeX.gitignore
template from github/gitignore, to prevent generated or temporary files for being commmitted into the repo.
Periodically push the repo to GitHub as a convenient backup and distribution mechanism.