Contents

LaTeX is a high-quality typesetting system; features are designed for the production of technical and scientific documentation. It’s the de-facto standard for the communication and publication of scientific documents, and available as free software. LaTeX is actually built on the TeX typesetting system created by the legendary Donald Knuth.

LaTeX is nothing more than a series of TeX macros, providing ready made commands for common formatting and layout needs, such as section headings, footnotes, bibliographies and cross references.

The name LaTeX is pronounced lah-teck or lay-teck, and NOT lay-tecks. The characters in TeX, on which LaTeX is based, on the greek word τέχνη (skill, art, technique). Creator Donald Knuth promotes a pronunciation of /tɛx/ (tekh) (using a voiceless velar fricative as in Modern Greek, similar to the ch in loch).

Install base Tex system

Tex Live is one great (GPLv2 licenced) way to get up and running with a completely standalone TeX system. In addition to the core TeX compiler, can bundle in optional packages and macros. These really are essential if you want to be productive.

On Arch, you can installed the everything with sudo pacman -S textlive-most. For Debian based distros sudo apt install textlive-full

Alternatively cherry pick the individual packages (e.g. I went with these on my Arch system):

  • texlive-core base system and libs
  • texlive-bin CLI programs and compiler
  • texlive-latexextra LaTeX macros
  • texlive-publishers
  • texlive-fontsextra fonts
  • pdflatex for rendering LaTeX to PDF

Pandoc

Pandoc is an amazing program (written in haskell) that converts document formats from one format to another. Some examples are Markdown to reveal.js. HTML to docx. In the specific case of writing technical reports, I’m most interesting in going from Markdown to LaTeX.

On Arch, install these packages:

  • pandoc base Pandoc system
  • pandoc-citeproc bibliography support
  • pandoc-crossref cross referencing of figures and tables

Author paper

Write the paper in plain old markdown. Use YAML frontmatter to specify variables for Pandoc to use in its LaTeX template. By default Pandoc will produce its own template pandoc -D latex. Supported front matter options for a LaTeX targetted output format are documented here (along with others). Here’s an example from one of my papers:

---
title: "Semantic Similarity in Binaries with BinHunt"
date: "May 2019"
author: "Benjamin Simmonds, UNSW Canberra"
abstract: Extracting meaningful semantic differences between software binaries without source code is difficult. This is a challenging problem due to the overwhelming amount of syntactic noise that small changes can result in at the assembly level.
---

# Introduction

Software binaries, or architecture specific compiled programs are often

Create bibliography (BibTeX)

If not familiar with BibTeX, don’t worry the basics are easy.

BibTeX is reference management software for formatting lists of references. The BibTeX tool often coupled with the LaTeX document preparation system.

Create a BibTeX file with all the references you consumed to produce your research. I for example always create a file called paper.bib, and cite every reference I use in this file. An excerpt from one:

@misc{tardos2006algorithm,
  title={Algorithm Design},
  author={Tardos, Eva and Kleinberg, Jon},
  year={2006},
  publisher={Reading (MA): Addison-Wesley}
}

@ONLINE {symbolicexecution2019,
  author = "Schroeder, Brian and Burget, Joel",
  title  = "A gentle introduction to symbolic execution",
  month  = "apr",
  year   = "2019",
  url    = "https://blog.monic.co/a-gentle-introduction-to-symbolic-execution/"
}

@article{eskandari2012graph,
  title={A graph mining approach for detecting unknown malwares},
  author={Eskandari, Mojtaba and Hashemi, Sattar},
  journal={Journal of Visual Languages \& Computing},
  volume={23},
  number={3},
  pages={154--162},
  year={2012},
  publisher={Elsevier}
}

@inproceedings{jin2012binary,
  title={Binary function clustering using semantic hashes},
  author={Jin, Wesley and Chaki, Sagar and Cohen, Cory and Gurfinkel, Arie and Havrilla, Jeffrey and Hines, Charles and Narasimhan, Priya},
  booktitle={2012 11th International Conference on Machine Learning and Applications},
  volume={1},
  pages={386--391},
  year={2012},
  organization={IEEE}
}

Don’t worry, most academic and journal databases support BibTeX formatting out of the box. Using Google Scholar for example locate the relevant paper you want to cite, click the Cite option and then the BibTeX link.

The pandoc-citeproc filter automatically creates a references section at the end of the document, and will substitute all references in the markdown source in academic citation style. To reference a BibTeX entry in the content of the paper, simply prefix the BibTeX tag with an ampersand @, for example:

The novel concept first introduced (@jin2012binary, P42) blah blah blah

References with BibTeX entry:

@inproceedings{jin2012binary,
  title={Binary function clustering using semantic hashes},
  author={Jin, Wesley and Chaki, Sagar and Cohen, Cory and Gurfinkel, Arie and Havrilla, Jeffrey and Hines, Charles and Narasimhan, Priya},
  booktitle={2012 11th International Conference on Machine Learning and Applications},
  volume={1},
  pages={386--391},
  year={2012},
  organization={IEEE}
}

Render the paper as PDF

You could run the following pandoc command on the shell. I prefer to use make, which will happily take on the tedious task of building the PDF version of your paper. Simply run make.

To do this create a new file called Makefile, placing it in the same directory as paper.md (the Markdown file to be converted into LaTeX by pandoc) and paper.bib (a vanilla BibTeX file) files:

all: clean paper

paper:
	pandoc -s -F pandoc-crossref -F pandoc-citeproc --bibliography=paper.bib \
	--variable papersize=a4paper \
	--variable classoption=twocolumn \
	-s paper.md -o paper.pdf

tex:
	pandoc -s -F pandoc-crossref -F pandoc-citeproc --bibliography=paper.bib \
	--variable papersize=a4paper \
	-s paper.md -t latex -o paper.tex

clean:
	rm -f *.pdf *.log *.tex

By default, running make will kick off the all target, first cleaning up old output files, following up by creating a fresh PDF output. Optionally if you want the raw TeX output, can do so with a make tex.

Use Git

Authoring documents using Markdown and LaTeX is absolutely life changing. You end up with the most beautifully typeset document output known to man, while maintaining the source for the report in a plain text format. Plain text (unlike nasty proprietary binary formats like docx) lends itself to:

  • Diffing (with diff) and patching (with patch)
  • Keeping in a version control system (VCS) like Git, to track the when, who, what of all changes.
  • Can use your favourite text editor (i.e. Vim)

I personally have a single Git repo for all my papers. In the top level of the repo, create a .gitignore based on the useful TeX.gitignore template from github/gitignore, to prevent generated or temporary files for being commmitted into the repo.

Periodically push the repo to GitHub as a convenient backup and distribution mechanism.

Resources

Gist by macogden