Managing my Annotated Bibliography with Emacs' Org Mode

31 Mar 2020 Gregory J. Stein

As an academic, I read a lot of papers. Part of my job is retaining what I read, since deeply understanding the work of others and building upon it is one way I come up with new research ideas. When it comes time to sit down and write a paper, I need to contextualize my ideas in a Related Work section: I include a discussion of other research that has touched upon similar problems or inspired my own work. Over my years in academia, I have settled on an annotated bibliography to manage my own knowledge base of papers.

When I started collecting papers and other references, I added annotations to PDFs. However, this doesn’t scale well, since my comments and the documents themselves were difficult to search and lacked easy-access to important metadata. I used Zotero for a while, but that didn’t mesh with my workflow either. It was nice enough, but still required that I leave my Emacs environment. In addition, I don’t really need the ability to markup PDFs. For a paper I think I may want to find again, I only need a couple of things:

An annotated bibliography is perfect for this. Everything is in one place. It’s easy to edit and share. And I can manage the entire thing from within Emacs with ease.

If I like a paper, I add it to my annotated bibliography. Each entry typically includes a single paragraph and must answer a simple question: why might I want to remember having read this paper? Useful algorithm? Add it. A new way of thinking about a problem I’m interested in. Added. Maybe I’ve been thinking of writing up something in this area for my blog. Goes on the list. Pretty pictures or figures. Why not? I’m relatively unscrupulous about what I decide to include in my annotated bibliography—as long as I’m willing to take the time to understand the paper well enough to write an informative short summary of it, it’s probably worth adding.

Org Mode for Reference Management

The biggest hurdle is organization: how do I manage hundreds of references? For this, Org Mode is fantastically powerful. A minimal working example of my setup looks something like this:

#+PROPERTY: header-args :exports none :tangle "~/org/resources/bibliography/refs.bib"
#+LATEX_CLASS_OPTIONS: [12pt]
#+LATEX_HEADER: \usepackage[natbib=true]{biblatex} \DeclareFieldFormat{apacase}{#1} \addbibresource{~/org/resources/bibliography/refs.bib}
#+LATEX_HEADER: \usepackage{parskip}
#+OPTIONS: <:nil c:nil todo:nil H:5

* Computer Vision
 ** Image-to-Image Translation
    ***** CycleGAN: Unpaired Image-to-Image Translation using
          Cycle-Consistent Adversarial Networks citep:zhu2017CycleGAN.
    #+begin_src bibtex
      @inproceedings{zhu2017CycleGAN,
        title =        {Unpaired Image-to-Image Translation using
                        Cycle-Consistent Adversarial Networks},
        author =       {Zhu, Jun-Yan and Park, Taesung and Isola,
                        Phillip and Efros, Alexei A},
        booktitle =    {International Conference on Computer
                        Vision (ICCV)},
        keywords =     {GANs, computer vision},
        year =         {2017},
      }
    #+end_src
    This is one of my favorite papers. The authors extending some
    of the classic work done in Pix2Pix citep:isola2017pix2pix to
    /unpaired/ sets of images. At the core of the CycleGAN
    procedure are two Generative Adversarial Networks that learn
    to map images between two domains. The key addition that makes
    this process work is an additional loss term, which enforces
    that images passed through both generators should be as close
    as possible to the input image. This has practical motivation:
    if we translate one way and then translate back, we should
    expect the input to be unchanged. The results are impressive
    and eye catching. This work inspired a paper of mine:
    GeneSIS-RT citep:stein2018genesisrt.

* References
  :PROPERTIES:
  :UNNUMBERED: t
  :END:
  #+LaTeX: \printbibliography[heading=none]

The call to addbibresource at the top of the example file is where org-ref will look for references. I use helm to navigate through the list and can narrow down my search easily until I find what I’m looking for.

The entry is entirely self-contained. It includes both my summary of it and the BibTeX entry I’ll use for citing it in my LaTeX exports. I’ve included citation commands inline—of the form citep:stein2018genesisrt—to reference other papers. In this, I’m taking advantage of the fantastic org-ref, which allows me to include inline citations that will be properly resolved when exporting my notes to HTML and LaTeX and added to a bibliography at the end of the document. When I export my notes to LaTeX via the built-in Org exporter, the resulting file looks great:

The resulting LaTeX-generated PDF exported by Org from our example above. Notice that the inline references have been replaced by their proper citations and automatically added to the references at the end of the document.

We must also be able to generate a standalone BibTeX file that org-ref and other software can consume. Again, Org makes this easy. A single call to org-babel-tangle combines and exports the BibTeX entries throughout the document and adds them to a file of my choosing specified at the top of the .org file.

The term tangle comes from the ideas of literate programming, in which code and documentation live in a single document. The tangle step refers to the extraction of source code that is subsequently complied or executed.

Including BibTeX inside each entry has an additional advantage: searching. I use org-rifle to search my notes, which allows me to look up Org mode by both their titles and their contents, thus allowing me to look up a paper by its title, authors, keywords, whatever.

Conclusion

Org mode makes for fantastic paper organization. With it I’m able to quickly add new papers, include inline citations, and search my bibliography—all without leaving the comfort of my Emacs environment.

Questions? Comments? Think your setup is better? I welcome comments below, on the Emacs Reddit, or on Hacker News.