cachestocaches

The Bottom Third

2020-04-08T19:46:43+00:00

When I first arrived at graduate school, one of my professors shared his experiences on managing students. He told us that his students tend to fall into three bins of roughly equal measure: one third of students earn their keep, one third don’t, and a final third produce enough work to fund both themselves and the underperformers. The implication was clear: Work hard and make sure you don’t end up in the bottom third.

My graduate school experience was in the sciences, in which my advisor was my boss and paid the bills. The social sciences have less direct management, but these lessons about effective management are no less important.

What has bothered me the most in the years since is the idea that these categories are fixed. That students who find themselves in the bottom third are stuck there. That the advisor is somehow abstracted away from how their students perform. How they treat underperforming students is what separates the good mentors from the bad.

Too many times, I have seen hard-working, brilliant students overlooked by their advisors because they didn’t show enough immediate progress. These students slowly receive less attention, despite often needing more, and their access to experimental resources wanes. Students who adapt quickly to graduate school or get lucky with an early project win favor, while the remaining students may not receive another chance to succeed. For advisors, this bottom third mentality is a self-fulfilling prophesy.

Mentorship is difficult. As educators and mentors, it is our responsibility to work to make sure that every student succeeds. Playing favorites isn’t always conscious, but good mentors look for other ways to engage with all their students and work with them to help them succeed. Here are some management lessons that good mentors keep in mind.

Different people may need different management styles to succeed. There is no one-size-fits-all approach to management. I have seen too many faculty manage as if every student was a cookie-cutter image of themselves. Yet everyone has different interests, different reasons for having pursued their career path, different strengths and weaknesses. Good advisors recognize that their students may need different levels of external pressure: while some students may need to meet multiple times a week, many are extremely productive for weeks at a time without supervision.

Even if students are managed differently, it is important that different standards do not apply to different people. Particularly in an academic environment, it is important to ensure that criteria for completing degree requirements or passing classes be clearly defined.

A lack of early success does not imply a lack of future success. I changed fields before earning my PhD. I was helpful enough to other students in the lab, but, for a while, I didn’t have the domain-specific knowledge to make independent progress towards my independent research goals. Some students take longer to spin up than others.

Burnout is different from laziness. Sometimes, burnout happens. It’s not great, and we should do what we can to identify the signs of burnout. Everyone needs downtime. Constant sprinting towards deadlines and not taking time to rest and recover creates incredible stress, and we all react to stress in different ways. It is important to ensure that students have guilt-free time to relax.

Changing labs/managers/jobs is not a sign of failure. We are imperfect. Even the best advisors have trouble managing some students. All of us—advisors and students—should be willing to have an open conversation about what’s not working. I changed advisors during my PhD; the lab’s research was simply not what I wanted to work on for the next four or five years. Too many friends of mine didn’t listen to that voice in their heads, and struggled through much of their PhD as a result. Advisors should make the option of switching to another lab clear, but do so without pressuring the student.

The book Difficult Conversations is a great resource for “how to discuss what matters most.”

As always, I welcome your thoughts in the comments below and on Hacker News.

Managing my Annotated Bibliography with Emacs’ Org Mode

2020-03-31T23:23:10+00:00

As an academic, I read a lot of papers. Part of my job is retaining what I read, since deeply understanding the work of others and building upon it is one way I come up with new research ideas. When it comes time to sit down and write a paper, I need to contextualize my ideas in a Related Work section: I include a discussion of other research that has touched upon similar problems or inspired my own work. Over my years in academia, I have settled on an annotated bibliography to manage my own knowledge base of papers.

When I started collecting papers and other references, I added annotations to PDFs. However, this doesn’t scale well, since my comments and the documents themselves were difficult to search and lacked easy-access to important metadata. I used Zotero for a while, but that didn’t mesh with my workflow either. It was nice enough, but still required that I leave my Emacs environment. In addition, I don’t really need the ability to markup PDFs. For a paper I think I may want to find again, I only need a couple of things:

A paragraph-long summary of the paper.
A BibTeX entry for the paper.

An annotated bibliography is perfect for this. Everything is in one place. It’s easy to edit and share. And I can manage the entire thing from within Emacs with ease.

If I like a paper, I add it to my annotated bibliography. Each entry typically includes a single paragraph and must answer a simple question: why might I want to remember having read this paper? Useful algorithm? Add it. A new way of thinking about a problem I’m interested in. Added. Maybe I’ve been thinking of writing up something in this area for my blog. Goes on the list. Pretty pictures or figures. Why not? I’m relatively unscrupulous about what I decide to include in my annotated bibliography—as long as I’m willing to take the time to understand the paper well enough to write an informative short summary of it, it’s probably worth adding.

Org Mode for Reference Management

The biggest hurdle is organization: how do I manage hundreds of references? For this, Org Mode is fantastically powerful. A minimal working example of my setup looks something like this:

#+PROPERTY: header-args :exports none :tangle "~/org/resources/bibliography/refs.bib"
#+LATEX_CLASS_OPTIONS: [12pt]
#+LATEX_HEADER: \usepackage[natbib=true]{biblatex} \DeclareFieldFormat{apacase}{#1} \addbibresource{~/org/resources/bibliography/refs.bib}
#+LATEX_HEADER: \usepackage{parskip}
#+OPTIONS: <:nil c:nil todo:nil H:5

* Computer Vision
 ** Image-to-Image Translation
    ***** CycleGAN: Unpaired Image-to-Image Translation using
          Cycle-Consistent Adversarial Networks citep:zhu2017CycleGAN.
    #+begin_src bibtex
      @inproceedings{zhu2017CycleGAN,
        title =        {Unpaired Image-to-Image Translation using
                        Cycle-Consistent Adversarial Networks},
        author =       {Zhu, Jun-Yan and Park, Taesung and Isola,
                        Phillip and Efros, Alexei A},
        booktitle =    {International Conference on Computer
                        Vision (ICCV)},
        keywords =     {GANs, computer vision},
        year =         {2017},
      }
    #+end_src
    This is one of my favorite papers. The authors extending some
    of the classic work done in Pix2Pix citep:isola2017pix2pix to
    /unpaired/ sets of images. At the core of the CycleGAN
    procedure are two Generative Adversarial Networks that learn
    to map images between two domains. The key addition that makes
    this process work is an additional loss term, which enforces
    that images passed through both generators should be as close
    as possible to the input image. This has practical motivation:
    if we translate one way and then translate back, we should
    expect the input to be unchanged. The results are impressive
    and eye catching. This work inspired a paper of mine:
    GeneSIS-RT citep:stein2018genesisrt.

* References
  :PROPERTIES:
  :UNNUMBERED: t
  :END:
  #+LaTeX: \printbibliography[heading=none]

The call to addbibresource at the top of the example file is where org-ref will look for references. I use helm to navigate through the list and can narrow down my search easily until I find what I’m looking for.

The entry is entirely self-contained. It includes both my summary of it and the BibTeX entry I’ll use for citing it in my LaTeX exports. I’ve included citation commands inline—of the form citep:stein2018genesisrt—to reference other papers. In this, I’m taking advantage of the fantastic org-ref, which allows me to include inline citations that will be properly resolved when exporting my notes to HTML and LaTeX and added to a bibliography at the end of the document. When I export my notes to LaTeX via the built-in Org exporter, the resulting file looks great:

The resulting LaTeX-generated PDF exported by Org from our example above. Notice that the inline references have been replaced by their proper citations and automatically added to the references at the end of the document. The resulting LaTeX-generated PDF exported by Org from our example above. Notice that the inline references have been replaced by their proper citations and automatically added to the references at the end of the document.

We must also be able to generate a standalone BibTeX file that org-ref and other software can consume. Again, Org makes this easy. A single call to org-babel-tangle combines and exports the BibTeX entries throughout the document and adds them to a file of my choosing specified at the top of the .org file.

The term tangle comes from the ideas of literate programming, in which code and documentation live in a single document. The tangle step refers to the extraction of source code that is subsequently complied or executed.

Including BibTeX inside each entry has an additional advantage: searching. I use org-rifle to search my notes, which allows me to look up Org mode by both their titles and their contents, thus allowing me to look up a paper by its title, authors, keywords, whatever.

Conclusion

Org mode makes for fantastic paper organization. With it I’m able to quickly add new papers, include inline citations, and search my bibliography—all without leaving the comfort of my Emacs environment.

Questions? Comments? Think your setup is better? I welcome comments below, on the Emacs Reddit, or on Hacker News.

A Guide to My Organizational Workflow: How to Streamline Your Life

2020-03-22T23:24:01+00:00

Five years ago, my life exploded in complexity. I had just started a new position in a new field. I was planning my wedding. And my inability to say NO to anyone and everyone had culminated in my serving on the board of three graduate student organizations. Inevitably, cracks began to form, and my finite brain started to lose track of tasks. My calendar was sufficient to ensure that I wouldn’t miss meetings, but I would often only prepare for those meetings at the eleventh hour. My productivity and the quality of my work both suffered. Something needed to change.

This guide is devoted to a discussion of the organizational system that I have honed in the time since. With it, I have found that my time is spent more wisely. Better organization means that I can consciously devote effort where it is needed early on, as opposed to scrambling to keep up, and deliver higher quality work without expending more energy.

Many of the ideas presented here derive from the Getting Things Done methodology, but adapted and expanded to meet my personal needs.

You too can streamline your process. This guide is meant to serve as an example of how you might reorganize your workflow and find order through the chaos of your busy life. Yet different lifestyles have different demands: what works for me may not work as well for you. As such, I do not expect that you will replicate this system in its entirety. Instead, I hope you will take inspiration from my system and use elements of it to build a workflow that works for you.

This document is broken into three main parts:

Goals: in which I dive into more detail about what it is I have tried to accomplish with my system.
Framework: in which I describe the core ideas and systems I employ to record information and keep track of my tasks.
Tooling: in which I discuss the tools—including hardware, software, whatever—that I use to implement the framework.

In addition, I conclude with two sections in which I describe what I see as limitations of my existing system and some other technical details.

Let’s dive in.

Goals

In order to determine how effective an organizational system is, it is important to have clearly enumerated aims: what is the system being designed to enable? Your goals may be different, but my system is structured to prioritize the following:

Ensure that I never miss a task, meeting, or deadline;
Manage tasks that I share with and have assigned to others;
Keep a permanent record of my work and research, and in a way that can be easily shared with others if necessary;
Collect my thoughts, writings, and half-baked ideas;
Ensure that my local progress—on a daily or monthly basis—is in service of my long-term goals;

Contemplating how my day-to-day work aligns with my long-term goals is perhaps the most difficult objective, since long-term goals are difficult to even enumerate; I discuss how I think about and manage long-term goals in a later section.

Finally: ensure I enjoy my life and make time for fun and friends! My system is meant to organize, not confine.

A note on flexibility and robustness

One key idea to keep in mind as well is that an organizational system should be flexible. Life is complicated and unexpected items—both good and bad—can appear at a moment’s notice. While it may not be immediately clear where a new task or project belongs, one should always have the ability to add new files or lists to which the new items can be added. Additionally, I do occasionally discover a file or project will outgrow the way I decided to structure it at its inception. My tools support fast and easy refactoring when necessary so that I can restructure a project to reflect my updated understanding of the problems it was intended to solve. The greater the effort required to reorganize when necessary, the less frequently it will happen and the effectiveness of the organizational system will decline. The tools I will describe later on work well for me, but you should find those tools that work best for you.

Similarly, an effective organizational system is robust: I am human and will occasionally forget to write something down or call someone back. The Getting Things Done methodology—upon which much of my workflow is based—recommends a weekly time of reflection during when I read through my outstanding tasks and rack my brain for anything I may have missed; I find this weekly effort an invaluable way of ensuring that my todo lists are complete. However you decide to organize your life, building in extra redundancies or time to tidy up is incredibly important.

Flexibility and robustness are properties that any organizational workflow should have if it is to be effective.

Framework: Getting Things Done (with Extensions)

One of the core principles of the Getting Things Done methodology, around which my system is built, is that the mind is for thinking, not remembering. Everything that may need to be accomplished—or that you might someday want to accomplish—should be written down. This core idea is used to motivate expansive list-making. Getting Things Done (or GTD) is all about making comprehensive lists of things that need to get done. Things that have deadlines. Things that rely on other people. Things that have other sub-things before they can be completed. Lists of tasks give an at-a-glance overview of the big picture and can be extremely useful for accomplishing long-term goals.

Yet Getting Things Done treats notes and other information as if they are separate from the tasks that motivated their creation—but this is not how most people think and work. My usual workflow for solving a problem involves breaking up a high-level objective into increasingly smaller goals until I can make progress towards accomplishing it. As I work, I log my progress, my thoughts, where I get stuck, temporary images or figures, intermediate results, how to reproduce my work, and so on. Note taking is a critical part of my thinking process, so notes and their parent tasks should coexist.

Software—the subject of a later section—allows me to collect my tasks and projects when needed and build the lists around which Getting Things Done (and my workflow) is centered.

Detailed notes are needed to accomplish individual projects or tasks but not the big picture. Task lists and calendars focus on the big picture at the expense of detail. An effective organizational system requires both.

A illustrative example: taking a class

Imagine you’re taking a class.

My system uses the Org syntax, similar to markdown, to store information. Asterisks are used to indicate the level of a task, so a task marked by ** belongs to another task heading with *.

Succeeding in your class requires that you keep track of a plethora of information—including lecture times, class notes, assignments and their due dates, and term projects. Here’s what an example file for this class might look like during the term:

* Class Lectures  :NOTES:

   ** Lecture: Linear Regression
      <2020-03-17 Tue 10:00-11:30>
      This task has a date and time.
      [Class notes will go here.]

   ** Lecture: Support Vector Machine
      <2020-03-19 Thu 10:00-11:30>
      [Class notes will go here.]

* Homework

  ** NEXT Homework Assignment 2
     DEADLINE: <2020-03-17 Tue>

  ** NEXT Homework Assignment 3
     DEADLINE: <2020-03-24 Tue>

* PROJ Mid-Term Project
  DEADLINE: <2020-03-21 Fri>
  This is marked as a project, since it has multiple
  sub-tasks.

   ** NEXT Find four reference papers
      We need to find four papers to write about. We can
      make immediate progress towards this task, so it is
      marked as NEXT (instead of TODO).

   ** TODO Choose one paper to implement
      One of the papers we find, we will need to implement
      for our project. This task requires that we have
      completed the previous task, so it is marked as
      TODO instead of NEXT.

   ** WAITING Check with Prof. paper topic okay
      Sent email to Professor. Waiting for reply.

* Reading
   ** SOMEDAY Read ML Textbook   :SOMEDAY:
      When I find the time, I would like to read the course
      textbook in more depth. The SOMEDAY tag means that
      I hope to accomplish this task eventually or may
      never make time for it.

The structure of the file is simple enough. Top-level items like Assignments and Class Lectures collect calendar items and standalone tasks as they are made clear. So where is the difficulty?

Well, if you’re only taking this one class, managing your course load is pretty straightforward. But most students are taking multiple classes at once. They participate in social events and clubs. They have lives outside of the university. Rent and tuition need to get paid. The number of different organizations to which a typical student is somehow beholden can be overwhelming, and a separate log file may exist for each and every one of them. The mental burden of keeping everything straight—much less accomplishing anything—introduces massive cognitive load.

That’s where the difficulty lies.

The second part of my system involves building calendars and todo lists that aggregate tasks from across these files—usually automating the process in software. A Weekly Calendar view might look something like this:


Monday      16 March 2020
Tuesday     17 March 2020
  ML Class:    10:00-11:30  Lecture: Linear Regression      :NOTES:
  Phys Class:  12:00-13:30  Lecture: Incompressible Fluids  :NOTES:
  ML Class:       DEADLINE: NEXT Homework Assignment 2
Wednesday   18 March 2020
  Dance Team:  19:00-22:00  Practice
Thursday    19 March 2020
  ML Class:    10:00-11:30  Lecture: Support Vector Machine :NOTES:
  Phys Class:  12:00-13:30  Lecture: Laminar Flow           :NOTES:
Friday      20 March 2020
  Phys Class:     DEADLINE: NEXT Homework Assignment 2
  ML Class:       DEADLINE: PROJ Mid-Term Project
Saturday    21 March 2020
  Dance Team:  12:00--18:00  Competition
Sunday      22 March 2020

Notice that the calendar collects tasks, meetings and deadlines from across multiple areas of focus. At a glance, you can see what the week is going to look like and what upcoming deadlines may require your immediate focus. The list of NEXT Tasks may look something like this:

Dance Team:  NEXT Download video of new routine
Dance Team:  NEXT Register for competition
Hobby:       NEXT Sign up for guitar lessons
ML Class:    NEXT Homework Assignment 2
ML Class:    NEXT Homework Assignment 3
Phys Class:  NEXT Homework Assignment 2
Phys Class:  NEXT Homework Assignment 3

These task lists are useful for gaining perspective and focusing effort where it is needed. Reviewing various task lists as part of a weekly agenda review is an effective way to ensure that tasks are not missed and that progress is being made towards longer-term goals. The remainder of this section will discuss each of these pieces in more detail.

Overall structure: logs and lists

In my system, everything is a type of task. Every task corresponds to something I should do, or somewhere I should be, or something I should remember. As such, a task can have many different types of data:

Strictly speaking, my system also has support for reference material: notes that do not pertain to specific task, like course notes and the like.

Tasks have notes. Every task can have any kind of notes associated with them. Simple tasks, like chores or errands, typically have no notes. More complex tasks—especially items associated with my job—can have extensive logs that include external links, images, or executable code snippets.
Tasks have state. Keywords like TODO, DONE, and CANCELED are used to indicate how complete a task is. WAITING is used to indicate tasks that are waiting on someone else’s action before I can make progress towards them. Items can also be created without any state at all, and are useful for aggregating information or tasks of a particular type, like chores.
Tasks can have times and deadlines. Adding time information to a task should be simple. Tasks associated with meetings or events have a time or time range associated with them. Deadlines can be added as well: times by which a task is expected to be marked as either DONE or CANCELED. Additionally, tasks can have a defer date until which they won’t appear on any list of incomplete tasks.
Projects are just tasks with subtasks. If I feel a task is too complicated to complete without further decomposition, I may create subtasks within that task. All subtasks need to be completed before a project can be marked as DONE. Note: I usually use the PROJ state to indicate projects for easier searching when building lists.
Tasks can have additional metadata. This might include where the task will occur (useful for meetings) or tags that I can use for sorting and filtering of information. I have included a more complete discussion of how I use tags in the appendix.

My organizational system consists of collections of tasks with various amount of metadata associated with them. As I get new tasks and projects I write them down. Making progress towards projects or higher-level objectives often reveals new tasks. I usually have a separate file for different organizations or areas of focus, and each file is populated with its own projects or tasks. When I’ve decided to work towards a particular project or area of focus, I’ll open that file and get to work.

But how do I know what needs to get done? For this, the task lists are very important.

My standard agenda view shows the Weekly Calendar and the NEXT Tasks list. When I start my day, I look at upcoming meetings and deadlines. I usually prioritize those, since they are time-sensitive. With those out of the way, I will look to all NEXT tasks to find tasks that I feel like doing. I will often use tags to filter my tasks by context: when I’m at work, I will only select tasks that are relevant for my job. My goals and deadlines are usually sparse: I have bigger projects that I need to deliver on, but whose deadlines are often months apart. Every day, I will usually pick one or two projects I want to make progress towards solving and start working on it. Projects are made up of smaller tasks: to make progress towards completing it, I can pick one of those tasks and start working on it. I continue to work on tasks or projects until the day is done.

Ranking tasks by urgency is generally more trouble than it is worth. My list of NEXT tasks is usually long but rarely is it terribly overwhelming. Filtering by tags is also occasionally useful.

The weekly review

On a day-to-day basis, new tasks are easy to keep track of. I write down tasks and projects as I become aware of them. Emails, conversations I have with friends or colleagues, random thoughts or blog posts—all of these can become new items to add to my logs. Even movies I want to watch or restaurants I want to go to can be added as SOMEDAY items. But reactively capturing tasks is only part of a well-organized existence. To facilitate progress towards goals and to encourage long-term personal growth, periodic reflection is required.

Consistent with the Getting Things Done methodology, every week I conduct a complete Weekly Review of my logs. The GTD weekly review consists of three phases:

1. Get Clear First, I ensure that everything not in my logs is added to my logs. Loose scraps of paper and notes are collected, processed, and discarded. My email inbox is emptied. Open browser tabs are reviewed and closed if possible. If I can clean it up and put it in its place, I do so.

2. Get Current Second, I look over all of my tasks and try to identify places where my logs might be incomplete or out-of-date. I look at all my NEXT tasks and remove any that have been completed. I review my WAITING tasks and follow up on any that may be overdue. Projects with no NEXT items are considered stuck; I update these if possible or flag them for later review. I reexamine tasks that have gone untouched for a long time: will I ever do these? I even review my calendar. Items from the last two weeks may need additional followup tasks, and deadlines or meeting for the upcoming week may require auxiliary tasks for preparation.

The Get Current phase is designed to make the system more robust. It’s hard for items to become stale if I look them over at least once a week.

3. Get Creative Finally, I look beyond my existing projects and tasks for new opportunities. I review my list of long-term goals—more on goals and motivators in the next section—in search of new tasks or projects. Can repetitive tasks be automated? Do certain aspects of my job feel overly stressful? Would gaining a new hobby or skill make me happier or more productive? Do I relax enough? I take a look at items tagged with SOMEDAY for TV shows, movies, games, or restaurants I’ve wanted to enjoy. In dedicating time to introspection, I can identify areas in which I feel I have not been making much progress and try to find ways my life might be improved.

The Weekly Review is essential for keeping my system up-to-date. Yet more important is ensuring my projects and tasks align with my long-term goals: if my system isn’t making me happier or more fulfilled, why bother?

I have included an exhaustively detailed discussion of my personal weekly review in the appendix.

From short-term tasks to long-term progress

When I started organizing my life, long-term goals were not on my mind. I was interested in little more than keeping my head above water—in finding some semblance of order through the chaos. Yet as I started to become more organized and think more than a week into the future, thinking about long-term goals helped to put my life into context. Without context, my accomplishments started to feel short-lived. Asking deep questions about motivation can help set direction, focusing time where it will be more useful in the long term. What sort of work did I want to do after graduation? How could I better spend my time during my PhD to get the job I wanted?

A key part of my organization system is my List of Motivators. Anything that might potentially generate new tasks or projects belongs on this list. In my system, motivators fall into three broad categories:

1. Active Motivators Any regular commitments, organizations I’m a part of, or ongoing activity is an active motivator. As a PhD student, I was a part of a number of student organizations; dedicating thought to each during the weekly review, I would occasionally come up with new events or try to identify ways in which those organizations might improve. This blog is also one of my active motivators—I try to think of new content or ways of improving the quality or visibility of my online presence. Hobbies can also fall in this category and my learn guitar motivator often inspires new songs to learn or techniques I would like to try.

In Getting Things Done parlance, what I call active motivators are referred to as areas of focus.

2. Tangible Motivators Sometimes a project is too small in scope to capture a milestone you would like to achieve. For me, graduating with my PhD was an example of this. Graduation required (1) writing my thesis, (2) passing my defense, an hour-long oral presentation, (3) satisfying course requirements, and (4) publishing the research upon which much of the thesis was built. Tangible Motivators are typically milestones that are greater than a few months into the future, yet might require indeterminate intermediate steps to accomplish. Writing a book or getting a promotion are other good examples of tangible motivators.

3. Conceptual Motivators The broadest long-term goals—like creating a successful business or becoming a thought leader in my field—belong to this class of motivator. I often consider my tangible motivators in the context of their conceptual counterparts. My goal to graduate from my PhD was connected to my desire to become a professor and to create an influential body of research in my chosen discipline.

The weekly review has a dedicated time during which I think critically about my different motivators. I identify ways I might make progress towards them, adding new tasks or projects to my logs in the process. Reviewing this list allows me to think critically about when my goals are not being met. Yet finding new tasks is only part of the process of reflection. I need to decide if my motivators themselves reflect what I want. Are there any missing? Do I no longer care about any? Critically, are there tasks I have enjoyed working on that don’t have a corresponding motivator? The weekly review should not feel utilitarian: if I find myself with a handful of projects that lack a corresponding motivator, perhaps something is missing.

Some soul-searching is obviously required here, since I cannot tell you what your goals are or how you should make progress towards them. Habit is the key: regular self-reflection has helped me find direction.

Tooling: Emacs and Org-mode

Understanding how my organizational workflow is meant to work is only one part of understanding how it works. The workhorses of my system are the tools—the services, software, and hardware that I use to get everything working.

Project Management with Org mode

My primary tool is Org mode, a notetaking syntax and surrounding software environment built-in to the customizable text editor Emacs. To be clear: Org mode and Emacs are not tools for everyone. Here is an overview of the highlights:

I have a slightly more technical writeup about my Org mode and Org Agenda workflow in another blog post as part of my Emacs for Productivity series.

Logs are written in plain text with simple markup. The class example from earlier shows a minimal Org mode file. *’s at the beginning of a line are used to indicate new headings, projects, or tasks. Other markup is used to stylize text: *bold text* becomes bold text and /italics/ becomes italics.

Org mode even supports inline code snippets that can be run without leaving the file. I wrote about this feature in a post on Literate Programming with Org Mode.

Keywords are used to indicate state. Task states like NEXT or SOMEDAY are identified automatically by their context, as in the class example above. A single function, org-todo, is used to quickly change the state of a task. During the weekly review, the keywords are needed for creating the task lists.

Moving tasks around is fast. Sometimes I want to move tasks to another location, particularly during the weekly review. With Org mode, this is done with a simple function: org-refile.

Capturing new tasks is simple. Many new tasks come in as emails or during meetings. For this, org-capture is incredibly useful. With it, I can create a new task with a single function call from one of an number of custom templates and then move it to where I would like it to go using org-refile.

Searching is rich and powerful. Since my Org mode files are just plain text, searching through them is straightforward. Org has a couple of extensions that allow me to search across all of my log files. I can even search by date, title, file, content, tags, metadata, anything with relative ease.

Clocking my time is simple. Clocking my time is not essential to my workflow. However, I find that it is a great way to keep track of how much effort I put towards different tasks and reevaluate if I’m appropriately allocating my time to one project over another. During my weekly agenda review, looking back over the last couple weeks of clocked time is also useful for reminding me what I have been working on recently. I usually only use this for work items—it doesn’t really matter to me how much time I spend on my hobbies or playing video games—but I do still record how long it takes me to write blog posts or other professional development activities. Clocking my time with Org mode is trivial and requires only a simple key sequence to enable or disable the clock for a given task.

Inline citations and references are easy with org-ref. It is very important to me that I can include references to papers and other publications in my notes. Since I work in academic computer science, I frequently use LaTeX for composing documents and BibTeX for managing references. With org-ref, written by the fantastic John Kitchin, I can add citations from my BibTeX file in my org-mode notes with a single command. Whenever I export my export my notes to a PDF via LaTeX, the citations appear inline and optionally populate a bibliography auto-generated at the end of the file.

Exporting and sharing my notes is simple and flexible. As you might have guessed, I do nearly everything in Org mode. Sharing notes with my colleagues is a critical part of my workflow and Org mode makes that easy with its flexible support for exporting and publishing notes. Turning a project into a PDF, via LaTeX, or a stylized web page is never more than a few commands away. I even have a custom exporter for my blog so that I can automatically insert Tufte-style the figure captions and margin notes I love to use.

Generating task lists with Org Agenda

Org mode is what provides the core functionality of my logs, but Org Agenda is what makes it all work. The agenda is the killer feature of Org mode. It is through Org Agenda views that I can build my calendar directly from my logs and create the task lists needed for daily operation—via the NEXT Tasks list and the Projects list—and my weekly review. I have custom task list generators for nearly all elements of the Get Current phase of my weekly review. In addition to simply aggregating tasks, the agenda view can automatically determine when a PROJ is stuck (does not have any NEXT tasks) or is waiting (is blocked by a task that is in the WAITING state). The views the system produces mirror those from the class example above.

Note: My more in-depth guide on creating custom agenda views and task lists is still a work in progress. For now, you can look at my technical overview or the incredibly detailed guide by Bernt Hansen.

Agenda views themselves are not merely plain text outputs: they are interactive. Each task is a link that, when opened, brings me to its corresponding location in my Org mode logs. The agenda can be filtered as well and I can easily narrow down what items I’m shown by their name, tag, or other metadata—this is useful when I’m at work, and I probably shouldn’t be working on my hobby projects.

Note: My workflow for managing tasks assigned to others or collaborative projects is still a work in progress. The system is tentatively working as expected, but I am not comfortable writing it up just yet. For an idea of some of the functionality I use, you can get an idea of how it works from this guide to team management with Org.

Other tools

Org and the Org Agenda are incredibly powerful, but general-purpose software isn’t always the easiest to use. Sometimes other tools are more appropriate.

My smartphone I can’t always be near my laptop to add new tasks. Fortunately, I have another tiny computer in my pocket that allows me to write down short notes that can be added to my logs at a later time. I usually just use the Notes application on my iPhone. The beorg iOS application is very easy to use and syncs with my Org files over Dropbox, allowing me to perform more complex operations on the go if necessary.

My email My relationship with email is constantly evolving. For a long time, I used the built-in desktop Mail client on my MacBook Pro. There was a time when I used Emacs to manage my email, something I wrote about extensively in another blog post. Now, I mostly use my desktop mail client again, though occasionally use my Emacs-based interface to sort through my mail and even link tasks with their associated mail message.

Staying in touch with friends and colleagues using “Monica” Though I once managed all my contacts in Emacs and Org mode, I have since found it easier to have a dedicated and specialized tool for this purpose. Monica has been my go-to tool for this purpose. It’s open source and optionally web-hosted for a small monthly fee. Monica is advertised as personal Customer Relationship Management (CRM) software and, with a clean web interface and flexible information storage, it delivers.

Personal finances with Ledger CLI My finances are not terribly complicated. I use Ledger, a command-line-based utility for double-entry accounting, to manage my money and budgets. Like Org mode, Ledger has a pretty simple syntax for storing entries. An example file looks something like this

2019/11/01 iTunes
    Expenses:Media:Music                       $9.99
    Liabilities:MasterCard

2019/11/01 Safeway
    Expenses:Food:Groceries                   $35.67
    Liabilities:MasterCard

2019/11/02 Compass Coffee
    ;; Coffee and lunch with my wife
    Expenses:Food:Coffee                       $8.00
    Expenses:Food:Snack                        $7.88
    Liabilities:MasterCard

Computing the balance of each account is simple:

gjstein $ ledger -f example.ledger balance
        $61.54  Expenses
        $51.55    Food
         $8.00      Coffee
        $35.67      Groceries
         $7.88      Snack
         $9.99    Media:Music
       $-61.54  Liabilities:MasterCard
====================
         0

So too is generating the history for a particular account, like Food:

gjstein $ ledger -f example.ledger register Food
> 19-Nov-01 Safeway         Ex:Foo:Groceries    $35.67    $35.67
> 19-Nov-02 Compass Coffee  Expe:Food:Coffee     $8.00    $43.67
>                           Expen:Food:Snack     $7.88    $51.55

Ledger is a simple yet powerful tool that I strongly recommend.

Limitations

As with any system, there exist shortcomings. There aren’t many—certainly not enough for me to consider another major overhaul—but what shortcomings I’ve identified I include here.

Archiving: I do not have a very principled way of deciding when material should be archived, yet if material is never archived, clutter can quickly become overwhelming.
Reference material: It is hard for me to know where new knowledge should be kept. One of my log files is for an expansive annotated bibliography, separated by category. As a researcher in an active field, many new publications blur the lines between subfields, making it challenging to know where to put that information. More specialized programs like Papers or Mendeley seem great at managing this sort of thing, but I haven’t had the impetus to leave the comfort of my system just yet.
Laziness: I am still human, and I occasionally fail to perform the weekly review. For one week, maybe two, this isn’t a problem. Over longer periods, cracks form. I can no longer trust that my system is a complete list of my tasks, and that lack of confidence makes the system significantly less effective.

Conclusion

I hope to have inspired you. My organizational workflow transformed my life, empowering me with the ability to plan for the future, rather than reactively making progress towards short-term goals. You should now have all the ideas you need to put in place a system of your own. Take control of your life and find order through the chaos.

Good luck.

As always, I welcome discussion in the comments below or on Hacker News. I encourage you to ask questions, provide feedback, or let me know if you might like more detail in a follow-up guide on a particular area.

Appendix: supplementary details

A complete list of task states

Without any customization, Org allows you to assign a TODO or a DONE states to any heading, either manually or using C-c C-t. However, I’ve found that these aren’t typically expressive enough for handling very large projects. I use the following keywords (and shortcut keys) for my documents:

NEXT (n) This is for tasks that can be immediately worked on.
TODO (t) Used to indicate blocked tasks, which require another task to get done before I can start working on it.
PROJ (p) Denotes a project, which usually contains other TODO or NEXT tasks. Projects without any NEXT tasks are considered stuck.
WAITING (w) Whenever I have a task that is waiting on someone else, I’ll assign it this keyword.
INACTIVE (i) As an academic, I will occasionally have ideas or projects that I’ll want to get done eventually, but won’t have time to work on them at the moment. I typically use the INACTIVE state to signify that I’d like to come back to it eventually and I’ll occasionally search my files for INACTIVE projects when I have time or during my weekly review.
CANCELED (c) This is self-explanatory. If I’m working on something for a while, but it fizzles, it’s sometimes useful to mark it as canceled rather than delete it outright.
DONE (d) This should be self-explanatory.

When changing between these states using C-c C-t, it’s possible to automatically prompt the user for clarification on what prompted the change. For example, whenever a task is set to WAITING, I have an opportunity to enter a short message detailing who I’m currently waiting on, and any additional information I may want to add. In addition, this command may also automatically add tags (e.g. CANCELED) to the heading to help with sorting and searching.

My weekly review in exhaustive detail

Much of this section is inspired by, or derives directly from, the principles outlined in Getting Things Done. However there are a number of things I’ve added or expanded upon to really make it my own. Your weekly review likely won’t match mine exactly, but hopefully it will give you an idea of what an exhaustive review requires.

Get Clear Everything that I want to be in my logs should be added to the logs.

[ ] Clean out my email inbox: I habitually use my email inbox to store messages that I should probably respond to. Either respond to it now, add a task to respond later, or forget about it. Regardless, it should be moved out of my inbox if possible.
[ ] Prune browser tabs: If you’re like me, you probably have dozens (hundreds?) of browser tabs open at a time. Many of these are papers or blog posts I may never read. I figure out the ones I want to read and write them down. The rest are closed.

Sometimes, it may not be clear where a new item belongs. For this, I have a Refile List, a specific heading with the REFILE tag where I put tasks for organization at a later date.

[ ] Collect lose papers and materials: Any handwritten notes or items I’ve added to my iPhone’s Notes app should be transfered into the system.
[ ] Empty your head: Anything that’s been on my mind should also be added. Usually, this step doesn’t mean much, but occasionally I’ll remember something I had to get done or someone I wanted to follow up with.
[ ] Clean my refile list: Any new tasks that I don’t know where to put during the week sometimes end up in a temporary refile list. Here, I look through that list and move tasks to where they belong, creating new headings, files, or projects if necessary.

Get Current Every open item in my logs—incomplete tasks or projects—should be looked at.

[ ] Review my daily log from the last week: I like to keep a pretty simple log of things I accomplished at the end of any given day, including what went well or where I got stuck. Looking over that log helps me come up with tasks I may not have thought of at the time.
[ ] Review the last 2 weeks of calendar entries: Similar to the daily log review, this occasionally reminds me of follow-up tasks or items that I may have missed.
[ ] Review the next week of calendar entries: Any upcoming deadlines may need additional tasks.
[ ] Review the NEXT Tasks list: Sometimes I forget to mark a task as complete; looking through all my NEXT items prevents the system from getting stale.
[ ] Review the WAITING Tasks list: The weekly review is a good time to make sure I reach out to people who may need prodding to get something done.
[ ] Review my list of open projects: I should spend a couple of minutes looking over every project, marked with a PROJ task.

Get Creative Finally, I look for opportunities for personal growth. The Getting Things Done book has some great suggestions for how to think about finding new projects or motivators. I highly recommend picking up a copy—and reading Chapter 7 in particular.

[ ] Review my inactive and “someday” lists: First, I take a look at projects that I’ve put on the backburner. The SOMEDAY state is devoted to projects, tasks, and other items associated with wishful thinking. I move any SOMEDAY or INACTIVE items to a more active state or cancel them as I see fit.
[ ] Review my motivators: This task is by far the most difficult of my review, which is why I have saved it for last. I go over my Motivators List and ask myself if I’m making sufficient progress towards them. What might I do to improve them? Are there any motivators that I’m no longer excited about? Are any motivators missing?

The Valley of AI Trust

2020-03-13T22:54:56+00:00

As a researcher at the intersection of Robotics and Machine Learning, the most surprising shift over my five years in the field is how quickly people have warmed to the idea of having AI impact their lives. Learning thermostats are becoming increasingly popular (probably good), digital voice assistants pervade our technology (probably less good), and self-driving vehicles populate our roads (about which I have mixed feelings). Along with this rapid adoption, fueled largely by the hype associated with artificial intelligence and recent progress in machine learning, we as a society are opening ourselves up to risks associated with using this technology in circumstances it is yet unprepared to deal with. Particularly for safety-critical applications or the automation of tasks that can directly impact quality of life, we must be careful to avoid what I call the valley of AI trust—the dip in overall safety caused by premature adoption of automation.

In light of the potential risks, the widespread adoption of AI is perhaps surprising at first glance. Even as late as two years ago, researchers and technologists were predicting that we would need massive performance improvements in AI and the capabilities of autonomous systems before humans would become comfortable with them:

Shai Shalev-Shwartz, Shaked Shammah, and Amnon Shashua

On a Formal Model of Safe and Scalable Self-driving Cars

Mobileye, 2017

[…] the probability of a fatality caused by an accident per one hour of (human) driving is known to be $10^{−6}$. It is reasonable to assume that for society to accept machines to replace humans in the task of driving, the fatality rate should be reduced by three orders of magnitude, namely a probability of $10^{−9}$ per hour [1].

[1] This estimate is inspired from the fatality rate of air bags and from aviation standards. In particular, $10^{−9}$ is the probability that a wing will spontaneously detach from the aircraft in mid air.

Shai Shalev-Shwartz, Shaked Shammah, and Amnon Shashua

On a Formal Model of Safe and Scalable Self-driving Cars

Mobileye, 2017

This three-orders-of-magnitude improvement threshold for acceptance has not been substantiated by reality, made obvious by the relatively widespread use of Tesla’s autopilot feature.

So how did we get here? Why are we seeing increasingly widespread acceptance of AI technologies despite the potential for risk?

Corporate financial incentives clearly play a role, since there is money to be made by being the first companies to provide a certain AI service or automating consumer-facing elements, the rise of digital chatbots for customer service a notable example. But I might argue that deeper effects are at play. Consumers are often choosing to use these technologies, and Amazon and Facebook are pushing quite hard to win over our affection (though why anyone would pay money to add a Facebook-brand camera in their living room is beyond me). The issue is that most consumers lack an understanding of how much they can trust these automated systems to make effective decisions and under what conditions they might fail.

The hype surrounding “AI” these days certainly is tipping the scales in a potentially dangerous direction. Many big tech companies, including Google, Microsoft, and Amazon Web Services, have positioned themselves as AI companies and their marketing preaches the idea the future they envision this technology will bring is very nearly upon us. In many ways, this narrative is supported by the incredible rate of progress in the research community over the last eight years—AI has achieved superhuman performance at tasks like language translation, playing the game of Go, and object detection, and has enabled mind-bending applications ranging from generative art to modeling protein folding. Referencing its potential for transformative impact, Andrew Ng has even been so ambitious as to call AI “the new electricity”. The hype surrounding the incredible capabilities of AI is so strong that a nontrivial portion of the conversation surrounding machine learning in the popular media is devoted to the AI Superintelligence. As a researcher, I occasionally get asked How much should we be worried about AI taking over? One thing I always make clear to people: AI is a lot less intelligent than you think.

For the forseeable future, society is substantially more likely to overestimate AI than it is to underestimate its capabilities. Progress on the specific applications that have captured public imagination does not mean that these individual systems can be composed into an AI capable of rivaling human performance in general. IBM Watson is a poignant example of a system that illustrates this failing. Watson failed to deliver upon its promises after being welcomed into hospitals to discover connections between patient symptoms that medical staff may have missed. However, The system made a number of dangerous predictions and recommendations that, if trusted implicitly, could have resulted in serious harm to patients.

The popular media also has a problematic tendency to overhype AI technologies as well, a point covered in a fantastic article in The Guardian entitled “The discourse is unhinged”.

In doing some deeper research on Watson I should point out that IBM has put out a statement that asserts they are “100% focused on patient safety.” For all the system’s shortcomings, I do not doubt that the researchers and engineers value human life.

Despite the hype, there remains a lack of clear delineation between tasks that AI is good at and those on which it is catastrophically bad. For example, consider the task of image generation that has received nontrivial popular attention in the last year or so. Between being asked to generate a “random human face” versus being asked to create an image of “A sheep by another sheep standing on the grass with the sky above and a boat in the ocean by a tree behind the sheep.”, on which might you expect that system to do a better job? Perhaps generating faces might be harder, because humans are incredibly discerning when identifying “normal looking” faces. Perhaps, the specificity of the sheep example makes it a more challenging task. The results speak for themselves:

Images from StyleGAN2 and Image Generation from Scene Graphs illustrating some state-of-the-art results in the space of image generation using generative adversarial networks. Images from StyleGAN2 and Image Generation from Scene Graphs illustrating some state-of-the-art results in the space of image generation using generative adversarial networks.

So why is the seep example perceptually much worse? The volume of query-specific data required to train the system plays a non-trivial role, since the face generation system has access to many thousands of high-resolution faces, yet the scene generation system is provided with a much more diverse set of images and accompanying natural language descriptions; if even any of those images contained a field with precisely two sheep in it, I would be surprised. In light of this change of perspective, these results are actually quite impressive, since the system generated an image that was potentially completely unlike any image it had ever seen before. However, understanding how and why these systems might perform poorly on a particular task often requires a deep, nuanced understanding of the AI system and the data used to train it. For the average consumer, this level of understanding should not be a requirement.

If you want to better understand the sheep example I highly recommend you read the Image Generation from Scene Graphs paper from which this result is taken. I have done a poor job here of emphasizing how impressive it is that an AI can be made to generate custom images from complex inputs.

To be clear, there are spaces in which AI is already purportedly improving safety: Tesla’s self-published safety report even claims significant safety improvements when using its “autopilot”, though the reality is perhaps more nuanced. Regardless, claims from Elon Musk that full autonomy is coming to Tesla’s vehicles as early as late 2020 is likely a misleading prediction, one that lends false credibility to the idea that quick progress on highway driving or parking lot navigation implies that the same technology will just-as-quickly enable full Level 5 autonomy. For example, driving on the highway is relatively easy, though merging onto an off-ramp is a much more challenging problem, because it requires more reliable predictions of the behavior of the other cars on the road and deeper knowledge of social convention.

So where does this supposed boundary exist? On what tasks should we trust AI? It is difficult to give general instruction about where AI may fail because rarely is the answer to these questions obvious. This deficiency is in part because even the research community often does not know where the lines are. There exist many open questions about the potential capacity for AI to revolutionize our everyday experience, and bold claims of its transformative power are difficult to refute when clear answers do not exist. Additionally, AI often fails in ways that humans do not expect, because its internal representation of the problem or of the data it uses for training is quite different from ours.

In light of these difficulties, our attitude towards new AI systems should be straightforward: be skeptical. We are on the cusp of a future in which AI will augment human capacity. Yet, for now, we need to ensure that these technologies are advertised in a way that makes clear what they can and cannot do and what it looks like when they fail. Trust should be earned, and must be re-earned when the scope of automation increases.

As always, I welcome discussion in the comments below or on Hacker News.

References

Shai Shalev-Shwartz, Shaked Shammah & Amnon Shashua, On a Formal Model of Safe and Scalable Self-driving Cars, arXiv preprint arXiv:1708.06374, 2017.
Justin Johnson, Agrim Gupta & Li Fei-Fei, Image Generation from Scene Graphs, in: Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
Tero Karras et al., Analyzing and Improving the Image Quality of StyleGAN, arXiv preprint arXiv:1912.04958, 2019.

Talk figures are different from paper figures

2020-01-06T11:44:35+00:00

As an academic, I see a lot of talks. In general, good presentations tend to be based on a good slide deck; even very capable speakers have a tough time reaching their audience when their slides are a mess. One common pitfall I often see is that many researchers will take figures or diagrams directly from their papers, upon which the talk is usually based, and paste them into their slides. It’s often clear to the audience when this happens, since figures in papers tend to be rich with information that can be distracting in a talk. My advice:

Avoid using unedited paper figures in talks.

For an example of what you might do when moving a figure from a paper to a talk, let’s take a look at an example from my own work. First, here’s a figure from my paper:

It’s a pretty clean figure (and one I’m pretty happy with), but there’s simply too much information being shown all at once here. In my talk, I split the figure into two slides:

Notice that in both of the slide figures, a few annotations have been added to guide the eye towards what I want the audience to see and a couple other annotations have been removed so as to reduce visual noise. It’s a relatively simple change, but one that makes a ton of difference when compounded across a lengthy technical talk.

Notice also that the slide titles are declarative statements rather than the generic and uninformative title Results; this is something else that tends to make for more effective presentations.

Depending on the paper and subject material, the differences between the talk figure and paper figure can be significant. One community in which I see this a lot is the physics community, when individual plots can represent months of data collection and processing, and so can be incredibly rich with information. Here’s a figure from an old physics paper of mine that is not at all well-suited to appear in a talk:

Why should the figures be different?

In general, the goals of a figure that appear in a paper are different from those that appear in a talk.

Figures that exist in a paper are often optimized for space. It’s simply a reality of most research these days that the combination of space constraints for publication and ever-higher expectations for acceptance mean that one needs to put a relatively high-density of information in each figure, since a figure needs to be at least as valuable as the text it replaces. Figures should still be as clear as possible, but the amount of stuff one can include for a paper figure is typically relatively high: the reader can spend some time looking over the figure and understanding what it says.
Talk slides should be optimized so that they communicate only one thing. This is something I often tell clients or students of mine when they show me a presentation draft. If there are too many things to look at . Either the figure needs to be trimmed-down or additional annotations need to be added. Including a full figure caption is a serious communication faux pas, so small (short) text bubbles on top of the figure to describe key features can go a long way to improving clarity.

How should I modify my figures for talks?

One of my top suggestions when making slides is to ask yourself “What is the point of adding this slide?” If your answer to that question is more than one sentence, you’re probably trying to communicate too much all at once. The same is true for figures: communicate only one thing at a time. Sometimes achieving this goal is a matter of splitting the figure into multiple parts. Other times, splitting the figure may be impossible, but simple annotations—e.g., a circle or arrow indicating the important part of the figure or a text bubble giving important context. Don’t be afraid to mark up your figures for a talk! The audience will thank you for the improved clarity, and those listeners who are sufficiently interested will come find you afterwards or look up your paper.

Also helpful is automating your figure-generation process. If you can remake your figure with minor changes at the push of a button, modifying them will be easy.

As always, I welcome your thoughts (and personal anecdotes) in the comments below or on Hacker News.

Machine Learning & Robotics: My (biased) 2019 State of the Field

2019-12-30T23:31:00+00:00

At the end of every year, I like to take a look back at the different trends or papers that inspired me the most. As a researcher in the field, I find it can be quite productive to take a deeper look at where I think the research community has made surprising progress or to identify areas where, perhaps unexpectedly, we did not advance.

Here, I hope to give my perspective on the state of the field. This post will no doubt be a biased sample of what I think is progress in the field. Not only is covering everything effectively impossible, but my views on what may constitute progress may differ from yours. Hopefully all of you reading will glean something from this post, or see a paper you hadn’t heard about. Better yet, feel free to disagree: I’d love to discuss my thoughts further and hear alternate perspectives in the comments below or on Hacker News.

As Jeff Dean points out, there are roughly 100 machine learning papers posted to the Machine Learning ArXiv per day!

From AlphaZero to MuZero

AlphaZero was one of my favorite papers from 2017. DeepMind’s world-class Chess- and Go-playing AI got a serious upgrade this year in the form of MuZero, which added Atari Games to its roster of super-human-performing tasks. Atari had previously been out of reach for AlphaZero because the observation space is incredibly large, making it difficult for AlphaZero to build a tree of actions and the outcomes that result. In Go, predicting the outcome of an action is easy, since the board follows a set of rules about what it will look like after taking an action. For Atari, however, predicting the outcome of an action in principle requires predicting what the next frame will look like. This very high-dimensional state space and hard-to-define observation model creates challenges when the system tries to estimate the impact of its actions a mere few frames into the future.

MuZero circumvents this problem by learning a latent (low-dimensional) representation of the state space, including the current frame, and then plans in that learned space. With this shift, taking an action moves around in this compact latent space, allowing the agent to imagine the impact of many different actions and evaluate the tradeoffs that may occur, a hallmark feature of the Monte-Carlo Tree Search (MCTS) algorithm upon which both AlphaZero and MuZero are based. This approach feels much more like what I might expect from a truly intelligent decision-making system: the ability to weigh different options and to do so without the need to predict precisely what the world will look like upon selecting each. Of course, the complication here is how they simultaneously learn this latent space while also learning to plan in it, but I’ll refer you to their paper for more details.

Another honorable mention in this space is Facebook AI’s Hanabi playing AI, in which the system needed to play a cooperative, partially observable card game.

What really struck me about this work is how it composes individual ideas into a larger working system. This paper is as much of a systems paper as any machine learning work I’ve seen, but beyond the dirty tricks that perennially characterize neural network training, the ideas presented in MuZero help answer deep questions about how one might build AI for increasingly complex problems. We are, as a community, making progress towards composing individual ideas to create more powerful decision-making systems. Both AlphaZero and MuZero make progress in this direction, recognizing that the structure of the MCTS tree-building (to simulate the impact of selecting different actions) coupled with the ability to predict the future goodness of each action would result in more powerful learning systems. MuZero’s addition of a learned compact representation (effectively a model of the system’s dynamics) in which actions and subsequent observations can be simulated for the purposes of planning, gives me hope that such systems may one day be able to tackle real-world robotics problems. As we strive to make increasingly intelligent AI, this work moves us in the direction of better understanding what ideas and tools are likely to bring about that reality.

See this post of mine for a discussion of what’s still missing for AlphaZero and MuZero to approach real-world problems

Great work (as always) from the folks at DeepMind!

Representation Learning (Long Live Symbolic AI)

Perhaps the area of progress I am most excited to see is in the space of Representation Learning. I’m a big fan of old-school classical planning and so-called symbolic AI, in which agents interface with the world by thinking about symbols, like objects or people. Humans do this all the time, but in translating our capacity to robotic or artificially intelligent agents, we often have to specify what objects or other predicates we want the agent to reason about. But a question that has largely eluded precise answers is Where do symbols come from? and more generally: How should we go about representing the world so that the robot can make quick and effective decisions when solving complex, real-world problems?

Some recent work has started to make real progress towards being able to learn such representations from data, enabling learned systems to infer objects on their own or to build a “relation graph” of objects and places that they can use to interact with a never-before-seen location. This research is yet young, but I am eager to see it progress, since I am largely convinced that progress towards more capable robotics will require deeper understanding and significant advances in this space. A couple of papers I’ve found particularly interesting include:

Entity Abstraction in Visual Model-Based Reinforcement Learning This paper is one of a handful of works recently trying to structure the learning problem in a way such that the system learns what objects are and can then forward simulate the behavior of those objects using a learned model of their dynamics. From the paper: “OP3 enforces the entity abstraction, factorizing the latent state into local entity states, each of which are symmetrically processed with the same function that takes in a generic entity as an argument.” This line of work is in its infancy, but I look forward to seeing how the community will continue to investigate using novel structures for learning that encourage the system to tease out entities of interest that can then be used in subsequent planning pipelines.

Here is an example from the Entity Abstraction paper, showing how this process can be used to make predictions about the future: #[image===entity-abstraction-example]#

Bayesian Relational Memory for Semantic Visual Navigation This paper involves building a topological map online as an agent navigates in search of a semantic goal—e.g., find the kitchen. As it navigates, it periodically identifies new rooms and adds them to its growing relation graph when it becomes sufficiently confident. Everything here is done from vision, meaning that the system has to deal with considerable uncertainty and high-dimensional inputs. This paper has some similar ideas to an influential paper from ICLR 2018: Semi-parametric Topological Memory for Navigation, which required a prior demonstration of the environment to build its map.

I look forward to seeing how the community will continue to blur the lines between model-based and model-free techniques in the next couple of years. More generally, I’d like to see more progress at the intersection of symbolic AI and more “modern” deep learning approaches to tackle problems of interest to the robotics community—e.g., vision-based map-building, planning under uncertainty, and life-long learning.

Supervised Computer Vision Research Cools (somewhat)

That’s not to say that work in this space isn’t important, but since Mask-RCNN from Facebook Research made waves in early 2018, I haven’t been particularly inspired by research in this space. Progress on tasks like semantic segmentation or object detection have matured considerably. The ImageNet challenge for object detection has largely faded into the background as only companies (who often have superior datasets or financial resources) really bother to try to top the leaderboard in related competitions.

But this isn’t a bad thing! In fact, it’s a particularly good time to be a robotics researcher, since the community has reached a point at which we have eeked as much performance as we can out of the datasets we have available to us and have started to focus more on widespread adoption of these tools and the “convenience features” associated with that process. There are now a variety of new techniques being applied to train these systems more quickly and, more importantly, to make them faster and more efficient without compromising accuracy. As someone who is interested in real-world and often real-time use of these technologies—particularly on resource-constrained systems like smartphones and small autonomous robots—I find I am particularly interested in this line of research, which will enable more widespread adoption of these tools and the capabilities they enable.

Of note has been some cool work on network distillation, in which optimization techniques are used after training to remove portions of a neural network that have little bearing on overall performance (and therefore do little more than increase the amount of computation). Though it has not yet seen widespread practical impact, the The Lottery Ticket Hypothesis paper has some fascinating ideas about how to initialize and train small neural networks that circumvent the need for pruning. This “Awesome” GitHub page has as complete a list as I’ve seen on different approaches to network pruning. A related technology of interest is network compilation in which hardware-specific functions are used to further accelerate evaluation; the FastDepth paper is a good example of using a combination of these techniques on the task of monocular depth estimation.

Maturing Technologies

Though new techniques and domains can be exciting, I am at least as interested in seeing what technologies start to slow down. Many areas of research become most interesting when they cross the point at which most of the low-hanging fruit has been picked, leading to investigations into deeper questions as the real challenges stymieing the field become clear. As someone who does research at the intersect of robotics and machine learning, I find that it is often at this point that the technologies become sufficiently robust that one might trust them to inform decision-making on actual hardware.

Graph Neural Networks

I am a huge fan of Graph Neural Networks. Ever since the paper Relational inductive biases, deep learning, and graph networks came out, I have been thinking deeply about how to integrate GNN’s as a learning backend for my own work. The general idea is quite elegant: build a graph in which the nodes correspond to individual entities—objects, regions of space, semantic locations—and connect them to one another according to their ability to impact each other. In short, the idea was to impose as much structure on the problem of interest where you could most easily define it and then let deep neural networks take care of learning the relationships between entities according to that structure (similar in concept to some of the work I discussed above in Representation Learning).

Graphical models have been used in AI for decades, but the problem of how to process high-dimensional observations created a bottleneck that, for a time, only hand-crafted features seemed to be able to overcome. But with GNN’s, high-dimensional inputs became a feature rather than a bug, and last year we saw an explosion of tools for using GNN’s for accomplishing interesting tasks that have proven challenging for other learning representations, like Quantum Chemistry.

This year, as tools for building and using Graph Nets matured, researchers started to apply GNNs to their own problems, leading to some interesting work at the intersection of machine learning and robotics (a place I tend to call home). I am particularly interested in robots capable of making good navigation decisions (especially when they only have incomplete knowledge about their surroundings) and a few papers—particularly Autonomous Exploration Under Uncertainty via Graph Convolutional Networks and Where are the Keys? by Niko Sünderhauf—have proven quite thought-provoking.

If you are interested in playing around with Graph Neural Networks, I would recommend you take the plunge via a Collaboratory Notebook provided by DeepMind in which you can play around with a bunch of demos.

Explainable & Interpretable AI

As excited as I am by the promise of deep learning and approaches to representation learning, the systems that result from these techniques are often opaque, a problematic property as such systems are increasingly human-facing. Fortunately, there has been increased attention and progress in the space of Explainable and Interpretable AI and, in general, working towards AI that humans might be comfortable trusting and co-existing with.

Relatedly: though it came out in 2018, I read the book Automating Inequality this past year and think that it should be required reading for all AI researchers so that we might think more critically about how the decisions we make (often unilaterally) have downstream consequences when these systems are deployed in real-world settings.

One of the more interesting papers in the vein of interpretable AI that caught my eye recently is This Looks Like That: Deep Learning for Interpretable Image Recognition by Chaofan Chen and Oscar Li from Cynthia Rudin’s lab at Duke. In the paper, the authors set up an image classification pipeline that works by identifying which regions of the current image match similar regions in other images and matching the classification between the two. The classification is therefore more interpretable than that of other competitive techniques since it, by design, provides a direct comparison to similar images and features from the training set. Here is an image from the paper showing how the system classified an image of a clay colored sparrow:

An example of how image classification is done in the paper This Looks Like That. An example of how image classification is done in the paper This Looks Like That.

Cynthia Rudin this year also published her well-known work: Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. In it, she argues that we should stop “explaining” (posthoc) decisions made by black-box models, and should instead be building models that are interpretable by construction. While I don’t know that I necessarily agree that this should be cause to immediately stop using black-box models, she makes some well-reasoned points in her paper. In particular, her voice is of critical importance to a field dominated by the development of black-box models.

There has also been some good work this past year, like Explaining Explanations to Society (by some friends and colleagues of mine, Leilani H. Gilpin and Cecilia Testart et al.), that focus on broader questions associated with understanding what types of explanations are most useful for society and how we might overcome the limitations of existing Deep Learning systems to extract such outputs.

In short, one of my biggest takeaways from 2019 is that researchers in particular should be conscious about how we develop models and try to build systems that are interpretable-by-design whenever possible. I am interested in such applications of recent work of mine and hope that an increasing portion of the community makes the design of such systems a priority.

Continued Growth of Simulation Tools and Progress in Sim-to-Real

Simulation is an incredibly useful tool, since data is cheap and effectively infinite, if not particularly diverse. Last year (2018) saw an explosion of simulated tools, many of them providing photorealistic images from simulated real-world environments and aimed at being directly useful for enabling real-world capabilities; these environments included InteriorNet: Mega-scale Multi-sensor Photo-realistic Indoor Scenes Dataset and the fantastic GibsonEnv, which includes “572 full buildings composed of 1447 floors covering a total area of 211k [square meters]”. This year has seen continued growth in this space, including a new, interactive Gibson Environment and Facebook’s gorgeous AI Habitat environment:

There’s also been continued interest in using video games as platforms for studying AI. Most recently Facebook has open sourced CraftAssist: “a platform for studying collaborative AI bots in Minecraft”.

These images are taken from Facebook’s technical report on their AI Habitat photorealistic simulated environment that was open-sourced this year. The images really do look incredible. These images are taken from Facebook’s technical report on their AI Habitat photorealistic simulated environment that was open-sourced this year. The images really do look incredible.

There are an ever-increasing number of technologies for using simulation tools to enable good performance in the real world. Admittedly, I’ve never been totally sold on the promise of domain randomization, in which elements of a simulated scene (texture, lighting, color, etc.) are randomly changed so that the learning algorithm learns to ignore those often irrelevant details. For many robotics applications, specific textures and lighting may actually matter for planning, and domain-specific techniques may be more appropriate and randomization, like some data augmentation procedures, may introduce problems of its own. That being said, recent efforts—including Sim-to-Real via Sim-to-Sim—and the widespread use of these techniques to improve performance throughout various subfields this past year are starting to convince me of its general utility. OpenAI also used domain randomization over both visual appearance and physics to learn to manipulate a Rubik’s Cube, thus proving their robot hand more dextrous than I am. Beyond randomization, domain adaptation procedures, which actively transfer knowledge between domains, have also seen some progress in the last year. I am particularly interested in work like Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience, in which a handful of real-world rollouts allow an RL agent to adapt its experience from simulation.

Also, the 2019 RSS conference had a full day workshop devoted to “Closing the Reality Gap in Sim2Real Transfer for Robotic Manipulation” that’s worth looking at if you’re interested.

Bittersweet Lessons

No discussion of AI in 2019 would be complete without a mention of the debate surrounding “The Bitter Lesson”. In March, Rich Sutton (a famous and well-respected researcher in AI) published a post to his website in which he talks about how, repeatedly over the course of AI’s history and his career, model-based methods—in which rigid structure is hand-designed by humans—have been overtaken by model-free methods, like Deep Learning. He cites as one example of this “SIFT” features for object detection: though they were the state-of-the-art for 20 years, Deep Learning has blown all of those results out of the water. He goes on:

Rich Sutton

The Bitter Lesson

March 13, 2019

This is a big lesson. As a field, we still have not thoroughly learned it, as we are continuing to make the same kind of mistakes. To see this, and to effectively resist it, we have to understand the appeal of these mistakes. We have to learn the bitter lesson that building in how we think we think does not work in the long run. The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.

One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great. The two methods that seem to scale arbitrarily in this way are search and learning.

Rich Sutton

The Bitter Lesson

March 13, 2019

His perspective spawned much debate in the AI research community, and some incredibly engaging rebuttals from the likes of Rodney Brooks and Max Welling. My take? There are always prior assumptions baked into our learning algorithms and we are only just scratching the surface of understanding how our data and learning representations translate into an ability to generalize. This is one of the reasons I am so excited by representation learning and by research at the intersection of deep learning and classical planning techniques. Only through being explicit about how we encode an agent’s ability to reuse its knowledge can we hope to achieve trustworthy generalization on complex, multi-sequence planning tasks. We should expect AI to exhibit combinatorial generalization, as humans do, in which we can achieve effective generalization without the need for exponentially growing datasets.

Conclusion

With as much progress as there was in 2019, there are yet areas ripe for growth in the coming years. I’d like to see more applications to Partially Observable Domains, which require that an agent have a deep understanding of its environment so that it may make predictions about the future (this is something I’m actively working on). I’m also interested in seeing more progress in so-called long-lived AI: systems that continue to learn and grow as they spend more time interacting with their surroundings. For now, many systems that interact with the world have a tough time handling noise in an elegant way and, except for the simplest of applications, most learned models will break down as the number of sensor observations grows.

As I mentioned earlier, I welcome discussion: let me know if there’s anything you thought I missed or feel strongly about. Feel free to drop me a line or comment on this review in the comments below or on Hacker News.

Wishing you all health and happiness in 2020!

References

David Silver et al., A general reinforcement learning algorithm that masters Chess, Shogi, and Go through self-play, Science, 2018.
Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap & David Silver, Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model, in: Advances in Neural Information Processing Systems (NeurIPS), 2019.
Jonathan Frankle & Michael Carbin, The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks, in: International Conference on Learning Representations, 2019.
Diana Wofk, Fangchang Ma, Tien-Ju Yang, Sertac Karaman & Vivienne Sze, FastDepth: Fast Monocular Depth Estimation on Embedded Systems, in: International Conference on Robotics and Automation (ICRA), 2019.
Yi Wu, Yuxin Wu, Aviv Tamar, Stuart Russell, Georgia Gkioxari & Yuandong Tian, Bayesian Relational Memory for Semantic Visual Navigation, in: International Conference on Computer Vision (ICCV), 2019.
Rishi Veerapaneni, John D. Co-Reyes, Michael Chang, Michael Janner, Chelsea Finn, Jiajun Wu, Joshua B. Tenenbaum & Sergey Levine, Entity Abstraction in Visual Model-Based Reinforcement Learning, in: Conference on Robot Learning (CoRL), 2019.
Peter W. Battaglia et al., Relational inductive biases, deep learning, and graph networks, arXiv preprint arXiv:1806.01261, 2018.
Fanfei Chen, Jinkun Wang, Tixiao Shan & Brendan Englot, Autonomous Exploration Under Uncertainty via Graph Convolutional Networks, in: International Symposium on Robotics Research (ISRR), 2019.
Virginia Eubanks, Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor, 2018.
Chaofan Chen, Oscar Li, Alina Barnett, Jonathan Su & Cynthia Rudin, This Looks Like That: Deep Learning for Interpretable Image Recognition, in: Neural Information Processing Systems (NeurIPS), 2019.
Fei Xia et al., Gibson Env V2: Embodied Simulation Environments for Interactive Navigation, 2019.
Stephen James, Paul Wohlhart, Mrinal Kalakrishnan, Dmitry Kalashnikov, Alex Irpan, Julian Ibarz, Sergey Levine, Raia Hadsell & Konstantinos Bousmalis, Sim-to-Real via Sim-to-Sim: Data-efficient Robotic Grasping via Randomized-to-Canonical Adaptation Networks, in: Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
OpenAI et al., Solving Rubik's Cube with a Robot Hand, arXiv preprint arXiv:1910.07113, 2019.

Anyone sufficiently experienced is indistinguishable from a magician

2019-10-31T08:02:19+00:00

There’s a story I retell about an incredibly talented researcher friend of mine from time-to-time. Though the exact details elude me now, since it was a number of years ago, the story goes something like this:

My friend and I were on our way to lunch when we ran into someone he knew in the hallway, who we’ll call Stumped Researcher. He was having some odd issue with a measurement apparatus he’d built; we were all physicists, and every lab has their own custom setup of sensors, signal analyzers, etc. to probe physical phenomena. After a lengthy description, stumped researcher was clearly distraught, unable to collect any data that made sense, indicating that something was wrong with his setup. Without ever having seen the measurement setup and without an understanding of the experimental goals, my friend asked a question that astonished me in its specificity, wanting to know the brand of lock-in amplifier that was being used. Stumped researcher (a bit lost, having not mentioned that any lock-in amplifier was even being used) didn’t remember. My friend responded “Yeah, the older model lock-in amplifiers produced by $COMPANY_NAME ship with cables that are known to fail sometimes. I’ll bet that’s the problem.” Sure enough, a couple days later, upon running into no-longer-stumped researcher, that was indeed the problem; a quick change of cable remedied the issue.

To this day, it remains one of the most incredible instances of remote problem-solving I’ve ever seen. The key enabler of this ability: experience. I know that my friend thought that might be the problem because he’d seen it before in the wild. Tinkering was his passion, and with the number of things he’d bought online, taken apart, and sold for parts, he’d no doubt seen it all. And yet, despite knowing how the trick was done, it certainly seemed like magic to me at the time. I find good doctors also have this ability, to have such a deep understanding of the entire body system that a problem in one region causes them to understand. Recently, it occurred to me that I occasionally do the same thing to the undergraduate researchers I work with, asking an obscure question about their code or data or algorithm and then remotely solving the problem that’s vexed them for days.

The title is an allusion to the perhaps overused Arthur C. Clarke quote: Any sufficiently advanced technology is indistinguishable from magic.

I have the privilege of being surrounded by brilliant scientists, philosophers, and thinkers of all kinds, so I witness this phenomena with relative frequency. Yet every time I see someone who surprises me in this way, I try to remember that these circumstances don’t just happen: only though dedication to a craft can one gain the depth of understanding necessary to demonstrate this level of mastery. The pull of impostor’s syndrome is real, but I try to be inspired by these moments whenever I can. Perhaps someday I’ll feature in someone else’s anecdotes.

As always, I welcome your thoughts (and personal anecdotes) in the comments below or on Hacker News.

The “Myths List” is a communication antipattern

2019-08-08T23:24:40+00:00

I can’t tell you the number of articles I’ve read devoted to “debunking myths”. They try to communicate the author’s opinion by listing a set of negative examples, often with section headings labeled Myth #1, Myth #2, etc. At best, it’s an easy way of building up a straw-man argument, yet at worst, such an article confuses the reader, filling their screen with potentially contentious or confusing statements. Try as I might, I rarely find these Myth List articles compelling. One particularly problematic article I recently came across boasted a headline of the form “10 Myths about […]” whose in-article headings were simply all the myths. At the start of every new section, I needed to remind myself that the author’s belief was opposite to what was written on the page. As you might imagine, the article was far from compelling.

Worse still are articles in which the author’s goal is to persuade rather than inform, and whether or not myths are actually myths is a contentious point.

The mental hoops I sometimes have to jump through to figure out what the author is trying to communicate rarely outweigh the benefits they might have gotten by introducing an opposing viewpoint. In succinctly summarizing only a point of view that is not being arguing for, the author introduces a cognitive dissonance in the reader that need not exist. Many such articles could benefit from a more clearly presented statement of the author’s viewpoint. Even having both views side-by-side would be a massive improvement, and could be made even clearer by adding visual markers to indicate which statement agrees with the author’s. Particularly in the modern era in which online attention span is limited and skimming is the norm, it is to the author’s benefit to make their article as skimmable as possible. Myth lists are in direct conflict with this goal, since the author’s perspective is often only fleshed out in the body of the text.

As always, I welcome your thoughts (and personal anecdotes) in the comments below or on Hacker News.

On the efficiency of Artificial Neural Networks versus the Brain

2019-08-07T07:06:08+00:00

Summary: Recent ire from the media has focused on the high-power consumption of artificial neural nets (ANNs), yet popular discussion frequently conflates training and testing. Here, I aim to clarify the ways in which conversations involving the relative efficiency of ANNs and the human brain often miss the mark.

I recently saw an article in the MIT Tech Review about the “Carbon Footprint” of training deep neural networks that ended with a peculiar line from one of the researchers quoted in the article:

'Human brains can do amazing things with little power consumption,' he says. 'The bigger question is how can we build such machines.'

Now I want to avoid putting this particular researcher on the spot since his meta-point is a good one: there are absolutely things that the human brain is readily capable of for which the field of Artificial Intelligence has only just begun to scratch the surface. There are certain classes of problems, e.g. navigation under uncertainty, that require massive computational resources to solve in general, yet humans are capable of solving very well with little effort. Our ability to solve complex problems from limited examples, also known as combinatorial generalization, is unmatched in general by machine intelligence. Relatedly, humans have incredibly high sample efficiency, and require only a few training instances to generalize performance on tasks like video game playing and skill learning.

Yet commenting on the relative inefficiency of the neural net training, particularly for supervised learning problems, misses the point slightly. Deep learning has been shown to match and even (arguably) surpass human performance on many supervised tasks, including object detection and semantic segmentation. For such problems, the conversation about relative energy expenditure — as compared to the human brain — becomes more nuanced.

The cost of training versus evaluation

What many popular articles omit is the massive difference between the amount of computation required to train a neural network versus the computational requirements of using it in production. When considering the computational cost of a deep learning system, it is important to recognize there are two different costs incurred when using deep learning: training and evaluation. The vast number of parameters that need to be tuned to make a modern deep neural network work effectively makes the training phase, relatively speaking, extremely expensive. Training involves a guided walk through a massively-high-dimensional parameter space in an effort to eventually settle on a configuration that performs well on the provided data.

Note that for Reinforcement Learning problems, the concern about training time is much more appropriate. In such problems, the robot influences its own training, and therefore the rate at which it learns is one way to measure performance. For tasks that require constant parameter updating or recurrence, AI is reliably outperformed by the brain.

Yet machine learning experts are still in the midst of seemingly-endless and expensive ‘evolution’ phase of neural network design. Every time a new network architecture is proposed, it must be retrained. Similarly, the brain has evolved to solve a particular set of challenging problems relating to survival: complex locomotion, object detection, and general pattern recognition are all key skills for surviving in a dangerous and ever-changing world. The process of trial-and-error that has resulted in the modern human brain has taken millions of years and proven incredibly — and perhaps immeasurably — expensive. In short: regardless of context, structure learning is incredibly expensive, yet once the structure of a problem is learned and codified in the parameters of an artifical neural network, there are many opportunities for improvements in efficiency.

As I discuss in another article, neural networks can’t extract benefit from nothing: machine learning must always balance flexibility and prior assumptions about the data codified in ANN structure.

As particular structures prove useful for solving particular tasks, as convolutional neural networks (CNN’s) have for object detection, researchers can settle on these designs and put effort into optimizing them for performance. There are a slew of popular techniques for network optimization that have received attention lately as machine learning has increasingly appeared in low power consumer hardware, like smartphones and, yes, even Skydio’s fancy “self-flying camera” drone. One popular neural network optimization technique is known as network pruning, in which parts of the network found to be generally unhelpful for accuracy are removed. Research in this area typically focuses on approaches for efficiently identifying which regions are least impactful if removed. Though potentially expensive on their own, such optimization techniques pay dividends over time, in both cost and runtime.

The role of specialized hardware

Algorithmic approaches are only one way to make neural network evaluation more efficient. The CPU is a miracle of modern engineering, yet they are designed to be general purpose, capable of running machine code for any application. GPUs are example of how specialized hardware structures can be more efficient for certain applications. Convolutional Neural Networks, for example, can be evaluated much more quickly using the massively parallelizable structures of a GPU. Just as GPUs are more efficient than CPUs for certain classes of problem, so too are other forms of specialized hardware. Google and their Tensor Processing Unit (TPU) were the first to put out a production-ready chip for general-purpose artificial neural network evaluation. Equipped with special hardware structures for matrix multiplication and other tensor operations, TPUs further raised the bar for speed and efficiency in artificial neural networks.

There are also strategies to accelerate neural network evaluation to take advantage of hardware features just as one might compile code. Some of these strategies are enumerated in this tutorial from Google.

Yet further opportunities remain as applications become more specific and hardware can be custom-tailored for particular problems once a neural network and its parameters are chosen. FPGAs are effectively programmable circuits that can be configured to implement lightning-fast neural network evaluation. Some startup companies are even starting to design custom chips for machine learning and robotics applications. And progress in hardware acceleration doesn’t stop at custom computer chips either: some recent research leverages silicon photonics to implement a neural network in a single sheet of doped glass, thereby allowing evaluation of the network in as much time as it takes light to pass through the material and without any external power. For supervised learning tasks, these systems provide the brain with some fierce competition.

Microsoft is surprisingly ahead of the curve on FPGAs. Their Project Catapult initiative involved adding an FPGA layer to their cloud computing servers, thereby enabling hardware-level acceleration of encryption and machine learning for the Bing search engine.

Efficiency should still be a priority

Certainly it is not my goal to brush off the environmental and economic impacts of deep learning. As convenient as it might be to think of the training phase as an investment, computation and energy are sufficiently cheap that rarely do individual trained network models stay in production for very long. Companies are constantly retraining their models: even small gains in performance may correspond to a competitive advantage and large research divisions at companies like Google and Facebook compete for bragging rights on AI benchmarks like ImageNet. A recent report from OpenAI has computed a 3.5 month Moore’s-law-esque doubling time for the amount of computation used in landmark AI experiments — a worrying number from the corporate world as many researchers in the academic community focus on trying to learn more with fewer training instances. In summary:

Data efficiency and skill reuse are key features of general intelligence and, here, the brain is solidly in the lead.

As always, I welcome discussion in the comments below or on Hacker News. Feel free to ask questions, share your thoughts, or let me know of some research you would like to share.

References

Emma Strubell, Ananya Ganesh & Andrew McCallum, Energy and Policy Considerations for Deep Learning in NLP, in: Annual Meeting of the Association for Computational Linguistics (ACL short), 2019.
Lucas Theis, Iryna Korshunova, Alykhan Tejani & Ferenc Huszár, Faster Gaze Prediction with Dense Networks and Fisher Pruning, arXiv preprint arXiv:1801.05787, 2018.
Peter W. Battaglia et al., Relational inductive biases, deep learning, and graph networks, arXiv preprint arXiv:1806.01261, 2018.
Dario Amodei & Danny Hernandez, AI and Compute, 2018.
Erfan Khoram, Ang Chen, Dianjing Liu, Lei Ying, Qiqi Wang, Ming Yuan & Zongfu Yu, Nanophotonic media for artificial neural inference, Photon. Res., 2019.

No Free Lunch and Neural Network Architecture

2019-05-24T21:48:59+00:00

Summary: Machine learning must always balance flexibility and prior assumptions about the data. In neural networks, the network architecture codifies these prior assumptions, yet the precise relationship between them is opaque. Deep learning solutions are therefore difficult to build without a lot of trial and error, and neural nets are far from an out-of-the-box solution for most applications.

Since I entered the machine learning community, I have frequently found myself engaging in conversation with researchers or startup-types from other communities about how they can get the most out of their data and, more often than not, we end up taking about neural networks. I get it: the allure is strong. The ability of a neural network to learn complex patterns from massive amounts of data has enabled computers to challenge (and even outperform) humans on general tasks like object detection and games like Go. But reading about the successes of these newer machine learning techniques rarely makes clear one important point:

Nothing is ever free.

When training a neural network — or any machine learning system — a tradeoff is always made between flexibility in the sorts of things the system can learn and the amount of data necessary to train these systems. Yet in practice, the precise nature of this tradeoff is opaque. That a neural network is capable of learning complex concepts — like what an object looks like from a bunch of images — means that training it effectively requires a large amount of data to convincingly rule-out other interpretations of the data and reject the impact of noise. On the face of it, this statement is perhaps obvious: of course it requires more work/data/effort to extract meaning out of more complex problems. Yet, perhaps counterintuitive to the thinking of many machine learning outsiders, the way in which these systems are designed and the relationship between the many complex hyperparameters that define them has a profound impact on how well the system performs.

Noise takes many forms. In the case of object detection, noise might include the color of the object: I should be able to identify that a car is a car regardless of its color.

On its own, data cannot provide all the answers. The goal of learning is to extract more general meaning from a limited amount of data, discovering properties about the data that allow new data to be understood. Generalizing to unseen data requires making a decision, whether explicitly or implicitly, about the nature of the data beyond what is provided in the dataset. Consider a very simple dataset consisting of 10 points evenly spaced along the x-axis:

Suppose your goal was to fit a curve to this data, what might you do? Well, from inspection, the “curve” looks pretty much like a line, so you fit a line to the points and hand off the resulting fit to another system. But what if you were to collect more data and you discover that, instead of falling close to a line (left), the new data has a rather dramatic periodic component (right):

You might (justifiably) claim that more data is necessary, and insist that a separate test set be provided so that you can choose between these different hypotheses. But at what point should you stop? Collecting data ad infinitum in order to rule out all possible hypothesis is impractical, so the learning algorithm must impose some assumptions about the structure of the data to be effective. Prior knowledge of the structure of the data is often essential to good machine learning performance: if you know that the data is linear with some noise added to it, you would know that fitting more complex curves is unnecessary. But if the underlying structure of the data is a mystery, as is often the case for very high-dimensional learning algorithms like a neural network, this problem is impossible to avoid.

Of course, having test data in addition to the training dataset is essential, and would mitigate the risk of a dramatic surprise.

Neural networks are vastly more complex and opaque than linear curve-fitting, exacerbating the challenges associated with constraining the space of possible solutions. The parameters that define a neural network — including the number of layers and the learning rate — do bias the system towards producing certain types of output, but often precisely how those parameters are related to the structure of the data is unclear. Unfortunately, detailed inspection of the performance of the network is largely impossible, owing to the complexity of the system being trained; most modern deep learning systems have millions of free parameters and no longer are a few coefficients enough to understand how the system performs.

There are methods for neural network inspection, but they remain an area of active research.

Relatedly, prior knowledge takes a markedly different form for more complex types of data, like images. For the dataset above, it was pretty clear from quick inspection of the data that a linear model was probably the right representation for the learning problem. If instead your dataset is billions of stock photos and your objective is to identify which of these contain household pets, it is not even clear how one should go about turning intuition into mathematical operations. In fact, this is ultimately the draw of such systems — the utility of neural network models is their ability to learn otherwise impossible-to-write-down features in the data.

Broadly, there is some well-validated intuition about the differences between vastly different neural network structures. Convolutional Neural Networks (CNNs), for example, are often used for image data, since the primary mode of their operation processes local patches of texture rather than the entire image all at once; this dramatically reduces the number of free parameters, and greatly restricts what the system can learn in a way that has proven effective for many image-based tasks. Yet whether I use six layers in my neural network instead of five, for example, is often decided by how well the system performs on the data, rather than an understanding of how that number of layers reflects the structure of the data.

I conclude with a word of caution: neural networks are far from an out-of-the-box solution for most learning problems and the process required to reach near-state-of-the art performance often comes as a surprise to non-experts. The design of machine learning systems remains an active area of research, and no size fits all for neural networks. Many off-the-shelf neural networks are designed or pretrained with a number of unwritten biases or priors about the nature of the dataset. Whether or not a particular learning strategy will work on one dataset is often a judgment call until validated with experiments, making iterating over the design an intensive process that often requires “expert” intuition or guidance.

As always, I welcome your thoughts in the comments below or on Hacker News.

cachestocaches

The Bottom Third

Managing my Annotated Bibliography with Emacs’ Org Mode

Org Mode for Reference Management

Conclusion

A Guide to My Organizational Workflow: How to Streamline Your Life

Goals

A note on flexibility and robustness

Framework: Getting Things Done (with Extensions)

A illustrative example: taking a class

Overall structure: logs and lists

The weekly review

From short-term tasks to long-term progress

Tooling: Emacs and Org-mode

Project Management with Org mode

Generating task lists with Org Agenda

Other tools

Limitations

Conclusion

Appendix: supplementary details

A complete list of task states

My weekly review in exhaustive detail

Tags

The Valley of AI Trust

References

Talk figures are different from paper figures

Why should the figures be different?

How should I modify my figures for talks?

Machine Learning & Robotics: My (biased) 2019 State of the Field

From AlphaZero to MuZero

Representation Learning (Long Live Symbolic AI)

Supervised Computer Vision Research Cools (somewhat)

Maturing Technologies

Graph Neural Networks

Explainable & Interpretable AI

Continued Growth of Simulation Tools and Progress in Sim-to-Real

Bittersweet Lessons

Conclusion

References

Anyone sufficiently experienced is indistinguishable from a magician

The “Myths List” is a communication antipattern

On the efficiency of Artificial Neural Networks versus the Brain

The cost of training versus evaluation

The role of specialized hardware

Efficiency should still be a priority

References

No Free Lunch and Neural Network Architecture