On the efficiency of Artificial Neural Networks versus the Brain

07 Aug 2019 Gregory J. Stein

Summary: Recent ire from the media has focused on the high-power consumption of artificial neural nets (ANNs), yet popular discussion frequently conflates training and testing. Here, I aim to clarify the ways in which conversations involving the relative efficiency of ANNs and the human brain often miss the mark.

I recently saw an article in the MIT Tech Review about the “Carbon Footprint” of training deep neural networks that ended with a peculiar line from one of the researchers quoted in the article:

'Human brains can do amazing things with little power consumption,' he says. 'The bigger question is how can we build such machines.'

Now I want to avoid putting this particular researcher on the spot since his meta-point is a good one: there are absolutely things that the human brain is readily capable of for which the field of Artificial Intelligence has only just begun to scratch the surface. There are certain classes of problems, e.g. navigation under uncertainty, that require massive computational resources to solve in general, yet humans are capable of solving very well with little effort. Our ability to solve complex problems from limited examples, also known as combinatorial generalization, is unmatched in general by machine intelligence. Relatedly, humans have incredibly high sample efficiency, and require only a few training instances to generalize performance on tasks like video game playing and skill learning.

Yet commenting on the relative inefficiency of the neural net training, particularly for supervised learning problems, misses the point slightly. Deep learning has been shown to match and even (arguably) surpass human performance on many supervised tasks, including object detection and semantic segmentation. For such problems, the conversation about relative energy expenditure — as compared to the human brain — becomes more nuanced.

The cost of training versus evaluation

What many popular articles omit is the massive difference between the amount of computation required to train a neural network versus the computational requirements of using it in production. When considering the computational cost of a deep learning system, it is important to recognize there are two different costs incurred when using deep learning: training and evaluation. The vast number of parameters that need to be tuned to make a modern deep neural network work effectively makes the training phase, relatively speaking, extremely expensive. Training involves a guided walk through a massively-high-dimensional parameter space in an effort to eventually settle on a configuration that performs well on the provided data.

Note that for Reinforcement Learning problems, the concern about training time is much more appropriate. In such problems, the robot influences its own training, and therefore the rate at which it learns is one way to measure performance. For tasks that require constant parameter updating or recurrence, AI is reliably outperformed by the brain.

Yet machine learning experts are still in the midst of seemingly-endless and expensive ‘evolution’ phase of neural network design. Every time a new network architecture is proposed, it must be retrained. Similarly, the brain has evolved to solve a particular set of challenging problems relating to survival: complex locomotion, object detection, and general pattern recognition are all key skills for surviving in a dangerous and ever-changing world. The process of trial-and-error that has resulted in the modern human brain has taken millions of years and proven incredibly — and perhaps immeasurably — expensive. In short: regardless of context, structure learning is incredibly expensive, yet once the structure of a problem is learned and codified in the parameters of an artifical neural network, there are many opportunities for improvements in efficiency.

As I discuss in another article, neural networks can’t extract benefit from nothing: machine learning must always balance flexibility and prior assumptions about the data codified in ANN structure.

As particular structures prove useful for solving particular tasks, as convolutional neural networks (CNN’s) have for object detection, researchers can settle on these designs and put effort into optimizing them for performance. There are a slew of popular techniques for network optimization that have received attention lately as machine learning has increasingly appeared in low power consumer hardware, like smartphones and, yes, even Skydio’s fancy “self-flying camera” drone. One popular neural network optimization technique is known as network pruning, in which parts of the network found to be generally unhelpful for accuracy are removed. Research in this area typically focuses on approaches for efficiently identifying which regions are least impactful if removed. Though potentially expensive on their own, such optimization techniques pay dividends over time, in both cost and runtime.

The role of specialized hardware

Algorithmic approaches are only one way to make neural network evaluation more efficient. The CPU is a miracle of modern engineering, yet they are designed to be general purpose, capable of running machine code for any application. GPUs are example of how specialized hardware structures can be more efficient for certain applications. Convolutional Neural Networks, for example, can be evaluated much more quickly using the massively parallelizable structures of a GPU. Just as GPUs are more efficient than CPUs for certain classes of problem, so too are other forms of specialized hardware. Google and their Tensor Processing Unit (TPU) were the first to put out a production-ready chip for general-purpose artificial neural network evaluation. Equipped with special hardware structures for matrix multiplication and other tensor operations, TPUs further raised the bar for speed and efficiency in artificial neural networks.

There are also strategies to accelerate neural network evaluation to take advantage of hardware features just as one might compile code. Some of these strategies are enumerated in this tutorial from Google.

Yet further opportunities remain as applications become more specific and hardware can be custom-tailored for particular problems once a neural network and its parameters are chosen. FPGAs are effectively programmable circuits that can be configured to implement lightning-fast neural network evaluation. Some startup companies are even starting to design custom chips for machine learning and robotics applications. And progress in hardware acceleration doesn’t stop at custom computer chips either: some recent research leverages silicon photonics to implement a neural network in a single sheet of doped glass, thereby allowing evaluation of the network in as much time as it takes light to pass through the material and without any external power. For supervised learning tasks, these systems provide the brain with some fierce competition.

Microsoft is surprisingly ahead of the curve on FPGAs. Their Project Catapult initiative involved adding an FPGA layer to their cloud computing servers, thereby enabling hardware-level acceleration of encryption and machine learning for the Bing search engine.

Efficiency should still be a priority

Certainly it is not my goal to brush off the environmental and economic impacts of deep learning. As convenient as it might be to think of the training phase as an investment, computation and energy are sufficiently cheap that rarely do individual trained network models stay in production for very long. Companies are constantly retraining their models: even small gains in performance may correspond to a competitive advantage and large research divisions at companies like Google and Facebook compete for bragging rights on AI benchmarks like ImageNet. A recent report from OpenAI has computed a 3.5 month Moore’s-law-esque doubling time for the amount of computation used in landmark AI experiments — a worrying number from the corporate world as many researchers in the academic community focus on trying to learn more with fewer training instances. In summary:

Data efficiency and skill reuse are key features of general intelligence and, here, the brain is solidly in the lead.

As always, I welcome discussion in the comments below or on Hacker News. Feel free to ask questions, share your thoughts, or let me know of some research you would like to share.

References