content/02.intro.md
## Introduction to deep learning
Biology and medicine are rapidly becoming data-intensive.
A recent comparison of genomics with social media, online videos, and other data-intensive disciplines suggests that genomics alone will equal or surpass other fields in data generation and analysis within the next decade [@doi:10.1371/journal.pbio.1002195].
The volume and complexity of these data present new opportunities, but also pose new challenges.
Automated algorithms that extract meaningful patterns could lead to actionable knowledge and change how we develop treatments, categorize patients, or study diseases, all within privacy-critical environments.
The term _deep learning_ has come to refer to a collection of new techniques that, together, have demonstrated breakthrough gains over existing best-in-class machine learning algorithms across several fields.
For example, over the past five years these methods have revolutionized image classification and speech recognition due to their flexibility and high accuracy [@doi:10.1038/nature14539].
More recently, deep learning algorithms have shown promise in fields as diverse as high-energy physics [@doi:10.1038/ncomms5308], computational chemistry [@doi:10.1002/jcc.24764], dermatology [@doi:10.1038/nature21056], and translation among written languages [@arxiv:1609.08144].
Across fields, "off-the-shelf" implementations of these algorithms have produced comparable or higher accuracy than previous best-in-class methods that required years of extensive customization, and specialized implementations are now being used at industrial scales.
Deep learning approaches grew from research on artificial neurons, which were first proposed in 1943 [@doi:10.1007/BF02478259] as a model for how the neurons in a biological brain process information.
The history of artificial neural networks---referred to as "neural networks" throughout this article---is interesting in its own right [@doi:10.1103/RevModPhys.34.135].
In neural networks, inputs are fed into the input layer, which feeds into one or more hidden layers, which eventually link to an output layer.
A layer consists of a set of nodes, sometimes called "features" or "units," which are connected via edges to the immediately earlier and the immediately deeper layers.
In some special neural network architectures, nodes can connect to themselves with a delay.
The nodes of the input layer generally consist of the variables being measured in the dataset of interest---for example, each node could represent the intensity value of a specific pixel in an image or the expression level of a gene in a specific transcriptomic experiment.
The neural networks used for deep learning have multiple hidden layers.
Each layer essentially performs feature construction for the layers before it.
The training process used often allows layers deeper in the network to contribute to the refinement of earlier layers.
For this reason, these algorithms can automatically engineer features that are suitable for many tasks and customize those features for one or more specific tasks.
Deep learning does many of the same things as more familiar machine learning approaches.
In particular, deep learning approaches can be used both in *supervised* applications---where the goal is to accurately predict one or more labels or outcomes associated with each data point---in the place of regression approaches, as well as in *unsupervised*, or "exploratory" applications---where the goal is to summarize, explain, or identify interesting patterns in a data set---as a form of clustering.
Deep learning methods may in fact combine both of these steps.
When sufficient data are available and labeled, these methods construct features tuned to a specific problem and combine those features into a predictor.
In fact, if the dataset is "labeled" with binary classes, a simple neural network with no hidden layers and no cycles between units is equivalent to logistic regression if the output layer is a sigmoid (logistic) function of the input layer.
Similarly, for continuous outcomes, linear regression can be seen as a single-layer neural network.
Thus, in some ways, supervised deep learning approaches can be seen as an extension of regression models that allow for greater flexibility and are especially well-suited for modeling non-linear relationships among the input features.
Recently, hardware improvements and very large training datasets have allowed these deep learning techniques to surpass other machine learning algorithms for many problems.
In a famous and early example, scientists from Google demonstrated that a neural network "discovered" that cats, faces, and pedestrians were important components of online videos [@url:http://research.google.com/archive/unsupervised_icml2012.html] without being told to look for them.
What if, more generally, deep learning takes advantage of the growth of data in biomedicine to tackle challenges in this field? Could these algorithms identify the "cats" hidden in our data---the patterns unknown to the researcher---and suggest ways to act on them? In this review, we examine deep learning's application to biomedical science and discuss the unique challenges that biomedical data pose for deep learning methods.
Several important advances make the current surge of work done in this area possible.
Easy-to-use software packages have brought the techniques of the field out of the specialist's toolkit to a broad community of computational scientists.
Additionally, new techniques for fast training have enabled their application to larger datasets [@arxiv:1106.5730].
Dropout of nodes, edges, and layers makes networks more robust, even when the number of parameters is very large.
Finally, the larger datasets now available are also sufficient for fitting the many parameters that exist for deep neural networks.
The convergence of these factors currently makes deep learning extremely adaptable and capable of addressing the nuanced differences of each domain to which it is applied.
![Neural networks come in many different forms.
Left: a key for the various types of nodes used in neural networks.
Simple FFNN: a feed forward neural network in which inputs are connected via some function to an output node and the model is trained to produce some output for a set of inputs.
MLP: the multi-layer perceptron is a feed forward neural network in which there is at least one hidden layer between the input and output nodes.
CNN: the convolutional neural network is a feed forward neural network in which the inputs are grouped spatially into hidden nodes.
In the case of this example, each input node is only connected to hidden nodes alongside their neighboring input node.
Autoencoder: a type of MLP in which the neural network is trained to produce an output that matches the input to the network.
RNN: a deep recurrent neural network is used to allow the neural network to retain memory over time or sequential inputs.
This figure was inspired by the [Neural Network Zoo](http://www.asimovinstitute.org/neural-network-zoo/) by Fjodor Van Veen.
<!-- Figure source at https://docs.google.com/drawings/d/1TEzlqiCewmJ-EITvogMqDz7N6CizMtBvWE4H_BXqnuQ -->
](images/petting-zoo.svg){#fig:nn-petting-zoo .white}
This review discusses recent work in the biomedical domain, and most successful applications select neural network architectures that are well suited to the problem at hand.
We sketch out a few simple example architectures in Figure @fig:nn-petting-zoo.
If data have a natural adjacency structure, a convolutional neural network (CNN) can take advantage of that structure by emphasizing local relationships, especially when convolutional layers are used in early layers of the neural network.
Other neural network architectures such as autoencoders require no labels and are now regularly used for unsupervised tasks.
In this review, we do not exhaustively discuss the different types of deep neural network architectures; an overview of the principal terms used herein is given in Table @tbl:glossary.
Table @tbl:glossary also provides select example applications, though in practice each neural network architecture has been broadly applied across multiple types of biomedical data.
A recent book from Goodfellow et al. covers neural network architectures in detail [@tag:goodfellow2016deep], and LeCun et al. provide a more general introduction [@doi:10.1038/nature14539].
| Term | Definition | Example applications |
|-----|----------|----------|
| Supervised learning | Machine-learning approaches with goal of prediction of labels or outcomes | |
| Unsupervised learning | Machine-learning approaches with goal of data summarization or pattern identification | |
| Neural network (NN) | Machine-learning approach inspired by biological neurons where inputs are fed into one or more layers, producing an output layer | |
| Deep neural network | NN with multiple hidden layers. Training happens over the network, and consequently such architectures allow for feature construction to occur alongside optimization of the overall training objective. | |
| Feed-forward neural network (FFNN) | NN that does not have cycles between nodes in the same layer | Most of the examples below are special cases of FFNNs, except recurrent neural networks. |
| Multi-layer perceptron (MLP) | Type of FFNN with at least one hidden layer where each deeper layer is a nonlinear function of each earlier layer | MLPs do not impose structure and are frequently used when there is no natural ordering of the inputs (e.g. as with gene expression measurements). |
| Convolutional neural network (CNN) | A NN with layers in which connectivity preserves local structure. _If the data meet the underlying assumptions_ performance is often good, and such networks can require fewer examples to train effectively because they have fewer parameters and also provide improved efficiency. | CNNs are used for sequence data---such as DNA sequences---or grid data---such as medical and microscopy images. |
| Recurrent neural network (RNN) | A neural network with cycles between nodes within a hidden layer. | The RNN architecture is used for sequential data---such as clinical time series and text or genome sequences. |
| Long short-term memory (LSTM) neural network | This special type of RNN has features that enable models to capture longer-term dependencies. | LSTMs are gaining a substantial foothold in the analysis of natural language, and may become more widely applied to biological sequence data. |
| Autoencoder (AE) | A NN where the training objective is to minimize the error between the output layer and the input layer. Such neural networks are unsupervised and are often used for dimensionality reduction. | Autoencoders have been used for unsupervised analysis of gene expression data as well as data extracted from the electronic health record. |
| Variational autoencoder (VAE) | This special type of generative AE learns a probabilistic latent variable model. | VAEs have been shown to often produce meaningful reduced representations in the imaging domain, and some early publications have used VAEs to analyze gene expression data. |
| Denoising autoencoder (DA) | This special type of AE includes a step where noise is added to the input during the training process. The denoising step acts as smoothing and may allow for effective use on input data that is inherently noisy. | Like AEs, DAs have been used for unsupervised analysis of gene expression data as well as data extracted from the electronic health record. |
| Generative neural network | Neural networks that fall into this class can be used to generate data similar to input data. These models can be sampled to produce hypothetical examples. | A number of the unsupervised learning neural network architectures that are summarized here can be used in a generative fashion. |
| Restricted Boltzmann machine (RBM) | A generative NN that forms the building block for many deep learning approaches, having a single input layer and a single hidden layer, with no connections between the nodes within each layer | RBMs have been applied to combine multiple types of omic data (e.g. DNA methylation, mRNA expression, and miRNA expression). |
| Deep belief network (DBN) | Generative NN with several hidden layers, which can be obtained from combining multiple RBMs | DBNs can be used to predict new relationships in a drug-target interaction network. |
| Generative adversarial network (GAN) | A generative NN approach where two neural networks are trained. One neural network, the generator, is provided with a set of randomly generated inputs and tasked with generating samples. The second, the discriminator, is trained to differentiate real and generated samples. After the two neural networks are trained against each other, the resulting generator can be used to produce new examples. | GANs can synthesize new examples with the same statistical properties of datasets that contain individual-level records and are subject to sharing restrictions. They have also been applied to generate microscopy images. |
| Adversarial training | A process by which artificial training examples are maliciously designed to fool a NN and then input as training examples to make the resulting NN robust (no relation to GANs) | Adversarial training has been used in image analysis. |
| Data augmentation | A process by which transformations that do not affect relevant properties of the input data (e.g. arbitrary rotations of histopathology images) are applied to training examples to increase the size of the training set. | Data augmentation is widely used in the analysis of images because rotation transformations for biomedical images often do not change relevant properties of the image. |
Table: Glossary.
{#tbl:glossary}
While deep learning shows increased flexibility over other machine learning approaches, as seen in the remainder of this review, it requires large training sets in order to fit the hidden layers, as well as accurate labels for the supervised learning applications.
For these reasons, deep learning has recently become popular in some areas of biology and medicine, while having lower adoption in other areas.
At the same time, this highlights the potentially even larger role that it may play in future research, given the increases in data in all biomedical fields.
It is also important to see it as a branch of machine learning and acknowledge that it has the same limitations as other approaches in that field.
In particular, the results are still dependent on the underlying study design and the usual caveats of correlation versus causation still apply---a more precise answer is only better than a less precise one if it answers the correct question.
### Will deep learning transform the study of human disease?
With this review, we ask the question: what is needed for deep learning to transform how we categorize, study, and treat individuals to maintain or restore health? We choose a high bar for "transform." Andrew Grove, the former CEO of Intel, coined the term Strategic Inflection Point to refer to a change in technologies or environment that requires a business to be fundamentally reshaped [@url:http://www.intel.com/pressroom/archive/speeches/ag080998.htm].
Here, we seek to identify whether deep learning is an innovation that can induce a Strategic Inflection Point in the practice of biology or medicine.
There are already a number of reviews focused on applications of deep learning in biology [@doi:10.1038/nbt.3313; @doi:10.1021/acs.molpharmaceut.5b00982; @tag:Angermueller2016_dl_review; @doi:10.1093/bib/bbw068; @doi:10.3109/10409238.2015.1135868], healthcare [@doi:10.1093/bib/bbx044; @tag:Litjens2017_medimage_survey; @tag:Kalinin2018_pgx], and drug discovery [@doi:10.1002/minf.201501008; @doi:10.1002/jcc.24764; @tag:PerezSianes2016_screening; @tag:Baskin2015_drug_disc].
Under our guiding question, we sought to highlight cases where deep learning enabled researchers to solve challenges that were previously considered infeasible or makes difficult, tedious analyses routine.
We also identified approaches that researchers are using to sidestep challenges posed by biomedical data.
We find that domain-specific considerations have greatly influenced how to best harness the power and flexibility of deep learning.
Model interpretability is often critical.
Understanding the patterns in data may be just as important as fitting the data.
In addition, there are important and pressing questions about how to build networks that efficiently represent the underlying structure and logic of the data.
Domain experts can play important roles in designing networks to represent data appropriately, encoding the most salient prior knowledge and assessing success or failure.
There is also great potential to create deep learning systems that augment biologists and clinicians by prioritizing experiments or streamlining tasks that do not require expert judgment.
We have divided the large range of topics into three broad classes: Disease and Patient Categorization, Fundamental Biological Study, and Treatment of Patients.
Below, we briefly introduce the types of questions, approaches and data that are typical for each class in the application of deep learning.
#### Disease and patient categorization
A key challenge in biomedicine is the accurate classification of diseases and disease subtypes.
In oncology, current "gold standard" approaches include histology, which requires interpretation by experts, or assessment of molecular markers such as cell surface receptors or gene expression.
One example is the PAM50 approach to classifying breast cancer where the expression of 50 marker genes divides breast cancer patients into four subtypes.
Substantial heterogeneity still remains within these four subtypes [@doi:10.1200/JCO.2008.18.1370; @doi:10.1158/1078-0432.CCR-13-0583].
Given the increasing wealth of molecular data available, a more comprehensive subtyping seems possible.
Several studies have used deep learning methods to better categorize breast cancer patients: For instance, denoising autoencoders, an unsupervised approach, can be used to cluster breast cancer patients [@doi:10.1142/9789814644730_0014], and CNNs can help count mitotic divisions, a feature that is highly correlated with disease outcome in histological images [@doi:10.1007/978-3-642-40763-5_51].
Despite these recent advances, a number of challenges exist in this area of research, most notably the integration of molecular and imaging data with other disparate types of data such as electronic health records (EHRs).
#### Fundamental biological study
Deep learning can be applied to answer more fundamental biological questions; it is especially suited to leveraging large amounts of data from high-throughput
"omics" studies.
One classic biological problem where machine learning, and now deep learning, has been extensively applied is molecular target prediction.
For example, deep recurrent neural networks (RNNs) have been used to predict gene targets of microRNAs [@doi:10.1109/icnn.1994.374637], and CNNs have been applied to predict protein residue-residue contacts and secondary structure [@doi:10.1371/journal.pcbi.1005324; @doi:10.1109/TCBB.2014.2343960; @doi:10.1038/srep18962].
Other recent exciting applications of deep learning include recognition of functional genomic elements such as enhancers and promoters [@doi:10.1101/036129; @doi:10.1007/978-3-319-16706-0_20; @doi:10.1093/nar/gku1058] and prediction of the deleterious effects of nucleotide polymorphisms [@doi:10.1093/bioinformatics/btu703].
#### Treatment of patients
Although the application of deep learning to patient treatment is just beginning, we expect new methods to recommend patient treatments, predict treatment outcomes, and guide the development of new therapies.
One type of effort in this area aims to identify drug targets and interactions or predict drug response.
Another uses deep learning on protein structures to predict drug interactions and drug bioactivity [@tag:Wallach2015_atom_net].
Drug repositioning using deep learning on transcriptomic data is another exciting area of research [@doi:10.1021/acs.molpharmaceut.6b00248].
Restricted Boltzmann machines (RBMs) can be combined into deep belief networks (DBNs) to predict novel drug-target interactions and formulate drug repositioning hypotheses [@doi:10.1093/bioinformatics/btt234; @doi:10.1021/acs.jproteome.6b00618].
Finally, deep learning is also prioritizing chemicals in the early stages of drug discovery for new targets [@tag:Baskin2015_drug_disc].