people/a/AlammarJay/feedforward-neural-networks-visual-interactive/index.html
<!--#include virtual="/header-start.html" -->
<title>A Visual And Interactive Look at Basic Neural Network Math</title>
<meta content="https://jalammar.github.io/visual-interactive-guide-basics-neural-networks/" name="url">
<meta content="Alammar, Jay" name="author">
<meta content="Jay Alammar" name="copyright">
<meta content="A Visual And Interactive Look at Basic Neural Network Math" property="og:title"/>
<meta content="A Visual And Interactive Look at Basic Neural Network Math" property="twitter:title"/>
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/5.16.0/d3.js" type="text/javascript"></script>
<script src="../js/d3-selection-multi.v0.4.min.js" type="text/javascript"></script>
<script src="../js/d3-jetpack.js" type="text/javascript"></script>
<link href="../style.css" rel="stylesheet" type="text/css"/>
<link href="https://jalammar.github.io/feed.xml" rel="alternate"
title="Jay Alammar - Visualizing machine learning one concept at a time." type="application/rss+xml"/>
<script type="text/javascript"> var _paq = _paq || [];</script>
<!--#include virtual="/header-end.html" -->
<div class="prediction">
<p>In the <a href="https://jalammar.github.io/visual-interactive-guide-basics-neural-networks/">previous post, we
looked at the basic concepts of neural networks</a>. Let us now take another example as an excuse to guide us to
explore some of the basic mathematical ideas involved in prediction with neural networks.</p>
<video autoplay="" class="img-div-large" controls="" loop="">
<source src="titanic_nn_calculation.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
<p>If you had been aboard the Titanic, would you have survived the sinking event? Let’s build a model to predict one’s
odds of survival.</p>
<p>This will be a neural network model building on what we discussed in the <a
href="https://jalammar.github.io/feedforward-neural-networks-visual-interactive/">previous post</a>, but will have
a higher prediction accuracy because it utilizes hidden layers and activation functions.</p>
<p>The dataset we’ll use this time will be the <a
href="https://jalammar.github.io/feedforward-neural-networks-visual-interactive/">Titanic passenger list</a> from
Kaggle. It lists the names and other information of the passengers and shows whether each passenger survived the
sinking event or not.</p>
<p>The raw dataset looks like this:</p>
<div class="titanic-dataset">
<table>
<thead>
<tr>
<th>PassengerId</th>
<th>Survived</th>
<th>Pclass</th>
<th>Name</th>
<th>Sex</th>
<th>Age</th>
<th>SibSp</th>
<th>Parch</th>
<th>Ticket</th>
<th>Fare</th>
<th>Cabin</th>
<th>Embarked</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
<td>3</td>
<td>Braund, Mr. Owen Harris</td>
<td>male</td>
<td>22.0</td>
<td>1</td>
<td>0</td>
<td>A/5 21171</td>
<td>7.2500</td>
<td>NaN</td>
<td>S</td>
</tr>
<tr>
<td>2</td>
<td>1</td>
<td>1</td>
<td>Cumings, Mrs. John Bradley (Florence Briggs Th…</td>
<td>female</td>
<td>38.0</td>
<td>1</td>
<td>0</td>
<td>PC 17599</td>
<td>71.2833</td>
<td>C85</td>
<td>C</td>
</tr>
<tr>
<td>3</td>
<td>1</td>
<td>3</td>
<td>Heikkinen, Miss. Laina</td>
<td>female</td>
<td>26.0</td>
<td>0</td>
<td>0</td>
<td>STON/O2. 3101282</td>
<td>7.9250</td>
<td>NaN</td>
<td>S</td>
</tr>
</tbody>
</table>
</div>
<p>We won’t bother with most of the columns for now. We’ll just use the sex and age columns as our features, and
survival as our label that we’ll try to predict.</p>
<div class="two_variables">
<table>
<thead>
<tr>
<th>Age</th>
<th>Sex</th>
<th>Survived?</th>
</tr>
</thead>
<tbody>
<tr>
<td>22</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>38</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>26</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td colspan="3">… 891 rows total</td>
</tr>
</tbody>
</table>
</div>
<p>We’ll attempt to build a network that predicts whether a passenger survived or not.</p>
<p>Neural networks need their inputs to be numeric. So we had to change the sex column – male is now 0, female is 1.
You’ll notice the dataset already uses something similar for the survival column – survived is 1, did not survive is
0.</p>
<p>The simplest neural network we can use to train to make this prediction looks like this:</p>
<div class="img-div">
<img alt="neural netowrk with two inputs, one output, and sigmoid output activation"
src="./index_files/two-input-one-output-sigmoid-network.png"> Calculating a prediction is done by plugging in a
value for "age" and "sex". The calculation then flows from the left to the right. Before we can use this net for
prediction, however, we'll have to run a "training" process that will give us the values for the weights (<span
class="weight-node-text">w</span>) and bias (<span class="bias-node-text">b</span>). <br> Note: we have slightly
adjusted the way we represent the networks from the previous post. The bias node specifically is more commonly
represented like this
</div>
<p>Let’s recap the elements that make up this network and how they work:</p>
<div class="row neuron-expo vertical-align">
<div class="col-sm-4 small-column">
<p><img alt="input neuron" src="./index_files/input-neuron.png"></p>
</div>
<div class="col-sm-8 side-column">
<p>An input neuron is where we plug in an input value (e.g. the age of a person). It’s where the calculation
starts. The outgoing connection and the rest of the graph tell us what other calculations we need to do to
calculate a prediction.</p>
</div>
</div>
<div class="row neuron-expo vertical-align">
<div class="col-sm-4 small-column">
<p><img alt="weighted neuron image" src="./index_files/weight.png"></p>
</div>
<div class="col-sm-8 side-column">
<p>If a connection has a weight, then the value is multiplied by that weight as it passes through it.</p>
<div class="language-plaintext highlighter-rouge">
<div class="highlight"><pre class="highlight"><code>connection_output = weight * connection_input
</code></pre>
</div>
</div>
</div>
</div>
<div class="row neuron-expo vertical-align">
<div class="col-sm-4 small-column">
<img alt=" neuron image" src="./index_files/neuron.png">
</div>
<div class="col-sm-8 side-column">
<p>If a neuron has inputs, it sums their value and sends that sum along its outgoing connection(s).</p>
<div class="language-plaintext highlighter-rouge">
<div class="highlight"><pre class="highlight"><code>node_output = input_1 + input_2
</code></pre>
</div>
</div>
</div>
</div>
<section>
<h2>Sigmoid</h2>
<div class="row neuron-expo vertical-align">
<div class="col-sm-4 small-column">
<p><img alt="sigmoid neuron" src="./index_files/sigmoid-neuron.png"></p>
</div>
<div class="col-sm-8 side-column">
<p>To turn the network’s calculation into a probability value between 0 and 1, we have to pass the value from
the output layer through a “sigmoid” formula. Sigmoid squashes the output value of a neuron to between 0 and 1
according to a specific curve.</p>
<p>`f(x)=1/(1+e^-x)`</p>
<p>Where e is the mathematical constant approximately equal to 2.71828</p>
<div class="language-plaintext highlighter-rouge">
<div class="highlight"><pre class="highlight"><code>def sigmoid(x):
return 1/(1 + math.exp(-x))
output = sigmoid(value)
</code></pre>
</div>
</div>
</div>
</div>
</section>
<section>
<h2>Sigmoid Visualization</h2>
<div class="row neuron-expo vertical-align">
<div class="col-sm-4 small-column">
<!-- ==== SIGMOID ACTIVATION GRAPH ==== -->
<table>
<tbody>
<tr>
<td class="sigmoid-input-value explicit-slider-weight-value">-1.63</td>
<td><img alt="sigmoid neuron" src="./index_files/sigmoid-neuron.png"></td>
<td class="explicit-activation-output-value">0.16383036122</td>
</tr>
</tbody>
</table>
</div>
<div class="col-sm-8 side-column">
<p>Interact a little with sigmoid to see how it transforms various values</p>
<!-- ==== SIGMOID SLIDER ==== -->
<table class="activation-graph-slider">
<tbody>
<tr>
<td>
<input class="weight" id="sigmoid-slider" max="20" min="-20" step="0.01"
style="width: 320px; margin-left: 40px;" type="range">
</td>
<td class="slider-value">
<span class="weight sigmoid-input-value">-1.63</span>
</td>
</tr>
</tbody>
</table>
<!-- ==== SIGMOID FORMULA ==== -->
<p style="margin-left:40px">f(<span class="slider-value"><span
class="sigmoid-input-value weight">-1.63</span></span>) = <span
id="sigmoid_function" style="font-size:200%">`1/(1+e^-(x))`</span> = <span
id="sigmoid-result">0.16383036122</span></p>
<!-- ==== SIGMOID GRAPH ==== -->
<div id="sigmoid-graph" style="width:100%">
<svg class="activation-graph" height="160" width="400">
<g transform="translate(60,30)">
<g class="x-axis" fill="none" font-family="sans-serif" font-size="10" text-anchor="middle"
transform="translate(0,110)">
<g class="tick" opacity="1" transform="translate(0,0)">
<line stroke="#000" x1="0.5" x2="0.5" y2="6"></line>
<text dy="0.71em" fill="#000" x="0.5" y="9">-20</text>
</g>
<g class="tick" opacity="1" transform="translate(35,0)">
<line stroke="#000" x1="0.5" x2="0.5" y2="6"></line>
<text dy="0.71em" fill="#000" x="0.5" y="9">-15</text>
</g>
<g class="tick" opacity="1" transform="translate(70,0)">
<line stroke="#000" x1="0.5" x2="0.5" y2="6"></line>
<text dy="0.71em" fill="#000" x="0.5" y="9">-10</text>
</g>
<g class="tick" opacity="1" transform="translate(105,0)">
<line stroke="#000" x1="0.5" x2="0.5" y2="6"></line>
<text dy="0.71em" fill="#000" x="0.5" y="9">-5</text>
</g>
<g class="tick" opacity="1" transform="translate(140,0)">
<line stroke="#000" x1="0.5" x2="0.5" y2="6"></line>
<text dy="0.71em" fill="#000" x="0.5" y="9">0</text>
</g>
<g class="tick" opacity="1" transform="translate(175,0)">
<line stroke="#000" x1="0.5" x2="0.5" y2="6"></line>
<text dy="0.71em" fill="#000" x="0.5" y="9">5</text>
</g>
<g class="tick" opacity="1" transform="translate(210,0)">
<line stroke="#000" x1="0.5" x2="0.5" y2="6"></line>
<text dy="0.71em" fill="#000" x="0.5" y="9">10</text>
</g>
<g class="tick" opacity="1" transform="translate(245,0)">
<line stroke="#000" x1="0.5" x2="0.5" y2="6"></line>
<text dy="0.71em" fill="#000" x="0.5" y="9">15</text>
</g>
<g class="tick" opacity="1" transform="translate(280,0)">
<line stroke="#000" x1="0.5" x2="0.5" y2="6"></line>
<text dy="0.71em" fill="#000" x="0.5" y="9">20</text>
</g>
</g>
<g class="y-axis" fill="none" font-family="sans-serif" font-size="10" text-anchor="end">
<path class="domain" d="M-6,110.5H0.5V0.5H-6" stroke="#000"></path>
<g class="tick" opacity="1" transform="translate(0,110)">
<line stroke="#000" x2="-6" y1="0.5" y2="0.5"></line>
<text dy="0.32em" fill="#000" x="-9" y="0.5">0.0</text>
</g>
<g class="tick" opacity="1" transform="translate(0,55)">
<line stroke="#000" x2="-6" y1="0.5" y2="0.5"></line>
<text dy="0.32em" fill="#000" x="-9" y="0.5">0.5</text>
</g>
<g class="tick" opacity="1" transform="translate(0,0)">
<line stroke="#000" x2="-6" y1="0.5" y2="0.5"></line>
<text dy="0.32em" fill="#000" x="-9" y="0.5">1.0</text>
</g>
<text dy="0.71em" fill="#000" text-anchor="end" transform="rotate(-90)" y="6">Output</text>
</g>
<g class="grid" fill="none" font-family="sans-serif" font-size="10" text-anchor="middle"
transform="translate(0,110)">
<path class="domain" d="M0.5,-110V0.5H280.5V-110" stroke="#000"></path>
<g class="tick" opacity="1" transform="translate(140,0)">
<line stroke="#000" x1="0.5" x2="0.5" y2="-110"></line>
<text dy="0.71em" fill="#000" x="0.5" y="3"></text>
</g>
</g>
<g class="grid" fill="none" font-family="sans-serif" font-size="10" text-anchor="end">
<path class="domain" d="M280,110.5H0.5V0.5H280" stroke="#000"></path>
<g class="tick" opacity="1" transform="translate(0,110)">
<line stroke="#000" x2="280" y1="0.5" y2="0.5"></line>
<text dy="0.32em" fill="#000" x="-3" y="0.5"></text>
</g>
<g class="tick" opacity="1" transform="translate(0,55)">
<line stroke="#000" x2="280" y1="0.5" y2="0.5"></line>
<text dy="0.32em" fill="#000" x="-3" y="0.5"></text>
</g>
<g class="tick" opacity="1" transform="translate(0,0)">
<line stroke="#000" x2="280" y1="0.5" y2="0.5"></line>
<text dy="0.32em" fill="#000" x="-3" y="0.5"></text>
</g>
</g>
<path class="sigmoid-line"
d="M0,110C0,110,5.833333333333334,110,7,110C8.166666666666666,110,12.833333333333334,110,14,110C15.166666666666666,110,19.833333333333332,110,21,110C22.166666666666668,110,26.833333333333332,110,28,110C29.166666666666668,110,33.833333333333336,110,35,110C36.166666666666664,110,40.833333333333336,110,42,110C43.166666666666664,110,47.833333333333336,110,49,110C50.166666666666664,110,54.833333333333336,110,56,110C57.166666666666664,110,61.833333333333336,110,63,110C64.16666666666667,110,68.83333333333333,110,70,110C71.16666666666667,110,75.83333333333333,110,77,110C78.16666666666667,110,82.83333333333333,110,84,110C85.16666666666667,110,89.83333333333333,110,91,110C92.16666666666667,110,96.83333333333333,110.08333333333333,98,110C99.16666666666667,109.91666666666667,103.83333333333333,109.16666666666667,105,109C106.16666666666667,108.83333333333333,110.83333333333333,108.33333333333333,112,108C113.16666666666667,107.66666666666667,117.83333333333333,105.91666666666667,119,105C120.16666666666667,104.08333333333333,124.83333333333333,99.08333333333333,126,97C127.16666666666667,94.91666666666667,131.83333333333334,83.5,133,80C134.16666666666666,76.5,138.83333333333334,59.166666666666664,140,55C141.16666666666666,50.833333333333336,145.83333333333334,33.5,147,30C148.16666666666666,26.5,152.83333333333334,15.083333333333332,154,13C155.16666666666666,10.916666666666668,159.83333333333334,5.916666666666667,161,5C162.16666666666666,4.083333333333333,166.83333333333334,2.3333333333333335,168,2C169.16666666666666,1.6666666666666667,173.83333333333334,1.1666666666666667,175,1C176.16666666666666,0.8333333333333334,180.83333333333334,0.08333333333333333,182,0C183.16666666666666,-0.08333333333333333,187.83333333333334,0,189,0C190.16666666666666,0,194.83333333333334,0,196,0C197.16666666666666,0,201.83333333333334,0,203,0C204.16666666666666,0,208.83333333333334,0,210,0C211.16666666666666,0,215.83333333333334,0,217,0C218.16666666666666,0,222.83333333333334,0,224,0C225.16666666666666,0,229.83333333333334,0,231,0C232.16666666666666,0,236.83333333333334,0,238,0C239.16666666666666,0,243.83333333333334,0,245,0C246.16666666666666,0,250.83333333333334,0,252,0C253.16666666666666,0,257.8333333333333,0,259,0C260.1666666666667,0,264.8333333333333,0,266,0C267.1666666666667,0,271.8333333333333,0,273,0C274.1666666666667,0,280,0,280,0"></path>
<g class="value-point" transform="translate(129,92)">
<ellipse class="sigmoid-value-dot" cx="0" cy="0" rx="5" ry="5"></ellipse>
<text class="sigmoid-value-text" fill="red" font-size="12" text-anchor="middle" y="-8">0.16383036122
</text>
</g>
</g>
</svg>
</div>
</div>
</div>
<div class="row neuron-expo vertical-align">
<div class="col-sm-4 small-column">
<p><img alt="weighted neuron image" src="./index_files/two-input-one-output-sigmoid-network.png"></p>
</div>
<div class="col-sm-8 side-column">
<p>To bring it all together, calculating a prediction with this shallow network looks like this:</p>
<div class="language-plaintext highlighter-rouge">
<div class="highlight"><pre class="highlight"><code>def sigmoid(x):
return 1/(1 + math.exp(-x))
def calculate_prediction(age, sex, weight_1, weight_2, bias):
# Multiply the inputs by their weights, sum the results up
layer_2_node = age * weight_1 + sex * weight_2 + 1 * bias
prediction = sigmoid(layer_2_node)
return prediction
</code></pre>
</div>
</div>
</div>
</div>
<p>Now that we know the structure of our network, we can train it using [gradient descent] running on the first 600
rows of the 891-row dataset. I will not be addressing the training process in this post because that’s a separate
concern at the moment. For now, I just want you to be comfortable with how a trained network calculates a
prediction. Once you get this intuition down, we’ll proceed to training in a future post.</p>
<p>The training process gives us the following values (with an accuracy of 73.20%):</p>
<div class="language-plaintext highlighter-rouge">
<div class="highlight"><pre class="highlight"><code>weight_1 = -0.016852 # Associated with "Age"
weight_2 = 0.704039 # Associated with "Sex" (where male is 0, female is 1)
bias = -0.116309
</code></pre>
</div>
</div>
<p>Intuitively, the weights indicate how much their associated property contribute to the prediction – odds of
survival improve the younger a person is (since a larger age multiplied by the negative weight value gives a
bigger negative number). They improve more if the person is female.</p></section>
<section>
<h2>Prediction Calculation</h2>
<p>The trained network now looks like this: (hover or click on values in the table to see their individual
predictions)</p>
<div class="row vertical-align">
<div class="col-sm-3" id="neural-network-calculation-table">
<table class="collapsed-style">
<thead>
<tr>
<th class="title"></th>
<th class="title">Age</th>
<th class="center">Sex</th>
<th class="center">Survived</th>
</tr>
</thead>
<tbody>
<tr>
<td class="title"><input class="radio_0" name="person" type="radio"></td>
<td class="title">22</td>
<td class="center">0</td>
<td class="center">0</td>
</tr>
<tr>
<td class="title"><input class="radio_1" name="person" type="radio"></td>
<td class="title">38</td>
<td class="center">1</td>
<td class="center">1</td>
</tr>
<tr>
<td class="title"><input class="radio_2" name="person" type="radio"></td>
<td class="title">26</td>
<td class="center">1</td>
<td class="center">1</td>
</tr>
<tr>
<td class="title"><input class="radio_3" name="person" type="radio"></td>
<td class="title">35</td>
<td class="center">1</td>
<td class="center">1</td>
</tr>
<tr>
<td class="title"><input class="radio_4" name="person" type="radio"></td>
<td class="title">35</td>
<td class="center">0</td>
<td class="center">0</td>
</tr>
<tr>
<td class="title"><input class="radio_5" name="person" type="radio"></td>
<td class="title">14</td>
<td class="center">1</td>
<td class="center">0</td>
</tr>
<tr>
<td class="title"><input class="radio_6" name="person" type="radio"></td>
<td class="title">25</td>
<td class="center">0</td>
<td class="center">0</td>
</tr>
<tr>
<td class="title"><input class="radio_7" name="person" type="radio"></td>
<td class="title">54</td>
<td class="center">0</td>
<td class="center">0</td>
</tr>
</tbody>
</table>
</div>
<div class="col-sm-9" id="neural-network-calculation-viz">
<svg height="300" width="500">
<g>
<line class="nn-arrow softmax-to-output-line" marker-end="url(#arrow)" x1="413.5" x2="440" y1="150"
y2="150"></line>
<text class="softmax-output-class-name" x="413.5" y="175">Survived</text>
<line class="nn-arrow bias-to-softmax-line" marker-end="url(#arrow)" x1="300" x2="357.5" y1="150"
y2="150"></line>
<line class="nn-arrow input-to-bias-line input-0 output-0" x1="35" x2="300" y1="35" y2="150"></line>
<line class="nn-arrow input-to-bias-line input-1 output-0" x1="35" x2="300" y1="145" y2="150"></line>
<line class="nn-arrow input-to-bias-line input-2 output-0" x1="35" x2="300" y1="255" y2="150"></line>
<g class="input-group" transform="translate(35,35)">
<circle class="outlined-input-node nn-node" cx="0" cy="0" r="25"></circle>
<text class="node-text" id="input-name" text-anchor="middle" x="0" y="5">Age</text>
</g>
<g class="input-group" transform="translate(35,145)">
<circle class="outlined-input-node nn-node" cx="0" cy="0" r="25"></circle>
<text class="node-text" id="input-name" text-anchor="middle" x="0" y="5">Sex</text>
</g>
<g class="input-group" transform="translate(35,255)">
<circle class="outlined-bias-node nn-node" cx="0" cy="0" r="25"></circle>
<text class="node-text" id="input-name" text-anchor="middle" x="0" y="5">Bias</text>
</g>
<g class="weight-group input-0-weight output-0-weight" transform="translate(100,63.20754716981132)">
<ellipse class="outlined-weight-node nn-node" cx="0" cy="0" rx="22" ry="10"></ellipse>
<text class="weightNodeText" id="weight0Value" text-anchor="middle" y="3">-0.0169</text>
</g>
<g class="weight-group input-1-weight output-0-weight" transform="translate(100,146.22641509433961)">
<ellipse class="outlined-weight-node nn-node" cx="0" cy="0" rx="22" ry="10"></ellipse>
<text class="weightNodeText" id="weight0Value" text-anchor="middle" y="3">0.704</text>
</g>
<g class="weight-group input-2-weight output-0-weight" transform="translate(100,229.24528301886792)">
<ellipse class="outlined-bias-node nn-node" cx="0" cy="0" rx="22" ry="10"></ellipse>
<text class="weightNodeText" id="weight0Value" text-anchor="middle" y="3">-0.1163</text>
</g>
<g class="output-group" transform="translate(300,150)">
<circle class="outlined-output-node nn-node" cx="0" cy="0" r="25"></circle>
<text class="node-text" id="output-name" text-anchor="middle" x="0" y="5"></text>
</g>
<g class="sigmoid activation" transform="translate(382.5,150)">
<rect class="outlined-sigmoid-node nn-node" height="140" rx="6.25" ry="6.25" width="37.5" x="-18.75"
y="-70"></rect>
<text id="sigmoid-label" text-anchor="middle" x="0" y="-2">σ</text>
</g>
</g>
<defs>
<marker id="arrow" markerHeight="4" markerWidth="4" orient="auto" refX="5" refY="0" viewBox="0 -5 10 10">
<path class="arrowHead" d="M0,-5L10,0L0,5"></path>
</marker>
</defs>
</svg>
</div>
</div>
<div class="nn-tooltip" style="opacity: 0"></div>
<p>An accuracy of 73.20% isn’t very impressive. This is a case that can benefit from adding a hidden layer. Hidden
layers give the model more capacity to represent more sophisticated prediction functions that may do a better job
(<a href="https://www.deeplearningbook.org/contents/ml.html">Deep Learning ch.5 page 113</a>).</p>
<div class="row neuron-expo vertical-align">
<div class="col-sm-4 small-column">
<p><img alt="weighted neuron with activation" src="./index_files/neuron_with_activation.png"></p>
</div>
<div class="col-sm-8 side-column">
<p>It’s often useful to apply certain math functions to the weighted outputs. These are called “activation
functions” because historically they translated the output of the neuron into either 1 (On/active) or 0
(Off).</p>
<div class="language-plaintext highlighter-rouge">
<div class="highlight"><pre class="highlight"><code>def activation_function(x):
# Do something to the value
...
weighted_sum = weight * (input_1 + input_2)
output = activation_function(weighted_sum)
</code></pre>
</div>
</div>
<p>Activation functions are vital for hidden layers. Without them, deep networks would be no better than a
shallow linear network. Read the “Commonly used activation functions” section from <a
href="https://cs231n.github.io/neural-networks-1/">Neural Networks Part 1: Setting up the Architecture</a>
for a look at various activation functions.</p>
</div>
</div>
</section>
<section>
<h3 id="relu-">ReLU</h3>
<div class="row neuron-expo vertical-align">
<div class="col-sm-4 small-column">
<p><img alt="weighted neuron with activation" src="./index_files/relu.png"></p>
</div>
<div class="col-sm-8 side-column">
<p>A leading choice for activation function is called ReLU. It returns 0 if its input is negative, returns the
number itself otherwise. Very simple!</p>
<p>f(x) = max(0, x)</p>
<div class="language-plaintext highlighter-rouge">
<div class="highlight"><pre class="highlight"><code># Naive scalar relu implementation. In the real world, most calculations are done on vectors
def relu(x):
if x < 0:
return 0
else:
return x
output = relu(value)
</code></pre>
</div>
</div>
</div>
</div>
</section>
<section>
<h2 id="relu-visualization-">ReLU Visualization</h2>
<div class="row neuron-expo vertical-align">
<div class="col-sm-4 small-column">
<!-- ==== RELU ACTIVATION GRAPH ==== -->
<table>
<tbody>
<tr>
<td class="relu-input-value explicit-slider-weight-value">0</td>
<td>
<img alt="Relu" src="./index_files/relu.png">
</td>
<td class="explicit-relu-activation-output-value">0</td>
</tr>
</tbody>
</table>
</div>
<div class="col-sm-8 side-column">
<p>Interact a little with relu to see how it transforms various values</p>
<!-- ==== RELU SLIDER ==== -->
<table class="activation-graph-slider">
<tbody>
<tr>
<td>
<input class="weight" id="relu-slider" max="20" min="-20" step="0.01"
style="width: 320px; margin-left: 40px;" type="range">
</td>
<td class="slider-value">
<span class="weight relu-input-value">0</span>
</td>
</tr>
</tbody>
</table>
<!-- ==== RELU FORMULA ==== -->
<p style="margin-left:40px">f(<span class="slider-value"><span class="relu-input-value weight">0</span></span>)
= max( 0, <span class="mord mathit" id="relu-formula-input"><span
class="relu-value-input-number">0.00</span></span>) = <span id="relu-result">0</span></p>
<!-- ==== RELU GRAPH ==== -->
<div id="relu-graph" style="width:100%">
<svg class="activation-graph" height="160" width="400">
<g transform="translate(60,30)">
<g class="x-axis" fill="none" font-family="sans-serif" font-size="10" text-anchor="middle"
transform="translate(0,110)">
<g class="tick" opacity="1" transform="translate(0,0)">
<line stroke="#000" x1="0.5" x2="0.5" y2="6"></line>
<text dy="0.71em" fill="#000" x="0.5" y="9">-20</text>
</g>
<g class="tick" opacity="1" transform="translate(35,0)">
<line stroke="#000" x1="0.5" x2="0.5" y2="6"></line>
<text dy="0.71em" fill="#000" x="0.5" y="9">-15</text>
</g>
<g class="tick" opacity="1" transform="translate(70,0)">
<line stroke="#000" x1="0.5" x2="0.5" y2="6"></line>
<text dy="0.71em" fill="#000" x="0.5" y="9">-10</text>
</g>
<g class="tick" opacity="1" transform="translate(105,0)">
<line stroke="#000" x1="0.5" x2="0.5" y2="6"></line>
<text dy="0.71em" fill="#000" x="0.5" y="9">-5</text>
</g>
<g class="tick" opacity="1" transform="translate(140,0)">
<line stroke="#000" x1="0.5" x2="0.5" y2="6"></line>
<text dy="0.71em" fill="#000" x="0.5" y="9">0</text>
</g>
<g class="tick" opacity="1" transform="translate(175,0)">
<line stroke="#000" x1="0.5" x2="0.5" y2="6"></line>
<text dy="0.71em" fill="#000" x="0.5" y="9">5</text>
</g>
<g class="tick" opacity="1" transform="translate(210,0)">
<line stroke="#000" x1="0.5" x2="0.5" y2="6"></line>
<text dy="0.71em" fill="#000" x="0.5" y="9">10</text>
</g>
<g class="tick" opacity="1" transform="translate(245,0)">
<line stroke="#000" x1="0.5" x2="0.5" y2="6"></line>
<text dy="0.71em" fill="#000" x="0.5" y="9">15</text>
</g>
<g class="tick" opacity="1" transform="translate(280,0)">
<line stroke="#000" x1="0.5" x2="0.5" y2="6"></line>
<text dy="0.71em" fill="#000" x="0.5" y="9">20</text>
</g>
</g>
<g class="y-axis" fill="none" font-family="sans-serif" font-size="10" text-anchor="end">
<path class="domain" d="M-6,110.5H0.5V0.5H-6" stroke="#000"></path>
<g class="tick" opacity="1" transform="translate(0,110)">
<line stroke="#000" x2="-6" y1="0.5" y2="0.5"></line>
<text dy="0.32em" fill="#000" x="-9" y="0.5">0</text>
</g>
<g class="tick" opacity="1" transform="translate(0,99)">
<line stroke="#000" x2="-6" y1="0.5" y2="0.5"></line>
<text dy="0.32em" fill="#000" x="-9" y="0.5">2</text>
</g>
<g class="tick" opacity="1" transform="translate(0,88)">
<line stroke="#000" x2="-6" y1="0.5" y2="0.5"></line>
<text dy="0.32em" fill="#000" x="-9" y="0.5">4</text>
</g>
<g class="tick" opacity="1" transform="translate(0,77)">
<line stroke="#000" x2="-6" y1="0.5" y2="0.5"></line>
<text dy="0.32em" fill="#000" x="-9" y="0.5">6</text>
</g>
<g class="tick" opacity="1" transform="translate(0,66)">
<line stroke="#000" x2="-6" y1="0.5" y2="0.5"></line>
<text dy="0.32em" fill="#000" x="-9" y="0.5">8</text>
</g>
<g class="tick" opacity="1" transform="translate(0,55)">
<line stroke="#000" x2="-6" y1="0.5" y2="0.5"></line>
<text dy="0.32em" fill="#000" x="-9" y="0.5">10</text>
</g>
<g class="tick" opacity="1" transform="translate(0,44)">
<line stroke="#000" x2="-6" y1="0.5" y2="0.5"></line>
<text dy="0.32em" fill="#000" x="-9" y="0.5">12</text>
</g>
<g class="tick" opacity="1" transform="translate(0,33)">
<line stroke="#000" x2="-6" y1="0.5" y2="0.5"></line>
<text dy="0.32em" fill="#000" x="-9" y="0.5">14</text>
</g>
<g class="tick" opacity="1" transform="translate(0,22)">
<line stroke="#000" x2="-6" y1="0.5" y2="0.5"></line>
<text dy="0.32em" fill="#000" x="-9" y="0.5">16</text>
</g>
<g class="tick" opacity="1" transform="translate(0,11)">
<line stroke="#000" x2="-6" y1="0.5" y2="0.5"></line>
<text dy="0.32em" fill="#000" x="-9" y="0.5">18</text>
</g>
<g class="tick" opacity="1" transform="translate(0,0)">
<line stroke="#000" x2="-6" y1="0.5" y2="0.5"></line>
<text dy="0.32em" fill="#000" x="-9" y="0.5">20</text>
</g>
<text dy="0.71em" fill="#000" text-anchor="end" transform="rotate(-90)" y="6">Output</text>
</g>
<g class="grid" fill="none" font-family="sans-serif" font-size="10" text-anchor="middle"
transform="translate(0,110)">
<path class="domain" d="M0.5,-110V0.5H280.5V-110" stroke="#000"></path>
<g class="tick" opacity="1" transform="translate(140,0)">
<line stroke="#000" x1="0.5" x2="0.5" y2="-110"></line>
<text dy="0.71em" fill="#000" x="0.5" y="3"></text>
</g>
</g>
<g class="grid" fill="none" font-family="sans-serif" font-size="10" text-anchor="end">
<path class="domain" d="M280,110.5H0.5V0.5H280" stroke="#000"></path>
<g class="tick" opacity="1" transform="translate(0,110)">
<line stroke="#000" x2="280" y1="0.5" y2="0.5"></line>
<text dy="0.32em" fill="#000" x="-3" y="0.5"></text>
</g>
<g class="tick" opacity="1" transform="translate(0,0)">
<line stroke="#000" x2="280" y1="0.5" y2="0.5"></line>
<text dy="0.32em" fill="#000" x="-3" y="0.5"></text>
</g>
</g>
<path class="relu-line activation-line-straight" d="M0,110L70,110L140,110L210,55L280,0"></path>
<g class="value-point" transform="translate(140,110)">
<ellipse class="relu-value-dot activation-value-dot" cx="0" cy="0" rx="5" ry="5"></ellipse>
<text class="relu-value-text activation-value-text" fill="red" font-size="12" text-anchor="middle"
y="-8">0
</text>
</g>
</g>
</svg>
</div>
</div>
</div>
</section>
<section>
<h2>Closing</h2>
<p>This post has been parked for more than a year. I had attempted to visualize a deeper network after this point,
but that never materialized. I hope you enjoyed it. Let me know on <a href="https://twitter.com/JayAlammar">@JayAlammar
on Twitter</a> if you have any feedback.</p>
</section>
</div>
<div class="date">Written on December 14, 2016</div>
<div>
<a href="https://creativecommons.org/licenses/by-nc-sa/4.0/" rel="license"><img alt="Creative Commons License"
src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" style="border-width:0"/></a><br/>This work is
licensed under a <a href="https://creativecommons.org/licenses/by-nc-sa/4.0/" rel="license">Creative Commons
Attribution-NonCommercial-ShareAlike 4.0 International License</a>. <br/> Attribution example: <br/> <i>Alammar, Jay
(2018). The Illustrated Transformer [Blog post]. Retrieved from <a
href="https://jalammar.github.io/illustrated-transformer/">https://jalammar.github.io/illustrated-transformer/</a></i>
<br/><br/> Note: If you translate any of the posts, let me know so I can link your translation to the original post.
My email is in the <a href="https://jalammar.github.io/about">about page</a>.
</div>
<script>
(function (i, s, o, g, r, a, m) {
i['GoogleAnalyticsObject'] = r;
i[r] = i[r] || function () {
(i[r].q = i[r].q || []).push(arguments)
};
i[r].l = 1 * new Date();
a = s.createElement(o);
m = s.getElementsByTagName(o)[0];
a.async = 1;
a.src = g;
m.parentNode.insertBefore(a, m)
})(window, document, 'script', '//www.google-analytics.com/analytics.js', 'ga');
ga('create', 'UA-71956058-1', 'auto');
ga('send', 'pageview', {
'page': '/visual-interactive-guide-basics-neural-networks/',
'title': 'A Visual and Interactive Guide to the Basics of Neural Networks'
});
</script>
<!--#include virtual="/footer.html" -->
<script src="../js/bootstrap.min.js"></script>
<script>
$(document).ready(function () {
setTimeout(() => {
$.getScript("../js/nnVizUtils.js");
$.getScript("../js/sigmoid_graph.js");
$.getScript("../js/nn_calc.js");
$.getScript("../js/relu_graph.js");
$.getScript("../js/accuracy-graph.js");
}, 4000) // Wait for angular to set the DOM
});
</script>
<style>.mjx-math * {
line-height: 0;
}</style>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.4/latest.js?config=AM_CHTML"></script>