A multi-layer perceptron (MLP) consists of at least three sets of of nodes: an input layer, one or more hidden layer and an output layer. Each node except for the input node is a neuron that uses a nonlinear activation function. The multiple layers and non-linearities allow an MLP to distinguish data that is not linearly separable once trained.
In this example, we create a MLP that classifies handwritten digits using the MNIST dataset.
Our model uses the simplest Flux layers, namely Dense
and Chain
.
Since it uses softmax on its outputs, and crossentropy
as the loss function.
For simplicity this model does not use a graphics card, since an ordinary CPU is fast enough. See for example the LeNet convolutional network for GPU usage.
You can copy and paste the example into the Julia REPL to see what each part does. Or you can run it all at once from the terminal, like this:
cd vision/mlp_mnist
julia --project mlp_mnist.jl