Looking inside a digit recognizer

Let's take a look at the internals of the model we trained on the MNIST dataset previously.

Setup

We'll load our data and train our model from scratch since it trains pretty quickly.

Our Model

We can use the show.modelSummary to see a summary of the layers present in our model.

This model is a 'convolutional' model, a common model type for image processing applications. They get their name from the 'convolution' operation implemented by some of their layers. There are 3 layers of interest we will focus on in this example:

Conv2d1: The first convolution

Convolutional layers are able to capture local spatial patterns in images due to how the convolution operation works. Convolution works similarly to image filters in common image editing tools or css filters. In this case they are applied to the image to extract information useful for classification. You can learn more about how convolution works from this interactive explanation. The training process learns appropriate 'filters' to help with the digit recognition task.

Before we take an in depth look at the filters we can get some summary information about this layer using the show.layer api.

The conv2d_Conv2D1/kernel weight represents the filters used in this convolution. Lets take a look at these filters and how they transform the output.

First we will define some helper functions.

With that we can look at how this layer transforms some example images.

This layer learns filters that tends to detect lower level features like edges.

Conv2d2: The second convolution

The second convolutional layer extracts higher level features from the output of the prior layer. This layer has twice as many filters as the previous convolutional layer. 16 different 'images' will be produced for each input tensor to this layer. We can see a summary of this layer and then look at the resulting activations.

dense_Dense1: The Dense layer

The dense layer is responsible for combining all the information from previous layers by multipying those values with its weights to generate the final prediction.

The 'activations' from this layer are actually our models final output. 10 values are output that represent the probablility that the image represents that digit (0-9).

That's it

An interesting exercise for you is to reload this page, load the data and then look at the activations before training the model, this will show you what those actications look like when the model is randomly initialized. Compare what these look like to activations produced after the model is trained.