In this article we will slowly go through the steps of how to create a deep learning program, using Python and the Keras library.

We hope that this guide will be only a base to create your own amazing models in the future.

The steps we are going to illustrate in detail are the following:

  1. Upload data
  2. Define the Keras model
  3. Compile the Keras model
  4. Train the Keras model
  5. Evaluate the Keras model
  6. Tie it all together
  7. Make previsions

First thing first, create a new file called my_first_neural_network.py and type or copy and paste the code into the file as you go.

1. Upload data

The first step is to define the functions and classes we intend to use in this tutorial.

We will use the NumPy library to load our dataset and we will use two classes from the Keras library to define our model.

The required imports are listed below.

# my neural network in Python with Keras
from numpy import loadtxt
from keras.models import Sequential
from keras.layers import Dense

or you may import from TensorFlow

# My first neural network
from numpy import loadtxt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

We can now load our dataset.

In this tutorial on Keras, we will use the Pima Indians Diabetes Database available on GitHub. This is a standard machine learning dataset from the UCI Machine Learning repository. It describes patient medical record data for Pima Indians and whether they had an onset of diabetes within five years.

As such, it is a binary classification problem (beginning of diabetes as 1 or not as 0). All input variables describing each patient are numeric. This makes it easy to use directly with neural networks that need numeric values as both input and output and is ideal for your first neural network in Keras.

Download the dataset available here and place it in your local working directory, in the same location as your python file.

Save it with the file name:

pima-indians-diabetes.csv

Take a look inside the file, you should see lines of data like the following:

7,107,74,0,0,29.6,0.254,31,1
1,103,30,38,83,43.3,0.183,33,0
1,115,70,30,96,34.6,0.529,32,1
3,126,88,41,235,39.3,0.704,27,0
8,99,84,0,0,35.4,0.388,50,0
7,196,90,0,0,39.8,0.451,41,1
9,119,80,35,0,29.0,0.263,29,1
11,143,94,33,146,36.6,0.254,51,1
10,125,70,26,115,31.1,0.205,41,1
7,147,76,0,0,39.4,0.257,43,1
1,97,66,15,140,23.2,0.487,22,0
13,145,82,19,110,22.2,0.245,57,0

Now we can load the file as an array of numbers using the NumPy loadtxt() function.

There are eight input variables and one output variable (the last column).

We will learn a model for mapping rows of input variables (X) to an output variable (y), which we often summarize as y = f (X).

The variables can be summarized as follows:

Input variables (X):

  1. Number of pregnancies
  2. Glucose (2-hour plasma glucose concentration in an oral glucose tolerance test)
  3. Diastolic blood pressure (mm Hg)
  4. Thickness of the skin fold of the triceps (mm)
  5. Insulin (2-hour serum insulin (mu U / mL))
  6. BMI, Body mass index (weight in kg / (height in m) ^ 2)
  7. DiabetesPedigreeFunction (genealogical function of diabetes)
  8. Age (years)

Output variables (y):

  1. Variable class (0 or 1)

Once the CSV file has been loaded into memory, we can split the data columns into input and output variables.

The data will be stored in a 2D array where the first dimension is rows and the second dimension is columns, for example [rows, columns].

We can divide the array into two arrays by selecting subsets of columns using the NumPy operator “:” We can select the first 8 columns from index 0 to index 7 using the expression 0: 8. We can therefore select the output column (the 9th variable) by means of index 8. (Because we remember that the indices start from zero not from 1)

# load the dataset
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=',')
# divide variables in input(X) e output (y)
X = dataset[:,0:8]
y = dataset[:,8]

We are now ready to define our neural network model.

Note that the dataset has 9 columns, and the range 0: 8 will select columns 0 to 7, stopping before index 8.

2. Define the Keras model

The models in Keras are defined as a sequence of levels.

We create a sequential model and add layers one at a time until we are happy with our network architecture.

The first thing to do is to make sure that the input layer has the right number of input functions. This can be specified when creating the first level with the input_dim argument and setting it to 8 for the 8 input variables.

How do we know the number of layers and their types?

This is a very difficult question.

In this example, we will use a fully connected network structure with three layers.

Fully connected levels are defined using the Dense class. We can specify the number of neurons or nodes in the layer as the first argument and specify the activation function using the activation argument.

We will use the rectified linear unit activation function named ReLU on the first two layers and the Sigmoid function in the output layer.

In the past, the activation functions Sigmoid and Tanh were preferred for all levels. Nowadays, better performance is achieved by using the ReLU activation function. We use a sigmoid on the output layer to ensure that our network output is between 0 and 1 and is easy to map to a class 1 probability or hook to a rigid classification of one of the classes with a default threshold of 0.5.

We can put it all together by adding each layer:

  • The model expects rows of data with 8 variables (the argument input_dim = 8)
  • The first hidden level has 12 nodes and uses the ReLU activation function
  • The second hidden layer has 8 nodes and uses the ReLU activation function
  • The output layer has a node and uses the sigmoid activation function
# Define the Keras model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

Note that the most confusing thing here is that the shape of the input in the model is defined as an argument on the first hidden level. This means that the line of code adding the first Dense layer is doing 2 things, defining the input or visible layer and the first hidden layer.

3. Compile the Keras model

Now that the model is defined, we can compile it.

Compiling the model uses efficient numerical libraries under the covers (so-called backends) such as Theano or TensorFlow. The backend automatically chooses the best way to represent the network for training and make predictions to run on your hardware, such as CPU or GPU.

During compiling, you need to specify some additional properties required during network training. Remember that training a network means finding the best set of weights to map inputs to outputs in our dataset.

We need to specify the loss function to use to evaluate a set of weights, the optimizer is used to look for different weights for the network and any optional metrics we would like to collect and report during training.

In this case, we will use cross-entropy as the loss argument. This loss is for binary classification problems and is defined in Keras as “binary_crossentropy“.

We will define the optimizer as the efficient “adam” stochastic gradient descent algorithm. This is a popular version of slope descent because it automatically tunes and works well in a wide range of problems.

Finally, since this is a classification problem, we will collect and report the accuracy of the classification, defined via the metrics argument.

# Compile the model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

4. Train the Keras model

We have defined our model and compiled it and it is now ready for efficient calculation.

Now it’s time to run the model on some data.

We can train or adapt our model to our loaded data by calling the fit() function on the model.

The training takes place in epochs and each epoch is divided into lots.

Epoch: A step through all the rows in the training dataset.

batch_size: one or more samples considered by the model within an epoch before updating the weights.

An epoch is made up of one or more batches, depending on the batch_size of the batch chosen, and the model is suitable for many epochs.

The training process will run for a fixed number of iterations through the dataset called epochs, which we must specify using the epochs argument. We also need to set the number of dataset rows that are considered before the model weights are updated within each epoch, called the batch size, and set using the batch_size argument.

For this problem, we will run a limited number of epochs (150) and use a relatively small batch size of 10.

These configurations can be chosen experimentally by trial and error. We want to train the model sufficiently so that it learns good (or good enough) mapping of input data rows to output classification. The model will always have some error, but the amount of error will level out after a certain point for a given model configuration. This is called model convergence.

# Train the network
model.fit(X, y, epochs=150, batch_size=10)

This is where the work on your CPU or GPU takes place.

5. Evaluate the Keras model

We have trained our neural network on the entire dataset and can evaluate the network’s performance on the same dataset.

This will only give us an idea of how well we modeled the data set (e.g. train accuracy), but we have no idea how well the algorithm could work on new data. We did this for simplicity, but ideally, you could separate your data into training and test datasets for training and evaluating your model.

You can evaluate your model on your training dataset using the evaluate() function on your model and pass it the same input and output used to train the model.

This will generate a forecast for each pair of inputs and outputs and collect the scores, including the average loss and any metrics you have configured, such as accuracy.

The evaluate () function will return a list with two values. The first will be the loss of the model on the dataset and the second will be the accuracy of the model on the dataset. We are only interested in reporting accuracy, so we will ignore the loss value.

# Evaluate the performance of our network
_, accuracy = model.evaluate(X, y)
print('Accuratezza: %.2f' % (accuracy*100))

6. Let’s put it all together

You just saw how to easily create your first neural network model in Keras.

Let’s tie it all together in a complete code example.

# My first neural network
from numpy import loadtxt
from keras.models import Sequential
from keras.layers import Dense
# Load the dataset
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=',')
# Divide the variables in input and output
X = dataset[:,0:8]
y = dataset[:,8]
# Define the model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile the model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Train the model
model.fit(X, y, epochs=150, batch_size=10)
# Evaluate the performance
_, accuracy = model.evaluate(X, y)
print('Accuratezza: %.2f' % (accuracy*100))

You can copy all the code into your Python file and save it in the same directory as your “pima-indians-diabetes.csv” data file. You can then run the Python file as a script from the command line (command prompt) as follows:

python3 my_first_neural_network.py

Running this example, you should see a message for each of the 150 epochs printing the loss and accuracy, followed by the final evaluation of the trained model on the training dataset.

It takes about 10 seconds to run.

Ideally, we would like the loss to go to zero and the accuracy to go to 1.0 (i.e. 100%). Instead, we will always have some errors in our model. The goal is to choose a model configuration and training configuration that achieve the lowest loss and highest possible accuracy for a given data set.

Epoch 145/150
77/77 [==============================] - 0s 499us/step - loss: 0.4850 - accuracy: 0.7617
Epoch 146/150
77/77 [==============================] - 0s 1ms/step - loss: 0.4856 - accuracy: 0.7747
Epoch 147/150
77/77 [==============================] - 0s 2ms/step - loss: 0.4798 - accuracy: 0.7747
Epoch 148/150
77/77 [==============================] - 0s 2ms/step - loss: 0.4908 - accuracy: 0.7604
Epoch 149/150
77/77 [==============================] - 0s 2ms/step - loss: 0.4994 - accuracy: 0.7617
Epoch 150/150
77/77 [==============================] - 0s 2ms/step - loss: 0.4755 - accuracy: 0.7682
24/24 [==============================] - 0s 461us/step - loss: 0.4892 - accuracy: 0.7708
Accuratezza: 77.08

Note, if you try to run this example on an IPython or Jupyter notebook you may get an error.

The reason is the progress bars of the output during training. You can easily turn them off by setting verbose=0 in the fit() and evaluate() function call, for example:

# Train the model without the progress bars
model.fit(X, y, epochs=150, batch_size=10, verbose=0)
# Evaluate the Keras model
_, accuracy = model.evaluate(X, y, verbose=0)

Note that the results may vary due to the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and comparing the average result.

Neural networks are stochastic algorithms, which means that the same algorithm on the same data can train a different model with different abilities every time the code is run.

The variance in model performance means that to get a reasonable approximation of the model’s performance, it may need to be adjusted multiple times and be averaged the accuracy scores.

For example, below are the accuracy scores of repeating the example 5 times:

Accuracy: 77.08
Accuracy: 77.69
Accuracy: 78.52
Accuracy: 78.78
Accuracy: 78.02

We can see that all accuracy scores are around 78% and the average is 78.018%.

7. How to make predictions after training the model

So we’ve trained our model. What’s now?

We can adapt the example above and use it to generate predictions on the training dataset, pretending it’s a new dataset we’ve never seen before.

Making predictions is as easy as calling the predict() function on the model. We are using a sigmoid activation function on the output layer, so the predictions will be a probability between 0 and 1. We can easily convert them to a neat binary prediction for this classification task by rounding them.

For instance:

# Make predictions using dataset X
predictions = model.predict(X)
# Round predictions
rounded = [round(x[0]) for x in predictions]

Alternatively, we can convert the probability to 0 or 1 to directly predict sharp classes, for example:

# Create a prediction class, which simply rounds to 1 (therefore sick patient) if the prediction is greater than 0.5 which corresponds to 50%
predictions = (model.predict(X) > 0.5).astype(int)

The complete example below makes predictions for each example in the dataset, then prints the input data, the predicted class, and the predicted class for the first 5 examples in the dataset.

# My first neural network
from numpy import loadtxt
from keras.models import Sequential
from keras.layers import Dense
# Load the dataset
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=',')
# Divide the variables in input and output
X = dataset[:,0:8]
y = dataset[:,8]
# Define the model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile the model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Train the model without the progress bars
model.fit(X, y, epochs=150, batch_size=10, verbose=0)
# Evaluate the Keras model
_, accuracy = model.evaluate(X, y, verbose=0)
print('Accuratezza: %.2f' % (accuracy*100))
# Make predictions using dataset X
predictions = (model.predict(X) > 0.5).astype(int)
# Round predictions
rounded = [round(x[0]) for x in predictions]
# Print first 5
for i in range(5):
print('%s => %d (expected %d)' % (X[i].tolist(), predictions[i], y[i]))

Running the example doesn’t show the progress bar as before since we set the verbose argument to 0.

After the model is fitted, predictions are made for all examples in the dataset, and the input rows and the expected class value for the first 5 examples are printed and compared to the expected class value.

We can see that most of the lines are predicted correctly. In fact, we would expect approximately 78% of the rows to be predicted correctly based on the model’s estimated performance in the previous section.

[6.0, 148.0, 72.0, 35.0, 0.0, 33.6, 0.627, 50.0] => 0 (expected 1)
[1.0, 85.0, 66.0, 29.0, 0.0, 26.6, 0.351, 31.0] => 0 (expected 0)
[8.0, 183.0, 64.0, 0.0, 0.0, 23.3, 0.672, 32.0] => 1 (expected 1)
[1.0, 89.0, 66.0, 23.0, 94.0, 28.1, 0.167, 21.0] => 0 (expected 0)
[0.0, 137.0, 40.0, 35.0, 168.0, 43.1, 2.288, 33.0] => 1 (expected 1)

Keras tutorial summary

In this article, you discovered how to build your first neural network model using the powerful Keras Python library for deep learning.

Specifically, you learned the six key steps in using Keras to build a neural network or deep learning model, step-by-step including:

  1. How to upload the data;
  2. How to define a neural network in Keras;
  3. How to build a Keras model using the efficient numeric backend;
  4. How to train a model on the data;
  5. How to evaluate a model on the data;
  6. How to make predictions with the model.

Well done, you have successfully developed your first neural network using the Keras deep learning library in Python.

Some possible extensions

Here we provide some possible extensions you may want to explore.

Tune the model. Change the model configuration or training process and see if you can improve model performance, such as achieving greater than 76% accuracy.

Save the model. Update the tutorial to save the model to a file, then upload it later and use it to make predictions.

Separate train and test data sets. Break up the loaded dataset into a train and a test set (split by rows) and use one set to train the model and the other to estimate the model’s performance on the new data.

Plot learning curves. The fit() function returns a history object that summarizes the loss and accuracy at the end of each epoch. Create linear graphs of this data, called learning curves.

Discover a new dataset. Update the tutorial to use a different tabular dataset, perhaps from the UCI Machine Learning Repository.