Transfer learning with Keras using DenseNet121

6 min readFeb 18, 2021

Abstract

In this article, we can see the steps of training a convolutional neural network to classify the CIFAR 10 dataset using the DenseNet121 architecture. The task is to transfer the learning of a DenseNet121 trained with Imagenet to a model that identify images from CIFAR-10 dataset.The pre-trained weights for DenseNet121 can be found in Keras and downloaded. There are other Neural Network architectures like VGG16, VGG19, ResNet50, Inception V3, etc.

Introduction

Machine Learning and Deep Learning are vast fields with a wide range of applications ranging from image recognition, speech recognition, recommendation and association systems, etc. To build any model from scratch, we require a lot of storage and computing power, which may not always be available to us. Also, we can face situations where we have methods to improve existing models, but the complications of training the models from scratch once again prevent us from doing so. To handle such objectives, we can use the concept of Transfer Learning.

Transfer learning is a machine learning method where a model developed for a task is reused as the starting point for a model on a second task. In this method, pre-trained models are used as the starting point on computer vision and natural language processing tasks instead of developing models from the very beginning. This allows us to handle the challenge of the large amount of computing and storage resources required to develop Deep Learning models. However, it should also be noted that transfer learning only works in deep learning if the model features learned from the first task are general.

Material and Methods

In this practice I used:

DataSet

The CIFAR-10 dataset (Canadian Institute For Advanced Research) is a collection of images that are commonly used to train machine learning and computer vision algorithms.The CIFAR-10 dataset contains 60,000 32x32 color images in 10 different classes. The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are 6,000 images of each class.

Keras 1.12

Keras is an open-source software library that provides a Python interface for artificial neural networks.

Colab using GPU

Colab is a Jupyter notebook environment that runs in the browser using Google Cloud.For me is the best option (cost-effective) that I have seen to compile and train a model. It’s Jupyter saving in drive or uploading to GitHub.

DenseNet-121

Densely Connected Convolutional Networks.I have chosen this model because DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters.

Preprocess

The first thing we will do is to load the CIFAR10 data into our environment and then make use of it.

"""load dataset"""(trainX, trainy), (testX, testy) = K.datasets.cifar10.load_data()x_train, y_train = preprocess_data(trainX, trainy)x_test, y_test = preprocess_data(testX, testy)

And we have used preprocess_data Function

def preprocess_data(X, Y):    """pre-processes the data"""    X_p = X_p = K.applications.densenet.preprocess_input(X)    """one hot encode target values"""    Y_p = K.utils.to_categorical(Y, 10)    return X_p, Y_p

Returns :Preprocessed numpy.array or a tf.Tensor with type float32.

The images are converted from RGB to BGR, then each color channel is zero-centered with respect to the ImageNet dataset, without scaling and the prediction is a one hot encoded values.

The next thing we will do is to define our DenseNet121 and freeze the first 149 layers.

""" USE DenseNet121"""OldModel = K.applications.DenseNet121(include_top=False,input_tensor=input_x,weights='imagenet')for layer in OldModel.layers[:149]:    layer.trainable = Falsefor layer in OldModel.layers[149:]:    layer.trainable = True

Later, we need to connect our pretrained model with the new layers of our model.

model = K.models.Sequential()

The input image of Cifar10 is 32x32, so we put the first layer as a lambda layer that scales up the data to the correct size so (224x224): ‘shape of DenseNet121 input’ we can pass it through the DenseNet121.

"""a lambda layer that scales up the data to the correct size"""model.add(K.layers.Lambda(lambda x:K.backend.resize_images(x,                     height_factor=7,width_factor=7,data_format='channels_last')))

Then, the images go through DenseNet121, we flatten our processed input and pass it through 3 dense layers (with 256 and 128 and 64 neurons ) and Relu as an activation functions respectively.

Each layer has batch normalization beforehand and dropout to avoid overfitting with(0.7,0.5 and 0.3)respectively coming out before the last dense layer, with softmax and 10 neurons.

model.add(OldModel)model.add(K.layers.Flatten())model.add(K.layers.BatchNormalization())model.add(K.layers.Dense(256, activation='relu',kernel_initializer=kernel_init))model.add(K.layers.Dropout(0.7))model.add(K.layers.BatchNormalization())model.add(K.layers.Dense(128, activation='relu',kernel_initializer=kernel_init))model.add(K.layers.Dropout(0.5))model.add(K.layers.BatchNormalization())model.add(K.layers.Dense(64, activation='relu',kernel_initializer=kernel_init))model.add(K.layers.Dropout(0.3))model.add(K.layers.Dense(10, activation='softmax',kernel_initializer=kernel_init))

Then, we have used callbacks in our model to perform certain actions in the training to avoid overfitting by using EarlyStopping.

"""callbacks"""CALLBACKS.append(K.callbacks.ModelCheckpoint(filepath='cifar10.h5',monitor='val_accuracy',save_best_only=True))CALLBACKS.append(K.callbacks.EarlyStopping(monitor='val_accuracy',patience=2))

Finally, once the model is defined, we compile it specifying which will be the optimization function, we will also take into account the cost or loss function and finally which will be the metric to use.

In this case, for the optimization we will use Adam and for the loss function categorical_crossentropy and for the metrics accuracy

model.compile(optimizer=optimizer,loss='categorical_crossentropy',
metrics=['accuracy'])"""train"""model.fit(x=x_train,y=y_train,batch_size=128,epochs=5,
callbacks=CALLBACKS,validation_data=(x_test, y_test))model.summary()model.save('cifar10.h5')

We also used the ModelCheckpoint callback in conjunction with training using model.fit() to save the model (in a checkpoint file) at some interval, so the model can be loaded later to continue the training from the state saved.

Results

Within the results we can see aspects such as loss, accuracy, loss validation and finally the validation of accuracy.

At first, we have got an accuracy 91% but with loss 0.6% that is why we have used ‘ K.callbacks.EarlyStopping ’ and changed up the dropout values from (0.5, 0.5, 0.5) to (0.7, 0.5, 0.3) .

At last we have got an accuracy 89% and loss 0.3%.

Discussion

As is always the case in machine learning, it is hard to form rules that are generally applicable, but here are some guidelines on when transfer learning might be used:

There isn’t enough labeled training data to train your network from scratch.
There already exists a network that is pre-trained on a similar task, which is usually trained on massive amounts of data.
When task 1 and task 2 have the same input.

If the original model was trained using TensorFlow, you can simply restore it and retrain some layers for your task. Keep in mind, however, that transfer learning only works if the features learned from the first task are general, meaning they can be useful for another related task as well. Also, the input of the model needs to have the same size as it was initially trained with. If you don’t have that, add a pre-processing step to resize your input to the needed size.

So, we confirmed that DenseNet121 works best with input images of (224 x 224). As CIFAR-10 have 32 x 32 images, it was necessary to perform a resize. With this adjustment alone, the model can achieve a high accuracy, I think it was the most important for DenseNet121.

Literature Cited

Google Colab: Introduction to Colab and Python
DenseNet: https://keras.io/api/applications/densenet/
Transfer_learning: https://en.wikipedia.org/wiki/Transfer_learning