Transfer Learning for Kaggle Facial Expression Recognition Challenge

This tutorial introduces transfer learning, and apply it to the Kaggle "Emotion Detection From Facial Expressions" challenge. This challenge is 3 years old, and is fairly simple, and thus serves as a good example to showcase fine-tuning a pre-trained neural network for other purposes. We will also introduce Keras' Sequential API, as well as Keras' data generation and image manipulation APIs.

Introduction

When using a neural network to solve a problem, one can design a network from scratch and spend time training it on a large amount of data. A quicker method is to use an existing neural network that has already been optimized. There are many high-performing networks, such as VGG, Resnet, Inception, etc, that has been extensively trained on vast amount of data, and have very high performance. However, a problem with using existing neural networks is that these networks are optimized on a set of data that is different from you own, and tries to solve a problem that is also different. For example, the VGG16 network is a highly optimized neural network that is trained on the ImageNet set of data, and it tries to identify which of the 1000 categories an image belongs to. The problems that we are trying to solve is often different, therefore we cannot simply use VGG16. Instead, we need to fine-tune the network to our own datasets and change the categories it identifies to the ones we need. This process of fine-tuning an existing network to serve another purpose is called transfer learning. We are transfering a different network for our machine-learning task.

In transfer learning, often the last prediction layer is replaced by a new prediction layer that is suited for the current problem. Also, most of the layers in the pre-trained network is fixed, and its weights will not changed when trained on the new dataset. Only the top few layers is allowed to change when training on new datasets. Often, a few extra layer is also added to the top. This modified network can be used by itsself, or be part of a larger network.

Kaggle Facial Expression Recognition Challenge

The Kaggle "Emotion Detection From Facial Expressions" challenge was introduced in 2016. Its goal is to identify the expression of an image of a face. The expression could be "sad", "neutral", "happy" among others. The dataset of the challenge is available at a github repository. The competition is already closed. The leaderboard is calculated with approximately 50% of the test data. The top standing has 100% accurace, while the second place has 77& accuracy on 50% of the test data.

Data Anlysis

The dataset of the competition was made available at a github repository after the competition. To download a github repository in Unix use Github Contents API to get an archive link and tar to retrieve a specified folder.

Command line: curl https://codeload.github.com/[owner]/[repo]/tar.gz/master | \ tar -xz --strip=2 [repo]-master/[folder_path]

In our case we need the "data" and the "image" folders.

curl https://codeload.github.com/muxspace/facial_expressions/tar.gz/master | \tar -xz --strip=2 facial_expressions-master/data

The data folder contains a csv file that lists the image names and their labels. The image folders stores all the images.

Let's first examine the data. We first load the data into data frames and do some cleaning. We also reserve a portion of the data as validation data. Some portion is also reserved for testing data, since no testing data exist now.

In [6]:
import pandas as pd
df=pd.read_csv('labels/legend.csv') #read into data frame

df.drop(columns="user.id", inplace=True) #drop the userid col, since it is not used
df['emotion'] = df['emotion'].str.lower() #make all the labels lowercase

valid_df = df.sample(frac=0.1) #randomly sampel 10% of the data to use as validation data
train_df = df[~df['image'].isin(valid_df['image'])] #delete validation data from training data

test_df = train_df.sample(frac=0.05) #randomly 5% sample remaining data to use as test data, since no other test data exist now
train_df = train_df[~train_df['image'].isin(test_df['image'])] #delete test data from training data

Then examine how many categories are there and how many images belong to each category. We immediately see 2 problems:

  1. The number of images in the data set is very low. There are only about 13,000 images in total
  2. Most categories are under-represented. Especially the "contempt" and the "fear" category. The "neutral" and "happy" together almost represent the entire data set.

These two issues are significant roadblocks. Some of the categories have so little data that the network will not be able to train on them. The happy and neutral categories contain so many images that during training, if network simply choose one of the categories all the time it will have 50% accuracy.

To resolve these issues, we can do two things:

  1. Increase the number of training images by artificially generating more images through image manipulation.

    • We can do a horizontal flip of the image, this would double the size of our samples. A facial expression remains unchanged if the image is flipped.
    • We can shift the image up, down, left, or right a little. We can also rotate the image a little. We can also skew the image a little. These image manipulations will not alter the facial expressions, but will increase our sample size.
  2. We can weight the categories with fewer images more. That is, instead of randomly presenting training data to the network, we pick training images from categories that have fewer images more. This way, the network would see approximate the smae number of images from each category during training.

In [7]:
categories = train_df['emotion'].unique() #examine how many categories are there
print(categories)
df.groupby('emotion').count() #examine how many samples are in each category
['anger' 'surprise' 'disgust' 'fear' 'neutral' 'happiness' 'sadness'
 'contempt']
Out[7]:
image
emotion
anger 252
contempt 9
disgust 208
fear 21
happiness 5697
neutral 6868
sadness 268
surprise 368

Training Data Generation

Keras has many data preprocessing APIs which we can use. In particular, we will use Keras' data generation API to feed the models. However, in order to use these APIs, the dataset must be stored in specific directories. Keras' data generation API expects images to be sorted in separate "training" and "validation" directories. With each directory, the images are stored in separate sub-directories according to their category. For example, the following directory structure is used:

  • Data
    • Training
      • Category_1
        • image 1
        • image 2
        • image n
      • Category_2
        • image 1
        • image 2
        • image n
      • Category_3
        • image 1
        • image 2
        • image n
    • Validation
      • Category_1
        • image 1
        • image 2
        • image n
      • Category_2
        • image 1
        • image 2
        • image n
      • Category_3
        • image 1
        • image 2
        • image n
In [19]:
import os
os.mkdir('data')
os.mkdir('data/train')
os.mkdir('data/valid')
os.mkdir('data/test')
for emotion in categories:
    train_cat = 'data/train/'+emotion
    valid_cat = 'data/valid/'+emotion
    test_cat  = 'data/test/'+emotion
    os.mkdir(train_cat)
    os.mkdir(valid_cat)
    os.mkdir(test_cat)
In [20]:
import shutil
for index, row in train_df.iterrows():
    file = row['image']
    emotion = row['emotion']
    
    dest_img_path = 'data/train/' + emotion + '/' + file
    src_img_path = 'images/' + file
    
    shutil.copy(src_img_path,dest_img_path)
    
for index, row in valid_df.iterrows():
    file = row['image']
    emotion = row['emotion']
    
    dest_img_path = 'data/valid/' + emotion + '/' + file
    src_img_path = 'images/' + file
    
    shutil.copy(src_img_path,dest_img_path)

for index, row in test_df.iterrows():
    file = row['image']
    emotion = row['emotion']
    
    dest_img_path = 'data/test/' + emotion + '/' + file
    src_img_path = 'images/' + file
    
    shutil.copy(src_img_path,dest_img_path)

After the images are copied to their respective places, we will define our image generators to preprocess our images. Keras provides the ImageDataGenerator API for image preprocessing. The inputs parameters we use for the training image generator are listed below:

  • rotation_range: This is the degree by which the image can be rotated
  • width_shift_range: This is the fraction of width the image can be shifted left or right
  • height_shift_range: This is the fraction of height the image can be shifted up or down
  • rescale: used to scale input
  • shear_range: used to shear the image
  • zoom_range: used to crop the image
  • horizontal_flip: flip image horizontally
  • fill_mode: when doing shifts or shears, how to fill the resulting empty spaces

Notice that we are not using the vertical_flip option here

For the validation data, we only scale the image, no other image manipulation is used.

An excellent explaination of the parameters with photo illustration is given by this website. .

After the image generator is defined, we will define the training and validation data generator by pointing the image generator to the correct directory.

The image generators increase the amount data available for training. But we still need to oversampel the under-represented categories.

In [78]:
from keras.preprocessing.image import ImageDataGenerator

batch_size = 64
img_h, img_w = 224, 224

# this is the augmentation configuration we will use for training
train_datagen = ImageDataGenerator(
        rotation_range=20,
        width_shift_range=0.1,
        height_shift_range=0.1,
        rescale=1./255,
        shear_range=0.01,
        zoom_range=0.1,
        horizontal_flip=True,
        fill_mode='constant')

# this is the augmentation configuration we will use for testing:
# only rescaling
valid_datagen = ImageDataGenerator(rescale=1./255)

# this is a generator that will read pictures found in
# subfolers of 'data/train', and indefinitely generate
# batches of augmented image data
train_generator = train_datagen.flow_from_directory(
        'data/train',  # this is the target directory
        target_size=(img_h, img_w),  # all images will be resized to 224x224
        batch_size=batch_size,
        class_mode='categorical')  # 

# this is a similar generator, for validation data
validation_generator = valid_datagen.flow_from_directory(
        'data/valid',
        target_size=(img_h, img_w),
        batch_size=batch_size,
        class_mode='categorical')
Found 11698 images belonging to 8 classes.
Found 1369 images belonging to 8 classes.

Transfer Model Definition

We use the VGG16 model as our base model. This model does not perform as well as ResNet or some of the other newer models, but it has fewer layers and is faster to train. We import the VGG16 model and delete the top layer by setting "include_top=False". The base model is summaries here.

In [73]:
from keras.applications import VGG16
#Load the VGG model
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(img_h, img_w, 3))
base_model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_11 (InputLayer)        (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________

We then fix all the layers except the last 4 years of the base model. We then add a flatten layer followed by a affine layer. Finally, a softmax layer is added after a dropout layer.

In [75]:
for layer in base_model.layers[:-4]:
    layer.trainable = False
 
# Check the trainable status of the individual layers
for layer in base_model.layers:
    print(layer, layer.trainable)
<keras.engine.input_layer.InputLayer object at 0x7f0dd41b3550> False
<keras.layers.convolutional.Conv2D object at 0x7f0e8a186eb8> False
<keras.layers.convolutional.Conv2D object at 0x7f0e8a186c50> False
<keras.layers.pooling.MaxPooling2D object at 0x7f0e8a1921d0> False
<keras.layers.convolutional.Conv2D object at 0x7f0e856807f0> False
<keras.layers.convolutional.Conv2D object at 0x7f0e85685be0> False
<keras.layers.pooling.MaxPooling2D object at 0x7f0e8562f8d0> False
<keras.layers.convolutional.Conv2D object at 0x7f0e8562fef0> False
<keras.layers.convolutional.Conv2D object at 0x7f0e85615780> False
<keras.layers.convolutional.Conv2D object at 0x7f0e856274a8> False
<keras.layers.pooling.MaxPooling2D object at 0x7f0e855bfc18> False
<keras.layers.convolutional.Conv2D object at 0x7f0e855d8550> False
<keras.layers.convolutional.Conv2D object at 0x7f0e85582198> False
<keras.layers.convolutional.Conv2D object at 0x7f0e85582a20> False
<keras.layers.pooling.MaxPooling2D object at 0x7f0e85534a58> False
<keras.layers.convolutional.Conv2D object at 0x7f0e85534128> True
<keras.layers.convolutional.Conv2D object at 0x7f0e8555b4e0> True
<keras.layers.convolutional.Conv2D object at 0x7f0e8555beb8> True
<keras.layers.pooling.MaxPooling2D object at 0x7f0e8550bdd8> True
In [76]:
import numpy as np
from keras.models import Model
from keras.models import Sequential
from keras.layers import Input
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.utils import plot_model
def define_model(base_model, num_cat):
    inputs1 = Input(shape=(None, None, 3,))
    # Create the model
    model = Sequential()
 
    # Add the vgg convolutional base model
    model.add(base_model)
 
    # Add new layers
    model.add(Flatten())
    model.add(Dense(1024, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(num_cat, activation='softmax'))
    
    # tie it together
    #model = Model(inputs=inputs1, outputs=output)
    #model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])
    
    # summarize model
    print(model.summary())
    plot_model(model, to_file='mode.png', show_shapes=True)
    return model
In [77]:
model = define_model(base_model, 8)
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['acc'])
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
vgg16 (Model)                (None, 7, 7, 512)         14714688  
_________________________________________________________________
flatten_9 (Flatten)          (None, 25088)             0         
_________________________________________________________________
dense_17 (Dense)             (None, 1024)              25691136  
_________________________________________________________________
dropout_9 (Dropout)          (None, 1024)              0         
_________________________________________________________________
dense_18 (Dense)             (None, 8)                 8200      
=================================================================
Total params: 40,414,024
Trainable params: 32,778,760
Non-trainable params: 7,635,264
_________________________________________________________________
None

Model Training

After the model is defined, we need to train the model. The image generators increase the amount data available for training. But we still need to oversampel the under-represented categories. Therefore we need to set the weights for the categories. We set a weight not higher than 3 for the under-represented categories

In [79]:
from collections import Counter

counter = Counter(train_generator.classes)                          
max_val = float(max(counter.values()))       
class_weights = {class_id : np.minimum(max_val/num_images,3) for class_id, num_images in counter.items()}     
print(class_weights)
{0: 3.0, 1: 3.0, 2: 3.0, 3: 3.0, 4: 1.2150226991333057, 5: 1.0, 6: 3.0, 7: 3.0}

We also need to create checkpoints during training. This is so because the training process may take a long time. If the computer crashes, we can restart from a saved checkpoint instead of starting from scratch again. Or, we may be interested in the result from an earlier training epoch. Checkpoints can be created so that a copy of the model will be saved at each epoch, or only the best performing model is saved.

In [ ]:
from keras.callbacks import ModelCheckpoint
filepath="toy-model_1-epoch-{epoch:02d}-val_acc-{val_acc:.2f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=False, mode='auto', period=1)
callbacks_list = [checkpoint]

nb_train_samples = train_df.size
nb_validation_samples = valid_df.size
epochs = 10
history = model.fit_generator(
        train_generator,
        steps_per_epoch=nb_train_samples // batch_size,
        epochs=epochs,
        validation_data=validation_generator,
        validation_steps=nb_validation_samples // batch_size,
        callbacks=callbacks_list,
        class_weight=class_weights)

model.save('toy_model1.h5')
In [110]:
import matplotlib.pyplot as plt
train_loss=[1.8026,1.3262,1.0748,0.9693,0.9058,0.8696,0.8459]
train_acc=[0.4635,0.7228,0.7935,0.8011,0.8127,0.8226,0.8257]
val_loss=[1.0213,0.9180,0.6206,0.6106,0.5370,0.4995,0.5248]
val_acc=[0.6372,0.8265,0.7939,0.8031,0.8188,0.8337,0.8318]
plt.plot(train_loss)
plt.plot(val_loss)
plt.title('Model Loss')
plt.ylabel('Loss')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper right')
plt.show()
plt.plot(train_acc)
plt.plot(val_acc)
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='lower right')
plt.show()

#print(history.history.keys())
# summarize history for accuracy
#plt.plot(history.history['acc'])
#plt.plot(history.history['val_acc'])
#plt.title('model accuracy')
#plt.ylabel('accuracy')
#plt.xlabel('epoch')
#plt.legend(['train', 'test'], loc='upper left')
#plt.show()
# summarize history for loss
#plt.plot(history.history['loss'])
#plt.plot(history.history['val_loss'])
#plt.title('model loss')
#plt.ylabel('loss')
#plt.xlabel('epoch')
#plt.legend(['train', 'test'], loc='upper left')
#plt.show()

The loss and accuracy for training and validation is plotted for each epoch. Note that the traiing loss is higher than validation loss, and training accuracy is lower than validation accuracy. This is because training loss is weighted by different categories, whereas validation loss is uniformly weighted.

In [95]:
test_datagen = ImageDataGenerator(rescale=1./255)
test_generator = test_datagen.flow_from_directory(
    directory="data/test/",
    target_size=(img_h, img_w),
    color_mode="rgb",
    batch_size=77,
    class_mode=None,
    shuffle=False,
    seed=42
)

STEP_SIZE_TEST=test_generator.n//test_generator.batch_size
test_generator.reset()
pred=model.predict_generator(test_generator,
                                steps=STEP_SIZE_TEST,
                                verbose=1)

predicted_class_indices=np.argmax(pred,axis=1)

labels = (train_generator.class_indices)
labels = dict((v,k) for k,v in labels.items())
predictions = [labels[k] for k in predicted_class_indices]
Found 616 images belonging to 1 classes.
8/8 [==============================] - 69s 9s/step
In [100]:
filenames= [i.split('/')[1] for i in test_generator.filenames]
results=pd.DataFrame({"image":filenames,
                      "emotion":predictions})
results.to_csv("results.csv",index=False)
test_df.to_csv("ground_truth.csv")
In [ ]:
result_test_merged = pd.merge(results, test_df, left_on=['image'],
              right_on=['image'],
              how='inner')
compare=result_test_merged['emotion_x']==result_test_merged['emotion_y']
acc = sum(compare)/len(compare)
acc
In [123]:
acc = sum(compare)/len(compare)
acc
Out[123]:
0.8392857142857143

We examine the accuracy of our model on test data. The accuracy is 84%, which is higher than the second place winner on the Kaggle challenge.

In [3]:
%%HTML
<style>
div.prompt {display:none} #div.prompt {display:""} div.prompt {display:none} 
</style>