This tutorial introduces transfer learning, and apply it to the Kaggle "Emotion Detection From Facial Expressions" challenge. This challenge is 3 years old, and is fairly simple, and thus serves as a good example to showcase fine-tuning a pre-trained neural network for other purposes. We will also introduce Keras' Sequential API, as well as Keras' data generation and image manipulation APIs.
When using a neural network to solve a problem, one can design a network from scratch and spend time training it on a large amount of data. A quicker method is to use an existing neural network that has already been optimized. There are many high-performing networks, such as VGG, Resnet, Inception, etc, that has been extensively trained on vast amount of data, and have very high performance. However, a problem with using existing neural networks is that these networks are optimized on a set of data that is different from you own, and tries to solve a problem that is also different. For example, the VGG16 network is a highly optimized neural network that is trained on the ImageNet set of data, and it tries to identify which of the 1000 categories an image belongs to. The problems that we are trying to solve is often different, therefore we cannot simply use VGG16. Instead, we need to fine-tune the network to our own datasets and change the categories it identifies to the ones we need. This process of fine-tuning an existing network to serve another purpose is called transfer learning. We are transfering a different network for our machine-learning task.
In transfer learning, often the last prediction layer is replaced by a new prediction layer that is suited for the current problem. Also, most of the layers in the pre-trained network is fixed, and its weights will not changed when trained on the new dataset. Only the top few layers is allowed to change when training on new datasets. Often, a few extra layer is also added to the top. This modified network can be used by itsself, or be part of a larger network.
The Kaggle "Emotion Detection From Facial Expressions" challenge was introduced in 2016. Its goal is to identify the expression of an image of a face. The expression could be "sad", "neutral", "happy" among others. The dataset of the challenge is available at a github repository. The competition is already closed. The leaderboard is calculated with approximately 50% of the test data. The top standing has 100% accurace, while the second place has 77& accuracy on 50% of the test data.
The dataset of the competition was made available at a github repository after the competition. To download a github repository in Unix use Github Contents API to get an archive link and tar to retrieve a specified folder.
Command line: curl https://codeload.github.com/[owner]/[repo]/tar.gz/master | \ tar -xz --strip=2 [repo]-master/[folder_path]
In our case we need the "data" and the "image" folders.
curl https://codeload.github.com/muxspace/facial_expressions/tar.gz/master | \tar -xz --strip=2 facial_expressions-master/data
The data folder contains a csv file that lists the image names and their labels. The image folders stores all the images.
Let's first examine the data. We first load the data into data frames and do some cleaning. We also reserve a portion of the data as validation data. Some portion is also reserved for testing data, since no testing data exist now.
import pandas as pd
df=pd.read_csv('labels/legend.csv') #read into data frame
df.drop(columns="user.id", inplace=True) #drop the userid col, since it is not used
df['emotion'] = df['emotion'].str.lower() #make all the labels lowercase
valid_df = df.sample(frac=0.1) #randomly sampel 10% of the data to use as validation data
train_df = df[~df['image'].isin(valid_df['image'])] #delete validation data from training data
test_df = train_df.sample(frac=0.05) #randomly 5% sample remaining data to use as test data, since no other test data exist now
train_df = train_df[~train_df['image'].isin(test_df['image'])] #delete test data from training data
Then examine how many categories are there and how many images belong to each category. We immediately see 2 problems:
These two issues are significant roadblocks. Some of the categories have so little data that the network will not be able to train on them. The happy and neutral categories contain so many images that during training, if network simply choose one of the categories all the time it will have 50% accuracy.
To resolve these issues, we can do two things:
Increase the number of training images by artificially generating more images through image manipulation.
We can weight the categories with fewer images more. That is, instead of randomly presenting training data to the network, we pick training images from categories that have fewer images more. This way, the network would see approximate the smae number of images from each category during training.
categories = train_df['emotion'].unique() #examine how many categories are there
print(categories)
df.groupby('emotion').count() #examine how many samples are in each category
Keras has many data preprocessing APIs which we can use. In particular, we will use Keras' data generation API to feed the models. However, in order to use these APIs, the dataset must be stored in specific directories. Keras' data generation API expects images to be sorted in separate "training" and "validation" directories. With each directory, the images are stored in separate sub-directories according to their category. For example, the following directory structure is used:
import os
os.mkdir('data')
os.mkdir('data/train')
os.mkdir('data/valid')
os.mkdir('data/test')
for emotion in categories:
train_cat = 'data/train/'+emotion
valid_cat = 'data/valid/'+emotion
test_cat = 'data/test/'+emotion
os.mkdir(train_cat)
os.mkdir(valid_cat)
os.mkdir(test_cat)
import shutil
for index, row in train_df.iterrows():
file = row['image']
emotion = row['emotion']
dest_img_path = 'data/train/' + emotion + '/' + file
src_img_path = 'images/' + file
shutil.copy(src_img_path,dest_img_path)
for index, row in valid_df.iterrows():
file = row['image']
emotion = row['emotion']
dest_img_path = 'data/valid/' + emotion + '/' + file
src_img_path = 'images/' + file
shutil.copy(src_img_path,dest_img_path)
for index, row in test_df.iterrows():
file = row['image']
emotion = row['emotion']
dest_img_path = 'data/test/' + emotion + '/' + file
src_img_path = 'images/' + file
shutil.copy(src_img_path,dest_img_path)
After the images are copied to their respective places, we will define our image generators to preprocess our images. Keras provides the ImageDataGenerator API for image preprocessing. The inputs parameters we use for the training image generator are listed below:
Notice that we are not using the vertical_flip option here
For the validation data, we only scale the image, no other image manipulation is used.
An excellent explaination of the parameters with photo illustration is given by this website. .
After the image generator is defined, we will define the training and validation data generator by pointing the image generator to the correct directory.
The image generators increase the amount data available for training. But we still need to oversampel the under-represented categories.
from keras.preprocessing.image import ImageDataGenerator
batch_size = 64
img_h, img_w = 224, 224
# this is the augmentation configuration we will use for training
train_datagen = ImageDataGenerator(
rotation_range=20,
width_shift_range=0.1,
height_shift_range=0.1,
rescale=1./255,
shear_range=0.01,
zoom_range=0.1,
horizontal_flip=True,
fill_mode='constant')
# this is the augmentation configuration we will use for testing:
# only rescaling
valid_datagen = ImageDataGenerator(rescale=1./255)
# this is a generator that will read pictures found in
# subfolers of 'data/train', and indefinitely generate
# batches of augmented image data
train_generator = train_datagen.flow_from_directory(
'data/train', # this is the target directory
target_size=(img_h, img_w), # all images will be resized to 224x224
batch_size=batch_size,
class_mode='categorical') #
# this is a similar generator, for validation data
validation_generator = valid_datagen.flow_from_directory(
'data/valid',
target_size=(img_h, img_w),
batch_size=batch_size,
class_mode='categorical')
We use the VGG16 model as our base model. This model does not perform as well as ResNet or some of the other newer models, but it has fewer layers and is faster to train. We import the VGG16 model and delete the top layer by setting "include_top=False". The base model is summaries here.
from keras.applications import VGG16
#Load the VGG model
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(img_h, img_w, 3))
base_model.summary()
We then fix all the layers except the last 4 years of the base model. We then add a flatten layer followed by a affine layer. Finally, a softmax layer is added after a dropout layer.
for layer in base_model.layers[:-4]:
layer.trainable = False
# Check the trainable status of the individual layers
for layer in base_model.layers:
print(layer, layer.trainable)
import numpy as np
from keras.models import Model
from keras.models import Sequential
from keras.layers import Input
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.utils import plot_model
def define_model(base_model, num_cat):
inputs1 = Input(shape=(None, None, 3,))
# Create the model
model = Sequential()
# Add the vgg convolutional base model
model.add(base_model)
# Add new layers
model.add(Flatten())
model.add(Dense(1024, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_cat, activation='softmax'))
# tie it together
#model = Model(inputs=inputs1, outputs=output)
#model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])
# summarize model
print(model.summary())
plot_model(model, to_file='mode.png', show_shapes=True)
return model
model = define_model(base_model, 8)
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['acc'])
After the model is defined, we need to train the model. The image generators increase the amount data available for training. But we still need to oversampel the under-represented categories. Therefore we need to set the weights for the categories. We set a weight not higher than 3 for the under-represented categories
from collections import Counter
counter = Counter(train_generator.classes)
max_val = float(max(counter.values()))
class_weights = {class_id : np.minimum(max_val/num_images,3) for class_id, num_images in counter.items()}
print(class_weights)
We also need to create checkpoints during training. This is so because the training process may take a long time. If the computer crashes, we can restart from a saved checkpoint instead of starting from scratch again. Or, we may be interested in the result from an earlier training epoch. Checkpoints can be created so that a copy of the model will be saved at each epoch, or only the best performing model is saved.
from keras.callbacks import ModelCheckpoint
filepath="toy-model_1-epoch-{epoch:02d}-val_acc-{val_acc:.2f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=False, mode='auto', period=1)
callbacks_list = [checkpoint]
nb_train_samples = train_df.size
nb_validation_samples = valid_df.size
epochs = 10
history = model.fit_generator(
train_generator,
steps_per_epoch=nb_train_samples // batch_size,
epochs=epochs,
validation_data=validation_generator,
validation_steps=nb_validation_samples // batch_size,
callbacks=callbacks_list,
class_weight=class_weights)
model.save('toy_model1.h5')
import matplotlib.pyplot as plt
train_loss=[1.8026,1.3262,1.0748,0.9693,0.9058,0.8696,0.8459]
train_acc=[0.4635,0.7228,0.7935,0.8011,0.8127,0.8226,0.8257]
val_loss=[1.0213,0.9180,0.6206,0.6106,0.5370,0.4995,0.5248]
val_acc=[0.6372,0.8265,0.7939,0.8031,0.8188,0.8337,0.8318]
plt.plot(train_loss)
plt.plot(val_loss)
plt.title('Model Loss')
plt.ylabel('Loss')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper right')
plt.show()
plt.plot(train_acc)
plt.plot(val_acc)
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='lower right')
plt.show()
#print(history.history.keys())
# summarize history for accuracy
#plt.plot(history.history['acc'])
#plt.plot(history.history['val_acc'])
#plt.title('model accuracy')
#plt.ylabel('accuracy')
#plt.xlabel('epoch')
#plt.legend(['train', 'test'], loc='upper left')
#plt.show()
# summarize history for loss
#plt.plot(history.history['loss'])
#plt.plot(history.history['val_loss'])
#plt.title('model loss')
#plt.ylabel('loss')
#plt.xlabel('epoch')
#plt.legend(['train', 'test'], loc='upper left')
#plt.show()
The loss and accuracy for training and validation is plotted for each epoch. Note that the traiing loss is higher than validation loss, and training accuracy is lower than validation accuracy. This is because training loss is weighted by different categories, whereas validation loss is uniformly weighted.
test_datagen = ImageDataGenerator(rescale=1./255)
test_generator = test_datagen.flow_from_directory(
directory="data/test/",
target_size=(img_h, img_w),
color_mode="rgb",
batch_size=77,
class_mode=None,
shuffle=False,
seed=42
)
STEP_SIZE_TEST=test_generator.n//test_generator.batch_size
test_generator.reset()
pred=model.predict_generator(test_generator,
steps=STEP_SIZE_TEST,
verbose=1)
predicted_class_indices=np.argmax(pred,axis=1)
labels = (train_generator.class_indices)
labels = dict((v,k) for k,v in labels.items())
predictions = [labels[k] for k in predicted_class_indices]
filenames= [i.split('/')[1] for i in test_generator.filenames]
results=pd.DataFrame({"image":filenames,
"emotion":predictions})
results.to_csv("results.csv",index=False)
test_df.to_csv("ground_truth.csv")
result_test_merged = pd.merge(results, test_df, left_on=['image'],
right_on=['image'],
how='inner')
compare=result_test_merged['emotion_x']==result_test_merged['emotion_y']
acc = sum(compare)/len(compare)
acc
acc = sum(compare)/len(compare)
acc
We examine the accuracy of our model on test data. The accuracy is 84%, which is higher than the second place winner on the Kaggle challenge.
%%HTML
<style>
div.prompt {display:none} #div.prompt {display:""} div.prompt {display:none}
</style>