Emotion Recognition With Python, OpenCV and a Face Dataset

Datetime:2016-08-22 23:40:10          Topic: OpenCV  Python           Share

Having your computer know how you feel? Madness!

Or actually not madness, but OpenCV and Python. In this tutorial we’ll write a little program to see if we can recognise emotions from images.

How cool would it be to have your computer recognize the emotion on your face? You could make all sorts of things with this, from a dynamic music player that plays music fitting with what you feel, to an emotion-recognizing robot.

For this tutorial I assume that you have:

Important: The code in this tutorial is licensed under the GNU 3.0 open source license and you are free to modify and redistribute the code, given that you give others you share the code with the same right, and cite my name (use citation format below). You are not free to redistribute or modify the tutorial itself in any way. By reading on you agree to these terms. If you disagree, please navigate away from this page.

Citation format

van Gent, P. (2016). Emotion Recognition With Python, OpenCV and a Face Dataset. A tech blog about fun things with Python and embedded electronics. Retrieved from:

http://www.paulvangent.com/2016/04/01/emotion-recognition-with-python-opencv-and-a-face-dataset/

IE users:I’ve gotten several reports that sometimes the code blocks don’t display correctly or at all on Internet Explorer. Please refresh the page and they should display fine.

Getting started

To be able to recognize emotions on images we will use OpenCV . OpenCV has a few ‘facerecognizer’ classes that we can also use for emotion recognition. They use different techniques, of which we’ll mostly use the Fisher Face one. For those interested in more background; this page has a clear explanation of what a fisher face is.

Request and download the dataset, here (get the CK+) . I cannot distribute it so you will have to request it yourself, or of course create and use your own dataset.

Once you have the CK+ dataset, extract it and look at the readme. It is organised into two folders, one containing images, the other txt files with emotions encoded that correspond to the kind of emotion shown. From the readme of the dataset, the encoding is: {0=neutral, 1=anger, 2=contempt, 3=disgust, 4=fear, 5=happy, 6=sadness, 7=surprise}.

Let’s go!

Organising the dataset

First we need to organise the dataset. In the directory you’re working, make two folders called “source_emotion” and “source_images”. Extract the dataset and put all folders containing the txt files (S005, S010, etc.) in a folder called “source_emotion”. Put the folders containing the images in a folder called “source_images”. Also create a folder named “sorted_set”, to house our sorted emotion images. Within this folder, create folders for the emotion labels (“neutral”, “anger”, etc.).

In the readme file, the authors mention that only a subset (327 of the 593) of the emotion sequences actually contain archetypical emotions. Each image sequence consists of the forming of an emotional expression, starting with a neutral face and ending with the emotion. So, from each image sequence we want to extract two images; one neutral (the first image) and one with an emotional expression (the last image). To help, let’s write a small python snippet to do this for us:

import glob
from shutil import copyfile

emotions = ["neutral", "anger", "contempt", "disgust", "fear", "happy", "sadness", "surprise"] #Define emotion order
participants = glob.glob("source_emotion\\*") #Returns a list of all folders with participant numbers

for x in participants:
    part = "%s" %x[-4:] #store current participant number
    for sessions in glob.glob("%s\\*" %x): #Store list of sessions for current participant
        for files in glob.glob("%s\\*" %sessions):
            current_session = files[20:-30]
            file = open(files, 'r')
            
            emotion = int(float(file.readline())) #emotions are encoded as a float, readline as float, then convert to integer.
            
            sourcefile_emotion = glob.glob("source_images\\%s\\%s\\*" %(part, current_session))[-1] #get path for last image in sequence, which contains the emotion
            sourcefile_neutral = glob.glob("source_images\\%s\\%s\\*" %(part, current_session))[0] #do same for neutral image
            
            dest_neut = "sorted_set\\neutral\\%s" %sourcefile_neutral[25:] #Generate path to put neutral image
            dest_emot = "sorted_set\\%s\\%s" %(emotions[emotion], sourcefile_emotion[25:]) #Do same for emotion containing image
            
            copyfile(sourcefile_neutral, dest_neut) #Copy file
            copyfile(sourcefile_emotion, dest_emot) #Copy file

Extracting faces

The classifier will work best if the training and classification images are all of the same size and have (almost) only a face on them (no clutter). We need to find the face on each image, convert to grayscale, crop it and save the image to the dataset. We can use a HAAR filter from OpenCV to automate face finding. Actually, OpenCV provides 4 pre-trained classifiers, so to be sure we detect as many faces as possible let’s use all of them in sequence, and abort the face search once we have found one. Get them from the OpenCV directory orfrom hereand extract to the same file you have your python files.

Create another folder called “dataset”, and in it create subfolders for each emotion (“neutral”, “anger”, etc.). The dataset we can use will live in these folders. Then, detect, crop and save faces as such;

import cv2
import glob

faceDet = cv2.CascadeClassifier("haarcascade_frontalface_default.xml")
faceDet2 = cv2.CascadeClassifier("haarcascade_frontalface_alt2.xml")
faceDet3 = cv2.CascadeClassifier("haarcascade_frontalface_alt.xml")
faceDet4 = cv2.CascadeClassifier("haarcascade_frontalface_alt_tree.xml")

emotions = ["neutral", "anger", "contempt", "disgust", "fear", "happy", "sadness", "surprise"] #Define emotions

def detect_faces(emotion):
    files = glob.glob("sorted_set\\%s\\*" %emotion) #Get list of all images with emotion

    filenumber = 0
    for f in files:
        frame = cv2.imread(f) #Open image
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) #Convert image to grayscale
        
        #Detect face using 4 different classifiers
        face = faceDet.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=10, minSize=(5, 5), flags=cv2.CASCADE_SCALE_IMAGE)
        face2 = faceDet2.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=10, minSize=(5, 5), flags=cv2.CASCADE_SCALE_IMAGE)
        face3 = faceDet3.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=10, minSize=(5, 5), flags=cv2.CASCADE_SCALE_IMAGE)
        face4 = faceDet4.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=10, minSize=(5, 5), flags=cv2.CASCADE_SCALE_IMAGE)

        #Go over detected faces, stop at first detected face, return empty if no face.
        if len(face) == 1:
            facefeatures = face
        elif len(face2) == 1:
            facefeatures == face2
        elif len(face3) == 1:
            facefeatures = face3
        elif len(face4) == 1:
            facefeatures = face4
        else:
            facefeatures = ""
        
        #Cut and save face
        for (x, y, w, h) in facefeatures: #get coordinates and size of rectangle containing face
            print "face found in file: %s" %f
            gray = gray[y:y+h, x:x+w] #Cut the frame to size
            
            try:
                out = cv2.resize(gray, (350, 350)) #Resize face so all images have same size
                cv2.imwrite("dataset\\%s\\%s.jpg" %(emotion, filenumber), out) #Write image
            except:
               pass #If error, pass file
        filenumber += 1 #Increment image number

for emotion in emotions: 
    detect_faces(emotion) #Call functiona

The last step is to clean up the “neutral” folder. Because most participants have expressed more than one emotion, we have more than one neutral image of the same person. This could (not sure if it will, but let’s be conservative) bias the classifier accuracy unfairly, it may recognize the same person on another picture or be triggered by other characteristics rather than the emotion displayed.

Creating the training and classification set

Now we get to the fun part! The dataset has been organised and is ready to be recognized, but first we need to actually teach the classifier what certain emotions look like. The usual approach is to split the complete dataset into a training set and a classification set. We use the training set to teach the classifier to recognize the to-be-predicted labels, and use the classification set to estimate the classifier performance.

Notethe reason for splitting the dataset: estimating the classifier performance on the same set as it has been trained is unfair, because we are not interested in how well the classifier memorizes the training set. Rather, we are interested in how well the classifier generalizes its recognition capability to never-seen-before data.

In any classification problem; the sizes of both sets depend on what you’re trying to classify, the size of the total datset, the number of features, the number of classification targets (categories). It’s a good idea to plot a learning curve . We’ll get into this in another tutorial.

For now let’s create the training and classification set, we randomly sample and train on 80% of the data and classify the remaining 20%, and repeat the process 10 times. Afterwards we play around with several settings a bit and see what useful results we can get.

import cv2
import glob
import random
import numpy as np

emotions = ["neutral", "anger", "contempt", "disgust", "fear", "happy", "sadness", "surprise"] #Emotion list
fishface = cv2.createFisherFaceRecognizer() #Initialize fisher face classifier

data = {}

def get_files(emotion): #Define function to get file list, randomly shuffle it and split 80/20
    files = glob.glob("dataset\\%s\\*" %emotion)
    random.shuffle(files)
    training = files[:int(len(files)*0.8)] #get first 80% of file list
    prediction = files[-int(len(files)*0.2):] #get last 20% of file list
    return training, prediction

def make_sets():
    training_data = []
    training_labels = []
    prediction_data = []
    prediction_labels = []
    for emotion in emotions:
        training, prediction = get_files(emotion)
        #Append data to training and prediction list, and generate labels 0-7
        for item in training:
            image = cv2.imread(item) #open image
            gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) #convert to grayscale
            training_data.append(gray) #append image array to training data list
            training_labels.append(emotions.index(emotion))
    
        for item in prediction: #repeat above process for prediction set
            image = cv2.imread(item)
            gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
            prediction_data.append(gray)
            prediction_labels.append(emotions.index(emotion))

    return training_data, training_labels, prediction_data, prediction_labels

def run_recognizer():
    training_data, training_labels, prediction_data, prediction_labels = make_sets()
    
    print "training fisher face classifier"
    print "size of training set is:", len(training_labels), "images"
    fishface.train(training_data, np.asarray(training_labels))

    print "predicting classification set"
    cnt = 0
    correct = 0
    incorrect = 0
    for image in prediction_data:
        pred, conf = fishface.predict(image)
        if pred == prediction_labels[cnt]:
            correct += 1
            cnt += 1
        else:
            incorrect += 1
            cnt += 1
    return ((100*correct)/(correct + incorrect))

#Now run it
metascore = []
for i in range(0,10):
    correct = run_recognizer()
    print "got", correct, "percent correct!"
    metascore.append(correct)

print "\n\nend score:", np.mean(metascore), "percent correct!"

Let it run for a while. In the end on my machine this returned 69.3% correct. This may not seem like a lot at first, but remember we have 8 categories . If the classifier learned absolutely nothing and just assigned class labels randomly we would expect on average (1/8)*100 =  12.5% correct. So actually it is already performing really well. Now let’s see if we can optimize it.

Optimizing Dataset

Let’s look critically at the dataset. The first thing to notice is that we have very few examples for “contempt” (18), “fear” (25) and “sadness” (28). I mentioned it’s not fair to predict the same dataset as the classifier has been trained on, and similarly it’s also not fair to give the classifier only a handful of examples and expect it to generalize well.

Change the emotionlist so that “contempt”, “fear” and “sadness” are no longer in it, because we really don’t have enough examples for it:

#Change from:
emotions = ["neutral", "anger", "contempt", "disgust", "fear", "happy", "sadness", "surprise"]

#To:
emotions = ["neutral", "anger", "disgust", "happy", "surprise"]

Let it run for a while again. On my computer this results in 82.5% correct. Purely by chance we would expect on average (1/5)*100 = 20% , so the performance is not bad at all. However, something can still be improved.

Providing a more realistic estimate

Performance so far is pretty neat! However, the numbers might not be very reflective of a real-world application. The data set we use is very standardized. All faces are exactly pointed at the camera and the emotional expressions are actually pretty exaggerated and even comical in some situations. Let’s see if we can append the dataset with some more natural images. For this I used google image search and the chrome plugin ZIG lite to batch-download the images from the results.

If you want, do this yourself, clean up the images. Make sure for each image that there is no text overlayed on the face, the emotion is recognizable, and the face is pointed mostly at the camera. Then adapt the facecropper script a bit and generate standardized face images.

Alternatively, save yourself an hour of work and download the set I generated and cleaned.

Merge both datasets and run again on all emotion categories except for “contempt” (so re-include “fear” and “sadness”), I could not find any convincing source images for this emotion.

This gave 61.6% correct. Not bad, but not great either. Despite what we would expect at chance level (14.3%), this still means the classifier will be wrong 38.4% of the time. I think the performance is actually really impressive, considering that emotion recognition is quite a complex task. However impressive, I admit an algorithm that is wrong almost half the time is not very practical.

Speaking about a practical perspective; depending on the goal, an emotion classifier might not actually need so many categories. For example, a dynamic music player that plays songs fitting to your mood would already work well if it recognized anger, happiness and sadness. Using only these categories I get 77.2% accurate. That is a more useful number! This means that almost 4 out of 5 times it will play a song fitting to your emotional state. In a next tutorial we will build such a player.

The spread of accuracies between different runs is still quite large, however. This either indicates the dataset is too small to accurately learn to predict emotions, or the problem is simply too complex. My money is mostly on the former. Using a larger dataset will probably enhance the detection quite a bit.

Looking at mistakes

The last thing that might be nice to look at is what mistakes the algorithm makes. Maybe the mistakes are understandable, maybe not. Add an extra line to the the last part of the function run_recognizer() to copy images that are wrongly classified, also create a folder “difficult” in your root working directory to house the images:
def run_recognizer():
    training_data, training_labels, prediction_data, prediction_labels = make_sets()
    
    print "training fisher face classifier"
    print "size of training set is:", len(training_labels), "images"
    fishface.train(training_data, np.asarray(training_labels))

    print "predicting classification set"
    cnt = 0
    correct = 0
    incorrect = 0
    for image in prediction_data:
        pred, conf = fishface.predict(image)
        if pred == prediction_labels[cnt]:
            correct += 1
            cnt += 1
        else:
            cv2.imwrite("difficult\\%s_%s_%s.jpg" %(emotions[prediction_labels[cnt]], emotions[pred], cnt), image) #<-- this one is new
            incorrect += 1
            cnt += 1
    return ((100*correct)/(correct + incorrect))

I ran it on all emotions except “contempt”, and ran it only once ( for i in range(0,1) ).

Some mistakes are understandable, for instance:

“Surprise”, classified as “Happy” , honestly it’s a bit of both

“Disgust”, classified as “Sadness” , he could also be starting to cry.

“Sadness”, classified as “Disgust”

But most are less understandable, for example:

“Anger”, classified as “Happy”

“Happy”, classified as “Neutral”

It’s clear that emotion recognition is a complex task, more so when only using images. Even for us humans this is difficult because the correct recognition of a facial emotion often depends on the context within which the emotion originates and is expressed.

I hope this tutorial gave you some insight into emotion recognition and hopefully some ideas to do something with it. Did anything cool with it or want to try something cool? Let me know below in the comments!

The dataset used in this article is the CK+ dataset, based on the work of:

– Kanade, T., Cohn, J. F., & Tian, Y. (2000). Comprehensive database for facial expression analysis. Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition (FG’00), Grenoble, France, 46-53.

– Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010). The Extended Cohn-Kanade Dataset (CK+): A complete expression dataset for action unit and emotion-specified expression. Proceedings of the Third International Workshop on CVPR for Human Communicative Behavior Analysis (CVPR4HB 2010), San Francisco, USA, 94-101.





About List