Real Time Facial Expressions/Emotions Recognition on a Web Interface using Python

9 min readJul 7, 2020

A Real Time Web-Based Application for Detection of Faces on Web Camera using Flask Opencv and face_recognition.

( A Deep Learning Case Study)

Introduction

The Human facial expressions are important for visually expressing a lot more information. Facial expression recognition is essential in the field of human-machine interaction. Automated facial recognition systems have many applications, including understanding of human behavior, diagnosing mental disorders, and synthetic human expression. Identifying facial expressions through computers with high detection rates is still a challenging task.

Two of the most popular methods used in the literature for automated FER systems are geometry and appearance. Facial expression recognition is usually performed in four steps, including pre-processing, face detection, feature extraction, and expression classification.

In this project, we have used a variety of intensive deep learning techniques (convolutional neural networks) to identify the main seven human emotions: ANGER, DISGUST, FEAR, HAPPY, NEUTRAL, SAD, SURPRISE.

What is Facial Detection ?
A facial recognition system is a technology capable of identifying or verifying a person from a digital image or a video frame from a video source. There are multiple methods in which facial recognition systems work, but in general, they work by comparing selected facial features from given image with faces within a database.
It is also described as a Biometric Artificial Intelligence based application that can uniquely identify a person by analyzing patterns based on the person’s facial textures and shape

A facial recognition system is a technology capable of identifying or verifying a person from a digital image or a video frame from a video source. There are multiple methods in which facial recognition systems work, but in general, they work by comparing selected facial features from given image with faces within a database. It is also described as a Biometric Artificial Intelligence based application that can uniquely identify a person by analyzing patterns based on the person’s facial textures and shape

1. PROBLEM

TO CLASSIFY THE EXPRESSION OF FACE IN IMAGE OUT OF SEVEN BASIC HUMAN EXPRESSIONS.

This model can be used for prediction of expressions of both still images and real time video.

Our goal here is to predict the human expressions. We have trained our model on the data consisting of 48x48 pixel grayscale images of human faces . The faces have been automatically registered so that the face is more or less centered and occupies about the same amount of space in each image. The task is to categorize each face based on the emotion shown in the facial expression in to one of seven categories.

For prediction we have decided to keep the size of each image 48*48.

2. Objectives & Constraints

Objective:

Our objective is to predict the expression of human face in real time as fast and as accurate as possible.

Constraints:

1. Latency: Given an image, the system should be able to predict the expression immediately and transfer the result. Hence, there is a low latency requirement.

2. Interpretability: Interpretability is important for still images but not in real time. For still images, probability of predicted expressions can be given.

3. Accuracy: Our goal is to predict the expression of a face in the image as accurate as possible. Higher the test accuracy, the better our model will perform in real world.

3. Performance Metric

This is a multi-class classification problem with 7 different classes, so we have considered three performance metrics:

1. Multi-Class Log-loss: We have used deep learning model with cross-entropy layer in the end with seven softmax units, so therefore our goal is to reduce the multi-class log loss/cross-entropy loss.

2. Accuracy: This tells us how accurately our model performs in predicting the expressions.

3. Confusion Metric: Since our problem is multi-class classification, so confusion metric will helps us to know which classes are more dominant over others or towards which class the model is more biased. This gave us the clear picture of the prediction result of the model.

4. Source Dataset Description

The dataset used in this project work has been taken from the Kaggle.com available at (https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data) i.e. a FER dataset. The data consists of 48x48 pixel grayscale images of faces.

The task is to categorize each face based on the emotion shown in the facial expression in to one of seven categories (0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral).

train.csv contains two columns, “emotion” and “pixels”. The “emotion” column contains a numeric code ranging from 0 to 6, inclusive, for the emotion that is present in the image. The “pixels” column contains a string surrounded in quotes for each image. The contents of this string a space-separated pixel values in row major order. test.csv contains only the “pixels” column and your task is to predict the emotion column.

The training set consists of 28,709 examples. The public test set used for the leaderboard consists of 3,589 examples. The final test set, which was used to determine the winner of the competition, consists of another 3,589 examples.

5. Libraries

We have used almost all of the same libraries which are used in normal ML/DL problems like pandas, numpy, matplotlib, sklearn etc. But here I want to highlight two important libraries.

OpenCV: This is one of the library which is widely used in processing images, particularly real time images.

pip install opencv-python

2. Keras: This is one of the library which is used to code deep learning models. In its back-end it uses Tensorflow.

pip install keras

3. Flask: Flask is a popular Python web framework, meaning it is a third-party Python library used for developing web applications.

pip install Flask

6. Project Formulation

The hands on building this project of Facial Expression Recognition is divided into following tasks/steps:-

A. Task 1: Introduction

· Introduction to the dataset

· Import essential modules and helper functions from NumPy, Matplotlib, and Keras.

B. Task 2: Exploring the Dataset

· Display some images from every expression type in the Emotion FER dataset.

C. Task 3: Generating Training and Validation Batches

· Generate batches of tensor image data with real-time data augmentation.

· Specify paths to training and validation image directories and generates batches of augmented data.

D. Task 4: Creating a Convolutional Neural Network (CNN) Model

· Design a convolutional neural network with 4 convolution layers and 2 fully connected layers to predict 7 types of facial expressions.

· Used Adam as the optimizer, categorical crossentropy as the loss function, and accuracy as the evaluation metric.

CNN (Convolutional Neural Network) Model

E. Task 5: Training and Evaluating Model

· Training the CNN by invoking the model.fit() method.

· Used ModelCheckpoint() to save the weights associated with the higher validation accuracy.

· Observed live training loss and accuracy plots in Jupyter Notebook for Keras.

F. Task 6: Saving and Serializing Model as JSON String

· Used to_json(), which uses a JSON string, to store the model architecture.

G. Task 7: Creating a Flask App to Serve Predictions

· We used the open-source code from “Video Streaming with Flask Example” to create a flask app to serve the model’s prediction images directly to a web interface.

H. Task 8: Creating a Class to Output Model Predictions

· Created a FacialExpressionModel class to load the model from the JSON file, load the trained weights into the model, and predict facial expressions.

I. Task 9: Designed an HTML Template for the Flask App

· Designed a basic template in HTML to create the layout for the Flask app.

J. Task 10: Used Model to Recognize Facial Expressions at the Real Time using laptops webcamera

· We than run the main.py script to create the Flask app and serve the model’s predictions to a web interface.

· Applied the model for real time recognition of facial expresssions of users using webcam of the Laptop.

7. Project Structure

Facial expression recognition is a process performed by humans or computers, which consist of:-

1. Locating faces in the scene (e.g., in an image; this step is also referred to as face detection)

2. Extracting facial features from the detected face region (e.g., detecting the shape of facial components or describing the texture of the skin in a facial area; this step is referred to as facial feature extraction),

3. Analyzing the motion of facial features and/or the changes in the appearance of facial features

8. Final Processing of images

As per various surveys it is found that for implementing this project four basic steps are required to be performed.

i.) Preprocessing

ii.) Face registration

iii.) Facial feature extraction

iv.) Emotion classification

9. Creating bottleneck features from CNN model. (Transfer learning).

What we have done here is that we have passed our each image one-by-one through this network and generate bottleneck features and stored them in numpy array. We just have to use model.predict() function of CNN and generate bottleneck features for our images.

Finally, for all of our images, we have generated bottleneck features and saved then in our hard-disk. In this way we have used transfer learning of CNN model for our own task.

10. Modelling and Training

Now, we already have bottleneck features for each of our images. Now our task is to create top-model means MLP model which will take bottleneck feature of each image one-by-one and reduce Multi-Class Log-Loss/Cross-Entropy loss. For this we have designed following Neural network.

model = Sequential()
model.add(Dense(512, activation='relu', input_dim = input_shape))
model.add(Dropout(0.1))
model.add(Dense(256, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(64, activation='relu'))
model.add(Dense(output_dim = 7, activation='softmax'))

As we can see above we have basically 5 dense fully connected layers. All contains relu activation units. First contains 512 activation units, second contains 256 activation units, third contains 128 activation units, fourth contains 64 activation units and fifth layer is the output layer which contains 7 softmax units. These softmax units are nothing but generalization of logistic regression to multi-class setting. In a nutshell it is multi-class log loss. It will generate 7 probability values corresponding to seven classes. The sum of all the probability value is one. This result will then feed to final cross-entropy loss which is minimized through back-propagation. In this way, our MLP model will be trained enough to classify facial expressions in the images.

In the above model we must have observed that we have used very small to no dropout rate. Initially, we began with ‘0.5’ dropout rate in between first four layers. But, after 12 epochs we observed that our training and CV loss was not reducing. We gradually decrease our dropout rate and we observed that our both training loss and CV decreased and thereby our training and CV accuracy increased.

We ran our model till for 15 epochs and we got following figures from epoch 1 till epoch 20:

11. Test Results/ Output Screenshots

We have tested our model in real time and here also our model performed good if not perfect.

Check out the video below.

Real Time Testing of Facial Expressions Recognition Model

Screenshots of Prediction of the proposed model using the Web Camera of the Laptop

12. Further scope

We have got a pretty good result but still there is a huge scope of improvement.

1. In order to get better accuracy we need much more human images with good variance among them.

2. We can also fine tune last 2 or 3 convolution blocks of CNN layer to increase accuracy.

3. We can also design our own CNN model if we have time and computation power. Of-course, we need much more images for this. But by careful hyper-parameter tuning and training the model on 100k human images with good variance among them and by keeping the size of each image higher than 400*400, we can achieve close to 99% accuracy on real world and in real time.

13. Conclusion

In this project a Emotion/Facial Recognition model has been trained and saved. It can recognize/detect the facial expressions of an individual on a real time basis that whether the individual is Neutral, Angry, Disgust, Fear, Happy, Sad, Surprised.

The entire project code is available in the following Github Repository: Real-Time-Facial-Emotions-Detection-Model-using-a-Web-Interface.

14. References

https://medium.com/@hinasharma19se/facial-expressions-recognition-b022318d842a
https://www.kaggle.com/ashishpatel26/tutorial-facial-expression-classification-keras

15. Full Code can be found at the link below

https://github.com/MayankBimbra/Real-Time-Facial-Emotions-Detection-Model-using-a-Web-Interface

Thank you for your time.