John Kramer runs a chain of restaurants catering to executives and business travelers. His focus has always been high quality customer engagement. To enhance customer experience John installed a system in each of his restaurants that greeted every visitor. Now he wants to leverage the latest technologies and welcome frequent visitors by name. He hopes that the new system would also overcome the shortcomings of the existing one. For example, it broadcasts a welcome message each time the door to the restaurant is opened, even if they are the employees. This drains the system power quickly as also irritates customers who are dining.
John wants a system that identifies the person walking through the door and greets them by name if they are regulars. The system should not say anything to the employees when they walk in, reducing unnecessary noise. Any new customer can be welcomed with the general welcome message.
What John needs is a facial recognition system. The technology that identifies or verifies a face in an image or a video is called face recognition. Besides the example scenario discussed here, there are situations where it is essential to automatically recognize people, whether in a photograph, a video or in-person. Some of these instances could be:
- Restrict access to online resources like files servers
- Match ID with face for verification
- Identify a person in an uploaded image or video
- Using biometric security for devices like laptops and smartphones
- Validating online transactions
- Mass surveillance in sensitive spaces like airports, sports stadia, university campus etc.
Face ID is a more effective tool than other biometric recognition systems like fingerprinting because it is a contactless identification system, which is also more relevant after the COVID-19 pandemic.
As humans, recognizing faces comes naturally to us. But think of a child who is still growing and learning to identify people. If you observe carefully you will realize that small children tend to confuse between people who have similar facial features like thick eyebrows or long beard or broad forehead or cleft lip. A machine has to be trained step-by-step, like a child learns, and this is achieved through deep learning technologies. Before getting into how a facial recognition system can be built into an app, let’s see how the technology developed.
Early days of facial recognition
Facial recognition is not a new phenomenon. For decades it has been the most important biometric technique for identity authentication, especially in critical areas like military, finance, public security, etc., that always place a premium on security.
Manual facial recognition system
Woodrow Wilson Bledsoe, who manually implemented facial recognition in the 1960s, is considered the father of facial recognition. He developed a system that classified photos through a RAND tablet, a graphical computer input device. Bledsoe manually recorded the coordinates of facial features like eyes, nose, mouth, hairline, etc from photos. These coordinates were then plotted and saved in a database. New photographs were plotted against the database to identify individuals that had the closest numerical resemblance based on the given information. Facial recognition was further refined in the 1970 by Goldstein, Harmon and Leask, but it was still mostly a manually computed process.
Computer assisted facial recognition system
The first computer assisted facial recognition technique was the eigenface approach where linear algebra was used for low dimensional representation of facial images. It was proved that less than hundred values were required to accurately code a normalized image of a face. This method is still used today as basis of many deep learning algorithms.
Face Recognition using Deep learning algorithms
Deep learning algorithms use very large datasets of faces to train facial recognition systems to detect and identify new faces. The credit for the modern-day facial recognition systems goes to the annual ImageNet Large Scale Visual Recognition challenge established in 2010. ImageNet is a large visual database especially developed for use in visual object recognition software research. The point to be noted here is that facial recognition is a special case of objects recognition, where only faces need to be recognized.
Till date more than 14 million images have been hand annotated by the ImageNet project, out of which at least a million images have bounded boxes provided. You will understand why bounded boxes are important later, when we discuss how facial recognition algorithms work.
This figure here shows that finally in 2015 the deep learning algorithms had better recognition rates than humans in the annual competition.
How apps can integrate facial recognition
Demand for facial recognition in applications is going up because it is an effective way of ensuring system security, user safety as well as user engagement. The growing interest can be gauged from the fact that face recognition market that was worth USD 3.2 billion in 2019 is set to grow at a CAGR of 16.6 % and stand at USD 7.0 billion in 2024.
This growth was expected due to expanding surveillance market, growing technological advancements and rising government and defence deployment. The COVID-19 pandemic is expected to drive the demand further as more and more businesses adopt digital transformation and look for secure ways to authenticate access to online assets, validate financial transactions and ramp up cybersecurity.
Different approaches to identifying a human face
- 2D recognition — it is much in demand specially for biometric identification purposes to secure anything from a smartphone to military facility.
- 3D recognition – it is inferior to 2D recognition on but it is gaining popularity because it can reconstruct 3D images of the subject.
- Color-based/texture-based face search — In this system, areas with typical skin color or texture are identified first and then the face localized.
- Face recognition in controlled background — this is used in scenarios where it is known beforehand that the new image would be clicked against a solid background. As we discuss how algorithms work, you will realise that richer the background, higher is the effort needed in identifying a face.
- Face search by motion — this is typically used in video images where reference points like blinking eyes, nostrils come of forehead, mouth or eyebrows can be used to localize the face.
- Thermal imaging – Thermal imaging provides added information about a face in terms of thermal images. When both visual and thermal imagers are used are the chances of getting a match increase. It is especially useful when visual images are of poor quality due to poor lighting, aging, different poses, etc.
The method integrated into the application depends upon the requirement.
How to integrate facial recognition into a new or existing app
To integrate facial recognition into a new or existing application, you need:
- Video camera
- A powerful server to store data
- Detection, comparison and recognition algorithms
- Trained neural networks with access to images
In the age of smartphones where every person has a high-quality HD camera in their hands, businesses need not worry about how users will use facial recognition app. Video cameras for business use like in restaurants, convention centres, research facilities, etc. are also very affordable. Cloud Computing has made access to powerful servers that can store, process and serve high data volumes both accessible and economical.
It's the last two components — algorithms and trained neural networks – that need to be worked around. Here is how these algorithms identify faces.
The first step is detecting faces present in the input provided. This input could be in the form of images or videos. Also, the system may be required to detect one or more faces. Face detection comes under the category of object detection. The system identifies an object as face and demarcates it, that is, localizes its extent with a box. Face detection is the most critical step because if a face is not detected, it can never be identified.
Data normalization / alignment
The faces detected in the previous step often need to be normalized so that they are consistent with the database. It is not necessary that the faces detected are always front facing. They could be side profiles or looking in different directions or shot under poor lighting. The system should be able to identify the face of a person even if pose, illumination and expression are different.
The next step in facial recognition is to extract features that can be used to identify the face. Here convolutional and autoencoder networks are used. Each database has a predefined set of features that must be extracted from each detected face so that it can be identified successfully.
In 1970, when Harman, Goldstein and Leask refined manual facial recognition systems, they used 21 facial markers like lip thickness, hairline, hair colour etc. to detect faces automatically. Modern algorithms extract 64 or 128 facial markers, also called embeddings. This is the step where alternate faces for the same collection of features can be generated for future reference.
This is the step where actual recognition happens by comparing the identified features with database. One must understand that practically there never is 100% match; each system has to define its own threshold above which the face will be considered recognized. Usually this is 80%. If there is a match of 80% or more, the system would return an identified status. Anything below this, the system will return an “unidentified” status.
Applications can choose to increase or decrease this threshold depending on their requirements. Usually military installations, sensitive research facilities, financial transactions, etc., which need very high level of security may increase the threshold. But for all other purposes the 80% threshold works just fine. When the threshold is decreased it may result in excess live and authorised people.
Deep learning to train facial recognition systems
The algorithms need to be trained on how to localize faces, extract features from them and then identify them. Deep learning systems use existing databases of millions of images to train face recognition systems using detection, extraction and comparing algorithms.
These algorithms are designed like an animal’s neural network and are called Artificial Neural Networks (ANN). Two types of ANNs – convolutional neural networks and deep auto-encoder networks – are the most popular algorithms for training facial recognition systems.
Convolutional neural networks
It is a set of deep feed-forward artificial neural network to analyze visual images. CNN can take into account the 2D topology of an image, and minimizes the effect of scale changes, turns and angles, biases and other distortions in the input image.
It is used in
- Supervised learning for object classification, recognition and detection
- Unsupervised learning for image segmentation
- Image Compression
Deep auto-encoder networks
These are networks used in an unsupervised learning mode for reducing the dimensionality of the input. It helps in optimizing the time required for matching the given input with that in the database. Basically, an auto-encoder is a set of encoder and decoder. The encoder takes the input, shrinks it into a simple vector which then passes through the CNN. The output provided by the CNN is then decoded and provided as output. Since the convoluted neural network gets a compressed feature vector, the time and resources required for identifying the face is optimized.
Face recognition frameworks
Facial recognition is one the features of many apps being developed. Therefore, developers cannot be expected to start facial recognition from scratch, writing their own algorithms and training them using huge data sets. Access to such high volumes of images itself would be a challenge. Also, most of the algorithms are specialized, doing just 2-3 of the steps really well. To build a facial recognition system, developers need to use at least two algorithms to achieve the desired outcome.
There are many facial recognition frameworks and APIs available, which can be used to build the facial recognition systems. Some of the most popular ones in 2020 include:
- Microsoft computer vision
- Lambda Labs
Out of these, OpenCV and Face++ are free to use.
How TechAhead can help develop apps with facial recognition capabilities
TechAhead has a team of consultants and developers provide end to end facial recognition app integration and development as per client requirement. Recognition in apps being a relatively new technology, often businesses are not clear about what they need. Our experts understand your unique requirements to bridge the gap between what is required and what is achievable, and design the system accordingly. We apply industry benchmarks in developing secure and robust facial recognition systems that are easy to train, install and use.
Technology that identifies or verifies a face in an image or video is called face recognition technology. There are many use cases for facial recognition systems as often it is imperative to recognize people automatically, be it in a photograph, a video or in-person. It is a very effective biometric recognition tool in the current COVID-19 scenario because it is a more robust authentication system and contactless. However, a machine has to be trained step by step, and this is done most effectively using deep learning technologies.
Woodrow Wilson Bledsoe, the father of facial recognition implemented facial recognition manually in the 1960. He recorded the coordinates of facial features like eyes, nose, mouth, hair line, etc. manually.
The first computer-assisted facial recognition used linear algebra for low dimensional representation of facial images. The low dimensional representations were called eigenfaces and this method is still basis of many deep learning algorithms. The only difference is that the modern deep learning algorithms used very large datasets of faces to train the facial recognition systems.
ImageNet is a large database of 14 million images for visual object recognition software research. Point to be noted here is that facial recognition is nothing but a special case of object recognition.
Demand for facial recognition in applications is going up because it is an effective way of ensuring system security, user safety as well as user engagement. There are different approaches to identifying a human face, like 2D/3D recognition, color- or texture-based face search, search by motion, thermal imaging, etc.
To integrate facial recognition into a new or existing application, just four things are required – a video camera; a powerful server to store data; detection, comparison and recognition algorithms; and a trained neural network with access to images.
The four main steps of facial recognition are face detection, data normalization / alignment, feature extraction and recognition. The deep learning algorithms train the systems in how to localize faces, extract features, compare images and finally identify the faces. These algorithms are designed like an animal’s neural network and are called Artificial Neural Networks (ANN). Two types of ANNs – convolutional neural networks and deep auto-encoder networks – are the most popular algorithms for training facial recognition systems.
There are many facial recognition APIs and frameworks to train the facial recognition systems. Some of the most popular ones include Microsoft computer vision, OpenCV, Kairos, Inferdo, Face++, etc.