Aaron runs a facility maintenance firm that specializes in office complexes. Maintaining infrastructure inside the building is easy as the occupants lodge complaints whenever an issue crops up. It's the external elements like rooftops, chimneys, exterior paint, etc. that is difficult to assess. The firm uses drones to take images every quarter, which is then assessed by experts who recommend the maintenance work required. 6 months into operation, Aaron felt that they were ready to scale but hiring experts to assess the images taken by the drone seemed prohibitive.
So, he turned to technology to solve the problem. He discussed his challenges with software development companies and hired one of them to develop a computer vision system that performed the assessment conducted by human experts. Training the software took six months and by that time the error percentage of the system had come down to acceptable limits.
What is computer vision?
Computer vision is a subset of Artificial Intelligence (AI), which equips a machine to see, i.e., detect, identify, and label objects as humans do. When human beings see an image, they recognize much extra information besides the main object. Take for instance the fruit basket.
When you see this image, besides identifying the fruits, you will also know that it is kept in a white plate, the fruits look fresh and the pineapple is kept outside the plate. This means that computer vision systems must be able to recognize the objects as well as their characteristics like shape, color, size, texture, spatial arrangement, background objects, etc.
How a computer “sees”
To understand how a computer learns to see, we need to remind ourselves how a child learns to identify objects. When children are born, they do not know anything about any object around them. As they grow up, people point to these objects or show them picture books and repeat their names again and again. Before long, children start identifying the common objects around them.
As they grow up, they read books, watch television, see videos on smartphones or tablets and learn to identify more objects, which could be household items, places, means of transport, body parts, etc.
A system that needs to acquire computer vision is trained in the same way. They are shown thousands and thousands of images that have been labeled with names, characteristics, and descriptions of the objects present in them.
Using deep learning algorithms, the system learns to detect, identify, and classify the objects. When a new image is shown to the computer vision systems, they can identify the objects in these images. At the most fundamental level, computer vision is pattern matching. The system matches the new image with their database, tries to identify the pattern and identify objects basis that.
Typical applications of computer vision
This seemingly easy act of identifying patterns can be put to great use by computer vision machines. Here are the most common ones:
Object classification — The objects can be classified into typical broad categories. For instance, they can classify the given image as an animal or an automobile or humans. They can go deeper by classifying the animals into cats, dogs, lions, etc.
Object identification — Identifying the object in the image.
Object verification – This includes checking if the required object is there in the image provided.
Object detection — Finding out the location of a specific object in the image is possible through computer vision systems.
Object recognition — A typical image would have multiple objects. Computer vision systems can recognize all the objects and their location.
Object tracking — It is possible to track a specific object across a series of images. Remember that this object could be a human being and they can be tracked using computer vision systems spread across an area, which can be unethical in certain cases.
Object counting — Computer vision systems can break down any object into its components, which helps in identifying different types of images of the same object. Think of people having different physical features and dresses. It greatly helps in counting and has been a great asset in restricting the number of people in a closed area during this COVID-19 pandemic.
Facial recognition — Computer vision systems can identify gender, age, emotions, the cultural appearance of people. Facial recognition has found use in biometric identification systems.
Action recognition – computer vision systems can identify an action or gesture of the person or animals in the image.
Forecasting behavior – Computer vision systems can study the mood and sentiments of people in the image and forecast their reaction for new situations based on that.
Crowd dynamics – Computer vision system can not only count people passing through but also track their direction and density.
Object character recognition (OCR) — Computer vision system can identify the text and numbers written in an image. This is being widely used to develop applications that extract information by uploading photos of things like business cards.
Document analysis – Computer vision systems can analyze a document based on the criteria provided and extract the required information.
How to use computer vision in various industries
Computer vision systems have found use in object detection and tracking, facial recognition, crowd dynamics, document analysis, etc.
Let’s look at some specific use cases in different industries:
Computer vision systems can help in two major problems faced by manufacturing units — disruption of assembly lines and product defects. Computer vision systems can analyze visual information to predict machine downtimes or disruption among shop floor employees.
Computer vision systems can also monitor the production line to spot defects and alert supervisors to take action as soon as a defect is detected.
Computer vision systems have found widespread acceptance and implementation in the Retail Industry. Here are some ways in which retailers can use computer vision:
- Determining human characteristics like age or gender to understand customer demographics.
- Customers’ movements can be tracked throughout the store to get insights into product visibility and the efficacy of aisle arrangements.
- Eye movement, facial expressions, and other hand gestures can be used to recognize products that attract maximum customers, whether they purchased it or not.
- Strategically placed computer vision systems can help in anti-theft measures.
- Computer vision algorithms can generate an accurate picture of inventory and can be integrated with product management systems to place orders.
The Healthcare sector has been one of the early adopters of computer vision systems.
- Imaging tools equipped with computer vision have been diagnosing ailments like tumors, neurological malfunctioning, cancers, etc.
- Computer vision tools can be used to identify Autism or dyslexia early on in a child.
- Computer vision tools can help visually impaired people navigate indoor areas safely.
The biggest headache of the insurance industry is verifying the authenticity of insurance claims. Computer vision can assist in analyzing images to identify the legal claims and forward it to the right person.
Insurance companies are also making risk management preemptive by developing applications that prevent collisions or send breakdown alerts.
Self-driving cars have been on the horizon for long and they use computer vision to detect objects in their path and take action accordingly.
Computer vision systems can alleviate recurring problems of the agriculture sector like weed control, disease and insect infestation, soil quality, etc. Insights generated by these systems can be used by farmers to take action quickly. They can even use agricultural robots equipped with computer vision to spray herbicides and pesticides in the relevant areas only.
Law and order/ public safety
Strategically placed computer vision systems can ensure law and order in public places. In case of accidents and criminal activities, the systems can also help identify the culprit and bring them to book.
Counting the number of people passing through is an important implementation of the application of computer vision systems. During the current pandemic, it is proving very useful in ensuring social distancing measures.
Top computer vision tools
As mentioned earlier, cloud technologies have played an important role in the widespread adoption of computer vision. Needless to add, the major cloud service providers like Microsoft, Google, and IBM have their own computer vision solutions. However, there are lots of open-source tools available for developing computer vision systems.
Let’s take a look at the most popular ones:
OpenCV is the most popular library of highly optimized programming functions to develop solutions for real-life problems. The library is a cross-platform and open source. It integrates well with C++ and python, which makes it a lucrative option for beginners as well.
It is a framework for open-source machine vision using the OpenCV library and Python as the programming language. It is designed for casual users who have no experience in writing programs. Cameras, images video streams, and video files are interoperable on SimpleCV and manipulations are very fast.
It is the most popular deep learning library because of the simplicity of its API. It is a free open source library for data streams and differential programming. TensorFlow 2.0 supports picture and speech recognition, object detection, reinforced learning, and recommendations. Its reference model makes it easier to start building solutions.
MATLAB is a multi-paradigm numerical computing environment and proprietary programming language developed by MathWorks. It allows matrix manipulation, plotting of functions and data, and creation of user interfaces and implementation of algorithms. It also allows integration with programs written in other languages. It is widely used in research as prototyping is very easy and quick.
CUDA is a parallel computing and application programming interface model created by NVIDIA, the market leader in GPUs. It delivers incredible performance using the GPU. NVIDIA Performance Primitives library is a part of CUDA and contains a set of image, signal, and video processing functions.
Keras is an open-source neural network library developed in Python. It is optimized to reduce cognitive load and concentrates on being user-friendly, modular, and extensible. It can also run on top of Microsoft Cognitive Toolkit, TensorFlow, R, PlaidML, or Theano.
You Look Just Once (YOLO) is an object detection system for real-time processing. It is an advanced real-time object detection system.
BoofCV is an open-source Java library written from scratch for real-time robotics and computer vision applications for both academic and business use. It is released under Apache Licence 2.0 and includes functionalities like low-level image processing, feature detection, and tracking, camera calibration, classification, and recognition.
Computer vision ethics
The training data set used in computer vision systems is taken from the public domain, and the privacy and security of the citizens is very important. All stakeholders including management, employees, customers, and regulators should be aware of their responsibility in developing and using these systems. And security should be considered right from strategy to execution and deployment. Businesses need to establish continued governance and regulatory compliance to ensure responsible and ethical use.
As computer vision systems can be used to track humans. It is important to ensure that they are not used to track employee activities and link their performance, appraisal, or incentives to it.
Challenges in using computer vision
Although computer vision has been around since the 1950s, it is only in this millennium that it has picked up. This is because implementing computer vision has some inherent challenges that could be overcome only now:
Millions of training images required
To train any computer vision system, millions of images are required. With the advent of Smartphones, the number of images being generated and shared on the Internet is rising every day. More than 300 million images get uploaded on Facebook itself. 95 million photos and videos are uploaded on Instagram while 100 million people use its stories feature. These are just two platforms. Billions of images are uploaded every day on the Internet and it is a veritable treasure trove to train algorithms for computer vision systems.
Processing these millions of images using neural networks requires a humongous amount of computing capacity. Older computers could not handle this and hence computer vision could not progress. Use of GPUs (Graphical Processing Units) has equipped computer systems to run the neural networks and deep learning algorithms with speed. The advent of cloud technologies had further speeded up adoption because storage space is cheaper and pay-as-you-go.
As discussed just now, computer vision is possible due to the huge amounts of images and videos available. But this has a flip side too. Identifying information about individuals is stored on the cloud and it could be harvested by governments and private institutions in invasive ways. It is a big social threat that must be debated openly so that the privacy and safety of citizens is ensured.
How TechAhead can help
TechAhead has a team of experts adept at developing computer vision systems for multiple industries like sports, retail, and manufacturing. Here is what our experts do to develop a fully customized computer vision system for your business:
- Create a data set of annotated images
- Create the model for solving the problem at hand by extracting relevant features from these images
- Train a deep learning model based on isolated features
- Evaluate the model using images that were not part of the training data set
- Repeat steps 2 to 4 till an acceptable level of accuracy is achieved
Computer vision is a branch of artificial intelligence that enables machines to detect, identify, and label objects. Computer vision machines can also identify characteristics of objects, like size, texture, color, and spatial arrangement. They can identify the age, gender, and cultural heritage of human beings, count them, detect their mood and sentiments.
Any computer vision system is trained on millions of images that have been already labeled with names, characteristics, and descriptions. Using deep learning algorithms and neural networks, these systems learn to “see” objects.
There are many open-source computer vision tools available for training computer vision systems. These include OpenCV, SimpleCV, TensorFlow, MATLAB, CUDA, etc.
Computer vision has found use in various industries like retail, healthcare, manufacturing, automobiles, education, etc. Using images generated by the citizens as a training dataset has inherent privacy and security issues because they make the people and their personal data identifiable.
It is important for those using computer vision systems to understand this and implement security as a strategy right from execution two deployment. There needs to be continued governance and regulatory compliance to ensure that these data are being used ethically.
Businesses that need computer vision solutions can get customized solutions developed with complete security and privacy built into them.