CN113052111A

CN113052111A - Intelligent communication auxiliary system and method based on gesture recognition and facial expression detection

Info

Publication number: CN113052111A
Application number: CN202110360368.9A
Authority: CN
Inventors: 王立军; 蒋林; 李争平
Original assignee: North China University of Technology
Current assignee: North China University of Technology
Priority date: 2021-04-02
Filing date: 2021-04-02
Publication date: 2021-06-29

Abstract

The invention discloses an intelligent communication auxiliary system based on gesture recognition and facial expression detection and an implementation method thereof, wherein the system comprises an image acquisition module, an image processing module, a physiological information monitoring system, a machine learning module, an information output module and a voice system. The invention establishes a system which has the functions of gesture understanding and motion capture and can convert gestures into sound, thereby being beneficial to the communication between the deaf-mute and others. Compared with the traditional communication system, the invention has the advantages that the facial recognition function is added, the facial expression recognition is added into the information recognition module, and the system accuracy is improved.

Description

Intelligent communication auxiliary system and method based on gesture recognition and facial expression detection

Technical Field

The invention belongs to the technical field of human-computer interaction, relates to gesture recognition, facial recognition and machine learning technologies, and particularly relates to an intelligent communication auxiliary system based on gesture recognition and facial expression detection and an implementation method thereof.

Background

About 2000 more than ten thousand deaf-mutes exist in China. Sign language has entered society for ease of communication. It is a nonverbal communication method commonly used by deaf-mutes. However, sign language has a problem in that people who have not been learned cannot understand the language. This creates a barrier to communication between the average and the deaf. Today, many systems for deaf-mute communication are invented, however, some of them are ignored. The deaf-mute can use sign language when communicating, and the facial expression of the deaf-mute is closely related to the content expressed by the deaf-mute.

The existing deaf-mute intelligent communication system is generally divided into two types, wherein one type is realized based on an intelligent glove, and the other type is realized based on an intelligent bracelet which can be worn on an arm. These conventional systems have a problem in that they extract only relevant information of the hands, and ignore face information. There may be relevant differences in gestures between different users, which reduces the accuracy of the system.

Disclosure of Invention

In order to solve the problems, the invention discloses an intelligent communication auxiliary system based on gesture recognition and facial expression detection and an implementation method thereof, establishes a system which has the functions of gesture understanding and motion capture and can convert gestures into sound, and is beneficial to communication between the deaf-mute and others.

In order to achieve the purpose, the invention provides the following technical scheme:

the intelligent communication auxiliary system based on the gesture recognition and the facial expression detection comprises an image acquisition module, an image processing module, a physiological information monitoring system, a machine learning module, an information output module and a voice system;

the image acquisition module reads an image acquired by image acquisition equipment by adopting OpenCV;

the image processing module processes the image acquired by the image acquisition module;

the physiological information monitoring system comprises a sign language information processing module and a facial expression information processing module; the sign language information processing module is used for sign language information acquisition and feature extraction, and the facial expression processing module is used for facial expression information acquisition and feature extraction;

the machine learning module is used for comparing and matching sign language information extracted by the physiological information monitoring system with gesture information and comparing and matching expression information extracted by the physiological information monitoring system with information in a database;

the information output module is used for receiving the gesture, mouth and eye information matched by the machine learning module, integrating the final information and checking the information;

the voice system information is used for receiving the information which is checked by the output module to be correct, and the recognized information is converted into voice.

Further, the image processing module specifically implements the following functions:

converting an input sequence of the RGB image into a gray image by utilizing OpenCV; performing work background segmentation, and separating a hand object in one image from the background of the hand object; and performing noise removal, and removing connected components or insignificant stains in the image for the image with the pixel less than P, wherein P is a variable value.

Further, the machine learning module takes the gesture information as a first reference basis, the mouth information as a second reference basis, and the eye information as a third reference basis when performing comparison and matching.

Furthermore, the machine learning module has relevant feedback with the physiological information monitoring system and the information output system, and performs information screening and proofreading.

Further, the feedback between the machine learning module and the physiological information monitoring system and the information output system comprises:

the physiological information monitoring system and the machine learning module have feedback, and the machine learning module feeds back the screened and collected information to the collection module after extracting characteristics, so that unnecessary information points are removed;

the information output module and the machine learning module have feedback, and when the information output module finds that the final information has conflict, the information is input back to the machine learning module to be compared and matched again.

An intelligent communication auxiliary method based on gesture recognition and facial expression detection comprises the following steps:

reading an image acquired by image acquisition equipment by adopting OpenCV (open source computer vision library);

processing the image obtained by the image acquisition module, and converting the input sequence of the RGB image into a gray image by utilizing OpenCV; performing work background segmentation, and separating a hand object in one image from the background of the hand object; removing noise, and removing connected components or insignificant stains in the image for the image with the pixel less than P, wherein P is a variable value;

step three, sign language information acquisition and feature extraction, retrograde facial expression information acquisition and feature extraction are carried out on the image processed in the step two;

comparing and matching the sign language information extracted in the step three with the gesture information in a machine learning mode, and comparing and matching the expression information extracted in the step three with information in a database;

step five, receiving the information of the gesture, the mouth and the eyes matched in the step four, integrating the final information and checking the information;

and step six, receiving the information which is checked to be correct in the step five, and converting the identified information into voice.

Furthermore, the gesture information is used as a first reference basis, the mouth information is used as a second reference basis, and the eye information is used as a third reference basis when the comparison and the matching are performed in the fourth step.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. compared with the traditional communication system, the invention has the advantages that the facial recognition function is added, the facial expression recognition is added into the information recognition module, and the system accuracy is improved.

2. The present invention establishes a system that can recognize gestures without using sensors. The system effectively identifies some keywords which need to be used in daily life. The application format of the system is further expanded, and the system can be easily installed on a mobile phone, so that more convenience is provided for communication of the deaf-mute.

Drawings

Fig. 1 is an architecture diagram of an intelligent communication assistance system based on gesture recognition and facial expression detection according to the present invention.

Fig. 2 is a human body image after image processing.

Fig. 3 is a schematic view of a physiological information monitoring system for bidding.

Detailed Description

The technical solutions provided by the present invention will be described in detail below with reference to specific examples, and it should be understood that the following specific embodiments are only illustrative of the present invention and are not intended to limit the scope of the present invention. Additionally, the steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions and, although a logical order is illustrated in the flow charts, in some cases, the steps illustrated or described may be performed in an order different than here.

The invention adds the facial expression recognition of the deaf-mute into an exchange system, and the facial expression recognition is divided into two types: one is mouth expression information, and the other is eye expression information. When the deaf-mute uses sign language to communicate, the facial expression is often very rich. But gestures may occlude the face. In this case, a general eigenface method cannot be used to extract the relevant image. The method for the characteristic face is properly adjusted and modified based on the shielding condition; the method for identifying the deaf-mute emotion based on the mouth features is formed on the basis of abandoning the extraction of the global face region and only extracting the feature values of the mouth for calculation. As shown in figure 1, the invention firstly carries out image acquisition, physiological information detection and acquisition and processing, outputs information after machine learning and feedback, finally carries out voice broadcast on the information, adds facial expression recognition on the original deaf-mute intelligent communication system and improves the system accuracy.

Specifically, the invention provides an intelligent communication auxiliary system based on gesture recognition and facial expression detection and an implementation method thereof.

The system will be described below. The intelligent communication auxiliary system based on the gesture recognition and the facial expression detection comprises an image acquisition module, an image processing module, a physiological information monitoring system, a machine learning module, an information output module and a voice system.

The image acquisition module is used for acquiring human body images. Various image input devices are commonly used in the art as image grabbers to capture images, and these devices are often hand images, data gloves, markers, and drawings. Images are acquired using a common camera used by mobile devices of different specifications. The method utilizes an OpenCV library to acquire images, and under the default condition, the images are continuously captured by a network camera at the rate of 3040 frames per second, and the images are read by using a 'read' function built in the OpenCV.

The image processing module processes the image obtained by the image acquisition module, performs image enhancement and obtains a good effect. The RGB image is obtained by a camera. The deaf-mute is in front of the system, and the camera acquires RGB images. The module converts an input sequence of an RGB image into a gray image by utilizing OpenCV, and then performs work background segmentation, namely, a hand object and a face image in one image are separated from the background of the hand object and the face image. Noise removal is then performed to remove connected components or insignificant smear in the image for images with pixels smaller than P, where P is the variable value. The image is captured in the RGB color space but needs to be converted to a grayscale image for two main reasons 1. many OpenCV functions are optimized to work only on grayscale images. 2. The number of features of an RGB image will be three times that of the original, which will increase the cost of the convolutional neural network many times.

As shown in fig. 3, the physiological information monitoring system is mainly divided into a sign language information processing module and a deaf-mute facial expression information processing module. The facial expression processing module is used for facial expression information acquisition and feature extraction, the sign language information processing module is used for sign language information acquisition and feature extraction, and the feature extraction calculates different features, such as binary regions, centroids, peak calculation, angle calculation, thumb detection, and edge detection of finger regions or hand regions. Starting from the initial set, the data is processed to construct derivative values for the feature extraction work. The purpose of feature extraction is to be redundancy and information free. It facilitates sequential learning with generalization and in some cases also produces better human interpretation. The original and initial sets of variables are reduced to groups (features) that accurately and completely describe the processing of the original set. Useful sign language information and facial expression information are extracted. The physiological information monitoring system and the machine learning module have feedback, namely information to be screened and collected is fed back to the physiological information monitoring system collection module after characteristics are extracted, and unnecessary information points are removed.

The machine learning module implements two functions: and the first step is used for comparing and matching the extracted sign language information with the gesture information stored in the database. And secondly, the facial expression recognition method is used for comparing and matching the extracted expressions (mainly mouth and eye expression recognition) in a database. The gesture information is used as a first reference, the mouth information is used as a second reference basis, the eye information is used as a third reference basis, and matching accuracy is higher after the multiple reference basis is combined. In recent years, research on dynamic gesture recognition is innovative in feature extraction, and a recognition algorithm is optimized, so that the gesture recognition rate is greatly improved, and the gesture recognition relates to the whole process from hand tracking to gesture representation, semantic command conversion and the like. Based on the facial expression recognition technology, the facial expression of the deaf-mute can be effectively recognized.

The machine learning module has relevant feedback with the physiological information monitoring system and the information output system, and performs information screening and proofreading. The system accuracy is improved.

And the machine learning module transmits the matched gesture, mouth and eye information to the information output module, and the information output module performs final information integration and checks information. The information output module comprises a sign language information output module and an expression information output module, and the sign language information output module and the expression information output module respectively output sign language information and expression information. And if the final information has conflict, inputting the information back to the machine learning module for comparison and matching again. This is a big feedback in the information output module and the machine learning module: the checked information is output after error investigation; and if errors exist, feeding back to the machine learning module for optimizing the machine learning.

And transmitting the information without errors after the investigation to a voice system, and converting the recognized information into voice by using a Google Audio built-in function 'gTTS'.

The intelligent communication auxiliary method and system based on gesture recognition and facial expression detection correspond to the system, and the method comprises the following steps:

step one, image acquisition, namely realizing the function of an image acquisition module;

step two, image processing, namely realizing the function of an image processing module;

monitoring physiological information, namely realizing the function of a physiological information monitoring system;

step four, machine learning, namely realizing the function of a machine learning module;

step five, information output, namely the function of an information output module is realized;

and step six, voice broadcasting is carried out, and the function of a voice system is realized.

The technical means disclosed in the invention scheme are not limited to the technical means disclosed in the above embodiments, but also include the technical scheme formed by any combination of the above technical features. It should be noted that those skilled in the art can make various improvements and modifications without departing from the principle of the present invention, and such improvements and modifications are also considered to be within the scope of the present invention.

Claims

1. Intelligent exchange auxiliary system based on gesture recognition detects with facial expression, its characterized in that: the device comprises an image acquisition module, an image processing module, a physiological information monitoring system, a machine learning module, an information output module and a voice system;

2. The intelligent communication assistance system based on gesture recognition and facial expression detection according to claim 1, wherein: the image processing module specifically realizes the following functions:

3. The intelligent communication assistance system based on gesture recognition and facial expression detection according to claim 1, wherein: when the machine learning module is used for comparing and matching, the gesture information is used as a first reference basis, the mouth information is used as a second reference basis, and the eye information is used as a third reference basis.

4. The intelligent communication assistance system based on gesture recognition and facial expression detection according to claim 1, wherein: the machine learning module has relevant feedback with the physiological information monitoring system and the information output system, and performs information screening and proofreading.

5. The intelligent communication assistance system based on gesture recognition and facial expression detection according to claim 4, wherein: the feedback between the machine learning module and the physiological information monitoring system and the information output system comprises:

6. An intelligent communication auxiliary method based on gesture recognition and facial expression detection is characterized in that: the method comprises the following steps:

7. The intelligent communication assistance method based on gesture recognition and facial expression detection according to claim 6, wherein: and step four, when the comparison and the matching are carried out, the gesture information is used as a first reference basis, the mouth information is used as a second reference basis, and the eye information is used as a third reference basis.