CN113052111A - Intelligent communication auxiliary system and method based on gesture recognition and facial expression detection - Google Patents

Intelligent communication auxiliary system and method based on gesture recognition and facial expression detection Download PDF

Info

Publication number
CN113052111A
CN113052111A CN202110360368.9A CN202110360368A CN113052111A CN 113052111 A CN113052111 A CN 113052111A CN 202110360368 A CN202110360368 A CN 202110360368A CN 113052111 A CN113052111 A CN 113052111A
Authority
CN
China
Prior art keywords
information
image
module
facial expression
machine learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110360368.9A
Other languages
Chinese (zh)
Inventor
王立军
蒋林
李争平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North China University of Technology
Original Assignee
North China University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North China University of Technology filed Critical North China University of Technology
Priority to CN202110360368.9A priority Critical patent/CN113052111A/en
Publication of CN113052111A publication Critical patent/CN113052111A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition

Abstract

The invention discloses an intelligent communication auxiliary system based on gesture recognition and facial expression detection and an implementation method thereof, wherein the system comprises an image acquisition module, an image processing module, a physiological information monitoring system, a machine learning module, an information output module and a voice system. The invention establishes a system which has the functions of gesture understanding and motion capture and can convert gestures into sound, thereby being beneficial to the communication between the deaf-mute and others. Compared with the traditional communication system, the invention has the advantages that the facial recognition function is added, the facial expression recognition is added into the information recognition module, and the system accuracy is improved.

Description

Intelligent communication auxiliary system and method based on gesture recognition and facial expression detection
Technical Field
The invention belongs to the technical field of human-computer interaction, relates to gesture recognition, facial recognition and machine learning technologies, and particularly relates to an intelligent communication auxiliary system based on gesture recognition and facial expression detection and an implementation method thereof.
Background
About 2000 more than ten thousand deaf-mutes exist in China. Sign language has entered society for ease of communication. It is a nonverbal communication method commonly used by deaf-mutes. However, sign language has a problem in that people who have not been learned cannot understand the language. This creates a barrier to communication between the average and the deaf. Today, many systems for deaf-mute communication are invented, however, some of them are ignored. The deaf-mute can use sign language when communicating, and the facial expression of the deaf-mute is closely related to the content expressed by the deaf-mute.
The existing deaf-mute intelligent communication system is generally divided into two types, wherein one type is realized based on an intelligent glove, and the other type is realized based on an intelligent bracelet which can be worn on an arm. These conventional systems have a problem in that they extract only relevant information of the hands, and ignore face information. There may be relevant differences in gestures between different users, which reduces the accuracy of the system.
Disclosure of Invention
In order to solve the problems, the invention discloses an intelligent communication auxiliary system based on gesture recognition and facial expression detection and an implementation method thereof, establishes a system which has the functions of gesture understanding and motion capture and can convert gestures into sound, and is beneficial to communication between the deaf-mute and others.
In order to achieve the purpose, the invention provides the following technical scheme:
the intelligent communication auxiliary system based on the gesture recognition and the facial expression detection comprises an image acquisition module, an image processing module, a physiological information monitoring system, a machine learning module, an information output module and a voice system;
the image acquisition module reads an image acquired by image acquisition equipment by adopting OpenCV;
the image processing module processes the image acquired by the image acquisition module;
the physiological information monitoring system comprises a sign language information processing module and a facial expression information processing module; the sign language information processing module is used for sign language information acquisition and feature extraction, and the facial expression processing module is used for facial expression information acquisition and feature extraction;
the machine learning module is used for comparing and matching sign language information extracted by the physiological information monitoring system with gesture information and comparing and matching expression information extracted by the physiological information monitoring system with information in a database;
the information output module is used for receiving the gesture, mouth and eye information matched by the machine learning module, integrating the final information and checking the information;
the voice system information is used for receiving the information which is checked by the output module to be correct, and the recognized information is converted into voice.
Further, the image processing module specifically implements the following functions:
converting an input sequence of the RGB image into a gray image by utilizing OpenCV; performing work background segmentation, and separating a hand object in one image from the background of the hand object; and performing noise removal, and removing connected components or insignificant stains in the image for the image with the pixel less than P, wherein P is a variable value.
Further, the machine learning module takes the gesture information as a first reference basis, the mouth information as a second reference basis, and the eye information as a third reference basis when performing comparison and matching.
Furthermore, the machine learning module has relevant feedback with the physiological information monitoring system and the information output system, and performs information screening and proofreading.
Further, the feedback between the machine learning module and the physiological information monitoring system and the information output system comprises:
the physiological information monitoring system and the machine learning module have feedback, and the machine learning module feeds back the screened and collected information to the collection module after extracting characteristics, so that unnecessary information points are removed;
the information output module and the machine learning module have feedback, and when the information output module finds that the final information has conflict, the information is input back to the machine learning module to be compared and matched again.
An intelligent communication auxiliary method based on gesture recognition and facial expression detection comprises the following steps:
reading an image acquired by image acquisition equipment by adopting OpenCV (open source computer vision library);
processing the image obtained by the image acquisition module, and converting the input sequence of the RGB image into a gray image by utilizing OpenCV; performing work background segmentation, and separating a hand object in one image from the background of the hand object; removing noise, and removing connected components or insignificant stains in the image for the image with the pixel less than P, wherein P is a variable value;
step three, sign language information acquisition and feature extraction, retrograde facial expression information acquisition and feature extraction are carried out on the image processed in the step two;
comparing and matching the sign language information extracted in the step three with the gesture information in a machine learning mode, and comparing and matching the expression information extracted in the step three with information in a database;
step five, receiving the information of the gesture, the mouth and the eyes matched in the step four, integrating the final information and checking the information;
and step six, receiving the information which is checked to be correct in the step five, and converting the identified information into voice.
Furthermore, the gesture information is used as a first reference basis, the mouth information is used as a second reference basis, and the eye information is used as a third reference basis when the comparison and the matching are performed in the fourth step.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. compared with the traditional communication system, the invention has the advantages that the facial recognition function is added, the facial expression recognition is added into the information recognition module, and the system accuracy is improved.
2. The present invention establishes a system that can recognize gestures without using sensors. The system effectively identifies some keywords which need to be used in daily life. The application format of the system is further expanded, and the system can be easily installed on a mobile phone, so that more convenience is provided for communication of the deaf-mute.
Drawings
Fig. 1 is an architecture diagram of an intelligent communication assistance system based on gesture recognition and facial expression detection according to the present invention.
Fig. 2 is a human body image after image processing.
Fig. 3 is a schematic view of a physiological information monitoring system for bidding.
Detailed Description
The technical solutions provided by the present invention will be described in detail below with reference to specific examples, and it should be understood that the following specific embodiments are only illustrative of the present invention and are not intended to limit the scope of the present invention. Additionally, the steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions and, although a logical order is illustrated in the flow charts, in some cases, the steps illustrated or described may be performed in an order different than here.
The invention adds the facial expression recognition of the deaf-mute into an exchange system, and the facial expression recognition is divided into two types: one is mouth expression information, and the other is eye expression information. When the deaf-mute uses sign language to communicate, the facial expression is often very rich. But gestures may occlude the face. In this case, a general eigenface method cannot be used to extract the relevant image. The method for the characteristic face is properly adjusted and modified based on the shielding condition; the method for identifying the deaf-mute emotion based on the mouth features is formed on the basis of abandoning the extraction of the global face region and only extracting the feature values of the mouth for calculation. As shown in figure 1, the invention firstly carries out image acquisition, physiological information detection and acquisition and processing, outputs information after machine learning and feedback, finally carries out voice broadcast on the information, adds facial expression recognition on the original deaf-mute intelligent communication system and improves the system accuracy.
Specifically, the invention provides an intelligent communication auxiliary system based on gesture recognition and facial expression detection and an implementation method thereof.
The system will be described below. The intelligent communication auxiliary system based on the gesture recognition and the facial expression detection comprises an image acquisition module, an image processing module, a physiological information monitoring system, a machine learning module, an information output module and a voice system.
The image acquisition module is used for acquiring human body images. Various image input devices are commonly used in the art as image grabbers to capture images, and these devices are often hand images, data gloves, markers, and drawings. Images are acquired using a common camera used by mobile devices of different specifications. The method utilizes an OpenCV library to acquire images, and under the default condition, the images are continuously captured by a network camera at the rate of 3040 frames per second, and the images are read by using a 'read' function built in the OpenCV.
The image processing module processes the image obtained by the image acquisition module, performs image enhancement and obtains a good effect. The RGB image is obtained by a camera. The deaf-mute is in front of the system, and the camera acquires RGB images. The module converts an input sequence of an RGB image into a gray image by utilizing OpenCV, and then performs work background segmentation, namely, a hand object and a face image in one image are separated from the background of the hand object and the face image. Noise removal is then performed to remove connected components or insignificant smear in the image for images with pixels smaller than P, where P is the variable value. The image is captured in the RGB color space but needs to be converted to a grayscale image for two main reasons 1. many OpenCV functions are optimized to work only on grayscale images. 2. The number of features of an RGB image will be three times that of the original, which will increase the cost of the convolutional neural network many times.
As shown in fig. 3, the physiological information monitoring system is mainly divided into a sign language information processing module and a deaf-mute facial expression information processing module. The facial expression processing module is used for facial expression information acquisition and feature extraction, the sign language information processing module is used for sign language information acquisition and feature extraction, and the feature extraction calculates different features, such as binary regions, centroids, peak calculation, angle calculation, thumb detection, and edge detection of finger regions or hand regions. Starting from the initial set, the data is processed to construct derivative values for the feature extraction work. The purpose of feature extraction is to be redundancy and information free. It facilitates sequential learning with generalization and in some cases also produces better human interpretation. The original and initial sets of variables are reduced to groups (features) that accurately and completely describe the processing of the original set. Useful sign language information and facial expression information are extracted. The physiological information monitoring system and the machine learning module have feedback, namely information to be screened and collected is fed back to the physiological information monitoring system collection module after characteristics are extracted, and unnecessary information points are removed.
The machine learning module implements two functions: and the first step is used for comparing and matching the extracted sign language information with the gesture information stored in the database. And secondly, the facial expression recognition method is used for comparing and matching the extracted expressions (mainly mouth and eye expression recognition) in a database. The gesture information is used as a first reference, the mouth information is used as a second reference basis, the eye information is used as a third reference basis, and matching accuracy is higher after the multiple reference basis is combined. In recent years, research on dynamic gesture recognition is innovative in feature extraction, and a recognition algorithm is optimized, so that the gesture recognition rate is greatly improved, and the gesture recognition relates to the whole process from hand tracking to gesture representation, semantic command conversion and the like. Based on the facial expression recognition technology, the facial expression of the deaf-mute can be effectively recognized.
The machine learning module has relevant feedback with the physiological information monitoring system and the information output system, and performs information screening and proofreading. The system accuracy is improved.
And the machine learning module transmits the matched gesture, mouth and eye information to the information output module, and the information output module performs final information integration and checks information. The information output module comprises a sign language information output module and an expression information output module, and the sign language information output module and the expression information output module respectively output sign language information and expression information. And if the final information has conflict, inputting the information back to the machine learning module for comparison and matching again. This is a big feedback in the information output module and the machine learning module: the checked information is output after error investigation; and if errors exist, feeding back to the machine learning module for optimizing the machine learning.
And transmitting the information without errors after the investigation to a voice system, and converting the recognized information into voice by using a Google Audio built-in function 'gTTS'.
The intelligent communication auxiliary method and system based on gesture recognition and facial expression detection correspond to the system, and the method comprises the following steps:
step one, image acquisition, namely realizing the function of an image acquisition module;
step two, image processing, namely realizing the function of an image processing module;
monitoring physiological information, namely realizing the function of a physiological information monitoring system;
step four, machine learning, namely realizing the function of a machine learning module;
step five, information output, namely the function of an information output module is realized;
and step six, voice broadcasting is carried out, and the function of a voice system is realized.
The technical means disclosed in the invention scheme are not limited to the technical means disclosed in the above embodiments, but also include the technical scheme formed by any combination of the above technical features. It should be noted that those skilled in the art can make various improvements and modifications without departing from the principle of the present invention, and such improvements and modifications are also considered to be within the scope of the present invention.

Claims (7)

1. Intelligent exchange auxiliary system based on gesture recognition detects with facial expression, its characterized in that: the device comprises an image acquisition module, an image processing module, a physiological information monitoring system, a machine learning module, an information output module and a voice system;
the image acquisition module reads an image acquired by image acquisition equipment by adopting OpenCV;
the image processing module processes the image acquired by the image acquisition module;
the physiological information monitoring system comprises a sign language information processing module and a facial expression information processing module; the sign language information processing module is used for sign language information acquisition and feature extraction, and the facial expression processing module is used for facial expression information acquisition and feature extraction;
the machine learning module is used for comparing and matching sign language information extracted by the physiological information monitoring system with gesture information and comparing and matching expression information extracted by the physiological information monitoring system with information in a database;
the information output module is used for receiving the gesture, mouth and eye information matched by the machine learning module, integrating the final information and checking the information;
the voice system information is used for receiving the information which is checked by the output module to be correct, and the recognized information is converted into voice.
2. The intelligent communication assistance system based on gesture recognition and facial expression detection according to claim 1, wherein: the image processing module specifically realizes the following functions:
converting an input sequence of the RGB image into a gray image by utilizing OpenCV; performing work background segmentation, and separating a hand object in one image from the background of the hand object; and performing noise removal, and removing connected components or insignificant stains in the image for the image with the pixel less than P, wherein P is a variable value.
3. The intelligent communication assistance system based on gesture recognition and facial expression detection according to claim 1, wherein: when the machine learning module is used for comparing and matching, the gesture information is used as a first reference basis, the mouth information is used as a second reference basis, and the eye information is used as a third reference basis.
4. The intelligent communication assistance system based on gesture recognition and facial expression detection according to claim 1, wherein: the machine learning module has relevant feedback with the physiological information monitoring system and the information output system, and performs information screening and proofreading.
5. The intelligent communication assistance system based on gesture recognition and facial expression detection according to claim 4, wherein: the feedback between the machine learning module and the physiological information monitoring system and the information output system comprises:
the physiological information monitoring system and the machine learning module have feedback, and the machine learning module feeds back the screened and collected information to the collection module after extracting characteristics, so that unnecessary information points are removed;
the information output module and the machine learning module have feedback, and when the information output module finds that the final information has conflict, the information is input back to the machine learning module to be compared and matched again.
6. An intelligent communication auxiliary method based on gesture recognition and facial expression detection is characterized in that: the method comprises the following steps:
reading an image acquired by image acquisition equipment by adopting OpenCV (open source computer vision library);
processing the image obtained by the image acquisition module, and converting the input sequence of the RGB image into a gray image by utilizing OpenCV; performing work background segmentation, and separating a hand object in one image from the background of the hand object; removing noise, and removing connected components or insignificant stains in the image for the image with the pixel less than P, wherein P is a variable value;
step three, sign language information acquisition and feature extraction, retrograde facial expression information acquisition and feature extraction are carried out on the image processed in the step two;
comparing and matching the sign language information extracted in the step three with the gesture information in a machine learning mode, and comparing and matching the expression information extracted in the step three with information in a database;
step five, receiving the information of the gesture, the mouth and the eyes matched in the step four, integrating the final information and checking the information;
and step six, receiving the information which is checked to be correct in the step five, and converting the identified information into voice.
7. The intelligent communication assistance method based on gesture recognition and facial expression detection according to claim 6, wherein: and step four, when the comparison and the matching are carried out, the gesture information is used as a first reference basis, the mouth information is used as a second reference basis, and the eye information is used as a third reference basis.
CN202110360368.9A 2021-04-02 2021-04-02 Intelligent communication auxiliary system and method based on gesture recognition and facial expression detection Pending CN113052111A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110360368.9A CN113052111A (en) 2021-04-02 2021-04-02 Intelligent communication auxiliary system and method based on gesture recognition and facial expression detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110360368.9A CN113052111A (en) 2021-04-02 2021-04-02 Intelligent communication auxiliary system and method based on gesture recognition and facial expression detection

Publications (1)

Publication Number Publication Date
CN113052111A true CN113052111A (en) 2021-06-29

Family

ID=76517541

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110360368.9A Pending CN113052111A (en) 2021-04-02 2021-04-02 Intelligent communication auxiliary system and method based on gesture recognition and facial expression detection

Country Status (1)

Country Link
CN (1) CN113052111A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101759444B1 (en) * 2016-08-25 2017-07-18 연세대학교 산학협력단 Expression recognition sysyem and method using a head mounted display
CN110989835A (en) * 2017-09-11 2020-04-10 大连海事大学 Working method of holographic projection device based on gesture recognition
CN111582039A (en) * 2020-04-13 2020-08-25 清华大学 Sign language recognition and conversion system and method based on deep learning and big data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101759444B1 (en) * 2016-08-25 2017-07-18 연세대학교 산학협력단 Expression recognition sysyem and method using a head mounted display
CN110989835A (en) * 2017-09-11 2020-04-10 大连海事大学 Working method of holographic projection device based on gesture recognition
CN111582039A (en) * 2020-04-13 2020-08-25 清华大学 Sign language recognition and conversion system and method based on deep learning and big data

Similar Documents

Publication Publication Date Title
US11423701B2 (en) Gesture recognition method and terminal device and computer readable storage medium using the same
EP1487341B1 (en) Real-time eye detection and tracking under various light conditions
Nimisha et al. A brief review of the recent trends in sign language recognition
CN110598580A (en) Human face living body detection method
Jain et al. Visual assistance for blind using image processing
CN111611849A (en) Face recognition system for access control equipment
Sharma et al. Recognition of single handed sign language gestures using contour tracing descriptor
CN108073875A (en) A kind of band noisy speech identifying system and method based on monocular cam
Sonare et al. Video-based sign language translation system using machine learning
Ardiansyah et al. Systematic literature review: American sign language translator
Singh et al. A Review For Different Sign Language Recognition Systems
CN113052111A (en) Intelligent communication auxiliary system and method based on gesture recognition and facial expression detection
Rao et al. Sign Language Recognition using LSTM and Media Pipe
Muthukumar et al. Vision based hand gesture recognition for Indian sign languages using local binary patterns with support vector machine classifier
Basha et al. Speaking system to mute people using hand gestures
Brahmankar et al. Indian sign language recognition using canny edge detection
Sahu et al. Result based analysis of various lip tracking systems
Jadhav et al. GoogLeNet application towards gesture recognition for ASL character identification
Kumar et al. Sign Language to Speech Conversion—An Assistive System for Speech Impaired
Jothimani et al. Sign and Machine Language Recognition for Physically Impaired Individuals
Praneel et al. Malayalam Sign Language Character Recognition System
Malakan et al. Classify, detect and tell: real-time American sign language
Tun et al. Real-time Myanmar sign language recognition system using PCA and SVM
Rani et al. Proposed Smart Specs: Voice Assisted Text Reading System For Visually Impaired Persons Using Tts Technique
Ni et al. Diverse local facial behaviors learning from enhanced expression flow for microexpression recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination