CN111325139A - Lip language identification method and device - Google Patents

Lip language identification method and device Download PDF

Info

Publication number
CN111325139A
CN111325139A CN202010099127.9A CN202010099127A CN111325139A CN 111325139 A CN111325139 A CN 111325139A CN 202010099127 A CN202010099127 A CN 202010099127A CN 111325139 A CN111325139 A CN 111325139A
Authority
CN
China
Prior art keywords
face
video frame
visible light
library
target group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010099127.9A
Other languages
Chinese (zh)
Other versions
CN111325139B (en
Inventor
刘晓成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN202010099127.9A priority Critical patent/CN111325139B/en
Publication of CN111325139A publication Critical patent/CN111325139A/en
Application granted granted Critical
Publication of CN111325139B publication Critical patent/CN111325139B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Social Psychology (AREA)
  • Psychiatry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a lip language identification method and device. The method comprises the following steps: receiving a visible light video frame collected by a terminal and a thermal imaging video frame corresponding to the visible light video frame; determining an identification area in the visible light video frame according to the thermal imaging video frame, and identifying a human face in the identification area; matching the recognized face with the faces in the target group face library, and if the recognized face is matched with at least one face in the target group face library, determining that the recognized face is effective; and performing lip language recognition on the face determined to be effective according to the visible light video frame containing the face determined to be effective.

Description

Lip language identification method and device
Technical Field
The application relates to the technical field of image processing, in particular to a lip language identification method and device.
Background
In the fields of artificial intelligence and image processing, various functions (such as analyzing facial expression action information of a user) can be realized by using image information of a target. Image acquisition and recognition are always a popular research topic, and relate to multiple aspects of user daily life and scientific research. For example, the accuracy of somatosensory interaction and semantic recognition can be improved by recognizing the lip language of the face of the user, so that more comfortable interaction experience is further brought.
At present, the lip language identification process is as follows: acquiring image information of a target human body object in a mode of combining at least one of a depth camera and an infrared camera; determining the position of the face according to the synthetic image; and extracting lip action characteristics to perform lip language recognition. The process mainly focuses on how to train a lip recognition model and accurately recognize lips in the video, but does not pay attention to personnel to which the lips in the video belong, so that the usability of the lip language recognition device is poor.
How to improve the usability and applicability of lip language recognition in a complex dynamic environment is a problem to be researched and solved urgently in the industry.
Disclosure of Invention
The embodiment of the application provides a lip language identification method and device, which improve the complex dynamic environment adaptability of lip language identification and ensure the consistency of the lip language identification from beginning to end.
In a first aspect, an embodiment of the present application provides a lip language identification method, including:
receiving a visible light video frame and a thermal imaging video frame corresponding to the visible light video frame, which are collected by a terminal;
determining an identification area in the visible light video frame according to the thermal imaging video frame, and identifying the face in the identification area;
matching the recognized face with the faces in the target group face library, and if the recognized face is matched with at least one face in the target group face library, determining that the recognized face is effective;
and performing lip language recognition on the face determined to be effective according to the visible light video frame containing the face determined to be effective.
Optionally, determining an identification area in the visible light video frame according to the thermal imaging video frame, and identifying a face in the identification area, includes:
determining a background area in a thermal imaging video frame, and shielding the background;
carrying out differential operation on the visible light video frame and the thermal imaging video frame with the shielded background area to obtain a differential video frame;
and determining an identification area in the visible light video frame according to the difference video frame, and identifying the face in the identification area.
Optionally, if the identified face matches at least one face in the face library of the target group, determining that the identified face is valid includes:
comparing the recognized face with the faces in the face library of the target group;
and if the similarity between the recognized face and at least one face in the target group face library is greater than or equal to a first threshold value, determining that the recognized face is effective.
Optionally, the method in the embodiment of the present application further includes: if the similarity between the recognized face and the face in the target group face library is smaller than a first threshold value and larger than or equal to a second threshold value, adding the recognized face into the target group face library, or replacing the face with the highest similarity between the recognized face and the face in the target group library by using the recognized face; wherein the first threshold is greater than the second threshold.
Optionally, if a plurality of faces are recognized and determined to be valid, performing lip language recognition on the faces determined to be valid, including: and respectively carrying out lip language recognition on a plurality of faces determined to be effective.
In a second aspect, an embodiment of the present application provides a server, including:
the receiving module is used for receiving the visible light video frames collected by the terminal and the thermal imaging video frames corresponding to the visible light video frames;
the face recognition module is used for determining a recognition area in the visible light video frame according to the thermal imaging video frame and recognizing a face in the recognition area;
the effective face determining module is used for matching the face obtained by recognition with the faces in the target group face library, and if the face obtained by recognition is matched with at least one face in the target group face library, determining that the recognized face is effective;
and the lip language recognition module is used for carrying out lip language recognition on the face determined to be effective according to the visible light video frame containing the face determined to be effective.
Optionally, the face recognition module is specifically configured to:
determining a background area in a thermal imaging video frame, and shielding the background;
carrying out differential operation on the visible light video frame and the thermal imaging video frame with the shielded background area to obtain a differential video frame;
and determining an identification area in the visible light video frame according to the difference video frame, and identifying the face in the identification area.
Optionally, the valid face determination module is specifically configured to: comparing the recognized face with the faces in the face library of the target group; and if the similarity between the recognized face and at least one face in the target group face library is greater than or equal to a first threshold value, determining that the recognized face is effective.
Optionally, the apparatus in the embodiment of the present application further includes:
the target group face library updating module is used for adding the recognized face into the target group face library or replacing the face with the highest similarity to the recognized face in the target group library by using the recognized face under the condition that the similarity between the recognized face and the face in the target group face library is smaller than a first threshold value but larger than or equal to a second threshold value; wherein the first threshold is greater than the second threshold.
Optionally, the lip language recognition module is specifically configured to:
if the effective face determining module determines that a plurality of effective faces are available in the faces identified by the face identifying module, lip language identification is performed on the plurality of faces determined to be effective respectively, and the lip language identification method comprises the following steps:
and respectively carrying out lip language recognition on a plurality of faces determined to be effective.
In a third aspect, an embodiment of the present application provides a server, including: a processor and a memory; a memory coupled to the processor and configured to store computer instructions; a processor, coupled to the memory, configured to execute the computer instructions to cause the server to perform the method of any of the first aspects described above.
In a fourth aspect, embodiments of the present application provide a computer storage medium having computer program instructions stored therein, which when executed on a computer, cause the computer to perform the method of any of the above first aspects.
In the embodiment of the application, when lip language is identified, based on the target group face library, lip language identification can be performed on members in a specific group, face interference of people who do not need to be concerned is eliminated, the complex dynamic environment adaptability of lip language identification is improved, and the consistency of the target from beginning to end in the lip language identification is ensured.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a diagram illustrating a lip recognition system architecture provided by an embodiment of the present application;
fig. 2 is a flowchart illustrating a lip language identification method provided by an embodiment of the present application;
fig. 3 schematically illustrates a structural diagram of a server 300 provided in an embodiment of the present application;
fig. 4 schematically shows a structural diagram of a server 400 according to another embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Fig. 1 schematically shows a lip language recognition system architecture diagram provided in an embodiment of the present application. As shown in fig. 1, includes: terminal 101, server 102, network 103. The terminal is provided with a thermal imaging camera 1011 and a visible light camera 1012, and is used for acquiring a thermal imaging video frame sequence and a visible light video frame sequence of a monitoring place in real time and sending the thermal imaging video frame sequence and the visible light video frame sequence to the server. The server 103 may be a common web server, an enterprise server, or the like, and is used for implementing a method for identifying abnormal body temperature. The network 102 may be the internet, a local area network, the internet, etc., for connecting data communication between the terminals and the server.
For clarity of description of the embodiments of the present application, a detailed description of the thermal imaging camera and the visible light camera will be provided below.
A thermal imaging camera: thermal imaging is a camera that detects infrared energy (heat) by non-contact, converts it into an electrical signal, generates a thermal image and a temperature value, and can detect the temperature value. The thermal imaging camera has the following principle: the human body is a naturally occurring source of infrared radiation that does not continue to radiate and absorb infrared radiation. The temperature of each part of a normal human body is stable and special, different temperatures have different thermal fields, and when a certain part is diseased or abnormal, the blood flow at the part can be changed, so that the local temperature is changed. According to the principle, the infrared thermal imaging is to collect the infrared radiation of a human body through a thermal imager, convert the infrared radiation into a digital signal and generate a colorful thermal image. For example, experts in the physical examination center analyze and judge the location, nature and degree of disease of the lesion of the human body based on these heat maps.
Visible light camera: visible light imaging technology is built in the human visual range, carries out prediction in a visible light wave band and depends on natural illumination. In terms of the visual effect of the human eye, different wavelengths of visible radiation cause the human eye to receive different colors, and we refer to a color that corresponds to a wavelength of light radiation as monochromatic light or spectral color, most of which changes with the intensity of light. The visible light camera is manufactured based on a visible light imaging technology, can perform real-time transmission and information processing, and is high in resolution.
Fig. 2 is a flowchart illustrating a lip language identification method provided in an embodiment of the present application, where the flowchart includes the following steps:
s201: the server receives the visible light video frames collected by the terminal and the thermal imaging video frames corresponding to the visible light video frames.
The visible light video frame and the corresponding thermal imaging video frame are acquired synchronously, that is, the acquisition time of the visible light video frame is the same as the acquisition time of the thermal imaging video frame.
In the step, the thermal imaging video frame is collected and sent by the thermal imaging camera, and the visible light video frame is collected and sent by the visible light camera. Wherein, the video frame frequency of the thermal imaging camera can be set to be the same as that of the visible light camera.
S202: and the server determines an identification area in the visible light video frame according to the thermal imaging video frame and identifies the face in the identification area.
In an actual application scenario, the visible light video frame may contain some invalid faces, such as a face in a poster, a face in an advertisement screen, and the like, which may interfere with face recognition. In the thermal imaging video frame collected by the thermal imaging camera, the human face in the propaganda newspaper, the human face in the advertisement screen and the like cannot be displayed in the thermal imaging video frame because infrared signals cannot be radiated outwards. In addition, although the human face far from the thermal imaging camera may radiate an infrared signal outward, the human face is usually ignored because the signal is weak because the human face is far from the thermal imaging camera. Therefore, according to the thermal imaging video frame, the areas of the invalid face or the face with weak infrared signals and other backgrounds can be distinguished, the areas do not need to be identified, and the area with strong infrared signals can be used as an identification area. By the method, invalid faces can be filtered, so that face recognition overhead can be reduced, and interference factors can be eliminated.
In the step, a background area in a thermal imaging video frame can be determined firstly, and the background is shielded; then, carrying out differential operation on the visible light video frame and the thermal imaging video frame with the shielded background area to obtain a differential video frame; and then determining an identification area in the visible light video frame according to the difference video frame, and identifying the face in the identification area.
Specifically, in the above process, the non-photosensitive or weakly photosensitive region in the thermal imaging video frame may be determined as the background region according to the characteristics of the active light source. And taking the visible light video frame as a static image of the current environment, taking the thermal imaging video frame as an active light source image, and carrying out differential operation on the static image and the active light source image to obtain a differential image. And performing difference distribution estimation to determine the face to be recognized in the image.
S203: and the server matches the recognized face with the faces in the target group face library, and if the recognized face is matched with at least one face in the target group face library, the recognized face is determined to be effective.
The faces of the members in the target group (target crowd) may be collected in advance, and the collected face data may be stored in the target group face library. The members of the target group are persons who need lip language recognition.
In this step, whether the identified face matches with a face in the face library of the target group can be judged in the following manner: comparing the recognized face with the faces in the target group face library to obtain the similarity between the recognized face and the faces in the target group face library, if the similarity between the recognized face and at least one face in the target group face library is greater than or equal to a first threshold value, determining that the recognized face is effective, and performing lip language recognition on the face determined to be effective.
The using scene of the lip language recognition device is not necessarily a stable environment, people may move, other people pass through the lens, and the human face in the video cannot be guaranteed to be the same person. Lip language identification is a continuous process, and the consistency of lip language identification from beginning to end needs to be maintained. Based on the target group face library, attention can be paid to a specific group, namely lip language recognition is carried out on members in the specific group, and interference is eliminated.
For example, when the server is performing lip language identification analysis on the person a, the person B passes through the lens and speaks a few words, but the face of the person B is not recorded in the face library of the target group, at this time, the server filters the face of the person B and does not identify the lip language of the person B, so that the lip language identification result of the person B is prevented from polluting the lip language identification result of the person a, and the consistency of the target from beginning to end in the lip language identification process is ensured.
Optionally, if the similarity between the identified face and the face in the target group face library is smaller than a first threshold but greater than or equal to a second threshold, the identified face is added to the target group face library, or the identified face is used to replace the face with the highest similarity between the identified face and the face in the target group library, where the first threshold is greater than the second threshold.
In an actual application scene, the human face changes along with the change of time or the change of age, and the human face in the target group human face library needs to be updated according to the human face obtained through recognition, so that the attention of a specific group is improved.
For example, when the face of the person C in the target group face library has the ziphi, and the person C appears in the acquired video frame again, the hairstyle is changed, the previous ziphi is combed, and when the face of the person C without the bang obtained through recognition is compared with the face of the person C in the target group face library, if the similarity is smaller than the first threshold value but greater than or equal to the second threshold value, the face of the person C obtained through recognition is added into the target group face library, or the face of the person C obtained through recognition is used to replace the face with the highest similarity to the face of the person C in the target group library, so that the attention of the specific group is improved.
S204: and the server performs lip language recognition on the face determined to be effective according to the visible light video frame containing the face determined to be effective.
In the actual lip language identification process, there are two people or the condition of a plurality of people in the dialogue, if carry out the lip language identification to everyone, need many equipment simultaneous workings to can not appear the second person in guaranteeing the camera lens, actual scene is complicated changeable, and the operation is very inconvenient.
Optionally, in the above flow, if there are a plurality of faces that are recognized and determined to be valid, lip language recognition may be performed on the plurality of faces that are determined to be valid, respectively. Specifically, when a plurality of faces are recognized, it is possible to sequentially determine whether the recognized faces are valid faces. And when judging that a plurality of effective faces exist, numbering each effective face, and respectively identifying lip language.
In the step, a lip language recognition model which is trained in advance can be used for carrying out lip language recognition, and an analysis result is output, wherein the model can be a deep neural network model. Specifically, data preprocessing may be performed on the video frame input to the model to form input data, and then the trained lip language recognition model is used to predict the language type of the input data and output an analysis result. And finally, converting the analysis result into storable text information, and recording the storable text information under the corresponding personnel name according to the human face number information.
Lip language identification is carried out on a plurality of effective human face lips, so that the usability of the system and the adaptability of a complex dynamic environment can be improved.
In the embodiment of the application, before lip language recognition, the thermal imaging technology and the face recognition technology are combined to filter invalid faces, so that the consistency of the target from beginning to end in the lip language recognition is ensured.
Based on the same technical concept, the embodiment of the application also provides a server, and the server executes the method in the embodiment.
Referring to fig. 3, a server structure provided in the embodiment of the present application is shown. As shown in fig. 3, the server 300 includes a receiving module 301, a face recognition module 302, a valid face determination module 303, and a lip language recognition module 304.
The receiving module 301 is configured to receive a visible light video frame and a thermal imaging video frame corresponding to the visible light video frame, which are acquired by a terminal;
the face recognition module 302 is configured to determine a recognition area in the visible light video frame according to the thermal imaging video frame, and recognize a face in the recognition area;
an effective face determining module 303, configured to match the identified face with faces in a target group face library, and determine that the identified face is effective if the identified face is matched with at least one face in the target group face library;
and the lip language recognition module 304 is configured to perform lip language recognition on the face determined to be valid according to the visible light video frame containing the face determined to be valid.
Optionally, the embodiment of the present application further includes a target group face library updating module 305, where the target group face library updating module 305 is configured to, if the similarity between the identified face and the face in the target group face library is smaller than a first threshold but greater than or equal to a second threshold, add the identified face to the target group face library, or replace the face with the highest similarity between the identified face and the face in the target group library with the identified face; wherein the first threshold is greater than the second threshold.
Optionally, the face recognition module 302 is specifically configured to: determining a background area in a thermal imaging video frame, and shielding the background; carrying out differential operation on the visible light video frame and the thermal imaging video frame with the shielded background area to obtain a differential video frame; and determining an identification area in the visible light video frame according to the difference video frame, and identifying the face in the identification area.
Optionally, the effective face determining module 303 is specifically configured to: comparing the recognized face with the faces in the face library of the target group; and if the similarity between the recognized face and at least one face in the target group face library is greater than or equal to a first threshold value, determining that the recognized face is effective.
Optionally, the lip language recognition module 304 is further configured to perform lip language recognition on a plurality of faces determined to be valid if the valid face determination module determines that a plurality of faces identified by the face recognition module are valid.
Based on the same technical concept, the embodiment of the application also provides a server, and the server executes the method in the embodiment.
Fig. 4 shows a schematic structural diagram of a server 400 provided in an embodiment of the present application. Referring to fig. 4, the server 400 includes a processor 401 and a network interface 402. The processor 401 may also be a controller. The processor 401 is configured to perform the functions referred to in fig. 2. The network interface 402 is configured to support messaging functionality. The server 400 may also include a memory 403, the memory 403 being coupled to the processor 401 and storing program instructions and data necessary for the device. The processor 401, the network interface 402 and the memory 403 are connected, the memory 403 is used for storing instructions, and the processor 401 is used for executing the instructions stored in the memory 403 to control the network interface 402 to send and receive messages, so as to complete the steps of the method for executing corresponding functions.
In the embodiment of the present application, for concepts, explanations, details, and other steps related to the technical solution provided in the embodiment of the present application, reference is made to the description of the foregoing method or the related steps in other embodiments, and details are not described herein.
It should be noted that the processor referred to in the embodiments of the present application may be a Central Processing Unit (CPU), a general purpose processor, a Digital Signal Processor (DSP), an application-specific integrated circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic devices, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. A processor may also be a combination of computing functions, e.g., comprising one or more microprocessors, a DSP and a microprocessor, or the like. Wherein the memory may be integrated in the processor or may be provided separately from the processor.
Embodiments of the present application also provide a computer storage medium for storing instructions that, when executed, may perform the method of the foregoing embodiments.
The embodiments of the present application also provide a computer program product for storing a computer program, where the computer program is used to execute the method of the foregoing embodiments.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (12)

1. A lip language identification method is characterized by comprising the following steps:
receiving a visible light video frame collected by a terminal and a thermal imaging video frame corresponding to the visible light video frame;
determining an identification area in the visible light video frame according to the thermal imaging video frame, and identifying a human face in the identification area;
matching the recognized face with the faces in the target group face library, and if the recognized face is matched with at least one face in the target group face library, determining that the recognized face is effective;
and performing lip language recognition on the face determined to be effective according to the visible light video frame containing the face determined to be effective.
2. The method of claim 1, wherein determining a recognition area in the visible light video frame from the thermal imaging video frame and recognizing a human face in the recognition area comprises:
determining a background area in the thermal imaging video frame, and shielding the background;
carrying out differential operation on the visible light video frame and the thermal imaging video frame with the shielded background area to obtain a differential video frame;
and determining an identification area in the visible light video frame according to the difference video frame, and identifying the face in the identification area.
3. The method of claim 1, wherein determining that the recognized face is valid if the recognized face matches at least one face in the library of faces in the target population comprises:
comparing the recognized face with the faces in the face library of the target group;
and if the similarity between the recognized face and at least one face in the target group face library is greater than or equal to a first threshold value, determining that the recognized face is effective.
4. The method of claim 3, further comprising:
if the similarity between the recognized face and the face in the target group face library is smaller than the first threshold value but larger than or equal to a second threshold value, adding the recognized face into the target group face library, or replacing the face with the highest similarity between the recognized face and the face in the target group library by using the recognized face; wherein the first threshold is greater than the second threshold.
5. The method according to any one of claims 1 to 4, wherein if a plurality of faces are recognized and determined to be valid, performing lip language recognition on the faces determined to be valid comprises:
and respectively carrying out lip language recognition on a plurality of faces determined to be effective.
6. A server, comprising:
the receiving module is used for receiving a visible light video frame collected by a terminal and a thermal imaging video frame corresponding to the visible light video frame;
the face recognition module is used for determining a recognition area in the visible light video frame according to the thermal imaging video frame and recognizing a face in the recognition area;
the effective face determining module is used for matching the face obtained by recognition with the faces in the target group face library, and if the face obtained by recognition is matched with at least one face in the target group face library, determining that the recognized face is effective;
and the lip language recognition module is used for carrying out lip language recognition on the face determined to be effective according to the visible light video frame containing the face determined to be effective.
7. The server of claim 6, wherein the face recognition module is specifically configured to:
determining a background area in the thermal imaging video frame, and shielding the background;
carrying out differential operation on the visible light video frame and the thermal imaging video frame with the shielded background area to obtain a differential video frame;
and determining an identification area in the visible light video frame according to the difference video frame, and identifying the face in the identification area.
8. The server of claim 6, wherein the valid face determination module is specifically configured to:
comparing the recognized face with the faces in the face library of the target group;
and if the similarity between the recognized face and at least one face in the target group face library is greater than or equal to a first threshold value, determining that the recognized face is effective.
9. The server of claim 6, further comprising:
a target group face library updating module, configured to add the identified face to the target group face library or replace a face with the highest similarity to the identified face in the target group library with the identified face when the similarity between the identified face and a face in the target group face library is smaller than the first threshold but greater than or equal to a second threshold; wherein the first threshold is greater than the second threshold.
10. The apparatus according to any one of claims 6 to 9, wherein the lip language identification module is specifically configured to:
if the effective face determining module determines that a plurality of effective faces are available in the faces identified by the face identifying module, lip language identification is carried out on the faces determined to be effective respectively.
11. A server, comprising: a processor and a memory;
the memory, coupled to the processor, configured to store computer instructions; the processor, coupled to the memory, configured to execute the computer instructions to cause the server to:
receiving a visible light video frame collected by a terminal and a thermal imaging video frame corresponding to the visible light video frame;
determining an identification area in the visible light video frame according to the thermal imaging video frame, and identifying a human face in the identification area;
matching the recognized face with the faces in the target group face library, and if the recognized face is matched with at least one face in the target group face library, determining that the recognized face is effective;
and performing lip language recognition on the face determined to be effective according to the visible light video frame containing the face determined to be effective.
12. A computer storage medium having computer program instructions stored therein, which when run on a computer, cause the computer to perform the method of any one of claims 1-5.
CN202010099127.9A 2020-02-18 2020-02-18 Lip language identification method and device Active CN111325139B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010099127.9A CN111325139B (en) 2020-02-18 2020-02-18 Lip language identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010099127.9A CN111325139B (en) 2020-02-18 2020-02-18 Lip language identification method and device

Publications (2)

Publication Number Publication Date
CN111325139A true CN111325139A (en) 2020-06-23
CN111325139B CN111325139B (en) 2023-08-04

Family

ID=71172135

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010099127.9A Active CN111325139B (en) 2020-02-18 2020-02-18 Lip language identification method and device

Country Status (1)

Country Link
CN (1) CN111325139B (en)

Citations (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080279425A1 (en) * 2007-04-13 2008-11-13 Mira Electronics Co., Ltd. Human face recognition and user interface system for digital camera and video camera
CN102982321A (en) * 2012-12-05 2013-03-20 深圳Tcl新技术有限公司 Acquisition method and device for face database
CN103605969A (en) * 2013-11-28 2014-02-26 Tcl集团股份有限公司 Method and device for face inputting
CN104050449A (en) * 2014-06-13 2014-09-17 无锡天脉聚源传媒科技有限公司 Face recognition method and device
CN104361276A (en) * 2014-11-18 2015-02-18 新开普电子股份有限公司 Multi-mode biometric authentication method and multi-mode biometric authentication system
CN104966086A (en) * 2014-11-14 2015-10-07 深圳市腾讯计算机系统有限公司 Living body identification method and apparatus
US20160125404A1 (en) * 2014-10-31 2016-05-05 Xerox Corporation Face recognition business model and method for identifying perpetrators of atm fraud
WO2016197765A1 (en) * 2015-06-11 2016-12-15 腾讯科技(深圳)有限公司 Human face recognition method and recognition system
CN106774856A (en) * 2016-08-01 2017-05-31 深圳奥比中光科技有限公司 Exchange method and interactive device based on lip reading
CN106778518A (en) * 2016-11-24 2017-05-31 汉王科技股份有限公司 A kind of human face in-vivo detection method and device
CN106874871A (en) * 2017-02-15 2017-06-20 广东光阵光电科技有限公司 A kind of recognition methods of living body faces dual camera and identifying device
US20170213074A1 (en) * 2016-01-27 2017-07-27 Intel Corporation Decoy-based matching system for facial recognition
CN107133608A (en) * 2017-05-31 2017-09-05 天津中科智能识别产业技术研究院有限公司 Identity authorization system based on In vivo detection and face verification
CN108090888A (en) * 2018-01-04 2018-05-29 北京环境特性研究所 The infrared image of view-based access control model attention model and the fusion detection method of visible images
CN108470169A (en) * 2018-05-23 2018-08-31 国政通科技股份有限公司 Face identification system and method
CN108875546A (en) * 2018-04-13 2018-11-23 北京旷视科技有限公司 Face auth method, system and storage medium
CN208351494U (en) * 2018-05-23 2019-01-08 国政通科技股份有限公司 Face identification system
CN109190561A (en) * 2018-09-04 2019-01-11 四川长虹电器股份有限公司 Face identification method and system in a kind of video playing
CN109325413A (en) * 2018-08-17 2019-02-12 深圳市中电数通智慧安全科技股份有限公司 A kind of face identification method, device and terminal
US20190141297A1 (en) * 2017-11-07 2019-05-09 Ooma, Inc. Systems and Methods of Activity Based Recording for Camera Applications
WO2019128362A1 (en) * 2017-12-28 2019-07-04 北京京东尚科信息技术有限公司 Human facial recognition method, apparatus and system, and medium
CN110163806A (en) * 2018-08-06 2019-08-23 腾讯科技(深圳)有限公司 A kind of image processing method, device and storage medium
CN110245630A (en) * 2019-06-18 2019-09-17 广东中安金狮科创有限公司 Monitoring data processing method, device and readable storage medium storing program for executing
CN110268419A (en) * 2019-05-08 2019-09-20 深圳市汇顶科技股份有限公司 A kind of face identification method, face identification device and computer readable storage medium
CN110443109A (en) * 2019-06-11 2019-11-12 万翼科技有限公司 Abnormal behaviour monitor processing method, device, computer equipment and storage medium

Patent Citations (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080279425A1 (en) * 2007-04-13 2008-11-13 Mira Electronics Co., Ltd. Human face recognition and user interface system for digital camera and video camera
CN102982321A (en) * 2012-12-05 2013-03-20 深圳Tcl新技术有限公司 Acquisition method and device for face database
CN103605969A (en) * 2013-11-28 2014-02-26 Tcl集团股份有限公司 Method and device for face inputting
CN104050449A (en) * 2014-06-13 2014-09-17 无锡天脉聚源传媒科技有限公司 Face recognition method and device
US20160125404A1 (en) * 2014-10-31 2016-05-05 Xerox Corporation Face recognition business model and method for identifying perpetrators of atm fraud
CN104966086A (en) * 2014-11-14 2015-10-07 深圳市腾讯计算机系统有限公司 Living body identification method and apparatus
CN104361276A (en) * 2014-11-18 2015-02-18 新开普电子股份有限公司 Multi-mode biometric authentication method and multi-mode biometric authentication system
WO2016197765A1 (en) * 2015-06-11 2016-12-15 腾讯科技(深圳)有限公司 Human face recognition method and recognition system
US20170213074A1 (en) * 2016-01-27 2017-07-27 Intel Corporation Decoy-based matching system for facial recognition
CN106774856A (en) * 2016-08-01 2017-05-31 深圳奥比中光科技有限公司 Exchange method and interactive device based on lip reading
CN106778518A (en) * 2016-11-24 2017-05-31 汉王科技股份有限公司 A kind of human face in-vivo detection method and device
CN106874871A (en) * 2017-02-15 2017-06-20 广东光阵光电科技有限公司 A kind of recognition methods of living body faces dual camera and identifying device
CN107133608A (en) * 2017-05-31 2017-09-05 天津中科智能识别产业技术研究院有限公司 Identity authorization system based on In vivo detection and face verification
US20190141297A1 (en) * 2017-11-07 2019-05-09 Ooma, Inc. Systems and Methods of Activity Based Recording for Camera Applications
WO2019128362A1 (en) * 2017-12-28 2019-07-04 北京京东尚科信息技术有限公司 Human facial recognition method, apparatus and system, and medium
CN108090888A (en) * 2018-01-04 2018-05-29 北京环境特性研究所 The infrared image of view-based access control model attention model and the fusion detection method of visible images
CN108875546A (en) * 2018-04-13 2018-11-23 北京旷视科技有限公司 Face auth method, system and storage medium
CN108470169A (en) * 2018-05-23 2018-08-31 国政通科技股份有限公司 Face identification system and method
CN208351494U (en) * 2018-05-23 2019-01-08 国政通科技股份有限公司 Face identification system
CN110163806A (en) * 2018-08-06 2019-08-23 腾讯科技(深圳)有限公司 A kind of image processing method, device and storage medium
CN109325413A (en) * 2018-08-17 2019-02-12 深圳市中电数通智慧安全科技股份有限公司 A kind of face identification method, device and terminal
CN109190561A (en) * 2018-09-04 2019-01-11 四川长虹电器股份有限公司 Face identification method and system in a kind of video playing
CN110268419A (en) * 2019-05-08 2019-09-20 深圳市汇顶科技股份有限公司 A kind of face identification method, face identification device and computer readable storage medium
CN110443109A (en) * 2019-06-11 2019-11-12 万翼科技有限公司 Abnormal behaviour monitor processing method, device, computer equipment and storage medium
CN110245630A (en) * 2019-06-18 2019-09-17 广东中安金狮科创有限公司 Monitoring data processing method, device and readable storage medium storing program for executing

Also Published As

Publication number Publication date
CN111325139B (en) 2023-08-04

Similar Documents

Publication Publication Date Title
US10936919B2 (en) Method and apparatus for detecting human face
CN108197592B (en) Information acquisition method and device
Durga et al. A ResNet deep learning based facial recognition design for future multimedia applications
Ruminski et al. Interactions with recognized patients using smart glasses
CN111626126A (en) Face emotion recognition method, device, medium and electronic equipment
CN110941978B (en) Face clustering method and device for unidentified personnel and storage medium
CN109583364A (en) Image-recognizing method and equipment
Revanur et al. Instantaneous physiological estimation using video transformers
CN112699758A (en) Sign language translation method and device based on dynamic gesture recognition, computer equipment and storage medium
CN111654694A (en) Quality evaluation method and device of image processing algorithm and electronic equipment
Maiano et al. Depthfake: a depth-based strategy for detecting deepfake videos
CN108460364B (en) Method and apparatus for generating information
Viedma et al. Relevant features for gender classification in NIR periocular images
Reddi et al. CNN Implementing Transfer Learning for Facial Emotion Recognition
Liang et al. Real time hand movement trajectory tracking for enhancing dementia screening in ageing deaf signers of British sign language
Farooq et al. ChildGAN: Large Scale Synthetic Child Facial Data Using Domain Adaptation in StyleGAN
CN111814738A (en) Human face recognition method, human face recognition device, computer equipment and medium based on artificial intelligence
Wang et al. Heart rate estimation from facial videos with motion interference using T-SNE-based signal separation
Kwaśniewska et al. Real-time facial features detection from low resolution thermal images with deep classification models
CN111325139B (en) Lip language identification method and device
KR101126704B1 (en) Online client diagnosis system and method thereof
Malgheet et al. MS-net: Multi-segmentation network for the iris region using deep learning in an unconstrained environment
Sadhana et al. Prediction of Skin Cancer using Convolutional Neural Network
CN114550249A (en) Face image generation method and device, computer readable medium and electronic equipment
Pranathi et al. A review on various facial expression recognition techniques

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant