CN116092167A - Human face living body detection method based on reading - Google Patents

Human face living body detection method based on reading Download PDF

Info

Publication number
CN116092167A
CN116092167A CN202310206478.9A CN202310206478A CN116092167A CN 116092167 A CN116092167 A CN 116092167A CN 202310206478 A CN202310206478 A CN 202310206478A CN 116092167 A CN116092167 A CN 116092167A
Authority
CN
China
Prior art keywords
reading
sound
video
face
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310206478.9A
Other languages
Chinese (zh)
Inventor
谢华
陈书东
彭汉迎
席锋
陈晓念
马宇翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Weisi E Commerce Shenzhen Co ltd
Original Assignee
Weisi E Commerce Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Weisi E Commerce Shenzhen Co ltd filed Critical Weisi E Commerce Shenzhen Co ltd
Priority to CN202310206478.9A priority Critical patent/CN116092167A/en
Publication of CN116092167A publication Critical patent/CN116092167A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/166Detection; Localisation; Normalisation using acquisition arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection
    • G06V40/45Detection of the body part being alive

Abstract

The invention provides a human face living body detection method based on reading, and relates to the field of medical appliances. The human face living body detection method based on the reading comprises the following steps: s1, face detection, S2, reading prompting, S3, reading data acquisition, S4, reading judgment and S5, living body detection. The invention judges whether the motion and the sound are consistent by identifying the motion and the sound content of the lip, and then performs motion video living body detection and sound living body detection after meeting the requirements.

Description

Human face living body detection method based on reading
Technical Field
The invention relates to the technical field of human face living body detection, in particular to a human face living body detection method based on reading.
Background
Along with the wide use of face recognition technology in various identity authentication scenes, the conditions of black birth, counterfeit others and the like are increasingly rampant, and the requirements on face living body detection are also higher.
Initially, a face image is used in face living body detection; subsequently, motion-based dynamic face in-vivo detection methods are also widely used. Meanwhile, with the development of technologies such as voice recognition and voiceprint recognition, human voice is also used as identity verification, and the requirement of voice living body detection is derived.
However, the existing detection method has low accuracy and low in living body detection capability, and the detection result is affected.
Disclosure of Invention
(one) solving the technical problems
Aiming at the defects of the prior art, the invention provides a human face living body detection method based on reading, which solves the problem of lower accuracy of the existing detection method.
(II) technical scheme
In order to achieve the above purpose, the invention is realized by the following technical scheme: the human face living body detection method based on the reading comprises the following steps:
s1, face detection
After entering a face identity verification process, acquiring a picture captured by a camera, and detecting whether a face exists in the picture by using a neural network face detection model;
s2, reading prompt
After the face is detected, a reading stage is carried out;
s3, collecting reading data
After the prompt is finished, a reading stage is entered;
s4, reading judgment
Because the user cannot be guaranteed to read out the specified number by sounding according to the requirement.
The equipment end needs to judge whether the user finishes the action of 'sounding reading', if the user sounding reading number is not detected, the equipment end prompts the user that the operation fails and guides the re-acquisition;
s5, living body detection
The device collects the reading video data and transmits the reading video data to the back end for living body detection.
Preferably, the step S1 further includes not performing the next stage if there is no face or the image quality of the face is low, and entering the next stage if there is a face.
Preferably, the step S2 further includes teaching the user how to complete the data collection in this stage through a certain interaction manner.
Preferably, in step S3, the user needs to read the number of the prompt in a period of time, and the device stores the video-sound data collected by the camera. Whether the user is reading on demand or not, the data is cached.
Preferably, in step S4, a voice activity detection algorithm is further used to detect whether a person utters, and the VAD algorithm determines whether there is a change in sound by detecting a change in sound energy. In general, the energy of the sound signal changes when the sound is read, and the device can judge whether sound is generated or not by detecting the change of the energy of the sound signal.
Preferably, in the step S5, the "multi-mode Tansformer-based video-sound content identification" and the "video-based motion living detection" are used to perform living detection, and only if two modules pass at the same time, the passing is verified.
Preferably, the step S5 specifically includes the following steps:
a. multi-modal Tansformer based video-sound content identification
In the forged video-sound data, the situation that the sound and the video are not matched is easy to occur, the sound and the video are refused, the structure of a Tansformer can record the position information of the data, the unmatched data is identified by utilizing the capability of a model, if the sound and the video are not matched, the sound and the video are not passed, and only the sound and the video are matched, the sound and the video can be passed;
b. video-based motion live detection
The human voice reads, the mouth of the person has actions, the optical flow can represent the size and the direction of the pixel point change of the front frame image and the rear frame image of the moving object, when the mouth of the person acts, the position moves, and the movement characteristics among the images can be represented through the optical flow.
(III) beneficial effects
The invention provides a human face living body detection method based on reading. The beneficial effects are as follows:
the invention provides a human face living body detection method based on reading, which prompts a person to read numbers through interaction with the person, lips of the person can make corresponding actions and make sounds, and the actions and the sounds are judged to be consistent through identifying the actions and the sounds, so that the actions and the sounds are detected after the requirements are met.
Drawings
FIG. 1 is a face biopsy flow chart of the present invention;
FIG. 2 is a diagram showing a structure of a multi-modal Tansformer-based video-audio content recognition model according to the present invention;
fig. 3 is a diagram of a video-based motion biopsy model structure according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Examples:
as shown in fig. 1-3, an embodiment of the present invention provides a method for detecting a human face living body based on reading, which specifically includes the following steps:
s1, face detection
After entering a face identity verification process, acquiring a picture captured by a camera, and detecting whether a face exists in the picture by using a neural network face detection model. If no face exists or the image quality of the face is lower, the next stage is not performed; when a face exists, entering the next stage;
s2, reading prompt
After the face is detected, a reading stage is entered. Through a certain interaction mode, the user is taught how to complete the data acquisition at the stage. For example: and displaying the animation prompt of the reading on the screen of the device, and simultaneously playing the sound prompt to display how to operate correctly to the user.
Then, the user vocally reads out the specified number according to the prompt. The scheme uses Arabic numerals, and the pronunciation of the same numeral is different for different languages. However, for the same number, the variety of pronunciation is limited;
s3, collecting reading data
After the prompt is finished, a reading stage is entered. The user needs to read the number of prompts over a period of time. Meanwhile, the device stores the video-sound data collected by the camera. Whether the user is reading on demand or not, the data is cached.
When the device stores sound and video data, the sound and video are required to be aligned in time and keep synchronous;
s4, reading judgment
Because the user cannot be guaranteed to read out the specified number by sounding according to the requirement.
The equipment end needs to judge whether the user finishes the action of 'sounding reading', and if the sounding reading number of the user is not detected, the equipment end prompts the user to fail to operate and guides the user to acquire again.
In this scheme, voice activity detection (Voice Activity Detection, VAD) algorithms are used to detect whether someone is speaking. The VAD algorithm determines whether there is a change in sound by detecting a change in sound energy. In general, the energy of the sound signal changes when the sound is read, and the device can judge whether sound is generated or not by detecting the change of the energy of the sound signal.
In this scheme, a motion detection model based on deep learning is used to detect whether a person's mouth has a read motion. Typically, the mouth will open, close, or change mouth shape when a person reads normally. By detecting whether there is a mouth motion in the video, the device can determine if the user is reading normally.
To ensure that normal user usage is not disturbed, the satisfaction of one of the two conditions indicates that the user has made the correct action. The device uploads the collected read video-sound data.
In order to ensure the robustness of the device to the judgment of the reading behavior, the threshold and the threshold are reduced as much as possible. The equipment end judges, filters out a part of data which does not meet the requirements, timely provides feedback, does not need to wait for the feedback result of the back end, and mainly aims to ensure the use experience of a novice user in the earlier stage;
s5, living body detection
The device collects the reading video data and transmits the reading video data to the back end for living body detection. The scheme uses the video-sound content identification based on the multimode Tansformer and the motion living body detection based on the video to perform living body detection, and only if two modules pass simultaneously, the passing is verified.
The step S5 specifically comprises the following steps:
a. multi-modal Tansformer based video-sound content identification
In the forged video-sound data, the situation that the sound and the video are not matched is easy to occur, the data of the sound and the video which are not matched are refused, the structure of the Tansformer can record the position information of the data, the unmatched data are identified by utilizing the capability of the model, if the sound and the video are not matched, the voice and the video are not passed, and only the sound and the video are matched, the voice and the video can be passed.
Meanwhile, the model can identify actions in sound and video, and judge whether the reading content of the person is consistent with the required reading. If the verification is consistent, the verification is passed; otherwise, not pass.
In this scheme, the model result of the video-sound content recognition based on the multimodal tan former is shown in fig. 2. The model is input into sound data and video data, the model is output into the category of living bodies, and data training Tansformer is used for obtaining an identification model;
b. video-based motion live detection
The human voice reads, the mouth of the person has actions, the optical flow can represent the size and the direction of the pixel point change of the front frame image and the rear frame image of the moving object, when the mouth of the person acts, the position moves, and the movement characteristics among the images can be represented through the optical flow.
For video, there are many frames of images [ I1, I2, ], in ], the optical flow between each frame of images is calculated to obtain a sequence of optical flows [ F1, F2, ], fn-1], two sequences of optical flows [ I1, F1, I2, F2, ], in-1, fn-1, in ] are combined to obtain a sequence of optical flows between a face image containing motion and every two consecutive images.
Setting a threshold value, and if the living body score is larger than the threshold value, predicting the model as living body, and verifying to pass; otherwise, not pass.
In this scheme, the result of the motion living body detection model based on the video is shown in fig. 3. The model inputs the video frame and the optical flow sequence, the model outputs the category of living body, and the data training neural network is used to obtain the living body detection model.
The invention further discloses a human face living body detection flow chart based on reading, as shown in figure 1, after entering a face brushing flow, equipment captures camera data, carries out face detection on an image acquired by a camera, and enters the next stage if the face is detected, otherwise, carries out face detection continuously and circularly; after the face is detected, prompting the user to read the number, and caching the voice and video read by the user; performing motion detection and voice activity detection on the stored voice and video, entering the next stage when mouth motion or sound exists, or starting a new round of acquisition; uploading the acquired data to a background, judging whether the face to be verified is a true person by using a living body detection model, if so, returning to the living body detection, otherwise, returning to the living body detection, and not passing.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (7)

1. The human face living body detection method based on the reading is characterized by comprising the following steps of:
s1, face detection
After entering a face identity verification process, acquiring a picture captured by a camera, and detecting whether a face exists in the picture by using a neural network face detection model;
s2, reading prompt
After the face is detected, a reading stage is carried out;
s3, collecting reading data
After the prompt is finished, a reading stage is entered;
s4, reading judgment
Because the user cannot be guaranteed to read out the specified number by sounding according to the requirement.
The equipment end needs to judge whether the user finishes the action of 'sounding reading', if the user sounding reading number is not detected, the equipment end prompts the user that the operation fails and guides the re-acquisition;
s5, living body detection
The device collects the reading video data and transmits the reading video data to the back end for living body detection.
2. A reading-based face biopsy method according to claim 1, wherein: the step S1 further includes if there is no face or the image quality of the face is low, not performing the next stage, and entering the next stage when there is a face.
3. A reading-based face biopsy method according to claim 1, wherein: the step S2 also comprises the step of teaching the user how to complete the data acquisition of the stage through a certain interaction mode.
4. A reading-based face biopsy method according to claim 1, wherein: the step S3 further includes that the user needs to read the number of the prompt in a period of time, and the device stores the video-sound data collected by the camera. Whether the user is reading on demand or not, the data is cached.
5. A reading-based face biopsy method according to claim 1, wherein: in step S4, a voice activity detection algorithm is further used to detect whether a person utters, and the VAD algorithm determines whether the sound has a change by detecting a change in sound energy. In general, the energy of the sound signal changes when the sound is read, and the device can judge whether sound is generated or not by detecting the change of the energy of the sound signal.
6. A reading-based face biopsy method according to claim 1, wherein: further in the step S5, the "multi-mode Tansformer-based video-sound content recognition" and "video-based motion living detection" are used to perform living detection, and only if two modules pass simultaneously, the passing is verified.
7. A reading-based face biopsy method according to claim 1, wherein: the step S5 specifically includes the following steps:
a. multi-modal Tansformer based video-sound content identification
In the forged video-sound data, the situation that the sound and the video are not matched is easy to occur, the sound and the video are refused, the structure of a Tansformer can record the position information of the data, the unmatched data is identified by utilizing the capability of a model, if the sound and the video are not matched, the sound and the video are not passed, and only the sound and the video are matched, the sound and the video can be passed;
b. video-based motion live detection
The human voice reads, the mouth of the person has actions, the optical flow can represent the size and the direction of the pixel point change of the front frame image and the rear frame image of the moving object, when the mouth of the person acts, the position moves, and the movement characteristics among the images can be represented through the optical flow.
CN202310206478.9A 2023-02-23 2023-02-23 Human face living body detection method based on reading Pending CN116092167A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310206478.9A CN116092167A (en) 2023-02-23 2023-02-23 Human face living body detection method based on reading

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310206478.9A CN116092167A (en) 2023-02-23 2023-02-23 Human face living body detection method based on reading

Publications (1)

Publication Number Publication Date
CN116092167A true CN116092167A (en) 2023-05-09

Family

ID=86214181

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310206478.9A Pending CN116092167A (en) 2023-02-23 2023-02-23 Human face living body detection method based on reading

Country Status (1)

Country Link
CN (1) CN116092167A (en)

Similar Documents

Publication Publication Date Title
Fisher et al. Speaker association with signal-level audiovisual fusion
CN104361276B (en) A kind of multi-modal biological characteristic identity identifying method and system
US8861779B2 (en) Methods for electronically analysing a dialogue and corresponding systems
CN105718874A (en) Method and device of in-vivo detection and authentication
JP2001092974A (en) Speaker recognizing method, device for executing the same, method and device for confirming audio generation
CN109410954A (en) A kind of unsupervised more Speaker Identification device and method based on audio-video
CN109829691B (en) C/S card punching method and device based on position and deep learning multiple biological features
CN111341350A (en) Man-machine interaction control method and system, intelligent robot and storage medium
CN111027400A (en) Living body detection method and device
CN112232276A (en) Emotion detection method and device based on voice recognition and image recognition
CN113920568A (en) Face and human body posture emotion recognition method based on video image
CN112286364A (en) Man-machine interaction method and device
CN113920560A (en) Method, device and equipment for identifying identity of multi-modal speaker
CN114242235A (en) Autism patient portrait method based on multi-level key characteristic behaviors
CN110460809A (en) A kind of vagitus method for detecting, device and intelligent camera head apparatus
CN114582355A (en) Audio and video fusion-based infant crying detection method and device
CN111950480A (en) English pronunciation self-checking method and system based on artificial intelligence
CN116092167A (en) Human face living body detection method based on reading
CN108197593B (en) Multi-size facial expression recognition method and device based on three-point positioning method
CN114494930B (en) Training method and device for voice and image synchronism measurement model
CN115905977A (en) System and method for monitoring negative emotion in family sibling interaction process
CN114492579A (en) Emotion recognition method, camera device, emotion recognition device and storage device
CN114466179A (en) Method and device for measuring synchronism of voice and image
CN114466178A (en) Method and device for measuring synchronism of voice and image
CN111159676A (en) Multi-dimensional identity authentication system and method based on face recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination