CN116092167A

CN116092167A - Human face living body detection method based on reading

Info

Publication number: CN116092167A
Application number: CN202310206478.9A
Authority: CN
Inventors: 谢华; 陈书东; 彭汉迎; 席锋; 陈晓念; 马宇翔
Original assignee: Weisi E Commerce Shenzhen Co ltd
Current assignee: Weisi E Commerce Shenzhen Co ltd
Priority date: 2023-02-23
Filing date: 2023-02-23
Publication date: 2023-05-09

Abstract

The invention provides a human face living body detection method based on reading, and relates to the field of medical appliances. The human face living body detection method based on the reading comprises the following steps: s1, face detection, S2, reading prompting, S3, reading data acquisition, S4, reading judgment and S5, living body detection. The invention judges whether the motion and the sound are consistent by identifying the motion and the sound content of the lip, and then performs motion video living body detection and sound living body detection after meeting the requirements.

Description

Human face living body detection method based on reading

Technical Field

The invention relates to the technical field of human face living body detection, in particular to a human face living body detection method based on reading.

Background

Along with the wide use of face recognition technology in various identity authentication scenes, the conditions of black birth, counterfeit others and the like are increasingly rampant, and the requirements on face living body detection are also higher.

Initially, a face image is used in face living body detection; subsequently, motion-based dynamic face in-vivo detection methods are also widely used. Meanwhile, with the development of technologies such as voice recognition and voiceprint recognition, human voice is also used as identity verification, and the requirement of voice living body detection is derived.

However, the existing detection method has low accuracy and low in living body detection capability, and the detection result is affected.

Disclosure of Invention

(one) solving the technical problems

Aiming at the defects of the prior art, the invention provides a human face living body detection method based on reading, which solves the problem of lower accuracy of the existing detection method.

(II) technical scheme

In order to achieve the above purpose, the invention is realized by the following technical scheme: the human face living body detection method based on the reading comprises the following steps:

s1, face detection

After entering a face identity verification process, acquiring a picture captured by a camera, and detecting whether a face exists in the picture by using a neural network face detection model;

s2, reading prompt

After the face is detected, a reading stage is carried out;

s3, collecting reading data

After the prompt is finished, a reading stage is entered;

s4, reading judgment

Because the user cannot be guaranteed to read out the specified number by sounding according to the requirement.

The equipment end needs to judge whether the user finishes the action of 'sounding reading', if the user sounding reading number is not detected, the equipment end prompts the user that the operation fails and guides the re-acquisition;

s5, living body detection

The device collects the reading video data and transmits the reading video data to the back end for living body detection.

Preferably, the step S1 further includes not performing the next stage if there is no face or the image quality of the face is low, and entering the next stage if there is a face.

Preferably, the step S2 further includes teaching the user how to complete the data collection in this stage through a certain interaction manner.

Preferably, in step S3, the user needs to read the number of the prompt in a period of time, and the device stores the video-sound data collected by the camera. Whether the user is reading on demand or not, the data is cached.

Preferably, in step S4, a voice activity detection algorithm is further used to detect whether a person utters, and the VAD algorithm determines whether there is a change in sound by detecting a change in sound energy. In general, the energy of the sound signal changes when the sound is read, and the device can judge whether sound is generated or not by detecting the change of the energy of the sound signal.

Preferably, in the step S5, the "multi-mode Tansformer-based video-sound content identification" and the "video-based motion living detection" are used to perform living detection, and only if two modules pass at the same time, the passing is verified.

Preferably, the step S5 specifically includes the following steps:

a. multi-modal Tansformer based video-sound content identification

In the forged video-sound data, the situation that the sound and the video are not matched is easy to occur, the sound and the video are refused, the structure of a Tansformer can record the position information of the data, the unmatched data is identified by utilizing the capability of a model, if the sound and the video are not matched, the sound and the video are not passed, and only the sound and the video are matched, the sound and the video can be passed;

b. video-based motion live detection

The human voice reads, the mouth of the person has actions, the optical flow can represent the size and the direction of the pixel point change of the front frame image and the rear frame image of the moving object, when the mouth of the person acts, the position moves, and the movement characteristics among the images can be represented through the optical flow.

(III) beneficial effects

The invention provides a human face living body detection method based on reading. The beneficial effects are as follows:

the invention provides a human face living body detection method based on reading, which prompts a person to read numbers through interaction with the person, lips of the person can make corresponding actions and make sounds, and the actions and the sounds are judged to be consistent through identifying the actions and the sounds, so that the actions and the sounds are detected after the requirements are met.

Drawings

FIG. 1 is a face biopsy flow chart of the present invention;

FIG. 2 is a diagram showing a structure of a multi-modal Tansformer-based video-audio content recognition model according to the present invention;

fig. 3 is a diagram of a video-based motion biopsy model structure according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Examples:

as shown in fig. 1-3, an embodiment of the present invention provides a method for detecting a human face living body based on reading, which specifically includes the following steps:

s1, face detection

After entering a face identity verification process, acquiring a picture captured by a camera, and detecting whether a face exists in the picture by using a neural network face detection model. If no face exists or the image quality of the face is lower, the next stage is not performed; when a face exists, entering the next stage;

s2, reading prompt

After the face is detected, a reading stage is entered. Through a certain interaction mode, the user is taught how to complete the data acquisition at the stage. For example: and displaying the animation prompt of the reading on the screen of the device, and simultaneously playing the sound prompt to display how to operate correctly to the user.

Then, the user vocally reads out the specified number according to the prompt. The scheme uses Arabic numerals, and the pronunciation of the same numeral is different for different languages. However, for the same number, the variety of pronunciation is limited;

s3, collecting reading data

After the prompt is finished, a reading stage is entered. The user needs to read the number of prompts over a period of time. Meanwhile, the device stores the video-sound data collected by the camera. Whether the user is reading on demand or not, the data is cached.

When the device stores sound and video data, the sound and video are required to be aligned in time and keep synchronous;

s4, reading judgment

The equipment end needs to judge whether the user finishes the action of 'sounding reading', and if the sounding reading number of the user is not detected, the equipment end prompts the user to fail to operate and guides the user to acquire again.

In this scheme, voice activity detection (Voice Activity Detection, VAD) algorithms are used to detect whether someone is speaking. The VAD algorithm determines whether there is a change in sound by detecting a change in sound energy. In general, the energy of the sound signal changes when the sound is read, and the device can judge whether sound is generated or not by detecting the change of the energy of the sound signal.

In this scheme, a motion detection model based on deep learning is used to detect whether a person's mouth has a read motion. Typically, the mouth will open, close, or change mouth shape when a person reads normally. By detecting whether there is a mouth motion in the video, the device can determine if the user is reading normally.

To ensure that normal user usage is not disturbed, the satisfaction of one of the two conditions indicates that the user has made the correct action. The device uploads the collected read video-sound data.

In order to ensure the robustness of the device to the judgment of the reading behavior, the threshold and the threshold are reduced as much as possible. The equipment end judges, filters out a part of data which does not meet the requirements, timely provides feedback, does not need to wait for the feedback result of the back end, and mainly aims to ensure the use experience of a novice user in the earlier stage;

s5, living body detection

The device collects the reading video data and transmits the reading video data to the back end for living body detection. The scheme uses the video-sound content identification based on the multimode Tansformer and the motion living body detection based on the video to perform living body detection, and only if two modules pass simultaneously, the passing is verified.

The step S5 specifically comprises the following steps:

a. multi-modal Tansformer based video-sound content identification

In the forged video-sound data, the situation that the sound and the video are not matched is easy to occur, the data of the sound and the video which are not matched are refused, the structure of the Tansformer can record the position information of the data, the unmatched data are identified by utilizing the capability of the model, if the sound and the video are not matched, the voice and the video are not passed, and only the sound and the video are matched, the voice and the video can be passed.

Meanwhile, the model can identify actions in sound and video, and judge whether the reading content of the person is consistent with the required reading. If the verification is consistent, the verification is passed; otherwise, not pass.

In this scheme, the model result of the video-sound content recognition based on the multimodal tan former is shown in fig. 2. The model is input into sound data and video data, the model is output into the category of living bodies, and data training Tansformer is used for obtaining an identification model;

b. video-based motion live detection

For video, there are many frames of images [ I1, I2, ], in ], the optical flow between each frame of images is calculated to obtain a sequence of optical flows [ F1, F2, ], fn-1], two sequences of optical flows [ I1, F1, I2, F2, ], in-1, fn-1, in ] are combined to obtain a sequence of optical flows between a face image containing motion and every two consecutive images.

Setting a threshold value, and if the living body score is larger than the threshold value, predicting the model as living body, and verifying to pass; otherwise, not pass.

In this scheme, the result of the motion living body detection model based on the video is shown in fig. 3. The model inputs the video frame and the optical flow sequence, the model outputs the category of living body, and the data training neural network is used to obtain the living body detection model.

The invention further discloses a human face living body detection flow chart based on reading, as shown in figure 1, after entering a face brushing flow, equipment captures camera data, carries out face detection on an image acquired by a camera, and enters the next stage if the face is detected, otherwise, carries out face detection continuously and circularly; after the face is detected, prompting the user to read the number, and caching the voice and video read by the user; performing motion detection and voice activity detection on the stored voice and video, entering the next stage when mouth motion or sound exists, or starting a new round of acquisition; uploading the acquired data to a background, judging whether the face to be verified is a true person by using a living body detection model, if so, returning to the living body detection, otherwise, returning to the living body detection, and not passing.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The human face living body detection method based on the reading is characterized by comprising the following steps of:

s1, face detection

s2, reading prompt

After the face is detected, a reading stage is carried out;

s3, collecting reading data

After the prompt is finished, a reading stage is entered;

s4, reading judgment

s5, living body detection

2. A reading-based face biopsy method according to claim 1, wherein: the step S1 further includes if there is no face or the image quality of the face is low, not performing the next stage, and entering the next stage when there is a face.

3. A reading-based face biopsy method according to claim 1, wherein: the step S2 also comprises the step of teaching the user how to complete the data acquisition of the stage through a certain interaction mode.

4. A reading-based face biopsy method according to claim 1, wherein: the step S3 further includes that the user needs to read the number of the prompt in a period of time, and the device stores the video-sound data collected by the camera. Whether the user is reading on demand or not, the data is cached.

5. A reading-based face biopsy method according to claim 1, wherein: in step S4, a voice activity detection algorithm is further used to detect whether a person utters, and the VAD algorithm determines whether the sound has a change by detecting a change in sound energy. In general, the energy of the sound signal changes when the sound is read, and the device can judge whether sound is generated or not by detecting the change of the energy of the sound signal.

6. A reading-based face biopsy method according to claim 1, wherein: further in the step S5, the "multi-mode Tansformer-based video-sound content recognition" and "video-based motion living detection" are used to perform living detection, and only if two modules pass simultaneously, the passing is verified.

7. A reading-based face biopsy method according to claim 1, wherein: the step S5 specifically includes the following steps:

a. multi-modal Tansformer based video-sound content identification

b. video-based motion live detection