CN113420667B

CN113420667B - Face living body detection method, device, equipment and medium

Info

Publication number: CN113420667B
Application number: CN202110700867.8A
Authority: CN
Inventors: 李敏; 徐春艳; 豆风雷; 民尧
Original assignee: Industrial and Commercial Bank of China Ltd ICBC; ICBC Technology Co Ltd
Current assignee: Industrial and Commercial Bank of China Ltd ICBC; ICBC Technology Co Ltd
Priority date: 2021-06-23
Filing date: 2021-06-23
Publication date: 2022-08-02
Anticipated expiration: 2041-06-23
Also published as: CN113420667A

Abstract

The present disclosure provides a face in-vivo detection method, apparatus, device and medium, which can be applied to the technical field of artificial intelligence. The method comprises the following steps: when a user wearing the mask performs mouth opening action within a preset time range, obtaining a human face video stream to be detected; forming a voting queue comprising a plurality of groups of images by grouping two continuous frames of images before and after a human face video stream to be detected; calculating a difference value of distances between the mandible characteristic point and the pupil characteristic point in each group of images, judging whether the difference value of the distances is larger than a preset distance threshold value, and if so, updating the voting weight of the group of images to be a first weight; calculating the difference value of the number of mask feature points of the previous frame image and the next frame image in each group of images, judging whether the difference value of the number is not less than a preset number threshold value, and if so, updating the voting weight of the group of images to be a second weight; and according to the voting weights of the multiple groups of images, performing living body voting judgment on the voting queue, and outputting a living body detection result of the human face video stream to be detected.

Description

Face living body detection method, device, equipment and medium

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, and a medium for detecting a living human face.

Background

Face recognition is widely applied to the fields of finance, social services, electronic commerce and the like. However, a human face is easily copied by video, photos, or the like, and the living body detection is a precondition for effective application of face recognition.

In general, face liveness detection is divided into four detection modes: nodding the head, shaking the head, opening the mouth, and blinking. Especially in public places, wearing a mask is basically required to cover the mouth and nose in order to block the transmission of viruses. Wearing the mask does not affect the face live body detection of nodding, shaking and blinking, but the mask shields most of facial features, so that key points of the mouth cannot be accurately acquired, and the accuracy of the face live body detection cannot be guaranteed.

Disclosure of Invention

In view of this, the present disclosure provides a face biopsy method, an apparatus, a device, and a medium, so as to solve the technical problem in the prior art that the recognition accuracy of mouth-opening biopsy is not high when a mask is worn.

One aspect of the present disclosure provides a face in-vivo detection method, including: when a user wearing the mask performs mouth opening action within a preset time range, tracking and collecting face images of the user to obtain a face video stream to be detected; taking each frame of face image of a face video stream to be detected as input, marking coordinates of mandible feature points and pupil feature points in each frame of face image by a face feature detection method, and recording the number of mask feature points in each frame of face image; forming a group of front and rear continuous two-frame images of a human face video stream to be detected, forming a voting queue comprising a plurality of groups of images, and initializing the voting weight of each group of images to be 0; calculating a difference value of distances between the mandible characteristic point and the pupil characteristic point in each group of images, judging whether the difference value of the distances is larger than a preset distance threshold value, and if so, updating the voting weight of the group of images to be a first weight; calculating the difference value of the number of mask feature points of the previous frame image and the next frame image in each group of images, judging whether the difference value of the number is not less than a preset number threshold value, and if so, updating the voting weight of the group of images to be a second weight; and according to the voting weights of the multiple groups of images, performing living body voting judgment on the voting queue, and outputting a living body detection result of the human face video stream to be detected.

According to an embodiment of the present disclosure, the preset time range is 2-10 s.

According to the embodiment of the disclosure, the mask feature points in each frame of face image include: the outer edge point of the mask and the crease point in the middle of the mask.

According to an embodiment of the present disclosure, the first weight increases with an increase of the preset distance threshold; the second weight increases with an increase in the preset number threshold.

According to the embodiment of the disclosure, the lateral coordinates of the mandible characteristic point and the pupil characteristic point are the same, and the coordinates of the pupil characteristic point are the coordinates of the middle positions of the left pupil characteristic point and the right pupil characteristic point of the human face.

According to the embodiment of the disclosure, the difference of the distance between the mandible characteristic point and the pupil characteristic point in each group of images is calculated by the following formula:

Δ _d ＝|(y ₁₂ -y ₁₁ )-(y ₂₂ -y ₂₁ )|

wherein, Delta _d Representing the difference of the distance between the mandible characteristic point and the pupil characteristic point in each group of images; y is ₁₁ A vertical coordinate representing a characteristic point of the lower jaw in a previous frame image in each group of images; y is ₁₂ The ordinate of the pupil characteristic point in the previous frame image in each group of images is represented; y is ₂₁ The ordinate of the mandible characteristic point in the next frame image in each group of images is represented; y is ₂₂ And the ordinate of the pupil characteristic point in the subsequent frame image in each group of images is represented.

According to the embodiment of the disclosure, live body voting judgment is performed on the voting queue according to the voting weights of the multiple groups of images, and a live body detection result of a face video stream to be detected is output, including: calculating the voting sum of the voting queue according to the voting weights of the multiple groups of images; and judging whether the sum of votes is larger than a preset vote and a threshold value, and if so, judging that the mouth opening action of the user is detected.

According to an embodiment of the present disclosure, the voting sum of the voting queue is calculated by the following formula:

wherein T represents the sum of votes in the voting queue; n represents the frame number of the face video stream, and is an even number; w is a _i Representing the voting weight of the ith set of images.

According to an embodiment of the present disclosure, the preset votes and thresholds are set by the following formula:

R＝(N-1)×(w ₁ +w ₂ )×η

wherein R represents a preset vote and a threshold; n represents the frame number of the face video stream; eta represents a correction coefficient and takes a value of 0-1.

According to the embodiment of the disclosure, the frame number N of the face video stream is less than 10.

According to an embodiment of the present disclosure, a facial feature detection method includes: performing model training on each frame of face image by adopting a deep learning model; or carrying out target feature classification on each frame of face image data through a classifier.

According to the embodiment of the present disclosure, before the step of taking each frame of face image of the face video stream to be detected as input, the method further includes: and preprocessing each frame of face image of the face video stream to be detected, wherein the preprocessing comprises at least one of rotation processing, scaling processing, cutting processing, gray processing or filtering processing.

Another aspect of the present disclosure provides a face liveness detection apparatus, including: the image acquisition module is used for tracking and acquiring a face image of a user when the user wearing the mask performs mouth opening action within a preset time range to obtain a face video stream to be detected; the facial feature extraction module is used for taking each frame of facial image of the to-be-detected facial video stream as input, marking coordinates of mandible feature points and pupil feature points in each frame of facial image by a facial feature detection method, and recording the number of mask feature points in each frame of facial image; the voting queue forming module is used for forming a voting queue containing a plurality of groups of images by grouping two continuous frames of images before and after the face video stream to be detected, and initializing the voting weight of each group of images to be 0; the voting weight setting module is used for calculating the difference value of the distance between the mandible characteristic point and the pupil characteristic point in each group of images, judging whether the difference value of the distance is greater than a preset distance threshold value or not, and if so, updating the voting weight of the group of images to be the first weight; calculating the difference value of the number of mask feature points of the previous frame image and the next frame image in each group of images, judging whether the difference value of the number is not less than a preset number threshold value, and if so, updating the voting weight of the group of images to be a second weight; and the living body detection output module is used for carrying out living body voting judgment on the voting queue according to the voting weights of the multiple groups of images and outputting a living body detection result of the human face video stream to be detected.

Another aspect of the present disclosure provides an electronic device including: one or more processors; a storage device for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the above-described face liveness detection method.

Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions for implementing the above-mentioned face liveness detection method when executed.

Another aspect of the present disclosure provides a computer program product comprising a computer program which, when executed by a processor, implements the above-described face liveness detection method.

Compared with the prior art, the face in-vivo detection method, the face in-vivo detection device, the face in-vivo detection equipment and the face in-vivo detection medium have the following beneficial effects:

(1) the convenience that the user who needs to carry out the mouth opening living body detection in public places can carry out the detection without taking off the mask is brought, the time for taking off the mask is saved, and the public health safety is guaranteed.

(2) The accuracy of human face live body detection under the scene of face sheltering such as similar wearing gauze mask is improved.

Drawings

The foregoing and other objects, features and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, which proceeds with reference to the accompanying drawings, in which:

fig. 1 schematically illustrates an application scenario of a human face living body detection method and apparatus according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow chart of a face liveness detection method according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates an operational flow diagram of voting weight setting according to an embodiment of the present disclosure;

fig. 4 schematically illustrates a mask feature point selection manner in a face image according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates an operational flow diagram of a liveness detection output according to an embodiment of the present disclosure;

FIG. 6 schematically shows a block diagram of an access control device according to an embodiment of the present disclosure; and

fig. 7 schematically shows a block diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

Some block diagrams and/or flow diagrams are shown in the figures. It will be understood that some blocks of the block diagrams and/or flowchart illustrations, or combinations thereof, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the instructions, which execute via the processor, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks. The techniques of this disclosure may be implemented in hardware and/or software (including firmware, microcode, etc.). In addition, the techniques of this disclosure may take the form of a computer program product on a computer-readable storage medium having instructions stored thereon for use by or in connection with an instruction execution system.

It should be noted that, in the technical solution of the present disclosure, the acquisition, storage, application, and the like of the personal information of the related user all conform to the regulations of the related laws and regulations, and necessary security measures are taken without violating the good customs of the public order.

The existing mouth opening biopsy usually needs to acquire three points, namely a mouth central point, a mouth upper edge point and a mouth lower edge point, and then judges whether a user completes mouth opening action according to the distance change from the mouth central point to the mouth upper edge point and from the mouth center to the mouth lower edge point in the mouth opening biopsy process.

At present, the accuracy rate of mouth opening living body detection in the whole face recognition reaches a higher level, however, when a mask is worn to shield most of the characteristics of a mouth, the accuracy rate of face living body recognition based on mouth opening detection cannot be guaranteed.

In view of this, the present disclosure provides a face in-vivo detection method, which can be applied to the technical field of artificial intelligence. The face living body detection method comprises the following steps: when a user wearing the mask performs mouth opening action within a preset time range, tracking and collecting face images of the user to obtain a face video stream to be detected; taking each frame of face image of a face video stream to be detected as input, marking coordinates of mandible feature points and pupil feature points in each frame of face image by a face feature detection method, and recording the number of mask feature points in each frame of face image; forming a group of front and rear continuous two-frame images of a human face video stream to be detected, forming a voting queue comprising a plurality of groups of images, and initializing the voting weight of each group of images to be 0; calculating a difference value of distances between the mandible characteristic point and the pupil characteristic point in each group of images, judging whether the difference value of the distances is larger than a preset distance threshold value, and if so, updating the voting weight of the group of images to be a first weight; calculating the difference value of the number of mask feature points of the previous frame image and the next frame image in each group of images, judging whether the difference value of the number is not less than a preset number threshold value, and if so, updating the voting weight of the group of images to be a second weight; and according to the voting weights of the multiple groups of images, performing living body voting judgment on the voting queue, and outputting a living body detection result of the human face video stream to be detected.

Fig. 1 schematically illustrates an application scenario 100 of a human face live detection method and apparatus according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of an application scenario in which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, but does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.

As shown in fig. 1, an application scenario 100 according to this embodiment may include a mask user 101, an image recognition terminal 102, and a server 103. The mask user 101 and the image recognition terminal 102 can communicate with each other via a communication link, and the image recognition terminal 102 and the server 103 can also communicate with each other via a communication link. The communication link may include various connection types, such as wired and/or wireless communication links, and so forth.

The mask user 101 interacts with the server 103 through the image recognition terminal 102 to receive or transmit a message or the like. The image recognition terminal 102 may be a variety of electronic devices having a display screen and supporting face image capture or recognition, including but not limited to cameras, video recorders, smart phones, tablets, laptop portable computers, desktop computers, and the like.

The server 103 may be a server that provides various services, such as a background management server (for example only) that provides support for image or video streams recognized by the image recognition terminal 102. The background management server may analyze and process the received data such as the image or the video stream, and feed back the processing result to the image recognition terminal 102.

When the mask wearing user 101 stands in front of the image recognition terminal 102, the image recognition terminal 102 acquires a mask wearing face image or a video stream to be detected. The image recognition terminal 102 uploads the face image or video stream of the mask to the server 103 for recognition, analysis or processing. The server 103 feeds back the recognition result to the image recognition terminal 102, and the image recognition terminal 102 displays the recognition result in the display interface. The recognition result may be a determination of whether the detection is a living body or a non-living body.

It should be noted that the living human face detection method provided by the embodiment of the present disclosure may be generally executed by the server 103. Accordingly, the living human face detection device provided by the embodiment of the present disclosure may be generally disposed in the server 103. The living human face detection method provided by the embodiment of the present disclosure may also be executed by a server or a server cluster that is different from the server 103 and can communicate with the image recognition terminal 102 and/or the server 103. Accordingly, the living human face detection device provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster that is different from the server 103 and can communicate with the image recognition terminal 102 and/or the server 103. Alternatively, the face living body detection method provided by the embodiment of the present disclosure may also be executed by the image recognition terminal 102, or may also be executed by another recognition terminal different from the image recognition terminal 102. Accordingly, the living human face detection device provided by the embodiment of the present disclosure may also be disposed in the image recognition terminal 102, or in another recognition terminal different from the image recognition terminal 102.

It should be understood that the number of mask users, image recognition terminals, and servers in fig. 1 are merely illustrative. There may be any number of wearer mask users, image recognition terminals, and servers, as desired for implementation.

Fig. 2 schematically shows a flow chart of a face liveness detection method according to an embodiment of the present disclosure. Fig. 3 schematically illustrates an operational flow diagram of voting weight setting according to an embodiment of the present disclosure.

The method shown in fig. 2 will be described in detail with reference to fig. 3. In the embodiment of the present disclosure, the face live detection method may include operations S201 to S205.

In operation S201, when a user wearing the mask performs mouth opening within a preset time range, the user' S face image is tracked and collected to obtain a face video stream to be detected.

In the embodiment of the present disclosure, the preset time range may be 2-10 s. Therefore, the video stream of the wearing mask user completing the mouth opening living body detection action collected within the preset time range can be used as the basis for the subsequent single-frame face image processing.

In operation S202, each frame of face image of the face video stream to be detected is used as input, coordinates of the chin feature point and the pupil feature point in each frame of face image are marked by a facial feature detection method, and the number of mask feature points in each frame of face image is recorded.

In order to extract a facial feature point, a facial feature detection method adopted by an embodiment of the present disclosure may include: performing model training on each frame of face image by adopting a deep learning model; or carrying out target feature classification on each frame of face image data through a classifier. Since the classifier or the deep learning model is used for identifying the feature points of the human face, which is the prior art in the field, it is not described herein again.

In the embodiment of the present disclosure, the lateral coordinates of the mandible feature point and the pupil feature point are the same, and the coordinates of the pupil feature point are the coordinates of the middle position of the left pupil feature point and the right pupil feature point of the human face.

In operation S203, two consecutive frames of images of the face video stream to be detected are grouped, a voting queue including a plurality of groups of images is formed, and the voting weight of each group of images is initialized to 0.

In operation S204, calculating a difference between distances between the mandible feature point and the pupil feature point in each group of images, determining whether the difference is greater than a preset distance threshold, and if so, updating the voting weight of the group of images to be a first weight; calculating the difference value of the number of the mask feature points of the previous frame image and the next frame image in each group of images, judging whether the difference value of the number is not less than a preset number threshold value, and if so, updating the voting weight of the group of images to be a second weight.

In the embodiment of the present disclosure, the first weight increases with an increase of the preset distance threshold, and the second weight increases with an increase of the preset number threshold.

In some embodiments, as shown in fig. 3, the calculation of the difference between the number of mask feature points of the previous image and the number of mask feature points of the next image in each group of images is performed on the premise that the distance difference is greater than a preset distance threshold, that is, the first weight is generally smaller than the second weight.

Because the horizontal coordinates of the mandible characteristic point and the pupil characteristic point are the same, and the vertical coordinates are different, in the embodiment of the disclosure, the difference value of the distance between the mandible characteristic point and the pupil characteristic point in each group of images is calculated by the following formula:

Δ _d ＝|(y ₁₂ -y _n )-(y ₂₂ -y ₂₁ )|

wherein, Delta _d Each group of drawings is shownThe difference in distance between the mandible feature point and the pupil feature point in the image; y is ₁₁ A vertical coordinate representing a characteristic point of the lower jaw in a previous frame image in each group of images; y is ₁₂ The ordinate of the pupil characteristic point in the previous frame image in each group of images is represented; y is ₂₁ The ordinate of the mandible characteristic point in the next frame image in each group of images is represented; y is ₂₂ And the ordinate of the pupil characteristic point in the subsequent frame image in each group of images is represented.

In operation S205, live body voting determination is performed on the voting queue according to the voting weights of the multiple groups of images, and a live body detection result of the face video stream to be detected is output.

Through the embodiment of the disclosure, the problems that the face detection of a user wearing a mask is inconvenient and the identification accuracy is not high are considered, the living body detection result of the face video stream to be detected is judged by utilizing the difference value of the distance between the mandible feature point and the pupil feature point and combining the difference value of the number of the mask feature points in the process of finishing the mouth opening action of the user wearing the mask, so that the change amount of the mouth shape feature point is avoided to judge whether the user opens the mouth in a conventional mouth opening living body detection mode, the living body or non-living body face can be effectively distinguished, and the accuracy of the face living body detection under the face shielding scene of a similar wearing mask and the like is improved.

In addition, the method brings convenience for detecting the mouth-opening living body of the user in public places without taking off the mask, saves the time for taking off the mask and ensures the health and safety of the public.

Fig. 4 schematically illustrates a mask feature point selection manner in a face image according to an embodiment of the present disclosure.

As shown in fig. 4, in the embodiment of the present disclosure, for each frame of face image of a face video stream to be detected, mask feature points may include mask outer edge points and mask middle crease points. Wherein, the outer edge point of the mask and the middle crease point of the mask can be selected from a plurality of points according to the image processing capability. Because the gauze mask outward flange includes upper edge, lower limb, left edge and right edge, and gauze mask outward flange and gauze mask middle part crease outward appearance are the curve usually, can evenly get the point with fixed interval on the curve to guarantee that the mode of choosing of the gauze mask characteristic point in every frame human face image is fixed.

It can be understood that when the user wearing the mask completes the mouth opening action, the position of the lower jaw moves downwards, the pupil position generally does not change greatly, and the face of the user can be unfolded due to the mouth opening action under the mask wearing condition, so that the number of crease points in the middle of the similar mask on the mask is reduced.

Because each group of images consists of 2 continuous frames of images, the difference value of the number of the mask characteristic points of the previous frame of image and the next frame of image in each group of images can be calculated to be used as the basis for detecting whether the user performs mouth opening action.

It should be noted that each frame of face image is a face image of the mask, and fig. 4 only shows a mask portion in the face image of the mask for convenience of explaining a selection manner of mask feature points, and does not indicate that only a mask exists but no face portion not blocked by the mask exists in each frame of face image of the face video stream to be detected.

Fig. 5 schematically illustrates an operation flowchart of the living body detection output according to an embodiment of the present disclosure.

As shown in FIG. 5, in the disclosed embodiment, operation S205 may include sub-operations S510-S520.

In operation S510, a vote sum of the vote queue is calculated according to the vote weights of the plurality of sets of images.

In operation S520, it is determined whether the sum of votes is greater than a preset vote and a threshold, and if so, it is determined that a mouth opening motion of the user is detected.

In the embodiment of the present disclosure, the voting sum of the voting queue is calculated by the following formula:

wherein T represents the sum of votes in the voting queue; n represents the frame number of the human face video stream, and is an even number; w is a _i Representing the voting weight of the ith set of images.

In the embodiment of the present disclosure, the preset votes and the threshold are set by the following formulas:

R＝(N-1)×(w ₁ +w ₂ )×η

Optionally, the number of frames N of the face video stream is less than 10.

The above is merely an exemplary description, and the present embodiment is not limited thereto. For example, in some embodiments, before the step of taking each frame of face image of the face video stream to be detected as an input, the method may further include: and preprocessing each frame of face image of the face video stream to be detected, wherein the preprocessing comprises at least one of rotation processing, scaling processing, cutting processing, gray processing or filtering processing.

Based on the human face in-vivo detection method, the disclosure also provides a human face in-vivo detection device. The apparatus will be described in detail below with reference to fig. 6.

Fig. 6 schematically shows a block diagram of a face liveness detection apparatus according to an embodiment of the present disclosure.

As shown in fig. 6, the living body detection device 500 of a human face may include an image acquisition module 510, a facial feature extraction module 520, a voting queue formation module 530, a voting weight setting module 540, and a living body detection output module 550.

The image acquisition module 510 is configured to track and acquire a face image of a user when a user wearing the mask performs a mouth opening motion within a preset time range, so as to obtain a face video stream to be detected.

The facial feature extraction module 520 is configured to use each frame of face image of the face video stream to be detected as input, mark coordinates of the mandible feature point and the pupil feature point in each frame of face image by a facial feature detection method, and record the number of mask feature points in each frame of face image.

The voting queue forming module 530 is configured to form a voting queue including a plurality of groups of images by grouping two consecutive frames of images before and after the face video stream to be detected, and initialize the voting weight of each group of images to 0.

A voting weight setting module 540, configured to calculate a difference between distances between the mandible feature point and the pupil feature point in each group of images, determine whether the difference is greater than a preset distance threshold, and if so, update the voting weight of the group of images to be a first weight; calculating the difference value of the number of the mask feature points of the previous frame image and the next frame image in each group of images, judging whether the difference value of the number is not less than a preset number threshold value, and if so, updating the voting weight of the group of images to be a second weight.

And the live body detection output module 550 is configured to perform live body voting judgment on the voting queue according to the voting weights of the multiple groups of images, and output a live body detection result of the to-be-detected face video stream.

It should be noted that the face living body detection device part in the embodiment of the present disclosure corresponds to the face living body detection method part in the embodiment of the present disclosure, and the description of the face living body detection device part specifically refers to the face living body detection method part, and is not described herein again.

Any number of modules, sub-modules, units, sub-units, or at least part of the functionality of any number thereof according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, and sub-units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in any other reasonable manner of hardware or firmware by integrating or packaging a circuit, or in any one of or a suitable combination of software, hardware, and firmware implementations. Alternatively, one or more of the modules, sub-modules, units, sub-units according to embodiments of the disclosure may be at least partially implemented as a computer program module, which when executed may perform the corresponding functions.

For example, any number of the image acquisition module 510, the facial feature extraction module 520, the voting queue formation module 530, the voting weight setting module 540, and the liveness detection output module 550 may be combined and implemented in one module/unit/sub-unit, or any one of the modules/units/sub-units may be divided into a plurality of modules/units/sub-units. Alternatively, at least part of the functionality of one or more of these modules/units/sub-units may be combined with at least part of the functionality of other modules/units/sub-units and implemented in one module/unit/sub-unit. According to an embodiment of the present disclosure, at least one of the image capturing module 510, the facial feature extracting module 520, the voting queue forming module 530, the voting weight setting module 540, and the living body detection output module 550 may be at least partially implemented as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented by any one of three implementations of software, hardware, and firmware, or by a suitable combination of any several of them. Alternatively, at least one of the image acquisition module 510, the facial feature extraction module 520, the voting queue formation module 530, the voting weight setting module 540, and the liveness detection output module 550 may be at least partially implemented as a computer program module that, when executed, may perform a corresponding function.

Fig. 7 schematically shows a block diagram of an electronic device according to an embodiment of the disclosure. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 7, the electronic device 600 includes a processor 610, a computer-readable storage medium 620. The electronic device 600 may perform the face liveness detection method according to the embodiments of the present disclosure.

In particular, the processor 610 may comprise, for example, a general purpose microprocessor, an instruction set processor and/or related chip set and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 610 may also include onboard memory for caching purposes. The processor 610 may be a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present disclosure.

Computer-readable storage medium 620, for example, may be a non-volatile computer-readable storage medium, specific examples including, but not limited to: magnetic storage devices, such as magnetic tape or Hard Disk Drives (HDDs); optical storage devices, such as compact disks (CD-ROMs); a memory, such as a Random Access Memory (RAM) or a flash memory; and so on.

The computer-readable storage medium 620 may include a computer program 621, which computer program 621 may include code/computer-executable instructions that, when executed by the processor 610, cause the processor 610 to perform a method according to an embodiment of the disclosure, or any variation thereof.

The computer program 621 may be configured with, for example, computer program code comprising computer program modules. For example, in an example embodiment, code in computer program 621 may include one or more program modules, including 621A, 621B, … …, for example. It should be noted that the division and number of the modules are not fixed, and those skilled in the art may use suitable program modules or program module combinations according to actual situations, so that the processor 610 may execute the method according to the embodiment of the present disclosure or any variation thereof when the program modules are executed by the processor 610.

According to an embodiment of the present disclosure, at least one of the image acquisition module 510, the facial feature extraction module 520, the voting queue formation module 530, the voting weight setting module 540, and the liveness detection output module 550 may be implemented as a computer program module described with reference to fig. 7, which, when executed by the processor 610, may implement the corresponding operations described above.

The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the face liveness detection method according to an embodiment of the present disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method illustrated in the flow chart. When the computer program product runs in a computer system, the program code is used for causing the computer system to realize the living human face detection method provided by the embodiment of the disclosure.

In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.

The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims

1. A face living body detection method is characterized by comprising the following steps:

when a user wearing the mask performs mouth opening action within a preset time range, tracking and collecting face images of the user to obtain a face video stream to be detected;

taking each frame of face image of the face video stream to be detected as input, marking coordinates of mandible feature points and pupil feature points in each frame of face image by a face feature detection method, and recording the number of mask feature points in each frame of face image;

forming a voting queue comprising a plurality of groups of images by grouping two continuous frames of images before and after the face video stream to be detected, and initializing the voting weight of each group of images to be 0;

calculating a difference value of distances between the mandible characteristic point and the pupil characteristic point in each group of images, judging whether the difference value of the distances is larger than a preset distance threshold value, and if so, updating the voting weight of the group of images to be a first weight; calculating the difference value of the number of mask feature points of the previous frame image and the next frame image in each group of images, judging whether the difference value of the number is not less than a preset number threshold value, and if so, updating the voting weight of the group of images to be a second weight;

and according to the voting weights of the multiple groups of images, performing living body voting judgment on the voting queue, and outputting a living body detection result of the to-be-detected face video stream.

2. The method of claim 1, wherein the preset time range is 2-10 s.

3. The method according to claim 1, wherein the mask feature points in each frame of face image comprise: the outer edge point of the mask and the crease point in the middle of the mask.

4. The method of claim 1, wherein the first weight increases with increasing the preset distance threshold;

the second weight increases with an increase of the preset number threshold.

5. The method according to claim 1, wherein the lateral coordinates of the mandible feature point and the pupil feature point are the same, and the coordinates of the pupil feature point are the coordinates of the middle position of the left pupil feature point and the right pupil feature point of the human face.

6. The method of claim 5, wherein the difference in distance between the mandible feature point and the pupil feature point in each set of images is calculated by the following formula:

Δ _d ＝|(y ₁₂ -y ₁₁ )-(y ₂₂ -y ₂₁ )|

7. The method according to claim 1, wherein the performing live body voting judgment on the voting queue according to the voting weights of the multiple groups of images and outputting a live body detection result of the human face video stream to be detected comprises:

calculating the voting sum of the voting queue according to the voting weights of the multiple groups of images;

and judging whether the sum of votes is larger than a preset vote and a threshold value, and if so, judging that the mouth opening action of the user is detected.

8. The method of claim 7, wherein the sum of votes of the voting queue is calculated by the following formula:

9. The method of claim 7, wherein the preset votes and thresholds are set by the following formula:

R＝(N-1)×(w ₁ +w ₂ )×η

wherein R represents a preset vote and a threshold; n represents the frame number of the face video stream; eta represents a correction coefficient, and the value of eta is 0-1; w is a ₁ Representing a first weight; w is a ₂ Representing a second weight.

10. The method of claim 9, wherein the number of frames N of the face video stream is less than 10.

11. The method of claim 1, wherein the facial feature detection method comprises:

performing model training on each frame of face image by adopting a deep learning model; or

And carrying out target feature classification on each frame of face image data through a classifier.

12. The method of claim 1, wherein the step of taking each frame of face image of the face video stream to be detected as input further comprises:

and preprocessing each frame of face image of the face video stream to be detected, wherein the preprocessing comprises at least one of rotation processing, scaling processing, cutting processing, gray processing or filtering processing.

13. A face liveness detection device, comprising:

the image acquisition module is used for tracking and acquiring the face image of a user when the user wearing the mask performs mouth opening action within a preset time range to obtain a face video stream to be detected;

the facial feature extraction module is used for taking each frame of facial image of the to-be-detected facial video stream as input, marking coordinates of mandible feature points and pupil feature points in each frame of facial image by a facial feature detection method, and recording the number of mask feature points in each frame of facial image;

a voting queue forming module, configured to form a voting queue including a plurality of groups of images by grouping two consecutive frames of images before and after the face video stream to be detected, and initialize the voting weight of each group of images to 0;

the voting weight setting module is used for calculating the difference value of the distance between the mandible characteristic point and the pupil characteristic point in each group of images, judging whether the difference value of the distance is greater than a preset distance threshold value or not, and if so, updating the voting weight of the group of images to be the first weight; calculating the difference value of the number of mask feature points of the previous frame image and the next frame image in each group of images, judging whether the difference value of the number is not less than a preset number threshold value, and if so, updating the voting weight of the group of images to be a second weight; and

and the living body detection output module is used for carrying out living body voting judgment on the voting queue according to the voting weights of the multiple groups of images and outputting a living body detection result of the to-be-detected face video stream.

14. An electronic device, comprising:

one or more processors;

a memory for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-12.

15. A computer-readable storage medium storing computer-executable instructions for implementing the method of any one of claims 1 to 12 when executed.