CN113420667B - Face living body detection method, device, equipment and medium - Google Patents

Face living body detection method, device, equipment and medium Download PDF

Info

Publication number
CN113420667B
CN113420667B CN202110700867.8A CN202110700867A CN113420667B CN 113420667 B CN113420667 B CN 113420667B CN 202110700867 A CN202110700867 A CN 202110700867A CN 113420667 B CN113420667 B CN 113420667B
Authority
CN
China
Prior art keywords
images
voting
face
group
video stream
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110700867.8A
Other languages
Chinese (zh)
Other versions
CN113420667A (en
Inventor
李敏
徐春艳
豆风雷
民尧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
ICBC Technology Co Ltd
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
ICBC Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC, ICBC Technology Co Ltd filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202110700867.8A priority Critical patent/CN113420667B/en
Publication of CN113420667A publication Critical patent/CN113420667A/en
Application granted granted Critical
Publication of CN113420667B publication Critical patent/CN113420667B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Collating Specific Patterns (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure provides a face in-vivo detection method, apparatus, device and medium, which can be applied to the technical field of artificial intelligence. The method comprises the following steps: when a user wearing the mask performs mouth opening action within a preset time range, obtaining a human face video stream to be detected; forming a voting queue comprising a plurality of groups of images by grouping two continuous frames of images before and after a human face video stream to be detected; calculating a difference value of distances between the mandible characteristic point and the pupil characteristic point in each group of images, judging whether the difference value of the distances is larger than a preset distance threshold value, and if so, updating the voting weight of the group of images to be a first weight; calculating the difference value of the number of mask feature points of the previous frame image and the next frame image in each group of images, judging whether the difference value of the number is not less than a preset number threshold value, and if so, updating the voting weight of the group of images to be a second weight; and according to the voting weights of the multiple groups of images, performing living body voting judgment on the voting queue, and outputting a living body detection result of the human face video stream to be detected.

Description

Face living body detection method, device, equipment and medium
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, and a medium for detecting a living human face.
Background
Face recognition is widely applied to the fields of finance, social services, electronic commerce and the like. However, a human face is easily copied by video, photos, or the like, and the living body detection is a precondition for effective application of face recognition.
In general, face liveness detection is divided into four detection modes: nodding the head, shaking the head, opening the mouth, and blinking. Especially in public places, wearing a mask is basically required to cover the mouth and nose in order to block the transmission of viruses. Wearing the mask does not affect the face live body detection of nodding, shaking and blinking, but the mask shields most of facial features, so that key points of the mouth cannot be accurately acquired, and the accuracy of the face live body detection cannot be guaranteed.
Disclosure of Invention
In view of this, the present disclosure provides a face biopsy method, an apparatus, a device, and a medium, so as to solve the technical problem in the prior art that the recognition accuracy of mouth-opening biopsy is not high when a mask is worn.
One aspect of the present disclosure provides a face in-vivo detection method, including: when a user wearing the mask performs mouth opening action within a preset time range, tracking and collecting face images of the user to obtain a face video stream to be detected; taking each frame of face image of a face video stream to be detected as input, marking coordinates of mandible feature points and pupil feature points in each frame of face image by a face feature detection method, and recording the number of mask feature points in each frame of face image; forming a group of front and rear continuous two-frame images of a human face video stream to be detected, forming a voting queue comprising a plurality of groups of images, and initializing the voting weight of each group of images to be 0; calculating a difference value of distances between the mandible characteristic point and the pupil characteristic point in each group of images, judging whether the difference value of the distances is larger than a preset distance threshold value, and if so, updating the voting weight of the group of images to be a first weight; calculating the difference value of the number of mask feature points of the previous frame image and the next frame image in each group of images, judging whether the difference value of the number is not less than a preset number threshold value, and if so, updating the voting weight of the group of images to be a second weight; and according to the voting weights of the multiple groups of images, performing living body voting judgment on the voting queue, and outputting a living body detection result of the human face video stream to be detected.
According to an embodiment of the present disclosure, the preset time range is 2-10 s.
According to the embodiment of the disclosure, the mask feature points in each frame of face image include: the outer edge point of the mask and the crease point in the middle of the mask.
According to an embodiment of the present disclosure, the first weight increases with an increase of the preset distance threshold; the second weight increases with an increase in the preset number threshold.
According to the embodiment of the disclosure, the lateral coordinates of the mandible characteristic point and the pupil characteristic point are the same, and the coordinates of the pupil characteristic point are the coordinates of the middle positions of the left pupil characteristic point and the right pupil characteristic point of the human face.
According to the embodiment of the disclosure, the difference of the distance between the mandible characteristic point and the pupil characteristic point in each group of images is calculated by the following formula:
Δ d =|(y 12 -y 11 )-(y 22 -y 21 )|
wherein, Delta d Representing the difference of the distance between the mandible characteristic point and the pupil characteristic point in each group of images; y is 11 A vertical coordinate representing a characteristic point of the lower jaw in a previous frame image in each group of images; y is 12 The ordinate of the pupil characteristic point in the previous frame image in each group of images is represented; y is 21 The ordinate of the mandible characteristic point in the next frame image in each group of images is represented; y is 22 And the ordinate of the pupil characteristic point in the subsequent frame image in each group of images is represented.
According to the embodiment of the disclosure, live body voting judgment is performed on the voting queue according to the voting weights of the multiple groups of images, and a live body detection result of a face video stream to be detected is output, including: calculating the voting sum of the voting queue according to the voting weights of the multiple groups of images; and judging whether the sum of votes is larger than a preset vote and a threshold value, and if so, judging that the mouth opening action of the user is detected.
According to an embodiment of the present disclosure, the voting sum of the voting queue is calculated by the following formula:
Figure BDA0003129460410000021
wherein T represents the sum of votes in the voting queue; n represents the frame number of the face video stream, and is an even number; w is a i Representing the voting weight of the ith set of images.
According to an embodiment of the present disclosure, the preset votes and thresholds are set by the following formula:
R=(N-1)×(w 1 +w 2 )×η
wherein R represents a preset vote and a threshold; n represents the frame number of the face video stream; eta represents a correction coefficient and takes a value of 0-1.
According to the embodiment of the disclosure, the frame number N of the face video stream is less than 10.
According to an embodiment of the present disclosure, a facial feature detection method includes: performing model training on each frame of face image by adopting a deep learning model; or carrying out target feature classification on each frame of face image data through a classifier.
According to the embodiment of the present disclosure, before the step of taking each frame of face image of the face video stream to be detected as input, the method further includes: and preprocessing each frame of face image of the face video stream to be detected, wherein the preprocessing comprises at least one of rotation processing, scaling processing, cutting processing, gray processing or filtering processing.
Another aspect of the present disclosure provides a face liveness detection apparatus, including: the image acquisition module is used for tracking and acquiring a face image of a user when the user wearing the mask performs mouth opening action within a preset time range to obtain a face video stream to be detected; the facial feature extraction module is used for taking each frame of facial image of the to-be-detected facial video stream as input, marking coordinates of mandible feature points and pupil feature points in each frame of facial image by a facial feature detection method, and recording the number of mask feature points in each frame of facial image; the voting queue forming module is used for forming a voting queue containing a plurality of groups of images by grouping two continuous frames of images before and after the face video stream to be detected, and initializing the voting weight of each group of images to be 0; the voting weight setting module is used for calculating the difference value of the distance between the mandible characteristic point and the pupil characteristic point in each group of images, judging whether the difference value of the distance is greater than a preset distance threshold value or not, and if so, updating the voting weight of the group of images to be the first weight; calculating the difference value of the number of mask feature points of the previous frame image and the next frame image in each group of images, judging whether the difference value of the number is not less than a preset number threshold value, and if so, updating the voting weight of the group of images to be a second weight; and the living body detection output module is used for carrying out living body voting judgment on the voting queue according to the voting weights of the multiple groups of images and outputting a living body detection result of the human face video stream to be detected.
Another aspect of the present disclosure provides an electronic device including: one or more processors; a storage device for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the above-described face liveness detection method.
Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions for implementing the above-mentioned face liveness detection method when executed.
Another aspect of the present disclosure provides a computer program product comprising a computer program which, when executed by a processor, implements the above-described face liveness detection method.
Compared with the prior art, the face in-vivo detection method, the face in-vivo detection device, the face in-vivo detection equipment and the face in-vivo detection medium have the following beneficial effects:
(1) the convenience that the user who needs to carry out the mouth opening living body detection in public places can carry out the detection without taking off the mask is brought, the time for taking off the mask is saved, and the public health safety is guaranteed.
(2) The accuracy of human face live body detection under the scene of face sheltering such as similar wearing gauze mask is improved.
Drawings
The foregoing and other objects, features and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, which proceeds with reference to the accompanying drawings, in which:
fig. 1 schematically illustrates an application scenario of a human face living body detection method and apparatus according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow chart of a face liveness detection method according to an embodiment of the present disclosure;
FIG. 3 schematically illustrates an operational flow diagram of voting weight setting according to an embodiment of the present disclosure;
fig. 4 schematically illustrates a mask feature point selection manner in a face image according to an embodiment of the present disclosure;
FIG. 5 schematically illustrates an operational flow diagram of a liveness detection output according to an embodiment of the present disclosure;
FIG. 6 schematically shows a block diagram of an access control device according to an embodiment of the present disclosure; and
fig. 7 schematically shows a block diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
Some block diagrams and/or flow diagrams are shown in the figures. It will be understood that some blocks of the block diagrams and/or flowchart illustrations, or combinations thereof, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the instructions, which execute via the processor, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks. The techniques of this disclosure may be implemented in hardware and/or software (including firmware, microcode, etc.). In addition, the techniques of this disclosure may take the form of a computer program product on a computer-readable storage medium having instructions stored thereon for use by or in connection with an instruction execution system.
It should be noted that, in the technical solution of the present disclosure, the acquisition, storage, application, and the like of the personal information of the related user all conform to the regulations of the related laws and regulations, and necessary security measures are taken without violating the good customs of the public order.
The existing mouth opening biopsy usually needs to acquire three points, namely a mouth central point, a mouth upper edge point and a mouth lower edge point, and then judges whether a user completes mouth opening action according to the distance change from the mouth central point to the mouth upper edge point and from the mouth center to the mouth lower edge point in the mouth opening biopsy process.
At present, the accuracy rate of mouth opening living body detection in the whole face recognition reaches a higher level, however, when a mask is worn to shield most of the characteristics of a mouth, the accuracy rate of face living body recognition based on mouth opening detection cannot be guaranteed.
In view of this, the present disclosure provides a face in-vivo detection method, which can be applied to the technical field of artificial intelligence. The face living body detection method comprises the following steps: when a user wearing the mask performs mouth opening action within a preset time range, tracking and collecting face images of the user to obtain a face video stream to be detected; taking each frame of face image of a face video stream to be detected as input, marking coordinates of mandible feature points and pupil feature points in each frame of face image by a face feature detection method, and recording the number of mask feature points in each frame of face image; forming a group of front and rear continuous two-frame images of a human face video stream to be detected, forming a voting queue comprising a plurality of groups of images, and initializing the voting weight of each group of images to be 0; calculating a difference value of distances between the mandible characteristic point and the pupil characteristic point in each group of images, judging whether the difference value of the distances is larger than a preset distance threshold value, and if so, updating the voting weight of the group of images to be a first weight; calculating the difference value of the number of mask feature points of the previous frame image and the next frame image in each group of images, judging whether the difference value of the number is not less than a preset number threshold value, and if so, updating the voting weight of the group of images to be a second weight; and according to the voting weights of the multiple groups of images, performing living body voting judgment on the voting queue, and outputting a living body detection result of the human face video stream to be detected.
Fig. 1 schematically illustrates an application scenario 100 of a human face live detection method and apparatus according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of an application scenario in which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, but does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.
As shown in fig. 1, an application scenario 100 according to this embodiment may include a mask user 101, an image recognition terminal 102, and a server 103. The mask user 101 and the image recognition terminal 102 can communicate with each other via a communication link, and the image recognition terminal 102 and the server 103 can also communicate with each other via a communication link. The communication link may include various connection types, such as wired and/or wireless communication links, and so forth.
The mask user 101 interacts with the server 103 through the image recognition terminal 102 to receive or transmit a message or the like. The image recognition terminal 102 may be a variety of electronic devices having a display screen and supporting face image capture or recognition, including but not limited to cameras, video recorders, smart phones, tablets, laptop portable computers, desktop computers, and the like.
The server 103 may be a server that provides various services, such as a background management server (for example only) that provides support for image or video streams recognized by the image recognition terminal 102. The background management server may analyze and process the received data such as the image or the video stream, and feed back the processing result to the image recognition terminal 102.
When the mask wearing user 101 stands in front of the image recognition terminal 102, the image recognition terminal 102 acquires a mask wearing face image or a video stream to be detected. The image recognition terminal 102 uploads the face image or video stream of the mask to the server 103 for recognition, analysis or processing. The server 103 feeds back the recognition result to the image recognition terminal 102, and the image recognition terminal 102 displays the recognition result in the display interface. The recognition result may be a determination of whether the detection is a living body or a non-living body.
It should be noted that the living human face detection method provided by the embodiment of the present disclosure may be generally executed by the server 103. Accordingly, the living human face detection device provided by the embodiment of the present disclosure may be generally disposed in the server 103. The living human face detection method provided by the embodiment of the present disclosure may also be executed by a server or a server cluster that is different from the server 103 and can communicate with the image recognition terminal 102 and/or the server 103. Accordingly, the living human face detection device provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster that is different from the server 103 and can communicate with the image recognition terminal 102 and/or the server 103. Alternatively, the face living body detection method provided by the embodiment of the present disclosure may also be executed by the image recognition terminal 102, or may also be executed by another recognition terminal different from the image recognition terminal 102. Accordingly, the living human face detection device provided by the embodiment of the present disclosure may also be disposed in the image recognition terminal 102, or in another recognition terminal different from the image recognition terminal 102.
It should be understood that the number of mask users, image recognition terminals, and servers in fig. 1 are merely illustrative. There may be any number of wearer mask users, image recognition terminals, and servers, as desired for implementation.
Fig. 2 schematically shows a flow chart of a face liveness detection method according to an embodiment of the present disclosure. Fig. 3 schematically illustrates an operational flow diagram of voting weight setting according to an embodiment of the present disclosure.
The method shown in fig. 2 will be described in detail with reference to fig. 3. In the embodiment of the present disclosure, the face live detection method may include operations S201 to S205.
In operation S201, when a user wearing the mask performs mouth opening within a preset time range, the user' S face image is tracked and collected to obtain a face video stream to be detected.
In the embodiment of the present disclosure, the preset time range may be 2-10 s. Therefore, the video stream of the wearing mask user completing the mouth opening living body detection action collected within the preset time range can be used as the basis for the subsequent single-frame face image processing.
In operation S202, each frame of face image of the face video stream to be detected is used as input, coordinates of the chin feature point and the pupil feature point in each frame of face image are marked by a facial feature detection method, and the number of mask feature points in each frame of face image is recorded.
In order to extract a facial feature point, a facial feature detection method adopted by an embodiment of the present disclosure may include: performing model training on each frame of face image by adopting a deep learning model; or carrying out target feature classification on each frame of face image data through a classifier. Since the classifier or the deep learning model is used for identifying the feature points of the human face, which is the prior art in the field, it is not described herein again.
In the embodiment of the present disclosure, the lateral coordinates of the mandible feature point and the pupil feature point are the same, and the coordinates of the pupil feature point are the coordinates of the middle position of the left pupil feature point and the right pupil feature point of the human face.
In operation S203, two consecutive frames of images of the face video stream to be detected are grouped, a voting queue including a plurality of groups of images is formed, and the voting weight of each group of images is initialized to 0.
In operation S204, calculating a difference between distances between the mandible feature point and the pupil feature point in each group of images, determining whether the difference is greater than a preset distance threshold, and if so, updating the voting weight of the group of images to be a first weight; calculating the difference value of the number of the mask feature points of the previous frame image and the next frame image in each group of images, judging whether the difference value of the number is not less than a preset number threshold value, and if so, updating the voting weight of the group of images to be a second weight.
In the embodiment of the present disclosure, the first weight increases with an increase of the preset distance threshold, and the second weight increases with an increase of the preset number threshold.
In some embodiments, as shown in fig. 3, the calculation of the difference between the number of mask feature points of the previous image and the number of mask feature points of the next image in each group of images is performed on the premise that the distance difference is greater than a preset distance threshold, that is, the first weight is generally smaller than the second weight.
Because the horizontal coordinates of the mandible characteristic point and the pupil characteristic point are the same, and the vertical coordinates are different, in the embodiment of the disclosure, the difference value of the distance between the mandible characteristic point and the pupil characteristic point in each group of images is calculated by the following formula:
Δ d =|(y 12 -y n )-(y 22 -y 21 )|
wherein, Delta d Each group of drawings is shownThe difference in distance between the mandible feature point and the pupil feature point in the image; y is 11 A vertical coordinate representing a characteristic point of the lower jaw in a previous frame image in each group of images; y is 12 The ordinate of the pupil characteristic point in the previous frame image in each group of images is represented; y is 21 The ordinate of the mandible characteristic point in the next frame image in each group of images is represented; y is 22 And the ordinate of the pupil characteristic point in the subsequent frame image in each group of images is represented.
In operation S205, live body voting determination is performed on the voting queue according to the voting weights of the multiple groups of images, and a live body detection result of the face video stream to be detected is output.
Through the embodiment of the disclosure, the problems that the face detection of a user wearing a mask is inconvenient and the identification accuracy is not high are considered, the living body detection result of the face video stream to be detected is judged by utilizing the difference value of the distance between the mandible feature point and the pupil feature point and combining the difference value of the number of the mask feature points in the process of finishing the mouth opening action of the user wearing the mask, so that the change amount of the mouth shape feature point is avoided to judge whether the user opens the mouth in a conventional mouth opening living body detection mode, the living body or non-living body face can be effectively distinguished, and the accuracy of the face living body detection under the face shielding scene of a similar wearing mask and the like is improved.
In addition, the method brings convenience for detecting the mouth-opening living body of the user in public places without taking off the mask, saves the time for taking off the mask and ensures the health and safety of the public.
Fig. 4 schematically illustrates a mask feature point selection manner in a face image according to an embodiment of the present disclosure.
As shown in fig. 4, in the embodiment of the present disclosure, for each frame of face image of a face video stream to be detected, mask feature points may include mask outer edge points and mask middle crease points. Wherein, the outer edge point of the mask and the middle crease point of the mask can be selected from a plurality of points according to the image processing capability. Because the gauze mask outward flange includes upper edge, lower limb, left edge and right edge, and gauze mask outward flange and gauze mask middle part crease outward appearance are the curve usually, can evenly get the point with fixed interval on the curve to guarantee that the mode of choosing of the gauze mask characteristic point in every frame human face image is fixed.
It can be understood that when the user wearing the mask completes the mouth opening action, the position of the lower jaw moves downwards, the pupil position generally does not change greatly, and the face of the user can be unfolded due to the mouth opening action under the mask wearing condition, so that the number of crease points in the middle of the similar mask on the mask is reduced.
Because each group of images consists of 2 continuous frames of images, the difference value of the number of the mask characteristic points of the previous frame of image and the next frame of image in each group of images can be calculated to be used as the basis for detecting whether the user performs mouth opening action.
It should be noted that each frame of face image is a face image of the mask, and fig. 4 only shows a mask portion in the face image of the mask for convenience of explaining a selection manner of mask feature points, and does not indicate that only a mask exists but no face portion not blocked by the mask exists in each frame of face image of the face video stream to be detected.
Fig. 5 schematically illustrates an operation flowchart of the living body detection output according to an embodiment of the present disclosure.
As shown in FIG. 5, in the disclosed embodiment, operation S205 may include sub-operations S510-S520.
In operation S510, a vote sum of the vote queue is calculated according to the vote weights of the plurality of sets of images.
In operation S520, it is determined whether the sum of votes is greater than a preset vote and a threshold, and if so, it is determined that a mouth opening motion of the user is detected.
In the embodiment of the present disclosure, the voting sum of the voting queue is calculated by the following formula:
Figure BDA0003129460410000101
wherein T represents the sum of votes in the voting queue; n represents the frame number of the human face video stream, and is an even number; w is a i Representing the voting weight of the ith set of images.
In the embodiment of the present disclosure, the preset votes and the threshold are set by the following formulas:
R=(N-1)×(w 1 +w 2 )×η
wherein R represents a preset vote and a threshold; n represents the frame number of the face video stream; eta represents a correction coefficient and takes a value of 0-1.
Optionally, the number of frames N of the face video stream is less than 10.
The above is merely an exemplary description, and the present embodiment is not limited thereto. For example, in some embodiments, before the step of taking each frame of face image of the face video stream to be detected as an input, the method may further include: and preprocessing each frame of face image of the face video stream to be detected, wherein the preprocessing comprises at least one of rotation processing, scaling processing, cutting processing, gray processing or filtering processing.
Based on the human face in-vivo detection method, the disclosure also provides a human face in-vivo detection device. The apparatus will be described in detail below with reference to fig. 6.
Fig. 6 schematically shows a block diagram of a face liveness detection apparatus according to an embodiment of the present disclosure.
As shown in fig. 6, the living body detection device 500 of a human face may include an image acquisition module 510, a facial feature extraction module 520, a voting queue formation module 530, a voting weight setting module 540, and a living body detection output module 550.
The image acquisition module 510 is configured to track and acquire a face image of a user when a user wearing the mask performs a mouth opening motion within a preset time range, so as to obtain a face video stream to be detected.
The facial feature extraction module 520 is configured to use each frame of face image of the face video stream to be detected as input, mark coordinates of the mandible feature point and the pupil feature point in each frame of face image by a facial feature detection method, and record the number of mask feature points in each frame of face image.
The voting queue forming module 530 is configured to form a voting queue including a plurality of groups of images by grouping two consecutive frames of images before and after the face video stream to be detected, and initialize the voting weight of each group of images to 0.
A voting weight setting module 540, configured to calculate a difference between distances between the mandible feature point and the pupil feature point in each group of images, determine whether the difference is greater than a preset distance threshold, and if so, update the voting weight of the group of images to be a first weight; calculating the difference value of the number of the mask feature points of the previous frame image and the next frame image in each group of images, judging whether the difference value of the number is not less than a preset number threshold value, and if so, updating the voting weight of the group of images to be a second weight.
And the live body detection output module 550 is configured to perform live body voting judgment on the voting queue according to the voting weights of the multiple groups of images, and output a live body detection result of the to-be-detected face video stream.
It should be noted that the face living body detection device part in the embodiment of the present disclosure corresponds to the face living body detection method part in the embodiment of the present disclosure, and the description of the face living body detection device part specifically refers to the face living body detection method part, and is not described herein again.
Any number of modules, sub-modules, units, sub-units, or at least part of the functionality of any number thereof according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, and sub-units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in any other reasonable manner of hardware or firmware by integrating or packaging a circuit, or in any one of or a suitable combination of software, hardware, and firmware implementations. Alternatively, one or more of the modules, sub-modules, units, sub-units according to embodiments of the disclosure may be at least partially implemented as a computer program module, which when executed may perform the corresponding functions.
For example, any number of the image acquisition module 510, the facial feature extraction module 520, the voting queue formation module 530, the voting weight setting module 540, and the liveness detection output module 550 may be combined and implemented in one module/unit/sub-unit, or any one of the modules/units/sub-units may be divided into a plurality of modules/units/sub-units. Alternatively, at least part of the functionality of one or more of these modules/units/sub-units may be combined with at least part of the functionality of other modules/units/sub-units and implemented in one module/unit/sub-unit. According to an embodiment of the present disclosure, at least one of the image capturing module 510, the facial feature extracting module 520, the voting queue forming module 530, the voting weight setting module 540, and the living body detection output module 550 may be at least partially implemented as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented by any one of three implementations of software, hardware, and firmware, or by a suitable combination of any several of them. Alternatively, at least one of the image acquisition module 510, the facial feature extraction module 520, the voting queue formation module 530, the voting weight setting module 540, and the liveness detection output module 550 may be at least partially implemented as a computer program module that, when executed, may perform a corresponding function.
Fig. 7 schematically shows a block diagram of an electronic device according to an embodiment of the disclosure. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 7, the electronic device 600 includes a processor 610, a computer-readable storage medium 620. The electronic device 600 may perform the face liveness detection method according to the embodiments of the present disclosure.
In particular, the processor 610 may comprise, for example, a general purpose microprocessor, an instruction set processor and/or related chip set and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 610 may also include onboard memory for caching purposes. The processor 610 may be a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present disclosure.
Computer-readable storage medium 620, for example, may be a non-volatile computer-readable storage medium, specific examples including, but not limited to: magnetic storage devices, such as magnetic tape or Hard Disk Drives (HDDs); optical storage devices, such as compact disks (CD-ROMs); a memory, such as a Random Access Memory (RAM) or a flash memory; and so on.
The computer-readable storage medium 620 may include a computer program 621, which computer program 621 may include code/computer-executable instructions that, when executed by the processor 610, cause the processor 610 to perform a method according to an embodiment of the disclosure, or any variation thereof.
The computer program 621 may be configured with, for example, computer program code comprising computer program modules. For example, in an example embodiment, code in computer program 621 may include one or more program modules, including 621A, 621B, … …, for example. It should be noted that the division and number of the modules are not fixed, and those skilled in the art may use suitable program modules or program module combinations according to actual situations, so that the processor 610 may execute the method according to the embodiment of the present disclosure or any variation thereof when the program modules are executed by the processor 610.
According to an embodiment of the present disclosure, at least one of the image acquisition module 510, the facial feature extraction module 520, the voting queue formation module 530, the voting weight setting module 540, and the liveness detection output module 550 may be implemented as a computer program module described with reference to fig. 7, which, when executed by the processor 610, may implement the corresponding operations described above.
The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the face liveness detection method according to an embodiment of the present disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method illustrated in the flow chart. When the computer program product runs in a computer system, the program code is used for causing the computer system to realize the living human face detection method provided by the embodiment of the disclosure.
In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.
The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims (15)

1. A face living body detection method is characterized by comprising the following steps:
when a user wearing the mask performs mouth opening action within a preset time range, tracking and collecting face images of the user to obtain a face video stream to be detected;
taking each frame of face image of the face video stream to be detected as input, marking coordinates of mandible feature points and pupil feature points in each frame of face image by a face feature detection method, and recording the number of mask feature points in each frame of face image;
forming a voting queue comprising a plurality of groups of images by grouping two continuous frames of images before and after the face video stream to be detected, and initializing the voting weight of each group of images to be 0;
calculating a difference value of distances between the mandible characteristic point and the pupil characteristic point in each group of images, judging whether the difference value of the distances is larger than a preset distance threshold value, and if so, updating the voting weight of the group of images to be a first weight; calculating the difference value of the number of mask feature points of the previous frame image and the next frame image in each group of images, judging whether the difference value of the number is not less than a preset number threshold value, and if so, updating the voting weight of the group of images to be a second weight;
and according to the voting weights of the multiple groups of images, performing living body voting judgment on the voting queue, and outputting a living body detection result of the to-be-detected face video stream.
2. The method of claim 1, wherein the preset time range is 2-10 s.
3. The method according to claim 1, wherein the mask feature points in each frame of face image comprise: the outer edge point of the mask and the crease point in the middle of the mask.
4. The method of claim 1, wherein the first weight increases with increasing the preset distance threshold;
the second weight increases with an increase of the preset number threshold.
5. The method according to claim 1, wherein the lateral coordinates of the mandible feature point and the pupil feature point are the same, and the coordinates of the pupil feature point are the coordinates of the middle position of the left pupil feature point and the right pupil feature point of the human face.
6. The method of claim 5, wherein the difference in distance between the mandible feature point and the pupil feature point in each set of images is calculated by the following formula:
Δ d =|(y 12 -y 11 )-(y 22 -y 21 )|
wherein, Delta d Representing the difference of the distance between the mandible characteristic point and the pupil characteristic point in each group of images; y is 11 A vertical coordinate representing a characteristic point of the lower jaw in a previous frame image in each group of images; y is 12 The ordinate of the pupil characteristic point in the previous frame image in each group of images is represented; y is 21 The ordinate of the mandible characteristic point in the next frame image in each group of images is represented; y is 22 And the ordinate of the pupil characteristic point in the subsequent frame image in each group of images is represented.
7. The method according to claim 1, wherein the performing live body voting judgment on the voting queue according to the voting weights of the multiple groups of images and outputting a live body detection result of the human face video stream to be detected comprises:
calculating the voting sum of the voting queue according to the voting weights of the multiple groups of images;
and judging whether the sum of votes is larger than a preset vote and a threshold value, and if so, judging that the mouth opening action of the user is detected.
8. The method of claim 7, wherein the sum of votes of the voting queue is calculated by the following formula:
Figure FDA0003679160250000021
wherein T represents the sum of votes in the voting queue; n represents the frame number of the human face video stream, and is an even number; w is a i Representing the voting weight of the ith set of images.
9. The method of claim 7, wherein the preset votes and thresholds are set by the following formula:
R=(N-1)×(w 1 +w 2 )×η
wherein R represents a preset vote and a threshold; n represents the frame number of the face video stream; eta represents a correction coefficient, and the value of eta is 0-1; w is a 1 Representing a first weight; w is a 2 Representing a second weight.
10. The method of claim 9, wherein the number of frames N of the face video stream is less than 10.
11. The method of claim 1, wherein the facial feature detection method comprises:
performing model training on each frame of face image by adopting a deep learning model; or
And carrying out target feature classification on each frame of face image data through a classifier.
12. The method of claim 1, wherein the step of taking each frame of face image of the face video stream to be detected as input further comprises:
and preprocessing each frame of face image of the face video stream to be detected, wherein the preprocessing comprises at least one of rotation processing, scaling processing, cutting processing, gray processing or filtering processing.
13. A face liveness detection device, comprising:
the image acquisition module is used for tracking and acquiring the face image of a user when the user wearing the mask performs mouth opening action within a preset time range to obtain a face video stream to be detected;
the facial feature extraction module is used for taking each frame of facial image of the to-be-detected facial video stream as input, marking coordinates of mandible feature points and pupil feature points in each frame of facial image by a facial feature detection method, and recording the number of mask feature points in each frame of facial image;
a voting queue forming module, configured to form a voting queue including a plurality of groups of images by grouping two consecutive frames of images before and after the face video stream to be detected, and initialize the voting weight of each group of images to 0;
the voting weight setting module is used for calculating the difference value of the distance between the mandible characteristic point and the pupil characteristic point in each group of images, judging whether the difference value of the distance is greater than a preset distance threshold value or not, and if so, updating the voting weight of the group of images to be the first weight; calculating the difference value of the number of mask feature points of the previous frame image and the next frame image in each group of images, judging whether the difference value of the number is not less than a preset number threshold value, and if so, updating the voting weight of the group of images to be a second weight; and
and the living body detection output module is used for carrying out living body voting judgment on the voting queue according to the voting weights of the multiple groups of images and outputting a living body detection result of the to-be-detected face video stream.
14. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-12.
15. A computer-readable storage medium storing computer-executable instructions for implementing the method of any one of claims 1 to 12 when executed.
CN202110700867.8A 2021-06-23 2021-06-23 Face living body detection method, device, equipment and medium Active CN113420667B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110700867.8A CN113420667B (en) 2021-06-23 2021-06-23 Face living body detection method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110700867.8A CN113420667B (en) 2021-06-23 2021-06-23 Face living body detection method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN113420667A CN113420667A (en) 2021-09-21
CN113420667B true CN113420667B (en) 2022-08-02

Family

ID=77716475

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110700867.8A Active CN113420667B (en) 2021-06-23 2021-06-23 Face living body detection method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN113420667B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113971841A (en) * 2021-10-28 2022-01-25 北京市商汤科技开发有限公司 Living body detection method and device, computer equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106027543A (en) * 2016-06-23 2016-10-12 北京孔方同鑫科技有限公司 Identification method and apparatus based on weight calculation
CN111414831A (en) * 2020-03-13 2020-07-14 深圳市商汤科技有限公司 Monitoring method and system, electronic device and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8942438B2 (en) * 2010-07-19 2015-01-27 The University Of Maryland, College Park Method and apparatus for authenticating swipe biometric scanners
US10331942B2 (en) * 2017-05-31 2019-06-25 Facebook, Inc. Face liveness detection
CN108171215B (en) * 2018-01-25 2023-02-03 河南大学 Face camouflage detection and camouflage type detection method based on low-rank variation dictionary and sparse representation classification
CN111898569B (en) * 2020-08-05 2023-05-09 福建工程学院 Face identification method based on living body detection
CN112906571B (en) * 2021-02-20 2023-09-05 成都新希望金融信息有限公司 Living body identification method and device and electronic equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106027543A (en) * 2016-06-23 2016-10-12 北京孔方同鑫科技有限公司 Identification method and apparatus based on weight calculation
CN111414831A (en) * 2020-03-13 2020-07-14 深圳市商汤科技有限公司 Monitoring method and system, electronic device and storage medium

Also Published As

Publication number Publication date
CN113420667A (en) 2021-09-21

Similar Documents

Publication Publication Date Title
CN109117827B (en) Video-based method for automatically identifying wearing state of work clothes and work cap and alarm system
US20190130580A1 (en) Methods and systems for applying complex object detection in a video analytics system
US20170344830A1 (en) System and method for automatic detection of spherical video content
CN108229376B (en) Method and device for detecting blinking
CN110945522B (en) Learning state judging method and device and intelligent robot
US20190138748A1 (en) Removing personally identifiable data before transmission from a device
JP6397581B2 (en) Congestion status visualization device, congestion status visualization system, congestion status visualization method, and congestion status visualization program
CN105160318A (en) Facial expression based lie detection method and system
CN105426827A (en) Living body verification method, device and system
CN107844742B (en) Facial image glasses minimizing technology, device and storage medium
CN108197318A (en) Face identification method, device, robot and storage medium
CN112101123B (en) Attention detection method and device
Huszár et al. Live spoofing detection for automatic human activity recognition applications
CN111259763A (en) Target detection method and device, electronic equipment and readable storage medium
CN111259757B (en) Living body identification method, device and equipment based on image
CN113420667B (en) Face living body detection method, device, equipment and medium
Eyiokur et al. A survey on computer vision based human analysis in the COVID-19 era
KR102248706B1 (en) System for intergrated education management based on intelligent image analysis technology and method thereof
CN108197608A (en) Face identification method, device, robot and storage medium
CN114549371A (en) Image analysis method and device
CN105844204A (en) Method and device for recognizing behavior of human body
CN105809183A (en) Video-based human head tracking method and device thereof
CN111860057A (en) Face image blurring and living body detection method and device, storage medium and equipment
EP4198772A1 (en) Method and device for making music recommendation
CN115578668A (en) Target behavior recognition method, electronic device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant