WO2021179719A1 - Face detection method, apparatus, medium, and electronic device - Google Patents

Face detection method, apparatus, medium, and electronic device Download PDF

Info

Publication number
WO2021179719A1
WO2021179719A1 PCT/CN2020/135548 CN2020135548W WO2021179719A1 WO 2021179719 A1 WO2021179719 A1 WO 2021179719A1 CN 2020135548 W CN2020135548 W CN 2020135548W WO 2021179719 A1 WO2021179719 A1 WO 2021179719A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
video stream
stream data
shaking
coordinates
Prior art date
Application number
PCT/CN2020/135548
Other languages
French (fr)
Chinese (zh)
Inventor
蔡中印
陆进
陈斌
宋晨
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021179719A1 publication Critical patent/WO2021179719A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection
    • G06V40/45Detection of the body part being alive

Definitions

  • This application relates to the field of artificial intelligence and is applied to the field of face recognition, and in particular to a method, device, medium, and electronic equipment for detecting a living body of a face.
  • Action in vivo detection is one of the important means of in vivo detection. It mainly selects several actions randomly from the actions of shaking the head, nodding, opening and closing the mouth, opening and closing eyes, etc., and sends instructions to the user. The user performs corresponding actions in front of the camera according to the instructions. Finally, the video data recorded by the camera is obtained, analyzed, and the detection result is obtained. Shaking the head is one of the key actions of the motion detection.
  • a new attack method for live detection has emerged, that is, according to the instructions, the paper or head model containing the face is used to shake the head to simulate the shaking of the head. The current live detection method cannot deal with this Means for identification have resulted in low accuracy of live detection and high safety risks.
  • the purpose of this application is to provide a method, device, medium, and electronic equipment for detecting a living body of a human face.
  • a method for detecting a human face includes: inputting a face region picture corresponding to a face shaking video stream data to be subjected to a living body detection into a preset recognition model to obtain The face key point coordinates and the human eye sight offset vector output by the preset recognition model, wherein the preset recognition model is a face key point detection model combined with the human eye sight offset vector output layer,
  • the face key point detection model includes a convolutional layer, the human eye sight offset vector output layer is connected to the last layer of the convolution layer in the face key point detection model, and the face key point coordinates are
  • the human eye sight deviation vector corresponds to each face image frame included in the face shaking video stream data, and the human eye sight deviation vector is used to measure the degree of deviation of the human eye sight during the face shaking process Determining whether the face shaking video stream data passes the current stage of living body detection according to the face key point coordinates corresponding to each face image frame and the eye sight offset vector.
  • a face living detection device comprising: an input module configured to input the face area picture corresponding to the face shaking video stream data to be subjected to the living detection to A preset recognition model to obtain the key point coordinates of the face and the eye sight offset vector output by the preset recognition model, where the preset recognition model is a face combined with the human eye sight offset vector output layer A key point detection model, the face key point detection model includes a convolutional layer, the human eye sight offset vector output layer is connected to the last layer of the convolution layer in the face key point detection model, and the person The coordinates of the key points of the face and the eye sight offset vector correspond to each face image frame included in the face shaking video stream data, and the eye sight offset vector is used to measure the process of shaking the head of the face The degree of deviation of the human eye line of sight; the judgment module is configured to determine whether the face shaking video stream data passes according to the face key point coordinates corresponding to each face image frame and the human eye line of sight offset vector Live detection
  • a computer-readable storage medium stores computer-readable instructions, and when the computer-readable instructions are executed by the computer, the computer executes the following method:
  • the face region picture corresponding to the face shaking video stream data to be subjected to the live detection is input into the preset recognition model, and the key point coordinates of the face and the eye sight offset vector output by the preset recognition model are obtained,
  • the preset recognition model is a face key point detection model combined with a human eye sight offset vector output layer
  • the face key point detection model includes a convolutional layer
  • the human eye sight offset vector output layer is The last layer of the convolutional layer in the face key point detection model is connected, and the face key point coordinates and the eye sight offset vector are respectively connected to each face included in the face shaking video stream data.
  • the human eye sight deviation vector is used to measure the degree of deviation of the human eye sight in the process of shaking the head of the face; according to the face key point coordinates and the human eye corresponding to each face image frame
  • the sight offset vector determines whether the face shaking video stream data passes the current stage of living body detection.
  • an electronic device including: a processor; , To implement the following method: input the face area picture corresponding to the face shaking video stream data to be subjected to the live detection into the preset recognition model, and obtain the key point coordinates of the face and the line of sight of the human eye output by the preset recognition model Offset vector, wherein the preset recognition model is a face key point detection model combined with an output layer of the human eye line of sight offset vector, the face key point detection model includes a convolutional layer, and the human eye line of sight is biased The shift vector output layer is connected to the last layer of the convolutional layer in the face key point detection model.
  • the face key point coordinates and the human eye sight offset vector are respectively compared with the face shaking head video stream data.
  • the included face image frames correspond to each other, and the human eye sight offset vector is used to measure the degree of deviation of the human eye sight during the process of shaking the head of the face; according to the face key point coordinates corresponding to each face image frame Determine whether the video stream data of the human face shaking head passes the current stage of the living body detection by using the sight deviation vector of the human eye.
  • This application uses the face key point detection model combined with the human eye sight offset vector output layer to calculate the human eye sight offset vector corresponding to the face area picture, and uses the human eye sight offset vector to perform face living detection . Therefore, in the process of living body detection, fraudulent means using paper or head molds containing human faces to shake can be identified, thereby improving the accuracy of living body detection and reducing security risks.
  • Fig. 1 is a schematic diagram showing a system architecture of a method for detecting a human face according to an exemplary embodiment.
  • Fig. 2 is a flow chart showing a method for detecting human face living according to an exemplary embodiment.
  • Fig. 3 is a schematic diagram showing at least part of the structure of a preset recognition model used in a method for detecting a human face according to an exemplary embodiment.
  • FIG. 4 is a flowchart of steps before step 240 of an embodiment shown in the embodiment corresponding to FIG. 2.
  • Fig. 5 is a block diagram showing a device for detecting human face living according to an exemplary embodiment.
  • Fig. 6 is a block diagram showing an example of an electronic device for realizing the above method for detecting a human face according to an exemplary embodiment.
  • Fig. 7 shows a computer-readable storage medium for realizing the above-mentioned method for detecting human face living according to an exemplary embodiment.
  • the technical solution of the present application can be applied to the fields of artificial intelligence, smart city, blockchain and/or big data technology to realize living body detection.
  • the data involved in this application such as video stream data and/or face area pictures, etc.
  • the live face detection mainly refers to the process of judging whether the face in the video is a live face based on the recorded video containing the face.
  • the live face detection is one of the important technical means in the field of identity verification.
  • motion live detection is an important part of face live detection.
  • the user needs to perform corresponding actions according to instructions such as voice, text, etc. These actions mainly include shaking their heads, nodding, opening and closing their mouths, opening and closing eyes, etc. Of course, they can also not issue instructions to the user, but Observe user actions randomly.
  • the implementation terminal of this application can be any device with computing, processing, and storage functions.
  • the device can be connected to an external device for receiving or sending data.
  • it can be a portable mobile device, such as a smart phone, a tablet computer, a notebook computer, PDA (Personal Digital Assistant), etc., can also be fixed devices, such as computer equipment, field terminals, desktop computers, servers, workstations, etc., or a collection of multiple devices, such as cloud computing physical infrastructure or server clusters.
  • the implementation terminal of this application may be a server or a physical infrastructure of cloud computing.
  • Fig. 1 is a schematic diagram showing a system architecture of a method for detecting a human face according to an exemplary embodiment.
  • the system architecture includes a server 110 and a mobile terminal 120.
  • the mobile terminal 120 may be, for example, a smart phone.
  • the mobile terminal 120 is connected to the server 110 through a communication link. Therefore, the mobile terminal 120 can send data to the server 110 or receive data from the server 110.
  • the server 110 is provided with a server program and a preset recognition model, and the mobile terminal Client software is installed and running on 120, and server 110 is the implementation terminal in this embodiment.
  • a specific process may be as follows: the user records and uploads the face shaking his head to the server 110 by operating the client software on the mobile terminal 120 Video stream data; after the server 110 receives the face shaking head video stream data, it runs a server program to extract the face area picture in the face shaking head video stream data; then, the server 110 inputs the face area picture to The recognition model is preset to obtain the face key point coordinates and the human eye line of sight offset vector output by the model; finally, the server 110 runs the server program to according to the face key point coordinates and the human eye line of sight offset vector output by the model To judge and output the detection results of the current stage of the living body detection.
  • Figure 1 is only an embodiment of the present application.
  • the implementation terminal in this embodiment is a server, and the terminal that provides face shaking video stream data is a mobile terminal, in other embodiments or actual
  • the implementation terminal and the terminal that provides the face shaking video stream data can be various terminals or devices as described above; although in this embodiment, the face shaking video stream data is from the terminal other than the application implementation terminal. It is sent from the terminal, but in fact, the face shaking video stream data can be directly obtained by the local terminal.
  • This application does not limit this, and the protection scope of this application should not be restricted in any way.
  • Fig. 2 is a flow chart showing a method for detecting human face living according to an exemplary embodiment.
  • the face living detection method provided in this embodiment can be executed by a server, as shown in FIG. 2, and includes the following steps.
  • Step 240 Input the face region picture corresponding to the face shaking video stream data to be subjected to the live detection into a preset recognition model, and obtain the key point coordinates of the face and the human eye sight offset output by the preset recognition model Vector.
  • the preset recognition model is a face key point detection model combined with a human eye line of sight offset vector output layer
  • the face key point detection model includes a convolutional layer
  • the human eye line of sight offset vector output layer Connected to the last layer of the convolutional layer in the face key point detection model, the face key point coordinates and the eye sight offset vector are respectively the same as each person included in the face shaking video stream data
  • the human eye sight deviation vector is used to measure the degree of deviation of the human eye sight during the process of shaking the head of the face.
  • the face key point coordinates and the eye sight offset vector correspond to each face image frame, that is, for each face image frame, there is a corresponding face key point coordinate and eye sight offset vector.
  • the human eye sight offset vector includes the direction and length. For example, the human eye sight to the left is positive, and to the right is negative. Length can be defined as the normalized relative degree of the deviation of the human eye pupil from the center of the eye socket.
  • Fig. 3 is a schematic diagram showing at least part of the structure of a preset recognition model used in a method for detecting a human face according to an exemplary embodiment.
  • the preset recognition model 300 includes at least a face key point detection model 310 and an eye sight offset vector output layer 320.
  • the part framed by the dashed line is the structural part of the face key point detection model 310, including the convolutional layer 311 and the output part 312 after the convolutional layer 311.
  • the convolutional layer 311 can be stacked by a multilayer neural network structure.
  • the output part 312 will finally output the coordinates of the key points of the face.
  • other structures may be included before the convolutional layer 311 and between the network structures of the convolutional layer 311.
  • the human eye sight offset vector output layer 320 receives the input of the last layer of the convolutional layer, and finally outputs the human eye sight offset vector corresponding to the face image frame.
  • the human eye sight offset vector output layer 320 is usually a fully connected layer.
  • FIG. 4 is a flowchart of steps before step 240 of an embodiment shown in the embodiment corresponding to FIG. 2. Please refer to Figure 4, including the following steps.
  • Step 210 Deframe the face shaking video stream data to be subjected to the living body detection, and obtain the face image frame corresponding to the face shaking video stream data.
  • Deframing the face shaking video stream data is a process of dividing the face shaking video stream data into face image frames.
  • the method before deframing the face shaking video stream data to be subjected to liveness detection to obtain the face image frame corresponding to the face shaking video stream data, the method further includes: obtaining from a user terminal Video stream data of face shaking head to be subjected to live detection.
  • the method before acquiring the face shaking video stream data to be subjected to the living body detection from the user terminal, the method further includes: randomly selecting a preset action instruction from a plurality of preset action instructions and selecting the selected one.
  • the preset action instruction is sent to the user terminal, where the plurality of preset action instructions include shaking the head, and acquiring from the user terminal the face shaking video stream data to be subjected to living body detection is based on the selected preset action
  • the instruction is carried out under the condition of shaking the head.
  • Step 220 Input the face image frame into a preset face detection model, and obtain the face detection frame coordinates corresponding to the face image frame.
  • the pixel area included in the face image frame may be very large, and the pixel area occupied by the face may be only a part or a small part of the face image frame. In order to accurately detect the face, it is necessary for the face image The area corresponding to the face in the frame is identified in a targeted manner.
  • the face detection frame coordinates are the position coordinates of the area corresponding to the face in the face image frame.
  • the preset face detection model can output the corresponding face detection frame coordinates according to the input of the face image frame.
  • the preset face detection model can be implemented based on various algorithms or principles, for example, general machine learning algorithms. It is a deep learning algorithm.
  • Step 230 Extract a face area picture from the face image frame according to the face detection frame coordinates.
  • the extracting a face region picture from the face image frame according to the face detection frame coordinates includes: determining the first person corresponding to the face detection frame coordinates in the face image frame Face detection frame area; expand the first face detection frame area according to a predetermined expansion ratio to obtain a second face detection frame area; extract people based on the range defined by the second face detection frame area Face area picture.
  • the first face detection frame area can be a rectangle
  • the face detection frame coordinates are coordinates that can be used to uniquely determine the range of the rectangle.
  • the face detection frame coordinates can be the coordinates of the four vertices of the rectangle, using the rectangle
  • the coordinates of the four vertices can determine the range of a rectangle;
  • the coordinates of the face detection frame can also be the coordinates of the intersection of the two diagonals of the rectangle. After having the coordinates of the intersection of the two diagonals, the Assuming the length and width of the rectangle, the range of the corresponding rectangle can also be determined.
  • the predetermined frame expansion ratio is the ratio of further expanding the coverage area on the basis of the original area.
  • the predetermined expansion ratio can be various predetermined ratios, such as 20%.
  • the expansion operation of the first face area can be performed in a variety of ways or angles, such as expanding from the center to the surroundings, to the left and right or up and down. Amplify, amplify to the upper right or lower left, and so on. In this way, after the frame expansion operation, the area of the second face detection frame area obtained is larger than the first face detection frame area.
  • the face area picture is not extracted directly according to the range defined by the first face detection frame area, but First, expand the frame of the first face detection frame area to obtain the second face detection frame area, and then extract the face area picture based on the range defined by the second face detection frame area. Therefore, this can make the extracted The face area picture is large enough to retain more information about the face, which improves the live detection effect to a certain extent.
  • the inputting the face image frame into a preset face detection model to obtain the face detection frame coordinates corresponding to the face image frame includes: Input to the preset face detection model to obtain the face detection frame coordinates corresponding to each of the face image frames; the extracting the face area picture from the face image frame according to the face detection frame coordinates includes: Extract a face region picture from each face image frame according to the coordinates of each face detection frame.
  • Extracting the face region picture from the face image frame is the process of matting in the face image frame.
  • the face region pictures are first determined by the preset face detection model to determine the face detection frame coordinates, and then extracted according to the face detection frame coordinates.
  • the inputting the face image frame into a preset face detection model to obtain the face detection frame coordinates corresponding to the face image frame includes: adding at least one of the face image frame Input to the preset face detection model to obtain the first face detection frame coordinates corresponding to at least one of the face image frames respectively; said extracting a face region picture from the face image frame according to the face detection frame coordinates , Including: extracting a corresponding first face region picture from the face image frame corresponding to the first face detection frame coordinates according to each of the first face detection frame coordinates; The region picture is input to the preset recognition model to obtain the face key point coordinates and the eye sight offset vector corresponding to each of the first face region pictures; determine the face corresponding to each of the first face region pictures The circumscribed rectangle of the face corresponding to the key point coordinates; determining a second face detection frame corresponding to at least one face image frame after the at least one face image frame according to the circumscribed rectangle of the face and a preset estimation algorithm Coordinates; according to the determined coordinates
  • the circumscribed rectangle of the face is a rectangle that can just cover the face area, and at least a part of the points on the edge of the face area are located on the rectangle.
  • the preset estimation algorithm may be various algorithms capable of estimating or calculating the motion state of the face, for example, it may be a Kalman filter.
  • Kalman Filter also known as Kalman filter equation or Kalman equation of motion, is an algorithm that uses linear system state equations to perform optimal estimation of system state through system input and output observation data. Specifically, by bringing the circumscribed rectangle of the face corresponding to at least one previous face image frame into the Kalman equation of motion, the coordinates of the second face detection frame corresponding to the current or future face image frames can be determined. The coordinates of the face detection frame are predicted based on the Kalman equation of motion.
  • two methods are used to determine the face detection frame coordinates corresponding to the face image frame: For at least one face image frame in the front, the method used is to The face image frame is input to the preset face detection model to obtain the face detection frame coordinates, and then the corresponding face area picture is extracted from the corresponding face image frame according to the face detection frame coordinates; for the current or subsequent faces The image frame is also determined based on the previously extracted face region picture.
  • the previously extracted face region picture is input into the preset recognition model to obtain the key point coordinates of the face, and then according to the face
  • the key point coordinates determine the corresponding circumscribed rectangle of the face, and finally input the circumscribed rectangle of the face into the preset estimation algorithm, and the coordinates of the second face detection frame corresponding to the current and subsequent face image frames can be determined.
  • this method consumes less computing resources and is more efficient.
  • Step 250 Determine whether the face shaking video stream data passes the current stage of living body detection according to the face key point coordinates corresponding to each face image frame and the eye sight offset vector.
  • the method further includes: obtaining face video stream data after the face shaking video stream data when the current stage of the live body detection is passed; and performing silent live body detection on the face video stream data.
  • Various algorithms or models can be used to perform silent live detection on face video stream data.
  • the person does not need to shake his head, and the position and angle of the face are in a relatively unchanged state.
  • the subsequent silent detection is only performed when the current stage of the living body detection is passed, the number of users who can complete the living body detection alone is far smaller than the number of users who can complete the silent detection alone.
  • the current stage of live detection can filter out a large number of users, so this reduces resource consumption to a certain extent.
  • the part of the preset recognition model that is related to the output layer of the human eye shift vector is trained in the following way: Obtain the normal person corresponding to the normal face shaking video stream data in the sample data set The face area picture and the face paper area picture corresponding to the face paper shaking head video stream data, the sample data set includes multiple normal face shaking head video stream data and multiple face paper shaking head video stream data; the normal person The face area picture and the face paper area picture are input to the preset recognition model, and the person output by the preset recognition model corresponding to the normal face area picture and the face paper area picture respectively Face key point coordinates and human eye line of sight offset vector; respectively use the face key point coordinate sequence corresponding to the normal face shaking head video stream data and the face key point coordinates corresponding to the face paper shaking head video stream data The sequence determines the face shaking degree sequence corresponding to the normal face shaking head video stream data and the face paper shaking head video stream data; for each normal face shaking head video stream data and each face sheet Head shaking video stream data, determine the face key point coordinates corresponding to the
  • the determining whether the face shaking video stream data passes the current stage of living body detection according to the face key point coordinates corresponding to each face image frame and the human eye sight offset vector includes: The face key point coordinates corresponding to the face image frame determine the face key point coordinates corresponding to the face shaking degree within the predetermined face shaking degree range, as the second target face key point coordinates; 2. The coordinates of the key points of the target face and the sight deviation vector of the human eye determine the score corresponding to the face shaking video stream data to be subjected to the live detection; if the score reaches the score threshold, it is determined to pass the current stage of the live detection , Otherwise, it is determined that the current stage of the living body test has not passed.
  • the score of normal face shaking video stream data is generally greater than the score of face paper shaking video stream data.
  • the determining the score threshold value using each of the scores includes: determining the score threshold value according to the score corresponding to each normal face shaking video stream data, so that and only making the normal face shaking video stream data The score of a predetermined proportion among the corresponding scores reaches the score threshold; the training of the preset recognition model based on the score threshold includes: determining that the score corresponding to each normal face shaking video stream data is less than the score Threshold the ratio of the number of scores of human face paper shaking head video stream data to the score of all human face paper shaking head video stream data; training the preset recognition model according to the ratio.
  • This ratio measures the proportion of all face paper shaking head video stream data that can be correctly identified as human face paper shaking head video stream data, that is, the correct rejection rate. Therefore, this ratio can be increased through training.
  • each of the scores can also be used in other ways to determine the score threshold.
  • the minimum value of a predetermined proportion of scores from small to large can be used as the score threshold, or a score threshold can be determined so that the face paper shakes the head video stream.
  • the scores of the predetermined proportion among the scores corresponding to the data do not reach the score threshold.
  • the scores ranked at 99 from the largest to the smallest will be used as the score threshold.
  • the degree of face shaking is based on the angle, which can be used to measure the size of the shaking head angle.
  • the change in the coordinates of the key points of the face affects the degree of face shaking. Therefore, the corresponding face shaking can be determined according to the coordinate sequence of the key points of the face.
  • the degree sequence, determining the corresponding face shaking degree sequence according to the coordinate sequence of the key points of the face can be implemented using various algorithms or models.
  • the predetermined range of the degree of face shaking may be, for example, 15 degrees.
  • Each picture of the normal face area or the picture of the face paper area corresponds to a degree of shaking the head of the face.
  • all the face shaking degrees corresponding to the normal face region pictures constitute a face shaking degree sequence.
  • the human face shaking degree corresponding to the picture of the paper area of the human face in the face paper shaking head video stream data can also form a face shaking degree sequence.
  • the normal face shaking video stream data and the face paper shaking video stream data are a set of face image frames in time series
  • the normal face area image and the face paper shaking video corresponding to the normal face shaking video stream data are all in the form of a picture sequence.
  • the face key point coordinates corresponding to the normal face shaking head video stream data and the face paper shaking head video stream data can also be sequenced The way exists.
  • the face key point detection model combined with the human eye sight offset vector output layer is used to calculate the eye sight deviation corresponding to the face region picture Shift the vector, and use the eye sight shift vector to detect the face living body. Therefore, in the process of living body detection, fraudulent means using paper or head molds containing human faces to shake can be identified, thereby improving the accuracy of living body detection and reducing security risks.
  • the present application also provides a face living detection device, and the following are the device embodiments of the present application.
  • Fig. 5 is a block diagram showing a device for detecting human face living according to an exemplary embodiment.
  • the device 500 includes: an input module 510, configured to input the face region picture corresponding to the face shaking video stream data to be subjected to the live detection into a preset recognition model, and the preset recognition model is obtained from the preset recognition model.
  • the detection model includes a convolutional layer.
  • the human eye sight offset vector output layer is connected to the last layer of the convolutional layer in the face key point detection model.
  • the face key point coordinates and the human eye sight deviation corresponds to each face image frame included in the face shaking video stream data, and the human eye sight deviation vector is used to measure the degree of deviation of the human eye sight during the face shaking process; the judgment module 520 , Configured to determine whether the face shaking video stream data passes the current stage of living body detection according to the face key point coordinates corresponding to each face image frame and the eye sight offset vector.
  • an electronic device capable of implementing the above method.
  • the electronic device 600 according to this embodiment of the present application will be described below with reference to FIG. 6.
  • the electronic device 600 shown in FIG. 6 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present application.
  • the electronic device 600 is represented in the form of a general-purpose computing device.
  • the components of the electronic device 600 may include, but are not limited to: the aforementioned at least one processing unit 610, the aforementioned at least one storage unit 620, and a bus 630 connecting different system components (including the storage unit 620 and the processing unit 610).
  • the storage unit 620 stores program code, and the program code can be executed by the processing unit 610, so that the processing unit 610 executes the various exemplary methods described in the above-mentioned "Embodiment Method" section of this specification. Steps of implementation.
  • the storage unit 620 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 621 and/or a cache storage unit 622, and may further include a read-only storage unit (ROM) 623.
  • the storage unit 620 may also include a program/utility tool 624 having a set of (at least one) program module 625.
  • Such program module 625 includes but is not limited to: an operating system, one or more application programs, other program modules, and program data, Each of these examples or some combination may include the implementation of a network environment.
  • the bus 630 may represent one or more of several types of bus structures, including a storage unit bus or a storage unit controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any bus structure among multiple bus structures. bus.
  • the electronic device 600 may also communicate with one or more external devices 800 (such as keyboards, pointing devices, Bluetooth devices, etc.), and may also communicate with one or more devices that enable a user to interact with the electronic device 600, and/or communicate with Any device (such as a router, modem, etc.) that enables the electronic device 600 to communicate with one or more other computing devices. Such communication may be performed through an input/output (I/O) interface 650, such as communication with the display unit 640.
  • I/O input/output
  • the electronic device 600 may also communicate with one or more networks (for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 660.
  • networks for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet
  • the network adapter 660 communicates with other modules of the electronic device 600 through the bus 630.
  • other hardware and/or software modules can be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives And data backup storage system, etc.
  • the example embodiments described here can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which can be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiment of the present application.
  • a computing device which can be a personal computer, a server, a terminal device, or a network device, etc.
  • the computer-readable storage medium stores computer-readable instructions, when the computer-readable instructions are executed by the computer, the computer executes this specification The above method.
  • the storage medium involved in this application such as a computer-readable storage medium, may be non-volatile or volatile.
  • each aspect of the present application can also be implemented in the form of a program product, which includes program code.
  • the program product runs on a terminal device, the program code is used to make the The terminal device executes the steps according to various exemplary embodiments of the present application described in the above-mentioned "Exemplary Method" section of this specification.
  • a program product 700 for implementing the above method according to an embodiment of the present application is described. It can adopt a portable compact disk read-only memory (CD-ROM) and include program code, and can be stored in a terminal device, For example, running on a personal computer.
  • the program product of this application is not limited to this.
  • the readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, device, or device.
  • the program product can use any combination of one or more readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • the readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Type programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • the computer-readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and readable program code is carried therein.
  • This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the readable signal medium may also be any readable medium other than a readable storage medium, and the readable medium may send, propagate, or transmit a program for use by or in combination with the instruction execution system, apparatus, or device.
  • the program code contained on the readable medium can be transmitted by any suitable medium, including but not limited to wireless, wired, optical cable, RF, etc., or any suitable combination of the above.
  • the program code used to perform the operations of the present application can be written in any combination of one or more programming languages.
  • the programming languages include object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural programming languages. Programming language-such as "C" language or similar programming language.
  • the program code can be executed entirely on the user's computing device, partly on the user's device, executed as an independent software package, partly on the user's computing device and partly executed on the remote computing device, or entirely on the remote computing device or server Executed on.
  • the remote computing device can be connected to a user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computing device (for example, using Internet service providers). Business to connect via the Internet).
  • LAN local area network
  • WAN wide area network
  • Internet service providers for example, using Internet service providers.

Abstract

Provided are a face detection method, apparatus, medium, and electronic device. The method comprises: inputting into a preset recognition model a face region image corresponding to face head-shake video stream data to be subjected to live-body detection, to obtain face key point coordinates and an eye line-of-sight offset vector outputted by the preset recognition model (240), the preset recognition model being a face key point detection model combined with a eye line-of-sight offset vector output layer, the eye line-of-sight offset vector being used for measuring the degree of offset of the eye line of sight during the process of shaking the head; according to the coordinates of the face key points corresponding to each face image frame and the eye line-of-sight offset vector, determining whether the face head-shake video stream data passes the current stage of live-body detection (250). In the method, during a live-body detection process, it is possible to identify fraudulent means making use of paper or head models containing human faces to shake, improving the accuracy of live-body detection and reducing security risks.

Description

人脸活体检测方法、装置、介质及电子设备Human face live detection method, device, medium and electronic equipment
本申请要求于2020年10月12日提交中国专利局、申请号为202011086784.6,发明名称为“人脸活体检测方法、装置、介质及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on October 12, 2020, with the application number 202011086784.6, and the invention title "Methods, Apparatus, Media and Electronic Equipment for Living Face Detection", the entire contents of which are incorporated by reference Incorporated in this application.
技术领域Technical field
本申请涉及人工智能领域,应用于人脸识别领域,特别涉及一种人脸活体检测方法、装置、介质及电子设备。This application relates to the field of artificial intelligence and is applied to the field of face recognition, and in particular to a method, device, medium, and electronic equipment for detecting a living body of a face.
背景技术Background technique
动作活体检测是活体检测的重要手段之一,它主要通过在摇头、点头、张闭嘴、睁闭眼等动作中随机选取若干动作,向用户发出指令,用户根据指令在摄像头前进行相应动作,最后获取摄像头录制的这些视频数据,对其进行分析,得到检测结果,摇头是动作活体检测的关键动作之一。然而,发明人发现,目前针对活体检测出现了新的攻击手段,即根据指令的指示利用包含人脸的纸张或头模进行相应的摇晃来模拟摇头动作,目前的动作活体检测方法无法对这种手段进行识别,造成了活体检测准确率低,安全风险高。Action in vivo detection is one of the important means of in vivo detection. It mainly selects several actions randomly from the actions of shaking the head, nodding, opening and closing the mouth, opening and closing eyes, etc., and sends instructions to the user. The user performs corresponding actions in front of the camera according to the instructions. Finally, the video data recorded by the camera is obtained, analyzed, and the detection result is obtained. Shaking the head is one of the key actions of the motion detection. However, the inventor found that a new attack method for live detection has emerged, that is, according to the instructions, the paper or head model containing the face is used to shake the head to simulate the shaking of the head. The current live detection method cannot deal with this Means for identification have resulted in low accuracy of live detection and high safety risks.
技术问题technical problem
在人工智能和人脸识别技术领域,为了解决上述技术问题,本申请的目的在于提供一种人脸活体检测方法、装置、介质及电子设备。In the field of artificial intelligence and face recognition technology, in order to solve the above technical problems, the purpose of this application is to provide a method, device, medium, and electronic equipment for detecting a living body of a human face.
技术解决方案Technical solutions
根据本申请的一方面,提供了一种人脸活体检测方法,所述方法包括:将待进行活体检测的人脸摇头视频流数据所对应的人脸区域图片输入至预设识别模型,得到由所述预设识别模型输出的人脸关键点坐标和人眼视线偏移矢量,其中,所述预设识别模型为结合了人眼视线偏移矢量输出层的人脸关键点检测模型,所述人脸关键点检测模型包括卷积层,所述人眼视线偏移矢量输出层与所述人脸关键点检测模型中卷积层的最后一层相连,所述人脸关键点坐标和所述人眼视线偏移矢量分别与所述人脸摇头视频流数据所包括的各人脸图像帧相对应,所述人眼视线偏移矢量用于衡量人脸摇头过程中人眼视线的偏移程度;根据与各人脸图像帧对应的所述人脸关键点坐标和所述人眼视线偏移矢量确定所述人脸摇头视频流数据是否通过当前阶段的活体检测。According to one aspect of the present application, there is provided a method for detecting a human face. The method includes: inputting a face region picture corresponding to a face shaking video stream data to be subjected to a living body detection into a preset recognition model to obtain The face key point coordinates and the human eye sight offset vector output by the preset recognition model, wherein the preset recognition model is a face key point detection model combined with the human eye sight offset vector output layer, The face key point detection model includes a convolutional layer, the human eye sight offset vector output layer is connected to the last layer of the convolution layer in the face key point detection model, and the face key point coordinates are The human eye sight deviation vector corresponds to each face image frame included in the face shaking video stream data, and the human eye sight deviation vector is used to measure the degree of deviation of the human eye sight during the face shaking process Determining whether the face shaking video stream data passes the current stage of living body detection according to the face key point coordinates corresponding to each face image frame and the eye sight offset vector.
根据本申请的另一方面,提供了一种人脸活体检测装置,所述装置包括:输入模块,被配置为将待进行活体检测的人脸摇头视频流数据所对应的人脸区域图片输入至预设识别模型,得到由所述预设识别模型输出的人脸关键点坐标和人眼视线偏移矢量,其中,所述预设识别模型为结合了人眼视线偏移矢量输出层的人脸关键点检测模型,所述人脸关键点检测模型包括卷积层,所述人眼视线偏移矢量输出层与所述人脸关键点检测模型中卷积层的最后一层相连,所述人脸关键点坐标和所述人眼视线偏移矢量分别与所述人脸摇头视频流数据所包括的各人脸图像帧相对应,所述人眼视线偏移矢量用于衡量人脸摇头过程中人眼视线的偏移程度;判断模块,被配置为根据与各人脸图像帧对应的所述人脸关键点坐标和所述人眼视线偏移矢量确定所述人脸摇头视频流数据是否通过当前阶段的活体检测。According to another aspect of the present application, there is provided a face living detection device, the device comprising: an input module configured to input the face area picture corresponding to the face shaking video stream data to be subjected to the living detection to A preset recognition model to obtain the key point coordinates of the face and the eye sight offset vector output by the preset recognition model, where the preset recognition model is a face combined with the human eye sight offset vector output layer A key point detection model, the face key point detection model includes a convolutional layer, the human eye sight offset vector output layer is connected to the last layer of the convolution layer in the face key point detection model, and the person The coordinates of the key points of the face and the eye sight offset vector correspond to each face image frame included in the face shaking video stream data, and the eye sight offset vector is used to measure the process of shaking the head of the face The degree of deviation of the human eye line of sight; the judgment module is configured to determine whether the face shaking video stream data passes according to the face key point coordinates corresponding to each face image frame and the human eye line of sight offset vector Live detection at the current stage.
根据本申请的另一方面,提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,当所述计算机可读指令被计算机执行时,使计算机执行以下方法:将待进行活体检测的人脸摇头视频流数据所对应的人脸区域图片输入至预设识别模型,得到由所述预设识别模型输出的人脸关键点坐标和人眼视线偏移矢量,其中,所述预设识别模型为结合了人眼视线偏移矢量输出层的人脸关键点检测模型,所述人脸关键点检测模型包括卷积层,所述人眼视线偏移矢量输出层与所述人脸关键点检测模型中卷积层的最后一层相连,所述人脸关键点坐标和所述人眼视线偏移矢量分别与所述人脸摇头视频流数据所包括的各人脸图像帧相对应,所述人眼视线偏移矢量用于衡量人脸摇头过程中人眼视线的偏移程度;根据与各人脸图像帧对应的所述人脸关键点坐标和所述人眼视线偏移矢量确定所述人脸摇头视频流数据是否通过当前阶段的活体检测。According to another aspect of the present application, there is provided a computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, and when the computer-readable instructions are executed by the computer, the computer executes the following method: The face region picture corresponding to the face shaking video stream data to be subjected to the live detection is input into the preset recognition model, and the key point coordinates of the face and the eye sight offset vector output by the preset recognition model are obtained, where The preset recognition model is a face key point detection model combined with a human eye sight offset vector output layer, the face key point detection model includes a convolutional layer, and the human eye sight offset vector output layer is The last layer of the convolutional layer in the face key point detection model is connected, and the face key point coordinates and the eye sight offset vector are respectively connected to each face included in the face shaking video stream data. Corresponding to the image frame, the human eye sight deviation vector is used to measure the degree of deviation of the human eye sight in the process of shaking the head of the face; according to the face key point coordinates and the human eye corresponding to each face image frame The sight offset vector determines whether the face shaking video stream data passes the current stage of living body detection.
根据本申请的另一方面,提供了一种电子设备,所述电子设备包括:处理器;存储器,所述存储器上存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,实现以下方法:将待进行活体检测的人脸摇头视频流数据所对应的人脸区域图片输入至预设识别模型,得到由所述预设识别模型输出的人脸关键点坐标和人眼视线偏移矢量,其中,所述预设识别模型为结合了人眼视线偏移矢量输出层的人脸关键点检测模型,所述人脸关键点检测模型包括卷积层,所述人眼视线偏移矢量输出层与所述人脸关键点检测模型中卷积层的最后一层相连,所述人脸关键点坐标和所述人眼视线偏移矢量分别与所述人脸摇头视频流数据所包括的各人脸图像帧相对应,所述人眼视线偏移矢量用于衡量人脸摇头过程中人眼视线的偏移程度;根据与各人脸图像帧对应的所述人脸关键点坐标和所述人眼视线偏移矢量确定所述人脸摇头视频流数据是否通过当前阶段的活体检测。According to another aspect of the present application, there is provided an electronic device, the electronic device including: a processor; , To implement the following method: input the face area picture corresponding to the face shaking video stream data to be subjected to the live detection into the preset recognition model, and obtain the key point coordinates of the face and the line of sight of the human eye output by the preset recognition model Offset vector, wherein the preset recognition model is a face key point detection model combined with an output layer of the human eye line of sight offset vector, the face key point detection model includes a convolutional layer, and the human eye line of sight is biased The shift vector output layer is connected to the last layer of the convolutional layer in the face key point detection model. The face key point coordinates and the human eye sight offset vector are respectively compared with the face shaking head video stream data. The included face image frames correspond to each other, and the human eye sight offset vector is used to measure the degree of deviation of the human eye sight during the process of shaking the head of the face; according to the face key point coordinates corresponding to each face image frame Determine whether the video stream data of the human face shaking head passes the current stage of the living body detection by using the sight deviation vector of the human eye.
有益效果Beneficial effect
本申请通过利用结合了人眼视线偏移矢量输出层的人脸关键点检测模型计算出人脸区域图片所对应的人眼视线偏移矢量,并利用人眼视线偏移矢量进行人脸活体检测。因此,在活体检测过程中,可以识别出利用包含人脸的纸张或头模进行摇晃的欺诈手段,从而提高了活体检测的准确率,降低了安全风险。This application uses the face key point detection model combined with the human eye sight offset vector output layer to calculate the human eye sight offset vector corresponding to the face area picture, and uses the human eye sight offset vector to perform face living detection . Therefore, in the process of living body detection, fraudulent means using paper or head molds containing human faces to shake can be identified, thereby improving the accuracy of living body detection and reducing security risks.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性的,并不能限制本申请。It should be understood that the above general description and the following detailed description are only exemplary and cannot limit the application.
附图说明Description of the drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。The drawings herein are incorporated into the specification and constitute a part of the specification, show embodiments that conform to the application, and are used together with the specification to explain the principle of the application.
图1是根据一示例性实施例示出的一种人脸活体检测方法的系统架构示意图。Fig. 1 is a schematic diagram showing a system architecture of a method for detecting a human face according to an exemplary embodiment.
图2是根据一示例性实施例示出的一种人脸活体检测方法的流程图。Fig. 2 is a flow chart showing a method for detecting human face living according to an exemplary embodiment.
图3是根据一示例性实施例示出的用于人脸活体检测方法的预设识别模型的至少部分结构示意图。Fig. 3 is a schematic diagram showing at least part of the structure of a preset recognition model used in a method for detecting a human face according to an exemplary embodiment.
图4是根据图2对应实施例示出的一实施例的步骤240之前步骤的流程图。FIG. 4 is a flowchart of steps before step 240 of an embodiment shown in the embodiment corresponding to FIG. 2.
图5是根据一示例性实施例示出的一种人脸活体检测装置的框图。Fig. 5 is a block diagram showing a device for detecting human face living according to an exemplary embodiment.
图6是根据一示例性实施例示出的一种实现上述人脸活体检测方法的电子设备示例框图。Fig. 6 is a block diagram showing an example of an electronic device for realizing the above method for detecting a human face according to an exemplary embodiment.
图7是根据一示例性实施例示出的一种实现上述人脸活体检测方法的计算机可读存储介质。Fig. 7 shows a computer-readable storage medium for realizing the above-mentioned method for detecting human face living according to an exemplary embodiment.
本发明的实施方式Embodiments of the present invention
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。The exemplary embodiments will be described in detail here, and examples thereof are shown in the accompanying drawings. When the following description refers to the accompanying drawings, unless otherwise indicated, the same numbers in different drawings represent the same or similar elements. The implementation manners described in the following exemplary embodiments do not represent all implementation manners consistent with the present application. On the contrary, they are merely examples of devices and methods consistent with some aspects of the application as detailed in the appended claims.
此外,附图仅为本申请的示意性图解,并非一定是按比例绘制。图中相同的附图标记表示相同或类似的部分,因而将省略对它们的重复描述。附图中所示的一些方框图是功能实体,不一定必须与物理或逻辑上独立的实体相对应。In addition, the drawings are only schematic illustrations of the application and are not necessarily drawn to scale. The same reference numerals in the figures denote the same or similar parts, and thus their repeated description will be omitted. Some of the block diagrams shown in the drawings are functional entities and do not necessarily correspond to physically or logically independent entities.
本申请的技术方案可应用于人工智能、智慧城市、区块链和/或大数据技术领域,以实现活体检测。可选的,本申请涉及的数据如视频流数据和/或人脸区域图片等可存储于数据库中,或者可以存储于区块链中,比如通过区块链分布式存储,本申请不做限定。The technical solution of the present application can be applied to the fields of artificial intelligence, smart city, blockchain and/or big data technology to realize living body detection. Optionally, the data involved in this application, such as video stream data and/or face area pictures, etc., can be stored in a database, or can be stored in a blockchain, such as distributed storage through a blockchain, which is not limited in this application .
本申请首先提供了一种人脸活体检测方法。人脸活体检测主要是指根据录制的含有人脸的视频来判断该视频中的人脸是否为活体人脸的过程,人脸活体检测是身份核验领域的重要技术手段之一。而动作活体检测是人脸活体检测的重要环节。在进行动作活体检测时,用户需要根据语音、文字等指令的指示进行相应的动作,这些动作主要包括摇头、点头、张闭嘴、睁闭眼等,当然也可以不向用户发出指令,而是随机观察用户动作。当指示用户进行摇头时,不法分子可能会利用包含人脸的纸张或头模进行摇晃的欺诈手段配合完成摇头动作,并上传记录这些动作的视频从而完成欺诈操作。利用相关的技术手段,无法对这种情况进行监测,导致这种欺诈手段能够轻易地通过活体检测,风险隐患大。而利用本申请提供的一种人脸活体检测方法可以识别出这些欺诈手段,从而可以提高活体检测的准确率,进而可以降低损失。This application first provides a method for detecting live human faces. The live face detection mainly refers to the process of judging whether the face in the video is a live face based on the recorded video containing the face. The live face detection is one of the important technical means in the field of identity verification. And motion live detection is an important part of face live detection. When performing live motion detection, the user needs to perform corresponding actions according to instructions such as voice, text, etc. These actions mainly include shaking their heads, nodding, opening and closing their mouths, opening and closing eyes, etc. Of course, they can also not issue instructions to the user, but Observe user actions randomly. When the user is instructed to shake his head, criminals may use paper containing human faces or a head model to shake the head to complete the shaking action, and upload a video recording these actions to complete the fraudulent operation. Using relevant technical means, it is impossible to monitor this situation, resulting in this kind of fraudulent means can easily pass live detection, and the risk is high. However, these fraudulent methods can be identified by using a face living detection method provided in the present application, so that the accuracy of living detection can be improved, and the loss can be reduced.
本申请的实施终端可以是任何具有运算、处理以及存储功能的设备,该设备可以与外部设备相连,用于接收或者发送数据,具体可以是便携移动设备,例如智能手机、平板电脑、笔记本电脑、PDA(Personal Digital Assistant)等,也可以是固定式设备,例如,计算机设备、现场终端、台式电脑、服务器、工作站等,还可以是多个设备的集合,比如云计算的物理基础设施或者服务器集群。The implementation terminal of this application can be any device with computing, processing, and storage functions. The device can be connected to an external device for receiving or sending data. Specifically, it can be a portable mobile device, such as a smart phone, a tablet computer, a notebook computer, PDA (Personal Digital Assistant), etc., can also be fixed devices, such as computer equipment, field terminals, desktop computers, servers, workstations, etc., or a collection of multiple devices, such as cloud computing physical infrastructure or server clusters.
可选地,本申请的实施终端可以为服务器或者云计算的物理基础设施。Optionally, the implementation terminal of this application may be a server or a physical infrastructure of cloud computing.
图1是根据一示例性实施例示出的一种人脸活体检测方法的系统架构示意图。如图1所示,该系统架构包括服务器110和移动终端120,移动终端120比如可以是智能手机。移动终端120与服务器110通过通信链路相连,因此,移动终端120可以向服务器110发送数据,也可以接收来自服务器110的数据,服务器110上设有服务端程序和预设识别模型,而移动终端120上安装并运行有客户端软件,服务器110为本实施例中的实施终端。当本申请提供的人脸活体检测方法应用于图1所示的系统架构中时,一个具体过程可以是这样的:用户通过操作移动终端120上的客户端软件录制并向服务器110上传人脸摇头视频流数据;服务器110接收到该人脸摇头视频流数据之后,通过运行服务端程序来提取该人脸摇头视频流数据中的人脸区域图片;接着,服务器110将该人脸区域图片输入至预设识别模型,得到该模型输出的人脸关键点坐标和人眼视线偏移矢量;最后,服务器110通过运行服务端程序来根据由模型输出的人脸关键点坐标和人眼视线偏移矢量来判断并输出对当前阶段的活体检测的检测结果。Fig. 1 is a schematic diagram showing a system architecture of a method for detecting a human face according to an exemplary embodiment. As shown in FIG. 1, the system architecture includes a server 110 and a mobile terminal 120. The mobile terminal 120 may be, for example, a smart phone. The mobile terminal 120 is connected to the server 110 through a communication link. Therefore, the mobile terminal 120 can send data to the server 110 or receive data from the server 110. The server 110 is provided with a server program and a preset recognition model, and the mobile terminal Client software is installed and running on 120, and server 110 is the implementation terminal in this embodiment. When the face living detection method provided in this application is applied to the system architecture shown in FIG. 1, a specific process may be as follows: the user records and uploads the face shaking his head to the server 110 by operating the client software on the mobile terminal 120 Video stream data; after the server 110 receives the face shaking head video stream data, it runs a server program to extract the face area picture in the face shaking head video stream data; then, the server 110 inputs the face area picture to The recognition model is preset to obtain the face key point coordinates and the human eye line of sight offset vector output by the model; finally, the server 110 runs the server program to according to the face key point coordinates and the human eye line of sight offset vector output by the model To judge and output the detection results of the current stage of the living body detection.
值得一提的是,图1仅为本申请的一个实施例,虽然在本实施例中的实施终端为服务器,而提供人脸摇头视频流数据的终端为移动终端,但在其他实施例或者实际应用中,实施终端以及提供人脸摇头视频流数据的终端均可以为如前所述的各种终端或设备;虽然在本实施例中,人脸摇头视频流数据是从申请实施终端之外的终端发送而来的,但实际上,人脸摇头视频流数据可以是由本地终端直接获得的。本申请对此不作限定,本申请的保护范围也不应因此而受到任何限制。It is worth mentioning that Figure 1 is only an embodiment of the present application. Although the implementation terminal in this embodiment is a server, and the terminal that provides face shaking video stream data is a mobile terminal, in other embodiments or actual In the application, the implementation terminal and the terminal that provides the face shaking video stream data can be various terminals or devices as described above; although in this embodiment, the face shaking video stream data is from the terminal other than the application implementation terminal. It is sent from the terminal, but in fact, the face shaking video stream data can be directly obtained by the local terminal. This application does not limit this, and the protection scope of this application should not be restricted in any way.
图2是根据一示例性实施例示出的一种人脸活体检测方法的流程图。本实施例提供的人脸活体检测方法可以由服务器执行,如图2所示,包括以下步骤。Fig. 2 is a flow chart showing a method for detecting human face living according to an exemplary embodiment. The face living detection method provided in this embodiment can be executed by a server, as shown in FIG. 2, and includes the following steps.
步骤240,将待进行活体检测的人脸摇头视频流数据所对应的人脸区域图片输入至预设识别模型,得到由所述预设识别模型输出的人脸关键点坐标和人眼视线偏移矢量。Step 240: Input the face region picture corresponding to the face shaking video stream data to be subjected to the live detection into a preset recognition model, and obtain the key point coordinates of the face and the human eye sight offset output by the preset recognition model Vector.
其中,所述预设识别模型为结合了人眼视线偏移矢量输出层的人脸关键点检测模型,所述人脸关键点检测模型包括卷积层,所述人眼视线偏移矢量输出层与所述人脸关键点检测模型中卷积层的最后一层相连,所述人脸关键点坐标和所述人眼视线偏移矢量分别与所述人脸摇头视频流数据所包括的各人脸图像帧相对应,所述人眼视线偏移矢量用于衡量人脸摇头过程中人眼视线的偏移程度。Wherein, the preset recognition model is a face key point detection model combined with a human eye line of sight offset vector output layer, the face key point detection model includes a convolutional layer, and the human eye line of sight offset vector output layer Connected to the last layer of the convolutional layer in the face key point detection model, the face key point coordinates and the eye sight offset vector are respectively the same as each person included in the face shaking video stream data Corresponding to the face image frame, the human eye sight deviation vector is used to measure the degree of deviation of the human eye sight during the process of shaking the head of the face.
人脸关键点坐标和人眼视线偏移矢量分别与各人脸图像帧相对应,也就是说,对于每一人脸图像帧,都有对应的人脸关键点坐标和人眼视线偏移矢量。The face key point coordinates and the eye sight offset vector correspond to each face image frame, that is, for each face image frame, there is a corresponding face key point coordinate and eye sight offset vector.
人眼视线偏移矢量包含方向和长度,比如人眼视线向左即为正,向右即为负。长度可以定义为人眼瞳孔偏移眼眶中心点的距离归一化后的相对程度。The human eye sight offset vector includes the direction and length. For example, the human eye sight to the left is positive, and to the right is negative. Length can be defined as the normalized relative degree of the deviation of the human eye pupil from the center of the eye socket.
申请人发现,正常人脸在摇头的时候视线会看手机,在摇头过程中视线会有偏移变化,而人脸弯曲纸张的视线相对固定,由于纸张弯曲和晃动的原因视线模型的预测会产生一定的抖动,但是这种抖动会小于正常人眼视线左右看的偏移增量,所以可以通过统计正常人脸摇头的视线偏移增量和人脸弯曲纸张的视线偏移增量来区分正常人脸和弯曲纸张。The applicant found that when a normal face is shaking his head, his line of sight will look at the mobile phone. During the shaking of his head, the line of sight will shift and change. However, the line of sight of the curved paper of the face is relatively fixed. The line of sight model predicts due to the bending and shaking of the paper. A certain amount of jitter, but this jitter will be less than the offset increment of the normal human eye's line of sight, so it can be distinguished by counting the line of sight offset increment of the normal face shaking the head and the line of sight offset increment of the face curved paper Human face and curved paper.
具体来说,预设识别模型的结构请参见图3。图3是根据一示例性实施例示出的用于人脸活体检测方法的预设识别模型的至少部分结构示意图。通过图3可以看到,预设识别模型300至少包括人脸关键点检测模型310以及眼视线偏移矢量输出层320。虚线所框起来的部分即为人脸关键点检测模型310的结构部分,包括卷积层311和卷积层311之后的输出部分312,卷积层311可以由多层神经网络结构堆叠而成,预设识别模型300接收的输入为人脸图像帧,输出部分312最终会输出人脸关键点坐标。当然,在预设识别模型300内,卷积层311之前以及卷积层311的各层网络结构之间还可以包括其他结构。人眼视线偏移矢量输出层320接收最后一层卷积层的输入,最终输出人脸图像帧所对应的人眼视线偏移矢量,人眼视线偏移矢量输出层320通常为全连接层。Specifically, the structure of the preset recognition model is shown in Figure 3. Fig. 3 is a schematic diagram showing at least part of the structure of a preset recognition model used in a method for detecting a human face according to an exemplary embodiment. It can be seen from FIG. 3 that the preset recognition model 300 includes at least a face key point detection model 310 and an eye sight offset vector output layer 320. The part framed by the dashed line is the structural part of the face key point detection model 310, including the convolutional layer 311 and the output part 312 after the convolutional layer 311. The convolutional layer 311 can be stacked by a multilayer neural network structure. Assuming that the input received by the recognition model 300 is a face image frame, the output part 312 will finally output the coordinates of the key points of the face. Of course, in the preset recognition model 300, other structures may be included before the convolutional layer 311 and between the network structures of the convolutional layer 311. The human eye sight offset vector output layer 320 receives the input of the last layer of the convolutional layer, and finally outputs the human eye sight offset vector corresponding to the face image frame. The human eye sight offset vector output layer 320 is usually a fully connected layer.
在一个实施例中,步骤240之前的步骤可以如图4所示。图4是根据图2对应实施例示出的一实施例的步骤240之前步骤的流程图。请参照图4,包括以下步骤。In one embodiment, the steps before step 240 may be as shown in FIG. 4. FIG. 4 is a flowchart of steps before step 240 of an embodiment shown in the embodiment corresponding to FIG. 2. Please refer to Figure 4, including the following steps.
步骤210,对待进行活体检测的人脸摇头视频流数据进行解帧,得到所述人脸摇头视频流数据所对应的人脸图像帧。Step 210: Deframe the face shaking video stream data to be subjected to the living body detection, and obtain the face image frame corresponding to the face shaking video stream data.
对人脸摇头视频流数据进行解帧即为将人脸摇头视频流数据分为人脸图像帧的过程。Deframing the face shaking video stream data is a process of dividing the face shaking video stream data into face image frames.
在一个实施例中,在对待进行活体检测的人脸摇头视频流数据进行解帧,得到所述人脸摇头视频流数据所对应的人脸图像帧之前,所述方法还包括:从用户终端获取待进行活体检测的人脸摇头视频流数据。In one embodiment, before deframing the face shaking video stream data to be subjected to liveness detection to obtain the face image frame corresponding to the face shaking video stream data, the method further includes: obtaining from a user terminal Video stream data of face shaking head to be subjected to live detection.
在一个实施例中,在从用户终端获取待进行活体检测的人脸摇头视频流数据之前,所述方法还包括:在多个预设动作指令中随机选择一个预设动作指令并将选择出的所述预设动作指令发送至用户终端,其中,所述多个预设动作指令中包括摇头,从用户终端获取待进行活体检测的人脸摇头视频流数据是在选择出的所述预设动作指令为摇头的条件下进行的。In an embodiment, before acquiring the face shaking video stream data to be subjected to the living body detection from the user terminal, the method further includes: randomly selecting a preset action instruction from a plurality of preset action instructions and selecting the selected one. The preset action instruction is sent to the user terminal, where the plurality of preset action instructions include shaking the head, and acquiring from the user terminal the face shaking video stream data to be subjected to living body detection is based on the selected preset action The instruction is carried out under the condition of shaking the head.
步骤220,将所述人脸图像帧输入至预设人脸检测模型,得到所述人脸图像帧对应的人脸检测框坐标。Step 220: Input the face image frame into a preset face detection model, and obtain the face detection frame coordinates corresponding to the face image frame.
人脸图像帧所包括的像素区域可能很大,而人脸所占的像素区域可能仅是人脸图像帧中的一部分或者一小部分,为了对人脸进行精准检测,所以有必要人脸图像帧中人脸对应的区域进行针对性地识别。The pixel area included in the face image frame may be very large, and the pixel area occupied by the face may be only a part or a small part of the face image frame. In order to accurately detect the face, it is necessary for the face image The area corresponding to the face in the frame is identified in a targeted manner.
人脸检测框坐标是人脸对应的区域在人脸图像帧中的位置坐标。预设人脸检测模型能够根据人脸图像帧的输入而输出相应的人脸检测框坐标,预设人脸检测模型可以基于各种算法或原理来实现,比如,一般的机器学习算法,也可以是深度学习算法。The face detection frame coordinates are the position coordinates of the area corresponding to the face in the face image frame. The preset face detection model can output the corresponding face detection frame coordinates according to the input of the face image frame. The preset face detection model can be implemented based on various algorithms or principles, for example, general machine learning algorithms. It is a deep learning algorithm.
步骤230,根据所述人脸检测框坐标在人脸图像帧中提取人脸区域图片。Step 230: Extract a face area picture from the face image frame according to the face detection frame coordinates.
在一个实施例中,所述根据所述人脸检测框坐标在人脸图像帧中提取人脸区域图片,包括:在人脸图像帧中确定所述人脸检测框坐标所对应的第一人脸检测框区域;按照预定扩框比例对所述第一人脸检测框区域进行扩框操作,得到第二人脸检测框区域;基于所述第二人脸检测框区域所限定的范围提取人脸区域图片。In an embodiment, the extracting a face region picture from the face image frame according to the face detection frame coordinates includes: determining the first person corresponding to the face detection frame coordinates in the face image frame Face detection frame area; expand the first face detection frame area according to a predetermined expansion ratio to obtain a second face detection frame area; extract people based on the range defined by the second face detection frame area Face area picture.
比如,第一人脸检测框区域可以是矩形,人脸检测框坐标是可以用来唯一确定该矩形的范围的坐标,比如,人脸检测框坐标可以是矩形的四个顶点的坐标,利用矩形的四个顶点的坐标即可确定一个矩形的范围;人脸检测框坐标也可以是矩形的两条对角线的交点的坐标,有了两条对角线的交点的坐标之后,再根据预设矩形的长度和宽度,也可以确定对应的该矩形的范围。For example, the first face detection frame area can be a rectangle, and the face detection frame coordinates are coordinates that can be used to uniquely determine the range of the rectangle. For example, the face detection frame coordinates can be the coordinates of the four vertices of the rectangle, using the rectangle The coordinates of the four vertices can determine the range of a rectangle; the coordinates of the face detection frame can also be the coordinates of the intersection of the two diagonals of the rectangle. After having the coordinates of the intersection of the two diagonals, the Assuming the length and width of the rectangle, the range of the corresponding rectangle can also be determined.
预定扩框比例是在原区域的基础上进一步扩大覆盖区域的比例。预定扩框比例可以预定的各种比例,比如可以是20%,对第一人脸区域进行扩框操作可以采用多种方式或角度,比如,从中心向四周扩增、向左右或上下两侧扩增、向右上或左下扩增等等。这样经过扩框操作之后,得到的第二人脸检测框区域的面积大于第一人脸检测框区域。The predetermined frame expansion ratio is the ratio of further expanding the coverage area on the basis of the original area. The predetermined expansion ratio can be various predetermined ratios, such as 20%. The expansion operation of the first face area can be performed in a variety of ways or angles, such as expanding from the center to the surroundings, to the left and right or up and down. Amplify, amplify to the upper right or lower left, and so on. In this way, after the frame expansion operation, the area of the second face detection frame area obtained is larger than the first face detection frame area.
在本实施例中,通过在根据人脸检测框坐标确定所对应的第一人脸检测框区域之后,不直接根据该第一人脸检测框区域所限定的范围提取人脸区域图片,而是先对第一人脸检测框区域进行扩框操作,得到第二人脸检测框区域,再基于第二人脸检测框区域所限定的范围提取人脸区域图片,因此,这样可以使得提取到的人脸区域图片足够大,从而保留更多有关人脸的信息,在一定程度上提高了活体检测效果。In this embodiment, after determining the corresponding first face detection frame area according to the face detection frame coordinates, the face area picture is not extracted directly according to the range defined by the first face detection frame area, but First, expand the frame of the first face detection frame area to obtain the second face detection frame area, and then extract the face area picture based on the range defined by the second face detection frame area. Therefore, this can make the extracted The face area picture is large enough to retain more information about the face, which improves the live detection effect to a certain extent.
在一个实施例中,所述将所述人脸图像帧输入至预设人脸检测模型,得到所述人脸图像帧对应的人脸检测框坐标,包括:将每一所述人脸图像帧输入至预设人脸检测模型,得到每一所述人脸图像帧对应的人脸检测框坐标;所述根据所述人脸检测框坐标在人脸图像帧中提取人脸区域图片,包括:根据各所述人脸检测框坐标在每一所述人脸图像帧中提取人脸区域图片。In an embodiment, the inputting the face image frame into a preset face detection model to obtain the face detection frame coordinates corresponding to the face image frame includes: Input to the preset face detection model to obtain the face detection frame coordinates corresponding to each of the face image frames; the extracting the face area picture from the face image frame according to the face detection frame coordinates includes: Extract a face region picture from each face image frame according to the coordinates of each face detection frame.
在人脸图像帧中提取人脸区域图片即为在人脸图像帧中抠图的过程。在本实施例中的人脸区域图片都是先由预设人脸检测模型确定出人脸检测框坐标,然后在根据人脸检测框坐标提取得到的。Extracting the face region picture from the face image frame is the process of matting in the face image frame. In this embodiment, the face region pictures are first determined by the preset face detection model to determine the face detection frame coordinates, and then extracted according to the face detection frame coordinates.
在一个实施例中,所述将所述人脸图像帧输入至预设人脸检测模型,得到所述人脸图像帧对应的人脸检测框坐标,包括:将至少一个所述人脸图像帧输入至预设人脸检测模型,得到至少一个所述人脸图像帧分别对应的第一人脸检测框坐标;所述根据所述人脸检测框坐标在人脸图像帧中提取人脸区域图片,包括:根据各所述第一人脸检测框坐标在所述第一人脸检测框坐标所对应的人脸图像帧中提取对应的第一人脸区域图片;将各所述第一人脸区域图片输入至所述预设识别模型,得到各所述第一人脸区域图片对应的人脸关键点坐标和人眼视线偏移矢量;确定各所述第一人脸区域图片对应的人脸关键点坐标所对应的人脸外接矩形;根据所述人脸外接矩形和预设估计算法确定所述至少一个所述人脸图像帧之后的至少一个人脸图像帧对应的第二人脸检测框坐标;根据确定出的所述第二人脸检测框坐标在所述第二人脸检测框坐标所对应的人脸图像帧中提取对应的第二人脸区域图片。In an embodiment, the inputting the face image frame into a preset face detection model to obtain the face detection frame coordinates corresponding to the face image frame includes: adding at least one of the face image frame Input to the preset face detection model to obtain the first face detection frame coordinates corresponding to at least one of the face image frames respectively; said extracting a face region picture from the face image frame according to the face detection frame coordinates , Including: extracting a corresponding first face region picture from the face image frame corresponding to the first face detection frame coordinates according to each of the first face detection frame coordinates; The region picture is input to the preset recognition model to obtain the face key point coordinates and the eye sight offset vector corresponding to each of the first face region pictures; determine the face corresponding to each of the first face region pictures The circumscribed rectangle of the face corresponding to the key point coordinates; determining a second face detection frame corresponding to at least one face image frame after the at least one face image frame according to the circumscribed rectangle of the face and a preset estimation algorithm Coordinates; according to the determined coordinates of the second face detection frame, extract a corresponding second face region picture from the face image frame corresponding to the coordinates of the second face detection frame.
人脸外接矩形是恰好能够覆盖人脸区域的矩形,人脸区域边缘上至少一部分点位于该矩形上。预设估计算法可以是各种能够对人脸运动状态进行估计或推算的算法,比如可以是卡尔曼滤波器。卡尔曼滤波器(Kalman Filter),也可以叫做卡尔曼滤波方程或卡尔曼运动方程,是一种利用线性系统状态方程,通过系统输入输出观测数据,对系统状态进行最优估计的算法。具体来说,将前面的至少一个人脸图像帧对应的人脸外接矩形带入卡尔曼运动方程,可以确定出当前甚至之后的人脸图像帧对应的第二人脸检测框坐标,第二人脸检测框坐标是基于卡尔曼运动方程预测得到的。The circumscribed rectangle of the face is a rectangle that can just cover the face area, and at least a part of the points on the edge of the face area are located on the rectangle. The preset estimation algorithm may be various algorithms capable of estimating or calculating the motion state of the face, for example, it may be a Kalman filter. Kalman Filter, also known as Kalman filter equation or Kalman equation of motion, is an algorithm that uses linear system state equations to perform optimal estimation of system state through system input and output observation data. Specifically, by bringing the circumscribed rectangle of the face corresponding to at least one previous face image frame into the Kalman equation of motion, the coordinates of the second face detection frame corresponding to the current or future face image frames can be determined. The coordinates of the face detection frame are predicted based on the Kalman equation of motion.
在本实施例中,在所有人脸图像帧中,确定人脸图像帧对应的人脸检测框坐标采用了两种方式:对于在前面的至少一个人脸图像帧而言,采用的方式是将人脸图像帧输入至预设人脸检测模型,得到人脸检测框坐标,然后根据人脸检测框坐标在对应的人脸图像帧中提取对应的人脸区域图片;对于当前或后续的人脸图像帧而言,也是基于前面提取到的人脸区域图片确定出的,具体来说,先将前面提取到的人脸区域图片输入至预设识别模型得到人脸关键点坐标,然后根据人脸关键点坐标确定对应的人脸外接矩形,最后将人脸外接矩形输入预设估计算法中,可以确定出当前以及后续的人脸图像帧对应的第二人脸检测框坐标。与单纯通过将人脸图像帧输入至预设人脸检测模型的方式相比,这种方式消耗计算资源更少,效率更高。In this embodiment, in all face image frames, two methods are used to determine the face detection frame coordinates corresponding to the face image frame: For at least one face image frame in the front, the method used is to The face image frame is input to the preset face detection model to obtain the face detection frame coordinates, and then the corresponding face area picture is extracted from the corresponding face image frame according to the face detection frame coordinates; for the current or subsequent faces The image frame is also determined based on the previously extracted face region picture. Specifically, the previously extracted face region picture is input into the preset recognition model to obtain the key point coordinates of the face, and then according to the face The key point coordinates determine the corresponding circumscribed rectangle of the face, and finally input the circumscribed rectangle of the face into the preset estimation algorithm, and the coordinates of the second face detection frame corresponding to the current and subsequent face image frames can be determined. Compared with the method of simply inputting the face image frame into the preset face detection model, this method consumes less computing resources and is more efficient.
步骤250,根据与各人脸图像帧对应的所述人脸关键点坐标和所述人眼视线偏移矢量确定所述人脸摇头视频流数据是否通过当前阶段的活体检测。Step 250: Determine whether the face shaking video stream data passes the current stage of living body detection according to the face key point coordinates corresponding to each face image frame and the eye sight offset vector.
在一个实施例中,在根据与各人脸图像帧对应的所述人脸关键点坐标和所述人眼视线偏移矢量确定所述人脸摇头视频流数据是否通过当前阶段的活体检测之后,所述方法还包括:在通过当前阶段的活体检测的情况下,获取所述人脸摇头视频流数据之后的人脸视频流数据;对所述人脸视频流数据进行静默活体检测。In an embodiment, after determining whether the face shaking video stream data passes the current stage of live detection according to the face key point coordinates corresponding to each face image frame and the eye sight offset vector, The method further includes: obtaining face video stream data after the face shaking video stream data when the current stage of the live body detection is passed; and performing silent live body detection on the face video stream data.
可以利用各种算法或模型对人脸视频流数据进行静默活体检测,在进行静默活体检测的人脸视频流数据中,人不需要摇头,人脸的位置和角度处于相对不变的状态。Various algorithms or models can be used to perform silent live detection on face video stream data. In the face video stream data for silent live detection, the person does not need to shake his head, and the position and angle of the face are in a relatively unchanged state.
在本实施例中,由于仅仅在通过当前阶段的活体检测的情况下,才进行后续的静默检测,而能够单独完成活体检测的用户的数量远远小于能够单独完成静默检测的用户的数量,这样先进行当前阶段的活体检测便能够过滤掉大量用户,因此,这样在一定程度上降低了资源消耗。In this embodiment, since the subsequent silent detection is only performed when the current stage of the living body detection is passed, the number of users who can complete the living body detection alone is far smaller than the number of users who can complete the silent detection alone. The current stage of live detection can filter out a large number of users, so this reduces resource consumption to a certain extent.
在一个实施例中,所述预设识别模型中与所述人眼视线偏移矢量输出层相关的部分利用如下方式训练而成:获取样本数据集中正常人脸摇头视频流数据所对应的正常人脸区域图片以及人脸纸张摇头视频流数据所对应的人脸纸张区域图片,所述样本数据集包括多个正常人脸摇头视频流数据和多个人脸纸张摇头视频流数据;将所述正常人脸区域图片和所述人脸纸张区域图片输入至所述预设识别模型,得到由所述预设识别模型输出的与所述正常人脸区域图片和所述人脸纸张区域图片分别对应的人脸关键点坐标和人眼视线偏移矢量;分别利用所述正常人脸摇头视频流数据所对应的人脸关键点坐标序列以及所述人脸纸张摇头视频流数据所对应的人脸关键点坐标序列确定所述正常人脸摇头视频流数据和所述人脸纸张摇头视频流数据所对应的人脸摇头程度序列;针对每一所述正常人脸摇头视频流数据和每一所述人脸纸张摇头视频流数据,确定位于预定人脸摇头程度范围内的人脸摇头程度所对应的人脸关键点坐标,作为第一目标人脸关键点坐标;针对每一所述正常人脸摇头视频流数据和每一所述人脸纸张摇头视频流数据,在第一目标人脸关键点坐标对应的人眼视线偏移矢量中确定最大的人眼视线偏移矢量与最小的人眼视线偏移矢量的差值,作为该正常人脸摇头视频流数据或者该人脸纸张摇头视频流数据的分数;利用各所述分数确定分数阈值;基于所述分数阈值训练所述预设识别模型。In one embodiment, the part of the preset recognition model that is related to the output layer of the human eye shift vector is trained in the following way: Obtain the normal person corresponding to the normal face shaking video stream data in the sample data set The face area picture and the face paper area picture corresponding to the face paper shaking head video stream data, the sample data set includes multiple normal face shaking head video stream data and multiple face paper shaking head video stream data; the normal person The face area picture and the face paper area picture are input to the preset recognition model, and the person output by the preset recognition model corresponding to the normal face area picture and the face paper area picture respectively Face key point coordinates and human eye line of sight offset vector; respectively use the face key point coordinate sequence corresponding to the normal face shaking head video stream data and the face key point coordinates corresponding to the face paper shaking head video stream data The sequence determines the face shaking degree sequence corresponding to the normal face shaking head video stream data and the face paper shaking head video stream data; for each normal face shaking head video stream data and each face sheet Head shaking video stream data, determine the face key point coordinates corresponding to the face shaking degree within the predetermined face shaking degree range, as the first target face key point coordinates; for each normal face shaking head video stream data And each of the face paper shaking head video stream data, determine the maximum human eye sight offset vector and the smallest human eye sight offset vector in the human eye sight offset vector corresponding to the coordinates of the first target face key point The difference is used as the score of the normal face shaking head video stream data or the face paper shaking head video stream data; each of the scores is used to determine a score threshold; and the preset recognition model is trained based on the score threshold.
所述根据与各人脸图像帧对应的所述人脸关键点坐标和所述人眼视线偏移矢量确定所述人脸摇头视频流数据是否通过当前阶段的活体检测,包括:在与各人脸图像帧对应的所述人脸关键点坐标确定出位于预定人脸摇头程度范围内的人脸摇头程度所对应的人脸关键点坐标,作为第二目标人脸关键点坐标;根据所述第二目标人脸关键点坐标和所述人眼视线偏移矢量确定待进行活体检测的人脸摇头视频流数据所对应的分数;若所述分数达到所述分数阈值,确定通过当前阶段的活体检测,否则,确定未通过当前阶段的活体检测。The determining whether the face shaking video stream data passes the current stage of living body detection according to the face key point coordinates corresponding to each face image frame and the human eye sight offset vector includes: The face key point coordinates corresponding to the face image frame determine the face key point coordinates corresponding to the face shaking degree within the predetermined face shaking degree range, as the second target face key point coordinates; 2. The coordinates of the key points of the target face and the sight deviation vector of the human eye determine the score corresponding to the face shaking video stream data to be subjected to the live detection; if the score reaches the score threshold, it is determined to pass the current stage of the live detection , Otherwise, it is determined that the current stage of the living body test has not passed.
正常人脸摇头视频流数据的分数一般大于人脸纸张摇头视频流数据的分数。The score of normal face shaking video stream data is generally greater than the score of face paper shaking video stream data.
在一个实施例中,所述利用各所述分数确定分数阈值,包括:根据各正常人脸摇头视频流数据对应的分数,确定分数阈值,以使得且仅使得所述正常人脸摇头视频流数据对应的分数中预定比例的分数达到所述分数阈值;所述基于所述分数阈值训练所述预设识别模型,包括:在各正常人脸摇头视频流数据对应的分数中,确定小于所述分数阈值的人脸纸张摇头视频流数据的分数的数目与所有人脸纸张摇头视频流数据的分数的比值;根据所述比值训练所述预设识别模型。In one embodiment, the determining the score threshold value using each of the scores includes: determining the score threshold value according to the score corresponding to each normal face shaking video stream data, so that and only making the normal face shaking video stream data The score of a predetermined proportion among the corresponding scores reaches the score threshold; the training of the preset recognition model based on the score threshold includes: determining that the score corresponding to each normal face shaking video stream data is less than the score Threshold the ratio of the number of scores of human face paper shaking head video stream data to the score of all human face paper shaking head video stream data; training the preset recognition model according to the ratio.
该比值衡量了所有人脸纸张摇头视频流数据中能被正确识别为人脸纸张摇头视频流数据的比例,即正确拒绝率,因此,可以通过训练使该比例提高。This ratio measures the proportion of all face paper shaking head video stream data that can be correctly identified as human face paper shaking head video stream data, that is, the correct rejection rate. Therefore, this ratio can be increased through training.
当然,也可以通过其他方式利用各所述分数确定分数阈值,比如可以将从小到大排名在前预定比例的分数的最小值作为分数阈值,也可以确定一个分数阈值,使得人脸纸张摇头视频流数据对应的分数中预定比例的分数未达到分数阈值。Of course, each of the scores can also be used in other ways to determine the score threshold. For example, the minimum value of a predetermined proportion of scores from small to large can be used as the score threshold, or a score threshold can be determined so that the face paper shakes the head video stream. The scores of the predetermined proportion among the scores corresponding to the data do not reach the score threshold.
具体来说,假如总共有100个分数,预定比例在99%,那么会将从大到小排名在99的分数作为分数阈值。Specifically, if there are a total of 100 scores and the predetermined ratio is 99%, then the scores ranked at 99 from the largest to the smallest will be used as the score threshold.
人脸摇头程度以角度为单位,能够用来衡量摇头角度的大小,人脸关键点坐标的变化影响着人脸摇头程度的大小,因此,可以根据人脸关键点坐标序列确定对应的人脸摇头程度序列,根据人脸关键点坐标序列确定对应的人脸摇头程度序列可以利用各种算法或模型实现。预定人脸摇头程度范围比如可以是15度。The degree of face shaking is based on the angle, which can be used to measure the size of the shaking head angle. The change in the coordinates of the key points of the face affects the degree of face shaking. Therefore, the corresponding face shaking can be determined according to the coordinate sequence of the key points of the face. The degree sequence, determining the corresponding face shaking degree sequence according to the coordinate sequence of the key points of the face can be implemented using various algorithms or models. The predetermined range of the degree of face shaking may be, for example, 15 degrees.
每一所述正常人脸区域图片或者所述人脸纸张区域图片都对应一个人脸摇头程度。正常人脸摇头视频流数据中所有正常人脸区域图片对应的人脸摇头程度组成了人脸摇头程度序列。类似地,人脸纸张摇头视频流数据中所有人脸纸张区域图片对应的人脸摇头程度也能组成人脸摇头程度序列。Each picture of the normal face area or the picture of the face paper area corresponds to a degree of shaking the head of the face. In the normal face shaking video stream data, all the face shaking degrees corresponding to the normal face region pictures constitute a face shaking degree sequence. Similarly, the human face shaking degree corresponding to the picture of the paper area of the human face in the face paper shaking head video stream data can also form a face shaking degree sequence.
由于正常人脸摇头视频流数据和人脸纸张摇头视频流数据均是在时序上的一组人脸图像帧,正常人脸摇头视频流数据所对应的正常人脸区域图片以及人脸纸张摇头视频流数据所对应的人脸纸张区域图片都是以图片序列的方式存在,同理,正常人脸摇头视频流数据和人脸纸张摇头视频流数据所对应的人脸关键点坐标也可以以序列的方式存在。Since the normal face shaking video stream data and the face paper shaking video stream data are a set of face image frames in time series, the normal face area image and the face paper shaking video corresponding to the normal face shaking video stream data The face paper area pictures corresponding to the stream data are all in the form of a picture sequence. Similarly, the face key point coordinates corresponding to the normal face shaking head video stream data and the face paper shaking head video stream data can also be sequenced The way exists.
综上所述,根据图2实施例提供的人脸活体检测方法,通过利用结合了人眼视线偏移矢量输出层的人脸关键点检测模型计算出人脸区域图片所对应的人眼视线偏移矢量,并利用人眼视线偏移矢量进行人脸活体检测。因此,在活体检测过程中,可以识别出利用包含人脸的纸张或头模进行摇晃的欺诈手段,从而提高了活体检测的准确率,降低了安全风险。To sum up, according to the face live detection method provided by the embodiment in FIG. 2, the face key point detection model combined with the human eye sight offset vector output layer is used to calculate the eye sight deviation corresponding to the face region picture Shift the vector, and use the eye sight shift vector to detect the face living body. Therefore, in the process of living body detection, fraudulent means using paper or head molds containing human faces to shake can be identified, thereby improving the accuracy of living body detection and reducing security risks.
本申请还提供了一种人脸活体检测装置,以下是本申请的装置实施例。The present application also provides a face living detection device, and the following are the device embodiments of the present application.
图5是根据一示例性实施例示出的一种人脸活体检测装置的框图。如图5所示,该装置500包括:输入模块510,被配置为将待进行活体检测的人脸摇头视频流数据所对应的人脸区域图片输入至预设识别模型,得到由所述预设识别模型输出的人脸关键点坐标和人眼视线偏移矢量,其中,所述预设识别模型为结合了人眼视线偏移矢量输出层的人脸关键点检测模型,所述人脸关键点检测模型包括卷积层,所述人眼视线偏移矢量输出层与所述人脸关键点检测模型中卷积层的最后一层相连,所述人脸关键点坐标和所述人眼视线偏移矢量分别与所述人脸摇头视频流数据所包括的各人脸图像帧相对应,所述人眼视线偏移矢量用于衡量人脸摇头过程中人眼视线的偏移程度;判断模块520,被配置为根据与各人脸图像帧对应的所述人脸关键点坐标和所述人眼视线偏移矢量确定所述人脸摇头视频流数据是否通过当前阶段的活体检测。Fig. 5 is a block diagram showing a device for detecting human face living according to an exemplary embodiment. As shown in FIG. 5, the device 500 includes: an input module 510, configured to input the face region picture corresponding to the face shaking video stream data to be subjected to the live detection into a preset recognition model, and the preset recognition model is obtained from the preset recognition model. The face key point coordinates and the human eye sight offset vector output by the recognition model, wherein the preset recognition model is a face key point detection model combined with the human eye sight offset vector output layer, and the face key point The detection model includes a convolutional layer. The human eye sight offset vector output layer is connected to the last layer of the convolutional layer in the face key point detection model. The face key point coordinates and the human eye sight deviation The shift vector corresponds to each face image frame included in the face shaking video stream data, and the human eye sight deviation vector is used to measure the degree of deviation of the human eye sight during the face shaking process; the judgment module 520 , Configured to determine whether the face shaking video stream data passes the current stage of living body detection according to the face key point coordinates corresponding to each face image frame and the eye sight offset vector.
根据本申请的第三方面,还提供了一种能够实现上述方法的电子设备。According to the third aspect of the present application, there is also provided an electronic device capable of implementing the above method.
所属技术领域的技术人员能够理解,本申请的各个方面可以实现为系统、方法或程序产品。因此,本申请的各个方面可以具体实现为以下形式,即:完全的硬件实施方式、完全的软件实施方式(包括固件、微代码等),或硬件和软件方面结合的实施方式,这里可以统称为“电路”、“模块”或“系统”。Those skilled in the art can understand that various aspects of the present application can be implemented as a system, a method, or a program product. Therefore, each aspect of the present application can be specifically implemented in the following forms, namely: complete hardware implementation, complete software implementation (including firmware, microcode, etc.), or a combination of hardware and software implementations, which can be collectively referred to herein as "Circuit", "Module" or "System".
下面参照图6来描述根据本申请的这种实施方式的电子设备600。图6显示的电子设备600仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。如图6所示,电子设备600以通用计算设备的形式表现。电子设备600的组件可以包括但不限于:上述至少一个处理单元610、上述至少一个存储单元620、连接不同系统组件(包括存储单元620和处理单元610)的总线630。其中,所述存储单元存储有程序代码,所述程序代码可以被所述处理单元610执行,使得所述处理单元610执行本说明书上述“实施例方法”部分中描述的根据本申请各种示例性实施方式的步骤。存储单元620可以包括易失性存储单元形式的可读介质,例如随机存取存储单元(RAM)621和/或高速缓存存储单元622,还可以进一步包括只读存储单元(ROM)623。存储单元620还可以包括具有一组(至少一个)程序模块625的程序/实用工具624,这样的程序模块625包括但不限于:操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。The electronic device 600 according to this embodiment of the present application will be described below with reference to FIG. 6. The electronic device 600 shown in FIG. 6 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present application. As shown in FIG. 6, the electronic device 600 is represented in the form of a general-purpose computing device. The components of the electronic device 600 may include, but are not limited to: the aforementioned at least one processing unit 610, the aforementioned at least one storage unit 620, and a bus 630 connecting different system components (including the storage unit 620 and the processing unit 610). Wherein, the storage unit stores program code, and the program code can be executed by the processing unit 610, so that the processing unit 610 executes the various exemplary methods described in the above-mentioned "Embodiment Method" section of this specification. Steps of implementation. The storage unit 620 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 621 and/or a cache storage unit 622, and may further include a read-only storage unit (ROM) 623. The storage unit 620 may also include a program/utility tool 624 having a set of (at least one) program module 625. Such program module 625 includes but is not limited to: an operating system, one or more application programs, other program modules, and program data, Each of these examples or some combination may include the implementation of a network environment.
总线630可以为表示几类总线结构中的一种或多种,包括存储单元总线或者存储单元控制器、外围总线、图形加速端口、处理单元或者使用多种总线结构中的任意总线结构的局域总线。电子设备600也可以与一个或多个外部设备800(例如键盘、指向设备、蓝牙设备等)通信,还可与一个或者多个使得用户能与该电子设备600交互的设备通信,和/或与使得该电子设备600能与一个或多个其它计算设备进行通信的任何设备(例如路由器、调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口650进行,比如与显示单元640通信。并且,电子设备600还可以通过网络适配器660与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器660通过总线630与电子设备600的其它模块通信。应当明白,尽管图中未示出,可以结合电子设备600使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。The bus 630 may represent one or more of several types of bus structures, including a storage unit bus or a storage unit controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any bus structure among multiple bus structures. bus. The electronic device 600 may also communicate with one or more external devices 800 (such as keyboards, pointing devices, Bluetooth devices, etc.), and may also communicate with one or more devices that enable a user to interact with the electronic device 600, and/or communicate with Any device (such as a router, modem, etc.) that enables the electronic device 600 to communicate with one or more other computing devices. Such communication may be performed through an input/output (I/O) interface 650, such as communication with the display unit 640. In addition, the electronic device 600 may also communicate with one or more networks (for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 660. As shown in the figure, the network adapter 660 communicates with other modules of the electronic device 600 through the bus 630. It should be understood that although not shown in the figure, other hardware and/or software modules can be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives And data backup storage system, etc.
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本申请实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、终端装置、或者网络设备等)执行根据本申请实施方式的方法。Through the description of the above embodiments, those skilled in the art can easily understand that the example embodiments described here can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which can be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiment of the present application.
根据本申请的第四方面,还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,当所述计算机可读指令被计算机执行时,使计算机执行本说明书上述的方法。According to the fourth aspect of the present application, there is also provided a computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, when the computer-readable instructions are executed by the computer, the computer executes this specification The above method.
可选的,本申请涉及的存储介质如计算机可读存储介质可以是非易失性的,也可以是易失性的。Optionally, the storage medium involved in this application, such as a computer-readable storage medium, may be non-volatile or volatile.
在一些可能的实施方式中,本申请的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当所述程序产品在终端设备上运行时,所述程序代码用于使所述终端设备执行本说明书上述“示例性方法”部分中描述的根据本申请各种示例性实施方式的步骤。In some possible implementation manners, each aspect of the present application can also be implemented in the form of a program product, which includes program code. When the program product runs on a terminal device, the program code is used to make the The terminal device executes the steps according to various exemplary embodiments of the present application described in the above-mentioned "Exemplary Method" section of this specification.
参考图7所示,描述了根据本申请的实施方式的用于实现上述方法的程序产品700,其可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码,并可以在终端设备,例如个人电脑上运行。然而,本申请的程序产品不限于此,在本文件中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。所述程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以为但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、有线、光缆、RF等等,或者上述的任意合适的组合。Referring to FIG. 7, a program product 700 for implementing the above method according to an embodiment of the present application is described. It can adopt a portable compact disk read-only memory (CD-ROM) and include program code, and can be stored in a terminal device, For example, running on a personal computer. However, the program product of this application is not limited to this. In this document, the readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, device, or device. The program product can use any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Type programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. The computer-readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. The readable signal medium may also be any readable medium other than a readable storage medium, and the readable medium may send, propagate, or transmit a program for use by or in combination with the instruction execution system, apparatus, or device. The program code contained on the readable medium can be transmitted by any suitable medium, including but not limited to wireless, wired, optical cable, RF, etc., or any suitable combination of the above.
可以以一种或多种程序设计语言的任意组合来编写用于执行本申请操作的程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。The program code used to perform the operations of the present application can be written in any combination of one or more programming languages. The programming languages include object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural programming languages. Programming language-such as "C" language or similar programming language. The program code can be executed entirely on the user's computing device, partly on the user's device, executed as an independent software package, partly on the user's computing device and partly executed on the remote computing device, or entirely on the remote computing device or server Executed on. In the case of a remote computing device, the remote computing device can be connected to a user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computing device (for example, using Internet service providers). Business to connect via the Internet).
此外,上述附图仅是根据本申请示例性实施例的方法所包括的处理的示意性说明,而不是限制目的。易于理解,上述附图所示的处理并不表明或限制这些处理的时间顺序。另外,也易于理解,这些处理可以是例如在多个模块中同步或异步执行的。应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围执行各种修改和改变。本申请的范围仅由所附的权利要求来限制。In addition, the above-mentioned drawings are merely schematic illustrations of the processing included in the method according to the exemplary embodiments of the present application, and are not intended for limitation. It is easy to understand that the processing shown in the above drawings does not indicate or limit the time sequence of these processings. In addition, it is easy to understand that these processes can be executed synchronously or asynchronously in multiple modules, for example. It should be understood that the present application is not limited to the precise structure that has been described above and shown in the drawings, and various modifications and changes can be performed without departing from its scope. The scope of the application is only limited by the appended claims.

Claims (20)

  1. 一种人脸活体检测方法,其中,所述方法包括:A method for detecting a living body of a human face, wherein the method includes:
    将待进行活体检测的人脸摇头视频流数据所对应的人脸区域图片输入至预设识别模型,得到由所述预设识别模型输出的人脸关键点坐标和人眼视线偏移矢量,其中,所述预设识别模型为结合了人眼视线偏移矢量输出层的人脸关键点检测模型,所述人脸关键点检测模型包括卷积层,所述人眼视线偏移矢量输出层与所述人脸关键点检测模型中卷积层的最后一层相连,所述人脸关键点坐标和所述人眼视线偏移矢量分别与所述人脸摇头视频流数据所包括的各人脸图像帧相对应,所述人眼视线偏移矢量用于衡量人脸摇头过程中人眼视线的偏移程度;The face region picture corresponding to the face shaking video stream data to be subjected to the live detection is input into the preset recognition model, and the key point coordinates of the face and the eye sight offset vector output by the preset recognition model are obtained, where The preset recognition model is a face key point detection model combined with a human eye sight offset vector output layer, the face key point detection model includes a convolutional layer, and the human eye sight offset vector output layer is The last layer of the convolutional layer in the face key point detection model is connected, and the face key point coordinates and the eye sight offset vector are respectively connected to each face included in the face shaking video stream data. Corresponding to the image frame, the human eye sight deviation vector is used to measure the degree of deviation of the human eye sight during the process of shaking the head of the face;
    根据与各人脸图像帧对应的所述人脸关键点坐标和所述人眼视线偏移矢量确定所述人脸摇头视频流数据是否通过当前阶段的活体检测。According to the coordinates of the key points of the face corresponding to each face image frame and the line of sight offset vector of the eye, it is determined whether the face shaking video stream data passes the current stage of living body detection.
  2. 根据权利要求1所述的方法,其中,在将待进行活体检测的人脸摇头视频流数据所对应的人脸区域图片输入至预设识别模型,得到由所述预设识别模型输出的人脸关键点坐标和人眼视线偏移矢量之前,所述方法还包括:The method according to claim 1, wherein the face region picture corresponding to the face shaking video stream data to be subjected to the live detection is input into a preset recognition model to obtain the face output by the preset recognition model Before the key point coordinates and the human eye line of sight offset vector, the method further includes:
    对待进行活体检测的人脸摇头视频流数据进行解帧,得到所述人脸摇头视频流数据所对应的人脸图像帧;Deframing the face shaking video stream data to be subjected to live body detection to obtain the face image frame corresponding to the face shaking video stream data;
    将所述人脸图像帧输入至预设人脸检测模型,得到所述人脸图像帧对应的人脸检测框坐标;Input the face image frame into a preset face detection model, and obtain the face detection frame coordinates corresponding to the face image frame;
    根据所述人脸检测框坐标在人脸图像帧中提取人脸区域图片。Extracting a face area picture from the face image frame according to the face detection frame coordinates.
  3. 根据权利要求2所述的方法,其中,所述根据所述人脸检测框坐标在人脸图像帧中提取人脸区域图片,包括:The method according to claim 2, wherein said extracting a face region picture from a face image frame according to said face detection frame coordinates comprises:
    在人脸图像帧中确定所述人脸检测框坐标所对应的第一人脸检测框区域;Determining the first face detection frame area corresponding to the face detection frame coordinates in the face image frame;
    按照预定扩框比例对所述第一人脸检测框区域进行扩框操作,得到第二人脸检测框区域;Performing a frame expansion operation on the first face detection frame area according to a predetermined frame expansion ratio to obtain a second face detection frame area;
    基于所述第二人脸检测框区域所限定的范围提取人脸区域图片。Extract a face area picture based on the range defined by the second face detection frame area.
  4. 根据权利要求1所述的方法,其中,在根据与各人脸图像帧对应的所述人脸关键点坐标和所述人眼视线偏移矢量确定所述人脸摇头视频流数据是否通过当前阶段的活体检测之后,所述方法还包括:The method according to claim 1, wherein it is determined whether the face shaking video stream data passes the current stage according to the face key point coordinates corresponding to each face image frame and the human eye sight offset vector After the live detection, the method further includes:
    在通过当前阶段的活体检测的情况下,获取所述人脸摇头视频流数据之后的人脸视频流数据;In the case of passing the current stage of living body detection, acquiring the face video stream data after the face shaking video stream data;
    对所述人脸视频流数据进行静默活体检测。Perform silent living detection on the face video stream data.
  5. 根据权利要求2所述的方法,其中,所述将所述人脸图像帧输入至预设人脸检测模型,得到所述人脸图像帧对应的人脸检测框坐标,包括:The method according to claim 2, wherein the inputting the face image frame to a preset face detection model to obtain the face detection frame coordinates corresponding to the face image frame comprises:
    将每一所述人脸图像帧输入至预设人脸检测模型,得到每一所述人脸图像帧对应的人脸检测框坐标;Inputting each of the face image frames into a preset face detection model to obtain the face detection frame coordinates corresponding to each of the face image frames;
    所述根据所述人脸检测框坐标在人脸图像帧中提取人脸区域图片,包括:The extracting a face area picture from the face image frame according to the face detection frame coordinates includes:
    根据各所述人脸检测框坐标在每一所述人脸图像帧中提取人脸区域图片。Extract a face region picture from each face image frame according to the coordinates of each face detection frame.
  6. 根据权利要求2所述的方法,其中,所述将所述人脸图像帧输入至预设人脸检测模型,得到所述人脸图像帧对应的人脸检测框坐标,包括:The method according to claim 2, wherein the inputting the face image frame to a preset face detection model to obtain the face detection frame coordinates corresponding to the face image frame comprises:
    将至少一个所述人脸图像帧输入至预设人脸检测模型,得到至少一个所述人脸图像帧分别对应的第一人脸检测框坐标;Inputting at least one of the face image frames into a preset face detection model to obtain the coordinates of the first face detection frame corresponding to the at least one face image frame;
    所述根据所述人脸检测框坐标在人脸图像帧中提取人脸区域图片,包括:The extracting a face area picture from the face image frame according to the face detection frame coordinates includes:
    根据各所述第一人脸检测框坐标在所述第一人脸检测框坐标所对应的人脸图像帧中提取对应的第一人脸区域图片;Extracting a corresponding first face region picture from the face image frame corresponding to the first face detection frame coordinates according to each of the first face detection frame coordinates;
    将各所述第一人脸区域图片输入至所述预设识别模型,得到各所述第一人脸区域图片对应的人脸关键点坐标和人眼视线偏移矢量;Inputting each of the first face area pictures into the preset recognition model to obtain the face key point coordinates and the human eye sight offset vector corresponding to each of the first face area pictures;
    确定各所述第一人脸区域图片对应的人脸关键点坐标所对应的人脸外接矩形;Determining the circumscribed rectangle of the face corresponding to the coordinates of the key points of the face corresponding to each of the first face region pictures;
    根据所述人脸外接矩形和预设估计算法确定所述至少一个所述人脸图像帧之后的至少一个人脸图像帧对应的第二人脸检测框坐标;Determining, according to the circumscribed rectangle of the face and a preset estimation algorithm, the coordinates of the second face detection frame corresponding to at least one face image frame after the at least one face image frame;
    根据确定出的所述第二人脸检测框坐标在所述第二人脸检测框坐标所对应的人脸图像帧中提取对应的第二人脸区域图片。Extract a corresponding second face region picture from the face image frame corresponding to the second face detection frame coordinates according to the determined second face detection frame coordinates.
  7. 根据权利要求1-6任意一项所述的方法,其中,所述预设识别模型中与所述人眼视线偏移矢量输出层相关的部分利用如下方式训练而成:The method according to any one of claims 1 to 6, wherein the part of the preset recognition model related to the human eye shift vector output layer is trained in the following manner:
    获取样本数据集中正常人脸摇头视频流数据所对应的正常人脸区域图片以及人脸纸张摇头视频流数据所对应的人脸纸张区域图片,所述样本数据集包括多个正常人脸摇头视频流数据和多个人脸纸张摇头视频流数据;Obtain the normal face area picture corresponding to the normal face shaking head video stream data and the face paper area picture corresponding to the face paper shaking head video stream data in the sample data set, the sample data set includes multiple normal face shaking head video streams Data and multiple face paper shaking head video stream data;
    将所述正常人脸区域图片和所述人脸纸张区域图片输入至所述预设识别模型,得到由所述预设识别模型输出的与所述正常人脸区域图片和所述人脸纸张区域图片分别对应的人脸关键点坐标和人眼视线偏移矢量;The picture of the normal face area and the picture of the face paper area are input to the preset recognition model to obtain the picture of the normal face area and the face paper area output by the preset recognition model The picture corresponds to the key point coordinates of the face and the eye sight offset vector;
    分别利用所述正常人脸摇头视频流数据所对应的人脸关键点坐标序列以及所述人脸纸张摇头视频流数据所对应的人脸关键点坐标序列确定所述正常人脸摇头视频流数据和所述人脸纸张摇头视频流数据所对应的人脸摇头程度序列;The normal face shaking head video stream data and the face key point coordinate sequence corresponding to the normal face shaking head video stream data and the face key point coordinate sequence corresponding to the face paper shaking head video stream data are used to determine the normal face shaking head video stream data and The human face shaking head degree sequence corresponding to the face paper shaking head video stream data;
    针对每一所述正常人脸摇头视频流数据和每一所述人脸纸张摇头视频流数据,确定位于预定人脸摇头程度范围内的人脸摇头程度所对应的人脸关键点坐标,作为第一目标人脸关键点坐标;For each of the normal face shaking video stream data and each of the face paper shaking video stream data, determine the face key point coordinates corresponding to the face shaking degree within the predetermined face shaking degree range, as the first 1. Coordinates of key points on the target face;
    针对每一所述正常人脸摇头视频流数据和每一所述人脸纸张摇头视频流数据,在第一目标人脸关键点坐标对应的人眼视线偏移矢量中确定最大的人眼视线偏移矢量与最小的人眼视线偏移矢量的差值,作为该正常人脸摇头视频流数据或者该人脸纸张摇头视频流数据的分数;For each of the normal face shaking head video stream data and each of the face paper shaking head video stream data, the largest human eye sight deviation is determined in the human eye sight offset vector corresponding to the coordinates of the first target face key point The difference between the shift vector and the smallest eye deviation vector is used as the score of the normal face shaking head video stream data or the face paper shaking head video stream data;
    利用各所述分数确定分数阈值;Using each of the scores to determine a score threshold;
    基于所述分数阈值训练所述预设识别模型;Training the preset recognition model based on the score threshold;
    所述根据与各人脸图像帧对应的所述人脸关键点坐标和所述人眼视线偏移矢量确定所述人脸摇头视频流数据是否通过当前阶段的活体检测,包括:The determining whether the face shaking video stream data passes the current stage of living body detection according to the face key point coordinates corresponding to each face image frame and the human eye sight offset vector includes:
    在与各人脸图像帧对应的所述人脸关键点坐标确定出位于预定人脸摇头程度范围内的人脸摇头程度所对应的人脸关键点坐标,作为第二目标人脸关键点坐标;Determining, from the face key point coordinates corresponding to each face image frame, the face key point coordinates corresponding to the face shaking degree within the predetermined face shaking degree range, as the second target face key point coordinates;
    根据所述第二目标人脸关键点坐标和所述人眼视线偏移矢量确定待进行活体检测的人脸摇头视频流数据所对应的分数;Determining, according to the coordinates of the key point of the second target human face and the eye sight offset vector, the score corresponding to the human face shaking head video stream data to be subjected to the living body detection;
    若所述分数达到所述分数阈值,确定通过当前阶段的活体检测,否则,确定未通过当前阶段的活体检测。If the score reaches the score threshold, it is determined that the current stage of living body detection is passed; otherwise, it is determined that the current stage of living body detection is not passed.
  8. 一种人脸活体检测装置,其中,所述装置包括:A living body detection device of a human face, wherein the device comprises:
    输入模块,被配置为将待进行活体检测的人脸摇头视频流数据所对应的人脸区域图片输入至预设识别模型,得到由所述预设识别模型输出的人脸关键点坐标和人眼视线偏移矢量,其中,所述预设识别模型为结合了人眼视线偏移矢量输出层的人脸关键点检测模型,所述人脸关键点检测模型包括卷积层,所述人眼视线偏移矢量输出层与所述人脸关键点检测模型中卷积层的最后一层相连,所述人脸关键点坐标和所述人眼视线偏移矢量分别与所述人脸摇头视频流数据所包括的各人脸图像帧相对应,所述人眼视线偏移矢量用于衡量人脸摇头过程中人眼视线的偏移程度;The input module is configured to input the face region picture corresponding to the face shaking video stream data to be subjected to the live detection to a preset recognition model, and obtain the face key point coordinates and human eyes output by the preset recognition model Gaze offset vector, wherein the preset recognition model is a face key point detection model combined with an output layer of the human eye gaze offset vector, the face key point detection model includes a convolutional layer, and the human eye sight line The offset vector output layer is connected to the last layer of the convolutional layer in the face key point detection model, and the face key point coordinates and the eye sight offset vector are respectively the same as the face shaking head video stream data The included face image frames correspond to each other, and the human eye sight deviation vector is used to measure the degree of deviation of the human eye sight during the process of shaking the head of the face;
    判断模块,被配置为根据与各人脸图像帧对应的所述人脸关键点坐标和所述人眼视线偏移矢量确定所述人脸摇头视频流数据是否通过当前阶段的活体检测。The judgment module is configured to determine whether the face shaking video stream data passes the current stage of living body detection according to the face key point coordinates corresponding to each face image frame and the eye sight offset vector.
  9. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机可读指令,当所述计算机可读指令被计算机执行时,使计算机执行以下方法:A computer-readable storage medium, wherein the computer-readable storage medium stores computer-readable instructions, and when the computer-readable instructions are executed by a computer, the computer executes the following method:
    将待进行活体检测的人脸摇头视频流数据所对应的人脸区域图片输入至预设识别模型,得到由所述预设识别模型输出的人脸关键点坐标和人眼视线偏移矢量,其中,所述预设识别模型为结合了人眼视线偏移矢量输出层的人脸关键点检测模型,所述人脸关键点检测模型包括卷积层,所述人眼视线偏移矢量输出层与所述人脸关键点检测模型中卷积层的最后一层相连,所述人脸关键点坐标和所述人眼视线偏移矢量分别与所述人脸摇头视频流数据所包括的各人脸图像帧相对应,所述人眼视线偏移矢量用于衡量人脸摇头过程中人眼视线的偏移程度;The face region picture corresponding to the face shaking video stream data to be subjected to the live detection is input into the preset recognition model, and the key point coordinates of the face and the eye sight offset vector output by the preset recognition model are obtained, where The preset recognition model is a face key point detection model combined with a human eye sight offset vector output layer, the face key point detection model includes a convolutional layer, and the human eye sight offset vector output layer is The last layer of the convolutional layer in the face key point detection model is connected, and the face key point coordinates and the eye sight offset vector are respectively connected to each face included in the face shaking video stream data. Corresponding to the image frame, the human eye sight deviation vector is used to measure the degree of deviation of the human eye sight during the process of shaking the head of the face;
    根据与各人脸图像帧对应的所述人脸关键点坐标和所述人眼视线偏移矢量确定所述人脸摇头视频流数据是否通过当前阶段的活体检测。According to the coordinates of the key points of the face corresponding to each face image frame and the line of sight offset vector of the eye, it is determined whether the face shaking video stream data passes the current stage of living body detection.
  10. 根据权利要求9所述的计算机可读存储介质,其中,在将待进行活体检测的人脸摇头视频流数据所对应的人脸区域图片输入至预设识别模型,得到由所述预设识别模型输出的人脸关键点坐标和人眼视线偏移矢量之前,当所述计算机可读指令被计算机执行时还用于使计算机执行:The computer-readable storage medium according to claim 9, wherein the face region picture corresponding to the face shaking video stream data to be subjected to the live detection is input into the preset recognition model, and the preset recognition model is obtained. Before the output of the key point coordinates of the face and the human eye line of sight offset vector, when the computer-readable instruction is executed by the computer, it is also used to make the computer execute:
    对待进行活体检测的人脸摇头视频流数据进行解帧,得到所述人脸摇头视频流数据所对应的人脸图像帧;Deframing the face shaking video stream data to be subjected to live body detection to obtain the face image frame corresponding to the face shaking video stream data;
    将所述人脸图像帧输入至预设人脸检测模型,得到所述人脸图像帧对应的人脸检测框坐标;Input the face image frame into a preset face detection model, and obtain the face detection frame coordinates corresponding to the face image frame;
    根据所述人脸检测框坐标在人脸图像帧中提取人脸区域图片。Extracting a face area picture from the face image frame according to the face detection frame coordinates.
  11. 根据权利要求9所述的计算机可读存储介质,其中,在根据与各人脸图像帧对应的所述人脸关键点坐标和所述人眼视线偏移矢量确定所述人脸摇头视频流数据是否通过当前阶段的活体检测之后,当所述计算机可读指令被计算机执行时还用于使计算机执行:The computer-readable storage medium according to claim 9, wherein the face shaking head video stream data is determined according to the face key point coordinates corresponding to each face image frame and the human eye sight offset vector After passing the current stage of living body detection, when the computer-readable instruction is executed by the computer, it is also used to make the computer execute:
    在通过当前阶段的活体检测的情况下,获取所述人脸摇头视频流数据之后的人脸视频流数据;In the case of passing the current stage of living body detection, acquiring the face video stream data after the face shaking video stream data;
    对所述人脸视频流数据进行静默活体检测。Perform silent living detection on the face video stream data.
  12. 根据权利要求10所述的计算机可读存储介质,其中,所述将所述人脸图像帧输入至预设人脸检测模型,得到所述人脸图像帧对应的人脸检测框坐标时,具体执行:The computer-readable storage medium according to claim 10, wherein when the face image frame is input to a preset face detection model to obtain the face detection frame coordinates corresponding to the face image frame, specifically implement:
    将每一所述人脸图像帧输入至预设人脸检测模型,得到每一所述人脸图像帧对应的人脸检测框坐标;Inputting each of the face image frames into a preset face detection model to obtain the face detection frame coordinates corresponding to each of the face image frames;
    所述根据所述人脸检测框坐标在人脸图像帧中提取人脸区域图片时,具体执行:When extracting a face region picture from a face image frame according to the face detection frame coordinates, the following is specifically executed:
    根据各所述人脸检测框坐标在每一所述人脸图像帧中提取人脸区域图片。Extract a face region picture from each face image frame according to the coordinates of each face detection frame.
  13. 根据权利要求10所述的计算机可读存储介质,其中,所述将所述人脸图像帧输入至预设人脸检测模型,得到所述人脸图像帧对应的人脸检测框坐标时,具体执行:The computer-readable storage medium according to claim 10, wherein when the face image frame is input to a preset face detection model to obtain the face detection frame coordinates corresponding to the face image frame, specifically implement:
    将至少一个所述人脸图像帧输入至预设人脸检测模型,得到至少一个所述人脸图像帧分别对应的第一人脸检测框坐标;Inputting at least one of the face image frames into a preset face detection model to obtain the coordinates of the first face detection frame corresponding to the at least one face image frame;
    所述根据所述人脸检测框坐标在人脸图像帧中提取人脸区域图片时,具体执行:When extracting a face region picture from a face image frame according to the face detection frame coordinates, the following is specifically executed:
    根据各所述第一人脸检测框坐标在所述第一人脸检测框坐标所对应的人脸图像帧中提取对应的第一人脸区域图片;Extracting a corresponding first face region picture from the face image frame corresponding to the first face detection frame coordinates according to each of the first face detection frame coordinates;
    将各所述第一人脸区域图片输入至所述预设识别模型,得到各所述第一人脸区域图片对应的人脸关键点坐标和人眼视线偏移矢量;Inputting each of the first face area pictures into the preset recognition model to obtain the face key point coordinates and the human eye sight offset vector corresponding to each of the first face area pictures;
    确定各所述第一人脸区域图片对应的人脸关键点坐标所对应的人脸外接矩形;Determining the circumscribed rectangle of the face corresponding to the coordinates of the key points of the face corresponding to each of the first face region pictures;
    根据所述人脸外接矩形和预设估计算法确定所述至少一个所述人脸图像帧之后的至少一个人脸图像帧对应的第二人脸检测框坐标;Determining, according to the circumscribed rectangle of the face and a preset estimation algorithm, the coordinates of the second face detection frame corresponding to at least one face image frame after the at least one face image frame;
    根据确定出的所述第二人脸检测框坐标在所述第二人脸检测框坐标所对应的人脸图像帧中提取对应的第二人脸区域图片。Extract a corresponding second face region picture from the face image frame corresponding to the second face detection frame coordinates according to the determined second face detection frame coordinates.
  14. 根据权利要求9-13任意一项所述的计算机可读存储介质,其中,所述预设识别模型中与所述人眼视线偏移矢量输出层相关的部分利用如下方式训练而成:The computer-readable storage medium according to any one of claims 9-13, wherein the part of the preset recognition model that is related to the human eye shift vector output layer is trained in the following manner:
    获取样本数据集中正常人脸摇头视频流数据所对应的正常人脸区域图片以及人脸纸张摇头视频流数据所对应的人脸纸张区域图片,所述样本数据集包括多个正常人脸摇头视频流数据和多个人脸纸张摇头视频流数据;Obtain the normal face area picture corresponding to the normal face shaking head video stream data and the face paper area picture corresponding to the face paper shaking head video stream data in the sample data set, the sample data set includes multiple normal face shaking head video streams Data and multiple face paper shaking head video stream data;
    将所述正常人脸区域图片和所述人脸纸张区域图片输入至所述预设识别模型,得到由所述预设识别模型输出的与所述正常人脸区域图片和所述人脸纸张区域图片分别对应的人脸关键点坐标和人眼视线偏移矢量;The picture of the normal face area and the picture of the face paper area are input to the preset recognition model to obtain the picture of the normal face area and the face paper area output by the preset recognition model The picture corresponds to the key point coordinates of the face and the eye sight offset vector;
    分别利用所述正常人脸摇头视频流数据所对应的人脸关键点坐标序列以及所述人脸纸张摇头视频流数据所对应的人脸关键点坐标序列确定所述正常人脸摇头视频流数据和所述人脸纸张摇头视频流数据所对应的人脸摇头程度序列;The normal face shaking head video stream data and the face key point coordinate sequence corresponding to the normal face shaking head video stream data and the face key point coordinate sequence corresponding to the face paper shaking head video stream data are used to determine the normal face shaking head video stream data and The human face shaking head degree sequence corresponding to the face paper shaking head video stream data;
    针对每一所述正常人脸摇头视频流数据和每一所述人脸纸张摇头视频流数据,确定位于预定人脸摇头程度范围内的人脸摇头程度所对应的人脸关键点坐标,作为第一目标人脸关键点坐标;For each of the normal face shaking video stream data and each of the face paper shaking video stream data, determine the face key point coordinates corresponding to the face shaking degree within the predetermined face shaking degree range, as the first 1. Coordinates of key points on the target face;
    针对每一所述正常人脸摇头视频流数据和每一所述人脸纸张摇头视频流数据,在第一目标人脸关键点坐标对应的人眼视线偏移矢量中确定最大的人眼视线偏移矢量与最小的人眼视线偏移矢量的差值,作为该正常人脸摇头视频流数据或者该人脸纸张摇头视频流数据的分数;For each of the normal face shaking head video stream data and each of the face paper shaking head video stream data, the largest human eye sight deviation is determined in the human eye sight offset vector corresponding to the coordinates of the first target face key point The difference between the shift vector and the smallest eye deviation vector is used as the score of the normal face shaking head video stream data or the face paper shaking head video stream data;
    利用各所述分数确定分数阈值;Using each of the scores to determine a score threshold;
    基于所述分数阈值训练所述预设识别模型;Training the preset recognition model based on the score threshold;
    所述根据与各人脸图像帧对应的所述人脸关键点坐标和所述人眼视线偏移矢量确定所述人脸摇头视频流数据是否通过当前阶段的活体检测时,具体执行:When determining whether the face shaking video stream data passes the current stage of living body detection according to the face key point coordinates corresponding to each face image frame and the eye sight offset vector, the following is specifically executed:
    在与各人脸图像帧对应的所述人脸关键点坐标确定出位于预定人脸摇头程度范围内的人脸摇头程度所对应的人脸关键点坐标,作为第二目标人脸关键点坐标;Determining, from the face key point coordinates corresponding to each face image frame, the face key point coordinates corresponding to the face shaking degree within the predetermined face shaking degree range, as the second target face key point coordinates;
    根据所述第二目标人脸关键点坐标和所述人眼视线偏移矢量确定待进行活体检测的人脸摇头视频流数据所对应的分数;Determining, according to the coordinates of the key point of the second target human face and the eye sight offset vector, the score corresponding to the human face shaking head video stream data to be subjected to the living body detection;
    若所述分数达到所述分数阈值,确定通过当前阶段的活体检测,否则,确定未通过当前阶段的活体检测。If the score reaches the score threshold, it is determined that the current stage of living body detection is passed; otherwise, it is determined that the current stage of living body detection is not passed.
  15. 一种电子设备,其中,所述电子设备包括:An electronic device, wherein the electronic device includes:
    处理器;processor;
    存储器,所述存储器上存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,实现以下方法:A memory, where computer readable instructions are stored, and when the computer readable instructions are executed by the processor, the following methods are implemented:
    将待进行活体检测的人脸摇头视频流数据所对应的人脸区域图片输入至预设识别模型,得到由所述预设识别模型输出的人脸关键点坐标和人眼视线偏移矢量,其中,所述预设识别模型为结合了人眼视线偏移矢量输出层的人脸关键点检测模型,所述人脸关键点检测模型包括卷积层,所述人眼视线偏移矢量输出层与所述人脸关键点检测模型中卷积层的最后一层相连,所述人脸关键点坐标和所述人眼视线偏移矢量分别与所述人脸摇头视频流数据所包括的各人脸图像帧相对应,所述人眼视线偏移矢量用于衡量人脸摇头过程中人眼视线的偏移程度;The face region picture corresponding to the face shaking video stream data to be subjected to the live detection is input into the preset recognition model, and the key point coordinates of the face and the eye sight offset vector output by the preset recognition model are obtained, where The preset recognition model is a face key point detection model combined with a human eye sight offset vector output layer, the face key point detection model includes a convolutional layer, and the human eye sight offset vector output layer is The last layer of the convolutional layer in the face key point detection model is connected, and the face key point coordinates and the eye sight offset vector are respectively connected to each face included in the face shaking video stream data. Corresponding to the image frame, the human eye sight deviation vector is used to measure the degree of deviation of the human eye sight during the process of shaking the head of the face;
    根据与各人脸图像帧对应的所述人脸关键点坐标和所述人眼视线偏移矢量确定所述人脸摇头视频流数据是否通过当前阶段的活体检测。According to the coordinates of the key points of the face corresponding to each face image frame and the line of sight offset vector of the eye, it is determined whether the face shaking video stream data passes the current stage of living body detection.
  16. 根据权利要求15所述的电子设备,其中,在将待进行活体检测的人脸摇头视频流数据所对应的人脸区域图片输入至预设识别模型,得到由所述预设识别模型输出的人脸关键点坐标和人眼视线偏移矢量之前,所述计算机可读指令被所述处理器执行时还用于实现:The electronic device according to claim 15, wherein the face region picture corresponding to the face shaking video stream data to be subjected to the live detection is input into a preset recognition model, and the person output by the preset recognition model is obtained. Before the face key point coordinates and the human eye line of sight offset vector, the computer-readable instructions are also used to implement when executed by the processor:
    对待进行活体检测的人脸摇头视频流数据进行解帧,得到所述人脸摇头视频流数据所对应的人脸图像帧;Deframing the face shaking video stream data to be subjected to live body detection to obtain the face image frame corresponding to the face shaking video stream data;
    将所述人脸图像帧输入至预设人脸检测模型,得到所述人脸图像帧对应的人脸检测框坐标;Input the face image frame into a preset face detection model, and obtain the face detection frame coordinates corresponding to the face image frame;
    根据所述人脸检测框坐标在人脸图像帧中提取人脸区域图片。Extracting a face area picture from the face image frame according to the face detection frame coordinates.
  17. 根据权利要求15所述的电子设备,其中,在根据与各人脸图像帧对应的所述人脸关键点坐标和所述人眼视线偏移矢量确定所述人脸摇头视频流数据是否通过当前阶段的活体检测之后,所述计算机可读指令被所述处理器执行时还用于实现:The electronic device according to claim 15, wherein it is determined whether the face shaking video stream data passes the current face based on the face key point coordinates corresponding to each face image frame and the human eye sight offset vector. After the stage of living body detection, when the computer-readable instructions are executed by the processor, they are also used to implement:
    在通过当前阶段的活体检测的情况下,获取所述人脸摇头视频流数据之后的人脸视频流数据;In the case of passing the current stage of living body detection, acquiring the face video stream data after the face shaking video stream data;
    对所述人脸视频流数据进行静默活体检测。Perform silent living detection on the face video stream data.
  18. 根据权利要求16所述的电子设备,其中,所述将所述人脸图像帧输入至预设人脸检测模型,得到所述人脸图像帧对应的人脸检测框坐标时,具体实现:The electronic device according to claim 16, wherein when the face image frame is input to a preset face detection model to obtain the face detection frame coordinates corresponding to the face image frame, the specific realization is implemented:
    将每一所述人脸图像帧输入至预设人脸检测模型,得到每一所述人脸图像帧对应的人脸检测框坐标;Inputting each of the face image frames into a preset face detection model to obtain the face detection frame coordinates corresponding to each of the face image frames;
    所述根据所述人脸检测框坐标在人脸图像帧中提取人脸区域图片时,具体实现:When extracting a face region picture from a face image frame according to the face detection frame coordinates, the specific implementation is as follows:
    根据各所述人脸检测框坐标在每一所述人脸图像帧中提取人脸区域图片。Extract a face region picture from each face image frame according to the coordinates of each face detection frame.
  19. 根据权利要求16所述的电子设备,其中,所述将所述人脸图像帧输入至预设人脸检测模型,得到所述人脸图像帧对应的人脸检测框坐标时,具体实现:The electronic device according to claim 16, wherein when the face image frame is input to a preset face detection model to obtain the face detection frame coordinates corresponding to the face image frame, the specific realization is implemented:
    将至少一个所述人脸图像帧输入至预设人脸检测模型,得到至少一个所述人脸图像帧分别对应的第一人脸检测框坐标;Inputting at least one of the face image frames into a preset face detection model to obtain the coordinates of the first face detection frame corresponding to the at least one face image frame;
    所述根据所述人脸检测框坐标在人脸图像帧中提取人脸区域图片时,具体实现:When extracting a face region picture from a face image frame according to the face detection frame coordinates, the specific implementation is as follows:
    根据各所述第一人脸检测框坐标在所述第一人脸检测框坐标所对应的人脸图像帧中提取对应的第一人脸区域图片;Extracting a corresponding first face region picture from the face image frame corresponding to the first face detection frame coordinates according to each of the first face detection frame coordinates;
    将各所述第一人脸区域图片输入至所述预设识别模型,得到各所述第一人脸区域图片对应的人脸关键点坐标和人眼视线偏移矢量;Inputting each of the first face area pictures into the preset recognition model to obtain the face key point coordinates and the human eye sight offset vector corresponding to each of the first face area pictures;
    确定各所述第一人脸区域图片对应的人脸关键点坐标所对应的人脸外接矩形;Determining the circumscribed rectangle of the face corresponding to the coordinates of the key points of the face corresponding to each of the first face region pictures;
    根据所述人脸外接矩形和预设估计算法确定所述至少一个所述人脸图像帧之后的至少一个人脸图像帧对应的第二人脸检测框坐标;Determining, according to the circumscribed rectangle of the face and a preset estimation algorithm, the coordinates of the second face detection frame corresponding to at least one face image frame after the at least one face image frame;
    根据确定出的所述第二人脸检测框坐标在所述第二人脸检测框坐标所对应的人脸图像帧中提取对应的第二人脸区域图片。Extract a corresponding second face region picture from the face image frame corresponding to the second face detection frame coordinates according to the determined second face detection frame coordinates.
  20. 根据权利要求15-19任意一项所述的电子设备,其中,所述预设识别模型中与所述人眼视线偏移矢量输出层相关的部分利用如下方式训练而成:The electronic device according to any one of claims 15-19, wherein the part of the preset recognition model related to the output layer of the human eye deviation vector is trained in the following manner:
    获取样本数据集中正常人脸摇头视频流数据所对应的正常人脸区域图片以及人脸纸张摇头视频流数据所对应的人脸纸张区域图片,所述样本数据集包括多个正常人脸摇头视频流数据和多个人脸纸张摇头视频流数据;Obtain the normal face area picture corresponding to the normal face shaking head video stream data and the face paper area picture corresponding to the face paper shaking head video stream data in the sample data set, the sample data set includes multiple normal face shaking head video streams Data and multiple face paper shaking head video stream data;
    将所述正常人脸区域图片和所述人脸纸张区域图片输入至所述预设识别模型,得到由所述预设识别模型输出的与所述正常人脸区域图片和所述人脸纸张区域图片分别对应的人脸关键点坐标和人眼视线偏移矢量;The picture of the normal face area and the picture of the face paper area are input to the preset recognition model to obtain the picture of the normal face area and the face paper area output by the preset recognition model The picture corresponds to the key point coordinates of the face and the eye sight offset vector;
    分别利用所述正常人脸摇头视频流数据所对应的人脸关键点坐标序列以及所述人脸纸张摇头视频流数据所对应的人脸关键点坐标序列确定所述正常人脸摇头视频流数据和所述人脸纸张摇头视频流数据所对应的人脸摇头程度序列;The normal face shaking head video stream data and the face key point coordinate sequence corresponding to the normal face shaking head video stream data and the face key point coordinate sequence corresponding to the face paper shaking head video stream data are used to determine the normal face shaking head video stream data and The human face shaking head degree sequence corresponding to the face paper shaking head video stream data;
    针对每一所述正常人脸摇头视频流数据和每一所述人脸纸张摇头视频流数据,确定位于预定人脸摇头程度范围内的人脸摇头程度所对应的人脸关键点坐标,作为第一目标人脸关键点坐标;For each of the normal face shaking video stream data and each of the face paper shaking video stream data, determine the face key point coordinates corresponding to the face shaking degree within the predetermined face shaking degree range, as the first 1. Coordinates of key points on the target face;
    针对每一所述正常人脸摇头视频流数据和每一所述人脸纸张摇头视频流数据,在第一目标人脸关键点坐标对应的人眼视线偏移矢量中确定最大的人眼视线偏移矢量与最小的人眼视线偏移矢量的差值,作为该正常人脸摇头视频流数据或者该人脸纸张摇头视频流数据的分数;For each of the normal face shaking head video stream data and each of the face paper shaking head video stream data, the largest human eye sight deviation is determined in the human eye sight offset vector corresponding to the coordinates of the first target face key point The difference between the shift vector and the smallest eye deviation vector is used as the score of the normal face shaking head video stream data or the face paper shaking head video stream data;
    利用各所述分数确定分数阈值;Using each of the scores to determine a score threshold;
    基于所述分数阈值训练所述预设识别模型;Training the preset recognition model based on the score threshold;
    所述根据与各人脸图像帧对应的所述人脸关键点坐标和所述人眼视线偏移矢量确定所述人脸摇头视频流数据是否通过当前阶段的活体检测时,具体实现:When determining whether the face shaking video stream data passes the current stage of living body detection according to the face key point coordinates corresponding to each face image frame and the eye sight offset vector, the specific implementation is as follows:
    在与各人脸图像帧对应的所述人脸关键点坐标确定出位于预定人脸摇头程度范围内的人脸摇头程度所对应的人脸关键点坐标,作为第二目标人脸关键点坐标;Determining, from the face key point coordinates corresponding to each face image frame, the face key point coordinates corresponding to the face shaking degree within the predetermined face shaking degree range, as the second target face key point coordinates;
    根据所述第二目标人脸关键点坐标和所述人眼视线偏移矢量确定待进行活体检测的人脸摇头视频流数据所对应的分数;Determining, according to the coordinates of the key point of the second target human face and the eye sight offset vector, the score corresponding to the human face shaking head video stream data to be subjected to the living body detection;
    若所述分数达到所述分数阈值,确定通过当前阶段的活体检测,否则,确定未通过当前阶段的活体检测。If the score reaches the score threshold, it is determined that the current stage of living body detection is passed; otherwise, it is determined that the current stage of living body detection is not passed.
PCT/CN2020/135548 2020-10-12 2020-12-11 Face detection method, apparatus, medium, and electronic device WO2021179719A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011086784.6 2020-10-12
CN202011086784.6A CN112149615A (en) 2020-10-12 2020-10-12 Face living body detection method, device, medium and electronic equipment

Publications (1)

Publication Number Publication Date
WO2021179719A1 true WO2021179719A1 (en) 2021-09-16

Family

ID=73953002

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/135548 WO2021179719A1 (en) 2020-10-12 2020-12-11 Face detection method, apparatus, medium, and electronic device

Country Status (2)

Country Link
CN (1) CN112149615A (en)
WO (1) WO2021179719A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116110111A (en) * 2023-03-23 2023-05-12 平安银行股份有限公司 Face recognition method, electronic equipment and storage medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112668553B (en) * 2021-01-18 2022-05-13 东莞先知大数据有限公司 Method, device, medium and equipment for detecting discontinuous observation behavior of driver
CN113392810A (en) * 2021-07-08 2021-09-14 北京百度网讯科技有限公司 Method, apparatus, device, medium and product for in vivo detection
CN113642428B (en) * 2021-07-29 2022-09-27 北京百度网讯科技有限公司 Face living body detection method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170169304A1 (en) * 2015-12-09 2017-06-15 Beijing Kuangshi Technology Co., Ltd. Method and apparatus for liveness detection
CN109886087A (en) * 2019-01-04 2019-06-14 平安科技(深圳)有限公司 A kind of biopsy method neural network based and terminal device
CN109977771A (en) * 2019-02-22 2019-07-05 杭州飞步科技有限公司 Verification method, device, equipment and the computer readable storage medium of driver identification
US20190377963A1 (en) * 2018-06-11 2019-12-12 Laurence Hamid Liveness detection
CN111160251A (en) * 2019-12-30 2020-05-15 支付宝实验室(新加坡)有限公司 Living body identification method and device
CN111401127A (en) * 2020-01-16 2020-07-10 创意信息技术股份有限公司 Human face living body detection joint judgment method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170169304A1 (en) * 2015-12-09 2017-06-15 Beijing Kuangshi Technology Co., Ltd. Method and apparatus for liveness detection
US20190377963A1 (en) * 2018-06-11 2019-12-12 Laurence Hamid Liveness detection
CN109886087A (en) * 2019-01-04 2019-06-14 平安科技(深圳)有限公司 A kind of biopsy method neural network based and terminal device
CN109977771A (en) * 2019-02-22 2019-07-05 杭州飞步科技有限公司 Verification method, device, equipment and the computer readable storage medium of driver identification
CN111160251A (en) * 2019-12-30 2020-05-15 支付宝实验室(新加坡)有限公司 Living body identification method and device
CN111401127A (en) * 2020-01-16 2020-07-10 创意信息技术股份有限公司 Human face living body detection joint judgment method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116110111A (en) * 2023-03-23 2023-05-12 平安银行股份有限公司 Face recognition method, electronic equipment and storage medium
CN116110111B (en) * 2023-03-23 2023-09-08 平安银行股份有限公司 Face recognition method, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112149615A (en) 2020-12-29

Similar Documents

Publication Publication Date Title
KR102063037B1 (en) Identity authentication method, terminal equipment and computer readable storage medium
WO2021179719A1 (en) Face detection method, apparatus, medium, and electronic device
CN108875833B (en) Neural network training method, face recognition method and device
US10832069B2 (en) Living body detection method, electronic device and computer readable medium
WO2018228218A1 (en) Identification method, computing device, and storage medium
WO2018028546A1 (en) Key point positioning method, terminal, and computer storage medium
WO2018177379A1 (en) Gesture recognition, gesture control and neural network training methods and apparatuses, and electronic device
WO2018188453A1 (en) Method for determining human face area, storage medium, and computer device
WO2020024484A1 (en) Method and device for outputting data
CN111767900B (en) Face living body detection method, device, computer equipment and storage medium
WO2022105118A1 (en) Image-based health status identification method and apparatus, device and storage medium
WO2022100337A1 (en) Face image quality assessment method and apparatus, computer device and storage medium
WO2022188697A1 (en) Biological feature extraction method and apparatus, device, medium, and program product
WO2020124993A1 (en) Liveness detection method and apparatus, electronic device, and storage medium
WO2020006964A1 (en) Image detection method and device
WO2020238321A1 (en) Method and device for age identification
WO2023035531A1 (en) Super-resolution reconstruction method for text image and related device thereof
WO2020124994A1 (en) Liveness detection method and apparatus, electronic device, and storage medium
WO2021047069A1 (en) Face recognition method and electronic terminal device
WO2020052062A1 (en) Detection method and device
WO2023124040A1 (en) Facial recognition method and apparatus
WO2021169616A1 (en) Method and apparatus for detecting face of non-living body, and computer device and storage medium
WO2023173646A1 (en) Expression recognition method and apparatus
WO2021159669A1 (en) Secure system login method and apparatus, computer device, and storage medium
US11741986B2 (en) System and method for passive subject specific monitoring

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20924217

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20924217

Country of ref document: EP

Kind code of ref document: A1