WO2021147434A1 - 基于人工智能的人脸识别方法、装置、设备及介质 - Google Patents

基于人工智能的人脸识别方法、装置、设备及介质 Download PDF

Info

Publication number
WO2021147434A1
WO2021147434A1 PCT/CN2020/124944 CN2020124944W WO2021147434A1 WO 2021147434 A1 WO2021147434 A1 WO 2021147434A1 CN 2020124944 W CN2020124944 W CN 2020124944W WO 2021147434 A1 WO2021147434 A1 WO 2021147434A1
Authority
WO
WIPO (PCT)
Prior art keywords
living body
face
detection unit
body detection
video frames
Prior art date
Application number
PCT/CN2020/124944
Other languages
English (en)
French (fr)
Inventor
高源�
李志锋
刘威
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2021147434A1 publication Critical patent/WO2021147434A1/zh
Priority to US17/685,177 priority Critical patent/US20220309836A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection
    • G06V40/45Detection of the body part being alive
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/45Structures or tools for the administration of authentication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • This application relates to the field of computer vision technology, and in particular to a face recognition method, device, equipment and medium based on artificial intelligence.
  • Face live detection technology refers to the use of specific detection methods, using face key point positioning and face tracking technologies to verify whether the user's operation is the actual operation of the living person.
  • three-dimensional (3Dimensions, 3D) structured light is used to verify the human face in vivo, and the structured light camera is used to send out uniformly spaced striped light to the target. If the target is a real living human face, due to the three-dimensional structure of the human face, the The striped light reflected back is bound to have inconsistent intervals; on the contrary, the structured light reflected back has the same interval.
  • the above-mentioned method of face live detection cannot effectively defend against online face verification attacks whose attack types are synthetic attacks and remake attacks, and are likely to threaten the user's information security.
  • the embodiments of the application provide a face recognition method, device, and medium based on artificial intelligence, which can defend against online face verification attacks of synthetic attacks, remake attacks, and mask attacks, and protect user information security.
  • the technical solution is as follows:
  • a face recognition method based on artificial intelligence is provided, which is applied to a computer device, and the method includes:
  • n groups of input video frames there are at least one group of video frames including color video frames and depth video frames of the target face, and n is a positive integer;
  • a first living body detection unit to identify the color video frames in the n groups of video frames, and the first living body detection unit is an interactive living body detection unit;
  • a second living body detection unit to identify the depth video frame in the n groups of video frames, and the second living body detection unit is a three-dimensional structured light living body detection unit;
  • the target human face is a living body target human face.
  • a face recognition device based on artificial intelligence comprising:
  • An acquisition module for acquiring n groups of input video frames there are at least one group of video frames including color video frames and depth video frames of the target face, and n is a positive integer;
  • the first living body detection unit is configured to identify the color video frames in the n groups of video frames, and the first living body detection unit is an interactive living body detection unit;
  • the second living body detection unit is configured to identify the depth video frame in the n groups of video frames, and the second living body detection unit is a three-dimensional structured light type living body detection unit;
  • the processing module is configured to determine that the target human face is a living target human face in response that the detection results of the first living body detection unit and the second living body detection unit are both living body types.
  • a computer device includes a processor and a memory.
  • the memory stores at least one instruction, at least one program, code set, or instruction set, and the at least one instruction At least one program, code set, or instruction set is loaded and executed by the processor to realize the artificial intelligence-based face recognition method described in the above aspect.
  • a computer-readable storage medium stores at least one instruction, at least one program, code set, or instruction set, the at least one instruction, the at least one program
  • the code set or instruction set is loaded and executed by the processor to implement the artificial intelligence-based face recognition method described in the above aspect.
  • a computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the artificial intelligence-based face recognition method as described in the above aspect.
  • the video frame contains the living body target face.
  • the first living body detection unit can resist remake attacks and mask attacks, and the second living body detection unit can resist synthetic attacks and remake attacks, which can more comprehensively protect the user's information security.
  • FIG. 1 is a schematic diagram of live face detection for different types of attacks provided by an exemplary embodiment of the present application
  • Fig. 2 is a schematic structural diagram of a computer system provided by an exemplary embodiment of the present application.
  • Fig. 3 is a flowchart of a face recognition method based on artificial intelligence provided by an exemplary embodiment of the present application
  • Fig. 4 is a flowchart of a face recognition method based on artificial intelligence provided by another exemplary embodiment of the present application.
  • Fig. 5 is a schematic diagram of facial feature points provided by an exemplary embodiment of the present application.
  • Fig. 6 is a flowchart of a face recognition method based on artificial intelligence combined with a face preprocessing process provided by an exemplary embodiment of the present application;
  • Fig. 7 is a flow chart of face detection through the MTCNN algorithm provided by an exemplary embodiment of the present application.
  • Fig. 8 is a structural block diagram of a VGG-16 deep learning network provided by an exemplary embodiment of the present application.
  • FIG. 9 is a flowchart of a face recognition method based on artificial intelligence combined with a living body face detection system provided by an exemplary embodiment of the present application.
  • FIG. 10 is a structural block diagram of an artificial intelligence-based face recognition device provided by an exemplary embodiment of the present application.
  • Fig. 11 is a structural block diagram of a server provided by an exemplary embodiment of the present application.
  • Fig. 12 is a schematic structural diagram of a computer device provided by an exemplary embodiment of the present application.
  • AI Artificial Intelligence
  • AI is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technology of computer science, which attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Artificial intelligence technology is a comprehensive discipline, covering a wide range of fields, including both hardware-level technology and software-level technology.
  • Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • artificial intelligence technology has been researched and applied in many fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, autonomous driving, drones , Robots, smart medical care, smart customer service, identity verification, live face recognition, etc. I believe that with the development of technology, artificial intelligence technology will be applied in more fields and play more and more important values.
  • Computer vision is a science that studies how to make machines "see”. Furthermore, it refers to the use of cameras and computers instead of human eyes to identify, track, and measure targets. , And further do graphic processing, so that computer processing becomes more suitable for human eyes to observe or send to the instrument to detect the image.
  • computer vision studies related theories and technologies, trying to establish an artificial intelligence system that can obtain information from images or multi-dimensional data.
  • Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, OCR (Optical Character Recognition, optical character recognition), video processing, video semantic understanding, video content/behavior recognition, 3D technology, virtual reality, and augmented reality , Synchronous positioning and map construction technologies, as well as common facial recognition, fingerprint recognition and other biometric recognition technologies.
  • the solution provided by the embodiments of the present application relates to the technical field of living body face detection.
  • both the first living body detection unit and the second living body detection unit are recognized as a living human face, it is determined that the target human face in the detected video frame is a living human face .
  • the recognition result of at least one of the first living body detection unit and the second living body detection unit is a non-living human face, it is determined that the target human face in the detected video frame is not a living human face.
  • Dynamic interactive verification means that the user needs to make corresponding actions according to the system instructions, such as blinking, opening the mouth, reading text or numbers, turning the head, etc.
  • In vivo 3D structured light face verification refers to the use of structured light cameras to send out uniformly spaced striped lights to the target, and determine whether the target is a real living face based on whether the reflected striped lights are uniformly spaced.
  • Synthetic attack that is, the face of another person is synthesized as an attack sample through the face synthesis technology based on deep learning.
  • the attacker can control the synthesized face of the other person to make a series of specified actions.
  • Mask attack that is, attack by wearing a mask made according to the appearance of others.
  • the 3D structured light face live verification method can effectively defend against synthetic attacks and remake attacks, but the 3D structured light face live verification method cannot defend against mask attacks.
  • the interactive live detection method can effectively defend against remake attacks and mask attacks, but the interactive live detection method cannot effectively defend against synthetic attacks.
  • the interactive live detection method needs to train the relevant model to recognize the actions made by the user, and the action data needs to be annotated. In the process of annotating the data, not only the person being collected must make corresponding actions according to the prompts, but also need to record The time required for each action increases the difficulty of data collection.
  • the embodiment of the present application provides a face recognition method based on artificial intelligence, which combines the above two living body detection methods.
  • a face recognition method based on artificial intelligence, which combines the above two living body detection methods.
  • the first living body detection unit and the second living body detection unit to identify whether the face in front of the camera is a picture corresponding to other media (such as pictures, videos, photos, printing paper, ID cards, etc.) to determine whether it is a living body, improve This improves the accuracy of in vivo verification, thereby ensuring user information security.
  • Fig. 1 shows a schematic diagram of live face detection for different types of attacks provided by an exemplary embodiment of the present application.
  • a smart phone Take a smart phone as an example.
  • the smart phone runs a program that supports the recognition of the face of a living body.
  • the program includes a first living body detection unit and a second living body detection unit.
  • the target object uses the smartphone to photograph the face containing the target object.
  • the smart phone recognizes whether the target face is a living target face.
  • the first target object is a living body
  • the first target human face is a living human face.
  • the first living body detection unit can recognize the actions completed by the first target object through color video frames, and the second living body detection
  • the depth information of the first target face can be recognized through the depth video frame, and the smart phone can recognize that the target face is a living face.
  • the face of the second target object is a second target human face combined with the facial features of the first target object as shown in Figure 1(a), which can be controlled to make a series of Action
  • the first living body detection unit recognizes that the second target face is a living body face through the color video frame
  • the second living body detection unit recognizes that the second target face does not have depth information, that is, it is not a living body face, through the depth video frame
  • the smart phone recognizes that the second target human face is not a living human face.
  • the face of the third target object is the face of the first target object in the photo
  • the first living body detection unit recognizes that the third target face cannot make any action through the color video frame.
  • the second living body detection unit recognizes that the third target face does not have depth information through the depth video frame
  • the smart phone recognizes that the third target face is not a living face.
  • the fourth target object wears a mask made according to the face of the first target object, and the first living body detection unit recognizes that the face of the fourth target cannot make any action through the color video frame.
  • the second living body detection unit recognizes that the fourth target human face has depth information through the depth video frame, and the smart phone recognizes that the fourth target human face is not a living human face.
  • Smartphones running the above programs that support the recognition of live faces can defend against synthetic attacks, remake attacks, and mask attacks, and can more comprehensively respond to various types of sample attacks, ensuring user information security.
  • the embodiment of the application provides a face recognition method based on artificial intelligence.
  • the method can be applied to a server.
  • the user uploads a video frame to the server, and the server performs live face verification and subsequent operations.
  • the method can also be applied In the terminal, a program supporting live face detection is run on the terminal to perform live face verification on the video frames taken by the user, and the user can upload the verification result to the server for subsequent operations.
  • Fig. 2 shows a structural diagram of a computer system provided by an exemplary embodiment of the present application.
  • the computer system 100 includes a terminal 120 and a server 140.
  • the terminal 120 installs and runs an application program that supports face living detection.
  • the application program may also be any of a small program, a web page, and an information interaction platform (such as an official account).
  • the terminal 120 is provided with a three-dimensional camera (including a color camera and a depth camera) for collecting facial images of the user 160 (including at least one of photos and video frames).
  • the terminal 120 takes pictures of the face of the user 160 or shoots a video at a certain frequency.
  • the face image of the user 160 may be an image with actions such as blinking, turning of the head, smiling, and opening the mouth, or no additional actions.
  • the terminal 120 may generally refer to one of multiple terminals, and this embodiment only uses the terminal 120 as an example for illustration.
  • the types of terminal devices include at least one of smart phones, tablet computers, e-book readers, MP3 players, MP4 players, laptop computers, and desktop computers.
  • the terminal includes a smart phone as an example.
  • the terminal 120 is connected to the server 140 through a wireless network or a wired network.
  • the server 140 includes at least one of a server, multiple servers, a cloud computing platform, and a virtualization center.
  • the server 140 includes a processor 144 and a memory 142, and the memory 142 further includes an obtaining module 1421, a processing module 1422, and a receiving module 1423.
  • the server 140 is used to provide background services for programs that support face living detection.
  • the back-end server may be a storage service for facial images, or provide a computing service for live face detection, or provide a verification service for live face detection.
  • the server 140 is responsible for the main calculation work, and the terminal 120 is responsible for the secondary calculation work; or, the server 140 is responsible for the secondary calculation work, and the terminal 120 is responsible for the main calculation work; or, the server 140 and the terminal 120 are distributed Computing architecture for collaborative computing.
  • the number of the aforementioned terminals may be more or less. For example, there may be only one terminal, or there may be dozens or hundreds of terminals, or more.
  • the embodiments of the present application do not limit the number of terminals and device types.
  • Fig. 3 shows a face recognition method based on artificial intelligence provided by an exemplary embodiment of the present application.
  • the method is applied to the terminal 120 in the computer system shown in Fig. 2 or other computer systems.
  • the method includes the following step:
  • Step 301 Obtain n groups of input video frames. There are at least one group of video frames including color video frames and depth video frames of the target face, and n is a positive integer.
  • the terminal includes at least one of a smart phone, a tablet computer, a notebook computer, a desktop computer connected with a camera, a camera, and a video camera.
  • the terminal uses the terminal to continuously shoot the face of the target object.
  • the continuous shooting is video shooting.
  • Color video frames and depth video frames are shot at the same time at each shooting moment.
  • the color video frames and the depth video frames taken at the same shooting moment constitute a group of video frames.
  • the color video frame is a color video frame in Red Green Blue (RGB) format.
  • RGB Red Green Blue
  • Each pixel in the depth (Depth, D) video frame stores the distance (depth) value from the depth camera to each real point in the scene.
  • the real point is a point on the face of the target person.
  • the color video frame and the depth video frame are stored as two associated video frames, for example, the shooting time is used for association.
  • the color video frame and the depth video frame are stored as the same video frame, for example, the video frame includes four channels of R, G, B, and D at the same time.
  • the embodiment of the present application does not limit the specific storage mode of the color video frame and the depth video frame.
  • Step 302 Invoke the first living body detection unit to identify the color video frames in the n groups of video frames, and the first living body detection unit is an interactive living body detection unit.
  • the first living body detection unit includes a neural network model and a program unit
  • the program unit may be a calculation unit or an artificial intelligence classifier.
  • the neural network model includes a stacked hourglass network.
  • the neural network model in the first living body detection part can identify the facial features of the target face in the color video frame, and obtain the position coordinates of the facial features of the target face.
  • the feature points of the five sense organs refer to the feature points corresponding to the location of the five sense organs, such as the feature points of the nose.
  • the facial features include at least one of eyebrows, eyes, nose, mouth, and ears.
  • the facial features feature points include the left eye, right eye, nose, left corner of the mouth, and right corner of the target face.
  • the calculation unit calculates the distance change of the facial features feature points, or the artificial intelligence classifier determines the distance change of the facial features feature points, so as to determine whether the target face has completed the target action.
  • Step 303 Invoke the second living body detection unit to identify the depth video frames in the n groups of video frames, and the second living body detection unit is a three-dimensional structured light type living body detection unit.
  • the second living body detection unit includes a VGG-16 deep learning network.
  • the second living body detection unit can identify whether there is a living target human face in the depth video frame, and schematically, output the probability that the target human face belongs to a living human face. For example, if the probability output by the second living body detection unit is 96%, the depth video frame recognized by the second living body detection unit contains a living target human face.
  • step 304 in response to the detection results of the first living body detection unit and the second living body detection unit being both living body types, it is determined that the target face is a living body target face.
  • the target face is a non-living body target face
  • the detection result obtained is non-living type, then the target face is a non-living target face
  • the detection results obtained are all non-living body types, and the target face is a non-living target face.
  • the first living body detection unit recognizes the target face in the color video frame to obtain the mouth feature points of the target face, and calculates the position change of the mouth feature points to conclude that the target face has completed the mouth opening action.
  • the second living body detection unit recognizes the target face in the depth video frame, and obtains that the probability that the target face belongs to the living target face is 98%. Therefore, the target face in the video frame composed of the above-mentioned color video frame and the depth video frame is a living target face.
  • first living body detection part and the second living body detection part in the above-mentioned embodiments may be models constructed by any neural network.
  • the first living body detection unit can resist remake attacks and mask attacks, and the second living body detection unit can resist synthetic attacks. And remake attacks can more comprehensively protect users' information security.
  • the following describes the training method of the neural network model in the first living body detection unit.
  • the neural network model is obtained by training as follows:
  • the training sample set includes multiple sets of sample face images and sample facial features.
  • the sample face image includes the face image contained in a photo or n groups of video frames, and the sample facial features include the left eye, right eye, nose, left corner of the mouth, and right corner of the sample face.
  • the feature points of the sample facial features can be manually labeled, or the sample face image can be input into a model with feature point labeling capability to identify the facial features, or the public data set of the labeled facial features in the prior art can be used .
  • the predicted position coordinates of the nose feature points are (x1, y1), and the actual position coordinates of the nose feature points of the sample face are (x0, y0), and the two are compared to calculate the error loss.
  • an error loss function can be used to perform error calculation on the sample facial features and predicted facial features.
  • the error function can be a smooth one-norm loss function, or a Euclidean loss function, or a normalized loss exponential function or other error loss functions.
  • the error loss between the sample facial features of the sample face and the predicted facial features continues to decrease.
  • the predicted facial feature points output by the first living body detection unit are consistent with the marked sample facial features, and the first living body detection unit training is completed .
  • the training method of the second living body detection unit is described below.
  • the second living body detection unit is obtained by training in the following way:
  • the training sample set includes multiple sets of depth images of sample faces and live results of the sample faces.
  • the depth image of the sample face is acquired by a depth camera. This application does not limit how to obtain the depth image of the sample face.
  • the live results of the sample face include whether the sample face is a live face or a non-living face.
  • the live results of the sample face can be manually annotated or recognized by other models.
  • the live result of the sample face is a live sample face (that is, the probability that the sample face belongs to a live face is 100%), and the prediction result of the second live detection unit is that the sample face belongs to a live face The probability is 95%. Compare the two and calculate the error loss.
  • an error loss function can be used to perform error calculation on the live result of the sample face and the probability that the sample face belongs to the live face.
  • the error function can be a smooth one-norm loss function, or a Euclidean loss function, or a normalized loss exponential function or other error loss functions.
  • the training method of the second living body detection unit may also be different from that of the first living body detection unit.
  • Fig. 4 shows a face recognition method based on artificial intelligence provided by another exemplary embodiment of the present application.
  • the method is applied to the terminal 120 in the computer system shown in Fig. 2 or other computer systems.
  • the method includes The following steps:
  • Step 401 Acquire n groups of input video frames. There are at least one group of video frames including color video frames and depth video frames of the target face, and n is a positive integer.
  • Step 401 is consistent with step 301 shown in FIG. 3, and will not be repeated here.
  • Step 402 Invoke the neural network model in the first living body detection part to obtain the positions of the facial features on the color video frame.
  • the first living body detection unit includes a neural network model and a program unit.
  • the program unit may be a calculation unit or a program unit for classification, and the program unit for classification may be an artificial intelligence classifier.
  • the neural network model includes stacked hourglass neural networks.
  • Each stacked hourglass neural network includes a multi-scale bottom-up feature extraction encoder and a multi-scale top-down decoder.
  • the scaling is symmetrical.
  • the combination of symmetrical multi-scale encoder and decoder can extract multi-scale features, and finally output 106 heat maps (Heatmaps), each heat map corresponds to a feature point, and the final feature point position is the maximum of each heat map The coordinate corresponding to the value point.
  • the stacked hourglass neural network further refines the extracted feature points by stacking multiple hourglass networks.
  • the refined operation refers to the operation of accurately calculating the feature points to obtain the precise location of the feature points.
  • Step 403 Invoke the first living body detection unit to determine whether the target face completes the target action according to the distance of the facial features on the color video frame.
  • This step includes the following sub-steps:
  • Step 4031 Invoke the first living body detection unit to calculate the ratio of the maximum lateral distance to the maximum longitudinal distance of the feature points of the five sense organs that belong to the same five sense organs, where the five senses include at least one of the eyes and the mouth.
  • the five senses include the eyes.
  • Eye feature points include left eye feature points, right eye feature points, upper eyelid feature points, and lower eyelid feature points.
  • the facial features are represented by white dots.
  • the maximum horizontal distance is the distance between the feature point of the left corner of the eye and the feature point of the right corner of the eye.
  • the maximum longitudinal distance is the distance between the feature point of the upper eyelid and the feature point of the lower eyelid.
  • the feature point of the upper eyelid is the feature point at the midpoint of the upper eyelid.
  • the feature point is the feature point at the midpoint of the upper eyelid.
  • the facial features include the mouth, and the mouth feature points include the left mouth corner feature point, the right mouth corner feature point, the upper lip feature point, and the lower lip feature point.
  • the mouth is represented by white dots.
  • Point the maximum horizontal distance of facial features is the distance between the left and right corners of the mouth
  • the maximum vertical distance is the distance between the upper and lower lip
  • the upper lip is the midpoint of the upper lip
  • the characteristic point of the lower lip is the characteristic point at the intersection of the two lip peaks.
  • the lower lip characteristic point is the characteristic point at the midpoint of the lower lip.
  • Step 4032 In response to the ratio reaching a preset condition, it is determined that the target face completes the target action.
  • the preset condition includes that the distance ratio reaches a distance threshold.
  • the target action includes at least one of the blinking action and the mouth opening action, and the determination of the blinking action and the mouth opening action are respectively described.
  • the first distance is the distance between the left eye corner feature point and the right eye corner feature point
  • the second distance is the distance between the upper eyelid feature point and the lower eyelid feature point.
  • the distance ratio includes the first distance 501 to the second distance 502, or the second distance 502 to the first distance 501.
  • the same eye includes the first distance 501 and the second distance 502. This embodiment is only for clear marking.
  • the first distance 501 and the second distance 502 are separated in two Mark on the eyes.
  • the distance ratio of the left eye of the target face is a1
  • the distance ratio of the right eye is a2
  • the average value of a1 and a2 is calculated.
  • the first distance threshold may be a default setting, or a distance threshold dynamically adjusted according to different target faces, for example, the first distance threshold of an adult's face is greater than the first distance threshold of a child's face.
  • the average value of the distance ratio is required to be greater than the first distance threshold; if the distance ratio is the second distance to the first distance, the average of the distance ratio is required The value is less than the first distance threshold.
  • the output of the first living body detection unit is a video frame containing the target action.
  • a positive sample video frame means that the video frame is a video frame containing a living human face, and is opposite to an attack sample video frame.
  • the first living body detection unit recognizes a video frame containing a living body face, and the video frame can be used as a sample video frame for subsequent training of the neural network model in the first living body detection unit.
  • step S14 can be alternatively implemented as the following step 14:
  • Step 14 in response to the facial features being the eyes and the ratio is identified as the first type by the first classifier, it is determined that the target face completes the blinking action.
  • the first living body detection unit further includes at least one of a first classifier and a second classifier.
  • the above-mentioned ratio can also be identified by a first classifier, which is a machine learning model that has the ability to identify the distance ratio of feature points of the five sense organs, such as a support vector machine.
  • the first type is the type corresponding to the blinking action.
  • the first classifier divides the input distance ratio into the type corresponding to the blinking action (that is, the first type) and the type corresponding to the non-blinking action.
  • the third distance is the distance between the feature point of the left corner of the mouth and the feature point of the right corner of the mouth
  • the fourth distance is the distance between the middle feature point of the upper lip and the middle feature point of the lower lip (The characteristic points are shown in white circles).
  • the distance ratio includes the third distance 503 to the fourth distance 504, or the fourth distance 504 to the third distance 503.
  • the third distance 503 is longer than the fourth distance 504 as an example.
  • the second distance threshold may be a default setting, or a distance threshold dynamically adjusted according to different target faces, for example, the second distance threshold of a man's face is greater than the second distance threshold of a woman's face.
  • the average value of the distance ratio needs to be less than the second distance threshold; if the distance ratio is the fourth distance to the third distance, the average distance ratio is required The value is greater than the second distance threshold.
  • the video frame is rejected in advance.
  • a positive sample video frame means that the video frame is a video frame containing a living human face, and is opposite to an attack sample video frame.
  • the second living body detection unit recognizes a video frame containing a living human face, and the video frame can be used as a sample video frame for subsequent training of the second living body detection unit.
  • step S23 can also be implemented as step 23 as follows:
  • Step 23 In response to the facial features being the mouth and the ratio is identified as the second type by the second classifier, it is determined that the target face completes the mouth opening action.
  • the above-mentioned ratio can also be recognized by a second classifier, which is a machine learning model with the ability to recognize the distance ratio of the feature points of the five sense organs, such as a support vector machine.
  • the second type is the type corresponding to the mouth opening action.
  • the second classifier divides the input distance ratio into the type corresponding to the mouth opening action (that is, the second type) and the type corresponding to the mouth opening action not being performed.
  • the second living body detection unit is called to identify the depth video frames in the n groups of video frames, and the second living body detection unit is a three-dimensional structured light type living body detection unit.
  • Step 405 in response to the detection results of the first living body detection unit and the second living body detection unit being both living body types, determining that the target face is a living body target face.
  • the output detection result is that the target face is a non-living target face.
  • This step includes the following sub-steps:
  • Step 4051 Obtain the first number of frames and the second number of frames.
  • the first number of frames is the number of color video frames that contain the target action identified by the first living body detection unit
  • the second number of frames is the number of color video frames identified by the second living body detection unit.
  • Step 4052 In response to the first frame number being greater than the first preset threshold and the second frame number being greater than the second preset threshold, it is determined that the target face is a living target face.
  • the results are both living human faces, and the number of frames containing living human faces is greater than the first preset threshold, and conforms to the living human face correspondence
  • the number of frames of the depth information is greater than the second preset threshold, it is determined that the target face is a living target face.
  • the living body detection unit in the above embodiment may be any neural network.
  • the method provided in this embodiment combines the first living body detection unit and the second living body detection unit to calculate the distance ratio of the facial features points to determine whether the target face has completed the target action, and there is no need to perform the action.
  • the first living body detection unit can resist remake attacks and mask attacks, and the second living body detection unit can resist synthetic attacks and remake attacks, which can more comprehensively protect the user's information security.
  • the first living body detection unit and the second living body detection unit recognize the video frame, it is necessary to determine whether the video frame contains a human face.
  • FIG. 6 is a flowchart of a face recognition method based on artificial intelligence combined with preprocessing provided by an exemplary embodiment of the present application.
  • the method is applied to the terminal 120 in the computer system shown in FIG. 2 or other computer systems, The method includes the following steps:
  • Step 601 Invoke a face preprocessing unit to recognize color video frames in n groups of video frames.
  • the face preprocessing unit is a machine learning unit with face recognition capabilities.
  • the face preprocessing unit may be a multi-task convolutional neural network (Multi-Task Convolutional Neural Network, MTCNN).
  • MTCNN Multi-Task Convolutional Neural Network
  • the MTCNN network is cascaded by three sub-networks, including Proposal Network (P-Net), Refine Network (R-Net) and Output Network (O-Net).
  • Fig. 7 shows a flowchart of a method for detecting facial features by an MTCNN network according to an exemplary embodiment of the present application, and the method includes:
  • Step 1 Obtain the image color video frame.
  • Step 2 It is recommended that the network take image color video frames as input to generate a series of candidate regions that may contain human faces.
  • Step 3 The purification network purifies a large number of candidate regions generated by the suggestion network, and filters out the parts that do not contain human faces.
  • Step 4 The output network outputs the face area and locates the facial features.
  • the facial features include left and right eyes, nose tip, and left and right mouth corners.
  • Step 5 Obtain the face detection frame and facial features.
  • Step 602 Acquire m groups of video frames with a human face area as the recognition result, m ⁇ n, and m is a positive integer.
  • the output network outputs m groups of video frames containing the face area.
  • the m groups of video frames may be as many as the input n groups of video frames, or less than the input n groups of video frames. And locate the facial features on the face, and output the position coordinates of the facial features.
  • Step 603 In response to m being less than the third preset threshold, filter out n groups of video frames.
  • the n groups of video frames are rejected in advance.
  • Step 604 Send the color video frames in the m groups of video frames to the first living body detection unit, and send the depth video frames in the m groups of video frames to the second living body detection unit.
  • the pre-processed video frames are respectively sent to the first living body detection unit and the second living body detection unit.
  • the first living body detection unit recognizes the color video frame
  • the second living body detection unit recognizes the depth video frame.
  • a two-class deep learning model is trained to determine whether the input deep video frame is a real living human face.
  • the two-class deep learning model is the VGG-16 deep learning network, and the structure of the VGG-16 deep learning network is shown in Figure 8.
  • 3 ⁇ 3 identifies the size of the convolution kernel, the number after convolution represents the number of output channels, and the number in the fully connected layer is the number of output channels.
  • the depth video frame is first scaled to the size of 224 ⁇ 224 (pixels), and then the scaled video frame is used as the input of the VGG-16 deep learning network.
  • the VGG-16 deep learning network After a series of convolution (Convolution), linear rectification activation (ReLU Activation), and full
  • Convolution convolution
  • ReLU Activation linear rectification activation
  • Softmax Fully-Connected and normalized index
  • the deep neural network in the foregoing embodiment may be any neural network.
  • the video frame before the video frame is detected by the first living body detection unit and the second living body detection unit, the video frame is recognized by the face preprocessing unit, and the video frame contains no people.
  • the partial video frame filtering of the face improves the accuracy of the two kinds of living body detection parts in detecting the living body face, and also improves the detection efficiency of the living body face.
  • the following describes the entire process of the live face detection system identifying the live face in the video frame.
  • the first living body detection unit as the interactive living body detection model
  • the second living body detection unit as the three-dimensional structured light detection model
  • the face preprocessing unit as the face detection preprocessing model as an example for description.
  • FIG. 9 shows a face recognition method based on artificial intelligence provided by an exemplary embodiment of the present application.
  • the method is applied to the terminal 120 in the computer system shown in FIG. 2 or other computer systems.
  • the method includes the following step:
  • Step 801 Input the color video frame into the face detection preprocessing model.
  • Step 802 Invoke the face detection preprocessing model to detect the face.
  • Step 803 Determine whether the number of video frames containing a human face is greater than n1.
  • the face detection preprocessing model 10 is used to determine whether the input video frame contains a human face. If the number of frames of the video frame containing the human face is greater than n1 (the third preset threshold), then go to step 806 and step 810; otherwise, Then go to step 804.
  • n1 the third preset threshold
  • Step 804 the input video frame is an attack sample, and the input of the video frame is rejected.
  • Step 805 Input the depth video frame into the three-dimensional structured light inspection model.
  • the three-dimensional light structured light detection model 11 detects the depth video frame in the video frame.
  • step 806 the face depth video frame is retained according to the color video frame detection result, and the face area is cropped.
  • the three-dimensional structured light detection model 11 can determine the face area in the depth video frame.
  • Step 807 Determine whether it is a paper attack through the lightweight classifier.
  • a paper attack refers to an attack formed by a human face on a flat-shaped medium, such as a sample attack formed by a human face on a photo, screen image, ID card, and newspaper. That is, the lightweight classifier can determine whether the target face is a face with a three-dimensional structure.
  • Step 808 Determine whether the number of non-paper attack video frames is greater than n2.
  • n2 the second preset threshold
  • Step 809 The input video frame is an attack sample, and the input of the video frame is rejected.
  • step 810 the interactive living body detection model is invoked to detect the feature points of the five sense organs.
  • Step 811 Calculate the aspect ratio of the feature points of the eyes or mouth, and determine whether to perform a blinking or mouth opening action.
  • the aspect ratio of the feature points of the eyes or mouth that is, the distance ratio.
  • the target face can also perform actions such as turning the head or nodding.
  • Step 812 Determine whether the number of video frames containing the blinking or mouth opening motion is greater than n3.
  • the feature points of the five sense organs of the target face are determined through the interactive living body detection model 12, and the movement changes of the feature points of the five sense senses are used to determine whether the target face has completed the target action.
  • the target action including blinking or mouth opening as an example, if the number of video frames containing blinking or mouth opening is greater than n3 (the first preset threshold), then go to step 814; otherwise, go to step 813.
  • Step 813 The input video frame is an attack sample, and the input of the video frame is rejected.
  • step 814 the input video frame is a positive sample video frame, and the input video frame is passed.
  • the positive sample video frame contains a living human face, and the positive sample video frame can be used as a training sample for training the neural network model in the first living body detection unit.
  • the deep neural network in the foregoing embodiment may be any neural network.
  • the method provided in this embodiment combines the first living body detection unit and the second living body detection unit to calculate the distance ratio of the facial features to determine whether the target face has completed the target action.
  • the depth information of the face is used to determine whether the target face is a paper attack.
  • the video frame contains the live target face.
  • the living body detection unit can resist remake attacks and mask attacks, and the second living body detection unit can resist synthetic attacks and remake attacks, which can more comprehensively protect the user's information security.
  • Fig. 10 shows a block diagram of an artificial intelligence-based face recognition device provided by an exemplary embodiment of the present application.
  • the measuring device includes:
  • the obtaining module 1010 is configured to obtain n groups of input video frames. There are at least one group of video frames including color video frames and depth video frames of the target face, and n is a positive integer;
  • the first living body detection unit 1020 is used to identify color video frames in the n groups of video frames, and the first living body detection unit is an interactive living body detection unit;
  • the second living body detection unit 1030 is configured to identify the depth video frames in the n groups of video frames, and the second living body detection unit is a three-dimensional structured light type living body detection unit;
  • the processing module 1040 is configured to determine that the target human face is a living target human face in response to the detection results of the first living body detection unit and the second living body detection unit being both living body types.
  • the first living body detection unit 1020 is configured to call a neural network model to obtain the position of the facial features feature points on the color video frame;
  • the first living body detection unit 1020 is configured to determine whether the target human face completes the target action according to the distance of the facial features points on the color video frame.
  • the device includes a calculation module 1050;
  • the calculation module 1050 is configured to call the first living body detection unit to calculate the ratio of the maximum lateral distance to the maximum longitudinal distance of the facial features of the same facial features, the facial features include at least one of the eyes and the mouth;
  • the processing module 1040 is configured to determine that the target face completes the target action in response to the ratio reaching a preset condition.
  • the processing module 1040 is configured to determine that the target face completes the blinking action in response to the facial features being the eyes and the ratio reaches the first distance threshold; in response to the facial features being the mouth and the ratio reaching the second The distance threshold is used to determine the target face to complete the mouth opening action.
  • the facial features feature points include eye feature points, and the eye feature points include left eye corner feature points, right eye corner feature points, upper eyelid feature points, and lower eyelid feature points;
  • the acquisition module 1010 is used to acquire a first distance and a second distance, the first distance is the distance between the left eye corner feature point and the right eye corner feature point, and the second distance is between the upper eyelid feature point and the lower eyelid feature point the distance;
  • the calculation module 1050 is configured to calculate the distance ratio between the first distance and the second distance
  • the calculation module 1050 is configured to calculate the average value of the distance ratio of the two eyes on the target face; in response to the average value being greater than the first distance threshold, it is determined that the target face completes the blinking action.
  • the first living body detection unit 1020 further includes: at least one of a first classifier and a second classifier, and the processing module 1040 is configured to respond to that the five sense organs are eyes and The ratio is identified as the first type by the first classifier, and it is determined that the target face completes the blinking action; the processing module 1040 is configured to determine the target person in response to the facial features being the mouth and the ratio is identified as the second type by the second classifier The face completes the action of opening the mouth.
  • the device includes a training module 1060, and the neural network model is obtained by training in the following manner:
  • the acquiring module 1010 is configured to acquire a training sample set, the training sample set includes multiple sets of sample face images and sample facial features;
  • the first living body detection unit 1020 is configured to recognize the sample face image to obtain the predicted facial features of the sample face;
  • the calculation module 1050 is configured to compare the sample facial features of the sample face with the predicted facial features, and calculate the error loss;
  • the training module 1060 is used to train the first living body detection unit according to the error loss through the error back propagation algorithm to obtain the trained first living body detection unit.
  • the device includes a face preprocessing unit 1070;
  • the face preprocessing unit 1070 is configured to recognize color video frames in the n groups of video frames, and the face preprocessing unit 1070 is a machine learning model with face recognition capability;
  • the acquiring module 1010 is configured to acquire m groups of video frames with a face region as a recognition result, m ⁇ n, and m is a positive integer;
  • the processing module 1040 is configured to send the color video frames in the m groups of video frames to the first living body detection unit, and send the depth video frames in the m groups of video frames to the second living body detection unit.
  • the acquiring module 1010 is configured to acquire a first frame number and a second frame number, where the first frame number is the color video frame containing the target action identified by the first living body detection unit 1020 The number of frames, and the second number of frames is the number of depth video frames identified by the second living body detection unit 1030 that match the depth information corresponding to the living body's face;
  • the processing module 1040 is configured to determine that the target face is a living target face in response to that the first frame number is greater than the first preset threshold and the second frame number is greater than the second preset threshold.
  • the processing module 1040 is configured to determine that the target face is a non-living target face in response to the first frame number of the video frame being less than the first preset threshold, and filter n groups of video frames Divide; or, in response to the second number of video frames being less than the second preset threshold, determine that the target face is a non-living target face, and filter out n groups of video frames.
  • the processing module 1040 is configured to filter out n groups of video frames in response to m being less than a third preset threshold.
  • Fig. 11 shows a schematic structural diagram of a server provided by an exemplary embodiment of the present application.
  • the server may be a server in the background server cluster 140. Specifically:
  • the server 1100 includes a central processing unit (CPU, Central Processing Unit) 1101, a system memory 1104 including a random access memory (RAM, Random Access Memory) 1102 and a read-only memory (ROM, Read Only Memory) 1103, and a system memory 1104 connected to it And the system bus 1105 of the central processing unit 1101.
  • the server 1100 also includes a basic input/output system (I/O system, Input Output System) 1106 that helps transfer information between various devices in the computer, and a large-scale system for storing the operating system 1113, application programs 1114, and other program modules 1115.
  • the basic input/output system 1106 includes a display 1108 for displaying information and an input device 1109 such as a mouse and a keyboard for the user to input information.
  • the display 1108 and the input device 1109 are both connected to the central processing unit 1101 through the input and output controller 1110 connected to the system bus 1105.
  • the basic input/output system 1106 may also include an input and output controller 1110 for receiving and processing input from multiple other devices such as a keyboard, a mouse, or an electronic stylus.
  • the input and output controller 1110 also provides output to a display screen, a printer, or other types of output devices.
  • the mass storage device 1107 is connected to the central processing unit 1101 through a mass storage controller (not shown) connected to the system bus 1105.
  • the mass storage device 1107 and its associated computer readable medium provide non-volatile storage for the server 1100. That is, the mass storage device 1107 may include a computer-readable medium (not shown) such as a hard disk or a Compact Disc Read Only Memory (CD-ROM, Compact Disc Read Only Memory) drive.
  • a computer-readable medium such as a hard disk or a Compact Disc Read Only Memory (CD-ROM, Compact Disc Read Only Memory) drive.
  • Computer-readable media may include computer storage media and communication media.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storing information such as computer readable instructions, data structures, program modules or other data.
  • Computer storage media include RAM, ROM, Erasable Programmable Read Only Memory (EPROM, Electrically Erasable Programmable Read Only Memory), flash memory or other solid-state storage Its technology, CD-ROM, Digital Versatile Disc (DVD, Digital Versatile Disc) or Solid State Drives (SSD, Solid State Drives), other optical storage, tape cartridges, magnetic tape, disk storage or other magnetic storage devices.
  • the random access memory may include resistive random access memory (ReRAM, Resistance Random Access Memory) and dynamic random access memory (DRAM, Dynamic Random Access Memory).
  • ReRAM resistive random access memory
  • DRAM Dynamic Random Access Memory
  • ReRAM Resistance Random Access Memory
  • DRAM Dynamic Random Access Memory
  • the server 1100 may also be connected to a remote computer on the network to run through a network such as the Internet. That is, the server 1100 can be connected to the network 1112 through the network interface unit 1111 connected to the system bus 1105, or in other words, the network interface unit 1111 can also be used to connect to other types of networks or remote computer systems (not shown).
  • the foregoing memory also includes one or more programs, and one or more programs are stored in the memory and configured to be executed by the CPU.
  • a computer device in an optional embodiment, includes a processor and a memory.
  • the memory stores at least one instruction, at least one program, a code set, or an instruction set, at least one instruction, and at least one program.
  • the code set or instruction set is loaded and executed by the processor to realize the above-mentioned artificial intelligence-based face recognition method.
  • a computer-readable storage medium stores at least one instruction, at least one program, code set, or instruction set, at least one instruction, at least one program, code set, or The instruction set is loaded and executed by the processor to realize the above-mentioned artificial intelligence-based face recognition method.
  • FIG. 12 shows a structural block diagram of a computer device 1200 provided by an exemplary embodiment of the present application.
  • the computer device 1200 may be a portable mobile terminal, such as a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, dynamic image expert compression standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, The motion picture expert compresses the standard audio frequency level 4) the player.
  • the computer device 1200 may also be called user equipment, portable terminal, and other names.
  • the computer device 1200 includes a processor 1201 and a memory 1202.
  • the processor 1201 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on.
  • the processor 1201 may adopt at least one hardware form among DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), and PLA (Programmable Logic Array, Programmable Logic Array). accomplish.
  • the processor 1201 may also include a main processor and a coprocessor.
  • the main processor is a processor used to process data in the awake state, also called a CPU (Central Processing Unit, central processing unit); the coprocessor is A low-power processor used to process data in the standby state.
  • the processor 1201 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used to render and draw content that needs to be displayed on the display screen.
  • the processor 1201 may further include an AI (Artificial Intelligence) processor, and the AI processor is used to process computing operations related to machine learning.
  • AI Artificial Intelligence
  • the memory 1202 may include one or more computer-readable storage media, which may be tangible and non-transitory.
  • the memory 1202 may also include high-speed random access memory and non-volatile memory, such as one or more magnetic disk storage devices and flash memory storage devices.
  • the non-transitory computer-readable storage medium in the memory 1202 is used to store at least one instruction, and the at least one instruction is used to be executed by the processor 1201 to implement the artificial intelligence-based humans provided in this application. Face recognition method.
  • the computer device 1200 may optionally further include: a peripheral device interface 1203 and at least one peripheral device.
  • the peripheral device includes: at least one of a radio frequency circuit 1204, a touch display screen 1205, a camera component 1206, an audio circuit 1207, a positioning component 1208, and a power supply 1209.
  • the peripheral device interface 1203 may be used to connect at least one peripheral device related to I/O (Input/Output) to the processor 1201 and the memory 1202.
  • the processor 1201, the memory 1202, and the peripheral device interface 1203 are integrated on the same chip or circuit board; in some other embodiments, any one of the processor 1201, the memory 1202, and the peripheral device interface 1203 or The two can be implemented on a separate chip or circuit board, which is not limited in this embodiment.
  • the radio frequency circuit 1204 is used for receiving and transmitting RF (Radio Frequency, radio frequency) signals, also called electromagnetic signals.
  • the radio frequency circuit 1204 communicates with a communication network and other communication devices through electromagnetic signals.
  • the radio frequency circuit 1204 converts electrical signals into electromagnetic signals for transmission, or converts received electromagnetic signals into electrical signals.
  • the radio frequency circuit 1204 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a user identity module card, and so on.
  • the radio frequency circuit 1204 can communicate with other terminals through at least one wireless communication protocol.
  • the wireless communication protocol includes but is not limited to: World Wide Web, Metropolitan Area Network, Intranet, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area network and/or WiFi (Wireless Fidelity, wireless fidelity) network.
  • the radio frequency circuit 1204 may also include a circuit related to NFC (Near Field Communication), which is not limited in this application.
  • the touch screen 1205 is used to display UI (User Interface).
  • the UI can include graphics, text, icons, videos, and any combination thereof.
  • the touch display screen 1205 also has the ability to collect touch signals on or above the surface of the touch display screen 1205.
  • the touch signal may be input to the processor 1201 as a control signal for processing.
  • the touch display screen 1205 is used to provide virtual buttons and/or virtual keyboards, also called soft buttons and/or soft keyboards.
  • the touch display screen 1205 may be a flexible display screen, which is arranged on the curved surface or the folding surface of the computer device 1200. Furthermore, the touch screen 1205 can also be set as a non-rectangular irregular figure, that is, a special-shaped screen.
  • the touch display screen 1205 can be made of materials such as LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode, organic light-emitting diode).
  • the camera assembly 1206 is used to capture images or videos.
  • the camera assembly 1206 includes a front camera and a rear camera.
  • the front camera is used to implement video calls or selfies
  • the rear camera is used to implement photos or videos.
  • the camera assembly 1206 may also include a flash.
  • the flash can be a single-color flash or a dual-color flash. Dual color temperature flash refers to a combination of warm light flash and cold light flash, which can be used for light compensation under different color temperatures.
  • the audio circuit 1207 is used to provide an audio interface between the user and the computer device 1200.
  • the audio circuit 1207 may include a microphone and a speaker.
  • the microphone is used to collect sound waves of the user and the environment, and convert the sound waves into electrical signals and input them to the processor 1201 for processing, or input to the radio frequency circuit 1204 to implement voice communication.
  • the microphone can also be an array microphone or an omnidirectional collection microphone.
  • the speaker is used to convert the electrical signal from the processor 1201 or the radio frequency circuit 1204 into sound waves.
  • the speaker can be a traditional thin-film speaker or a piezoelectric ceramic speaker.
  • the speaker When the speaker is a piezoelectric ceramic speaker, it can not only convert the electrical signal into human audible sound waves, but also convert the electrical signal into human inaudible sound waves for distance measurement and other purposes.
  • the audio circuit 1207 may also include a headphone jack.
  • the positioning component 1208 is used to locate the current geographic location of the computer device 1200 to implement navigation or LBS (Location Based Service, location-based service).
  • the positioning component 1208 may be a positioning component based on the GPS (Global Positioning System, Global Positioning System) of the United States, the Beidou system of China, or the Galileo system of Russia.
  • the power supply 1209 is used to supply power to various components in the computer device 1200.
  • the power source 1209 may be alternating current, direct current, disposable batteries, or rechargeable batteries.
  • the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery.
  • a wired rechargeable battery is a battery charged through a wired line
  • a wireless rechargeable battery is a battery charged through a wireless coil.
  • the rechargeable battery can also be used to support fast charging technology.
  • the computer device 1200 further includes one or more sensors 1210.
  • the one or more sensors 1210 include, but are not limited to: an acceleration sensor 1211, a gyroscope sensor 1212, a pressure sensor 1213, a fingerprint sensor 1214, an optical sensor 1215, and a proximity sensor 1216.
  • the acceleration sensor 1211 detects the magnitude of acceleration on the three coordinate axes of the coordinate system established by the computer device 1200. For example, the acceleration sensor 1211 is used to detect the components of gravitational acceleration on three coordinate axes.
  • the processor 1201 may control the touch display screen 1205 to display the user interface in a horizontal view or a vertical view according to the gravity acceleration signal collected by the acceleration sensor 1211.
  • the acceleration sensor 1211 may be used for the collection of game or user motion data.
  • the gyroscope sensor 1212 can detect the body direction and rotation angle of the computer device 1200, and the gyroscope sensor 1212 and the acceleration sensor 1211 can collect the user's 3D actions on the computer device 1200 together.
  • the processor 1201 can implement the following functions according to the data collected by the gyroscope sensor 1212: motion sensing (for example, changing the UI according to the user's tilt operation), image stabilization during shooting, game control, and inertial navigation.
  • the pressure sensor 1213 may be arranged on the side frame of the computer device 1200 and/or the lower layer of the touch screen 1205.
  • the pressure sensor 1213 can detect the user's holding signal of the computer device 1200, and perform left and right hand recognition or quick operation according to the holding signal.
  • the operability controls on the UI interface can be controlled according to the user's pressure operation on the touch display screen 1205.
  • the operability control includes at least one of a button control, a scroll bar control, an icon control, and a menu control.
  • the fingerprint sensor 1214 is used to collect the user's fingerprint to identify the user's identity according to the collected fingerprint. When it is recognized that the user's identity is a trusted identity, the processor 1201 authorizes the user to perform related sensitive operations, including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings.
  • the fingerprint sensor 1214 may be provided on the front, back or side of the computer device 1200. When the computer device 1200 is provided with a physical button or a manufacturer logo, the fingerprint sensor 1214 can be integrated with the physical button or the manufacturer logo.
  • the optical sensor 1215 is used to collect the ambient light intensity.
  • the processor 1201 may control the display brightness of the touch display screen 1205 according to the ambient light intensity collected by the optical sensor 1215. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 1205 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 1205 is decreased.
  • the processor 1201 may also dynamically adjust the shooting parameters of the camera assembly 1206 according to the ambient light intensity collected by the optical sensor 1215.
  • the proximity sensor 1216 also called a distance sensor, is usually installed on the front of the computer device 1200.
  • the proximity sensor 1216 is used to collect the distance between the user and the front of the computer device 1200.
  • the processor 1201 controls the touch screen 1205 to switch from the on-screen state to the off-screen state; when the proximity sensor 1216 When it is detected that the distance between the user and the front of the computer device 1200 is gradually increasing, the processor 1201 controls the touch screen 1205 to switch from the rest screen state to the bright screen state.
  • FIG. 12 does not constitute a limitation on the computer device 1200, and may include more or fewer components than those shown in the figure, or combine certain components, or adopt different component arrangements.
  • a computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the artificial intelligence-based face recognition method provided in the foregoing optional implementation manner.

Abstract

一种基于人工智能的人脸识别方法、装置、设备及介质,涉及计算机视觉领域。所述方法包括:获取输入的n组视频帧,存在至少一组视频帧包括目标人脸的彩色视频帧和深度视频帧,n为正整数(301);调用第一活体检测部对所述n组视频帧中的所述彩色视频帧进行识别,所述第一活体检测部是交互式活体检测部(302);调用第二活体检测部对所述n组视频帧中的所述深度视频帧进行识别,所述第二活体检测部是三维结构光式活体检测部(303);响应于所述第一活体检测部和所述第二活体检测部的检测结果均为活体类型,确定所述目标人脸为活体目标人脸(304)。

Description

基于人工智能的人脸识别方法、装置、设备及介质
本申请要求于2020年01月22日提交的申请号为202010075684.7、发明名称为“基于人工智能的人脸识别方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机视觉技术领域,特别涉及一种基于人工智能的人脸识别方法、装置、设备及介质。
背景技术
人脸活体检测技术是指通过特定的检测方式,使用人脸关键点定位和人脸追踪等技术,验证用户的操作是否为真实的活体本人的操作。
相关技术中,采用三维(3Dimensions,3D)结构光进行人脸活体验证,利用结构光摄像头向目标发出间隔一致的条纹状光线,若目标是真实的活体人脸,由于人脸的三维结构,会使反射回来的条纹状光线必然发生间隔不一致的情况;反之,其反射回来的结构光间隔一致。
上述人脸活体检测的方式不能有效防御攻击类型为合成攻击和翻拍攻击的线上人脸验证攻击,易于威胁用户的信息安全。
发明内容
本申请实施例提供了一种基于人工智能的人脸识别方法、装置设备及介质,可防御攻击类型为合成攻击、翻拍攻击和面具攻击的线上人脸验证攻击,保护了用户的信息安全,所述技术方案如下:
根据本申请的一个方面,提供了一种基于人工智能的人脸识别方法,应用于计算机设备中,所述方法包括:
获取输入的n组视频帧,存在至少一组视频帧包括目标人脸的彩色视频帧和深度视频帧,n为正整数;
调用第一活体检测部对所述n组视频帧中的所述彩色视频帧进行识别,所 述第一活体检测部是交互式活体检测部;
调用第二活体检测部对所述n组视频帧中的所述深度视频帧进行识别,所述第二活体检测部是三维结构光式活体检测部;
响应于所述第一活体检测部和所述第二活体检测部的检测结果均为活体类型,确定所述目标人脸为活体目标人脸。
根据本申请的另一方面,提供了一种基于人工智能的人脸识别装置,所述装置包括:
获取模块,用于获取输入的n组视频帧,存在至少一组视频帧包括目标人脸的彩色视频帧和深度视频帧,n为正整数;
第一活体检测部,用于对所述n组视频帧中的所述彩色视频帧进行识别,所述第一活体检测部是交互式活体检测部;
第二活体检测部,用于对所述n组视频帧中的所述深度视频帧进行识别,所述第二活体检测部是三维结构光式活体检测部;
处理模块,用于响应于所述第一活体检测部和所述第二活体检测部的检测结果均为活体类型,确定所述目标人脸为活体目标人脸。
根据本申请的另一方面,提供了一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、至少一段程序、代码集或指令集由所述处理器加载并执行以实现如上方面所述的基于人工智能的人脸识别方法。
根据本申请的另一方面,提供了一种计算机可读存储介质,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行,以实现如上方面所述的基于人工智能的人脸识别方法。
根据本申请的另一方面,提供一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行如上方面所述的基于人工智能的人脸识别方法。
本申请实施例提供的技术方案带来的有益效果至少包括:
通过将第一活体检测部和第二活体检测部进行结合,当两种活体检测部对 视频帧中的目标人脸的检测结果均为活体类型时,该视频帧中含有活体目标人脸。第一活体检测部可抵御翻拍攻击和面具攻击,第二活体检测部可抵御合成攻击和翻拍攻击,能够更全面地保障用户的信息安全。
附图说明
图1是本申请一个示例性实施例提供的对不同类型的攻击的活体人脸检测的示意图;
图2是本申请一个示例性实施例提供的计算机系统的结构示意图;
图3是本申请一个示例性实施例提供的基于人工智能的人脸识别方法的流程图;
图4是本申请另一个示例性实施例提供的基于人工智能的人脸识别方法的流程图;
图5是本申请一个示例性实施例提供的人脸特征点的示意图;
图6是本申请一个示例性实施例提供的结合人脸预处理过程的基于人工智能的人脸识别方法的流程图;
图7是本申请一个示例性实施例提供的通过MTCNN算法进行人脸检测的流程图;
图8是本申请一个示例性实施例提供的VGG-16深度学习网络的结构框图;
图9是本申请一个示例性实施例提供的结合活体人脸检测系统的基于人工智能的人脸识别方法的流程图;
图10是本申请一个示例性实施例提供的基于人工智能的人脸识别装置的结构框图;
图11是本申请一个示例性实施例提供的服务器的结构框图;
图12是本申请一个示例性实施例提供的计算机设备的结构示意图。
具体实施方式
首先,对本申请实施例中涉及的名词进行介绍:
AI(Artificial Intelligence,人工智能)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似 的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。
随着人工智能技术研究和进步,人工智能技术在多个领域展开研究和应用,例如常见的智能家居、智能穿戴设备、虚拟助理、智能音箱、智能营销、无人驾驶、自动驾驶、无人机、机器人、智能医疗、智能客服、身份验证、活体人脸识别等,相信随着技术的发展,人工智能技术将在更多的领域得到应用,并发挥越来越重要的价值。
计算机视觉技术(Computer Vision,CV):计算机视觉是一门研究如何使机器“看”的科学,更进一步的说,就是指用摄影机和电脑代替人眼对目标进行识别、跟踪和测量等机器视觉,并进一步做图形处理,使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科,计算机视觉研究相关的理论和技术,试图建立能够从图像或者多维数据中获取信息的人工智能系统。计算机视觉技术通常包括图像处理、图像识别、图像语义理解、图像检索、OCR(Optical Character Recognition,光学字符识别)、视频处理、视频语义理解、视频内容/行为识别、3D技术、虚拟现实、增强现实、同步定位与地图构建等技术,还包括常见的人脸识别、指纹识别等生物特征识别技术。
本申请实施例提供的方案涉及活体人脸检测技术领域,通过第一活体检测部和第二活体检测部均识别为活体人脸时,确定所检测的视频帧中的目标人脸是活体人脸。当第一活体检测部和第二活体检测部中至少有一个检测部的识别结果是非活体人脸时,确定所检测的视频帧中的目标人脸不是活体人脸。
相关技术中,通过动态交互验证和3D结构光人脸活体验证两种方式区分目标人脸是否为活体人脸。
动态交互验证指的是:用户需要根据系统指示做出相应动作,譬如眨眼、张嘴、读文字或数字、转头等。3D结构光人脸活体验证指的是:利用结构光摄像头向目标发出间隔一致的条纹状光线,根据反射回来的条纹状光线是否间隔 一致,判断目标是否为真实的活体人脸。
而针对上述两种验证方式存在的主要攻击方式如下:
1、合成攻击,即通过基于深度学习的人脸合成技术来合成他人人脸作为攻击样本,攻击者可以控制所合成的他人人脸做出一系列指定动作。
2、翻拍攻击,即通过翻拍身份证、照片、打印纸、其他播放设备所播放的视频等。
3、面具攻击,即通过佩戴按他人容貌所制作的面具来进行攻击。
因为合成攻击与翻拍攻击没有提供有效的3D深度信息,通过3D结构光人脸活体验证的方式能够有效防御合成攻击和翻拍攻击,但3D结构光人脸活体验证的方式不能防御面具攻击。
因为翻拍攻击和面具攻击中攻击者很难做出要求的动作或动作序列,通过交互式活体检测方式能够有效防御翻拍攻击和面具攻击,但交互式活体检测方式不能有效防御合成攻击。同时,通过交互式活体检测方式需要训练相关模型识别出用户做出的动作,需要对动作数据进行标注,在标注数据的过程中,不但要被采集者根据提示做出相应动作,同时还需要记录每个动作所需的时间,增加了数据采集的难度。
基于上述情况,本申请实施例提供了一种基于人工智能的人脸识别方法,将上述两种活体检测方式结合在一起。通过第一活体检测部和第二活体检测部来识别镜头前的人脸是否为其他媒介(如图片、视频、照片、打印纸、身份证等)所对应的画面来判断其是否为活体,提高了活体验证的准确率,从而保证了用户的信息安全。
图1示出了本申请一个示例性实施例提供的对不同类型的攻击的活体人脸检测的示意图。以终端是智能手机为例,在智能手机中运行有支持识别活体人脸的程序,该程序中包括第一活体检测部和第二活体检测部,目标对象通过智能手机拍摄含有目标对象的面部的视频帧,智能手机识别目标人脸是否为活体目标人脸。
如图1的(a)所示,第一目标对象是活体,第一目标人脸是活体人脸,第一活体检测部通过彩色视频帧可识别第一目标对象完成的动作,第二活体检测部通过深度视频帧可识别第一目标人脸的深度信息,则智能手机可识别出该目标人脸是活体人脸。
如图1的(b)所示,第二目标对象的面部是结合如图1的(a)所示的第一目标对象的五官合成的第二目标人脸,可受控制地做出一系列动作,第一活体检测部通过彩色视频帧识别第二目标人脸是活体人脸,第二活体检测部通过深度视频帧识别出第二目标人脸不具有深度信息,也即不是活体人脸,则智能手机识别出该第二目标人脸不是活体人脸。
如图1的(c)所示,第三目标对象的面部是第一目标对象在照片中的人脸,第一活体检测部通过彩色视频帧识别出第三目标人脸无法做出动作,第二活体检测部通过深度视频帧识别出第三目标人脸不具有深度信息,则智能手机识别出第三目标人脸不是活体人脸。
如图1的(d)所示,第四目标对象佩戴有按照第一目标对象的面容制作成的面具,第一活体检测部通过彩色视频帧是识别出第四目标人脸无法做出动作,第二活体检测部通过深度视频帧识别出第四目标人脸具有深度信息,则智能手机识别出第四目标人脸不是活体人脸。
运行有上述支持识别活体人脸程序的智能手机可防御合成攻击、翻拍攻击和面具攻击,能够更全面地应对各种类型的样本攻击,保障了用户的信息安全。
本申请实施例提供了一种基于人工智能的人脸识别方法,该方法可应用于服务器中,用户将视频帧上传至服务器中,服务器进行活体人脸验证并进行后续操作,该方法还可应用于终端中,通过终端上运行有支持活体人脸检测的程序,对用户拍摄的视频帧进行活体人脸验证,用户可将验证结果上传至服务器以进行后续操作。
图2示出了本申请一个示例性实施例提供的计算机系统结构图。该计算机系统100包括终端120和服务器140。
终端120安装和运行有支持人脸活体检测的应用程序,该应用程序也可以为小程序、网页、信息交互平台(如公众号)中的任意一种。终端120上设置有三维摄像头(包括彩色摄像头和深度摄像头),用于采集用户160的面部图像(包括照片和视频帧中的至少一种)。可选地,终端120以一定的频率对用户160的面部进行照片连拍,或拍摄视频,用户160的面部图像可以是附加有眨眼、转头、微笑、张嘴等动作的图像,或者是无附加动作的图像。终端120可以泛指多个终端中的一个,本实施例仅以终端120来举例说明。终端设备类型包括:智能手机、平板电脑、电子书阅读器、MP3播放器、MP4播放器、膝上型便携 计算机和台式计算机中的至少一种。以下实施例以终端包括智能手机来举例说明。
终端120通过无线网络或有线网络与服务器140相连。服务器140包括一台服务器、多台服务器、云计算平台和虚拟化中心中的至少一种。示意性的,服务器140包括处理器144和存储器142,存储器142又包括获取模块1421、处理模块1422和接收模块1423。服务器140用于为支持人脸活体检测的程序提供后台服务。示意性的,后台服务器可以是面部图像的存储服务,或为活体人脸检测提供计算服务,或为活体人脸检测提供验证服务。可选地,服务器140承担主要计算工作,终端120承担次要计算工作;或者,服务器140承担次要计算工作,终端120承担主要计算工作;或者,服务器140和终端120两者之间采用分布式计算架构进行协同计算。
本领域技术人员可以知晓,上述终端的数量可以更多或更少。比如上述终端可以仅为一个,或者上述终端为几十个或几百个,或者更多数量。本申请实施例对终端的数量和设备类型不加以限定。
图3示出了本申请一个示例性实施例提供的基于人工智能的人脸识别方法,该方法应用于如图2所示的计算机系统中的终端120中或其他计算机系统中,该方法包括如下步骤:
步骤301,获取输入的n组视频帧,存在至少一组视频帧包括目标人脸的彩色视频帧和深度视频帧,n为正整数。
使用具有图像采集功能的终端采集目标人脸的视频帧。示意性的,该终端包括:智能手机、平板电脑、笔记本电脑、连接有摄像头的台式计算机、照相机、摄像机中的至少一种。使用终端对目标对象的脸部进行连续拍摄。示意性的,该连续拍摄是拍摄视频。
每个拍摄时刻会同时拍摄彩色视频帧和深度视频帧。同一个拍摄时刻拍摄的彩色视频帧和深度视频帧,构成一组视频帧。
彩色视频帧是红绿蓝(Red Green Blue,RGB)格式的彩色视频帧。深度(Depth,D)视频帧中的每个像素点存储有深度摄像头到场景中各个真实点的距离(深度)值。示例性的,真实点是目标人脸上的点。
示意性的,彩色视频帧和深度视频帧存储为两帧关联的视频帧,比如采用拍摄时刻进行关联。或者,彩色视频帧和深度视频帧存储为同一视频帧,比如, 该帧视频同时包含R、G、B、D四个通道。本申请实施例对彩色视频帧和深度视频帧的具体存储方式不加以限定。
步骤302,调用第一活体检测部对n组视频帧中的彩色视频帧进行识别,第一活体检测部是交互式活体检测部。
可选地,第一活体检测部包括神经网络模型和程序单元,程序单元可以是计算单元或人工智能分类器。
可选地,神经网络模型包括堆叠沙漏网络。第一活体检测部中的神经网络模型可识别出彩色视频帧中的目标人脸上的五官特征点,得到目标人脸上的五官特征点的位置坐标。五官特征点是指一个五官所在的位置对应的特征点,如鼻子特征点,五官包括眉毛、眼睛、鼻子、嘴巴、耳朵中的至少一种。本申请实施例中,五官特征点包括目标人脸的左眼睛、右眼睛、鼻子、左嘴角和右嘴角。进一步,通过计算单元计算五官特征点的距离变化,或者,通过人工智能分类器确定五官特征点的距离变化,从而确定目标人脸是否完成了目标动作。
步骤303,调用第二活体检测部对n组视频帧中的深度视频帧进行识别,第二活体检测部是三维结构光式活体检测部。
可选地,第二活体检测部包括VGG-16深度学习网络。第二活体检测部可识别出深度视频帧中的是否含有活体目标人脸,示意性的,输出目标人脸属于活体人脸的概率。如,第二活体检测部输出的概率是96%,则第二活体检测部识别的深度视频帧中含有活体目标人脸。
步骤304,响应于第一活体检测部和第二活体检测部的检测结果均为活体类型,确定目标人脸为活体目标人脸。
当第一活体检测部对彩色频帧中的目标人脸进行识别后,得到的检测结果为非活体类型,则目标人脸为非活体目标人脸;当第二活体检测部对深度频帧中的目标人脸进行识别后,得到的检测结果为非活体类型,则目标人脸为非活体目标人脸;当第一活体检测部和第二活体检测部对视频帧中的目标人脸进行识别后,得到的检测结果均为非活体类型,则目标人脸为非活体目标人脸。
在一个示例中,第一活体检测部对彩色视频帧中的目标人脸进行识别,得到目标人脸的嘴巴特征点,通过计算嘴巴特征点的位置变化,得出目标人脸完成了张嘴动作,第二活体检测部对深度视频帧中的目标人脸进行识别,得到目标人脸属于活体目标人脸的概率是98%。因此,上述彩色视频帧和深度视频帧所组成的视频帧中的目标人脸是活体目标人脸。
可以理解的是,上述实施例中的第一活体检测部和第二活体检测部可以是任意神经网络构建的模型。
综上所述,本实施例提供的方法,通过将第一活体检测部和第二活体检测部进行结合,第一活体检测部可抵御翻拍攻击和面具攻击,第二活体检测部可抵御合成攻击和翻拍攻击,能够更全面地保障用户的信息安全。
下面对第一活体检测部中的神经网络模型的训练方式进行说明,神经网络模型是通过如下方式训练得到的:
S1、获取训练样本集合,训练样本集合包括多组样本人脸图像和样本五官特征点。
样本人脸图像包括照片或n组视频帧中包含的人脸图像,样本五官特征点包括样本人脸的左眼睛、右眼睛、鼻子、左嘴角和右嘴角。样本五官特征点可通过人工方式进行标注,或将样本人脸图像输入至具有特征点标注能力的模型中,识别出五官特征点,或利用现有技术中标注好的五官特征点的公开数据集合。
S2、调用第一活体检测部对样本人脸图像进行识别,得到样本人脸的预测五官特征点。
S3、将样本人脸的样本五官特征点和预测五官特征点进行比较,计算误差损失。
在一个示例中,鼻子特征点的预测位置坐标是(x1,y1),样本人脸的鼻子特征点的实际位置坐标是(x0,y0),将两者进行比较,计算误差损失。
可选地,可利用误差损失函数对样本五官特征点和预测五官特征点进行误差计算。误差函数可以是平滑的一范数损失函数,或欧式损失函数,或归一化损失指数函数或其它误差损失函数。
S4、通过误差反向传播算法根据误差损失对第一活体检测部进行训练,得到训练后的第一活体检测部。
样本人脸的样本五官特征点和预测五官特征点的误差损失不断减小,第一活体检测部输出的预测五官特征点与标注好的样本五官特征点趋于一致,第一活体检测部训练完成。
下面对第二活体检测部的训练方式进行说明,第二活体检测部是通过如下方式训练得到的:
S11、获取训练样本集合,训练样本集合包括多组样本人脸的深度图像和样本人脸的活体结果。
可选地,样本人脸的深度图像是由深度摄像头采集得到的。本申请对如何获得样本人脸的深度图像的方式不加以限定。
样本人脸的活体结果包括样本人脸是活体人脸,或非活体人脸。样本人脸的活体结果可通过人工方式进行标注,或通过其它模型识别。
S22、调用第二活体检测部对样本人脸图像进行识别,得到样本人脸属于活体人脸的概率。
S33、将样本人脸的活体结果和样本人脸属于活体人脸的概率进行比较,计算误差损失。
在一个示例中,样本人脸的活体结果是活体样本人脸(也即样本人脸属于活体人脸的概率是100%),第二活体检测部的预测结果是样本人脸属于活体人脸的概率是95%。将两者进行比较,计算误差损失。
可选地,可利用误差损失函数对样本人脸的活体结果和样本人脸属于活体人脸的概率进行误差计算。误差函数可以是平滑的一范数损失函数,或欧式损失函数,或归一化损失指数函数或其它误差损失函数。
S44、通过误差反向传播算法根据误差损失对第二活体检测部进行训练,得到训练后的第二活体检测部。
可选地,第二活体检测部也可与第一活体检测部的训练方式不同。
图4示出了本申请另一个示例性实施例提供的基于人工智能的人脸识别方法,该方法应用于如图2所示的计算机系统中的终端120中或其他计算机系统中,该方法包括如下步骤:
步骤401,获取输入的n组视频帧,存在至少一组视频帧包括目标人脸的彩色视频帧和深度视频帧,n为正整数。
步骤401与图3所示的步骤301一致,此处不再赘述。
步骤402,调用第一活体检测部中的神经网络模型获取彩色视频帧上的五官特征点的位置。
可选地,第一活体检测部包括神经网络模型和程序单元,程序单元可以为计算单元,或者用于分类的程序单元,该用于分类的程序单元可以为人工智能分类器。
可选地,神经网络模型包括堆叠沙漏神经网络,每个堆叠沙漏神经网络包含一个多尺度自底向上的特征提取编码器和一个多尺度自顶向下的解码器,该编码器和解码器在尺度放缩上是对称的。对称的多尺度编码器和解码器的组合能够提取多尺度特征,并最终输出106个热度图(Heatmaps),每个热度图对应一个特征点,最终特征点的位置即为每个热度图极大值点对应的坐标。堆叠沙漏神经网络通过堆叠多个沙漏网络来对提取到的特征点进一步精细化(Refine)操作,精细化操作是指对特征点进行精确计算后得到精确的特征点位置的操作。
步骤403,调用第一活体检测部根据彩色视频帧上的五官特征点的距离,确定目标人脸是否完成目标动作。
该步骤包括如下子步骤:
步骤4031,调用第一活体检测部计算属于同一个五官部位的五官特征点的最大横向距离和最大纵向距离的比值,五官部位包括眼睛和嘴巴中的至少一种。
示意性的,五官包括眼睛。眼睛特征点包括左眼特征点、右眼特征点、上眼皮特征点和下眼皮特征点,如图5的(a)所示的眼睛,以白色圆点表示五官特征点,则五官特征点的最大横向距离是左眼角特征点与右眼角特征点的距离,最大纵向距离是上眼皮特征点与下眼皮特征点的距离,上眼皮特征点是上眼皮的中点位置处的特征点,下眼皮特征点是上眼皮的中点位置处的特征点。
示意性的,五官包括嘴巴,嘴巴特征点包括左嘴角特征点、右嘴角特征点、上嘴唇特征点和下嘴唇特征点,如图5的(a)所示嘴巴,以白色圆点表示五官特征点,则五官特征点的最大横向距离是左嘴角特征点与右嘴角特征点的距离,最大纵向距离是上嘴唇特征点和下嘴唇特征点的距离,上嘴唇特征点是上嘴唇的中点位置处的特征点,也即两个唇峰的交点处特征点,下嘴唇特征点是下嘴唇的中点位置处的特征点。
步骤4032,响应于比值达到预设条件,确定目标人脸完成目标动作。
可选地,预设条件包括距离比值达到距离阈值。可选地,目标动作包括眨眼动作和张嘴动作中的至少一种,分别对确定眨眼动作和张嘴动作进行说明。
一、响应于五官部位为眼睛且比值达到第一距离阈值,确定目标人脸完成眨眼动作。
下面对目标人脸完成眨眼动作进行说明,如图5(b)所示。
S11、获取第一距离和第二距离,第一距离是左眼角特征点和右眼角特征点之间的距离,第二距离是上眼皮特征点和下眼皮特征点之间的距离。
S12、计算第一距离与第二距离的距离比值。
可选地,距离比值包括第一距离501比第二距离502,或第二距离502比第一距离501。以计算第一距离501比第二距离502为例,同一眼睛上包括第一距离501和第二距离502,本实施例仅为清楚标示,将第一距离501和第二距离502分开在两只眼睛上标注。
S13、计算目标人脸上的两只眼睛的距离比值的平均值。
在一个示例中,目标人脸的左眼睛的距离比值是a1,右眼睛的距离比值是a2,计算a1和a2的平均值。
S14、响应于平均值大于第一距离阈值,确定目标人脸完成眨眼动作。
可选地,第一距离阈值可以是默认设置,或根据不同的目标人脸动态调整的距离阈值,如成人的脸的第一距离阈值大于儿童的脸的第一距离阈值。
S15、响应于平均值小于第一距离阈值,确定目标人脸未完成眨眼动作。
需要说明的是,若距离比值是以第一距离比第二距离时,需要距离比值的平均值大于第一距离阈值;若距离比值是以第二距离比第一距离时,需要距离比值的平均值小于第一距离阈值。
S16、获取含有目标动作的视频帧的第一帧数。
S17、响应于视频帧的第一帧数小于第一预设阈值,将n组视频帧筛除。
若含有眨眼或张嘴等动作的视频帧数目小于第一预设阈值,则提前将该视频帧拒绝。第一活体检测部输出的是含有目标动作的视频帧。
S18、响应于视频帧的第一帧数大于第一预设阈值,确定n组视频帧是正样本视频帧。
正样本视频帧是指该视频帧是含有活体人脸的视频帧,与攻击样本视频帧相对。第一活体检测部识别出含有活体人脸的视频帧,可将该视频帧作为后续训练第一活体检测部中的神经网络模型的样本视频帧。
需要说明的是,上述步骤S14还可替换实施为如下步骤14:
步骤14,响应于五官部位为眼睛且比值被第一分类器识别为第一类型,确定目标人脸完成眨眼动作。
可选地,第一活体检测部还包括第一分类器和第二分类器中的至少一种。
可选地,还可通过第一分类器识别上述比值,第一分类器是具有对五官特征点的距离比值识别能力的机器学习模型,如支撑向量机。示意性的,第一类型为眨眼动作对应的类型。第一分类器将输入的距离比值分为眨眼动作对应的 类型(也即第一类型)和未进行眨眼动作对应的类型。
二、响应于五官部位为嘴巴且比值达到第二距离阈值,确定目标人脸完成张嘴动作。
下面对目标人脸完成张嘴动作进行说明,如图5的(c)所示。
S21、获取第三距离和第四距离,第三距离是左嘴角特征点和右嘴角特征点之间的距离,第四距离是上嘴唇的中间特征点和下嘴唇的中间特征点之间的距离(特征点以白色圆圈示出)。
S22、计算第三距离与第四距离的距离比值。
可选地,该距离比值包括第三距离503比第四距离504,或第四距离504比第三距离503。本实施例以第三距离503比第四距离504为例。
S23、响应于距离比值小于第二距离阈值,确定目标人脸完成张嘴动作。
可选地,第二距离阈值可以是默认设置,或根据不同的目标人脸动态调整的距离阈值,如男人的脸的第二距离阈值大于女人的脸的第二距离阈值。
需要说明的是,若距离比值是以第三距离比第四距离时,需要距离比值的平均值小于第二距离阈值;若距离比值是以第四距离比第三距离时,需要距离比值的平均值大于第二距离阈值。
S24、响应于距离比值大于第二距离阈值,确定目标人脸未完成张嘴动作。
S25、获取符合活体人脸对应的深度信息的视频帧的第二帧数。
S26、响应于视频帧的第二帧数小于第二预设阈值,将n组视频帧筛除。
若输入的视频帧中符合活体人脸对应的深度信息的视频帧数少于第二预设阈值,则提前将该视频帧拒绝。
S27、响应于视频帧的第二帧数大于第二预设阈值,确定n组视频帧是正样本视频帧。
正样本视频帧是指该视频帧是含有活体人脸的视频帧,与攻击样本视频帧相对。第二活体检测部识别出含有活体人脸的视频帧,可将该视频帧作为后续训练第二活体检测部的样本视频帧。
需要说明的是,上述步骤S23还可以替换实施为步骤23如下:
步骤23,响应于五官部位为嘴巴且比值被第二分类器识别为第二类型,确定目标人脸完成张嘴动作。
可选地,还可通过第二分类器识别上述比值,第二分类器是具有对五官特征点的距离比值识别能力的机器学习模型,如支撑向量机。示意性的,第二类 型为张嘴动作对应的类型。第二分类器将输入的距离比值分为张嘴动作对应的类型(也即第二类型)和未进行张嘴动作对应的类型。
步骤404,调用第二活体检测部对n组视频帧中的深度视频帧进行识别,第二活体检测部是三维结构光式活体检测部。
步骤405,响应于第一活体检测部和第二活体检测部的检测结果均为活体类型,确定目标人脸为活体目标人脸。
当至少存在一个活体检测部的检测结果不是活体类型,则输出的检测结果是该目标人脸为非活体目标人脸。
该步骤包括如下子步骤:
步骤4051,获取第一帧数和第二帧数,第一帧数是第一活体检测部识别出的含有目标动作的彩色视频帧的帧数,第二帧数是第二活体检测部识别出的符合活体人脸对应的深度信息的深度视频帧的帧数;
步骤4052,响应于第一帧数大于第一预设阈值且第二帧数大于第二预设阈值,确定目标人脸为活体目标人脸。
当第一活体检测部与第二活体检测部对视频帧中的人脸进行检测的结果均为活体人脸,且含有活体人脸的帧数大于第一预设阈值,且符合活体人脸对应的深度信息的帧数大于第二预设阈值时,确定目标人脸为活体目标人脸。
可以理解的是,上述实施例中的活体检测部可以是任意神经网络。
综上所述,本实施例提供的方法,通过将第一活体检测部和第二活体检测部进行结合,以计算五官特征点的距离比值来确定目标人脸是否完成了目标动作,无需对动作进行标注,也无需记录每个动作所需的时间,降低了数据采集的难度,使得第一活体检测部易于训练。第一活体检测部可抵御翻拍攻击和面具攻击,第二活体检测部可抵御合成攻击和翻拍攻击,能够更全面地保障用户的信息安全。
在第一活体检测部和第二活体检测部对视频帧进行识别之前,需要先确定视频帧中是否含有人脸。
图6是本申请一个示例性实施例提供的结合预处理基于人工智能的人脸识别方法的流程图,该方法应用于如图2所示的计算机系统中的终端120中或其他计算机系统中,该方法包括如下步骤:
步骤601,调用人脸预处理部对n组视频帧中的彩色视频帧进行识别,人脸 预处理部是具有人脸识别能力的机器学习部。
可选地,人脸预处理部可以是多任务卷积神经网络(Multi-Task Convolutional Neural Network,MTCNN)。MTCNN网络由三个子网络级联而成,包括建议网络(Proposal Network,P-Net),提纯网络(Refine Network,R-Net)和输出网络(Output Network,O-Net)。图7示出了本申请一个示例性实施例提供的MTCNN网络检测五官特征点的方法流程图,该方法包括:
步骤1,获取图像彩色视频帧。
步骤2,建议网络将图像彩色视频帧作为输入,生成一系列可能包含人脸的候选区域。
步骤3,提纯网络对建议网络生成的大量候选区域进行提纯,筛除其中不包含人脸的部分。
步骤4,输出网络输出人脸区域并且定位五官特征点。
示意性的,五官特征点包括左右眼、鼻尖、左右嘴角。
步骤5,得到人脸检测框及五官特征点。
步骤602,获取识别结果为具有人脸区域的m组视频帧,m≤n,且m为正整数。
输出网络输出含有人脸区域的m组视频帧,该m组视频帧可能与输入的n组视频帧一样多,或者少于输入的n组视频帧。并定位人脸上的五官特征点,输出五官特征点的位置坐标。
步骤603,响应于m小于第三预设阈值,将n组视频帧筛除。
如果输入的视频帧中包含人脸区域的视频帧数目m未超过第三预设阈值,则提前将该n组视频帧拒绝。
步骤604,将m组视频帧中的彩色视频帧发送至第一活体检测部,以及将m组视频帧中的深度视频帧发送至第二活体检测部。
将进过预处理后的视频帧分别发送至第一活体检测部和第二活体检测部,第一活体检测部识别彩色视频帧,第二活体检测部识别深度视频帧。
可选地,通过训练一个二分类深度学习模型来判断输入的深度视频帧是否为真实活体人脸。示意性的,二分类深度学习模型是VGG-16深度学习网络,VGG-16深度学习网络的结构如图8所示。
3×3标识卷积核大小,卷积后的数字代表输出通道的个数,全连接层中的数字为输出通道的个数。
深度视频帧首先缩放至224×224(像素)的大小,然后将缩放后的视频帧作为VGG-16深度学习网络的输入,经过一系列卷积(Convolution)、线性整流激活(ReLU Activation)、全连接(Fully-Connected)以及归一化指数(softmax)等网络层的操作,深度学习网络输出该输入视频帧是活体真人还是攻击样本的概率。
可以理解的是,上述实施例中的深度神经网络可以是任意神经网络。
综上所述,本实施例提供的方法,在第一活体检测部和第二活体检测部对视频帧进行检测之前,通过人脸预处理部对视频帧进行识别,将视频帧中不含有人脸的部分视频帧筛除,提高了两种活体检测部检测活体人脸的准确率,也提高了活体人脸的检测效率。
下面对活体人脸检测系统识别视频帧中的活体人脸的整个过程进行说明。以第一活体检测部为交互式活体检测模型,以第二活体检测部为三维结构光式检测模型,以人脸预处理部为人脸检测预处理模型为例进行说明。
图9示出了本申请一个示例性实施例提供的基于人工智能的人脸识别方法,该方法应用于如图2所示的计算机系统中的终端120中或其他计算机系统中,该方法包括如下步骤:
步骤801,将彩色视频帧输入至人脸检测预处理模型中。
步骤802,调用人脸检测预处理模型对人脸进行检测。
步骤803,判断包含人脸的视频帧的帧数是否大于n1。
通过人脸检测预处理模型10来确定输入的视频帧中是否含有人脸,若包含人脸的视频帧的帧数大于n1(第三预设阈值),则进入步骤806和步骤810;反之,则进入步骤804。
步骤804,输入视频帧为攻击样本,拒绝输入该视频帧。
步骤805,将深度视频帧输入至三维结构光式检测模型中。
三维光结构光式检测模型11对视频帧中的深度视频帧进行检测。
步骤806,根据彩色视频帧检测结果保留人脸深度视频帧,并裁剪出人脸区域。
三维结构光式检测模型11可确定深度视频帧中的人脸区域。
步骤807,通过轻量化分类器判断是否为纸片攻击。
纸片攻击是指平面形状的媒介上的人脸所形成的攻击,如照片、屏幕画面、 身份证、报纸上的人脸所形成的样本攻击。也即通过轻量化分类器可判断目标人脸是否是具有三维结构的人脸。
步骤808,判断非纸片攻击的视频帧数是否大于n2。
若非纸片攻击的视频帧数大于n2(第二预设阈值),进入步骤810;反之,进入步骤809。
步骤809,输入的视频帧为攻击样本,拒绝输入该视频帧。
步骤810,调用交互式活体检测模型对五官特征点进行检测。
步骤811,计算眼睛或嘴巴特征点的长宽比,判断是否进行眨眼或张嘴动作。
眼睛或嘴巴特征点的长宽比,也即距离比值。可选地,目标人脸还可进行转头或点头等动作。
步骤812,判断含有眨眼或张嘴动作的视频帧的帧数是否大于n3。
通过交互式活体检测模型12来确定目标人脸的五官特征点,通过五官特征点的移动变化,确定目标人脸是否完成了目标动作。以目标动作包括眨眼或张嘴动作为例,若含有眨眼或张嘴动作的视频帧数大于n3(第一预设阈值),则进入步骤814;反之,则进入步骤813。
步骤813,输入的视频帧为攻击样本,拒绝输入该视频帧。
步骤814,输入的视频帧为正样本视频帧,通过输入的视频帧。
该正样本视频帧中包含活体人脸,该正样本视频帧可作为训练第一活体检测部中的神经网络模型的训练样本。
可以理解的是,上述实施例中的深度神经网络可以是任意神经网络。
综上所述,本实施例提供的方法,通过将第一活体检测部和第二活体检测部进行结合,以计算五官特征点的距离比值来确定目标人脸是否完成了目标动作,以目标人脸的深度信息来确定目标人脸是否为纸片攻击,当两种活体检测模型对视频帧中的目标人脸的检测结果均为活体类型时,该视频帧中含有活体目标人脸,第一活体检测部可抵御翻拍攻击和面具攻击,第二活体检测部可抵御合成攻击和翻拍攻击,能够更全面地保障用户的信息安全。
图10示出了本申请一个示例性实施例提供的基于人工智能的人脸识别装置的框图。该测装置包括:
获取模块1010,用于获取输入的n组视频帧,存在至少一组视频帧包括目标人脸的彩色视频帧和深度视频帧,n为正整数;
第一活体检测部1020,用于对n组视频帧中的彩色视频帧进行识别,第一活体检测部是交互式活体检测部;
第二活体检测部1030,用于对n组视频帧中的深度视频帧进行识别,第二活体检测部是三维结构光式活体检测部;
处理模块1040,用于响应于第一活体检测部和第二活体检测部的检测结果均为活体类型,确定目标人脸为活体目标人脸。
在一个可选的实施例中,所述第一活体检测部1020,用于调用神经网络模型获取彩色视频帧上的五官特征点的位置;
第一活体检测部1020,用于根据彩色视频帧上的五官特征点的距离,确定目标人脸是否完成目标动作。
在一个可选的实施例中,该装置包括计算模块1050;
所述计算模块1050,用于调用第一活体检测部计算属于同一个五官部位的五官特征点的最大横向距离和最大纵向距离的比值,五官部位包括眼睛和嘴巴中的至少一种;
所述处理模块1040,用于响应于比值达到预设条件,确定目标人脸完成目标动作。
在一个可选的实施例中,所述处理模块1040,用于响应于五官部位为眼睛且比值达到第一距离阈值,确定目标人脸完成眨眼动作;响应于五官部位为嘴巴且比值达到第二距离阈值,确定目标人脸完成张嘴动作。
在一个可选的实施例中,所述五官特征点包括眼睛特征点,眼睛特征点包括左眼角特征点、右眼角特征点、上眼皮特征点和下眼皮特征点;
所述获取模块1010,用于获取第一距离和第二距离,第一距离是左眼角特征点和右眼角特征点之间的距离,第二距离是上眼皮特征点和下眼皮特征点之间的距离;
所述计算模块1050,用于计算第一距离与第二距离的距离比值;
所述计算模块1050,用于计算目标人脸上的两只眼睛的距离比值的平均值;响应于平均值大于第一距离阈值,确定目标人脸完成眨眼动作。
在一个可选的实施例中,所述第一活体检测部1020还包括:第一分类器和第二分类器中的至少一种,所述处理模块1040,用于响应于五官部位为眼睛且比值被第一分类器识别为第一类型,确定目标人脸完成眨眼动作;所述处理模块1040,用于响应于五官部位为嘴巴且比值被第二分类器识别为第二类型,确 定目标人脸完成张嘴动作。
在一个可选的实施例中,该装置包括训练模块1060,所述神经网络模型是通过如下方式训练得到:
所述获取模块1010,用于获取训练样本集合,训练样本集合包括多组样本人脸图像和样本五官特征点;
所述第一活体检测部1020,用于对样本人脸图像进行识别,得到样本人脸的预测五官特征点;
所述计算模块1050,用于将样本人脸的样本五官特征点和预测五官特征点进行比较,计算误差损失;
训练模块1060,用于通过误差反向传播算法根据误差损失对第一活体检测部进行训练,得到训练后的第一活体检测部。
在一个可选的实施例中,该装置包括人脸预处理部1070;
所述人脸预处理部1070,用于对n组视频帧中的彩色视频帧进行识别,所述人脸预处理部1070是具有人脸识别能力的机器学习模型;
所述获取模块1010,用于获取识别结果为具有人脸区域的m组视频帧,m≤n,且m为正整数;
所述处理模块1040,用于将m组视频帧中的彩色视频帧发送至第一活体检测部,以及将m组视频帧中的深度视频帧发送至第二活体检测部。
在一个可选的实施例中,所述获取模块1010,用于获取第一帧数和第二帧数,第一帧数是第一活体检测部1020识别出的含有目标动作的彩色视频帧的帧数,第二帧数是第二活体检测部1030识别出的符合活体人脸对应的深度信息的深度视频帧的帧数;
所述处理模块1040,用于响应于第一帧数大于第一预设阈值且第二帧数大于第二预设阈值,确定目标人脸为活体目标人脸。
在一个可选的实施例中,所述处理模块1040,用于响应于视频帧的第一帧数小于第一预设阈值,确定目标人脸为非活体目标人脸,将n组视频帧筛除;或,响应于视频帧的第二帧数小于第二预设阈值,确定目标人脸为非活体目标人脸,将n组视频帧筛除。
在一个可选的实施例中,所述处理模块1040,用于响应于m小于第三预设阈值,将n组视频帧筛除。
图11示出了本申请一个示例性实施例提供的服务器的结构示意图。该服务器可以是后台服务器集群140中的服务器。具体来讲:
服务器1100包括中央处理单元(CPU,Central Processing Unit)1101、包括随机存取存储器(RAM,Random Access Memory)1102和只读存储器(ROM,Read Only Memory)1103的系统存储器1104,以及连接系统存储器1104和中央处理单元1101的系统总线1105。服务器1100还包括帮助计算机内的各个器件之间传输信息的基本输入/输出系统(I/O系统,Input Output System)1106,和用于存储操作系统1113、应用程序1114和其他程序模块1115的大容量存储设备1107。
基本输入/输出系统1106包括有用于显示信息的显示器1108和用于用户输入信息的诸如鼠标、键盘之类的输入设备1109。其中显示器1108和输入设备1109都通过连接到系统总线1105的输入输出控制器1110连接到中央处理单元1101。基本输入/输出系统1106还可以包括输入输出控制器1110以用于接收和处理来自键盘、鼠标、或电子触控笔等多个其他设备的输入。类似地,输入输出控制器1110还提供输出到显示屏、打印机或其他类型的输出设备。
大容量存储设备1107通过连接到系统总线1105的大容量存储控制器(未示出)连接到中央处理单元1101。大容量存储设备1107及其相关联的计算机可读介质为服务器1100提供非易失性存储。也就是说,大容量存储设备1107可以包括诸如硬盘或者紧凑型光盘只读存储器(CD-ROM,Compact Disc Read Only Memory)驱动器之类的计算机可读介质(未示出)。
计算机可读介质可以包括计算机存储介质和通信介质。计算机存储介质包括以用于存储诸如计算机可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机存储介质包括RAM、ROM、可擦除可编程只读存储器(EPROM,Erasable Programmable Read Only Memory)、带电可擦可编程只读存储器(EEPROM,Electrically Erasable Programmable Read Only Memory)、闪存或其他固态存储其技术,CD-ROM、数字通用光盘(DVD,Digital Versatile Disc)或固态硬盘(SSD,Solid State Drives)、其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存储设备。其中,随机存取记忆体可以包括电阻式随机存取记忆体(ReRAM,Resistance Random Access Memory)和动态随机存取存储器(DRAM,Dynamic Random Access Memory)。当然,本领域技术人员可知计算机存储介质不局限于上述几 种。上述的系统存储器1104和大容量存储设备1107可以统称为存储器。
根据本申请的各种实施例,服务器1100还可以通过诸如因特网等网络连接到网络上的远程计算机运行。也即服务器1100可以通过连接在系统总线1105上的网络接口单元1111连接到网络1112,或者说,也可以使用网络接口单元1111来连接到其他类型的网络或远程计算机系统(未示出)。
上述存储器还包括一个或者一个以上的程序,一个或者一个以上程序存储于存储器中,被配置由CPU执行。
在一个可选的实施例中,提供了一种计算机设备,该计算机设备包括处理器和存储器,存储器中存储有至少一条指令、至少一段程序、代码集或指令集,至少一条指令、至少一段程序、代码集或指令集由处理器加载并执行以实现如上所述的基于人工智能的人脸识别方法。
在一个可选的实施例中,提供了一种计算机可读存储介质,该存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,至少一条指令、至少一段程序、代码集或指令集由处理器加载并执行以实现如上所述的基于人工智能的人脸识别方法。
请参考图12,其示出了本申请一个示例性实施例提供的计算机设备1200的结构框图。该计算机设备1200可以是便携式移动终端,比如:智能手机、平板电脑、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器。计算机设备1200还可能被称为用户设备、便携式终端等其他名称。
通常,计算机设备1200包括有:处理器1201和存储器1202。
处理器1201可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器1201可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器1201也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器1201可以集成有GPU(Graphics Processing Unit,图像处理器), GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器1201还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器1202可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是有形的和非暂态的。存储器1202还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器1202中的非暂态的计算机可读存储介质用于存储至少一个指令,该至少一个指令用于被处理器1201所执行以实现本申请中提供的基于人工智能的人脸识别方法。
在一些实施例中,计算机设备1200还可选包括有:外围设备接口1203和至少一个外围设备。具体地,外围设备包括:射频电路1204、触摸显示屏1205、摄像头组件1206、音频电路1207、定位组件1208和电源1209中的至少一种。
外围设备接口1203可被用于将I/O(Input/Output,输入/输出)相关的至少一个外围设备连接到处理器1201和存储器1202。在一些实施例中,处理器1201、存储器1202和外围设备接口1203被集成在同一芯片或电路板上;在一些其他实施例中,处理器1201、存储器1202和外围设备接口1203中的任意一个或两个可以在单独的芯片或电路板上实现,本实施例对此不加以限定。
射频电路1204用于接收和发射RF(Radio Frequency,射频)信号,也称电磁信号。射频电路1204通过电磁信号与通信网络以及其他通信设备进行通信。射频电路1204将电信号转换为电磁信号进行发送,或者,将接收到的电磁信号转换为电信号。可选地,射频电路1204包括:天线系统、RF收发器、一个或多个放大器、调谐器、振荡器、数字信号处理器、编解码芯片组、用户身份模块卡等等。射频电路1204可以通过至少一种无线通信协议来与其它终端进行通信。该无线通信协议包括但不限于:万维网、城域网、内联网、各代移动通信网络(2G、3G、4G及5G)、无线局域网和/或WiFi(Wireless Fidelity,无线保真)网络。在一些实施例中,射频电路1204还可以包括NFC(Near Field Communication,近距离无线通信)有关的电路,本申请对此不加以限定。
触摸显示屏1205用于显示UI(User Interface,用户界面)。该UI可以包括图形、文本、图标、视频及其它们的任意组合。触摸显示屏1205还具有采集在触摸显示屏1205的表面或表面上方的触摸信号的能力。该触摸信号可以作为控制信号输入至处理器1201进行处理。触摸显示屏1205用于提供虚拟按钮和/或 虚拟键盘,也称软按钮和/或软键盘。在一些实施例中,触摸显示屏1205可以为一个,设置在计算机设备1200的前面板;在另一些实施例中,触摸显示屏1205可以为至少两个,分别设置在计算机设备1200的不同表面或呈折叠设计;在另一些实施例中,触摸显示屏1205可以是柔性显示屏,设置在计算机设备1200的弯曲表面上或折叠面上。甚至,触摸显示屏1205还可以设置成非矩形的不规则图形,也即异形屏。触摸显示屏1205可以采用LCD(Liquid Crystal Display,液晶显示器)、OLED(Organic Light-Emitting Diode,有机发光二极管)等材质制备。
摄像头组件1206用于采集图像或视频。可选地,摄像头组件1206包括前置摄像头和后置摄像头。通常,前置摄像头用于实现视频通话或自拍,后置摄像头用于实现照片或视频的拍摄。在一些实施例中,后置摄像头为至少两个,分别为主摄像头、景深摄像头、广角摄像头中的任意一种,以实现主摄像头和景深摄像头融合实现背景虚化功能,主摄像头和广角摄像头融合实现全景拍摄以及VR(Virtual Reality,虚拟现实)拍摄功能。在一些实施例中,摄像头组件1206还可以包括闪光灯。闪光灯可以是单色温闪光灯,也可以是双色温闪光灯。双色温闪光灯是指暖光闪光灯和冷光闪光灯的组合,可以用于不同色温下的光线补偿。
音频电路1207用于提供用户和计算机设备1200之间的音频接口。音频电路1207可以包括麦克风和扬声器。麦克风用于采集用户及环境的声波,并将声波转换为电信号输入至处理器1201进行处理,或者输入至射频电路1204以实现语音通信。出于立体声采集或降噪的目的,麦克风可以为多个,分别设置在计算机设备1200的不同部位。麦克风还可以是阵列麦克风或全向采集型麦克风。扬声器则用于将来自处理器1201或射频电路1204的电信号转换为声波。扬声器可以是传统的薄膜扬声器,也可以是压电陶瓷扬声器。当扬声器是压电陶瓷扬声器时,不仅可以将电信号转换为人类可听见的声波,也可以将电信号转换为人类听不见的声波以进行测距等用途。在一些实施例中,音频电路1207还可以包括耳机插孔。
定位组件1208用于定位计算机设备1200的当前地理位置,以实现导航或LBS(Location Based Service,基于位置的服务)。定位组件1208可以是基于美国的GPS(Global Positioning System,全球定位系统)、中国的北斗系统或俄罗斯的伽利略系统的定位组件。
电源1209用于为计算机设备1200中的各个组件进行供电。电源1209可以 是交流电、直流电、一次性电池或可充电电池。当电源1209包括可充电电池时,该可充电电池可以是有线充电电池或无线充电电池。有线充电电池是通过有线线路充电的电池,无线充电电池是通过无线线圈充电的电池。该可充电电池还可以用于支持快充技术。
在一些实施例中,计算机设备1200还包括有一个或多个传感器1210。该一个或多个传感器1210包括但不限于:加速度传感器1211陀螺仪传感器1212、压力传感器1213、指纹传感器1214、光学传感器1215以及接近传感器1216。
加速度传感器1211以检测以计算机设备1200建立的坐标系的三个坐标轴上的加速度大小。比如,加速度传感器1211以用于检测重力加速度在三个坐标轴上的分量。处理器1201可以根据加速度传感器1211集的重力加速度信号,控制触摸显示屏1205以横向视图或纵向视图进行用户界面的显示。加速度传感器1211可以用于游戏或者用户的运动数据的采集。
陀螺仪传感器1212可以检测计算机设备1200的机体方向及转动角度,陀螺仪传感器1212可以与加速度传感器1211同采集用户对计算机设备1200的3D动作。处理器1201根据陀螺仪传感器1212采集的数据,可以实现如下功能:动作感应(比如根据用户的倾斜操作来改变UI)、拍摄时的图像稳定、游戏控制以及惯性导航。
压力传感器1213可以设置在计算机设备1200的侧边框和/或触摸显示屏1205的下层。当压力传感器1213设置在计算机设备1200的侧边框时,可以检测用户对计算机设备1200的握持信号,根据该握持信号进行左右手识别或快捷操作。当压力传感器1213设置在触摸显示屏1205的下层时,可以根据用户对触摸显示屏1205的压力操作,实现对UI界面上的可操作性控件进行控制。可操作性控件包括按钮控件、滚动条控件、图标控件、菜单控件中的至少一种。
指纹传感器1214用于采集用户的指纹,以根据采集到的指纹识别用户的身份。在识别出用户的身份为可信身份时,由处理器1201授权该用户执行相关的敏感操作,该敏感操作包括解锁屏幕、查看加密信息、下载软件、支付及更改设置等。指纹传感器1214可以被设置在计算机设备1200的正面、背面或侧面。当计算机设备1200上设置有物理按键或厂商Logo时,指纹传感器1214可以与物理按键或厂商Logo集成在一起。
光学传感器1215用于采集环境光强度。在一个实施例中,处理器1201可以根据光学传感器1215采集的环境光强度,控制触摸显示屏1205的显示亮度。 具体地,当环境光强度较高时,调高触摸显示屏1205的显示亮度;当环境光强度较低时,调低触摸显示屏1205的显示亮度。在另一个实施例中,处理器1201还可以根据光学传感器1215采集的环境光强度,动态调整摄像头组件1206的拍摄参数。
接近传感器1216,也称距离传感器,通常设置在计算机设备1200的正面。接近传感器1216用于采集用户与计算机设备1200的正面之间的距离。在一个实施例中,当接近传感器1216检测到用户与计算机设备1200的正面之间的距离逐渐变小时,由处理器1201控制触摸显示屏1205从亮屏状态切换为息屏状态;当接近传感器1216检测到用户与计算机设备1200的正面之间的距离逐渐变大时,由处理器1201控制触摸显示屏1205从息屏状态切换为亮屏状态。
本领域技术人员可以理解,图12中示出的结构并不构成对计算机设备1200的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
根据本申请实施例的一个方面,提供一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述可选实现方式中提供的基于人工智能的人脸识别方法。

Claims (14)

  1. 一种基于人工智能的人脸识别方法,其特征在于,应用于计算机设备中,所述方法包括:
    获取输入的n组视频帧,存在至少一组视频帧包括目标人脸的彩色视频帧和深度视频帧,n为正整数;
    调用第一活体检测部对所述n组视频帧中的所述彩色视频帧进行识别,所述第一活体检测部是交互式活体检测部;
    调用第二活体检测部对所述n组视频帧中的所述深度视频帧进行识别,所述第二活体检测部是三维结构光式活体检测部;
    响应于所述第一活体检测部和所述第二活体检测部的检测结果均为活体类型,确定所述目标人脸为活体目标人脸。
  2. 根据权利要求1所述的方法,其特征在于,所述调用第一活体检测部对所述n组视频帧中的所述彩色视频帧进行识别,包括:
    调用所述第一活体检测部中的神经网络模型获取所述彩色视频帧上的五官特征点的位置;
    调用所述第一活体检测部根据所述彩色视频帧上的五官特征点的距离,确定所述目标人脸是否完成目标动作。
  3. 根据权利要求2所述的方法,其特征在于,所述调用所述第一活体检测部根据所述彩色视频帧上的五官特征点的距离,确定所述目标人脸是否完成目标动作,包括:
    调用所述第一活体检测部计算属于同一个五官部位的所述五官特征点的最大横向距离和最大纵向距离的比值,所述五官部位包括眼睛和嘴巴中的至少一种;
    响应于所述比值达到预设条件,确定所述目标人脸完成所述目标动作。
  4. 根据权利要求3所述的方法,其特征在于,所述响应于所述比值达到预设条件,确定所述目标人脸完成所述目标动作,包括:
    响应于所述五官部位为所述眼睛且所述比值达到第一距离阈值,确定所述 目标人脸完成眨眼动作;
    响应于所述五官部位为所述嘴巴且所述比值达到第二距离阈值,确定所述目标人脸完成张嘴动作。
  5. 根据权利要求4所述的方法,其特征在于,所述五官特征点包括眼睛特征点,所述眼睛特征点包括左眼角特征点、右眼角特征点、上眼皮特征点和下眼皮特征点;
    所述响应于所述五官部位为所述眼睛且所述比值达到第一距离阈值,确定所述目标人脸完成眨眼动作,包括:
    获取第一距离和第二距离,所述第一距离是所述左眼角特征点和所述右眼角特征点之间的距离,所述第二距离是所述上眼皮特征点和所述下眼皮特征点之间的距离;
    计算所述第一距离与所述第二距离的距离比值;
    计算所述目标人脸上的两只眼睛的所述距离比值的平均值;
    响应于所述平均值大于所述第一距离阈值,确定所述目标人脸完成所述眨眼动作。
  6. 根据权利要求3所述的方法,其特征在,所述第一活体检测部还包括:第一分类器和第二分类器中的至少一种,所述响应于所述比值达到预设条件,确定所述目标人脸完成所述目标动作,包括:
    响应于所述五官部位为所述眼睛且所述比值被所述第一分类器识别为第一类型,确定所述目标人脸完成眨眼动作;
    响应于所述五官部位为所述嘴巴且所述比值被所述第二分类器识别为第二类型,确定所述目标人脸完成张嘴动作。
  7. 根据权利要求2至6任一所述的方法,其特征在于,所述神经网络模型通过如下方式训练得到:
    获取训练样本集合,所述训练样本集合包括多组样本人脸图像和样本五官特征点;
    调用所述第一活体检测部对所述样本人脸图像进行识别,得到样本人脸的 预测五官特征点;
    将所述样本人脸的样本五官特征点和所述预测五官特征点进行比较,计算误差损失;
    通过误差反向传播算法根据所述误差损失对所述第一活体检测部进行训练,得到训练后的第一活体检测部。
  8. 根据权利要求1至6任一所述的方法,其特征在于,所述方法还包括:
    调用人脸预处理部对所述n组视频帧中的所述彩色视频帧进行识别,所述人脸预处理部是具有人脸识别能力的机器学习部;
    获取识别结果为具有人脸区域的m组视频帧,m≤n,且m为正整数;
    将所述m组视频帧中的所述彩色视频帧发送至所述第一活体检测部,以及将所述m组视频帧中的所述深度视频帧发送至所述第二活体检测部。
  9. 根据权利要求1至6任一所述的方法,其特征在于,所述响应于所述第一活体检测部和所述第二活体检测部的检测结果均为活体类型,确定所述目标人脸为活体目标人脸,包括:
    获取第一帧数和第二帧数,所述第一帧数是所述第一活体检测部识别出的含有目标动作的彩色视频帧的帧数,所述第二帧数是所述第二活体检测部识别出的符合所述活体人脸对应的深度信息的深度视频帧的帧数;
    响应于所述第一帧数大于第一预设阈值且所述第二帧数大于第二预设阈值,确定所述目标人脸为活体目标人脸。
  10. 根据权利要求9所述的方法,其特征在于,所述方法还包括:
    响应于所述视频帧的第一帧数小于第一预设阈值,确定所述目标人脸为非活体目标人脸,将所述n组视频帧筛除;
    或,
    响应于所述视频帧的第二帧数小于第二预设阈值,确定所述目标人脸为非活体目标人脸,将所述n组视频帧筛除。
  11. 根据权利要求8所述的方法,其特征在于,所述方法还包括:
    响应于所述m小于第三预设阈值,将所述n组视频帧筛除。
  12. 一种基于人工智能的人脸识别装置,其特征在于,所述装置包括:
    获取模块,用于获取输入的n组视频帧,存在至少一组视频帧包括目标人脸的彩色视频帧和深度视频帧,n为正整数;
    第一活体检测部,用于对所述n组视频帧中的所述彩色视频帧进行识别,所述第一活体检测部是交互式活体检测部;
    第二活体检测部,用于对所述n组视频帧中的所述深度视频帧进行识别,所述第二活体检测部是三维结构光式活体检测部;
    处理模块,用于响应于所述第一活体检测部和所述第二活体检测部的检测结果均为活体类型,确定所述目标人脸为活体目标人脸。
  13. 一种计算机设备,其特征在于,所述计算机设备包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、至少一段程序、代码集或指令集由所述处理器加载并执行以实现如权利要求1至11任一所述的基于人工智能的人脸识别方法。
  14. 一种计算机可读存储介质,其特征在于,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行,以实现如权利要求1至11任一所述的基于人工智能的人脸识别方法。
PCT/CN2020/124944 2020-01-22 2020-10-29 基于人工智能的人脸识别方法、装置、设备及介质 WO2021147434A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/685,177 US20220309836A1 (en) 2020-01-22 2022-03-02 Ai-based face recognition method and apparatus, device, and medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010075684.7 2020-01-22
CN202010075684.7A CN111242090B (zh) 2020-01-22 2020-01-22 基于人工智能的人脸识别方法、装置、设备及介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/685,177 Continuation US20220309836A1 (en) 2020-01-22 2022-03-02 Ai-based face recognition method and apparatus, device, and medium

Publications (1)

Publication Number Publication Date
WO2021147434A1 true WO2021147434A1 (zh) 2021-07-29

Family

ID=70879808

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/124944 WO2021147434A1 (zh) 2020-01-22 2020-10-29 基于人工智能的人脸识别方法、装置、设备及介质

Country Status (3)

Country Link
US (1) US20220309836A1 (zh)
CN (1) CN111242090B (zh)
WO (1) WO2021147434A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111242090B (zh) * 2020-01-22 2023-06-23 腾讯科技(深圳)有限公司 基于人工智能的人脸识别方法、装置、设备及介质
CN111767845B (zh) * 2020-06-29 2024-03-05 京东科技控股股份有限公司 证件识别方法及装置
CN113139419A (zh) * 2020-12-28 2021-07-20 西安天和防务技术股份有限公司 一种无人机检测方法及装置
CN112906571B (zh) * 2021-02-20 2023-09-05 成都新希望金融信息有限公司 活体识别方法、装置及电子设备
CN113392810A (zh) * 2021-07-08 2021-09-14 北京百度网讯科技有限公司 用于活体检测的方法、装置、设备、介质和产品
CN113971841A (zh) * 2021-10-28 2022-01-25 北京市商汤科技开发有限公司 一种活体检测方法、装置、计算机设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102622588A (zh) * 2012-03-08 2012-08-01 无锡数字奥森科技有限公司 双验证人脸防伪方法及装置
US20160143584A1 (en) * 2014-11-20 2016-05-26 Seiko Epson Corporation Biological information measuring apparatus
CN105975935A (zh) * 2016-05-04 2016-09-28 腾讯科技(深圳)有限公司 一种人脸图像处理方法和装置
CN108182409A (zh) * 2017-12-29 2018-06-19 北京智慧眼科技股份有限公司 活体检测方法、装置、设备及存储介质
CN109034102A (zh) * 2018-08-14 2018-12-18 腾讯科技(深圳)有限公司 人脸活体检测方法、装置、设备及存储介质
CN111242090A (zh) * 2020-01-22 2020-06-05 腾讯科技(深圳)有限公司 基于人工智能的人脸识别方法、装置、设备及介质

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102420985B (zh) * 2011-11-29 2014-01-22 宁波大学 一种多视点视频对象提取方法
US9529426B2 (en) * 2012-02-08 2016-12-27 Microsoft Technology Licensing, Llc Head pose tracking using a depth camera
CN102761765B (zh) * 2012-07-16 2014-08-20 清华大学 一种用于三维立体视频的深度快速插帧方法
CN103220543B (zh) * 2013-04-25 2015-03-04 同济大学 基于kinect的实时3d视频通信系统及其实现方法
US10475186B2 (en) * 2016-06-23 2019-11-12 Intel Corportation Segmentation of objects in videos using color and depth information
US9965865B1 (en) * 2017-03-29 2018-05-08 Amazon Technologies, Inc. Image data segmentation using depth data
US10217195B1 (en) * 2017-04-17 2019-02-26 Amazon Technologies, Inc. Generation of semantic depth of field effect
CN107480629A (zh) * 2017-08-11 2017-12-15 常熟理工学院 一种基于深度信息的疲劳驾驶检测方法及装置
CN108038453A (zh) * 2017-12-15 2018-05-15 罗派智能控制技术(上海)有限公司 一种基于rgbd的汽车驾驶员状态检测和识别系统
CN108596128B (zh) * 2018-04-28 2020-06-26 京东方科技集团股份有限公司 对象识别方法、装置及存储介质
CN109087351B (zh) * 2018-07-26 2021-04-16 北京邮电大学 基于深度信息对场景画面进行闭环检测的方法及装置
CN109740513B (zh) * 2018-12-29 2020-11-27 青岛小鸟看看科技有限公司 一种动作行为分析方法和装置
CN110634136B (zh) * 2019-09-17 2022-09-13 北京华捷艾米科技有限公司 一种管道壁破损检测方法、装置及系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102622588A (zh) * 2012-03-08 2012-08-01 无锡数字奥森科技有限公司 双验证人脸防伪方法及装置
US20160143584A1 (en) * 2014-11-20 2016-05-26 Seiko Epson Corporation Biological information measuring apparatus
CN105975935A (zh) * 2016-05-04 2016-09-28 腾讯科技(深圳)有限公司 一种人脸图像处理方法和装置
CN108182409A (zh) * 2017-12-29 2018-06-19 北京智慧眼科技股份有限公司 活体检测方法、装置、设备及存储介质
CN109034102A (zh) * 2018-08-14 2018-12-18 腾讯科技(深圳)有限公司 人脸活体检测方法、装置、设备及存储介质
CN111242090A (zh) * 2020-01-22 2020-06-05 腾讯科技(深圳)有限公司 基于人工智能的人脸识别方法、装置、设备及介质

Also Published As

Publication number Publication date
US20220309836A1 (en) 2022-09-29
CN111242090B (zh) 2023-06-23
CN111242090A (zh) 2020-06-05

Similar Documents

Publication Publication Date Title
CN110647865B (zh) 人脸姿态的识别方法、装置、设备及存储介质
US11678734B2 (en) Method for processing images and electronic device
WO2021147434A1 (zh) 基于人工智能的人脸识别方法、装置、设备及介质
US11288807B2 (en) Method, electronic device and storage medium for segmenting image
US11710351B2 (en) Action recognition method and apparatus, and human-machine interaction method and apparatus
WO2020233464A1 (zh) 模型训练方法、装置、存储介质及设备
WO2020224479A1 (zh) 目标的位置获取方法、装置、计算机设备及存储介质
CN110544272B (zh) 脸部跟踪方法、装置、计算机设备及存储介质
CN111541907B (zh) 物品显示方法、装置、设备及存储介质
CN111382624B (zh) 动作识别方法、装置、设备及可读存储介质
JP2021524957A (ja) 画像処理方法およびその、装置、端末並びにコンピュータプログラム
CN110135336B (zh) 行人生成模型的训练方法、装置及存储介质
CN110807361A (zh) 人体识别方法、装置、计算机设备及存储介质
CN108830186B (zh) 文本图像的内容提取方法、装置、设备及存储介质
CN112749613B (zh) 视频数据处理方法、装置、计算机设备及存储介质
CN108363982B (zh) 确定对象数量的方法及装置
CN112036331A (zh) 活体检测模型的训练方法、装置、设备及存储介质
JP7332813B2 (ja) 画像処理方法、装置、電子デバイス及び記憶媒体
US20210134022A1 (en) Method and electronic device for adding virtual item
CN110675473B (zh) 生成gif动态图的方法、装置、电子设备及介质
CN114741559A (zh) 确定视频封面的方法、设备及存储介质
CN111566693B (zh) 一种皱纹检测方法及电子设备
CN111753813A (zh) 图像处理方法、装置、设备及存储介质
CN110853124A (zh) 生成gif动态图的方法、装置、电子设备及介质
WO2021218823A1 (zh) 指纹活体检测方法、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20915540

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 14.11.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20915540

Country of ref document: EP

Kind code of ref document: A1