WO2023098128A1 - 活体检测方法及装置、活体检测系统的训练方法及装置 - Google Patents
活体检测方法及装置、活体检测系统的训练方法及装置 Download PDFInfo
- Publication number
- WO2023098128A1 WO2023098128A1 PCT/CN2022/110111 CN2022110111W WO2023098128A1 WO 2023098128 A1 WO2023098128 A1 WO 2023098128A1 CN 2022110111 W CN2022110111 W CN 2022110111W WO 2023098128 A1 WO2023098128 A1 WO 2023098128A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sample
- image
- living body
- body detection
- face
- Prior art date
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 327
- 238000012549 training Methods 0.000 title claims abstract description 98
- 238000000034 method Methods 0.000 title claims abstract description 82
- 239000000463 material Substances 0.000 claims abstract description 23
- 238000003062 neural network model Methods 0.000 claims description 55
- 230000004927 fusion Effects 0.000 claims description 38
- 238000004422 calculation algorithm Methods 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 5
- 238000005286 illumination Methods 0.000 abstract 3
- 230000008569 process Effects 0.000 description 39
- 230000006870 function Effects 0.000 description 38
- 210000000887 face Anatomy 0.000 description 37
- 238000012545 processing Methods 0.000 description 21
- 238000010586 diagram Methods 0.000 description 17
- 238000004891 communication Methods 0.000 description 11
- 238000005516 engineering process Methods 0.000 description 6
- 239000000284 extract Substances 0.000 description 5
- 238000001574 biopsy Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 4
- 238000003384 imaging method Methods 0.000 description 4
- 230000004913 activation Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000013500 data storage Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 239000011347 resin Substances 0.000 description 2
- 229920005989 resin Polymers 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 239000008186 active pharmaceutical agent Substances 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 210000004709 eyebrow Anatomy 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000036548 skin texture Effects 0.000 description 1
- 210000000707 wrist Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/64—Three-dimensional objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
- G06T7/593—Depth or shape recovery from multiple images from stereo images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/776—Validation; Performance evaluation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/803—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of input or preprocessed data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/40—Spoof detection, e.g. liveness detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/40—Spoof detection, e.g. liveness detection
- G06V40/45—Detection of the body part being alive
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
- G06T2207/10012—Stereo images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10141—Special mode during image acquisition
- G06T2207/10152—Varying illumination
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
Definitions
- the present application relates to the technical field of face biopsy detection, and in particular to a biopsy detection method and device, a biopsy detection system training method and device, electronic equipment and a storage medium.
- face liveness detection technology has become a key step in face recognition technology.
- the detection results obtained through face liveness detection are not accurate enough, and there is a risk of recognizing the prosthetic face as a live face.
- the present application proposes a living body detection method, device, electronic equipment and storage medium, which can solve the above problems.
- an embodiment of the present application provides a living body detection method, including: acquiring a first target image acquired by a first sensor on a face to be recognized, and acquiring a first target image acquired by a second sensor on a face to be identified Two target images; use the pre-trained depth generation network to extract the target depth information from the first target image and the second target image; detect the target depth information through the pre-trained live detection model, and obtain the live detection of the face to be recognized
- the living body detection model is obtained by training depth information extracted from sample data
- the sample data includes a first sample image collected by the first sensor and a second sample image collected by the second sensor under at least two lighting environments , wherein both the first sample image and the second sample image include prosthetic faces of different materials.
- an embodiment of the present application provides a training method for a living body detection system.
- the living body detection system includes a deep generation network and a living body detection model.
- the first sample image obtained by face collection and the second sample image obtained by the second sensor on the sample face, wherein the sample face includes prosthetic faces of different materials; the first sample image and the second sample image
- the two sample images are input into the initial generation network to train the initial generation network to obtain a deep generation network; use the depth generation network to extract the depth information of the sample face from the first sample image and the second sample image; the sample face Input the depth information of the neural network model into the neural network model to train the neural network model to obtain the living body detection model.
- an embodiment of the present application provides a living body detection device, the device including: an image acquisition module, a depth generation module, and a living body detection module.
- the image acquiring module is used to acquire the first target image acquired by the first sensor for the face to be recognized, and the second target image acquired by the second sensor for the face to be identified;
- the trained depth generation network extracts the target depth information from the first target image and the second target image;
- the living body detection module is used to detect the target depth information through the pre-trained living body detection model to obtain the living body of the face to be recognized Detection results, wherein the living body detection model is obtained by training the depth information extracted from the sample data, and the sample data includes the first sample image collected by the first sensor and the second sample collected by the second sensor under at least two lighting environments images, wherein both the first sample image and the second sample image include prosthetic human faces made of different materials.
- an embodiment of the present application provides a training device for a living body detection system.
- the living body detection system includes a deep generation network and a living body detection model.
- the first sample image acquired by the first sensor on the sample face and the second sample image obtained by the second sensor on the sample face, wherein the sample face includes prosthetic faces of different materials;
- network training The module is used to input the first sample image and the second sample image into the initial generation network to train the initial generation network to obtain a deep generation network;
- the depth extraction module is used to use the depth generation network to generate the network from the first sample image and extracting the depth information of the sample face from the second sample image;
- the model training module is used to input the depth information of the sample face into the neural network model to train the neural network model to obtain a living body detection model.
- an embodiment of the present application provides an electronic device, including: one or more processors; memory; one or more application programs, wherein the one or more application programs are stored in the memory and configured as Executed by one or more processors, one or more application programs are configured to execute the above-mentioned living body detection method or training method of a living body detection system.
- the embodiment of the present application provides a computer-readable storage medium, in which program code is stored, and the program code can be invoked by a processor to execute the above-mentioned living body detection method or the training method of the living body detection system.
- the embodiment of the present application provides a computer program product containing instructions, which is characterized in that the computer program product stores instructions, and when it is run on a computer, the computer can realize the above-mentioned living body detection method or living body detection system training method.
- This application can obtain two target images collected by two sensors for the same face to be recognized, and use the pre-trained depth generation network to extract the target depth information (that is, the depth information of the face to be recognized) based on the two target images. ), and then use the pre-trained living detection model to detect according to the target depth information, and obtain the living detection result of the face to be recognized.
- the living body detection model is trained by depth information extracted from sample data, and the sample data includes, under at least two lighting environments, a first sample image collected by the first sensor and a second sample image collected by the second sensor, and Both the first sample image and the second sample image include prosthetic human faces made of different materials.
- the technical solution of the present application can quickly obtain the depth information of the face according to the two images of the same face to be recognized by using the neural network, and determine the living body detection result of the face to be recognized according to the depth information, and then Realize efficient and high-accuracy liveness detection.
- the application can recognize prosthetic human faces under different lighting environments, so that the accuracy of living body detection is higher.
- FIG. 1 shows a schematic diagram of an application environment of a living body detection method provided by an embodiment of the present application
- Fig. 2 shows a schematic diagram of an application scenario of a living body detection method provided by an embodiment of the application
- Fig. 3 shows a schematic flow chart of a living body detection method provided by an embodiment of the present application
- Fig. 4 shows a schematic diagram of imaging of the first sensor and the second sensor provided by an embodiment of the present application
- FIG. 5 shows a schematic diagram of a processing flow of a living body detection system provided by an embodiment of the present application
- Fig. 6 shows a schematic flow chart of a living body detection method provided by another embodiment of the present application.
- FIG. 7 shows a schematic diagram of an extraction process of target depth information provided by an embodiment of the present application.
- FIG. 8 shows a schematic diagram of a processing process of central convolution provided by an embodiment of the present application.
- FIG. 9 shows a schematic flowchart of a training method of a living body detection system provided by an embodiment of the present application.
- Fig. 10 shows a schematic flow chart of the training process of the deep generation network in the living body detection system provided by an embodiment of the present application
- Fig. 11 shows a schematic diagram of a stereo matching algorithm provided by an embodiment of the present application.
- Fig. 12 shows a schematic flow chart of the training process of the living body detection model in the living body detection system provided by an embodiment of the present application
- Fig. 13 shows a schematic diagram of the processing flow of the training device of the living body detection system provided by an embodiment of the present application
- Fig. 14 shows a module block diagram of a living body detection device provided by an embodiment of the present application.
- Fig. 15 shows a module block diagram of a training device of a living body detection system provided by an embodiment of the present application
- Fig. 16 shows a structural block diagram of an electronic device provided by an embodiment of the present application.
- Fig. 17 shows a structural block diagram of a computer-readable storage medium provided by an embodiment of the present application.
- the inventors of the present application found after careful research that the depth generation network pre-trained can be used to extract the target depth information from the two images collected by the two sensors, and then the target depth information can be extracted using the pre-trained living body detection model. Liveness detection based on information can obtain more accurate detection results without increasing hardware costs.
- FIG. 1 shows a schematic diagram of an application environment of a living body detection method provided by an embodiment of the present application.
- the living body detection method and the training method of the living body detection system provided in the embodiments of the present application may be applied to electronic devices.
- the electronic device may be, for example, the server 110 shown in FIG. 1 , and the server 110 may be connected to the image acquisition device 120 through a network.
- the network is a medium used to provide a communication link between the server 110 and the image acquisition device 120 .
- the network may include various connection types, such as wired communication links, wireless communication links, etc., which are not limited in this embodiment of the present application.
- image capture device 120 may include a first sensor and a second sensor.
- the user's face images can be collected by the first sensor and the second sensor, and then the collected face images can be sent to the server 110 through the network.
- the server may perform liveness detection on the user based on these face images through the liveness detection method described in the embodiment of the present application.
- these face images may include a first target image collected by the first sensor for the user, and a second target image collected by the second sensor for the user.
- the server 110, the network and the image acquisition device 120 in FIG. 1 are only schematic. According to the realization requirements, there may be any number of servers, networks and image acquisition devices.
- the server 110 may be a physical server, or may be a server cluster composed of multiple servers
- the image acquisition device 120 may be a mobile phone, a tablet, a camera, a notebook computer, and the like. It can be understood that the embodiment of the present application may also allow multiple image acquisition devices 120 to access the server 110 at the same time.
- the electronic device may also be a smart phone, a tablet, a notebook computer, and the like.
- the image acquisition device 120 may be integrated into an electronic device, for example, an electronic device such as a smart phone, a tablet, or a notebook computer may be equipped with two sensors.
- the electronic device can collect a user's face image through these two sensors, and then perform liveness detection locally based on the collected face image.
- the living body detection passes.
- the user can continue to be further authenticated, and the collected face image and the detection result of the living body detection can also be displayed synchronously on the display interface of the electronic device.
- the living body detection method and device, the living body detection system training method and device, electronic equipment and storage media provided by the present application will be described in detail below through specific embodiments.
- Fig. 3 shows a schematic flowchart of a living body detection method provided by an embodiment of the present application. As shown in Figure 3, the living body detection method may specifically include the following steps:
- Step S310 Obtain a first target image captured by the first sensor for the face to be recognized, and acquire a second target image captured by the second sensor for the face to be recognized.
- the user's face image is usually collected in real time, and then the face image is recognized, and the user's identity is verified according to the face features in the face image.
- face detection it is necessary to use face detection to determine whether the user in the current face image is a real person, so as to prevent others from posing as real people through photos, face masks, etc. as a user.
- live face detection by detecting the face image, it can be identified whether the face image is collected from a real person (the corresponding detection result is a live body), or it is obtained from a prosthetic face (the corresponding detection result is Prosthesis).
- the detection result is a living body
- other processing procedures can be continued through the living body detection, for example, identity verification of the user can be performed.
- the face to be recognized in the liveness detection can be the recognition object during face recognition, such as a face that is close to the image acquisition device for recognition in application scenarios such as security or face payment.
- the recognition object may be a real user's face, or a forged prosthetic face.
- the prosthetic human face may be a photo of a human face, a face mask or a printed paper human face, and the like.
- the prosthetic human face may also be a virtual human face, such as an avatar generated based on a real human face.
- the first sensor and the second sensor may be used to collect a face image of a face to be recognized.
- the first sensor 430 and the second sensor 440 may be separated by a relatively short distance.
- a first target image 450 captured by the first sensor 430 and a second target image 460 captured by the second sensor 440 can be obtained. That is to say, the first target image 450 and the second target image 460 are face images collected at different positions for the same face to be recognized.
- the first target image and the second target image may have the same image size.
- the first sensor 430 and the second sensor 440 may be arranged directly in front of the face to be recognized during image collection.
- both the first sensor and the second sensor are located at the same level as the center point of the eyes of the face to be recognized.
- face distance For example, the distance between the first sensor and the center point of the eyes of the face to be recognized and the distance between the second sensor and the center point of the eyes of the face to be recognized may be determined.
- both the obtained first target image and the second target image can include the human face image of the human face to be recognized.
- using two sensors to collect images separately can obtain more detailed image information of the face to be recognized, and then use these image information to obtain more accurate detection results during living body detection.
- these image information can be higher-precision texture information, lighting information, etc. Using these texture information and lighting information, it is possible to detect prosthetic faces such as human face masks made of special materials.
- these images may be transmitted to the electronic device for liveness detection.
- both the first sensor and the second sensor may be visible light cameras, so the first target image and the second target image may be visible light images (which may be RGB images or grayscale images).
- the first sensor and the second sensor may be separated by a relatively short distance, such as 1 decimeter.
- the distance between the first sensor and the face to be recognized may be consistent with the distance between the second sensor and the face to be recognized.
- the shooting angle of the face to be recognized by the first sensor may also be consistent with the shooting angle of the face to be recognized by the second sensor.
- the first sensor and the second sensor may be disposed on the same binocular stereo sensor.
- the first sensor may be the left eye sensor of the binocular stereo sensor
- the second sensor may be the right eye sensor of the binocular stereo sensor.
- Step S320 using a pre-trained depth generation network to extract target depth information from the first target image and the second target image.
- the distance information between the two sensors and the face to be recognized can be determined by using the difference (parallax) between the first target image and the second target image respectively collected by the first sensor and the second sensor, and then The distance information can be used as the depth information of the face to be recognized.
- the depth information of the face to be recognized can be obtained through calculation according to a stereo matching algorithm.
- stereo matching algorithm to calculate depth information will consume more resources and time, which may lead to low detection efficiency, and cannot be applied to application scenarios that require frequent liveness detection.
- target depth information can be extracted from the first target image and the second target image through a pre-trained depth generation network.
- the target depth information may also represent the distance information between the first sensor and the second sensor and the face to be recognized.
- the depth generation network can use a lightweight generator, whose algorithm complexity is lower than that of the stereo matching algorithm, and depth information can be obtained with less resources, thereby improving the efficiency of liveness detection.
- the target depth information extracted from the first target image and the second target image to perform liveness detection on the face to be recognized, it is possible to distinguish whether the face to be recognized is a living body or a prosthesis.
- the face image of a real person and the face image of a prosthetic face present different features in the target depth information.
- the depth features corresponding to the living face of a real person may be determined as the living body feature
- the depth features corresponding to the prosthetic face may be determined as the prosthesis feature.
- Step S330 Detect the depth information of the target through the pre-trained liveness detection model, and obtain the liveness detection result of the face to be recognized.
- the living body detection model is obtained by training depth information extracted from sample data, and the sample data includes a first sample image collected by the first sensor and a second sample image collected by the second sensor under at least two lighting environments, wherein, Both the first sample image and the second sample image include prosthetic human faces made of different materials.
- the target depth information obtained in the preceding steps and corresponding to the face to be recognized can be input into a pre-trained liveness detection model, so as to perform liveness detection on the face to be recognized.
- the living body detection model can output the detection result of the face to be recognized based on the target depth information. It can be understood that the detection result of the liveness detection of the face to be recognized can be either a living body or a prosthesis.
- the detection result is living body, which means that the living body detection model confirms that the face to be recognized is a real face; the detection result is prosthetic, which means that the living body detection model confirms that the face to be recognized may not be a real face, but may be a fake Prosthetic face.
- the living body detection model can be trained according to the depth information extracted from the sample data.
- the sample data may be obtained by the first sensor and the second sensor jointly collecting images of the sample faces under different lighting environments. That is to say, the sample data may include a first sample image collected by the first sensor on the sample face and a second sample image collected by the second sensor on the same sample face under at least two lighting environments.
- the different lighting environments may include lighting environments such as strong light, weak light, cloudy sunlight, etc., or may include multiple lighting environments with different color temperatures. Therefore, under different lighting environments, the first sensor and the second sensor collect images of the sample faces, and multiple sets of sample data corresponding to various lighting environments can be obtained, wherein each set of sample data includes the first sample image and the second sample image.
- two sets of sample data can be obtained by collecting the same sample face in two environments of strong light and low light respectively, one set of sample data corresponds to the strong light environment, and the other set of sample data corresponds to the low light environment .
- the living body detection model can be adapted to the living body detection requirements in various lighting environments. Accurate biopsy results can be obtained under all conditions.
- the first sample image and the second sample image used during training may include prosthetic human faces made of different materials.
- the sample faces used for capturing the sample images may include prosthetic faces made of different materials.
- the sample face can be various prosthetic faces such as a paper photo, a paper face mask, a plastic face mask or a headgear made of resin, so the sample data can include using the first sensor and the second sensor to detect the paper face. Face images collected from various prosthetic faces such as high-quality photos, paper face masks, plastic face masks or resin headgears.
- the first sample image and the second sample image used during training may also include the real user's face image.
- the living body detection model may compare the target depth information with living body features corresponding to real people and/or prosthetic features corresponding to prosthetic faces, and then obtain detection results of faces to be recognized.
- the face of a real person is three-dimensional, so the target depth information extracted from the face image of a real person is diverse; while the prosthetic face is usually smooth, so the prosthesis
- the target depth information extracted from the face image is usually relatively simple. Therefore, if the target depth information extracted from the first target image and the second target image of the face to be recognized is relatively diverse, the face to be recognized can be determined as a living body (ie, a real face).
- the living body detection model may score the faces to be recognized based on the target depth information. Among them, in the living body detection model, the living body detection score can be calculated based on the target depth information. When the living body detection score meets the preset detection threshold, it can be determined that the face to be recognized is a living body. When the living body detection score meets the preset prosthetic threshold , it can be determined that the face to be recognized is a prosthesis.
- the target detection score of the face to be recognized can be calculated based on the target depth information, and then the target detection score is compared with the preset detection threshold to determine the target Whether the detection score satisfies the preset condition, and if the target detection score satisfies the preset condition, the face to be recognized is determined as a living body.
- the living body detection model can compare the target depth information with the corresponding depth features of the living body (ie living body features), and obtain the target detection score by calculating the similarity between the target depth information and the living body features. For example, the higher the similarity between the target depth information and the living body features, the higher the target detection score, and the lower the similarity between the target depth information and the living body features, the lower the target detection score.
- the target detection score may be the probability that the living body detection model determines the face to be recognized as a living body. For example, the probability can be obtained by normalizing the similarity between the target depth information and the living body feature through a softmax model.
- the preset detection threshold may be a detection threshold for detecting the face to be recognized as a living body.
- the preset detection threshold may be preset, or the preset detection threshold may also be determined during the training process of the living body detection model. For example, the higher the target detection score obtained by the liveness detection model, the closer the face to be recognized is to a real face. Therefore, as an example, when the target detection score is greater than a preset detection threshold, it may be determined that the face to be recognized is a living body. As another example, when the target detection score is less than a preset detection threshold, it may be determined that the face to be recognized is a fake.
- the target prosthesis score of the face to be recognized can be calculated based on the target depth information, and then the target prosthesis score and the predicted The prosthesis threshold is compared to determine whether the target prosthesis score satisfies the preset condition, and if the target prosthesis score satisfies the preset condition, it is determined that the face to be recognized is a prosthesis.
- the score of the target prosthesis can be obtained by calculating the similarity between the target depth information and the corresponding depth features of the prosthesis (i.e. prosthesis features).
- the target prosthesis score may be the probability that the liveness detection model determines the face to be recognized as a prosthesis.
- the preset fake threshold may be a detection threshold for detecting the face to be recognized as a fake.
- the preset dummy threshold may be preset, or the preset dummy threshold may also be determined during the training process of the living body detection model.
- the living body detection method provided by this application can acquire the first target image acquired by the first sensor for the face to be recognized and the second target image acquired by the second sensor for the same face to be identified, and use the
- the trained deep generation network extracts the target depth information from the first target image and the second target image, and then detects the target depth information through the pre-trained living body detection model to obtain the living body detection result of the face to be recognized.
- the living body detection model is trained by the depth information extracted from the sample data. Based on this, the living body detection method provided by this application extracts the target depth information from the two images collected by the first sensor and the second sensor through the depth generation network, and then uses the living body detection model to detect the target depth information to obtain the living body detection result. It can greatly reduce the consumption of computing resources, shorten the computing time, effectively improve the detection efficiency, and significantly improve the real-time performance of living body detection, especially suitable for actual living body detection scenarios.
- the sample data used in the living body detection method provided by the present application includes the first sample image collected by the first sensor and the second sample image collected by the second sensor under at least two lighting environments, and the first sample image and The second sample images all include prosthetic human faces of different materials, so the living body detection method provided by the present application can recognize prosthetic human faces of different materials under different lighting environments, so that the accuracy of living body detection is higher.
- this embodiment provides a living body detection method on the basis of the above embodiments, which can fuse the first target image and the second target image to obtain a target fusion image, and then combine the target The fused image is input into the deep generation network, and the target fusion image is processed in the deep generation network to obtain the target depth information.
- the target depth information extracted by the deep generative network can more accurately reflect the real features of the face to be recognized.
- FIG. 6 shows a schematic flowchart of a living body detection method provided by another embodiment of the present application. Specifically, the following steps may be included:
- Step S610 The first target image and the second target image are scaled down and then fused to obtain a target fused image.
- the target depth information is directly generated from the first target image and the second target image through the depth generation network, the size of the target depth information may become larger, causing the target depth information to be distorted and unable to accurately reflect the face to be recognized. real features. Therefore, in the embodiment of the present application, the first target image and the second target image can be scaled down and then fused to obtain a target fusion image, and then the target fusion image can be input into the deep generation network.
- the first target image and the second target image can be scaled down at the same time by downsampling, and then image fusion is performed on the two scaled down images to obtain the target fusion image .
- the image sizes of the first target image and the second target image may be the same.
- both have a resolution of 112 ⁇ 112 ⁇ 3.
- two feature maps can be generated, and then the two feature maps can be fused to obtain a target fused image with a resolution of, for example, 28 ⁇ 28 ⁇ 48.
- the first target image XL and the second target image XR can be input to the F-CDCN network shallow feature extractor including FeatherNet and the central difference convolution module for processing, and two images generated after downsampling are obtained
- the feature maps g(XL) and g(XR) and then use the feature stacking method to perform image fusion on the two feature maps g(XL) and g(XR) to obtain the target fusion image.
- the skeleton of the network is built with the structure of the lightweight network FeatherNetV2, and all the convolutions in the network are replaced by the central difference convolution.
- the processing method of the central difference convolution can be shown in FIG. 8 .
- the central difference convolution can be expressed as:
- y(.) is the output feature map
- x(.) is the input feature map
- P0 represents the current position of the input feature map and output feature map
- Pn represents the position of the local receptive field R
- ⁇ is ⁇ [0, 1], which can be used to measure the weight of different semantic information.
- Step S620 Input the target fusion image into the depth generation network, and process the target fusion image in the depth generation network to obtain target depth information.
- the target fusion image can be input into the depth generation network to generate target depth information.
- the deep generative network can be a pre-trained lightweight generator.
- the algorithmic complexity of the deep generation network may be less than that of the stereo matching algorithm.
- the target fusion image can be bilinearly upsampled, and then processed by the Sigmoid activation function to finally generate the target depth information.
- the target depth information can be expressed as
- the generated target depth information may be presented in the form of a pseudo depth map, and the resolution of the pseudo depth map may be 56 ⁇ 56 ⁇ 1.
- the process of reducing the first target image and the second target image in equal proportions, merging them, and using the deep generation network to obtain the target depth information can be uniformly processed in the NNB network. That is to say, in the embodiment of the present application, the target depth information can be obtained by directly inputting the first target image and the second target image into the NNB network.
- Such a modular processing method can make the image processing process in liveness detection more concise.
- the target fusion image is obtained by reducing the first target image and the second target image in equal proportions, and then inputting the target fusion image into the deep generation network, and processing the target fusion image in the deep generation network to obtain the target Depth information, so that the target depth information obtained by processing the two images collected by the first sensor and the second sensor by using the depth generation network can more accurately reflect the real characteristics of the face to be recognized, and then can make the life detection The test results are more real and reliable.
- At least two The sample data collected in various lighting environments are used to train the deep generation network and the live detection model.
- FIG. 9 shows a schematic flowchart of a training method for a living body detection system provided by an embodiment of the present application.
- the liveness detection system can include a deep generative network and a liveness detection model.
- the training method may include the following steps:
- Step S910 Obtain a first sample image obtained by capturing the sample face by the first sensor under at least two lighting environments and a second sample image obtained by capturing the sample face by the second sensor.
- sample faces include prosthetic faces of different materials.
- sample data for training can be collected in advance.
- the first sensor and the second sensor can be used to collect the same sample face under different lighting environments, so as to obtain the first sample image and the sample face collected by the first sensor.
- the second sample image acquired by the second sensor from the sample face is used as sample data for training.
- different lighting environments may include two or more lighting environments such as strong light, weak light, and shade or sunlight, or multiple lighting environments with different color temperatures, which is not limited in this embodiment of the present application.
- the sample data may include a plurality of first sample images and a plurality of second sample images obtained by the first sensor and the second sensor collecting images of the sample faces under different lighting environments.
- images of the sample face x1 can be collected under at least two lighting environments such as strong light, low light, and cloudy sunlight to obtain multiple sets of sample data.
- Each group of sample face x1 The sample data may correspond to a lighting environment.
- the first sample data corresponding to the strong light environment, the second sample data corresponding to the low light environment, the third sample data corresponding to the cloudy and sunny environment, etc. may be collected.
- the first sample data may include the first sample image collected from the sample face x1 under strong light and the second sample image of the sample face x1
- the second sample data may include the first sample image collected from the sample face x1 under low light and the second sample image of the sample face x1
- at least another set of sample data such as the third sample data can also be obtained.
- the sample data may include images of multiple sample faces.
- the sample faces may include prosthetic faces of different materials, and may also include faces of multiple real users.
- the sample data can be made more diverse, so that the trained liveness detection model can detect various human faces in different lighting environments.
- Step S920 Input the first sample image and the second sample image into the initial generation network to train the initial generation network to obtain a deep generation network.
- the pre-built initial generation network may be trained by using the first sample image and the second sample image to obtain a deep generation network.
- the first sample image and the second sample image can be reduced in proportion and then fused to obtain a sample fused image, and then the sample fused image is input into the initial generation network for training to obtain a deep generated image. network. Similar to the process of obtaining the target fusion image in the previous embodiment, the first sample image and the second sample image can be reduced in proportion by down-sampling, and then image fusion is performed on the two images after proportional reduction , to obtain the sample fusion image.
- the first sample image and the second sample image may be input into an image processing unit including a FeatherNet and a central difference convolution module for processing to obtain two feature maps generated after downsampling.
- an image processing unit including a FeatherNet and a central difference convolution module for processing to obtain two feature maps generated after downsampling.
- the specific training process of inputting the first sample image and the second sample image into the initial generation network to train the initial generation network to obtain the depth generation network may include the following steps:
- Step S1010 Based on the first sample image and the second sample image, use a stereo matching algorithm to calculate initial depth information.
- the initial depth information calculated by the stereo matching algorithm may be used as supervision information to train the depth generation network.
- the distance between the first sensor and the sample face is the same as the distance between the second sensor and the sample face.
- the shooting angle between the first sensor and the sample face may also be consistent with the shooting angle between the second sensor and the sample face. Therefore, in the stereo matching algorithm, the initial depth information of the sample face can be calculated according to the intrinsic parameters of the first sensor and the second sensor and the parallax between the first sensor and the second sensor, wherein the initial depth information can represent the first The linear distance between the first sensor and the second sensor and the face of the sample.
- the initial depth information may include distance information from the baseline midpoint of the first sensor and the second sensor to each spatial point on the face of the sample.
- O1 is the first sensor
- Or is the first sensor
- B is the baseline
- f is the focal length
- P is the position of the sample face to be tested (for example, it can be a spatial point on the sample face )
- P can be called the target point
- D is the straight-line distance from the target point P to the first sensor and the second sensor (for example, it can be the midpoint of the baseline between the two sensors)
- xl, xr are the target point P at
- xol and xor are respectively the intersection points of the optical axes of the first sensor and the second sensor and the two imaging planes
- xol and xor may be called principal points of the image. If the baselines of the first sensor and the second sensor are unified, it can be obtained by the principle of similar triangle:
- the initial depth information can be obtained:
- each set of sample data (including the first sample image and the second sample image) and corresponding initial depth information can form a piece of training data. Furthermore, all training data can be combined into a training set. By inputting the training set into the initial generation network and the neural network model for training, a deep generation network and a living body detection model can be obtained.
- the training set can be expressed, for example, as:
- x l is the first sample image collected by the first sensor
- x r is the second sample image collected by the second sensor.
- the resolution of both can be 112 ⁇ 112 ⁇ 3, and the two are used as the input of the network.
- b is the initial depth information obtained through the stereo matching algorithm, and the resolution of the initial depth information can be set to 56 ⁇ 56 ⁇ 1.
- b can be used as a depth label.
- y is the classification label of two categories of true and false, which is used to indicate whether the sample face included in the corresponding training data is a living body or a fake (for example, the classification label of "true” can be "1", indicating a living body; The classification label for "fake” can be "0", indicating a fake).
- n is the number of training data in the training set.
- the first sensor and the second sensor may both belong to one binocular stereo sensor.
- the first sensor may be the left eye sensor of the binocular stereo sensor
- the second sensor may be the right eye sensor of the binocular stereo sensor.
- Step S1020 Input the first sample image and the second sample image into the initial generation network, and use the initial depth information as supervision information to train the initial generation network to obtain a deep generation network, so that the depth generation network from the first The difference between the depth information extracted from the image and the second sample image and the initial depth information satisfies a preset difference condition.
- the initial generation network may be trained using the first sample image and the second sample image in each piece of training data in the training set.
- the initial depth information can be used as supervision information to train the initial generation network.
- a loss function may be constructed in the initial generation network to represent the difference between the depth information extracted by the depth generation network from the first sample image and the second sample image and the initial depth information.
- two sets of loss functions can be constructed for the initial generation network, and one set of loss functions is to represent the depth information
- the cross-entropy of the difference with the initial depth information Bi i.e., the depth label b in the training set
- Another set of loss functions is the relative depth loss, defined as shown in L2 NNB .
- i is used to represent the serial number of each training data in the training set.
- K contrast is a set of convolution kernels, for example, it can be defined as:
- the depth information obtained by the deep generation network can more accurately reflect the real characteristics of the sample face, so that when performing face liveness detection, accurate and accurate results can be obtained. Reliable test results.
- the method of extracting the target depth information through the depth generation network is used instead of the method of calculating the initial depth information through the stereo matching algorithm, which can greatly reduce the consumption of computing resources, shorten the computing time, and effectively improve the detection efficiency. Improve the real-time performance of liveness detection.
- Step S930 Using the depth generation network to extract the depth information of the sample face from the first sample image and the second sample image.
- the training process of the deep generation network and the training process of the living body detection model can be performed simultaneously, that is, the training of the deep generation network and the training of the living body detection network can be performed synchronously by using the training data in the training set.
- the initial generation network at this time can be used to extract depth information from the first sample image and the second sample image, and input the depth information to into the neural network model to train the neural network model. That is to say, the training iteration process of the initial generation network and the training iteration process of the neural network model can be nested together, and they tend to converge together. It should be understood that the initial generation network at this time may not have reached the training target, so the depth information may not be the best depth information, and there is still a large difference from the initial depth information. At this point, the depth information generated during each adjustment process can be input into the pre-built neural network model for training. After the two training iterations are completed, the current initial generation network and neural network model are determined to be the deep generation network and the live detection model.
- Step S940 Input the depth information of the sample face into the neural network model to train the neural network model to obtain a living body detection model.
- the depth information can be detected in the neural network model to obtain the living body detection result of the sample face.
- the detection result can be compared with the target label that is pre-marked on the sample face, and the neural network model can be trained by comparing the result, so that the detection difference between the detection result obtained by the living body detection model and the target label can meet the preset Detection conditions.
- the specific training process of inputting the depth information into the neural network model for training to obtain the living body detection model may include the following steps:
- Step S1210 Input the depth information of the sample face into the neural network model to obtain the liveness detection score of the sample face.
- the living body detection score is the probability that the neural network model determines the classification label of the sample face as the pre-marked target label.
- the sample faces can be scored based on depth information.
- the liveness detection score of the sample face can be obtained. Since each sample face can be labeled with a classification label in the training set, during the training process, the liveness detection score can be determined as a pre-marked classification label (i.e. target label) according to the depth information. The probability.
- the liveness detection score can be obtained by calculating the similarity between the depth information and the depth features represented by the pre-marked classification labels.
- the sample The liveness detection score of the face is obtained from the neural network model, which is the probability that the sample face is determined to be alive.
- the living body detection score can be obtained by normalizing the similarity between the depth information and the living body features, for example, through a softmax model.
- Step S1220 Determine the detection error based on the living body detection score.
- the classification label obtained by binary classification of the sample face by the neural network model can be obtained based on the liveness detection score output, and further, the Detection error of class labels versus pre-annotated target labels.
- Step S1230 Adjusting the neural network model based on the detection error to obtain a living body detection model, so that the detection error of the living body detection model satisfies a preset error condition.
- a living body detection score corresponding to a detection error satisfying a preset error condition may also be determined as a preset detection threshold.
- a loss function for liveness detection can be constructed based on the classification labels output by the neural network model and the pre-marked target labels.
- FocalLoss can be used to define the loss function L NNC of liveness detection.
- L NNC can be expressed as
- a t is a custom parameter, is the classification label output by the neural network model for the sample face contained in the i-th sample data, and Y i is the target label in the i-th sample data.
- the neural network model can be used to detect the liveness of the training set to obtain the detection error.
- the loss function of the living body detection can finally be obtained so that the loss function of the living body detection satisfies the preset condition (for example, the loss function specified by the loss function A neural network model whose detection error is less than a preset error threshold), the neural network model at this time can be used as a living body detection model.
- the preset condition for example, the loss function specified by the loss function
- the neural network model at this time can be used as a living body detection model.
- a living body detection score corresponding to a detection error satisfying a preset condition may also be determined as a preset detection threshold.
- the classification labels obtained through the output of the neural network model can be expressed as Among them, NNC(.) can represent the output of the neural network model.
- the pre-built neural network model can be an NNC network composed of a bottleneck layer, a downsampling layer, a flow module, and a fully connected layer.
- the depth information may be presented in the form of a pseudo-depth map
- the downsampling layer may sample the resolution of the pseudo-depth map to a size of 7 ⁇ 7.
- a fully connected layer can consist of 2 neurons and a softmax activation function. Then, the NNC network is trained through the pseudo-depth map.
- the NNC network After the pseudo-depth map is input into the NNC network, the NNC network will perform binary classification on the sample face according to the pseudo-depth map, calculate the scores for the living label and the fake label, and then use the loss function L NNC of living detection to calculate the predicted classification label and The error between the real classification labels, optimize the NNC network by adjusting the model parameters of each layer in the NNC network, reduce the classification error, thereby improving the accuracy of liveness detection.
- all convolutions in the neural network model can also be replaced by central difference convolution.
- the training process of the deep generation network and the training process of the liveness detection model can be carried out simultaneously, that is to say, the objective function can be used uniformly to represent the difference between the depth information output by the initial generation network and the initial depth information and the neural Detection error of the network model.
- the objective function can finally be obtained so that the objective function satisfies the preset condition ( For example, a deep generative network and a liveness detection network that are less than the target threshold).
- the preset condition For example, a deep generative network and a liveness detection network that are less than the target threshold.
- the detection error can be obtained by continuously adjusting the model parameters of the neural network model to meet the preset Liveness detection model for error conditions.
- the preset detection threshold of the liveness detection model can also be obtained. Therefore, when using the liveness detection model to perform liveness detection on the face to be recognized, once it is determined that the target detection score meets the preset conditions (such as greater than the detection threshold), it can be determined that the face to be recognized is a living body, and at the same time, the detection error can be constrained within a reasonable range, so as to achieve the purpose of improving the accuracy of living body detection.
- FIG. 13 shows a schematic flowchart of a model training process in a living body detection method provided by an embodiment of the present application.
- the first sample image (XL) and the second sample image (XR) collected by the first sensor and the second sensor on the sample face can be respectively input into the F with the same structure.
- the feature extractor can combine the functions of FeatherNet and the central difference convolution module. After downsampling and other processing, two feature maps g(XL) and g(XR) can be obtained.
- the image fusion of g(XL) and g(XR) can be performed by means of feature stacking to obtain the sample fusion image Z.
- the sample fused image Z can be input to the lightweight generator.
- the depth information can be generated by bilinear upsampling and Sigmoid activation function under the supervision of the initial depth information.
- the initial depth information may be the distance from the first sensor and the second sensor to the sample face calculated by the stereo matching algorithm for the first sample image (XL) and the second sample image (XR).
- the depth information may be presented in the form of a pseudo depth map, and the initial depth information may be presented in the form of a depth map.
- the resolution of the pseudo depth map is consistent with the resolution of the depth map, for example, it may be 56 ⁇ 56 ⁇ 1.
- the depth information is input into the NNC network, and the depth information can be used in the NNC network to classify the sample faces.
- the depth information in order to control the output error of the lightweight generator and the NNC network, two sets of loss functions L1 NNB and L2 NNB can be constructed in the lightweight generator, and a set of loss functions L NNC can be constructed in the NNC network.
- the four algorithm modules of feature extractor, feature fusion, lightweight generator and NNC network are trained by using face images collected by the first sensor and the second sensor under different lighting environments, At the same time, using the depth information obtained by the stereo matching algorithm as supervision, it can solve the problems of occlusion area, color style, and eyebrow ghosting in the process of replacing the real face with the prosthetic face.
- the first sensor and the second sensor may jointly form a binocular stereo sensor.
- FIG. 14 shows a module block diagram of a living body detection device provided by an embodiment of the present application.
- the device may include: an image acquisition module 1410 , a depth generation module 1420 and a living body detection module 1430 .
- the image acquisition module 1410 is used to acquire the first target image acquired by the first sensor for the face to be recognized, and the second target image obtained by the second sensor for the face to be identified;
- the depth generation module 1420 It is used to extract the target depth information from the first target image and the second target image by using the pre-trained depth generation network;
- the living body detection module 1430 is used to detect the target depth information through the pre-trained living body detection model , to obtain the liveness detection result of the face to be recognized.
- the living body detection model can be trained by depth information extracted from sample data.
- the sample data may include a first sample image captured by the first sensor and a second sample image captured by the second sensor under at least two lighting environments.
- both the first sample image and the second sample image include prosthetic human faces made of different materials.
- the aforementioned living body detection module 1430 may include: a score calculation module, configured to input the target depth information into the living body detection model to obtain the target detection score of the face to be recognized being detected as a living body; the score judgment module, It is used to determine the face to be recognized as a living body when the target detection score meets the preset detection threshold.
- the above-mentioned depth generation module 1420 may include: an image fusion module, configured to fuse the first target image and the second target image to obtain a target fusion image; a depth generation submodule, configured to combine the The target fusion image is input to the deep generation network, and the target fusion image is processed in the deep generation network to obtain the target depth information.
- the above-mentioned image fusion module may also be used to scale down the first target image and the second target image, and fuse the two scaled down images.
- FIG. 15 shows a block diagram of a training device of a living body detection system provided by an embodiment of the present application.
- the liveness detection system can include a deep generative network and a liveness detection model.
- the device may include: a sample acquisition module 1510 , a network training module 1520 , a depth extraction module 1530 and a model training module 1540 .
- the sample acquisition module 1510 is configured to acquire the first sample image obtained by the first sensor collecting the sample face under at least two lighting environments and the second sample image obtained by the second sensor collecting the sample face.
- Sample images wherein the sample faces include prosthetic faces of different materials
- the network training module 1520 is used to input the first sample image and the second sample image into the initial generation network to train the initial generation network to obtain the depth Generate a network
- the depth extraction module 1530 is used to use the depth generation network to extract the depth information of the sample face from the first sample image and the second sample image
- the model training module 1540 is used to input the depth information into the pre-built neural network
- the neural network model is trained to obtain a living body detection model.
- the above-mentioned network training module 1520 may include: a stereo matching module, used to calculate initial depth information using a stereo matching algorithm according to the first sample image and the second sample image; a supervision module, using Input the first sample image and the second sample image into the initial generation network, and use the initial depth information as supervision information to train the initial generation network to obtain a deep generation network, so that the first sample image can be obtained through the deep generation network and the difference between the depth information extracted from the second sample image and the initial depth information satisfies a preset difference condition.
- the network training module 1520 may further include: a sample fusion module, configured to fuse the first sample image and the second sample image to obtain a sample fusion image; the network training submodule , which is used to input the sample fusion image into the initial generation network for training to obtain a deep generation network.
- a sample fusion module configured to fuse the first sample image and the second sample image to obtain a sample fusion image
- the network training submodule which is used to input the sample fusion image into the initial generation network for training to obtain a deep generation network.
- the above sample fusion module may also be used to scale down the first sample image and the second sample image, and fuse the two scaled down images.
- the above-mentioned model training module 1540 may include: a sample score calculation module, configured to input depth information into the neural network model to obtain a liveness detection score of a sample human face, wherein the liveness detection score is the sample human face The classification label of the face is determined as the probability of the pre-marked target label; the error determination module is used to determine the detection error based on the live body detection score, and adjust the neural network model based on the detection error to obtain the living body detection model, so that the living body The detection error of the detection model satisfies a preset error condition.
- model training module 1540 may further include: a model training sub-module, configured to determine a living body detection score corresponding to a detection error satisfying a preset error condition as a preset detection threshold.
- the first sensor and the second sensor together form a binocular stereo sensor.
- the coupling or direct coupling or communication connection between the modules shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or modules may be electrical, mechanical or otherwise.
- each functional module in each embodiment of the present application may be integrated into one processing module, each module may exist separately physically, or two or more modules may be integrated into one module.
- the above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules.
- FIG. 16 shows a structural block diagram of an electronic device provided by an embodiment of the present application.
- the electronic device in this embodiment may include one or more of the following components: a processor 1610, a memory 1620, and one or more application programs, wherein one or more application programs may be stored in the memory 1620 and configured as Executed by one or more processors 1610, one or more application programs are configured to execute the methods described in the foregoing method embodiments.
- the electronic device may be any of various types of computer system devices that are mobile, portable, and perform wireless communication.
- the electronic device can be a mobile phone or a smart phone (for example, based on iPhone TM, a phone based on Android TM), a portable game device (such as Nintendo DS TM, PlayStation Portable TM, Gameboy Advance TM, iPhone TM), a laptop Computers, PDAs, portable Internet devices, music players and data storage devices, other handheld devices and such as smart watches, smart bracelets, earphones, pendants, etc.
- electronic devices can also be other wearable devices (for example, such as electronic glasses, e-clothes, e-bracelets, e-necklaces, e-tattoos, electronic devices or head-mounted devices (HMD)).
- HMD head-mounted devices
- the electronic device may also be any of a number of electronic devices including, but not limited to, cellular phones, smart phones, smart watches, smart bracelets, other wireless communication devices, personal digital assistants, audio players, other media Players, Music Recorders, Video Recorders, Cameras, Other Media Recorders, Radios, Medical Equipment, Vehicle Transportation Instruments, Calculators, Programmable Remote Controls, Pagers, Laptop Computers, Desktop Computers, Printers, Netbook Computers, Personal Digital Assistants (PDAs), Portable Multimedia Players (PMPs), Moving Picture Experts Group (MPEG-1 or MPEG-2) Audio Layer 3 (MP3) players, Portable Medical Devices, and Digital Cameras and combinations thereof.
- PDAs Personal Digital Assistants
- PMPs Portable Multimedia Players
- MPEG-1 or MPEG-2 Moving Picture Experts Group Audio Layer 3
- an electronic device can perform multiple functions (eg, play music, display video, store pictures, and receive and send phone calls).
- the electronic device may be a device such as a cellular phone, media player, other handheld device, wrist watch device, pendant device, earpiece device, or other compact portable device.
- the electronic device can also be a server, for example, it can be an independent physical server, or it can be a server cluster or a distributed system composed of multiple physical servers, and it can also be a server that provides cloud services, cloud databases, cloud computing, and cloud functions. , cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDN (Content Delivery Network, content distribution network), and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms. Provide specialized or platform servers for face recognition, automatic driving, industrial Internet services, data communications (such as 4G, 5G, etc.).
- Processor 1610 may include one or more processing cores.
- the processor 1610 uses various interfaces and lines to connect various parts of the entire electronic device, and executes by running or executing instructions, application programs, code sets or instruction sets stored in the memory 1620, and calling data stored in the memory 1620.
- the processor 1610 may adopt at least one of Digital Signal Processing (Digital Signal Processing, DSP), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), and Programmable Logic Array (Programmable Logic Array, PLA). implemented in the form of hardware.
- DSP Digital Signal Processing
- FPGA Field-Programmable Gate Array
- PLA Programmable Logic Array
- the processor 1610 may integrate one or a combination of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), a modem, and the like.
- CPU Central Processing Unit
- GPU Graphics Processing Unit
- the CPU mainly handles the operating system, user interface and application programs, etc.
- the GPU is used to render and draw the displayed content
- the modem is used to handle wireless communication. It can be understood that the above modem may also not be integrated into the processor 1610, but implemented by a communication chip alone.
- the memory 1620 may include random access memory (Random Access Memory, RAM), and may also include read-only memory (Read-Only Memory). Memory 1620 may be used to store instructions, applications, codes, sets of codes, or sets of instructions.
- the memory 1620 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system and instructions for implementing at least one function (such as a touch function, a sound playback function, an image playback function, etc.) , instructions for implementing the following method embodiments, and the like.
- the data storage area may also be data created by the electronic device during use (such as phone book, audio and video data, chat record data) and the like.
- FIG. 17 shows a structural block diagram of a computer-readable storage medium provided by an embodiment of the present application.
- Program codes are stored in the computer-readable storage medium 1700, and the program codes can be invoked by a processor to execute the methods described in the foregoing method embodiments.
- the computer readable storage medium 1700 may be an electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM.
- the computer-readable storage medium 1700 includes a non-transitory computer-readable storage medium (non-transitory computer-readable storage medium).
- the computer-readable storage medium 1700 has a storage space for program code 1710 for executing any method steps in the above methods. These program codes can be read from or written into one or more computer program products.
- Program code 1710 may, for example, be compressed in a suitable form.
- the computer-readable storage medium 1700 may be, for example, a read-only memory (Read-Only Memory, ROM for short), a random access memory (Random Access Memory, RAM for short), SSD, an electrically erasable programmable read-only memory (Electrically Erasable Programmable read only memory, referred to as EEPROM) or flash memory (Flash Memory, referred to as Flash), etc.
- ROM Read-Only Memory
- RAM Random Access Memory
- SSD an electrically erasable programmable read-only memory
- Flash memory Flash Memory
- a computer program product or computer program comprising computer instructions stored on a computer readable storage medium.
- the processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the steps in the foregoing method embodiments.
- the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is more best implementation.
- the technical solution of the present application can be embodied in the form of a software product in essence or the part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, SSD, Flash ) includes several instructions to make a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the method of each embodiment of the present application.
- a terminal which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.
- the present application provides a living body detection method, device, electronic equipment and storage medium. Specifically, the present application can obtain the first target image collected by the first sensor for the face to be recognized and the second target image collected by the second sensor for the same face to be recognized, and use the pre-trained deep generation network to generate The target depth information is extracted from the target image and the second target image, and then the target depth information is detected by a pre-trained liveness detection model, so as to obtain the liveness detection result of the face to be recognized.
- the living body detection model is obtained by training depth information extracted from sample data, and the sample data includes, under at least two lighting environments, a first sample image collected by the first sensor and a second sample image collected by the second sensor, And both the first sample image and the second sample image include prosthetic human faces made of different materials.
- this application extracts the target depth information from the two images collected by the first sensor and the second sensor through the depth generation network, and then uses the living body detection model to detect the target depth information to obtain the living body detection result, which can greatly reduce the cost of computing resources. Consumption, shorten the calculation time, effectively improve the detection efficiency, significantly improve the real-time performance of live detection, especially suitable for actual live detection scenarios.
- the living body detection method provided by the present application can recognize prosthetic human faces made of different materials under different lighting environments, so that the accuracy of living body detection is higher.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Human Computer Interaction (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (20)
- 一种活体检测方法,其特征在于,包括:获取第一传感器对待识别人脸采集而得到的第一目标图像,获取第二传感器对所述待识别人脸采集而得到的第二目标图像;利用预先训练的深度生成网络,从所述第一目标图像及所述第二目标图像中提取目标深度信息;通过预先训练的活体检测模型对所述目标深度信息进行检测,得到所述待识别人脸的活体检测结果,其中,所述活体检测模型由从样本数据提取的深度信息训练而得到,所述样本数据包括在至少两种光照环境下,所述第一传感器采集的第一样本图像和所述第二传感器采集的第二样本图像,其中,所述第一样本图像和所述第二样本图像均包括不同材质的假体人脸。
- 根据权利要求1所述的活体检测方法,其特征在于,所述利用预先训练的深度生成网络,从所述第一目标图像及所述第二目标图像中提取目标深度信息,包括:将所述第一目标图像及所述第二目标图像进行融合,得到目标融合图像;将所述目标融合图像输入所述深度生成网络,在所述深度生成网络中对所述目标融合图像处理得到所述目标深度信息。
- 根据权利要求2所述的活体检测方法,其特征在于,所述将所述第一目标图像及所述第二目标图像进行融合,得到目标融合图像,包括:将所述第一目标图像及所述第二目标图像等比例缩小后进行融合,得到所述目标融合图像。
- 根据权利要求1所述的活体检测方法,其特征在于,所述第一传感器和所述第二传感器为双目立体传感器上的左目传感器和右目传感器。
- 根据权利要求1-4任一项所述的活体检测方法,其特征在于,所述第一目标图像及所述第二目标图像均为可见光图像。
- 根据权利要求1所述的活体检测方法,其特征在于,还包括:将所述第一样本图像及所述第二样本图像输入初始生成网络中以对所述初始生成网络进行训练,得到所述深度生成网络;利用所述深度生成网络,从所述第一样本图像及所述第二样本图像中提取所述深度信息;将所述深度信息输入神经网络模型中以对所述神经网络模型进行训练,得到所述活体检测模型。
- 根据权利要求6所述的活体检测方法,其特征在于,所述将所述第一样本图像及所述第二样本图像输入初始生成网络中以对所述初始生成网络进行训练,得到所述深度生成网络,包括:基于所述第一样本图像及所述第二样本图像,使用立体匹配算法计算得到初始深度信息;将所述第一样本图像及所述第二样本图像输入所述初始生成网络,并利用所述初始深度信息作为监督信息,对所述初始生成网络进行训练,得到所述深度生成网络,以使通过所述深度生成网络从所述第一样本图像及所述第二样本图像中提取的所述深度信息与所述初始深度信息之间的差异满足预设差异条件。
- 根据权利要求6所述的活体检测方法,其特征在于,所述将所述第一样本图像及所述第二样本图像输入初始生成网络中以对所述初始生成网络进行训练,得到所述深度生成网络,包括:将所述第一样本图像及所述第二样本图像等比例缩小后进行融合,得到样本融合图像;将所述样本融合图像输入所述初始生成网络以对所述初始生成网络进行训练,得到所述深度生成网络。
- 根据权利要求6所述的活体检测方法,其特征在于,所述将所述深度信息输入神经网络模型中以对所述神经网络模型进行训练,得到所述活体检测模型,包括:将所述深度信息输入所述神经网络模型,得到所述样本人脸的活体检测分值,其中,所述活体检测分值为所述神经网络模型将所述样本人脸的分类标签确定为预先标注的目标标签的概率;基于所述活体检测分值确定检测误差;基于所述检测误差调整所述神经网络模型,得到所述活体检测模型,以使所述活体检测模型的检测误差满足预设误差条件。
- 一种活体检测系统的训练方法,其特征在于,所述活体检测系统包括深度生成网络和活体检测模型,所述训练方法包括:获取在至少两种光照环境下第一传感器对样本人脸采集而得到的第一样本图像以及第二传感器对所述样本人脸采集而得到的第二样本图像,其中,所述样本人脸包括不同材质的假体人脸;将所述第一样本图像及所述第二样本图像输入初始生成网络中以对所述初始生成网络进行训练,得到所述深度生成网络;利用所述深度生成网络,从所述第一样本图像及所述第二样本图像中提取所述样本人脸的深度信息;将所述样本人脸的深度信息输入神经网络模型中以对所述神经网络模型进行训练,得到所述活体检测模型。
- 根据权利要求10所述的训练方法,其特征在于,所述第一样本图像和所述第二样本图像均为可见光图像。
- 根据权利要求10所述的训练方法,其特征在于,所述将所述第一样本图像及所述第二样本图像输入初始生成网络中以对所述初始生成网络进行训练,得到所述深度生成网络,包括:将所述第一样本图像及所述第二样本图像进行融合,得到样本融合图像;将所述样本融合图像输入所述初始生成网络以对所述初始生成网络进行训练,得到所述深度生成网络。
- 根据权利要求12所述的训练方法,其特征在于,所述将所述第一样本图像及所述第二样本图像进行融合,得到样本融合图像,包括:将所述第一样本图像及所述第二样本图像等比例缩小后进行融合,得到所述样本融合图像。
- 根据权利要求10所述的训练方法,其特征在于,所述将所述第一样本图像及所述第二样本图像输入初始生成网络中以对所述初始生成网络进行训练,得到所述深度生成网络,包括:基于所述第一样本图像及所述第二样本图像,使用立体匹配算法计算得到初始深度信息;将所述第一样本图像及所述第二样本图像输入所述初始生成网络,并以所述初始深度信息作为监督信息,对所述初始生成网络进行训练,得到所述深度生成网络,以使通过所述深度生成网络从所述第一样本图像及所述第二样本图像中提取的深度信息与所述初始深度信息之间的差异满足预设差异条件。
- 根据权利要求10所述的训练方法,其特征在于,所述将所述样本人脸的深度信息输入神经网络模型中以对所述神经网络模型进行训练,得到活体检测模型,包括:将所述样本人脸的深度信息输入所述神经网络模型,得到所述样本人脸的活体检测分值,其中,所述活体检测分值为所述神经网络模型将所述样本人脸的分类标签确定为预先标注的目标标签的概率;基于所述活体检测分值确定检测误差;基于所述检测误差调整所述神经网络模型,当所述检测误差满足预设误差条件时,确定当前的神经网络模型为所述活体检测模型。
- 一种活体检测装置,其特征在于,包括:图像获取模块,用于获取第一传感器对待识别人脸采集而得到的第一目标图像,获取第二传感器对所述待识别人脸采集而得到的第二目标图像;深度生成模块,用于利用预先训练的深度生成网络,从所述第一目标图像及所述第二目标图像中提取目标深度信息;活体检测模块,用于通过预先训练的活体检测模型对所述目标深度信息进行检测,得到所述待识别人脸的活体检测结果,其中,所述活体检测模型由从样本数据提取的深度信息训练而得到,所述样本数据包括在至少两种光照环境下,所述第一传感器采集的第一样本图像和所述第二传感器采集的第二样本图像,其中,所述第一样本图像和所述第二样本图像均包括不同材质的假体人脸。
- 一种活体检测系统的训练装置,其特征在于,所述活体检测系统包括深度生成网络和活体检测模型,所述训练装置包括:样本获取模块,用于获取在至少两种光照环境下第一传感器对样本人脸采集而得到的第一样本图像以及第二传感器对所述样本人脸采集而得到的第二样本图像,其中,所述样本人脸包括不同材质的假体人脸;网络训练模块,用于将所述第一样本图像及所述第二样本图像输入初始生成网络中以对所述初始生成网络进行训练,得到深度生成网络;深度提取模块,用于利用所述深度生成网络,从所述第一样本图像及所述第二样本图像中提取所述样本人脸的深度信息;模型训练模块,用于将所述样本人脸的深度信息输入神经网络模型中以对所述神经网络模型进行训练,得到活体检测模型。
- 一种电子设备,其特征在于,包括:一个或多个处理器;存储器;一个或多个程序,其中所述一个或多个程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个程序配置用于执行如权利要求1至9任一项所述的活体检测方法或者如权利要求10至15任一项所述的训练方法。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有程序代码,所述程序代码可被处理器调用执行如权利要求1至9任一项所述的活体检测方法或者如权利要求10至15任一项所述的训练方法。
- 一种包含指令的计算机程序产品,其特征在于,所述计算机程序产品中存储有指令,当其在计算机上运行时,使得计算机实现如权利要求1至9任一项所述的活体检测方法或者如权利要求10至15任一项所述的训练方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/568,910 US20240282149A1 (en) | 2021-12-01 | 2022-08-03 | Liveness detection method and apparatus, and training method and apparatus for liveness detection system |
EP22899948.8A EP4345777A1 (en) | 2021-12-01 | 2022-08-03 | Living body detection method and apparatus, and training method and apparatus for living body detection system |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111454390.6 | 2021-12-01 | ||
CN202111454390.6A CN114333078B (zh) | 2021-12-01 | 2021-12-01 | 活体检测方法、装置、电子设备及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023098128A1 true WO2023098128A1 (zh) | 2023-06-08 |
Family
ID=81049548
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/110111 WO2023098128A1 (zh) | 2021-12-01 | 2022-08-03 | 活体检测方法及装置、活体检测系统的训练方法及装置 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20240282149A1 (zh) |
EP (1) | EP4345777A1 (zh) |
CN (1) | CN114333078B (zh) |
WO (1) | WO2023098128A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117576791A (zh) * | 2024-01-17 | 2024-02-20 | 杭州魔点科技有限公司 | 基于生机线索和垂直领域大模型范式的活体检测方法 |
CN117688538A (zh) * | 2023-12-13 | 2024-03-12 | 上海深感数字科技有限公司 | 一种基于数字身份安全防范的互动教育管理方法及系统 |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114333078B (zh) * | 2021-12-01 | 2024-07-23 | 马上消费金融股份有限公司 | 活体检测方法、装置、电子设备及存储介质 |
CN114841340B (zh) * | 2022-04-22 | 2023-07-28 | 马上消费金融股份有限公司 | 深度伪造算法的识别方法、装置、电子设备及存储介质 |
CN114842399B (zh) * | 2022-05-23 | 2023-07-25 | 马上消费金融股份有限公司 | 视频检测方法、视频检测模型的训练方法及装置 |
CN115116147B (zh) * | 2022-06-06 | 2023-08-08 | 马上消费金融股份有限公司 | 图像识别、模型训练、活体检测方法及相关装置 |
JP7450668B2 (ja) | 2022-06-30 | 2024-03-15 | 維沃移動通信有限公司 | 顔認識方法、装置、システム、電子機器および読み取り可能記憶媒体 |
CN115131572A (zh) * | 2022-08-25 | 2022-09-30 | 深圳比特微电子科技有限公司 | 一种图像特征提取方法、装置和可读存储介质 |
CN116132084A (zh) * | 2022-09-20 | 2023-05-16 | 马上消费金融股份有限公司 | 视频流处理方法、装置及电子设备 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109034102A (zh) * | 2018-08-14 | 2018-12-18 | 腾讯科技(深圳)有限公司 | 人脸活体检测方法、装置、设备及存储介质 |
CN110765923A (zh) * | 2019-10-18 | 2020-02-07 | 腾讯科技(深圳)有限公司 | 一种人脸活体检测方法、装置、设备及存储介质 |
CN111310724A (zh) * | 2020-03-12 | 2020-06-19 | 苏州科达科技股份有限公司 | 基于深度学习的活体检测方法、装置、存储介质及设备 |
CN112200057A (zh) * | 2020-09-30 | 2021-01-08 | 汉王科技股份有限公司 | 人脸活体检测方法、装置、电子设备及存储介质 |
CN113505682A (zh) * | 2021-07-02 | 2021-10-15 | 杭州萤石软件有限公司 | 活体检测方法及装置 |
CN114333078A (zh) * | 2021-12-01 | 2022-04-12 | 马上消费金融股份有限公司 | 活体检测方法、装置、电子设备及存储介质 |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108764069B (zh) * | 2018-05-10 | 2022-01-14 | 北京市商汤科技开发有限公司 | 活体检测方法及装置 |
US10956714B2 (en) * | 2018-05-18 | 2021-03-23 | Beijing Sensetime Technology Development Co., Ltd | Method and apparatus for detecting living body, electronic device, and storage medium |
CN112464690A (zh) * | 2019-09-06 | 2021-03-09 | 广州虎牙科技有限公司 | 活体识别方法、装置、电子设备及可读存储介质 |
CN111091063B (zh) * | 2019-11-20 | 2023-12-29 | 北京迈格威科技有限公司 | 活体检测方法、装置及系统 |
CN110909693B (zh) * | 2019-11-27 | 2023-06-20 | 深圳华付技术股份有限公司 | 3d人脸活体检测方法、装置、计算机设备及存储介质 |
CN111597938B (zh) * | 2020-05-07 | 2022-02-22 | 马上消费金融股份有限公司 | 活体检测、模型训练方法及装置 |
CN111597944B (zh) * | 2020-05-11 | 2022-11-15 | 腾讯科技(深圳)有限公司 | 活体检测方法、装置、计算机设备及存储介质 |
CN111814589A (zh) * | 2020-06-18 | 2020-10-23 | 浙江大华技术股份有限公司 | 部位识别方法以及相关设备、装置 |
CN111767879A (zh) * | 2020-07-03 | 2020-10-13 | 北京视甄智能科技有限公司 | 一种活体检测方法 |
CN112200056B (zh) * | 2020-09-30 | 2023-04-18 | 汉王科技股份有限公司 | 人脸活体检测方法、装置、电子设备及存储介质 |
CN113128481A (zh) * | 2021-05-19 | 2021-07-16 | 济南博观智能科技有限公司 | 一种人脸活体检测方法、装置、设备及存储介质 |
-
2021
- 2021-12-01 CN CN202111454390.6A patent/CN114333078B/zh active Active
-
2022
- 2022-08-03 WO PCT/CN2022/110111 patent/WO2023098128A1/zh active Application Filing
- 2022-08-03 EP EP22899948.8A patent/EP4345777A1/en active Pending
- 2022-08-03 US US18/568,910 patent/US20240282149A1/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109034102A (zh) * | 2018-08-14 | 2018-12-18 | 腾讯科技(深圳)有限公司 | 人脸活体检测方法、装置、设备及存储介质 |
CN110765923A (zh) * | 2019-10-18 | 2020-02-07 | 腾讯科技(深圳)有限公司 | 一种人脸活体检测方法、装置、设备及存储介质 |
CN111310724A (zh) * | 2020-03-12 | 2020-06-19 | 苏州科达科技股份有限公司 | 基于深度学习的活体检测方法、装置、存储介质及设备 |
CN112200057A (zh) * | 2020-09-30 | 2021-01-08 | 汉王科技股份有限公司 | 人脸活体检测方法、装置、电子设备及存储介质 |
CN113505682A (zh) * | 2021-07-02 | 2021-10-15 | 杭州萤石软件有限公司 | 活体检测方法及装置 |
CN114333078A (zh) * | 2021-12-01 | 2022-04-12 | 马上消费金融股份有限公司 | 活体检测方法、装置、电子设备及存储介质 |
Non-Patent Citations (1)
Title |
---|
ZITONG YU; JUN WAN; YUNXIAO QIN; XIAOBAI LI; STAN Z. LI; GUOYING ZHAO: "NAS-FAS: Static-Dynamic Central Difference Network Search for Face Anti-Spoofing", ARXIV.ORG, 3 November 2020 (2020-11-03), XP081807406, DOI: 10.1109/TPAMI.2020.3036338 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117688538A (zh) * | 2023-12-13 | 2024-03-12 | 上海深感数字科技有限公司 | 一种基于数字身份安全防范的互动教育管理方法及系统 |
CN117688538B (zh) * | 2023-12-13 | 2024-06-07 | 上海深感数字科技有限公司 | 一种基于数字身份安全防范的互动教育管理方法及系统 |
CN117576791A (zh) * | 2024-01-17 | 2024-02-20 | 杭州魔点科技有限公司 | 基于生机线索和垂直领域大模型范式的活体检测方法 |
CN117576791B (zh) * | 2024-01-17 | 2024-04-30 | 杭州魔点科技有限公司 | 基于生机线索和垂直领域大模型范式的活体检测方法 |
Also Published As
Publication number | Publication date |
---|---|
CN114333078A (zh) | 2022-04-12 |
CN114333078B (zh) | 2024-07-23 |
US20240282149A1 (en) | 2024-08-22 |
EP4345777A1 (en) | 2024-04-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2023098128A1 (zh) | 活体检测方法及装置、活体检测系统的训练方法及装置 | |
US10997787B2 (en) | 3D hand shape and pose estimation | |
JP7130057B2 (ja) | 手部キーポイント認識モデルの訓練方法及びその装置、手部キーポイントの認識方法及びその装置、並びにコンピュータプログラム | |
CN111652121B (zh) | 一种表情迁移模型的训练方法、表情迁移的方法及装置 | |
WO2020103700A1 (zh) | 一种基于微表情的图像识别方法、装置以及相关设备 | |
Houshmand et al. | Facial expression recognition under partial occlusion from virtual reality headsets based on transfer learning | |
CN113395542B (zh) | 基于人工智能的视频生成方法、装置、计算机设备及介质 | |
CN109753875A (zh) | 基于人脸属性感知损失的人脸识别方法、装置与电子设备 | |
US20220309836A1 (en) | Ai-based face recognition method and apparatus, device, and medium | |
CN111444826B (zh) | 视频检测方法、装置、存储介质及计算机设备 | |
CN111598168B (zh) | 图像分类方法、装置、计算机设备及介质 | |
CN111046734A (zh) | 基于膨胀卷积的多模态融合视线估计方法 | |
US12080098B2 (en) | Method and device for training multi-task recognition model and computer-readable storage medium | |
WO2023178906A1 (zh) | 活体检测方法及装置、电子设备、存储介质、计算机程序、计算机程序产品 | |
CN113298018A (zh) | 基于光流场和脸部肌肉运动的假脸视频检测方法及装置 | |
US20230281833A1 (en) | Facial image processing method and apparatus, device, and storage medium | |
CN112257513A (zh) | 一种手语视频翻译模型的训练方法、翻译方法及系统 | |
CN113516665A (zh) | 图像分割模型的训练方法、图像分割方法、装置、设备 | |
Shehada et al. | A lightweight facial emotion recognition system using partial transfer learning for visually impaired people | |
CN117237547B (zh) | 图像重建方法、重建模型的处理方法和装置 | |
CN112528760B (zh) | 图像处理方法、装置、计算机设备及介质 | |
CN117540007A (zh) | 基于相似模态补全的多模态情感分析方法、系统和设备 | |
CN110866508B (zh) | 识别目标对象的形态的方法、装置、终端及存储介质 | |
WO2024059374A1 (en) | User authentication based on three-dimensional face modeling using partial face images | |
CN116959123A (zh) | 一种人脸活体检测方法、装置、设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22899948 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18568910 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022899948 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2022899948 Country of ref document: EP Effective date: 20231226 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |