WO2023178906A1 - Procédé et appareil de détection de vivacité, et dispositif électronique, support de stockage, programme informatique et produit-programme informatique - Google Patents

Procédé et appareil de détection de vivacité, et dispositif électronique, support de stockage, programme informatique et produit-programme informatique Download PDF

Info

Publication number
WO2023178906A1
WO2023178906A1 PCT/CN2022/110261 CN2022110261W WO2023178906A1 WO 2023178906 A1 WO2023178906 A1 WO 2023178906A1 CN 2022110261 W CN2022110261 W CN 2022110261W WO 2023178906 A1 WO2023178906 A1 WO 2023178906A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature map
facial feature
face
living body
attention
Prior art date
Application number
PCT/CN2022/110261
Other languages
English (en)
Chinese (zh)
Inventor
王柏润
刘建博
张帅
伊帅
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023178906A1 publication Critical patent/WO2023178906A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present disclosure relates to but is not limited to the field of computer vision technology, and in particular, to a living body detection method and device, electronic equipment, storage media, computer programs, and computer program products.
  • Computer vision is a hot topic in current research. It is a synthesis of image processing, artificial intelligence, pattern recognition and other technologies. It has also been widely used in various fields of society.
  • face recognition is always inseparable, and a key step in face recognition is liveness detection.
  • Common liveness algorithms can be divided into interactive liveness algorithms and silent liveness algorithms according to the form of liveness detection. , according to the type of camera module, it can be divided into monocular in vivo algorithm, binocular in vivo algorithm and three-dimensional (3-Dimensional, 3D) in vivo algorithm.
  • Current living body detection algorithms often appear in the form of a single model, but in some scenarios, the capacity of a single model is often difficult to achieve the accuracy of live body detection.
  • Embodiments of the present disclosure provide a living body detection method and device, electronic equipment, storage media, computer programs, and computer program products, which are beneficial to improving the accuracy of binocular living body detection.
  • Embodiments of the present disclosure provide a living body detection method, which method includes:
  • the second facial feature map obtain the target category attribute information of the detection object
  • the third face feature map is obtained
  • the liveness detection result of the detection object is obtained.
  • the embodiment of the present disclosure obtains the infrared face image and the color face image of the detection object collected by the binocular camera; performs feature extraction on the infrared face image to obtain the first face feature map, and performs feature extraction on the color face image.
  • the second face feature map extracted from the color face image is classified to obtain the category attribute information of the detection object (ie, the target category attribute information), and the second face feature map is converted into the third face feature map based on the category attribute information of the detection object.
  • Embodiments of the present disclosure provide a living body detection device, which includes an acquisition unit and a processing unit;
  • An acquisition unit configured to acquire infrared face images and color face images of the detection object collected by the binocular camera
  • the processing unit is configured to perform feature extraction on the infrared face image to obtain a first face feature map, and perform feature extraction on the color face image to obtain a second face feature map;
  • the processing unit is also configured to obtain target category attribute information of the detection object based on the second facial feature map;
  • the processing unit is also configured to obtain a third facial feature map based on the target category attribute information and the second facial feature map;
  • the processing unit is also configured to obtain a liveness detection result of the detection object based on the first facial feature map and the third facial feature map.
  • An embodiment of the present disclosure provides an electronic device.
  • the electronic device includes: a processor, the processor is connected to a memory, the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory. So that the electronic device performs the method of life detection.
  • Embodiments of the present disclosure provide a computer-readable storage medium that stores a computer program, and the computer program causes the computer to perform the method of life detection.
  • Embodiments of the present disclosure provide a computer program that includes computer readable code.
  • the computer readable code is read and executed by a computer, part of the method in any embodiment of the present disclosure is implemented or All steps.
  • Embodiments of the present disclosure provide a computer program product, which includes a non-transitory computer-readable storage medium storing a computer program, and the computer is operable to cause the computer to perform the method of life detection.
  • Figure 1 is a schematic diagram of an application environment provided by an embodiment of the present disclosure
  • Figure 2A is a schematic flow chart of a living body detection method provided by an embodiment of the present disclosure
  • Figure 2B is a schematic flowchart of a method for determining an attention matrix provided by an embodiment of the present disclosure
  • Figure 2C is a schematic flow chart of a living body detection method provided by an embodiment of the present disclosure.
  • Figure 3 is a schematic network structure diagram of a living body detection model provided by an embodiment of the present disclosure.
  • Figure 4 is a schematic diagram of selecting a third branch provided by an embodiment of the present disclosure.
  • Figure 5 is a schematic diagram of multiple pixels corresponding to a certain feature provided by an embodiment of the present disclosure.
  • Figure 6 is a schematic flow chart of another living body detection method provided by an embodiment of the present disclosure.
  • Figure 7 is a schematic structural diagram of a living body detection device provided by an embodiment of the present disclosure.
  • FIG. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • Figure 1 is a schematic diagram of an application environment provided by an embodiment of the present disclosure.
  • the application environment at least includes a binocular camera 101 and an electronic device 102.
  • the binocular camera 101 and the electronic device 102 connected via wired or wireless network.
  • the binocular camera 101 includes a visible light camera module 1011 and an infrared camera module 1012.
  • the visible light camera module 1011 and the infrared camera module 1012 are used to synchronously collect images of the detection object when the detection object enters the image collection range. , obtain color images and infrared images respectively, and store the color images and infrared images in the face recognition system or directly send them to the electronic device 102.
  • the electronic device 102 receives or matches the color images and infrared images from the system. Next, face detection is performed on it, and a color face image is intercepted from the color image and an infrared face image is intercepted from the infrared image based on the position information of the face detection frame.
  • the electronic device 102 calls the living body that supports multi-category attribute information.
  • the detection model performs live body detection on color face images and infrared face images. Since the live body detection model uses branches corresponding to each category attribute information for feature extraction, the extracted live body features have unique category attribute information, so that it can Improve the accuracy of living body classification, thereby improving the accuracy of living body detection.
  • the electronic device 102 may be an independent physical server or a server cluster, or may provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, as well as cloud servers for basic cloud computing services such as big data and artificial intelligence platforms, etc.
  • the living body detection method can be implemented by the processor calling computer-readable instructions stored in the memory.
  • Figure 2A is a schematic flow chart of a living body detection method provided by an embodiment of the present disclosure. The method can be implemented based on the application environment shown in Figure 1 and applied to electronic devices. As shown in Figure 2A, the method includes Steps 201 to 205:
  • the electronic device can obtain in real time the infrared image and color image of the detection object synchronously collected by the binocular camera, and can also obtain the infrared image and color image of the detection object synchronously collected by the binocular camera from the face recognition system. Images are not limited here. For example, when the electronic device acquires an infrared image and a color image, it intercepts the infrared face image and the color face image respectively from the two images based on the detection frame generated by the face detection algorithm.
  • obtaining the infrared face image and color face image of the detection object collected by the binocular camera includes:
  • A1 Select the one with the highest face quality as the target color image from at least two color images of the detection object stored in the face recognition system, where the at least two color images are generated by the visible light camera module in the binocular camera It is obtained by continuously collecting the detection objects.
  • the electronic device extracts features from the faces in at least two color images through a pre-trained face quality detection model to obtain features containing face size, angle, and sharpness information, and then performs classification prediction on the features to obtain Face quality detection scores for each of at least two color images, and the one with the highest score is selected as the target color image.
  • A2 Perform face quality detection on the faces in at least two infrared images stored in the face recognition system, obtain the face quality detection score of each infrared image in the at least two infrared images, and calculate the face quality detection The difference between the score and the face quality detection score of the target color image.
  • the electronic device also extracts features including face size, angle, and sharpness information through the face quality detection model, and then classifies the features to obtain the face quality detection score of each infrared image.
  • A3 Among at least two infrared images, the one with the smallest difference between the face quality detection score and the target color image is used as the candidate infrared image.
  • A4 Detect face key points on the target color image and candidate infrared image respectively, and obtain 106 first key points including human eyes, cheekbones, nose, ears, chin and cheek areas in the target color image and 106 first key points in the candidate infrared image. Includes 106 second key points of human eyes, cheekbones, nose, ears, chin and cheek areas.
  • A5 Calculate the similarity between the 106 first key points and the 106 second key points. If the similarity is less than the preset threshold, the target color image and the candidate infrared image are determined to be the detection objects of the binocular camera at the same time. Based on the image pairs collected, based on the detection frames in the target color image and the candidate infrared image when detecting facial key points, the face area image is intercepted from the target color image and the candidate infrared image respectively, and the infrared face of the detection object is obtained. images and color face images.
  • the electronic device when the electronic device needs to obtain an image from the face recognition system, the electronic device may not know which two color images and infrared images were collected from the detection object at the same time. For any detection object, First, select the one with the highest face quality from at least two color images, and then select the one with the face quality detection score closest to the color image from all infrared images of the face recognition system as the candidate infrared image, and then select 106 facial key points to perform key point matching on the two images. If the similarity between the key points is less than the preset threshold, it is considered that the candidate infrared image and the target color image are detected at the same time. This will help improve the accuracy of image matching in scenarios where electronic devices need to obtain images from the face recognition system.
  • the living body detection model includes a first branch (303), a second branch (304), a category attribute classifier (305), at least two The third branch (306) and the living body detection classifier (307), wherein the first branch (303) is used to extract features from the input infrared face image (301) to obtain the first face feature map, and the second branch (304) is used to extract features from the input color face image (302) to obtain a second face feature map, in which the first face feature map and the second face feature map cover important areas on the face.
  • the first branch (303) is used to extract features from the input infrared face image (301) to obtain the first face feature map
  • the second branch (304) is used to extract features from the input color face image (302) to obtain a second face feature map, in which the first face feature map and the second face feature map cover important areas on the face.
  • the semantic information can be one or at least two of material, texture and gloss.
  • both the first branch and the second branch can use at least two Inception structures in series for feature extraction.
  • the Inception structure uses convolution kernels of different sizes, which means different sizes of receptive fields, that is, different scales are achieved. The fusion of features, therefore, the first face feature map and the second face feature map have richer semantic information.
  • the living body detection model further includes a category attribute classifier, at least two third branches and a living body detection classifier, wherein the second branch, the attribute classifier and the at least two third branches are connected in sequence, At least two third branches are independent of each other, and each of the at least two third branches corresponds to different category attribute information.
  • the output of each third branch is spliced with the output of the first branch respectively, and the splicing The final output is used as the input of the living body detection classifier.
  • the corresponding third branch can be used for inference in the same living body detection model. Compared with detection objects with different categories of attribute information, different living body detection models need to be adopted.
  • the solution for live detection is helpful to save the memory overhead caused by storing at least two models and is more robust.
  • Each third branch in the model can be migrated using the parameters of the existing model, which can be achieved during the training phase. Efficient iteration, while only adding a category attribute classifier after the second branch has a negligible impact on the inference speed of the entire model.
  • performing feature extraction on the infrared face image to obtain the first face feature map includes: inputting the infrared face image into the first branch of the living body detection model for feature extraction to obtain the first face feature map.
  • Feature map performing feature extraction on the color face image to obtain the second face feature map, including: inputting the color face image into the second branch of the living body detection model for feature extraction to obtain the second face feature map.
  • different neural network branches are used to extract features of infrared face images and color face images respectively. Since the first branch is trained using the supervision information of infrared face sample images, the second branch is trained using color face images. If the supervision information of face sample images is trained, then the infrared face images and color face images are input into their corresponding branches for feature extraction, which is beneficial to extracting features with richer semantic information.
  • the second facial feature map is input into an attribute classifier, so that one or at least two types of semantic information are classified and predicted by the attribute classifier to obtain target category attribute information.
  • the target category attribute information can be gender, age group, location identification (such as belonging to the first region), etc.
  • the attribute classifier uses category attribute information as supervision during training. Therefore, it is based on a classifier that contains rich semantics.
  • the second feature map of the information can predict the target category attribute information of the detection object, such as which country the detection object is from and which age group the detection object belongs to.
  • the features in the second face feature map include one or at least two semantic information of material, texture and gloss.
  • the target category attribute of the detection object is obtained.
  • Information includes: inputting the second face feature map into an attribute classifier, and classifying and predicting one or at least two types of semantic information through the attribute classifier to obtain target category attribute information, where the target category attribute information includes a location identifier; according to The target category attribute information and the second facial feature map are used to obtain the third facial feature map, which includes: determining the third branch corresponding to the location identification from at least two third branches, and inputting the second facial feature map The third branch corresponding to the location identification performs feature extraction to obtain a third face feature map.
  • the second facial feature map is classified by the attribute classifier in the living body detection model to obtain the target category attribute information of the detection object (such as the identification of the location), and then from at least two third branches Determine the third branch corresponding to the attribute information of the target category, and use the third branch to extract features from the second face feature map, so that the third face feature map can carry features unique to the attribute information of the category, thereby enabling Relatively improve the accuracy of living body detection.
  • the electronic device after the electronic device obtains the target category attribute information of the detection object, it can determine the third branch corresponding to the target category attribute information from at least two third branches, as shown in Figure 4. If If the identification of the location of the detection object is the first identification (401), it can be determined from at least two third branches such as the first identification branch (402), the second identification branch (403), and the third identification branch (404). In the first identification branch (402), the second facial feature map is input into the first identification branch (402) for feature extraction to obtain a third facial feature map (405). Optionally, at least two third branches can also use at least two consecutive Inception structures for feature extraction. Each third branch uses unique category attribute information as supervision during the training process. Therefore, the third face feature The graph carries characteristics unique to this category of attribute information, which can relatively improve the accuracy of living body detection.
  • the living body detection result of the detection object is obtained, including:
  • B1 Splice the first facial feature map and the third facial feature map to obtain the fourth facial feature map
  • B4 Classify the first weighted feature map to obtain the live detection result of the detection object.
  • an attention model may be first used to generate attention coefficients for each facial feature in the fourth facial feature map, and a matrix composed of attention coefficients may be determined as the first attention matrix.
  • the attention model can be any existing attention model. It should be understood that the attention model can predict which target the human eye will pay more attention to when viewing a certain image, that is, it can be calculated through the characteristics of the image. Attention coefficient of features.
  • the electronic device performs key point detection on the color face image, and obtains M key points of the preset area of interest and the coordinate information and category information of the M key points.
  • the preset attention areas refer to the eye area, cheekbone area, nose area, ear area, chin area and cheek area.
  • the M key points refer to the 106 key points in step A4, and M is greater than 1. integer.
  • the electronic device calculates the second attention matrix based on the fourth facial feature map and the M key points, and adds the elements at corresponding positions of the first attention matrix and the second attention matrix to obtain the attention matrix.
  • determining the second attention matrix based on the fourth facial feature map and M key points may include the following steps 211 to 215:
  • each facial feature in the fourth facial feature map For the location of each facial feature in the fourth facial feature map, determine at least two pixels corresponding to the location of each facial feature in the color face image, and obtain at least two pixels. coordinate information.
  • the fourth facial feature map is obtained by splicing the first facial feature map and the third facial feature map, the original image is convolved and pooled based on the deep learning structure (Inception) structure.
  • the features at this position are calculated through the features of at least two pixels (502), such as the 9 pixels (502) in the black rectangular box in Figure 5.
  • the at least two pixels can be obtained (502)
  • the coordinate information in the color face image (501) for example, the coordinate information of a certain pixel point a is (x4, y1).
  • the distance between the pixel point a and the key point b is From this, the distance between each of the at least two pixel points and each of the M key points can be calculated.
  • the embodiments of the present disclosure pre-set weights for M key points.
  • each key point can correspond to a different weight, such as key points in the eye area, according to the distance from the closest to the center of the eyeball.
  • the weights show a decreasing trend, that is, more attention is paid to key points in the eye area that are closer to the center of the eyeball.
  • Other areas can adopt the same or similar weight setting method with reference to the eye area, then the weights of M key points It can be ⁇ 1 , ⁇ 2 , ⁇ 3 ,,,, ⁇ M .
  • pixel point a among at least two pixel points if it is related to one of the M key points (for example: key point b) If the distance between them is less than the preset distance threshold, then the weight of the key point b is assigned to the pixel a. If the distance between it and the key point b is greater than or equal to the preset distance threshold, then the weight of the pixel a is assigned. 0.
  • the same weight can also be set for key points of the same category of information. For example, the weights of n key points in the eye area can be set to ⁇ 1 , and the weights of o key points in the nose area can be set to ⁇ 1 .
  • each pixel will be assigned M weights, and the M weights will be used as the reference weight of each pixel.
  • each facial feature in the fourth facial feature map corresponds to at least two pixels in the color face image
  • each facial feature can be calculated based on the weight of each pixel in step 214
  • the average of the weights of at least two pixels can be used as the weight of the corresponding facial feature in the fourth face feature map.
  • the mode of the weights of at least two pixels can also be used as the fourth face.
  • the weight of the corresponding facial feature in the feature map is used to determine the matrix composed of the weight of each facial feature as the second attention matrix.
  • the features in the fourth facial feature map are multiplied by the elements at corresponding positions in the attention matrix to obtain the first weighted feature map.
  • the features in the first weighted feature map can better express the detection object. Semantic information of facial focus areas.
  • the attention model is used to generate the first attention matrix, and then the second attention matrix is constructed for the fourth facial feature map based on key point detection and weight distribution. Since the attention model generates attention coefficients It is possible that a small amount of information about the key areas of focus may be missed, and assigning weights to the features in the fourth face feature map based on preset weights can make up for the possible omissions of the attention model, so that the focus of the face can be All areas can receive attention, which enables the obtained first weighted feature map to fully express the semantic information of the key areas of interest, which in turn helps improve the accuracy of living body classification.
  • obtaining the living body detection result of the detection object based on the first facial feature map and the third facial feature map includes the following steps 221 to 226:
  • an attention model is used to generate the attention coefficient of each facial feature in the first facial feature map, and the matrix composed of the attention coefficients is determined as
  • the third attention matrix performs key point detection on the infrared face image, and also obtains N key points (such as 106 key points) in the preset area of interest and the coordinate information and category information of the N key points.
  • the position of each facial feature in the facial feature map is determined, and at least two pixel points corresponding to the position of each facial feature in the infrared facial image are obtained, and the coordinate information of the at least two pixel points is obtained, for at least For each pixel in the two pixels, the coordinate information of each pixel and the coordinate information of N key points are used to calculate the distance between each pixel and each of the N key points. Based on the The distance between each pixel and each of the N key points and the category information of each key point are assigned a weight to each pixel to obtain N reference weights for each pixel.
  • the average of the reference weights is determined as the weight of each pixel point, the average or mode of the weights of the at least two pixel points is determined as the weight of each facial feature in the first facial feature map, and the The matrix composed of the weight of each facial feature is determined as the fourth attention matrix.
  • the third attention matrix and the fourth attention matrix are added to obtain the attention matrix E.
  • an attention model is first used to generate the attention coefficient of each facial feature in the third facial feature map, the matrix composed of the attention coefficients is determined as the fifth attention matrix, and the color face image is keyed
  • Point detection also obtains S key points (such as 106 key points) in the preset area of interest and the coordinate information and category information of the S key points.
  • S key points such as 106 key points
  • For the location of each facial feature in the third facial feature map determine at least two pixels corresponding to the location of each facial feature in the color face image, and obtain the coordinate information of the at least two pixels.
  • For each pixel in the at least two pixels use each The coordinate information of each pixel point and the coordinate information of S key points are calculated.
  • the distance between each pixel point and each of the S key points is calculated based on the distance between each pixel point and each of the S key points.
  • the distance between points and the category information of each key point are assigned a weight to each pixel point to obtain S reference weights for each pixel point, and the average of the S reference weights is determined for each pixel point.
  • the weight of the at least two pixels is determined as the weight of each facial feature in the third facial feature map, and the matrix composed of the weight of each facial feature is determined as the sixth Attention matrix, add the fifth attention matrix and the sixth attention matrix to obtain the attention matrix F.
  • another feature splicing method is used, that is, the attention model, key point detection and weight distribution are used to generate the attention matrix E for the first face feature map, and then the first face feature map and the attention matrix
  • E the attention matrix
  • F the attention matrix
  • the attention model, key point detection and weight distribution to generate the attention matrix F for the third face feature map, and combine the third face feature map with Multiplying the face feature map and the attention matrix F can also obtain the third weighted feature map that can fully express the semantic information of the key area of interest
  • splicing the second weighted feature map and the third weighted feature map fully integrates the color facial feature map.
  • the semantic information of key areas of interest in face images and the semantic information of key areas of interest in infrared face images are also conducive to improving the accuracy of living body classification.
  • the embodiment of the present disclosure obtains the infrared face image and the color face image of the detection object collected by the binocular camera; performs feature extraction on the infrared face image to obtain the first face feature map, and performs feature extraction on the color face image.
  • the second face feature map extracted from the color face image is classified to obtain the category attribute information of the detection object (ie, the target category attribute information), and the second face feature map is converted into the third face feature map based on the category attribute information of the detection object.
  • embodiments of the present disclosure can use the corresponding third branch in the same living body detection model to perform inference.
  • the living body detection solution helps save the memory overhead caused by storing at least two models and is more robust.
  • Each third branch in the model can be migrated using the parameters of the existing model, which can achieve high efficiency in the training phase. Iteration, meanwhile, only adds a category attribute classifier after the second branch, which has a negligible impact on the inference speed of the entire model.
  • Figure 6 is a schematic flow chart of another living body detection method provided by an embodiment of the present disclosure. As shown in Figure 6, the method includes steps 601 to 608:
  • 601 Obtain the infrared face image and color face image of the detection object collected by the binocular camera;
  • 602 Perform feature extraction on the infrared face image to obtain the first face feature map, and perform feature extraction on the color face image to obtain the second face feature map;
  • 605 Splice the first facial feature map and the third facial feature map to obtain the fourth facial feature map
  • 608 Classify the first weighted feature map to obtain the live detection result of the detection object.
  • steps 601 to 608 has been described in the embodiments shown in FIGS. 2A to 5 , and can achieve the same or similar beneficial effects.
  • Figure 7 is a schematic structural diagram of a life detection device provided by an embodiment of the present disclosure. As shown in Figure 7, the device includes an acquisition unit 701 and Processing unit 702, wherein:
  • the acquisition unit 701 is configured to acquire the infrared face image and color face image of the detection object collected by the binocular camera;
  • the processing unit 702 is configured to perform feature extraction on the infrared face image to obtain a first face feature map, and perform feature extraction on the color face image to obtain a second face feature map;
  • the processing unit 702 is also configured to obtain the target category attribute information of the detection object based on the second facial feature map;
  • the processing unit 702 is also configured to obtain a third facial feature map based on the target category attribute information and the second facial feature map;
  • the processing unit 702 is also configured to obtain the living body detection result of the detection object based on the first facial feature map and the third facial feature map.
  • the living body detection device shown in Figure 7 by acquiring the infrared face image and the color face image of the detection object collected by the binocular camera; performing feature extraction on the infrared face image, the first person is obtained facial feature map, and perform feature extraction on the color face image to obtain a second face feature map; obtain the target category attribute information of the detection object according to the second face feature map; and obtain the target category attribute information according to the target category Attribute information and the second facial feature map are used to obtain a third facial feature map; based on the first facial feature map and the third facial feature map, a living body detection result of the detection object is obtained.
  • the second face feature map extracted from the color face image is classified to obtain the category attribute information of the detection object (ie, the target category attribute information), and the second face feature map is converted into the third face feature map based on the category attribute information of the detection object.
  • the processing unit 702 is configured as:
  • the processing unit 702 is configured as:
  • the attention model is used to generate the attention coefficient of each facial feature in the fourth facial feature map, and the matrix composed of the attention coefficients is determined as the first attention matrix;
  • the M key points also include coordinate information and category information of each key point, and then based on the fourth facial feature map and the M key points, the second attention matrix is determined, and the processing unit 702 is configured as:
  • each facial feature in the fourth facial feature map For the location of each facial feature in the fourth facial feature map, determine at least two pixels corresponding to the location of each facial feature in the color face image, and obtain the coordinate information of at least two pixels. ;
  • each pixel point among at least two pixel points use the coordinate information of each pixel point and the coordinate information of M key points to calculate the distance between each pixel point and each of the M key points;
  • a weight is assigned to each pixel to obtain M reference weights for each pixel;
  • the weight of each facial feature is determined, and a matrix composed of the weight of each facial feature is determined as the second attention matrix.
  • the processing unit 702 when performing feature extraction on infrared face images to obtain the first face feature map, is configured as:
  • the processing unit 702 is configured as:
  • the color face image is input into the second branch of the live body detection model for feature extraction to obtain the second face feature map.
  • the living body detection model further includes a category attribute classifier, at least two third branches and a living body detection classifier, wherein the second branch, the attribute classifier and the at least two third branches are connected in sequence, At least two third branches are independent of each other, and each of the at least two third branches corresponds to different category attribute information.
  • the output of each third branch is spliced with the output of the first branch respectively, and the splicing The final output is used as the input of the living body detection classifier.
  • the features in the second facial feature map include one or at least two semantic information of material, texture and gloss, and the target category of the detection object is obtained based on the second facial feature map.
  • the processing unit 702 is configured as:
  • the processing unit 702 is configured as:
  • a third branch corresponding to the location identification is determined from at least two third branches, and the second face feature map is input into the third branch corresponding to the location identification for feature extraction to obtain a third face feature map.
  • each unit in the life detection device shown in FIG. 7 can be separately or entirely combined into one or several other units to form, or one (some) of the units can be further Splitting it into at least two functionally smaller units can achieve the same operation without affecting the realization of the technical effects of the embodiments of the present disclosure.
  • the above-mentioned units are divided based on logical functions.
  • the function of one unit can also be realized by at least two units, or the functions of at least two units can be realized by one unit.
  • the living body detection device may also include other units.
  • these functions may also be implemented with the assistance of other units, and may be implemented by at least two units in cooperation.
  • the system can be configured by including a central processing unit (Central Processing Unit, CPU), a random access storage medium (Random Access Memory, RAM), a read-only storage medium (Read-Only Memory, ROM), etc.
  • a computer program (including program code) capable of executing each step involved in the corresponding method as shown in Figure 2A or Figure 6 is run on a general-purpose computing device such as a computer with processing elements and storage elements to construct the method as shown in Figure 7
  • the living body detection device shown in the figure, and the living body detection method of the embodiment of the present disclosure are implemented.
  • the computer program may be recorded on, for example, a computer-readable recording medium, loaded into the above-mentioned computing device through the computer-readable recording medium, and run therein.
  • the electronic device includes a transceiver 801, a processor 802, and a memory 803. They are connected via bus 804.
  • the memory 803 is used to store computer programs and data, and can transmit the data stored in the memory 803 to the processor 802.
  • the processor 802 is used to read the computer program in the memory 803 to perform the following operations:
  • the second facial feature map obtain the target category attribute information of the detection object
  • the third face feature map is obtained
  • the liveness detection result of the detection object is obtained.
  • the second face feature map extracted from the color face image is classified to obtain the category attribute information of the detection object (ie, the target category attribute information), and the second face feature map is converted into the third face feature map based on the category attribute information of the detection object.
  • the processor 802 performs the following steps to obtain the living body detection result of the detection object based on the first facial feature map and the third facial feature map, including:
  • the processor 802 executes to obtain the degree of attention of each facial feature in the fourth facial feature map, and obtains the degree of attention matrix, which includes:
  • the attention model is used to generate the attention coefficient of each facial feature in the fourth facial feature map, and the matrix composed of the attention coefficients is determined as the first attention matrix;
  • the M key points also include coordinate information and category information of each key point.
  • the processor 802 determines the second attention matrix based on the fourth facial feature map and the M key points, include:
  • each facial feature in the fourth facial feature map For the location of each facial feature in the fourth facial feature map, determine at least two pixels corresponding to the location of each facial feature in the color face image, and obtain the coordinate information of at least two pixels. ;
  • each pixel point among at least two pixel points use the coordinate information of each pixel point and the coordinate information of M key points to calculate the distance between each pixel point and each of the M key points;
  • a weight is assigned to each pixel to obtain M reference weights for each pixel;
  • the weight of each facial feature is determined, and a matrix composed of the weight of each facial feature is determined as the second attention matrix.
  • the processor 802 performs feature extraction on the infrared face image to obtain the first face feature map, including: inputting the infrared face image into the first branch of the living body detection model for feature extraction, and obtaining The first facial feature map;
  • the processor 802 performs feature extraction on the color face image to obtain a second face feature map, including:
  • the color face image is input into the second branch of the live body detection model for feature extraction to obtain the second face feature map.
  • the living body detection model further includes a category attribute classifier, at least two third branches and a living body detection classifier, wherein the second branch, the attribute classifier and the at least two third branches are connected in sequence, At least two third branches are independent of each other, and each of the at least two third branches corresponds to different category attribute information.
  • the output of each third branch is spliced with the output of the first branch respectively, and the splicing The final output is used as the input of the living body detection classifier.
  • the features in the second face feature map include one or at least two semantic information of material, texture and gloss.
  • the processor 802 executes to obtain the detection object according to the second face feature map.
  • Target category attribute information including:
  • the third face feature map is obtained, including:
  • a third branch corresponding to the location identification is determined from at least two third branches, and the second face feature map is input into the third branch corresponding to the location identification for feature extraction to obtain a third face feature map.
  • the electronic device may include but is not limited to a transceiver 801, a processor 802, and a memory 803.
  • a transceiver 801 a transceiver 801
  • a processor 802 a processor 802
  • the schematic diagram is only an example of an electronic device and does not constitute a limitation on the electronic device. May include more or fewer parts than shown, or combinations of certain parts, or different parts.
  • the processor 802 of the electronic device implements the steps in the life detection method of the embodiment of the present disclosure when executing the computer program, the embodiments of the life detection method are all applicable to the electronic device, and can achieve the same or better results. Similar beneficial effects.
  • Embodiments of the present disclosure also provide a computer-readable storage medium that stores a computer program, and the computer program is executed by a processor to implement any of the living body detection methods described in the above method embodiments. some or all of the steps.
  • the computer-readable storage medium may only store the computer program corresponding to the living body detection method.
  • a computer-readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device, and may be a volatile storage medium or a non-volatile storage medium.
  • the computer-readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the above.
  • Non-exhaustive list of computer-readable storage media include: portable computer disks, hard drives, magnetic disks, optical disks, random access memory, read-only memory, Erasable Programmable Read-Only Memory Read Only Memory, EPROM) or flash memory, Static Random-Access Memory (Static Random-Access Memory, SRAM), Portable Compact Disk Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM), Digital Multi-Function Disk (Digital Video Disc, DVD), memory stick, floppy disk, mechanical encoding device, such as a punched card or a raised structure in a groove with instructions stored thereon, and any suitable combination of the above.
  • computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses through fiber optic cables), or through electrical wires. transmitted electrical signals.
  • Embodiments of the present disclosure also provide a computer program.
  • the computer program includes computer readable code.
  • the computer readable code is read and executed by a computer, part of the method in any embodiment of the present disclosure is implemented. or all steps.
  • Embodiments of the present disclosure also provide a computer program product.
  • the computer program product includes a non-transitory computer-readable storage medium storing a computer program.
  • the computer program is operable to cause the computer to perform the steps described in the above method embodiments. Some or all steps of any living body detection method.
  • the disclosed device can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • at least two units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical or other forms.
  • the units described as separate components may or may not be physically separated.
  • the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to at least two network units. . Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above integrated units can be implemented in the form of hardware or software program modules.
  • the integrated unit if implemented in the form of a software program module and sold or used as an independent product, may be stored in a computer-readable memory.
  • the technical solution of the present disclosure is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory, It includes several instructions to cause a computer device (which can be a personal computer, a server or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

L'invention concerne un procédé et un appareil de détection de vivacité, ainsi qu'un dispositif électronique, un support de stockage, un programme informatique et un produit-programme informatique. Le procédé consiste à : acquérir une image faciale infrarouge et une image faciale couleur, collectées par une caméra binoculaire, d'un objet soumis à une détection (201) ; effectuer une extraction de caractéristiques sur l'image faciale infrarouge afin d'obtenir une première carte de caractéristiques faciales, puis effectuer une extraction de caractéristiques sur l'image faciale couleur afin d'obtenir une seconde carte de caractéristiques faciales (202) ; obtenir des informations d'attribut de catégorie cible dudit objet d'après la deuxième carte de caractéristiques faciales (203) ; obtenir une troisième carte de caractéristiques faciales d'après les informations d'attribut de catégorie cible et la deuxième carte de caractéristiques faciales (204) ; et obtenir un résultat de détection de vivacité dudit objet d'après la première carte de caractéristiques faciales et la troisième carte de caractéristiques faciales (205).
PCT/CN2022/110261 2022-03-22 2022-08-04 Procédé et appareil de détection de vivacité, et dispositif électronique, support de stockage, programme informatique et produit-programme informatique WO2023178906A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210283792.2 2022-03-22
CN202210283792.2A CN114677730A (zh) 2022-03-22 2022-03-22 活体检测方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2023178906A1 true WO2023178906A1 (fr) 2023-09-28

Family

ID=82074801

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/110261 WO2023178906A1 (fr) 2022-03-22 2022-08-04 Procédé et appareil de détection de vivacité, et dispositif électronique, support de stockage, programme informatique et produit-programme informatique

Country Status (2)

Country Link
CN (1) CN114677730A (fr)
WO (1) WO2023178906A1 (fr)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114677730A (zh) * 2022-03-22 2022-06-28 北京市商汤科技开发有限公司 活体检测方法、装置、电子设备及存储介质
CN116363762A (zh) * 2022-12-23 2023-06-30 北京百度网讯科技有限公司 活体检测方法、深度学习模型的训练方法及装置
CN116453194B (zh) * 2023-04-21 2024-04-12 无锡车联天下信息技术有限公司 一种人脸属性判别方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975908A (zh) * 2016-04-26 2016-09-28 汉柏科技有限公司 人脸识别方法及装置
WO2020134858A1 (fr) * 2018-12-29 2020-07-02 北京市商汤科技开发有限公司 Procédé et appareil de reconnaissance d'attributs de visage, dispositif électronique et support d'informations
CN111401134A (zh) * 2020-02-19 2020-07-10 北京三快在线科技有限公司 活体检测方法、装置、电子设备及存储介质
CN112818722A (zh) * 2019-11-15 2021-05-18 上海大学 模块化动态可配置的活体人脸识别系统
CN113449623A (zh) * 2021-06-21 2021-09-28 浙江康旭科技有限公司 一种基于深度学习的轻型活体检测方法
CN114677730A (zh) * 2022-03-22 2022-06-28 北京市商汤科技开发有限公司 活体检测方法、装置、电子设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975908A (zh) * 2016-04-26 2016-09-28 汉柏科技有限公司 人脸识别方法及装置
WO2020134858A1 (fr) * 2018-12-29 2020-07-02 北京市商汤科技开发有限公司 Procédé et appareil de reconnaissance d'attributs de visage, dispositif électronique et support d'informations
CN112818722A (zh) * 2019-11-15 2021-05-18 上海大学 模块化动态可配置的活体人脸识别系统
CN111401134A (zh) * 2020-02-19 2020-07-10 北京三快在线科技有限公司 活体检测方法、装置、电子设备及存储介质
CN113449623A (zh) * 2021-06-21 2021-09-28 浙江康旭科技有限公司 一种基于深度学习的轻型活体检测方法
CN114677730A (zh) * 2022-03-22 2022-06-28 北京市商汤科技开发有限公司 活体检测方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN114677730A (zh) 2022-06-28

Similar Documents

Publication Publication Date Title
WO2023178906A1 (fr) Procédé et appareil de détection de vivacité, et dispositif électronique, support de stockage, programme informatique et produit-programme informatique
CN111062871B (zh) 一种图像处理方法、装置、计算机设备及可读存储介质
EP3968179A1 (fr) Procédé et appareil de reconnaissance de lieu, procédé et appareil d'apprentissage de modèle pour la reconnaissance de lieu et dispositif électronique
WO2023098128A1 (fr) Procédé et appareil de détection de corps vivant, et procédé et appareil d'apprentissage pour système de détection de corps vivant
EP4137991A1 (fr) Procédé et dispositif de réidentification de piéton
CN110503076B (zh) 基于人工智能的视频分类方法、装置、设备和介质
WO2020024484A1 (fr) Procédé et dispositif de production de données
CN111950723A (zh) 神经网络模型训练方法、图像处理方法、装置及终端设备
WO2018196718A1 (fr) Procédé et dispositif de désambiguïsation d'image, support de stockage et dispositif électronique
CN112446322B (zh) 眼球特征检测方法、装置、设备及计算机可读存储介质
US20200285859A1 (en) Video summary generation method and apparatus, electronic device, and computer storage medium
CN111062328B (zh) 一种图像处理方法、装置及智能机器人
CN112188306B (zh) 一种标签生成方法、装置、设备及存储介质
WO2023173646A1 (fr) Procédé et appareil de reconnaissance d'expression
CN112036284B (zh) 图像处理方法、装置、设备及存储介质
WO2021127916A1 (fr) Procédé de reconnaissance d'émotion faciale, dispositif intelligent et support de stockage lisible par ordinateur
CN111553838A (zh) 模型参数的更新方法、装置、设备及存储介质
CN111274946B (zh) 一种人脸识别方法和系统及设备
CN116152938A (zh) 身份识别模型训练和电子资源转移方法、装置及设备
WO2021134485A1 (fr) Procédé et dispositif de notation de vidéo, support d'enregistrement et dispositif électronique
CN113723164A (zh) 获取边缘差异信息的方法、装置、设备及存储介质
CN112070744A (zh) 一种人脸识别的方法、系统、设备及可读存储介质
CN111259698A (zh) 用于获取图像的方法及装置
CN113486260B (zh) 互动信息的生成方法、装置、计算机设备及存储介质
CN114677620A (zh) 对焦方法、电子设备和计算机可读介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22932958

Country of ref document: EP

Kind code of ref document: A1