WO2023178906A1 - Liveness detection method and apparatus, and electronic device, storage medium, computer program and computer program product - Google Patents

Liveness detection method and apparatus, and electronic device, storage medium, computer program and computer program product Download PDF

Info

Publication number
WO2023178906A1
WO2023178906A1 PCT/CN2022/110261 CN2022110261W WO2023178906A1 WO 2023178906 A1 WO2023178906 A1 WO 2023178906A1 CN 2022110261 W CN2022110261 W CN 2022110261W WO 2023178906 A1 WO2023178906 A1 WO 2023178906A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature map
facial feature
face
living body
attention
Prior art date
Application number
PCT/CN2022/110261
Other languages
French (fr)
Chinese (zh)
Inventor
王柏润
刘建博
张帅
伊帅
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023178906A1 publication Critical patent/WO2023178906A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present disclosure relates to but is not limited to the field of computer vision technology, and in particular, to a living body detection method and device, electronic equipment, storage media, computer programs, and computer program products.
  • Computer vision is a hot topic in current research. It is a synthesis of image processing, artificial intelligence, pattern recognition and other technologies. It has also been widely used in various fields of society.
  • face recognition is always inseparable, and a key step in face recognition is liveness detection.
  • Common liveness algorithms can be divided into interactive liveness algorithms and silent liveness algorithms according to the form of liveness detection. , according to the type of camera module, it can be divided into monocular in vivo algorithm, binocular in vivo algorithm and three-dimensional (3-Dimensional, 3D) in vivo algorithm.
  • Current living body detection algorithms often appear in the form of a single model, but in some scenarios, the capacity of a single model is often difficult to achieve the accuracy of live body detection.
  • Embodiments of the present disclosure provide a living body detection method and device, electronic equipment, storage media, computer programs, and computer program products, which are beneficial to improving the accuracy of binocular living body detection.
  • Embodiments of the present disclosure provide a living body detection method, which method includes:
  • the second facial feature map obtain the target category attribute information of the detection object
  • the third face feature map is obtained
  • the liveness detection result of the detection object is obtained.
  • the embodiment of the present disclosure obtains the infrared face image and the color face image of the detection object collected by the binocular camera; performs feature extraction on the infrared face image to obtain the first face feature map, and performs feature extraction on the color face image.
  • the second face feature map extracted from the color face image is classified to obtain the category attribute information of the detection object (ie, the target category attribute information), and the second face feature map is converted into the third face feature map based on the category attribute information of the detection object.
  • Embodiments of the present disclosure provide a living body detection device, which includes an acquisition unit and a processing unit;
  • An acquisition unit configured to acquire infrared face images and color face images of the detection object collected by the binocular camera
  • the processing unit is configured to perform feature extraction on the infrared face image to obtain a first face feature map, and perform feature extraction on the color face image to obtain a second face feature map;
  • the processing unit is also configured to obtain target category attribute information of the detection object based on the second facial feature map;
  • the processing unit is also configured to obtain a third facial feature map based on the target category attribute information and the second facial feature map;
  • the processing unit is also configured to obtain a liveness detection result of the detection object based on the first facial feature map and the third facial feature map.
  • An embodiment of the present disclosure provides an electronic device.
  • the electronic device includes: a processor, the processor is connected to a memory, the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory. So that the electronic device performs the method of life detection.
  • Embodiments of the present disclosure provide a computer-readable storage medium that stores a computer program, and the computer program causes the computer to perform the method of life detection.
  • Embodiments of the present disclosure provide a computer program that includes computer readable code.
  • the computer readable code is read and executed by a computer, part of the method in any embodiment of the present disclosure is implemented or All steps.
  • Embodiments of the present disclosure provide a computer program product, which includes a non-transitory computer-readable storage medium storing a computer program, and the computer is operable to cause the computer to perform the method of life detection.
  • Figure 1 is a schematic diagram of an application environment provided by an embodiment of the present disclosure
  • Figure 2A is a schematic flow chart of a living body detection method provided by an embodiment of the present disclosure
  • Figure 2B is a schematic flowchart of a method for determining an attention matrix provided by an embodiment of the present disclosure
  • Figure 2C is a schematic flow chart of a living body detection method provided by an embodiment of the present disclosure.
  • Figure 3 is a schematic network structure diagram of a living body detection model provided by an embodiment of the present disclosure.
  • Figure 4 is a schematic diagram of selecting a third branch provided by an embodiment of the present disclosure.
  • Figure 5 is a schematic diagram of multiple pixels corresponding to a certain feature provided by an embodiment of the present disclosure.
  • Figure 6 is a schematic flow chart of another living body detection method provided by an embodiment of the present disclosure.
  • Figure 7 is a schematic structural diagram of a living body detection device provided by an embodiment of the present disclosure.
  • FIG. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • Figure 1 is a schematic diagram of an application environment provided by an embodiment of the present disclosure.
  • the application environment at least includes a binocular camera 101 and an electronic device 102.
  • the binocular camera 101 and the electronic device 102 connected via wired or wireless network.
  • the binocular camera 101 includes a visible light camera module 1011 and an infrared camera module 1012.
  • the visible light camera module 1011 and the infrared camera module 1012 are used to synchronously collect images of the detection object when the detection object enters the image collection range. , obtain color images and infrared images respectively, and store the color images and infrared images in the face recognition system or directly send them to the electronic device 102.
  • the electronic device 102 receives or matches the color images and infrared images from the system. Next, face detection is performed on it, and a color face image is intercepted from the color image and an infrared face image is intercepted from the infrared image based on the position information of the face detection frame.
  • the electronic device 102 calls the living body that supports multi-category attribute information.
  • the detection model performs live body detection on color face images and infrared face images. Since the live body detection model uses branches corresponding to each category attribute information for feature extraction, the extracted live body features have unique category attribute information, so that it can Improve the accuracy of living body classification, thereby improving the accuracy of living body detection.
  • the electronic device 102 may be an independent physical server or a server cluster, or may provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, as well as cloud servers for basic cloud computing services such as big data and artificial intelligence platforms, etc.
  • the living body detection method can be implemented by the processor calling computer-readable instructions stored in the memory.
  • Figure 2A is a schematic flow chart of a living body detection method provided by an embodiment of the present disclosure. The method can be implemented based on the application environment shown in Figure 1 and applied to electronic devices. As shown in Figure 2A, the method includes Steps 201 to 205:
  • the electronic device can obtain in real time the infrared image and color image of the detection object synchronously collected by the binocular camera, and can also obtain the infrared image and color image of the detection object synchronously collected by the binocular camera from the face recognition system. Images are not limited here. For example, when the electronic device acquires an infrared image and a color image, it intercepts the infrared face image and the color face image respectively from the two images based on the detection frame generated by the face detection algorithm.
  • obtaining the infrared face image and color face image of the detection object collected by the binocular camera includes:
  • A1 Select the one with the highest face quality as the target color image from at least two color images of the detection object stored in the face recognition system, where the at least two color images are generated by the visible light camera module in the binocular camera It is obtained by continuously collecting the detection objects.
  • the electronic device extracts features from the faces in at least two color images through a pre-trained face quality detection model to obtain features containing face size, angle, and sharpness information, and then performs classification prediction on the features to obtain Face quality detection scores for each of at least two color images, and the one with the highest score is selected as the target color image.
  • A2 Perform face quality detection on the faces in at least two infrared images stored in the face recognition system, obtain the face quality detection score of each infrared image in the at least two infrared images, and calculate the face quality detection The difference between the score and the face quality detection score of the target color image.
  • the electronic device also extracts features including face size, angle, and sharpness information through the face quality detection model, and then classifies the features to obtain the face quality detection score of each infrared image.
  • A3 Among at least two infrared images, the one with the smallest difference between the face quality detection score and the target color image is used as the candidate infrared image.
  • A4 Detect face key points on the target color image and candidate infrared image respectively, and obtain 106 first key points including human eyes, cheekbones, nose, ears, chin and cheek areas in the target color image and 106 first key points in the candidate infrared image. Includes 106 second key points of human eyes, cheekbones, nose, ears, chin and cheek areas.
  • A5 Calculate the similarity between the 106 first key points and the 106 second key points. If the similarity is less than the preset threshold, the target color image and the candidate infrared image are determined to be the detection objects of the binocular camera at the same time. Based on the image pairs collected, based on the detection frames in the target color image and the candidate infrared image when detecting facial key points, the face area image is intercepted from the target color image and the candidate infrared image respectively, and the infrared face of the detection object is obtained. images and color face images.
  • the electronic device when the electronic device needs to obtain an image from the face recognition system, the electronic device may not know which two color images and infrared images were collected from the detection object at the same time. For any detection object, First, select the one with the highest face quality from at least two color images, and then select the one with the face quality detection score closest to the color image from all infrared images of the face recognition system as the candidate infrared image, and then select 106 facial key points to perform key point matching on the two images. If the similarity between the key points is less than the preset threshold, it is considered that the candidate infrared image and the target color image are detected at the same time. This will help improve the accuracy of image matching in scenarios where electronic devices need to obtain images from the face recognition system.
  • the living body detection model includes a first branch (303), a second branch (304), a category attribute classifier (305), at least two The third branch (306) and the living body detection classifier (307), wherein the first branch (303) is used to extract features from the input infrared face image (301) to obtain the first face feature map, and the second branch (304) is used to extract features from the input color face image (302) to obtain a second face feature map, in which the first face feature map and the second face feature map cover important areas on the face.
  • the first branch (303) is used to extract features from the input infrared face image (301) to obtain the first face feature map
  • the second branch (304) is used to extract features from the input color face image (302) to obtain a second face feature map, in which the first face feature map and the second face feature map cover important areas on the face.
  • the semantic information can be one or at least two of material, texture and gloss.
  • both the first branch and the second branch can use at least two Inception structures in series for feature extraction.
  • the Inception structure uses convolution kernels of different sizes, which means different sizes of receptive fields, that is, different scales are achieved. The fusion of features, therefore, the first face feature map and the second face feature map have richer semantic information.
  • the living body detection model further includes a category attribute classifier, at least two third branches and a living body detection classifier, wherein the second branch, the attribute classifier and the at least two third branches are connected in sequence, At least two third branches are independent of each other, and each of the at least two third branches corresponds to different category attribute information.
  • the output of each third branch is spliced with the output of the first branch respectively, and the splicing The final output is used as the input of the living body detection classifier.
  • the corresponding third branch can be used for inference in the same living body detection model. Compared with detection objects with different categories of attribute information, different living body detection models need to be adopted.
  • the solution for live detection is helpful to save the memory overhead caused by storing at least two models and is more robust.
  • Each third branch in the model can be migrated using the parameters of the existing model, which can be achieved during the training phase. Efficient iteration, while only adding a category attribute classifier after the second branch has a negligible impact on the inference speed of the entire model.
  • performing feature extraction on the infrared face image to obtain the first face feature map includes: inputting the infrared face image into the first branch of the living body detection model for feature extraction to obtain the first face feature map.
  • Feature map performing feature extraction on the color face image to obtain the second face feature map, including: inputting the color face image into the second branch of the living body detection model for feature extraction to obtain the second face feature map.
  • different neural network branches are used to extract features of infrared face images and color face images respectively. Since the first branch is trained using the supervision information of infrared face sample images, the second branch is trained using color face images. If the supervision information of face sample images is trained, then the infrared face images and color face images are input into their corresponding branches for feature extraction, which is beneficial to extracting features with richer semantic information.
  • the second facial feature map is input into an attribute classifier, so that one or at least two types of semantic information are classified and predicted by the attribute classifier to obtain target category attribute information.
  • the target category attribute information can be gender, age group, location identification (such as belonging to the first region), etc.
  • the attribute classifier uses category attribute information as supervision during training. Therefore, it is based on a classifier that contains rich semantics.
  • the second feature map of the information can predict the target category attribute information of the detection object, such as which country the detection object is from and which age group the detection object belongs to.
  • the features in the second face feature map include one or at least two semantic information of material, texture and gloss.
  • the target category attribute of the detection object is obtained.
  • Information includes: inputting the second face feature map into an attribute classifier, and classifying and predicting one or at least two types of semantic information through the attribute classifier to obtain target category attribute information, where the target category attribute information includes a location identifier; according to The target category attribute information and the second facial feature map are used to obtain the third facial feature map, which includes: determining the third branch corresponding to the location identification from at least two third branches, and inputting the second facial feature map The third branch corresponding to the location identification performs feature extraction to obtain a third face feature map.
  • the second facial feature map is classified by the attribute classifier in the living body detection model to obtain the target category attribute information of the detection object (such as the identification of the location), and then from at least two third branches Determine the third branch corresponding to the attribute information of the target category, and use the third branch to extract features from the second face feature map, so that the third face feature map can carry features unique to the attribute information of the category, thereby enabling Relatively improve the accuracy of living body detection.
  • the electronic device after the electronic device obtains the target category attribute information of the detection object, it can determine the third branch corresponding to the target category attribute information from at least two third branches, as shown in Figure 4. If If the identification of the location of the detection object is the first identification (401), it can be determined from at least two third branches such as the first identification branch (402), the second identification branch (403), and the third identification branch (404). In the first identification branch (402), the second facial feature map is input into the first identification branch (402) for feature extraction to obtain a third facial feature map (405). Optionally, at least two third branches can also use at least two consecutive Inception structures for feature extraction. Each third branch uses unique category attribute information as supervision during the training process. Therefore, the third face feature The graph carries characteristics unique to this category of attribute information, which can relatively improve the accuracy of living body detection.
  • the living body detection result of the detection object is obtained, including:
  • B1 Splice the first facial feature map and the third facial feature map to obtain the fourth facial feature map
  • B4 Classify the first weighted feature map to obtain the live detection result of the detection object.
  • an attention model may be first used to generate attention coefficients for each facial feature in the fourth facial feature map, and a matrix composed of attention coefficients may be determined as the first attention matrix.
  • the attention model can be any existing attention model. It should be understood that the attention model can predict which target the human eye will pay more attention to when viewing a certain image, that is, it can be calculated through the characteristics of the image. Attention coefficient of features.
  • the electronic device performs key point detection on the color face image, and obtains M key points of the preset area of interest and the coordinate information and category information of the M key points.
  • the preset attention areas refer to the eye area, cheekbone area, nose area, ear area, chin area and cheek area.
  • the M key points refer to the 106 key points in step A4, and M is greater than 1. integer.
  • the electronic device calculates the second attention matrix based on the fourth facial feature map and the M key points, and adds the elements at corresponding positions of the first attention matrix and the second attention matrix to obtain the attention matrix.
  • determining the second attention matrix based on the fourth facial feature map and M key points may include the following steps 211 to 215:
  • each facial feature in the fourth facial feature map For the location of each facial feature in the fourth facial feature map, determine at least two pixels corresponding to the location of each facial feature in the color face image, and obtain at least two pixels. coordinate information.
  • the fourth facial feature map is obtained by splicing the first facial feature map and the third facial feature map, the original image is convolved and pooled based on the deep learning structure (Inception) structure.
  • the features at this position are calculated through the features of at least two pixels (502), such as the 9 pixels (502) in the black rectangular box in Figure 5.
  • the at least two pixels can be obtained (502)
  • the coordinate information in the color face image (501) for example, the coordinate information of a certain pixel point a is (x4, y1).
  • the distance between the pixel point a and the key point b is From this, the distance between each of the at least two pixel points and each of the M key points can be calculated.
  • the embodiments of the present disclosure pre-set weights for M key points.
  • each key point can correspond to a different weight, such as key points in the eye area, according to the distance from the closest to the center of the eyeball.
  • the weights show a decreasing trend, that is, more attention is paid to key points in the eye area that are closer to the center of the eyeball.
  • Other areas can adopt the same or similar weight setting method with reference to the eye area, then the weights of M key points It can be ⁇ 1 , ⁇ 2 , ⁇ 3 ,,,, ⁇ M .
  • pixel point a among at least two pixel points if it is related to one of the M key points (for example: key point b) If the distance between them is less than the preset distance threshold, then the weight of the key point b is assigned to the pixel a. If the distance between it and the key point b is greater than or equal to the preset distance threshold, then the weight of the pixel a is assigned. 0.
  • the same weight can also be set for key points of the same category of information. For example, the weights of n key points in the eye area can be set to ⁇ 1 , and the weights of o key points in the nose area can be set to ⁇ 1 .
  • each pixel will be assigned M weights, and the M weights will be used as the reference weight of each pixel.
  • each facial feature in the fourth facial feature map corresponds to at least two pixels in the color face image
  • each facial feature can be calculated based on the weight of each pixel in step 214
  • the average of the weights of at least two pixels can be used as the weight of the corresponding facial feature in the fourth face feature map.
  • the mode of the weights of at least two pixels can also be used as the fourth face.
  • the weight of the corresponding facial feature in the feature map is used to determine the matrix composed of the weight of each facial feature as the second attention matrix.
  • the features in the fourth facial feature map are multiplied by the elements at corresponding positions in the attention matrix to obtain the first weighted feature map.
  • the features in the first weighted feature map can better express the detection object. Semantic information of facial focus areas.
  • the attention model is used to generate the first attention matrix, and then the second attention matrix is constructed for the fourth facial feature map based on key point detection and weight distribution. Since the attention model generates attention coefficients It is possible that a small amount of information about the key areas of focus may be missed, and assigning weights to the features in the fourth face feature map based on preset weights can make up for the possible omissions of the attention model, so that the focus of the face can be All areas can receive attention, which enables the obtained first weighted feature map to fully express the semantic information of the key areas of interest, which in turn helps improve the accuracy of living body classification.
  • obtaining the living body detection result of the detection object based on the first facial feature map and the third facial feature map includes the following steps 221 to 226:
  • an attention model is used to generate the attention coefficient of each facial feature in the first facial feature map, and the matrix composed of the attention coefficients is determined as
  • the third attention matrix performs key point detection on the infrared face image, and also obtains N key points (such as 106 key points) in the preset area of interest and the coordinate information and category information of the N key points.
  • the position of each facial feature in the facial feature map is determined, and at least two pixel points corresponding to the position of each facial feature in the infrared facial image are obtained, and the coordinate information of the at least two pixel points is obtained, for at least For each pixel in the two pixels, the coordinate information of each pixel and the coordinate information of N key points are used to calculate the distance between each pixel and each of the N key points. Based on the The distance between each pixel and each of the N key points and the category information of each key point are assigned a weight to each pixel to obtain N reference weights for each pixel.
  • the average of the reference weights is determined as the weight of each pixel point, the average or mode of the weights of the at least two pixel points is determined as the weight of each facial feature in the first facial feature map, and the The matrix composed of the weight of each facial feature is determined as the fourth attention matrix.
  • the third attention matrix and the fourth attention matrix are added to obtain the attention matrix E.
  • an attention model is first used to generate the attention coefficient of each facial feature in the third facial feature map, the matrix composed of the attention coefficients is determined as the fifth attention matrix, and the color face image is keyed
  • Point detection also obtains S key points (such as 106 key points) in the preset area of interest and the coordinate information and category information of the S key points.
  • S key points such as 106 key points
  • For the location of each facial feature in the third facial feature map determine at least two pixels corresponding to the location of each facial feature in the color face image, and obtain the coordinate information of the at least two pixels.
  • For each pixel in the at least two pixels use each The coordinate information of each pixel point and the coordinate information of S key points are calculated.
  • the distance between each pixel point and each of the S key points is calculated based on the distance between each pixel point and each of the S key points.
  • the distance between points and the category information of each key point are assigned a weight to each pixel point to obtain S reference weights for each pixel point, and the average of the S reference weights is determined for each pixel point.
  • the weight of the at least two pixels is determined as the weight of each facial feature in the third facial feature map, and the matrix composed of the weight of each facial feature is determined as the sixth Attention matrix, add the fifth attention matrix and the sixth attention matrix to obtain the attention matrix F.
  • another feature splicing method is used, that is, the attention model, key point detection and weight distribution are used to generate the attention matrix E for the first face feature map, and then the first face feature map and the attention matrix
  • E the attention matrix
  • F the attention matrix
  • the attention model, key point detection and weight distribution to generate the attention matrix F for the third face feature map, and combine the third face feature map with Multiplying the face feature map and the attention matrix F can also obtain the third weighted feature map that can fully express the semantic information of the key area of interest
  • splicing the second weighted feature map and the third weighted feature map fully integrates the color facial feature map.
  • the semantic information of key areas of interest in face images and the semantic information of key areas of interest in infrared face images are also conducive to improving the accuracy of living body classification.
  • the embodiment of the present disclosure obtains the infrared face image and the color face image of the detection object collected by the binocular camera; performs feature extraction on the infrared face image to obtain the first face feature map, and performs feature extraction on the color face image.
  • the second face feature map extracted from the color face image is classified to obtain the category attribute information of the detection object (ie, the target category attribute information), and the second face feature map is converted into the third face feature map based on the category attribute information of the detection object.
  • embodiments of the present disclosure can use the corresponding third branch in the same living body detection model to perform inference.
  • the living body detection solution helps save the memory overhead caused by storing at least two models and is more robust.
  • Each third branch in the model can be migrated using the parameters of the existing model, which can achieve high efficiency in the training phase. Iteration, meanwhile, only adds a category attribute classifier after the second branch, which has a negligible impact on the inference speed of the entire model.
  • Figure 6 is a schematic flow chart of another living body detection method provided by an embodiment of the present disclosure. As shown in Figure 6, the method includes steps 601 to 608:
  • 601 Obtain the infrared face image and color face image of the detection object collected by the binocular camera;
  • 602 Perform feature extraction on the infrared face image to obtain the first face feature map, and perform feature extraction on the color face image to obtain the second face feature map;
  • 605 Splice the first facial feature map and the third facial feature map to obtain the fourth facial feature map
  • 608 Classify the first weighted feature map to obtain the live detection result of the detection object.
  • steps 601 to 608 has been described in the embodiments shown in FIGS. 2A to 5 , and can achieve the same or similar beneficial effects.
  • Figure 7 is a schematic structural diagram of a life detection device provided by an embodiment of the present disclosure. As shown in Figure 7, the device includes an acquisition unit 701 and Processing unit 702, wherein:
  • the acquisition unit 701 is configured to acquire the infrared face image and color face image of the detection object collected by the binocular camera;
  • the processing unit 702 is configured to perform feature extraction on the infrared face image to obtain a first face feature map, and perform feature extraction on the color face image to obtain a second face feature map;
  • the processing unit 702 is also configured to obtain the target category attribute information of the detection object based on the second facial feature map;
  • the processing unit 702 is also configured to obtain a third facial feature map based on the target category attribute information and the second facial feature map;
  • the processing unit 702 is also configured to obtain the living body detection result of the detection object based on the first facial feature map and the third facial feature map.
  • the living body detection device shown in Figure 7 by acquiring the infrared face image and the color face image of the detection object collected by the binocular camera; performing feature extraction on the infrared face image, the first person is obtained facial feature map, and perform feature extraction on the color face image to obtain a second face feature map; obtain the target category attribute information of the detection object according to the second face feature map; and obtain the target category attribute information according to the target category Attribute information and the second facial feature map are used to obtain a third facial feature map; based on the first facial feature map and the third facial feature map, a living body detection result of the detection object is obtained.
  • the second face feature map extracted from the color face image is classified to obtain the category attribute information of the detection object (ie, the target category attribute information), and the second face feature map is converted into the third face feature map based on the category attribute information of the detection object.
  • the processing unit 702 is configured as:
  • the processing unit 702 is configured as:
  • the attention model is used to generate the attention coefficient of each facial feature in the fourth facial feature map, and the matrix composed of the attention coefficients is determined as the first attention matrix;
  • the M key points also include coordinate information and category information of each key point, and then based on the fourth facial feature map and the M key points, the second attention matrix is determined, and the processing unit 702 is configured as:
  • each facial feature in the fourth facial feature map For the location of each facial feature in the fourth facial feature map, determine at least two pixels corresponding to the location of each facial feature in the color face image, and obtain the coordinate information of at least two pixels. ;
  • each pixel point among at least two pixel points use the coordinate information of each pixel point and the coordinate information of M key points to calculate the distance between each pixel point and each of the M key points;
  • a weight is assigned to each pixel to obtain M reference weights for each pixel;
  • the weight of each facial feature is determined, and a matrix composed of the weight of each facial feature is determined as the second attention matrix.
  • the processing unit 702 when performing feature extraction on infrared face images to obtain the first face feature map, is configured as:
  • the processing unit 702 is configured as:
  • the color face image is input into the second branch of the live body detection model for feature extraction to obtain the second face feature map.
  • the living body detection model further includes a category attribute classifier, at least two third branches and a living body detection classifier, wherein the second branch, the attribute classifier and the at least two third branches are connected in sequence, At least two third branches are independent of each other, and each of the at least two third branches corresponds to different category attribute information.
  • the output of each third branch is spliced with the output of the first branch respectively, and the splicing The final output is used as the input of the living body detection classifier.
  • the features in the second facial feature map include one or at least two semantic information of material, texture and gloss, and the target category of the detection object is obtained based on the second facial feature map.
  • the processing unit 702 is configured as:
  • the processing unit 702 is configured as:
  • a third branch corresponding to the location identification is determined from at least two third branches, and the second face feature map is input into the third branch corresponding to the location identification for feature extraction to obtain a third face feature map.
  • each unit in the life detection device shown in FIG. 7 can be separately or entirely combined into one or several other units to form, or one (some) of the units can be further Splitting it into at least two functionally smaller units can achieve the same operation without affecting the realization of the technical effects of the embodiments of the present disclosure.
  • the above-mentioned units are divided based on logical functions.
  • the function of one unit can also be realized by at least two units, or the functions of at least two units can be realized by one unit.
  • the living body detection device may also include other units.
  • these functions may also be implemented with the assistance of other units, and may be implemented by at least two units in cooperation.
  • the system can be configured by including a central processing unit (Central Processing Unit, CPU), a random access storage medium (Random Access Memory, RAM), a read-only storage medium (Read-Only Memory, ROM), etc.
  • a computer program (including program code) capable of executing each step involved in the corresponding method as shown in Figure 2A or Figure 6 is run on a general-purpose computing device such as a computer with processing elements and storage elements to construct the method as shown in Figure 7
  • the living body detection device shown in the figure, and the living body detection method of the embodiment of the present disclosure are implemented.
  • the computer program may be recorded on, for example, a computer-readable recording medium, loaded into the above-mentioned computing device through the computer-readable recording medium, and run therein.
  • the electronic device includes a transceiver 801, a processor 802, and a memory 803. They are connected via bus 804.
  • the memory 803 is used to store computer programs and data, and can transmit the data stored in the memory 803 to the processor 802.
  • the processor 802 is used to read the computer program in the memory 803 to perform the following operations:
  • the second facial feature map obtain the target category attribute information of the detection object
  • the third face feature map is obtained
  • the liveness detection result of the detection object is obtained.
  • the second face feature map extracted from the color face image is classified to obtain the category attribute information of the detection object (ie, the target category attribute information), and the second face feature map is converted into the third face feature map based on the category attribute information of the detection object.
  • the processor 802 performs the following steps to obtain the living body detection result of the detection object based on the first facial feature map and the third facial feature map, including:
  • the processor 802 executes to obtain the degree of attention of each facial feature in the fourth facial feature map, and obtains the degree of attention matrix, which includes:
  • the attention model is used to generate the attention coefficient of each facial feature in the fourth facial feature map, and the matrix composed of the attention coefficients is determined as the first attention matrix;
  • the M key points also include coordinate information and category information of each key point.
  • the processor 802 determines the second attention matrix based on the fourth facial feature map and the M key points, include:
  • each facial feature in the fourth facial feature map For the location of each facial feature in the fourth facial feature map, determine at least two pixels corresponding to the location of each facial feature in the color face image, and obtain the coordinate information of at least two pixels. ;
  • each pixel point among at least two pixel points use the coordinate information of each pixel point and the coordinate information of M key points to calculate the distance between each pixel point and each of the M key points;
  • a weight is assigned to each pixel to obtain M reference weights for each pixel;
  • the weight of each facial feature is determined, and a matrix composed of the weight of each facial feature is determined as the second attention matrix.
  • the processor 802 performs feature extraction on the infrared face image to obtain the first face feature map, including: inputting the infrared face image into the first branch of the living body detection model for feature extraction, and obtaining The first facial feature map;
  • the processor 802 performs feature extraction on the color face image to obtain a second face feature map, including:
  • the color face image is input into the second branch of the live body detection model for feature extraction to obtain the second face feature map.
  • the living body detection model further includes a category attribute classifier, at least two third branches and a living body detection classifier, wherein the second branch, the attribute classifier and the at least two third branches are connected in sequence, At least two third branches are independent of each other, and each of the at least two third branches corresponds to different category attribute information.
  • the output of each third branch is spliced with the output of the first branch respectively, and the splicing The final output is used as the input of the living body detection classifier.
  • the features in the second face feature map include one or at least two semantic information of material, texture and gloss.
  • the processor 802 executes to obtain the detection object according to the second face feature map.
  • Target category attribute information including:
  • the third face feature map is obtained, including:
  • a third branch corresponding to the location identification is determined from at least two third branches, and the second face feature map is input into the third branch corresponding to the location identification for feature extraction to obtain a third face feature map.
  • the electronic device may include but is not limited to a transceiver 801, a processor 802, and a memory 803.
  • a transceiver 801 a transceiver 801
  • a processor 802 a processor 802
  • the schematic diagram is only an example of an electronic device and does not constitute a limitation on the electronic device. May include more or fewer parts than shown, or combinations of certain parts, or different parts.
  • the processor 802 of the electronic device implements the steps in the life detection method of the embodiment of the present disclosure when executing the computer program, the embodiments of the life detection method are all applicable to the electronic device, and can achieve the same or better results. Similar beneficial effects.
  • Embodiments of the present disclosure also provide a computer-readable storage medium that stores a computer program, and the computer program is executed by a processor to implement any of the living body detection methods described in the above method embodiments. some or all of the steps.
  • the computer-readable storage medium may only store the computer program corresponding to the living body detection method.
  • a computer-readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device, and may be a volatile storage medium or a non-volatile storage medium.
  • the computer-readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the above.
  • Non-exhaustive list of computer-readable storage media include: portable computer disks, hard drives, magnetic disks, optical disks, random access memory, read-only memory, Erasable Programmable Read-Only Memory Read Only Memory, EPROM) or flash memory, Static Random-Access Memory (Static Random-Access Memory, SRAM), Portable Compact Disk Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM), Digital Multi-Function Disk (Digital Video Disc, DVD), memory stick, floppy disk, mechanical encoding device, such as a punched card or a raised structure in a groove with instructions stored thereon, and any suitable combination of the above.
  • computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses through fiber optic cables), or through electrical wires. transmitted electrical signals.
  • Embodiments of the present disclosure also provide a computer program.
  • the computer program includes computer readable code.
  • the computer readable code is read and executed by a computer, part of the method in any embodiment of the present disclosure is implemented. or all steps.
  • Embodiments of the present disclosure also provide a computer program product.
  • the computer program product includes a non-transitory computer-readable storage medium storing a computer program.
  • the computer program is operable to cause the computer to perform the steps described in the above method embodiments. Some or all steps of any living body detection method.
  • the disclosed device can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • at least two units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical or other forms.
  • the units described as separate components may or may not be physically separated.
  • the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to at least two network units. . Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above integrated units can be implemented in the form of hardware or software program modules.
  • the integrated unit if implemented in the form of a software program module and sold or used as an independent product, may be stored in a computer-readable memory.
  • the technical solution of the present disclosure is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory, It includes several instructions to cause a computer device (which can be a personal computer, a server or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of the present disclosure.

Abstract

A liveness detection method and apparatus, and an electronic device, a storage medium, a computer program and a computer program product. The method comprises: acquiring an infrared facial image and a color facial image, that are collected by a binocular camera, of an object subjected to detection (201); performing feature extraction on the infrared facial image so as to obtain a first facial feature map, and performing feature extraction on the color facial image so as to obtain a second facial feature map (202); obtaining target category attribute information of said object on the basis of the second facial feature map (203); obtaining a third facial feature map on the basis of the target category attribute information and the second facial feature map (204); and obtaining a liveness detection result of said object on the basis of the first facial feature map and the third facial feature map (205).

Description

活体检测方法及装置、电子设备、存储介质、计算机程序、计算机程序产品Living body detection methods and devices, electronic equipment, storage media, computer programs, computer program products
相关申请的交叉应用Cross-application of related applications
本公开实施例基于申请号为202210283792.2、申请日为2022年03月22日、申请名称为“活体检测方法、装置、电子设备及存储介质”的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本公开作为参考。This disclosed embodiment is based on a Chinese patent application with application number 202210283792.2, application date is March 22, 2022, and the application name is "living body detection method, device, electronic equipment and storage medium", and requires priority of this Chinese patent application The entire content of this Chinese patent application is hereby incorporated by reference into this disclosure.
技术领域Technical field
本公开涉及但不限于计算机视觉技术领域,尤其涉及一种活体检测方法及装置、电子设备、存储介质、计算机程序、计算机程序产品。The present disclosure relates to but is not limited to the field of computer vision technology, and in particular, to a living body detection method and device, electronic equipment, storage media, computer programs, and computer program products.
背景技术Background technique
计算机视觉是当前研究的热点方向,是图像处理、人工智能和模式识别等技术的综合,在社会各领域也取得了广泛的应用。谈到计算机视觉的应用,总是离不开人脸识别,而人脸识别中一个关键的步骤便是活体检测,常见的活体算法按活体检验形式可以分为交互式活体算法和静默式活体算法,按相机模组类型可分为单目活体算法、双目活体算法和三维(3-Dimensional,3D)活体算法。目前的活体检测算法往往以单个模型的形式出现,但在部分场景中,单个模型的容量通常难以达到活体检测的精度。Computer vision is a hot topic in current research. It is a synthesis of image processing, artificial intelligence, pattern recognition and other technologies. It has also been widely used in various fields of society. When it comes to the application of computer vision, face recognition is always inseparable, and a key step in face recognition is liveness detection. Common liveness algorithms can be divided into interactive liveness algorithms and silent liveness algorithms according to the form of liveness detection. , according to the type of camera module, it can be divided into monocular in vivo algorithm, binocular in vivo algorithm and three-dimensional (3-Dimensional, 3D) in vivo algorithm. Current living body detection algorithms often appear in the form of a single model, but in some scenarios, the capacity of a single model is often difficult to achieve the accuracy of live body detection.
发明内容Contents of the invention
本公开实施例提供了一种活体检测方法及装置、电子设备、存储介质、计算机程序、计算机程序产品,有利于提升双目活体检测的精度。Embodiments of the present disclosure provide a living body detection method and device, electronic equipment, storage media, computer programs, and computer program products, which are beneficial to improving the accuracy of binocular living body detection.
本公开实施例提供了一种活体检测方法,该方法包括:Embodiments of the present disclosure provide a living body detection method, which method includes:
获取双目相机采集的检测对象的红外人脸图像和彩色人脸图像;Obtain the infrared face image and color face image of the detection object collected by the binocular camera;
对红外人脸图像进行特征提取,得到第一人脸特征图,以及对彩色人脸图像进行特征提取,得到第二人脸特征图;Perform feature extraction on the infrared face image to obtain the first face feature map, and perform feature extraction on the color face image to obtain the second face feature map;
根据第二人脸特征图,得到检测对象的目标类别属性信息;According to the second facial feature map, obtain the target category attribute information of the detection object;
根据目标类别属性信息和第二人脸特征图,得到第三人脸特征图;According to the target category attribute information and the second face feature map, the third face feature map is obtained;
根据第一人脸特征图和第三人脸特征图,得到检测对象的活体检测结果。According to the first face feature map and the third face feature map, the liveness detection result of the detection object is obtained.
本公开实施例通过获取双目相机采集的检测对象的红外人脸图像和彩色人脸图像;对所述红外人脸图像进行特征提取,得到第一人脸特征图,以及对所述彩色人脸图像进行特征提取,得到第二人脸特征图;根据所述第二人脸特征图,得到所述检测对象的目标类别属性信息;根据所述目标类别属性信息和所述第二人脸特征图,得到第三人脸特 征图;根据所述第一人脸特征图和所述第三人脸特征图,得到所述检测对象的活体检测结果。这样对彩色人脸图像提取出的第二人脸特征图进行分类,得到检测对象的类别属性信息(即目标类别属性信息),基于检测对象的类别属性信息将第二人脸特征图转化为第三人脸特征图,以实现带类别属性信息的特征提取,利用带类别属性信息的特征(即第三人脸特征图)和红外人脸特征(即第一人脸特征图)进行活体检测,有利于提升双目活体检测的精度。The embodiment of the present disclosure obtains the infrared face image and the color face image of the detection object collected by the binocular camera; performs feature extraction on the infrared face image to obtain the first face feature map, and performs feature extraction on the color face image. Perform feature extraction on the image to obtain a second facial feature map; obtain the target category attribute information of the detection object based on the second facial feature map; obtain the target category attribute information of the detection object based on the target category attribute information and the second facial feature map , obtain a third facial feature map; obtain a living body detection result of the detection object based on the first facial feature map and the third facial feature map. In this way, the second face feature map extracted from the color face image is classified to obtain the category attribute information of the detection object (ie, the target category attribute information), and the second face feature map is converted into the third face feature map based on the category attribute information of the detection object. Three face feature maps to achieve feature extraction with category attribute information, and use features with category attribute information (i.e., the third face feature map) and infrared face features (i.e., the first face feature map) for live detection, It is helpful to improve the accuracy of binocular live body detection.
本公开实施例提供了一种活体检测装置,该装置包括获取单元和处理单元;Embodiments of the present disclosure provide a living body detection device, which includes an acquisition unit and a processing unit;
获取单元,配置为获取双目相机采集的检测对象的红外人脸图像和彩色人脸图像;An acquisition unit configured to acquire infrared face images and color face images of the detection object collected by the binocular camera;
处理单元,配置为对红外人脸图像进行特征提取,得到第一人脸特征图,以及对彩色人脸图像进行特征提取,得到第二人脸特征图;The processing unit is configured to perform feature extraction on the infrared face image to obtain a first face feature map, and perform feature extraction on the color face image to obtain a second face feature map;
处理单元,还配置为根据第二人脸特征图,得到检测对象的目标类别属性信息;The processing unit is also configured to obtain target category attribute information of the detection object based on the second facial feature map;
处理单元,还配置为根据目标类别属性信息和第二人脸特征图,得到第三人脸特征图;The processing unit is also configured to obtain a third facial feature map based on the target category attribute information and the second facial feature map;
处理单元,还配置为根据第一人脸特征图和第三人脸特征图,得到检测对象的活体检测结果。The processing unit is also configured to obtain a liveness detection result of the detection object based on the first facial feature map and the third facial feature map.
本公开实施例提供一种电子设备,该电子设备包括:处理器,所述处理器与存储器相连,所述存储器用于存储计算机程序,所述处理器用于执行所述存储器中存储的计算机程序,以使得所述电子设备执行如所述活体检测的方法。An embodiment of the present disclosure provides an electronic device. The electronic device includes: a processor, the processor is connected to a memory, the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory. So that the electronic device performs the method of life detection.
本公开实施例提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序使得计算机执行如所述活体检测的方法。Embodiments of the present disclosure provide a computer-readable storage medium that stores a computer program, and the computer program causes the computer to perform the method of life detection.
本公开实施例提供一种计算机程序,所述计算机程序包括计算机可读代码,在所述计算机可读代码被计算机读取并执行的情况下,实现本公开任一实施例中的方法的部分或全部步骤。Embodiments of the present disclosure provide a computer program that includes computer readable code. When the computer readable code is read and executed by a computer, part of the method in any embodiment of the present disclosure is implemented or All steps.
本公开实施例提供一种计算机程序产品,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机可操作来使计算机执行如所述活体检测的方法。Embodiments of the present disclosure provide a computer program product, which includes a non-transitory computer-readable storage medium storing a computer program, and the computer is operable to cause the computer to perform the method of life detection.
附图说明Description of the drawings
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present disclosure or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only These are some embodiments of the present disclosure. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting creative efforts.
图1为本公开实施例提供的一种应用环境的示意图;Figure 1 is a schematic diagram of an application environment provided by an embodiment of the present disclosure;
图2A为本公开实施例提供的一种活体检测方法的流程示意图;Figure 2A is a schematic flow chart of a living body detection method provided by an embodiment of the present disclosure;
图2B为本公开实施例提供的一种关注度矩阵确定方法的流程示意图;Figure 2B is a schematic flowchart of a method for determining an attention matrix provided by an embodiment of the present disclosure;
图2C为本公开实施例提供的一种活体检测方法的流程示意图;Figure 2C is a schematic flow chart of a living body detection method provided by an embodiment of the present disclosure;
图3为本公开实施例提供的一种活体检测模型的网络结构示意图;Figure 3 is a schematic network structure diagram of a living body detection model provided by an embodiment of the present disclosure;
图4为本公开实施例提供的一种选择第三分支的示意图;Figure 4 is a schematic diagram of selecting a third branch provided by an embodiment of the present disclosure;
图5为本公开实施例提供的一种确定特征对应的多个像素点的示意图;Figure 5 is a schematic diagram of multiple pixels corresponding to a certain feature provided by an embodiment of the present disclosure;
图6为本公开实施例提供的另一种活体检测方法的流程示意图;Figure 6 is a schematic flow chart of another living body detection method provided by an embodiment of the present disclosure;
图7为本公开实施例提供的一种活体检测装置的结构示意图;Figure 7 is a schematic structural diagram of a living body detection device provided by an embodiment of the present disclosure;
图8为本公开实施例提供的一种电子设备的结构示意图。FIG. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
具体实施方式Detailed ways
为了使本技术领域的人员更好地理解本公开方案,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分的实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本公开保护的范围。In order to enable those skilled in the art to better understand the present disclosure, the following will clearly and completely describe the technical solutions in the present disclosure embodiments in conjunction with the accompanying drawings. Obviously, the described embodiments are only These are part of the embodiments of this disclosure, not all of them. Based on the embodiments in this disclosure, all other embodiments obtained by those of ordinary skill in the art without creative efforts should fall within the scope of protection of this disclosure.
本公开说明书、权利要求书和附图中出现的术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。此外,术语“第一”、“第二”和“第三”等是用于区别不同的对象,而并非用于描述特定的顺序。Where the terms "include" and "have" and any variations thereof appear in this disclosure, claims, and drawings, they are intended to cover a non-exclusive inclusion. For example, a process, method, system, product or device that includes a series of steps or units is not limited to the listed steps or units, but optionally also includes steps or units that are not listed, or optionally also includes Other steps or units inherent to such processes, methods, products or devices. In addition, the terms "first", "second" and "third" are used to distinguish different objects and are not used to describe a specific order.
在本公开中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本公开的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本公开所描述的实施例可以与其它实施例相结合。Reference in this disclosure to an "embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment can be included in at least one embodiment of the disclosure. The appearances of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described in this disclosure may be combined with other embodiments.
请参见图1,图1为本公开实施例提供的一种应用环境的示意图,如图1所示,该应用环境至少包括双目相机101和电子设备102,双目相机101和电子设备102之间通过有线或无线网络连接。其中,双目相机101包括可见光相机模组1011和红外相机模组1012,可见光相机模组1011和红外相机模组1012用于在检测对象进入图像采集范围的情况下,同步对检测对象进行图像采集,分别得到彩色图像和红外图像,并将该彩色图像和红外图像存入人脸识别系统或者直接发送到电子设备102,电子设备102在接收到或者从系统中匹配出彩色图像和红外图像的情况下,对其进行人脸检测,基于人脸检测框的位置信息分别从彩色图像中截取彩色人脸图像、从红外图像中截取出红外人脸图像,电子设备102调用支持多类别属性信息的活体检测模型对彩色人脸图像和红外人脸图像进行活体检测,由于该活体检测模型采用每种类别属性信息对应的分支进行特征提取,使提取出的活体特征带有特有的类别属性信息,从而能够提升活体分类的精度,进 而提升活体检测的精度。Please refer to Figure 1. Figure 1 is a schematic diagram of an application environment provided by an embodiment of the present disclosure. As shown in Figure 1, the application environment at least includes a binocular camera 101 and an electronic device 102. The binocular camera 101 and the electronic device 102 connected via wired or wireless network. Among them, the binocular camera 101 includes a visible light camera module 1011 and an infrared camera module 1012. The visible light camera module 1011 and the infrared camera module 1012 are used to synchronously collect images of the detection object when the detection object enters the image collection range. , obtain color images and infrared images respectively, and store the color images and infrared images in the face recognition system or directly send them to the electronic device 102. The electronic device 102 receives or matches the color images and infrared images from the system. Next, face detection is performed on it, and a color face image is intercepted from the color image and an infrared face image is intercepted from the infrared image based on the position information of the face detection frame. The electronic device 102 calls the living body that supports multi-category attribute information. The detection model performs live body detection on color face images and infrared face images. Since the live body detection model uses branches corresponding to each category attribute information for feature extraction, the extracted live body features have unique category attribute information, so that it can Improve the accuracy of living body classification, thereby improving the accuracy of living body detection.
示例性地,电子设备102可以是独立的物理服务器、服务器集群,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、以及大数据和人工智能平台等基础云计算服务的云服务器,等等。在一些可能的实现方式中,该活体检测方法可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。For example, the electronic device 102 may be an independent physical server or a server cluster, or may provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, as well as cloud servers for basic cloud computing services such as big data and artificial intelligence platforms, etc. In some possible implementations, the living body detection method can be implemented by the processor calling computer-readable instructions stored in the memory.
请参见图2A,图2A为本公开实施例提供的一种活体检测方法的流程示意图,该方法可基于图1所示的应用环境实施,应用于电子设备,如图2A所示,该方法包括步骤201至205:Please refer to Figure 2A. Figure 2A is a schematic flow chart of a living body detection method provided by an embodiment of the present disclosure. The method can be implemented based on the application environment shown in Figure 1 and applied to electronic devices. As shown in Figure 2A, the method includes Steps 201 to 205:
201:获取双目相机采集的检测对象的红外人脸图像和彩色人脸图像。201: Obtain the infrared face image and color face image of the detection object collected by the binocular camera.
本公开实施例中,电子设备可以实时获取到双目相机同步采集的检测对象的红外图像和彩色图像,也可从人脸识别系统中获取到双目相机同步采集的检测对象的红外图像和彩色图像,此处不作限定。示例性地,电子设备在获取到红外图像和彩色图像的情况下,基于人脸检测算法生成的检测框从两张图像中分别截取出红外人脸图像和彩色人脸图像。In the embodiment of the present disclosure, the electronic device can obtain in real time the infrared image and color image of the detection object synchronously collected by the binocular camera, and can also obtain the infrared image and color image of the detection object synchronously collected by the binocular camera from the face recognition system. Images are not limited here. For example, when the electronic device acquires an infrared image and a color image, it intercepts the infrared face image and the color face image respectively from the two images based on the detection frame generated by the face detection algorithm.
示例性地,获取双目相机采集的检测对象的红外人脸图像和彩色人脸图像,包括:Exemplarily, obtaining the infrared face image and color face image of the detection object collected by the binocular camera includes:
A1:从人脸识别系统存储的检测对象的至少两张彩色图像中选取出人脸质量最高的一张作为目标彩色图像,其中,该至少两张彩色图像由双目相机中的可见光相机模组对检测对象进行连续采集得到。电子设备通过预先训练的人脸质量检测模型对该至少两张彩色图像中的人脸进行特征提取,以得到包含人脸大小、角度、清晰度信息的特征,然后对该特征进行分类预测,得到至少两张彩色图像中每张彩色图像的人脸质量检测得分,并选取得分最高的一张作为目标彩色图像。A1: Select the one with the highest face quality as the target color image from at least two color images of the detection object stored in the face recognition system, where the at least two color images are generated by the visible light camera module in the binocular camera It is obtained by continuously collecting the detection objects. The electronic device extracts features from the faces in at least two color images through a pre-trained face quality detection model to obtain features containing face size, angle, and sharpness information, and then performs classification prediction on the features to obtain Face quality detection scores for each of at least two color images, and the one with the highest score is selected as the target color image.
A2:对人脸识别系统中存储的至少两张红外图像中的人脸进行人脸质量检测,得到至少两张红外图像中每张红外图像的人脸质量检测得分,并计算该人脸质量检测得分与目标彩色图像的人脸质量检测得分的差值。电子设备同样通过人脸质量检测模型提取出包含人脸大小、角度、清晰度信息的特征,再进行分类得到每张红外图像的人脸质量检测得分。A2: Perform face quality detection on the faces in at least two infrared images stored in the face recognition system, obtain the face quality detection score of each infrared image in the at least two infrared images, and calculate the face quality detection The difference between the score and the face quality detection score of the target color image. The electronic device also extracts features including face size, angle, and sharpness information through the face quality detection model, and then classifies the features to obtain the face quality detection score of each infrared image.
A3:将至少两张红外图像中人脸质量检测得分与目标彩色图像的人脸质量检测得分的差值最小的一张作为候选红外图像。A3: Among at least two infrared images, the one with the smallest difference between the face quality detection score and the target color image is used as the candidate infrared image.
A4:对目标彩色图像和候选红外图像分别进行人脸关键点检测,得到目标彩色图像中包括人眼、颧骨、鼻子、耳朵、下巴和脸颊区域的106个第一关键点和候选红外图像中包括人眼、颧骨、鼻子、耳朵、下巴和脸颊区域的106个第二关键点。A4: Detect face key points on the target color image and candidate infrared image respectively, and obtain 106 first key points including human eyes, cheekbones, nose, ears, chin and cheek areas in the target color image and 106 first key points in the candidate infrared image. Includes 106 second key points of human eyes, cheekbones, nose, ears, chin and cheek areas.
A5:计算106个第一关键点与106个第二关键点之间的相似度,若该相似度小于预设阈值,则将目标彩色图像和候选红外图像确定为双目相机同一时刻对检测对象进行采集得到的图像对,基于人脸关键点检测时目标彩色图像和候选红外图像中的检测框,分别从目标彩色图像和候选红外图像中截取出人脸区域图像,得到检测对象的红外人脸图 像和彩色人脸图像。A5: Calculate the similarity between the 106 first key points and the 106 second key points. If the similarity is less than the preset threshold, the target color image and the candidate infrared image are determined to be the detection objects of the binocular camera at the same time. Based on the image pairs collected, based on the detection frames in the target color image and the candidate infrared image when detecting facial key points, the face area image is intercepted from the target color image and the candidate infrared image respectively, and the infrared face of the detection object is obtained. images and color face images.
该实施方式中,当电子设备需要从人脸识别系统中获取图像时,电子设备可能并不知道哪两张彩色图像和红外图像是同一时刻对检测对象进行采集得到的,针对任一检测对象,先从其至少两张彩色图像中选取出人脸质量最高的一张,然后再从人脸识别系统的所有红外图像中选取出与该张彩色图像的人脸质量检测得分最接近的一张作为候选红外图像,然后选取106个人脸关键点对两张图像进行关键点匹配,若关键点之间的相似度小于预设阈值,则认为候选红外图像与目标彩色图像是在同一时刻对检测对象进行采集得到,从而在电子设备需要从人脸识别系统中获取图像的场景中,有利于提升图像匹配的精度。In this implementation, when the electronic device needs to obtain an image from the face recognition system, the electronic device may not know which two color images and infrared images were collected from the detection object at the same time. For any detection object, First, select the one with the highest face quality from at least two color images, and then select the one with the face quality detection score closest to the color image from all infrared images of the face recognition system as the candidate infrared image, and then select 106 facial key points to perform key point matching on the two images. If the similarity between the key points is less than the preset threshold, it is considered that the candidate infrared image and the target color image are detected at the same time. This will help improve the accuracy of image matching in scenarios where electronic devices need to obtain images from the face recognition system.
202:对红外人脸图像进行特征提取,得到第一人脸特征图,以及对彩色人脸图像进行特征提取,得到第二人脸特征图。202: Perform feature extraction on the infrared face image to obtain the first face feature map, and perform feature extraction on the color face image to obtain the second face feature map.
本公开实施例中,提出一种活体检测模型结构,如图3所示,该活体检测模型包括第一分支(303)、第二分支(304)、类别属性分类器(305)、至少两个第三分支(306)和活体检测分类器(307),其中,第一分支(303)用于对输入的红外人脸图像(301)进行特征提取,得到第一人脸特征图,第二分支(304)用于对输入的彩色人脸图像(302)进行特征提取,得到第二人脸特征图,其中,第一人脸特征图和第二人脸特征图涵盖了人脸上的重要区域(比如:人眼、颧骨、鼻子、耳朵、下巴、脸颊等)是否为活体的语义信息,比如该语义信息可以是材质、纹理和光泽中的一种或至少两种。可选的,第一分支和第二分支均可采用连续串联的至少两个Inception结构进行特征提取,Inception结构采用不同大小的卷积核意味着不同大小的感受野,也即实现了不同尺度的特征的融合,因此,第一人脸特征图和第二人脸特征图具有更加丰富的语义信息。In the embodiment of the present disclosure, a living body detection model structure is proposed. As shown in Figure 3, the living body detection model includes a first branch (303), a second branch (304), a category attribute classifier (305), at least two The third branch (306) and the living body detection classifier (307), wherein the first branch (303) is used to extract features from the input infrared face image (301) to obtain the first face feature map, and the second branch (304) is used to extract features from the input color face image (302) to obtain a second face feature map, in which the first face feature map and the second face feature map cover important areas on the face. (For example: human eyes, cheekbones, nose, ears, chin, cheeks, etc.) Semantic information about whether it is a living body. For example, the semantic information can be one or at least two of material, texture and gloss. Optionally, both the first branch and the second branch can use at least two Inception structures in series for feature extraction. The Inception structure uses convolution kernels of different sizes, which means different sizes of receptive fields, that is, different scales are achieved. The fusion of features, therefore, the first face feature map and the second face feature map have richer semantic information.
在一种可能的实施方式中,活体检测模型还包括类别属性分类器、至少两个第三分支和活体检测分类器,其中,第二分支、属性分类器和至少两个第三分支依次连接,至少两个第三分支相互独立,且至少两个第三分支中的每个第三分支与不同的类别属性信息对应,每个第三分支的输出分别与第一分支的输出进行拼接,将拼接后的输出作为活体检测分类器的输入。这样,该实施方式中,针对具有不同类别属性信息的检测对象,可以在同一活体检测模型中采用对应的第三分支进行推理,相比针对不同类别属性信息的检测对象需要采用不同的活体检测模型进行活体检测的方案,有利于节省存储至少两个模型所带来的内存开销、鲁棒性更强,模型内的每一第三分支可使用已有模型的参数进行迁移,在训练阶段能够实现高效迭代,同时,在第二分支后仅添加了一个类别属性分类器,其对整个模型的推理速度的影响可忽略不计。In a possible implementation, the living body detection model further includes a category attribute classifier, at least two third branches and a living body detection classifier, wherein the second branch, the attribute classifier and the at least two third branches are connected in sequence, At least two third branches are independent of each other, and each of the at least two third branches corresponds to different category attribute information. The output of each third branch is spliced with the output of the first branch respectively, and the splicing The final output is used as the input of the living body detection classifier. In this way, in this embodiment, for detection objects with different categories of attribute information, the corresponding third branch can be used for inference in the same living body detection model. Compared with detection objects with different categories of attribute information, different living body detection models need to be adopted. The solution for live detection is helpful to save the memory overhead caused by storing at least two models and is more robust. Each third branch in the model can be migrated using the parameters of the existing model, which can be achieved during the training phase. Efficient iteration, while only adding a category attribute classifier after the second branch has a negligible impact on the inference speed of the entire model.
在一种可能的实施方式中,对红外人脸图像进行特征提取,得到第一人脸特征图,包括:将红外人脸图像输入活体检测模型的第一分支进行特征提取,得到第一人脸特征图;对彩色人脸图像进行特征提取,得到第二人脸特征图,包括:将彩色人脸图像输入活体检测模型的第二分支进行特征提取,得到第二人脸特征图。这样,该实施方式中,采用不同的神经网络分支分别对红外人脸图像和彩色人脸图像进行特征提取,由于第一 分支采用红外人脸样本图像的监督信息训练得到,第二分支采用彩色人脸样本图像的监督信息训练得到,则将红外人脸图像和彩色人脸图像分别输入各自对应的分支进行特征提取,有利于提取出语义信息更丰富的特征。In a possible implementation, performing feature extraction on the infrared face image to obtain the first face feature map includes: inputting the infrared face image into the first branch of the living body detection model for feature extraction to obtain the first face feature map. Feature map; performing feature extraction on the color face image to obtain the second face feature map, including: inputting the color face image into the second branch of the living body detection model for feature extraction to obtain the second face feature map. In this way, in this embodiment, different neural network branches are used to extract features of infrared face images and color face images respectively. Since the first branch is trained using the supervision information of infrared face sample images, the second branch is trained using color face images. If the supervision information of face sample images is trained, then the infrared face images and color face images are input into their corresponding branches for feature extraction, which is beneficial to extracting features with richer semantic information.
203:根据第二人脸特征图,得到检测对象的目标类别属性信息。203: Obtain the target category attribute information of the detection object based on the second face feature map.
本公开实施例中,将第二人脸特征图输入属性分类器,以通过属性分类器对一种或至少两种语义信息进行分类预测,得到目标类别属性信息。在一些实施例中,目标类别属性信息可以是性别、年龄段、所属地标识(如属于第一地区)等等,属性分类器在训练时以类别属性信息为监督,因此,其基于包含丰富语义信息的第二特征图能够预测出检测对象的目标类别属性信息,比如检测对象是哪个国家的,检测对象属于哪个年龄段。In the embodiment of the present disclosure, the second facial feature map is input into an attribute classifier, so that one or at least two types of semantic information are classified and predicted by the attribute classifier to obtain target category attribute information. In some embodiments, the target category attribute information can be gender, age group, location identification (such as belonging to the first region), etc., and the attribute classifier uses category attribute information as supervision during training. Therefore, it is based on a classifier that contains rich semantics. The second feature map of the information can predict the target category attribute information of the detection object, such as which country the detection object is from and which age group the detection object belongs to.
在一种可能的实施方式中,第二人脸特征图中的特征包括材质、纹理和光泽中的一种或至少两种语义信息,根据第二人脸特征图,得到检测对象的目标类别属性信息,包括:将第二人脸特征图输入属性分类器,并通过属性分类器对一种或至少两种语义信息进行分类预测,得到目标类别属性信息,目标类别属性信息包括所属地标识;根据目标类别属性信息和第二人脸特征图,得到第三人脸特征图,包括:从至少两个第三分支中确定出与所属地标识对应的第三分支,将第二人脸特征图输入与所属地标识对应的第三分支进行特征提取,得到第三人脸特征图。这样,该实施方式中,通过活体检测模型中的属性分类器对第二人脸特征图进行分类,得到检测对象的目标类别属性信息(比如所属地标识),再从至少两个第三分支中确定出与目标类别属性信息对应的第三分支,采用该第三分支对第二人脸特征图进行特征提取,能够使第三人脸特征图携带有该类别属性信息所特有的特征,从而能够相对提升活体检测的精度。In a possible implementation, the features in the second face feature map include one or at least two semantic information of material, texture and gloss. According to the second face feature map, the target category attribute of the detection object is obtained. Information includes: inputting the second face feature map into an attribute classifier, and classifying and predicting one or at least two types of semantic information through the attribute classifier to obtain target category attribute information, where the target category attribute information includes a location identifier; according to The target category attribute information and the second facial feature map are used to obtain the third facial feature map, which includes: determining the third branch corresponding to the location identification from at least two third branches, and inputting the second facial feature map The third branch corresponding to the location identification performs feature extraction to obtain a third face feature map. In this way, in this embodiment, the second facial feature map is classified by the attribute classifier in the living body detection model to obtain the target category attribute information of the detection object (such as the identification of the location), and then from at least two third branches Determine the third branch corresponding to the attribute information of the target category, and use the third branch to extract features from the second face feature map, so that the third face feature map can carry features unique to the attribute information of the category, thereby enabling Relatively improve the accuracy of living body detection.
204:根据目标类别属性信息和第二人脸特征图,得到第三人脸特征图。204: Obtain the third face feature map based on the target category attribute information and the second face feature map.
本公开实施例中,电子设备在得到检测对象的目标类别属性信息后,即可从至少两个第三分支中确定出与该目标类别属性信息对应的第三分支,如图4所示,若检测对象的所属地标识为第一标识(401),则可从第一标识分支(402)、第二标识分支(403)、第三标识分支(404)等至少两个第三分支中确定出第一标识分支(402),将第二人脸特征图输入该第一标识分支(402)进行特征提取,得到第三人脸特征图(405)。可选的,至少两个第三分支同样可采用连续串联的至少两个Inception结构进行特征提取,每个第三分支在训练过程中采用特有的类别属性信息为监督,因此,第三人脸特征图携带有该类别属性信息所特有的特征,从而能够相对提升活体检测的精度。In the embodiment of the present disclosure, after the electronic device obtains the target category attribute information of the detection object, it can determine the third branch corresponding to the target category attribute information from at least two third branches, as shown in Figure 4. If If the identification of the location of the detection object is the first identification (401), it can be determined from at least two third branches such as the first identification branch (402), the second identification branch (403), and the third identification branch (404). In the first identification branch (402), the second facial feature map is input into the first identification branch (402) for feature extraction to obtain a third facial feature map (405). Optionally, at least two third branches can also use at least two consecutive Inception structures for feature extraction. Each third branch uses unique category attribute information as supervision during the training process. Therefore, the third face feature The graph carries characteristics unique to this category of attribute information, which can relatively improve the accuracy of living body detection.
205:根据第一人脸特征图和第三人脸特征图,得到检测对象的活体检测结果。205: Obtain the liveness detection result of the detection object based on the first face feature map and the third face feature map.
本公开实施例中,示例性地,根据第一人脸特征图和第三人脸特征图,得到检测对象的活体检测结果,包括:In the embodiment of the present disclosure, for example, based on the first facial feature map and the third facial feature map, the living body detection result of the detection object is obtained, including:
B1:将第一人脸特征图和第三人脸特征图拼接,得到第四人脸特征图;B1: Splice the first facial feature map and the third facial feature map to obtain the fourth facial feature map;
B2:获取第四人脸特征图中每个人脸特征的关注度,得到关注度矩阵;B2: Obtain the degree of attention of each facial feature in the fourth facial feature map and obtain the degree of attention matrix;
B3:将第四人脸特征图与关注度矩阵相乘,得到第一加权特征图;B3: Multiply the fourth face feature map and the attention matrix to obtain the first weighted feature map;
B4:对第一加权特征图进行分类,得到检测对象的活体检测结果。B4: Classify the first weighted feature map to obtain the live detection result of the detection object.
在一些实施例中,在步骤B1中可以先采用注意力模型生成第四人脸特征图中每个人脸特征的注意力系数,将注意力系数构成的矩阵确定为第一关注度矩阵。其中,该注意力模型可以是现有的任一注意力模型,应理解,注意力模型可以预测人眼在观看到某张图像时,对哪个目标更为关注,即可以通过该图像的特征计算出特征的注意力系数。电子设备对彩色人脸图像进行关键点检测,得到预设关注区域的M个关键点及该M个关键点的坐标信息和类别信息。其中,预设关注区域即指眼部区域、颧骨区域、鼻子区域、耳部区域、下巴区域和脸颊区域,该M个关键点即指步骤A4中的106个关键点,M为大于1的整数。电子设备基于第四人脸特征图和M个关键点,计算得到第二关注度矩阵,将第一关注度矩阵与第二关注度矩阵对应位置的元素相加,即得到关注度矩阵。In some embodiments, in step B1, an attention model may be first used to generate attention coefficients for each facial feature in the fourth facial feature map, and a matrix composed of attention coefficients may be determined as the first attention matrix. Among them, the attention model can be any existing attention model. It should be understood that the attention model can predict which target the human eye will pay more attention to when viewing a certain image, that is, it can be calculated through the characteristics of the image. Attention coefficient of features. The electronic device performs key point detection on the color face image, and obtains M key points of the preset area of interest and the coordinate information and category information of the M key points. Among them, the preset attention areas refer to the eye area, cheekbone area, nose area, ear area, chin area and cheek area. The M key points refer to the 106 key points in step A4, and M is greater than 1. integer. The electronic device calculates the second attention matrix based on the fourth facial feature map and the M key points, and adds the elements at corresponding positions of the first attention matrix and the second attention matrix to obtain the attention matrix.
示例性地,如图2B所示,基于第四人脸特征图和M个关键点,确定第二关注度矩阵,可以包括如下步骤211至215:For example, as shown in Figure 2B, determining the second attention matrix based on the fourth facial feature map and M key points may include the following steps 211 to 215:
211:对于第四人脸特征图中的每个人脸特征所在的位置,确定该每个人脸特征所在的位置在彩色人脸图像中对应的至少两个像素点,并获取至少两个像素点的坐标信息。211: For the location of each facial feature in the fourth facial feature map, determine at least two pixels corresponding to the location of each facial feature in the color face image, and obtain at least two pixels. coordinate information.
应理解,由于第四人脸特征图是将第一人脸特征图和第三人脸特征图拼接得到的,则基于深度学习结构(Inception)结构对原始图像进行卷积和池化的原理,如图5所示,对于第四人脸特征图(503)中的任一人脸特征所在的位置,均能确定其在彩色人脸图像(501)中对应的至少两个像素点(502),即该位置上的特征是通过这至少两个像素点(502)的特征计算得到的,如图5中黑色矩形框中的9个像素点(502),同时,能够得到该至少两个像素点(502)在彩色人脸图像(501)中的坐标信息,比如某个像素点a的坐标信息为(x4,y1)。It should be understood that since the fourth facial feature map is obtained by splicing the first facial feature map and the third facial feature map, the original image is convolved and pooled based on the deep learning structure (Inception) structure. As shown in Figure 5, for the location of any facial feature in the fourth facial feature map (503), at least two pixels (502) corresponding to it in the color face image (501) can be determined. That is, the features at this position are calculated through the features of at least two pixels (502), such as the 9 pixels (502) in the black rectangular box in Figure 5. At the same time, the at least two pixels can be obtained (502) The coordinate information in the color face image (501), for example, the coordinate information of a certain pixel point a is (x4, y1).
212:对于至少两个像素点中的每个像素点,采用每个像素点的坐标信息和M个关键点的坐标信息,计算每个像素点与M个关键点中每个关键点之间的距离。212: For each pixel point among at least two pixel points, use the coordinate information of each pixel point and the coordinate information of M key points to calculate the distance between each pixel point and each of the M key points. distance.
在一些实施例中,设M个关键点中某个关键点b的坐标信息为(x5,y7),则像素点a与个关键点b之间的距离为
Figure PCTCN2022110261-appb-000001
由此可计算出至少两个像素点中每个像素点与M个关键点中每个关键点之间的距离。
In some embodiments, assuming that the coordinate information of a key point b among the M key points is (x5, y7), then the distance between the pixel point a and the key point b is
Figure PCTCN2022110261-appb-000001
From this, the distance between each of the at least two pixel points and each of the M key points can be calculated.
213:基于每个像素点与M个关键点中每个关键点之间的距离和每个关键点的类别信息,对每个像素点分配权重,得到每个像素点的M个参考权重。213: Based on the distance between each pixel and each of the M key points and the category information of each key point, assign a weight to each pixel to obtain M reference weights for each pixel.
在一些实施例中,本公开实施例预先为M个关键点设定权重,示例性地,每个关键点可以对应不同的权重,比如眼部区域的关键点,按照其与眼球中心从近到远的距离,其权重呈递减趋势,也就是更关注离眼球中心更近的眼部区域关键点,其他区域可参照眼部区域采取相同或相似的权重设定方式,则M个关键点的权重可以是α 1,α 2,α 3,、、、,α M,对于至少两个像素点中的像素点a,若其与M个关键点中的某个关键点(比如:关键点b)之间的距离小于预设距离阈值,则将该关键点b的权重分配给该像素点a,若其与关键点b之间的距离大于或等于预设距离阈值,则为像素点a分配权重 0。示例性地,还可以为相同类别信息的关键点设定相同的权重,比如眼部区域的n个关键点的权重均可设定为α 1,鼻子区域的o个关键点的权重均可设定为α 2,下巴区域的q个关键点的权重均可设定为α 3,等等,则M个关键点同样具有M个权重,区别在于相同类别信息的关键点的权重也是相同的,对于像素点a和关键点b,当二者之间的距离小于预设距离阈值时,同样将关键点b的权重分配给像素点a,当二者之间的距离大于或等于预设距离阈值时,同样为像素点a分配权重0。基于上述两种权重分配方式,每个像素点均会被分配M个权重,将该M个权重作为该每个像素点的参考权重。 In some embodiments, the embodiments of the present disclosure pre-set weights for M key points. For example, each key point can correspond to a different weight, such as key points in the eye area, according to the distance from the closest to the center of the eyeball. For further distances, the weights show a decreasing trend, that is, more attention is paid to key points in the eye area that are closer to the center of the eyeball. Other areas can adopt the same or similar weight setting method with reference to the eye area, then the weights of M key points It can be α 1 , α 2 , α 3 ,,,, α M . For pixel point a among at least two pixel points, if it is related to one of the M key points (for example: key point b) If the distance between them is less than the preset distance threshold, then the weight of the key point b is assigned to the pixel a. If the distance between it and the key point b is greater than or equal to the preset distance threshold, then the weight of the pixel a is assigned. 0. For example, the same weight can also be set for key points of the same category of information. For example, the weights of n key points in the eye area can be set to α 1 , and the weights of o key points in the nose area can be set to α 1 . is set to α 2 , the weights of the q key points in the chin area can be set to α 3 , etc., then the M key points also have M weights. The difference is that the weights of key points with the same category of information are also the same. For pixel a and key point b, when the distance between them is less than the preset distance threshold, the weight of key point b is also assigned to pixel a. When the distance between the two is greater than or equal to the preset distance threshold When , pixel a is also assigned a weight of 0. Based on the above two weight allocation methods, each pixel will be assigned M weights, and the M weights will be used as the reference weight of each pixel.
214:将M个参考权重的平均值确定为每个像素点的权重。214: Determine the average of the M reference weights as the weight of each pixel.
215:基于每个像素点的权重,确定每个人脸特征的权重,将每个人脸特征的权重构成的矩阵确定为第二关注度矩阵。215: Based on the weight of each pixel, determine the weight of each facial feature, and determine the matrix composed of the weight of each facial feature as the second attention matrix.
在一些实施例中,由于第四人脸特征图中每个人脸特征与彩色人脸图像中的至少两个像素点对应,则基于步骤214中每个像素点的权重可计算出每个人脸特征的权重,比如可将至少两个像素点的权重的平均值作为第四人脸特征图中对应人脸特征的权重,比如还可将至少两个像素点的权重的众数作为第四人脸特征图中对应人脸特征的权重,由此将每个人脸特征的权重构成的矩阵确定为第二关注度矩阵。In some embodiments, since each facial feature in the fourth facial feature map corresponds to at least two pixels in the color face image, each facial feature can be calculated based on the weight of each pixel in step 214 For example, the average of the weights of at least two pixels can be used as the weight of the corresponding facial feature in the fourth face feature map. For example, the mode of the weights of at least two pixels can also be used as the fourth face. The weight of the corresponding facial feature in the feature map is used to determine the matrix composed of the weight of each facial feature as the second attention matrix.
本公开实施例中,将第四人脸特征图中的特征与关注度矩阵中对应位置的元素相乘,即得到第一加权特征图,第一加权特征图中的特征更能表达出检测对象脸部重点关注区域的语义信息。In the embodiment of the present disclosure, the features in the fourth facial feature map are multiplied by the elements at corresponding positions in the attention matrix to obtain the first weighted feature map. The features in the first weighted feature map can better express the detection object. Semantic information of facial focus areas.
该实施方式中,采用注意力模型生成第一关注度矩阵,再基于关键点检测和权重分配的方式为第四人脸特征图构造第二关注度矩阵,由于注意力模型在生成注意力系数时有可能会遗漏少部分重点关注区域的信息,而基于预先设定的权重再为第四人脸特征图中的特征分配权重,能够弥补注意力模型可能存在的遗漏,从而使人脸的重点关注区域均能受到关注,这就使得到的第一加权特征图能够充分表达重点关注区域的语义信息,进而有利于提升活体分类的精度。In this implementation, the attention model is used to generate the first attention matrix, and then the second attention matrix is constructed for the fourth facial feature map based on key point detection and weight distribution. Since the attention model generates attention coefficients It is possible that a small amount of information about the key areas of focus may be missed, and assigning weights to the features in the fourth face feature map based on preset weights can make up for the possible omissions of the attention model, so that the focus of the face can be All areas can receive attention, which enables the obtained first weighted feature map to fully express the semantic information of the key areas of interest, which in turn helps improve the accuracy of living body classification.
示例性地,如图2C所示,根据第一人脸特征图和第三人脸特征图,得到检测对象的活体检测结果,包括如下步骤221至226:Exemplarily, as shown in Figure 2C, obtaining the living body detection result of the detection object based on the first facial feature map and the third facial feature map includes the following steps 221 to 226:
221:获取第一人脸特征图中每个人脸特征的关注度,得到关注度矩阵E。221: Obtain the degree of attention of each facial feature in the first facial feature map, and obtain the degree of attention matrix E.
在一些实施例中,可参照步骤B2中的关注度矩阵的获取方法,首先采用注意力模型生成第一人脸特征图中每个人脸特征的注意力系数,将注意力系数构成的矩阵确定为第三关注度矩阵,对红外人脸图像进行关键点检测,同样得到预设关注区域的N个关键点(比如106个关键点)及该N个关键点的坐标信息和类别信息,对于第一人脸特征图中的每个人脸特征所在的位置,确定该每个人脸特征所在位置在红外人脸图像中对应的至少两个像素点,并获取该至少两个像素点的坐标信息,对于至少两个像素点中的每个像素点,采用每个像素点的坐标信息和N个关键点的坐标信息,计算每个像素点与N个关键点中每个关键点之间的距离,基于该每个像素点与N个关键点中每个关键点之间 的距离和该每个关键点的类别信息,对该每个像素点分配权重,得到每个像素点的N个参考权重,将N个参考权重的平均值确定为该每个像素点的权重,将该至少两个像素点的权重的平均值或众数确定为第一人脸特征图中的每个人脸特征的权重,将该每个人脸特征的权重构成的矩阵确定为第四关注度矩阵,将第三关注度矩阵与第四关注度矩阵相加,得到关注度矩阵E。In some embodiments, refer to the method of obtaining the attention matrix in step B2. First, an attention model is used to generate the attention coefficient of each facial feature in the first facial feature map, and the matrix composed of the attention coefficients is determined as The third attention matrix performs key point detection on the infrared face image, and also obtains N key points (such as 106 key points) in the preset area of interest and the coordinate information and category information of the N key points. For the first The position of each facial feature in the facial feature map is determined, and at least two pixel points corresponding to the position of each facial feature in the infrared facial image are obtained, and the coordinate information of the at least two pixel points is obtained, for at least For each pixel in the two pixels, the coordinate information of each pixel and the coordinate information of N key points are used to calculate the distance between each pixel and each of the N key points. Based on the The distance between each pixel and each of the N key points and the category information of each key point are assigned a weight to each pixel to obtain N reference weights for each pixel. N The average of the reference weights is determined as the weight of each pixel point, the average or mode of the weights of the at least two pixel points is determined as the weight of each facial feature in the first facial feature map, and the The matrix composed of the weight of each facial feature is determined as the fourth attention matrix. The third attention matrix and the fourth attention matrix are added to obtain the attention matrix E.
222:获取第三人脸特征图中每个人脸特征的关注度,得到关注度矩阵F。222: Obtain the degree of attention of each facial feature in the third facial feature map, and obtain the degree of attention matrix F.
在一些实施例中,首先采用注意力模型生成第三人脸特征图中每个人脸特征的注意力系数,将注意力系数构成的矩阵确定为第五关注度矩阵,对彩色人脸图像进行关键点检测,同样得到预设关注区域的S个关键点(比如106个关键点)及该S个关键点的坐标信息和类别信息,对于第三人脸特征图中的每个人脸特征所在的位置,确定该每个人脸特征所在位置在彩色人脸图像中对应的至少两个像素点,并获取该至少两个像素点的坐标信息,对于至少两个像素点中的每个像素点,采用每个像素点的坐标信息和S个关键点的坐标信息,计算每个像素点与S个关键点中每个关键点之间的距离,基于该每个像素点与S个关键点中每个关键点之间的距离和该每个关键点的类别信息,对该每个像素点分配权重,得到每个像素点的S个参考权重,将S个参考权重的平均值确定为该每个像素点的权重,将该至少两个像素点的权重的平均值或众数确定为第三人脸特征图中的每个人脸特征的权重,将该每个人脸特征的权重构成的矩阵确定为第六关注度矩阵,将第五关注度矩阵与第六关注度矩阵相加,得到关注度矩阵F。In some embodiments, an attention model is first used to generate the attention coefficient of each facial feature in the third facial feature map, the matrix composed of the attention coefficients is determined as the fifth attention matrix, and the color face image is keyed Point detection also obtains S key points (such as 106 key points) in the preset area of interest and the coordinate information and category information of the S key points. For the location of each facial feature in the third facial feature map , determine at least two pixels corresponding to the location of each facial feature in the color face image, and obtain the coordinate information of the at least two pixels. For each pixel in the at least two pixels, use each The coordinate information of each pixel point and the coordinate information of S key points are calculated. The distance between each pixel point and each of the S key points is calculated based on the distance between each pixel point and each of the S key points. The distance between points and the category information of each key point are assigned a weight to each pixel point to obtain S reference weights for each pixel point, and the average of the S reference weights is determined for each pixel point. The weight of the at least two pixels is determined as the weight of each facial feature in the third facial feature map, and the matrix composed of the weight of each facial feature is determined as the sixth Attention matrix, add the fifth attention matrix and the sixth attention matrix to obtain the attention matrix F.
223:将第一人脸特征图与注度矩阵E相乘,得到第二加权特征图;223: Multiply the first face feature map and the attention matrix E to obtain the second weighted feature map;
224:将第三人脸特征图与注度矩阵F相乘,得到第三加权特征图;224: Multiply the third face feature map and the attention matrix F to obtain the third weighted feature map;
225:将第二加权特征图与第三加权特征图拼接,得到拼接加权特征图;225: Splice the second weighted feature map and the third weighted feature map to obtain a spliced weighted feature map;
226:对拼接加权特征图进行分类,得到检测对象的活体检测结果。226: Classify the spliced weighted feature map to obtain the live detection result of the detection object.
该实施方式中,采用另一种特征拼接方式,即采用注意力模型、关键点检测和权重分配为第一人脸特征图生成注意力矩阵E,然后将第一人脸特征图与注度矩阵E相乘,同样能够得到可充分表达重点关注区域的语义信息的第二加权特征图;采用注意力模型、关键点检测和权重分配为第三人脸特征图生成注意力矩阵F,将第三人脸特征图与注度矩阵F相乘,同样能够得到可充分表达重点关注区域的语义信息的第三加权特征图;将第二加权特征图与第三加权特征图拼接,充分融合了彩色人脸图像中重点关注区域的语义信息和红外人脸图像中重点关注区域的语义信息,同样有利于提升活体分类的精度。In this implementation, another feature splicing method is used, that is, the attention model, key point detection and weight distribution are used to generate the attention matrix E for the first face feature map, and then the first face feature map and the attention matrix By multiplying E, we can also obtain the second weighted feature map that can fully express the semantic information of the key area of interest; use the attention model, key point detection and weight distribution to generate the attention matrix F for the third face feature map, and combine the third face feature map with Multiplying the face feature map and the attention matrix F can also obtain the third weighted feature map that can fully express the semantic information of the key area of interest; splicing the second weighted feature map and the third weighted feature map fully integrates the color facial feature map. The semantic information of key areas of interest in face images and the semantic information of key areas of interest in infrared face images are also conducive to improving the accuracy of living body classification.
本公开实施例通过获取双目相机采集的检测对象的红外人脸图像和彩色人脸图像;对所述红外人脸图像进行特征提取,得到第一人脸特征图,以及对所述彩色人脸图像进行特征提取,得到第二人脸特征图;根据所述第二人脸特征图,得到所述检测对象的目标类别属性信息;根据所述目标类别属性信息和所述第二人脸特征图,得到第三人脸特征图;根据所述第一人脸特征图和所述第三人脸特征图,得到所述检测对象的活体检测结果。这样对彩色人脸图像提取出的第二人脸特征图进行分类,得到检测对象的类别属 性信息(即目标类别属性信息),基于检测对象的类别属性信息将第二人脸特征图转化为第三人脸特征图,以实现带类别属性信息的特征提取,利用带类别属性信息的特征(即第三人脸特征图)和红外人脸特征(即第一人脸特征图)进行活体检测,有利于提升双目活体检测的精度。另外,针对具有不同类别属性信息的检测对象,本公开实施例可以在同一活体检测模型中采用对应的第三分支进行推理,相比针对不同类别属性信息的检测对象需要采用不同的活体检测模型进行活体检测的方案,有利于节省存储至少两个模型所带来的内存开销、鲁棒性更强,模型内的每一第三分支可使用已有模型的参数进行迁移,在训练阶段能够实现高效迭代,同时,在第二分支后仅添加了一个类别属性分类器,其对整个模型的推理速度的影响可忽略不计。The embodiment of the present disclosure obtains the infrared face image and the color face image of the detection object collected by the binocular camera; performs feature extraction on the infrared face image to obtain the first face feature map, and performs feature extraction on the color face image. Perform feature extraction on the image to obtain a second facial feature map; obtain the target category attribute information of the detection object based on the second facial feature map; obtain the target category attribute information of the detection object based on the target category attribute information and the second facial feature map , obtain a third facial feature map; obtain a living body detection result of the detection object based on the first facial feature map and the third facial feature map. In this way, the second face feature map extracted from the color face image is classified to obtain the category attribute information of the detection object (ie, the target category attribute information), and the second face feature map is converted into the third face feature map based on the category attribute information of the detection object. Three face feature maps to achieve feature extraction with category attribute information, and use features with category attribute information (i.e., the third face feature map) and infrared face features (i.e., the first face feature map) for live detection, It is helpful to improve the accuracy of binocular live body detection. In addition, for detection objects with different categories of attribute information, embodiments of the present disclosure can use the corresponding third branch in the same living body detection model to perform inference. Compared with detection objects with different categories of attribute information, different living body detection models need to be used. The living body detection solution helps save the memory overhead caused by storing at least two models and is more robust. Each third branch in the model can be migrated using the parameters of the existing model, which can achieve high efficiency in the training phase. Iteration, meanwhile, only adds a category attribute classifier after the second branch, which has a negligible impact on the inference speed of the entire model.
请参见图6,图6为本公开实施例提供的另一种活体检测方法的流程示意图,如图6所示,该方法包括步骤601至608:Please refer to Figure 6. Figure 6 is a schematic flow chart of another living body detection method provided by an embodiment of the present disclosure. As shown in Figure 6, the method includes steps 601 to 608:
601:获取双目相机采集的检测对象的红外人脸图像和彩色人脸图像;601: Obtain the infrared face image and color face image of the detection object collected by the binocular camera;
602:对红外人脸图像进行特征提取,得到第一人脸特征图,以及对彩色人脸图像进行特征提取,得到第二人脸特征图;602: Perform feature extraction on the infrared face image to obtain the first face feature map, and perform feature extraction on the color face image to obtain the second face feature map;
603:根据第二人脸特征图,得到检测对象的目标类别属性信息;603: Obtain the target category attribute information of the detection object based on the second face feature map;
604:根据目标类别属性信息和第二人脸特征图,得到第三人脸特征图;604: Obtain the third face feature map based on the target category attribute information and the second face feature map;
605:将第一人脸特征图和第三人脸特征图拼接,得到第四人脸特征图;605: Splice the first facial feature map and the third facial feature map to obtain the fourth facial feature map;
606:获取第四人脸特征图中每个人脸特征的关注度,得到关注度矩阵;606: Obtain the degree of attention of each facial feature in the fourth facial feature map and obtain the degree of attention matrix;
607:将第四人脸特征图与关注度矩阵相乘,得到第一加权特征图;607: Multiply the fourth face feature map and the attention matrix to obtain the first weighted feature map;
608:对第一加权特征图进行分类,得到检测对象的活体检测结果。608: Classify the first weighted feature map to obtain the live detection result of the detection object.
其中,步骤601至608的实施方式在图2A至图5所示的实施例中已有相关说明,且能达到相同或相似的有益效果。The implementation of steps 601 to 608 has been described in the embodiments shown in FIGS. 2A to 5 , and can achieve the same or similar beneficial effects.
基于图2A或图6所示方法实施例的描述,请参见图7,图7为本公开实施例提供的一种活体检测装置的结构示意图,如图7所示,该装置包括获取单元701和处理单元702,其中:Based on the description of the method embodiment shown in Figure 2A or Figure 6, please refer to Figure 7. Figure 7 is a schematic structural diagram of a life detection device provided by an embodiment of the present disclosure. As shown in Figure 7, the device includes an acquisition unit 701 and Processing unit 702, wherein:
获取单元701,配置为获取双目相机采集的检测对象的红外人脸图像和彩色人脸图像;The acquisition unit 701 is configured to acquire the infrared face image and color face image of the detection object collected by the binocular camera;
处理单元702,配置为对红外人脸图像进行特征提取,得到第一人脸特征图,以及对彩色人脸图像进行特征提取,得到第二人脸特征图;The processing unit 702 is configured to perform feature extraction on the infrared face image to obtain a first face feature map, and perform feature extraction on the color face image to obtain a second face feature map;
处理单元702,还配置为根据第二人脸特征图,得到检测对象的目标类别属性信息;The processing unit 702 is also configured to obtain the target category attribute information of the detection object based on the second facial feature map;
处理单元702,还配置为根据目标类别属性信息和第二人脸特征图,得到第三人脸特征图;The processing unit 702 is also configured to obtain a third facial feature map based on the target category attribute information and the second facial feature map;
处理单元702,还配置为根据第一人脸特征图和第三人脸特征图,得到检测对象的活体检测结果。The processing unit 702 is also configured to obtain the living body detection result of the detection object based on the first facial feature map and the third facial feature map.
可以看出,在图7所示的活体检测装置中,通过获取双目相机采集的检测对象的红 外人脸图像和彩色人脸图像;对所述红外人脸图像进行特征提取,得到第一人脸特征图,以及对所述彩色人脸图像进行特征提取,得到第二人脸特征图;根据所述第二人脸特征图,得到所述检测对象的目标类别属性信息;根据所述目标类别属性信息和所述第二人脸特征图,得到第三人脸特征图;根据所述第一人脸特征图和所述第三人脸特征图,得到所述检测对象的活体检测结果。这样对彩色人脸图像提取出的第二人脸特征图进行分类,得到检测对象的类别属性信息(即目标类别属性信息),基于检测对象的类别属性信息将第二人脸特征图转化为第三人脸特征图,以实现带类别属性信息的特征提取,利用带类别属性信息的特征(即第三人脸特征图)和红外人脸特征(即第一人脸特征图)进行活体检测,有利于提升双目活体检测的精度。It can be seen that in the living body detection device shown in Figure 7, by acquiring the infrared face image and the color face image of the detection object collected by the binocular camera; performing feature extraction on the infrared face image, the first person is obtained facial feature map, and perform feature extraction on the color face image to obtain a second face feature map; obtain the target category attribute information of the detection object according to the second face feature map; and obtain the target category attribute information according to the target category Attribute information and the second facial feature map are used to obtain a third facial feature map; based on the first facial feature map and the third facial feature map, a living body detection result of the detection object is obtained. In this way, the second face feature map extracted from the color face image is classified to obtain the category attribute information of the detection object (ie, the target category attribute information), and the second face feature map is converted into the third face feature map based on the category attribute information of the detection object. Three face feature maps to achieve feature extraction with category attribute information, using features with category attribute information (i.e., the third face feature map) and infrared face features (i.e., the first face feature map) for live detection, It is helpful to improve the accuracy of binocular live body detection.
在一种可能的实施方式中,在根据第一人脸特征图和第三人脸特征图,得到检测对象的活体检测结果方面,处理单元702配置为:In a possible implementation, in terms of obtaining the living body detection result of the detection object based on the first facial feature map and the third facial feature map, the processing unit 702 is configured as:
将第一人脸特征图和第三人脸特征图拼接,得到第四人脸特征图;Splice the first facial feature map and the third facial feature map to obtain the fourth facial feature map;
获取第四人脸特征图中每个人脸特征的关注度,得到关注度矩阵;Obtain the degree of attention of each facial feature in the fourth facial feature map and obtain the degree of attention matrix;
将第四人脸特征图与关注度矩阵相乘,得到第一加权特征图;Multiply the fourth face feature map and the attention matrix to obtain the first weighted feature map;
对第一加权特征图进行分类,得到检测对象的活体检测结果。Classify the first weighted feature map to obtain a living body detection result of the detection object.
在一种可能的实施方式中,在获取第四人脸特征图中每个人脸特征的关注度,得到关注度矩阵方面,处理单元702配置为:In a possible implementation, in terms of obtaining the degree of attention of each facial feature in the fourth facial feature map and obtaining the degree of attention matrix, the processing unit 702 is configured as:
采用注意力模型生成第四人脸特征图中每个人脸特征的注意力系数,将注意力系数构成的矩阵确定为第一关注度矩阵;The attention model is used to generate the attention coefficient of each facial feature in the fourth facial feature map, and the matrix composed of the attention coefficients is determined as the first attention matrix;
对彩色人脸图像进行关键点检测,得到预设关注区域的M个关键点,M为大于1的整数;Perform key point detection on the color face image to obtain M key points in the preset area of interest, where M is an integer greater than 1;
基于第四人脸特征图和M个关键点,确定第二关注度矩阵;Based on the fourth facial feature map and M key points, determine the second attention matrix;
将第一关注度矩阵与第二关注度矩阵相加,得到关注度矩阵。Add the first attention matrix and the second attention matrix to obtain the attention matrix.
在一种可能的实施方式中,M个关键点还包括每个关键点的坐标信息和类别信息,再基于第四人脸特征图和M个关键点,确定第二关注度矩阵方面,处理单元702配置为:In a possible implementation, the M key points also include coordinate information and category information of each key point, and then based on the fourth facial feature map and the M key points, the second attention matrix is determined, and the processing unit 702 is configured as:
对于第四人脸特征图中的每个人脸特征所在的位置,确定该每个人脸特征所在的位置在彩色人脸图像中对应的至少两个像素点,并获取至少两个像素点的坐标信息;For the location of each facial feature in the fourth facial feature map, determine at least two pixels corresponding to the location of each facial feature in the color face image, and obtain the coordinate information of at least two pixels. ;
对于至少两个像素点中的每个像素点,采用每个像素点的坐标信息和M个关键点的坐标信息,计算每个像素点与M个关键点中每个关键点之间的距离;For each pixel point among at least two pixel points, use the coordinate information of each pixel point and the coordinate information of M key points to calculate the distance between each pixel point and each of the M key points;
基于每个像素点与M个关键点中每个关键点之间的距离和每个关键点的类别信息,对每个像素点分配权重,得到每个像素点的M个参考权重;Based on the distance between each pixel and each of the M key points and the category information of each key point, a weight is assigned to each pixel to obtain M reference weights for each pixel;
将M个参考权重的平均值确定为每个像素点的权重;Determine the average of the M reference weights as the weight of each pixel;
基于每个像素点的权重,确定每个人脸特征的权重,将每个人脸特征的权重构成的矩阵确定为第二关注度矩阵。Based on the weight of each pixel, the weight of each facial feature is determined, and a matrix composed of the weight of each facial feature is determined as the second attention matrix.
在一种可能的实施方式中,在对红外人脸图像进行特征提取,得到第一人脸特征图 方面,处理单元702配置为:In a possible implementation, when performing feature extraction on infrared face images to obtain the first face feature map, the processing unit 702 is configured as:
将红外人脸图像输入活体检测模型的第一分支进行特征提取,得到第一人脸特征图;Input the infrared face image into the first branch of the living body detection model for feature extraction to obtain the first face feature map;
在对彩色人脸图像进行特征提取,得到第二人脸特征图方面,处理单元702配置为:In terms of extracting features from the color face image to obtain the second face feature map, the processing unit 702 is configured as:
将彩色人脸图像输入活体检测模型的第二分支进行特征提取,得到第二人脸特征图。The color face image is input into the second branch of the live body detection model for feature extraction to obtain the second face feature map.
在一种可能的实施方式中,活体检测模型还包括类别属性分类器、至少两个第三分支和活体检测分类器,其中,第二分支、属性分类器和至少两个第三分支依次连接,至少两个第三分支相互独立,且至少两个第三分支中的每个第三分支与不同的类别属性信息对应,每个第三分支的输出分别与第一分支的输出进行拼接,将拼接后的输出作为活体检测分类器的输入。In a possible implementation, the living body detection model further includes a category attribute classifier, at least two third branches and a living body detection classifier, wherein the second branch, the attribute classifier and the at least two third branches are connected in sequence, At least two third branches are independent of each other, and each of the at least two third branches corresponds to different category attribute information. The output of each third branch is spliced with the output of the first branch respectively, and the splicing The final output is used as the input of the living body detection classifier.
在一种可能的实施方式中,第二人脸特征图中的特征包括材质、纹理和光泽中的一种或至少两种语义信息,在根据第二人脸特征图,得到检测对象的目标类别属性信息方面,处理单元702配置为:In a possible implementation, the features in the second facial feature map include one or at least two semantic information of material, texture and gloss, and the target category of the detection object is obtained based on the second facial feature map. In terms of attribute information, the processing unit 702 is configured as:
将第二人脸特征图输入属性分类器,以通过属性分类器对一种或至少两种语义信息进行分类预测,得到目标类别属性信息,目标类别属性信息包括所属地标识;Input the second face feature map into the attribute classifier to classify and predict one or at least two types of semantic information through the attribute classifier to obtain target category attribute information, where the target category attribute information includes a location identifier;
在根据目标类别属性信息和第二人脸特征图,得到第三人脸特征图方面,处理单元702配置为:In terms of obtaining the third facial feature map based on the target category attribute information and the second facial feature map, the processing unit 702 is configured as:
从至少两个第三分支中确定出与所属地标识对应的第三分支,将第二人脸特征图输入与所属地标识对应的第三分支进行特征提取,得到第三人脸特征图。A third branch corresponding to the location identification is determined from at least two third branches, and the second face feature map is input into the third branch corresponding to the location identification for feature extraction to obtain a third face feature map.
根据本公开的一个实施例,图7所示的活体检测装置中的每个个单元可以分别或全部合并为一个或若干个另外的单元来构成,或者其中的某个(些)单元还可以再拆分为功能上更小的至少两个单元来构成,这可以实现同样的操作,而不影响本公开的实施例的技术效果的实现。上述单元是基于逻辑功能划分的,在实际应用中,一个单元的功能也可以由至少两个单元来实现,或者至少两个单元的功能由一个单元实现。在本公开的其它实施例中,活体检测装置也可以包括其它单元,在实际应用中,这些功能也可以由其它单元协助实现,并且可以由至少两个单元协作实现。According to an embodiment of the present disclosure, each unit in the life detection device shown in FIG. 7 can be separately or entirely combined into one or several other units to form, or one (some) of the units can be further Splitting it into at least two functionally smaller units can achieve the same operation without affecting the realization of the technical effects of the embodiments of the present disclosure. The above-mentioned units are divided based on logical functions. In practical applications, the function of one unit can also be realized by at least two units, or the functions of at least two units can be realized by one unit. In other embodiments of the present disclosure, the living body detection device may also include other units. In practical applications, these functions may also be implemented with the assistance of other units, and may be implemented by at least two units in cooperation.
根据本公开的另一个实施例,可以通过在包括中央处理单元(Central Processing Unit,CPU)、随机存取存储介质(Random Access Memory,RAM)、只读存储介质(Read-Only Memory,ROM)等处理元件和存储元件的例如计算机的通用计算设备上运行能够执行如图2A或图6中所示的相应方法所涉及的每一步骤的计算机程序(包括程序代码),来构造如图7中所示的活体检测装置,以及来实现本公开实施例的活体检测方法。所述计算机程序可以记载于例如计算机可读记录介质上,并通过计算机可读记录介质装载于上述计算设备中,并在其中运行。According to another embodiment of the present disclosure, the system can be configured by including a central processing unit (Central Processing Unit, CPU), a random access storage medium (Random Access Memory, RAM), a read-only storage medium (Read-Only Memory, ROM), etc. A computer program (including program code) capable of executing each step involved in the corresponding method as shown in Figure 2A or Figure 6 is run on a general-purpose computing device such as a computer with processing elements and storage elements to construct the method as shown in Figure 7 The living body detection device shown in the figure, and the living body detection method of the embodiment of the present disclosure are implemented. The computer program may be recorded on, for example, a computer-readable recording medium, loaded into the above-mentioned computing device through the computer-readable recording medium, and run therein.
基于上述方法实施例和装置实施例的描述,本公开实施例提供了一种电子设备,请参见图8,电子设备包括收发器801、处理器802和存储器803。它们之间通过总线804 连接。存储器803用于存储计算机程序和数据,并可以将存储803存储的数据传输给处理器802。处理器802用于读取存储器803中的计算机程序执行以下操作:Based on the description of the above method embodiments and device embodiments, embodiments of the present disclosure provide an electronic device. See FIG. 8 . The electronic device includes a transceiver 801, a processor 802, and a memory 803. They are connected via bus 804. The memory 803 is used to store computer programs and data, and can transmit the data stored in the memory 803 to the processor 802. The processor 802 is used to read the computer program in the memory 803 to perform the following operations:
获取双目相机采集的检测对象的红外人脸图像和彩色人脸图像;Obtain the infrared face image and color face image of the detection object collected by the binocular camera;
对红外人脸图像进行特征提取,得到第一人脸特征图,以及对彩色人脸图像进行特征提取,得到第二人脸特征图;Perform feature extraction on the infrared face image to obtain the first face feature map, and perform feature extraction on the color face image to obtain the second face feature map;
根据第二人脸特征图,得到检测对象的目标类别属性信息;According to the second facial feature map, obtain the target category attribute information of the detection object;
根据目标类别属性信息和第二人脸特征图,得到第三人脸特征图;According to the target category attribute information and the second face feature map, the third face feature map is obtained;
根据第一人脸特征图和第三人脸特征图,得到检测对象的活体检测结果。According to the first face feature map and the third face feature map, the liveness detection result of the detection object is obtained.
可以看出,在图8所示的电子设备中,通过获取双目相机采集的检测对象的红外人脸图像和彩色人脸图像;对所述红外人脸图像进行特征提取,得到第一人脸特征图,以及对所述彩色人脸图像进行特征提取,得到第二人脸特征图;根据所述第二人脸特征图,得到所述检测对象的目标类别属性信息;根据所述目标类别属性信息和所述第二人脸特征图,得到第三人脸特征图;根据所述第一人脸特征图和所述第三人脸特征图,得到所述检测对象的活体检测结果。这样对彩色人脸图像提取出的第二人脸特征图进行分类,得到检测对象的类别属性信息(即目标类别属性信息),基于检测对象的类别属性信息将第二人脸特征图转化为第三人脸特征图,以实现带类别属性信息的特征提取,利用带类别属性信息的特征(即第三人脸特征图)和红外人脸特征(即第一人脸特征图)进行活体检测,有利于提升双目活体检测的精度。It can be seen that in the electronic device shown in Figure 8, by acquiring the infrared face image and the color face image of the detection object collected by the binocular camera; performing feature extraction on the infrared face image, the first face is obtained feature map, and perform feature extraction on the color face image to obtain a second face feature map; obtain target category attribute information of the detection object according to the second face feature map; and obtain target category attribute information according to the target category attribute Information and the second facial feature map are used to obtain a third facial feature map; based on the first facial feature map and the third facial feature map, a living body detection result of the detection object is obtained. In this way, the second face feature map extracted from the color face image is classified to obtain the category attribute information of the detection object (ie, the target category attribute information), and the second face feature map is converted into the third face feature map based on the category attribute information of the detection object. Three face feature maps to achieve feature extraction with category attribute information, and use features with category attribute information (i.e., the third face feature map) and infrared face features (i.e., the first face feature map) for live detection, It is helpful to improve the accuracy of binocular live body detection.
在一种可能的实施方式中,处理器802执行根据第一人脸特征图和第三人脸特征图,得到检测对象的活体检测结果,包括:In a possible implementation, the processor 802 performs the following steps to obtain the living body detection result of the detection object based on the first facial feature map and the third facial feature map, including:
将第一人脸特征图和第三人脸特征图拼接,得到第四人脸特征图;Splice the first facial feature map and the third facial feature map to obtain the fourth facial feature map;
获取第四人脸特征图中每个人脸特征的关注度,得到关注度矩阵;Obtain the degree of attention of each facial feature in the fourth facial feature map and obtain the degree of attention matrix;
将第四人脸特征图与关注度矩阵相乘,得到第一加权特征图;Multiply the fourth face feature map and the attention matrix to obtain the first weighted feature map;
对第一加权特征图进行分类,得到检测对象的活体检测结果。Classify the first weighted feature map to obtain a living body detection result of the detection object.
在一种可能的实施方式中,处理器802执行获取第四人脸特征图中每个人脸特征的关注度,得到关注度矩阵,包括:In a possible implementation, the processor 802 executes to obtain the degree of attention of each facial feature in the fourth facial feature map, and obtains the degree of attention matrix, which includes:
采用注意力模型生成第四人脸特征图中每个人脸特征的注意力系数,将注意力系数构成的矩阵确定为第一关注度矩阵;The attention model is used to generate the attention coefficient of each facial feature in the fourth facial feature map, and the matrix composed of the attention coefficients is determined as the first attention matrix;
对彩色人脸图像进行关键点检测,得到预设关注区域的M个关键点,M为大于1的整数;Perform key point detection on the color face image to obtain M key points in the preset area of interest, where M is an integer greater than 1;
基于第四人脸特征图和M个关键点,确定第二关注度矩阵;Based on the fourth facial feature map and M key points, determine the second attention matrix;
将第一关注度矩阵与第二关注度矩阵相加,得到关注度矩阵。Add the first attention matrix and the second attention matrix to obtain the attention matrix.
在一种可能的实施方式中,M个关键点还包括每个关键点的坐标信息和类别信息,处理器802执行基于第四人脸特征图和M个关键点,确定第二关注度矩阵,包括:In a possible implementation, the M key points also include coordinate information and category information of each key point. The processor 802 determines the second attention matrix based on the fourth facial feature map and the M key points, include:
对于第四人脸特征图中的每个人脸特征所在的位置,确定该每个人脸特征所在的位置在彩色人脸图像中对应的至少两个像素点,并获取至少两个像素点的坐标信息;For the location of each facial feature in the fourth facial feature map, determine at least two pixels corresponding to the location of each facial feature in the color face image, and obtain the coordinate information of at least two pixels. ;
对于至少两个像素点中的每个像素点,采用每个像素点的坐标信息和M个关键点的坐标信息,计算每个像素点与M个关键点中每个关键点之间的距离;For each pixel point among at least two pixel points, use the coordinate information of each pixel point and the coordinate information of M key points to calculate the distance between each pixel point and each of the M key points;
基于每个像素点与M个关键点中每个关键点之间的距离和每个关键点的类别信息,对每个像素点分配权重,得到每个像素点的M个参考权重;Based on the distance between each pixel and each of the M key points and the category information of each key point, a weight is assigned to each pixel to obtain M reference weights for each pixel;
将M个参考权重的平均值确定为每个像素点的权重;Determine the average of the M reference weights as the weight of each pixel;
基于每个像素点的权重,确定每个人脸特征的权重,将每个人脸特征的权重构成的矩阵确定为第二关注度矩阵。Based on the weight of each pixel, the weight of each facial feature is determined, and a matrix composed of the weight of each facial feature is determined as the second attention matrix.
在一种可能的实施方式中,处理器802执行对红外人脸图像进行特征提取,得到第一人脸特征图,包括:将红外人脸图像输入活体检测模型的第一分支进行特征提取,得到第一人脸特征图;In a possible implementation, the processor 802 performs feature extraction on the infrared face image to obtain the first face feature map, including: inputting the infrared face image into the first branch of the living body detection model for feature extraction, and obtaining The first facial feature map;
处理器802执行对彩色人脸图像进行特征提取,得到第二人脸特征图,包括:The processor 802 performs feature extraction on the color face image to obtain a second face feature map, including:
将彩色人脸图像输入活体检测模型的第二分支进行特征提取,得到第二人脸特征图。The color face image is input into the second branch of the live body detection model for feature extraction to obtain the second face feature map.
在一种可能的实施方式中,活体检测模型还包括类别属性分类器、至少两个第三分支和活体检测分类器,其中,第二分支、属性分类器和至少两个第三分支依次连接,至少两个第三分支相互独立,且至少两个第三分支中的每个第三分支与不同的类别属性信息对应,每个第三分支的输出分别与第一分支的输出进行拼接,将拼接后的输出作为活体检测分类器的输入。In a possible implementation, the living body detection model further includes a category attribute classifier, at least two third branches and a living body detection classifier, wherein the second branch, the attribute classifier and the at least two third branches are connected in sequence, At least two third branches are independent of each other, and each of the at least two third branches corresponds to different category attribute information. The output of each third branch is spliced with the output of the first branch respectively, and the splicing The final output is used as the input of the living body detection classifier.
在一种可能的实施方式中,第二人脸特征图中的特征包括材质、纹理和光泽中的一种或至少两种语义信息,处理器802执行根据第二人脸特征图,得到检测对象的目标类别属性信息,包括:In a possible implementation, the features in the second face feature map include one or at least two semantic information of material, texture and gloss. The processor 802 executes to obtain the detection object according to the second face feature map. Target category attribute information, including:
将第二人脸特征图输入属性分类器,以通过属性分类器对一种或至少两种语义信息进行分类预测,得到目标类别属性信息,目标类别属性信息包括所属地标识;Input the second face feature map into the attribute classifier to classify and predict one or at least two types of semantic information through the attribute classifier to obtain target category attribute information, where the target category attribute information includes a location identifier;
根据目标类别属性信息和第二人脸特征图,得到第三人脸特征图,包括:According to the target category attribute information and the second face feature map, the third face feature map is obtained, including:
从至少两个第三分支中确定出与所属地标识对应的第三分支,将第二人脸特征图输入与所属地标识对应的第三分支进行特征提取,得到第三人脸特征图。A third branch corresponding to the location identification is determined from at least two third branches, and the second face feature map is input into the third branch corresponding to the location identification for feature extraction to obtain a third face feature map.
示例性地,该电子设备可包括但不仅限于收发器801、处理器802和存储器803,本领域技术人员可以理解,所述示意图仅仅是电子设备的示例,并不构成对电子设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件。Illustratively, the electronic device may include but is not limited to a transceiver 801, a processor 802, and a memory 803. Those skilled in the art can understand that the schematic diagram is only an example of an electronic device and does not constitute a limitation on the electronic device. May include more or fewer parts than shown, or combinations of certain parts, or different parts.
需要说明的是,由于电子设备的处理器802执行计算机程序时实现本公开实施例的活体检测方法中的步骤,因此该活体检测方法的实施例均适用于该电子设备,且均能达到相同或相似的有益效果。It should be noted that since the processor 802 of the electronic device implements the steps in the life detection method of the embodiment of the present disclosure when executing the computer program, the embodiments of the life detection method are all applicable to the electronic device, and can achieve the same or better results. Similar beneficial effects.
本公开实施例还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现如上述方法实施例中记载的任何一种活体检测方法的部分或全部步骤。其中,该计算机可读存储介质可以只存储活体检测方法对应的计算机程序。Embodiments of the present disclosure also provide a computer-readable storage medium that stores a computer program, and the computer program is executed by a processor to implement any of the living body detection methods described in the above method embodiments. some or all of the steps. Wherein, the computer-readable storage medium may only store the computer program corresponding to the living body detection method.
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备,可为易失性存储介质或者非易失性存储介质。计算机可读存储介质例如可以是(但不限于)电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、磁碟、光盘、随机存取存储器、只读存储器、可擦式可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)或闪存、静态随机存取存储器(Static Random-Access Memory,SRAM)、便携式压缩盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、数字多功能盘(Digital Video Disc,DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。A computer-readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device, and may be a volatile storage medium or a non-volatile storage medium. The computer-readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the above. More specific examples (non-exhaustive list) of computer-readable storage media include: portable computer disks, hard drives, magnetic disks, optical disks, random access memory, read-only memory, Erasable Programmable Read-Only Memory Read Only Memory, EPROM) or flash memory, Static Random-Access Memory (Static Random-Access Memory, SRAM), Portable Compact Disk Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM), Digital Multi-Function Disk (Digital Video Disc, DVD), memory stick, floppy disk, mechanical encoding device, such as a punched card or a raised structure in a groove with instructions stored thereon, and any suitable combination of the above. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses through fiber optic cables), or through electrical wires. transmitted electrical signals.
本公开实施例还提出一种计算机程序,所述计算机程序包括计算机可读代码,在所述计算机可读代码被计算机读取并执行的情况下,实现本公开任一实施例中的方法的部分或全部步骤。Embodiments of the present disclosure also provide a computer program. The computer program includes computer readable code. When the computer readable code is read and executed by a computer, part of the method in any embodiment of the present disclosure is implemented. or all steps.
本公开实施例还提供一种计算机程序产品,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机程序可操作来使计算机执行如上述方法实施例中记载的任何一种活体检测方法的部分或全部步骤。Embodiments of the present disclosure also provide a computer program product. The computer program product includes a non-transitory computer-readable storage medium storing a computer program. The computer program is operable to cause the computer to perform the steps described in the above method embodiments. Some or all steps of any living body detection method.
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本公开并不受所描述的动作顺序的限制,因为依据本公开,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于可选实施例,所涉及的动作和模块并不一定是本公开所必须的。It should be noted that for the sake of simple description, the foregoing method embodiments are expressed as a series of action combinations. However, those skilled in the art should know that the present disclosure is not limited by the described action sequence. Because in accordance with the present disclosure, certain steps may be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are optional embodiments, and the actions and modules involved are not necessarily necessary for the present disclosure.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the above embodiments, each embodiment is described with its own emphasis. For parts that are not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.
在本公开所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如至少两个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。In the several embodiments provided in this disclosure, it should be understood that the disclosed device can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, at least two units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到至少两个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated. The components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to at least two network units. . Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是 各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件程序模块的形式实现。In addition, each functional unit in various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above integrated units can be implemented in the form of hardware or software program modules.
所述集成的单元如果以软件程序模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。The integrated unit, if implemented in the form of a software program module and sold or used as an independent product, may be stored in a computer-readable memory. Based on this understanding, the technical solution of the present disclosure is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory, It includes several instructions to cause a computer device (which can be a personal computer, a server or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of the present disclosure.
以上对本公开实施例进行了详细介绍,本文中应用了个例对本公开的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本公开的方法及其核心思想;同时,对于本领域的一般技术人员,依据本公开的思想,在实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本公开的限制。The embodiments of the present disclosure have been introduced in detail above. Examples are used in this article to illustrate the principles and implementation modes of the present disclosure. The description of the above embodiments is only used to help understand the methods and core ideas of the present disclosure; at the same time, for this disclosure Those of ordinary skill in the field may make changes in the implementation and application scope based on the ideas of the present disclosure. In summary, the contents of this description should not be understood as limiting the present disclosure.

Claims (18)

  1. 一种活体检测方法,所述方法包括:A living body detection method, the method includes:
    获取双目相机采集的检测对象的红外人脸图像和彩色人脸图像;Obtain the infrared face image and color face image of the detection object collected by the binocular camera;
    对所述红外人脸图像进行特征提取,得到第一人脸特征图,以及对所述彩色人脸图像进行特征提取,得到第二人脸特征图;Perform feature extraction on the infrared face image to obtain a first face feature map, and perform feature extraction on the color face image to obtain a second face feature map;
    根据所述第二人脸特征图,得到所述检测对象的目标类别属性信息;Obtain the target category attribute information of the detection object according to the second facial feature map;
    根据所述目标类别属性信息和所述第二人脸特征图,得到第三人脸特征图;Obtain a third facial feature map according to the target category attribute information and the second facial feature map;
    根据所述第一人脸特征图和所述第三人脸特征图,得到所述检测对象的活体检测结果。According to the first facial feature map and the third facial feature map, a living body detection result of the detection object is obtained.
  2. 根据权利要求1所述的方法,其中,所述根据所述第一人脸特征图和所述第三人脸特征图,得到所述检测对象的活体检测结果,包括:The method according to claim 1, wherein obtaining the living body detection result of the detection object based on the first facial feature map and the third facial feature map includes:
    将所述第一人脸特征图和所述第三人脸特征图拼接,得到第四人脸特征图;Splicing the first facial feature map and the third facial feature map to obtain a fourth facial feature map;
    获取所述第四人脸特征图中每个人脸特征的关注度,得到关注度矩阵;Obtain the degree of attention of each facial feature in the fourth facial feature map and obtain the degree of attention matrix;
    将所述第四人脸特征图与所述关注度矩阵相乘,得到第一加权特征图;Multiply the fourth facial feature map and the attention matrix to obtain a first weighted feature map;
    对所述第一加权特征图进行分类,得到所述检测对象的活体检测结果。Classify the first weighted feature map to obtain a living body detection result of the detection object.
  3. 根据权利要求2所述的方法,其中,所述获取所述第四人脸特征图中每个人脸特征的关注度,得到关注度矩阵,包括:The method according to claim 2, wherein said obtaining the degree of attention of each facial feature in the fourth facial feature map and obtaining the degree of attention matrix includes:
    采用注意力模型生成所述第四人脸特征图中每个人脸特征的注意力系数,将所述注意力系数构成的矩阵确定为第一关注度矩阵;Using an attention model to generate attention coefficients for each facial feature in the fourth facial feature map, the matrix composed of the attention coefficients is determined as a first attention matrix;
    对所述彩色人脸图像进行关键点检测,得到预设关注区域的M个关键点,所述M为大于1的整数;Perform key point detection on the color face image to obtain M key points of the preset area of interest, where M is an integer greater than 1;
    基于所述第四人脸特征图和所述M个关键点,确定第二关注度矩阵;Based on the fourth facial feature map and the M key points, determine a second attention matrix;
    将所述第一关注度矩阵与所述第二关注度矩阵相加,得到所述关注度矩阵。The first attention matrix and the second attention matrix are added to obtain the attention matrix.
  4. 根据权利要求3所述的方法,其中,所述M个关键点还包括每个所述关键点的坐标信息和类别信息,所述基于所述第四人脸特征图和所述M个关键点,确定第二关注度矩阵,包括:The method according to claim 3, wherein the M key points further include coordinate information and category information of each key point, and the method is based on the fourth facial feature map and the M key points. , determine the second attention matrix, including:
    对于所述第四人脸特征图中的每个人脸特征所在的位置,确定所述每个人脸特征所在的位置在所述彩色人脸图像中对应的至少两个像素点,并获取所述至少两个像素点的坐标信息;For the location of each facial feature in the fourth facial feature map, determine at least two pixels corresponding to the location of each facial feature in the color face image, and obtain the at least Coordinate information of two pixels;
    对于所述至少两个像素点中的每个像素点,采用所述每个像素点的坐标信息和所述M个关键点的坐标信息,计算所述每个像素点与所述M个关键点中每个关键点之间的距离;For each pixel point in the at least two pixel points, use the coordinate information of each pixel point and the coordinate information of the M key points to calculate the relationship between each pixel point and the M key points. The distance between each key point in
    基于所述每个像素点与所述M个关键点中每个关键点之间的距离和所述每个关键点的类别信息,对所述每个像素点分配权重,得到所述每个像素点的M个参考权重;Based on the distance between each pixel and each of the M key points and the category information of each key point, a weight is assigned to each pixel to obtain each pixel. M reference weights of points;
    将所述M个参考权重的平均值确定为所述每个像素点的权重;Determine the average value of the M reference weights as the weight of each pixel;
    基于所述每个像素点的权重,确定所述每个人脸特征的权重,将所述每个人脸特征的权重构成的矩阵确定为第二关注度矩阵。Based on the weight of each pixel point, the weight of each facial feature is determined, and a matrix composed of the weight of each facial feature is determined as a second attention matrix.
  5. 根据权利要求1至4任一项所述的方法,其中,所述对所述红外人脸图像进行特征提取,得到第一人脸特征图,包括:The method according to any one of claims 1 to 4, wherein the feature extraction of the infrared face image to obtain the first face feature map includes:
    将所述红外人脸图像输入活体检测模型的第一分支进行特征提取,得到所述第一人脸特征图;Input the infrared face image into the first branch of the living body detection model for feature extraction to obtain the first face feature map;
    所述对所述彩色人脸图像进行特征提取,得到第二人脸特征图,包括:The feature extraction of the color face image to obtain a second face feature map includes:
    将所述彩色人脸图像输入所述活体检测模型的第二分支进行特征提取,得到所述第二人脸特征图。The color face image is input into the second branch of the living body detection model for feature extraction to obtain the second face feature map.
  6. 根据权利要求5所述的方法,其中,所述活体检测模型还包括类别属性分类器、至少两个第三分支和活体检测分类器,其中,所述第二分支、所述属性分类器和所述至少两个第三分支依次连接,所述至少两个第三分支相互独立,且所述至少两个第三分支中的每个第三分支与不同的类别属性信息对应,所述每个第三分支的输出分别与所述第一分支的输出进行拼接,将拼接后的输出作为所述活体检测分类器的输入。The method of claim 5, wherein the living body detection model further includes a category attribute classifier, at least two third branches and a living body detection classifier, wherein the second branch, the attribute classifier and the living body detection classifier The at least two third branches are connected in sequence, the at least two third branches are independent of each other, and each of the at least two third branches corresponds to different category attribute information, and each of the at least two third branches corresponds to different category attribute information. The outputs of the three branches are respectively spliced with the output of the first branch, and the spliced output is used as the input of the living body detection classifier.
  7. 根据权利要求6所述的方法,其中,所述第二人脸特征图中的特征包括材质、纹理和光泽中的一种或至少两种语义信息,所述根据所述第二人脸特征图,得到所述检测对象的目标类别属性信息,包括:The method according to claim 6, wherein the features in the second facial feature map include one or at least two semantic information of material, texture and gloss. , obtain the target category attribute information of the detection object, including:
    将所述第二人脸特征图输入所述属性分类器,并通过所述属性分类器对所述一种或至少两种语义信息进行分类预测,得到所述目标类别属性信息,所述目标类别属性信息包括所属地标识;The second face feature map is input into the attribute classifier, and the one or at least two semantic information are classified and predicted by the attribute classifier to obtain the target category attribute information. Attribute information includes location identification;
    所述根据所述目标类别属性信息和所述第二人脸特征图,得到第三人脸特征图,包括:Obtaining a third facial feature map based on the target category attribute information and the second facial feature map includes:
    从所述至少两个第三分支中确定出与所述所属地标识对应的第三分支,将所述第二人脸特征图输入与所述所属地标识对应的第三分支进行特征提取,得到所述第三人脸特征图。Determine the third branch corresponding to the location identification from the at least two third branches, input the second face feature map into the third branch corresponding to the location identification for feature extraction, and obtain The third facial feature map.
  8. 一种活体检测装置,所述装置包括获取单元和处理单元;A living body detection device, the device includes an acquisition unit and a processing unit;
    所述获取单元,配置为获取双目相机采集的检测对象的红外人脸图像和彩色人脸图像;The acquisition unit is configured to acquire the infrared face image and color face image of the detection object collected by the binocular camera;
    所述处理单元,配置为对所述红外人脸图像进行特征提取,得到第一人脸特征图,以及对所述彩色人脸图像进行特征提取,得到第二人脸特征图;The processing unit is configured to perform feature extraction on the infrared face image to obtain a first face feature map, and perform feature extraction on the color face image to obtain a second face feature map;
    所述处理单元,还配置为根据所述第二人脸特征图,得到所述检测对象的目标类别属性信息;The processing unit is further configured to obtain target category attribute information of the detection object based on the second facial feature map;
    所述处理单元,还配置为根据所述目标类别属性信息和所述第二人脸特征图,得到第三人脸特征图;The processing unit is further configured to obtain a third facial feature map based on the target category attribute information and the second facial feature map;
    所述处理单元,还配置为根据所述第一人脸特征图和所述第三人脸特征图,得到所述检测对象的活体检测结果。The processing unit is further configured to obtain a living body detection result of the detection object based on the first facial feature map and the third facial feature map.
  9. 根据权利要求8所述的装置,其中,所述处理单元还配置为:The device of claim 8, wherein the processing unit is further configured to:
    将所述第一人脸特征图和所述第三人脸特征图拼接,得到第四人脸特征图;Splicing the first facial feature map and the third facial feature map to obtain a fourth facial feature map;
    获取所述第四人脸特征图中每个人脸特征的关注度,得到关注度矩阵;Obtain the degree of attention of each facial feature in the fourth facial feature map and obtain the degree of attention matrix;
    将所述第四人脸特征图与所述关注度矩阵相乘,得到第一加权特征图;Multiply the fourth facial feature map and the attention matrix to obtain a first weighted feature map;
    对所述第一加权特征图进行分类,得到所述检测对象的活体检测结果。Classify the first weighted feature map to obtain a living body detection result of the detection object.
  10. 根据权利要求9所述的装置,其中,所述处理单元还配置为:The device of claim 9, wherein the processing unit is further configured to:
    采用注意力模型生成所述第四人脸特征图中每个人脸特征的注意力系数,将所述注意力系数构成的矩阵确定为第一关注度矩阵;Using an attention model to generate attention coefficients for each facial feature in the fourth facial feature map, the matrix composed of the attention coefficients is determined as a first attention matrix;
    对所述彩色人脸图像进行关键点检测,得到预设关注区域的M个关键点,所述M为大于1的整数;Perform key point detection on the color face image to obtain M key points of the preset area of interest, where M is an integer greater than 1;
    基于所述第四人脸特征图和所述M个关键点,确定第二关注度矩阵;Based on the fourth facial feature map and the M key points, determine a second attention matrix;
    将所述第一关注度矩阵与所述第二关注度矩阵相加,得到所述关注度矩阵。The first attention matrix and the second attention matrix are added to obtain the attention matrix.
  11. 根据权利要求10所述的装置,其中,所述M个关键点还包括每个所述关键点的坐标信息和类别信息,所述处理单元还配置为:The device according to claim 10, wherein the M key points further include coordinate information and category information of each key point, and the processing unit is further configured to:
    对于所述第四人脸特征图中的每个人脸特征所在的位置,确定所述每个人脸特征所在的位置在所述彩色人脸图像中对应的至少两个像素点,并获取所述至少两个像素点的坐标信息;For the location of each facial feature in the fourth facial feature map, determine at least two pixels corresponding to the location of each facial feature in the color face image, and obtain the at least Coordinate information of two pixels;
    对于所述至少两个像素点中的每个像素点,采用所述每个像素点的坐标信息和所述M个关键点的坐标信息,计算所述每个像素点与所述M个关键点中每个关键点之间的距离;For each pixel point in the at least two pixel points, use the coordinate information of each pixel point and the coordinate information of the M key points to calculate the relationship between each pixel point and the M key points. The distance between each key point in
    基于所述每个像素点与所述M个关键点中每个关键点之间的距离和所述每个关键点的类别信息,对所述每个像素点分配权重,得到所述每个像素点的M个参考权重;Based on the distance between each pixel and each of the M key points and the category information of each key point, a weight is assigned to each pixel to obtain each pixel. M reference weights of points;
    将所述M个参考权重的平均值确定为所述每个像素点的权重;Determine the average value of the M reference weights as the weight of each pixel;
    基于所述每个像素点的权重,确定所述每个人脸特征的权重,将所述每个人脸特征的权重构成的矩阵确定为第二关注度矩阵。Based on the weight of each pixel point, the weight of each facial feature is determined, and a matrix composed of the weight of each facial feature is determined as a second attention matrix.
  12. 根据权利要求8至11任一项所述的装置,其中,所述处理单元还配置为:The device according to any one of claims 8 to 11, wherein the processing unit is further configured to:
    将所述红外人脸图像输入活体检测模型的第一分支进行特征提取,得到所述第一人脸特征图;Input the infrared face image into the first branch of the living body detection model for feature extraction to obtain the first face feature map;
    所述对所述彩色人脸图像进行特征提取,得到第二人脸特征图,包括:The feature extraction of the color face image to obtain a second face feature map includes:
    将所述彩色人脸图像输入所述活体检测模型的第二分支进行特征提取,得到所述第二人脸特征图。The color face image is input into the second branch of the living body detection model for feature extraction to obtain the second face feature map.
  13. 根据权利要求12所述的装置,其中,所述活体检测模型还包括类别属性分类器、至少两个第三分支和活体检测分类器,其中,所述第二分支、所述属性分类器 和所述至少两个第三分支依次连接,所述至少两个第三分支相互独立,且所述至少两个第三分支中的每个第三分支与不同的类别属性信息对应,所述每个第三分支的输出分别与所述第一分支的输出进行拼接,将拼接后的输出作为所述活体检测分类器的输入。The device according to claim 12, wherein the living body detection model further includes a category attribute classifier, at least two third branches and a living body detection classifier, wherein the second branch, the attribute classifier and the living body detection classifier The at least two third branches are connected in sequence, the at least two third branches are independent of each other, and each of the at least two third branches corresponds to different category attribute information, and each of the at least two third branches corresponds to different category attribute information. The outputs of the three branches are respectively spliced with the output of the first branch, and the spliced output is used as the input of the living body detection classifier.
  14. 根据权利要求13所述的装置,其中,所述第二人脸特征图中的特征包括材质、纹理和光泽中的一种或至少两种语义信息,所述处理单元还配置为:The device according to claim 13, wherein the features in the second facial feature map include one or at least two semantic information of material, texture and gloss, and the processing unit is further configured to:
    将所述第二人脸特征图输入所述属性分类器,并通过所述属性分类器对所述一种或至少两种语义信息进行分类预测,得到所述目标类别属性信息,所述目标类别属性信息包括所属地标识;The second face feature map is input into the attribute classifier, and the one or at least two semantic information are classified and predicted by the attribute classifier to obtain the target category attribute information. Attribute information includes location identification;
    从所述至少两个第三分支中确定出与所述所属地标识对应的第三分支,将所述第二人脸特征图输入与所述所属地标识对应的第三分支进行特征提取,得到所述第三人脸特征图。Determine the third branch corresponding to the location identification from the at least two third branches, input the second face feature map into the third branch corresponding to the location identification for feature extraction, and obtain The third facial feature map.
  15. 一种电子设备,包括:处理器,所述处理器与存储器相连,所述存储器用于存储计算机程序,所述处理器用于执行所述存储器中存储的计算机程序,以使得所述电子设备执行如权利要求1至7中任一项所述的方法。An electronic device includes: a processor, the processor is connected to a memory, the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory, so that the electronic device executes as The method of any one of claims 1 to 7.
  16. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现如权利要求1至7任一项所述的方法。A computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement the method according to any one of claims 1 to 7.
  17. 一种计算机程序,包括计算机可读代码,在计算机可读代码在设备上运行的情况下,设备中的处理器执行用于实现权利要求1至7中任一所述的方法。A computer program includes a computer readable code. When the computer readable code is run on a device, a processor in the device executes the method for implementing any one of claims 1 to 7.
  18. 一种计算机程序产品,配置为存储计算机可读指令,所述计算机可读指令被执行时使得计算机执行权利要求1至7中任一所述的方法。A computer program product configured to store computer-readable instructions that, when executed, cause a computer to perform the method of any one of claims 1 to 7.
PCT/CN2022/110261 2022-03-22 2022-08-04 Liveness detection method and apparatus, and electronic device, storage medium, computer program and computer program product WO2023178906A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210283792.2 2022-03-22
CN202210283792.2A CN114677730A (en) 2022-03-22 2022-03-22 Living body detection method, living body detection device, electronic apparatus, and storage medium

Publications (1)

Publication Number Publication Date
WO2023178906A1 true WO2023178906A1 (en) 2023-09-28

Family

ID=82074801

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/110261 WO2023178906A1 (en) 2022-03-22 2022-08-04 Liveness detection method and apparatus, and electronic device, storage medium, computer program and computer program product

Country Status (2)

Country Link
CN (1) CN114677730A (en)
WO (1) WO2023178906A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114677730A (en) * 2022-03-22 2022-06-28 北京市商汤科技开发有限公司 Living body detection method, living body detection device, electronic apparatus, and storage medium
CN116363762A (en) * 2022-12-23 2023-06-30 北京百度网讯科技有限公司 Living body detection method, training method and device of deep learning model
CN116453194B (en) * 2023-04-21 2024-04-12 无锡车联天下信息技术有限公司 Face attribute discriminating method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975908A (en) * 2016-04-26 2016-09-28 汉柏科技有限公司 Face recognition method and device thereof
WO2020134858A1 (en) * 2018-12-29 2020-07-02 北京市商汤科技开发有限公司 Facial attribute recognition method and apparatus, electronic device, and storage medium
CN111401134A (en) * 2020-02-19 2020-07-10 北京三快在线科技有限公司 Living body detection method, living body detection device, electronic apparatus, and storage medium
CN112818722A (en) * 2019-11-15 2021-05-18 上海大学 Modular dynamically configurable living body face recognition system
CN113449623A (en) * 2021-06-21 2021-09-28 浙江康旭科技有限公司 Light living body detection method based on deep learning
CN114677730A (en) * 2022-03-22 2022-06-28 北京市商汤科技开发有限公司 Living body detection method, living body detection device, electronic apparatus, and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975908A (en) * 2016-04-26 2016-09-28 汉柏科技有限公司 Face recognition method and device thereof
WO2020134858A1 (en) * 2018-12-29 2020-07-02 北京市商汤科技开发有限公司 Facial attribute recognition method and apparatus, electronic device, and storage medium
CN112818722A (en) * 2019-11-15 2021-05-18 上海大学 Modular dynamically configurable living body face recognition system
CN111401134A (en) * 2020-02-19 2020-07-10 北京三快在线科技有限公司 Living body detection method, living body detection device, electronic apparatus, and storage medium
CN113449623A (en) * 2021-06-21 2021-09-28 浙江康旭科技有限公司 Light living body detection method based on deep learning
CN114677730A (en) * 2022-03-22 2022-06-28 北京市商汤科技开发有限公司 Living body detection method, living body detection device, electronic apparatus, and storage medium

Also Published As

Publication number Publication date
CN114677730A (en) 2022-06-28

Similar Documents

Publication Publication Date Title
WO2023178906A1 (en) Liveness detection method and apparatus, and electronic device, storage medium, computer program and computer program product
EP3968179A1 (en) Place recognition method and apparatus, model training method and apparatus for place recognition, and electronic device
WO2020103700A1 (en) Image recognition method based on micro facial expressions, apparatus and related device
CN111062871A (en) Image processing method and device, computer equipment and readable storage medium
EP4137991A1 (en) Pedestrian re-identification method and device
CN110503076B (en) Video classification method, device, equipment and medium based on artificial intelligence
WO2020024484A1 (en) Method and device for outputting data
WO2023098128A1 (en) Living body detection method and apparatus, and training method and apparatus for living body detection system
WO2018196718A1 (en) Image disambiguation method and device, storage medium, and electronic device
US20200285859A1 (en) Video summary generation method and apparatus, electronic device, and computer storage medium
CN111062328B (en) Image processing method and device and intelligent robot
CN112188306B (en) Label generation method, device, equipment and storage medium
CN112446322B (en) Eyeball characteristic detection method, device, equipment and computer readable storage medium
CN112036284B (en) Image processing method, device, equipment and storage medium
WO2023173646A1 (en) Expression recognition method and apparatus
WO2021127916A1 (en) Facial emotion recognition method, smart device and computer-readabel storage medium
WO2021134485A1 (en) Method and device for scoring video, storage medium and electronic device
CN113723164A (en) Method, device and equipment for acquiring edge difference information and storage medium
CN111259698A (en) Method and device for acquiring image
CN113486260B (en) Method and device for generating interactive information, computer equipment and storage medium
CN111274946B (en) Face recognition method, system and equipment
CN114067394A (en) Face living body detection method and device, electronic equipment and storage medium
CN116152938A (en) Method, device and equipment for training identity recognition model and transferring electronic resources
CN115223214A (en) Identification method of synthetic mouth-shaped face, model acquisition method, device and equipment
CN112070744A (en) Face recognition method, system, device and readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22932958

Country of ref document: EP

Kind code of ref document: A1