WO2023000792A1

WO2023000792A1 - Methods and apparatuses for constructing living body identification model and for living body identification, device and medium

Info

Publication number: WO2023000792A1
Application number: PCT/CN2022/093514
Authority: WO
Inventors: 俞颖超; 周秋生
Original assignee: 京东科技控股股份有限公司
Priority date: 2021-07-22
Filing date: 2022-05-18
Publication date: 2023-01-26
Also published as: CN115690918A

Abstract

The present disclosure relates to methods and apparatuses for constructing a living body identification model and for living body identification, a device and a medium. The method for constructing the living body identification model comprises: acquiring image data obtained by photographing a target subject, the target subject comprising: a living subject and non-living objects carried by multiple types of physical media; making image data of the living subject correspond to a first tag which represents a living body category; on the basis of type differences of the physical media, making image data of the non-living objects correspond to multiple types of second tags which represent non-living-body categories; inputting the image data into a machine learning model for training; and performing multi-class training on the machine learning model on the basis of the first tag and the multiple types of second tags so as to obtain a living body identification model.

Description

Method, device, equipment and medium for constructing living body recognition model and living body recognition

Cross References to Related Applications

This disclosure requires the application for an invention patent with the application number 202110833025.X and the name "method, device, equipment and medium for constructing a living body recognition model and living body recognition" submitted to the State Intellectual Property Office of the People's Republic of China on July 22, 2021. priority, and is hereby incorporated by reference in its entirety.

technical field

The present disclosure relates to the field of computer technology, and in particular to a method, device, device and medium for constructing a living body recognition model and living body recognition.

Background technique

With the development of artificial intelligence technology, human facial features are used as a way to lock and unlock the screen. By setting up the face recognition system, the face can be entered and recognized on the smart device, so that the user can unlock the smart device based on the facial features. However, in the fields of face payment, face security inspection, and video surveillance, in order to improve the security of face recognition, it is necessary for the face recognition system to be able to distinguish between real faces and some forged faces carrying face information, so as to avoid The recognition system suffers malicious attacks and causes a series of losses.

In the process of realizing the concept of the present disclosure, it is found that there are at least the following technical problems in related technologies: In the existing face anti-counterfeiting solutions, the anti-counterfeiting task is often modeled as a binary classification problem. Attacks are regarded as one type. This binary classification modeling method regards various attacks as one type. There are many feature modes that need to be learned in this mixed attack type, which makes the machine learning process of mixed attack types very difficult. Complicated, machine learning results are poor.

public content

In order to solve the above technical problems or at least partly solve the above technical problems, embodiments of the present disclosure provide a method, device, device and medium for constructing a living body recognition model and living body recognition.

In the first aspect, the embodiments of the present disclosure provide a method for constructing a living body recognition model. The above-mentioned method for constructing a living body recognition model includes: acquiring image data obtained by shooting a target object, the above-mentioned target object including: a living body object and non-living body objects carried by multiple types of physical media; The first label; based on the type difference of the physical medium, the image data of the above-mentioned non-living object corresponds to multiple second labels representing the non-living category; the above-mentioned image data is input into the machine learning model for training; and based on the above The first label and multiple types of the above-mentioned second labels are used to perform multi-classification training on the above-mentioned machine learning model to obtain a living body recognition model.

According to an embodiment of the present disclosure, the above-mentioned multi-classification training is performed on the above-mentioned machine learning model based on the above-mentioned first label and multiple types of the above-mentioned second labels to obtain a living body recognition model, including: in each round of training of the above-mentioned machine learning model, For the input current image data, output the respective probability values that the above current image data belong to the living body category and belong to the non-living body category corresponding to each type of physical medium in the above-mentioned multiple types of physical media; A target loss function, the above-mentioned target loss function is used to characterize the degree of deviation between the predicted category of the above-mentioned current image data and the category corresponding to the label of the above-mentioned current image data; and when the degree of convergence of the above-mentioned target loss function meets the set value Stop the training and get the trained living body recognition model.

According to an embodiment of the present disclosure, the above target loss function is a weighted sum of a cross-entropy loss function and a ternary center loss function.

According to an embodiment of the present disclosure, the above-mentioned cross-entropy loss function is used as the main loss function, the above-mentioned ternary center loss function is used as the auxiliary loss function, and the above-mentioned target loss function is the sum of the product of the above-mentioned auxiliary loss function and the weight coefficient and the above-mentioned main loss function , the value of the above weight coefficient is between 0 and 1 and can ensure the convergence of the above target loss function.

According to an embodiment of the present disclosure, the above-mentioned based on the type difference of the physical medium, corresponding the image data of the above-mentioned non-living object to multiple types of second labels representing the category of the non-living body, includes: based on the difference of the attribute type of the physical medium, the above-mentioned physical medium Divided into a plurality of main categories; based on the difference of at least one of the shape and material of the physical medium, the physical medium under each main category is subdivided to obtain a subdivision category; wherein, the above main category and the above subdivision category belong to non-living object category; for each image data of the non-living object, determine the target main category or target sub-category corresponding to the physical medium of the current non-living object; or the second tab of the target segment above.

According to an embodiment of the present disclosure, the above-mentioned main categories include: paper media, screen media, and material media for three-dimensional models; Or more: plain paper, curved paper, cut paper, button hole paper, normal photo, curved photo, cropped photo, button hole photo; according to the type difference of the above screen media, the above screen media are divided into the following Two or more types of subdivided categories: desktop screen, tablet computer screen, mobile phone screen, laptop computer screen; according to the material difference of the material medium for the above three-dimensional model, the above three-dimensional model material medium is divided into the following subcategories Two or more of: plaster models, wooden models, metal models, plastic models.

In a second aspect, embodiments of the present disclosure provide a method for living body identification. The above method for living body recognition includes: acquiring image data to be detected, wherein the image data to be detected contains objects to be identified; inputting the image data to be detected into a living body recognition model to output classification results of the objects to be identified is the type of physical medium corresponding to the living body category or the non-living body category; wherein, the above-mentioned living body recognition model is constructed by the above-mentioned method for constructing a living body recognition model.

In a third aspect, embodiments of the present disclosure provide an apparatus for constructing a living body recognition model. The above-mentioned device for building a living body recognition model includes: a first data acquisition module, a tag association module, an input module and a training module. The above-mentioned first data acquisition module is configured to acquire the image data of the target object obtained by shooting, and the above-mentioned target object includes: living objects and non-living objects carried by various types of physical media. The label association module is configured to correspond the image data of the above-mentioned living objects to the first label representing the category of living objects; Second tab. The above-mentioned input module is configured to input the above-mentioned image data into the machine learning model for training. The above-mentioned training module is configured to perform multi-classification training on the above-mentioned machine learning model based on the above-mentioned first label and multiple types of the above-mentioned second labels, so as to obtain a living body recognition model.

In a fourth aspect, embodiments of the present disclosure provide a device for living body identification. The above-mentioned device for living body identification includes: a second data acquisition module and an identification module. The second data acquisition module is configured to acquire image data to be detected, and the image data to be detected includes an object to be identified. The recognition module is configured to input the image data to be detected into the living body recognition model, so as to output the classification result of the object to be recognized as the living body category or the physical medium type corresponding to the non-living body category. Wherein, the above-mentioned living body recognition model is constructed by the above-mentioned method for constructing a living body recognition model or constructed by the above-mentioned device for constructing a living body recognition model.

In a fifth aspect, embodiments of the present disclosure provide an electronic device. The above-mentioned electronic equipment includes a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory complete mutual communication through the communication bus; the memory is used to store computer programs; the processor is used to execute all programs on the memory. The stored program realizes the above-mentioned method of constructing a living body recognition model or a method of living body recognition.

In a sixth aspect, embodiments of the present disclosure provide a computer-readable storage medium. A computer program is stored on the above-mentioned computer-readable storage medium, and when the above-mentioned computer program is executed by a processor, the above-mentioned method for constructing a living body recognition model or a method for living body recognition is realized.

Some technical solutions provided by embodiments of the present disclosure have some or all of the following advantages:

Based on the type difference of the physical medium, the image data of the non-living object corresponds to the multiple second labels representing the category of non-living objects. The second label is used for multi-category learning. The learning of each attack category only needs to focus on a smaller number of features, the task is simpler, the machine learning is easier and more efficient, and the living body recognition model obtained after training is suitable for living objects and non-living objects. Objects are well differentiated.

Combining the multi-category training process with more than three categories and the target loss function composed of the weighted sum of the cross-entropy loss function and the ternary center loss function makes the overall training process more efficient, fast and has a good convergence effect. Among them, based on the setting of the main loss function L _ce , it is ensured that the corresponding output of the image data sample input to the machine learning model is as close as possible to the corresponding category of the real label; based on the setting of the auxiliary loss function L _tc , in the three categories In the above multi-category training scenario, the characteristic distance between the input image data sample and the center point of the current category in the training process tends to decrease, and at the same time, the characteristic distance between the input image data sample and other category center points decreases. The minimum value tends to increase, which effectively promotes the reduction of the intra-class distance and the simultaneous increase of the inter-class distance, speeds up the convergence speed of training, and improves the effect of aggregation between similar classes and distinction between different classes.

Description of drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the following will briefly introduce the accompanying drawings that need to be used in the description of the embodiments or related technologies. Obviously, for those of ordinary skill in the art , on the premise of not paying creative labor, other drawings can also be obtained based on these drawings.

FIG. 1 schematically shows the system architecture of the method and device for constructing a living body recognition model applicable to an embodiment of the present disclosure;

FIG. 2 schematically shows a flowchart of a method for constructing a living body recognition model according to an embodiment of the present disclosure;

FIG. 3 schematically shows a detailed implementation flowchart of operation S203 according to an embodiment of the present disclosure;

FIG. 4 schematically shows a detailed implementation flowchart of operation S205 according to an embodiment of the present disclosure;

Fig. 5 schematically shows a schematic diagram of the implementation process of constructing a living body recognition model according to an embodiment of the present disclosure;

Figure 6 schematically shows the visual features of the trained model on the test set using the Cross Entropy Loss function (Cross Entropy Loss) as the target loss function;

Figure 7 schematically shows the visual features of the trained living body recognition model on the test set using the weighted sum of the cross-entropy loss function (Cross Entropy Loss) and the triplet-center loss function (Triplet-Center Loss) as the target loss function ;

FIG. 8 schematically shows a flow chart of a method for living body recognition according to an embodiment of the present disclosure;

Fig. 9 schematically shows a structural block diagram of a device for constructing a living body recognition model according to an embodiment of the present disclosure;

Fig. 10 schematically shows a structural block diagram of a device for living body recognition according to an embodiment of the present disclosure; and

Fig. 11 schematically shows a structural block diagram of an electronic device provided by an embodiment of the present disclosure.

detailed description

Embodiments of the present disclosure provide a method, device, device and medium for constructing a living body recognition model and living body recognition. The above-mentioned method for constructing a living body recognition model includes: acquiring image data obtained by shooting a target object, the above-mentioned target object including: a living body object and non-living body objects carried by multiple types of physical media; The first label; based on the type difference of the physical medium, the image data of the above-mentioned non-living object corresponds to multiple second labels representing the non-living category; the above-mentioned image data is input into the machine learning model for training; and based on the above The first label and multiple types of the above-mentioned second labels are used to perform multi-classification training on the above-mentioned machine learning model (corresponding to at least 3 categories, one category of living objects, and at least 2 categories of non-living objects carried by physical media), to obtain a living body recognition model. .

A result of the above-mentioned living body recognition model classifying the above-mentioned image data is: a living body category, or a non-living body category corresponding to one type of physical medium among the above-mentioned multiple types of physical media.

In order to make the purpose, technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the drawings in the embodiments of the present disclosure. Obviously, the described embodiments It is a part of embodiments of the present disclosure, but not all embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present disclosure.

Fig. 1 schematically shows the system architecture of the method and device for constructing a living body recognition model applicable to the embodiments of the present disclosure.

Referring to FIG. 1 , a system architecture 100 applicable to the method and device for constructing a living body recognition model according to an embodiment of the present disclosure includes:

terminal devices

101 , 102 , 103 , a network 104 and a server 105 . The network 104 is used as a medium for providing communication links between the

terminal devices

101 , 102 , 103 and the server 105 . Network 104 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.

Users can use

terminal devices

101 , 102 , 103 to interact with server 105 via network 104 to receive or send messages and the like. The

terminal devices

101, 102, 103 may be installed with an image capture device, a picture/video playing application, and the like. Other communication client applications may also be installed, such as shopping applications, web browser applications, search applications, instant messaging tools, email clients, social platform software, etc. (just examples).

The

terminal devices

101, 102, 103 can be display screens and various electronic devices that support picture/video playback. The electronic devices can further include image capture devices. For example, electronic devices include but are not limited to smart phones, tablet computers, notebook computers, Desktop computers, self-driving cars, surveillance equipment, and more.

The server 105 may be a server that provides various services, such as a background management server that provides service support for data processing of images or videos captured by users using the

terminal devices

101 , 102 , and 103 (just an example). The background management server can analyze and process the received data such as image/video processing requests, and feed back the processing results (such as web pages, information, or data obtained or generated according to user requests) to the terminal device. For example, the data processing can be to perform face recognition processing on the video frames in the images or videos captured by the

terminal devices

101, 102, 103, so as to obtain whether the faces in the above images or video frames are real faces or other types of faces. Fake face.

It should be noted that the method for constructing a living body recognition model provided by the embodiments of the present disclosure may generally be executed by the server 105 or a terminal device with certain computing capabilities. Correspondingly, the apparatus for constructing a living body recognition model provided by the embodiments of the present disclosure may generally be set in the server 105 or the above-mentioned terminal devices with certain computing capabilities. The method for constructing a living body recognition model provided by the embodiments of the present disclosure may also be executed by a server or server cluster that is different from the server 105 and can communicate with the

terminal devices

101 , 102 , 103 and/or the server 105 . Correspondingly, the apparatus for constructing a living body recognition model provided by the embodiments of the present disclosure may also be set in a server or a server cluster that is different from the server 105 and can communicate with the

terminal devices

101 , 102 , 103 and/or the server 105 .

It should be understood that the numbers of terminal devices, networks and servers in Fig. 1 are only illustrative. According to the implementation needs, there can be any number of terminal devices, networks and servers.

The first exemplary embodiment of the present disclosure provides a method of constructing a living body recognition model.

Fig. 2 schematically shows a flowchart of a method for constructing a living body recognition model according to an embodiment of the present disclosure.

Referring to FIG. 2 , the method for constructing a living body recognition model provided by an embodiment of the present disclosure includes the following operations: S201 , S202 , S203 , S204 and S205 . The above operations S201-S205 may be performed by a terminal device equipped with an image capture device, or by a server.

In operation S201, image data of a target object captured by shooting is acquired, and the target object includes: living objects and non-living objects carried by various types of physical media.

In operation S202, the image data of the above-mentioned living body object is corresponded to the first label representing the living body category.

In operation S203, based on the type difference of the physical medium, the image data of the above-mentioned non-living object is corresponded to multiple types of second labels representing the category of non-living objects.

In operation S204, the above image data is input into the machine learning model for training.

In operation S205, multi-classification training is performed on the machine learning model based on the first label and multiple types of the second label, so as to obtain a living body recognition model.

In the above operation S201, the above-mentioned living object is a real object, such as a real human part, such as a human face. The non-living objects carried by the above-mentioned physical medium may be: human faces on photos, human faces on A4 paper, human faces on screens (such as human faces on mobile phone screens), human faces corresponding to statues, etc. Here we take the human face as an example of a living object. In other application scenarios, the living object can be other real animals, such as cats, dogs, birds, etc. The non-living objects carried by physical media are: cats/dogs/ Birds, cats/dogs/birds on A4 paper, cats/dogs/birds on screen, cats/dogs/birds corresponding to statues, etc.

The way to obtain the image data obtained by shooting the target object can be directly shooting the target object by the terminal device to obtain the image data obtained by shooting the target object; Or the image data corresponding to the photo and video frame is obtained from the image and video database captured by the monitoring device).

In the above operation S202, for example, the first label is denoted as 0, and the image data of the living object is corresponding (also referred to as associated) with the label 0, and the label 0 indicates that the real classification of the living object is the living category. It should be understood that the number 0 of the above label is used as an example, and may also be defined as other numbers, as long as the number corresponds to the meaning of the representation.

In the above operation S203, according to the difference of the physical medium carried by the non-living object, the image data of the non-living object may be corresponding to multiple different category tags. Exemplarily, non-living objects are classified according to differences in physical media: ordinary paper, curved paper, cut paper, buttonhole paper, desktop screen, tablet computer screen, mobile phone screen, laptop computer screen, plaster model , wooden model, metal model, plastic model these 12 non-living categories, then the corresponding multi-category second labels can be expressed as: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12. Corresponding to each of the above types in turn, label 1 indicates that the real classification of non-living objects is non-living objects corresponding to ordinary paper. The real classification of living objects is non-living objects corresponding to bent paper, non-living objects corresponding to cut paper, ..., non-living objects corresponding to metal models, and non-living objects corresponding to plastic models.

In the above operation S204, the machine learning model may be a convolutional neural network, or other types of deep learning networks or other machine learning models.

In the above operation S205, according to the above label examples, multi-classification training may be performed on the machine learning model based on the labels 0, 1, 2, . . . , 11, 12, so as to obtain a living body recognition model. In the multi-classification training here, it corresponds to at least 3 categories, the living object corresponds to one category, and according to the difference in the type of physical medium, the non-living objects carried by the physical medium correspond to at least 2 categories.

In the embodiments of the present disclosure, considering that the attacker can present the legal user's face to the face recognition system based on physical media such as paper, screen, plaster, etc., when the model recognizes attacks based on physical media, for different physical media Media model identification relies on different features. For example, for screen-based attacks, the model mainly relies on the moiré features generated when the screen is presented, while for paper-based attacks, the model mainly relies on the unique fiber texture of paper. Features such as color gamut changes are used for identification. Therefore, based on the above-mentioned operations S201-S205, the image data of non-living objects corresponds to multiple second labels representing non-living categories based on the type difference of the physical medium, and when training the machine learning model, according to the The first label and the multi-category second label of non-living objects are used for multi-classification learning. For the learning of each attack category, only a small number of essential features need to be focused. The task is simpler, and machine learning is easier and more efficient.

Fig. 3 schematically shows a detailed implementation flowchart of operation S203 according to an embodiment of the present disclosure.

According to an embodiment of the present disclosure, referring to FIG. 3 , the operation S203 of associating the image data of the above-mentioned non-living object with multiple types of second labels representing the category of non-living objects based on the type difference of the physical medium includes the following sub-operations: S2031, S2032, S2033, and S2034.

In operation S2031, the above-mentioned physical media is divided into a plurality of main categories based on the difference of attribute types of the physical media.

In operation S2032, based on the difference of at least one of the shape and material of the physical medium, the physical medium under each main category is subdivided to obtain subdivided categories. Among them, the above-mentioned main category and the above-mentioned sub-category belong to the non-living category.

In operation S2033, for each image data of the non-living object, determine a target main category or a target sub-category corresponding to the physical medium of the current non-living object.

In operation S2034, the image data of the current non-living object is corresponded to the second label representing the above-mentioned target main category or the above-mentioned target sub-category.

According to an embodiment of the present disclosure, the above-mentioned main categories include: paper media, screen media, and material media for three-dimensional models.

According to the difference in material and shape of the above-mentioned paper-based media, the above-mentioned paper-based media are divided into two or more of the following sub-categories: plain paper, curved paper, cut-out paper, button-hole paper, plain photo, Bend photos, crop photos, buttonhole photos.

According to the differences in types of the above-mentioned screen media, the above-mentioned screen media can be divided into two or more of the following subcategories: desktop screens, tablet computer screens, mobile phone screens, and laptop computer screens.

According to the material difference of the material medium for the three-dimensional model, the above-mentioned material medium for the three-dimensional model is divided into two or more of the following subcategories: plaster model, wooden model, metal model, plastic model.

Fig. 4 schematically shows a detailed implementation flowchart of operation S205 according to an embodiment of the present disclosure.

According to an embodiment of the present disclosure, as shown in FIG. 4 , the above-mentioned operation S205 of performing multi-classification training on the above-mentioned machine learning model based on the above-mentioned first label and multiple types of above-mentioned second labels to obtain a living body recognition model includes the following sub-operations: S2051, S2052, and S2053.

In sub-operation S2051, in each round of training of the above-mentioned machine learning model, for the input current image data, output and obtain that the above-mentioned current image data respectively belong to the living body category and belong to the non-living body category corresponding to each type of physical medium in the above-mentioned multiple types of physical media each probability value of .

In sub-operation S2052, a target loss function for the current image data is determined according to the respective probability values, and the target loss function is used to characterize the degree of deviation between the predicted category of the current image data and the category corresponding to the label of the current image data.

In sub-operation S2053, the training is stopped when the convergence degree of the target loss function meets the set value, and a trained living body recognition model is obtained.

Among them, the Triplet-Center Loss function (Triplet-Center Loss) combines the advantages of Triplet Loss (Triplet Loss) and Center Loss (Center Loss). Triplet Loss makes the sample features of the same class as close as possible during the learning process. , the sample features of different categories are as far away as possible to achieve the effect of increasing the separability between categories. The center loss first provides a class center for each category. During the model learning process, the distance between the sample and the corresponding category center is minimized to reduce the intra-class variance and make the intra-class features more compact. The ternary center loss function can be Increasing the distance between classes can reduce the variance within classes.

Fig. 5 schematically shows a schematic diagram of an implementation process of constructing a living body recognition model according to an embodiment of the present disclosure.

Referring to FIG. 5 , the process of building a living body recognition model is illustrated. In this embodiment, the target object includes: a real human face, a human face carried by ordinary paper, a human face carried by curved paper, a human face carried by a tablet computer screen, a human face carried by a mobile phone screen, and a human face carried by a plaster model and the human face carried by the metal model, the captured image data of these target objects are obtained, and the image data 0-6 are respectively used to correspond to the image data of the above-mentioned target objects. A large number of image data samples acquired are divided into training set and test set, and the image data samples in the training set are input into the machine learning model for multi-classification training. During training, the features of each image data can be extracted through the weight-sharing convolutional neural network, correspondingly expressed as features 0 to 6, and the target loss function is determined based on the labels of each input image data sample, where the target loss function is cross The weighted sum of the entropy loss function (Cross Entropy Loss, CE Loss) and the ternary center loss function. After many times of training, if the convergence degree of the target loss function meets the set value, then the corresponding training is the living body recognition model. The living body recognition model can process an image data containing an object to be recognized randomly input in the test set, and the classification result is obtained: the living body category or the corresponding medium type of the non-living body category is: ordinary paper, curved paper, Tablet screens, mobile phone screens, plaster or metal models.

According to the embodiments of the present disclosure, the accuracy of the living body recognition model can be tested based on the test set, and the parameters of the living body recognition model can be adjusted according to the test set, so that the application scenarios of the living body recognition model can be generalized.

In the embodiment of the present disclosure, in the training process of the machine learning model, by using the target loss function composed of the weighted sum of the cross-entropy loss function and the ternary center loss function, the cross-entropy loss function is used as the main loss function, and the ternary The central loss function is used as an auxiliary loss function (corresponding to multiply by the previous weight coefficient α in the subsequent formula (3)). Based on the setting of the main loss function, it is ensured that the predicted category corresponding to the output of the image data sample input to the machine learning model is as close as possible to the category corresponding to the real label; based on the setting of the auxiliary loss function, in multi-category training scenarios with more than three categories It can effectively promote the reduction of intra-class distance and the simultaneous increase of inter-class distance.

In order to verify the effect of the target loss function of the embodiment of the present disclosure, the model obtained by using the weighted sum of the cross-entropy loss function and the ternary center loss function as the target loss function for training is also compared with the model obtained by only using the cross-entropy loss function for training. The model, the test results corresponding to these two loss functions.

Figure 6 schematically shows the visual features of the trained model on the test set using the cross entropy loss function (Cross Entropy Loss) as the target loss function; Figure 7 schematically shows the cross entropy loss function (Cross Entropy Loss) ) and the triplet-center loss function (Triplet-Center Loss) as the target loss function, the visual features of the trained living body recognition model on the test set.

With reference to Fig. 6 and Fig. 7, shown in the part circled by the dotted line frame, the circled part is represented as a real person feature (corresponding to a living body category), and other points in the area outside the circled part represent non-real human face features, Corresponds to the attack features in the face anti-counterfeiting technology (corresponding to the non-living category). By comparing Figure 6 and Figure 7, it can be seen that the liveness recognition model corresponding to Figure 7 has a stronger separability between real-person features and attack features, while the model corresponding to Figure 6 has a weaker separability between real-person features surrounded by attack features, which shows that Adding the triplet-center loss function Triplet-Center Loss on the basis of the cross-entropy loss function Cross Entropy Loss can improve the model's ability to distinguish between attacks and real people, which proves that the living body recognition model obtained by training with the target loss function provided by the embodiment of the present disclosure It has a good degree of discrimination for living objects and non-living objects.

In addition, compared with the two-category training scenario in the prior art, the three-category (including living body type and at least two other non-living body types) model training process proposed by the embodiments of the present disclosure requires only It is necessary to focus on fewer and essential features, which not only achieves the focus of features; but also combines with the target loss function composed of the weighted sum of the cross-entropy loss function and the ternary center loss function, making the overall training process more efficient. Fast and has a good convergence effect.

The following describes the expression of the target loss function in combination with specific examples.

Suppose the training data is

Wherein N represents the total number of samples, ^xi is the input image data sample, y ⁱ is the actual/real label corresponding to ^xi , in one embodiment, the above label y ⁱ ∈ {0,1,2,3,... ,9,10,11,12} as an example; y ⁱ =0 represents the living body type, and other values 1 to 12 represent different attack types, including: ordinary paper, bent paper, cut paper, buttonhole paper, Desktop screens, tablet screens, mobile phone screens, laptop screens, plaster models, wooden models, metal models, and plastic models correspond to non-living types.

In this embodiment, a convolutional neural network (CNN) whose machine learning model includes weight sharing is used as an example. For each image data sample x ⁱ , after image feature extraction by weight-sharing CNN network f, a fixed-dimensional feature f( ^xi ) is output, and f( ^xi ) is abbreviated as f ⁱ .

Assuming that the number of samples participating in each iteration in training is M, and M<N, then the ternary center loss function corresponding to M samples participating in each iteration is:

in,

Indicates the center point of the current category of the real label y ⁱ corresponding to the input image data sample x ⁱ ; f ⁱ is the feature extracted by the CNN network; c _j and j≠y ⁱ ^represent other The center point of the category; m is the preset hyperparameter of the ternary loss;

for f ⁱ and

The Euclidean distance of is used to characterize the feature distance between the input image data sample x ⁱ and the center point of the current category;

The minimum value of the feature distance between the image data sample x ⁱ used to characterize the input and the center points of other categories.

In the above formula (1), the purpose of setting the preset hyperparameter m is to increase the distance between classes, and the specific value can be optimized in advance. Through multiple trainings, the target loss function composed of the ternary center loss function L _tc and the weighted cross-entropy loss function converges to a preset level. In order to achieve the convergence of the ternary center loss function, the parameters of the training model make

The corresponding intra-class distance decreases,

The corresponding inter-class distance increases.

The target loss function is the weighted sum of the cross-entropy loss function L _ce and the ternary center loss function L _tc , and the target loss function is expressed as L, then L satisfies the following expression:

L=L _ce +αL _tc (3),

in,

is the probability (or score) of the image data sample x ⁱ identified as y ⁱ obtained after passing through the CNN network; α is the weight coefficient of the ternary center loss function, and the value of α is: 0<α< 1 and can guarantee the convergence of the target loss function. According to the actual experimental results, on the premise of ensuring the convergence of the target loss function, the value of α can be as large as possible to improve the training speed.

in,

satisfy the following expression:

in,

Indicates the weight of f ⁱ ,

Indicates the bias. The value of j here is the value corresponding to each category of the classification result. Here, the category corresponding to labels 0-12 is used as an example.

The target loss function provided by the embodiments of the present disclosure, which is composed of the weighted sum of the cross-entropy loss function and the ternary center loss function, can well match the multi-classification training process of more than three classifications. By using the cross-entropy loss function as the main loss function and the ternary center loss function as the auxiliary loss function, refer to formula (3). Based on the setting of the main loss function L _ce , it is ensured that the corresponding output of the image data sample input to the machine learning model is as close as possible to the corresponding category of the real label; based on the setting of the auxiliary loss function L _tc , in more than three categories In the multi-category training scenario, the feature distance between the input image data sample and the current category center point during the training process tends to decrease, and at the same time, the minimum value of the feature distance between the input image data sample and other category center points It shows an increasing trend, which effectively promotes the reduction of the intra-class distance and the simultaneous increase of the inter-class distance, speeds up the convergence speed of training, and improves the effect of aggregation between similar classes and distinction between different classes.

In comparison, the target loss function disclosed in this disclosure does not have a good adaptability to the scenario of binary classification, because the intra-class distance of binary classification is larger than that of multi-classification, and various types of attacks act as a large class, resulting in a large intra-class distance, not easy to aggregate, and the convergence speed is very slow.

Specifically, referring to the formula (1) in

In terms of this item, due to the large fluctuations in the center point corresponding to the non-living body type in the binary classification, the data corresponding to the non-living body type is not easy to aggregate in the class during the training process, so that the input image data samples are consistent with The minimum value of the feature distance between other category center points (there is only one category center point in the binary classification scenario, and it is unstable) cannot be increased in a regular manner, resulting in that the distance between classes is not easy to separate, and the convergence speed is very slow. The idea of combining the multi-classification training with more than three classifications and the target loss function in the weighted form of the cross-entropy loss function and the ternary center loss function proposed by the embodiments of the present disclosure is original and has excellent effects.

A second exemplary embodiment of the present disclosure provides a method of living body recognition.

Fig. 8 schematically shows a flow chart of a method for living body recognition according to an embodiment of the present disclosure.

Referring to FIG. 8 , the living body recognition method provided by the embodiment of the present disclosure includes the following operations: S801 and S802.

In operation S801, image data to be detected is acquired, and the image data to be detected includes an object to be recognized.

The image data to be detected can be image data containing objects to be identified in various types of application scenarios, for example, in the scenario of facial recognition punching of a facial recognition attendance machine, or in the scenario of personal smart device security verification. The image data to be detected may be: image data taken by a real user in the surrounding background, or image data taken by an illegal user in the surrounding background by taking a face photo or A4 paper with a human face printed on it.

In operation S802, the above-mentioned image data to be detected is input into the living body recognition model, so as to output the classification result of the above-mentioned object to be recognized as a living body type or a physical medium type corresponding to a non-living body type.

Through the living body recognition model, feature extraction and recognition can be performed on the image to be recognized in the input image data to be detected, and it can be identified whether the classification result of the object to be recognized is the living body category or the physical medium type determined in the non-living body category.

Wherein, the above-mentioned living body recognition model is constructed by the method for constructing a living body recognition model described in the first embodiment.

Since the living body recognition model has a good degree of discrimination between living objects and non-living objects, multi-category learning is performed according to the first label of living objects and the multi-category second labels of non-living objects. For the learning of each attack category, it only needs to focus on With fewer and essential features, the task is simpler, machine learning is easier and more efficient, and it can quickly extract and classify the feature information of the object to be recognized in the image data to be detected, and has a high efficiency and high recognition accuracy.

A third exemplary embodiment of the present disclosure provides an apparatus for constructing a living body recognition model.

Fig. 9 schematically shows a structural block diagram of an apparatus for constructing a living body recognition model according to an embodiment of the present disclosure.

Referring to FIG. 9 , an apparatus 900 for building a living body recognition model provided by an embodiment of the present disclosure includes: a first data acquisition module 901 , a tag association module 902 , an input module 903 and a training module 904 .

The above-mentioned first data acquisition module 901 is configured to acquire the image data of the target object obtained by shooting, and the above-mentioned target object includes: living objects and non-living objects carried by various types of physical media.

The tag association module 902 is configured to associate the image data of the living object with the first tag representing the living category; class second label. The tag association module 902 includes functional modules or sub-modules for implementing the above-mentioned sub-operations S2031-S2034.

The above-mentioned input module 903 is configured to input the above-mentioned image data into the machine learning model for training.

The above-mentioned training module 904 is configured to perform multi-classification training on the above-mentioned machine learning model based on the above-mentioned first label and multiple types of the above-mentioned second labels, so as to obtain a living body recognition model.

The result of the above-mentioned living body recognition model classifying the above-mentioned image data is: a living body category, or a non-living body category corresponding to one type of physical medium among the above-mentioned multiple types of physical media.

The above-mentioned training module 904 includes functional modules or sub-modules for implementing the above-mentioned sub-operations S2051-S2053.

A fourth exemplary embodiment of the present disclosure provides an apparatus for living body identification.

Fig. 10 schematically shows a structural block diagram of a device for living body recognition according to an embodiment of the present disclosure.

Referring to FIG. 10 , an apparatus 1000 for living body identification provided by an embodiment of the present disclosure includes: a second data acquisition module 1001 and an identification module 1002 .

The second data acquisition module 1001 is configured to acquire image data to be detected, and the image data to be detected includes an object to be identified.

The recognition module 1002 is configured to input the image data to be detected into the living body recognition model, so as to output the classification result of the object to be recognized as the living body category or the physical medium type corresponding to the non-living body category.

Wherein, the above-mentioned living body recognition model is constructed by the above-mentioned method for constructing a living body recognition model or constructed by the above-mentioned device for constructing a living body recognition model.

The above-mentioned device 1000 for living body recognition may store a pre-built living body recognition model, or may perform data communication with a device for building a living body recognition model, so as to call the constructed living body recognition model to process the image data to be detected, In order to obtain the classification result of the object to be recognized.

In the third embodiment above, any number of the first data acquisition module 901, the label association module 902, the input module 903 and the training module 904 can be combined in one module, or any one of the modules can be split into multiple modules. Alternatively, at least part of the functions of one or more of these modules may be combined with at least part of the functions of other modules and implemented in one module. At least one of the first data acquisition module 901, the label association module 902, the input module 903 and the training module 904 can be at least partially implemented as a hardware circuit, such as a field programmable gate array (FPGA), a programmable logic array (PLA) , system-on-chip, system-on-substrate, system-on-package, application-specific integrated circuit (ASIC), or any other reasonable means of integrating or packaging circuits, such as hardware or firmware, or in software, hardware, and firmware Any one of the three implementations or an appropriate combination of any of them. Alternatively, at least one of the first data acquisition module 901, the label association module 902, the input module 903 and the training module 904 may be at least partially implemented as a computer program module, and when the computer program module is executed, corresponding functions may be performed .

In the fourth embodiment above, any multiple of the second data acquisition module 1001 and the identification module 1002 can be implemented in one module, or any one of the modules can be split into multiple modules. Alternatively, at least part of the functions of one or more of these modules may be combined with at least part of the functions of other modules and implemented in one module. At least one of the second data acquisition module 1001 and the identification module 1002 can be at least partially implemented as a hardware circuit, such as a field programmable gate array (FPGA), a programmable logic array (PLA), a system on a chip, a system on a substrate, A system on a package, an application-specific integrated circuit (ASIC), or any other reasonable way of integrating or packaging circuits, such as hardware or firmware, or any of the three implementation methods of software, hardware, and firmware, or It can be realized by any suitable combination of any of them. Alternatively, at least one of the second data acquisition module 1001 and the identification module 1002 may be at least partially implemented as a computer program module, and when the computer program module is executed, corresponding functions may be performed.

A fifth exemplary embodiment of the present disclosure provides an electronic device.

Referring to FIG. 11 , an electronic device 1100 provided by an embodiment of the present disclosure includes a processor 1101, a communication interface 1102, a memory 1103, and a communication bus 1104, wherein the processor 1101, the communication interface 1102, and the memory 1103 complete mutual communication via the communication bus 1104. The memory 1103 is used to store computer programs; the processor 1101 is used to execute the programs stored in the memory to implement the above-mentioned method of constructing a living body recognition model or a living body recognition method.

The sixth exemplary embodiment of the present disclosure also provides a computer-readable storage medium. A computer program is stored on the above-mentioned computer-readable storage medium, and when the above-mentioned computer program is executed by a processor, the method for constructing a living body recognition model or the method for living body recognition as described above is realized.

The computer-readable storage medium may be included in the device/device described in the above embodiments; or it may exist independently without being assembled into the device/device. The above-mentioned computer-readable storage medium carries one or more programs, and when the above-mentioned one or more programs are executed, the method according to the embodiment of the present disclosure is implemented.

According to an embodiment of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, such as may include but not limited to: portable computer disk, hard disk, random access memory (RAM), read-only memory (ROM) , erasable programmable read-only memory (EPROM or flash memory), portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In the present disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.

It should be noted that in this article, relative terms such as "first" and "second" are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these No such actual relationship or order exists between entities or operations. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements of or also include elements inherent in such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.

The above descriptions are only specific implementation manners of the present disclosure, so that those skilled in the art can understand or implement the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the present disclosure. Therefore, the present disclosure will not be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features claimed herein.

Claims

A method for building a living body recognition model, comprising:

Obtaining the image data of the target object after shooting, the target object includes: living objects and non-living objects carried by various types of physical media;

Corresponding the image data of the living subject to the first label representing the living category;

Corresponding the image data of the non-living object to multiple types of second labels representing the category of non-living objects based on the type difference of the physical medium;

inputting the image data into a machine learning model for training; and

Perform multi-classification training on the machine learning model based on the first label and multiple types of the second labels, so as to obtain a living body recognition model.
The method according to claim 1, said performing multi-classification training on said machine learning model based on said first label and multiple types of said second labels, so as to obtain a living body recognition model, comprising:

In each round of training of the machine learning model, for the input current image data, output the current image data belonging to the living body category and the non-living body category corresponding to each type of physical medium in the multiple types of physical media. probability value;

Determine a target loss function for the current image data according to the respective probability values, the target loss function is used to characterize the degree of deviation between the predicted category of the current image data and the category corresponding to the label of the current image data; and

When the convergence degree of the target loss function meets the set value, the training is stopped, and a trained living body recognition model is obtained.
The method according to claim 2, the target loss function is a weighted sum of a cross-entropy loss function and a ternary center loss function.
According to the method according to claim 3, the cross-entropy loss function is used as the main loss function, the ternary center loss function is used as the auxiliary loss function, and the target loss function is the product of the auxiliary loss function and the weight coefficient and the The sum of the main loss functions, the value of the weight coefficient is between 0 and 1 and can ensure the convergence of the target loss function.
According to the method according to any one of claims 1-4, said based on the type difference of the physical medium, corresponding the image data of the non-living object to multiple types of second labels that characterize the non-living category, comprising:

dividing the physical medium into a plurality of main categories based on differences in attribute types of the physical medium;

Based on the difference of at least one of the shape and material of the physical medium, the physical medium under each main category is subdivided to obtain a subdivided category; wherein, both the main category and the subdivided category belong to the non-living category;

For each image data of the non-living object, determine a target main category or a target sub-category corresponding to the physical medium of the current non-living object; and

Corresponding the image data of the current non-living object to the second label representing the target main category or the target subcategory.
The method according to claim 5, wherein,

The main categories include: paper media, screen media, material media for three-dimensional models;

According to the difference in material and shape of the paper medium, the paper medium is divided into two or more of the following sub-categories: plain paper, curved paper, cut paper, buttonhole paper, plain photos, bent photos, cropped photos, buttonhole photos;

According to the type difference of the screen medium, the screen medium is divided into two or more of the following subdivided categories: desktop screen, tablet computer screen, mobile phone screen, notebook computer screen;

According to the material difference of the material medium for the three-dimensional model, the material medium for the three-dimensional model is divided into two or more of the following subcategories: plaster model, wooden model, metal model, plastic model.
A method for living body identification, comprising:

Obtaining image data to be detected, the image data to be detected includes an object to be identified;

Input the image data to be detected into the living body recognition model to output the classification result of the object to be recognized as the living body category or the physical medium type corresponding to the non-living body category; wherein, the living body recognition model is controlled by the right Obtained by the method described in any one of requirements 1-6.
A device for building a living body recognition model, comprising:

The first data acquisition module is configured to acquire the image data of the target object obtained by shooting, and the target object includes: living objects and non-living objects carried by multiple types of physical media;

A label association module configured to associate the image data of the living object with the first label representing the living category; and for corresponding the image data of the non-living object to the first label representing the non-living category based on the type difference of the physical medium multi-category second label;

an input module configured to input the image data into a machine learning model for training; and

The training module is configured to perform multi-classification training on the machine learning model based on the first label and multiple types of the second labels, so as to obtain a living body recognition model.
A device for living body identification, comprising:

The second data acquisition module is configured to acquire image data to be detected, where the image data to be detected includes an object to be identified;

The recognition module is configured to input the image data to be detected into the living body recognition model, so as to output the classification result of the object to be recognized as the living body category or the physical medium type corresponding to the non-living body category; wherein, the The living body recognition model is constructed by the method described in any one of claims 1-6 or constructed by the device described in claim 8.
An electronic device, including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory complete mutual communication through the communication bus;

memory for storing computer programs;

When the processor is used to execute the program stored in the memory, it realizes the method described in any one of claims 1-7.
A computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method according to any one of claims 1-7 is implemented.