CN112464873A - Model training method, face living body recognition method, system, device and medium - Google Patents

Model training method, face living body recognition method, system, device and medium Download PDF

Info

Publication number
CN112464873A
CN112464873A CN202011447167.4A CN202011447167A CN112464873A CN 112464873 A CN112464873 A CN 112464873A CN 202011447167 A CN202011447167 A CN 202011447167A CN 112464873 A CN112464873 A CN 112464873A
Authority
CN
China
Prior art keywords
image
channel
living body
visible light
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011447167.4A
Other languages
Chinese (zh)
Inventor
沈涛
罗超
胡泓
李巍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ctrip Computer Technology Shanghai Co Ltd
Original Assignee
Ctrip Computer Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ctrip Computer Technology Shanghai Co Ltd filed Critical Ctrip Computer Technology Shanghai Co Ltd
Priority to CN202011447167.4A priority Critical patent/CN112464873A/en
Publication of CN112464873A publication Critical patent/CN112464873A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection
    • G06V40/45Detection of the body part being alive

Abstract

The invention discloses a training method of a face living body recognition model, a face living body recognition method, a face living body recognition system, a face living body recognition device and a face living body recognition medium, wherein the training method comprises the following steps: acquiring a visible light image to be trained; carrying out data enhancement processing on the visible light image to obtain a plurality of different types of feature maps; modifying the number of input channels of an initial convolutional layer of the multi-channel residual convolutional neural network model; and inputting all the feature mapping graphs into an input channel of an initial convolutional layer of the multi-channel residual convolutional neural network model for training to generate a human face living body recognition model. The invention increases the number of input channels, improves the starting point of neural network model learning, and enhances the accuracy of living body identification of the neural network model.

Description

Model training method, face living body recognition method, system, device and medium
Technical Field
The invention relates to the technical field of computer vision, in particular to a training method of a face living body recognition model, a face living body recognition method, a face living body recognition system, face living body recognition equipment and a face living body recognition medium.
Background
At present, the attack modes of the false faces such as face photos, face videos, three-dimensional masks and the like are diversified, and the living body detection of the faces is more and more concerned by people. The living body detection means that a computer judges whether a detected face is a real face or a false face, and a subsequent face identification process can be carried out only under the condition of judging the detected face to be the real face. The existing human face living body detection mainly depends on obtaining human face images of users and judging whether an input human face is a living body or not through a training living body detection module.
However, in the process of training the living body detection model, data enhancement is generally used as a method for expanding the data set, that is, the diversity of the data is increased by only expanding the number of samples, so that the feature extraction of the pre-processing convolution layer on the RGB image or the gray image is not enough, and the accuracy of the neural network model identification is low.
Disclosure of Invention
The invention aims to overcome the defects that the feature extraction of a preprocessed convolutional layer on an RGB (red, green and blue) image or a gray image is not enough to cause low model identification accuracy in the prior art which only uses data enhancement as an extended data set, and provides a training method of a face living body identification model, a face living body identification method, a system, equipment and a medium.
The invention solves the technical problems through the following technical scheme:
in a first aspect, the present invention provides a training method for a face living body recognition model, where the training method includes the following steps:
acquiring a visible light image to be trained;
performing data enhancement processing on the visible light image to obtain a plurality of different types of feature maps; wherein the data enhancement processing does not change the living body information of the visible light image;
modifying the number of input channels of an initial convolutional layer of the multi-channel residual convolutional neural network model;
inputting all the feature mapping maps into an input channel of an initial convolutional layer of the multi-channel residual convolutional neural network model for training to generate a human face living body recognition model;
each input channel in the multi-channel residual convolutional neural network model has the same network structure, each input channel corresponds to a feature map, and the number value of the input channel of the initial convolutional layer is the sum of the number values of the feature maps.
Preferably, the step of inputting all the feature maps into the input channels of the initial convolutional layer of the multi-channel residual convolutional neural network model for training includes:
arranging the feature maps in a preset fixed order along a channel direction;
inputting the arranged feature mapping maps into the input channels in a one-to-one correspondence manner;
performing fusion processing on the input channel after the feature mapping image is input; wherein the fusion process comprises splicing.
Preferably, the step of performing data enhancement processing on the visible light image includes:
sequentially carrying out 12 data enhancement treatments on the visible light image according to a preset sequence; the preset sequence comprises image Fourier transform, image wavelet transform, random channel loss transform, random channel shuffle transform, downscaling, image value inversion, motion blur, turnover, grid deletion transform, random shuffle grid, random scale interception and translation scale scaling;
and/or the presence of a gas in the gas,
preferably, the range of pixel values in the feature map is [0,1 ];
and/or the presence of a gas in the gas,
the visible light image is a three-channel RGB image only containing a face part, and the types of the feature mapping images comprise RGB images and gray level images; the multi-channel residual convolutional neural network model is ResNet50, the number of the input channels of the initial convolutional layer after modification is 32, and the loss function used by the multi-channel residual convolutional neural network model is a cross entropy loss function.
In a second aspect, the present invention further provides a training system for a living human face recognition model, where the training system includes:
the first acquisition module is used for acquiring a visible light image to be trained;
the data enhancement processing module is used for carrying out data enhancement processing on the visible light image to obtain a plurality of different types of feature maps; wherein the data enhancement processing does not change the living body information of the visible light image;
the modification module is used for modifying the number of input channels of the initial convolutional layer of the multi-channel residual convolutional neural network model;
the training module is used for inputting all the feature maps into an input channel of an initial convolutional layer of the multi-channel residual convolutional neural network model for training to generate a human face living body recognition model;
each input channel in the multi-channel residual convolutional neural network model has the same network structure, each input channel corresponds to a feature map, and the number value of the input channel of the initial convolutional layer is the sum of the number values of the feature maps.
Preferably, the training module comprises:
an arrangement unit for arranging the feature maps in a preset fixed order along a channel direction;
the input unit is used for inputting the arranged feature mapping maps to the input channels in a one-to-one correspondence manner;
and the fusion unit is used for performing fusion processing on the input channel after the characteristic mapping image is input, wherein the fusion processing comprises splicing.
Preferably, the data enhancement processing module is specifically configured to:
sequentially carrying out 12 data enhancement treatments on the visible light image according to a preset sequence; the preset sequence comprises image Fourier transform, image wavelet transform, random channel loss transform, random channel shuffle transform, downscaling, image value inversion, motion blur, turnover, grid deletion transform, random shuffle grid, random scale interception and translation scale scaling;
and/or the presence of a gas in the gas,
the range of pixel values in the feature map is [0,1 ];
and/or the presence of a gas in the gas,
the visible light image is a three-channel RGB image only containing a face part, and the types of the feature mapping images comprise RGB images and gray level images; the multi-channel residual convolutional neural network model is ResNet50, the number of the input channels of the initial convolutional layer after modification is 32, and the loss function used by the multi-channel residual convolutional neural network model is a cross entropy loss function.
In a third aspect, the present invention further provides a method for recognizing a living human face, including the following steps:
acquiring an image to be identified; wherein the image to be recognized is an image including a target face;
preprocessing the image to be recognized to obtain a plurality of different types of target images to be recognized;
inputting each target image to be recognized into the face living body recognition model trained by the training method in the first aspect for inspection so as to judge whether the target face in the target image to be recognized is an attack type; and the preprocessing is the same as the data enhancement processing in the training process of the human face living body recognition model.
In a fourth aspect, the present invention further provides a face living body recognition system, including:
the second acquisition module is used for acquiring an image to be identified; wherein the image to be recognized is an image including a target face;
the preprocessing module is used for preprocessing the image to be recognized to obtain a plurality of different types of target images to be recognized;
a judging module, configured to input each target image to be recognized into the face living body recognition model trained by the training method according to the first aspect, and check the target image to be recognized to judge whether a target face in the target image to be recognized is an attack type; and the preprocessing is the same as the data enhancement processing in the training process of the human face living body recognition model.
The present invention also provides an electronic device comprising a processor, a memory and a computer program stored on the memory and operable on the processor, wherein the computer program, when executed by the processor, implements the training method of the living human face recognition model according to the first aspect, or implements the recognition method of the living human face according to the third aspect.
The present invention also provides a computer-readable storage medium on which a computer program is stored, which, when being executed by a processor, implements the method for training a living human face recognition model according to the first aspect, or performs the steps of the method for recognizing a living human face according to the third aspect.
The positive progress effects of the invention are as follows: the feature mapping image after data enhancement is input into an initial convolutional layer of a multi-channel residual convolutional neural network model for training, living body information is not changed, the neural network model simultaneously learns the relationship between enhanced data, the number of input channels is increased, the learning starting point of the neural network model is improved, and the accuracy of the neural network model for living body identification is enhanced.
Drawings
Fig. 1 is a flowchart of a training method of a face living body recognition model according to embodiment 1 of the present invention.
Fig. 2 is a schematic structural diagram of data enhancement processing of a training method of a face living body recognition model according to embodiment 1 of the present invention.
Fig. 3 is a schematic block diagram of a training system of a face living body recognition model according to embodiment 2 of the present invention.
Fig. 4 is a schematic view of an application environment of the method for recognizing a living human face according to embodiment 3 of the present invention.
Fig. 5 is a flowchart of a method for recognizing a living human face according to embodiment 3 of the present invention.
Fig. 6 is a schematic block diagram of a face living body recognition system according to embodiment 4 of the present invention.
Fig. 7 is a schematic diagram of a hardware structure of an electronic device according to embodiment 5 of the present invention.
Detailed Description
The invention is further illustrated by the following examples, which are not intended to limit the scope of the invention.
Example 1
In this embodiment, a training method for a living human face recognition model is provided, and with reference to fig. 1, the method includes the following steps:
and step S11, acquiring a visible light image to be trained.
Each training sample may include a plurality of live face images and a plurality of forged face images, a live type label corresponding to each live face image, and an attack type label corresponding to each forged face image.
The living body type face image is acquired from a real field, the acquired image can comprise a common color image, the forged face image can adopt a plane photo, a playing video or a video frame captured from the video and the like, but the forged face image cannot reflect the face three-dimensional image characteristics of the living body face, and the visible light image is a three-channel RGB image with the size of 224 × 3.
Step S12, performing data enhancement processing on the visible light image to obtain a plurality of different types of feature maps; wherein the data enhancement processing does not change the living body information of the visible light image.
The step of performing data enhancement processing on the visible light image comprises the following steps:
sequentially carrying out 12 data enhancement treatments on the visible light image according to a preset sequence; the preset sequence is image Fourier transform, image wavelet transform, random channel loss transform, random channel shuffle transform, downscaling, image value inversion, motion blur, flipping, grid deletion transform, random shuffle grid, random scale truncation, and shift scale scaling.
Note that the range of pixel values in the feature map is [0,1 ].
When the image fourier transform is performed on the visible light image, that is, the grayed visible light image is subjected to fourier transform, then the low frequency is moved to the center of the image, and finally each pixel of the visible light image is normalized to [0,1 ].
When the visible light image is subjected to image wavelet transformation, namely, the grayed visible light image is subjected to wavelet transformation, sub-images in the visible light images are normalized and then spliced to generate a feature mapping chart. When random channel loss transformation is carried out on the visible light image, channels are randomly selected for zero value filling, and living body information in the visible light image distributed in each channel is the same. When the random channel shuffling transformation is carried out on the visible light images, the sequence of the channels of the visible light images is randomly shuffled, and the living body information on each visible light image is reserved. When the visible light image is subjected to down-scaling conversion, the quality of the image is reduced after the image is reduced and the image is amplified, so that the condition of quality loss caused by the fact that the image is reduced and then amplified is enhanced.
When the image value of the visible light image is inverted, the pixel value of each pixel in the visible light image is subtracted by 255 or 1, that is, the image pixel value is inverted. When the motion blur processing is performed on the visible light image, a kernel matrix with a random size is used, which may also be a preset numerical value, and is not limited specifically here. When the grid deleting transformation is carried out on the visible light image, the rectangular area of the image is deleted in a grid mode, and the living body information is not lost due to the loss of one area in the distribution of the whole pixel. When the random shuffling grid is carried out on the visible light image, the shuffling operation does not change the overall distribution condition of the living body information on the image, and when the random zooming and intercepting operation is carried out on the visible light image, a part of the visible light image is randomly intercepted and then zoomed to a preset size, wherein the distribution characteristics of the living body information of each image area in the visible light image are the same. When the reflection transformation operation is randomly performed on the visible light image, namely, the translation, scaling and rotation operations, the distribution situation of the living body information in the visible light cannot be changed by the reflection transformation operation.
Specifically, the visible light image includes a forged face image, for example, a face image in which a live photograph is taken, and the visible light image also includes a face image of a live subject, for example, an image in which a live subject is taken. And performing data enhancement processing on the forged face image and the living face image in the visible light image to obtain a plurality of feature mapping images. It can be understood that, in this embodiment, the number of feature maps obtained after the data enhancement processing is generally consistent with the number of data enhancement processing manners adopted, that is, the feature maps of corresponding number can be obtained by performing the data enhancement processing on the image to be detected in different manners.
In the field of machine learning, a plurality of data enhancement processing modes are used for expanding the number of training samples, and the original training samples are subjected to data enhancement processing to generate a larger number of new training samples. In this embodiment, the data enhancement processing is not used to expand the images of the set to be trained, but is used to improve the relationship between the feature maps after the data enhancement processing is learned by the neural network. After the data enhancement processing, the characteristics with separability in the visible light image do not change, that is, the characteristics for distinguishing the living body type or the attack type do not change.
It should be noted that the pixel values of the feature map generated after the enhancement processing are normalized to the range of [0,1] so as to adapt to the input of the neural network model.
And step S13, modifying the number of input channels of the initial convolutional layer of the multi-channel residual convolutional neural network model.
Deep learning is a method for performing characterization learning on data in machine learning, and can be applied to various applications such as target detection, image recognition, image classification and the like. Designing a multi-channel residual convolution neural network model, wherein the neural network model comprises convolution layers, pooling layers and the like, and does not comprise full-connection layers, the number of original channels of the multi-channel residual convolution neural network model is 3, the network structure of each channel is completely the same, and the images of various different modes corresponding to the visible light image to be trained are respectively and correspondingly input. The starting point of the convolutional neural network model learning is improved after the number of the original channels is modified.
And step S14, inputting all feature maps into the input channels of the initial convolutional layers of the multi-channel residual convolutional neural network model for training to generate a human face living body recognition model. Each input channel in the multi-channel residual convolutional neural network model has the same network structure, each input channel corresponds to one feature mapping graph, and the number value of the input channel of the initial convolutional layer is the sum of the number values of the feature mapping graphs.
The visible light image is a three-channel RGB image only containing a face part, and the types of the feature mapping images comprise RGB images and gray level images; the multi-channel residual convolutional neural network model is ResNet50, the number of input channels of the modified initial convolutional layer is 32, and the loss function used by the multi-channel residual convolutional neural network model is a cross entropy loss function.
In a specific example, the batch processing data volume batch size of the multi-channel residual convolution neural network model is 128, the visible light image to be trained is from the CelebA-Spoof data set, the optimizer enables SGD, Ir to be 1e-1, Weight decay to be 5e-4, momentum to be 0.9, a GPU of V100 is used for training, and the iteration number is 100.
The number of input channels of the initial convolutional layer of the convolutional neural network model is equal to the number of the feature mapping maps corresponding to each to-be-trained visible light image, each input channel corresponds to one feature mapping map, but parameter values set by each channel can be different, so that the convolutional neural network model can be better suitable for feature extraction of different modes. Inputting a feature mapping chart generated after Fourier transform of a visible light image into a neural network model, so that the model can start learning from the angle of a frequency domain; inputting a feature map generated by wavelet transformation into the neural network model, so that the model can start learning from the perspective of a transform domain; inputting a feature map generated by random channel shuffle transformation into a neural network model, so that the model can learn the overall distribution features of pixel values; the feature map generated by the image value inversion transformation is input into the neural network model so that the model can start learning from a reverse angle.
Specifically, as shown in fig. 2, a feature map generated after FFT (fast Fourier transform) is a grayscale map, and the number of input channels is 1; a feature map generated after Wavelet transform (image Wavelet transform) is a gray scale map, and the number value of an input channel is 1; the generated feature maps after ChannelDropout (random channel loss), ChannelShuffle (random channel shuffle transform), Downscale (downscaling), InvertImg (image value inversion), motionBlur (motion blur), Flip (Flip), GridDropout (mesh delete transform), RandomGridShuffle (random shuffle mesh), RandomResizadcrop (random zoom intercept), and ShiftScaleRotate (shift scale transform) are color maps, and the number of input channels is 3.
In this embodiment, step S14 includes:
step S141, arranging the feature maps in a preset fixed order along the channel direction.
And step S142, inputting the arranged feature maps into the input channels in a one-to-one correspondence manner.
Step S143, performing fusion processing on the input channel after the characteristic mapping image is input; wherein the fusion process comprises splicing.
The feature maps are sequentially generated according to a preset sequence of 12 types of enhancement processing, and each feature map is labeled, for example, the feature maps may be labeled as feature map 1, feature map 2, feature map 3-a, feature map 3-B, feature map 3-C, feature map 4-a, feature map 4-B, feature map 4-C, and the like. And arranging the marked feature maps according to the sequence from large to small or from small to large, respectively inputting the arranged feature maps into input channels of the initial convolutional layer of the neural network model in a one-to-one correspondence manner, and then carrying out channel fusion processing, wherein the channel fusion processing generally comprises two modes of addition and splicing. The method and the device adopt a splicing mode for fusion, namely, after the detection results corresponding to all the feature maps can be further fused, whether the visible light image to be trained is a living body type or an attack type is determined.
The feature mapping image after data enhancement is input into an initial convolutional layer of a multi-channel residual convolutional neural network model for training, living body information is not changed, the neural network model simultaneously learns the relationship between enhanced data, the number of input channels is increased, the learning starting point of the neural network model is improved, and the accuracy of the neural network model for living body identification is enhanced.
Example 2
In this embodiment, a training system for a living human face recognition model is provided, as shown in fig. 3, the training system includes: the system comprises a first obtaining module 21, a data enhancement processing module 22, a modification module 23 and a training module 24, wherein the training module 24 comprises a ranking unit 241, an input unit 242 and a fusion unit 243.
The first obtaining module 21 is configured to obtain a visible light image to be trained.
Each training sample acquired by the first acquiring module 21 may include a plurality of live face images and a plurality of forged face images, a live type label corresponding to each live face image and an attack type label corresponding to each forged face image, and each visible light image is a three-channel RGB image with a size of 224 × 3.
The living body type face image is acquired from a real scene, the acquired image can comprise a common color image, the forged face image can adopt a plane photo, a playing video or a video frame intercepted from the video and the like, but the forged face image cannot embody the face stereo image characteristics of the living body face.
The data enhancement processing module 22 is used for performing data enhancement processing on the visible light image to obtain a plurality of different types of feature maps; wherein the data enhancement processing does not change the living body information of the visible light image.
The data enhancement processing module 22 is specifically configured to:
sequentially carrying out 12 data enhancement treatments on the visible light image according to a preset sequence; the preset sequence is image Fourier transform, image wavelet transform, random channel loss transform, random channel shuffle transform, downscaling, image value inversion, motion blur, flipping, grid deletion transform, random shuffle grid, random scale truncation, and shift scale scaling.
Note that the range of pixel values in the feature map is [0,1 ].
When the image fourier transform is performed on the visible light image, that is, the grayed visible light image is subjected to fourier transform, then the low frequency is moved to the center of the image, and finally each pixel of the visible light image is normalized to [0,1 ].
When the visible light image is subjected to image wavelet transformation, namely, the grayed visible light image is subjected to wavelet transformation, sub-images in the visible light images are normalized and then spliced to generate a feature mapping chart. When random channel loss transformation is carried out on the visible light image, channels are randomly selected for zero value filling, and living body information in the visible light image distributed in each channel is the same. When the random channel shuffling transformation is carried out on the visible light images, the sequence of the channels of the visible light images is randomly shuffled, and the living body information on each visible light image is reserved. When the visible light image is subjected to down-scaling conversion, the quality of the image is reduced after the image is reduced and the image is amplified, so that the condition of quality loss caused by the fact that the image is reduced and then amplified is enhanced.
When the image value of the visible light image is inverted, the pixel value of each pixel in the visible light image is subtracted by 255 or 1, that is, the image pixel value is inverted. When the motion blur processing is performed on the visible light image, a kernel matrix with a random size is used, which may also be a preset numerical value, and is not limited specifically here. When the grid deleting transformation is carried out on the visible light image, the rectangular area of the image is deleted in a grid mode, and the living body information is not lost due to the loss of one area in the distribution of the whole pixel. When the random shuffling grid is carried out on the visible light image, the shuffling operation does not change the overall distribution condition of the living body information on the image, and when the random zooming and intercepting operation is carried out on the visible light image, a part of the visible light image is randomly intercepted and then zoomed to a preset size, wherein the distribution characteristics of the living body information of each image area in the visible light image are the same. When the reflection transformation operation is randomly performed on the visible light image, namely, the translation, scaling and rotation operations, the distribution situation of the living body information in the visible light cannot be changed by the reflection transformation operation.
Specifically, the visible light image includes a forged face image, for example, a face image in which a live photograph is taken, and the visible light image also includes a face image of a live subject, for example, an image in which a live subject is taken. And performing data enhancement processing on the forged face image and the living face image in the visible light image to obtain a plurality of feature mapping images. It can be understood that, the number of feature maps obtained after the data enhancement processing of this embodiment is generally consistent with the number of data enhancement processing manners adopted, that is, the feature maps of corresponding number can be obtained by performing the data enhancement processing on the image to be detected in different manners.
In the field of machine learning, a plurality of data enhancement processing modes are used for expanding the number of training samples, and the original training samples are subjected to data enhancement processing to generate a larger number of new training samples. In this embodiment, the data enhancement processing is not used to expand the images of the set to be trained, but is used to improve the relationship between the feature maps after the data enhancement processing is learned by the neural network. After the data enhancement processing, the characteristics with separability in the visible light image do not change, that is, the characteristics for distinguishing the living body type or the attack type do not change.
It should be noted that the pixel values of the feature map generated after the enhancement processing are normalized to the range of [0,1] so as to adapt to the input of the neural network model.
And the modifying module 23 is configured to modify the number of input channels of the initial convolutional layer of the multi-channel residual convolutional neural network model.
Deep learning is a method for performing characterization learning on data in machine learning, and can be applied to various applications such as target detection, image recognition, image classification and the like. Designing a multi-channel residual convolution neural network model, wherein the neural network model comprises convolution layers, pooling layers and the like, and does not comprise full-connection layers, the number of original channels of the multi-channel residual convolution neural network model is 3, the network structure of each channel is completely the same, and the images of various different modes corresponding to the visible light image to be trained are respectively and correspondingly input. The starting point of the convolutional neural network model learning is improved after the number of the original channels is modified.
The training module 24 is used for inputting all the feature maps into an input channel of an initial convolutional layer of the multi-channel residual convolutional neural network model for training to generate a human face living body recognition model;
each input channel in the multi-channel residual convolutional neural network model has the same network structure, each input channel corresponds to one feature mapping graph, and the number value of the input channel of the initial convolutional layer is the sum of the number values of the feature mapping graphs.
The visible light image is a three-channel RGB image only containing a face part, and the types of the feature mapping images comprise RGB images and gray level images; the multi-channel residual convolutional neural network model is ResNet50, the number of input channels of the modified initial convolutional layer is 32, and the loss function used by the multi-channel residual convolutional neural network model is a cross entropy loss function.
In a specific example, the batch processing data volume batch size of the multi-channel residual convolution neural network model is 128, the visible light image to be trained is from the CelebA-Spoof data set, the optimizer enables SGD, Ir to be 1e-1, Weight decay to be 5e-4, momentum to be 0.9, a GPU of V100 is used for training, and the iteration number is 100.
The number of input channels of the initial convolutional layer of the convolutional neural network model is equal to the number of the feature mapping maps corresponding to each to-be-trained visible light image, each input channel corresponds to one feature mapping map, but parameter values set by each channel can be different, so that the convolutional neural network model can be better suitable for feature extraction of different modes. Inputting a feature mapping chart generated after Fourier transform of a visible light image into a neural network model, so that the model can start learning from the angle of a frequency domain; inputting a feature map generated by wavelet transformation into the neural network model, so that the model can start learning from the perspective of a transform domain; inputting a feature map generated by random channel shuffle transformation into a neural network model, so that the model can learn the overall distribution features of pixel values; the feature map generated by the image value inversion transformation is input into the neural network model so that the model can start learning from a reverse angle.
Specifically, a feature mapping image generated after image fourier transform is a gray scale image, and the number value of an input channel is 1; the characteristic mapping image generated after the image wavelet transformation is a gray scale image, and the number value of an input channel is 1; the feature mapping images generated after random channel loss, random channel shuffle transformation, downscaling, image value reversal, motion blur, turnover, grid deletion transformation, random shuffle grid, random scale interception and translation scale scaling transformation are all color images, the number of input channels is 3, and the number of input channels of the initial convolution layer is the sum 32 of the number of feature images generated after all data enhancement is performed on each visible light image to be trained.
In this embodiment, the training module 24 further includes:
an arranging unit 241 for arranging the feature maps in a preset fixed order along the channel direction.
The input unit 242 is configured to input the arranged feature maps to the input channels in a one-to-one correspondence manner.
A fusion unit 243, configured to perform fusion processing on the input channel after the feature map is input; wherein the fusion process comprises splicing.
The feature maps are sequentially generated according to a preset sequence of 12 types of enhancement processing, and each feature map is labeled, for example, the feature maps may be labeled as feature map 1, feature map 2, feature map 3-a, feature map 3-B, feature map 3-C, feature map 4-a, feature map 4-B, feature map 4-C, and the like. And arranging the marked feature maps according to the sequence from large to small or from small to large, respectively inputting the arranged feature maps into input channels of the initial convolutional layer of the neural network model in a one-to-one correspondence manner, and then carrying out channel fusion processing, wherein the channel fusion processing generally comprises two modes of addition and splicing. The method and the device adopt a splicing mode for fusion, namely, after the detection results corresponding to all the feature maps can be further fused, whether the visible light image to be trained is a living body type or an attack type is determined.
The training system of the human face living body recognition model is characterized in that a feature mapping image after data enhancement is input into an initial convolution layer of a multi-channel residual convolution neural network model for training, living body information is not changed, the neural network model simultaneously learns the relationship between enhanced data, the number of input channels is increased, the learning starting point of the neural network model is improved, and the accuracy of the neural network model in living body recognition is enhanced.
Example 3
Computer Vision technology (CV) generally includes image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video speech understanding, 3D technology, virtual reality, augmented reality, and synchronous localization technology, and also includes biometric identification technologies such as face recognition and fingerprint recognition. It is understood that, in the present embodiment, the processing of performing living body identification based on an image belongs to a biometric identification technology in a computer vision technology, and living body detection is realized.
The method for recognizing living human faces in the embodiment can be applied to the application environment shown in fig. 3, in which the terminal 1 communicates with the living recognition server 2 through a network. Specifically, the living body recognition server 2 may receive an image to be recognized sent by the terminal 1; and processing the image to be recognized to produce a target image to be recognized, and recognizing the target image to be recognized in vivo.
It should be noted that living body recognition may be performed on the whole human target object, that is, the whole person is included in the image to be recognized, and the biological features of the person in the image to be recognized are extracted to perform living body recognition on the target object. In the embodiment, the image to be recognized is an image including only a face of the target object, and the living body recognition result for the face portion is obtained by performing living body recognition on the face portion, so as to determine whether the target object is a living body type or an attack type.
The terminal 1 may be, but is not limited to, various personal computers, notebook computers, smart phones, and tablet computers, and the living body recognition server 2 may be implemented by an independent server or a cluster of a plurality of servers. For example, in the financial field, there is a need for human face living body identification, and when the terminal 1 is a smart phone, the transfer payment service needs to be performed after the identity authentication of the user is successful. After the terminal 1 acquires a plurality of images to be recognized of a target user, the living body recognition server 2 acquires a plurality of images to be recognized corresponding to a plurality of first target users A sent by the terminal 1, judges whether the image with recognition acquired by the operation is initiated by the first target users A, and feeds back the judgment result to the smart phone of the terminal 1, thereby finishing the face living body recognition. In addition, in an access control system, there is a need for face living body recognition, and further description is omitted here for other application scenarios.
In one embodiment, as shown in fig. 4, a method for recognizing a living human face is provided, and the embodiment is explained by applying the method to a living body recognition server 2, and it is understood that the method can also be applied to a system comprising a terminal 1 and the living body recognition server 2, and is realized by interaction of the terminal 1 and the living body recognition server 2. In this embodiment, as shown in fig. 5, the method includes the following steps:
step S31, acquiring an image to be recognized, wherein the image to be recognized is an image including a target face.
The living body recognition server 2 may receive in real time an image to be recognized acquired by the terminal 1 through a local image acquisition system, or may receive an image derived locally by the terminal 1, where the image is a photograph taken in advance or a stored photograph. But it should be noted. Images of the terminal 1 everywhere are generally determined as an attack type after being subjected to living body recognition.
And step S32, preprocessing the image to be recognized to obtain a plurality of different types of target images to be recognized.
Step S33, inputting each target image to be recognized into the face living body recognition model trained by the training method of embodiment 1, and checking to determine whether the target face in the target image to be recognized is an attack type; the preprocessing is the same as the data enhancement processing mode in the training process of the face living body recognition model.
In this embodiment, the preprocessing is performed in the same manner as the data enhancement processing in embodiment 1, and if the model training is performed in the 12 data enhancement manners in embodiment 1, the preprocessing is performed in the same 12 data enhancement manners as those of the image to be recognized.
The method for recognizing the living human face is characterized in that after an image to be recognized is preprocessed, a generated target image to be recognized is input into a neural network model trained after an input channel is amplified for living body detection, so that the starting point of learning of the neural network model is improved, and the accuracy of living body recognition is enhanced.
Example 4
In this embodiment, there is provided a face living body recognition system, as shown in fig. 6, the face living body recognition system including: a second obtaining module 41, a preprocessing module 42 and a judging module 43.
The second obtaining module 41 is configured to obtain an image to be identified; wherein the image to be recognized is an image including a target face.
And the preprocessing module 42 is configured to preprocess the image to be recognized to obtain a plurality of different types of target images to be recognized.
A judging module 43, configured to input each target image to be recognized into the face living body recognition model trained by the training method in embodiment 1 for inspection, so as to judge whether a target face in the target image to be recognized is an attack type; the preprocessing is the same as the data enhancement processing mode in the training process of the face living body recognition model.
In this embodiment, the preprocessing is performed in the same manner as the data enhancement processing in embodiment 1, and if the model training is performed in the 12 data enhancement manners in embodiment 1, the preprocessing is performed in the same 12 data enhancement manners as those of the image to be recognized.
The recognition system for the human face living body is provided, after the image to be recognized is preprocessed, the generated target image to be recognized is input into a neural network model which is trained after an input channel is amplified for living body detection, the starting point of learning of the neural network model is improved, and the accuracy of living body recognition is enhanced.
Example 5
Fig. 7 is a schematic structural diagram of an electronic device provided in this embodiment. The electronic device includes a memory, a processor and a computer program stored in the memory and executable on the processor, and the processor executes the program to implement the training method of the living human face recognition model of embodiment 1 or the recognition method of the living human face of embodiment 3, and the electronic device 60 shown in fig. 7 is only an example and should not bring any limitation to the functions and the scope of use of the embodiment of the present invention.
The electronic device 60 may be embodied in the form of a general purpose computing device, which may be a server device, for example. The components of the electronic device 60 may include, but are not limited to: the at least one processor 61, the at least one memory 62, and a bus 63 connecting the various system components (including the memory 62 and the processor 61).
The bus 63 includes a data bus, an address bus, and a control bus.
The memory 62 may include volatile memory, such as Random Access Memory (RAM)621 and/or cache memory 622, and may further include Read Only Memory (ROM) 623.
The memory 62 may also include a program/utility 625 having a set (at least one) of program modules 624, such program modules 624 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The processor 61 executes various functional applications and data processing, such as a training method of the living human face recognition model of embodiment 1 of the present invention or a recognition method of a living human face of embodiment 3, by running a computer program stored in the memory 62.
The electronic device 60 may also communicate with one or more external devices 64 (e.g., keyboard, pointing device, etc.). Such communication may be through an input/output (I/O) interface 65. Also, model-generating device 60 may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via network adapter 66. As shown, network adapter 66 communicates with the other modules of model-generating device 60 via bus 63. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the model-generating device 60, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, and data backup storage systems, etc.
It should be noted that although in the above detailed description several units/modules or sub-units/modules of the electronic device are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module according to embodiments of the invention. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.
Example 6
The present embodiment provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor, implements the steps of the training method of the living human face recognition model of embodiment 1 or the recognition method of the living human face of embodiment 3.
More specific examples, among others, that the readable storage medium may employ may include, but are not limited to: a portable disk, a hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.
In a possible implementation manner, the present invention can also be implemented in the form of a program product comprising program code for causing a terminal device to execute the steps of implementing the training method of the living human face recognition model of embodiment 1 or the recognition method of the living human face of embodiment 3 when the program product is run on the terminal device.
Where program code for carrying out the invention is written in any combination of one or more programming languages, the program code may be executed entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on a remote device or entirely on the remote device.
While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that this is by way of example only, and that the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the spirit and scope of the invention, and these changes and modifications are within the scope of the invention.

Claims (10)

1. A training method of a face living body recognition model is characterized by comprising the following steps:
acquiring a visible light image to be trained;
performing data enhancement processing on the visible light image to obtain a plurality of different types of feature maps; wherein the data enhancement processing does not change the living body information of the visible light image;
modifying the number of input channels of an initial convolutional layer of the multi-channel residual convolutional neural network model;
inputting all the feature mapping maps into an input channel of an initial convolutional layer of the multi-channel residual convolutional neural network model for training to generate a human face living body recognition model;
each input channel in the multi-channel residual convolutional neural network model has the same network structure, each input channel corresponds to a feature map, and the number value of the input channel of the initial convolutional layer is the sum of the number values of the feature maps.
2. The training method of living human face recognition model as claimed in claim 1, wherein the step of inputting all the feature maps into the input channels of the initial convolutional layer of the multi-channel residual convolutional neural network model for training comprises:
arranging the feature maps in a preset fixed order along a channel direction;
inputting the arranged feature mapping maps into the input channels in a one-to-one correspondence manner;
performing fusion processing on the input channel after the feature mapping image is input; wherein the fusion process comprises splicing.
3. The training method of living human face recognition model according to claim 1, wherein the step of performing data enhancement processing on the visible light image comprises:
sequentially carrying out 12 data enhancement treatments on the visible light image according to a preset sequence; the preset sequence comprises image Fourier transform, image wavelet transform, random channel loss transform, random channel shuffle transform, downscaling, image value inversion, motion blur, turnover, grid deletion transform, random shuffle grid, random scale interception and translation scale scaling;
and/or the presence of a gas in the gas,
the range of pixel values in the feature map is [0,1 ];
and/or the presence of a gas in the gas,
the visible light image is a three-channel RGB image only containing a face part, and the types of the feature mapping images comprise RGB images and gray level images; the multi-channel residual convolutional neural network model is ResNet50, the number of the input channels of the initial convolutional layer after modification is 32, and the loss function used by the multi-channel residual convolutional neural network model is a cross entropy loss function.
4. A training system of a face living body recognition model is characterized by comprising:
the first acquisition module is used for acquiring a visible light image to be trained;
the data enhancement processing module is used for carrying out data enhancement processing on the visible light image to obtain a plurality of different types of feature maps; wherein the data enhancement processing does not change the living body information of the visible light image;
the modification module is used for modifying the number of input channels of the initial convolutional layer of the multi-channel residual convolutional neural network model;
the training module is used for inputting all the feature maps into an input channel of an initial convolutional layer of the multi-channel residual convolutional neural network model for training to generate a human face living body recognition model;
each input channel in the multi-channel residual convolutional neural network model has the same network structure, each input channel corresponds to a feature map, and the number value of the input channel of the initial convolutional layer is the sum of the number values of the feature maps.
5. The system for training the living human face recognition model as claimed in claim 4, wherein the training module comprises:
an arrangement unit for arranging the feature maps in a preset fixed order along a channel direction;
the input unit is used for inputting the arranged feature mapping maps to the input channels in a one-to-one correspondence manner;
and the fusion unit is used for performing fusion processing on the input channel after the characteristic mapping image is input, wherein the fusion processing comprises splicing.
6. The system for training the living human face recognition model according to claim 4, wherein the data enhancement processing module is specifically configured to:
sequentially carrying out 12 data enhancement treatments on the visible light image according to a preset sequence; the preset sequence comprises image Fourier transform, image wavelet transform, random channel loss transform, random channel shuffle transform, downscaling, image value inversion, motion blur, turnover, grid deletion transform, random shuffle grid, random scale interception and translation scale scaling;
and/or the presence of a gas in the gas,
the range of pixel values in the feature map is [0,1 ];
and/or the presence of a gas in the gas,
the visible light image is a three-channel RGB image only containing a face part, and the types of the feature mapping images comprise RGB images and gray level images; the multi-channel residual convolutional neural network model is ResNet50, the number of the input channels of the initial convolutional layer after modification is 32, and the loss function used by the multi-channel residual convolutional neural network model is a cross entropy loss function.
7. A method for recognizing a living human face is characterized by comprising the following steps:
acquiring an image to be identified; wherein the image to be recognized is an image including a target face;
preprocessing the image to be recognized to obtain a plurality of different types of target images to be recognized;
inputting each target image to be recognized into a human face living body recognition model trained by the training method according to claim 1 for inspection so as to judge whether a target human face in the target image to be recognized is an attack type; and the preprocessing is the same as the data enhancement processing in the training process of the human face living body recognition model.
8. A system for recognizing a living human face, the system comprising:
the second acquisition module is used for acquiring an image to be identified; wherein the image to be recognized is an image including a target face;
the preprocessing module is used for preprocessing the image to be recognized to obtain a plurality of different types of target images to be recognized;
a judging module, configured to input each target image to be recognized into the living human face recognition model trained by the training method according to claim 1, and check the target image to be recognized to judge whether a target human face in the target image to be recognized is an attack type; and the preprocessing is the same as the data enhancement processing in the training process of the human face living body recognition model.
9. An electronic device, comprising a processor, a memory, and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing a training method of a living human face recognition model according to any one of claims 1-3, or performing a recognition method of a living human face according to claim 7.
10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, implements the training method of the living human face recognition model according to any one of claims 1-3, or performs the steps of the recognition method of the living human face according to claim 7.
CN202011447167.4A 2020-12-09 2020-12-09 Model training method, face living body recognition method, system, device and medium Pending CN112464873A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011447167.4A CN112464873A (en) 2020-12-09 2020-12-09 Model training method, face living body recognition method, system, device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011447167.4A CN112464873A (en) 2020-12-09 2020-12-09 Model training method, face living body recognition method, system, device and medium

Publications (1)

Publication Number Publication Date
CN112464873A true CN112464873A (en) 2021-03-09

Family

ID=74801470

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011447167.4A Pending CN112464873A (en) 2020-12-09 2020-12-09 Model training method, face living body recognition method, system, device and medium

Country Status (1)

Country Link
CN (1) CN112464873A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115116111A (en) * 2022-06-24 2022-09-27 北京百度网讯科技有限公司 Anti-disturbance human face living body detection model training method and device and electronic equipment
CN115205637A (en) * 2022-09-19 2022-10-18 山东世纪矿山机电有限公司 Intelligent identification method for mine car materials

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115116111A (en) * 2022-06-24 2022-09-27 北京百度网讯科技有限公司 Anti-disturbance human face living body detection model training method and device and electronic equipment
CN115205637A (en) * 2022-09-19 2022-10-18 山东世纪矿山机电有限公司 Intelligent identification method for mine car materials

Similar Documents

Publication Publication Date Title
WO2021017261A1 (en) Recognition model training method and apparatus, image recognition method and apparatus, and device and medium
US20190311223A1 (en) Image processing methods and apparatus, and electronic devices
CN107545241A (en) Neural network model is trained and biopsy method, device and storage medium
CN111754396B (en) Face image processing method, device, computer equipment and storage medium
CN111680672B (en) Face living body detection method, system, device, computer equipment and storage medium
CN113139628B (en) Sample image identification method, device and equipment and readable storage medium
CN111680675B (en) Face living body detection method, system, device, computer equipment and storage medium
EP4085369A1 (en) Forgery detection of face image
CN111275784A (en) Method and device for generating image
CN112650875A (en) House image verification method and device, computer equipment and storage medium
US20230095182A1 (en) Method and apparatus for extracting biological features, device, medium, and program product
CN111444826A (en) Video detection method and device, storage medium and computer equipment
CN109784243B (en) Identity determination method and device, neural network training method and device, and medium
CN112464873A (en) Model training method, face living body recognition method, system, device and medium
CN116580257A (en) Feature fusion model training and sample retrieval method and device and computer equipment
WO2023165616A1 (en) Method and system for detecting concealed backdoor of image model, storage medium, and terminal
CN112989085A (en) Image processing method, image processing device, computer equipment and storage medium
CN114419363A (en) Target classification model training method and device based on label-free sample data
JP2022133378A (en) Face biological detection method, device, electronic apparatus, and storage medium
CN114282258A (en) Screen capture data desensitization method and device, computer equipment and storage medium
CN111144374B (en) Facial expression recognition method and device, storage medium and electronic equipment
CN112287945A (en) Screen fragmentation determination method and device, computer equipment and computer readable storage medium
CN116798041A (en) Image recognition method and device and electronic equipment
CN116229528A (en) Living body palm vein detection method, device, equipment and storage medium
CN115083006A (en) Iris recognition model training method, iris recognition method and iris recognition device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination