WO2021218899A1 - 人脸识别模型训练方法、人脸识别方法及装置 - Google Patents

人脸识别模型训练方法、人脸识别方法及装置 Download PDF

Info

Publication number
WO2021218899A1
WO2021218899A1 PCT/CN2021/089846 CN2021089846W WO2021218899A1 WO 2021218899 A1 WO2021218899 A1 WO 2021218899A1 CN 2021089846 W CN2021089846 W CN 2021089846W WO 2021218899 A1 WO2021218899 A1 WO 2021218899A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
face recognition
initial
recognition model
model
Prior art date
Application number
PCT/CN2021/089846
Other languages
English (en)
French (fr)
Inventor
王子路
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Publication of WO2021218899A1 publication Critical patent/WO2021218899A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the present disclosure relates to the field of image processing technology, in particular to a face recognition model training method, face recognition method and device.
  • the present disclosure provides a face recognition model training method, face recognition method and device.
  • the present disclosure provides a method for training a face recognition model, including:
  • the first training neural network model and the second training neural network model are used to perform secondary training on the pre-trained face recognition model according to the sample face image to obtain a target face recognition model.
  • the sample face image corresponds to an initial face tagging frame
  • the initial face recognition model includes: a detection model component
  • the training of the initial face recognition model according to the sample face image to obtain a pre-trained face recognition model for detecting the position of the face in the image includes:
  • the trained initial face recognition model is used as the pre-trained face recognition model.
  • the detection model component includes: a first detection model component, a second detection model component, and a third detection model component;
  • the calling the detection model component to recognize the sample face image to obtain a predicted face frame includes:
  • the first size is larger than the second size
  • the second size is larger than the third size
  • the calculating the first loss value corresponding to the initial face recognition model according to the initial face tagging frame and the predicted face frame includes:
  • a third detection loss value corresponding to the initial face recognition model is calculated.
  • using a trained initial face recognition model as the pre-trained face recognition model includes:
  • the trained initial face recognition model is used as the pre-training Face recognition model.
  • the initial face recognition model further includes: a lightweight network layer;
  • the method further includes:
  • the lightweight network layer is called to recognize the facial features in the sample facial image to obtain the recognized facial features.
  • the invoking the detection model component to recognize the sample face image to obtain a predicted face frame includes:
  • the detection model component to perform detection processing on the recognized face feature, and determine the predicted face frame of the recognized face feature in the sample face image.
  • the first training neural network model and the second training neural network connected in series are all embedded before the first detection model component, the second detection model component, and the third detection model component Model.
  • the initial face recognition model further includes a classification layer, and the sample face image corresponds to an initial classification result.
  • said using the first training neural network model and the second training neural network model to perform secondary training on the pre-trained face recognition model according to the sample face image to obtain the target face recognition model includes:
  • the trained pre-trained face recognition model is used as the target face recognition model.
  • the first training neural network model is a confrontation network model for processing occlusion
  • the second training neural network model is a confrontation network model for processing deformation
  • the initial face recognition model further includes a lightweight network layer
  • the detection model component includes: a first detection model component, a second detection model component, and a third detection model component
  • a feature processing layer is connected between the first detection model component and the lightweight network layer, and the feature processing layer is used to process the face features in the sample face image to obtain the first A sample face image of the face features of the size;
  • the second detection model component is directly connected behind the lightweight network layer
  • a dimensionality reduction processing module is connected between the lightweight network layer and the third detection model component, and the dimensionality reduction processing module is used to perform dimensionality reduction processing on the sample face image to obtain a third size A sample face image of the facial features.
  • the dimensionality reduction processing module includes: a first activation function layer, a second activation function layer, and a convolution layer;
  • the first activation function layer and the second activation function layer are connected in parallel between the lightweight network layer and the convolutional layer, and the convolutional layer is connected to the third detection model component.
  • the first loss value includes a loss value corresponding to face classification and a loss value corresponding to predicted face frame coordinates
  • the calculation formula of the first loss value is:
  • l c is the loss of face classification
  • k is the component number of the detection model
  • p i is the prediction probability of the i-th predicted face frame
  • g i is the label value of the correct label data of the i-th predicted face frame
  • l r is the regression loss of the predicted face frame
  • b i is the predicted 4 correction values
  • t i is the actual value of the correct labeled data.
  • the present disclosure provides a face recognition method, including:
  • the target face recognition model is obtained through training by the above-mentioned training method.
  • the present disclosure provides a face recognition device, including:
  • the present disclosure provides a computer-readable storage medium.
  • the electronic device can execute the face recognition model training method described in any one of the above, or the above Face recognition method.
  • Fig. 1 shows a flow chart of the steps of a method for training a face recognition model provided by an embodiment of the present disclosure
  • Figure 2 shows a schematic diagram of an SSH network structure provided by an embodiment of the present disclosure
  • FIG. 3 shows a schematic diagram of a fully connected layer provided by an embodiment of the present disclosure
  • Figure 4 shows a schematic diagram of an ASTN network provided by an embodiment of the present disclosure
  • Figure 5 shows a flow chart of the steps of a face recognition method provided by an embodiment of the present disclosure
  • Fig. 6 schematically shows a block diagram of a face recognition device for executing the method according to the present disclosure.
  • Fig. 7 schematically shows a storage unit for holding or carrying program codes for implementing the method according to the present disclosure.
  • the face recognition model training method may specifically include the following steps:
  • Step 101 Obtain a sample face image.
  • the embodiments of the present disclosure can be applied to a scene that recognizes a distorted face image in an edge area of a commercial fisheye camera.
  • the backbone detection network of the face recognition model of this embodiment may use the SSH algorithm.
  • SSH actually introduces different detection model components Detection Module into the convolutional layers corresponding to feature maps of different scales to detect faces of different scales.
  • the network structure is a fully convolutional network structure.
  • mobilenet or shufflenet is used as the backbone network instead. Both mobilenet and shufflenet are lightweight networks, which can simplify the network structure, so that the trained model can be applied to terminal products.
  • a lightweight network layer (mobilenet or shufflenet) is used to replace conv1-1 ⁇ 4 and conv5-3 shown in Figure 2 to detect the model component M1 It is connected after a max-pooling layer added after the lightweight network layer, and the detection model component M2 is directly connected after the lightweight network layer convolution layer, between the detection model component M1 and the detection model component M2
  • the difference is a max-pooling layer operation with a stride of 2
  • the Max-pooling operation is used to increase the receptive field, so that M1 can detect a face larger than M2.
  • two parallel activation function (Rectified Linear Unints, Relu) layers are added after the lightweight network layer to reduce the number of channels from the original 512-dimensional to 128-dimensional, and through bilinear
  • the interpolation up-sampling operation increases the size of the feature map, and then the parameters output by the two channels are correspondingly summed, after a 3 ⁇ 3 convolution layer, and finally connected to the detection model component M3, through dimensionality reduction processing, so that M3 It can detect faces of smaller sizes.
  • the sample face image refers to the face image used to train the face recognition model.
  • a face image can be randomly obtained from the Internet as a sample face image, specifically, it can be determined according to business requirements, which is not limited in the embodiment of the present disclosure.
  • step 102 is executed.
  • Step 102 Train the initial face recognition model according to the sample face image to obtain a pre-trained face recognition model for detecting the position of the face in the image.
  • the initial face recognition model refers to a face recognition model that has not been trained yet.
  • the pre-trained face recognition model refers to a model that can effectively recognize the position of the face in the image after the initial face recognition model is trained and can achieve the expected effect.
  • the initial face recognition model can be trained according to the acquired sample face image to obtain a pre-trained face recognition model for detecting the position of the face in the image.
  • the specific training process can be described in detail in conjunction with the following specific implementation manners.
  • the foregoing step 102 may include: sub-step 1021, sub-step 1022, and sub-step 1023, where:
  • Sub-step 1021 Invoke the detection model component to recognize the sample face image to obtain a predicted face frame.
  • the sample face image may correspond to an initial face detection frame.
  • the initial face detection frame is a frame previously marked by the business personnel according to the position of the face in each sample face image. Therefore, the face can be enclosed by four points so as to form a square frame, that is, the initial face labeling frame.
  • the predicted face frame refers to the face frame obtained by recognizing the facial features in the processed face image through the detection model component.
  • the sample face image can be input to the initial face recognition model, and the detection model component in the initial face recognition model can be called to recognize the facial features of the sample face image, and according to the recognition result A square frame enclosing the facial features is formed, that is, the predicted face frame.
  • the initial face recognition model can also include the aforementioned lightweight network layer.
  • the lightweight network layer can be called to perform face recognition on the sample face image. Recognize the facial features in, to obtain the recognized facial features, and then call the detection model component to detect the recognized facial features to detect the position of the recognized facial features in the sample face image, combined with the position to determine the recognition The predicted face frame of the face features in the sample face image.
  • the detection model component in this embodiment can be divided into three branches: the first detection model component M1, the second detection model component M2, and the third detection model component M3. Combining the three branches, the detection process is described as follows.
  • the aforementioned step 1021 may include: sub-step 10211, sub-step 10212, and sub-step 10213, wherein:
  • Sub-step 10211 Invoke the first detection model component to recognize the processed sample face image containing the facial features of the first size, to obtain a first predicted face frame;
  • Sub-step 10212 Invoke the second detection model component to recognize the processed sample face image containing the face features of the second size, to obtain a second predicted face frame;
  • Sub-step 10213 Invoke the third detection model component to recognize the processed sample face image containing the face feature of the third size, and obtain a third predicted face frame.
  • the first predicted face frame refers to a predicted face frame obtained by using the first detection model component to recognize the facial features in the sample face image.
  • the second predicted face frame refers to a predicted face frame obtained by using the second detection model component to recognize the face features in the sample face image.
  • the third predicted face frame refers to a predicted face frame obtained by using the third detection model component to recognize the facial features in the sample face image.
  • the first detection model component is the detection model component M1 mentioned in step 101 above.
  • a feature processing layer (Max pooling layer) is added before the detection model component M1, and the facial features in the sample face image are processed through Max pooling. Processing to obtain a sample face image containing the facial features of the first size can increase the receptive field of the facial features, so that the detection model component M1 can detect the facial features of the first size.
  • the first detection model component is used to recognize the processed sample face image containing the facial features of the first size, and the first predicted face frame can be obtained.
  • the second detection model component is the detection model component M2 mentioned in step 101 above. Compared with the detection model component M1, the detection model component M2 is directly connected after the lightweight network layer. Therefore, the detection model component M2 can detect more than The face feature of the second size that is smaller in the first size. The second detection model component is used to recognize the processed sample face image containing the face feature of the second size, and the second predicted face frame can be obtained.
  • the third detection model component is the detection model component M3 mentioned in step 101 above.
  • two parallel activation function (Rectified Linear Unints, Relu) layers are added to change the number of channels from the original 512
  • the dimension is reduced to 128 dimensions, and the size of the feature map is increased through the bilinear interpolation up-sampling operation, and then the parameters output by the two channels are correspondingly summed, and the 3 ⁇ 3 convolutional layer is passed through the dimensionality reduction process.
  • the sample face image containing the facial features of the third size is finally connected to the detection model component M3, so that the M3 can detect the facial features of the third size smaller than the second size.
  • the third detection model component is used to recognize the processed sample face image containing the face feature of the third size, and the third predicted face frame can be obtained.
  • three detection model components are used to respectively recognize the face features in the sample face image, so that the simultaneous training of the three detection model components can be realized to meet the detection of face features of different sizes.
  • sub-step 1022 is executed.
  • Sub-step 1022 According to the initial face labeling frame and the predicted face frame, a first loss value corresponding to the initial face recognition model is calculated.
  • the first loss value refers to the loss value corresponding to the calculated initial face recognition model.
  • the first loss value may indicate the degree of deviation between the predicted face frame of the sample face image and the initial face label frame.
  • the initial face tagging frame and the predicted face frame can be combined to calculate the first loss value corresponding to the initial face recognition model.
  • the first loss value may include two parts: the loss value corresponding to the two tasks responsible for face classification and predicting face frame coordinate regression. As shown in the following formula (1):
  • l c represents the loss of face classification
  • k represents the sequence number of the detection model component (ie the sequence number corresponding to the three detection model components)
  • p i represents the prediction probability of the i-th predicted face frame
  • g i represents The label value of the i-th predicted face frame correct label data t (ground-truth) (when the degree of overlap (Intersection over Union, IoU)>0.5 is 1, ⁇ 0.5 is 0), the negative sample is defined as the same as any face The ground-truth IoU ⁇ 0.3 of the detection frame.
  • l r represents the regression loss of the predicted face frame
  • the regression variable is the log transformation value of the detection frame size scaling and translation.
  • b i represents the predicted 4 correction values
  • t i represents the actual value of ground-truth.
  • the initial face recognition model can include three detection model components.
  • each detection model component will output a loss value.
  • the above sub-step 1022 can Including: sub-step 10221, sub-step 10222 and sub-step 10223, where:
  • Sub-step 10221 calculating a first detection loss value corresponding to the initial face recognition model according to the initial face tagging frame and the first predicted face frame;
  • Sub-step 10222 calculating a second detection loss value corresponding to the initial face recognition model according to the initial face tagging frame and the second predicted face frame;
  • Sub-step 10223 According to the initial face labeling frame and the third predicted face frame, a third detection loss value corresponding to the initial face recognition model is calculated.
  • three predicted face frames can be obtained respectively, namely, the first predicted face frame, the second predicted face frame, and the third predicted face frame. .
  • the three losses corresponding to the initial face recognition model can be calculated respectively. Values, that is, the first detection loss value, the second detection loss value, and the third detection loss value.
  • the first detection loss value can be calculated according to the initial face labeling frame and the first predicted face frame
  • the second detection loss value can be calculated according to the initial face labeling frame and the second predicted face frame.
  • the face labeling frame and the third predicted face frame can be calculated to obtain the third detection loss value.
  • the three loss values correspond to the first detection model component, the second detection model component, and the third detection model component, respectively.
  • step 1023 After calculating the first loss value corresponding to the initial face recognition model according to the initial face labeling frame and the predicted face frame, step 1023 is executed.
  • Sub-step 1023 When the first loss value reaches the first initial value, use the trained initial face recognition model as the pre-trained face recognition model.
  • the first initial value refers to a standard preset by the business personnel for judging the training level of the initial face recognition model.
  • the training of the initial face recognition model has not reached the desired effect. It can be considered that the predicted face frame and the corresponding initial face label frame in each sample face image The deviation is large. At this time, the number of sample face images can be increased, and the initial face recognition model can be continuously trained.
  • the initial face recognition model can be used as a pre-trained face Recognition model, pre-trained face recognition model can be used to detect the position of the face in the subsequent face image.
  • the sample face image undergoes three detection model components during the training process, and outputs three loss values, namely the first detection loss value, the second detection loss value, and the third detection loss value.
  • the three detection loss values it is necessary for the three detection loss values to reach the first initial value before the training process of the initial face recognition model is considered to be over. If there is only one detection loss value that does not reach the first initial value, it is considered that The training of the initial face recognition model has not yet ended, and it is necessary to continue training the initial face recognition model combined with sample face images.
  • the pre-trained face recognition model obtained by training can form the initial face feature location capability, that is, locate the position of the face feature in the image.
  • Step 103 Use the first training neural network model and the second training neural network model to perform secondary training on the pre-trained face recognition model according to the sample face image to obtain a target face recognition model.
  • the first training neural network model is an adversarial network model (ASDN, Adversarial spatial dropout network) that handles occlusion
  • the second training neural network model is an adversarial network model (ASTN, Adversarial spatial transform network) that handles deformation.
  • ASDN adversarial network model
  • ASDN Adversarial spatial dropout network
  • ASTN Adversarial spatial transform network
  • the ASDN network contains two fully connected layers, as shown in Figure 3. It learns the effects of occlusion and light shadows on features during the training process. In forward propagation, the two fully connected layers form a occlusion with missing features. (dropout mask), reduce the weight of important features, thereby training a stronger face recognition model.
  • the ASTN network may also include: localization network, grid generator, and sampler (as shown in Figure 4).
  • the ASTN network will cause features to be rotated and distorted, making it more difficult to recognize. During the training process, it helps the detection network to enhance the performance of recognizing distorted human faces, so that the method of this embodiment is more suitable for face recognition by fisheye cameras.
  • a series-connected ASDN and ASTN are embedded before the first detection model component, the second detection model component, and the third detection model component. These two confrontation network models can be used to pre-train the face recognition model. Training is carried out to learn the influence of occlusion and deformation on the detection results, and improve the efficiency of distorted face detection and the detection accuracy of occluded faces.
  • ASDN and ASTN can be used to perform secondary training on the pre-trained face recognition model, and after the training is completed, the final can be obtained
  • a target face recognition model that recognizes distorted and occluded faces.
  • the above step 103 may include: sub-step 1031, sub-step 1032, sub-step 1033, sub-step 1034, sub-step 1035, and sub-step 1036, where:
  • Sub-step 1031 Invoke the first training neural network model to perform occlusion processing on the face features in the sample face image, and generate occluded face features.
  • the first training neural network can perform occlusion processing on facial features.
  • the occlusion of facial features refers to the facial features obtained by occluding the facial features in the sample face image.
  • the first training neural network model can be called to perform occlusion processing on the face features in the sample face image to generate occluded face features.
  • a sliding window (d/3) ⁇ (d/3) can be used. For each sliding window, all values of the corresponding channel can be discarded to generate a new feature vector, which will be sent to
  • the detection network responsible for face classification and prediction of face frame regression calculates Loss. Based on the Loss of all these sliding windows, the highest Loss is selected.
  • N training sample pairs ⁇ (X 1 ,M 1 ),...,(X N ,M N ) ⁇ can be generated.
  • the binary cross-entropy loss function can be used, as shown in the following formula (2):
  • L represents a binary cross-entropy loss function
  • a i, j (X P ) represents the output for the input ASDN wherein X P at a position (i, j), and Indicates the pixel value of the input feature at position (i, j).
  • Sub-step 1032 Invoke the second training neural network model to perform deformation processing on the occluded face feature to generate a deformed face feature.
  • the second training neural network model connected in series can be called to deform the occluded facial features, thereby generating deformed facial features.
  • Sub-step 1033 Invoke the classification layer to recognize the deformed face feature, and determine the predicted classification result of the deformed face feature.
  • the initial face recognition model may also include a classification layer, and the classification layer may determine the classification result of the facial features.
  • the background feature of the image may be recognized as a face feature, and a classification probability can be generated through the classification layer, and whether the recognized feature is a face feature can be determined by the classification probability.
  • the predicted classification result refers to the classification of deformed facial features predicted by the classification layer.
  • the classification layer can be called to recognize the deformed facial features to determine the predicted classification result of the deformed facial features.
  • the training of the ASTN network can adopt the transfer learning method.
  • the training process is similar to the ASDN network, and in the back propagation process, only the localization net variables are changed.
  • Sub-step 1034 Invoke the detection model component to recognize the deformed face feature, and obtain the predicted face frame.
  • the detection model component connected to the second neural network model can be called to recognize deformed facial features to obtain deformed facial features The predicted face frame corresponding to the feature.
  • Sub-step 1035 According to the initial classification result, the initial face labeling frame, the predicted classification result, and the predicted face frame, a second loss value of the pre-trained face recognition model is calculated.
  • the second loss value refers to the loss value of the pre-trained face recognition model obtained during the secondary training of the pre-trained face recognition model through the first training neural network model and the second training neural network model.
  • the initial classification result, initial face labeling frame, predicted face frame, and predicted classification result can be combined to calculate the second loss value of the pre-trained face recognition model .
  • the loss of the ASDN and ASTN generation network is designed as:
  • L A is the joint loss function of ASDN and ASTN, that is, the second loss value
  • l c is the face classification loss
  • Sigmoid is a threshold function of a neural network, which can map variables to between 0 and 1. between.
  • sub-step 1036 is executed.
  • Sub-step 1036 When the second loss value reaches the second initial value, use the trained pre-trained face recognition model as the target face recognition model.
  • the second initial value refers to a standard preset by the business personnel for judging the training degree of the pre-trained face recognition model.
  • the second loss value does not reach the second initial value, it means that the training of the pre-trained face recognition model has not reached the expected effect. At this time, the number of sample face images can be increased, and the pre-trained face recognition model can continue to be trained .
  • the second loss value reaches the second initial value, it means that the training of the pre-trained face recognition model has achieved the expected effect.
  • the pre-trained face recognition model after training has been able to accurately recognize the occlusion deformation
  • the trained pre-trained face recognition model can be used as the target face recognition model.
  • the first training neural network model and the second training neural network model can be removed to obtain the target face recognition model.
  • the second training of the face recognition model by the first training neural network model and the second training neural network model can improve the accuracy of the face recognition model for occluded and deformed face recognition.
  • the face recognition model training method obtains a sample face image, and trains an initial face recognition model based on the sample face image to obtain a pre-trained person for detecting the position of the face in the image.
  • the face recognition model uses the first training neural network and the second training neural network to perform secondary training on the pre-trained face recognition model according to the sample face image to obtain the target face recognition model.
  • the embodiment of the present disclosure adds a confrontation network (that is, the first training neural network model and the second training neural network model) in the training process of the face recognition model to learn the influence of occlusion and deformation on the detection result, and improve the commercial fisheye The detection efficiency of the distorted face in the edge area of the camera and the detection accuracy of the occluded face.
  • the face recognition method may specifically include the following steps:
  • Step 201 Obtain a face image to be recognized.
  • the embodiments of the present disclosure can be applied to a scene of fuzzy face recognition in a face image.
  • the facial image to be recognized refers to an image used for facial feature recognition.
  • the facial image to be recognized may contain fuzzy facial features, such as occluded facial features, distorted facial features, and so on.
  • the face image to be recognized may not include fuzzy facial features, such as occluded facial features, distorted facial features, etc. Specifically, it can be determined according to actual conditions, and this embodiment does not impose restrictions on this.
  • step 202 After acquiring the face image to be recognized, step 202 is executed.
  • Step 202 Input the face image to be recognized into the target face recognition model and output the face recognition result.
  • the target face recognition model refers to the face recognition model obtained by training using the face recognition model training method of the foregoing embodiment.
  • the result of face recognition refers to the facial features of the face in the recognized image. Understandably, for different people, facial features are unique, and different people can be tracked and monitored through facial features. .
  • the face recognition result corresponding to the face image to be recognized can be output through the target face recognition model.
  • the target face recognition model in this embodiment is trained using the face recognition model training method of the above-mentioned embodiment, the target face recognition model in this embodiment can not only treat faces that do not contain occlusion deformations. Feature recognition can also be performed on facial features including occlusion and/or deformation.
  • the face image to be recognized is acquired, and the face image to be recognized is input to the target face recognition model to output the face recognition result. Since the target face recognition model used in this embodiment is The training method of the face recognition model in the above embodiment can realize the recognition of occluded and deformed face features, and further, can improve the efficiency of the distortion face detection in the edge area of the commercial fisheye camera and the detection of the occluded face. Detection accuracy rate.
  • an embodiment of the present disclosure also provides a face recognition device, including: a processor, a memory, and a computer program stored on the memory and running on the processor, and the processor executes the The above-mentioned face recognition model training method or the above-mentioned face recognition method is implemented in the program.
  • the present disclosure also provides a computer-readable storage medium.
  • the instructions in the storage medium are executed by the processor of the electronic device, the electronic device can execute the aforementioned face recognition model training method or the aforementioned face recognition method.
  • the device embodiments described above are merely illustrative.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network units.
  • Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments. Those of ordinary skill in the art can understand and implement it without creative work.
  • the various component embodiments of the present disclosure may be implemented by hardware, or by software modules running on one or more processors, or by a combination of them.
  • a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in the face recognition device according to the embodiments of the present disclosure.
  • DSP digital signal processor
  • the present disclosure can also be implemented as a device or device program (for example, a computer program and a computer program product) for executing part or all of the methods described herein.
  • Such a program for realizing the present disclosure may be stored on a computer-readable medium, or may have the form of one or more signals.
  • Such a signal can be downloaded from an Internet website, or provided on a carrier signal, or provided in any other form.
  • FIG. 6 shows a face recognition device that can implement the method according to the present disclosure.
  • the face recognition device traditionally includes a processor 1010 and a computer program product in the form of a memory 1020 or a computer readable medium.
  • the memory 1020 may be an electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM.
  • the memory 1020 has a storage space 1030 for executing program codes 1031 of any method steps in the above methods.
  • the storage space 1030 for program codes may include various program codes 1031 respectively used to implement various steps in the above method. These program codes can be read from or written into one or more computer program products.
  • These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards, or floppy disks.
  • Such a computer program product is usually a portable or fixed storage unit as described with reference to FIG. 7.
  • the storage unit may have a storage section, storage space, etc. arranged similarly to the memory 1020 in the face recognition device of FIG. 6.
  • the program code can be compressed in an appropriate form, for example.
  • the storage unit includes computer-readable codes 1031', that is, codes that can be read by, for example, a processor such as 1010. These codes, when run by a face recognition device, cause the face recognition device to perform the above-described The steps in the method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

人脸识别模型训练方法、人脸识别方法及装置,涉及图像处理技术领域。所述人脸识别模型训练方法包括:获取样本人脸图像;根据所述样本人脸图像对初始人脸识别模型进行训练,得到用于对图像中的人脸位置进行检测的预训练人脸识别模型;利用第一训练神经网络模型和第二训练神经网络模型根据所述样本人脸图像对所述预训练识别模型进行二次训练,得到目标人脸识别模型。

Description

人脸识别模型训练方法、人脸识别方法及装置
相关申请的交叉引用
本公开要求在2020年04月30日提交中国专利局、申请号为202010366448.0、名称为“人脸识别模型训练方法、人脸识别方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
技术领域
本公开涉及图像处理技术领域,特别是涉及一种人脸识别模型训练方法、人脸识别方法及装置。
背景技术
随着电子商务等应用的发展,人脸识别成为最有潜力的生物身份验证手段,这种应用背景要求自动人脸识别系统能够对一般图象具有一定的识别能力,由此使得人脸检测技术得以发展。随后,金字塔多层级检测网络(如PyramidBox等)的出现使人脸检测达到了一个非常高的性能。
概述
本公开提供了一种人脸识别模型训练方法、人脸识别方法及装置。
本公开提供了一种人脸识别模型训练方法,包括:
获取样本人脸图像;
根据所述样本人脸图像对初始人脸识别模型进行训练,得到用于对图像中的人脸位置进行检测的预训练人脸识别模型;
利用第一训练神经网络模型和第二训练神经网络模型根据所述样本人脸图像对所述预训练人脸识别模型进行二次训练,得到目标人脸识别模型。
可选地,所述样本人脸图像对应于一个初始人脸标注框,所述初始人脸识别模型包括:检测模型组件;
所述根据所述样本人脸图像对初始人脸识别模型进行训练,得到用于对图像中的人脸位置进行检测的预训练人脸识别模型,包括:
调用所述检测模型组件对所述样本人脸图像进行识别,得到预测人脸框;
根据所述初始人脸标注框和所述预测人脸框,计算得到所述初始人脸识别模型对应的第一损失值;
在所述第一损失值达到第一初始值的情况下,将训练后的初始人脸识别模型作为所述预训练人脸识别模型。
可选地,所述检测模型组件包括:第一检测模型组件、第二检测模型组件和第三检测模型组件;
所述调用所述检测模型组件对所述样本人脸图像进行识别,得到预测人脸框,包括:
调用所述第一检测模型组件对经处理后包含第一尺寸的人脸特征的样本人脸图像进行识别,得到第一预测人脸框;
调用所述第二检测模型组件对经处理后包含第二尺寸的人脸特征的样本人脸图像进行识别,得到第二预测人脸框;
调用所述第三检测模型组件对经处理后包含第三尺寸的人脸特征的样本人脸图像进行识别,得到第三预测人脸框;
其中,所述第一尺寸大于所述第二尺寸,所述第二尺寸大于所述第三尺寸。
可选地,所述根据所述初始人脸标注框和所述预测人脸框,计算得到所述初始人脸识别模型对应的第一损失值,包括:
根据所述初始人脸标注框和所述第一预测人脸框,计算得到所述初始人脸识别模型对应的第一检测损失值;
根据所述初始人脸标注框和所述第二预测人脸框,计算得到所述初始人脸识别模型对应的第二检测损失值;
根据所述初始人脸标注框和所述第三预测人脸框,计算得到所述初始人脸识别模型对应的第三检测损失值。
可选地,所述在所述第一损失值达到第一初始值的情况下,将训练后的初始人脸识别模型作为所述预训练人脸识别模型,包括:
在所述第一检测损失值、所述第二检测损失值和所述第三检测损失值均达到所述第一初始值的情况下,将训练后的初始人脸识别模型作为所述预训练人脸识别模型。
可选地,所述初始人脸识别模型还包括:轻量级网络层;
在所述调用所述检测模型组件对所述样本人脸图像进行识别,得到预测人脸框之前,还包括:
调用所述轻量级网络层对所述样本人脸图像中的人脸特征进行识别,得到识别人脸特征。
可选地,所述调用所述检测模型组件对所述样本人脸图像进行识别,得到预测人脸框,包括:
调用所述检测模型组件对所述识别人脸特征进行检测处理,确定所述识别人脸特征在所述样本人脸图像中的预测人脸框。
可选地,在所述第一检测模型组件、所述第二检测模型组件和所述第三检测模型组件之前均嵌入串联连接的所述第一训练神经网络模型和所述第二训练神经网络模型。
可选地,所述初始人脸识别模型还包括:分类层,所述样本人脸图像对应于一个初始分类结果。
可选地,所述利用第一训练神经网络模型和第二训练神经网络模型根据所述样本人脸图像对所述预训练人脸识别模型进行二次训练,得到目标人脸识别模型,包括:
调用所述第一训练神经网络模型对所述样本人脸图像中的人脸特征进行遮挡处理,生成遮挡人脸特征;
调用所述第二训练神经网络模型对所述遮挡人脸特征进行形变处理,生成形变人脸特征;
调用所述分类层对所述形变人脸特征进行识别,确定所述形变人脸特征的预测分类结果;
调用所述检测模型组件对所述形变人脸特征进行识别,得到所述预测人脸框;
根据所述初始分类结果、所述初始人脸标注框、所述预测分类结果和所述预测人脸框,计算得到所述预训练人脸识别模型的第二损失值;
在所述第二损失值达到第二初始值的情况下,将训练后的预训练人脸识别模型作为目标人脸识别模型。
可选地,所述第一训练神经网络模型为处理遮挡的对抗网络模型,所述第二训练神经网络模型为处理形变的对抗网络模型。
可选地,所述初始人脸识别模型还包括轻量级网络层,所述检测模型组件包括:第一检测模型组件、第二检测模型组件和第三检测模型组件;
所述第一检测模型组件与所述轻量级网络层之间连接有特征处理层,所述特征处理层用于对所述样本人脸图像中的人脸特征进行处理,以得到包含第一尺寸的人脸特征的样本人脸图像;
所述第二检测模型组件直接连接于所述轻量级网络层之后;
所述轻量级网络层与所述第三检测模型组件之间连接有降维处理模块,所述降维处理模块用于对所述样本人脸图像进行降维处理,以得到包含第三尺寸的人脸特征的样本人脸图像。
可选地,所述降维处理模块包括:第一激活函数层、第二激活函数层以及卷积层;
所述第一激活函数层和所述第二激活函数层并联连接于所述轻量级网络层与所述卷积层之间,所述卷积层与所述第三检测模型组件连接。
可选地,所述第一损失值包括人脸分类对应的损失值以及预测人脸框坐标对应的损失值;
所述第一损失值的计算公式为:
Figure PCTCN2021089846-appb-000001
其中,l c为人脸分类的损失,k为检测模型组件序号,p i为第i个预测人脸框的预测概率,g i为第i个预测人脸框的正确的标注数据的标签值,l r为预测人脸框的回归损失,b i为预测的4个修正值,t i为正确的标注数据的实际值。
本公开提供了一种人脸识别方法,包括:
获取待识别人脸图像;
将所述待识别人脸图像输入至目标人脸识别模型输出人脸识别结果;
其中,所述目标人脸识别模型是上述训练方法训练得到的。
本公开提供了一种人脸识别装置,包括:
处理器、存储器以及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述程序时实现上述任一项所述的人脸识别模型训练方法,或上述人脸识别方法。
本公开提供了一种计算机可读存储介质,当所述存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行上述任一项所述的人脸识别 模型训练方法,或上述人脸识别方法。
上述说明仅是本公开技术方案的概述,为了能够更清楚了解本公开的技术手段,而可依照说明书的内容予以实施,并且为了让本公开的上述和其它目的、特征和优点能够更明显易懂,以下特举本公开的具体实施方式。
附图简述
为了更清楚地说明本公开实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示出了本公开实施例提供的一种人脸识别模型训练方法的步骤流程图;
图2示出了本公开实施例提供的一种SSH网络结构的示意图;
图3示出了本公开实施例提供的一种全连接层的示意图;
图4示出了本公开实施例提供的一种ASTN网络的示意图;
图5示出了本公开实施例提供的一种人脸识别方法的步骤流程图;
图6示意性地示出了用于执行根据本公开的方法的人脸识别装置的框图;并且
图7示意性地示出了用于保持或者携带实现根据本公开的方法的程序代码的存储单元。
详细描述
为使本公开的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本公开作进一步详细的说明。
参照图1,示出了本公开实施例提供的一种人脸识别模型训练方法的步骤流程图,该人脸识别模型训练方法具体可以包括如下步骤:
步骤101:获取样本人脸图像。
本公开实施例可以应用于对商用鱼眼摄像头边缘地区的畸变的人脸图像进行识别的场景中。
首先,可以结合本实施例对人脸识别模型的训练过程进行描述。
本实施例的人脸识别模型的主干检测网络可以采用SSH算法,SSH其实就是在不同尺度的特征图对应的卷积层引入不同的检测模型组件Detection Module以检测不同尺度大小的人脸。该网络结构为全卷积网络结构。本实施例采用mobilenet或者shufflenet替换作为主干网络,mobilenet和shufflenet均为轻量级网络,可以简化网络结构,从而可以使训练得到的模型应用于终端产品种。
接下来,结合图2对SSH网络的架构进行如下描述。
由图2所示的SSH网络结构图,本实施例中,采用轻量级网络层(mobilenet或shufflenet)对图2所示的conv1-1~4和conv5-3进行了替换,检测模型组件M1是连接在轻量级网络层后增加的一个max-pooling层之后的,而检测模型组件M2则是直接连接在轻量级网络层卷积层之后,检测模型组件M1和检测模型组件M2之间相差一个stride为2的max-pooling层操作,通过Max-pooling操作以增加感受野,从而使得M1能够检测到比M2更大的人脸。
对于检测模型组件M3,是在轻量级网络层后增加了两个并联的激活函数(Rectified Linear Unints,Relu)层,以将通道数从原来的512维降至128维,并通过双线性插值up-sampling操作将feature map的尺寸变大,然后将两个通道输出的参数对应求和,经过3×3的卷积层,最后连接上检测模型组件M3,通过降维处理,以使得M3可以检测更小尺寸的人脸。
样本人脸图像是指用于对人脸识别模型进行训练的人脸图像。
在具体实现中,可以从互联网中随机获取人脸图像以作为样本人脸图像,具体地,可以根据业务需求而定,本公开实施例对此不加以限制。
在获取样本人脸图像之后,执行步骤102。
步骤102:根据所述样本人脸图像对初始人脸识别模型进行训练,得到用于对图像中的人脸位置进行检测的预训练人脸识别模型。
初始人脸识别模型是指还未进行训练的人脸识别模型。
预训练人脸识别模型是指对初始人脸识别模型进行训练之后,能够达到预期效果的可以对图像中的人脸位置进行有效识别的模型。
在获取样本人脸图像之后,可以根据获取的样本人脸图像对初始人脸识别模型进行训练,以得到用于对图像中的人脸位置进行检测的预训练人脸识别模型。对于具体地训练过程,可以结合下述具体实现方式进行详细描述。
在本公开的一种具体实现方式中,上述步骤102可以包括:子步骤1021、子步骤1022和子步骤1023,其中,
子步骤1021:调用所述检测模型组件对所述样本人脸图像进行识别,得到预测人脸框。
在本实施例中,样本人脸图像可以对应于一个初始人脸检测框,初始人脸检测框是由业务人员预先根据每幅样本人脸图像中人脸所处的位置而标注的框,具体地,可以由四个点恰好将人脸围合,从而可以形成一个方形框,即初始人脸标注框。
预测人脸框是指通过检测模型组件对处理人脸图像中的人脸特征进行识别得到的人脸框。
在获取到样本人脸图像之后,可以将样本人脸图像输入至初始人脸识别模型,并调用初始人脸识别模型中的检测模型组件对样本人脸图像进行人脸特征进行识别,根据识别结果形成围合人脸特征的方形框,即预测人脸框。
当然,在本实施例中,初始人脸识别模型还可以包括上述提及的轻量级网络层,在调用检测模型组件进行人脸特征识别之前,可以调用轻量级网络层对样本人脸图像中的人脸特征进行识别,以得到识别人脸特征,进而,调用检测模型组件对识别人脸特征进行检测,以检测识别人脸特征在样本人脸图像中所处的位置,结合位置确定识别人脸特征在样本人脸图像中的预测人脸框。
可以理解地,结合下述步骤101中的描述可知,本实施例中的检测模型组件可以分为第一检测模型组件M1、第二检测模型组件M2和第三检测模型组件M3三个分支,以下结合三个分支对检测过程进行如下描述。
在本公开的一种具体实现方式中,上述步骤1021可以包括:子步骤10211、子步骤10212和子步骤10213,其中:
子步骤10211:调用所述第一检测模型组件对经处理后包含第一尺寸的人脸特征的样本人脸图像进行识别,得到第一预测人脸框;
子步骤10212:调用所述第二检测模型组件对经处理后包含第二尺寸的人脸特征的样本人脸图像进行识别,得到第二预测人脸框;
子步骤10213:调用所述第三检测模型组件对经处理后包含第三尺寸的人脸特征的样本人脸图像进行识别,得到第三预测人脸框。
在本实施例中,第一预测人脸框是指采用第一检测模型组件对样本人脸图像中的人脸特征进行识别得到的预测人脸框。
第二预测人脸框是指采用第二检测模型组件对样本人脸图像中的人脸特征进行识别得到的预测人脸框。
第三预测人脸框是指采用第三检测模型组件对样本人脸图像中的人脸特征进行识别得到的预测人脸框。
第一检测模型组件即上述步骤101中提及的检测模型组件M1,在检测模型组件M1之前增加了一个特征处理层(Max pooling层),通过Max pooling对样本人脸图像中的人脸特征进行处理,以得到包含第一尺寸的人脸特征的样本人脸图像,可以增加人脸特征的感受野,从而可以使得检测模型组件M1可以检测第一尺寸的人脸特征。通过第一检测模型组件对经处理后包含第一尺寸的人脸特征的样本人脸图像进行识别,可以得到第一预测人脸框。
第二检测模型组件即上述步骤101中提及的检测模型组件M2,相较于检测模型组件M1,检测模型组件M2是直接连接在轻量级网络层之后,因此,检测模型组件M2可以检测比第一尺寸小的第二尺寸的人脸特征。通过第二检测模型组件对经处理后包含第二尺寸的人脸特征的样本人脸图像进行识别,可以得到第二预测人脸框。
第三检测模型组件即上述步骤101中提及的检测模型组件M3,在轻量级网络层后增加了两个并联的激活函数(Rectified Linear Unints,Relu)层,以将通道数从原来的512维降至128维,并通过双线性插值up-sampling操作将feature map的尺寸变大,然后将两个通道输出的参数对应求和,经过3×3的卷积层,通过降维处理得到包含第三尺寸的人脸特征的样本人脸图像,最后连接上检测模型组件M3,以使得M3可以检测比第二尺寸更小的第三尺寸的人脸特征。通过第三检测模型组件对经处理后包含第三尺寸的人脸特征的样本人脸图像进行识别,可以得到第三预测人脸框。
在本实施例中,通过三个检测模型组件分别对样本人脸图像中的人脸特征进行识别,从而可以实现三个检测模型组件的同时训练,以满足不同尺寸的人脸特征的检测。
在调用检测模型组件对所述样本人脸图像进行识别,得到预测人脸框之后,执行子步骤1022。
子步骤1022:根据所述初始人脸标注框和所述预测人脸框,计算得到所述初始人脸识别模型对应的第一损失值。
第一损失值是指计算得到的初始人脸识别模型所对应的损失值。第一损失值可以表示样本人脸图像的预测人脸框与初始人脸标注框之间的偏差程度。
在获取预测人脸框之后,可以结合初始人脸标注框和预测人脸框计算得到初始人脸识别模型对应的第一损失值。
在本实施例中,第一损失值可以包括两部分:负责人脸分类和预测人脸框坐标回归两个任务分别对应的损失值。如下述公式(1)所示:
Figure PCTCN2021089846-appb-000002
上述公式(1)中,l c表示面部分类的损失,k表示检测模型组件序号(即三个检测模型组件对应的序号),p i表示第i个预测人脸框的预测概率,g i表示第i个预测人脸框正确的标注数据t(ground-truth)的标签值(当重叠度(Intersection over Union,IoU)>0.5为1,<0.5为0),负样本定义为与任何人脸的ground-truth的IoU<0.3的检测框。l r表示预测人脸框的回归损失,回归变量为检测框尺寸缩放量和平移量的log变换值。其中,b i表示预测的4个修正值,t i表示ground-truth的实际值。
可以理解地,在上述过程中提及,初始人脸识别模型可以包括三个检测模型组件,在实际训练过程中,每个检测模型组件均会输出一个损失值,具体地,上述子步骤1022可以包括:子步骤10221、子步骤10222和子步骤10223,其中:
子步骤10221:根据所述初始人脸标注框和所述第一预测人脸框,计算得到所述初始人脸识别模型对应的第一检测损失值;
子步骤10222:根据所述初始人脸标注框和所述第二预测人脸框,计算得到所述初始人脸识别模型对应的第二检测损失值;
子步骤10223:根据所述初始人脸标注框和所述第三预测人脸框,计算得到所述初始人脸识别模型对应的第三检测损失值。
在本实施例中,将样本人脸图像经过三个检测模型组件之后,可以分别得到三个预测人脸框,即第一预测人脸框、第二预测人脸框和第三预测人脸框。然后,可以根据初始人脸标注框、第一预测人脸框、第二预测人脸框和 第三预测人脸框结合上述公式(1),分别计算出初始人脸识别模型对应的三个损失值,即第一检测损失值、第二检测损失值和第三检测损失值。
具体地,可以根据初始人脸标注框和第一预测人脸框计算得到第一检测损失值,根据初始人脸标注框和第二预测人脸框可以计算得到第二检测损失值,根据初始人脸标注框和第三预测人脸框可以计算得到第三检测损失值。这三个损失值分别对应于第一检测模型组件、第二检测模型组件和第三检测模型组件。
在根据初始人脸标注框和预测人脸框计算得到初始人脸识别模型对应的第一损失值之后,执行步骤1023。
子步骤1023:在所述第一损失值达到第一初始值的情况下,将训练后的初始人脸识别模型作为所述预训练人脸识别模型。
第一初始值是指由业务人员预先设置的用于对初始人脸识别模型的训练程度进行判断的标准。
在第一损失值未达到第一初始值时,表示初始人脸识别模型的训练还未达到理想的效果,则可以认为每个样本人脸图像中预测人脸框与对应初始人脸标注框的偏差较大,此时,可以增加样本人脸图像的数量,继续对初始人脸识别模型进行训练。
在第一损失值达到第一初始值时,表示初始人脸识别模型的训练已经达到了预期效果,此时,可以认为每个样本人脸图像中预测人脸框,与对应初始人脸标注框的偏差非常小,此时,可以认为该训练后初始人脸识别模型能够准确的检测出人脸图像中的人脸特征的位置,相应的,可以将该初始人脸识别模型作为预训练人脸识别模型,预训练人脸识别模型即可以进行后续的人脸图像中的人脸位置检测。
可以理解地,由于样本人脸图像在训练过程中经历了三个检测模型组件,并输出了三个损失值,即第一检测损失值、第二检测损失值和第三检测损失值,在具体训练过程中,需要这三个检测损失值均达到第一初始值才可以认为初始人脸识别模型的训练过程结束,在其中存在只少一个检测损失值未达到第一初始值时,则认为该初始人脸识别模型的训练尚未结束,需要继续结合样本人脸图像对初始人练识别模型进行训练。
在预训练过程中,可以设置设定个数的训练过程(如10k个循环等), 在预训练过程中,不加入第一训练神经网络模型和第二训练神经网络模型,这一过程是为了使训练得到的预训练人脸识别模型可以形成初始的人脸特征定位能力,即定位人脸特征在图像中的位置。
步骤103:利用第一训练神经网络模型和第二训练神经网络模型根据所述样本人脸图像对所述预训练人脸识别模型进行二次训练,得到目标人脸识别模型。
在本实施例中,第一训练神经网络模型为处理遮挡的对抗网络模型(ASDN,Adversarial spatial dropout network),所述第二训练神经网络模型为处理形变的对抗网络模型(ASTN,Adversarial spatial transform network)。其中,ASDN网络中包含两个全连接层,如图3所示,其在训练过程中学习遮挡和光照阴影对特征的影响,在前向传播中,两个全连接层形成一个漏失特征的遮挡(dropout mask),降低重要特征的权重,从而训练出更强的人脸识别模型。ASTN网络还可以包括:localization network、grid generator和sampler(如图4所示)。ASTN网络会使特征产生旋转、畸变,从而变得更难识别,在训练过程中帮助检测网络增强识别畸变人脸的性能,从而使得本实施例的方法更适用于鱼眼摄像头的人脸识别。
在本实施例中,在第一检测模型组件、第二检测模型组件和第三检测模型组件之前均嵌入一个串联连接的ASDN和ASTN,通过这两个对抗网络模型可以对预训练人脸识别模型进行训练,以学习遮挡和形变对检测结果的影响,提高畸变的人脸检测效率以及被遮挡的人脸的检测准确率。
在根据样本人脸图像对初始人脸识别模型进行训练得到预训练人脸识别模型之后,可以利用ASDN和ASTN对预训练人脸识别模型进行二次训练,并在训练完成之后,得到最终的可以对畸变人脸和遮挡人脸进行识别的目标人脸识别模型。具体地训练过程,可以结合下述具体实现方式进行详细描述。
在本公开的一种具体实现方式中,上述步骤103可以包括:子步骤1031、子步骤1032、子步骤1033、子步骤1034、子步骤1035和子步骤1036,其中:
子步骤1031:调用所述第一训练神经网络模型对所述样本人脸图像中的人脸特征进行遮挡处理,生成遮挡人脸特征。
在本实施例中,第一训练神经网络可以对对人脸特征进行遮挡处理。
遮挡人脸特征是指通过对样本人脸图像中人脸特征进行遮挡得到的人脸 特征。
在进行二次训练时,可以调用第一训练神经网络模型对样本人脸图像中的人脸特征进行遮挡处理,以生成遮挡人脸特征,具体地,假设输入至ASDN网络的特征图尺寸为d×d×c,可以采用一个滑动窗口(d/3)×(d/3),对每一个滑动窗口,可以丢掉对应通道的所有值,生成一个新的特征向量,此特征向量会被送入后续负责面部分类和预测人脸框回归的检测网络计算Loss。基于这些所有滑动窗口的Loss里面选取最高的一个Loss。针对N(N为大于等于1的正整数)个训练样本可以生成N个训练样本对{(X 1,M 1),...,(X N,M N)}。在前10K个ASDN训练迭代过程中,可以使用二值交叉熵损失函数,如下述公式(2)所示:
Figure PCTCN2021089846-appb-000003
上述公式(2)中,L表示二值交叉熵损失函数,A i,j(X P)表示ASDN针对输入特征X P在位置(i,j)的输出,
Figure PCTCN2021089846-appb-000004
表示输入特征在位置(i,j)的像素值。
子步骤1032:调用所述第二训练神经网络模型对所述遮挡人脸特征进行形变处理,生成形变人脸特征。
在采用第一训练神经网络模型对样本人脸图像中的人脸进行遮挡处理之后,可以调用串联连接的第二训练神经网络模型对遮挡人脸特征进行形变处理,从而可以生成形变人脸特征。
子步骤1033:调用所述分类层对所述形变人脸特征进行识别,确定所述形变人脸特征的预测分类结果。
在本实施例中,初始人脸识别模型中还可以包括分类层,分类层可以确定人脸特征的分类结果。可以理解地,在识别过程中,可能会将图像背景特征识别为人脸特征,通过分类层可以生成一个分类概率,通过该分类概率可以确定识别的特征是否为人脸特征。
预测分类结果是指通过分类层预测的形变人脸特征的分类。
在调用第二训练神经网络模型对遮挡人脸特征进行形变处理生成形变人脸特征之后,可以调用分类层对形变人脸特征进行识别,以确定形变人脸特征的预测分类结果。
在本实施例中,ASTN网络的训练可以采用迁移学习的方式,其训练过程 类似于ASDN网络,而在反向传播过程中,只改变定位层(localization net)的变量。
子步骤1034:调用所述检测模型组件对所述形变人脸特征进行识别,得到所述预测人脸框。
在调用第二训练神经网络模型对遮挡人脸特征进行形变处理生成形变人脸特征之后,可以调用连接于第二神经网络模型后的检测模型组件对形变人脸特征进行识别,以得到形变人脸特征所对应的预测人脸框。
子步骤1035:根据所述初始分类结果、所述初始人脸标注框、所述预测分类结果和所述预测人脸框,计算得到所述预训练人脸识别模型的第二损失值。
第二损失值是指通过第一训练神经网络模型和第二训练神经网络模型对预训练人脸识别模型进行二次训练的过程中,所得到的预训练人脸识别模型的损失值。
在获取上述步骤中的预测人脸框、预测分类结果之后,可以结合初始分类结果、初始人脸标注框、预测人脸框和预测分类结果,计算得到预训练人脸识别模型的第二损失值。
在训练过程中,当ASDN和ASTN生成的MASK使得预训练人脸识别模型非常容易识别,将会得到一个高的loss,设计ASDN与ASTN生成网络的loss为:
L A=1-sigmoid(l c(A(x),C))      (3)
上述公式(3)中,L A为ASDN与ASTN的联合损失函数,即第二损失值,l c为面部分类损失,Sigmoid为一种神经网络的阈值函数,可以将变量映射到0~1之间。
在计算得到预训练人脸识别模型的第二损失值之后,执行子步骤1036。
子步骤1036:在所述第二损失值达到第二初始值的情况下,将训练后的预训练人脸识别模型作为目标人脸识别模型。
第二初始值是指由业务人员预先设置的用于对预训练人脸识别模型的训练程度进行判断的标准。
在第二损失值未达到第二初始值时,表示预训练人脸识别模型的训练还未达到预期效果,此时,可以增加样本人脸图像的数量,继续对预训练人脸 识别模型进行训练。
在第二损失值达到第二初始值时,表示预训练人脸识别模型的训练已经达到了预期效果,此时,可以认为该训练后的预训练人脸识别模型已经能够准确的识别包含遮挡形变的人脸特征,相应的,可以将该训练后的预训练人脸识别模型作为目标人脸识别模型。
在利用第一训练神经网络模型和第二训练神经网络模型对预训练人脸识别模型训练完成之后,可以去掉第一训练神经网络模型和第二训练神经网络模型,从而得到目标人脸识别模型,而经过第一训练神经网络模型和第二训练神经网络模型对人脸识别模型的二次训练,可以提高人脸识别模型对遮挡形变人脸识别的准确率。
本公开实施例提供的人脸识别模型训练方法,通过获取样本人脸图像,根据样本人脸图像对初始人脸识别模型进行训练,得到用于对图像中的人脸位置进行检测的预训练人脸识别模型,利用第一训练神经网络和第二训练神经网络根据样本人脸图像对预训练人脸识别模型进行二次训练,得到目标人脸识别模型。本公开实施例通过在人脸识别模型的训练过程中,增加对抗网络(即第一训练神经网络模型和第二训练神经网络模型),以学习遮挡和形变对检测结果的影响,提高商用鱼眼摄像头边缘地区的畸变的人脸检测效率以及被遮挡的人脸的检测准确率。
参照图5,示出了本公开实施例提供的一种人脸识别方法的步骤流程图,该人脸识别方法具体可以包括如下步骤:
步骤201:获取待识别人脸图像。
本公开实施例可以应用于对人脸图像中的模糊人脸识别的场景中。
待识别人脸图像是指用于进行人脸特征识别的图像,在待识别人脸图像中可以包含有模糊人脸特征,如遮挡人脸特征、畸变人脸特征等。
在商用鱼眼摄像头边缘地区多容易出现畸变人脸特征的图像,则可以将商用鱼眼摄像头采集的边缘地区的图像作为待识别图像。
当然,待识别人脸图像也可以不包含模糊人脸特征,如遮挡人脸特征、畸变人脸特征等,具体地,可以根据实际情况而定,本实施例对此不加以限制。
在获取待识别人脸图像之后,执行步骤202。
步骤202:将所述待识别人脸图像输入至目标人脸识别模型输出人脸识别结果。
目标人脸识别模型是指采用上述实施例人脸识别模型训练方法训练得到的人脸识别模型。
人脸识别结果是指识别得到的图像中人脸的面部特征,可以理解地,对于不同的人而言,人脸面部特征是唯一的,通过人脸面部特征可以实现不同人的跟踪、监控等。
在将待识别人脸图像输入至目标人脸识别模型之后,可以通过目标人脸识别模型输出待识别人脸图像对应的人脸识别结果。
可以理解地,由于本实施例中的目标人脸识别模型是采用上述实施例人脸识别模型训练方法训练得到的,本实施例中的目标人脸识别模型不仅可以对未包含遮挡形变的人脸特征进行识别,也可以对包含遮挡和/或形变的人脸特征进行识别。
本实施例提供的人脸识别方法,通过获取待识别人脸图像,将待识别人脸图像输入至目标人脸识别模型输出人脸识别结果,由于本实施例中采用的目标人脸识别模型是采用上述实施例人脸识别模型训练方法训练得到的,可以实现对遮挡和形变人脸特征的识别,进而,可以提高商用鱼眼摄像头边缘地区的畸变的人脸检测效率以及被遮挡的人脸的检测准确率。
对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本公开并不受所描述的动作顺序的限制,因为依据本公开,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本公开所必须的。
另外地,本公开实施例还提供了一种人脸识别装置,包括:处理器、存储器以及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述程序时实现上述人脸识别模型训练方法,或上述人脸识别方法。
本公开还提供了一种计算机可读存储介质,当所述存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行上述人脸识别模型训练方法,或上述人脸识别方法。
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。
本公开的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本公开实施例的人脸识别装置中的一些或者全部部件的一些或者全部功能。本公开还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本公开的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。
例如,图6示出了可以实现根据本公开的方法的人脸识别装置。该人脸识别装置传统上包括处理器1010和以存储器1020形式的计算机程序产品或者计算机可读介质。存储器1020可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。存储器1020具有用于执行上述方法中的任何方法步骤的程序代码1031的存储空间1030。例如,用于程序代码的存储空间1030可以包括分别用于实现上面的方法中的各种步骤的各个程序代码1031。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。这些计算机程序产品包括诸如硬盘,紧致盘(CD)、存储卡或者软盘之类的程序代码载体。这样的计算机程序产品通常为如参考图7所述的便携式或者固定存储单元。该存储单元可以具有与图6的人脸识别装置中的存储器1020类似布置的存储段、存储空间等。程序代码可以例如以适当形式进行压缩。通常,存储单元包括计算机可读代码1031’,即可以由例如诸如1010之类的处理器读取的代码,这些代码当由人脸识别装置运行时,导致该人脸识别装置执行上面所描述的方法中的各个步骤。
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本公开的实施例可以在没有这些具体细节的情况下被实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。
以上对本公开所提供的一种人脸识别模型训练方法、一种人脸识别方法、一种人脸识别装置和一种计算机可读存储介质,进行了详细介绍,本文中应用了具体个例对本公开的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本公开的方法及其核心思想;同时,对于本领域的一般技术人员,依据本公开的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本公开的限制。

Claims (18)

  1. 一种人脸识别模型训练方法,包括:
    获取样本人脸图像;
    根据所述样本人脸图像对初始人脸识别模型进行训练,得到用于对图像中的人脸位置进行检测的预训练人脸识别模型;
    利用第一训练神经网络模型和第二训练神经网络模型,根据所述样本人脸图像对所述预训练人脸识别模型进行二次训练,得到目标人脸识别模型。
  2. 根据权利要求1所述的方法,其中,所述样本人脸图像对应于一个初始人脸标注框,所述初始人脸识别模型包括:检测模型组件;
    所述根据所述样本人脸图像对初始人脸识别模型进行训练,得到用于对图像中的人脸位置进行检测的预训练人脸识别模型,包括:
    调用所述检测模型组件对所述样本人脸图像进行识别,得到预测人脸框;
    根据所述初始人脸标注框和所述预测人脸框,计算得到所述初始人脸识别模型对应的第一损失值;
    在所述第一损失值达到第一初始值的情况下,将训练后的初始人脸识别模型作为所述预训练人脸识别模型。
  3. 根据权利要求2所述的方法,其中,所述检测模型组件包括:第一检测模型组件、第二检测模型组件和第三检测模型组件;
    所述调用所述检测模型组件对所述样本人脸图像进行识别,得到预测人脸框,包括:
    调用所述第一检测模型组件对经处理后包含第一尺寸的人脸特征的样本人脸图像进行识别,得到第一预测人脸框;
    调用所述第二检测模型组件对经处理后包含第二尺寸的人脸特征的样本人脸图像进行识别,得到第二预测人脸框;
    调用所述第三检测模型组件对经处理后包含第三尺寸的人脸特征的样本人脸图像进行识别,得到第三预测人脸框;
    其中,所述第一尺寸大于所述第二尺寸,所述第二尺寸大于所述第三尺寸。
  4. 根据权利要求3所述的方法,其中,所述根据所述初始人脸标注框和所述预测人脸框,计算得到所述初始人脸识别模型对应的第一损失值,包括:
    根据所述初始人脸标注框和所述第一预测人脸框,计算得到所述初始人脸识别模型对应的第一检测损失值;
    根据所述初始人脸标注框和所述第二预测人脸框,计算得到所述初始人脸识别模型对应的第二检测损失值;
    根据所述初始人脸标注框和所述第三预测人脸框,计算得到所述初始人脸识别模型对应的第三检测损失值。
  5. 根据权利要求4所述的方法,其中,所述在所述第一损失值达到第一初始值的情况下,将训练后的初始人脸识别模型作为所述预训练人脸识别模型,包括:
    在所述第一检测损失值、所述第二检测损失值和所述第三检测损失值均达到所述第一初始值的情况下,将训练后的初始人脸识别模型作为所述预训练人脸识别模型。
  6. 根据权利要求2-5中任一项所述的方法,其中,所述初始人脸识别模型还包括:轻量级网络层;
    在所述调用所述检测模型组件对所述样本人脸图像进行识别,得到预测人脸框之前,还包括:
    调用所述轻量级网络层对所述样本人脸图像中的人脸特征进行识别,得到识别人脸特征。
  7. 根据权利要求6所述的方法,其中,所述调用所述检测模型组件对所述样本人脸图像进行识别,得到预测人脸框,包括:
    调用所述检测模型组件对所述识别人脸特征进行检测处理,确定所述识别人脸特征在所述样本人脸图像中的预测人脸框。
  8. 根据权利要求3-5中任一项所述的方法,其中,在所述第一检测模型组件、所述第二检测模型组件和所述第三检测模型组件之前均嵌入串联连接的所述第一训练神经网络模型和所述第二训练神经网络模型。
  9. 根据权利要求8所述的方法,其中,所述初始人脸识别模型还包括:分类层,所述样本人脸图像对应于一个初始分类结果。
  10. 根据权利要求9所述的方法,其中,所述利用第一训练神经网络模型和第二训练神经网络模型根据所述样本人脸图像对所述预训练人脸识别模型进行二次训练,得到目标人脸识别模型,包括:
    调用所述第一训练神经网络模型对所述样本人脸图像中的人脸特征进行遮挡处理,生成遮挡人脸特征;
    调用所述第二训练神经网络模型对所述遮挡人脸特征进行形变处理,生成形变人脸特征;
    调用所述分类层对所述形变人脸特征进行识别,确定所述形变人脸特征的预测分类结果;
    调用所述检测模型组件对所述形变人脸特征进行识别,得到所述预测人脸框;
    根据所述初始分类结果、所述初始人脸标注框、所述预测分类结果和所述预测人脸框,计算得到所述预训练人脸识别模型的第二损失值;
    在所述第二损失值达到第二初始值的情况下,将训练后的预训练人脸识别模型作为目标人脸识别模型。
  11. 根据权利要求8-10中任一项所述的方法,其中,所述第一训练神经网络模型为处理遮挡的对抗网络模型,所述第二训练神经网络模型为处理形变的对抗网络模型。
  12. 根据权利要求2-5中任一项所述的方法,其中,所述初始人脸识别模型还包括轻量级网络层,所述检测模型组件包括:第一检测模型组件、第二检测模型组件和第三检测模型组件;
    所述第一检测模型组件与所述轻量级网络层之间连接有特征处理层,所述特征处理层用于对所述样本人脸图像中的人脸特征进行处理,以得到包含第一尺寸的人脸特征的样本人脸图像;
    所述第二检测模型组件直接连接于所述轻量级网络层之后;
    所述轻量级网络层与所述第三检测模型组件之间连接有降维处理模块,所述降维处理模块用于对所述样本人脸图像进行降维处理,以得到包含第三尺寸的人脸特征的样本人脸图像。
  13. 根据权利要求12所述的方法,其中,所述降维处理模块包括:第一激活函数层、第二激活函数层以及卷积层;
    所述第一激活函数层和所述第二激活函数层并联连接于所述轻量级网络层与所述卷积层之间,所述卷积层与所述第三检测模型组件连接。
  14. 根据权利要求2-5中任一项所述的方法,其中,所述第一损失值包括 人脸分类对应的损失值以及预测人脸框坐标对应的损失值;
    所述第一损失值的计算公式为:
    Figure PCTCN2021089846-appb-100001
    其中,l c为人脸分类的损失,k为检测模型组件序号,p i为第i个预测人脸框的预测概率,g i为第i个预测人脸框的正确的标注数据的标签值,l r为预测人脸框的回归损失,b i为预测的4个修正值,t i为正确的标注数据的实际值。
  15. 一种人脸识别方法,其中,包括:
    获取待识别人脸图像;
    将所述待识别人脸图像输入至目标人脸识别模型输出人脸识别结果;
    其中,所述目标人脸识别模型是利用权利要求1至14任一项所述的训练方法训练得到的。
  16. 一种人脸识别装置,其中,包括:
    处理器、存储器以及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述程序时实现权利要求1至14任一项所述的人脸识别模型训练方法,或权利要求15所述的人脸识别方法。
  17. 一种计算机可读存储介质,其中,当所述存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行权利要求1至14中任一项所述的人脸识别模型训练方法,或权利要求15所述的人脸识别方法。
  18. 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在人脸识别装置上运行时,导致所述人脸识别装置执行根据权利要求1至14中任一项所述的人脸识别模型训练方法,或权利要求15所述的人脸识别方法。
PCT/CN2021/089846 2020-04-30 2021-04-26 人脸识别模型训练方法、人脸识别方法及装置 WO2021218899A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010366448.0 2020-04-30
CN202010366448.0A CN111582141B (zh) 2020-04-30 2020-04-30 人脸识别模型训练方法、人脸识别方法及装置

Publications (1)

Publication Number Publication Date
WO2021218899A1 true WO2021218899A1 (zh) 2021-11-04

Family

ID=72124629

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/089846 WO2021218899A1 (zh) 2020-04-30 2021-04-26 人脸识别模型训练方法、人脸识别方法及装置

Country Status (2)

Country Link
CN (1) CN111582141B (zh)
WO (1) WO2021218899A1 (zh)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114093006A (zh) * 2021-11-26 2022-02-25 北京百度网讯科技有限公司 活体人脸检测模型的训练方法、装置、设备以及存储介质
CN114241569A (zh) * 2021-12-21 2022-03-25 中国电信股份有限公司 人脸识别攻击样本的生成方法、模型训练方法及相关设备
CN114267328A (zh) * 2021-12-14 2022-04-01 北京达佳互联信息技术有限公司 一种语音合成模型的训练方法、装置、设备以及存储介质
CN115050129A (zh) * 2022-06-27 2022-09-13 北京睿家科技有限公司 一种智能门禁的数据处理方法及系统
CN115063803A (zh) * 2022-05-31 2022-09-16 北京开拓鸿业高科技有限公司 图像处理方法、装置、存储介质及电子设备
CN115131826A (zh) * 2022-08-23 2022-09-30 浙江大华技术股份有限公司 物品检测识别方法、网络模型的训练方法和装置
CN116167922A (zh) * 2023-04-24 2023-05-26 广州趣丸网络科技有限公司 一种抠图方法、装置、存储介质及计算机设备
CN117275075A (zh) * 2023-11-01 2023-12-22 浙江同花顺智能科技有限公司 一种人脸遮挡检测方法、系统、装置和存储介质
CN118230396A (zh) * 2024-05-22 2024-06-21 苏州元脑智能科技有限公司 人脸识别及其模型训练方法、装置、设备、介质及产品

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582141B (zh) * 2020-04-30 2023-05-09 京东方科技集团股份有限公司 人脸识别模型训练方法、人脸识别方法及装置
CN112149582A (zh) * 2020-09-27 2020-12-29 中国科学院空天信息创新研究院 一种高光谱图像材质识别方法及系统
CN112149601A (zh) * 2020-09-30 2020-12-29 北京澎思科技有限公司 兼容遮挡的面部属性识别方法、装置和电子设备
WO2022134067A1 (zh) * 2020-12-25 2022-06-30 深圳市优必选科技股份有限公司 多任务识别模型的训练方法、系统及存储介质

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106096538A (zh) * 2016-06-08 2016-11-09 中国科学院自动化研究所 基于定序神经网络模型的人脸识别方法及装置
CN106815566A (zh) * 2016-12-29 2017-06-09 天津中科智能识别产业技术研究院有限公司 一种基于多任务卷积神经网络的人脸检索方法
US20180268203A1 (en) * 2017-03-17 2018-09-20 Nec Laboratories America, Inc. Face recognition system for face recognition in unlabeled videos with domain adversarial learning and knowledge distillation
CN109902621A (zh) * 2019-02-26 2019-06-18 嘉兴学院 一种三维人脸识别方法、装置、计算机设备及存储介质
CN109934115A (zh) * 2019-02-18 2019-06-25 苏州市科远软件技术开发有限公司 人脸识别模型的构建方法、人脸识别方法及电子设备
CN110210457A (zh) * 2019-06-18 2019-09-06 广州杰赛科技股份有限公司 人脸检测方法、装置、设备及计算机可读存储介质
CN110363091A (zh) * 2019-06-18 2019-10-22 广州杰赛科技股份有限公司 侧脸情况下的人脸识别方法、装置、设备及存储介质
CN110458133A (zh) * 2019-08-19 2019-11-15 电子科技大学 基于生成式对抗网络的轻量级人脸检测方法
CN111582141A (zh) * 2020-04-30 2020-08-25 京东方科技集团股份有限公司 人脸识别模型训练方法、人脸识别方法及装置
CN112215154A (zh) * 2020-10-13 2021-01-12 北京中电兴发科技有限公司 一种应用于人脸检测系统的基于蒙版的模型评估方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109753850B (zh) * 2017-11-03 2022-10-25 富士通株式会社 面部识别模型的训练方法和训练设备
CN110222562A (zh) * 2019-04-26 2019-09-10 昆明理工大学 一种基于Fast R-CNN的人脸检测方法
CN110188673B (zh) * 2019-05-29 2021-07-30 京东方科技集团股份有限公司 表情识别方法和装置
CN110837781B (zh) * 2019-10-16 2024-03-15 平安科技(深圳)有限公司 一种人脸识别方法、人脸识别装置及电子设备

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106096538A (zh) * 2016-06-08 2016-11-09 中国科学院自动化研究所 基于定序神经网络模型的人脸识别方法及装置
CN106815566A (zh) * 2016-12-29 2017-06-09 天津中科智能识别产业技术研究院有限公司 一种基于多任务卷积神经网络的人脸检索方法
US20180268203A1 (en) * 2017-03-17 2018-09-20 Nec Laboratories America, Inc. Face recognition system for face recognition in unlabeled videos with domain adversarial learning and knowledge distillation
CN109934115A (zh) * 2019-02-18 2019-06-25 苏州市科远软件技术开发有限公司 人脸识别模型的构建方法、人脸识别方法及电子设备
CN109902621A (zh) * 2019-02-26 2019-06-18 嘉兴学院 一种三维人脸识别方法、装置、计算机设备及存储介质
CN110210457A (zh) * 2019-06-18 2019-09-06 广州杰赛科技股份有限公司 人脸检测方法、装置、设备及计算机可读存储介质
CN110363091A (zh) * 2019-06-18 2019-10-22 广州杰赛科技股份有限公司 侧脸情况下的人脸识别方法、装置、设备及存储介质
CN110458133A (zh) * 2019-08-19 2019-11-15 电子科技大学 基于生成式对抗网络的轻量级人脸检测方法
CN111582141A (zh) * 2020-04-30 2020-08-25 京东方科技集团股份有限公司 人脸识别模型训练方法、人脸识别方法及装置
CN112215154A (zh) * 2020-10-13 2021-01-12 北京中电兴发科技有限公司 一种应用于人脸检测系统的基于蒙版的模型评估方法

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114093006A (zh) * 2021-11-26 2022-02-25 北京百度网讯科技有限公司 活体人脸检测模型的训练方法、装置、设备以及存储介质
CN114267328A (zh) * 2021-12-14 2022-04-01 北京达佳互联信息技术有限公司 一种语音合成模型的训练方法、装置、设备以及存储介质
CN114241569A (zh) * 2021-12-21 2022-03-25 中国电信股份有限公司 人脸识别攻击样本的生成方法、模型训练方法及相关设备
CN114241569B (zh) * 2021-12-21 2024-01-02 中国电信股份有限公司 人脸识别攻击样本的生成方法、模型训练方法及相关设备
CN115063803A (zh) * 2022-05-31 2022-09-16 北京开拓鸿业高科技有限公司 图像处理方法、装置、存储介质及电子设备
CN115050129A (zh) * 2022-06-27 2022-09-13 北京睿家科技有限公司 一种智能门禁的数据处理方法及系统
CN115131826A (zh) * 2022-08-23 2022-09-30 浙江大华技术股份有限公司 物品检测识别方法、网络模型的训练方法和装置
CN115131826B (zh) * 2022-08-23 2022-11-11 浙江大华技术股份有限公司 物品检测识别方法、网络模型的训练方法和装置
CN116167922A (zh) * 2023-04-24 2023-05-26 广州趣丸网络科技有限公司 一种抠图方法、装置、存储介质及计算机设备
CN117275075A (zh) * 2023-11-01 2023-12-22 浙江同花顺智能科技有限公司 一种人脸遮挡检测方法、系统、装置和存储介质
CN117275075B (zh) * 2023-11-01 2024-02-13 浙江同花顺智能科技有限公司 一种人脸遮挡检测方法、系统、装置和存储介质
CN118230396A (zh) * 2024-05-22 2024-06-21 苏州元脑智能科技有限公司 人脸识别及其模型训练方法、装置、设备、介质及产品

Also Published As

Publication number Publication date
CN111582141B (zh) 2023-05-09
CN111582141A (zh) 2020-08-25

Similar Documents

Publication Publication Date Title
WO2021218899A1 (zh) 人脸识别模型训练方法、人脸识别方法及装置
WO2022111506A1 (zh) 视频动作识别方法、装置、电子设备和存储介质
CN112597941B (zh) 一种人脸识别方法、装置及电子设备
Chen et al. Research on recognition of fly species based on improved RetinaNet and CBAM
US20200242451A1 (en) Method, system and apparatus for pattern recognition
CN112200057B (zh) 人脸活体检测方法、装置、电子设备及存储介质
CN113066002A (zh) 对抗样本的生成方法、神经网络的训练方法、装置及设备
Zhu et al. Few-shot common-object reasoning using common-centric localization network
Niu et al. Boundary-aware RGBD salient object detection with cross-modal feature sampling
Yang et al. BANDT: A border-aware network with deformable transformers for visual tracking
CN112528978B (zh) 人脸关键点的检测方法、装置、电子设备及存储介质
CN116912924B (zh) 一种目标图像识别方法和装置
Ma et al. Layn: Lightweight multi-scale attention yolov8 network for small object detection
CN117671800A (zh) 面向遮挡的人体姿态估计方法、装置及电子设备
CN109784154B (zh) 基于深度神经网络的情绪识别方法、装置、设备及介质
CN115471718A (zh) 基于多尺度学习的轻量级显著性目标检测模型的构建和检测方法
CN110826726B (zh) 目标处理方法、目标处理装置、目标处理设备及介质
Liu et al. Sample-Adapt Fusion Network for RGB-D Hand Detection in the Wild
CN110826469A (zh) 一种人物检测方法、装置及计算机可读存储介质
Le et al. Locality and relative distance-aware non-local networks for hand-raising detection in classroom video
Zheng et al. MD-YOLO: Surface Defect Detector for Industrial Complex Environments
Zhang et al. Lightweight network for small target fall detection based on feature fusion and dynamic convolution
Saifullah et al. Voice keyword spotting on edge devices
Li et al. Learning to capture dependencies between global features of different convolution layers
CN116188918B (zh) 图像去噪方法、网络模型的训练方法、装置、介质和设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21797617

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21797617

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 27.06.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21797617

Country of ref document: EP

Kind code of ref document: A1