WO2020186886A1 - 一种人脸识别模型的生成方法及设备 - Google Patents

一种人脸识别模型的生成方法及设备 Download PDF

Info

Publication number
WO2020186886A1
WO2020186886A1 PCT/CN2019/130815 CN2019130815W WO2020186886A1 WO 2020186886 A1 WO2020186886 A1 WO 2020186886A1 CN 2019130815 W CN2019130815 W CN 2019130815W WO 2020186886 A1 WO2020186886 A1 WO 2020186886A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature vector
face
modal
depth feature
residual compensation
Prior art date
Application number
PCT/CN2019/130815
Other languages
English (en)
French (fr)
Inventor
乔宇
邓重英
彭小江
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Publication of WO2020186886A1 publication Critical patent/WO2020186886A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • the invention belongs to the technical field of image processing, and particularly relates to a method and equipment for generating a face recognition model.
  • Multi-modal face recognition has broad application prospects in security monitoring and public security law enforcement. For example, in dark scenes at night, ordinary surveillance cameras are often difficult to image well, which limits the role of surveillance cameras at night. As the near-infrared camera has better imaging capabilities at night, it can make up for the lack of surveillance cameras based on the principle of visible light imaging. For another example, when arresting a suspect, the public security department can generate a synthetic face photo of the suspect through image synthesis based on the relevant description of the eyewitness. When making and issuing ID cards, the public security department will use ordinary cameras to collect the face images of citizens under visible light conditions. That is, the public security department only records facial images under visible light. Therefore, how to perform face recognition based on synthesized face images or face images collected based on various detection lights, that is, multi-modal face recognition technology is becoming more and more important today.
  • the existing multi-modal face recognition technology generally adopts the multi-modal face recognition technology based on artificial design features.
  • the above methods are limited by the expressive ability of artificial features, and the artificial features cannot exhaust all faces.
  • the description is not accurate, it will directly affect the recognition accuracy of the face recognition technology. It can be seen that the accuracy of the multi-modal face recognition technology based on artificially designed features is low, and the labor cost is high .
  • the embodiments of the present invention provide a method and device for generating a face recognition model to solve the existing multi-modal face recognition technology, which is mainly based on artificially designed features for multi-modal face recognition, resulting in The problem of low accuracy of face recognition and high labor cost.
  • the first aspect of the embodiments of the present invention provides a method for generating a face recognition model, including:
  • the face image includes a first face image corresponding to a primary modal and a second face image corresponding to at least one secondary modal;
  • the residual compensation model is adjusted based on the first depth feature vector and the second depth feature vector corresponding to the plurality of training objects, so that the first depth feature vector and the second depth feature The difference between the vectors is less than the preset difference threshold;
  • a face recognition model is generated according to the adjusted residual compensation model, the first convolutional neural network, and the second convolutional neural network.
  • a second aspect of the embodiments of the present invention provides a device for generating a face recognition model, including:
  • the face image acquisition unit is used to acquire face images corresponding to the training object in each preset modal; the face image includes a first face image corresponding to the primary modal and a second person corresponding to at least one secondary modal Face image
  • the first depth feature vector acquiring unit is configured to extract the first depth feature vector of the first face image through a preset first convolutional neural network
  • a second depth feature vector acquiring unit configured to extract a second depth feature vector of the second face image through a preset second convolutional neural network and a residual compensation model to be adjusted regarding the sub-modality;
  • the residual compensation model adjustment unit is configured to adjust the residual compensation model based on the first depth feature vector and the second depth feature vector corresponding to the plurality of training objects, so that the first depth The degree of difference between the feature vector and the second depth feature vector is less than a preset difference threshold;
  • the face recognition model generation unit is configured to generate a face recognition model according to the adjusted residual compensation model, the first convolutional neural network, and the second convolutional neural network.
  • a third aspect of the embodiments of the present invention provides a terminal device, including a memory, a processor, and a computer program stored in the memory and running on the processor.
  • the processor executes the computer program, Realize the steps of the first aspect.
  • a fourth aspect of the embodiments of the present invention provides a computer-readable storage medium that stores a computer program that implements the steps of the first aspect when the computer program is executed by a processor.
  • the embodiment of the present invention obtains the face images of the training object in different modalities, and extracts the second depth feature vector of the secondary modal through the residual compensation model to be adjusted and the convolutional neural network.
  • a depth feature vector and a second depth feature vector are fed back and adjusted to the residual compensation model so that the difference between the first depth feature vector and the second depth feature vector is smaller than the preset difference threshold, that is, the recognition result converges,
  • the face images of the main mode and the sub-mode belong to the same entity, and the depth feature vector is used to represent the characteristics of each key point of the face, if the residual compensation module is adjusted, the depth features of the two modes The deviation of the vector is small, so when the difference between the two depth feature vectors is less than the preset difference threshold, it can be determined that the residual compensation module has been adjusted, and the face is generated based on the residual compensation module Identify the model.
  • the present invention does not rely on the user's artificial feature description of the face information, and can generate a face recognition model by inputting the face information of the training object, thereby improving the multi-modality
  • the accuracy of face recognition reduces labor costs.
  • Fig. 1 is an implementation flowchart of a method for generating a face recognition model provided by the first embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of a ten-layer residual network provided by an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of the structure of four multi-modal face recognition networks provided by an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a second convolutional neural network after the residual compensation module is configured in the convolutional layer according to an embodiment of the present invention
  • FIG. 5 is a specific implementation flow chart of a method S104 for generating a face recognition model provided by the second embodiment of the present invention
  • FIG. 6 is a specific implementation flow chart of a method S1042 for generating a face recognition model provided by the third embodiment of the present invention.
  • Fig. 7 is a network structure diagram of a face recognition model provided by an embodiment of the present invention.
  • FIG. 8 is a specific implementation flow chart of a method S1042 for generating a face recognition model provided by the fourth embodiment of the present invention.
  • FIG. 9 is a specific implementation flowchart of a method S101 for generating a face recognition model provided by the fifth embodiment of the present invention.
  • FIG. 10 is a specific implementation flowchart of a method for generating a face recognition model provided by the sixth embodiment of the present invention.
  • FIG. 11 is a specific implementation flowchart of a method for generating a face recognition model provided by a seventh embodiment of the present invention.
  • FIG. 12 is a structural block diagram of a device for generating a face recognition model according to an embodiment of the present invention.
  • FIG. 13 is a schematic diagram of a terminal device according to another embodiment of the present invention.
  • the embodiment of the present invention obtains the face images of the training object in different modalities, and extracts the second depth feature vector of the secondary modal through the residual compensation model to be adjusted and the convolutional neural network.
  • a depth feature vector and a second depth feature vector are fed back and adjusted to the residual compensation model so that the difference between the first depth feature vector and the second depth feature vector is smaller than the preset difference threshold, that is, the recognition result converges,
  • the face images of the main mode and the sub-mode belong to the same entity, and the depth feature vector is used to represent the characteristics of each key point of the face, if the residual compensation module is adjusted, the depth features of the two modes The deviation of the vector is small, so when the difference between the two depth feature vectors is less than the preset difference threshold, it can be determined that the residual compensation module has been adjusted, and the face is generated based on the residual compensation module
  • the recognition model solves the existing multi-modal face recognition technology, which is mainly based on artificially designed features for multi-modal face recognition, which results in low
  • the execution subject of the process is the terminal device.
  • the terminal device includes, but is not limited to: a server, a computer, a smart phone, a tablet computer, and other devices capable of performing the operation of generating a face recognition model.
  • the terminal device is specifically a face recognition device that can determine the object attributes of the target object through the input face image.
  • the terminal device has multiple input channels, and each input channel can be used to identify people in a preset mode. Face images to realize multi-modal face recognition.
  • Fig. 1 shows an implementation flow chart of the method for generating a face recognition model provided by the first embodiment of the present invention, which is detailed as follows:
  • a face image corresponding to each preset modal of a training object is acquired; the face image includes a first face image corresponding to the main modal and a second face image corresponding to at least one secondary modal.
  • the terminal device can extract face images of the training object in different preset modalities from the database, and each face image corresponds to the same entity person, and then can combine the face images of different modalities.
  • the images are recognized as the same group of facial images.
  • face images of different modalities specifically refer to face images output by different imaging principles.
  • face images of different modalities include, but are not limited to: face images generated under visible light, face images generated under infrared light, face images generated based on the principle of thermal imaging, and generated based on the principle of distance measurement The depth of field face image, the face image generated based on the principle of animation synthesis, and the face image generated based on hand drawing.
  • the terminal device can select one of the modes as the main mode, and the other modes except the main mode are sub-modes.
  • the terminal device can use the face image generated based on the principle of visible light as the face image corresponding to the main mode. Since the method of acquiring the face image based on visible light is widely used, A large number of face images under visible light can be easily collected, and the algorithm for extracting the depth feature vector of the face image under visible light is relatively mature, which can greatly improve the accuracy of the face recognition model. There are two core difficulties in multi-modal face recognition. One is the inconvenience of multi-modal data collection, so there are too few data sets that can be used. The other is that there are huge modal differences between images of different modalities. Around these two problems, the prior art uses some feature expression methods that are robust to different modal information to deal with this problem.
  • a method of shared component analysis is proposed to learn the shared components between image pairs from different modalities, so that image pairs belonging to different modalities are projected into the same feature space to reduce modal differences. influences.
  • Another prior art proposes to obtain a common subspace of different modal pictures through a modal independent component analysis algorithm, and then obtain a modal independent dictionary in the common subspace through a self-learning strategy.
  • a multi-modal face recognition method based on graphic representation is proposed. This method uses Markov network to model the compatibility relationship between adjacent image blocks, and uses the similarity measurement method expressed in pairs to measure the similarity between pictures and realize the comparison of faces.
  • This technology designs an objective function to bring together the feature vectors of paired pictures of different modalities, while keeping the pictures of people with different identities away from each other, and implicitly by constructing training samples of triples Increase the number of training samples and reduce overfitting.
  • the multi-modal face recognition technology based on artificial design features is limited by the expression ability of artificial features, and the recognition accuracy is low; some existing technologies based on deep learning are difficult to solve at the same time to reduce the overfitting and overfitting of convolutional neural networks. Reduce the two problems of modal differences, and the design is relatively complicated, relatively inconvenient to use, and the recognition effect is difficult to meet the needs of practical applications.
  • the present invention proposes a simple and effective multi-modal face recognition method based on a residual compensation network.
  • the terminal device can realize the recognition of the multi-modal face image, which includes one main mode and at least one sub-mode.
  • the terminal device can configure corresponding face image input channels for different modalities based on the number of modalities of the face recognition model.
  • the terminal device can recognize the modal type of the face image, and Determine the corresponding input channel according to the modal type. Among them, if the detected face image is the first face image based on the main modal, then skip to S102 for processing; conversely, if the detected face image is the second face image based on the secondary modal, skip Go to S103 for processing.
  • the terminal device can recognize the face images of two or more sub-modalities, the terminal device can configure the corresponding second convolutional neural network and the corresponding residual compensation model for different sub-modalities, so that the residual The difference compensation model matches the image characteristics of the corresponding sub-modality, thereby improving the accuracy of recognition.
  • the first depth feature vector of the first face image is extracted through a preset first convolutional neural network.
  • the terminal device needs to perform feature extraction on the first face image, so it imports the first face image into the preset first convolutional neural network, and outputs the first face image about the first face image.
  • the first convolutional neural network may be a convolutional neural network constructed based on convolutional neural network structures such as VGGNet, GoogLeNet, DenseNet, SEnet, Xception, and light CNN.
  • the first convolutional neural network is specifically a ten-layer residual network.
  • Fig. 2 shows a schematic structural diagram of a ten-layer residual network provided by this embodiment.
  • the ten-layer residual network consists of 10 convolutional layers and a fully connected (FC) layer.
  • the number of convolution channels is gradually increased from 32 to 512, except for the convolution step of the first layer. Except for the length of 2, the step size of all other convolutional layers is 1, and the 128-dimensional vector output by the fully connected layer is the feature of the face image in the main mode.
  • "3*3Conv” is used to indicate the size of the convolution kernel of the convolution layer
  • “2*2max pool” is used to indicate the size of the convolution kernel of the pooling layer.
  • the second depth feature vector of the second face image is extracted through the preset second convolutional neural network and the residual compensation model to be adjusted regarding the sub-modality.
  • the terminal device imports the second face image into the second convolutional neural network, and extracts the person in the second face image of the sub-modality. After the face feature value, it is necessary to perform modal residual compensation on the face feature value through the residual compensation model, output the second depth feature vector about the secondary mode, and eliminate the main mode and secondary mode through the residual modal compensation model. Modal difference between modes. It should be noted that if the face recognition model can perform face recognition on multiple sub-modes, a corresponding residual compensation network can be configured for each sub-mode based on the modal characteristics of each sub-mode.
  • the convolution parameters in the first convolutional neural network and the second convolutional neural network are the same, that is, the convolution parameters of the two convolutional neural networks are shared, and the convolution of the two branches
  • the parameters are initialized with the convolutional neural network trained on large-scale visible light face images, and the convolution parameters of the two branches are shared and will not be updated during the training process, so that the differences in different modes are all passed through the residual compensation module Make adjustments so that the learnable parameters can be greatly reduced, thereby reducing overfitting.
  • Face image in sub-mode can use convolutional neural network to extract its deep feature vector Since f ⁇ (*) is trained on the face data in the main mode, it can be used to extract The depth feature vector.
  • f ⁇ (*) is trained on the face data in the main mode, it can be used to extract The depth feature vector.
  • the face feature distribution of the sub-mode is quite different from the face feature distribution of the main mode, so the f ⁇ (*) extracted The depth feature vector may get poor facial feature expression, which will bring modal differences.
  • the modal difference between can be approximately modeled as a residual compensation module, namely
  • the second depth feature vector can be generated based on the second face image after the feature value is extracted by the second convolutional neural network, and the compensation value output by the residual compensation module is superimposed, and the second depth feature vector can be approximately equal to the first A depth feature vector.
  • the residual compensation model is adjusted based on the first depth feature vector and the second depth feature vector corresponding to the multiple training objects, so that the first depth feature vector and the The degree of difference between the second depth feature vectors is less than a preset difference threshold.
  • the terminal device can be based on the first The depth feature vector and the second depth feature vector are used to feedback and adjust the residual compensation network so that the difference between the second depth feature vector output by the residual compensation network and the first depth feature vector is less than the preset difference threshold , That is, the output result converges.
  • a face recognition model is generated according to the adjusted residual compensation model, the first convolutional neural network, and the second convolutional neural network.
  • the terminal device determines that the difference between the second depth feature vector output by the residual compensation model and the first depth feature vector is less than the preset difference threshold, it indicates that the output result of the residual compensation network has converged .
  • the depth feature vector corresponding to the face image of the sub-modality can be converted by the residual compensation module to generate a depth feature vector consistent with the main mode, thereby
  • the face feature vectors of all sub-modalities can be unified to the main mode, so that it can be compared with various standard face vectors generated based on the main mode to determine the object attributes corresponding to the face images of the sub-modes.
  • Figure 3 shows four multi-modal face recognition networks provided by the present invention.
  • Figure 3a is a convolutional neural network that fine-tunes a fully connected layer
  • Figure 3b is a convolutional neural network formed by adding a fully connected and PReLU layer after the original fully connected layer.
  • Figure 3c is A face recognition network with two modal branches, and a fully connected and PReLU layer is added to the sub-modal branch.
  • FIG. 3d is a face recognition network provided by the present invention with a residual compensation module added in the sub-modal.
  • the model in Fig. 3b adds a new fully connected layer and even reduces the accuracy.
  • the reason for this phenomenon is that although the newly-added fully connected layer increases the expressive ability of the model, it is easier to overfit on small data sets of cross-modal faces.
  • the face recognition model provided by the present invention adds a residual compensation module, and its accuracy rate is also higher than that of the above models, which shows that the residual compensation module can effectively improve the accuracy of cross-modal face recognition.
  • the residual compensation module can keep the main characteristics of the backbone network basically unchanged, and at the same time compensate the difference between different modal characteristics through a nonlinear residual mapping, thereby reducing the modal difference.
  • the multi-modal face recognition model based on the residual compensation network is based on CASIA NIR-VIS 2.0, IIIT-D Viewed Sketch, Forensic Sketch, CUHK NIR-VIS and other multi-modal data
  • the set achieved the highest recognition accuracy. This shows that the multi-modal face recognition model based on the residual compensation model can effectively deal with the over-fitting problem and reduce the modal difference.
  • the implementation of the residual compensation model can be not only a fully connected layer + a nonlinear activation function, but also a stack of multiple fully connected layers + a nonlinear activation function, or a nonlinear activation function.
  • Activation function + fully connected layer, or nonlinear activation function + fully connected layer + nonlinear activation function, or nonlinear activation function + fully connected layer + nonlinear activation function, or through convolutional layer + nonlinear activation function The form is added after the convolutional layer.
  • FIG. 4 is a schematic diagram of a second convolutional neural network after the residual compensation module is configured in a convolutional layer according to an embodiment of the present invention.
  • the convolutional layer of the first convolutional neural network does not update the parameters after initialization, and the residual compensation model can be added to the upper and lower convolutions of the second convolutional neural network where the fixed parameters are not updated. Between layers. At the same time, the structure of the residual compensation model is no longer in the form of fully connected layer + PReLU, but in the form of convolutional layer + PReLU.
  • the method for generating a face recognition model obtained by the embodiment of the present invention obtains the face images of the training object in different modalities, and extracts the second time through the residual compensation model to be adjusted and the convolutional neural network.
  • the second depth eigenvector of the modal is based on the first depth eigenvector and the second depth eigenvector of the main modal, and then feedback adjustment is performed on the residual compensation model, so that the first depth eigenvector and the second depth eigenvector are The difference between the two is smaller than the preset difference threshold, that is, the recognition result converges.
  • the residual compensation module is adjusted, the deviation of the depth feature vectors of the two modalities is small. Therefore, when the difference between the two depth feature vectors is less than the preset difference threshold, the residual can be determined
  • the compensation module has been adjusted, and a face recognition model is generated based on the residual compensation module.
  • the present invention does not rely on the user's artificial feature description of the face information, and can generate a face recognition model by inputting the face information of the training object, thereby improving the multi-modality
  • the accuracy of face recognition reduces labor costs.
  • FIG. 5 shows a specific implementation flow chart of a method S104 for generating a face recognition model provided by the second embodiment of the present invention.
  • the method S104 for generating a face recognition model provided by this embodiment includes: S1041 to S1043, which are detailed as follows:
  • the adjusting the residual compensation model based on the first depth feature vector and the second depth feature vector corresponding to the multiple training objects includes:
  • the first depth feature vector and the second depth feature vector are imported into a preset difference degree calculation model, and the deviation value of the residual compensation model to be adjusted is determined.
  • the terminal device first needs to determine the current deviation value of the residual compensation model to be adjusted. Therefore, the first depth feature vector and the second depth feature vector can be imported into the preset difference degree calculation model to determine the two depths. The deviation value between the feature vectors.
  • the terminal device will input the different depth feature vectors of the training object in multiple preset modalities into the difference calculation model in the form of face image groups in pairs, thereby The deviation values of the depth feature vectors of the same training object in different modalities can be determined.
  • the residual compensation model is specifically composed of a fully connected layer and an additional fully connected layer PReLU layer.
  • the dropout technique can be used to compensate the residual based on the deviation values of multiple training objects. Network for adjustment and learning.
  • the first depth feature vector and the second depth feature vector are imported into a preset multi-mode loss function calculation model, and the loss value of the residual compensation model is determined.
  • the terminal device in addition to adjusting the residual loss model by the deviation value between the depth feature vectors of different modalities, can also adjust the residual loss model based on the loss value calculated by the residual compensation model for multiple training objects.
  • the residual compensation model performs supervised learning to avoid overfitting of the residual compensation function and reduce the difference between different modes.
  • the multi-mode loss calculation model may be a loss model based on Center loss and/or a loss model based on Contrastive loss.
  • the residual compensation model is adjusted based on the loss value and the deviation value, so that the residual compensation model meets a convergence condition; the convergence condition is:
  • is the learning parameter of the residual compensation function
  • diff(*,*) is the vector deviation function
  • the residual compensation model uses the second convolutional neural network trained on the first face image of the large-scale main modal as the backbone network, and adds the second face image for the second modal Residual compensation model and multi-mode loss function.
  • the backbone network of the residual compensation model that is, the convolution parameters of the second convolutional neural network are not updated, only in the multi-mode loss function.
  • the parameters of the residual compensation model are learned under the joint supervision of, thus greatly reducing the amount of parameters, which can effectively alleviate the over-fitting problem of the convolutional neural network.
  • the residual compensation model compensates for modal differences and multi-modal loss functions. All optimizations can reduce modal differences.
  • the deviation value of the residual compensation model is determined through the first depth feature vector and the second depth feature vector, and the loss value of multiple training objects is calculated through the multi-mode loss function and the deviation value compensates the residual
  • the adjustment and learning of the network can reduce the over-fitting of the residual compensation model, and at the same time reduce the differences caused by different modalities, and improve the accuracy of face recognition.
  • Fig. 6 shows a specific implementation flowchart of a method S1042 for generating a face recognition model provided by the third embodiment of the present invention.
  • the method S1042 for generating a face recognition model provided by this embodiment includes: S601 to S602, which are detailed as follows:
  • the first depth feature vector and the second depth feature vector are imported into a preset multi-mode loss function calculation model to determine the residual compensation model
  • the loss value includes:
  • L MD1 is the amount of modal loss
  • N is the number of training objects
  • the cosine similarity function calculates the cosine similarity between two depth feature vectors, and calculates the loss component of a single training object based on the cosine similarity, and performs a weighted summation of the loss components of N training objects to calculate The first modal loss of the residual compensation function.
  • L is the loss value
  • L softmax is a cross-entropy loss function for face classification
  • is a hyperparameter based on the cross-entropy loss function and the modal difference loss function.
  • FIG. 7 shows a network structure diagram of a face recognition model provided by an embodiment of the present invention.
  • the network structure of the face model has two input channels, the first face image channel used to output the primary modal and the second face image channel used to input the secondary modal, where
  • the second face image channel is configured with a residual compensation model, which is specifically composed of a fully connected layer and a nonlinear activation function.
  • the face recognition network imports the first deep feature vector and the second deep feature vector into the multi-mode
  • the loss function calculation model is used to calculate the first mode loss and the total loss value of the two modes, and supervise the learning of the residual compensation model.
  • the modal loss function and the cross-entropy loss function are jointly supervised to train the residual compensation network.
  • the backpropagation algorithm can be used to update the learnable parameters in the residual compensation model, and the training is good.
  • the different branches of the residual compensation network can be used to extract the depth feature vector of the corresponding modal face image, and then the depth feature vector can be used in the test to calculate the similarity of the two face images , So as to determine the identity of the person in the face image.
  • Fig. 8 shows a specific implementation flow chart of a method S1042 for generating a face recognition model provided by the fourth embodiment of the present invention.
  • the method S1042 for generating a face recognition model provided by this embodiment includes: S801 to S802, which are detailed as follows:
  • the first depth feature vector and the second depth feature vector are imported into a preset multi-mode loss function calculation model to determine the residual
  • the loss value of the compensation model includes:
  • L MD2 is the modal loss
  • N is the number of training objects
  • the deviation value between the first depth feature vector and the second depth feature vector is calculated as the Euclidean distance deviation value, and then multiple
  • you can pass Euclidean distance function calculate the Euclidean distance between two depth feature vectors, use the Euclidean distance value as the loss component of the training object, and perform a weighted summation of the loss components of the N training objects to calculate the residual The second modal loss of the difference compensation function.
  • L is the loss value
  • L softmax is a cross-entropy loss function for face classification
  • is a hyperparameter based on the cross-entropy loss function and the modal difference loss function.
  • the modal loss function and the cross-entropy loss function are jointly supervised to train the residual compensation network.
  • the backpropagation algorithm can be used to update the learnable parameters in the residual compensation model, and the training is good.
  • the different branches of the residual compensation network can be used to extract the depth feature vector of the corresponding modal face image, and then the depth feature vector can be used in the test to calculate the similarity of the two face images , So as to determine the identity of the person in the face image.
  • FIG. 9 shows a specific implementation flow chart of a method S101 for generating a face recognition model provided by the fifth embodiment of the present invention.
  • the method S101 for generating a face recognition model provided by this embodiment includes: S1011 to S1015, which are detailed as follows:
  • the acquiring the face images corresponding to the training object in each preset mode includes:
  • the object image of the training object in each of the preset modalities is acquired, and the facial feature points in the object image are determined through a face detection algorithm.
  • the terminal device can preprocess the object image of the training object, which can improve subsequent disability
  • the difference compensation model adjusts the accuracy of learning. Based on this, after acquiring the object image of the training object, the terminal device can identify multiple facial feature points about the training object through a face detection algorithm, and mark each facial feature point in the object image.
  • the facial feature points can be various facial organs, such as eyes, ears, nose, mouth, eyebrows, etc.
  • the face area of the training object from the object image based on the face feature points; the face area includes the first face area of the main modal and the sub-modality The second face area.
  • the terminal device can determine the location of the face of the training object based on the coordinate information of the facial feature points, so that the person can be extracted from the training image The image of the area where the face is, that is, the aforementioned face area.
  • the above operations are performed on training images of different modalities, so that the first face area of the main modal and the second face area of the sub-modal can be generated.
  • a standardized transformation is performed on the second face area to Match the second coordinate information of each of the face feature points in the second face area with the first coordinate information.
  • the terminal device needs to preprocess different face areas after acquiring the face area, so as to facilitate the output of the depth feature vector. Based on this, the terminal device can adjust the size of the second face area according to the area size of the first face area in the main mode, and adjust the size of the second face area according to the coordinate information of all facial feature points in the first face area.
  • the two face regions are similarly transformed or affine transformed, so that the face feature points of different modalities can be aligned, that is, the coordinate information of the same type of face feature points in different modalities is the same, and a uniform size and face pose can be obtained The same face image with different modalities.
  • the terminal device is provided with a standard face template
  • the standard face template is configured with a standard template size and standard face feature points
  • the terminal device can adjust the first face according to the standard face template. Area and the second face area, align the face feature points of the first face area and the face feature points of the second face area with the face feature points of the standard face template.
  • the terminal device can expand the monochrome image of the sub-mode with three channels, or perform gray-scale processing on the color image of the main mode, so as to ensure that the number of channels included in the main mode and the sub-mode is the same straight.
  • the pixel value of each pixel in the first face area is normalized, and the normalized first face area is recognized as the first face image.
  • the terminal device can obtain the pixel value of each pixel in the first face area, and perform normalization processing on it based on the pixel value.
  • the pixel value can be divided by 255, that is, the maximum value of the pixel value, thereby ensuring that each pixel value in the face area is a value between 0-1.
  • the terminal device can also first subtract 127.5 from the pixel value of the pixel, which is one-half of the maximum pixel value, and divide the difference by 128, so that the normalized pixel value will be in [-1,1] Within the range of, and recognize the normalized face area as the first face image.
  • the normalization operation is the same as the specific implementation process of S1014.
  • S1014 For specific description, please refer to the relevant description of S1014, which will not be repeated here.
  • the subsequent depth feature vector can be improved. Uniformity improves the training accuracy of the residual compensation model.
  • FIG. 10 shows a specific implementation flowchart of a method for generating a face recognition model provided by the sixth embodiment of the present invention.
  • the method for generating a face recognition model provided by this embodiment is based on the adjusted residual compensation model and the first convolutional neural After the network and the second convolutional neural network generate the face recognition model, it also includes: S1001 to S1004, which are detailed as follows:
  • the acquiring the face images corresponding to the training object in each preset mode includes:
  • a target image of the object to be recognized is acquired, and the modal type of the target image is determined.
  • the terminal device after the terminal device generates the face recognition model, it can realize multi-modal face recognition and determine the object attributes corresponding to different face images.
  • the user can send the object image to be recognized to the terminal device, and the terminal device extracts the target image about the object to be recognized from the object image.
  • the method of extracting the target image may adopt the method of the embodiment provided in FIG.
  • the terminal device after acquiring the target image, the terminal device needs to determine the modal type of the target image, that is, the target image is a face image generated based on the principle of primary modal imaging, or based on the principle of secondary modal imaging The generated face image. If the target image is a face image generated based on the main modality, the target feature vector of the target object is output through the first convolutional neural network, and the target feature vector is matched with each standard feature vector in the object library, So as to determine the object attributes of the object to be identified.
  • the target feature vector of the target image is calculated by the second convolutional neural network and the adjusted residual compensation model.
  • the target feature of the target image can be output through the residual compensation model corresponding to the sub-modal and the second convolutional neural network.
  • Vector because the parameter compensation is performed by the residual compensation network, that is, the target feature vector can be approximately equivalent to the target feature vector based on the main mode, so it can be matched with each standard feature vector generated based on the main mode.
  • the terminal device can calculate the matching degree between the target feature vector of the object to be identified and the standard feature vector of each registered object in the object library.
  • the target feature vector and each standard can be calculated by the smallest neighbor algorithm.
  • the distance value between the feature vectors, and the reciprocal of the distance value is used as the matching degree between the two.
  • the entered object corresponding to the standard feature vector with the highest matching degree is used as the matching object of the object to be identified.
  • the terminal device uses the entered object corresponding to the standard feature vector with the highest degree of matching as the matching object of the object to be recognized, thereby achieving the purpose of recognizing the sub-modal face image.
  • the standard feature vector of each entered object in the object library is based on the feature vector generated in the main mode.
  • the recognition accuracy can be improved by performing face recognition on a face image through a multi-modal face recognition model including a residual compensation network.
  • FIG. 11 shows a specific implementation flowchart of a method for generating a face recognition model provided by a seventh embodiment of the present invention.
  • the method for generating a face recognition model provided by this embodiment is based on the adjusted residual compensation model and the first convolutional neural After the network and the second convolutional neural network generate the face recognition model, it also includes: S1101 to S1104, which are detailed as follows:
  • a first image of a first object and a second image of a second object are acquired; the modal type of the first image is the main modal type; the modal type of the second image is the sub-modal type .
  • the terminal device can be used to detect whether two objects belong to the same entity user. Therefore, the terminal device can obtain the first image of the first object to be matched and the image of the second object to be matched.
  • the second image may include multiple second images, and different second images may correspond to different modal types or the same modal type, which is not limited here.
  • the first target vector of the first image is extracted through the first convolutional neural network.
  • the terminal device may calculate the first depth feature vector of the first object through the first convolutional neural network, that is, the aforementioned first target vector.
  • the second target vector of the second image is extracted through the second convolutional neural network and the adjusted residual compensation model.
  • the terminal device can determine the second depth feature vector of the second image, that is, the aforementioned second target vector through the second convolutional neural network and the adjusted residual compensation model.
  • the deviation value between the first target vector and the second target vector is calculated, and if the deviation value is less than a preset deviation threshold, it is recognized that the first object and the second object belong to The same entity object.
  • the terminal device can calculate the deviation value between the first target vector and the second target vector, for example, by means of cosine distance function or Euclidean distance function, etc., to calculate the degree of difference between the two vectors, that is, the above If the deviation value is less than the preset deviation threshold, the two objects are identified as belonging to the same entity; conversely, if the deviation value is greater than or equal to the preset deviation threshold, it means that the two objects belong to two different Entity object.
  • the images of the two modalities can be imported into the face recognition network, the depth feature vectors corresponding to the two modalities are calculated, and the two faces are determined based on the deviation value between the two depth feature vectors. Whether the image belongs to the same entity object realizes the purpose of classification and recognition of the entity object.
  • Figure 12 shows a structural block diagram of a face recognition model generation device provided by an embodiment of the present invention.
  • the face recognition model generation device includes units for executing steps in the embodiment corresponding to Figure 1 .
  • only the parts related to this embodiment are shown.
  • the device for generating the face recognition model includes:
  • the face image acquiring unit 121 is configured to acquire face images corresponding to the training object in each preset modal; the face image includes a first face image corresponding to a primary modal and a second face image corresponding to at least one secondary modal Face image
  • the first depth feature vector acquiring unit 122 is configured to extract the first depth feature vector of the first face image through a preset first convolutional neural network
  • the second depth feature vector acquiring unit 123 is configured to extract the second depth feature vector of the second face image through a preset second convolutional neural network and a residual compensation model to be adjusted regarding the sub-modality ;
  • the residual compensation model adjustment unit 124 is configured to adjust the residual compensation model based on the first depth feature vector and the second depth feature vector corresponding to the multiple training objects, so that the first The degree of difference between the depth feature vector and the second depth feature vector is less than a preset difference threshold;
  • the face recognition model generating unit 125 is configured to generate a face recognition model according to the adjusted residual compensation model, the first convolutional neural network, and the second convolutional neural network.
  • the residual compensation model adjustment unit 124 includes:
  • a compensation deviation value calculation unit configured to import the first depth feature vector and the second depth feature vector into a preset difference degree calculation model, and determine the deviation value of the residual compensation model to be adjusted;
  • a compensation loss value calculation unit configured to import the first depth feature vector and the second depth feature vector into a preset multi-mode loss function calculation model, and determine the loss value of the residual compensation model
  • the model convergence adjustment unit is configured to adjust the residual compensation model based on the loss value and the deviation value, so that the residual compensation model meets a convergence condition; the convergence condition is:
  • is the learning parameter of the residual compensation function
  • diff(*,*) is the vector deviation function
  • the compensation loss value calculation unit includes:
  • the first modal loss calculation unit is configured to import the first depth feature vector and the second depth feature vector of the plurality of training objects into a preset first modal difference loss function, and calculate the residual
  • the first modal loss amount of the difference compensation model; the first modal difference loss function is specifically:
  • L MD1 is the amount of modal loss
  • N is the number of training objects
  • the first loss value output unit is configured to import the first modal loss amount into a preset face recognition loss function to calculate the loss value of the residual compensation model;
  • the face recognition loss function is specifically :
  • L is the loss value
  • L softmax is a cross-entropy loss function for face classification
  • is a hyperparameter based on the cross-entropy loss function and the modal difference loss function.
  • the compensation loss value calculation unit includes:
  • the second modal loss calculation unit is configured to import the first depth feature vector and the second depth feature vector of the multiple training objects into a preset second modal difference loss function, and calculate the residual
  • the second modal loss amount of the difference compensation model; the second modal difference loss function is specifically:
  • L MD2 is the modal loss
  • N is the number of training objects
  • the second loss value output unit is configured to import the second modal loss amount into a preset face recognition loss function, and calculate the loss value of the residual compensation model;
  • the face recognition loss function is specifically :
  • L is the loss value
  • L softmax is a cross-entropy loss function for face classification
  • is a hyperparameter based on the cross-entropy loss function and the modal difference loss function.
  • the face image acquiring unit 121 includes:
  • the face feature point recognition unit is configured to obtain the object image of the training object in each of the preset modalities, and determine the face feature point in the object image through a face detection algorithm;
  • the face area extraction unit is configured to extract the face area of the training object from the object image based on the face feature points; the face area includes the first face area of the main modal and all The second face area of the sub-mode;
  • the face feature point adjustment unit is configured to compare the second face area based on the first coordinate information of each of the face feature points in the first face area and the area size of the first face area Performing a standardized transformation, so that the second coordinate information of each of the facial feature points in the second face region matches the first coordinate information;
  • the first normalization processing unit is configured to perform normalization processing on the pixel value of each pixel in the first face area, and recognize the normalized first face area as the first Face image
  • the second normalization processing unit is configured to perform normalization processing on the pixel value of each pixel in the second face region after transformation, and recognize the normalized second face region as the Describe the second face image.
  • the device for generating the face recognition model further includes:
  • the modal type recognition unit is used to obtain the target image of the object to be recognized and determine the modal type of the target image
  • the target feature vector output unit is configured to calculate the target feature vector of the target image through the second convolutional neural network and the adjusted residual compensation model if the mode type is the sub-mode ;
  • the face matching degree calculation unit is used to calculate the matching degree between the target feature vector and each standard feature vector in the object library
  • the face recognition unit is configured to use the entered object corresponding to the standard feature vector with the highest matching degree as the matching object of the object to be recognized.
  • the device for generating the face recognition model further includes:
  • a multi-object image acquisition unit for acquiring a first image of a first object and a second image of a second object; the modal type of the first image is the main modal type; the modal type of the second image is Sub-modal type;
  • a first target vector calculation unit configured to extract a first target vector of the first image through the first convolutional neural network
  • a second target vector calculation unit configured to extract a second target vector of the second image through the second convolutional neural network and the adjusted residual compensation model
  • the same entity object recognition unit is used to calculate the deviation value between the first target vector and the second target vector, and if the deviation value is less than a preset deviation threshold, then identify the first object and the The second object belongs to the same entity object
  • the face recognition model generation device provided by the embodiment of the present invention also does not rely on the user's artificial feature description of the face information, and can generate the face recognition model by inputting the face information of the training object, thereby improving the multi-modality
  • the accuracy of face recognition reduces labor costs.
  • FIG. 13 is a schematic diagram of a terminal device according to another embodiment of the present invention.
  • the terminal device 13 of this embodiment includes: a processor 130, a memory 131, and a computer program 132 stored in the memory 131 and running on the processor 130, such as a face recognition model Generate the program.
  • the processor 130 executes the computer program 132, the steps in the above embodiments of the method for generating face recognition models are implemented, such as S101 to S105 shown in FIG. 1.
  • the processor 130 executes the computer program 132, the functions of the units in the foregoing device embodiments, such as the functions of the modules 1121 to 1125 shown in FIG. 112, are realized.
  • the computer program 132 may be divided into one or more units, and the one or more units are stored in the memory 131 and executed by the processor 130 to complete the present invention.
  • the one or more units may be a series of computer program instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the computer program 132 in the terminal device 13.
  • the computer program 132 may be divided into a human face image acquisition unit, a first depth feature vector acquisition unit, a second depth feature vector acquisition unit, a residual compensation model adjustment unit, and a face recognition model generation unit. The specific functions of each unit As mentioned above.
  • the terminal device 13 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the terminal device may include, but is not limited to, a processor 130 and a memory 131.
  • FIG. 13 is only an example of the terminal device 13, and does not constitute a limitation on the terminal device 13. It may include more or less components than shown in the figure, or a combination of certain components, or different components.
  • the terminal device may also include input and output devices, network access devices, buses, etc.
  • the so-called processor 130 may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the memory 131 may be an internal storage unit of the terminal device 13, such as a hard disk or a memory of the terminal device 13.
  • the memory 131 may also be an external storage device of the terminal device 13, for example, a plug-in hard disk equipped on the terminal device 13, a smart memory card (Smart Media Card, SMC), or a Secure Digital (SD) Card, Flash Card, etc.
  • the memory 131 may also include both an internal storage unit of the terminal device 13 and an external storage device.
  • the memory 131 is used to store the computer program and other programs and data required by the terminal device.
  • the memory 131 can also be used to temporarily store data that has been output or will be output.
  • the functional units in the various embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本申请适用于图像处理技术领域,提供了一种人脸识别模型的生成方法及设备,包括:获取训练对象在各个预设模态对应的人脸图像;通过预设的第一卷积神经网络提取第一人脸图像的第一深度特征向量;通过预设的第二卷积神经网络以及关于次模态的待调整的残差补偿模型提取第二人脸图像的第二深度特征向量;基于多个训练对象对应的第一深度特征向量以及第二深度特征向量对残差补偿模型进行调整,根据调整后的残差补偿模型、第一卷积神经网络以及第二卷积神经网络生成人脸识别模型。本申请通过输入训练对象的人脸信息来生成人脸识别模型,从而能够提高多模态下人脸识别的准确性,减少人力成本。

Description

一种人脸识别模型的生成方法及设备 技术领域
本发明属于图像处理技术领域,尤其涉及一种人脸识别模型的生成方法及设备。
背景技术
多模态人脸识别在安防监控和公安执法方面的具有广阔的应用前景。例如,在夜间黑暗场景下,普通的监控摄像头往往难以较好成像,这限制了监控摄像头在夜间的作用。而由于近红外摄像头在夜间具有较好的成像能力,能够弥补基于可视光成像原理的监控摄像头的不足。再比如,公安部门在抓捕嫌疑犯时,可以根据目击证人的相关描述,通过图像合成的手段来生成关于嫌疑犯的合成人脸照片。而公安部门在制作颁发身份证时会在可见光的条件下用普通照相机采集公民的人脸图像,即公安部门只记录可视光下的人脸图像。因此,如何根据合成的人脸图像或基于各种探测光采集的人脸图像来进行人脸识别,即多模态人脸识别技术在现今的作用越来越重要。
现有的多模态人脸识别技术,一般采用的是基于人工设计特征的多模态人脸识别技术,然而上述方式受限于人工特征的表达能力,而由于人工特征无法穷举所有人脸的不同特征,而且当描述不准确时则会直接影响该人脸识别技术的识别准确性,由此可见,基于人工设计特征的多模态人脸识别技术的准确率低,而且人力成本较高。
技术问题
有鉴于此,本发明实施例提供了一种人脸识别模型的生成方法及设备,以解决现有的多模态人脸识别技术,主要是基于人工设计特征进行多模态人脸识别,导致人脸识别的准确率低,而且人力成本较高的问题。
技术解决方案
本发明实施例的第一方面提供了一种人脸识别模型的生成方法,包括:
获取训练对象在各个预设模态对应的人脸图像;所述人脸图像包括主模态对应的第一人脸图像以及至少一个次模态对应的第二人脸图像;
通过预设的第一卷积神经网络提取所述第一人脸图像的第一深度特征向量;
通过预设的第二卷积神经网络以及关于所述次模态的待调整的残差补偿模型提取所述第二人脸图像的第二深度特征向量;
基于多个所述训练对象对应的所述第一深度特征向量以及所述第二深度特征向量对所述残差补偿模型进行调整,以使所述第一深度特征向量与所述第二深度特征向量之间差异度小于预设的差异阈值;
根据调整后的所述残差补偿模型、所述第一卷积神经网络以及所述第二卷积神经网络 生成人脸识别模型。
本发明实施例的第二方面提供了一种人脸识别模型的生成设备,包括:
人脸图像获取单元,用于获取训练对象在各个预设模态对应的人脸图像;所述人脸图像包括主模态对应的第一人脸图像以及至少一个次模态对应的第二人脸图像;
第一深度特征向量获取单元,用于通过预设的第一卷积神经网络提取所述第一人脸图像的第一深度特征向量;
第二深度特征向量获取单元,用于通过预设的第二卷积神经网络以及关于所述次模态的待调整的残差补偿模型提取所述第二人脸图像的第二深度特征向量;
残差补偿模型调整单元,用于基于多个所述训练对象对应的所述第一深度特征向量以及所述第二深度特征向量对所述残差补偿模型进行调整,以使所述第一深度特征向量与所述第二深度特征向量之间差异度小于预设的差异阈值;
人脸识别模型生成单元,用于根据调整后的所述残差补偿模型、所述第一卷积神经网络以及所述第二卷积神经网络生成人脸识别模型。
本发明实施例的第三方面提供了一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现第一方面的各个步骤。
本发明实施例的第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现第一方面的各个步骤。
有益效果
实施本发明实施例提供的一种人脸识别模型的生成方法及设备具有以下有益效果:
本发明实施例通过获取训练对象在不同模态下的人脸图像,并通过待调整的残差补偿模型以及卷积神经网络提取次模态的第二深度特征向量,基于关于主模态的第一深度特征向量以及第二深度特征向量再对残差补偿模型进行反馈调整,以使第一深度特征向量以及第二深度特征向量之间的差异度小于预设的差异阈值,即识别结果收敛,由于主模态以及次模态的人脸图像属于同一个实体人,且深度特征向量用于表示人脸各个关键点的特征,因此若残差补偿模块调整完毕,则两个模态的深度特征向量的偏差值较小,因而当两个深度特征向量之间的差异度小于预设的差异度阈值时,则可以确定该残差补偿模块已调整完毕,并基于该残差补偿模块生成人脸识别模型。与现有的多模态人脸识别技术相比,本发明不依赖用户对人脸信息进行人工特征描述,可以通过输入训练对象的人脸信息来生成人脸识别模型,从而能够提高多模态下人脸识别的准确性,减少人力成本。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本发明第一实施例提供的一种人脸识别模型的生成方法的实现流程图;
图2是本发明一实施例提供的一种十层残差网络的结构示意图;
图3是本发明一实施例提供的4种多模态人脸识别网络的结构示意图;
图4是本发明一实施例提供的残差补偿模块配置于卷积层后的第二卷积神经网络的结构示意图;
图5是本发明第二实施例提供的一种人脸识别模型的生成方法S104具体实现流程图;
图6是本发明第三实施例提供的一种人脸识别模型的生成方法S1042具体实现流程图;
图7是本发明一实施例提供的人脸识别模型的网络结构图;
图8是本发明第四实施例提供的一种人脸识别模型的生成方法S1042具体实现流程图;
图9是本发明第五实施例提供的一种人脸识别模型的生成方法S101具体实现流程图;
图10是本发明第六实施例提供的一种人脸识别模型的生成方法的具体实现流程图;
图11是本发明第七实施例提供的一种人脸识别模型的生成方法的具体实现流程图;
图12是本发明一实施例提供的一种人脸识别模型的生成设备的结构框图;
图13是本发明另一实施例提供的一种终端设备的示意图。
本发明的实施方式
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
本发明实施例通过获取训练对象在不同模态下的人脸图像,并通过待调整的残差补偿模型以及卷积神经网络提取次模态的第二深度特征向量,基于关于主模态的第一深度特征向量以及第二深度特征向量再对残差补偿模型进行反馈调整,以使第一深度特征向量以及第二深度特征向量之间的差异度小于预设的差异阈值,即识别结果收敛,由于主模态以及次模态的人脸图像属于同一个实体人,且深度特征向量用于表示人脸各个关键点的特征,因此若残差补偿模块调整完毕,则两个模态的深度特征向量的偏差值较小,因而当两个深度特征向量之间的差异度小于预设的差异度阈值时,则可以确定该残差补偿模块已调整完毕,并基于该残差补偿模块生成人脸识别模型,解决了现有多模态人脸识别技术,主要是基于人工设计特征进行多模态人脸识别,导致人脸识别的准确率低,而且人力成本较高的 问题。
在本发明实施例中,流程的执行主体为终端设备。该终端设备包括但不限于:服务器、计算机、智能手机以及平板电脑等能够执行人脸识别模型的生成操作的设备。优选地,该终端设备具体为一人脸识别装置,可以通过输入的人脸图像确定目标对象的对象属性,该终端设备具有多个输入通道,每个输入通道可以用于识别预设模态的人脸图像,从而实现多模态的人脸识别。图1示出了本发明第一实施例提供的人脸识别模型的生成方法的实现流程图,详述如下:
在S101中,获取训练对象在各个预设模态对应的人脸图像;所述人脸图像包括主模态对应的第一人脸图像以及至少一个次模态对应的第二人脸图像。
在本实施例中,终端设备可以从数据库中提取关于训练对象在不同的预设模态下的人脸图像,各个人脸图像所对应的实体人相同,继而可以将关于不同模态的人脸图像识别为同一个人脸图像组。如上所述,不同模态的人脸图像具体指的是通过不同的成像原理所输出的人脸图像。其中,不同模态的人脸图像包括但不限于:基于可视光下生成的人脸图像、基于红外光下生成的人脸图像、基于热成像原理生成的人脸图像、基于测距原理生成的景深人脸图像、基于动画合成原理生成的人脸图像以及基于手绘生成的人脸图像等。终端设备可以选取其中一种模态作为主模态,则除主模态外的其余模态则为次模态。
优选地,在本实施例中,终端设备可以将基于可视光原理生成的人脸图像作为主模态对应的人脸图像,由于基于可视光的人脸图像的获取方式被广泛使用,因此能够较易地采集得到大量关于可视光下的人脸图像,并且对可视光的人脸图像的深度特征向量的提取算法较为成熟,从而能够大大提高人脸识别模型的准确性。多模态人脸识别中有两个核心难点,一是多模态数据采集不便,因而可以使用的数据集太少,二是不同模态的图像之间存在着巨大的模态差异。围绕这两个问题,现有技术采用一些对不同模态信息具有鲁棒性的特征表达方法来处理这一问题。如:现有技术中提出了一种共有成分分析的方法来学习来自不同模态的图片对之间的共有成分,从而将属于不同模态的图片对投影到同一特征空间,减少模态差异的影响。另一现有技术中又提出通过模态独立成分分析算法来得到不同模态图片的共同子空间,再通过自我学习的策略在共同子空间中得到与模态独立的词典。再一现有技术中则提出了基于图表达的多模态人脸识别方法。该方法利用马尔科夫网络来建模邻近的图像块之间的相容性关系,并使用了成对表达的相似性度量方法来衡量图片之间的相似性,实现人脸的比对。
而近年来,由于深度学习在视觉领域的良好表现,有些研究工作也开始将深度学习方法应用于多模态人脸识别任务,这既可以充分利用深度神经网络来提取具有较强判别力的 特征,还可以利用其来学习到不同模态数据之间的高度非线性关系。现有技术中可以通过在给定数据集上混合不同图片的某些区域来合成图片,从而可以极大地增加数据量,减轻卷积神经网络的过拟合问题。还有现有技术中使用了一个成对的深度学习方法来将不同模态的数据映射到同一个的特征空间。该技术设计了一个目标函数来使不同模态的成对图片的特征向量聚集到一起,同时使属于不同身份的人的图片相互远离,并通过构造三元组的训练样本的方式来隐式地增加训练样本数,减少过拟合。而基于人工设计特征的多模态人脸识别技术受限于人工特征的表达能力,识别的准确率较低;基于深度学习的一些现有技术难以同时解决好减轻卷积神经网络过拟合和减少模态差异这两个问题,且设计较为复杂,使用相对不便,识别效果也难以满足实际应用的需要。为了解决上述技术问题,本发明提出了一种简单有效的基于残差补偿网络的多模态人脸识别方法。
在本实施例中,终端设备可以实现对多模态的人脸图像进行识别,其中包括一个主模态以及至少一个次模态。终端设备可以基于人脸识别模型的模态个数,为不同的模态配置对应的人脸图像输入通道,终端设备在获取到人脸图像后,可以识别该人脸图像的模态类型,并根据该模态类型确定对应的输入通道。其中,若检测到人脸图像为基于主模态的第一人脸图像,则跳转至S102进行处理;反之,若检测到人脸图像为基于次模态的第二人脸图像,则跳转至S103进行处理。优选地,若终端设备可以识别两个或以上次模态的人脸图像,则终端设备可以为不同的次模态配置对应的第二卷积神经网络以及对应的残差补偿模型,以使残差补偿模型与对应的次模态的图像特性相匹配,从而提高识别的准确性。
在S102中,通过预设的第一卷积神经网络提取所述第一人脸图像的第一深度特征向量。
在本实施例中,终端设备需要对第一人脸图像进行特征提取,因此会将第一人脸图像导入到预设的第一卷积神经网络中,输出关于第一人脸图像的第一深度特征向量。该第一卷积神经网络可以为基于VGGNet、GoogLeNet、DenseNet、SENet、Xception、light CNN等卷积神经网络结构构建的卷积神经网络。
可选地,该第一卷积神经网络具体为十层残差网络。图2示出了本实施例提供的一种十层残差网络的结构示意图。如图2所示,该十层残差网络包括10个卷积层以及一个全连接(FC)层构成,其中卷积通道数从32逐步增加到512,其中,除了第一层的卷积步长为2外,其余所有卷积层的步长均为1,全连接层输出的128维向量即为主模态下的人脸图像的特征。其中,“3*3Conv”用于表示该卷积层的卷积核的尺寸;而“2*2max pool”则用于表示池化层的卷积核的尺寸。
在S103中,通过预设的第二卷积神经网络以及关于所述次模态的待调整的残差补偿模 型提取所述第二人脸图像的第二深度特征向量。
在本实施例中,为了调整模态差异所对深度特征向量的影响,终端设备在将第二人脸图像导入到第二卷积神经网络,提取关于次模态的第二人脸图像的人脸特征值后,需要通过残差补偿模型对该人脸特征值进行模态残差补偿,输出关于次模态的第二深度特征向量,通过残差模态补偿模型来消除主模态与次模态之间的模态差异。需要说明的是,若人脸识别模型可以对多个次模态进行人脸识别,则可以基于每个次模态的模态特征,为各个次模态配置对应的残差补偿网络。
可选地,在本实施例中,第一卷积神经网络与第二卷积神经网络内的卷积参数相同,即两个卷积神经网络的卷积参数共享,这两个分支的卷积参数以在大规模可见光人脸图像上训练得到的卷积神经网络来初始化,并且两个分支的卷积参数共享且在训练过程中不再更新,从而不同模态的差异均通过残差补偿模块进行调整,如此可以极大地减少可学习的参数,进而减少过拟合。
在本实施例中,对于属于同一个人i的主模态下的人脸图像
Figure PCTCN2019130815-appb-000001
和次模态下的人脸图像
Figure PCTCN2019130815-appb-000002
可以使用卷积神经网络提取其深度特征向量
Figure PCTCN2019130815-appb-000003
由于f θ(*)是在主模态下的人脸数据上训练得到的,因而可以用它来提取
Figure PCTCN2019130815-appb-000004
的深度特征向量。但是,次模态的人脸特征分布与主模态下的人脸特征分布差异较大,所以使用f θ(*)提取的
Figure PCTCN2019130815-appb-000005
深度特征向量可能会得到较差的人脸特征表达,从而带来模态差异。
由于预训练卷积神经网络的输出
Figure PCTCN2019130815-appb-000006
是属于同一个实体人i的,因此,它们应该都和一个仅与该人物身份相关的隐向量x i相关。假定
Figure PCTCN2019130815-appb-000007
是由x i经过不同变换所得:
Figure PCTCN2019130815-appb-000008
其中,
Figure PCTCN2019130815-appb-000009
为变换函数。记
Figure PCTCN2019130815-appb-000010
Figure PCTCN2019130815-appb-000011
的近似逆函数,它使得
Figure PCTCN2019130815-appb-000012
于是有
Figure PCTCN2019130815-appb-000013
其中
Figure PCTCN2019130815-appb-000014
上述表明,
Figure PCTCN2019130815-appb-000015
之间的模态差异可以近似建模成一个残差补偿模块,即
Figure PCTCN2019130815-appb-000016
第二深度特征向量可以基于第二人脸图像经过第二卷积神 经网络提取特征值后,与残差补偿模块输出的补偿值进行叠加后生成得到,且第二深度特征向量可以近似等同于第一深度特征向量。
在S104中,基于多个所述训练对象对应的所述第一深度特征向量以及所述第二深度特征向量对所述残差补偿模型进行调整,以使所述第一深度特征向量与所述第二深度特征向量之间差异度小于预设的差异阈值。
在本实施例中,由于残差补偿网络中的学习参数为未调整状态,即该残差补偿网络未与次模态的图像特征相匹配,因此,终端设备可以根据多个训练对象的第一深度特征向量以及第二深度特征向量,对残差补偿网络进行反馈调整,以使经过残差补偿网络输出的第二深度特征向量与第一深度特征向量之间的差异度小于预设的差异阈值,即输出结果收敛。
在S105中,根据调整后的所述残差补偿模型、所述第一卷积神经网络以及所述第二卷积神经网络生成人脸识别模型。
在本实施例中,终端设备在确定了残差补偿模型输出的第二深度特征向量与第一深度特征向量的差异度小于预设的差异阈值时,则表示该残差补偿网络的输出结果收敛,其中的参数与次模态的图像特征相匹配,可以将次模态的人脸图像所对应的深度特征向量经过残差补偿模块的转换后,生成与主模态一致的深度特征向量,从而可以将所有次模态的人脸特征向量统一到主模态,从而可以与基于主模态生成的各个标准人脸向量进行比对,确定次模态的人脸图像所对应的对象属性。
图3示出了本发明提供的4种多模态人脸识别网络。参见图3所示,其中,图3a为微调全连接层的卷积神经网络,图3b为在原始全连接层后新增了一个全连接和PReLU层所构成的卷积神经网络,图3c为含两个模态分支的人脸识别网络,并在次模态分支增加了一个全连接和PReLU层,图3d为本发明提供的在次模态下增加残差补偿模块的人脸识别网络。以上四种结构都使用交叉熵损失函数来对神经网络进行调整学习,并采集基于CASIA NIR-VIS 2.0和IIIT-D Viewed Sketch两个跨模态人脸数据集下人脸识别的实验结果。具体实验结果参见表1所示,根据实验结果可以得出:
1)预训练的卷积神经网络在两个数据集上均难以取得较好地结果,说明仅在可见光人脸数据集上训练的模型无法有效处理模态差异。
2)按照传统的迁移学习方法微调预训练的卷积神经网络的所有层可以较大地提升模型的性能,但是,只微调全连接层的图3a的模型的准确率比微调所有层更高,这一现象在IIIT-D数据集上尤为明显。这是因为仅微调全连接层可以降低卷积神经网络过拟合的风险。
3)与图3a的模型相比,图3b的模型增加新的全连接层甚至会降低准确率。这种现象的原因是新增全连接层虽然增加了模型的表达能力,在跨模态人脸的小数据集上反而更容 易过拟合。
4)图3c的准确率甚至比微调所有层的准确率还低,这是因为仅在次模态的分支增加PReLU而在主模态的人脸图像分支无PReLU会导致不同模态数据的特征差异相对更大,即,它引入了新的导致模态差异的因素。
5)本发明提供的人脸识别模型中增加了残差补偿模块,其准确率也上述几种模型的高,这说明了残差补偿模块确实能有效提升跨模态人脸识别的准确率。与图3c的模型相比,残差补偿模块可以保持主干网络的主要特征基本不变,同时通过一个非线性的残差映射来补偿不同模态特征之间的差异,从而可以减少模态差异。
Figure PCTCN2019130815-appb-000017
表1
除与基准模型对比外,我们还进一步比较了本发明实施例提供的人脸识别模型和现有的人脸识别模型在性能的差异,具体差异参见表2至表3的内容。
Figure PCTCN2019130815-appb-000018
Figure PCTCN2019130815-appb-000019
表2 CASIA NIR-VIS 2.0数据集上准确率对比
Figure PCTCN2019130815-appb-000020
表3 IIIT-D Viewed Sketch数据集上的准确率对比
Figure PCTCN2019130815-appb-000021
表4 CUHK NIR-VIS和Forensic Sketch数据集上的准确率对比
由表2~4可知,本发明提供的基于残差补偿网络实现的多模态人脸识别模型在CASIA NIR-VIS 2.0,IIIT-D Viewed Sketch,Forensic Sketch和CUHK NIR-VIS等多模态数据集取得了识别准确率最高的实现结果。这说明了基于残差补偿模型的多模态人脸识别模型能有效应对过拟合问题,并减少了模态差异。
可选地,在本实施例中,残差补偿模型的实现方式不仅可以是全连接层+非线性激活函数的方式,也可以是多个全连接层+非线性激活函数的堆叠,或者非线性激活函数+全连接层,或者非线性激活函数+全连接层+非线性激活函数,或者非线性激活函数+全连接层+非线性激活函数,也可以是通过卷积层+非线性激活函数的形式加在卷积层后。图4为本发明 一实施例提供的残差补偿模块配置于卷积层后的第二卷积神经网络示意图。如图4所示,该第一卷积神经网络的卷积层在初始化后不再更新参数,可以将残差补偿模型加在第二卷积神经网络的固定参数不更新的上下两个卷积层之间。同时残差补偿模型的结构不再是全连接层+PReLU的形式,而是卷积层+PReLU的形式。
以上可以看出,本发明实施例提供的一种人脸识别模型的生成方法通过获取训练对象在不同模态下的人脸图像,并通过待调整的残差补偿模型以及卷积神经网络提取次模态的第二深度特征向量,基于关于主模态的第一深度特征向量以及第二深度特征向量再对残差补偿模型进行反馈调整,以使第一深度特征向量以及第二深度特征向量之间的差异度小于预设的差异阈值,即识别结果收敛,由于主模态以及次模态的人脸图像属于同一个实体人,且深度特征向量用于表示人脸各个关键点的特征,因此若残差补偿模块调整完毕,则两个模态的深度特征向量的偏差值较小,因而当两个深度特征向量之间的差异度小于预设的差异度阈值时,则可以确定该残差补偿模块已调整完毕,并基于该残差补偿模块生成人脸识别模型。与现有的多模态人脸识别技术相比,本发明不依赖用户对人脸信息进行人工特征描述,可以通过输入训练对象的人脸信息来生成人脸识别模型,从而能够提高多模态下人脸识别的准确性,减少人力成本。
图5示出了本发明第二实施例提供的一种人脸识别模型的生成方法S104的具体实现流程图。参见图1,相对于图1所述实施例,本实施例提供的一种人脸识别模型的生成方法S104包括:S1041~S1043,具体详述如下:
进一步地,所述基于多个所述训练对象对应的所述第一深度特征向量以及所述第二深度特征向量对所述残差补偿模型进行调整,包括:
在S1041中,将所述第一深度特征向量以及所述第二深度特征向量导入到预设的差异度计算模型,确定待调整的所述残差补偿模型的偏差值。
在本实施例中,终端设备首先需要确定待调整的残差补偿模型当前的偏差值,因此可以将第一深度特征向量以及第二深度特征向量导入预设的差异度计算模型,确定两个深度特征向量之间的偏差值,在S1041中,终端设备会将训练对象在多个预设模态下的不同深度特征向量,以人脸图像组的形式成对输入至该差异度计算模型,从而可以确定关于同一训练对象在不同模态下的深度特征向量的偏差值。
可选地,在本实施例中,所述残差补偿模型具体为有一全连接层和增加额外的全连接层PReLU层构成,可以基于多个训练对象的偏差值采用dropout技巧对该残差补偿网络进行调整学习。
在1042中,将所述第一深度特征向量以及所述第二深度特征向量导入预设的多模损失 函数计算模型,确定所述残差补偿模型的损失值。
在本实施例中,终端设备除了通过不同模态的深度特征向量之间的偏差值对残差损失模型进行调整外,还可以根据多个训练对象经过残差补偿模型计算后的损失值,对残差补偿模型进行监督学习,从而避免残差补偿函数的过拟合以及减少不同模态之间的差异。具体地,该多模损失计算模型可以为基于Center loss的损失模型和/或基于Contrastive loss的损失模型。
在S1043中,基于所述损失值以及所述偏差值调整所述残差补偿模型,以使所述残差补偿模型满足收敛条件;所述收敛条件为:
Figure PCTCN2019130815-appb-000022
其中,τ为所述残差补偿函数的学习参数;
Figure PCTCN2019130815-appb-000023
为所述第一深度特征向量;
Figure PCTCN2019130815-appb-000024
为所述第二深度特征向量;diff(*,*)为向量偏差函数;
Figure PCTCN2019130815-appb-000025
为所述向量偏差函数取最小值或极小值时τ的取值。
在本实施例中,残差补偿模型以在大规模主模态的第一人脸图像上训练得到的第二卷积神经网络作为主干网络,同时针对次模态的第二人脸图像添加了残差补偿模型和多模损失函数。残差补偿模型的主干网络,即第二卷积神经网络的卷积参数不更新,只在多模损失函数。的联合监督下学习残差补偿模型的参数,因而大大减少了参数量,从而可以有效减轻卷积神经网络的过拟合问题。此外,残差补偿模型对模态差异的补偿和多模损失函数。的优化都可以减少模态差异。
在本实施例中,当
Figure PCTCN2019130815-appb-000026
取最小值或极小值时,则表示该残差补偿模型已调整完毕,其中第一深度特征向量以及第二深度特征向量满足如下公式:
Figure PCTCN2019130815-appb-000027
Figure PCTCN2019130815-appb-000028
其中,
Figure PCTCN2019130815-appb-000029
为第一深度特征向量,diff(*,*)函数用于衡量两个深度特征向量的偏差度。如果加入RC模块的同时也对预训练的第二卷积神经网络做微调,则将上述公式中的f θ改为f θ+Δ,其中Δ为预训练的第二卷积神经网络的参数的改变量。
在本发明实施例中,通过第一深度特征向量以及第二深度特征向量,确定残差补偿模型的偏差值,并通过多模损失函数计算多个训练对象的损失值以及偏差值对残差补偿网络 进行调整学习,能够减少残差补偿模型的过拟合的情况,同时也能够减少不同模态所带来的差异,提高人脸识别的准确率。
图6示出了本发明第三实施例提供的一种人脸识别模型的生成方法S1042的具体实现流程图。参见图6,相对于图5所述的实施例,本实施例提供的一种人脸识别模型的生成方法S1042包括:S601~S602,具体详述如下:
进一步地,若所述偏差值为余弦偏差值,则所述将所述第一深度特征向量以及所述第二深度特征向量导入预设的多模损失函数计算模型,确定所述残差补偿模型的损失值,包括:
在S601中,将多个所述训练对象的所述第一深度特征向量以及所述第二深度特征向量导入预设的第一模态差异损失函数,计算所述残差补偿模型的第一模态损失量;所述第一模态差异损失函数具体为:
Figure PCTCN2019130815-appb-000030
其中,L MD1为所述模态损失量;N为所述训练对象的个数;
Figure PCTCN2019130815-appb-000031
为余弦相似度函数。
在本实施例中,若diff(*,*)为余弦偏差函数,则计算第一深度特征向量以及第二深度特征向量之间的偏差值为余弦偏差值,则在后续计算多个训练对象整体的损失量时,则可以通过
Figure PCTCN2019130815-appb-000032
余弦相似度函数,计算两个深度特征向量之间的余弦相似度,并基于余弦相似度计算对于单个训练对象的损失分量,并对N个训练对象的损失分量进行加权求和,即可以计算出残差补偿函数的第一模态损失量。
在S602中,将所述第一模态损失量导入预设的人脸识别损失函数,计算所述残差补偿模型的所述损失值;所述人脸识别损失函数具体为:
L=L softmax+λL MD1
其中,L为所述损失值;L softmax为用于人脸分类的交叉熵损失函数;λ为基于所述交叉熵损失函数以及所述模态差异损失函数的超参数。
举例性地,图7示出了本发明一实施例提供的人脸识别模型的网络结构图。如图7所示,该人脸模型的网络结构有两个输入通道,分别为用于输出主模态的第一人脸图像通道以及用于输入次模态的第二人脸图像通道,其中第二人脸图像通道配置有残差补偿模型,该残差补偿模型具体由全连接层以及非线性激活函数构成,人脸识别网络将第一深度特征 向量以及第二深度特征向量导入到多模损失函数计算模型来计算两个模态的第一模态损失量以及总的损失值,并对残差补偿模型进行监督学习。
在本发明实施例中,采用模态损失函数和交叉熵损失函数联合监督来训练残差补偿网络,训练过程中可以采用反向传播算法来更新残差补偿模型中的可学习参数,得到训练好的残差补偿模型后,可以用残差补偿网络的不同分支去提取对应模态的人脸图像的深度特征向量,进而可以在测试的时候使用深度特征向量来计算两张人脸图像的相似度,从而确定人脸图像中的人物身份。
图8示出了本发明第四实施例提供的一种人脸识别模型的生成方法S1042的具体实现流程图。参见图8,相对于图5所述实施例,本实施例提供的一种人脸识别模型的生成方法S1042包括:S801~S802,具体详述如下:
进一步地,若所述偏差值为欧氏距离偏差值,则所述将所述第一深度特征向量以及所述第二深度特征向量导入预设的多模损失函数计算模型,确定所述残差补偿模型的损失值,包括:
在S801中,将多个所述训练对象的所述第一深度特征向量以及所述第二深度特征向量导入预设的第二模态差异损失函数,计算所述残差补偿模型的第二模态损失量;所述第二模态差异损失函数具体为:
Figure PCTCN2019130815-appb-000033
其中,L MD2为所述模态损失量;N为所述训练对象的个数;
Figure PCTCN2019130815-appb-000034
为欧氏距离函数。
在本实施例中,若diff(*,*)为欧氏距离函数,则计算第一深度特征向量以及第二深度特征向量之间的偏差值为欧氏距离偏差值,则在后续计算多个训练对象整体的损失量时,则可以通过
Figure PCTCN2019130815-appb-000035
欧氏距离函数,计算两个深度特征向量之间的欧氏距离,并将欧氏距离值作为训练对象的损失分量,并对N个训练对象的损失分量进行加权求和,即可以计算出残差补偿函数的第二模态损失量。
在S802中,将所述第二模态损失量导入预设的人脸识别损失函数,计算所述残差补偿模型的所述损失值;所述人脸识别损失函数具体为:
L=L softmax+λL MD2
其中,L为所述损失值;L softmax为用于人脸分类的交叉熵损失函数;λ为基于所述交叉熵损失函数以及所述模态差异损失函数的超参数。
在本发明实施例中,采用模态损失函数和交叉熵损失函数联合监督来训练残差补偿网络,训练过程中可以采用反向传播算法来更新残差补偿模型中的可学习参数,得到训练好的残差补偿模型后,可以用残差补偿网络的不同分支去提取对应模态的人脸图像的深度特征向量,进而可以在测试的时候使用深度特征向量来计算两张人脸图像的相似度,从而确定人脸图像中的人物身份。
图9示出了本发明第五实施例提供的一种人脸识别模型的生成方法S101的具体实现流程图。参见图9,相对于图1-8所述实施例,本实施例提供的一种人脸识别模型的生成方法S101包括:S1011~S1015,具体详述如下:
进一步地,所述获取训练对象在各个预设模态对应的人脸图像,包括:
在S1011中,获取所述训练对象在各个所述预设模态的对象图像,并通过人脸检测算法确定所述对象图像中的人脸特征点。
在本实施例中,由于终端设备采集到的图像并非只包含训练对象的人脸信息,因此为了提高识别的准确率,终端设备可以对训练对象的对象图像进行预处理,从而能够提高后续对于残差补偿模型调整学习的准确性。基于此,终端设备在获取得到训练对象的对象图像后,可以通过人脸检测算法,识别出关于训练对象的多个人脸特征点,并在对象图像中标记出各个人脸特征点。该人脸特征点可以为各个面部器官,例如双眼、双耳、鼻子、嘴巴、眉毛等。
在S1012中,基于所述人脸特征点从所述对象图像中提取所述训练对象的人脸区域;所述人脸区域包括所述主模态的第一人脸区域以及所述次模态的第二人脸区域。
在本实施例中,终端设备可以在识别了各个模态的人脸特征点后,可以基于人脸特征点所在坐标信息,确定该训练对象的人脸所在位置,从而可以从训练图像中提取人脸所在区域的图像,即上述的人脸区域。对不同模态的训练图像均执行上述的操作,从而可以生成主模态的第一人脸区域以及次模态的第二人脸区域。
在S1013中,基于所述第一人脸区域中各个所述人脸特征点的第一坐标信息以及所述第一人脸区域的区域尺寸,对所述第二人脸区域进行标准化变换,以使所述第二人脸区域中各个所述人脸特征点的第二坐标信息与所述第一坐标信息相匹配。
在本实施例中,终端设备在获取了人脸区域后还需要对不同的人脸区域进行预处理,从而便于输出深度特征向量。基于此,终端设备可以根据主模态下的第一人脸区域的区域尺寸,调整第二人脸区域的尺寸大小,并根据第一人脸区域中所有人脸特征点的坐标信息,对第二人脸区域进行相似变换或仿射变换,从而可以将不同模态的人脸特征点进行对齐,即同类型的人脸特征点在不同模态的坐标信息相同,得到统一尺寸且人脸姿态相同的关于 不同模态的人脸图像。
可选地,在本实施例中,终端设备设置有标准人脸模板,该标准人脸模板配置有标准模板尺寸以及标准人脸特征点,终端设备可以根据该标准人脸模板调整第一人脸区域以及第二人脸区域,将第一人脸区域的人脸特征点以及第二人脸区域的人脸特征点与标准人脸模板的人脸特征点对齐。
可选地,在本实施例中,若主模态与次模态所包含的通道数不同,例如主模态为三基色图像,即包含RGB三个通道的彩色图像,而次模态为单色图像,则终端设备可以对次模态的单色图像进行三通道扩展,或对主模态的彩色图像进行灰度化处理,从而保证主模态与次模态所包含的通道个数一直。
在S1014中,将所述第一人脸区域中各个像素点的像素值进行归一化处理,将归一化后的所述第一人脸区域识别为所述第一人脸图像。
在本实施例中,终端设备可以获取第一人脸区域中各个像素点的像素值,并基于该像素值对其进行归一化处理。其中,可以将该像素值除以255,即像素值的最大值,从而保证了人脸区域中各个像素值均为0-1之间的值。终端设备还可以先将像素点的像素值减去127.5,即最大像素值的二分之一,在将差值除以128,从而使得归一化后的像素值会在[-1,1]的范围内,并将归一化后的人脸区域识别为第一人脸图像。
在S1015中,将变换后的所述第二人脸区域中各个像素点的像素值进行归一化处理,将归一化后的所述第二人脸区域识别为所述第二人脸图像。
在本实施例中,归一化操作与S1014的具体实现过程相同,具体阐述可以参见S1014的相关描述,在此不再赘述。
在本发明实施例中,通过从训练图像中提取人脸区域,并对不同模态的人脸区域进行统一变换、人脸特征点对齐以及归一化处理,从而能够提高后续的深度特征向量的统一性,提高残差补偿模型的训练准确性。
图10示出了本发明第六实施例提供的一种人脸识别模型的生成方法的具体实现流程图。参见图10,相对于图1-8所述实施例,本实施例提供的一种人脸识别模型的生成方法在所述根据调整后的所述残差补偿模型、所述第一卷积神经网络以及所述第二卷积神经网络生成人脸识别模型之后,还包括:S1001~S1004,具体详述如下:
进一步地,所述获取训练对象在各个预设模态对应的人脸图像,包括:
在S1001中,获取待识别对象的目标图像,并确定所述目标图像的模态类型。
在本实施例中,终端设备在生成人脸识别模型后,可以实现多模态的人脸识别,确定不同的人脸图像多对应的对象属性。用户可以向终端设备发送待识别的对象图像,终端设 备从对象图像中提取关于待识别对象的目标图像。其中,提取目标图像的方式可以采用图9提供的实施例的方式,在此不再赘述。
在本实施例中,终端设备在获取了目标图像后,需要确定该目标图像的模态类型,即该目标图像是基于主模态成像原理生成的人脸图像,或是基于次模态成像原理生成的人脸图像。若该目标图像为基于主模态生成的人脸图像,则通过第一卷积神经网络输出该目标对象的目标特征向量,并将该目标特征向量与对象库中的各个标准特征向量进行匹配,从而确定该待识别对象的对象属性。
在S1002中,若所述模态类型为所述次模态,则通过所述第二卷积神经网络以及调整后的所述残差补偿模型计算所述目标图像的目标特征向量。
在本实施例中,若该目标图像是基于次模态成像原理生成的人脸图像,则可以通过与该次模态对应的残差补偿模型以及第二卷积神经网络输出目标图像的目标特征向量,由于通过残差补偿网络进行参量补偿,即目标特征向量可以近似相当于基于主模态下的目标特征向量,因此可以与基于主模态生成的各个标准特征向量进行匹配。
在S1003中,计算所述目标特征向量与对象库中的各个标准特征向量之间的匹配度。
在本实施例中,终端设备可以将待识别对象的目标特征向量分别于对象库中各个已登记对象的标准特征向量进行匹配度计算,优选地,可以通过最小邻近算法计算目标特征向量与各个标准特征向量之间的距离值,并将该距离值的倒数作为两者之间的匹配度。
在S1004中,将匹配度最高的所述标准特征向量对应的已录入对象作为所述待识别对象的匹配对象。
在本实施例中,终端设备将匹配度最高的标准特征向量所对应的已录入对象作为待识别对象的匹配对象,从而实现了对次模态的人脸图像进行识别的目的。需要说明的是,该对象库中各个已录入对象的标准特征向量是基于主模态下生成的特征向量。
在本发明实施例中,通过包含有残差补偿网络的多模态人脸识别模型对人脸图像进行人脸识别,能够提高识别的准确率。
图11示出了本发明第七实施例提供的一种人脸识别模型的生成方法的具体实现流程图。参见图11,相对于图1-8所述实施例,本实施例提供的一种人脸识别模型的生成方法在所述根据调整后的所述残差补偿模型、所述第一卷积神经网络以及所述第二卷积神经网络生成人脸识别模型之后,还包括:S1101~S1104,具体详述如下:
在S1101中,获取第一对象的第一图像以及第二对象的第二图像;所述第一图像的模态类型为主模态类型;所述第二图像的模态类型为次模态类型。
在本实施例中,终端设备可以用于检测两个对象是否属于同一个实体用户,因此,终 端设备可以获取关于待匹配的第一对象的第一图像,以及另一待匹配的第二对象的第二图像,当然第二图像可以包括多个,不同的第二图像可以对应不同的模态类型或同一模态类型,在此不做限定。
在S1102中,通过所述第一卷积神经网络提取所述第一图像的第一目标向量。
在本实施例中,终端设备可以通过第一卷积神经网络计算第一对象的第一深度特征向量,即上述的第一目标向量。
在S1103中,通过所述第二卷积神经网络以及调整后的所述残差补偿模型提取所述第二图像的第二目标向量。
在本实施例中,终端设备可以通过第二卷积神经网络以及调整后的残差补偿模型确定关于第二图像的第二深度特征向量,即上述的第二目标向量。
在S1104中,计算所述第一目标向量以及所述第二目标向量之间的偏差值,若所述偏差值小于预设的偏差阈值,则识别所述第一对象以及所述第二对象属于同一实体对象。
在本实施例中,终端设备可以计算第一目标向量以及第二目标向量之间的偏差值,例如通过余弦距离函数或欧氏距离函数等方式,计算两个向量之间的差异程度,即上述的偏差值,若该偏差值小于预设的偏差阈值,则识别两个对象属于同一实体对象;反之,若该偏差值大于或等于预设的偏差阈值,则表示两个对象属于两个不同的实体对象。
在本发明实施例中,可以将两个模态的图像导入到人脸识别网络中,计算两个模态对应的深度特征向量,基于两个深度特征向量之间的偏差值,确定两个人脸图像是否属于同一实体对象,实现了实体对象的分类以及识别的目的。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实施过程构成任何限定。
图12示出了本发明一实施例提供的一种人脸识别模型的生成设备的结构框图,该人脸识别模型的生成设备包括的各单元用于执行图1对应的实施例中的各步骤。具体请参阅图1与图1所对应的实施例中的相关描述。为了便于说明,仅示出了与本实施例相关的部分。
参见图12,所述人脸识别模型的生成设备包括:
人脸图像获取单元121,用于获取训练对象在各个预设模态对应的人脸图像;所述人脸图像包括主模态对应的第一人脸图像以及至少一个次模态对应的第二人脸图像;
第一深度特征向量获取单元122,用于通过预设的第一卷积神经网络提取所述第一人脸图像的第一深度特征向量;
第二深度特征向量获取单元123,用于通过预设的第二卷积神经网络以及关于所述次模态的待调整的残差补偿模型提取所述第二人脸图像的第二深度特征向量;
残差补偿模型调整单元124,用于基于多个所述训练对象对应的所述第一深度特征向量以及所述第二深度特征向量对所述残差补偿模型进行调整,以使所述第一深度特征向量与所述第二深度特征向量之间差异度小于预设的差异阈值;
人脸识别模型生成单元125,用于根据调整后的所述残差补偿模型、所述第一卷积神经网络以及所述第二卷积神经网络生成人脸识别模型。
可选地,所述残差补偿模型调整单元124包括:
补偿偏差值计算单元,用于将所述第一深度特征向量以及所述第二深度特征向量导入到预设的差异度计算模型,确定待调整的所述残差补偿模型的偏差值;
补偿损失值计算单元,用于将所述第一深度特征向量以及所述第二深度特征向量导入预设的多模损失函数计算模型,确定所述残差补偿模型的损失值;
模型收敛调整单元,用于基于所述损失值以及所述偏差值调整所述残差补偿模型,以使所述残差补偿模型满足收敛条件;所述收敛条件为:
Figure PCTCN2019130815-appb-000036
其中,τ为所述残差补偿函数的学习参数;
Figure PCTCN2019130815-appb-000037
为所述第一深度特征向量;
Figure PCTCN2019130815-appb-000038
为所述第二深度特征向量;diff(*,*)为向量偏差函数;
Figure PCTCN2019130815-appb-000039
为所述向量偏差函数取最小值时τ的取值。
可选地,若所述偏差值为余弦偏差值,则所述补偿损失值计算单元包括:
第一模态损失量计算单元,用于将多个所述训练对象的所述第一深度特征向量以及所述第二深度特征向量导入预设的第一模态差异损失函数,计算所述残差补偿模型的第一模态损失量;所述第一模态差异损失函数具体为:
Figure PCTCN2019130815-appb-000040
其中,L MD1为所述模态损失量;N为所述训练对象的个数;
Figure PCTCN2019130815-appb-000041
为余弦相似度函数;
第一损失值输出单元,用于将所述第一模态损失量导入预设的人脸识别损失函数,计算所述残差补偿模型的所述损失值;所述人脸识别损失函数具体为:
L=L softmax+λL MD1
其中,L为所述损失值;L softmax为用于人脸分类的交叉熵损失函数;λ为基于所述交叉熵损失函数以及所述模态差异损失函数的超参数。
可选地,若所述偏差值为欧氏距离偏差值,则所述偿损失值计算单元包括:
第二模态损失量计算单元,用于将多个所述训练对象的所述第一深度特征向量以及所述第二深度特征向量导入预设的第二模态差异损失函数,计算所述残差补偿模型的第二模态损失量;所述第二模态差异损失函数具体为:
Figure PCTCN2019130815-appb-000042
其中,L MD2为所述模态损失量;N为所述训练对象的个数;
Figure PCTCN2019130815-appb-000043
为欧氏距离函数;
第二损失值输出单元,用于将所述第二模态损失量导入预设的人脸识别损失函数,计算所述残差补偿模型的所述损失值;所述人脸识别损失函数具体为:
L=L softmax+λL MD2
其中,L为所述损失值;L softmax为用于人脸分类的交叉熵损失函数;λ为基于所述交叉熵损失函数以及所述模态差异损失函数的超参数。
可选地,所述人脸图像获取单元121包括:
人脸特征点识别单元,用于获取所述训练对象在各个所述预设模态的对象图像,并通过人脸检测算法确定所述对象图像中的人脸特征点;
人脸区域提取单元,用于基于所述人脸特征点从所述对象图像中提取所述训练对象的人脸区域;所述人脸区域包括所述主模态的第一人脸区域以及所述次模态的第二人脸区域;
人脸特征点调整单元,用于基于所述第一人脸区域中各个所述人脸特征点的第一坐标信息以及所述第一人脸区域的区域尺寸,对所述第二人脸区域进行标准化变换,以使所述第二人脸区域中各个所述人脸特征点的第二坐标信息与所述第一坐标信息相匹配;
第一归一化处理单元,用于将所述第一人脸区域中各个像素点的像素值进行归一化处理,将归一化后的所述第一人脸区域识别为所述第一人脸图像;
第二归一化处理单元,用于将变换后的所述第二人脸区域中各个像素点的像素值进行归一化处理,将归一化后的所述第二人脸区域识别为所述第二人脸图像。
可选地,所述人脸识别模型的生成设备还包括:
模态类型识别单元,用于获取待识别对象的目标图像,并确定所述目标图像的模态类型;
目标特征向量输出单元,用于若所述模态类型为所述次模态,则通过所述第二卷积神经网络以及调整后的所述残差补偿模型计算所述目标图像的目标特征向量;
人脸匹配度计算单元,用于计算所述目标特征向量与对象库中的各个标准特征向量之 间的匹配度;
人脸识别单元,用于将匹配度最高的所述标准特征向量对应的已录入对象作为所述待识别对象的匹配对象。
可选地,所述人脸识别模型的生成设备还包括:
多对象图像获取单元,用于获取第一对象的第一图像以及第二对象的第二图像;所述第一图像的模态类型为主模态类型;所述第二图像的模态类型为次模态类型;
第一目标向量计算单元,用于通过所述第一卷积神经网络提取所述第一图像的第一目标向量;
第二目标向量计算单元,用于通过所述第二卷积神经网络以及调整后的所述残差补偿模型提取所述第二图像的第二目标向量;
相同实体对象识别单元,用于计算所述第一目标向量以及所述第二目标向量之间的偏差值,若所述偏差值小于预设的偏差阈值,则识别所述第一对象以及所述第二对象属于同一实体对象
因此,本发明实施例提供的人脸识别模型的生成设备同样不依赖用户对人脸信息进行人工特征描述,可以通过输入训练对象的人脸信息来生成人脸识别模型,从而能够提高多模态下人脸识别的准确性,减少人力成本。
图13是本发明另一实施例提供的一种终端设备的示意图。如图13所示,该实施例的终端设备13包括:处理器130、存储器131以及存储在所述存储器131中并可在所述处理器130上运行的计算机程序132,例如人脸识别模型的生成程序。所述处理器130执行所述计算机程序132时实现上述各个人脸识别模型的生成方法实施例中的步骤,例如图1所示的S101至S105。或者,所述处理器130执行所述计算机程序132时实现上述各装置实施例中各单元的功能,例如图112所示模块1121至1125功能。
示例性的,所述计算机程序132可以被分割成一个或多个单元,所述一个或者多个单元被存储在所述存储器131中,并由所述处理器130执行,以完成本发明。所述一个或多个单元可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述所述计算机程序132在所述终端设备13中的执行过程。例如,所述计算机程序132可以被分割成人脸图像获取单元、第一深度特征向量获取单元、第二深度特征向量获取单元、残差补偿模型调整单元以及人脸识别模型生成单元,各单元具体功能如上所述。
所述终端设备13可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述终端设备可包括,但不仅限于,处理器130、存储器131。本领域技术人员可以理解,图13仅仅是终端设备13的示例,并不构成对终端设备13的限定,可以包括比图示更多或 更少的部件,或者组合某些部件,或者不同的部件,例如所述终端设备还可以包括输入输出设备、网络接入设备、总线等。
所称处理器130可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
所述存储器131可以是所述终端设备13的内部存储单元,例如终端设备13的硬盘或内存。所述存储器131也可以是所述终端设备13的外部存储设备,例如所述终端设备13上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器131还可以既包括所述终端设备13的内部存储单元也包括外部存储设备。所述存储器131用于存储所述计算机程序以及所述终端设备所需的其他程序和数据。所述存储器131还可以用于暂时地存储已经输出或者将要输出的数据。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
以上所述实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围,均应包含在本发明的保护范围之内。

Claims (11)

  1. 一种人脸识别模型的生成方法,其特征在于,包括:
    获取训练对象在各个预设模态对应的人脸图像;所述人脸图像包括主模态对应的第一人脸图像以及至少一个次模态对应的第二人脸图像;
    通过预设的第一卷积神经网络提取所述第一人脸图像的第一深度特征向量;
    通过预设的第二卷积神经网络以及关于所述次模态的待调整的残差补偿模型提取所述第二人脸图像的第二深度特征向量;
    基于多个所述训练对象对应的所述第一深度特征向量以及所述第二深度特征向量对所述残差补偿模型进行调整,以使所述第一深度特征向量与所述第二深度特征向量之间差异度小于预设的差异阈值;
    根据调整后的所述残差补偿模型、所述第一卷积神经网络以及所述第二卷积神经网络生成人脸识别模型。
  2. 根据权利要求1所述的生成方法,其特征在于,所述基于多个所述训练对象对应的所述第一深度特征向量以及所述第二深度特征向量对所述残差补偿模型进行调整,包括:
    将所述第一深度特征向量以及所述第二深度特征向量导入到预设的差异度计算模型,确定待调整的所述残差补偿模型的偏差值;
    将所述第一深度特征向量以及所述第二深度特征向量导入预设的多模损失函数计算模型,确定所述残差补偿模型的损失值;
    基于所述损失值以及所述偏差值调整所述残差补偿模型,以使所述残差补偿模型满足收敛条件;所述收敛条件为:
    Figure PCTCN2019130815-appb-100001
    其中,τ为所述残差补偿函数的学习参数;
    Figure PCTCN2019130815-appb-100002
    为所述第一深度特征向量;
    Figure PCTCN2019130815-appb-100003
    为所述第二深度特征向量;diff(*,*)为向量偏差函数;
    Figure PCTCN2019130815-appb-100004
    为所述向量偏差函数取最小值或极小值时τ的取值。
  3. 根据权利要求2所述的生成方法,其特征在于,若所述偏差值为余弦偏差值,则所述将所述第一深度特征向量以及所述第二深度特征向量导入预设的多模损失函数计算模型,确定所述残差补偿模型的损失值包括:
    将多个所述训练对象的所述第一深度特征向量以及所述第二深度特征向量导入预设的第一模态差异损失函数,计算所述残差补偿模型的第一模态损失量;所述第一模态差异损失函数具体为:
    Figure PCTCN2019130815-appb-100005
    其中,L MD1为所述模态损失量;N为所述训练对象的个数;
    Figure PCTCN2019130815-appb-100006
    为余弦相似度函数;
    将所述第一模态损失量导入预设的人脸识别损失函数,计算所述残差补偿模型的所述损失值;所述人脸识别损失函数具体为:
    L=L softmax+λL MD1
    其中,L为所述损失值;L softmax为用于人脸分类的交叉熵损失函数;λ为基于所述交叉熵损失函数以及所述模态差异损失函数的超参数。
  4. 根据权利要求2所述的生成方法,其特征在于,若所述偏差值为欧氏距离偏差值,则所述将所述第一深度特征向量以及所述第二深度特征向量导入预设的多模损失函数计算模型,确定所述残差补偿模型的损失值包括:
    将多个所述训练对象的所述第一深度特征向量以及所述第二深度特征向量导入预设的第二模态差异损失函数,计算所述残差补偿模型的第二模态损失量;所述第二模态差异损失函数具体为:
    Figure PCTCN2019130815-appb-100007
    其中,L MD2为所述模态损失量;N为所述训练对象的个数;
    Figure PCTCN2019130815-appb-100008
    为欧氏距离函数;
    将所述第二模态损失量导入预设的人脸识别损失函数,计算所述残差补偿模型的所述损失值;所述人脸识别损失函数具体为:
    L=L softmax+λL MD2
    其中,L为所述损失值;L softmax为用于人脸分类的交叉熵损失函数;λ为基于所述交叉熵损失函数以及所述模态差异损失函数的超参数。
  5. 根据权利要求1-4任一项所述的生成方法,其特征在于,所述获取训练对象在各个预设模态对应的人脸图像,包括:
    获取所述训练对象在各个所述预设模态的对象图像,并通过人脸检测算法确定所述对象图像中的人脸特征点;
    基于所述人脸特征点从所述对象图像中提取所述训练对象的人脸区域;所述人脸区域包括所述主模态的第一人脸区域以及所述次模态的第二人脸区域;
    基于所述第一人脸区域中各个所述人脸特征点的第一坐标信息以及所述第一人脸区域的区域尺寸,对所述第二人脸区域进行标准化变换,以使所述第二人脸区域中各个所述人脸特征点的第二坐标信息与所述第一坐标信息相匹配;
    将所述第一人脸区域中各个像素点的像素值进行归一化处理,将归一化后的所述第一人脸区域识别为所述第一人脸图像;
    将变换后的所述第二人脸区域中各个像素点的像素值进行归一化处理,将归一化后的所述第二人脸区域识别为所述第二人脸图像。
  6. 根据权利要求1-4任一项所述的生成方法,其特征在于,在所述根据调整后的所述残差补偿模型、所述第一卷积神经网络以及所述第二卷积神经网络生成人脸识别模型之后,还包括:
    获取待识别对象的目标图像,并确定所述目标图像的模态类型;
    若所述模态类型为所述次模态,则通过所述第二卷积神经网络以及调整后的所述残差补偿模型计算所述目标图像的目标特征向量;
    计算所述目标特征向量与对象库中的各个标准特征向量之间的匹配度;
    将匹配度最高的所述标准特征向量对应的已录入对象作为所述待识别对象的匹配对象。
  7. 根据权利要求1-4任一项所述的生成方法,其特征在于,在所述根据调整后的所述残差补偿模型、所述第一卷积神经网络以及所述第二卷积神经网络生成人脸识别模型之后,还包括:
    获取第一对象的第一图像以及第二对象的第二图像;所述第一图像的模态类型为主模态类型;所述第二图像的模态类型为次模态类型;
    通过所述第一卷积神经网络提取所述第一图像的第一目标向量;
    通过所述第二卷积神经网络以及调整后的所述残差补偿模型提取所述第二图像的第二目标向量;
    计算所述第一目标向量以及所述第二目标向量之间的偏差值,若所述偏差值小于预设的偏差阈值,则识别所述第一对象以及所述第二对象属于同一实体对象。
  8. 一种人脸识别模型的生成设备,其特征在于,包括:
    人脸图像获取单元,用于获取训练对象在各个预设模态对应的人脸图像;所述人脸图像包括主模态对应的第一人脸图像以及至少一个次模态对应的第二人脸图像;
    第一深度特征向量获取单元,用于通过预设的第一卷积神经网络提取所述第一人脸图像的第一深度特征向量;
    第二深度特征向量获取单元,用于通过预设的第二卷积神经网络以及关于所述次模态的待调整的残差补偿模型提取所述第二人脸图像的第二深度特征向量;
    残差补偿模型调整单元,用于基于多个所述训练对象对应的所述第一深度特征向量以及所述第二深度特征向量对所述残差补偿模型进行调整,以使所述第一深度特征向量与所述第二深度特征向量之间差异度小于预设的差异阈值;
    人脸识别模型生成单元,用于根据调整后的所述残差补偿模型、所述第一卷积神经网络以及所述第二卷积神经网络生成人脸识别模型。
  9. 根据权利要求8所述的生成设备,其特征在于,所述残差补偿模型调整单元包括:
    补偿偏差值计算单元,用于将所述第一深度特征向量以及所述第二深度特征向量导入到预设的差异度计算模型,确定待调整的所述残差补偿模型的偏差值;
    补偿损失值计算单元,用于将所述第一深度特征向量以及所述第二深度特征向量导入预设的多模损失函数计算模型,确定所述残差补偿模型的损失值;
    模型收敛调整单元,用于基于所述损失值以及所述偏差值调整所述残差补偿模型,以使所述残差补偿模型满足收敛条件;所述收敛条件为:
    Figure PCTCN2019130815-appb-100009
    其中,τ为所述残差补偿函数的学习参数;
    Figure PCTCN2019130815-appb-100010
    为所述第一深度特征向量;
    Figure PCTCN2019130815-appb-100011
    为所述第二深度特征向量;diff(*,*)为向量偏差函数;
    Figure PCTCN2019130815-appb-100012
    为所述向量偏差函数取最小值或极小值时τ的取值。
  10. 一种终端设备,其特征在于,所述终端设备包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时如权利要求1至7任一项所述方法的步骤。
  11. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至7任一项所述方法的步骤。
PCT/CN2019/130815 2019-03-18 2019-12-31 一种人脸识别模型的生成方法及设备 WO2020186886A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910202253.XA CN110046551B (zh) 2019-03-18 2019-03-18 一种人脸识别模型的生成方法及设备
CN201910202253.X 2019-03-18

Publications (1)

Publication Number Publication Date
WO2020186886A1 true WO2020186886A1 (zh) 2020-09-24

Family

ID=67274935

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/130815 WO2020186886A1 (zh) 2019-03-18 2019-12-31 一种人脸识别模型的生成方法及设备

Country Status (2)

Country Link
CN (1) CN110046551B (zh)
WO (1) WO2020186886A1 (zh)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112016523A (zh) * 2020-09-25 2020-12-01 北京百度网讯科技有限公司 跨模态人脸识别的方法、装置、设备和存储介质
CN112085540A (zh) * 2020-09-27 2020-12-15 湖北科技学院 基于人工智能技术的广告智能推送系统及方法
CN112101552A (zh) * 2020-09-25 2020-12-18 北京百度网讯科技有限公司 用于训练模型的方法、装置、设备以及存储介质
CN112149634A (zh) * 2020-10-23 2020-12-29 北京百度网讯科技有限公司 图像生成器的训练方法、装置、设备以及存储介质
CN112183491A (zh) * 2020-11-04 2021-01-05 北京百度网讯科技有限公司 表情识别模型及训练方法、识别方法、装置和计算设备
CN112215136A (zh) * 2020-10-10 2021-01-12 北京奇艺世纪科技有限公司 一种目标人物识别方法、装置、电子设备及存储介质
CN112232236A (zh) * 2020-10-20 2021-01-15 城云科技(中国)有限公司 行人流量的监测方法、系统、计算机设备和存储介质
CN112633203A (zh) * 2020-12-29 2021-04-09 上海商汤智能科技有限公司 关键点检测方法及装置、电子设备和存储介质
CN113487013A (zh) * 2021-06-29 2021-10-08 杭州中葳数字科技有限公司 一种基于注意力机制的排序分组卷积方法
CN113674161A (zh) * 2021-07-01 2021-11-19 清华大学 一种基于深度学习的人脸残缺扫描补全方法、装置
CN113903053A (zh) * 2021-09-26 2022-01-07 厦门大学 基于统一中间模态的跨模态行人重识别方法
CN114140350A (zh) * 2021-11-24 2022-03-04 四川大学锦江学院 一种应用于无人机中的量子图像修复方法及装置
CN114359034A (zh) * 2021-12-24 2022-04-15 北京航空航天大学 一种基于手绘的人脸图片生成方法及系统
CN114756425A (zh) * 2022-03-08 2022-07-15 深圳集智数字科技有限公司 智能监控方法、装置、电子设备及计算机可读存储介质
CN114863542A (zh) * 2022-07-06 2022-08-05 武汉微派网络科技有限公司 基于多模态的未成年人识别方法及系统
CN113505740B (zh) * 2021-07-27 2023-10-10 北京工商大学 基于迁移学习和卷积神经网络的面部识别方法
CN118230396A (zh) * 2024-05-22 2024-06-21 苏州元脑智能科技有限公司 人脸识别及其模型训练方法、装置、设备、介质及产品

Families Citing this family (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110046551B (zh) * 2019-03-18 2021-04-20 中国科学院深圳先进技术研究院 一种人脸识别模型的生成方法及设备
CN110633698A (zh) * 2019-09-30 2019-12-31 上海依图网络科技有限公司 基于循环生成对抗网络的红外图片识别方法、设备及介质
CN110895809B (zh) * 2019-10-18 2022-07-15 中国科学技术大学 准确提取髋关节影像中关键点的方法
CN110738654B (zh) * 2019-10-18 2022-07-15 中国科学技术大学 髋关节影像中的关键点提取及骨龄预测方法
CN110781856B (zh) * 2019-11-04 2023-12-19 浙江大华技术股份有限公司 异质人脸识别模型训练方法、人脸识别方法及相关装置
CN111027382B (zh) * 2019-11-06 2023-06-23 华中师范大学 一种基于注意力机制的轻量级人脸检测的方法及模型
CN110991281B (zh) * 2019-11-21 2022-11-04 电子科技大学 一种动态人脸识别方法
CN111046759A (zh) * 2019-11-28 2020-04-21 深圳市华尊科技股份有限公司 人脸识别方法及相关装置
CN111080626B (zh) * 2019-12-19 2024-06-18 联想(北京)有限公司 一种检测方法和电子设备
CN111160350B (zh) * 2019-12-23 2023-05-16 Oppo广东移动通信有限公司 人像分割方法、模型训练方法、装置、介质及电子设备
CN111104987B (zh) * 2019-12-25 2023-08-01 盛景智能科技(嘉兴)有限公司 人脸识别方法、装置及电子设备
CN111368644B (zh) * 2020-02-14 2024-01-05 深圳市商汤科技有限公司 图像处理方法、装置、电子设备及存储介质
CN111461959B (zh) * 2020-02-17 2023-04-25 浙江大学 人脸情绪合成方法及装置
CN111488972B (zh) * 2020-04-09 2023-08-08 北京百度网讯科技有限公司 数据迁移方法、装置、电子设备和存储介质
CN111539287B (zh) * 2020-04-16 2023-04-07 北京百度网讯科技有限公司 训练人脸图像生成模型的方法和装置
CN111523663B (zh) * 2020-04-22 2023-06-23 北京百度网讯科技有限公司 一种目标神经网络模型训练方法、装置以及电子设备
CN111506761B (zh) * 2020-04-22 2021-05-14 上海极链网络科技有限公司 一种相似图片查询方法、装置、系统及存储介质
CN112084946B (zh) * 2020-05-09 2022-08-05 支付宝(杭州)信息技术有限公司 一种人脸识别方法、装置及电子设备
CN111753753A (zh) * 2020-06-28 2020-10-09 北京市商汤科技开发有限公司 图像识别方法及装置、电子设备和存储介质
CN111862030B (zh) * 2020-07-15 2024-02-09 北京百度网讯科技有限公司 一种人脸合成图检测方法、装置、电子设备及存储介质
CN111860364A (zh) * 2020-07-24 2020-10-30 携程计算机技术(上海)有限公司 人脸识别模型的训练方法、装置、电子设备和存储介质
CN114092848A (zh) * 2020-07-31 2022-02-25 阿里巴巴集团控股有限公司 对象确定和机器模型的处理方法、装置、设备和存储介质
CN112439201B (zh) * 2020-12-07 2022-05-27 中国科学院深圳先进技术研究院 一种基于次模最大化的动态场景生成方法、终端以及存储介质
CN112949855B (zh) * 2021-02-26 2023-08-25 平安科技(深圳)有限公司 人脸识别模型训练方法、识别方法、装置、设备及介质
CN113191940A (zh) * 2021-05-12 2021-07-30 广州虎牙科技有限公司 图像处理方法、装置、设备及介质
CN113205058A (zh) * 2021-05-18 2021-08-03 中国科学院计算技术研究所厦门数据智能研究院 一种防止非活体攻击的人脸识别方法
CN113240115B (zh) * 2021-06-08 2023-06-06 深圳数联天下智能科技有限公司 一种生成人脸变化图像模型的训练方法及相关装置
CN113449623B (zh) * 2021-06-21 2022-06-28 浙江康旭科技有限公司 一种基于深度学习的轻型活体检测方法
CN113449848A (zh) * 2021-06-28 2021-09-28 中国工商银行股份有限公司 卷积神经网络的训练方法、人脸识别方法及装置
CN113705506B (zh) * 2021-09-02 2024-02-13 中国联合网络通信集团有限公司 核酸检测方法、装置、设备和计算机可读存储介质
CN113989908A (zh) * 2021-11-29 2022-01-28 北京百度网讯科技有限公司 鉴别人脸图像的方法、装置、电子设备及存储介质
CN115797560B (zh) * 2022-11-28 2023-07-25 广州市碳码科技有限责任公司 一种基于近红外光谱成像的头部模型构建方法及系统
CN116343301B (zh) * 2023-03-27 2024-03-08 滨州市沾化区退役军人服务中心 基于人脸识别的人员信息智能校验系统

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107871105A (zh) * 2016-09-26 2018-04-03 北京眼神科技有限公司 一种人脸认证方法和装置
US20180137396A1 (en) * 2014-08-29 2018-05-17 Google Llc Processing images using deep neural networks
CN108573243A (zh) * 2018-04-27 2018-09-25 上海敏识网络科技有限公司 一种基于深度卷积神经网络的低质量人脸的比对方法
CN108985236A (zh) * 2018-07-20 2018-12-11 南京开为网络科技有限公司 一种基于深度化可分离卷积模型的人脸识别方法
CN109117817A (zh) * 2018-08-28 2019-01-01 摩佰尔(天津)大数据科技有限公司 人脸识别的方法及装置
WO2019009449A1 (ko) * 2017-07-06 2019-01-10 삼성전자 주식회사 영상을 부호화/복호화 하는 방법 및 그 장치
CN110046551A (zh) * 2019-03-18 2019-07-23 中国科学院深圳先进技术研究院 一种人脸识别模型的生成方法及设备

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104778441A (zh) * 2015-01-07 2015-07-15 深圳市唯特视科技有限公司 融合灰度信息和深度信息的多模态人脸识别装置及方法
CN106909905B (zh) * 2017-03-02 2020-02-14 中科视拓(北京)科技有限公司 一种基于深度学习的多模态人脸识别方法
CN107463919A (zh) * 2017-08-18 2017-12-12 深圳市唯特视科技有限公司 一种基于深度3d卷积神经网络进行面部表情识别的方法
US11182597B2 (en) * 2018-01-19 2021-11-23 Board Of Regents, The University Of Texas Systems Systems and methods for evaluating individual, group, and crowd emotion engagement and attention
CN108509843B (zh) * 2018-02-06 2022-01-28 重庆邮电大学 一种基于加权的Huber约束稀疏编码的人脸识别方法
CN109472240B (zh) * 2018-11-12 2020-02-28 北京影谱科技股份有限公司 人脸识别多模型自适应特征融合增强方法和装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180137396A1 (en) * 2014-08-29 2018-05-17 Google Llc Processing images using deep neural networks
CN107871105A (zh) * 2016-09-26 2018-04-03 北京眼神科技有限公司 一种人脸认证方法和装置
WO2019009449A1 (ko) * 2017-07-06 2019-01-10 삼성전자 주식회사 영상을 부호화/복호화 하는 방법 및 그 장치
CN108573243A (zh) * 2018-04-27 2018-09-25 上海敏识网络科技有限公司 一种基于深度卷积神经网络的低质量人脸的比对方法
CN108985236A (zh) * 2018-07-20 2018-12-11 南京开为网络科技有限公司 一种基于深度化可分离卷积模型的人脸识别方法
CN109117817A (zh) * 2018-08-28 2019-01-01 摩佰尔(天津)大数据科技有限公司 人脸识别的方法及装置
CN110046551A (zh) * 2019-03-18 2019-07-23 中国科学院深圳先进技术研究院 一种人脸识别模型的生成方法及设备

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112016523A (zh) * 2020-09-25 2020-12-01 北京百度网讯科技有限公司 跨模态人脸识别的方法、装置、设备和存储介质
CN112101552A (zh) * 2020-09-25 2020-12-18 北京百度网讯科技有限公司 用于训练模型的方法、装置、设备以及存储介质
CN112016523B (zh) * 2020-09-25 2023-08-29 北京百度网讯科技有限公司 跨模态人脸识别的方法、装置、设备和存储介质
CN112085540A (zh) * 2020-09-27 2020-12-15 湖北科技学院 基于人工智能技术的广告智能推送系统及方法
CN112215136B (zh) * 2020-10-10 2023-09-05 北京奇艺世纪科技有限公司 一种目标人物识别方法、装置、电子设备及存储介质
CN112215136A (zh) * 2020-10-10 2021-01-12 北京奇艺世纪科技有限公司 一种目标人物识别方法、装置、电子设备及存储介质
CN112232236A (zh) * 2020-10-20 2021-01-15 城云科技(中国)有限公司 行人流量的监测方法、系统、计算机设备和存储介质
CN112232236B (zh) * 2020-10-20 2024-02-06 城云科技(中国)有限公司 行人流量的监测方法、系统、计算机设备和存储介质
CN112149634A (zh) * 2020-10-23 2020-12-29 北京百度网讯科技有限公司 图像生成器的训练方法、装置、设备以及存储介质
CN112149634B (zh) * 2020-10-23 2024-05-24 北京神州数码云科信息技术有限公司 图像生成器的训练方法、装置、设备以及存储介质
CN112183491A (zh) * 2020-11-04 2021-01-05 北京百度网讯科技有限公司 表情识别模型及训练方法、识别方法、装置和计算设备
CN112633203A (zh) * 2020-12-29 2021-04-09 上海商汤智能科技有限公司 关键点检测方法及装置、电子设备和存储介质
CN113487013A (zh) * 2021-06-29 2021-10-08 杭州中葳数字科技有限公司 一种基于注意力机制的排序分组卷积方法
CN113487013B (zh) * 2021-06-29 2024-05-07 杭州中葳数字科技有限公司 一种基于注意力机制的排序分组卷积方法
CN113674161A (zh) * 2021-07-01 2021-11-19 清华大学 一种基于深度学习的人脸残缺扫描补全方法、装置
CN113505740B (zh) * 2021-07-27 2023-10-10 北京工商大学 基于迁移学习和卷积神经网络的面部识别方法
CN113903053A (zh) * 2021-09-26 2022-01-07 厦门大学 基于统一中间模态的跨模态行人重识别方法
CN113903053B (zh) * 2021-09-26 2024-06-07 厦门大学 基于统一中间模态的跨模态行人重识别方法
CN114140350A (zh) * 2021-11-24 2022-03-04 四川大学锦江学院 一种应用于无人机中的量子图像修复方法及装置
CN114359034B (zh) * 2021-12-24 2023-08-08 北京航空航天大学 一种基于手绘的人脸图片生成方法及系统
CN114359034A (zh) * 2021-12-24 2022-04-15 北京航空航天大学 一种基于手绘的人脸图片生成方法及系统
CN114756425A (zh) * 2022-03-08 2022-07-15 深圳集智数字科技有限公司 智能监控方法、装置、电子设备及计算机可读存储介质
CN114863542B (zh) * 2022-07-06 2022-09-30 武汉微派网络科技有限公司 基于多模态的未成年人识别方法及系统
CN114863542A (zh) * 2022-07-06 2022-08-05 武汉微派网络科技有限公司 基于多模态的未成年人识别方法及系统
CN118230396A (zh) * 2024-05-22 2024-06-21 苏州元脑智能科技有限公司 人脸识别及其模型训练方法、装置、设备、介质及产品

Also Published As

Publication number Publication date
CN110046551A (zh) 2019-07-23
CN110046551B (zh) 2021-04-20

Similar Documents

Publication Publication Date Title
WO2020186886A1 (zh) 一种人脸识别模型的生成方法及设备
Ullah et al. A Real‐Time Framework for Human Face Detection and Recognition in CCTV Images
WO2021218060A1 (zh) 基于深度学习的人脸识别方法及装置
WO2020228525A1 (zh) 地点识别及其模型训练的方法和装置以及电子设备
Tao et al. Manifold ranking-based matrix factorization for saliency detection
WO2021135509A1 (zh) 图像处理方法、装置、电子设备及存储介质
WO2021143101A1 (zh) 人脸识别方法和人脸识别装置
WO2021159769A1 (zh) 图像检索方法、装置、存储介质及设备
JP6411510B2 (ja) 無制約の媒体内の顔を識別するシステムおよび方法
Gao et al. 3-D object retrieval and recognition with hypergraph analysis
Wang et al. Background-driven salient object detection
CN112232184B (zh) 一种基于深度学习和空间转换网络的多角度人脸识别方法
CN111091075A (zh) 人脸识别方法、装置、电子设备及存储介质
CN112052831A (zh) 人脸检测的方法、装置和计算机存储介质
CN110569724B (zh) 一种基于残差沙漏网络的人脸对齐方法
CN113298158B (zh) 数据检测方法、装置、设备及存储介质
CN111091129B (zh) 一种基于多重颜色特征流形排序的图像显著区域提取方法
Wang et al. Robust head pose estimation via supervised manifold learning
Diaz-Chito et al. Continuous head pose estimation using manifold subspace embedding and multivariate regression
CN109948420A (zh) 人脸比对方法、装置及终端设备
Wan et al. Palmprint recognition system for mobile device based on circle loss
Deng et al. Self-feedback image retrieval algorithm based on annular color moments
Yuan et al. Explore double-opponency and skin color for saliency detection
Guo et al. Automatic face recognition of target images based on deep learning algorithms
Ye et al. Fast single sample face recognition based on sparse representation classification

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19919783

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19919783

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19919783

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 22.03.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19919783

Country of ref document: EP

Kind code of ref document: A1