WO2020186886A1 - 一种人脸识别模型的生成方法及设备 - Google Patents
一种人脸识别模型的生成方法及设备 Download PDFInfo
- Publication number
- WO2020186886A1 WO2020186886A1 PCT/CN2019/130815 CN2019130815W WO2020186886A1 WO 2020186886 A1 WO2020186886 A1 WO 2020186886A1 CN 2019130815 W CN2019130815 W CN 2019130815W WO 2020186886 A1 WO2020186886 A1 WO 2020186886A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- feature vector
- face
- modal
- depth feature
- residual compensation
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
Definitions
- the invention belongs to the technical field of image processing, and particularly relates to a method and equipment for generating a face recognition model.
- Multi-modal face recognition has broad application prospects in security monitoring and public security law enforcement. For example, in dark scenes at night, ordinary surveillance cameras are often difficult to image well, which limits the role of surveillance cameras at night. As the near-infrared camera has better imaging capabilities at night, it can make up for the lack of surveillance cameras based on the principle of visible light imaging. For another example, when arresting a suspect, the public security department can generate a synthetic face photo of the suspect through image synthesis based on the relevant description of the eyewitness. When making and issuing ID cards, the public security department will use ordinary cameras to collect the face images of citizens under visible light conditions. That is, the public security department only records facial images under visible light. Therefore, how to perform face recognition based on synthesized face images or face images collected based on various detection lights, that is, multi-modal face recognition technology is becoming more and more important today.
- the existing multi-modal face recognition technology generally adopts the multi-modal face recognition technology based on artificial design features.
- the above methods are limited by the expressive ability of artificial features, and the artificial features cannot exhaust all faces.
- the description is not accurate, it will directly affect the recognition accuracy of the face recognition technology. It can be seen that the accuracy of the multi-modal face recognition technology based on artificially designed features is low, and the labor cost is high .
- the embodiments of the present invention provide a method and device for generating a face recognition model to solve the existing multi-modal face recognition technology, which is mainly based on artificially designed features for multi-modal face recognition, resulting in The problem of low accuracy of face recognition and high labor cost.
- the first aspect of the embodiments of the present invention provides a method for generating a face recognition model, including:
- the face image includes a first face image corresponding to a primary modal and a second face image corresponding to at least one secondary modal;
- the residual compensation model is adjusted based on the first depth feature vector and the second depth feature vector corresponding to the plurality of training objects, so that the first depth feature vector and the second depth feature The difference between the vectors is less than the preset difference threshold;
- a face recognition model is generated according to the adjusted residual compensation model, the first convolutional neural network, and the second convolutional neural network.
- a second aspect of the embodiments of the present invention provides a device for generating a face recognition model, including:
- the face image acquisition unit is used to acquire face images corresponding to the training object in each preset modal; the face image includes a first face image corresponding to the primary modal and a second person corresponding to at least one secondary modal Face image
- the first depth feature vector acquiring unit is configured to extract the first depth feature vector of the first face image through a preset first convolutional neural network
- a second depth feature vector acquiring unit configured to extract a second depth feature vector of the second face image through a preset second convolutional neural network and a residual compensation model to be adjusted regarding the sub-modality;
- the residual compensation model adjustment unit is configured to adjust the residual compensation model based on the first depth feature vector and the second depth feature vector corresponding to the plurality of training objects, so that the first depth The degree of difference between the feature vector and the second depth feature vector is less than a preset difference threshold;
- the face recognition model generation unit is configured to generate a face recognition model according to the adjusted residual compensation model, the first convolutional neural network, and the second convolutional neural network.
- a third aspect of the embodiments of the present invention provides a terminal device, including a memory, a processor, and a computer program stored in the memory and running on the processor.
- the processor executes the computer program, Realize the steps of the first aspect.
- a fourth aspect of the embodiments of the present invention provides a computer-readable storage medium that stores a computer program that implements the steps of the first aspect when the computer program is executed by a processor.
- the embodiment of the present invention obtains the face images of the training object in different modalities, and extracts the second depth feature vector of the secondary modal through the residual compensation model to be adjusted and the convolutional neural network.
- a depth feature vector and a second depth feature vector are fed back and adjusted to the residual compensation model so that the difference between the first depth feature vector and the second depth feature vector is smaller than the preset difference threshold, that is, the recognition result converges,
- the face images of the main mode and the sub-mode belong to the same entity, and the depth feature vector is used to represent the characteristics of each key point of the face, if the residual compensation module is adjusted, the depth features of the two modes The deviation of the vector is small, so when the difference between the two depth feature vectors is less than the preset difference threshold, it can be determined that the residual compensation module has been adjusted, and the face is generated based on the residual compensation module Identify the model.
- the present invention does not rely on the user's artificial feature description of the face information, and can generate a face recognition model by inputting the face information of the training object, thereby improving the multi-modality
- the accuracy of face recognition reduces labor costs.
- Fig. 1 is an implementation flowchart of a method for generating a face recognition model provided by the first embodiment of the present invention
- FIG. 2 is a schematic structural diagram of a ten-layer residual network provided by an embodiment of the present invention.
- FIG. 3 is a schematic diagram of the structure of four multi-modal face recognition networks provided by an embodiment of the present invention.
- FIG. 4 is a schematic structural diagram of a second convolutional neural network after the residual compensation module is configured in the convolutional layer according to an embodiment of the present invention
- FIG. 5 is a specific implementation flow chart of a method S104 for generating a face recognition model provided by the second embodiment of the present invention
- FIG. 6 is a specific implementation flow chart of a method S1042 for generating a face recognition model provided by the third embodiment of the present invention.
- Fig. 7 is a network structure diagram of a face recognition model provided by an embodiment of the present invention.
- FIG. 8 is a specific implementation flow chart of a method S1042 for generating a face recognition model provided by the fourth embodiment of the present invention.
- FIG. 9 is a specific implementation flowchart of a method S101 for generating a face recognition model provided by the fifth embodiment of the present invention.
- FIG. 10 is a specific implementation flowchart of a method for generating a face recognition model provided by the sixth embodiment of the present invention.
- FIG. 11 is a specific implementation flowchart of a method for generating a face recognition model provided by a seventh embodiment of the present invention.
- FIG. 12 is a structural block diagram of a device for generating a face recognition model according to an embodiment of the present invention.
- FIG. 13 is a schematic diagram of a terminal device according to another embodiment of the present invention.
- the embodiment of the present invention obtains the face images of the training object in different modalities, and extracts the second depth feature vector of the secondary modal through the residual compensation model to be adjusted and the convolutional neural network.
- a depth feature vector and a second depth feature vector are fed back and adjusted to the residual compensation model so that the difference between the first depth feature vector and the second depth feature vector is smaller than the preset difference threshold, that is, the recognition result converges,
- the face images of the main mode and the sub-mode belong to the same entity, and the depth feature vector is used to represent the characteristics of each key point of the face, if the residual compensation module is adjusted, the depth features of the two modes The deviation of the vector is small, so when the difference between the two depth feature vectors is less than the preset difference threshold, it can be determined that the residual compensation module has been adjusted, and the face is generated based on the residual compensation module
- the recognition model solves the existing multi-modal face recognition technology, which is mainly based on artificially designed features for multi-modal face recognition, which results in low
- the execution subject of the process is the terminal device.
- the terminal device includes, but is not limited to: a server, a computer, a smart phone, a tablet computer, and other devices capable of performing the operation of generating a face recognition model.
- the terminal device is specifically a face recognition device that can determine the object attributes of the target object through the input face image.
- the terminal device has multiple input channels, and each input channel can be used to identify people in a preset mode. Face images to realize multi-modal face recognition.
- Fig. 1 shows an implementation flow chart of the method for generating a face recognition model provided by the first embodiment of the present invention, which is detailed as follows:
- a face image corresponding to each preset modal of a training object is acquired; the face image includes a first face image corresponding to the main modal and a second face image corresponding to at least one secondary modal.
- the terminal device can extract face images of the training object in different preset modalities from the database, and each face image corresponds to the same entity person, and then can combine the face images of different modalities.
- the images are recognized as the same group of facial images.
- face images of different modalities specifically refer to face images output by different imaging principles.
- face images of different modalities include, but are not limited to: face images generated under visible light, face images generated under infrared light, face images generated based on the principle of thermal imaging, and generated based on the principle of distance measurement The depth of field face image, the face image generated based on the principle of animation synthesis, and the face image generated based on hand drawing.
- the terminal device can select one of the modes as the main mode, and the other modes except the main mode are sub-modes.
- the terminal device can use the face image generated based on the principle of visible light as the face image corresponding to the main mode. Since the method of acquiring the face image based on visible light is widely used, A large number of face images under visible light can be easily collected, and the algorithm for extracting the depth feature vector of the face image under visible light is relatively mature, which can greatly improve the accuracy of the face recognition model. There are two core difficulties in multi-modal face recognition. One is the inconvenience of multi-modal data collection, so there are too few data sets that can be used. The other is that there are huge modal differences between images of different modalities. Around these two problems, the prior art uses some feature expression methods that are robust to different modal information to deal with this problem.
- a method of shared component analysis is proposed to learn the shared components between image pairs from different modalities, so that image pairs belonging to different modalities are projected into the same feature space to reduce modal differences. influences.
- Another prior art proposes to obtain a common subspace of different modal pictures through a modal independent component analysis algorithm, and then obtain a modal independent dictionary in the common subspace through a self-learning strategy.
- a multi-modal face recognition method based on graphic representation is proposed. This method uses Markov network to model the compatibility relationship between adjacent image blocks, and uses the similarity measurement method expressed in pairs to measure the similarity between pictures and realize the comparison of faces.
- This technology designs an objective function to bring together the feature vectors of paired pictures of different modalities, while keeping the pictures of people with different identities away from each other, and implicitly by constructing training samples of triples Increase the number of training samples and reduce overfitting.
- the multi-modal face recognition technology based on artificial design features is limited by the expression ability of artificial features, and the recognition accuracy is low; some existing technologies based on deep learning are difficult to solve at the same time to reduce the overfitting and overfitting of convolutional neural networks. Reduce the two problems of modal differences, and the design is relatively complicated, relatively inconvenient to use, and the recognition effect is difficult to meet the needs of practical applications.
- the present invention proposes a simple and effective multi-modal face recognition method based on a residual compensation network.
- the terminal device can realize the recognition of the multi-modal face image, which includes one main mode and at least one sub-mode.
- the terminal device can configure corresponding face image input channels for different modalities based on the number of modalities of the face recognition model.
- the terminal device can recognize the modal type of the face image, and Determine the corresponding input channel according to the modal type. Among them, if the detected face image is the first face image based on the main modal, then skip to S102 for processing; conversely, if the detected face image is the second face image based on the secondary modal, skip Go to S103 for processing.
- the terminal device can recognize the face images of two or more sub-modalities, the terminal device can configure the corresponding second convolutional neural network and the corresponding residual compensation model for different sub-modalities, so that the residual The difference compensation model matches the image characteristics of the corresponding sub-modality, thereby improving the accuracy of recognition.
- the first depth feature vector of the first face image is extracted through a preset first convolutional neural network.
- the terminal device needs to perform feature extraction on the first face image, so it imports the first face image into the preset first convolutional neural network, and outputs the first face image about the first face image.
- the first convolutional neural network may be a convolutional neural network constructed based on convolutional neural network structures such as VGGNet, GoogLeNet, DenseNet, SEnet, Xception, and light CNN.
- the first convolutional neural network is specifically a ten-layer residual network.
- Fig. 2 shows a schematic structural diagram of a ten-layer residual network provided by this embodiment.
- the ten-layer residual network consists of 10 convolutional layers and a fully connected (FC) layer.
- the number of convolution channels is gradually increased from 32 to 512, except for the convolution step of the first layer. Except for the length of 2, the step size of all other convolutional layers is 1, and the 128-dimensional vector output by the fully connected layer is the feature of the face image in the main mode.
- "3*3Conv” is used to indicate the size of the convolution kernel of the convolution layer
- “2*2max pool” is used to indicate the size of the convolution kernel of the pooling layer.
- the second depth feature vector of the second face image is extracted through the preset second convolutional neural network and the residual compensation model to be adjusted regarding the sub-modality.
- the terminal device imports the second face image into the second convolutional neural network, and extracts the person in the second face image of the sub-modality. After the face feature value, it is necessary to perform modal residual compensation on the face feature value through the residual compensation model, output the second depth feature vector about the secondary mode, and eliminate the main mode and secondary mode through the residual modal compensation model. Modal difference between modes. It should be noted that if the face recognition model can perform face recognition on multiple sub-modes, a corresponding residual compensation network can be configured for each sub-mode based on the modal characteristics of each sub-mode.
- the convolution parameters in the first convolutional neural network and the second convolutional neural network are the same, that is, the convolution parameters of the two convolutional neural networks are shared, and the convolution of the two branches
- the parameters are initialized with the convolutional neural network trained on large-scale visible light face images, and the convolution parameters of the two branches are shared and will not be updated during the training process, so that the differences in different modes are all passed through the residual compensation module Make adjustments so that the learnable parameters can be greatly reduced, thereby reducing overfitting.
- Face image in sub-mode can use convolutional neural network to extract its deep feature vector Since f ⁇ (*) is trained on the face data in the main mode, it can be used to extract The depth feature vector.
- f ⁇ (*) is trained on the face data in the main mode, it can be used to extract The depth feature vector.
- the face feature distribution of the sub-mode is quite different from the face feature distribution of the main mode, so the f ⁇ (*) extracted The depth feature vector may get poor facial feature expression, which will bring modal differences.
- the modal difference between can be approximately modeled as a residual compensation module, namely
- the second depth feature vector can be generated based on the second face image after the feature value is extracted by the second convolutional neural network, and the compensation value output by the residual compensation module is superimposed, and the second depth feature vector can be approximately equal to the first A depth feature vector.
- the residual compensation model is adjusted based on the first depth feature vector and the second depth feature vector corresponding to the multiple training objects, so that the first depth feature vector and the The degree of difference between the second depth feature vectors is less than a preset difference threshold.
- the terminal device can be based on the first The depth feature vector and the second depth feature vector are used to feedback and adjust the residual compensation network so that the difference between the second depth feature vector output by the residual compensation network and the first depth feature vector is less than the preset difference threshold , That is, the output result converges.
- a face recognition model is generated according to the adjusted residual compensation model, the first convolutional neural network, and the second convolutional neural network.
- the terminal device determines that the difference between the second depth feature vector output by the residual compensation model and the first depth feature vector is less than the preset difference threshold, it indicates that the output result of the residual compensation network has converged .
- the depth feature vector corresponding to the face image of the sub-modality can be converted by the residual compensation module to generate a depth feature vector consistent with the main mode, thereby
- the face feature vectors of all sub-modalities can be unified to the main mode, so that it can be compared with various standard face vectors generated based on the main mode to determine the object attributes corresponding to the face images of the sub-modes.
- Figure 3 shows four multi-modal face recognition networks provided by the present invention.
- Figure 3a is a convolutional neural network that fine-tunes a fully connected layer
- Figure 3b is a convolutional neural network formed by adding a fully connected and PReLU layer after the original fully connected layer.
- Figure 3c is A face recognition network with two modal branches, and a fully connected and PReLU layer is added to the sub-modal branch.
- FIG. 3d is a face recognition network provided by the present invention with a residual compensation module added in the sub-modal.
- the model in Fig. 3b adds a new fully connected layer and even reduces the accuracy.
- the reason for this phenomenon is that although the newly-added fully connected layer increases the expressive ability of the model, it is easier to overfit on small data sets of cross-modal faces.
- the face recognition model provided by the present invention adds a residual compensation module, and its accuracy rate is also higher than that of the above models, which shows that the residual compensation module can effectively improve the accuracy of cross-modal face recognition.
- the residual compensation module can keep the main characteristics of the backbone network basically unchanged, and at the same time compensate the difference between different modal characteristics through a nonlinear residual mapping, thereby reducing the modal difference.
- the multi-modal face recognition model based on the residual compensation network is based on CASIA NIR-VIS 2.0, IIIT-D Viewed Sketch, Forensic Sketch, CUHK NIR-VIS and other multi-modal data
- the set achieved the highest recognition accuracy. This shows that the multi-modal face recognition model based on the residual compensation model can effectively deal with the over-fitting problem and reduce the modal difference.
- the implementation of the residual compensation model can be not only a fully connected layer + a nonlinear activation function, but also a stack of multiple fully connected layers + a nonlinear activation function, or a nonlinear activation function.
- Activation function + fully connected layer, or nonlinear activation function + fully connected layer + nonlinear activation function, or nonlinear activation function + fully connected layer + nonlinear activation function, or through convolutional layer + nonlinear activation function The form is added after the convolutional layer.
- FIG. 4 is a schematic diagram of a second convolutional neural network after the residual compensation module is configured in a convolutional layer according to an embodiment of the present invention.
- the convolutional layer of the first convolutional neural network does not update the parameters after initialization, and the residual compensation model can be added to the upper and lower convolutions of the second convolutional neural network where the fixed parameters are not updated. Between layers. At the same time, the structure of the residual compensation model is no longer in the form of fully connected layer + PReLU, but in the form of convolutional layer + PReLU.
- the method for generating a face recognition model obtained by the embodiment of the present invention obtains the face images of the training object in different modalities, and extracts the second time through the residual compensation model to be adjusted and the convolutional neural network.
- the second depth eigenvector of the modal is based on the first depth eigenvector and the second depth eigenvector of the main modal, and then feedback adjustment is performed on the residual compensation model, so that the first depth eigenvector and the second depth eigenvector are The difference between the two is smaller than the preset difference threshold, that is, the recognition result converges.
- the residual compensation module is adjusted, the deviation of the depth feature vectors of the two modalities is small. Therefore, when the difference between the two depth feature vectors is less than the preset difference threshold, the residual can be determined
- the compensation module has been adjusted, and a face recognition model is generated based on the residual compensation module.
- the present invention does not rely on the user's artificial feature description of the face information, and can generate a face recognition model by inputting the face information of the training object, thereby improving the multi-modality
- the accuracy of face recognition reduces labor costs.
- FIG. 5 shows a specific implementation flow chart of a method S104 for generating a face recognition model provided by the second embodiment of the present invention.
- the method S104 for generating a face recognition model provided by this embodiment includes: S1041 to S1043, which are detailed as follows:
- the adjusting the residual compensation model based on the first depth feature vector and the second depth feature vector corresponding to the multiple training objects includes:
- the first depth feature vector and the second depth feature vector are imported into a preset difference degree calculation model, and the deviation value of the residual compensation model to be adjusted is determined.
- the terminal device first needs to determine the current deviation value of the residual compensation model to be adjusted. Therefore, the first depth feature vector and the second depth feature vector can be imported into the preset difference degree calculation model to determine the two depths. The deviation value between the feature vectors.
- the terminal device will input the different depth feature vectors of the training object in multiple preset modalities into the difference calculation model in the form of face image groups in pairs, thereby The deviation values of the depth feature vectors of the same training object in different modalities can be determined.
- the residual compensation model is specifically composed of a fully connected layer and an additional fully connected layer PReLU layer.
- the dropout technique can be used to compensate the residual based on the deviation values of multiple training objects. Network for adjustment and learning.
- the first depth feature vector and the second depth feature vector are imported into a preset multi-mode loss function calculation model, and the loss value of the residual compensation model is determined.
- the terminal device in addition to adjusting the residual loss model by the deviation value between the depth feature vectors of different modalities, can also adjust the residual loss model based on the loss value calculated by the residual compensation model for multiple training objects.
- the residual compensation model performs supervised learning to avoid overfitting of the residual compensation function and reduce the difference between different modes.
- the multi-mode loss calculation model may be a loss model based on Center loss and/or a loss model based on Contrastive loss.
- the residual compensation model is adjusted based on the loss value and the deviation value, so that the residual compensation model meets a convergence condition; the convergence condition is:
- ⁇ is the learning parameter of the residual compensation function
- diff(*,*) is the vector deviation function
- the residual compensation model uses the second convolutional neural network trained on the first face image of the large-scale main modal as the backbone network, and adds the second face image for the second modal Residual compensation model and multi-mode loss function.
- the backbone network of the residual compensation model that is, the convolution parameters of the second convolutional neural network are not updated, only in the multi-mode loss function.
- the parameters of the residual compensation model are learned under the joint supervision of, thus greatly reducing the amount of parameters, which can effectively alleviate the over-fitting problem of the convolutional neural network.
- the residual compensation model compensates for modal differences and multi-modal loss functions. All optimizations can reduce modal differences.
- the deviation value of the residual compensation model is determined through the first depth feature vector and the second depth feature vector, and the loss value of multiple training objects is calculated through the multi-mode loss function and the deviation value compensates the residual
- the adjustment and learning of the network can reduce the over-fitting of the residual compensation model, and at the same time reduce the differences caused by different modalities, and improve the accuracy of face recognition.
- Fig. 6 shows a specific implementation flowchart of a method S1042 for generating a face recognition model provided by the third embodiment of the present invention.
- the method S1042 for generating a face recognition model provided by this embodiment includes: S601 to S602, which are detailed as follows:
- the first depth feature vector and the second depth feature vector are imported into a preset multi-mode loss function calculation model to determine the residual compensation model
- the loss value includes:
- L MD1 is the amount of modal loss
- N is the number of training objects
- the cosine similarity function calculates the cosine similarity between two depth feature vectors, and calculates the loss component of a single training object based on the cosine similarity, and performs a weighted summation of the loss components of N training objects to calculate The first modal loss of the residual compensation function.
- L is the loss value
- L softmax is a cross-entropy loss function for face classification
- ⁇ is a hyperparameter based on the cross-entropy loss function and the modal difference loss function.
- FIG. 7 shows a network structure diagram of a face recognition model provided by an embodiment of the present invention.
- the network structure of the face model has two input channels, the first face image channel used to output the primary modal and the second face image channel used to input the secondary modal, where
- the second face image channel is configured with a residual compensation model, which is specifically composed of a fully connected layer and a nonlinear activation function.
- the face recognition network imports the first deep feature vector and the second deep feature vector into the multi-mode
- the loss function calculation model is used to calculate the first mode loss and the total loss value of the two modes, and supervise the learning of the residual compensation model.
- the modal loss function and the cross-entropy loss function are jointly supervised to train the residual compensation network.
- the backpropagation algorithm can be used to update the learnable parameters in the residual compensation model, and the training is good.
- the different branches of the residual compensation network can be used to extract the depth feature vector of the corresponding modal face image, and then the depth feature vector can be used in the test to calculate the similarity of the two face images , So as to determine the identity of the person in the face image.
- Fig. 8 shows a specific implementation flow chart of a method S1042 for generating a face recognition model provided by the fourth embodiment of the present invention.
- the method S1042 for generating a face recognition model provided by this embodiment includes: S801 to S802, which are detailed as follows:
- the first depth feature vector and the second depth feature vector are imported into a preset multi-mode loss function calculation model to determine the residual
- the loss value of the compensation model includes:
- L MD2 is the modal loss
- N is the number of training objects
- the deviation value between the first depth feature vector and the second depth feature vector is calculated as the Euclidean distance deviation value, and then multiple
- you can pass Euclidean distance function calculate the Euclidean distance between two depth feature vectors, use the Euclidean distance value as the loss component of the training object, and perform a weighted summation of the loss components of the N training objects to calculate the residual The second modal loss of the difference compensation function.
- L is the loss value
- L softmax is a cross-entropy loss function for face classification
- ⁇ is a hyperparameter based on the cross-entropy loss function and the modal difference loss function.
- the modal loss function and the cross-entropy loss function are jointly supervised to train the residual compensation network.
- the backpropagation algorithm can be used to update the learnable parameters in the residual compensation model, and the training is good.
- the different branches of the residual compensation network can be used to extract the depth feature vector of the corresponding modal face image, and then the depth feature vector can be used in the test to calculate the similarity of the two face images , So as to determine the identity of the person in the face image.
- FIG. 9 shows a specific implementation flow chart of a method S101 for generating a face recognition model provided by the fifth embodiment of the present invention.
- the method S101 for generating a face recognition model provided by this embodiment includes: S1011 to S1015, which are detailed as follows:
- the acquiring the face images corresponding to the training object in each preset mode includes:
- the object image of the training object in each of the preset modalities is acquired, and the facial feature points in the object image are determined through a face detection algorithm.
- the terminal device can preprocess the object image of the training object, which can improve subsequent disability
- the difference compensation model adjusts the accuracy of learning. Based on this, after acquiring the object image of the training object, the terminal device can identify multiple facial feature points about the training object through a face detection algorithm, and mark each facial feature point in the object image.
- the facial feature points can be various facial organs, such as eyes, ears, nose, mouth, eyebrows, etc.
- the face area of the training object from the object image based on the face feature points; the face area includes the first face area of the main modal and the sub-modality The second face area.
- the terminal device can determine the location of the face of the training object based on the coordinate information of the facial feature points, so that the person can be extracted from the training image The image of the area where the face is, that is, the aforementioned face area.
- the above operations are performed on training images of different modalities, so that the first face area of the main modal and the second face area of the sub-modal can be generated.
- a standardized transformation is performed on the second face area to Match the second coordinate information of each of the face feature points in the second face area with the first coordinate information.
- the terminal device needs to preprocess different face areas after acquiring the face area, so as to facilitate the output of the depth feature vector. Based on this, the terminal device can adjust the size of the second face area according to the area size of the first face area in the main mode, and adjust the size of the second face area according to the coordinate information of all facial feature points in the first face area.
- the two face regions are similarly transformed or affine transformed, so that the face feature points of different modalities can be aligned, that is, the coordinate information of the same type of face feature points in different modalities is the same, and a uniform size and face pose can be obtained The same face image with different modalities.
- the terminal device is provided with a standard face template
- the standard face template is configured with a standard template size and standard face feature points
- the terminal device can adjust the first face according to the standard face template. Area and the second face area, align the face feature points of the first face area and the face feature points of the second face area with the face feature points of the standard face template.
- the terminal device can expand the monochrome image of the sub-mode with three channels, or perform gray-scale processing on the color image of the main mode, so as to ensure that the number of channels included in the main mode and the sub-mode is the same straight.
- the pixel value of each pixel in the first face area is normalized, and the normalized first face area is recognized as the first face image.
- the terminal device can obtain the pixel value of each pixel in the first face area, and perform normalization processing on it based on the pixel value.
- the pixel value can be divided by 255, that is, the maximum value of the pixel value, thereby ensuring that each pixel value in the face area is a value between 0-1.
- the terminal device can also first subtract 127.5 from the pixel value of the pixel, which is one-half of the maximum pixel value, and divide the difference by 128, so that the normalized pixel value will be in [-1,1] Within the range of, and recognize the normalized face area as the first face image.
- the normalization operation is the same as the specific implementation process of S1014.
- S1014 For specific description, please refer to the relevant description of S1014, which will not be repeated here.
- the subsequent depth feature vector can be improved. Uniformity improves the training accuracy of the residual compensation model.
- FIG. 10 shows a specific implementation flowchart of a method for generating a face recognition model provided by the sixth embodiment of the present invention.
- the method for generating a face recognition model provided by this embodiment is based on the adjusted residual compensation model and the first convolutional neural After the network and the second convolutional neural network generate the face recognition model, it also includes: S1001 to S1004, which are detailed as follows:
- the acquiring the face images corresponding to the training object in each preset mode includes:
- a target image of the object to be recognized is acquired, and the modal type of the target image is determined.
- the terminal device after the terminal device generates the face recognition model, it can realize multi-modal face recognition and determine the object attributes corresponding to different face images.
- the user can send the object image to be recognized to the terminal device, and the terminal device extracts the target image about the object to be recognized from the object image.
- the method of extracting the target image may adopt the method of the embodiment provided in FIG.
- the terminal device after acquiring the target image, the terminal device needs to determine the modal type of the target image, that is, the target image is a face image generated based on the principle of primary modal imaging, or based on the principle of secondary modal imaging The generated face image. If the target image is a face image generated based on the main modality, the target feature vector of the target object is output through the first convolutional neural network, and the target feature vector is matched with each standard feature vector in the object library, So as to determine the object attributes of the object to be identified.
- the target feature vector of the target image is calculated by the second convolutional neural network and the adjusted residual compensation model.
- the target feature of the target image can be output through the residual compensation model corresponding to the sub-modal and the second convolutional neural network.
- Vector because the parameter compensation is performed by the residual compensation network, that is, the target feature vector can be approximately equivalent to the target feature vector based on the main mode, so it can be matched with each standard feature vector generated based on the main mode.
- the terminal device can calculate the matching degree between the target feature vector of the object to be identified and the standard feature vector of each registered object in the object library.
- the target feature vector and each standard can be calculated by the smallest neighbor algorithm.
- the distance value between the feature vectors, and the reciprocal of the distance value is used as the matching degree between the two.
- the entered object corresponding to the standard feature vector with the highest matching degree is used as the matching object of the object to be identified.
- the terminal device uses the entered object corresponding to the standard feature vector with the highest degree of matching as the matching object of the object to be recognized, thereby achieving the purpose of recognizing the sub-modal face image.
- the standard feature vector of each entered object in the object library is based on the feature vector generated in the main mode.
- the recognition accuracy can be improved by performing face recognition on a face image through a multi-modal face recognition model including a residual compensation network.
- FIG. 11 shows a specific implementation flowchart of a method for generating a face recognition model provided by a seventh embodiment of the present invention.
- the method for generating a face recognition model provided by this embodiment is based on the adjusted residual compensation model and the first convolutional neural After the network and the second convolutional neural network generate the face recognition model, it also includes: S1101 to S1104, which are detailed as follows:
- a first image of a first object and a second image of a second object are acquired; the modal type of the first image is the main modal type; the modal type of the second image is the sub-modal type .
- the terminal device can be used to detect whether two objects belong to the same entity user. Therefore, the terminal device can obtain the first image of the first object to be matched and the image of the second object to be matched.
- the second image may include multiple second images, and different second images may correspond to different modal types or the same modal type, which is not limited here.
- the first target vector of the first image is extracted through the first convolutional neural network.
- the terminal device may calculate the first depth feature vector of the first object through the first convolutional neural network, that is, the aforementioned first target vector.
- the second target vector of the second image is extracted through the second convolutional neural network and the adjusted residual compensation model.
- the terminal device can determine the second depth feature vector of the second image, that is, the aforementioned second target vector through the second convolutional neural network and the adjusted residual compensation model.
- the deviation value between the first target vector and the second target vector is calculated, and if the deviation value is less than a preset deviation threshold, it is recognized that the first object and the second object belong to The same entity object.
- the terminal device can calculate the deviation value between the first target vector and the second target vector, for example, by means of cosine distance function or Euclidean distance function, etc., to calculate the degree of difference between the two vectors, that is, the above If the deviation value is less than the preset deviation threshold, the two objects are identified as belonging to the same entity; conversely, if the deviation value is greater than or equal to the preset deviation threshold, it means that the two objects belong to two different Entity object.
- the images of the two modalities can be imported into the face recognition network, the depth feature vectors corresponding to the two modalities are calculated, and the two faces are determined based on the deviation value between the two depth feature vectors. Whether the image belongs to the same entity object realizes the purpose of classification and recognition of the entity object.
- Figure 12 shows a structural block diagram of a face recognition model generation device provided by an embodiment of the present invention.
- the face recognition model generation device includes units for executing steps in the embodiment corresponding to Figure 1 .
- only the parts related to this embodiment are shown.
- the device for generating the face recognition model includes:
- the face image acquiring unit 121 is configured to acquire face images corresponding to the training object in each preset modal; the face image includes a first face image corresponding to a primary modal and a second face image corresponding to at least one secondary modal Face image
- the first depth feature vector acquiring unit 122 is configured to extract the first depth feature vector of the first face image through a preset first convolutional neural network
- the second depth feature vector acquiring unit 123 is configured to extract the second depth feature vector of the second face image through a preset second convolutional neural network and a residual compensation model to be adjusted regarding the sub-modality ;
- the residual compensation model adjustment unit 124 is configured to adjust the residual compensation model based on the first depth feature vector and the second depth feature vector corresponding to the multiple training objects, so that the first The degree of difference between the depth feature vector and the second depth feature vector is less than a preset difference threshold;
- the face recognition model generating unit 125 is configured to generate a face recognition model according to the adjusted residual compensation model, the first convolutional neural network, and the second convolutional neural network.
- the residual compensation model adjustment unit 124 includes:
- a compensation deviation value calculation unit configured to import the first depth feature vector and the second depth feature vector into a preset difference degree calculation model, and determine the deviation value of the residual compensation model to be adjusted;
- a compensation loss value calculation unit configured to import the first depth feature vector and the second depth feature vector into a preset multi-mode loss function calculation model, and determine the loss value of the residual compensation model
- the model convergence adjustment unit is configured to adjust the residual compensation model based on the loss value and the deviation value, so that the residual compensation model meets a convergence condition; the convergence condition is:
- ⁇ is the learning parameter of the residual compensation function
- diff(*,*) is the vector deviation function
- the compensation loss value calculation unit includes:
- the first modal loss calculation unit is configured to import the first depth feature vector and the second depth feature vector of the plurality of training objects into a preset first modal difference loss function, and calculate the residual
- the first modal loss amount of the difference compensation model; the first modal difference loss function is specifically:
- L MD1 is the amount of modal loss
- N is the number of training objects
- the first loss value output unit is configured to import the first modal loss amount into a preset face recognition loss function to calculate the loss value of the residual compensation model;
- the face recognition loss function is specifically :
- L is the loss value
- L softmax is a cross-entropy loss function for face classification
- ⁇ is a hyperparameter based on the cross-entropy loss function and the modal difference loss function.
- the compensation loss value calculation unit includes:
- the second modal loss calculation unit is configured to import the first depth feature vector and the second depth feature vector of the multiple training objects into a preset second modal difference loss function, and calculate the residual
- the second modal loss amount of the difference compensation model; the second modal difference loss function is specifically:
- L MD2 is the modal loss
- N is the number of training objects
- the second loss value output unit is configured to import the second modal loss amount into a preset face recognition loss function, and calculate the loss value of the residual compensation model;
- the face recognition loss function is specifically :
- L is the loss value
- L softmax is a cross-entropy loss function for face classification
- ⁇ is a hyperparameter based on the cross-entropy loss function and the modal difference loss function.
- the face image acquiring unit 121 includes:
- the face feature point recognition unit is configured to obtain the object image of the training object in each of the preset modalities, and determine the face feature point in the object image through a face detection algorithm;
- the face area extraction unit is configured to extract the face area of the training object from the object image based on the face feature points; the face area includes the first face area of the main modal and all The second face area of the sub-mode;
- the face feature point adjustment unit is configured to compare the second face area based on the first coordinate information of each of the face feature points in the first face area and the area size of the first face area Performing a standardized transformation, so that the second coordinate information of each of the facial feature points in the second face region matches the first coordinate information;
- the first normalization processing unit is configured to perform normalization processing on the pixel value of each pixel in the first face area, and recognize the normalized first face area as the first Face image
- the second normalization processing unit is configured to perform normalization processing on the pixel value of each pixel in the second face region after transformation, and recognize the normalized second face region as the Describe the second face image.
- the device for generating the face recognition model further includes:
- the modal type recognition unit is used to obtain the target image of the object to be recognized and determine the modal type of the target image
- the target feature vector output unit is configured to calculate the target feature vector of the target image through the second convolutional neural network and the adjusted residual compensation model if the mode type is the sub-mode ;
- the face matching degree calculation unit is used to calculate the matching degree between the target feature vector and each standard feature vector in the object library
- the face recognition unit is configured to use the entered object corresponding to the standard feature vector with the highest matching degree as the matching object of the object to be recognized.
- the device for generating the face recognition model further includes:
- a multi-object image acquisition unit for acquiring a first image of a first object and a second image of a second object; the modal type of the first image is the main modal type; the modal type of the second image is Sub-modal type;
- a first target vector calculation unit configured to extract a first target vector of the first image through the first convolutional neural network
- a second target vector calculation unit configured to extract a second target vector of the second image through the second convolutional neural network and the adjusted residual compensation model
- the same entity object recognition unit is used to calculate the deviation value between the first target vector and the second target vector, and if the deviation value is less than a preset deviation threshold, then identify the first object and the The second object belongs to the same entity object
- the face recognition model generation device provided by the embodiment of the present invention also does not rely on the user's artificial feature description of the face information, and can generate the face recognition model by inputting the face information of the training object, thereby improving the multi-modality
- the accuracy of face recognition reduces labor costs.
- FIG. 13 is a schematic diagram of a terminal device according to another embodiment of the present invention.
- the terminal device 13 of this embodiment includes: a processor 130, a memory 131, and a computer program 132 stored in the memory 131 and running on the processor 130, such as a face recognition model Generate the program.
- the processor 130 executes the computer program 132, the steps in the above embodiments of the method for generating face recognition models are implemented, such as S101 to S105 shown in FIG. 1.
- the processor 130 executes the computer program 132, the functions of the units in the foregoing device embodiments, such as the functions of the modules 1121 to 1125 shown in FIG. 112, are realized.
- the computer program 132 may be divided into one or more units, and the one or more units are stored in the memory 131 and executed by the processor 130 to complete the present invention.
- the one or more units may be a series of computer program instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the computer program 132 in the terminal device 13.
- the computer program 132 may be divided into a human face image acquisition unit, a first depth feature vector acquisition unit, a second depth feature vector acquisition unit, a residual compensation model adjustment unit, and a face recognition model generation unit. The specific functions of each unit As mentioned above.
- the terminal device 13 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
- the terminal device may include, but is not limited to, a processor 130 and a memory 131.
- FIG. 13 is only an example of the terminal device 13, and does not constitute a limitation on the terminal device 13. It may include more or less components than shown in the figure, or a combination of certain components, or different components.
- the terminal device may also include input and output devices, network access devices, buses, etc.
- the so-called processor 130 may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
- the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
- the memory 131 may be an internal storage unit of the terminal device 13, such as a hard disk or a memory of the terminal device 13.
- the memory 131 may also be an external storage device of the terminal device 13, for example, a plug-in hard disk equipped on the terminal device 13, a smart memory card (Smart Media Card, SMC), or a Secure Digital (SD) Card, Flash Card, etc.
- the memory 131 may also include both an internal storage unit of the terminal device 13 and an external storage device.
- the memory 131 is used to store the computer program and other programs and data required by the terminal device.
- the memory 131 can also be used to temporarily store data that has been output or will be output.
- the functional units in the various embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
Description
Claims (11)
- 一种人脸识别模型的生成方法,其特征在于,包括:获取训练对象在各个预设模态对应的人脸图像;所述人脸图像包括主模态对应的第一人脸图像以及至少一个次模态对应的第二人脸图像;通过预设的第一卷积神经网络提取所述第一人脸图像的第一深度特征向量;通过预设的第二卷积神经网络以及关于所述次模态的待调整的残差补偿模型提取所述第二人脸图像的第二深度特征向量;基于多个所述训练对象对应的所述第一深度特征向量以及所述第二深度特征向量对所述残差补偿模型进行调整,以使所述第一深度特征向量与所述第二深度特征向量之间差异度小于预设的差异阈值;根据调整后的所述残差补偿模型、所述第一卷积神经网络以及所述第二卷积神经网络生成人脸识别模型。
- 根据权利要求1所述的生成方法,其特征在于,所述基于多个所述训练对象对应的所述第一深度特征向量以及所述第二深度特征向量对所述残差补偿模型进行调整,包括:将所述第一深度特征向量以及所述第二深度特征向量导入到预设的差异度计算模型,确定待调整的所述残差补偿模型的偏差值;将所述第一深度特征向量以及所述第二深度特征向量导入预设的多模损失函数计算模型,确定所述残差补偿模型的损失值;基于所述损失值以及所述偏差值调整所述残差补偿模型,以使所述残差补偿模型满足收敛条件;所述收敛条件为:
- 根据权利要求2所述的生成方法,其特征在于,若所述偏差值为余弦偏差值,则所述将所述第一深度特征向量以及所述第二深度特征向量导入预设的多模损失函数计算模型,确定所述残差补偿模型的损失值包括:将多个所述训练对象的所述第一深度特征向量以及所述第二深度特征向量导入预设的第一模态差异损失函数,计算所述残差补偿模型的第一模态损失量;所述第一模态差异损失函数具体为:将所述第一模态损失量导入预设的人脸识别损失函数,计算所述残差补偿模型的所述损失值;所述人脸识别损失函数具体为:L=L softmax+λL MD1其中,L为所述损失值;L softmax为用于人脸分类的交叉熵损失函数;λ为基于所述交叉熵损失函数以及所述模态差异损失函数的超参数。
- 根据权利要求2所述的生成方法,其特征在于,若所述偏差值为欧氏距离偏差值,则所述将所述第一深度特征向量以及所述第二深度特征向量导入预设的多模损失函数计算模型,确定所述残差补偿模型的损失值包括:将多个所述训练对象的所述第一深度特征向量以及所述第二深度特征向量导入预设的第二模态差异损失函数,计算所述残差补偿模型的第二模态损失量;所述第二模态差异损失函数具体为:将所述第二模态损失量导入预设的人脸识别损失函数,计算所述残差补偿模型的所述损失值;所述人脸识别损失函数具体为:L=L softmax+λL MD2其中,L为所述损失值;L softmax为用于人脸分类的交叉熵损失函数;λ为基于所述交叉熵损失函数以及所述模态差异损失函数的超参数。
- 根据权利要求1-4任一项所述的生成方法,其特征在于,所述获取训练对象在各个预设模态对应的人脸图像,包括:获取所述训练对象在各个所述预设模态的对象图像,并通过人脸检测算法确定所述对象图像中的人脸特征点;基于所述人脸特征点从所述对象图像中提取所述训练对象的人脸区域;所述人脸区域包括所述主模态的第一人脸区域以及所述次模态的第二人脸区域;基于所述第一人脸区域中各个所述人脸特征点的第一坐标信息以及所述第一人脸区域的区域尺寸,对所述第二人脸区域进行标准化变换,以使所述第二人脸区域中各个所述人脸特征点的第二坐标信息与所述第一坐标信息相匹配;将所述第一人脸区域中各个像素点的像素值进行归一化处理,将归一化后的所述第一人脸区域识别为所述第一人脸图像;将变换后的所述第二人脸区域中各个像素点的像素值进行归一化处理,将归一化后的所述第二人脸区域识别为所述第二人脸图像。
- 根据权利要求1-4任一项所述的生成方法,其特征在于,在所述根据调整后的所述残差补偿模型、所述第一卷积神经网络以及所述第二卷积神经网络生成人脸识别模型之后,还包括:获取待识别对象的目标图像,并确定所述目标图像的模态类型;若所述模态类型为所述次模态,则通过所述第二卷积神经网络以及调整后的所述残差补偿模型计算所述目标图像的目标特征向量;计算所述目标特征向量与对象库中的各个标准特征向量之间的匹配度;将匹配度最高的所述标准特征向量对应的已录入对象作为所述待识别对象的匹配对象。
- 根据权利要求1-4任一项所述的生成方法,其特征在于,在所述根据调整后的所述残差补偿模型、所述第一卷积神经网络以及所述第二卷积神经网络生成人脸识别模型之后,还包括:获取第一对象的第一图像以及第二对象的第二图像;所述第一图像的模态类型为主模态类型;所述第二图像的模态类型为次模态类型;通过所述第一卷积神经网络提取所述第一图像的第一目标向量;通过所述第二卷积神经网络以及调整后的所述残差补偿模型提取所述第二图像的第二目标向量;计算所述第一目标向量以及所述第二目标向量之间的偏差值,若所述偏差值小于预设的偏差阈值,则识别所述第一对象以及所述第二对象属于同一实体对象。
- 一种人脸识别模型的生成设备,其特征在于,包括:人脸图像获取单元,用于获取训练对象在各个预设模态对应的人脸图像;所述人脸图像包括主模态对应的第一人脸图像以及至少一个次模态对应的第二人脸图像;第一深度特征向量获取单元,用于通过预设的第一卷积神经网络提取所述第一人脸图像的第一深度特征向量;第二深度特征向量获取单元,用于通过预设的第二卷积神经网络以及关于所述次模态的待调整的残差补偿模型提取所述第二人脸图像的第二深度特征向量;残差补偿模型调整单元,用于基于多个所述训练对象对应的所述第一深度特征向量以及所述第二深度特征向量对所述残差补偿模型进行调整,以使所述第一深度特征向量与所述第二深度特征向量之间差异度小于预设的差异阈值;人脸识别模型生成单元,用于根据调整后的所述残差补偿模型、所述第一卷积神经网络以及所述第二卷积神经网络生成人脸识别模型。
- 根据权利要求8所述的生成设备,其特征在于,所述残差补偿模型调整单元包括:补偿偏差值计算单元,用于将所述第一深度特征向量以及所述第二深度特征向量导入到预设的差异度计算模型,确定待调整的所述残差补偿模型的偏差值;补偿损失值计算单元,用于将所述第一深度特征向量以及所述第二深度特征向量导入预设的多模损失函数计算模型,确定所述残差补偿模型的损失值;模型收敛调整单元,用于基于所述损失值以及所述偏差值调整所述残差补偿模型,以使所述残差补偿模型满足收敛条件;所述收敛条件为:
- 一种终端设备,其特征在于,所述终端设备包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时如权利要求1至7任一项所述方法的步骤。
- 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至7任一项所述方法的步骤。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910202253.XA CN110046551B (zh) | 2019-03-18 | 2019-03-18 | 一种人脸识别模型的生成方法及设备 |
CN201910202253.X | 2019-03-18 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020186886A1 true WO2020186886A1 (zh) | 2020-09-24 |
Family
ID=67274935
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/130815 WO2020186886A1 (zh) | 2019-03-18 | 2019-12-31 | 一种人脸识别模型的生成方法及设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110046551B (zh) |
WO (1) | WO2020186886A1 (zh) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112016523A (zh) * | 2020-09-25 | 2020-12-01 | 北京百度网讯科技有限公司 | 跨模态人脸识别的方法、装置、设备和存储介质 |
CN112085540A (zh) * | 2020-09-27 | 2020-12-15 | 湖北科技学院 | 基于人工智能技术的广告智能推送系统及方法 |
CN112101552A (zh) * | 2020-09-25 | 2020-12-18 | 北京百度网讯科技有限公司 | 用于训练模型的方法、装置、设备以及存储介质 |
CN112149634A (zh) * | 2020-10-23 | 2020-12-29 | 北京百度网讯科技有限公司 | 图像生成器的训练方法、装置、设备以及存储介质 |
CN112183491A (zh) * | 2020-11-04 | 2021-01-05 | 北京百度网讯科技有限公司 | 表情识别模型及训练方法、识别方法、装置和计算设备 |
CN112215136A (zh) * | 2020-10-10 | 2021-01-12 | 北京奇艺世纪科技有限公司 | 一种目标人物识别方法、装置、电子设备及存储介质 |
CN112232236A (zh) * | 2020-10-20 | 2021-01-15 | 城云科技(中国)有限公司 | 行人流量的监测方法、系统、计算机设备和存储介质 |
CN112633203A (zh) * | 2020-12-29 | 2021-04-09 | 上海商汤智能科技有限公司 | 关键点检测方法及装置、电子设备和存储介质 |
CN113487013A (zh) * | 2021-06-29 | 2021-10-08 | 杭州中葳数字科技有限公司 | 一种基于注意力机制的排序分组卷积方法 |
CN113674161A (zh) * | 2021-07-01 | 2021-11-19 | 清华大学 | 一种基于深度学习的人脸残缺扫描补全方法、装置 |
CN113903053A (zh) * | 2021-09-26 | 2022-01-07 | 厦门大学 | 基于统一中间模态的跨模态行人重识别方法 |
CN114140350A (zh) * | 2021-11-24 | 2022-03-04 | 四川大学锦江学院 | 一种应用于无人机中的量子图像修复方法及装置 |
CN114359034A (zh) * | 2021-12-24 | 2022-04-15 | 北京航空航天大学 | 一种基于手绘的人脸图片生成方法及系统 |
CN114756425A (zh) * | 2022-03-08 | 2022-07-15 | 深圳集智数字科技有限公司 | 智能监控方法、装置、电子设备及计算机可读存储介质 |
CN114863542A (zh) * | 2022-07-06 | 2022-08-05 | 武汉微派网络科技有限公司 | 基于多模态的未成年人识别方法及系统 |
CN113505740B (zh) * | 2021-07-27 | 2023-10-10 | 北京工商大学 | 基于迁移学习和卷积神经网络的面部识别方法 |
CN118230396A (zh) * | 2024-05-22 | 2024-06-21 | 苏州元脑智能科技有限公司 | 人脸识别及其模型训练方法、装置、设备、介质及产品 |
Families Citing this family (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110046551B (zh) * | 2019-03-18 | 2021-04-20 | 中国科学院深圳先进技术研究院 | 一种人脸识别模型的生成方法及设备 |
CN110633698A (zh) * | 2019-09-30 | 2019-12-31 | 上海依图网络科技有限公司 | 基于循环生成对抗网络的红外图片识别方法、设备及介质 |
CN110895809B (zh) * | 2019-10-18 | 2022-07-15 | 中国科学技术大学 | 准确提取髋关节影像中关键点的方法 |
CN110738654B (zh) * | 2019-10-18 | 2022-07-15 | 中国科学技术大学 | 髋关节影像中的关键点提取及骨龄预测方法 |
CN110781856B (zh) * | 2019-11-04 | 2023-12-19 | 浙江大华技术股份有限公司 | 异质人脸识别模型训练方法、人脸识别方法及相关装置 |
CN111027382B (zh) * | 2019-11-06 | 2023-06-23 | 华中师范大学 | 一种基于注意力机制的轻量级人脸检测的方法及模型 |
CN110991281B (zh) * | 2019-11-21 | 2022-11-04 | 电子科技大学 | 一种动态人脸识别方法 |
CN111046759A (zh) * | 2019-11-28 | 2020-04-21 | 深圳市华尊科技股份有限公司 | 人脸识别方法及相关装置 |
CN111080626B (zh) * | 2019-12-19 | 2024-06-18 | 联想(北京)有限公司 | 一种检测方法和电子设备 |
CN111160350B (zh) * | 2019-12-23 | 2023-05-16 | Oppo广东移动通信有限公司 | 人像分割方法、模型训练方法、装置、介质及电子设备 |
CN111104987B (zh) * | 2019-12-25 | 2023-08-01 | 盛景智能科技(嘉兴)有限公司 | 人脸识别方法、装置及电子设备 |
CN111368644B (zh) * | 2020-02-14 | 2024-01-05 | 深圳市商汤科技有限公司 | 图像处理方法、装置、电子设备及存储介质 |
CN111461959B (zh) * | 2020-02-17 | 2023-04-25 | 浙江大学 | 人脸情绪合成方法及装置 |
CN111488972B (zh) * | 2020-04-09 | 2023-08-08 | 北京百度网讯科技有限公司 | 数据迁移方法、装置、电子设备和存储介质 |
CN111539287B (zh) * | 2020-04-16 | 2023-04-07 | 北京百度网讯科技有限公司 | 训练人脸图像生成模型的方法和装置 |
CN111523663B (zh) * | 2020-04-22 | 2023-06-23 | 北京百度网讯科技有限公司 | 一种目标神经网络模型训练方法、装置以及电子设备 |
CN111506761B (zh) * | 2020-04-22 | 2021-05-14 | 上海极链网络科技有限公司 | 一种相似图片查询方法、装置、系统及存储介质 |
CN112084946B (zh) * | 2020-05-09 | 2022-08-05 | 支付宝(杭州)信息技术有限公司 | 一种人脸识别方法、装置及电子设备 |
CN111753753A (zh) * | 2020-06-28 | 2020-10-09 | 北京市商汤科技开发有限公司 | 图像识别方法及装置、电子设备和存储介质 |
CN111862030B (zh) * | 2020-07-15 | 2024-02-09 | 北京百度网讯科技有限公司 | 一种人脸合成图检测方法、装置、电子设备及存储介质 |
CN111860364A (zh) * | 2020-07-24 | 2020-10-30 | 携程计算机技术(上海)有限公司 | 人脸识别模型的训练方法、装置、电子设备和存储介质 |
CN114092848A (zh) * | 2020-07-31 | 2022-02-25 | 阿里巴巴集团控股有限公司 | 对象确定和机器模型的处理方法、装置、设备和存储介质 |
CN112439201B (zh) * | 2020-12-07 | 2022-05-27 | 中国科学院深圳先进技术研究院 | 一种基于次模最大化的动态场景生成方法、终端以及存储介质 |
CN112949855B (zh) * | 2021-02-26 | 2023-08-25 | 平安科技(深圳)有限公司 | 人脸识别模型训练方法、识别方法、装置、设备及介质 |
CN113191940A (zh) * | 2021-05-12 | 2021-07-30 | 广州虎牙科技有限公司 | 图像处理方法、装置、设备及介质 |
CN113205058A (zh) * | 2021-05-18 | 2021-08-03 | 中国科学院计算技术研究所厦门数据智能研究院 | 一种防止非活体攻击的人脸识别方法 |
CN113240115B (zh) * | 2021-06-08 | 2023-06-06 | 深圳数联天下智能科技有限公司 | 一种生成人脸变化图像模型的训练方法及相关装置 |
CN113449623B (zh) * | 2021-06-21 | 2022-06-28 | 浙江康旭科技有限公司 | 一种基于深度学习的轻型活体检测方法 |
CN113449848A (zh) * | 2021-06-28 | 2021-09-28 | 中国工商银行股份有限公司 | 卷积神经网络的训练方法、人脸识别方法及装置 |
CN113705506B (zh) * | 2021-09-02 | 2024-02-13 | 中国联合网络通信集团有限公司 | 核酸检测方法、装置、设备和计算机可读存储介质 |
CN113989908A (zh) * | 2021-11-29 | 2022-01-28 | 北京百度网讯科技有限公司 | 鉴别人脸图像的方法、装置、电子设备及存储介质 |
CN115797560B (zh) * | 2022-11-28 | 2023-07-25 | 广州市碳码科技有限责任公司 | 一种基于近红外光谱成像的头部模型构建方法及系统 |
CN116343301B (zh) * | 2023-03-27 | 2024-03-08 | 滨州市沾化区退役军人服务中心 | 基于人脸识别的人员信息智能校验系统 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107871105A (zh) * | 2016-09-26 | 2018-04-03 | 北京眼神科技有限公司 | 一种人脸认证方法和装置 |
US20180137396A1 (en) * | 2014-08-29 | 2018-05-17 | Google Llc | Processing images using deep neural networks |
CN108573243A (zh) * | 2018-04-27 | 2018-09-25 | 上海敏识网络科技有限公司 | 一种基于深度卷积神经网络的低质量人脸的比对方法 |
CN108985236A (zh) * | 2018-07-20 | 2018-12-11 | 南京开为网络科技有限公司 | 一种基于深度化可分离卷积模型的人脸识别方法 |
CN109117817A (zh) * | 2018-08-28 | 2019-01-01 | 摩佰尔(天津)大数据科技有限公司 | 人脸识别的方法及装置 |
WO2019009449A1 (ko) * | 2017-07-06 | 2019-01-10 | 삼성전자 주식회사 | 영상을 부호화/복호화 하는 방법 및 그 장치 |
CN110046551A (zh) * | 2019-03-18 | 2019-07-23 | 中国科学院深圳先进技术研究院 | 一种人脸识别模型的生成方法及设备 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104778441A (zh) * | 2015-01-07 | 2015-07-15 | 深圳市唯特视科技有限公司 | 融合灰度信息和深度信息的多模态人脸识别装置及方法 |
CN106909905B (zh) * | 2017-03-02 | 2020-02-14 | 中科视拓(北京)科技有限公司 | 一种基于深度学习的多模态人脸识别方法 |
CN107463919A (zh) * | 2017-08-18 | 2017-12-12 | 深圳市唯特视科技有限公司 | 一种基于深度3d卷积神经网络进行面部表情识别的方法 |
US11182597B2 (en) * | 2018-01-19 | 2021-11-23 | Board Of Regents, The University Of Texas Systems | Systems and methods for evaluating individual, group, and crowd emotion engagement and attention |
CN108509843B (zh) * | 2018-02-06 | 2022-01-28 | 重庆邮电大学 | 一种基于加权的Huber约束稀疏编码的人脸识别方法 |
CN109472240B (zh) * | 2018-11-12 | 2020-02-28 | 北京影谱科技股份有限公司 | 人脸识别多模型自适应特征融合增强方法和装置 |
-
2019
- 2019-03-18 CN CN201910202253.XA patent/CN110046551B/zh active Active
- 2019-12-31 WO PCT/CN2019/130815 patent/WO2020186886A1/zh active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180137396A1 (en) * | 2014-08-29 | 2018-05-17 | Google Llc | Processing images using deep neural networks |
CN107871105A (zh) * | 2016-09-26 | 2018-04-03 | 北京眼神科技有限公司 | 一种人脸认证方法和装置 |
WO2019009449A1 (ko) * | 2017-07-06 | 2019-01-10 | 삼성전자 주식회사 | 영상을 부호화/복호화 하는 방법 및 그 장치 |
CN108573243A (zh) * | 2018-04-27 | 2018-09-25 | 上海敏识网络科技有限公司 | 一种基于深度卷积神经网络的低质量人脸的比对方法 |
CN108985236A (zh) * | 2018-07-20 | 2018-12-11 | 南京开为网络科技有限公司 | 一种基于深度化可分离卷积模型的人脸识别方法 |
CN109117817A (zh) * | 2018-08-28 | 2019-01-01 | 摩佰尔(天津)大数据科技有限公司 | 人脸识别的方法及装置 |
CN110046551A (zh) * | 2019-03-18 | 2019-07-23 | 中国科学院深圳先进技术研究院 | 一种人脸识别模型的生成方法及设备 |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112016523A (zh) * | 2020-09-25 | 2020-12-01 | 北京百度网讯科技有限公司 | 跨模态人脸识别的方法、装置、设备和存储介质 |
CN112101552A (zh) * | 2020-09-25 | 2020-12-18 | 北京百度网讯科技有限公司 | 用于训练模型的方法、装置、设备以及存储介质 |
CN112016523B (zh) * | 2020-09-25 | 2023-08-29 | 北京百度网讯科技有限公司 | 跨模态人脸识别的方法、装置、设备和存储介质 |
CN112085540A (zh) * | 2020-09-27 | 2020-12-15 | 湖北科技学院 | 基于人工智能技术的广告智能推送系统及方法 |
CN112215136B (zh) * | 2020-10-10 | 2023-09-05 | 北京奇艺世纪科技有限公司 | 一种目标人物识别方法、装置、电子设备及存储介质 |
CN112215136A (zh) * | 2020-10-10 | 2021-01-12 | 北京奇艺世纪科技有限公司 | 一种目标人物识别方法、装置、电子设备及存储介质 |
CN112232236A (zh) * | 2020-10-20 | 2021-01-15 | 城云科技(中国)有限公司 | 行人流量的监测方法、系统、计算机设备和存储介质 |
CN112232236B (zh) * | 2020-10-20 | 2024-02-06 | 城云科技(中国)有限公司 | 行人流量的监测方法、系统、计算机设备和存储介质 |
CN112149634A (zh) * | 2020-10-23 | 2020-12-29 | 北京百度网讯科技有限公司 | 图像生成器的训练方法、装置、设备以及存储介质 |
CN112149634B (zh) * | 2020-10-23 | 2024-05-24 | 北京神州数码云科信息技术有限公司 | 图像生成器的训练方法、装置、设备以及存储介质 |
CN112183491A (zh) * | 2020-11-04 | 2021-01-05 | 北京百度网讯科技有限公司 | 表情识别模型及训练方法、识别方法、装置和计算设备 |
CN112633203A (zh) * | 2020-12-29 | 2021-04-09 | 上海商汤智能科技有限公司 | 关键点检测方法及装置、电子设备和存储介质 |
CN113487013A (zh) * | 2021-06-29 | 2021-10-08 | 杭州中葳数字科技有限公司 | 一种基于注意力机制的排序分组卷积方法 |
CN113487013B (zh) * | 2021-06-29 | 2024-05-07 | 杭州中葳数字科技有限公司 | 一种基于注意力机制的排序分组卷积方法 |
CN113674161A (zh) * | 2021-07-01 | 2021-11-19 | 清华大学 | 一种基于深度学习的人脸残缺扫描补全方法、装置 |
CN113505740B (zh) * | 2021-07-27 | 2023-10-10 | 北京工商大学 | 基于迁移学习和卷积神经网络的面部识别方法 |
CN113903053A (zh) * | 2021-09-26 | 2022-01-07 | 厦门大学 | 基于统一中间模态的跨模态行人重识别方法 |
CN113903053B (zh) * | 2021-09-26 | 2024-06-07 | 厦门大学 | 基于统一中间模态的跨模态行人重识别方法 |
CN114140350A (zh) * | 2021-11-24 | 2022-03-04 | 四川大学锦江学院 | 一种应用于无人机中的量子图像修复方法及装置 |
CN114359034B (zh) * | 2021-12-24 | 2023-08-08 | 北京航空航天大学 | 一种基于手绘的人脸图片生成方法及系统 |
CN114359034A (zh) * | 2021-12-24 | 2022-04-15 | 北京航空航天大学 | 一种基于手绘的人脸图片生成方法及系统 |
CN114756425A (zh) * | 2022-03-08 | 2022-07-15 | 深圳集智数字科技有限公司 | 智能监控方法、装置、电子设备及计算机可读存储介质 |
CN114863542B (zh) * | 2022-07-06 | 2022-09-30 | 武汉微派网络科技有限公司 | 基于多模态的未成年人识别方法及系统 |
CN114863542A (zh) * | 2022-07-06 | 2022-08-05 | 武汉微派网络科技有限公司 | 基于多模态的未成年人识别方法及系统 |
CN118230396A (zh) * | 2024-05-22 | 2024-06-21 | 苏州元脑智能科技有限公司 | 人脸识别及其模型训练方法、装置、设备、介质及产品 |
Also Published As
Publication number | Publication date |
---|---|
CN110046551A (zh) | 2019-07-23 |
CN110046551B (zh) | 2021-04-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020186886A1 (zh) | 一种人脸识别模型的生成方法及设备 | |
Ullah et al. | A Real‐Time Framework for Human Face Detection and Recognition in CCTV Images | |
WO2021218060A1 (zh) | 基于深度学习的人脸识别方法及装置 | |
WO2020228525A1 (zh) | 地点识别及其模型训练的方法和装置以及电子设备 | |
Tao et al. | Manifold ranking-based matrix factorization for saliency detection | |
WO2021135509A1 (zh) | 图像处理方法、装置、电子设备及存储介质 | |
WO2021143101A1 (zh) | 人脸识别方法和人脸识别装置 | |
WO2021159769A1 (zh) | 图像检索方法、装置、存储介质及设备 | |
JP6411510B2 (ja) | 無制約の媒体内の顔を識別するシステムおよび方法 | |
Gao et al. | 3-D object retrieval and recognition with hypergraph analysis | |
Wang et al. | Background-driven salient object detection | |
CN112232184B (zh) | 一种基于深度学习和空间转换网络的多角度人脸识别方法 | |
CN111091075A (zh) | 人脸识别方法、装置、电子设备及存储介质 | |
CN112052831A (zh) | 人脸检测的方法、装置和计算机存储介质 | |
CN110569724B (zh) | 一种基于残差沙漏网络的人脸对齐方法 | |
CN113298158B (zh) | 数据检测方法、装置、设备及存储介质 | |
CN111091129B (zh) | 一种基于多重颜色特征流形排序的图像显著区域提取方法 | |
Wang et al. | Robust head pose estimation via supervised manifold learning | |
Diaz-Chito et al. | Continuous head pose estimation using manifold subspace embedding and multivariate regression | |
CN109948420A (zh) | 人脸比对方法、装置及终端设备 | |
Wan et al. | Palmprint recognition system for mobile device based on circle loss | |
Deng et al. | Self-feedback image retrieval algorithm based on annular color moments | |
Yuan et al. | Explore double-opponency and skin color for saliency detection | |
Guo et al. | Automatic face recognition of target images based on deep learning algorithms | |
Ye et al. | Fast single sample face recognition based on sparse representation classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19919783 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19919783 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19919783 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 22.03.2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19919783 Country of ref document: EP Kind code of ref document: A1 |