CN106503669B

CN106503669B - Training and recognition method and system based on multitask deep learning network

Info

Publication number: CN106503669B
Application number: CN201610952920.2A
Authority: CN
Inventors: 周曦; 焦宾
Original assignee: Chongqing Zhongke Yuncong Technology Co Ltd
Current assignee: Chongqing Zhongke Yuncong Technology Co., Ltd.
Priority date: 2016-11-02
Filing date: 2016-11-02
Publication date: 2019-12-10
Anticipated expiration: 2036-11-02
Also published as: CN106503669A

Abstract

The invention provides a training method, an identification method and a system based on a multitask deep learning network, wherein the training method comprises the following steps: acquiring a face area of a face image in a training set; carrying out key point detection on the face area to obtain key feature point positions; affine transformation is carried out on the face image according to the key feature position to obtain an aligned face image; inputting the aligned face images into a multi-task deep learning network for training to obtain a multi-task deep learning network model; the identification method comprises the following steps: carrying out affine transformation on the face image to be recognized according to the key feature position of the face image to be recognized to obtain an aligned face image; inputting the aligned face images into a trained multi-task deep learning network model for feature extraction to obtain feature information; and respectively matching the characteristic information of the facial image to be recognized with the characteristic information corresponding to each facial image in the registered set to obtain a recognition result. Therefore, the efficiency of multi-task deep learning network training and recognition can be improved.

Description

training and recognition method and system based on multitask deep learning network

Technical Field

The invention relates to the technical field of face recognition, in particular to a training method, a recognition method and a system based on a multi-task deep learning network.

background

the face recognition technology is a technology for performing identity verification by using physiological characteristics or behavior characteristics which are owned by human beings and can uniquely mark the identity of the human beings based on a biological characteristic recognition mode. With the increasingly wide application of the human-computer interaction technology, the human face recognition technology has very important significance in the field of human-computer interaction. As one of the main research methods in the field of pattern recognition and machine learning, a large number of face recognition algorithms have been proposed.

At present, in a face recognition mode and various attribute recognition modes thereof, a deep learning network is usually trained independently according to different tasks to obtain respective deep learning network models, and then the deep learning network models obtained through training are independently recognized. However, the existing single-task deep learning network has low training and recognition efficiency, thereby causing the overall performance of the network to be reduced.

disclosure of Invention

in view of the above-mentioned shortcomings of the prior art, an object of the present invention is to provide a training method, a recognition method and a system based on a multitask deep learning network, which can improve the efficiency and recognition rate of the training and recognition of the multitask deep learning network.

to achieve the above and other related objects, an embodiment of the present invention provides a training method based on a multitask deep learning network, including:

acquiring a face area of a face image in a training set;

carrying out key point detection on the face area to obtain key feature point positions of the face area;

Affine transformation is carried out on the face image according to the key feature position to obtain an aligned face image;

Inputting the aligned face images into a multitask deep learning neural network for training to obtain a multitask deep learning neural network model;

Wherein, the structure of multitask deep learning neural network adopts the GoogleNet structure, multitask deep learning neural network includes face identification task, age identification task, gender identification task, the face identification task the age identification task reaches gender identification task sharing multitask deep learning neural network's convolution layer and first all-link layer, the loss function of the specific all-link layer of every task of first all-link layer connection and every task, the loss function of face identification task is the triplet function the loss function of age identification task is the softmax function the loss function of gender identification task is the softmax function.

preferably, the inputting the aligned face images into a multitask deep learning neural network for training to obtain a multitask deep learning neural network model includes:

inputting the aligned face image into a first layer convolution of a multitask deep learning neural network to complete convolution operation;

Inputting the obtained operation result to a second layer of convolution of the multi-task deep learning neural network to complete convolution operation until the obtained operation result is input to an Nth layer of convolution of the multi-task deep learning neural network to complete convolution operation, and then linking two full link layers to obtain a final training result;

and determining the multitask deep learning neural network model according to the training result.

Preferably, the loss functions of each task of the face recognition task, the age recognition task and the gender recognition task are added according to a weight proportion to obtain a total loss function of the multitask deep learning neural network.

the embodiment of the invention also provides an identification method based on the multitask deep learning network, which comprises the following steps:

Acquiring a face area of a face image to be recognized;

carrying out affine transformation on the face image to be recognized according to the key feature position to obtain an aligned face image;

inputting the aligned face images into a trained multitask deep learning neural network model for feature extraction to obtain feature information of the face images to be recognized;

respectively matching the characteristic information of the facial image to be recognized with the characteristic information corresponding to each facial image in the registered set to obtain a recognition result;

preferably, the matching the feature information of the facial image to be recognized with the feature information corresponding to each facial image in the registered set respectively to obtain the recognition result includes:

Determining similarity values of the facial images to be recognized and each facial image in the registered set respectively by calculating Euclidean distances between the characteristic information of the facial images to be recognized and the characteristic information corresponding to each facial image in the registered set respectively;

and determining a recognition result according to the similarity value between the facial image to be recognized and each facial image in the registered set and a preset similarity threshold value.

According to the method, the embodiment of the invention provides a training system based on a multitask deep learning network, which comprises the following steps: the system comprises a face region acquisition module, a key point detection module, a face alignment module and a training module; wherein the content of the first and second substances,

the face region acquisition module is used for acquiring a face region of a face image in a training set;

the key point detection module is used for detecting key points of the face area to obtain key feature point positions of the face area;

the face alignment module is used for carrying out affine transformation on the face image according to the key feature position to obtain an aligned face image;

The training module is used for inputting the aligned face images into the multitask deep learning neural network for training to obtain a multitask deep learning neural network model;

preferably, the training module is specifically configured to:

and obtaining the multitask deep learning neural network model according to the training result.

according to the method, the embodiment of the invention provides an identification system based on a multitask deep learning network, which comprises the following steps: the system comprises a face region acquisition module, a key point detection module, a face alignment module, a feature extraction module and a matching identification module; wherein the content of the first and second substances,

the face region acquisition module is used for acquiring a face region of a face image to be recognized;

the face alignment module is used for carrying out affine transformation on the face image to be recognized according to the key feature position to obtain an aligned face image;

The feature extraction module is used for inputting the aligned face images into the trained multitask deep learning neural network model for feature extraction to obtain feature information of the face images to be recognized;

The matching identification module is used for matching the characteristic information of the facial image to be identified with the characteristic information corresponding to each facial image in the registered set respectively to obtain an identification result;

preferably, the matching identification module is specifically configured to:

The invention provides a training method, an identification method and a system based on a multitask deep learning network, wherein the training method comprises the following steps: acquiring a face area of a face image in a training set; carrying out key point detection on the face area to obtain key feature point positions of the face area; affine transformation is carried out on the face image according to the key feature position to obtain an aligned face image; inputting the aligned face images into a multi-task deep learning network for training to obtain a multi-task deep learning network model; the identification method comprises the following steps: acquiring a face area of a face image to be recognized; carrying out key point detection on the face area to obtain key feature point positions of the face area; carrying out affine transformation on the face image to be recognized according to the key feature position to obtain an aligned face image; inputting the aligned face images into a trained multi-task deep learning network model for feature extraction to obtain feature information of the face images to be recognized; respectively matching the characteristic information of the facial image to be recognized with the characteristic information corresponding to each facial image in the registered set to obtain a recognition result; the structure of the multitask deep learning network adopts a GoogleNet structure, the multitask deep learning network comprises a face recognition task, an age recognition task and a gender recognition task, the face recognition task, the age recognition task and the gender recognition task share a convolution layer and a first full link layer of the multitask deep learning network, the first full link layer is connected with a full link layer specific to each task and a loss function of each task, the loss function of the face recognition task is a triplet function, the loss function of the age recognition task is a softmax function, and the loss function of the gender recognition task is a softmax function. Therefore, in the embodiment of the invention, the face recognition task, the gender recognition task and the age recognition task are arranged in the multi-task deep learning network, the correlation among the face recognition task, the gender recognition task and the age recognition task is established in the multi-task deep learning network, the multi-task deep learning network is used for training and learning common characteristics to obtain a multi-task deep learning network model, and then the trained multi-task deep learning network is used for extracting and recognizing the characteristics, so that the efficiency and the recognition rate of the overall multi-task deep learning network training and recognition can be improved, and the recognition rate of a single task can be improved.

Drawings

FIG. 1 is a schematic flow chart of a training method based on a multitask deep learning network according to the present invention;

FIG. 2 is a flowchart illustrating a recognition method based on a multitask deep learning network according to the present invention;

FIG. 3 is a schematic diagram illustrating the structure of the training system based on the multitask deep learning network according to the present invention;

FIG. 4 is a schematic diagram showing the structure of the recognition system based on the multitask deep learning network according to the present invention.

Detailed Description

in the embodiment of the invention, firstly, a face area of a face image in a training set is obtained; carrying out key point detection on the face area to obtain key feature point positions of the face area; affine transformation is carried out on the face image according to the key feature position to obtain an aligned face image; inputting the aligned face images into a multi-task deep learning network for training to obtain a multi-task deep learning network model; and then, performing feature extraction and recognition on the face image to be recognized according to the trained multitask deep learning network model.

the invention is described in further detail below with reference to the figures and the embodiments.

the embodiment of the invention provides a training method based on a multitask deep learning network, which comprises the following steps of:

step S100: and acquiring the face area of the face image in the training set.

in the step, firstly, images containing human faces are collected, and human face regions and key feature points in the human face images are calibrated according to a preset rule to generate a training set. Specifically, a face region and key feature points in a face image are calibrated according to preset rules of a training set for an image containing a face, which is acquired by a user through various ways, and the position and scale information of the calibrated face region and the coordinate information of the key feature points are uploaded to a PC (personal computer) and stored in a corresponding document by a server.

in this step, a face detection algorithm may be used to obtain the face region of the face image in the training set, the face detection algorithm may be an AdaBoost algorithm or a deep learning face detection algorithm, and how to obtain the face region of the face image in the training set by using the AdaBoost algorithm or the deep learning face detection algorithm belongs to the prior art, and repeated parts are not described again. The face detection algorithm is not particularly limited herein.

In this step, the face image in the training set is one of bmp, jpg, tiff, gif, pcx, tga, exif, fpx, svg, psd, cdr, pcd, dxf, ufo, eps, ai, and raw in any format, and is a non-compressed image.

step S101: and carrying out key point detection on the face area to obtain the key feature point position of the face area.

In this step, how to perform the key point detection on the face region adopts the existing key point detection algorithm, and repeated parts are not described again.

Step S102: and carrying out affine transformation on the face image according to the key feature position to obtain an aligned face image.

in this step, how to obtain an aligned face image by affine transformation of the face image according to the key feature position belongs to the prior art, and repeated parts are not described again.

step S103: and inputting the aligned face images into a multi-task deep learning network for training to obtain a multi-task deep learning network model.

In this step, the structure of multitask deep learning network adopts the GoogleNet structure, multitask deep learning network includes face identification task, age identification task, gender identification task, the face identification task the age identification task reaches gender identification task sharing the convolution layer and the first all-link layer of multitask deep learning network, the loss function of the specific all-link layer of every task of first all-link layer connection and every task, the loss function of face identification task be the triplet function the loss function of age identification task be the softmax function the loss function of gender identification task is the softmax function.

specifically, the triplet loss function of the face recognition task is as follows:

wherein, gamma is a triple of the face image in the training setthe collection of (a) and (b),is a matrix of images of a human face,Is another face image matrix belonging to the same class as the face image,is a face image matrix which does not belong to the same category as the face image, and alpha is a matrix in a tripleSum matrixSimilarity value and matrix ofSum matrixI is a positive integer.

Specifically, the softmax loss function of the gender identification task is as follows:

L＝-(1-g)·log(1-p₀)-g·log(p₁)

Wherein, if the gender is female, g is set to 0; if the gender is male, setting g to 1; p₀For the calculated probability of gender being female, P, from the multitask deep learning network₁And the sex calculated from the multitask deep learning network is male.

specifically, the softmax loss function of the age identification task is as follows:

L＝-(g₀)·log(p₀)-g₁·log(p₁)-…g_n·log(p_n)

Wherein, P_nFor each age probability, g, calculated from the multitask deep learning network_nFor each age weight coefficient, n is a positive integer.

specifically, the loss function of each task is distributed according to a weight proportion, and the total loss function of the multitask deep learning network obtained by adding the loss functions of each task of the face recognition task, the age recognition task and the gender recognition task according to the weight proportion is as follows:

wherein L is_allFor the total loss function, L, of the multitask deep learning network_na loss function, λ, for the nth task in said multi-task deep learning network_nThe weighting scaling factor in the overall loss for the nth task.

here, the weight ratio of the loss function of each of the face recognition task, the age recognition task, and the gender recognition task is set according to actual conditions and requirements, and the weight ratio of the loss function of each of the face recognition task, the age recognition task, and the gender recognition task is not particularly limited.

In this step, the multitask deep learning network needs to be trained, and the multitask deep learning network is specifically trained in the following manner:

inputting the aligned face images into a first layer of convolution of a multitask deep learning network to complete convolution operation;

inputting the obtained operation result to the second layer of convolution of the multi-task deep learning network to complete convolution operation until the obtained operation result is input to the Nth layer of convolution of the multi-task deep learning network to complete convolution operation, and then linking two full link layers to obtain a final training result;

and determining the multi-task deep learning network model according to the training result.

it should be noted that the convolution process of the multitask deep learning network belongs to the prior art, and repeated parts are not described again.

the embodiment of the invention provides an identification method based on a multitask deep learning network, which comprises the following steps of:

step S200: and acquiring a face area of the face image to be recognized.

in this step, the same face detection algorithm as that in step S100 is used to obtain the face region of the face image to be recognized, and the repeated parts are not described again.

step S201: and carrying out key point detection on the face area to obtain the key feature point position of the face area.

in this step, the same key point detection algorithm as that in step S101 is used to perform key point detection on the face region of the face image to be recognized, so as to obtain the key feature point position of the face region.

Step S202: and carrying out affine transformation on the face image to be recognized according to the key feature position to obtain an aligned face image.

in this step, as in step S102, how to obtain an aligned face image by affine transformation of the face image according to the key feature position belongs to the prior art, and repeated parts are not described again.

Step S203: and inputting the aligned face images into the trained multi-task deep learning network model for feature extraction to obtain feature information of the face images to be recognized.

In this step, the multitask deep learning network model is the network model that obtains for training through step S100 ~ S102, wherein, the structure of multitask deep learning network adopts GoogleNet structure, multitask deep learning network includes face identification task, age identification task, gender identification task, the face identification task the age identification task reaches gender identification task shares the convolution layer and the first all-link layer of multitask deep learning network, the loss function of the specific all-link layer of every task of first all-link layer connection and every task, the loss function of face identification task is the triplet function, the loss function of age identification task is the softmax function, the loss function of recognition task is the softmax function.

in this step, the aligned face images are input into the trained deep learning network model for feature extraction, and the output of the first full link layer of the multi-task deep learning network model is used as the feature information of the face images to be recognized.

Step S204: and respectively matching the characteristic information of the facial image to be recognized with the characteristic information corresponding to each facial image in the registered set to obtain a recognition result.

Specifically, firstly, determining similarity values between the facial image to be recognized and each facial image in the registered set by calculating Euclidean distances between the characteristic information of the facial image to be recognized and the characteristic information corresponding to each facial image in the registered set respectively;

And then determining a recognition result according to the similarity value between the facial image to be recognized and each facial image in the registered set and a preset similarity threshold value.

Here, how to calculate the euclidean distance between the feature information of the face image to be recognized and the feature information corresponding to each face image in the registration set belongs to the prior art, and repeated parts are not described again.

Here, the similarity threshold may be preset according to actual situations and requirements, and is not specifically limited herein.

how to determine the recognition result according to the similarity value between the facial image to be recognized and each facial image in the registered set and a preset similarity threshold value is explained in detail below:

If any similarity value is larger than or equal to the preset similarity threshold, matching is successful, and the identification result is the identification success and the number of the corresponding category.

And if all the similarity values are smaller than the preset similarity threshold value, the matching fails and the recognition result is output as the recognition failure.

in order to implement the method, the embodiment of the invention also provides a training system based on the fusion of key feature points of the multitask deep learning network and an identification system based on the fusion of key feature points of the multitask deep learning network.

An embodiment of the present invention provides a training system based on a multitask deep learning network, as shown in fig. 3, the system includes: a face region acquisition module 300, a key point detection module 301, a face alignment module 302 and a training module 303; wherein the content of the first and second substances,

the face region acquiring module 300 is configured to acquire a face region of a face image in a training set;

The key point detection module 301 is configured to perform key point detection on the face region to obtain key feature point positions of the face region;

the face alignment module 302 is configured to perform affine transformation on the face image according to the key feature position to obtain an aligned face image;

The training module 303 is configured to input the aligned face images into a multitask deep learning network for training to obtain a multitask deep learning network model;

the structure of the multitask deep learning network adopts a GoogleNet structure, the multitask deep learning network comprises a face recognition task, an age recognition task and a gender recognition task, the face recognition task, the age recognition task and the gender recognition task share a convolution layer and a first full link layer of the multitask deep learning network, the first full link layer is connected with a full link layer specific to each task and a loss function of each task, the loss function of the face recognition task is a triplet function, the loss function of the age recognition task is a softmax function, and the loss function of the gender recognition task is a softmax function.

In a specific implementation, the training module 303 is specifically configured to:

And obtaining the multi-task deep learning network model according to the training result.

In specific implementation, the loss functions of each task of the face recognition task, the age recognition task and the gender recognition task are added according to a weight proportion to obtain a total loss function of the multi-task deep learning network.

the above division manner of the functional modules is only one preferred implementation manner given in the embodiment of the present invention, and the division manner of the functional modules does not limit the present invention. For convenience of description, the parts of the system described above are separately described as functionally divided into various modules or units. Of course, the functionality of the various modules or units may be implemented in the same one or more pieces of software or hardware in practicing the invention.

The embodiment of the invention provides an identification system based on a multitask deep learning network, as shown in fig. 4, the system comprises: a face region acquisition module 400, a key point detection module 401, a face alignment module 402, a feature extraction module 403 and a matching identification module 404; wherein the content of the first and second substances,

The face region acquiring module 400 is configured to acquire a face region of a face image to be recognized;

The key point detection module 401 is configured to perform key point detection on the face region to obtain key feature point positions of the face region;

the face alignment module 402 is configured to perform affine transformation on a face image to be recognized according to the key feature position to obtain an aligned face image;

the feature extraction module 403 is configured to input the aligned face images into a trained multitask deep learning network model for feature extraction, so as to obtain feature information of the face images to be recognized;

The matching identification module 404 is configured to match the feature information of the facial image to be identified with the feature information corresponding to each facial image in the registered set, so as to obtain an identification result;

In a specific implementation, the matching identification module 404 is specifically configured to:

In summary, in the training process, the embodiment of the present invention first obtains the face area of the face image in the training set; carrying out key point detection on the face area to obtain key feature point positions of the face area; affine transformation is carried out on the face image according to the key feature position to obtain an aligned face image; inputting the aligned face images into a multi-task deep learning network for training to obtain a multi-task deep learning network model; in the identification process, acquiring a face area of a face image to be identified; carrying out key point detection on the face area to obtain key feature point positions of the face area; carrying out affine transformation on the face image to be recognized according to the key feature position to obtain an aligned face image; inputting the aligned face images into a trained multi-task deep learning network model for feature extraction to obtain feature information of the face images to be recognized; respectively matching the characteristic information of the facial image to be recognized with the characteristic information corresponding to each facial image in the registered set to obtain a recognition result; the structure of the multitask deep learning network adopts a GoogleNet structure, the multitask deep learning network comprises a face recognition task, an age recognition task and a gender recognition task, the face recognition task, the age recognition task and the gender recognition task share a convolution layer and a first full link layer of the multitask deep learning network, the first full link layer is connected with a full link layer specific to each task and a loss function of each task, the loss function of the face recognition task is a triplet function, the loss function of the age recognition task is a softmax function, and the loss function of the gender recognition task is a softmax function. Therefore, the embodiment of the invention obtains a multitask deep learning network model through the multitask deep learning network training, the model can extract the fusion characteristic information including face recognition, gender recognition and age recognition, and the multitask recognition is carried out according to the characteristic information, so that the multitask recognition of the face and the attribute characteristics is realized, the efficiency and the recognition rate of the whole multitask deep learning network training and recognition can be improved, and the recognition rate of a single task can be improved.

the foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. a training method based on a multitask deep learning network is characterized by comprising the following steps:

acquiring a face area of a face image in a training set;

inputting the aligned face images into a multi-task deep learning network for training to obtain a multi-task deep learning network model;

the multitask deep learning network adopts a GoogleNet structure, the multitask deep learning network comprises a face recognition task, an age recognition task and a gender recognition task, the face recognition task, the age recognition task and the gender recognition task share a convolution layer and a first full link layer of the multitask deep learning network, the first full link layer is connected with the full link layer specific to each task and a loss function of each task, the loss function of the face recognition task is a triplet function, the loss function of the age recognition task is a softmax function, and the loss function of the gender recognition task is a softmax function; and adding the loss functions of each task of the face recognition task, the age recognition task and the gender recognition task according to a weight proportion to obtain a total loss function of the multi-task deep learning network.

2. the method of claim 1, wherein the inputting the aligned face images into a multitask deep learning network for training to obtain a multitask deep learning network model comprises:

3. a recognition method based on a multitask deep learning network is characterized by comprising the following steps:

Acquiring a face area of a face image to be recognized;

Inputting the aligned face images into a trained multi-task deep learning network model for feature extraction to obtain feature information of the face images to be recognized;

4. The method according to claim 3, wherein the matching the feature information of the facial image to be recognized with the feature information corresponding to each facial image in the registered set respectively to obtain the recognition result comprises:

5. a training system based on a multitask deep learning network, the system comprising: the system comprises a face region acquisition module, a key point detection module, a face alignment module and a training module; wherein the content of the first and second substances,

The training module is used for inputting the aligned face images into a multi-task deep learning network for training to obtain a multi-task deep learning network model;

6. the system of claim 5, wherein the training module is specifically configured to:

7. A recognition system based on a multitask deep learning network, the system comprising: the system comprises a face region acquisition module, a key point detection module, a face alignment module, a feature extraction module and a matching identification module; wherein the content of the first and second substances,

The feature extraction module is used for inputting the aligned face images into the trained multi-task deep learning network model for feature extraction to obtain feature information of the face images to be recognized;

8. the system of claim 7, wherein the match identification module is specifically configured to: