CN113158902B

CN113158902B - Knowledge distillation-based method for automatically training recognition model

Info

Publication number: CN113158902B
Application number: CN202110439569.8A
Authority: CN
Inventors: 朱鑫懿; 魏文应; 张世雄; 龙仕强; 陈智敏
Original assignee: Instritute Of Intelligent Video Audio Technology Longgang Shenzhen
Current assignee: Instritute Of Intelligent Video Audio Technology Longgang Shenzhen
Priority date: 2021-04-23
Filing date: 2021-04-23
Publication date: 2023-08-11
Anticipated expiration: 2041-04-23
Also published as: CN113158902A

Abstract

A method for automatically training a recognition model based on knowledge distillation, comprising s1: the terminal equipment uploads the collected face image data to a server; s2: the server side extracts the collected face features by using a model with large calculation amount and generates corresponding soft label data; s3: mixing soft tag data and manually marked hard tag data, and training a model with small calculated amount by using the mixed data; and s4: after the model with small calculation amount is trained, selecting an optimal model with small calculation amount and updating the optimal model to each terminal device. The face recognition model lifting method based on knowledge distillation can realize automatic data acquisition, data labeling and model training, and is used for improving efficiency and saving labor cost.

Description

Knowledge distillation-based method for automatically training recognition model

Technical Field

The invention relates to the technical field of image recognition, in particular to a knowledge distillation-based method for automatically training a recognition model.

Background

With the development of artificial intelligence and computing hardware, image recognition technology based on deep learning is widely applied in various fields, for example, face recognition is one of the most successful and mature applications in the field of computer vision, and is applied to scenes such as mobile phone face brushing unlocking, company face attendance, face passing in famous navigation, face brushing payment in markets and the like. The development of deep learning and mass data in big data age make the face recognition technology surpass the traditional face recognition algorithm.

In order to better identify the model accuracy, a model with large calculation amount is generally selected, the corresponding calculation speed is slow, the real-time requirement on the terminal equipment is difficult to achieve, and the cost for achieving the real-time requirement on the server side is high. In the terminal device, a small-calculation-amount model is generally used to meet the real-time requirement, but the accuracy is lower than that of a large-calculation-amount model. Compared with other image recognition fields, the face recognition needs a larger amount of data, namely tens of millions of data are needed, and more data can reach more than one hundred million. The label of the face data used by the conventional general model training method is a hard label, massive data are cleaned and marked by high-cost manpower, and a large amount of manpower and time cost are consumed for obtaining the hard label. When the amount of face data rises to a certain extent, a labeling person also has difficulty in distinguishing different faces, errors are more easily labeled, and obtaining high-quality labeling data becomes more difficult.

Disclosure of Invention

The invention provides a knowledge distillation-based automatic training recognition model method, and provides a knowledge distillation-based face recognition model lifting method.

The technical scheme of the invention is as follows:

a method for automatically training a recognition model based on knowledge distillation, comprising the steps of: s1: the terminal equipment uploads the collected face image data to a server; s2: the server side extracts the collected face features by using a model with large calculation amount and generates corresponding soft label data; s3: mixing soft tag data and manually marked hard tag data, and training a model with small calculated amount by using the mixed data; and s4: after the model with small calculation amount is trained, selecting an optimal model with small calculation amount and updating the optimal model to each terminal device.

Preferably, in the above method, in step s1, the terminal device collects a face image and stores the collected image on the local storage device, and uploads data to the server through the network at night when no person uses the image.

Preferably, in the above method, in step s2, after the number of the received images reaches a certain number, the server extracts the face features as soft tag data of the face image by using the model with large calculation amount trained by the high-quality labeling data.

Preferably, in the above method, in step s3, the model with small calculation amount is trained, and the model with small calculation amount is trained on the server side by using a combination of a knowledge distillation method and a general method, wherein the knowledge distillation method is used for training the model with small calculation amount by soft labels of images extracted from the model with large calculation amount, and the general method is used for training the model with small calculation amount by high quality labeling data.

Preferably, in the above method, in step s3, for the ith sample in the soft tag data, the face feature is extracted using a model with small calculation amount, and the loss value is calculated using cosine similarity, where the expression is as shown in (1):

wherein < > is the cosine similarity of the feature sums, M is the number of samples; for hard tag data, using the general method, the Softmax-based loss value, the expression is shown in (2):

where wj is the mean of the features of the j-th class learned by the last layer of the computationally inexpensive model.

Preferably, in the above method, after the training of the model with small calculation amount is completed in step s4, the model with small calculation amount is tested on the server, and the optimal model with small calculation amount is selected and updated to each terminal.

According to the technical scheme of the invention, the beneficial effects are that:

the method of the invention utilizes the model with large calculation amount to improve the accuracy of the model with small calculation amount, so that the model with small calculation amount on the terminal equipment can achieve real-time accuracy similar to the model with large calculation amount; the data marking in the method does not need manpower marking, and reduces the manpower cost and marking difficulty.

For a better understanding and explanation of the conception, working principle and inventive effect of the present invention, the present invention is described in detail below by way of specific examples with reference to the accompanying drawings, in which:

drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

FIG. 1 is a flow diagram of a method of automatically training an identification model based on knowledge distillation in accordance with the present invention.

FIG. 2 is a flow chart of the method steps of the knowledge-based automated training recognition model of the present invention.

Detailed Description

The invention relates to a knowledge distillation-based automatic training recognition model method, wherein distillation is to assist in training a model with small calculation amount (short for a small model) through a model with large calculation amount (short for a large model), so as to improve the recognition performance of the model with small calculation amount, and specifically, a soft label is generated for the model with large calculation amount to train the model with small calculation amount (a hard label corresponding to the soft label is generally called a manually marked label, and the label generated by the model is a soft label). The method starts from the performance difference of the complex model with large calculated amount and the model with small calculated amount, and improves the face recognition accuracy through the model with large calculated amount and the model with small calculated amount, and simultaneously can realize automatic data acquisition, data labeling and model training, and reduce the cost of manpower and material resources required by data cleaning.

The principle of the invention is as follows: 1. ) The knowledge distillation method is adopted, namely, the model with large calculation amount is used for extracting the face features from the data, and the model with small calculation amount is used for learning the face features extracted by the model with large calculation amount, so that the performance similar to that of the model with large calculation amount is achieved. 2. ) The face features are extracted through the model with large calculation amount to serve as a soft tag, and compared with a common hard tag, the soft tag does not need additional manpower for marking, so that a large amount of manpower cost is saved. 3. ) The terminal equipment collects face image data, uploads the face image data in batches to the server at regular time, extracts face features at the server to generate soft labels and train, and distributes a trained model with small calculated amount to each terminal, so that automatic collection and training are realized.

The method for automatically training the recognition model based on knowledge distillation provided by the discovery comprises 4 parts: the terminal uploads the face image to the server to provide training data; generating a soft label corresponding to the data on the server by using a model with large calculation amount, and using the soft label for assisting model training with small calculation amount; training a model with small calculation amount by using the collected face image data, the corresponding soft label and the labeled face image data and the corresponding hard label; and updating the trained model with small calculation amount to each terminal. The method for automatically training the recognition model based on the knowledge distillation comprises the following steps from data acquisition to knowledge distillation based model training and updating, as shown in fig. 1 and 2:

s1: and the terminal equipment uploads the acquired face image data to the server. Specifically, the terminal device collects face images and stores the collected images on the local storage device, and uploads data to the server through the network at a specific time, for example, the face data is uploaded at night without people so as to improve transmission efficiency. The terminal equipment is generally hardware equipment with lower cost, and only a model with small calculation amount can meet the requirement of real-time performance on the equipment.

s2: the server extracts the collected face features by using a model with large calculation amount, and generates corresponding soft label data (namely, the large model shown in fig. 1 generates the soft label). Specifically, after the number of the received images reaches a certain number, the server extracts the face features as soft label data of the face images by using a model with large calculated amount trained by high-quality labeling data. The soft label for the ith sample is denoted as Ti, where Ti is typically high latitude data (128 to 1024 dimensions). The marked high-quality face data is prepared, the hard tag of the ith sample in the data is marked as yi, the range of the hard tag is [0, n ], and n is the number of different people in the data set.

s3: the soft tag data and the artificially labeled hard tag data are mixed, and a computationally small model (i.e., the small model training shown in fig. 1) is trained using the mixed data. Specifically, the model with small calculation amount is trained on the server side by using a combination of a knowledge distillation method and a general method, the knowledge distillation method is used for training the model with small calculation amount through soft labels of images extracted from the model with large calculation amount, and the general method is used for training the model with small calculation amount through high-quality labeling data. For the ith sample in the soft label data, extracting the face characteristics of the sample by using a model with small calculation amount, and calculating the loss value of the sample by using cosine similarity, wherein the expression is as shown in (1):

where < > is the cosine similarity of the feature sums and M is the number of samples. For hard tag data, the general method, namely Softmax-based loss value, is used, and the expression is shown in (2):

wherein w is _j The mean value of the features of the j-th class learned for the last layer of the model with small calculation amount. Training is carried out by mixing the acquired data and the high-quality labeling data, and the accuracy and generalization performance of the model with small calculation amount are improved.

s4: after the model with small calculation amount is trained, selecting an optimal model with small calculation amount and updating the optimal model to each terminal device. Specifically, after the model with small calculation amount is trained, the model with small calculation amount is tested on the server, and the optimal model with small calculation amount is selected and updated on each terminal.

The method adopts a TAR-FAR test mode. Where TAR (TrueAcceptanceRate) refers to the ratio of the actual passage of the face that should pass, i.e., the higher the passage, the better the model performance. FAR (FalseAcceptanceRate) is the ratio of the passing faces which should not pass, namely the false recognition rate, and the lower the false recognition rate is, the better the performance of the model is. Let the passing rate at a certain false recognition rate be used as a measure, for example, "tar@far=1e-6" is expressed as the passing rate at FAR of 1e-6 (parts per million false recognition rate). The test results of the inventive method on the same test data are shown in table 1 below:

table 1: test results of the inventive method on test data

From table 1, it can be seen that the model with small calculation amount obtained by training the distillation method provided by the invention has a higher improvement in accuracy, which is close to the model with large calculation amount, than the model with small calculation amount obtained by training the general method.

The above description is of the best mode of carrying out the conception and the working principle of the present invention. The above examples should not be construed as limiting the scope of the claims, but other embodiments and combinations of implementations according to the inventive concept are within the scope of the invention.

Claims

1. A method for automatically training an identification model based on knowledge distillation, comprising the steps of:

s1: the terminal equipment uploads the collected face image data to a server;

s2: the server uses a model with large calculation amount to extract the collected face characteristics and generate corresponding soft label data,

specifically, after the number of the received images reaches a certain number, the server extracts the face features as soft label data of the face images by using a model with large calculated amount trained by high-quality labeling data;

the soft label for the ith sample is denoted as T _i Wherein T is _i For high latitude data, preparing marked high-quality face data, wherein the hard tag of the ith sample in the data is marked as y _i In the range of [0, n ]]N is the number of different people in the dataset;

s3: mixing the soft tag data with artificially labeled hard tag data, training a model with small calculation amount by using the mixed data,

training the model with small calculation amount, and training the model with small calculation amount by using a combination of a knowledge distillation method and a general method on the server, wherein the knowledge distillation method is used for training the model with small calculation amount by using soft labels of images extracted from the model with large calculation amount, the general method is used for training the model with small calculation amount by using high-quality labeling data,

for the ith sample in the soft label data, extracting the face characteristics of the sample by using the model with small calculation amount, and calculating the loss value by using cosine similarity, wherein the expression is as shown in (1):

wherein cos is<S _i ，T _i >The cosine similarity of the feature sums is that M is the number of samples; for hard tag data, using the general method, the Softmax-based loss value, the expression is as shown in (2):

wherein w is _j The mean value of the features of the j-th class learned for the last layer of the model with small calculation amount; and

s4: and after the model with small calculated amount is trained, selecting an optimal model with small calculated amount and updating the optimal model to each terminal device.

2. The method according to claim 1, wherein in step s1, the terminal device collects face images and stores the collected images on a local storage device, and uploads data to the server via a network at night when no person is using.

3. The method according to claim 1, wherein in step s4, after the training of the model with small calculation amount is completed, the model with small calculation amount is tested on the server side, and the optimal model with small calculation amount is selected and updated on each terminal.