CN113158902B - Knowledge distillation-based method for automatically training recognition model - Google Patents
Knowledge distillation-based method for automatically training recognition model Download PDFInfo
- Publication number
- CN113158902B CN113158902B CN202110439569.8A CN202110439569A CN113158902B CN 113158902 B CN113158902 B CN 113158902B CN 202110439569 A CN202110439569 A CN 202110439569A CN 113158902 B CN113158902 B CN 113158902B
- Authority
- CN
- China
- Prior art keywords
- model
- data
- calculation amount
- training
- small calculation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
A method for automatically training a recognition model based on knowledge distillation, comprising s1: the terminal equipment uploads the collected face image data to a server; s2: the server side extracts the collected face features by using a model with large calculation amount and generates corresponding soft label data; s3: mixing soft tag data and manually marked hard tag data, and training a model with small calculated amount by using the mixed data; and s4: after the model with small calculation amount is trained, selecting an optimal model with small calculation amount and updating the optimal model to each terminal device. The face recognition model lifting method based on knowledge distillation can realize automatic data acquisition, data labeling and model training, and is used for improving efficiency and saving labor cost.
Description
Technical Field
The invention relates to the technical field of image recognition, in particular to a knowledge distillation-based method for automatically training a recognition model.
Background
With the development of artificial intelligence and computing hardware, image recognition technology based on deep learning is widely applied in various fields, for example, face recognition is one of the most successful and mature applications in the field of computer vision, and is applied to scenes such as mobile phone face brushing unlocking, company face attendance, face passing in famous navigation, face brushing payment in markets and the like. The development of deep learning and mass data in big data age make the face recognition technology surpass the traditional face recognition algorithm.
In order to better identify the model accuracy, a model with large calculation amount is generally selected, the corresponding calculation speed is slow, the real-time requirement on the terminal equipment is difficult to achieve, and the cost for achieving the real-time requirement on the server side is high. In the terminal device, a small-calculation-amount model is generally used to meet the real-time requirement, but the accuracy is lower than that of a large-calculation-amount model. Compared with other image recognition fields, the face recognition needs a larger amount of data, namely tens of millions of data are needed, and more data can reach more than one hundred million. The label of the face data used by the conventional general model training method is a hard label, massive data are cleaned and marked by high-cost manpower, and a large amount of manpower and time cost are consumed for obtaining the hard label. When the amount of face data rises to a certain extent, a labeling person also has difficulty in distinguishing different faces, errors are more easily labeled, and obtaining high-quality labeling data becomes more difficult.
Disclosure of Invention
The invention provides a knowledge distillation-based automatic training recognition model method, and provides a knowledge distillation-based face recognition model lifting method.
The technical scheme of the invention is as follows:
a method for automatically training a recognition model based on knowledge distillation, comprising the steps of: s1: the terminal equipment uploads the collected face image data to a server; s2: the server side extracts the collected face features by using a model with large calculation amount and generates corresponding soft label data; s3: mixing soft tag data and manually marked hard tag data, and training a model with small calculated amount by using the mixed data; and s4: after the model with small calculation amount is trained, selecting an optimal model with small calculation amount and updating the optimal model to each terminal device.
Preferably, in the above method, in step s1, the terminal device collects a face image and stores the collected image on the local storage device, and uploads data to the server through the network at night when no person uses the image.
Preferably, in the above method, in step s2, after the number of the received images reaches a certain number, the server extracts the face features as soft tag data of the face image by using the model with large calculation amount trained by the high-quality labeling data.
Preferably, in the above method, in step s3, the model with small calculation amount is trained, and the model with small calculation amount is trained on the server side by using a combination of a knowledge distillation method and a general method, wherein the knowledge distillation method is used for training the model with small calculation amount by soft labels of images extracted from the model with large calculation amount, and the general method is used for training the model with small calculation amount by high quality labeling data.
Preferably, in the above method, in step s3, for the ith sample in the soft tag data, the face feature is extracted using a model with small calculation amount, and the loss value is calculated using cosine similarity, where the expression is as shown in (1):
wherein < > is the cosine similarity of the feature sums, M is the number of samples; for hard tag data, using the general method, the Softmax-based loss value, the expression is shown in (2):
where wj is the mean of the features of the j-th class learned by the last layer of the computationally inexpensive model.
Preferably, in the above method, after the training of the model with small calculation amount is completed in step s4, the model with small calculation amount is tested on the server, and the optimal model with small calculation amount is selected and updated to each terminal.
According to the technical scheme of the invention, the beneficial effects are that:
the method of the invention utilizes the model with large calculation amount to improve the accuracy of the model with small calculation amount, so that the model with small calculation amount on the terminal equipment can achieve real-time accuracy similar to the model with large calculation amount; the data marking in the method does not need manpower marking, and reduces the manpower cost and marking difficulty.
For a better understanding and explanation of the conception, working principle and inventive effect of the present invention, the present invention is described in detail below by way of specific examples with reference to the accompanying drawings, in which:
drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
FIG. 1 is a flow diagram of a method of automatically training an identification model based on knowledge distillation in accordance with the present invention.
FIG. 2 is a flow chart of the method steps of the knowledge-based automated training recognition model of the present invention.
Detailed Description
The invention relates to a knowledge distillation-based automatic training recognition model method, wherein distillation is to assist in training a model with small calculation amount (short for a small model) through a model with large calculation amount (short for a large model), so as to improve the recognition performance of the model with small calculation amount, and specifically, a soft label is generated for the model with large calculation amount to train the model with small calculation amount (a hard label corresponding to the soft label is generally called a manually marked label, and the label generated by the model is a soft label). The method starts from the performance difference of the complex model with large calculated amount and the model with small calculated amount, and improves the face recognition accuracy through the model with large calculated amount and the model with small calculated amount, and simultaneously can realize automatic data acquisition, data labeling and model training, and reduce the cost of manpower and material resources required by data cleaning.
The principle of the invention is as follows: 1. ) The knowledge distillation method is adopted, namely, the model with large calculation amount is used for extracting the face features from the data, and the model with small calculation amount is used for learning the face features extracted by the model with large calculation amount, so that the performance similar to that of the model with large calculation amount is achieved. 2. ) The face features are extracted through the model with large calculation amount to serve as a soft tag, and compared with a common hard tag, the soft tag does not need additional manpower for marking, so that a large amount of manpower cost is saved. 3. ) The terminal equipment collects face image data, uploads the face image data in batches to the server at regular time, extracts face features at the server to generate soft labels and train, and distributes a trained model with small calculated amount to each terminal, so that automatic collection and training are realized.
The method for automatically training the recognition model based on knowledge distillation provided by the discovery comprises 4 parts: the terminal uploads the face image to the server to provide training data; generating a soft label corresponding to the data on the server by using a model with large calculation amount, and using the soft label for assisting model training with small calculation amount; training a model with small calculation amount by using the collected face image data, the corresponding soft label and the labeled face image data and the corresponding hard label; and updating the trained model with small calculation amount to each terminal. The method for automatically training the recognition model based on the knowledge distillation comprises the following steps from data acquisition to knowledge distillation based model training and updating, as shown in fig. 1 and 2:
s1: and the terminal equipment uploads the acquired face image data to the server. Specifically, the terminal device collects face images and stores the collected images on the local storage device, and uploads data to the server through the network at a specific time, for example, the face data is uploaded at night without people so as to improve transmission efficiency. The terminal equipment is generally hardware equipment with lower cost, and only a model with small calculation amount can meet the requirement of real-time performance on the equipment.
s2: the server extracts the collected face features by using a model with large calculation amount, and generates corresponding soft label data (namely, the large model shown in fig. 1 generates the soft label). Specifically, after the number of the received images reaches a certain number, the server extracts the face features as soft label data of the face images by using a model with large calculated amount trained by high-quality labeling data. The soft label for the ith sample is denoted as Ti, where Ti is typically high latitude data (128 to 1024 dimensions). The marked high-quality face data is prepared, the hard tag of the ith sample in the data is marked as yi, the range of the hard tag is [0, n ], and n is the number of different people in the data set.
s3: the soft tag data and the artificially labeled hard tag data are mixed, and a computationally small model (i.e., the small model training shown in fig. 1) is trained using the mixed data. Specifically, the model with small calculation amount is trained on the server side by using a combination of a knowledge distillation method and a general method, the knowledge distillation method is used for training the model with small calculation amount through soft labels of images extracted from the model with large calculation amount, and the general method is used for training the model with small calculation amount through high-quality labeling data. For the ith sample in the soft label data, extracting the face characteristics of the sample by using a model with small calculation amount, and calculating the loss value of the sample by using cosine similarity, wherein the expression is as shown in (1):
where < > is the cosine similarity of the feature sums and M is the number of samples. For hard tag data, the general method, namely Softmax-based loss value, is used, and the expression is shown in (2):
wherein w is j The mean value of the features of the j-th class learned for the last layer of the model with small calculation amount. Training is carried out by mixing the acquired data and the high-quality labeling data, and the accuracy and generalization performance of the model with small calculation amount are improved.
s4: after the model with small calculation amount is trained, selecting an optimal model with small calculation amount and updating the optimal model to each terminal device. Specifically, after the model with small calculation amount is trained, the model with small calculation amount is tested on the server, and the optimal model with small calculation amount is selected and updated on each terminal.
The method adopts a TAR-FAR test mode. Where TAR (TrueAcceptanceRate) refers to the ratio of the actual passage of the face that should pass, i.e., the higher the passage, the better the model performance. FAR (FalseAcceptanceRate) is the ratio of the passing faces which should not pass, namely the false recognition rate, and the lower the false recognition rate is, the better the performance of the model is. Let the passing rate at a certain false recognition rate be used as a measure, for example, "tar@far=1e-6" is expressed as the passing rate at FAR of 1e-6 (parts per million false recognition rate). The test results of the inventive method on the same test data are shown in table 1 below:
table 1: test results of the inventive method on test data
From table 1, it can be seen that the model with small calculation amount obtained by training the distillation method provided by the invention has a higher improvement in accuracy, which is close to the model with large calculation amount, than the model with small calculation amount obtained by training the general method.
The above description is of the best mode of carrying out the conception and the working principle of the present invention. The above examples should not be construed as limiting the scope of the claims, but other embodiments and combinations of implementations according to the inventive concept are within the scope of the invention.
Claims (3)
1. A method for automatically training an identification model based on knowledge distillation, comprising the steps of:
s1: the terminal equipment uploads the collected face image data to a server;
s2: the server uses a model with large calculation amount to extract the collected face characteristics and generate corresponding soft label data,
specifically, after the number of the received images reaches a certain number, the server extracts the face features as soft label data of the face images by using a model with large calculated amount trained by high-quality labeling data;
the soft label for the ith sample is denoted as T i Wherein T is i For high latitude data, preparing marked high-quality face data, wherein the hard tag of the ith sample in the data is marked as y i In the range of [0, n ]]N is the number of different people in the dataset;
s3: mixing the soft tag data with artificially labeled hard tag data, training a model with small calculation amount by using the mixed data,
training the model with small calculation amount, and training the model with small calculation amount by using a combination of a knowledge distillation method and a general method on the server, wherein the knowledge distillation method is used for training the model with small calculation amount by using soft labels of images extracted from the model with large calculation amount, the general method is used for training the model with small calculation amount by using high-quality labeling data,
for the ith sample in the soft label data, extracting the face characteristics of the sample by using the model with small calculation amount, and calculating the loss value by using cosine similarity, wherein the expression is as shown in (1):
wherein cos is<S i ,T i >The cosine similarity of the feature sums is that M is the number of samples; for hard tag data, using the general method, the Softmax-based loss value, the expression is as shown in (2):
wherein w is j The mean value of the features of the j-th class learned for the last layer of the model with small calculation amount; and
s4: and after the model with small calculated amount is trained, selecting an optimal model with small calculated amount and updating the optimal model to each terminal device.
2. The method according to claim 1, wherein in step s1, the terminal device collects face images and stores the collected images on a local storage device, and uploads data to the server via a network at night when no person is using.
3. The method according to claim 1, wherein in step s4, after the training of the model with small calculation amount is completed, the model with small calculation amount is tested on the server side, and the optimal model with small calculation amount is selected and updated on each terminal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110439569.8A CN113158902B (en) | 2021-04-23 | 2021-04-23 | Knowledge distillation-based method for automatically training recognition model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110439569.8A CN113158902B (en) | 2021-04-23 | 2021-04-23 | Knowledge distillation-based method for automatically training recognition model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113158902A CN113158902A (en) | 2021-07-23 |
CN113158902B true CN113158902B (en) | 2023-08-11 |
Family
ID=76869735
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110439569.8A Active CN113158902B (en) | 2021-04-23 | 2021-04-23 | Knowledge distillation-based method for automatically training recognition model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113158902B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109190521A (en) * | 2018-08-17 | 2019-01-11 | 北京亮亮视野科技有限公司 | A kind of construction method of the human face recognition model of knowledge based purification and application |
CN109214360A (en) * | 2018-10-15 | 2019-01-15 | 北京亮亮视野科技有限公司 | A kind of construction method of the human face recognition model based on ParaSoftMax loss function and application |
CN110223281A (en) * | 2019-06-06 | 2019-09-10 | 东北大学 | A kind of Lung neoplasm image classification method when in data set containing uncertain data |
CN110674880A (en) * | 2019-09-27 | 2020-01-10 | 北京迈格威科技有限公司 | Network training method, device, medium and electronic equipment for knowledge distillation |
CN112183670A (en) * | 2020-11-05 | 2021-01-05 | 南开大学 | Knowledge distillation-based few-sample false news detection method |
CN112329617A (en) * | 2020-11-04 | 2021-02-05 | 中国科学院自动化研究所 | New scene face recognition model construction method and system based on single source domain sample |
CN112686046A (en) * | 2021-01-06 | 2021-04-20 | 上海明略人工智能(集团)有限公司 | Model training method, device, equipment and computer readable medium |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11410029B2 (en) * | 2018-01-02 | 2022-08-09 | International Business Machines Corporation | Soft label generation for knowledge distillation |
-
2021
- 2021-04-23 CN CN202110439569.8A patent/CN113158902B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109190521A (en) * | 2018-08-17 | 2019-01-11 | 北京亮亮视野科技有限公司 | A kind of construction method of the human face recognition model of knowledge based purification and application |
CN109214360A (en) * | 2018-10-15 | 2019-01-15 | 北京亮亮视野科技有限公司 | A kind of construction method of the human face recognition model based on ParaSoftMax loss function and application |
CN110223281A (en) * | 2019-06-06 | 2019-09-10 | 东北大学 | A kind of Lung neoplasm image classification method when in data set containing uncertain data |
CN110674880A (en) * | 2019-09-27 | 2020-01-10 | 北京迈格威科技有限公司 | Network training method, device, medium and electronic equipment for knowledge distillation |
CN112329617A (en) * | 2020-11-04 | 2021-02-05 | 中国科学院自动化研究所 | New scene face recognition model construction method and system based on single source domain sample |
CN112183670A (en) * | 2020-11-05 | 2021-01-05 | 南开大学 | Knowledge distillation-based few-sample false news detection method |
CN112686046A (en) * | 2021-01-06 | 2021-04-20 | 上海明略人工智能(集团)有限公司 | Model training method, device, equipment and computer readable medium |
Also Published As
Publication number | Publication date |
---|---|
CN113158902A (en) | 2021-07-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105022835B (en) | A kind of intelligent perception big data public safety recognition methods and system | |
CN110457677B (en) | Entity relationship identification method and device, storage medium and computer equipment | |
CN110675421B (en) | Depth image collaborative segmentation method based on few labeling frames | |
CN108427713A (en) | A kind of video summarization method and system for homemade video | |
CN104636755A (en) | Face beauty evaluation method based on deep learning | |
CN109635726B (en) | Landslide identification method based on combination of symmetric deep network and multi-scale pooling | |
CN106649663A (en) | Video copy detection method based on compact video representation | |
CN112749663B (en) | Agricultural fruit maturity detection system based on Internet of things and CCNN model | |
CN111507413A (en) | City management case image recognition method based on dictionary learning | |
CN106250925B (en) | A kind of zero Sample video classification method based on improved canonical correlation analysis | |
CN113705383A (en) | Cross-age face recognition method and system based on ternary constraint | |
CN111291705B (en) | Pedestrian re-identification method crossing multiple target domains | |
CN111461162B (en) | Zero-sample target detection model and establishing method thereof | |
CN115546553A (en) | Zero sample classification method based on dynamic feature extraction and attribute correction | |
CN113657473A (en) | Web service classification method based on transfer learning | |
CN109242039A (en) | It is a kind of based on candidates estimation Unlabeled data utilize method | |
CN113158902B (en) | Knowledge distillation-based method for automatically training recognition model | |
CN111582195B (en) | Construction method of Chinese lip language monosyllabic recognition classifier | |
CN109829887B (en) | Image quality evaluation method based on deep neural network | |
CN116597438A (en) | Improved fruit identification method and system based on Yolov5 | |
CN110377790A (en) | A kind of video automatic marking method based on multi-modal privately owned feature | |
CN112580569B (en) | Vehicle re-identification method and device based on multidimensional features | |
CN115293144A (en) | Method and device for recognizing white characters based on zero sample learning | |
CN113051962B (en) | Pedestrian re-identification method based on twin Margin-Softmax network combined attention machine | |
CN112990892A (en) | Video information acquisition method and image processing system for teaching evaluation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |