CN113158902B - Knowledge distillation-based method for automatically training recognition model - Google Patents

Knowledge distillation-based method for automatically training recognition model Download PDF

Info

Publication number
CN113158902B
CN113158902B CN202110439569.8A CN202110439569A CN113158902B CN 113158902 B CN113158902 B CN 113158902B CN 202110439569 A CN202110439569 A CN 202110439569A CN 113158902 B CN113158902 B CN 113158902B
Authority
CN
China
Prior art keywords
model
data
calculation amount
training
small calculation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110439569.8A
Other languages
Chinese (zh)
Other versions
CN113158902A (en
Inventor
朱鑫懿
魏文应
张世雄
龙仕强
陈智敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Instritute Of Intelligent Video Audio Technology Longgang Shenzhen
Original Assignee
Instritute Of Intelligent Video Audio Technology Longgang Shenzhen
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Instritute Of Intelligent Video Audio Technology Longgang Shenzhen filed Critical Instritute Of Intelligent Video Audio Technology Longgang Shenzhen
Priority to CN202110439569.8A priority Critical patent/CN113158902B/en
Publication of CN113158902A publication Critical patent/CN113158902A/en
Application granted granted Critical
Publication of CN113158902B publication Critical patent/CN113158902B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

A method for automatically training a recognition model based on knowledge distillation, comprising s1: the terminal equipment uploads the collected face image data to a server; s2: the server side extracts the collected face features by using a model with large calculation amount and generates corresponding soft label data; s3: mixing soft tag data and manually marked hard tag data, and training a model with small calculated amount by using the mixed data; and s4: after the model with small calculation amount is trained, selecting an optimal model with small calculation amount and updating the optimal model to each terminal device. The face recognition model lifting method based on knowledge distillation can realize automatic data acquisition, data labeling and model training, and is used for improving efficiency and saving labor cost.

Description

Knowledge distillation-based method for automatically training recognition model
Technical Field
The invention relates to the technical field of image recognition, in particular to a knowledge distillation-based method for automatically training a recognition model.
Background
With the development of artificial intelligence and computing hardware, image recognition technology based on deep learning is widely applied in various fields, for example, face recognition is one of the most successful and mature applications in the field of computer vision, and is applied to scenes such as mobile phone face brushing unlocking, company face attendance, face passing in famous navigation, face brushing payment in markets and the like. The development of deep learning and mass data in big data age make the face recognition technology surpass the traditional face recognition algorithm.
In order to better identify the model accuracy, a model with large calculation amount is generally selected, the corresponding calculation speed is slow, the real-time requirement on the terminal equipment is difficult to achieve, and the cost for achieving the real-time requirement on the server side is high. In the terminal device, a small-calculation-amount model is generally used to meet the real-time requirement, but the accuracy is lower than that of a large-calculation-amount model. Compared with other image recognition fields, the face recognition needs a larger amount of data, namely tens of millions of data are needed, and more data can reach more than one hundred million. The label of the face data used by the conventional general model training method is a hard label, massive data are cleaned and marked by high-cost manpower, and a large amount of manpower and time cost are consumed for obtaining the hard label. When the amount of face data rises to a certain extent, a labeling person also has difficulty in distinguishing different faces, errors are more easily labeled, and obtaining high-quality labeling data becomes more difficult.
Disclosure of Invention
The invention provides a knowledge distillation-based automatic training recognition model method, and provides a knowledge distillation-based face recognition model lifting method.
The technical scheme of the invention is as follows:
a method for automatically training a recognition model based on knowledge distillation, comprising the steps of: s1: the terminal equipment uploads the collected face image data to a server; s2: the server side extracts the collected face features by using a model with large calculation amount and generates corresponding soft label data; s3: mixing soft tag data and manually marked hard tag data, and training a model with small calculated amount by using the mixed data; and s4: after the model with small calculation amount is trained, selecting an optimal model with small calculation amount and updating the optimal model to each terminal device.
Preferably, in the above method, in step s1, the terminal device collects a face image and stores the collected image on the local storage device, and uploads data to the server through the network at night when no person uses the image.
Preferably, in the above method, in step s2, after the number of the received images reaches a certain number, the server extracts the face features as soft tag data of the face image by using the model with large calculation amount trained by the high-quality labeling data.
Preferably, in the above method, in step s3, the model with small calculation amount is trained, and the model with small calculation amount is trained on the server side by using a combination of a knowledge distillation method and a general method, wherein the knowledge distillation method is used for training the model with small calculation amount by soft labels of images extracted from the model with large calculation amount, and the general method is used for training the model with small calculation amount by high quality labeling data.
Preferably, in the above method, in step s3, for the ith sample in the soft tag data, the face feature is extracted using a model with small calculation amount, and the loss value is calculated using cosine similarity, where the expression is as shown in (1):
wherein < > is the cosine similarity of the feature sums, M is the number of samples; for hard tag data, using the general method, the Softmax-based loss value, the expression is shown in (2):
where wj is the mean of the features of the j-th class learned by the last layer of the computationally inexpensive model.
Preferably, in the above method, after the training of the model with small calculation amount is completed in step s4, the model with small calculation amount is tested on the server, and the optimal model with small calculation amount is selected and updated to each terminal.
According to the technical scheme of the invention, the beneficial effects are that:
the method of the invention utilizes the model with large calculation amount to improve the accuracy of the model with small calculation amount, so that the model with small calculation amount on the terminal equipment can achieve real-time accuracy similar to the model with large calculation amount; the data marking in the method does not need manpower marking, and reduces the manpower cost and marking difficulty.
For a better understanding and explanation of the conception, working principle and inventive effect of the present invention, the present invention is described in detail below by way of specific examples with reference to the accompanying drawings, in which:
drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
FIG. 1 is a flow diagram of a method of automatically training an identification model based on knowledge distillation in accordance with the present invention.
FIG. 2 is a flow chart of the method steps of the knowledge-based automated training recognition model of the present invention.
Detailed Description
The invention relates to a knowledge distillation-based automatic training recognition model method, wherein distillation is to assist in training a model with small calculation amount (short for a small model) through a model with large calculation amount (short for a large model), so as to improve the recognition performance of the model with small calculation amount, and specifically, a soft label is generated for the model with large calculation amount to train the model with small calculation amount (a hard label corresponding to the soft label is generally called a manually marked label, and the label generated by the model is a soft label). The method starts from the performance difference of the complex model with large calculated amount and the model with small calculated amount, and improves the face recognition accuracy through the model with large calculated amount and the model with small calculated amount, and simultaneously can realize automatic data acquisition, data labeling and model training, and reduce the cost of manpower and material resources required by data cleaning.
The principle of the invention is as follows: 1. ) The knowledge distillation method is adopted, namely, the model with large calculation amount is used for extracting the face features from the data, and the model with small calculation amount is used for learning the face features extracted by the model with large calculation amount, so that the performance similar to that of the model with large calculation amount is achieved. 2. ) The face features are extracted through the model with large calculation amount to serve as a soft tag, and compared with a common hard tag, the soft tag does not need additional manpower for marking, so that a large amount of manpower cost is saved. 3. ) The terminal equipment collects face image data, uploads the face image data in batches to the server at regular time, extracts face features at the server to generate soft labels and train, and distributes a trained model with small calculated amount to each terminal, so that automatic collection and training are realized.
The method for automatically training the recognition model based on knowledge distillation provided by the discovery comprises 4 parts: the terminal uploads the face image to the server to provide training data; generating a soft label corresponding to the data on the server by using a model with large calculation amount, and using the soft label for assisting model training with small calculation amount; training a model with small calculation amount by using the collected face image data, the corresponding soft label and the labeled face image data and the corresponding hard label; and updating the trained model with small calculation amount to each terminal. The method for automatically training the recognition model based on the knowledge distillation comprises the following steps from data acquisition to knowledge distillation based model training and updating, as shown in fig. 1 and 2:
s1: and the terminal equipment uploads the acquired face image data to the server. Specifically, the terminal device collects face images and stores the collected images on the local storage device, and uploads data to the server through the network at a specific time, for example, the face data is uploaded at night without people so as to improve transmission efficiency. The terminal equipment is generally hardware equipment with lower cost, and only a model with small calculation amount can meet the requirement of real-time performance on the equipment.
s2: the server extracts the collected face features by using a model with large calculation amount, and generates corresponding soft label data (namely, the large model shown in fig. 1 generates the soft label). Specifically, after the number of the received images reaches a certain number, the server extracts the face features as soft label data of the face images by using a model with large calculated amount trained by high-quality labeling data. The soft label for the ith sample is denoted as Ti, where Ti is typically high latitude data (128 to 1024 dimensions). The marked high-quality face data is prepared, the hard tag of the ith sample in the data is marked as yi, the range of the hard tag is [0, n ], and n is the number of different people in the data set.
s3: the soft tag data and the artificially labeled hard tag data are mixed, and a computationally small model (i.e., the small model training shown in fig. 1) is trained using the mixed data. Specifically, the model with small calculation amount is trained on the server side by using a combination of a knowledge distillation method and a general method, the knowledge distillation method is used for training the model with small calculation amount through soft labels of images extracted from the model with large calculation amount, and the general method is used for training the model with small calculation amount through high-quality labeling data. For the ith sample in the soft label data, extracting the face characteristics of the sample by using a model with small calculation amount, and calculating the loss value of the sample by using cosine similarity, wherein the expression is as shown in (1):
where < > is the cosine similarity of the feature sums and M is the number of samples. For hard tag data, the general method, namely Softmax-based loss value, is used, and the expression is shown in (2):
wherein w is j The mean value of the features of the j-th class learned for the last layer of the model with small calculation amount. Training is carried out by mixing the acquired data and the high-quality labeling data, and the accuracy and generalization performance of the model with small calculation amount are improved.
s4: after the model with small calculation amount is trained, selecting an optimal model with small calculation amount and updating the optimal model to each terminal device. Specifically, after the model with small calculation amount is trained, the model with small calculation amount is tested on the server, and the optimal model with small calculation amount is selected and updated on each terminal.
The method adopts a TAR-FAR test mode. Where TAR (TrueAcceptanceRate) refers to the ratio of the actual passage of the face that should pass, i.e., the higher the passage, the better the model performance. FAR (FalseAcceptanceRate) is the ratio of the passing faces which should not pass, namely the false recognition rate, and the lower the false recognition rate is, the better the performance of the model is. Let the passing rate at a certain false recognition rate be used as a measure, for example, "tar@far=1e-6" is expressed as the passing rate at FAR of 1e-6 (parts per million false recognition rate). The test results of the inventive method on the same test data are shown in table 1 below:
table 1: test results of the inventive method on test data
From table 1, it can be seen that the model with small calculation amount obtained by training the distillation method provided by the invention has a higher improvement in accuracy, which is close to the model with large calculation amount, than the model with small calculation amount obtained by training the general method.
The above description is of the best mode of carrying out the conception and the working principle of the present invention. The above examples should not be construed as limiting the scope of the claims, but other embodiments and combinations of implementations according to the inventive concept are within the scope of the invention.

Claims (3)

1. A method for automatically training an identification model based on knowledge distillation, comprising the steps of:
s1: the terminal equipment uploads the collected face image data to a server;
s2: the server uses a model with large calculation amount to extract the collected face characteristics and generate corresponding soft label data,
specifically, after the number of the received images reaches a certain number, the server extracts the face features as soft label data of the face images by using a model with large calculated amount trained by high-quality labeling data;
the soft label for the ith sample is denoted as T i Wherein T is i For high latitude data, preparing marked high-quality face data, wherein the hard tag of the ith sample in the data is marked as y i In the range of [0, n ]]N is the number of different people in the dataset;
s3: mixing the soft tag data with artificially labeled hard tag data, training a model with small calculation amount by using the mixed data,
training the model with small calculation amount, and training the model with small calculation amount by using a combination of a knowledge distillation method and a general method on the server, wherein the knowledge distillation method is used for training the model with small calculation amount by using soft labels of images extracted from the model with large calculation amount, the general method is used for training the model with small calculation amount by using high-quality labeling data,
for the ith sample in the soft label data, extracting the face characteristics of the sample by using the model with small calculation amount, and calculating the loss value by using cosine similarity, wherein the expression is as shown in (1):
wherein cos is<S i ,T i >The cosine similarity of the feature sums is that M is the number of samples; for hard tag data, using the general method, the Softmax-based loss value, the expression is as shown in (2):
wherein w is j The mean value of the features of the j-th class learned for the last layer of the model with small calculation amount; and
s4: and after the model with small calculated amount is trained, selecting an optimal model with small calculated amount and updating the optimal model to each terminal device.
2. The method according to claim 1, wherein in step s1, the terminal device collects face images and stores the collected images on a local storage device, and uploads data to the server via a network at night when no person is using.
3. The method according to claim 1, wherein in step s4, after the training of the model with small calculation amount is completed, the model with small calculation amount is tested on the server side, and the optimal model with small calculation amount is selected and updated on each terminal.
CN202110439569.8A 2021-04-23 2021-04-23 Knowledge distillation-based method for automatically training recognition model Active CN113158902B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110439569.8A CN113158902B (en) 2021-04-23 2021-04-23 Knowledge distillation-based method for automatically training recognition model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110439569.8A CN113158902B (en) 2021-04-23 2021-04-23 Knowledge distillation-based method for automatically training recognition model

Publications (2)

Publication Number Publication Date
CN113158902A CN113158902A (en) 2021-07-23
CN113158902B true CN113158902B (en) 2023-08-11

Family

ID=76869735

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110439569.8A Active CN113158902B (en) 2021-04-23 2021-04-23 Knowledge distillation-based method for automatically training recognition model

Country Status (1)

Country Link
CN (1) CN113158902B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109190521A (en) * 2018-08-17 2019-01-11 北京亮亮视野科技有限公司 A kind of construction method of the human face recognition model of knowledge based purification and application
CN109214360A (en) * 2018-10-15 2019-01-15 北京亮亮视野科技有限公司 A kind of construction method of the human face recognition model based on ParaSoftMax loss function and application
CN110223281A (en) * 2019-06-06 2019-09-10 东北大学 A kind of Lung neoplasm image classification method when in data set containing uncertain data
CN110674880A (en) * 2019-09-27 2020-01-10 北京迈格威科技有限公司 Network training method, device, medium and electronic equipment for knowledge distillation
CN112183670A (en) * 2020-11-05 2021-01-05 南开大学 Knowledge distillation-based few-sample false news detection method
CN112329617A (en) * 2020-11-04 2021-02-05 中国科学院自动化研究所 New scene face recognition model construction method and system based on single source domain sample
CN112686046A (en) * 2021-01-06 2021-04-20 上海明略人工智能(集团)有限公司 Model training method, device, equipment and computer readable medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11410029B2 (en) * 2018-01-02 2022-08-09 International Business Machines Corporation Soft label generation for knowledge distillation

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109190521A (en) * 2018-08-17 2019-01-11 北京亮亮视野科技有限公司 A kind of construction method of the human face recognition model of knowledge based purification and application
CN109214360A (en) * 2018-10-15 2019-01-15 北京亮亮视野科技有限公司 A kind of construction method of the human face recognition model based on ParaSoftMax loss function and application
CN110223281A (en) * 2019-06-06 2019-09-10 东北大学 A kind of Lung neoplasm image classification method when in data set containing uncertain data
CN110674880A (en) * 2019-09-27 2020-01-10 北京迈格威科技有限公司 Network training method, device, medium and electronic equipment for knowledge distillation
CN112329617A (en) * 2020-11-04 2021-02-05 中国科学院自动化研究所 New scene face recognition model construction method and system based on single source domain sample
CN112183670A (en) * 2020-11-05 2021-01-05 南开大学 Knowledge distillation-based few-sample false news detection method
CN112686046A (en) * 2021-01-06 2021-04-20 上海明略人工智能(集团)有限公司 Model training method, device, equipment and computer readable medium

Also Published As

Publication number Publication date
CN113158902A (en) 2021-07-23

Similar Documents

Publication Publication Date Title
CN105022835B (en) A kind of intelligent perception big data public safety recognition methods and system
CN110457677B (en) Entity relationship identification method and device, storage medium and computer equipment
CN110675421B (en) Depth image collaborative segmentation method based on few labeling frames
CN108427713A (en) A kind of video summarization method and system for homemade video
CN104636755A (en) Face beauty evaluation method based on deep learning
CN109635726B (en) Landslide identification method based on combination of symmetric deep network and multi-scale pooling
CN106649663A (en) Video copy detection method based on compact video representation
CN112749663B (en) Agricultural fruit maturity detection system based on Internet of things and CCNN model
CN111507413A (en) City management case image recognition method based on dictionary learning
CN106250925B (en) A kind of zero Sample video classification method based on improved canonical correlation analysis
CN113705383A (en) Cross-age face recognition method and system based on ternary constraint
CN111291705B (en) Pedestrian re-identification method crossing multiple target domains
CN111461162B (en) Zero-sample target detection model and establishing method thereof
CN115546553A (en) Zero sample classification method based on dynamic feature extraction and attribute correction
CN113657473A (en) Web service classification method based on transfer learning
CN109242039A (en) It is a kind of based on candidates estimation Unlabeled data utilize method
CN113158902B (en) Knowledge distillation-based method for automatically training recognition model
CN111582195B (en) Construction method of Chinese lip language monosyllabic recognition classifier
CN109829887B (en) Image quality evaluation method based on deep neural network
CN116597438A (en) Improved fruit identification method and system based on Yolov5
CN110377790A (en) A kind of video automatic marking method based on multi-modal privately owned feature
CN112580569B (en) Vehicle re-identification method and device based on multidimensional features
CN115293144A (en) Method and device for recognizing white characters based on zero sample learning
CN113051962B (en) Pedestrian re-identification method based on twin Margin-Softmax network combined attention machine
CN112990892A (en) Video information acquisition method and image processing system for teaching evaluation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant