CN114021720A

CN114021720A - Label screening method and device

Info

Publication number: CN114021720A
Application number: CN202111164295.2A
Authority: CN
Inventors: 李远辉; 王奇刚; 舒红乔
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2022-02-08

Abstract

The embodiment of the application discloses a label screening method and a label screening device, wherein guide labels corresponding to a plurality of teacher models are obtained, and the guide labels are obtained by predicting training samples through the teacher models; screening the guide tags according to the real tags, and determining target tags meeting screening conditions; carrying out distillation guidance on the student model according to the target label to obtain an updated student model; by applying the method provided by the embodiment of the application, the student model with higher precision can be obtained by training in a shorter time.

Description

Label screening method and device

Technical Field

The application relates to the technical field of neural networks, in particular to a label screening method and device.

Background

When the model deployment application is made, the student model can be guided through the teacher model, so that the student can reach higher precision under the guidance of the teacher model. However, for some complex scenes, the teacher model can achieve higher precision through self-learning training due to complex structure, but in such a scene, the teacher model guides the student model not only to enable the student model to obtain higher precision, but also to cause that the task scene is too complex, which leads to the precision of the student model to be reduced again compared with that of the student model under a single scene.

Disclosure of Invention

The application provides a label screening method and a label screening device, which are used for at least solving the technical problems in the prior art.

One aspect of the present application provides a method for screening a tag, the method including: obtaining guide labels corresponding to a plurality of teacher models, wherein the guide labels are obtained by predicting training samples through the teacher models; screening the guide tags according to the real tags, and determining target tags meeting screening conditions; and carrying out distillation guidance on the student model according to the target label to obtain an updated student model.

In an embodiment, the obtaining the guidance labels corresponding to the plurality of teacher models includes: obtaining a training sample, wherein the training sample is a single scene training sample, and the single scene training sample is marked with a real label with at least one dimension; obtaining a plurality of teacher models for a single scene, wherein different teacher models have different advantages for the training sample; and predicting the training samples through the plurality of teacher models to obtain the guide label of the at least one dimension.

In an embodiment, the obtaining the guidance labels corresponding to the plurality of teacher models includes: obtaining training samples, wherein the training samples are a plurality of scene training samples, and the scene training samples are marked with real labels with a plurality of dimensions; obtaining a teacher model for each of a plurality of scenes; each teacher model is used for predicting real labels of at least one dimension in a scene; training the training samples through the teacher models to obtain the guidance labels with multiple dimensions.

In an embodiment, the screening the guide tags according to the real tags and determining the target tags meeting the screening condition includes: comparing the guide label with the real label to obtain a first comparison value; and adjusting the guide tag according to the first comparison value to determine the target tag.

In an embodiment, the adjusting the guidance tag according to the first comparison value to determine the target tag includes: screening first comparison values corresponding to the same training sample, and determining the first comparison value with the minimum value; and determining the guide label corresponding to the first comparison value with the minimum numerical value as a target label.

In an embodiment, the adjusting the guidance tag according to the first comparison value to determine the target tag includes: determining an advantage weight based on a first comparison value corresponding to the same training sample; and weighting the instruction label according to the advantage weight value to obtain a target label.

In an embodiment, the distilling guidance of the student model according to the target label to obtain an updated student model includes: training the student model through a training sample corresponding to the target label to obtain a training label; determining guidance data according to the target label and the training label; and carrying out distillation guidance on the student model according to the guidance data to obtain an updated student model.

In an embodiment, the determining the guidance data according to the target label and the training label includes: comparing the target label with the training label to obtain a second comparison value; comparing the target label with the real label to obtain a third comparison value; and integrating the second comparison value and the third comparison value to obtain guidance data.

In an embodiment, the determining the guidance data according to the target label and the training label includes: performing data extraction on the middle layer of the teacher model corresponding to the target label to obtain first middle data; extracting data of the middle layer of the student model corresponding to the training label to obtain second middle data; integrating the first intermediate data and the second intermediate data to determine a fourth comparison value; integrating the fourth comparison value and the second comparison value to obtain a fifth comparison value; wherein the fifth comparison value is used to determine the instructional data.

This application another aspect provides a label screening apparatus, the apparatus includes: the obtaining module is used for obtaining guide labels corresponding to a plurality of teacher models, and the guide labels are obtained by predicting the training samples through the teacher models; the screening module is used for screening the guide tag according to the real tag and determining a target tag meeting screening conditions; and the guiding module is used for carrying out distillation guidance on the student model according to the target label to obtain an updated student model.

In an embodiment, the obtaining module includes: the obtaining submodule is used for obtaining a training sample, the training sample is a single scene training sample, and the single scene training sample is marked with a real label with at least one dimension; the obtaining submodule is further used for obtaining a plurality of teacher models for a single scene, wherein different teacher models have different advantages for the training samples; and the first training submodule is used for predicting the training samples through the plurality of teacher models to obtain the guide label of at least one dimension.

In an implementation manner, the obtaining sub-module is further configured to obtain training samples, where the training samples are a plurality of scene training samples, and the scene training samples are labeled with real labels with multiple dimensions; the obtaining sub-module is further used for obtaining a teacher model for each scene in a plurality of scenes; each teacher model is used for predicting real labels of at least one dimension in a scene; the first training submodule is further configured to train the training samples through the multiple teacher models to obtain the guidance labels of the multiple dimensions.

In one embodiment, the screening module includes: the comparison submodule is used for comparing the guide label with the real label to obtain a first comparison value; and the adjusting submodule is used for adjusting the guide tag according to the first comparison value so as to determine the target tag.

In one embodiment, the adjusting sub-module includes: screening first comparison values corresponding to the same training sample, and determining the first comparison value with the minimum value; and determining the guide label corresponding to the first comparison value with the minimum numerical value as a target label.

In one embodiment, the adjusting sub-module includes: determining an advantage weight based on a first comparison value corresponding to the same training sample; and weighting the instruction label according to the advantage weight value to obtain a target label.

In one embodiment, the guidance module includes: the second training submodule is used for training the student model through a training sample corresponding to the target label to obtain a training label; a determining submodule for determining guidance data according to the target label and the training label; and the distillation submodule is used for carrying out distillation guidance on the student model according to the guidance data to obtain an updated student model.

In one embodiment, the determining sub-module includes: comparing the target label with the training label to obtain a second comparison value; comparing the target label with the real label to obtain a third comparison value; and integrating the second comparison value and the third comparison value to obtain guidance data.

In one embodiment, the determining sub-module includes: performing data extraction on the middle layer of the teacher model corresponding to the target label to obtain first middle data; extracting data of the middle layer of the student model corresponding to the training label to obtain second middle data; integrating the first intermediate data and the second intermediate data to determine a fourth comparison value; integrating the fourth comparison value and the second comparison value to obtain a fifth comparison value; wherein the fifth comparison value is used to determine the instructional data.

The label screening method provided by the application screens guiding labels predicted by a plurality of teacher models through screening conditions to screen out target labels suitable for distillation guidance of student models, and distillation training is performed on the student models by using the target labels obtained through screening, so that the obtained student models can learn the favorable advantages of at least one teacher model, and the student models with simple structures can have higher precision in a single scene or a plurality of scenes.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present application will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present application are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:

in the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Fig. 1 is a schematic flow chart illustrating an implementation of a tag screening method according to a first embodiment of the present application;

fig. 2 is a schematic flow chart illustrating an implementation of a tag screening method according to a second embodiment of the present application;

fig. 3 is a schematic flow chart illustrating an implementation of a tag screening method according to a third embodiment of the present application;

fig. 4 is a schematic flow chart illustrating an implementation of a tag screening method according to a fourth embodiment of the present application;

fig. 5 is a schematic flow chart illustrating an implementation of a tag screening method according to a fifth embodiment of the present application;

FIG. 6 is a single scene distillation architecture diagram of a label screening method according to a sixth embodiment of the present application;

FIG. 7 is a diagram illustrating a plurality of scene distillation architectures of a tag screening method according to a seventh embodiment of the present application;

FIG. 8 is a single scenario distillation flow chart of a tag screening method according to a seventh embodiment of the present application;

fig. 9 is a schematic diagram of an implementation module of a tag screening apparatus according to a first embodiment of the present application.

Detailed Description

In order to make the objects, features and advantages of the present application more obvious and understandable, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Fig. 1 is a schematic flow chart illustrating an implementation of a tag screening method according to a first embodiment of the present application.

Referring to fig. 1, in one aspect, the present application provides a method for screening a tag, the method including: operation 101, obtaining guide labels corresponding to a plurality of teacher models, wherein the guide labels are obtained by predicting training samples through the teacher models; operation 102, screening the guide tags according to the real tags, and determining target tags meeting screening conditions; and operation 103, performing distillation guidance on the student model according to the target label to obtain an updated student model.

According to the label screening method, the guiding labels predicted by the multiple teacher models are screened through specific screening conditions to screen out the target labels suitable for distillation guidance of the student models, the student models are subjected to distillation training by using the target labels obtained through screening, the student models obtained through screening can learn the favorable advantages of each teacher model, and therefore a student model which can be comparable to or even stronger than all the teacher models is formed. The target label obtained by screening by the method can be used for distilling guidance of the student model, so that the student model with a simple structure can have high identification precision in a single scene or multiple scenes in a short time, and the learning model with the simple structure and high precision can be obtained in a short time and deployed to edge equipment with limited resources.

In operation 101, the teacher model is a trained model, and the model parameters of the plurality of teacher models are different, specifically, the plurality of teacher models may be models of different architectures, models of different preset parameters, models of different types, and models trained by different teacher training samples, and only the trained teacher models need to be different. Further, the teacher model can be selected as a model with a more complex structure and a larger parameter amount compared with the student model. Further, the teacher model may be selected as a model having accuracy advantage in training samples corresponding to the student model. Further, the teacher training sample used to train the teacher model may be the same as or different from the student training sample used to train the student model.

The instruction label is obtained by predicting input training samples for a trained teacher model, and needs to be explained, wherein the training samples refer to student training samples for training students. Specifically, each teacher model predicts each training sample in the student training samples, and the guidance label corresponding to each training sample can be obtained. For example, trained teacher models T1, T2, and T3 … with different model parameters are obtained, a student training sample set D for training the student models is obtained, the training sample D includes a plurality of training samples D1, D2, and D3 …, then the teacher model T1 predicts the training samples D1, D2, and D3 …, and guidance labels T1D1, T1D2, and T1D3 … corresponding to each training sample are obtained, similarly, guidance labels T2D1, T2D2, and T2D3 … corresponding to the teacher model T2 can be obtained, and guidance labels T3D1, T3D2, and T3D3 … corresponding to the student model T3 are obtained, which will not be described in detail below.

In operation 102 of the method, the real label is a label corresponding to a student training sample used for training a student model, and the real label may be obtained by manually labeling the student training sample or by predicting by another model that is not any teacher model. The screening condition may be to screen out the most accurate guidance tag, or screen out the guidance tag that can embody the guidance advantages of the teacher model, or other conditions that help to distill and guide the student model so that the student model has higher accuracy. For example, the screening condition may be set as a guidance tag closest to the real tag, or set as a guidance tag whose proximity to the real tag is within a certain threshold, and the guidance tag satisfying the screening condition is determined as the target tag, and it is to be added that the target tags obtained by screening may be one or more than one according to the setting of the screening condition. Specifically, with reference to the implementation scenario, the real label corresponding to the training sample D1 is D1, the real label corresponding to the training sample D2 is D2, the real label corresponding to the training sample D3 is D3 …, T1D1, T2D1, and T3D1 are screened through the real label D1, T1D2, T2D2, and T3D2 are screened through the real label D2, T1D3, T2D3, and T3D3 are screened through the real label D3, a target label satisfying a screening condition is obtained, for example, a guidance label whose screening condition is set to be within a certain threshold range of proximity to the real label, the target label obtained by screening is a guidance label T1D1 corresponding to the training sample D1, a guidance label T1D2, T2D2 corresponding to the training sample D2, and a guidance label T1D3 … corresponding to the training sample D3

In the method operation 103, the method may directly utilize the target label to perform distillation guidance on the student model, so that the student model learns the advantages of the teacher model corresponding to the target label, and may also synthesize the target label and other training related data to perform distillation guidance on the student model, so as to obtain the student model that learns the main advantages of each teacher model. By combining the implementation scene, student training samples D1, D2 and D3 … for training the student models and target labels obtained by screening are T1D1, T1D2, T2D2 and T1D3 …, and the student models are subjected to distillation guidance through the target labels and the student training samples, so that the most adept parts corresponding to the student training samples of each model are obtained, the student models can be effectively updated in a short time, the advantages of a plurality of teacher models can be still reflected under the condition that the student models have smaller structures and fewer parameters, and higher prediction accuracy is obtained.

It can be found that the method can reduce the training time of the student model in a complex scene, and if the student model independently trains the training samples corresponding to the complex scene, the training time of the student model needs to be increased in order to achieve the purpose of high precision as much as possible. Under the training of the distillation guidance of the teacher model, the student model can reduce the learning cost, and further, the high-precision purpose which can be achieved only by long-time training of the student model can be achieved only by a small amount of training time. In addition, the method makes full use of different advantages of each teacher model for the training samples, namely analysis of the importance of the teacher model for the adequacy knowledge points of different training samples and the contribution of the teacher model to the precision of different training samples, so that the precision of the student model to each training sample can be improved in the training process of the student model, the goodness and badness consideration information of the teacher model can be obtained through the instruction label of the teacher model in the training process, and the robustness of the student model for the precision growth of the complex training samples in the distillation process is accelerated. By the operation, the high-precision student models in a single scene or a plurality of scenes can be obtained simultaneously by applying the method, and the deployment progress and quality of small models such as the student models are accelerated.

Fig. 2 is a schematic flow chart illustrating an implementation of a tag screening method according to a second embodiment of the present application.

Referring to FIG. 2, in one possible embodiment, an operation 101 for obtaining instructional tags corresponding to a plurality of teacher models includes: in operation 1011, a training sample is obtained, where the training sample is a single scene training sample, and the single scene training sample is marked with a real label with at least one dimension; at operation 1012, obtaining a plurality of teacher models for a single scene, wherein different teacher models have different advantages for training samples; in operation 1013, the training samples are predicted by a plurality of teacher models, and a guidance label of at least one dimension is obtained.

Specifically, according to different training requirements of the student models, corresponding student training samples can be selected, and the selected student training samples are used for selecting the guidance labels obtained by the teacher model in a prediction mode, so that the most advantageous prediction guidance for the training samples is obtained.

In operation 1011, the training requirement of the student model is to obtain a student model adapted to a single scene. In the following, a single scene is explained, where the single scene refers to predicting the same type of information to be predicted and obtaining the prediction result with the same dimension. Further, the following explains dimensions, and the same dimension refers to the same type of label, such as a face recognition result label, a gender recognition result label, and the like. One or more types of dimensions may be included in the prediction results for the same dimension. For example, when the model is an image recognition model, the information to be predicted may be a picture including a human face, when the prediction result of the same dimension is one dimension, the prediction result may be a human face recognition result, and when the prediction result of the same dimension is more than one multi-dimension, the prediction result may include dimensional contents such as a human face recognition result and a gender recognition result. It should be further noted that the label referred by the method may also be a label in a specific range in the picture, for example, when the prediction result in the same dimension is more than one multi-dimension, the prediction result may include the contents of dimensions such as a face recognition result, a gender recognition result, and a face labeling range. In a specific implementation scenario, for example, the training sample includes a road picture acquired by a road camera, and a picture is manually marked to correspond to a real label with four dimensions, namely, road type, number of automobiles, automobile color and automobile model.

In operation 1012, a plurality of teacher models corresponding to the training samples of the single scene are obtained, the teacher models may be obtained by training the training samples partially or completely different from the training samples of the single scene, and specifically, the teacher models may be obtained by training the training samples of the same type and with completely different contents from the training samples of the single scene. Further, the training samples used to train the teacher model may be labeled with all or part of the labels corresponding to the real labels, so that the teacher model can fully or partially predict the guidance labels corresponding to the training samples of a single scene. Specifically, the dimensionality of the training labels corresponding to the teacher model is determined according to the advantages of the teacher model, and in combination with the specific implementation scenario, if the teacher model to be trained excels in predicting the road type, the training samples and the corresponding training labels used for training the teacher model may be the training samples corresponding to the road type and the training labels corresponding to the road type, so that the trained teacher model can accurately predict the road type. Similarly, if the teacher model to be trained excels in predicting the labels of the two dimensions of the number of cars and the colors of cars, the training samples and the corresponding training labels used for training the teacher model may include two dimensions corresponding to the number of cars and the colors of cars, so that the trained teacher model can accurately predict the number of cars and the colors of cars. Wherein the advantages of the model can be predetermined based on prior knowledge.

In operation 1013, after the teacher model is trained, each trained model is made to predict the student training samples for training the student model to obtain the guidance labels corresponding to each dimension that the student model needs to predict. In combination with the above specific implementation scenario, each trained teacher model needs to predict four dimensions of road type, number of cars, car color, and car model to obtain guidance labels corresponding to the four dimensions of road type, number of cars, car color, and car model.

Fig. 3 is a schematic flow chart illustrating an implementation of a tag screening method according to a third embodiment of the present application.

Referring to FIG. 3, in one possible embodiment, an operation 101 of obtaining tutorial tags corresponding to a plurality of teacher models includes: operation 1014, obtaining training samples, where the training samples are multiple scene training samples, and the multiple scene training samples are labeled with real labels of multiple dimensions; at operation 1015, a teacher model for each of the plurality of scenes is obtained; each teacher model is used for predicting real labels of at least one dimension in a scene; at operation 1016, training the training samples through multiple teacher models to obtain guidance labels of multiple dimensions.

In operation 1014, the training requirements of the student model are to obtain a student model that is adapted to a plurality of scenarios. According to different training requirements, the method can perform the operations 1011-1013 to obtain the guidance label, perform the operations 1014-1016 to obtain the guidance label, and obtain the guidance label by combining the operations 1011-1013 and the operations 1014-1016. It should be clear that operations 1011-1013 and 1014-1016 are used to characterize two ways of obtaining the guidance tag, and operations 1011-1016 are to distinguish the steps for convenience of description, and there is no requirement for sequential execution between the two.

In the following, a plurality of scenarios are explained, where the plurality of scenarios refer to predicting different types of information to be predicted and obtaining prediction results of the same or different dimensions. For example, when the student model is an image recognition model, the information to be predicted may include a picture of only a vehicle, a picture of only a human face, a picture of a plurality of people as a whole, a picture of a pet, and the like. The multiple dimensions refer to the fact that multiple types of labels exist in a student training sample set, such as face recognition result labels, vehicle color detection labels, pet type detection labels and the like. It should be explained that the multiple dimensions refer to the labels of all training samples as multiple dimensions, and the corresponding real label of one training sample may include one or more dimensions. In an example implementation scenario, a training sample set including a plurality of training samples includes a vehicle picture, a human face picture, and a pet picture. The pet picture is correspondingly marked with one or more labels related to pets, the vehicle picture is correspondingly marked with one or more labels related to vehicles, and the face picture is correspondingly marked with one or more labels related to faces.

In operation 1015, one or more teacher models with higher prediction accuracy in each scene are selected according to the features of the information to be predicted and the tag features of the scene, so as to perform the acquisition of the guidance tag. Similarly, the teacher model is a trained teacher model. One or more teacher models with higher prediction accuracy in the scene can be selected after being predicted according to the prior knowledge. Further, it is understood that, in the case that training samples of the same scene correspond to real tags with multiple dimensions, a teacher model that excels in each dimension may be selected to predict the instructional tags corresponding to each dimension. In combination with the above specific implementation scenario, the pet picture includes a pet color dimension, a pet type dimension, and a pet size dimension, and then a teacher model good in predicting the pet color dimension, a teacher model good in predicting the pet type dimension, and a teacher model good in the pet size dimension may be selected to obtain the guidance label.

In operation 1016, in the same operation 1013, each teacher model predicts a student training sample for training the student model and predicts a guidance label for each dimension that the student model needs to predict, so that each teacher model predicts a guidance label for each dimension corresponding to the training sample.

Fig. 4 is a schematic flow chart illustrating an implementation of a tag screening method according to a fourth embodiment of the present application.

In one embodiment, the operation 102, screening the guidance tags according to the real tags, and determining the target tags meeting the screening condition, includes: operation 1021, comparing the guide tag and the real tag to obtain a first comparison value; at operation 1022, the guidance tag is adjusted according to the first comparison value to determine the target tag.

In a specific implementation scenario, the method sets the screening condition to be associated with a first comparison value between the guide label and the real label, the larger the first comparison value is, the larger the difference between the guide label and the real label is, and the method may determine the difference between the guide label and the real label based on the first comparison value, the smaller the difference is, that is, the more accurate the prediction result is, that is, the greater the advantage of the teacher model predicting the training sample is. Based on this, it can be understood that the method can determine the target tag through the first comparison value. Further, the target tag may be the same as or different from the guide tag, depending on the actual adjustment.

In one embodiment, operation 1022, adjusting the guidance tag according to the first comparison value to determine the target tag includes: firstly, screening first comparison values corresponding to the same training sample, and determining the first comparison value with the minimum value; then, the guide tag corresponding to the first comparison value having the smallest value is determined as the target tag.

In one implementation scenario, the target tag is the same as one of the instructional tags. Specifically, first comparison values corresponding to each teacher model and the same training sample are determined, and the first comparison values are compared to determine a first comparison value with a minimum value, where the first comparison value with the minimum value is a prediction result with the minimum difference between the training sample and the real label, that is, a most advantageous guidance label. The guide label can be directly determined as a target label for distillation guidance of the student model in training corresponding to the training sample.

In one embodiment, operation 1022, adjusting the guidance tag according to the first comparison value to determine the target tag includes: firstly, determining an advantage weight value based on a first comparison value corresponding to the same training sample; and then, weighting the target label and the guide label according to the advantage weight value to obtain the target label.

In another implementation scenario, the target label is different from the guide label. Specifically, first comparison values corresponding to each teacher model and the same training sample may be determined first, and the advantage weight corresponding to the guidance label is determined according to specific numerical values of the first comparison values, specifically, the smaller the first comparison value is, the better the teacher model is in predicting the training sample, the better the corresponding advantage is, and the larger the corresponding advantage weight is, and thus, the advantage weight corresponding to each guidance label may be determined. And then integrating the guide tags in a multiplying or adding mode, thereby determining the target tags. It needs to be supplemented that, this application can also play the effect of amplifying the target label through the dominant weight, through amplifying the target label, can increase the difference with real label during knowledge distillation, be favorable to the student model more to be close mr model's prediction scheme to further shorten student model's training time, improve student model's prediction accuracy.

Fig. 5 is a schematic flow chart illustrating an implementation of a tag screening method according to a fifth embodiment of the present application.

Referring to fig. 5, in an embodiment, operation 103, distilling guidance for the student model according to the target label, and obtaining an updated student model, includes: operation 1031, training the student model through the training samples corresponding to the target labels to obtain training labels; an operation 1032 of determining guidance data from the target label and the training labels; and operation 1033, performing distillation guidance on the student model according to the guidance data to obtain an updated student model.

The method specifically guides the distillation of the student model through the target label in the following way: the method comprises the steps of firstly carrying out forward training on a student model through one training sample in a training sample set to obtain a training label. And then determining a guide label of the teacher model corresponding to the training sample according to the training sample, and screening the guide label to determine a corresponding target label. And then, integrating the target label and the training label according to the real label to obtain guidance data for performing back propagation on the student model, and guiding the student model to perform back propagation through the guidance data to update the model so as to obtain the updated student model. By analogy, the operations are carried out on each training sample to realize the repeated updating of the student model until the student model meets the preset model prediction precision, and the student model meeting the requirements is obtained. It should be understood that the integration manner of integrating the target label and the training label according to the real label may be different.

In one possible embodiment, operation 1032 determines guidance data based on the target label and the training labels, comprising: firstly, comparing a target label with a training label to obtain a second comparison value; then, comparing the target label with the real label to obtain a third comparison value; and then, integrating the second comparison value and the third comparison value to obtain guidance data.

In a specific implementation scenario, the target label and the training label are integrated by comparing the target label with the training label to obtain a difference value between the target label and the training label, i.e. a second comparison value, where the second comparison value is used to represent a difference between the training label and the target label corresponding to the student model. And then comparing the target label with the real label to obtain a difference value between the target label and the real label, namely a third comparison value, wherein the third comparison value is used for representing the difference between the target label and the real label. And finally, integrating the second comparison value and the third comparison value to obtain guidance data, wherein the obtained guidance data comprises information of a real label and a target label, and the student model carries out directional propagation through the guidance data, so that the student model can obtain information from an advantage teacher model and information from a true label, and can comprehensively consider to obtain a more advantageous learning object, therefore, the method is favorable for screening out important information corresponding to the improvement of the prediction accuracy of the student model, and further provides guidance opinions to guide the student model to obtain a more excellent training effect.

In one possible embodiment, operation 1032 determines guidance data based on the target label and the training labels, comprising: firstly, data extraction is carried out on a middle layer of a teacher model corresponding to a target label to obtain first middle data; then, extracting data of the middle layer of the student model corresponding to the training label to obtain second middle data; then, integrating the first intermediate data and the second intermediate data to determine a fourth comparison value; then, integrating the fourth comparison value and the second comparison value to obtain a fifth comparison value; wherein the fifth comparison value is used to determine the instructional data.

In another specific implementation scenario, the method can also extract data of the middle layer of the student model of the middle layer of the teacher model so as to further correlate the student model and the model parameters of the teacher model, so that the student model can obtain more information related to the dominant teacher model, and obtain more comprehensive consideration as a learning object of the student model.

Specifically, the method extracts intermediate layer data of the dominant teacher model. It should be added that, when the target tag is determined as the guidance tag with the smallest first comparison value, the dominant teacher model is the teacher model corresponding to the target tag. When the target label is obtained by comprehensively considering the plurality of guidance labels and performing weighted integration on the plurality of guidance labels through the dominance weight, intermediate layer data corresponding to the plurality of teacher models can be extracted, and first intermediate data can be obtained in a corresponding weighted mode.

The method can extract the intermediate layer data of the student model, and needs to supplement that the model frameworks of the student model and different teacher models are different, and the number of the model layers of the student model and different teacher models is also different, so that the intermediate layer required to be extracted of each model can be preset, the extracted intermediate layers can be compared, for example, the same matrix data is extracted for comparison, and the second intermediate data is obtained.

The first intermediate data and the second intermediate data are integrated by addition or other integration means to determine a fourth comparison value. The fourth comparison value can characterize the gap between the student model and the dominant teacher model so that the student model can obtain the information in the training process. In one implementation scenario, the method may determine the guidance data from the fourth comparison value data.

In yet another implementation scenario, the method may comprehensively consider target tags, training tags, and intermediate layer data information. That is, the method may add or otherwise integrate the second comparison value corresponding to the target tag and the fourth comparison value corresponding to the intermediate layer data in the above embodiment to obtain the fifth comparison value. Thereafter, the guidance data is determined based on the fifth comparison value.

It should be added that, depending on the actual situation, the method can also determine the known data by integrating the third comparison value and the fourth comparison value. The guidance data may also be determined by integrating the second comparison value, the third comparison value and the fourth comparison value. The determination method of the guidance data is the same as the foregoing embodiment, and will not be described in detail below.

To facilitate a further understanding of the above embodiments, several specific implementation scenarios are provided below.

Fig. 6 is a single scene distillation architecture diagram of a label screening method according to a sixth embodiment of the present application.

Referring to fig. 6, in one implementation scenario, a student model that is good at predicting a single scenario needs to be trained.

First, a training sample corresponding to the single scene, i.e., single scene data (D1) is obtained for distillation training;

then, a plurality of trained teacher models, i.e., teacher model T1, teacher model T2, and teacher model T3, which excel in the single scene, are selected. The teacher model T1, the teacher model T2, and the teacher model T3 are different models.

Then, the single scene data (D1) is predicted by the trained teacher model T1, teacher model T2, and teacher model T3, and a guidance tag corresponding to the single scene data (D1) is obtained.

And then, guiding the guide label and the real label through a guide function so as to determine corresponding guide data, and guiding the student model S to carry out distillation training according to the single scene data (D1) and the guide data so as to obtain the student model which is good at predicting the single scene.

Wherein the guideline function may be a combination of one or more of the following functions.

The first guiding function: and guiding by the guiding label and the real label output by the teacher model, and determining the comparison value with the minimum value as guiding data by obtaining the comparison value of the loss function corresponding to the real label and the loss function corresponding to each guiding label.

Second guiding function: and guiding through the guiding labels and the real labels output by the teacher model, integrating the loss values corresponding to all the guiding labels, and determining the comparison value as guiding data by obtaining the comparison value of the loss function corresponding to the real label and the integration loss values corresponding to all the guiding labels.

The third guiding function: and guiding through the intermediate layer data of the teacher model and the intermediate layer data of the student model, obtaining a comparison value of the intermediate layer data corresponding to the teacher model and the student model, and integrating the intermediate value with the guiding data obtained by the first guiding function to obtain final guiding data.

The fourth guideline function: and guiding through the intermediate layer data of the teacher model and the intermediate layer data of the student model, obtaining a comparison value of the intermediate layer data corresponding to the teacher model and the student model, and integrating the intermediate value with the guiding data obtained by the second guiding function to obtain final guiding data.

It is to be understood that the guiding function may be selected to adjust according to the actual scenario. In addition, the specific layer of the middle layer and the number of the selected layers can be selected in a targeted manner according to different scenes and models. When all teacher models are considered comprehensively, the teacher models can be ranked according to the merits of guidance labels output by the teacher models, and then the teacher models are combined according to the proportion of 70%, 20% and 10%.

The guidance calculation formula of the middle layers and the final output layers of the teacher models fully considers the contribution of the teacher models, and the information of the real labels can be also considered at the same time, so that the information of the guidance function is more comprehensive. Therefore, the importance of each teacher model to the precision effect in a plurality of scenes or the same scene can be well divided, so that the information important to the precision can be well screened out, and the guidance suggestion can be given in the next step to guide the student model to obtain the optimal effect.

Fig. 7 is a diagram illustrating a plurality of scene distillation architectures of a tag screening method according to a seventh embodiment of the present application.

Fig. 8 is a single scenario distillation flow chart of a tag screening method according to a seventh embodiment of the present application.

Referring to fig. 7 and 8, in one implementation scenario, a student model S is trained that is good at predicting multiple scenarios. The student model S is a modified YOLOv4_ tiny model.

Wherein the plurality of scenes comprises:

an application scene D1, which is a face detection scene, for detecting face-related tags;

an application scenario D2, which is a pedestrian detection scenario, for detecting pedestrian-related tags;

and applying a scene D3, namely a head and shoulder detection scene, for detecting the head and shoulder related labels.

Determining a trained teacher model T1 which is good at the scene D1 as a YOLOv3 model;

determining a trained teacher model T2 which is good at the scene D2 as a YOLOv4 model;

the trained teacher model T3, which was determined to be good at scene D3, is the YOLOv5 model.

In the distillation process, training samples corresponding to a scene D1, a scene D2 and a scene D3 are obtained, each sample is marked with a real label L1, a real label L2 and a real label L3, the training samples of the three scenes are mixed together without distinction, and a multi-scene training sample D1/D2/D3 is obtained.

And training the student model through the multi-scene training sample D1/D2/D3 to obtain an output result Os of the student model S.

Output results Ot1, Qt2 and Qt3 obtained by predicting the batch of input data D1 through teacher models T1, T2 and T3 are compared with a real label L1 respectively through output results Ot1, Qt2 and Qt3, Ot1, Qt2 and Qt3 are compared with the real label L1 respectively through a guide function, an output result corresponding to a comparison value with the minimum value is determined, and the output result is converted into guide data corresponding to a student model and a training sample to learn. For example, since the output result Qt1 of the teacher T1 is closer to the real label L1 for the data D1 at this time, the student model S is distillation-trained using the teacher model T1 when the data D1 is trained at this time. Similarly, when the scene data D2 is encountered, the teacher model T2 with higher accuracy for the scene is used to distill and train the student model S; when the scene data D3 is encountered, the teacher model T3, which is relatively accurate for the scene, is used to distillation train the student model S. Therefore, when the whole training is finished, the student model can improve the recognition accuracy of the student model for the multitask scene from the knowledge points which are learned by three teachers and are good in each other.

Referring to fig. 9, another aspect of the present application provides a label screening apparatus, including: an obtaining module 901, configured to obtain guidance labels corresponding to multiple teacher models, where the guidance labels are obtained by predicting training samples through the teacher models; a screening module 902, configured to screen the guidance tag according to the real tag, and determine a target tag that meets a screening condition; and the guiding module 903 is used for guiding distillation of the student model according to the target label to obtain an updated student model.

In one implementation, obtaining module 901 includes: the obtaining sub-module 9011 is used for obtaining a training sample, wherein the training sample is a single scene training sample, and the single scene training sample is marked with a real label with at least one dimension; the obtaining sub-module 9011 is further configured to obtain multiple teacher models for a single scene, where different teacher models have different advantages for training samples; the first training submodule 9012 is configured to predict a training sample through multiple teacher models, and obtain a guidance label of at least one dimension.

In an implementation manner, the obtaining sub-module 9011 is further configured to obtain training samples, where the training samples are multiple scene training samples, and the multiple scene training samples are labeled with real labels with multiple dimensions; an obtaining sub-module 9011, further configured to obtain a teacher model for each of the plurality of scenes; each teacher model is used for predicting real labels of at least one dimension in a scene; the first training submodule 9012 is further configured to train the training sample through a plurality of teacher models, and obtain guidance labels of multiple dimensions.

In one embodiment, the filtering module 902 includes: a comparison sub-module 9021, configured to compare the guidance tag with the real tag to obtain a first comparison value; and the adjusting submodule 9022 is configured to adjust the guidance tag according to the first comparison value to determine the target tag.

In an embodiment, the adjusting sub-module 9022 includes: screening first comparison values corresponding to the same training sample, and determining the first comparison value with the minimum value; and determining the guide label corresponding to the first comparison value with the minimum value as the target label.

In an embodiment, the adjusting sub-module 9022 includes: determining an advantage weight based on a first comparison value corresponding to the same training sample; and weighting the target label and the guide label according to the advantage weight value to obtain the target label.

In one embodiment, the instruction module 903 comprises: the second training submodule 9031 is configured to train the student model through the training sample corresponding to the target label to obtain a training label; a determining submodule 9032, configured to determine guidance data according to the target label and the training label; and the distillation submodule 9033 is used for carrying out distillation guidance on the student model according to the guidance data to obtain an updated student model.

In one embodiment, determining submodule 9032 includes: comparing the target label with the training label to obtain a second comparison value; comparing the target label with the real label to obtain a third comparison value; and integrating the second comparison value and the third comparison value to obtain guidance data.

In one embodiment, determining submodule 9032 includes: performing data extraction on the middle layer of the teacher model corresponding to the target label to obtain first middle data; extracting data of the middle layer of the student model corresponding to the training label to obtain second middle data; integrating the first intermediate data and the second intermediate data to determine a fourth comparison value; integrating the fourth comparison value and the second comparison value to obtain a fifth comparison value; wherein the fifth comparison value is used to determine the instructional data.

In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of tag screening, the method comprising:

obtaining guide labels corresponding to a plurality of teacher models, wherein the guide labels are obtained by predicting training samples through the teacher models;

screening the guide tags according to the real tags, and determining target tags meeting screening conditions;

and carrying out distillation guidance on the student model according to the target label to obtain an updated student model.

2. The method of claim 1, the obtaining instructional tags corresponding to a plurality of teacher models, comprising:

obtaining a training sample, wherein the training sample is a single scene training sample, and the single scene training sample is marked with a real label with at least one dimension;

obtaining a plurality of teacher models for a single scene, wherein different teacher models have different advantages for the training sample;

and predicting the training samples through the plurality of teacher models to obtain the guide label of the at least one dimension.

3. The method of claim 1, the obtaining instructional tags corresponding to a plurality of teacher models, comprising:

obtaining training samples, wherein the training samples are a plurality of scene training samples, and the scene training samples are marked with real labels with a plurality of dimensions;

obtaining a teacher model for each of a plurality of scenes; each teacher model is used for predicting real labels of at least one dimension in a scene;

and predicting the training samples through the teacher models to obtain the guidance labels of the multiple dimensions.

4. The method of claim 1, wherein the screening the guide tag according to the real tag to determine the target tag meeting the screening condition comprises:

comparing the guide label with the real label to obtain a first comparison value;

and adjusting the guide tag according to the first comparison value to determine the target tag.

5. The method of claim 4, said adjusting the guide tag according to the first comparison to determine the target tag, comprising:

screening first comparison values corresponding to the same training sample, and determining the first comparison value with the minimum value;

and determining the guide label corresponding to the first comparison value with the minimum numerical value as a target label.

6. The method of claim 4, said adjusting the guide tag according to the first comparison to determine the target tag, comprising:

determining an advantage weight based on a first comparison value corresponding to the same training sample;

and weighting the instruction label according to the advantage weight value to obtain a target label.

7. The method of claim 1, wherein the distilling guidance of the student model according to the target label, obtaining an updated student model, comprises:

training the student model through a training sample corresponding to the target label to obtain a training label;

determining guidance data according to the target label and the training label;

and carrying out distillation guidance on the student model according to the guidance data to obtain an updated student model.

8. The method of claim 7, the determining guidance data from the target label and the training label, comprising:

comparing the target label with the training label to obtain a second comparison value;

comparing the target label with the real label to obtain a third comparison value;

and integrating the second comparison value and the third comparison value to obtain guidance data.

9. The method of claim 7, the determining guidance data from the target label and the training label, comprising:

performing data extraction on the middle layer of the teacher model corresponding to the target label to obtain first middle data;

extracting data of the middle layer of the student model corresponding to the training label to obtain second middle data;

integrating the first intermediate data and the second intermediate data to determine a fourth comparison value;

integrating the fourth comparison value and the second comparison value to obtain a fifth comparison value; wherein the fifth comparison value is used to determine the instructional data.

10. A label screening apparatus, the apparatus comprising:

the obtaining module is used for obtaining guide labels corresponding to a plurality of teacher models, and the guide labels are obtained by predicting the training samples through the teacher models;

the screening module is used for screening the guide tag according to the real tag and determining a target tag meeting screening conditions;

and the guiding module is used for carrying out distillation guidance on the student model according to the target label to obtain an updated student model.