Data weighted learning method based on model collaboration
Technical Field
The invention relates to the technical field of image instance segmentation, in particular to a data weighted learning method based on model collaboration.
Background
In recent years, artificial intelligence technology has been rapidly developed, and new sparks are continuously collided with by the combination of computer vision, and penetrate into various aspects of learning, work and life, so that the artificial intelligence technology is closely related to human science and technology development. Image classification, object detection, and image segmentation are three major core problems of computer vision. Image instance segmentation is an important branch of research in the field of image segmentation. The example segmentation task refers to locating the position of a potential target in an image, wherein the position is represented by a detection target frame, and pixel-by-pixel marking is performed in different target areas by using a semantic segmentation mode.
With the proposal of the full convolution network, image instance segmentation algorithm research based on deep learning starts to appear in the field of view of people. The method has the outstanding advantages of automatic and multi-layer feature extraction, training of the network by inputting a large number of image samples with labels or notes into the learning network, adjusting the weight of connection between neurons according to error detection, repeatedly optimizing the network to realize end-to-end classification learning, and predicting the image samples without labels after completion. The application effect of the method in the aspect of image instance segmentation is greatly improved in accuracy compared with the traditional algorithm.
However, deep learning requires a large number of training image samples to be accurately labeled when fitting a model, and such a finely labeled image dataset is mainly completed through manual labeling, so that the cost is high and the efficiency is low, and therefore, the training image dataset is difficult to acquire in a real scene. In the process of collecting and labeling an image data set, the problems of low quality of an image sample and inconsistent labeling quality generally occur, and the problem of inconsistent difficulty of the image sample also exists in model training, so that the difficulty of identifying and processing the low-quality image sample is great.
Learning data-driven curriculum for very deep neural networks on corrupted labls, in ICML, 2018 pre-trains an additional auxiliary network and then uses it to select a clean instance to guide training of the target network. The two networks are trained simultaneously by the Decoupling "white to update" from "how to update". In NeurIPS, 2017, and for each batch of samples, the prediction results are given to the two networks respectively, and when the prediction results are inconsistent, the backward propagation gradient update is carried out, but as the training times increase, the two networks gradually tend to be consistent, and functionally gradually degenerate into a self-training single target model. In order to solve the problems, co-training is Robust training of deep neural networks with extremely noisy labes, in NeurIPS, 2018 trains two deep neural networks at the same time, in each small batch of data, the two networks select label data which is as clean as possible according to the acquired loss result when feeding all data forward, the two networks determine the trained data by selecting a small loss method, and finally the other network performs back propagation on the selected data to update the super parameters of the network weight. "How does disagreement help generalization against label corruption". Then, among these inconsistent data, each network selects its own small-loss data to teach the other network, propagates the small-loss data from the other network, and updates its own parameters.
The method is focused on detecting the low-quality image sample, and re-labeling the image data set after discarding the low-quality image sample, however, the method does not dig the potential value of the low-quality image sample, so that the waste of invalid acquisition is caused.
Disclosure of Invention
The invention aims to: aiming at the problems in the background art, the invention provides a data weighted learning method based on model cooperation, which considers that the existing data effectiveness evaluation is mainly focused on detecting low-quality image samples, and the data is re-marked after being discarded, and the value of the image samples is not dug out, so that a new standard general framework is designed for the data effectiveness evaluation scheme, namely, the quality of the image samples is evaluated, the image sample re-weighting scheme is adopted to enable a target model to tend to learn the image samples with high quality, and the discarding of too many low-quality image samples is avoided, so that the model training is assisted by using the model.
The technical scheme is as follows: in order to achieve the above purpose, the invention adopts the following technical scheme:
a data weighted learning method based on model collaboration comprises the following steps:
step S1, giving a general image datasetTarget image dataset->Verification set->Giving an auxiliary model set consisting of a plurality of auxiliary models differing in both decision boundary and learning ability>Wherein->Indicate->Auxiliary model->For natural numbers greater than 0, give the target model +.>The method comprises the steps of carrying out a first treatment on the surface of the Wherein the set of auxiliary models->In general image dataset->Pre-training is carried out on the device;
step S2, randomly selecting the target image datasetImage sample->Auxiliary model set->Is->The individual auxiliary models are all for image samples +.>Generating a prediction result to obtain a prediction result setWherein->Indicate->The individual auxiliary model is +.>The prediction result generated is then passed through the image samples +.>The set of prediction results generated->And image sample->Corresponding real tag calculate image sample +.>Is->Auxiliary weights are obtained to obtain auxiliary weight set +.>Wherein->Indicate->Auxiliary weights generated by the auxiliary model are then calculated +.>Averaging the auxiliary weights to obtain an image sampleFinal auxiliary weights of (2);
finally, the target image dataset is calculatedFinal auxiliary weight of other image samples in the target image dataset, and according to the size of the final auxiliary weight +.>Ordering all the image samples in the image, if the final auxiliary weight of a certain image sample is smaller than the set threshold +.>Then the image sample is +.>Discarding to obtain cleaned target image dataset +.>;
Step S3, randomly selecting a target image datasetThe image sample is defined as +.>Through the object model->Generating an image sample->Is not yet controlled by the pre-processing of (2)Measurement of->Through image sample->Generated prediction result->Image sample->Corresponding real tag calculate image sample +.>Target weight and sample loss of (2), then using the target model +.>Calculating the target image dataset +.>Target weights and sample losses for all other image samples;
step S4, collecting the target image data setThe final auxiliary weight and the target weight obtained by each image sample in the image are fused to obtain sample weight;
step S5, collecting the target image data setRe-weighting the sample loss obtained for each image sample in the model with the sample weight to obtain a weighted sample loss, and finally updating the target model by using gradient calculation>Is->Is used for training;
step S6, when the target image data setThe training is completed, and when all training rounds are completed, the target model +.>After training, each training round comprises a training process of steps S3-S5;
step S7, verifying the setInput to the trained object model +.>And outputting a result after the image instance to be segmented is segmented.
Preferably, the implementation process of obtaining the final auxiliary weight in step S2 is as follows:
step S2.1, randomly selecting a target image datasetImage sample->Based on image sample->Is->Obtaining an image sample->Predicted target bounding box setsPredicted target mask setAnd predicted target edge setsWherein->Representing image samples +.>First->Target bounding box predicted by the auxiliary model +.>Representing image samples +.>First->Target mask for individual auxiliary model prediction, +.>Representing image samples +.>First->Target edges predicted by the auxiliary models; calculate->、/>Andconfidence score of (2) and calculate +.>、/>And->An average confidence score of the confidence scores of +.>The method comprises the steps of carrying out a first treatment on the surface of the Then calculate the image sample +.>Average confidence scores obtained from other auxiliary models respectively, all of the average confidence scores forming a setThe method comprises the steps of carrying out a first treatment on the surface of the Aggregating average confidence scoresIn combination with the set of evaluation index scores as image sample +.>Is a difficulty score set of (1)The evaluation index score set is expressed asWhereinRepresenting image samples +.>Is>Score of each evaluation index->Representing image samples +.>Is>The individual difficulty scores:
;
;
;
wherein the method comprises the steps ofRepresenting a confidence score; />Representing image samples +.>Is set to be the target bounding box true value of (1),representing image samples +.>Target mask true value,/->Representing image samples +.>Target edge truth values of (2); />For measuring the image sample->First->Degree of overlap between target bounding box and target bounding box truth values predicted by the respective auxiliary model, ++>Is used for measuring the image sample +.>First->An average indicator of the degree of overlap between the target mask and the target mask truth values predicted by the respective auxiliary models,is used for measuring the image sample +.>First->Matching degree between the target edge predicted by the auxiliary model and the target edge true value;
step S2.2, based on the prediction result setAnd image sample->At the coordinates ofCorresponding real label at pixel->Calculate image sample +.>Is a set of tag quality scores for (a)Wherein->Represents the>The mass fraction of each label is as follows:
;
;
wherein the method comprises the steps ofRepresenting image samples +.>Width of->Representing image samples +.>Height of->、/>For parameters->For counting symbols +.>Expressed in coordinates +.>Corresponding real label at pixels of (2)>Confidence of category->For the set threshold value, calculate the image sample +.>True mark at other pixelsConfidence of sign category, statistics of +.>The individual auxiliary models are +.>Confidence in the prediction results at all pixels in (a) is greater than the set threshold +.>The number of pixels of (2) is recorded as +.>;
S2.3, obtaining an image sample based on the UNIQUE without reference quality evaluation indexIs>Wherein->Representing image samples +.>Is>The image quality scores are:
;
step S2.4, sample the imageThe difficulty score set, the label quality score set and the image quality score set are fused to obtain an image sample +.>Auxiliary weight set ∈ ->:;
Wherein the method comprises the steps of,/>,/>Is a superparameter for balancing the difficulty score set +.>Tag quality score set->Image quality score set +.>For->Contribution to->The auxiliary weights are averaged to obtain an image sample +.>Then calculates the final auxiliary weights of the target image dataset +.>Setting a threshold t for final auxiliary weights of other image samples, and if the final auxiliary weight is smaller than the threshold t, indicating that the quality of the image sample corresponding to the final auxiliary weight is low, and taking the image sample from the target image datasetDiscarding to obtain cleaned target image dataset +.>。
Preferably, the implementation process of step S3 is as follows:
step S3.1, calculating target weight: for the target image datasetIs a randomly selected image sample +.>Based on the object model->Predicted outcome of->Obtaining a target model->Confidence score of predicted target bounding box, confidence score of predicted target mask and confidence score of predicted target edge, and calculate average confidence score of predicted target bounding box, confidence score of predicted target mask and confidence score of predicted target edge, combine average confidence score with evaluation index score as image sample>Is then based on the objective model +.>Predicted outcome of->And image sample->Corresponding real label calculation image sample +.>Then obtaining an image sample ++based on the no-reference quality evaluation index UNIQUE>Is to sample the image +.>The difficulty score, the label quality score and the image quality score are fused to obtain an image sample +.>The operation of fusing is as follows: />;
Wherein the method comprises the steps ofFor image sample->Target weight of->For image sample->Difficulty score of->For image sample->Label mass fraction,/>For the image quality score +.>,/>,/>Is a super parameter for balancing the difficulty fraction +.>Tag quality score set->And image quality score->Weight of target->Contribution of (2);
step S3.2, calculating an image sampleSample loss of->:
;
;
;
;
Wherein,representing class loss of target frame, using class cross entropy loss to measure differences between predicted and real target frame classes,/>True value representing target frame class, ++>A probability value representing a predicted target class for the target frame; />Representing regression loss of target frame, using average absolute error loss, measuring difference between predicted target frame and real target frame, +.>True value representing the position coordinates of the target frame, +.>Predicted value representing target frame position coordinates, +.>Representing the number of target boxes +.>Is a parameter; />Representing the segmentation penalty of the target mask, using the cross entropy penalty at pixel level,/for the target mask>True value representing target mask, +_>A predicted value representing a target mask, wherein +.>Representing image samples +.>Width of->Representing image samples +.>Height of->,/>For parameters->The number of categories is indicated and,cis a parameter.
Preferably, the sample weight obtained in S4 is:
;
wherein,for sample weight, ++>For the target weight, ++>For final auxiliary weight, ++>For adjusting target weight->And final auxiliary weight +.>Is a parameter of (a).
Preferably, in the step S5, the target model is updatedIs a weighted sample loss->The method comprises the following steps:
。
the beneficial effects of the invention include the following aspects:
(1) According to the invention, the model is used for evaluating the data validity instead of manual work, and the process of evaluating the data validity is automated, so that the labor cost of data screening can be greatly reduced, and the efficiency is improved. Data in scenes such as crowdsourcing are cleaned, so that higher-quality data can be obtained while the data collection cost is reduced;
(2) The invention calculates the loss weight of each image sample by evaluating the quality of the label sample together with the auxiliary model and the target model, and under the weighted loss, the target model tends to learn the image sample with high quality, thereby improving the performance of the image data set containing a large number of low-quality image samples;
(3) The method is not only suitable for the field of image instance segmentation, but also hopefully achieves positive effects in the visual fields of image semantic segmentation, image classification, target detection and the like.
Drawings
FIG. 1 is a flow chart of a data weighted learning method based on model collaboration provided by the invention;
FIG. 2 is a simplified flow chart of the method for deriving a cleaned dataset based on auxiliary weights;
fig. 3 is a schematic diagram of a data weighting framework based on model collaboration provided by the invention.
Description of the embodiments
The invention will be further described with reference to the accompanying drawings. It will be apparent that the described embodiments are some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention provides a data weighted learning method based on model collaboration, the specific principle is shown in figure 1, and the method comprises the following steps:
step S1, giving a general image datasetTarget image dataset->Verification set->Giving an auxiliary model set consisting of a plurality of auxiliary models differing in both decision boundary and learning ability>Wherein->Indicate->Auxiliary model->For natural numbers greater than 0, give the target model +.>The method comprises the steps of carrying out a first treatment on the surface of the Wherein the set of auxiliary models->In general image dataset->Pre-training is carried out on the device;
step S2, randomly selecting the target image datasetImage sample->Auxiliary model set->Is->The individual auxiliary models are all for image samples +.>Generating a prediction result to obtain a prediction result setWherein->Indicate->The individual auxiliary model is +.>The prediction result generated is then passed through the image samples +.>The set of prediction results generated->And image sample->The corresponding real tag calculates the image sample +.>Sets of individual auxiliary weights->Wherein->Indicate->Auxiliary weights generated by the auxiliary model are then applied to this +.>The auxiliary weights are averaged to obtain an image sample +.>Final auxiliary weights of (2);
finally, the target image dataset is calculatedFinal auxiliary weight of other image samples in the target image dataset, and according to the size of the final auxiliary weight +.>Ordering all the image samples in the image, if the final auxiliary weight of a certain image sample is smaller than the set threshold +.>Then the image sample is +.>Discarding to obtain cleaned target image dataset +.>。
Specifically, referring to fig. 2, the method includes the following steps:
step S2.1, randomly selecting a target image datasetImage sample->Based on image sample->Is->Obtaining an image sample->Predicted target bounding box setsPredicted target mask setAnd predicted target edge setsWherein->Representing image samples +.>First->Target bounding box predicted by the auxiliary model +.>Representing image samples +.>First->Target mask for individual auxiliary model prediction, +.>Representing image samples +.>First->Target edges predicted by the auxiliary models; calculate->、/>Andand calculates the average confidence score +.>The method comprises the steps of carrying out a first treatment on the surface of the Then calculate the image sample +.>Average confidence scores obtained from other auxiliary models respectively, all the average confidence scores constituting the set +.>The method comprises the steps of carrying out a first treatment on the surface of the The average confidence score set +.>In combination with the set of evaluation index scores as image sample +.>Is ++A of difficulty score sets>The evaluation index score set is expressed as +.>Wherein->Representing image samples +.>Is>Score of each evaluation index->Representing image samples +.>Is the first of (2)The individual difficulty scores:
;
;
;
wherein the method comprises the steps ofRepresenting a confidence score; />Representing image samples +.>Is set to be the target bounding box true value of (1),representing image samples +.>Target mask true value,/->Representing image samples +.>Target edge truth values of (2); />For measuring the image sample->First->Target bounding box and target boundary for individual auxiliary model predictionDegree of overlap between frame truth values, +.>Is used for measuring the image sample +.>First->An average indicator of the degree of overlap between the target mask and the target mask truth values predicted by the respective auxiliary models,is used for measuring the image sample +.>First->Matching degree between the target edge predicted by the auxiliary model and the target edge true value;
step S2.2, based on the prediction result setAnd image sample->At the coordinates ofCorresponding real label at pixel->Calculate image sample +.>Is a set of tag quality scores for (a)Wherein->Represents the>The mass fraction of each label is as follows:
;
;
wherein the method comprises the steps ofRepresenting image samples +.>Width of->Representing image samples +.>Height of->、/>As a function of the parameters,for counting symbols +.>Expressed in coordinates +.>Corresponding real label at pixels of (2)>Confidence of category->For the set threshold value, calculate +.>Confidence of true tag class at other pixels, statistics of +.>Confidence in the prediction results of the auxiliary models for all pixels in the image sample is greater than a set threshold +.>The number of pixels of (2) is recorded as +.>;
S2.3, obtaining an image sample based on the UNIQUE without reference quality evaluation indexIs>Wherein->Representing image samples +.>Is>The image quality scores are:
;
step S2.4, sample the imageThe difficulty score set, the label quality score set and the image quality score set are fused to obtain an image sample +.>Auxiliary weight set ∈ ->:;
Wherein the method comprises the steps of,/>,/>Is a superparameter for balancing the difficulty score set +.>Tag quality score set->Image quality score set +.>For->Contribution to->The auxiliary weights are averaged to obtain an image sample +.>Then calculates the final auxiliary weights of the target image dataset +.>Final auxiliary weights of other image samples in the image data set, setting a threshold t, and adding the final auxiliary weights to the target image data set>Screening and discarding low-quality image samples, wherein the method comprises the following steps: if the final auxiliary weight is less than the thresholdA value t indicating that the quality of the image sample corresponding to the final auxiliary weight is low, and this image sample is taken from the target image dataset +.>Discarding to obtain cleaned target image dataset +.>。
Step S3, randomly selecting a target image datasetThe image sample is defined as +.>Through the object model->Generating an image sample->Predicted outcome of->Through image sample->Generated prediction result->Image sample->Corresponding real tag calculate image sample +.>Target weight and sample loss of (2), then using the target model +.>Calculating the target image dataset +.>Target weights and sample losses for all other image samples; specifically, please see fig. 3:
step S3.1, calculating target weight: for the target image datasetIs a randomly selected image sample +.>Based on the object model->Predicted outcome of->Obtaining a target model->Confidence scores of the predicted target bounding box, the predicted target mask and the predicted target edge, and calculating an average confidence score of the three confidence scores, combining the average confidence score with the evaluation index score as an image sample>Is then based on the objective model +.>Predicted outcome of->And image sample->Corresponding real label calculation image sample +.>Then obtaining an image sample ++based on the no-reference quality evaluation index UNIQUE>Is to sample the image +.>The difficulty score, the label quality score and the image quality score are fused to obtain an image sample +.>The operation of fusing is as follows: />;
Wherein the method comprises the steps ofFor image sample->Target weight of->For image sample->Difficulty score of->For image sample->Label mass fraction,/>For the image quality score +.>,/>,/>Is a super parameter for balancing the difficulty and the easinessCount->Tag quality score set->And image quality score->Weight of target->Contribution of (2);
step S3.2, calculating an image sampleSample loss of->:
;
;
;
;
Wherein,representing class loss of target frame, using class cross entropy loss to measure difference between class of predicted target frame and class of real target frame, +.>True value representing target frame class, ++>A probability value representing a predicted target class for the target frame; />Representing regression loss of target frame, using average absolute error loss, measuring difference between predicted target frame and real target frame, +.>True value representing the position coordinates of the target frame, +.>Predicted value representing target frame position coordinates, +.>Representing the number of target boxes +.>Is a parameter; />Representing the segmentation penalty of the target mask, using the cross entropy penalty at pixel level,/for the target mask>True value representing target mask, +_>A predicted value representing a target mask, wherein +.>Representing image samples +.>Width of->Representing image samples +.>Height of->,/>For parameters->The number of categories is indicated and,cis a parameter.
Step S4, collecting the target image data setThe final auxiliary weight and the target weight obtained by each image sample are fused to obtain sample weight:
;
wherein,for sample weight, ++>For the target weight, ++>For final auxiliary weight, ++>For adjusting target weights->And final auxiliary weight +.>Is a parameter of (a).
Step S5, object image data setRe-weighting the sample loss obtained by each image sample and the sample weight to obtain weighted sample loss, and finally utilizing gradientIs used for updating the target model>Is->Is to (1) training:
;
s6, when the training set training is completed and all training rounds are completed, the target modelAfter training, each training round comprises a training process of steps S2-S5;
step S7, verifying the setInput to the trained object model +.>And outputting a result after the picture instance to be segmented is segmented.
A data weighted learning method based on model collaboration is implemented in a specific example segmentation task. Two models, mask-RCNN, pointRend, were chosen as the auxiliary model, the yolact model as the target model, and experiments were performed on CoCo datasets, with the results shown in table 1.
Table 1: experimental results of training the yolact example segmentation model under the CoCo dataset:
in table 1, the first row shows the results obtained by training the yolact model directly on the CoCo dataset. The second line shows that re-weighting the final sample weights to the loss of the target model proceeds using only the difficulty score of the image sample as the sample score for the auxiliary model and the target modelAnd updating the row model to obtain a result. The third line shows the results obtained after model updating by re-weighting the final sample weights to the loss of the target model using only the label quality scores of the image samples as the sample scores for the auxiliary model and the target model. The last line shows the difficulty score for the simultaneous use of the image samplesTag mass fraction of image sample->Image quality fraction of image sample->And weighting the final sample weight to the loss of the target model according to a certain proportion as the sample fraction of the auxiliary model and the target model, and carrying out model updating to obtain a result.
By observing the table, the used sample fraction indexes can be found to improve the segmentation performance of the model to a certain extent, and the effectiveness of the method is further verified.
The foregoing is only a preferred embodiment of the invention, it being noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the invention.