CN113379037B

CN113379037B - Partial multi-mark learning method based on complementary mark cooperative training

Info

Publication number: CN113379037B
Application number: CN202110717550.5A
Authority: CN
Inventors: 张珍茹; 张敏灵
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2021-06-28
Filing date: 2021-06-28
Publication date: 2023-11-10
Anticipated expiration: 2041-06-28
Also published as: CN113379037A

Abstract

The invention discloses a partial multi-mark learning method based on complementary mark collaborative training, which aims at the problem that noise marks exist in training data under a sample multi-mark scene and improves the performance of a classification model. The method uses two neural networks for collaborative training, wherein one network only learns from a candidate marker set and the other network only learns from a non-candidate marker set, namely, a complement marker. Specifically, a sample with small loss is selected from each batch data, another network is guided to update parameters, and finally the output of the two networks is combined according to weight to give confidence to the batch data. And respectively learning from the candidate marks and the non-candidate marks, and comprehensively considering the two networks to reduce the influence of noise in the candidate marks on the model performance, thereby obtaining a robust classification model under the partial mark learning scene.

Description

Partial multi-mark learning method based on complementary mark cooperative training

Technical Field

The invention belongs to the technical field of computer application, and particularly relates to a partial multi-mark learning method based on complementary mark cooperative training.

Background

The partial multi-mark learning method is a novel weak supervision learning method, one sample in training data corresponds to a plurality of candidate marks, wherein a plurality of marks are true, the other marks are pseudo marks, and the difference between the partial multi-mark learning method and the partial multi-mark learning method is that the number of the true marks is different.

Current solutions to biased marker learning typically estimate the confidence that each candidate marker is a true marker. For example, the PML-fp and PML-lc methods are to initialize the tag confidence and add it as a weight to the ranking penalty, and to obtain the tag relevance ranking using alternate optimization to minimize the ranking penalty. However, if the confidence coefficient of the candidate mark is estimated incorrectly, the method can affect the model along with the alternative optimization process. The fPML method utilizes implicit dependencies between low-rank matrix approximate markers and features to identify noise markers and trains to obtain a multi-marker classifier. The PML-LRS method utilizes a low-rank matrix and sparse decomposition to obtain a real marking matrix and an uncorrelated marking matrix from candidate marking matrices, thereby reducing the influence of noise marking. The parameter method is based on a two-stage strategy, decomposing the task into a marker confidence estimate and a predictive model derivation. The first stage is to estimate the confidence coefficient of the candidate mark through iterative mark propagation, and select a trusted mark with high confidence coefficient; the second stage trains the multi-label model based on beaconing, and the beaconing selection in the first stage has larger influence on the subsequent model.

The method relies on traditional machine learning, cannot be expanded to a large data set, and greatly limits the field expansion and project iteration; in addition, the existing method depends on the mark confidence of a single model, and if errors exist, the single model is overlapped with iteration, so that the performance of a subsequent model is reduced; in addition, the problem of marker noise in multi-marker learning tends to result in reduced model performance.

Disclosure of Invention

In order to solve the problems, the invention discloses a partial multi-mark learning method based on complementary mark cooperative training, and simultaneously, two neural networks are used for cooperative training, so that the model can gradually update the confidence of candidate marks in iteration, thereby giving higher attention to real marks, reducing the influence of noise marks and improving the performance of a learning model.

In order to achieve the above purpose, the technical scheme of the invention is as follows:

a partial multi-mark learning method based on complementary mark cooperative training comprises the following specific steps:

(1) The partial multi-mark training set is used as input;

(2) The neural network f calculates candidate marker loss, and selects a plurality of samples with the minimum loss as knowledge to be provided to the neural network g;

(3) The neural network g calculates the compensation mark loss, and selects a plurality of samples with the minimum loss as knowledge to be provided to the neural network f;

(4) Calculating the loss between the output of the sample selected by the neural network g in the neural network f and the target mark to update the parameters of the neural network f;

(5) Calculating the loss between the output of the sample selected by the neural network f in the neural network g and the target mark to update the parameters of the neural network g;

(6) Combining the outputs of the neural network f and the neural network g as a label confidence;

(7) Iteratively optimizing a neural network;

(8) Performing multi-mark prediction on the test data according to a threshold value;

(9) Multi-mark index evaluation;

(10) Submitting the result to manual sampling review;

(11) And (5) sending a training set, and iterating the process.

Further, step (1) prepares the more marked training data, specifically as follows:

training data is obtained from arbitrary multi-label application scenarios such as images, audio, text, etc.:let->Representing d-dimensional feature space, < >>Representing the tag space containing q tags, & gt>For d-dimensional feature vector, ++>For example x _i Is a candidate marker set,/->For example x _i Is added to the mark of the (c). True mark set +.>Hidden in the candidate marker set, i.e.>Model->Will be from->Is learned.

Further, step (2) calculates candidate marker loss

Under the condition that the true mark is not determined, the candidate mark is directly utilized to guide the neural network f to learn, and a cross entropy loss function is adopted, wherein the formula is as follows:

wherein S is _k Representing candidate marker y _k Confidence of p _k Representing that model prediction samples belong to category y _k The method calculates p using a softmax activation function _k . For candidate labels, the neural network f hopes to have the output probability p _k The higher the better. After calculating loss for one batch of samples, sorting the samples from small to large, and selecting samples with small partial lossThe ratio selected is controlled by the following formula:

wherein T is the current epoch number of the neural network, T _max Is the set maximum epoch number, η is the learning rate. Because of the presence of noise markers in the candidate markers, the neural network f will gradually overfit the noise markers as epoch increases. Meanwhile, the neural network has a memory effect, and a simple and clean mode can be learned first. The ratio R (T) thus chosen decreases as epoch increases, i.e., the more "trusted" model initially learns knowledge.

Further, step (3) calculates the loss of the supplemental mark

The neural network g will only learn from the complement markers, its loss calculation formula is as follows:

wherein the method comprises the steps ofIndicating whether the supplementary mark is a supplementary mark, if so, the supplementary mark is 1, otherwise, the supplementary mark is 0. For the complement labels, the neural network g desirably has an output probability p _k The closer to 0, the better. Similarly, the neural network g sorts the samples of a batch from small to large after calculating the loss, and selects the sample with smaller R (T) proportion loss +.>

Further, the step (4) (5) updates the neural network parameters

Neural network f selected samplesWill help the neural network g update the network parameters, likewise the sample chosen by neural network g ∈ ->It will also help the neural network f update the network parameters, i.e. calculate the losses in the own network to adjust the parameter weights, using samples that the opposite network sees reliable, respectively, as follows:

further, step (6) updates the label confidence

Sample candidate marker setThe initial confidence level for each marker in the pool is 1 and will vary with model training. Specifically, the output results of the neural networks f, g on each batch of samples are combined by weight and normalized to be the new label confidence. For each sample x in a batch _i The label confidence update formula is as follows:

where α is a balance parameter controlling the proportion of information inherited from the neural networks f, g, respectively. In addition, the method only updates the confidence coefficient of the candidate mark, and the supplementary mark explicitly indicates that the sample does not belong to the sample, and the confidence coefficient is always 0.

Further, step (8) predicts multiple markers

After model training is completed, test case x is tested by the following method _i ^* Multi-label prediction is performed.

The beneficial effects of the invention are as follows:

the partial multi-mark learning method based on the complementary mark collaborative training aims at the problems that the multi-mark description is carried out in training data under a machine learning scene and the mark space contains noise, and improves the performance of a learning model. The method uses two neural networks simultaneously for collaborative training, wherein one network only learns from a candidate mark set, and the other network only learns from a non-candidate mark set, namely a complementary mark, and the two networks mutually learn and update parameters in each batch. In addition, the method considers the output of the two models, so that the models can gradually update the confidence coefficient of the candidate mark in iteration, thereby giving higher attention to the real mark and reducing the influence of the noise mark.

Drawings

FIG. 1 is a diagram of a system frame according to the present invention.

Detailed Description

The present invention is further illustrated in the following drawings and detailed description, which are to be understood as being merely illustrative of the invention and not limiting the scope of the invention.

As shown in the figure, the partial multi-mark learning method based on the complementary mark cooperative training uses two neural networks to mutually guide learning. A neural network learns a candidate mark set in a training set as guiding information, but due to the fact that noise marks exist in the candidate mark set, model overfitting noise is caused, and model performance is reduced. The method adopts another neural network to learn from the non-candidate mark set, namely the supplementary mark, and the information provided by the supplementary mark is clear and can guide the learning of another model. The label confidence is then updated, with a greater confidence indicating a higher likelihood of the label being a true label. The method comprises the following specific steps:

(1) The partial multi-mark training set is used as input; training data is obtained for arbitrary multi-label application scenarios such as images, audio, text, etc.:let->Representing d-dimensional feature space, < >>Representing the tag space containing q tags, & gt>For d-dimensional feature vector, ++>For example x _i Is a candidate marker set,/->For example x _i Is added to the mark of the (c). True mark set +.>Hidden in the candidate marker set, i.e.>Model->Will followIs learned.

neural network f selected samplesWill help the neural network g update network parameters and, as such, the mindSample selected via network g->It will also help the neural network f update the network parameters, i.e. calculate the losses in the own network to adjust the parameter weights, using samples that the opposite network sees reliable, respectively, as follows:

the initial confidence level for each marker in the sample candidate marker set is 1 and will vary with model training. Specifically, the output results of the neural networks f, g on each batch of samples are combined by weight and normalized to be the new label confidence. For each sample x in a batch _i The label confidence update formula is as follows:

(7) Iteratively optimizing a neural network;

(9) Multi-mark index evaluation;

(10) Submitting the result to manual sampling review;

(11) And (5) sending a training set, and iterating the process.

The invention provides a partial multi-mark learning scheme which can be directly utilized to learn from partial multi-mark data, so that the performance of a classification model is improved, and the cost of manually screening and purifying the data is reduced.

In addition, the loss calculation of the candidate mark and the supplementary mark is mainly based on a cross entropy loss function, and other loss calculation methods exist. In addition, aiming at the scheme of model collaborative training, a divergence-based mode can be considered, namely, if two models have no divergence to the prediction result of the same sample, the sample is given higher weight; if there is a greater divergence, the sample weight lag learning is reduced or a third model is introduced to take on the role of "referee". Such modifications are intended to fall within the scope of the present invention.

Claims

1. A partial multi-mark learning method based on complementary mark cooperative training is characterized in that: the method comprises the following specific steps:

(1) The method comprises the steps that a multi-mark training set is used as input, and the multi-mark training set is training data of images, audios and texts;

the method comprises the following steps:

training data is obtained from any of the partial mark application scenarios:let->Representing d-dimensional feature space, < >>Representing the tag space containing q tags, & gt>For d-dimensional feature vector, ++>For example x _i Is a candidate marker set,/->For example x _i Is a complement of the true marker set->Hidden in the candidate marker set, i.e.>Model f:>will be from->Learning in the middle;

the method comprises the following steps:

wherein S is _k Representing candidate marker y _k Confidence of p _k Representation model predictionThe sample belonging to category y _k The method calculates p using a softmax activation function _k The method comprises the steps of carrying out a first treatment on the surface of the For candidate labels, the neural network f hopes to have the output probability p _k The higher the better; after calculating loss for one batch of samples, sorting the samples from small to large, and selecting samples with small partial lossThe ratio selected is controlled by the following formula:

wherein T is the current epoch number of the neural network, T _max Is the set maximum epoch number, η is the learning rate; because of the noise marks in the candidate marks, the neural network f gradually overfits the noise marks with the increase of epochs; meanwhile, the neural network has a memory effect, and a simple and clean mode can be learned first; the ratio R (T) thus chosen decreases as epoch increases, i.e., the knowledge initially learned by the more "trusted" model;

(3) The neural network g calculates the compensation mark loss, and selects a plurality of samples with the minimum loss as knowledge to be provided to the neural network f; the supplemental markers are a set of non-candidate markers;

the method comprises the following steps:

wherein the method comprises the steps ofIndicating whether the mark is a supplementary mark, if so, the mark is 1, otherwise, the mark is 0; for the complement labels, the neural network g desirably has an output probability p _k The closer to 0, the better; similarly, neural network g computes the loss for a batch of samples before it is usedSorting from small to large, selecting samples with smaller R (T) proportion loss +.>

neural network f selected samplesThe neural network g will be helped to update the network parameters, calculate the losses in the own network to adjust the parameter weights using samples that the opposite network feels reliable, the formula is as follows:

neural network g selected samplesThe neural network f will be helped to update the network parameters, calculate the losses in the own network to adjust the parameter weights using samples that the opposite network feels reliable, the formula is as follows:

the method comprises the following steps:

the initial confidence of each marker in the sample candidate marker set is 1, and will change with model training; specifically, the output junctions of the neural networks f, g on each batch of samples are combined by weightThe result is normalized and then used as a new mark confidence coefficient; for each sample x in a batch _i The label confidence update formula is as follows:

wherein alpha is a balance parameter, controlling the information proportion inherited from the neural network f and g respectively, and in addition, the method only updates the confidence coefficient of the candidate mark, and the supplementary mark explicitly indicates that the candidate mark does not belong to a sample, and the confidence coefficient is always 0;

(7) Iteratively optimizing a neural network;

the method comprises the following steps:

after model training is completed, test case x is tested by the following method _i ^* A multi-label prediction is performed and,

(9) Multi-mark index evaluation;

(10) Submitting the result to manual sampling review;

(11) And (5) sending a training set, and iterating the process.