CN113269226B

CN113269226B - Picture selection labeling method based on local and global information

Info

Publication number: CN113269226B
Application number: CN202110399472.9A
Authority: CN
Inventors: 王魏; 李文韬; 陈攀
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2021-04-14
Filing date: 2021-04-14
Publication date: 2022-09-23
Anticipated expiration: 2041-04-14
Also published as: CN113269226A

Abstract

The invention discloses a picture selection and marking method based on local and global information, which can learn a model as good as possible by using pictures with marks as less as possible by enabling a learning model to automatically select partial pictures for marking. In order to reduce the requirement of image marking, the method utilizes the feature extraction capability of the depth model to construct the feature representation space of the image sample, and the effect of the sample on model updating is measured based on the local information of the image sample in the feature representation space. Meanwhile, the picture data space is divided into different areas based on the global information of the feature representation space, and the labeling budget is dynamically allocated according to the performance of the model on the different areas, so that the picture marking information is efficiently utilized, and the demand of picture marking is reduced.

Description

Picture selection and annotation method based on local and global information

Technical Field

The invention relates to a picture selection labeling method based on local and global information, which can efficiently select objects to be labeled in a picture database by utilizing the local and global information of a feature representation space, train a better picture classification model with less labeling cost, and belongs to the technical field of computer artificial intelligent data analysis.

Background

With the continuous development of the internet, a large amount of picture data needs to be processed, such as face pictures in face recognition, road pictures in automatic driving, commodity pictures on e-commerce platforms, and the like. The picture data structure is complex, so the picture classification task is often completed by using a depth model. But training the depth model requires a large number of labeled pictures. In general, it is expensive to label these pictures with a lot of manpower and material resources. In order to reduce the labeling cost and improve the utilization efficiency of labeled pictures, one solution is to let the model automatically select important pictures to be labeled, and collect the labels of the pictures for updating the model, which is the basic idea of selecting labels. The current selection labeling method mainly considers the uncertainty and the representativeness of data when measuring the importance degree of the data. Wherein the lower the confidence of the model in the prediction of the data, the higher the uncertainty of the data. In addition, the modulo length of the data gradient can also be used to estimate the uncertainty of the data. Since the uncertainty-based approach only considers the uncertainty level of a single data, the model easily picks out a batch of data with high uncertainty but redundancy. This problem can be alleviated to some extent by taking into account the representativeness of the data. Typically, a representative-based approach groups the features of the data into clusters, and selects the center point of each cluster as a representative of the cluster. Therefore, the distribution condition of the whole data can be described by only using a small amount of data. However, in this method, since there is no information about the model as a guide, the selected data does not necessarily facilitate the update of the model.

Disclosure of Invention

The invention aims to: aiming at the problems and the defects in the prior art, the invention provides a picture selection and annotation method based on local and global information. The method can utilize the picture characteristics to represent local information in space, combines the prediction result of the model, measures the information quantity of the picture, and can avoid similar or redundant pictures to a certain extent. Meanwhile, the global information of the feature representation space is combined, the picture data are divided into a plurality of clusters, and the labeling budget is dynamically allocated according to the performance of the model on different clusters, so that the utilization efficiency of the picture label is further improved, and the labeling cost is reduced. When the same number of marked pictures are utilized, the model trained by the method has better performance compared with a general selection marking method.

The technical scheme is as follows: a picture selection labeling method based on local and global information comprises the following contents:

first, a user is required to create a library of picture objects. And then randomly selecting a part of picture objects from the picture object library, acquiring marks of the picture objects, and forming an initial training set. And setting the structure of the depth model, the number of the selected picture objects in each round and the total number of iteration rounds by the user.

Next, the deep learning model is trained based on the training set. And converting the picture objects in the picture object library into feature representations by using the depth model, namely extracting the features of the pictures in the picture object library. Where the output of the penultimate layer of the depth model is often represented as a feature of the corresponding picture object. The space composed of these feature representations is called a feature representation space.

Then, in the feature expression space, the information amount of each object is estimated according to a local information calculation method, and a labeling budget is allocated according to a global information budget allocation method. Based on the budget, a batch of picture objects with high information content are selected, and marks of the picture objects are collected. And updating the marked picture object set and the unmarked picture object set. Meanwhile, the depth model is retrained by using the marked picture object set, and the feature representation of the picture object is re-extracted by using the new model. These steps are iterated in turn to specify the number of rounds. And the model of the last round is the final depth model.

And finally, in the prediction stage, the user inputs the picture object to be tested into the depth model obtained by training, and the depth model returns the prediction result to the user.

Has the advantages that: compared with the prior art, the method and the device have the advantages that the local information and the global information in the characteristic representation space are combined, the local information of the picture object is considered to avoid selecting a redundant picture, the budget is allocated according to needs through the global information of the characteristic representation space, the utilization efficiency of the picture data marks is improved, and the marking cost is reduced.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a flow chart of a local information computation method in the present invention;

FIG. 3 is a flowchart of a global information budget allocation method according to the present invention.

Detailed Description

The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, as various equivalent modifications of the invention will occur to those skilled in the art upon reading the present disclosure and fall within the scope of the appended claims.

As shown in fig. 1, the method for selecting and labeling a picture based on local and global information includes the following steps:

step 100, establishing a library of picture objectsAnd randomly selecting a small number of objects from the picture object library for the data set, and acquiring marks of the objects to form an initial training set. The number of categories of data in the picture object library is denoted as C.

Representing a set of tagged picture objects,

representing a set of unmarked picture objects;

step 101, the user selects the depth model to be used, and records the model as f (·; Θ), wherein

In order to be the parameters of the model,

selecting the number B of samples selected in each round and the total number T of iteration rounds by a user as a full-connection layer parameter of the model and theta as other parameters in the model;

step 102, using the marked picture object

Training a depth model, wherein the current round number t is 1;

step 103, inputting the unmarked picture object into the depth model, and extracting the feature representation r of the picture object according to the depth model _θ (x) Outputting f (x; theta) with the softmax layer;

step 104, estimating the information quantity provided by each object for the model according to a local information calculation method, as shown in fig. 2, the specific steps are as follows:

step 1041, selecting the range of the local neighbor area from the user;

step 1042, for the unmarked picture object x, the softmax layer output is f (x; Θ) ═ p ₁ ，...，p _C )，

Labelling of model f (x; Θ) predictions

To increase robustness, probability smoothing is performed as follows:

wherein g (x; Θ) ═ g (x; Θ) ₁ ，…，g(x；Θ) _C )；

Step 1043, for unmarked picture object

And

computing information volume based on smoothed probabilities

Step 1044, recording the neighboring area of the picture object x as

Wherein r is _θ (x) For the feature representation of the picture object x, the information content of the picture object x is

Step 1045, for all unmarked picture objects

Computing

And output.

105, aggregating the unmarked data into C clusters in the feature expression space according to the global information budget allocation method, and allocating budgets in different clusters (B) ₁ ，…，B _C ) In which B is _j Budgets for the markers assigned to the jth cluster. As shown in fig. 3, the specific steps are as follows:

step 1051, selecting a Gibbs distributed temperature parameter tau by a user;

step 1052, using means + + method to group the feature representations of the unmarked picture objects into C clusters, where the picture objects in the jth cluster form a set of C clusters

Step 1053, estimating the performance of the model on different clusters, and recording the performance as gamma of the model on the jth cluster _j

Step 1054, according to γ _j Constructing a budget of Gibbs distribution α ═ (α) ₁ ，…，α _C )

Wherein ∑ _j α _j τ is a temperature parameter used to adjust the degree of smoothing of the Gibbs distribution;

step 1055, sampling for B times according to the Gibbs distribution alpha to obtain the budget (B) allocated in each cluster ₁ ，…，B _C ) And output, wherein ∑ _j B _j B is the total marking budget;

step 106, in each cluster, according to the corresponding budget B _j j∈[C]Selecting the B with the highest information amount _j Picture objects, obtaining their labels, adding them to a set of labeled objectsClosing box

In, update

And

retraining the depth model;

step 107, if T is less than T, T is T +1, and the step 103 is skipped;

and step 108, using the model obtained by the training of the T-th round as a final model. And for the object to be measured, outputting the mark predicted by the model.

Claims

1. A picture selection labeling method based on local and global information is characterized by comprising the following contents:

firstly, establishing a picture object library; then randomly selecting a part of picture objects from a picture object library, acquiring marks of the picture objects, and forming an initial training set; setting the structure of a depth model, the number of picture objects selected in each round and the total number of iteration rounds;

secondly, training a deep learning model based on a training set; converting the picture objects in the picture object library into feature representation by using the depth model, namely extracting the features of the pictures in the picture object library; the space composed by the feature representation is called a feature representation space;

then, in the feature representation space, estimating the information quantity of each object according to a local information calculation method, and allocating a labeling budget according to a global information budget allocation method; based on the budget, selecting a batch of picture objects with high information quantity, and collecting marks of the picture objects; updating a marked picture object set and an unmarked picture object set; meanwhile, retraining the depth model by using the marked picture object set, and extracting the feature representation of the picture object by using the new model; iteration is carried out to designate the number of rounds; the model of the last round is the final depth model;

finally, in a prediction stage, the user inputs the picture object to be tested into a depth model obtained by training, and the depth model returns a prediction result to the user;

recording the category number of data in the picture object library as C;

representing a set of tagged picture objects,

a set of representative unmarked picture objects; the selected depth model is denoted as f (·; Θ), where

In order to be the parameters of the model,

selecting the number B of samples selected in each round and the total number T of iteration rounds by a user as a full-connection layer parameter of the model and theta as other parameters in the model; using tagged picture objects

Training a depth model, wherein the current round number t is 1; inputting unmarked picture objects into a depth model, and extracting characteristic representation r of the picture objects according to the depth model _θ (x) Outputs f (x; theta) with the softmax layer;

the method for calculating the object information quantity by using the probability smoothing and the local information comprises the following specific steps:

step 1041, selecting a range E of the local neighbor area;

step 1042, for the unmarked picture object x, the softmax layer output is f (x; Θ) ═ p ₁ ,…,p _C )，

Labelling of model f (x; Θ) predictions

Probability smoothing is performed as follows:

wherein g (x; Θ) ═ g (x; Θ) ₁ ,…,g(x；Θ) _C )；

Step 1043, for unmarked picture object

And

computing information volume based on smoothed probabilities

Step 1044, recording the neighboring area of the picture object x as

Step 1045, for all unmarked picture objects

Calculating out

And outputting;

grouping the unmarked data into C clusters in the feature representation space according to the global information budget allocation method, and allocating budgets in different clusters (B) ₁ ,…,B _C ) In which B is _j The specific steps for the marking budget allocated to the jth cluster are as follows:

step 1051, selecting a temperature parameter tau of Gibbs distribution by a user;

step 1052, using the kmeans + + method to group the feature representations of the unmarked picture objects into C clusters, the picture objects in the jth cluster are grouped into a set

Step 1054, according to γ _j Constructing a budget of Gibbs distribution α ═ (α) ₁ ,…,α _C )

Tau is a temperature parameter used for adjusting the smoothness degree of Gibbs distribution;

step 1055, sampling for B times according to the Gibbs distribution alpha to obtain the budget (B) allocated in each cluster ₁ ,…,B _C ) And output, wherein ∑ _j B _j B is the total marking budget.