CN116385466B

CN116385466B - Method and system for dividing targets in image based on boundary box weak annotation

Info

Publication number: CN116385466B
Application number: CN202310494738.7A
Authority: CN
Inventors: 黄小明
Original assignee: Beijing Information Science and Technology University
Current assignee: Beijing Information Science and Technology University
Priority date: 2023-05-05
Filing date: 2023-05-05
Publication date: 2024-06-21
Anticipated expiration: 2043-05-05
Also published as: CN116385466A

Abstract

The invention discloses a method and a system for dividing targets in images based on weak labels of boundary boxes, which relate to the technical field of image division and comprise the following steps: acquiring an image data set, and marking a boundary box of the image data set; generating pixel-level pseudo labels with confidence on the basis of the boundary box labels and training a target segmentation model according to the pixel-level pseudo labels; optimizing pixel-level pseudo labels based on cross verification iteration and training a target segmentation model based on the pixel-level pseudo labels; and outputting the optimal model. Compared with a pixel-level manual labeling method, the method does not need time-consuming and labor-consuming pixel-level labeling, only uses simple bounding box labeling, and saves manpower and material resources; the segmentation performance of the model is improved through simultaneous learning of two tasks of an output graph and a boundary box and fusion of two global and local scale graphs; by the cross-validation pseudo-annotation noise detection method, false annotations in the pseudo-annotations are detected, influence on model training is reduced, and target segmentation accuracy in images is improved.

Description

Method and system for dividing targets in image based on boundary box weak annotation

Technical Field

The invention relates to the technical field of image segmentation, in particular to a method and a system for segmenting targets in an image based on weak annotation of a boundary box.

Background

At present, the object segmentation in the image is widely applied to image content understanding or scene analysis tasks, and has wide application value. In recent years, the deep convolutional neural network greatly improves the performance of target segmentation, and the target segmentation method in the existing image has the advantages that the method with better performance is a full supervision method based on deep learning, but a large amount of manually marked pixel-level data is needed, so that the marking working cost is too high, and the limitation of other application scenes is difficult to generalize.

Therefore, the object segmentation method in the image based on the weak annotation of the bounding box is provided, time-consuming pixel-level annotation data is not needed, only the bounding boxes of all objects in the image are needed to be annotated, and finally a object segmentation model is learned based on the bounding boxes of the objects to carry out object segmentation, so that the problem to be solved by the person skilled in the art is needed to be solved.

Disclosure of Invention

In view of the above, the invention provides a method and a system for dividing targets in an image based on weak labeling of a bounding box, which do not need time-consuming pixel-level labeling data, only need to label bounding boxes of all targets in the image, then generate multi-level pseudo labels, optimize the pseudo labels through a neural network, and finally learn a target dividing model, and in order to achieve the above purpose, the invention adopts the following technical scheme:

A method for dividing targets in an image based on weak annotation of a bounding box comprises the following steps:

acquiring an image data set, and marking a boundary box of the image data set;

Generating a pixel-level pseudo label with confidence based on the boundary box label;

Training a target segmentation model according to the pixel-level pseudo-labels;

Optimizing pixel-level pseudo labels based on cross verification iteration, and training a target segmentation model through the optimized pixel-level pseudo labels to obtain an optimal model;

And inputting the image to be segmented into the optimal model for target segmentation to obtain a segmentation result.

Optionally, performing bounding box labeling on the image dataset includes: and marking the minimum circumscribed rectangular frame of each target in the image data set as a boundary frame, merging the boundary frames of a plurality of targets into one boundary frame when the boundary frames of the targets are overlapped, wherein the targets are positioned in the boundary frame, and the background area is arranged outside the boundary frame.

Optionally, processing the bounding box label to obtain a specific position of the object in the bounding box.

Optionally, the specific steps of processing the bounding box label are as follows:

marking the input image and the boundary box through a color space;

Dividing the color space by using a GrabCut algorithm, and generating a pseudo mark according to a division result;

Calculating a first segmentation confidence according to consistency of segmentation results of pixels in the pseudo labels:

calculating a second segmentation confidence according to the area ratio of the minimum circumscribed rectangle to the boundary box mark in the segmentation result;

and coupling the first segmentation confidence and the second segmentation confidence to obtain the confidence of the pseudo-label.

Optionally, the training the target segmentation model according to the pixel-level pseudo-labeling includes:

acquiring an image frame, marking a boundary frame, and marking a pixel level pseudo mark;

Constructing a deep neural network model;

Training a target segmentation depth neural network model under the supervision of pixel-level pseudo labeling;

And constructing a loss function through the loss between the boundary box label and the predicted boundary box, the loss between the global segmentation result and the pseudo label, the loss between the local segmentation result and the pseudo label and the confidence level of the pseudo label to guide the neural network to learn.

Optionally, the optimizing the pixel-level pseudo-labeling based on the cross-validation iteration includes:

randomly dividing the dataset T and the pseudo-annotation S into two parts T ₁,S₁ and T ₂,S₂; respectively learning two network models D (T ₁,S₁) and D (T ₂,S₂);

The result W ₁ is obtained by deducing the learned network model D (T ₂,S₂) on the data set T ₁, the degree of difference between the pseudo-annotation S ₁ and the deduced result W ₁ is calculated, namely the result is regarded as noise, the result is marked as noise in S ₁, noise detection is carried out in the pseudo-annotation S ₂, and the noise is marked in S ₂;

updating the pseudo labels S ₁ and S ₂, combining the two parts of data, and repeating the steps to update iteratively until no noise is detected.

Optionally, the method comprises the following steps: and updating the pixel-level pseudo-labels by the optimized pixel-level pseudo-labels, training the target segmentation depth neural network model under the supervision of the optimized pixel-level pseudo-labels, iterating for N rounds until the model converges, and outputting an optimal model.

Optionally, a target segmentation system in an image based on a weak label of a bounding box includes:

the acquisition module is used for: for acquiring an image dataset;

and the marking module is used for: the method comprises the steps of performing boundary box labeling on the image data set, and generating pixel-level pseudo labels with confidence on the basis of the boundary box labeling;

The segmentation model construction module: training a target segmentation model according to the pixel-level pseudo-labels;

Training module: optimizing pixel-level pseudo labels based on cross verification iteration, and training a target segmentation model through the optimized pixel-level pseudo labels to obtain an optimal model;

And an output module: and inputting the image to be segmented into the optimal model for target segmentation to obtain a segmentation result.

Compared with the prior art, the method and the system for dividing the target in the image based on the weak annotation of the bounding box have the following beneficial effects:

compared with a pixel-level manual labeling method, the method for dividing the target in the image based on the boundary box weak labeling does not need time-consuming and labor-consuming pixel-level labeling, only uses simple boundary box labeling, is easier to popularize in any application scene, and saves a large amount of manpower and material resources; the pixel-level pseudo-labeling segmentation results and the result consistency thereof generated in different modes are used for obtaining pseudo-labels with confidence coefficient and guiding neural network learning; the global and local scale segmentation results are predicted by learning and predicting the simultaneous learning of the target segmentation task and the boundary frame task, so that the segmentation performance of the model is improved; by the cross-validation pseudo-annotation noise detection method, false annotations in the pseudo-annotations are detected, influence on model training is reduced, and target segmentation accuracy in images is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a method for dividing targets in an image based on weak annotation of a bounding box.

Fig. 2 is a schematic diagram of bounding box labeling provided in the present invention.

FIG. 3 is a schematic diagram of pseudo label generation with confidence provided by the present invention.

Fig. 4 is a flowchart of cross-validation-based noise detection provided by the present invention.

Fig. 5 is a schematic diagram of a target segmentation model structure provided by the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The embodiment of the invention discloses a target segmentation method in an image based on a boundary box weak annotation, which comprises the following steps:

acquiring an image data set, and marking a boundary box of the image data set;

Further, as shown in fig. 2, the image dataset is subjected to bounding box labeling, and the pixel-level labeling has the limitation that the labeling work cost is too high and is difficult to generalize to other application scenes. And adopting simple bounding box labeling, namely using the smallest circumscribed rectangular box of each object in the picture as a bounding box labeling, and merging into one bounding box if the bounding boxes of a plurality of objects are overlapped. Wherein, the red rectangle box in fig. 2 shows the bounding box labeling example used by the method.

Further, the bounding box labels are processed to obtain specific positions of the objects in the bounding box.

Further, the specific steps of processing the bounding box label are as follows:

marking the input image and the boundary box through a color space;

The region outside the bounding box must be the background, according to the labeling definition of the bounding box, where the object is present, but no specific location at the pixel level of the object is given. The method is characterized in that firstly, based on the labeling of the boundary box, a pseudo labeling is generated by using a segmentation result obtained by using a GrabCut algorithm.

For the input image I and the boundary box label B, the input image is respectively represented by LAB and RGB color spaces, two segmentation results S _LAB and S _RGB are obtained by respectively applying Grabut algorithm to the two color spaces, and a pseudo label S is generated on the basis of the two segmentation results, as shown in a formula (1). For each pixel i, if both the segmentation results are 1, the false label is 1, if both the segmentation results are 0, the false label is 0, and otherwise, the false label is 0.5 when the segmentation results are inconsistent.

In the pseudo labeling result S, a pixel having a value of 1 and 0 indicates that the two division results are identical, and a pixel having a value of 0.5 indicates that the two division results are not identical. The proportion of 1 to 0 in the pseudo-label can be used for measuring the consistency of two segmentation results, and the segmentation results with higher consistency are more reliable. The duty cycle of 1 and 0 in the pseudo-label can thus be taken as a confidence in the segmentation result:

The confidence coefficient of each segmentation result can be calculated, two segmentation results S _LAB and S _RGB are obtained for the LAB and RGB color spaces, the minimum circumscribed rectangle and the area ratio of the minimum circumscribed rectangle to the boundary box mark B are calculated respectively, the ratios C _LAB and C _RGB are obtained, the ratio is closer to 1, the confidence coefficient is higher, and otherwise, the confidence coefficient is lower.

By combining the results, the confidence coefficient of the pseudo label S can be obtained:

C＝C_LAB-RGB×C_LAB×C_RGB (4)

fig. 3 is a schematic diagram of pseudo label generation with confidence, which sequentially includes an image and a bounding box, an LAB segmentation result, an RGB segmentation result and a pseudo label, wherein a red rectangular box is the bounding box label, the LAB segmentation result and the RGB segmentation result respectively represent the segmentation result of the GrabCut algorithm under the LAB and the RGB color spaces, a yellow box is the minimum circumscribed rectangle of the segmentation result, the pseudo label and the confidence level thereof are generated based on the two segmentation results, and white, gray and black in the pseudo label are respectively represented as 1, 0.5 and 0.

Further, the training the target segmentation model according to the pixel-level pseudo-labeling includes:

Under the supervision of pixel-level pseudo labeling, a deep neural network model of target segmentation is trained, and the network structure is shown in fig. 4. The input data includes an image I, a bounding box label B, and a pixel-level pseudo label S.

(1) The input image I firstly obtains global features through an encoder, performs bounding box regression prediction on the basis of the global features to obtain a bounding box B, and performs global segmentation to obtain a global segmentation result S _G*.

(2) And obtaining a rough position of the target through the predicted boundary box B, extracting image features in the boundary box B to obtain local features, and dividing based on the local features to obtain a local division result S _L*.

(3) The loss function of model training includes three parts: loss L (B, B) between the labeled and predicted bounding boxes B, loss L (S, S _G*) between the global segmentation result S _G* and the pseudo-label S, loss L (S, S _L*) between the local segmentation result S _L* and the pseudo-label S. Meanwhile, the false labeling confidence coefficient C is counted into the loss function, and the larger the confidence coefficient is, the larger the weight of the false labeling confidence coefficient C in the loss function is. The total loss function is:

Loss＝(L(B，B^*)+L(S,S_G ^*)+L(S,S_L ^*))×C (5)

the invention simultaneously generates the segmentation result and the prediction boundary frame, so that the model is a model for multi-task collaborative learning, and the two tasks are simultaneously learned, thereby further improving the characteristic learning capacity of the neural network. The segmentation results are generated based on the global features and the local features, respectively, and are therefore multi-scale segmentation models.

Further, the optimizing pixel-level pseudo-labeling based on cross-validation iteration includes:

in the pixel-level false labeling, although the confidence coefficient implies the probability that the false labeling is correct, the false labeling with low confidence coefficient is not reliable, but the false labeling with high confidence coefficient still has the condition of labeling errors, namely labeling noise. The invention further adopts a marking noise elimination mechanism based on cross validation, and the general flow is shown in figure 5. Firstly, randomly dividing a data set T and a pseudo-label S into two parts T ₁,S₁ and T ₂,S₂, respectively learning two network models D (T ₁,S₁) and D (T ₂,S₂), The result W ₁ is then inferred on the data set T ₁ using the learned network model D (T ₂,S₂). Since the model of inferred W ₁ is trained on dataset T ₂ and pseudo-annotation S ₂, inferred result W ₁ and pseudo-annotation S ₁ are relatively independent. By calculating the degree of difference between the pseudo-labeling S ₁ and the inferred result W ₁, noise is considered to be greater than a certain threshold, and the result is labeled as noise (the confidence level is set to 0) in S ₁. Similarly, noise in the pseudo tag S ₂ can be detected and marked as noise (confidence level set to 0) in S ₂. after updating the pseudo tags S ₁ and S ₂, combining the two parts of data, and repeating the steps to update iteratively until no pseudo tag noise is detected.

Further, the method comprises the step of continuously iteratively optimizing the pseudo label through cross-validation-based pseudo label noise detection on the basis of the initial pixel-level pseudo label with confidence. After the reliable pseudo labels are obtained, inputting all image data and the pseudo labels with confidence to a multi-task multi-scale target segmentation model, and retraining to obtain the target segmentation model which is the optimal model.

Further, a system for segmenting a target in an image based on weak annotation of a bounding box comprises:

the acquisition module is used for: for acquiring an image dataset;

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. The method for dividing the target in the image based on the weak annotation of the bounding box is characterized by comprising the following steps:

acquiring an image data set, and marking a boundary box of the image data set;

processing the boundary box label to obtain a specific position of a target in the boundary box;

The specific steps of processing the boundary box label are as follows:

marking the input image and the boundary box through a color space;

Coupling the first segmentation confidence coefficient and the second segmentation confidence coefficient to obtain the confidence coefficient of the pseudo label;

the iterative optimization pixel-level pseudo-labeling based on cross-validation comprises:

Updating the pseudo labels S ₁ and S ₂, combining the two parts of data, and repeating the steps for iterative updating until no noise is detected;

2. The method for segmenting the target in the image based on the weak annotation of the boundary box according to claim 1, wherein the step of carrying out the boundary box annotation on the image data set comprises the following steps: and marking the minimum circumscribed rectangular frame of each target in the image data set as a boundary frame, merging the boundary frames of a plurality of targets into one boundary frame when the boundary frames of the targets are overlapped, wherein the targets are positioned in the boundary frame, and the background area is arranged outside the boundary frame.

3. The method for segmenting the target in the image based on the weak annotation of the bounding box according to claim 1, wherein the training the target segmentation model according to the pixel-level pseudo annotation comprises:

Constructing a deep neural network model;

4. A method of object segmentation in an image based on bounding box weak annotation according to claim 3, comprising: and updating the pixel-level pseudo-labels by the optimized pixel-level pseudo-labels, training the target segmentation depth neural network model under the supervision of the optimized pixel-level pseudo-labels, iterating for N rounds until the model converges, and outputting an optimal model.

5. A system for segmenting a target in an image based on weak annotation of a bounding box, comprising:

the acquisition module is used for: for acquiring an image dataset;

The specific steps of processing the boundary box label are as follows:

marking the input image and the boundary box through a color space;