CN110111340A

CN110111340A - The Weakly supervised example dividing method cut based on multichannel

Info

Publication number: CN110111340A
Application number: CN201910347532.5A
Authority: CN
Inventors: 程明明; 刘云; 吴宇寰
Original assignee: Nankai University
Current assignee: Nankai University
Priority date: 2019-04-28
Filing date: 2019-04-28
Publication date: 2019-08-09
Anticipated expiration: 2039-04-28
Also published as: CN110111340B

Abstract

A kind of Weakly supervised example dividing method cut based on multichannel.This method trains the convolutional neural networks for example segmentation using only the labeled data of image level.Specifically, a training set only with image level mark is given, the unrelated object of several classifications is calculated to every image with quasi- physical property sampling algorithm and recommends region；Then recommend region as input using image and corresponding object, using the image category of mark as learning objective, calculate the class probability distribution and semantic feature that each object recommends region by more case-based learning frames.Object in entire data set is recommended into region as node and establishes a large-scale graph model, regards the graph model as a multichannel and cuts problem, segmentation result recommends region to assign a category label as a result on each object；Or any convolutional neural networks for example segmentation are trained as training set.Experiment shows that this method is substantially better than existing Weakly supervised example dividing method.

Description

The Weakly supervised example dividing method cut based on multichannel

Technical field

The invention belongs to technical field of computer vision, are related specifically to a kind of Weakly supervised example segmentation cut based on multichannel Method.

Background technique

Example segmentation is dedicated to respectively splitting each object in an image and identifies the classification of object.It is based on Business and academic immense value, example segmentation are a vital tasks in computer vision.Nearest example cutting techniques Progress mostlys come from some basic models based on convolutional neural networks, such as Ross Girshick in ICCV meeting in 2015 Faster R-CNN that Fast R-CNN, Shaoqing Ren for iing is proposed in view et al. are proposed in NIPS meeting in 2015, The Mask R-CNN that Kaiming He et al. is proposed in ICCV meeting in 2017.But these deep learning models seriously according to Rely in a large amount of training data, these training datas are all to have the object example of pixel scale to mark.It is marked from pixel scale One image is very time-consuming, therefore collecting so more data is a very expensive thing.

In order to reduce the demand to pixel scale labeled data, some research work are believed object marker frame as supervision Breath, Lai Xunlian example parted pattern." the Simple does that Anna Khoreva et al. is delivered in CVPR meeting in 2017 With modified version in it:Weakly supervised instance and semantic segmentation " paper GrabCut algorithm estimates the object segmentation in object marker frame, and the segmentation of these examples is then corrected with MCG algorithm.Qizhu Paper " the Weakly-and semi-supervised panoptic that Li et al. people delivers in ECCV meeting in 2018 The method that segmentation " extends Anna Khoreva et al., they correct the example estimated with the method for iteration Segmentation.Specifically, they first obtain initial example with the method similar with Anna Khoreva and divide to train network, Training net network again is removed using the prediction result after the completion of network training as new partitioning estimation again, such iteration several times, obtains Final result.

However, a large amount of object frame is marked still to take time and effort very much, its using object marker frame as supervision message is needed His task, such as object detection, all have begun the strategy for seeking Weakly supervised study.Therefore, Yanzhao Zhou et al. exists " the Weakly Supervised Instance Segmentation using Class Peak delivered on CVPR for 2018 Supervision message is further loosened to the mark of image level in Response " paper, that is, only uses the figure with class label As training example parted pattern as training data.They propose a new concept, " class peak value of response ", i.e., with institute When the picture training image disaggregated model of offer, by certain processing so that convolutional neural networks are on each object have it is larger Peak value of response, can be obtained by the approximate location of object in this way, in conjunction with quasi- physical property sampling calculate object recommend region, just The result of available example segmentation.

Summary of the invention

The training number that object of the present invention is to solve to need a large amount of pixel scale marks present in existing example cutting techniques According to the technical issues of, provide it is a kind of cut based on multichannel Weakly supervised example segmentation method.This method is only needed provide with class The picture not marked, so that it may learn an example parted pattern out.

To achieve the goals above, the present invention devises a kind of more case-based learning frames first, the frame with image and Corresponding quasi- physical property sampled result is as input, and using image category as learning objective, trained model can be an input Image calculate probability distribution and semantic feature that each object recommends region.Based on these probability distribution and semantic feature, I Construct a multichannel and cut problem, and then recommend region to assign a correct class label for each object.

The present invention provides the Weakly supervised example dividing methods cut based on multichannel, comprise the following steps:

A. the data set comprising training set and test set is given, every image has the mark of image level in training set Label, it may include target category object that this method, which is every image generation in data set with the algorithm that general quasi- physical property samples, The object of body recommends region, and it includes the object of target category that these objects, which recommend region to be possible to, it is also possible to not include and (carry on the back Scape)；Also, it is not no class label that these objects, which recommend region, simply indicates that these regions may include target category Object.

B. the more case-based learning frames for recommending region based on object are devised, more case-based learning frames are with image With corresponding object recommendation region as input, using the label classification of image as learning objective, more case-based learning frames of design The loss function of frame can recommend regional learning to calculate class probability distribution and semantic information for each object.

One convolutional Neural net as shown in Figure 2 of described more case-based learning Frame Designs for recommending region based on object Network model enables the model to be that each object recommends one probability distribution of regional prediction, can be with according to this probability distribution The category label of image is used to recommend the supervision target in region as each object.The more examples for recommending region based on object The loss function of learning framework is made of three parts, i.e., in attention loss function, more case-based learning loss functions and cluster Heart loss function, wherein the first two loss function is mainly used to learn classification information, and cluster centre loss function is to learn The semantic feature in object recommendation region.

C. the class probability distribution and semantic information for recommending region with object calculated in step b, by entire data set In object recommend region as node to establish a large-scale graph model, by the graph model regard as one it is more on a large scale Problem is cut on road, and segmentation result recommends region to assign a category label on each object.

Specifically, the node that each object recommendation region is regarded as to figure, sees mapping for each target category Vertex, the distance on the side of a node a to vertex are exactly the class probability predicted, the distance on the side between two nodes It is the included angle cosine value between their semantic feature vector, the distance between two vertex are infinitely great.The target that multichannel is cut It is that entire figure is divided into several subsets, one and only one vertex in each subset, each node is also belonged to and only belonged to A subset.Solve this large-scale multichannel problem of cutting be it is unpractical, still, the large-scale multichannel problem of cutting can be with The maximum number of edges connected by limiting each node, so that large-scale multichannel is cut PROBLEM DECOMPOSITION as several small rule The multichannel of mould cuts problem.Problem solving is cut to each small-scale multichannel, the intersection that they solve is exactly the solution of big figure.Multichannel cut by Representing object recommends each node point in region in a subset, and classification corresponding to the vertex which includes is exactly this The classification in object recommendation region.

D. it will mark the object for being that region is recommended to delete in step c, remaining object recommends region and corresponding class Segmentation result Biao Ji not can be used as；Remaining object can also be used to recommend region as training data to train any be used for The convolutional neural networks of example segmentation, network after training can be used for carrying out example segmentation to image.

The advantages of the present invention

The present invention can calculate the probability distribution that an object recommends region by case-based learning frame more than one simultaneously And semantic feature, a multichannel is finally established with them cuts problem.Doing so can be with binding object example, image and entire number Extra object is filtered according to the information on collection and recommends region, is retained correct object and is recommended region and assign class label.This It is more more robust than the attention model based on image classification network and accurate.

Detailed description of the invention

Fig. 1 is overall flow figure of the invention.

Fig. 2 is the convolutional neural networks in more case-based learning frames for being proposed.

Fig. 3 is the comparison of experimental result and correlation technique of the invention.

Fig. 4 is several groups of example results of the invention.The first row and fourth line are original input picture, the second row and the 5th Row is correctly to divide, and the third line and the 6th row are that method of the present invention exports the segmentation masking-out quilt as a result, the result It has signed in original image in order to observing.

Specific embodiment

With reference to the accompanying drawing, specific embodiments of the present invention will be described in further detail.Following embodiment is for saying The bright present invention, but be not intended to limit the scope of the invention.

Based on the Weakly supervised example dividing method that multichannel is cut, the concrete operations of this method are as follows:

A. present networks model is the convolutional neural networks model for recommending the multi-instance learning of pool area with object, wherein Characteristic extraction part can be " the Very Deep Convolutional Networks for that Karen Simonyan is delivered The VGG16 framework mentioned in Large-Scale Image Recognition " article, is also possible to what Kaiming He was delivered The ResNet framework mentioned in " Deep residual learning for image recognition " article or other Basic network architectures.For ResNet-50 network, as shown in Fig. 2, we are (complete to the last one module of ResNet Before the average pond of office) area-of-interest pond module is added.Interest pool area module is that input is sampled by quasi- physical property The multiple objects arrived recommend regional frame, then cut out object from characteristic pattern again and recommend the region of regional frame same position size special Sign, then maximum pondization sampling is carried out to the provincial characteristics, each module can obtain 7x7 identical with input feature vector figure port number Characteristic pattern.So object recommendation regional frame is input in the module by we after ResNet extracts characteristic pattern, for Each 7x7 characteristic pattern for recommending frame that can obtain (2048 channels) identical as input feature vector figure port number.It is complete using one layer The average pond of office, each feature vector for recommending frame that can obtain corresponding 2048 dimensions.Feature vector is inputted into one Layer 21 neuron full articulamentum and after softmax layer, it is each recommend frame will correspond to one 21 tie up probability characteristics Vector remembers that i-th of 2048 dimensional feature vectors are f_i, i-th 21 dimension probability characteristics vectors are p_i。

B. for f obtained in a._iAnd p_iWith more case-based learning frames in Fig. 1, it is proposed that several loss function conducts Team surveillance.First is the loss of network attention, we are delivered using Bolei Zhou et al. in CVPR meeting in 2016 The CAM method proposed in " Learning Deep Features forDiscriminative Localization " calculates Classification pays attention to trying hard to, if the attention of k-th of class for being normalized to [0,1] of i-th image try hard to forScheme for i-th The attention classification of j-th of recommendation frame as in, j-th of recommendation frame is denoted asIf:

Recommend the attention score of the classification k of frame j for the i-th picture,It can be calculated by following formula:

Attention loss functionIt can be calculated by following formula:

Wherein, | S_i| it is the sum for recommending frame,WithJ-th of recommendation frame is pre- in respectively i-th image Surveying is classificationWith the probability of k', K is the sum of target category.After providing attention loss function, it is proposed that more examples Learning loss function, we first use the characteristic pattern of all recommendation frames interception of the log-sum-exp function for i-th image, Estimate each class probability vector under all recommendation framesMaximum value, ifFor the general of i-th image kth ' a classification Rate estimated value,It can be calculated by following formula:

Wherein r is a parameter of log-sum-exp function, here r=5, so that the function is estimation input vector Maximum value.In the probabilistic estimated value for estimating i-th image kth ' a classificationAfterwards, more case-based learning loss functionsIt can It is calculated by following formula:

Wherein Y_i′It is positive example classification,It is negative example classification, the two classes are mutual exclusions.More case-based learning losses are introduced After function, below come introduce it is proposed that third loss function: the cluster centre loss function based on more case-based learnings.It is poly- Class center loss functionIt is calculated by following two formula:

WhereinIndicate classification corresponding to the maximum probability of i-th j-th of image recommendation frame,For the same recommendation Corresponding 2048 dimensional feature vector of frame,It is classificationStatistical nature vector, ‖ ‖₂Indicate the 2- norm of vector, | S_i| it is Recommend the sum of frame.Statistical nature vectorCan occur slowly change with trained:

WhereinLast time iterative calculation obtainsIt is calculated for current iterationθ is to update The parameter of speed, we use θ=0.01.Introduced it is proposed that three loss functions after, we use these loss The fusion of function is as final loss function:

Wherein α, β, γ are respectively the weight of three loss functions, here, we use α=0.5, β=0.5, γ= 0.1.More than, we input picture recommendation frame obtained by the quasi- physical property method of sampling corresponding with its into more case-based learning frames Frame, and use L⁽ⁱ⁾It exercises supervision training as loss function.

C. with after the completion of the method training in step b, by picture and frame is recommended to be input in frame, available every figure Feature vector and class probability vector corresponding to each recommendation frame of piece.Using them, it is undirected that we can establish a knowledge Figure.If G=(V, E) is a non-directed graph, V represents node collection, E representative edge collection.Recommend frameWith target category setIt will be used as node, so:

Wherein, S_iIndicate the set of the bounding box in i-th image.IfFor terminal (terminals), side uv ∈ E's Capacity e (u, v)=e (v, u) are as follows:

Using above formula, we have established a non-directed graph.It should be noted that we when building figure in addition to terminal node, I Only retain each node the sides of three maximum capacities.

As above a non-directed graph has been had built up, multichannel has been carried out to the non-directed graph below and is cut.Solve an optimization Problem:

Wherein, Δ_k:‖·‖₁It is the 1- norm of vector.Solving the optimization problem When, we are 3, i.e. maximum three sides of weight by limiting the Maximum edge numbers that each node is connected, thus will solve The subgraph that non-directed graph converts for multiple connected domains, these subgraphs G_t=(V_t,E_t) be disconnected with each other, and:

We independently solve optimization problem above under each subgraph.For each subgraph, a multichannel can be obtained Cut D_t, and:

∪_tD_t=D,

The multichannel that wherein D is G is cut.

D. the multichannel in step c cuts each node point that will represent object recommendation region in a subset, the subset packet Classification corresponding to the vertex contained is exactly the classification that this object recommends region.Region is recommended to delete on the object labeled as background It removes, remaining object recommends region and corresponding category label to can be used as segmentation result；It can also be recommended with remaining object Any convolutional neural networks for example segmentation are trained in region as training set, and network after training can be to image Carry out example segmentation.

Fig. 3 illustrates our method compared with other methods.mAP_0.5 ^rAnd mAP_0.75 ^rRespectively indicating threshold value is 0.5 He The mean accuracy average by class when 0.75, ABO indicate average best level of coverage.CAM method is that Bolei Zhou et al. exists " the Learning Deep Features for Discriminative Localization " that CVPR meeting in 2016 is delivered The method of middle proposition, SPN are that Zhu Yi et al. in ICCV meeting in 2017 delivers " Soft proposal networks for The method proposed in weakly supervised object localization ", MELM was Fang Wan et al. in 2018 " the Min-Entropy Latent Model for Weakly Supervised Object that CVPR meeting is delivered The method proposed in Detection ", PRM are that Yanzhao Zhou et al. in CVPR meeting in 2018 delivers " Weakly The method that Supervised Instance Segmentation using Class Peak Response " is proposed.LIID is It is proposed that method.Rect. it represents and pays attention to trying hard to using box covering, Ellipse is represented to be tried hard to using attention, and MCG is represented " the Multiscale Combinatorial delivered using PabloArbelaez et al. in CVPR meeting in 2014 The method mentioned in Grouping " removes covering attention figure.It can be found that our method will be than these sides in all indexs Method is good.

Fig. 4 is 10 groups of example figures of the example segmentation result obtained using our method.The first row and fourth line are former The input picture of beginning, the second row and fifth line are correctly to divide, and the third line and the 6th row are method output of the present invention As a result, the segmentation masking-out of the result has been signed in original image in order to observing.

The uppermost picture of every group of example figure is original image, and intermediate picture is the reference result of mankind's mark, most lower Face is the result that our method generates.

Claims

1. a kind of Weakly supervised example dividing method cut based on multichannel, which is characterized in that this method comprises the following steps:

A. the data set comprising training set and test set is given, every image has the label of image level in training set, Algorithm with general quasi- physical property sampling be every image in data set generate may include target category object object Recommend region；

B. the more case-based learning frames for recommending region based on object are devised, more case-based learning frames are with image and right The object answered recommend region as input, using the label classification of image as learning objective, design more case-based learning loss functions Regional learning is recommended to calculate class probability distribution and semantic information for each object；

C. the class probability distribution and semantic information for recommending region with object calculated in step b, will be in entire data set Object recommends region as node and establishes a large-scale graph model, regards the graph model as a large-scale multichannel and cuts Problem, segmentation result recommend region to assign a category label on each object；

D. it will mark the object for being that region is recommended to delete in step c, remaining object recommends region and corresponding classification mark It is denoted as segmentation result；Remaining object can also be used to recommend region as training data to train any example that is used for divide Convolutional neural networks, network after training is used to carry out example segmentation to image.

2. the method for the Weakly supervised example segmentation according to claim 1 cut based on multichannel, it is characterised in that: the base Recommend one convolutional neural networks model of more case-based learning Frame Designs in region in object, enabling the model is each object Body recommends one probability distribution of regional prediction, in order to use the category label of image as the supervision mesh in each object recommendation region Mark.

3. the method for the Weakly supervised example segmentation according to claim 1 cut based on multichannel, it is characterised in that: the base The loss function of more case-based learning frames in region is recommended to be made of three parts in object, i.e. attention loss function, mostly real Example study loss function and cluster centre loss function, wherein the first two loss function is mainly used to learn classification information, cluster Center loss function is to learn the semantic feature that object recommends region.

4. the method for the Weakly supervised example segmentation according to claim 3 cut based on multichannel, it is characterised in that: the note Meaning power loss functionIt is calculated by following formula:

Wherein, | S_i| it is the sum for recommending frame,WithJ-th of recommendation frame is predicted to be class in respectively i-th image NotWith the probability of k', K is the sum of target category；

More case-based learning loss functionsIt is calculated by following formula:

Wherein Y_i′It is positive example classification,It is negative example classification, the two classes are mutual exclusions,For i-th image belong to kth ' it is a The probabilistic estimated value of classification；

Cluster centre loss functionIt is calculated by following two formula:

WhereinIndicate classification corresponding to the maximum probability of i-th j-th of image recommendation frame,It is corresponding for the same recommendation frame 2048 dimensional feature vectors,It is classificationStatistical nature vector, ‖ ‖₂Indicate the 2- norm of vector, | S_i| to recommend frame Sum.

5. the method for the Weakly supervised example segmentation according to claim 4 cut based on multichannel, it is characterised in that: described mostly real The loss function of example learning framework is finally by attention loss function, more case-based learning loss functions and cluster centre loss function It is indicated after fusion are as follows:

Wherein α, β, γ are respectively the weight of three loss functions.

6. the method for the Weakly supervised example segmentation according to claim 1 cut based on multichannel, it is characterised in that: the big rule The multichannel of mould cuts the maximum number of edges that problem is connected by limiting each node, so that large-scale multichannel is cut PROBLEM DECOMPOSITION Problem is cut as several small-scale multichannels.