CN107506792A

CN107506792A - A kind of semi-supervised notable method for checking object

Info

Publication number: CN107506792A
Application number: CN201710702593.XA
Authority: CN
Inventors: 崔静; 谭凯; 王嘉欣
Original assignee: Chengdu Jisheng Intelligent Engineering Co Ltd; Shanghai Hefu Artificial Intelligence Technology (group) Co Ltd
Current assignee: Shanghai Hefu Artificial Intelligence Technology Group Co ltd
Priority date: 2017-08-16
Filing date: 2017-08-16
Publication date: 2017-12-22
Anticipated expiration: 2037-08-16
Also published as: CN107506792B

Abstract

The invention discloses a kind of semi-supervised notable method for checking object, based on improved faster rcnn RPN network models, method includes：S1：Initial segmentation, the initial ground truth as object are carried out to object box in picture；S2：Network is set；S3：Training network；S4：Notable object prediction is carried out to training picture；S5：The smooth of super-pixel segmentation and super-pixel level is carried out to training picture, and carries out binarization operation；S6：The object prospect spectrum of the binaryzation obtained by the use of step S5 is used as ground truth；S7：Repeat step S3 ~ S6.The present invention proposes the joint network comprising notable prediction module and obj ect detection module, and the output of notable prediction module is merged with the intermediate features layer of sort module, carries out combined optimization.This network structure efficiently utilizes the profile information of object, and auxiliary network more accurately detects notable object.

Description

A kind of semi-supervised notable method for checking object

Technical field

The present invention relates to the convolutional neural networks field of notable object detection, more particularly to a kind of semi-supervised notable object Detection method.

Background technology

The mankind have the ability that object of interest is perceived from picture.In computer vision field, this object interested Referred to as notable object.The apish this ability of computer, detect that notable object is examined referred to as notable object from picture Survey.Notable object detection has become a popular field, because it can aid in other image processing tasks, has extensive Application, its application field include object detection, Object Segmentation, scene understand, image classification and retrieval etc..

In computer vision field, there are many notable method for checking object.These methods can be divided into two classes：One kind is nothing The notable method for checking object of supervision, another kind of is the notable method for checking object for having supervision.Traditional unsupervised notable object Detection method is normally based on the color of image, brightness, Texture eigenvalue, utilizes local contrast, global contrast or figure Notable object is detected as the background information at edge.These methods are typically based on the distribution of notable object in the picture in design It is assumed that the diversity of notable object can not be well adapted for.Such as the method based on local contrast can not detect well Go out the object small with picture background difference, the method based on image edge background information can not preferably detect to be located at image side The notable object of edge.In addition, for the complicated image of scene, unsupervised approaches possibly can not accurately detect notable object, imitate Fruit is not ideal enough.

Compared to unsupervised approaches, there is the method for supervision, such as svm, random tree, DNN, utilize the picture of mark The model trained is simultaneously used for notable object detection by training pattern.These methods have shown better performance.Especially base In deep neural network (DNN) method, due to the performance advantage that deep neural network is shown on large-scale data, it is based on The notable method for checking object of deep neural network further improves the performance of notable object detection.It is however, existing based on deep The notable method for checking object for spending neutral net is the training mode supervised entirely, and the performance of these methods largely depends on In the scale of training data.Picture number is very few easily makes network over-fitting for training, causes poor detection performance；And to reach Preferable Detection results are, it is necessary to tens of thousands of or even up to a million training data.At the same time, each Zhang Xunlian pictures are required for manually Pixel-level mark is carried out, annotation process is dull and takes time and effort.Therefore, the training data of ideal size is collected, is one Part needs to consume the work of a large amount of manpowers and time.

The content of the invention

It is an object of the invention to overcome the deficiencies of the prior art and provide a kind of semi-supervised notable method for checking object, Solve the problems, such as that prior art is big using the training mode workload supervised entirely.

The purpose of the present invention is achieved through the following technical solutions：A kind of semi-supervised notable method for checking object, Based on improved faster rcnn RPN network models, described RPN network models are pre- including obj ect detection module and significantly Survey module；Described one shared convolutional layer of obj ect detection module and the inputs sharing of notable prediction module；

Described notable prediction module includes first convolutional layer, a sigmoid layer and three transposition convolutional layers, altogether The output end for enjoying convolutional layer is composed by the first convolutional layer output characteristic, and described characteristic spectrum is obtained at the beginning of one by sigmoid layers Begin significantly prediction spectrum s, and described initial significantly prediction spectrum s export one significantly in advance after the up-sampling of three transposition convolutional layers Survey spectrum sal；In the training stage, described notable prediction module also includes an Euclidean damage after transposition convolutional layer Lose layer；

Described obj ect detection module includes second convolutional layer, a ReLu layer and a full articulamentum, shared volume The output end of lamination is sequentially connected with the second convolutional layer and ReLu layers, the characteristic spectrum F={ f of ReLu layers output₁,f₂,...,f_nWith Initial significantly prediction spectrum s obtains notable feature spectrum FS={ fs after carrying out func function operations₁,fs₂,...,fs_n, it is described Two matrix corresponding elements of func function representations are multiplied；Full articulamentum extracts prediction block feature from notable feature spectrum, carries out pre- Survey the position prediction and class prediction of frame；

Described method comprises the following steps：

S1：Using object box information, initial segmentation is carried out to object box region in picture, and segmentation result is protected Deposit, the initial groundtruth as object is normative reference；

S2：The detection classification number of obj ect detection module and the initial learning rate of network are set；

S3：The described network of training, optimize network losses L using stochastic gradient descent algorithm and update network, its middle school Habit rate is updated per iteration n times；

S4：The network trained to step S3, shared convolutional layer using notable prediction module and above is as Test Network Network, notable object prediction is carried out to training picture；

S5：Super-pixel segmentation, the notable prediction obtained using super-pixel segmentation result to step S4 are carried out to training picture Compose sal and carry out the smooth of super-pixel level, and binarization operation is carried out to the spectrum after smooth, finally give the object prospect of binaryzation Spectrum；

S6：The object prospect spectrum of the binaryzation obtained by the use of step S5 is used in alternative steps S1 as groundtruth The foreground object region that dividing method obtains, and learning rate is arranged to the value of learning rate at the end of network training in step S3；

S7：Repeat step S3~S6, until network training reaches promising result.

Further, described method also includes：

S8：The model completed to training is tested：Test network is obtained by the way of step S4, test image is defeated Enter into test network, test network exports notable object prediction spectrum, and significantly prediction spectrum is surpassed using step S5 mode Pixel-level is smooth.

Further, binarization operation is in step S5：The pixel that will be above threshold value σ is set to 1, less than threshold value σ pixel It is set to 0；In the object prospect spectrum of obtained binaryzation, subject area pixel is 1, background area pixels 0.

Further, the formula of the network losses L described in step S3 is：

In formula, first two two losses for representing RPN networks in faster rcnn,Represent prediction block confidence Degree loss, p_iRepresent that the confidence level that i-th of prediction block includes object is given a mark,True classification is represented,Represent prediction block Position loss between true frame, t_iThe position coordinates of i-th of prediction block is represented,The position coordinates of true frame is represented, α is power Weight coefficient；Section 3 is lost for significantly prediction, is provided by Euclidean loss layers, β is weight coefficient；Wherein：

In formula, gt is a two-value spectrum, represents the groundtruth of image object；In the starting stage, gt is to image pair As the two-value spectrum that frame region segmentation obtains, N is the number of gt pixels.

The beneficial effects of the invention are as follows：The present invention proposes a kind of semi-supervised notable method for checking object, will significantly compose Prediction is joined to consolidated network with object detection and optimized；Utilize object box information (object box coordinate information, object box coordinate As rectangle frame coordinate in the upper left corner and the lower right corner in picture) mark of the initial segmentation result as inaccuracy is obtained, calculate The significantly prediction loss of prediction spectrum；Meanwhile using the structural information and profile information of object during object detection, it will significantly predict spectrum It is added in detection module, obtains more accurately significantly predicting spectrum；In addition, the network is trained only to need to utilize object box position Information, avoid and Pixel-level mark is carried out to database picture, save the time loss of picture mark, alleviate artificial mark When workload.

Brief description of the drawings

Fig. 1 is training stage schematic diagram of the invention；

Fig. 2 is test phase schematic diagram of the present invention；

Fig. 3 is the inventive method flow chart.

Embodiment

Technical scheme is described in further detail below in conjunction with the accompanying drawings：A kind of semi-supervised notable object detection side Method, based on improved faster rcnn RPN network models, as shown in figure 1, described RPN network models include object detection Module and notable prediction module；Described one shared convolution of obj ect detection module and the inputs sharing of notable prediction module Layer；Shared convolutional layer includes multilayer convolutional layer.

Described notable prediction module includes first convolutional layer, a sigmoid layer and three transposition convolutional layers.Its In, the convolution kernel size of first convolutional layer is 1*1, and stride 1, pad are [0,0,0,0], and convolution kernel number is 1, the layer Output be a characteristic spectrum, it is consistent with preceding layer last layer of convolutional layer of convolutional layer (i.e. shared) output size.This feature Spectrum then passes through a sigmoid layer, and the span for obtaining pixel value in initial significantly prediction spectrum a s, s is [0,1]；For Obtain composing with input picture im significantly predictions of the same size, the present invention with the addition of three transposition convolution after sigmoid layers Layer, it is therefore an objective to up-sampled to significantly prediction spectrum.The up-sampling factor of the first two transposition convolutional layer is 2, the 3rd transposition volume The up-sampling factor of lamination is 4.By three transposition convolutional layers, notable prediction module exports significantly prediction spectrum a sal, sal Size and input picture im it is in the same size, the span of pixel is [0,1] in sal.In the training stage, notable prediction module An Euclidean loss layer is added after transposition convolutional layer.This layer of loss function is provided by formula (1)：

Wherein, gt is a two-value spectrum, represents the groundtruth of image object.In the starting stage, gt is to image pair The two-value spectrum obtained as frame region segmentation.N is the number of gt pixels.It is assumed here that object box and notable object have one it is very tight The fitting of cause.Under the premise of this, the purpose that adds above-mentioned loss is the prospect that to allow in prediction spectrum sal and object box tries one's best one Cause, while it is 1 also to require that the pixel value of prospect is tried one's best, and improves the degree of accuracy of prediction.

Described obj ect detection module added between the convolutional layer and full articulamentum of RPN networks the second convolutional layer and ReLu layers.The convolution kernel size of second convolutional layer is 1*1, and stride 1, pad are [0,0,0,0], convolution kernel number with it is previous The convolution kernel number of layer is consistent.The output of the characteristic spectrum and preceding layer of this layer of output can so kept in size and number Unanimously, without being modified to follow-up full articulamentum.Characteristic spectrum F={ the f of ReLu layers output₁,f₂,...,f_nWith Initial significantly prediction spectrum s is proceeded as follows：

fs_i=func (f_i,s) (2)

Wherein, function func represents that two matrix corresponding elements are multiplied, i-th characteristic spectrum f_iPass through func functions with s, obtain Fs is composed to corresponding notable feature_i.Full articulamentum composes FS={ fs from notable feature₁,fs₂,...,fs_nOn to extract prediction block special Sign, it is predicted the position prediction and class prediction of frame.Characteristic spectrum F is a kind of semantic character representation in middle level, every spectral representation Some characteristic information of object, the either specific texture information of component information of such as object.It is initial aobvious in aforesaid operations A mask spectrum can be counted as by writing prediction spectrum s, by initially will significantly predict that spectrum s and characteristic spectrum F carries out corresponding element phase Multiply, the background information outside object can be filtered out, can more accurately detection object；Meanwhile in detection object, Neng Gouli With the structural information of object in itself, notable object prediction module prediction significantly spectrum is preferably aided in.

In the training stage, the loss function that the present invention carries network is provided by formula (3)：

Loss function L includes three losses, and Section 3 is lost for significantly prediction, as shown in formula (1).β is weight coefficient, It is arranged to 1.First two of loss function LWithRepresent two damages of RPN networks in faster rcnn Lose.p_iRepresent that the confidence level that i-th of prediction block includes object is given a mark,True classification is represented, if comprising object=1, Otherwise it is 0.Represent the loss of prediction block confidence level.t_iThe position coordinates of i-th of prediction block is represented,Represent true frame Position coordinates,Represent that the position between prediction block and true frame is lost.α is weight coefficient, as RPN networks, α It is arranged to 1.

As shown in figure 3, described method comprises the following steps：

S1：Using object box information, initial segmentation is carried out to object box region in picture, and segmentation result is protected Deposit, the initial ground truth as object are normative reference；In the present embodiment, using Graphcut methods in picture Object box region carries out initial segmentation；And it is not limited to use Graphcut methods in another embodiment, Ren Hefen Segmentation method can use.

S2：2 (object 1, backgrounds 0) are arranged to by classification number is detected in obj ect detection module.The initial of network is set Learning rate, learning rate is arranged to 0.001.

S3：The described network of training, optimize network losses L using stochastic gradient descent algorithm and update network, its middle school Habit rate is per iteration 2*10⁴It is secondary to be multiplied by 0.1；

S5：Super-pixel segmentation is carried out to training picture, in the present embodiment, training picture surpassed using slic methods Pixel is split；Superpixel segmentation method is not limited to slic methods.Then step S4 is obtained using super-pixel segmentation result aobvious Write prediction spectrum sal and carry out the smooth of super-pixel level, and binarization operation is carried out to the spectrum after smooth：It will be above threshold value σ pixel 1 is set to, the pixel less than threshold value σ is set to 0, and in the present embodiment, threshold value σ is arranged to 0.3；Finally give the object of binaryzation Prospect is composed, and subject area pixel is 1, background area pixels 0.

S6：The object prospect spectrum of the binaryzation obtained by the use of step S5 is made in alternative steps S1 as ground truth The foreground object region obtained with dividing method, and learning rate is arranged in step S3 learning rate at the end of network training Value；

S7：Repeat step S3~S6, until network training reaches promising result.In the present embodiment, when current significantly pre- Survey spectrum and be less than 10 with last time prediction result mean pixel difference^-3When, you can deconditioning.

More preferably, in the present embodiment, as shown in Fig. 2 described method also includes：

Claims

A kind of 1. semi-supervised notable method for checking object, based on improved faster rcnn RPN network models, its feature It is：Described RPN network models include obj ect detection module and notable prediction module；Described obj ect detection module and significantly One shared convolutional layer of inputs sharing of prediction module；

Described notable prediction module includes first convolutional layer, a sigmoid layer and three transposition convolutional layers, shared volume The output end of lamination is composed by the first convolutional layer output characteristic, and described characteristic spectrum obtains one by sigmoid layers and initially shown Write prediction spectrum s, described initial significantly prediction spectrum s and a significantly prediction spectrum is exported after the up-sampling of three transposition convolutional layers sal；In the training stage, described notable prediction module also includes the Euclidean loss layers one after transposition convolutional layer；

Described obj ect detection module includes second convolutional layer, a ReLu layer and a full articulamentum, shares convolutional layer Output end be sequentially connected with the second convolutional layer and ReLu layers, ReLu layers output characteristic spectrum F={ f₁,f₂,...,f_nAnd it is initial Significantly prediction spectrum s obtains notable feature spectrum FS={ fs after carrying out func function operations₁,fs₂,...,fs_n, described func letters Number represents that two matrix corresponding elements are multiplied；Full articulamentum extracts prediction block feature from notable feature spectrum, is predicted frame Position prediction and class prediction；

Described method comprises the following steps：

S1：Using object box information, initial segmentation is carried out to object box region in picture, and segmentation result is preserved, is made It is normative reference for the initial groundtruth of object；

S2：The detection classification number of obj ect detection module and the initial learning rate of network are set；

S3：The described network of training, optimize network losses L using stochastic gradient descent algorithm and update network, wherein learning rate It is updated per iteration n times；

S4：The network trained to step S3, shared convolutional layer using notable prediction module and above are right as test network Picture is trained to carry out notable object prediction；

S5：Super-pixel segmentation is carried out to training picture, sal is composed in the notable prediction obtained using super-pixel segmentation result to step S4 The smooth of super-pixel level is carried out, and binarization operation is carried out to the spectrum after smooth, finally gives the object prospect spectrum of binaryzation；

S6：The object prospect spectrum of the binaryzation obtained by the use of step S5 is used as groundtruth, using splitting in alternative steps S1 The foreground object region that method obtains, and learning rate is arranged to the value of learning rate at the end of network training in step S3；

S7：Repeat step S3~S6, until network training reaches promising result.
A kind of 2. semi-supervised notable method for checking object according to claim 1, it is characterised in that：Described method is also Including：

S8：The model completed to training is tested：Test network is obtained by the way of step S4, test image is input to In test network, test network exports notable object prediction spectrum, and super-pixel is carried out to significantly prediction spectrum using step S5 mode Level is smooth.
A kind of 3. semi-supervised notable method for checking object according to claim 1, it is characterised in that：Two-value in step S5 Changing operation is：The pixel that will be above threshold value σ is set to 1, and the pixel less than threshold value σ is set to 0；The object prospect spectrum of obtained binaryzation In, subject area pixel is 1, background area pixels 0.
A kind of 4. semi-supervised notable method for checking object according to claim 1, it is characterised in that：Described in step S3 Network losses L formula be：

<mrow> <mi>L</mi> <mo>=</mo> <msub> <mi>L</mi> <mrow> <mi>c</mi> <mi>l</mi> <mi>s</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>p</mi> <mi>i</mi> </msub> <mo>,</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mo>*</mo> </msubsup> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&alpha;L</mi> <mrow> <mi>l</mi> <mi>o</mi> <mi>c</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>t</mi> <mi>i</mi> </msub> <mo>,</mo> <msubsup> <mi>t</mi> <mi>i</mi> <mo>*</mo> </msubsup> <mo>)</mo> </mrow> <mo>+</mo> <mi>&beta;</mi> <mi>L</mi> <mrow> <mo>(</mo> <mi>s</mi> <mi>a</mi> <mi>l</mi> <mo>,</mo> <mi>g</mi> <mi>t</mi> <mo>)</mo> </mrow> </mrow>

In formula, first two two losses for representing RPN networks in faster rcnn,Represent prediction block confidence level damage Lose, p_iRepresent that the confidence level that i-th of prediction block includes object is given a mark,True classification is represented,Represent prediction block with it is true Position loss between real frame, t_iThe position coordinates of i-th of prediction block is represented,The position coordinates of true frame is represented, α is weight system Number；Section 3 is lost for significantly prediction, is provided by Euclidean loss layers, β is weight coefficient；Wherein：

<mrow> <mi>L</mi> <mrow> <mo>(</mo> <mi>s</mi> <mi>a</mi> <mi>l</mi> <mo>,</mo> <mi>g</mi> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mn>2</mn> <mi>N</mi> </mrow> </mfrac> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </msubsup> <msup> <mrow> <mo>(</mo> <msub> <mi>sal</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>gt</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow>

In formula, gt is a two-value spectrum, represents the groundtruth of image object；In the starting stage, gt is to image object frame The two-value spectrum that region segmentation obtains, N are the number of gt pixels.