CN104217225B  A kind of sensation target detection and mask method  Google Patents
A kind of sensation target detection and mask method Download PDFInfo
 Publication number
 CN104217225B CN104217225B CN201410442817.4A CN201410442817A CN104217225B CN 104217225 B CN104217225 B CN 104217225B CN 201410442817 A CN201410442817 A CN 201410442817A CN 104217225 B CN104217225 B CN 104217225B
 Authority
 CN
 China
 Prior art keywords
 mrow
 msup
 image
 msub
 mtd
 Prior art date
 Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
 Active
Links
 238000001514 detection method Methods 0.000 title claims abstract description 59
 230000035807 sensation Effects 0.000 title claims abstract description 25
 230000001537 neural Effects 0.000 claims abstract description 25
 238000000605 extraction Methods 0.000 claims abstract description 14
 238000010845 search algorithm Methods 0.000 claims abstract description 14
 239000000284 extract Substances 0.000 claims abstract description 5
 238000005457 optimization Methods 0.000 claims description 14
 238000000354 decomposition reaction Methods 0.000 claims description 10
 238000000034 method Methods 0.000 claims description 7
 238000006243 chemical reaction Methods 0.000 claims description 3
 239000011159 matrix material Substances 0.000 claims description 3
 230000011218 segmentation Effects 0.000 claims description 3
 239000003086 colorant Substances 0.000 claims description 2
 230000017105 transposition Effects 0.000 claims 1
 238000005065 mining Methods 0.000 abstract description 2
 230000000052 comparative effect Effects 0.000 description 2
 230000000694 effects Effects 0.000 description 2
 210000002569 neurons Anatomy 0.000 description 2
 210000001519 tissues Anatomy 0.000 description 2
 238000004458 analytical method Methods 0.000 description 1
 238000009412 basement excavation Methods 0.000 description 1
 238000002939 conjugate gradient method Methods 0.000 description 1
 230000001419 dependent Effects 0.000 description 1
 230000018109 developmental process Effects 0.000 description 1
 238000010586 diagram Methods 0.000 description 1
 238000005516 engineering process Methods 0.000 description 1
 239000002360 explosive Substances 0.000 description 1
 238000001914 filtration Methods 0.000 description 1
 238000007689 inspection Methods 0.000 description 1
 230000004048 modification Effects 0.000 description 1
 238000006011 modification reaction Methods 0.000 description 1
 238000003909 pattern recognition Methods 0.000 description 1
 238000006467 substitution reaction Methods 0.000 description 1
Abstract
The invention discloses a kind of detection of sensation target and mask method, including：Image input step, inputs image to be detected；Candidate region extraction step, extracts candidate window from described image to be detected using selective search algorithm and is used as candidate region；Feature describes extraction step, is described using the feature that the extensive convolutional neural networks of training in advance carry out candidate region feature description and export the candidate region；Sensation target prediction steps, the feature description based on the candidate region, are predicted candidate region using object detection model trained in advance, estimate the region there are the sensation target；Position annotation step, is labeled the position of the sensation target according to the estimated result.Experiment shows that the present invention compared with mask method, has stronger positive sample mining ability and more generally application prospect with the Weakly supervised sensation target detection of mainstream, the sensation target detection being suitable on largescale dataset and automatic marking task.
Description
Technical field
The present invention relates to object detection technical field in computer vision, more particularly to a kind of regarding based on Weakly supervised study
Feel target detection and mask method.
Background technology
It is one basic problem of computer vision field that objects in images, which is detected with automated location mark, and the field will
One of key problem of research.Objects in images detection is exactly given test image, answers what where this
Problem.Object detection has a wide range of applications in a lot of other vision research problems, such as object identification, pedestrian detection, face
Detection, the foreground detection under monitoring scene, motion tracking, Activity recognition with analysis etc..
General object detection needs the given database for having marked object boundary rectangle, and gradient direction is based on to use
The pure object detection model for having supervision such as histogram (HOG), deformable member model (DPM) carries out model training.Digital Media skill
The high speed development of art so that explosive growth occur in the data such as image, video, and the popularization of internet then allows one to more
Easily get image, the video data of magnanimity.In face of the view data of such magnanimity, current object detection is calculated with standard
The problem of sternness that method needs to face is that substantial amounts of data do not have available object space markup information.To large nuber of images
Data are marked into row position, are that a labor intensity is very high, the very high task of cost.
Comparatively, it is then much easier that classification mark is carried out to whole image, is carried out using the methods of Unsupervised clustering
Filtering, which can also be realized, in advance constructs fairly large taxonomy database in the short time.Thus, utilize only classification annotation
Image data base, realize it is automatic carry out object Category Learning and positioning, i.e., by Weakly supervised study realize sensation target detection with
Mark, there is important theory value and realistic meaning.
In traditional Weakly supervised learning algorithm, the selection for candidate region, is generally basede on the candidate window of intensive collection
Algorithm, window number is very huge, and recall rate and registration are all less desirable.Meanwhile to candidate window generally use word bag
Model is described, and the eigentransformation level of word bag model is usually few, and obtained feature may be considered middle level expression, lack
The information of higher allows model to excavate out object apparent model from image automatically.
In terms of current Weakly supervised object detection and mark the method for mainstream include multiinstance learning, topic model, condition with
Airport etc..Traditional many multiinstance learning algorithms are due to being largely dependent upon core study or study based on distance metric
Frame, and using very high optimization algorithms of complexity such as heuritic approach, quadratic programming, integer programmings, it is difficult to extensive
Efficient application is obtained on data set.
Therefore, how to improve and optimize Weakly supervised learning algorithm and efficiently to realize the object detection of large nuber of images and automatic position
Mark is put, is a major issue of the prior art for being badly in need of solving.
The content of the invention
In view of this, the main object of the present invention is to provide sensation target detection and mask method under Weakly supervised scene,
Target interested can be positioned from image collection automatically in the case of only given image class label, can also be to figure
As carrying out object space automatic marking.
In order to achieve the above object, the present invention provides following technical scheme：
A kind of sensation target detection and mask method, it is characterised in that including：
Image input step, inputs image to be detected；
Candidate region extraction step, extracted using selective search algorithm from described image to be detected candidate window as
Candidate region；
Feature describes extraction step, and carrying out feature to candidate region using the extensive convolutional neural networks of training in advance retouches
State and export the feature description of the candidate region；
Sensation target prediction steps, the feature description based on the candidate region, utilize object detection mould trained in advance
Type is predicted candidate region, estimates the region there are the sensation target；
Position annotation step, is labeled the position of the sensation target according to the estimated result.
Preferably, the selective search algorithm in the candidate region extraction step further comprises：
It is predetermined space by the color space conversion of image to be detected, using the oversegmentation algorithm based on Graph to described
Image is split, and constantly merges the highest two pieces of regions of similarity, the stratification segmentation result of image is obtained, by multiple colors
Behind space and the merging of multilevel cut zone set and duplicate removal processing, the set of candidate regions of the image is obtained.
Preferably, the predetermined color space includes：HSV, RGI, I, Lab.
Preferably, the convolutional neural networks trained in advance are：Instructed based on object classification database ImageNet 2013
Experienced convolutional neural networks.
Preferably, object detection model training step is further included, is specifically included：
Training set image of the input with image category label；
Candidate window is extracted as candidate region from training set image using selective search algorithm；
Feature description is carried out to candidate region using the extensive convolutional neural networks of training in advance and exports the candidate regions
The feature description in domain；
Feature description based on the candidate region, object apparent model is trained using more Exemplary linear support vector machines.
Preferably, it is described using more Exemplary linear support vector machines training object detection model, including：
Object detection model is trained without constraint largespacing multiinstance learning algorithm using MILinear, its target letter
Number is：
Wherein, an image I^{i}N is included by one^{i}A d ties up exemplary bag B^{i}To describe, wherein jth of example is denoted asIf it is positive sample that being included at least in a bag, which has an example, then the label y of the bag^{i}For+1, if all examples are all
Negative sample, then the label y of the bag^{i}For 1, training set is B={ (B^{i},y^{i})  i=1,2 ..., N },  B =N is training set sample
This number, w are grader coefficients, and C is that regular terms is used to control the punishment to mistake classification,It is bag B^{i}
The middle prediction highest exemplary index value of fraction.
Preferably, MILinear algorithms are solved using intertrust domain Newton method, including：
The optimization object function for determining MILinear be it is unconfined lead object function, its first derivative is：
Wherein,It is exemplary set of the interval less than 1；
Generalized Hessian is calculated by formula below
Wherein, I is unit matrix；
Object function is optimized in an iterative manner, is calculated
Wherein, s^{k}It is to update steplength, Δ_{k}It is intertrust domain, g (w^{k}) and H (w^{k}) be respectively MILinear object functions single order
Derivative and second dervative.
Renewal steplength s is obtained in solution^{k}Afterwards, if the decline of realistic objective function is sufficiently large, then just to w^{k}It is updated,
Otherwise w is kept^{k}Constant, formula is as follows：
Wherein, η_{0}It is the positive number that the minimum acceptable actual function of a predefined control declines.Preferably, further include
Using trained object detection model running bag decomposition algorithm, the fuzziness of positive closure is gradually reduced using iterative manner, including：
The object detection model trained by MILinear is obtained on training set image to all candidate windows
Prediction probability, resolves into the negative bag of a positive closure and one, on the data set obtained after disassembly according to this prediction probability by positive closure
One new MILinear object detection model of training, the possible iteration of the decomposable process is for several times.
Sensation target detection provided by the invention and mask method, have several obvious advantages：
1) it is based on a large amount of oversegmentations as a result, obtaining the candidate that target most probable occurs, by the way of selective search
Window, the window that this mode obtains can be good at keeping the border of object, very high with realworld object coincidence factor, while several
Hundred to keeping high recall rate in the case of thousands of a candidate windows.
2), using in advance on a very big image classification data collection the obtained convolutional neural networks of training from candidate's window
Feature representation is extracted in mouthful, the featurerich expression comprising stronger highlayer semantic information can be obtained, make model automatic
Object apparent model is excavated out from image.
3) a kind of new more example linear SVM models, are employed, while intertrust domain newton is based on using one kind
The optimization algorithm of method optimizes, and the study of Weakly supervised detection model can be efficiently carried out on largescale dataset.
4) a kind of new bag decomposition algorithm, is employed, it is negative by the way that positive sample bag is resolved into a positive sample bag and one
Sample bag, substantially reduces the ambiguity in positive sample bag, can effectively improve the performance of Weakly supervised detection model.
Brief description of the drawings
Fig. 1 be according to the embodiment of the present invention based on Weakly supervised study sensation target detection with mask method model training with
Test flow chart；
Fig. 2 is the MILinear schematic diagrames with being decomposed with bag according to MILinear of the embodiment of the present invention；
Fig. 3 is to optimize to show with other optimization method Comparative results using intertrust domain Newton method according to the embodiment of the present invention
It is intended to；
Fig. 4 is shown according to the object detection model prediction fraction that the embodiment of the present invention is trained and sample registration relation
It is intended to；
Fig. 5 is to use some object classification performance improvement signals in bag decomposition algorithm iterative process according to the embodiment of the present invention
Figure；
Fig. 6 be according to the object detection model that the embodiment of the present invention is trained on Pascal VOC2007 databases
Testing result schematic diagram.
Embodiment
For the object, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with specific embodiment, and reference
Attached drawing, the present invention is described in more detail.
The present invention thought main points be：1) by the way of selective search, it is based on a large amount of oversegmentations as a result, it is possible to
Higher target recall rate and registration are obtained in the case of less candidate window；2) present invention is using very big at one in advance
The convolutional neural networks that training obtains on image classification data collection extract feature representation from candidate window, can obtain comprising more
The featurerich expression of strong highlayer semantic information；3) a kind of new more example linear SVM models are employed, are used
A kind of optimization algorithm based on intertrust domain Newton method optimizes, and Weakly supervised inspection can be efficiently carried out on largescale dataset
Survey the study of model；4) present invention employs a kind of new bag decomposition algorithm, by the way that positive sample bag is resolved into a positive sample
Bag and a negative sample bag, substantially reduce the ambiguity in positive sample bag, can effectively improve the performance of Weakly supervised detection model.
As shown in Figure 1, Fig. 1 top halfs be according to the embodiment of the present invention based on Weakly supervised study sensation target detection with
Mask method model training flow chart.First, it is input picture；Secondly, the figure by using selective search algorithm to input
As carrying out candidate window extraction, the candidate region of extraction is obtained；Then, candidate region, i.e. candidate window sample order are sent into
Convolutional neural networks, obtain the feature description of each candidate region, i.e. Zonal expression；Finally, feature based describes, and uses this hair
The algorithm based on Weakly supervised study of bright proposition carries out the automatic study of object apparent model, i.e. positive sample is excavated.Fig. 1 lower half
Divide the test process for elaborating this method.For test image, candidate window is extracted by the way of as training process, so
Feature description is carried out to window area using depth convolutional neural networks afterwards, finally trained object apparent model before use
Classify to window area, realize target detection or mark task.This method comprises the following steps：
S1, candidate region extraction, extract candidate window from training set image using selective search algorithm and are used as candidate
Region.
In the case of only given image class label, the object that some classifications are included in image, such as " vapour can only be known
Car ", " people ", but be ignorant, this just needs to determine the outer of object by algorithm for the position of " automobile " and " people "
Connect rectangle.If extracting all possible rectangle frame from image, the number of that all possible rectangle frame be it is very huge,
It is also unpractical to deal with.Candidate region extraction algorithm is sought to by extracting a limited number of possible object rectangle frame,
So that wherein include the object to be positioned as much as possible.Here there are three indexs most important：When the number of candidate window,
Number is fewer, and efficiency of algorithm is higher；Second, the number comprising realworld object and all objects number in recall rate, namely candidate window
Purpose ratio；Third, candidate window and the registration of realworld object boundary rectangle frame.Candidate window algorithm based on intensive collection,
Window number is very huge, and recall rate and registration are all less desirable.
The selective search algorithm that the present invention uses is a kind of candidate window extraction algorithm based on oversegmentation, it is by adopting
Oversegmentation is carried out to image with different parameters, obtains different image blocks, then using stratification tissue thought to piecemeal
Merge, so as to find the most possible boundary rectangle for including object.Comprise the following steps that：First, by original image from RGB
Color space conversion is to other color spaces, including HSV, RGI, I, Lab etc.；Then, the oversegmentation based on Graph is used respectively
Algorithm splits respective image respectively, then constantly merges the highest Liang Kuai areas of similarity by the thought of stratification tissue
Domain, obtains the stratification segmentation result of image.By multiple color spaces, multilevel cut zone set is combined, and is carried out
After duplicate removal processing, the set of candidate regions of the figure is just obtained.
Selective search algorithm operational efficiency is higher, in the case of hundreds of to thousands of candidate windows, can obtain non
Often high recall rate and registration.
S2, carry out feature description to each candidate region using extensive convolutional neural networks trained in advance and export this
Feature describes.
After getting and may include the candidate region of attention object, to pass through computer vision and patternrecognition and calculate
Method determines whether some candidate window is certain object, it is necessary to carry out feature description to the candidate region first, so as to
Afterwards classification judgement is carried out using grader.Image classification with identification field, common Image Description Methods include SIFT,
The lowlevel image features such as LBP, HOG describe, the middle level features description such as word bag model, and convolutional neural networks, depth belief network etc. are in recent years
Popular stratification feature representation.Weakly supervised object detection and mark problem, what is solved is that the identification of object level is asked
Topic, by eliminate Weakly supervised ambiguity answer what object somewhere this semantic hierarchies the problem of.This height
Layer matter of semantics is not lowlevel image feature description and middle level features description can be handled very well, it is necessary to very abstract highlevel characteristic
Expression.Convolutional neural networks achieve a series of important breakthrough in object identification field, and the feature representation of its stratification, is realized
Feature being successively abstracted by bottom to highrise, and the characteristic layer before it is typically edge, Corner detector, as the number of plies increases
More, feature below is gradually illustrated starting at object part, whole object.By the spy for extracting characteristic layer behind convolutional neural networks
Sign, can obtain the description and expression to image higher level (such as object rank).Convolutional neural networks also have one it is important
Characteristic be exactly that its model capacity is very big, and the number of plies is more, and neuron number is bigger, and model complexity is more, can encode and deposit
The information content of storage is bigger.
Based on this, the present invention trained one big rule on the data set ImageNet 2013 of a very big image
The convolutional neural networks of mould, substantial amounts of general object information is stored in the network.Preferably, using one largescale one
As object classification database ImageNet 2013 carry out the training of convolutional neural networks, training data includes 1000 classes about 120
Ten thousand images, the convolutional neural networks used include 5 convolutional layers, 2 full articulamentums, and behind the 1st, 2,5 convolutional layer
Maximum convergencelevel is connected, whole network includes about 650,000 neurons.Knowledge in existing largely just as the mankind helps to differentiate
Object is the same, this contains the convolutional neural networks of a large amount of general vision prior informations, can be efficiently used for object into
The general description of row.
S3, on the basis of only given image class label, using more Exemplary linear support vector machines MISVM in candidate
Training object detection model in provincial characteristics expression.
The present invention gets candidate window set by using selective search algorithm from image, and uses one
Extensive convolutional neural networks trained in advance carry out feature description to these candidate windows, and what is next done is exactly at this
A little upper automatic study object detection models of candidate window feature description ,utilize trained object detection model, it is possible to waiting
Favored area is predicted, and finds region of the most probable there are object.
Weakly supervised object detection can usually be modeled as a multiinstance learning problem with mark problem.One image I^{i}It is logical
Cross one and include n^{i}A d ties up exemplary bag B^{i}To describe, wherein jth of example is denoted asIf being included at least in a bag has
One example is positive sample, then the label y of the bag^{i}All it is negative sample if all of example, then the label of the bag for+1
y^{i}For 1.In order to avoid explicitly handling offset below, the present invention with the addition of a volume at the end of each exemplary characteristics
Outer 1.Note
ξ_{i}≥0
Training set is B={ (B^{i},y^{i})  i=1,2 ..., N },  B =N is training set number of samples, and w is grader coefficient,
C is that regular terms is used to control the punishment to mistake classification, ξ_{i}It is slack variable.
Under multiinstance learning frame, what the basic markup information of image was brought is the ambiguity in positive closure, i.e., only knows
Do not know which is positive sample but including at least a positive sample.MISVM algorithms predict fraction W by only considering^{T} Maximum
Example is predicted bag by this to solve the problems, such as this, as shown in Fig. 2 (a).The hyperplane of MISVM algorithms be by
What the highest example of fraction each wrapped determined, it is a mixed integer programming problem that it, which optimizes formula, can only be by heuristic
Algorithm is solved, and speed is very slow.
S3.1 MILinear algorithms
Different from the small data set of traditional multiinstance learning issue handling, present invention primarily contemplates comprising 5000 bags with
Include above and each hundreds of to the thousand meter exemplary big data problems of higherdimension.For preferably to weak under big data scale
Monitor Problems carry out Efficient Solution, and the present invention proposes a kind of new to be calculated without the more Exemplary linear support vector machines of constraint largespacing
Method, is known as MILinear.Its formula is shown below：
WhereinIt is the feature vector of jth of example in ith of bag, y^{i}It is the classification mark of ith of bag.Above formula Section 2
A square Hinge loss functions are employed, max (a, b) takes a, the maximum of b.
It is bag B^{i}The middle prediction highest exemplary index value of fraction.
It is used widely based on the optimization method of gradient on Largescale Optimization Problems, present invention uses what can be led
Hinge Loss loss functions.As shown in 2 (a), MISVM and MILinear are by selecting the example of fraction maximum come to this
Large scale multiinstance learning problem is solved.
S3.2 bag decomposition algorithms
In the experiment of MILinear, it is a discovery of the invention that in a positive closure, positive sample is generally focused on fraction maximum
Preceding 30%.After noticing this problem, the present invention proposes a kind of new bag decomposition algorithm, by the way that positive closure is being resolved into one just
Bag and a negative bag, effectively reduce the ambiguity of positive closure.Preferably, the model trained by MILinear is in training image
On obtain the prediction probability to all candidate windows, positive closure is resolved into by the negative bag of a positive closure and one according to this prediction probability,
Specially the 30% of maximum probability is new positive closure, remaining sample becomes a new negative bag.Next, obtain after disassembly
One new MILinear model of training on data set, as shown in Fig. 2 (b).By bag decomposition algorithm, reduce sample in positive closure
Ambiguity, so as to improve category of model performance.This decomposable process may iteration for several times, until model performance is no longer improved to
Only.
S3.3 gradient optimal methods
The definition of MILinear algorithms is above had been presented for, how is discussed below under large scale data set
Efficiently carry out model learning.The optimization object function of MILinear be it is unconfined lead form, its first derivative is
Wherein
It is exemplary set of the interval less than 1.
After the gradient Analytical Expression of object function is obtained, just there are many methods to carry out objective function optimization
, including stochastic gradient descent (SGD), LBFGS, Nonlinear Conjugate Gradient Methods (CG) etc..Stochastic gradient descent method is to data set
Handled, and model is updated one by one iteratively.LBFGS is a kind of plan Newton optimization method, it passes through one kind
The approximate lowrank method for solving of Hessian matrixes avoids storing whole Hessian matrixes.It is, in general, that stochastic gradient descent
The cost often walked is relatively low but iteration time is longer, and the second order such as LBFGS optimization method often walks timeconsuming longer, but global convergence speed
Degree is very fast.
For the more efficient apparent model learning of carry out object, the present invention proposes a kind of base more efficient than LBFGS
Optimize algorithm in more Exemplary linear support vector machines of intertrust domain Newton method.Intertrust domain Newton method is a kind of very efficient big ruler
Unconstrained problem method for solving is spent, and is answered on general large scale logistic is returned and support vector machines is trained
With.To apply intertrust domain Newton Algorithm MILinear problems, generalized Hessian is calculated using formula below
Wherein I is unit matrix.
Intertrust domain Newton method in an iterative manner optimizes object function, attempts to solve following include per suboptimization
The subproblem of intertrust domain
Wherein s^{k}It is to update steplength, Δ_{k}It is intertrust domain, g (w^{k}) core H (w^{k}) it is MILinear object functions (formula 2) respectively
First derivative and second dervative.
This subproblem can use the conjugate gradient method for considering intertrust domain to carry out Efficient Solution.
Renewal steplength s is obtained in solution^{k}Afterwards, if the decline of realistic objective function is sufficiently large, then just to w^{k}It is updated,
Otherwise w is kept^{k}It is constant.
Wherein η_{0}It is the positive number that the minimum acceptable actual function of a predefined control declines, actual function declines big
In the value, then more new direction is received, and in an embodiment of the present invention, it is 1e4 to be preferably provided with it.
Strictly, the object function of MILinear is due to introducing max functions, thus is nonconvex.The target letter at the same time
Number is nor what second order can be led.Although it cannot be guaranteed that globally optimal solution, in practical situations, the algorithm can be effectively from big
Object apparent model is arrived in study on scale data collection.
S4, extract candidate region in test image, and makes to carry out feature description in the same way, training before use
Obtained object detection model orientation object interested.In test phase, obtained first by selective search algorithm certain
The candidate region of quantity, then carries out feature description using the convolutional neural networks as the training stage.Afterwards before use
The object apparent model that training obtains classifies window feature, so as to judge whether each candidate window is interested
Object, draws any object in conclusion where.This completes only utilize image tag information realization thing interested
The automatic detection of body and mark.
Fig. 3 is to optimize to show with other optimization method Comparative results using intertrust domain Newton method according to the embodiment of the present invention
It is intended to, Fig. 4 is illustrated according to the object detection model prediction fraction that the embodiment of the present invention is trained and sample registration relation
Figure, Fig. 5 are to use some object classification performance improvement schematic diagrames in bag decomposition algorithm iterative process, figure according to the embodiment of the present invention
6 be that testing result according to the object detection model that the embodiment of the present invention is trained on Pascal VOC2007 databases is shown
It is intended to.
In short, the present invention proposes a kind of new detection of the sensation target based on Weakly supervised study and mask method, use
Selective search algorithm carries out candidate window extraction, and time is used as using the deep layer convolutional neural networks of the pretraining in mass data
Window feature expression model and general priori are selected, and positive sample is carried out using a kind of algorithm based on more Exemplary linear support vector machines
This excavation.Model optimization is carried out by using intertrust domain Newton method, and is progressively reduced using a kind of novel bag decomposition algorithm
The ambiguity of positive closure, this method realize sensation target detection and automatic marking under Weakly supervised scene.Experiment shows the invention
With the Weakly supervised sensation target detection of mainstream compared with mask method, there is stronger positive sample mining ability and more generally apply
Prospect, the sensation target detection being suitable on largescale dataset and automatic marking task.
Particular embodiments described above, has carried out the purpose of the present invention, technical solution and beneficial effect further in detail
Describe in detail it is bright, it should be understood that the foregoing is merely the present invention specific embodiment, be not intended to limit the invention, it is all
Within the spirit and principles in the present invention, any modification, equivalent substitution, improvement and etc. done, should be included in the guarantor of the present invention
Within the scope of shield.
Claims (6)
1. a kind of sensation target detection and mask method, it is characterised in that including：
Image input step, inputs image to be detected；
Candidate region extraction step, extracts candidate window from described image to be detected using selective search algorithm and is used as candidate
Region；
Feature describes extraction step, and feature description is carried out simultaneously to candidate region using the extensive convolutional neural networks of training in advance
Export the feature description of the candidate region；
Sensation target prediction steps, the feature description based on the candidate region, utilize object detection model pair trained in advance
Candidate region is predicted, and estimates the region there are the sensation target；
Position annotation step, is labeled the position of the sensation target according to the estimated result；
Wherein, the selective search algorithm in the candidate region extraction step further comprises：
It is predetermined color space by the color space conversion of image to be detected, using the oversegmentation algorithm based on Graph to described
Image is split, and constantly merges the highest two pieces of regions of similarity, the stratification segmentation result of image is obtained, by multiple colors
Behind space and the merging of multilevel cut zone set and duplicate removal processing, the set of candidate regions of the image is obtained；
The method further includes：
Object detection model training step, specifically includes：
Training set image of the input with image category label；
Candidate window is extracted as candidate region from training set image using selective search algorithm；
Feature description is carried out to candidate region using the extensive convolutional neural networks of training in advance and exports the candidate region
Feature describes；
Feature description based on the candidate region, object detection model is trained using more Exemplary linear support vector machines.
2. according to the method described in claim 1, it is characterized in that, the predetermined color space includes：HSV, RGI, I, Lab.
3. according to the method described in claim 1, it is characterized in that, the convolutional neural networks trained in advance are：Based on thing
The convolutional neural networks that body taxonomy database ImageNet 2013 is trained.
4. according to the method described in claim 1, it is characterized in that, described use more Exemplary linear support vector machines training object
Detection model, including：
Object detection model is trained without constraint largespacing multiinstance learning algorithm using MILinear, its object function is
<mrow>
<munder>
<mrow>
<mi>m</mi>
<mi>i</mi>
<mi>n</mi>
</mrow>
<mi>w</mi>
</munder>
<mfrac>
<mn>1</mn>
<mn>2</mn>
</mfrac>
<mo></mo>
<mo></mo>
<mi>w</mi>
<mo></mo>
<msup>
<mo></mo>
<mn>2</mn>
</msup>
<mo>+</mo>
<mfrac>
<mi>C</mi>
<mrow>
<mo></mo>
<mi>B</mi>
<mo></mo>
</mrow>
</mfrac>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mrow>
<mo></mo>
<mi>B</mi>
<mo></mo>
</mrow>
</munderover>
<msup>
<mrow>
<mo>(</mo>
<mi>m</mi>
<mi>a</mi>
<mi>x</mi>
<mo>(</mo>
<mrow>
<mn>0</mn>
<mo>,</mo>
<mn>1</mn>
<mo></mo>
<msup>
<mi>y</mi>
<mi>i</mi>
</msup>
<msup>
<mi>w</mi>
<mi>T</mi>
</msup>
<msubsup>
<mi>B</mi>
<msub>
<mi>I</mi>
<mi>i</mi>
</msub>
<mi>i</mi>
</msubsup>
</mrow>
<mo>)</mo>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>,</mo>
</mrow>
Wherein, an image I^{i}N is included by one^{i}A d ties up exemplary bag B^{i}To describe, wherein jth of example is denoted as B^{i} _{j}；If
It is positive sample that being included at least in one bag, which has an example, then the label y of the bag^{i}For+1, if all examples are all negative samples
This, then the label y of the bag^{i}For 1, training set is B={ (B^{i}, y^{i})  i=1,2 ..., N },  B =N is training set sample number
Mesh, w are grader coefficients, w^{T}It is the transposition of w, C is that regular terms is used to control the punishment to mistake classification,
It is bag B^{i}The middle prediction highest exemplary index value of fraction.
5. according to the method described in claim 4, it is characterized in that, MILinear algorithms are asked using intertrust domain Newton method
Solution, including：
The optimization object function for determining MILinear be it is unconfined lead object function, its first derivative is
<mrow>
<mi>g</mi>
<mrow>
<mo>(</mo>
<mi>w</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mi>w</mi>
<mo>+</mo>
<mn>2</mn>
<mfrac>
<mi>C</mi>
<mrow>
<mo></mo>
<mi>B</mi>
<mo></mo>
</mrow>
</mfrac>
<munder>
<mi>&Sigma;</mi>
<mrow>
<mi>i</mi>
<mo>&Element;</mo>
<msub>
<mi>I</mi>
<mi>B</mi>
</msub>
</mrow>
</munder>
<mrow>
<mo>(</mo>
<msup>
<mi>w</mi>
<mi>T</mi>
</msup>
<msubsup>
<mi>B</mi>
<msub>
<mi>I</mi>
<mi>i</mi>
</msub>
<mi>i</mi>
</msubsup>
<msubsup>
<mi>B</mi>
<msub>
<mi>I</mi>
<mi>i</mi>
</msub>
<mrow>
<mi>i</mi>
<mi>T</mi>
</mrow>
</msubsup>
<mo></mo>
<msup>
<mi>y</mi>
<mi>i</mi>
</msup>
<msubsup>
<mi>B</mi>
<msub>
<mi>I</mi>
<mi>i</mi>
</msub>
<mrow>
<mi>i</mi>
<mi>T</mi>
</mrow>
</msubsup>
<mo>)</mo>
</mrow>
<mo>,</mo>
</mrow>
Wherein,It is exemplary set of the interval less than 1；
Generalized Hessian is calculated by formula below
Wherein, I is unit matrix；
Object function is optimized in an iterative manner, is calculated
<mrow>
<mtable>
<mtr>
<mtd>
<mrow>
<msup>
<mi>s</mi>
<mi>k</mi>
</msup>
<mo>=</mo>
<mi>min</mi>
<mi> </mi>
<msub>
<mi>q</mi>
<mi>k</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>s</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<munder>
<mi>min</mi>
<mi>s</mi>
</munder>
<mo>&dtri;</mo>
<mi>f</mi>
<msup>
<mrow>
<mo>(</mo>
<msup>
<mi>w</mi>
<mi>k</mi>
</msup>
<mo>)</mo>
</mrow>
<mi>T</mi>
</msup>
<mi>s</mi>
<mo>+</mo>
<mfrac>
<mn>1</mn>
<mn>2</mn>
</mfrac>
<msup>
<mi>s</mi>
<mi>T</mi>
</msup>
<msup>
<mo>&dtri;</mo>
<mn>2</mn>
</msup>
<mi>f</mi>
<mrow>
<mo>(</mo>
<msup>
<mi>w</mi>
<mi>k</mi>
</msup>
<mo>)</mo>
</mrow>
<mi>s</mi>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mtable>
<mtr>
<mtd>
<mrow>
<mo>=</mo>
<munder>
<mi>min</mi>
<mi>s</mi>
</munder>
<mi>g</mi>
<msup>
<mrow>
<mo>(</mo>
<msup>
<mi>w</mi>
<mi>k</mi>
</msup>
<mo>)</mo>
</mrow>
<mi>T</mi>
</msup>
<mi>s</mi>
<mo>+</mo>
<mfrac>
<mi>1</mi>
<mi>2</mi>
</mfrac>
<msup>
<mi>s</mi>
<mi>T</mi>
</msup>
<mi>H</mi>
<mrow>
<mo>(</mo>
<msup>
<mi>w</mi>
<mi>k</mi>
</msup>
<mo>)</mo>
</mrow>
<mi>s</mi>
<mo>,</mo>
</mrow>
</mtd>
<mtd>
<mrow>
<mi>s</mi>
<mo>.</mo>
<mi>t</mi>
<mo>.</mo>
</mrow>
</mtd>
<mtd>
<mrow>
<mo></mo>
<mo></mo>
<mi>s</mi>
<mo></mo>
<mo></mo>
<mo>&le;</mo>
<msub>
<mi>&Delta;</mi>
<mi>k</mi>
</msub>
</mrow>
</mtd>
</mtr>
</mtable>
</mtd>
</mtr>
</mtable>
<mo>,</mo>
</mrow>
Wherein, k is iterations, and s is to update steplength, s^{k}It is the renewal steplength of kth time iteration, w^{k}It is the weights of iteration kth time,
Δ_{k}It is intertrust domain,WithIt is the single order of MILinear object functions respectively
Derivative and second dervative；
Renewal steplength s is obtained in solution^{k}Afterwards, if the decline of realistic objective function is sufficiently large, then just to w^{k}It is updated, otherwise
Keep w^{k}Constant, formula is as follows：
<mrow>
<msup>
<mi>w</mi>
<mrow>
<mi>k</mi>
<mo>+</mo>
<mn>1</mn>
</mrow>
</msup>
<mo>=</mo>
<mfenced open = "{" close = "">
<mtable>
<mtr>
<mtd>
<mrow>
<msup>
<mi>w</mi>
<mi>k</mi>
</msup>
<mo>+</mo>
<msup>
<mi>s</mi>
<mi>k</mi>
</msup>
</mrow>
</mtd>
<mtd>
<mrow>
<mi>i</mi>
<mi>f</mi>
<mfrac>
<mrow>
<mi>f</mi>
<mrow>
<mo>(</mo>
<msup>
<mi>w</mi>
<mi>k</mi>
</msup>
<mo>+</mo>
<msup>
<mi>s</mi>
<mi>k</mi>
</msup>
<mo>)</mo>
</mrow>
<mo></mo>
<mi>f</mi>
<mrow>
<mo>(</mo>
<msup>
<mi>w</mi>
<mi>k</mi>
</msup>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<msub>
<mi>q</mi>
<mi>k</mi>
</msub>
<mrow>
<mo>(</mo>
<msup>
<mi>s</mi>
<mi>k</mi>
</msup>
<mo>)</mo>
</mrow>
</mrow>
</mfrac>
<mo>></mo>
<msub>
<mi>&eta;</mi>
<mn>0</mn>
</msub>
<mo>,</mo>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<msup>
<mi>w</mi>
<mi>k</mi>
</msup>
</mtd>
<mtd>
<mrow>
<mi>o</mi>
<mi>t</mi>
<mi>h</mi>
<mi>e</mi>
<mi>r</mi>
<mi>w</mi>
<mi>i</mi>
<mi>s</mi>
<mi>e</mi>
<mo>.</mo>
</mrow>
</mtd>
</mtr>
</mtable>
</mfenced>
<mo>,</mo>
</mrow>
Wherein η_{0}It is the positive number that the minimum acceptable actual function of a predefined control declines.
6. according to the method described in claim 5, utilize trained object detection model running bag it is characterized in that, further including
Decomposition algorithm, the fuzziness of positive closure is gradually reduced using iterative manner, is specifically included：
The object detection model trained by MILinear obtains the prediction to all candidate windows on training set image
Probability, resolves into a positive closure and a negative bag by positive closure according to this prediction probability, is trained on the data set obtained after disassembly
One new MILinear object detection model, the decomposable process need iteration for several times.
Priority Applications (1)
Application Number  Priority Date  Filing Date  Title 

CN201410442817.4A CN104217225B (en)  20140902  20140902  A kind of sensation target detection and mask method 
Applications Claiming Priority (1)
Application Number  Priority Date  Filing Date  Title 

CN201410442817.4A CN104217225B (en)  20140902  20140902  A kind of sensation target detection and mask method 
Publications (2)
Publication Number  Publication Date 

CN104217225A CN104217225A (en)  20141217 
CN104217225B true CN104217225B (en)  20180424 
Family
ID=52098687
Family Applications (1)
Application Number  Title  Priority Date  Filing Date 

CN201410442817.4A Active CN104217225B (en)  20140902  20140902  A kind of sensation target detection and mask method 
Country Status (1)
Country  Link 

CN (1)  CN104217225B (en) 
Families Citing this family (55)
Publication number  Priority date  Publication date  Assignee  Title 

CN104572965A (en) *  20141231  20150429  南京理工大学  Searchbyimage system based on convolutional neural network 
CN104573669B (en) *  20150127  20180904  中国科学院自动化研究所  Image object detection method 
CN105989174B (en) *  20150305  20191101  欧姆龙株式会社  Regionofinterest extraction element and regionofinterest extracting method 
CN104700118A (en) *  20150318  20150610  中国科学院自动化研究所  Pulmonary nodule benignity and malignancy predicting method based on convolutional neural networks 
CN106156777B (en) *  20150423  20200602  华中科技大学  Text picture detection method and device 
CN106327516B (en) *  20150629  20181218  北京雷动云合智能技术有限公司  A kind of learningoriented visual pursuit method based on display model 
CN105069774B (en) *  20150630  20171110  长安大学  The Target Segmentation method of optimization is cut based on multiinstance learning and figure 
CN105138983B (en) *  20150821  20190628  燕山大学  The pedestrian detection method divided based on weighting block model and selective search 
WO2017059576A1 (en) *  20151009  20170413  Beijing Sensetime Technology Development Co., Ltd  Apparatus and method for pedestrian detection 
CN106611177A (en) *  20151027  20170503  北京航天长峰科技工业集团有限公司  Big databased image classification method 
WO2017096570A1 (en) *  20151210  20170615  Intel Corporation  Visual recognition using deep learning attributes 
CN105678322A (en) *  20151231  20160615  百度在线网络技术（北京）有限公司  Sample labeling method and apparatus 
CN108475331A (en) *  20160217  20180831  英特尔公司  Use the candidate region for the imageregion for including interested object of multiple layers of the characteristic spectrum from convolutional neural networks model 
CN105868269A (en) *  20160308  20160817  中国石油大学(华东)  Precise image searching method based on region convolutional neural network 
CN105893963B (en) *  20160331  20190308  南京邮电大学  A kind of method of the best frame easy to identify of single pedestrian target in screening video 
CN105956563B (en) *  20160506  20190416  西安工程大学  The method for carrying out face mark in news image based on multiinstance learning 
CN106127204B (en) *  20160630  20190809  华南理工大学  A kind of multidirection meter reading Region detection algorithms of full convolutional neural networks 
CN106203450A (en) *  20160711  20161207  国家新闻出版广电总局广播科学研究院  Based on degree of depth learning framework, image is carried out the object detection method of feature extraction 
CN106227836B (en) *  20160726  20200714  上海交通大学  Unsupervised joint visual concept learning system and unsupervised joint visual concept learning method based on images and characters 
CN106326985A (en) *  20160818  20170111  北京旷视科技有限公司  Neural network training method, neural network training device, data processing method and data processing device 
CN106326893A (en) *  20160825  20170111  安徽水滴科技有限责任公司  Vehicle color recognition method based on area discrimination 
US10140508B2 (en) *  20160826  20181127  Huawei Technologies Co. Ltd.  Method and apparatus for annotating a video stream comprising a sequence of frames 
CN106384345B (en) *  20160831  20190402  上海交通大学  A kind of image detection and flow statistical method based on RCNN 
CN107918624A (en) *  20161011  20180417  富士通株式会社  Image retrieving apparatus and method, electronic equipment 
CN106504233B (en) *  20161018  20190409  国网山东省电力公司电力科学研究院  Unmanned plane inspection image electric power widget recognition methods and system based on Faster RCNN 
CN106529485A (en) *  20161116  20170322  北京旷视科技有限公司  Method and apparatus for obtaining training data 
WO2018107371A1 (en)  20161213  20180621  上海联影医疗科技有限公司  Image searching system and method 
CN108229514A (en) *  20161229  20180629  北京市商汤科技开发有限公司  Object detecting method, device and electronic equipment 
WO2018120038A1 (en) *  20161230  20180705  深圳前海达闼云端智能科技有限公司  Method and device for target detection 
CN108303748A (en) *  20170112  20180720  同方威视技术股份有限公司  The method for checking equipment and detecting the gun in luggage and articles 
CN108303747A (en) *  20170112  20180720  清华大学  The method for checking equipment and detecting gun 
CN106815604B (en) *  20170116  20190927  大连理工大学  Method for viewing points detecting based on fusion of multilayer information 
CN106909887A (en) *  20170119  20170630  南京邮电大学盐城大数据研究院有限公司  A kind of action identification method based on CNN and SVM 
CN106934344B (en) *  20170123  20200131  西北大学  quick pedestrian detection method based on neural network 
CN106934346B (en) *  20170124  20190315  北京大学  A kind of method of target detection performance optimization 
CN106845430A (en) *  20170206  20170613  东华大学  Pedestrian detection and tracking based on acceleration region convolutional neural networks 
CN107038448B (en) *  20170301  20200228  中科视语(北京)科技有限公司  Target detection model construction method 
CN106991400A (en) *  20170405  20170728  北京中燕信息技术有限公司  A kind of fire hazard smoke detecting method and device 
CN107203781B (en) *  20170522  20200728  浙江大学  Endtoend weak supervision target detection method based on significance guidance 
CN107330449A (en) *  20170613  20171107  瑞达昇科技(大连)有限公司  A kind of BDR sign detection method and device 
CN107609483B (en) *  20170815  20200616  中国科学院自动化研究所  Dangerous target detection method and device for driving assistance system 
CN107562050B (en) *  20170829  20210316  广东工业大学  Method and system for robot to recognize environment 
CN107945153A (en) *  20171107  20180420  广东广业开元科技有限公司  A kind of road surface crack detection method based on deep learning 
CN108319633A (en) *  20171117  20180724  腾讯科技（深圳）有限公司  A kind of image processing method, device and server, system, storage medium 
CN108062574B (en) *  20171231  20200616  厦门大学  Weak supervision target detection method based on specific category space constraint 
CN110147796A (en) *  20180212  20190820  杭州海康威视数字技术股份有限公司  Image matching method and device 
CN108596223A (en) *  20180411  20180928  珠海博明视觉科技有限公司  A method of automatically generating object data set 
CN109063559B (en) *  20180628  20210511  东南大学  Pedestrian detection method based on improved region regression 
CN109492686A (en) *  20181101  20190319  郑州云海信息技术有限公司  A kind of picture mask method and system 
CN109492702B (en) *  20181121  20200922  中国科学院自动化研究所  Pedestrian reidentification method, system and device based on ranking measurement function 
CN109857878A (en) *  20181227  20190607  深兰科技（上海）有限公司  Article mask method and device, electronic equipment and storage medium 
CN109740571A (en) *  20190122  20190510  南京旷云科技有限公司  The method of Image Acquisition, the method, apparatus of image procossing and electronic equipment 
CN111488400B (en) *  20190428  20210330  北京京东尚科信息技术有限公司  Data classification method, device and computer readable storage medium 
CN110288629A (en) *  20190624  20190927  湖北亿咖通科技有限公司  Target detection automatic marking method and device based on moving Object Detection 
CN110929729B (en) *  20200218  20200804  北京海天瑞声科技股份有限公司  Image annotation method, image annotation device and computer storage medium 
Citations (3)
Publication number  Priority date  Publication date  Assignee  Title 

US8355569B2 (en) *  20060810  20130115  Nec Corporation  Object region extracting device 
CN103870834A (en) *  20140403  20140618  张琰  Method for searching for sliding window based on layered segmentation 
CN103984959A (en) *  20140526  20140813  中国科学院自动化研究所  Datadriven and taskdriven image classification method 
Family Cites Families (1)
Publication number  Priority date  Publication date  Assignee  Title 

EP2207138B1 (en) *  20071030  20161228  PASCO Corporation  House movement determining method, house movement determining program, house movement determining image generating method, and house movement determining image 

2014
 20140902 CN CN201410442817.4A patent/CN104217225B/en active Active
Patent Citations (3)
Publication number  Priority date  Publication date  Assignee  Title 

US8355569B2 (en) *  20060810  20130115  Nec Corporation  Object region extracting device 
CN103870834A (en) *  20140403  20140618  张琰  Method for searching for sliding window based on layered segmentation 
CN103984959A (en) *  20140526  20140813  中国科学院自动化研究所  Datadriven and taskdriven image classification method 
Also Published As
Publication number  Publication date 

CN104217225A (en)  20141217 
Similar Documents
Publication  Publication Date  Title 

Huang et al.  Urban landuse mapping using a deep convolutional neural network with high spatial resolution multispectral remote sensing imagery  
Kümmerer et al.  DeepGaze II: Reading fixations from deep features trained on object recognition  
Caicedo et al.  Evaluation of deep learning strategies for nucleus segmentation in fluorescence images  
Komura et al.  Machine learning methods for histopathological image analysis  
RomeraParedes et al.  Recurrent instance segmentation  
Zhang et al.  A Linear Dirichlet Mixture Model for decomposing scenes: Application to analyzing urban functional zonings  
Tian et al.  Detecting text in natural image with connectionist text proposal network  
Zhang et al.  Hierarchical semantic cognition for urban functional zones with VHR satellite images and POI data  
Namin et al.  Deep phenotyping: deep learning for temporal phenotype/genotype classification  
Pan et al.  Accurate segmentation of nuclei in pathological images via sparse reconstruction and deep convolutional networks  
Vetrivel et al.  Disaster damage detection through synergistic use of deep learning and 3D point cloud features derived from very high resolution oblique aerial images, and multiplekernellearning  
Nasiri et al.  A whale optimization algorithm (WOA) approach for clustering  
EP3364341A1 (en)  Analyzing digital holographic microscopy data for hematology applications  
Nichols et al.  Machine learning: applications of artificial intelligence to imaging and diagnosis  
US10303979B2 (en)  System and method for classifying and segmenting microscopy images with deep multiple instance learning  
Boom et al.  A research tool for longterm and continuous analysis of fish assemblage in coralreefs using underwater camera footage  
Sudderth et al.  Learning hierarchical models of scenes, objects, and parts  
Raut et al.  Image segmentation–a stateofart survey for prediction  
CN104573669B (en)  Image object detection method  
Zhang et al.  Hybrid region merging method for segmentation of highresolution remote sensing images  
Angermueller et al.  Deep learning for computational biology  
Shi et al.  Cloud detection of remote sensing images by deep learning  
US10417524B2 (en)  Deep active learning method for civil infrastructure defect detection  
Kumar et al.  Automatic cluster evolution using gravitational search algorithm and its application on image segmentation  
Weinmann et al.  A classificationsegmentation framework for the detection of individual trees in dense MMS point cloud data acquired in urban areas 
Legal Events
Date  Code  Title  Description 

C06  Publication  
PB01  Publication  
C10  Entry into substantive examination  
SE01  Entry into force of request for substantive examination  
GR01  Patent grant  
GR01  Patent grant 