CN104217225B - A kind of sensation target detection and mask method - Google Patents

A kind of sensation target detection and mask method Download PDF

Info

Publication number
CN104217225B
CN104217225B CN201410442817.4A CN201410442817A CN104217225B CN 104217225 B CN104217225 B CN 104217225B CN 201410442817 A CN201410442817 A CN 201410442817A CN 104217225 B CN104217225 B CN 104217225B
Authority
CN
China
Prior art keywords
mrow
msup
image
msub
mtd
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410442817.4A
Other languages
Chinese (zh)
Other versions
CN104217225A (en
Inventor
黄凯奇
任伟强
王冲
张俊格
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang Institute of Automation of CAS
Original Assignee
Shenyang Institute of Automation of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang Institute of Automation of CAS filed Critical Shenyang Institute of Automation of CAS
Priority to CN201410442817.4A priority Critical patent/CN104217225B/en
Publication of CN104217225A publication Critical patent/CN104217225A/en
Application granted granted Critical
Publication of CN104217225B publication Critical patent/CN104217225B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a kind of detection of sensation target and mask method, including:Image input step, inputs image to be detected;Candidate region extraction step, extracts candidate window from described image to be detected using selective search algorithm and is used as candidate region;Feature describes extraction step, is described using the feature that the extensive convolutional neural networks of training in advance carry out candidate region feature description and export the candidate region;Sensation target prediction steps, the feature description based on the candidate region, are predicted candidate region using object detection model trained in advance, estimate the region there are the sensation target;Position annotation step, is labeled the position of the sensation target according to the estimated result.Experiment shows that the present invention compared with mask method, has stronger positive sample mining ability and more generally application prospect with the Weakly supervised sensation target detection of mainstream, the sensation target detection being suitable on large-scale dataset and automatic marking task.

Description

A kind of sensation target detection and mask method
Technical field
The present invention relates to object detection technical field in computer vision, more particularly to a kind of regarding based on Weakly supervised study Feel target detection and mask method.
Background technology
It is one basic problem of computer vision field that objects in images, which is detected with automated location mark, and the field will One of key problem of research.Objects in images detection is exactly given test image, answers what where this Problem.Object detection has a wide range of applications in a lot of other vision research problems, such as object identification, pedestrian detection, face Detection, the foreground detection under monitoring scene, motion tracking, Activity recognition with analysis etc..
General object detection needs the given database for having marked object boundary rectangle, and gradient direction is based on to use The pure object detection model for having supervision such as histogram (HOG), deformable member model (DPM) carries out model training.Digital Media skill The high speed development of art so that explosive growth occur in the data such as image, video, and the popularization of internet then allows one to more Easily get image, the video data of magnanimity.In face of the view data of such magnanimity, current object detection is calculated with standard The problem of sternness that method needs to face is that substantial amounts of data do not have available object space markup information.To large nuber of images Data are marked into row position, are that a labor intensity is very high, the very high task of cost.
Comparatively, it is then much easier that classification mark is carried out to whole image, is carried out using the methods of Unsupervised clustering Filtering, which can also be realized, in advance constructs fairly large taxonomy database in the short time.Thus, utilize only classification annotation Image data base, realize it is automatic carry out object Category Learning and positioning, i.e., by Weakly supervised study realize sensation target detection with Mark, there is important theory value and realistic meaning.
In traditional Weakly supervised learning algorithm, the selection for candidate region, is generally basede on the candidate window of intensive collection Algorithm, window number is very huge, and recall rate and registration are all less desirable.Meanwhile to candidate window generally use word bag Model is described, and the eigentransformation level of word bag model is usually few, and obtained feature may be considered middle level expression, lack The information of higher allows model to excavate out object apparent model from image automatically.
In terms of current Weakly supervised object detection and mark the method for mainstream include multi-instance learning, topic model, condition with Airport etc..Traditional many multi-instance learning algorithms are due to being largely dependent upon core study or study based on distance metric Frame, and using very high optimization algorithms of complexity such as heuritic approach, quadratic programming, integer programmings, it is difficult to extensive Efficient application is obtained on data set.
Therefore, how to improve and optimize Weakly supervised learning algorithm and efficiently to realize the object detection of large nuber of images and automatic position Mark is put, is a major issue of the prior art for being badly in need of solving.
The content of the invention
In view of this, the main object of the present invention is to provide sensation target detection and mask method under Weakly supervised scene, Target interested can be positioned from image collection automatically in the case of only given image class label, can also be to figure As carrying out object space automatic marking.
In order to achieve the above object, the present invention provides following technical scheme:
A kind of sensation target detection and mask method, it is characterised in that including:
Image input step, inputs image to be detected;
Candidate region extraction step, extracted using selective search algorithm from described image to be detected candidate window as Candidate region;
Feature describes extraction step, and carrying out feature to candidate region using the extensive convolutional neural networks of training in advance retouches State and export the feature description of the candidate region;
Sensation target prediction steps, the feature description based on the candidate region, utilize object detection mould trained in advance Type is predicted candidate region, estimates the region there are the sensation target;
Position annotation step, is labeled the position of the sensation target according to the estimated result.
Preferably, the selective search algorithm in the candidate region extraction step further comprises:
It is predetermined space by the color space conversion of image to be detected, using the over-segmentation algorithm based on Graph to described Image is split, and constantly merges the highest two pieces of regions of similarity, the stratification segmentation result of image is obtained, by multiple colors Behind space and the merging of multi-level cut zone set and duplicate removal processing, the set of candidate regions of the image is obtained.
Preferably, the predetermined color space includes:HSV, RGI, I, Lab.
Preferably, the convolutional neural networks trained in advance are:Instructed based on object classification database ImageNet 2013 Experienced convolutional neural networks.
Preferably, object detection model training step is further included, is specifically included:
Training set image of the input with image category label;
Candidate window is extracted as candidate region from training set image using selective search algorithm;
Feature description is carried out to candidate region using the extensive convolutional neural networks of training in advance and exports the candidate regions The feature description in domain;
Feature description based on the candidate region, object apparent model is trained using more Exemplary linear support vector machines.
Preferably, it is described using more Exemplary linear support vector machines training object detection model, including:
Object detection model is trained without constraint large-spacing multi-instance learning algorithm using MILinear, its target letter Number is:
Wherein, an image IiN is included by oneiA d ties up exemplary bag BiTo describe, wherein j-th of example is denoted asIf it is positive sample that being included at least in a bag, which has an example, then the label y of the bagiFor+1, if all examples are all Negative sample, then the label y of the bagiFor -1, training set is B={ (Bi,yi) | i=1,2 ..., N }, | B |=N is training set sample This number, w are grader coefficients, and C is that regular terms is used to control the punishment to mistake classification,It is bag Bi The middle prediction highest exemplary index value of fraction.
Preferably, MILinear algorithms are solved using inter-trust domain Newton method, including:
The optimization object function for determining MILinear be it is unconfined lead object function, its first derivative is:
Wherein,It is exemplary set of the interval less than 1;
Generalized Hessian is calculated by formula below
Wherein, I is unit matrix;
Object function is optimized in an iterative manner, is calculated
Wherein, skIt is to update step-length, ΔkIt is inter-trust domain, g (wk) and H (wk) be respectively MILinear object functions single order Derivative and second dervative.
Renewal step-length s is obtained in solutionkAfterwards, if the decline of realistic objective function is sufficiently large, then just to wkIt is updated, Otherwise w is keptkConstant, formula is as follows:
Wherein, η0It is the positive number that the minimum acceptable actual function of a pre-defined control declines.Preferably, further include Using trained object detection model running bag decomposition algorithm, the fuzziness of positive closure is gradually reduced using iterative manner, including:
The object detection model trained by MILinear is obtained on training set image to all candidate windows Prediction probability, resolves into the negative bag of a positive closure and one, on the data set obtained after disassembly according to this prediction probability by positive closure One new MILinear object detection model of training, the possible iteration of the decomposable process is for several times.
Sensation target detection provided by the invention and mask method, have several obvious advantages:
1) it is based on a large amount of over-segmentations as a result, obtaining the candidate that target most probable occurs, by the way of selective search Window, the window that this mode obtains can be good at keeping the border of object, very high with real-world object coincidence factor, while several Hundred to keeping high recall rate in the case of thousands of a candidate windows.
2), using in advance on a very big image classification data collection the obtained convolutional neural networks of training from candidate's window Feature representation is extracted in mouthful, the feature-rich expression comprising stronger high-layer semantic information can be obtained, make model automatic Object apparent model is excavated out from image.
3) a kind of new more example linear SVM models, are employed, while inter-trust domain newton is based on using one kind The optimization algorithm of method optimizes, and the study of Weakly supervised detection model can be efficiently carried out on large-scale dataset.
4) a kind of new bag decomposition algorithm, is employed, it is negative by the way that positive sample bag is resolved into a positive sample bag and one Sample bag, substantially reduces the ambiguity in positive sample bag, can effectively improve the performance of Weakly supervised detection model.
Brief description of the drawings
Fig. 1 be according to the embodiment of the present invention based on Weakly supervised study sensation target detection with mask method model training with Test flow chart;
Fig. 2 is the MILinear schematic diagrames with being decomposed with bag according to MILinear of the embodiment of the present invention;
Fig. 3 is to optimize to show with other optimization method Comparative results using inter-trust domain Newton method according to the embodiment of the present invention It is intended to;
Fig. 4 is shown according to the object detection model prediction fraction that the embodiment of the present invention is trained and sample registration relation It is intended to;
Fig. 5 is to use some object classification performance improvement signals in bag decomposition algorithm iterative process according to the embodiment of the present invention Figure;
Fig. 6 be according to the object detection model that the embodiment of the present invention is trained on Pascal VOC2007 databases Testing result schematic diagram.
Embodiment
For the object, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with specific embodiment, and reference Attached drawing, the present invention is described in more detail.
The present invention thought main points be:1) by the way of selective search, it is based on a large amount of over-segmentations as a result, it is possible to Higher target recall rate and registration are obtained in the case of less candidate window;2) present invention is using very big at one in advance The convolutional neural networks that training obtains on image classification data collection extract feature representation from candidate window, can obtain comprising more The feature-rich expression of strong high-layer semantic information;3) a kind of new more example linear SVM models are employed, are used A kind of optimization algorithm based on inter-trust domain Newton method optimizes, and Weakly supervised inspection can be efficiently carried out on large-scale dataset Survey the study of model;4) present invention employs a kind of new bag decomposition algorithm, by the way that positive sample bag is resolved into a positive sample Bag and a negative sample bag, substantially reduce the ambiguity in positive sample bag, can effectively improve the performance of Weakly supervised detection model.
As shown in Figure 1, Fig. 1 top halfs be according to the embodiment of the present invention based on Weakly supervised study sensation target detection with Mask method model training flow chart.First, it is input picture;Secondly, the figure by using selective search algorithm to input As carrying out candidate window extraction, the candidate region of extraction is obtained;Then, candidate region, i.e. candidate window sample order are sent into Convolutional neural networks, obtain the feature description of each candidate region, i.e. Zonal expression;Finally, feature based describes, and uses this hair The algorithm based on Weakly supervised study of bright proposition carries out the automatic study of object apparent model, i.e. positive sample is excavated.Fig. 1 lower half Divide the test process for elaborating this method.For test image, candidate window is extracted by the way of as training process, so Feature description is carried out to window area using depth convolutional neural networks afterwards, finally trained object apparent model before use Classify to window area, realize target detection or mark task.This method comprises the following steps:
S1, candidate region extraction, extract candidate window from training set image using selective search algorithm and are used as candidate Region.
In the case of only given image class label, the object that some classifications are included in image, such as " vapour can only be known Car ", " people ", but be ignorant, this just needs to determine the outer of object by algorithm for the position of " automobile " and " people " Connect rectangle.If extracting all possible rectangle frame from image, the number of that all possible rectangle frame be it is very huge, It is also unpractical to deal with.Candidate region extraction algorithm is sought to by extracting a limited number of possible object rectangle frame, So that wherein include the object to be positioned as much as possible.Here there are three indexs most important:When the number of candidate window, Number is fewer, and efficiency of algorithm is higher;Second, the number comprising real-world object and all objects number in recall rate, namely candidate window Purpose ratio;Third, candidate window and the registration of real-world object boundary rectangle frame.Candidate window algorithm based on intensive collection, Window number is very huge, and recall rate and registration are all less desirable.
The selective search algorithm that the present invention uses is a kind of candidate window extraction algorithm based on over-segmentation, it is by adopting Over-segmentation is carried out to image with different parameters, obtains different image blocks, then using stratification tissue thought to piecemeal Merge, so as to find the most possible boundary rectangle for including object.Comprise the following steps that:First, by original image from RGB Color space conversion is to other color spaces, including HSV, RGI, I, Lab etc.;Then, the over-segmentation based on Graph is used respectively Algorithm splits respective image respectively, then constantly merges the highest Liang Kuai areas of similarity by the thought of stratification tissue Domain, obtains the stratification segmentation result of image.By multiple color spaces, multi-level cut zone set is combined, and is carried out After duplicate removal processing, the set of candidate regions of the figure is just obtained.
Selective search algorithm operational efficiency is higher, in the case of hundreds of to thousands of candidate windows, can obtain non- Often high recall rate and registration.
S2, carry out feature description to each candidate region using extensive convolutional neural networks trained in advance and export this Feature describes.
After getting and may include the candidate region of attention object, to pass through computer vision and pattern-recognition and calculate Method determines whether some candidate window is certain object, it is necessary to carry out feature description to the candidate region first, so as to Afterwards classification judgement is carried out using grader.Image classification with identification field, common Image Description Methods include SIFT, The low-level image features such as LBP, HOG describe, the middle level features description such as word bag model, and convolutional neural networks, depth belief network etc. are in recent years Popular stratification feature representation.Weakly supervised object detection and mark problem, what is solved is that the identification of object level is asked Topic, by eliminate Weakly supervised ambiguity answer what object somewhere this semantic hierarchies the problem of.This height Layer matter of semantics is not low-level image feature description and middle level features description can be handled very well, it is necessary to very abstract high-level characteristic Expression.Convolutional neural networks achieve a series of important breakthrough in object identification field, and the feature representation of its stratification, is realized Feature being successively abstracted by bottom to high-rise, and the characteristic layer before it is typically edge, Corner detector, as the number of plies increases More, feature below is gradually illustrated starting at object part, whole object.By the spy for extracting characteristic layer behind convolutional neural networks Sign, can obtain the description and expression to image higher level (such as object rank).Convolutional neural networks also have one it is important Characteristic be exactly that its model capacity is very big, and the number of plies is more, and neuron number is bigger, and model complexity is more, can encode and deposit The information content of storage is bigger.
Based on this, the present invention trained one big rule on the data set ImageNet 2013 of a very big image The convolutional neural networks of mould, substantial amounts of general object information is stored in the network.Preferably, using one large-scale one As object classification database ImageNet 2013 carry out the training of convolutional neural networks, training data includes 1000 classes about 120 Ten thousand images, the convolutional neural networks used include 5 convolutional layers, 2 full articulamentums, and behind the 1st, 2,5 convolutional layer Maximum convergence-level is connected, whole network includes about 650,000 neurons.Knowledge in existing largely just as the mankind helps to differentiate Object is the same, this contains the convolutional neural networks of a large amount of general vision prior informations, can be efficiently used for object into The general description of row.
S3, on the basis of only given image class label, using more Exemplary linear support vector machines MI-SVM in candidate Training object detection model in provincial characteristics expression.
The present invention gets candidate window set by using selective search algorithm from image, and uses one Extensive convolutional neural networks trained in advance carry out feature description to these candidate windows, and what is next done is exactly at this A little upper automatic study object detection models of candidate window feature description ,-utilize trained object detection model, it is possible to waiting Favored area is predicted, and finds region of the most probable there are object.
Weakly supervised object detection can usually be modeled as a multi-instance learning problem with mark problem.One image IiIt is logical Cross one and include niA d ties up exemplary bag BiTo describe, wherein j-th of example is denoted asIf being included at least in a bag has One example is positive sample, then the label y of the bagiAll it is negative sample if all of example, then the label of the bag for+1 yiFor -1.In order to avoid explicitly handling offset below, the present invention with the addition of a volume at the end of each exemplary characteristics Outer 1.Note
ξi≥0
Training set is B={ (Bi,yi) | i=1,2 ..., N }, | B |=N is training set number of samples, and w is grader coefficient, C is that regular terms is used to control the punishment to mistake classification, ξiIt is slack variable.
Under multi-instance learning frame, what the basic markup information of image was brought is the ambiguity in positive closure, i.e., only knows Do not know which is positive sample but including at least a positive sample.MI-SVM algorithms predict fraction W by only consideringT Maximum Example is predicted bag by this to solve the problems, such as this, as shown in Fig. 2 (a).The hyperplane of MI-SVM algorithms be by What the highest example of fraction each wrapped determined, it is a mixed integer programming problem that it, which optimizes formula, can only be by heuristic Algorithm is solved, and speed is very slow.
S3.1 MILinear algorithms
Different from the small data set of traditional multi-instance learning issue handling, present invention primarily contemplates comprising 5000 bags with Include above and each hundreds of to the thousand meter exemplary big data problems of higher-dimension.For preferably to weak under big data scale Monitor Problems carry out Efficient Solution, and the present invention proposes a kind of new to be calculated without the more Exemplary linear support vector machines of constraint large-spacing Method, is known as MILinear.Its formula is shown below:
WhereinIt is the feature vector of j-th of example in i-th of bag, yiIt is the classification mark of i-th of bag.Above formula Section 2 A square Hinge loss functions are employed, max (a, b) takes a, the maximum of b.
It is bag BiThe middle prediction highest exemplary index value of fraction.
It is used widely based on the optimization method of gradient on Large-scale Optimization Problems, present invention uses what can be led Hinge Loss loss functions.As shown in 2 (a), MI-SVM and MILinear are by selecting the example of fraction maximum come to this Large scale multi-instance learning problem is solved.
S3.2 bag decomposition algorithms
In the experiment of MILinear, it is a discovery of the invention that in a positive closure, positive sample is generally focused on fraction maximum Preceding 30%.After noticing this problem, the present invention proposes a kind of new bag decomposition algorithm, by the way that positive closure is being resolved into one just Bag and a negative bag, effectively reduce the ambiguity of positive closure.Preferably, the model trained by MILinear is in training image On obtain the prediction probability to all candidate windows, positive closure is resolved into by the negative bag of a positive closure and one according to this prediction probability, Specially the 30% of maximum probability is new positive closure, remaining sample becomes a new negative bag.Next, obtain after disassembly One new MILinear model of training on data set, as shown in Fig. 2 (b).By bag decomposition algorithm, reduce sample in positive closure Ambiguity, so as to improve category of model performance.This decomposable process may iteration for several times, until model performance is no longer improved to Only.
S3.3 gradient optimal methods
The definition of MILinear algorithms is above had been presented for, how is discussed below under large scale data set Efficiently carry out model learning.The optimization object function of MILinear be it is unconfined lead form, its first derivative is
Wherein
It is exemplary set of the interval less than 1.
After the gradient Analytical Expression of object function is obtained, just there are many methods to carry out objective function optimization , including stochastic gradient descent (SGD), L-BFGS, Nonlinear Conjugate Gradient Methods (CG) etc..Stochastic gradient descent method is to data set Handled, and model is updated one by one iteratively.L-BFGS is a kind of plan Newton optimization method, it passes through one kind The approximate low-rank method for solving of Hessian matrixes avoids storing whole Hessian matrixes.It is, in general, that stochastic gradient descent The cost often walked is relatively low but iteration time is longer, and the second order such as L-BFGS optimization method often walks time-consuming longer, but global convergence speed Degree is very fast.
For the more efficient apparent model learning of carry out object, the present invention proposes a kind of base more efficient than L-BFGS Optimize algorithm in more Exemplary linear support vector machines of inter-trust domain Newton method.Inter-trust domain Newton method is a kind of very efficient big ruler Unconstrained problem method for solving is spent, and is answered on general large scale logistic is returned and support vector machines is trained With.To apply inter-trust domain Newton Algorithm MILinear problems, generalized Hessian is calculated using formula below
Wherein I is unit matrix.
Inter-trust domain Newton method in an iterative manner optimizes object function, attempts to solve following include per suboptimization The subproblem of inter-trust domain
Wherein skIt is to update step-length, ΔkIt is inter-trust domain, g (wk) core H (wk) it is MILinear object functions (formula 2) respectively First derivative and second dervative.
This subproblem can use the conjugate gradient method for considering inter-trust domain to carry out Efficient Solution.
Renewal step-length s is obtained in solutionkAfterwards, if the decline of realistic objective function is sufficiently large, then just to wkIt is updated, Otherwise w is keptkIt is constant.
Wherein η0It is the positive number that the minimum acceptable actual function of a pre-defined control declines, actual function declines big In the value, then more new direction is received, and in an embodiment of the present invention, it is 1e-4 to be preferably provided with it.
Strictly, the object function of MILinear is due to introducing max functions, thus is non-convex.The target letter at the same time Number is nor what second order can be led.Although it cannot be guaranteed that globally optimal solution, in practical situations, the algorithm can be effectively from big Object apparent model is arrived in study on scale data collection.
S4, extract candidate region in test image, and makes to carry out feature description in the same way, training before use Obtained object detection model orientation object interested.In test phase, obtained first by selective search algorithm certain The candidate region of quantity, then carries out feature description using the convolutional neural networks as the training stage.Afterwards before use The object apparent model that training obtains classifies window feature, so as to judge whether each candidate window is interested Object, draws any object in conclusion where.This completes only utilize image tag information realization thing interested The automatic detection of body and mark.
Fig. 3 is to optimize to show with other optimization method Comparative results using inter-trust domain Newton method according to the embodiment of the present invention It is intended to, Fig. 4 is illustrated according to the object detection model prediction fraction that the embodiment of the present invention is trained and sample registration relation Figure, Fig. 5 are to use some object classification performance improvement schematic diagrames in bag decomposition algorithm iterative process, figure according to the embodiment of the present invention 6 be that testing result according to the object detection model that the embodiment of the present invention is trained on Pascal VOC2007 databases is shown It is intended to.
In short, the present invention proposes a kind of new detection of the sensation target based on Weakly supervised study and mask method, use Selective search algorithm carries out candidate window extraction, and time is used as using the deep layer convolutional neural networks of the pre-training in mass data Window feature expression model and general priori are selected, and positive sample is carried out using a kind of algorithm based on more Exemplary linear support vector machines This excavation.Model optimization is carried out by using inter-trust domain Newton method, and is progressively reduced using a kind of novel bag decomposition algorithm The ambiguity of positive closure, this method realize sensation target detection and automatic marking under Weakly supervised scene.Experiment shows the invention With the Weakly supervised sensation target detection of mainstream compared with mask method, there is stronger positive sample mining ability and more generally apply Prospect, the sensation target detection being suitable on large-scale dataset and automatic marking task.
Particular embodiments described above, has carried out the purpose of the present invention, technical solution and beneficial effect further in detail Describe in detail it is bright, it should be understood that the foregoing is merely the present invention specific embodiment, be not intended to limit the invention, it is all Within the spirit and principles in the present invention, any modification, equivalent substitution, improvement and etc. done, should be included in the guarantor of the present invention Within the scope of shield.

Claims (6)

1. a kind of sensation target detection and mask method, it is characterised in that including:
Image input step, inputs image to be detected;
Candidate region extraction step, extracts candidate window from described image to be detected using selective search algorithm and is used as candidate Region;
Feature describes extraction step, and feature description is carried out simultaneously to candidate region using the extensive convolutional neural networks of training in advance Export the feature description of the candidate region;
Sensation target prediction steps, the feature description based on the candidate region, utilize object detection model pair trained in advance Candidate region is predicted, and estimates the region there are the sensation target;
Position annotation step, is labeled the position of the sensation target according to the estimated result;
Wherein, the selective search algorithm in the candidate region extraction step further comprises:
It is predetermined color space by the color space conversion of image to be detected, using the over-segmentation algorithm based on Graph to described Image is split, and constantly merges the highest two pieces of regions of similarity, the stratification segmentation result of image is obtained, by multiple colors Behind space and the merging of multi-level cut zone set and duplicate removal processing, the set of candidate regions of the image is obtained;
The method further includes:
Object detection model training step, specifically includes:
Training set image of the input with image category label;
Candidate window is extracted as candidate region from training set image using selective search algorithm;
Feature description is carried out to candidate region using the extensive convolutional neural networks of training in advance and exports the candidate region Feature describes;
Feature description based on the candidate region, object detection model is trained using more Exemplary linear support vector machines.
2. according to the method described in claim 1, it is characterized in that, the predetermined color space includes:HSV, RGI, I, Lab.
3. according to the method described in claim 1, it is characterized in that, the convolutional neural networks trained in advance are:Based on thing The convolutional neural networks that body taxonomy database ImageNet 2013 is trained.
4. according to the method described in claim 1, it is characterized in that, described use more Exemplary linear support vector machines training object Detection model, including:
Object detection model is trained without constraint large-spacing multi-instance learning algorithm using MILinear, its object function is
<mrow> <munder> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <mi>w</mi> </munder> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mo>|</mo> <mo>|</mo> <mi>w</mi> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> <mo>+</mo> <mfrac> <mi>C</mi> <mrow> <mo>|</mo> <mi>B</mi> <mo>|</mo> </mrow> </mfrac> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mo>|</mo> <mi>B</mi> <mo>|</mo> </mrow> </munderover> <msup> <mrow> <mo>(</mo> <mi>m</mi> <mi>a</mi> <mi>x</mi> <mo>(</mo> <mrow> <mn>0</mn> <mo>,</mo> <mn>1</mn> <mo>-</mo> <msup> <mi>y</mi> <mi>i</mi> </msup> <msup> <mi>w</mi> <mi>T</mi> </msup> <msubsup> <mi>B</mi> <msub> <mi>I</mi> <mi>i</mi> </msub> <mi>i</mi> </msubsup> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>,</mo> </mrow>
Wherein, an image IiN is included by oneiA d ties up exemplary bag BiTo describe, wherein j-th of example is denoted as Bi j;If It is positive sample that being included at least in one bag, which has an example, then the label y of the bagiFor+1, if all examples are all negative samples This, then the label y of the bagiFor -1, training set is B={ (Bi, yi) | i=1,2 ..., N }, | B |=N is training set sample number Mesh, w are grader coefficients, wTIt is the transposition of w, C is that regular terms is used to control the punishment to mistake classification, It is bag BiThe middle prediction highest exemplary index value of fraction.
5. according to the method described in claim 4, it is characterized in that, MILinear algorithms are asked using inter-trust domain Newton method Solution, including:
The optimization object function for determining MILinear be it is unconfined lead object function, its first derivative is
<mrow> <mi>g</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>w</mi> <mo>+</mo> <mn>2</mn> <mfrac> <mi>C</mi> <mrow> <mo>|</mo> <mi>B</mi> <mo>|</mo> </mrow> </mfrac> <munder> <mi>&amp;Sigma;</mi> <mrow> <mi>i</mi> <mo>&amp;Element;</mo> <msub> <mi>I</mi> <mi>B</mi> </msub> </mrow> </munder> <mrow> <mo>(</mo> <msup> <mi>w</mi> <mi>T</mi> </msup> <msubsup> <mi>B</mi> <msub> <mi>I</mi> <mi>i</mi> </msub> <mi>i</mi> </msubsup> <msubsup> <mi>B</mi> <msub> <mi>I</mi> <mi>i</mi> </msub> <mrow> <mi>i</mi> <mi>T</mi> </mrow> </msubsup> <mo>-</mo> <msup> <mi>y</mi> <mi>i</mi> </msup> <msubsup> <mi>B</mi> <msub> <mi>I</mi> <mi>i</mi> </msub> <mrow> <mi>i</mi> <mi>T</mi> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>,</mo> </mrow>
Wherein,It is exemplary set of the interval less than 1;
Generalized Hessian is calculated by formula below
Wherein, I is unit matrix;
Object function is optimized in an iterative manner, is calculated
<mrow> <mtable> <mtr> <mtd> <mrow> <msup> <mi>s</mi> <mi>k</mi> </msup> <mo>=</mo> <mi>min</mi> <mi> </mi> <msub> <mi>q</mi> <mi>k</mi> </msub> <mrow> <mo>(</mo> <mi>s</mi> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mi>min</mi> <mi>s</mi> </munder> <mo>&amp;dtri;</mo> <mi>f</mi> <msup> <mrow> <mo>(</mo> <msup> <mi>w</mi> <mi>k</mi> </msup> <mo>)</mo> </mrow> <mi>T</mi> </msup> <mi>s</mi> <mo>+</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <msup> <mi>s</mi> <mi>T</mi> </msup> <msup> <mo>&amp;dtri;</mo> <mn>2</mn> </msup> <mi>f</mi> <mrow> <mo>(</mo> <msup> <mi>w</mi> <mi>k</mi> </msup> <mo>)</mo> </mrow> <mi>s</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mtable> <mtr> <mtd> <mrow> <mo>=</mo> <munder> <mi>min</mi> <mi>s</mi> </munder> <mi>g</mi> <msup> <mrow> <mo>(</mo> <msup> <mi>w</mi> <mi>k</mi> </msup> <mo>)</mo> </mrow> <mi>T</mi> </msup> <mi>s</mi> <mo>+</mo> <mfrac> <mi>1</mi> <mi>2</mi> </mfrac> <msup> <mi>s</mi> <mi>T</mi> </msup> <mi>H</mi> <mrow> <mo>(</mo> <msup> <mi>w</mi> <mi>k</mi> </msup> <mo>)</mo> </mrow> <mi>s</mi> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>s</mi> <mo>.</mo> <mi>t</mi> <mo>.</mo> </mrow> </mtd> <mtd> <mrow> <mo>|</mo> <mo>|</mo> <mi>s</mi> <mo>|</mo> <mo>|</mo> <mo>&amp;le;</mo> <msub> <mi>&amp;Delta;</mi> <mi>k</mi> </msub> </mrow> </mtd> </mtr> </mtable> </mtd> </mtr> </mtable> <mo>,</mo> </mrow>
Wherein, k is iterations, and s is to update step-length, skIt is the renewal step-length of kth time iteration, wkIt is the weights of iteration kth time, ΔkIt is inter-trust domain,WithIt is the single order of MILinear object functions respectively Derivative and second dervative;
Renewal step-length s is obtained in solutionkAfterwards, if the decline of realistic objective function is sufficiently large, then just to wkIt is updated, otherwise Keep wkConstant, formula is as follows:
<mrow> <msup> <mi>w</mi> <mrow> <mi>k</mi> <mo>+</mo> <mn>1</mn> </mrow> </msup> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <msup> <mi>w</mi> <mi>k</mi> </msup> <mo>+</mo> <msup> <mi>s</mi> <mi>k</mi> </msup> </mrow> </mtd> <mtd> <mrow> <mi>i</mi> <mi>f</mi> <mfrac> <mrow> <mi>f</mi> <mrow> <mo>(</mo> <msup> <mi>w</mi> <mi>k</mi> </msup> <mo>+</mo> <msup> <mi>s</mi> <mi>k</mi> </msup> <mo>)</mo> </mrow> <mo>-</mo> <mi>f</mi> <mrow> <mo>(</mo> <msup> <mi>w</mi> <mi>k</mi> </msup> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>q</mi> <mi>k</mi> </msub> <mrow> <mo>(</mo> <msup> <mi>s</mi> <mi>k</mi> </msup> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>&gt;</mo> <msub> <mi>&amp;eta;</mi> <mn>0</mn> </msub> <mo>,</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <msup> <mi>w</mi> <mi>k</mi> </msup> </mtd> <mtd> <mrow> <mi>o</mi> <mi>t</mi> <mi>h</mi> <mi>e</mi> <mi>r</mi> <mi>w</mi> <mi>i</mi> <mi>s</mi> <mi>e</mi> <mo>.</mo> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>,</mo> </mrow>
Wherein η0It is the positive number that the minimum acceptable actual function of a pre-defined control declines.
6. according to the method described in claim 5, utilize trained object detection model running bag it is characterized in that, further including Decomposition algorithm, the fuzziness of positive closure is gradually reduced using iterative manner, is specifically included:
The object detection model trained by MILinear obtains the prediction to all candidate windows on training set image Probability, resolves into a positive closure and a negative bag by positive closure according to this prediction probability, is trained on the data set obtained after disassembly One new MILinear object detection model, the decomposable process need iteration for several times.
CN201410442817.4A 2014-09-02 2014-09-02 A kind of sensation target detection and mask method Active CN104217225B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410442817.4A CN104217225B (en) 2014-09-02 2014-09-02 A kind of sensation target detection and mask method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410442817.4A CN104217225B (en) 2014-09-02 2014-09-02 A kind of sensation target detection and mask method

Publications (2)

Publication Number Publication Date
CN104217225A CN104217225A (en) 2014-12-17
CN104217225B true CN104217225B (en) 2018-04-24

Family

ID=52098687

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410442817.4A Active CN104217225B (en) 2014-09-02 2014-09-02 A kind of sensation target detection and mask method

Country Status (1)

Country Link
CN (1) CN104217225B (en)

Families Citing this family (55)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104572965A (en) * 2014-12-31 2015-04-29 南京理工大学 Search-by-image system based on convolutional neural network
CN104573669B (en) * 2015-01-27 2018-09-04 中国科学院自动化研究所 Image object detection method
CN105989174B (en) * 2015-03-05 2019-11-01 欧姆龙株式会社 Region-of-interest extraction element and region-of-interest extracting method
CN104700118A (en) * 2015-03-18 2015-06-10 中国科学院自动化研究所 Pulmonary nodule benignity and malignancy predicting method based on convolutional neural networks
CN106156777B (en) * 2015-04-23 2020-06-02 华中科技大学 Text picture detection method and device
CN106327516B (en) * 2015-06-29 2018-12-18 北京雷动云合智能技术有限公司 A kind of learning-oriented visual pursuit method based on display model
CN105069774B (en) * 2015-06-30 2017-11-10 长安大学 The Target Segmentation method of optimization is cut based on multi-instance learning and figure
CN105138983B (en) * 2015-08-21 2019-06-28 燕山大学 The pedestrian detection method divided based on weighting block model and selective search
WO2017059576A1 (en) * 2015-10-09 2017-04-13 Beijing Sensetime Technology Development Co., Ltd Apparatus and method for pedestrian detection
CN106611177A (en) * 2015-10-27 2017-05-03 北京航天长峰科技工业集团有限公司 Big data-based image classification method
WO2017096570A1 (en) * 2015-12-10 2017-06-15 Intel Corporation Visual recognition using deep learning attributes
CN105678322A (en) * 2015-12-31 2016-06-15 百度在线网络技术(北京)有限公司 Sample labeling method and apparatus
CN108475331A (en) * 2016-02-17 2018-08-31 英特尔公司 Use the candidate region for the image-region for including interested object of multiple layers of the characteristic spectrum from convolutional neural networks model
CN105868269A (en) * 2016-03-08 2016-08-17 中国石油大学(华东) Precise image searching method based on region convolutional neural network
CN105893963B (en) * 2016-03-31 2019-03-08 南京邮电大学 A kind of method of the best frame easy to identify of single pedestrian target in screening video
CN105956563B (en) * 2016-05-06 2019-04-16 西安工程大学 The method for carrying out face mark in news image based on multi-instance learning
CN106127204B (en) * 2016-06-30 2019-08-09 华南理工大学 A kind of multi-direction meter reading Region detection algorithms of full convolutional neural networks
CN106203450A (en) * 2016-07-11 2016-12-07 国家新闻出版广电总局广播科学研究院 Based on degree of depth learning framework, image is carried out the object detection method of feature extraction
CN106227836B (en) * 2016-07-26 2020-07-14 上海交通大学 Unsupervised joint visual concept learning system and unsupervised joint visual concept learning method based on images and characters
CN106326985A (en) * 2016-08-18 2017-01-11 北京旷视科技有限公司 Neural network training method, neural network training device, data processing method and data processing device
CN106326893A (en) * 2016-08-25 2017-01-11 安徽水滴科技有限责任公司 Vehicle color recognition method based on area discrimination
US10140508B2 (en) * 2016-08-26 2018-11-27 Huawei Technologies Co. Ltd. Method and apparatus for annotating a video stream comprising a sequence of frames
CN106384345B (en) * 2016-08-31 2019-04-02 上海交通大学 A kind of image detection and flow statistical method based on RCNN
CN107918624A (en) * 2016-10-11 2018-04-17 富士通株式会社 Image retrieving apparatus and method, electronic equipment
CN106504233B (en) * 2016-10-18 2019-04-09 国网山东省电力公司电力科学研究院 Unmanned plane inspection image electric power widget recognition methods and system based on Faster R-CNN
CN106529485A (en) * 2016-11-16 2017-03-22 北京旷视科技有限公司 Method and apparatus for obtaining training data
WO2018107371A1 (en) 2016-12-13 2018-06-21 上海联影医疗科技有限公司 Image searching system and method
CN108229514A (en) * 2016-12-29 2018-06-29 北京市商汤科技开发有限公司 Object detecting method, device and electronic equipment
WO2018120038A1 (en) * 2016-12-30 2018-07-05 深圳前海达闼云端智能科技有限公司 Method and device for target detection
CN108303748A (en) * 2017-01-12 2018-07-20 同方威视技术股份有限公司 The method for checking equipment and detecting the gun in luggage and articles
CN108303747A (en) * 2017-01-12 2018-07-20 清华大学 The method for checking equipment and detecting gun
CN106815604B (en) * 2017-01-16 2019-09-27 大连理工大学 Method for viewing points detecting based on fusion of multi-layer information
CN106909887A (en) * 2017-01-19 2017-06-30 南京邮电大学盐城大数据研究院有限公司 A kind of action identification method based on CNN and SVM
CN106934344B (en) * 2017-01-23 2020-01-31 西北大学 quick pedestrian detection method based on neural network
CN106934346B (en) * 2017-01-24 2019-03-15 北京大学 A kind of method of target detection performance optimization
CN106845430A (en) * 2017-02-06 2017-06-13 东华大学 Pedestrian detection and tracking based on acceleration region convolutional neural networks
CN107038448B (en) * 2017-03-01 2020-02-28 中科视语(北京)科技有限公司 Target detection model construction method
CN106991400A (en) * 2017-04-05 2017-07-28 北京中燕信息技术有限公司 A kind of fire hazard smoke detecting method and device
CN107203781B (en) * 2017-05-22 2020-07-28 浙江大学 End-to-end weak supervision target detection method based on significance guidance
CN107330449A (en) * 2017-06-13 2017-11-07 瑞达昇科技(大连)有限公司 A kind of BDR sign detection method and device
CN107609483B (en) * 2017-08-15 2020-06-16 中国科学院自动化研究所 Dangerous target detection method and device for driving assistance system
CN107562050B (en) * 2017-08-29 2021-03-16 广东工业大学 Method and system for robot to recognize environment
CN107945153A (en) * 2017-11-07 2018-04-20 广东广业开元科技有限公司 A kind of road surface crack detection method based on deep learning
CN108319633A (en) * 2017-11-17 2018-07-24 腾讯科技(深圳)有限公司 A kind of image processing method, device and server, system, storage medium
CN108062574B (en) * 2017-12-31 2020-06-16 厦门大学 Weak supervision target detection method based on specific category space constraint
CN110147796A (en) * 2018-02-12 2019-08-20 杭州海康威视数字技术股份有限公司 Image matching method and device
CN108596223A (en) * 2018-04-11 2018-09-28 珠海博明视觉科技有限公司 A method of automatically generating object data set
CN109063559B (en) * 2018-06-28 2021-05-11 东南大学 Pedestrian detection method based on improved region regression
CN109492686A (en) * 2018-11-01 2019-03-19 郑州云海信息技术有限公司 A kind of picture mask method and system
CN109492702B (en) * 2018-11-21 2020-09-22 中国科学院自动化研究所 Pedestrian re-identification method, system and device based on ranking measurement function
CN109857878A (en) * 2018-12-27 2019-06-07 深兰科技(上海)有限公司 Article mask method and device, electronic equipment and storage medium
CN109740571A (en) * 2019-01-22 2019-05-10 南京旷云科技有限公司 The method of Image Acquisition, the method, apparatus of image procossing and electronic equipment
CN111488400B (en) * 2019-04-28 2021-03-30 北京京东尚科信息技术有限公司 Data classification method, device and computer readable storage medium
CN110288629A (en) * 2019-06-24 2019-09-27 湖北亿咖通科技有限公司 Target detection automatic marking method and device based on moving Object Detection
CN110929729B (en) * 2020-02-18 2020-08-04 北京海天瑞声科技股份有限公司 Image annotation method, image annotation device and computer storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8355569B2 (en) * 2006-08-10 2013-01-15 Nec Corporation Object region extracting device
CN103870834A (en) * 2014-04-03 2014-06-18 张琰 Method for searching for sliding window based on layered segmentation
CN103984959A (en) * 2014-05-26 2014-08-13 中国科学院自动化研究所 Data-driven and task-driven image classification method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2207138B1 (en) * 2007-10-30 2016-12-28 PASCO Corporation House movement determining method, house movement determining program, house movement determining image generating method, and house movement determining image

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8355569B2 (en) * 2006-08-10 2013-01-15 Nec Corporation Object region extracting device
CN103870834A (en) * 2014-04-03 2014-06-18 张琰 Method for searching for sliding window based on layered segmentation
CN103984959A (en) * 2014-05-26 2014-08-13 中国科学院自动化研究所 Data-driven and task-driven image classification method

Also Published As

Publication number Publication date
CN104217225A (en) 2014-12-17

Similar Documents

Publication Publication Date Title
Huang et al. Urban land-use mapping using a deep convolutional neural network with high spatial resolution multispectral remote sensing imagery
Kümmerer et al. DeepGaze II: Reading fixations from deep features trained on object recognition
Caicedo et al. Evaluation of deep learning strategies for nucleus segmentation in fluorescence images
Komura et al. Machine learning methods for histopathological image analysis
Romera-Paredes et al. Recurrent instance segmentation
Zhang et al. A Linear Dirichlet Mixture Model for decomposing scenes: Application to analyzing urban functional zonings
Tian et al. Detecting text in natural image with connectionist text proposal network
Zhang et al. Hierarchical semantic cognition for urban functional zones with VHR satellite images and POI data
Namin et al. Deep phenotyping: deep learning for temporal phenotype/genotype classification
Pan et al. Accurate segmentation of nuclei in pathological images via sparse reconstruction and deep convolutional networks
Vetrivel et al. Disaster damage detection through synergistic use of deep learning and 3D point cloud features derived from very high resolution oblique aerial images, and multiple-kernel-learning
Nasiri et al. A whale optimization algorithm (WOA) approach for clustering
EP3364341A1 (en) Analyzing digital holographic microscopy data for hematology applications
Nichols et al. Machine learning: applications of artificial intelligence to imaging and diagnosis
US10303979B2 (en) System and method for classifying and segmenting microscopy images with deep multiple instance learning
Boom et al. A research tool for long-term and continuous analysis of fish assemblage in coral-reefs using underwater camera footage
Sudderth et al. Learning hierarchical models of scenes, objects, and parts
Raut et al. Image segmentation–a state-of-art survey for prediction
CN104573669B (en) Image object detection method
Zhang et al. Hybrid region merging method for segmentation of high-resolution remote sensing images
Angermueller et al. Deep learning for computational biology
Shi et al. Cloud detection of remote sensing images by deep learning
US10417524B2 (en) Deep active learning method for civil infrastructure defect detection
Kumar et al. Automatic cluster evolution using gravitational search algorithm and its application on image segmentation
Weinmann et al. A classification-segmentation framework for the detection of individual trees in dense MMS point cloud data acquired in urban areas

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant