CN107833213A - A kind of Weakly supervised object detecting method based on pseudo- true value adaptive method - Google Patents
A kind of Weakly supervised object detecting method based on pseudo- true value adaptive method Download PDFInfo
- Publication number
- CN107833213A CN107833213A CN201711066445.XA CN201711066445A CN107833213A CN 107833213 A CN107833213 A CN 107833213A CN 201711066445 A CN201711066445 A CN 201711066445A CN 107833213 A CN107833213 A CN 107833213A
- Authority
- CN
- China
- Prior art keywords
- candidate region
- weakly supervised
- pseudo
- bounding box
- true value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
Abstract
The present invention relates to a kind of Weakly supervised object detecting method based on pseudo- true value adaptive method, it is that need to rely on the database for largely having markup information to solve existing full supervision object detector, and when mutually being blocked containing multiple objects and object in picture the shortcomings that object space detection is inaccurate and propose, including:Picture is input in Weakly supervised object detector, detector output result is subjected to non-maximum restraining processing, the bounding box of every kind of object highest scoring is chosen in result;Candidate region is trained to produce network according to the positional information for the bounding box selected, and retain the bounding box for being more than certain value with the bounding box overlapping area of highest scoring, pixel coordinate corresponding to the candidate region of same object is averaged, unique bounding box of each object is determined according to result of calculation;Full supervision object detector is inputed to using bounding box information as pseudo- true value.The present invention is applied to the general object detection technology in object detection technology, especially real scene.
Description
Technical field
The present invention relates to field of machine vision, and in particular to a kind of Weakly supervised object detection based on pseudo- true value adaptive method
Method.
Background technology
Object detection is a very important research topic in field of machine vision, it be image segmentation, object tracking,
The basic technology of the advanced tasks such as behavior act analysis identification.In addition, development image and video with development of Mobile Internet technology
Quantity increase in a manner of explosion type, be badly in need of a kind of skill that can fast and accurately identify, position object in image and video
Art, so as to the acquisition of the intelligent classification and key message of successive image video.Present object detection technology is widely applied to existing
During generation is social, such as the Face datection in safety-security area, pedestrian detection, Traffic Sign Recognition, vehicle detection in intelligent transportation chase after
Track, self-navigation driving, robot path planning etc..
Because object detection technology has important theoretical research value and urgent practical application request, examined for object
The relevant art of survey is also evolving with new, and the present invention is roughly divided into two classes:Conventional method based on sliding window and
Modernism based on deep learning.
Traditional method is to give a detected picture, and whole image is carried out once time using the method for sliding window
Go through.Because accredited image is likely to occur any position in the picture, and the size of target, the ratio of width to height are all uncertain
, repeatedly slided so needing to design different scale, different the ratio of width to height window on tested altimetric image.It is this traditional
Exhaustive method always finds the position (being referred to as candidate region) of object appearance, but has the shortcomings that obvious:If slide
Window size and the ratio of width to height is less, step-length is too big, then can not detect all objects;If sliding window yardstick and width are high
It is small to compare more and step-length, then causes redundancy window too many, time-consuming oversize, it is impossible to meet the needs of real-time in practical application.It is logical
Cross after sliding window selectes each candidate region, conventional method adopts the feature for manually extracting these candidate regions
(being referred to as shallow-layer feature), common method have scale invariant feature conversion extraction and analysis method (Scale-invariant feature
Transform, SIFT), Lis Hartel sign method (Haar-like features), histograms of oriented gradients feature extraction
(Histogram of Oriented Gradient, HOG), local binary feature extraction (Local Binary Pattern,
LBP) etc..In order to improve identification positioning precision, it will usually merge feature caused by above-mentioned various features extraction method as candidate regions
The feature in domain.Finally, a grader is designed to identify the classification of object in each candidate region, and common grader has:Branch
Hold vector machine (Support Vector Machine, SVM), adaptively strengthen method (Adaptive Boosting,
AdaBoost) etc., the flow chart based on conventional method object detection is as shown in Figure 1.Traditional " sliding window+artificial extraction spy
The framework of sign+shallow-layer grader ", because excessive redundancy window and feature representation ability weaker (shallow-layer feature) cause to calculate
Speed and accuracy of detection can not all meet actual demand.
After 2012, deep learning achieves breakthrough in image classification problem (what classification objects in images is),
The feature of appearance and convolutional neural networks (CNN) extraction mainly due to large database (ImageNet) has stronger table
Danone power, as 4096 dimension datas of full articulamentum (Fully connected layer) in VGG-16 models are used for representing image
Feature, the feature (further feature) of this deep learning extraction contains stronger semantic information.Then, deep learning is utilized
The method of extraction further feature is also utilized object detection, and (including what object isAt which) in field, now detection essence
Degree has a certain upgrade, but detection speed is still relatively slow, or even (characteristic dimension is bigger, network depth more slowly than conventional method
It is deeper) because now simply solving the problems, such as that the ability to express of shallow-layer feature manually extracted is weak and by shallow-layer grader
The convolutional neural networks (Convolution Neural Network, CNN) of deep learning are replaced with, still rely on sliding window
The method of mouth solves the Issues On Multi-scales of object detection, so still the problem of bulk redundancy window be present.For sliding window
The problem of bringing, candidate region (region proposal) method give good solution, and candidate region utilizes image
The information such as edge, texture and color find out the position that object is likely to occur in image (frame of video) in advance, its quantity is usually
Hundreds of according to actual conditions to thousands of (setting).This method can keep higher recall rate under less candidate region,
Operation time is so greatly reduced, improves detection speed.Method caused by more commonly used candidate region has
Selective Search, Edge Boxes, Region Proposal Network (RPN) etc..Based on candidate region depth
The object detection flow chart of study is as shown in Figure 2.Based on " candidate region (Proposals Region)+convolutional neural networks
(CNN) deep learning framework " balances the problem of conflicting between detection time and accuracy of detection, and is examined faster
Higher accuracy of detection can be obtained by surveying under the time.
However, whether based on the conventional art still modern technologies based on deep learning of sliding window, at this stage
Research is carried out on fixed database (PASCAL VOC, Microsoft COCO etc., refer to table 1), and needs pair
The particular location occurred in each pictures in data set comprising which object and object is labeled.And it is based on deep learning
Method again rely on substantial amounts of training data (tens of thousands of to arrive hundreds of thousands pictures), the so large-scale data for having mark of structure
Storehouse is a giant-scale engineering taken time and effort.In addition, the database of these marks has the disadvantage that:First, the thing in database
Body classification is limited, the object classification under the real scene in practical application may not be consistent with the object classification of database or
Classification considerably beyond in these databases;Second, when manually marking the position of object in the picture with certain subjectivity
Property, in the case of mutually being blocked containing multiple objects and object especially in picture, this can cause mark with certain inclined
Difference, these mark deviations are likely to make model converge on some locally optimal solution when training pattern, and final result is exactly
Object space detection is inaccurate.
The content of the invention
The invention aims to solve existing full supervision object detector to need dependence largely to have markup information
Database, while solve to mark error so that object space is examined when containing multiple objects and object in picture and mutually blocking
Indeterminacy is true, and the object for needing to detect in practical application may not be consistent with the object classification of database or much surpass
The shortcomings that crossing the classification in these databases, and a kind of Weakly supervised object detecting method based on pseudo- true value adaptive method is proposed,
Including:
Step 1), structure training sample;
Step 2), the picture in training sample is input to based on more example learning method (Multiple-Instance
Learning in Weakly supervised object detector);
Step 3), the output result progress non-maximum restraining processing by Weakly supervised object detector, in result picture
Choose the bounding box of every kind of object highest scoring;
Step 4), the positional information training candidate region for the bounding box chosen according to step 3) produce network (Region
Proposal Network, RPN), produce network using the candidate region and produce multiple candidate regions, retain and true value weight
Area is closed than all candidate regions more than certain threshold value;The object of each classification corresponds to multiple candidate regions;
Step 5), the pixel coordinate corresponding to the candidate region of same object averaged, it is true according to result of calculation
Unique bounding box of fixed each object;
Step 6), the information of the bounding box obtained in step 5) is inputed to as pseudo- true value and supervises object detector entirely,
Obtain testing result.
Beneficial effects of the present invention are:1st, the invention enables the object detection technology based on deep learning not by training data
The limitation of the problems such as rare and artificial labeled data deviation, promote based on deep learning the object detection under real scene
Using;2nd, accurate testing result can also be reached when containing multiple objects in picture and object mutually blocks;3rd, this hair
MAP data in bright experimental result are 52.4%, hence it is evident that higher than the 41.6% of prior art and 45.8%;Present invention experiment
As a result the Corloc data in are 70.3%, hence it is evident that higher than the 61.4% of prior art and 65.0%.
Brief description of the drawings
Fig. 1 is the object detection flow chart based on conventional method;
Fig. 2 is the object detection flow chart based on candidate region deep learning;
Fig. 3 is Weakly supervised object detector testing result exemplary plot;Fig. 3 (a) to Fig. 3 (e) represents the inspection to different images
Survey result;
Fig. 4 is Weakly supervised object detector testing result score exemplary plot;
Fig. 5 is conventional method and pseudo- true value adaptive method schematic diagram;Wherein Fig. 5 (a) represents the conventional method of prior art;
The method that Fig. 5 (b) represents the present invention;
Fig. 6 is the Weakly supervised object detecting method flow chart based on pseudo- true value adaptive method;
Fig. 7 is the Weakly supervised detector schematic diagram based on more event selections
Fig. 8 experimental result pictures;Fig. 8 (a) to Fig. 8 (o) is the experimental result for different images.
Embodiment
Embodiment one:The Weakly supervised object detecting method based on pseudo- true value adaptive method of present embodiment, bag
Include:
Step 1), structure training sample;
Step 2), the picture in training sample is input to based on more example learning method (Multiple-Instance
Learning in Weakly supervised object detector);
Step 3), the output result progress non-maximum restraining processing by Weakly supervised object detector, in result picture
Choose the bounding box of every kind of object highest scoring;
Step 4), the positional information training candidate region for the bounding box chosen according to step 3) produce network, using described
Candidate region produces network and produces multiple candidate regions, retains with true value overlapping area than all candidates more than certain threshold value
Region;The object of each classification corresponds to multiple candidate regions;
Step 5), the pixel coordinate corresponding to the candidate region of same object averaged, it is true according to result of calculation
Unique bounding box of fixed each object;Seeking the process of pixel coordinate can be:All candidate regions of an object are calculated respectively
The upper left corner, the lower left corner, the upper right corner, the average value of bottom right angular coordinate, a unique border is determined according to this four average values
Frame.
Step 6), the information of the bounding box obtained in step 5) is inputed to as pseudo- true value and supervises object detector entirely,
Obtain testing result.Wherein pseudo- true value and be the true value information of real handmarking, is found by the method for present embodiment
The approximation of one true value serves as true value.
The process of step 3) to step 5) can specifically describe according to Fig. 5 (b), and it is defeated that the left numbers of Fig. 5 (b) first, which open figure,
The original picture entered;Second figure represents the bounding box of the every kind of object highest scoring obtained by the processing of step 3);3rd
Figure represents to input the bounding box of highest scoring to candidate region generation network, and obtained multiple candidate regions, each object
Corresponding multiple candidate regions are done pixel and are averaged, and just obtain each unique bounding box of object in the 4th width figure.
Specifically, the present invention with the image (frame of video) under real scene for research object, the class of specific detection object
It can not determined according to the practical problem of oneself.Due to the development of present Internet technology, it is general that picture video obtains equipment
And the picture on present YouTube and video are increased with the speed that 58 pictures are per second and 3.6 videos are per second according to statistics
It is long.As long as user crawls picture on a search engine according to the detection classification of oneself in the form of keyword, it is possible to establishes
The database being consistent with oneself practical problem, solves in existing fixed data storehouse that object classification is few, object classification and reality
Need to detect the problem of classification is not consistent.Simultaneously as the mark of positional information is not needed, it is not necessary to substantial amounts of manpower and materials
Mark database is removed, it also avoid the deviation that artificial mark subjectivity is introduced.
Establish after tranining database, it is possible to utilize existing Weakly supervised one Weakly supervised thing of object detection technique drill
Detector.It is so-called Weakly supervised, refer to that each training sample has corresponding supervision message, but supervision here is believed
Breath is Weakly supervised to refer to that every pictures have object type in simple information either imperfect information, such as the present invention
Other information (which object is included in picture), but there is no object location information (object is at which).Existing Weakly supervised object inspection
Survey technology is all that the object detection under Weakly supervised information is regarded as event selection (Multiple Instance more than one
Learning, MIL) problem, this method has two shortcomings:First, model is more sensitive for initializing;Second, it is one
Individual non-convex problem, model can converge on a locally optimal solution.Reflection directly perceived is exactly that object detector is only able to detect a thing
The most characteristic part of body, rather than the whole part of object, such as when detection pedestrian, it is only able to detect the position of face
Put and not all body, detection animal when can only position the head of object rather than whole body, as shown in Figure 3.
The present invention deploys to study to Weakly supervised object detector, it is found that object detector is to examine in most cases
Whole object is measured, simply the score of the detection block comprising whole object (bounding box) is relatively low, and emphasizing object most has
The score of feature part detection block is higher, as shown in Figure 4.Simultaneously as there is no position markup information, object when training
Detector does not have regression capability, and this can cause part detection knot really only comprising the most characteristic part of object or comprising whole
Also too many background information is included while individual object, these results are to cause the basic original of detection failure (discrimination reduction)
Cause.In order to solve the problems, such as that Weakly supervised detector discrimination is low, the present invention proposes a kind of frame of the supervised learning from Weakly supervised to complete
Frame:True value using the output result of Weakly supervised detector as object location information, the full prison of training one is gone with this pseudo- true value
Object detector is superintended and directed, because full supervised learning has very strong regression capability.For true value On The Choice, one most simply may be used
Capable method is exactly the bounding box for choosing highest scoring in Weakly supervised detector output result as true value.But the method is present
Two problems:First, a bounding box is only able to find for each type objects in a pictures, even if comprising more in picture
Individual object;Second, the pseudo- trutll value being selected includes the most characteristic part of object, rather than object whole, such as Fig. 5 (a) institutes
Show.For problem above and analysis, the present invention proposes a kind of " Weakly supervised object detection side based on pseudo- true value adaptive method
Method ".Specifically, first with the output result conduct of the Weakly supervised detector after non-maximum restraining (NMS) processing
The true value of object location information, a candidate region is trained to produce network (region proposal with this pseudo- true value
Network, RPN), then produce candidate region (proposals) with the network trained and retain those and pseudo- true value weight
The candidate region that area ratio (IoU) is more than certain threshold value is closed, the pixel coordinate for these candidate regions that are finally averaged is to pseudo- true value
Further optimized, flow chart is as shown in Figure 6.After above-mentioned processing, the pseudo- true value (bounding box) of each object
It is found and more accurate, as shown in Fig. 5 (b).Gone using these more accurate bounding boxes as true value (ground truth)
An object detector supervised entirely is trained, (can be according to true value to thing using the strong regression capability of full supervision object detector
The bounding box of body is adjusted), can solve the problems, such as that Weakly supervised object detector discrimination is low.
" the Weakly supervised object detecting method based on pseudo- true value adaptive method " of the present invention can utilize full supervised learning
Method solves the problems, such as Weakly supervised object detection, and higher object detection rate is being obtained in the case where not needing markup information.Solution
Object classification of object detection of the having determined technology in practical application in mark database is not consistent with object classification in practical application
Problem, while the problem of overcome mark database time and effort consuming.To the object detection technology based on deep learning from laboratory
Certain impetus is served to practical application, promotes the development of Weakly supervised object detection technology.
Embodiment two:Present embodiment is unlike embodiment one:Step 1 specifically includes:
Step 1.1), the keyword for receiving user's input;The keyword is used for the classification for representing object;
Step 1.2), retrieved in a search engine using the keyword, choose the retrieval result of predetermined number simultaneously
Markup information using the keyword as the retrieval result.
I.e. the present invention only need to know simple object classification information in picture, it is not necessary to which complicated object location information can
Model is trained.Here simple object classification information can be obtained by many kinds of methods, such as with keyword (" OK
People ", " vehicle " etc.) form search pictures in a search engine, download several thousand sheets come above and can serve as training
Sample, it is not necessary to manually marked.
It is to be understood that when using the method for the present invention, training set can voluntarily be built by user, can be without using
Some picture databases, building the process of training set is:Inputted and be used for represent object in photographic search engine by user
Keyword, then crawls a number of picture in search result, and these pictures are usually to contain the object represented by keyword
, that is to say, that just marked automatically equivalent to during searching for and crawling, it is no longer necessary to which artificial mark, this is fine
It is difficult to adapt to changing new object, the situation of new picture that ground, which solves existing database,.Other existing object detections
Method needs to rely on the large database with label information, and what can not voluntarily be built according to user only has simple picture letter
The database of breath is trained and detected.
Other steps and parameter are identical with embodiment one.
Embodiment three:Present embodiment is unlike embodiment one or two:In step 1), sample is trained
This collection can be any one in PASCAL VOC 2007/2012, MC COCO, WIDER FACE and FDDB databases,
The database either built according to the method for embodiment two.Above-mentioned English name is the title of database.
Other steps and parameter are identical with embodiment one or two.
Embodiment four:Unlike one of present embodiment and embodiment one to three:In step 1), instruction
The size for practicing the picture in sample meets:
The most short side of picture is random one kind in { 480,576,688,864,1200 } five yardsticks;The longest edge of picture
Less than or equal to 2000.
Other steps and parameter are identical with one of embodiment one to three.
Embodiment five:Unlike one of present embodiment and embodiment one to four:Step 2) is specific
Including:
Step 2.1) extracts the candidate region of predetermined number using selective search algorithm in the picture of training sample;
Step 2.2) inputs the candidate region to the VGG16 network models trained on ImageNet data sets
The shallow-layer feature for representing detailed information and the further feature for representing semantic information are obtained, then passes through RoI pondizations side
Method obtains the feature of each candidate region, and candidate region feature is converted into a bit vector table by two-dimensional matrix representation
Show form, obtain the full connection features of each candidate region;
Step 2.3) inputs the full connection features into the Weakly supervised object detector based on more example learning methods,
Have in Weakly supervised object detector and be used to be classification branch that object classification in candidate region is given a mark and for for candidate
The detection branches that the positional information in region is given a mark;Then classification branch is multiplied with the score of detection branches to obtain this time
The score of favored area;
Step 2.4) is inputted the score of each candidate region as supervision message to the 3 optimization networks mutually cascaded
In, consequent propagate is carried out to optimization network and calculated, the result after being optimized.
Other steps and parameter are identical with one of embodiment one to four.
Embodiment six:Unlike one of present embodiment and embodiment one to five:In step 6), entirely
It is any one in Fast-RCNN, Faster-RCNN, YOLO, SSD to supervise object detector.Above-mentioned English name is
The title of object detector.
Other steps and parameter are identical with one of embodiment one to six.
Embodiment seven:
Present embodiment provides a specific implementation process:
As shown in fig. 6, training sample is prepared according to the actual demand of oneself first, then according to more event selections (MIL)
Method trains a Weakly supervised object detector.Afterwards, output of the pseudo- true value adaptive method to Weakly supervised object detector is utilized
As a result handled, obtain the positional information (pseudo- true value) of each object in training sample.Finally using this positional information as
True value goes to train an object detector supervised entirely, and the object detector supervised entirely will provide a more accurate detection
As a result.Every part is described in detail below:
Prepare training sample first.Training sample can be obtained with the pattern of keyword from search engine according to the actual requirements
Take, if the detection of general object can also utilize existing object detection database, such as PASCAL VOC, MC COCO
Deng if the detection of certain objects such as Face datection, the databases such as WIDER FACE, FDDB can be selected.In the present invention,
Training sample is used as in order to choose the parts of the trainval in the databases of PASCAL VOC 2007 without loss of generality, with test portions
It is allocated as surveying data for test.It should be noted that the present invention has only used classification information in training sample, not using thing
The positional information of body.In the training stage, in order to further increase training sample, the versatility for strengthening training pattern, increase model
Robustness, all samples have been subjected to left and right upset, and the image after upset is added to training data and concentrated.In addition, it is
Adapt to the multiple dimensioned change of object in real scene, the present invention on the basis of the length-width ratio of data set picture is kept, from
Most short side of the yardstick as training sample is randomly selected in { 480,576,688,864,1200 } five yardsticks, is examined simultaneously
The longest edge for considering GPU memory problem setting training sample is not more than 2000.
Train Weakly supervised detector (weakly-supervised detector, WSD).The present invention utilizes more event selections
Method realizes Weakly supervised object detector, and because no positional information is as supervision message, Weakly supervised object detector will
A locally optimal solution is converged on, causes the discrimination of object detection relatively low.In order to improve discrimination, the present invention is in training mould
Several embedded optimization networks parallel with more event selections detection network in type, as shown in Figure 7.For an input sample
This, extracts about 2000 candidate regions (proposals), then using existing first with selective search
The VGG16 network models extraction feature trained on ImageNet, finally obtains each time using the method in RoI ponds
The feature of favored area, and then obtain the full connection features of each candidate region.In more example learning networks, input to be each
The effect of the full connection features of candidate region, two classification arranged side by side and detection branches is respectively to judge the class of each candidate region
Not and the positional information of each candidate region is given a mark, finally be multiplied to obtain with the score of detection branches by classification branch
The score of this candidate region.In network is optimized, supervision is used as using the score of each candidate region in more example learning networks
Information, consequent propagate is carried out to network and is calculated, further improves discrimination.In view of the relation between training time and discrimination
(discrimination is in non-linear growth relation with the number for optimizing networking, but the training time closes with linear increase of discrimination
System), the number for optimizing network is set as 3 by the present invention.
Pseudo- true value adaptive method (Pseudo Ground-truth Adaptive, PGA).In the training of Weakly supervised detector
During do not use positional information, so the discrimination of Weakly supervised detector is limited.It is embodied in:It is only able to detect thing
A part rather than whole object (for example, body of the face of people rather than people) for body, or believe comprising too many background
Breath, these results are the basic reasons for causing discrimination low.In order to further improve discrimination, the side that the present invention will supervise entirely
Method is referred in Weakly supervised object detection, but full supervised learning needs the positional information of object to be trained as supervision message
Network, a most straightforward procedure are exactly to choose the candidate regions of each type objects highest scoring in Weakly supervised detector output result
True value of the domain as positional information, the full supervision object detector of training one is gone with this pseudo- true value.Utilize full supervised learning
Regression capability further improves object detection rate.But there are two shortcomings in this method:First, for each training sample
This, a type objects are only able to find a bounding box, even if the multiple objects containing identical category in this sample;Second, looked for
The bounding box arrived is not accurate enough, ordinary circumstance can only the most characteristic part of detection object, as Fig. 5 institutes (a) show.For upper
Problem is stated, the present invention proposes a kind of pseudo- true value adaptive method, and detailed process includes three parts:First, to Weakly supervised detector
Output result carries out non-maximum restraining (NMS) processing, chooses the bounding box of highest scoring in the corresponding bounding box of each sample
As the positional information (pseudo- true value) of this object, but positional information now typically only includes the most characteristic portion of object
Divide the whole of (such as head of people) rather than object, as shown in the second pictures in Fig. 5 (b);Second, after being handled using NMS
Result train candidate region to produce network (region proposal network, RPN) as positional information, then
Some candidate regions are produced using the network trained, the present invention retains those and is more than necessarily with true value overlapping area ratio (IoU)
All candidate regions of threshold value (present invention is set as 0.3), as shown in the 3rd pictures in Fig. 5 (b);3rd, by second step
After operation, there are corresponding some candidate regions for each object in training sample, and these candidate regions are closer
In the profile of whole object, the present invention is averaged to the pixel coordinate of all candidate regions of each object, and this is averaged
It is worth as final pseudo- true value, as shown in the 4th pictures in Fig. 5 (b).After above-mentioned three step process, in training sample
Each object has a bounding box to correspond, while this bounding box is compared to the result chosen by highest point-score more
Accurately.
Train full supervisory detection device (fully-supervised detector, FSD).After the search of pseudo- true value, training
Each object has an accurate positional information in sample.True value is used as by the use of this positional information, it is possible to is trained
One full supervision object detector.Full supervision object detector is not the emphasis of the present invention, and it can be existing any object
Detector, such as Fast-RCNN, Faster-RCNN, YOLO, SSD etc..The present invention is from Fast-RCNN as full supervision thing
Detector, it is 70000 times to train total iterations, and the learning rate of preceding 40000 iteration is 0.01, rear 30000 iteration
Learning rate be 0.001.
The object detection network trained through above-mentioned steps, object can be realized in the case where not needing positional information mark
Detection function, the object detection that can be applied to according to the actual requirements in real scene, not by existing object detection database thing
The limitation of body classification, it is not necessary to spend manpower and materials to go to be labeled each training sample.Experiment proves the present invention's
" the Weakly supervised object detecting method based on pseudo- true value adaptive method " positioning precision is accurate, while detection efficiency is high, and table two is real
Comparative result data are tested, wherein mAP is Average Accuracy (mean Average Precision), is that test sample is carried out
The index assessed, Corloc is the rate that is properly positioned (Correct Location), is to training sample in training process
The index that locating effect is assessed.As can be seen that " Weakly supervised object detection proposed by the present invention from correction data
The framework of device+pseudo- true value adaptive method+full supervisory detection device " has one huge to carry than the testing result of Weakly supervised detector
Rise, while " the pseudo- true value adaptive method " of the present invention is compared with " highest scoring method ", testing result also has greatly improved.Figure
8 be experimental result picture, and the detection block of its Green is " the Weakly supervised object detection based on pseudo- true value adaptive method in the present invention
The testing result of method ", red detection block are the inspection of " Weakly supervised object detector+top score method+full supervisory detection device "
Result is surveyed, method of the invention is substantially better than another method as seen from the figure.
Collect in the object detection frequently-used data storehouse of table 1
The experimental result correction data of table 2
The present invention can also have other various embodiments, in the case of without departing substantially from spirit of the invention and its essence, this area
Technical staff works as can make various corresponding changes and deformation according to the present invention, but these corresponding changes and deformation should all belong to
In the protection domain of appended claims of the invention.
Claims (6)
- A kind of 1. Weakly supervised object detecting method based on pseudo- true value adaptive method, it is characterised in that including:Step 1), structure training sample;Step 2), the picture in training sample is input in the Weakly supervised object detector based on more example learning methods;Step 3), the output result progress non-maximum restraining processing by Weakly supervised object detector, choose in result picture The bounding box of every kind of object highest scoring;Step 4), the positional information training candidate region for the bounding box chosen according to step 3) produce network, use the candidate Region produces network and produces multiple candidate regions, retains and is more than with the bounding box overlapping area ratio of the highest scoring described in step 3) All candidate regions of certain threshold value;The object of each classification corresponds to multiple candidate regions;Step 5), the pixel coordinate corresponding to the candidate region of same object averaged, determined according to result of calculation every Unique bounding box of individual object;Step 6), the information of the bounding box obtained in step 5) is inputed to and supervises object detector entirely, obtain testing result.
- 2. the Weakly supervised object detecting method according to claim 1 based on pseudo- true value adaptive method, it is characterised in that step Rapid one specifically includes:Step 1.1), the keyword for receiving user's input;The keyword is used for the classification for representing object;Step 1.2), retrieved in a search engine using the keyword, choose the retrieval result of predetermined number and by institute State markup information of the keyword as the retrieval result.
- 3. the Weakly supervised object detecting method according to claim 1 based on pseudo- true value adaptive method, it is characterised in that step It is rapid 1) in, training sample set be PASCAL VOC 2007/2012, MC COCO, WIDER FACE and FDDB databases in Any one.
- 4. the Weakly supervised object detecting method according to claim 1 based on pseudo- true value adaptive method, it is characterised in that step It is rapid 1) in, the size of the picture in training sample meets:The most short side of picture is random one kind in { 480,576,688,864,1200 } five yardsticks;The longest edge of picture is less than Equal to 2000.
- 5. the Weakly supervised object detecting method according to claim 1 based on pseudo- true value adaptive method, it is characterised in that step It is rapid 2) to specifically include:Step 2.1) extracts the candidate region of predetermined number using selective search algorithm in the picture of training sample;The candidate region is inputted to the VGG16 network models trained on ImageNet data sets and obtained by step 2.2) Obtained for representing the shallow-layer feature of detailed information and further feature for representing semantic information, then by RoI ponds method The feature of each candidate region is taken, and candidate region feature is converted into one-dimensional vector by two-dimensional matrix representation and represents shape Formula, obtain the full connection features of each candidate region;Step 2.3) inputs the full connection features into the Weakly supervised object detector based on more example learning methods, weak prison Superintend and direct to have in object detector and be used to be classification branch that object classification in candidate region is given a mark and for for candidate region The detection branches given a mark of positional information;Then classification branch is multiplied with the score of detection branches to obtain this candidate region Score;Step 2.4) is inputted the score of each candidate region as supervision message into the 3 optimization networks mutually cascaded, right Optimize network and carry out consequent propagation calculating, the result after being optimized.
- 6. the Weakly supervised object detecting method according to claim 1 based on pseudo- true value adaptive method, it is characterised in that step It is rapid 6) in, full object detector of supervising is any one in Fast-RCNN, Faster-RCNN, YOLO, SSD.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711066445.XA CN107833213B (en) | 2017-11-02 | 2017-11-02 | Weak supervision object detection method based on false-true value self-adaptive method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711066445.XA CN107833213B (en) | 2017-11-02 | 2017-11-02 | Weak supervision object detection method based on false-true value self-adaptive method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107833213A true CN107833213A (en) | 2018-03-23 |
CN107833213B CN107833213B (en) | 2020-09-22 |
Family
ID=61650549
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711066445.XA Active CN107833213B (en) | 2017-11-02 | 2017-11-02 | Weak supervision object detection method based on false-true value self-adaptive method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107833213B (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109033939A (en) * | 2018-06-04 | 2018-12-18 | 上海理工大学 | Improved YOLOv2 object detecting method under a kind of cluttered environment |
CN109389040A (en) * | 2018-09-07 | 2019-02-26 | 广东中粤电力科技有限公司 | A kind of inspection method and device of the dressing of operation field personnel safety |
CN110032935A (en) * | 2019-03-08 | 2019-07-19 | 北京联合大学 | A kind of traffic signals label detection recognition methods based on deep learning cascade network |
CN110111340A (en) * | 2019-04-28 | 2019-08-09 | 南开大学 | The Weakly supervised example dividing method cut based on multichannel |
CN110135480A (en) * | 2019-04-30 | 2019-08-16 | 南开大学 | A kind of network data learning method for eliminating deviation based on unsupervised object detection |
CN110310301A (en) * | 2018-03-27 | 2019-10-08 | 华为技术有限公司 | A kind of method and device detecting target image |
CN110516551A (en) * | 2019-07-29 | 2019-11-29 | 上海交通大学烟台信息技术研究院 | A kind of line walking positional shift identifying system, method and the unmanned plane of view-based access control model |
WO2020006554A1 (en) * | 2018-06-29 | 2020-01-02 | Geomni, Inc. | Computer vision systems and methods for automatically detecting, classifing, and pricing objects captured in images or videos |
CN111444939A (en) * | 2020-02-19 | 2020-07-24 | 山东大学 | Small-scale equipment component detection method based on weak supervision cooperative learning in open scene of power field |
CN111738454A (en) * | 2020-08-28 | 2020-10-02 | 腾讯科技(深圳)有限公司 | Target detection method, device, storage medium and equipment |
US11068718B2 (en) | 2019-01-09 | 2021-07-20 | International Business Machines Corporation | Attribute classifiers for image classification |
CN114415254A (en) * | 2022-01-21 | 2022-04-29 | 哈尔滨工业大学 | Multi-case weak supervision mars surface morphology detection method based on online learning |
CN114638322A (en) * | 2022-05-20 | 2022-06-17 | 南京大学 | Full-automatic target detection system and method based on given description in open scene |
US11676182B2 (en) | 2018-06-29 | 2023-06-13 | Insurance Services Office, Inc. | Computer vision systems and methods for automatically detecting, classifying, and pricing objects captured in images or videos |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106326938A (en) * | 2016-09-12 | 2017-01-11 | 西安电子科技大学 | SAR image target discrimination method based on weakly supervised learning |
CN106682697A (en) * | 2016-12-29 | 2017-05-17 | 华中科技大学 | End-to-end object detection method based on convolutional neural network |
CN107169421A (en) * | 2017-04-20 | 2017-09-15 | 华南理工大学 | A kind of car steering scene objects detection method based on depth convolutional neural networks |
-
2017
- 2017-11-02 CN CN201711066445.XA patent/CN107833213B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106326938A (en) * | 2016-09-12 | 2017-01-11 | 西安电子科技大学 | SAR image target discrimination method based on weakly supervised learning |
CN106682697A (en) * | 2016-12-29 | 2017-05-17 | 华中科技大学 | End-to-end object detection method based on convolutional neural network |
CN107169421A (en) * | 2017-04-20 | 2017-09-15 | 华南理工大学 | A kind of car steering scene objects detection method based on depth convolutional neural networks |
Non-Patent Citations (2)
Title |
---|
H.BILEN: ""Weakly supervised deep detection networks"", 《COMPUTER VISION&PATTERN RECOGNITION》 * |
HAKAN BILEN: ""Weekly Supervised Object Detection with Convex Clustering"", 《COMPUTER VISION&PATTERN RECOGNITION》 * |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110310301A (en) * | 2018-03-27 | 2019-10-08 | 华为技术有限公司 | A kind of method and device detecting target image |
CN110310301B (en) * | 2018-03-27 | 2021-07-16 | 华为技术有限公司 | Method and device for detecting target object |
CN109033939A (en) * | 2018-06-04 | 2018-12-18 | 上海理工大学 | Improved YOLOv2 object detecting method under a kind of cluttered environment |
US11676182B2 (en) | 2018-06-29 | 2023-06-13 | Insurance Services Office, Inc. | Computer vision systems and methods for automatically detecting, classifying, and pricing objects captured in images or videos |
WO2020006554A1 (en) * | 2018-06-29 | 2020-01-02 | Geomni, Inc. | Computer vision systems and methods for automatically detecting, classifing, and pricing objects captured in images or videos |
US11783384B2 (en) | 2018-06-29 | 2023-10-10 | Insurance Services Office, Inc. | Computer vision systems and methods for automatically detecting, classifying, and pricing objects captured in images or videos |
CN109389040B (en) * | 2018-09-07 | 2022-05-10 | 广东珺桦能源科技有限公司 | Inspection method and device for safety dressing of personnel in operation field |
CN109389040A (en) * | 2018-09-07 | 2019-02-26 | 广东中粤电力科技有限公司 | A kind of inspection method and device of the dressing of operation field personnel safety |
US11281912B2 (en) | 2019-01-09 | 2022-03-22 | International Business Machines Corporation | Attribute classifiers for image classification |
US11068718B2 (en) | 2019-01-09 | 2021-07-20 | International Business Machines Corporation | Attribute classifiers for image classification |
CN110032935A (en) * | 2019-03-08 | 2019-07-19 | 北京联合大学 | A kind of traffic signals label detection recognition methods based on deep learning cascade network |
CN110111340A (en) * | 2019-04-28 | 2019-08-09 | 南开大学 | The Weakly supervised example dividing method cut based on multichannel |
CN110111340B (en) * | 2019-04-28 | 2021-05-14 | 南开大学 | Weak supervision example segmentation method based on multi-path segmentation |
CN110135480A (en) * | 2019-04-30 | 2019-08-16 | 南开大学 | A kind of network data learning method for eliminating deviation based on unsupervised object detection |
CN110516551A (en) * | 2019-07-29 | 2019-11-29 | 上海交通大学烟台信息技术研究院 | A kind of line walking positional shift identifying system, method and the unmanned plane of view-based access control model |
CN110516551B (en) * | 2019-07-29 | 2023-04-07 | 上海交通大学烟台信息技术研究院 | Vision-based line patrol position deviation identification system and method and unmanned aerial vehicle |
CN111444939B (en) * | 2020-02-19 | 2022-06-28 | 山东大学 | Small-scale equipment component detection method based on weak supervision cooperative learning in open scene of power field |
CN111444939A (en) * | 2020-02-19 | 2020-07-24 | 山东大学 | Small-scale equipment component detection method based on weak supervision cooperative learning in open scene of power field |
CN111738454B (en) * | 2020-08-28 | 2020-11-27 | 腾讯科技(深圳)有限公司 | Target detection method, device, storage medium and equipment |
CN111738454A (en) * | 2020-08-28 | 2020-10-02 | 腾讯科技(深圳)有限公司 | Target detection method, device, storage medium and equipment |
CN114415254A (en) * | 2022-01-21 | 2022-04-29 | 哈尔滨工业大学 | Multi-case weak supervision mars surface morphology detection method based on online learning |
CN114415254B (en) * | 2022-01-21 | 2023-02-07 | 哈尔滨工业大学 | Multi-case weak supervision mars surface morphology detection method based on online learning |
CN114638322A (en) * | 2022-05-20 | 2022-06-17 | 南京大学 | Full-automatic target detection system and method based on given description in open scene |
CN114638322B (en) * | 2022-05-20 | 2022-09-13 | 南京大学 | Full-automatic target detection system and method based on given description in open scene |
Also Published As
Publication number | Publication date |
---|---|
CN107833213B (en) | 2020-09-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107833213A (en) | A kind of Weakly supervised object detecting method based on pseudo- true value adaptive method | |
CN108428229B (en) | Lung texture recognition method based on appearance and geometric features extracted by deep neural network | |
Li et al. | Localizing and quantifying damage in social media images | |
Ping et al. | A deep learning approach for street pothole detection | |
CN107730553A (en) | A kind of Weakly supervised object detecting method based on pseudo- true value search method | |
CN111444821A (en) | Automatic identification method for urban road signs | |
US10262214B1 (en) | Learning method, learning device for detecting lane by using CNN and testing method, testing device using the same | |
CN107341517A (en) | The multiple dimensioned wisp detection method of Fusion Features between a kind of level based on deep learning | |
CN107346420A (en) | Text detection localization method under a kind of natural scene based on deep learning | |
CN112016605B (en) | Target detection method based on corner alignment and boundary matching of bounding box | |
CN107945153A (en) | A kind of road surface crack detection method based on deep learning | |
CN105184265A (en) | Self-learning-based handwritten form numeric character string rapid recognition method | |
CN105574527A (en) | Quick object detection method based on local feature learning | |
CN112560675B (en) | Bird visual target detection method combining YOLO and rotation-fusion strategy | |
CN109284779A (en) | Object detecting method based on the full convolutional network of depth | |
CN110147841A (en) | The fine grit classification method for being detected and being divided based on Weakly supervised and unsupervised component | |
CN113674216A (en) | Subway tunnel disease detection method based on deep learning | |
CN111488911A (en) | Image entity extraction method based on Mask R-CNN and GAN | |
CN106845458A (en) | A kind of rapid transit label detection method of the learning machine that transfinited based on core | |
Luo et al. | Boundary-aware and semiautomatic segmentation of 3-D object in point clouds | |
CN105404682B (en) | A kind of book retrieval method based on digital image content | |
CN106548195A (en) | A kind of object detection method based on modified model HOG ULBP feature operators | |
CN112766170B (en) | Self-adaptive segmentation detection method and device based on cluster unmanned aerial vehicle image | |
CN112347927B (en) | High-resolution image building extraction method based on convolutional neural network probability decision fusion | |
CN109741351A (en) | A kind of classification responsive type edge detection method based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |