CN109614990A - A kind of object detecting device - Google Patents
A kind of object detecting device Download PDFInfo
- Publication number
- CN109614990A CN109614990A CN201811385266.7A CN201811385266A CN109614990A CN 109614990 A CN109614990 A CN 109614990A CN 201811385266 A CN201811385266 A CN 201811385266A CN 109614990 A CN109614990 A CN 109614990A
- Authority
- CN
- China
- Prior art keywords
- target
- candidate frame
- target candidate
- information
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of object detecting devices.A kind of object detecting device, including memory, Memory Controller, processor and Peripheral Interface;The Peripheral Interface includes image acquisition units, man-machine interaction unit and display unit;The memory includes module of target detection;The processor obtains described image acquisition unit acquired image information, is then that input executes the module of target detection with described image information, exports the target information of described image information;Finally the target information is shown in the display unit.Wherein the module of target detection includes: obtaining sample unit;Establish the convolutional network unit based on target candidate frame;Determine the label information of target candidate frame and the map information element of target sample data;Determine the recurrence information unit of target candidate frame;Convolutional network unit of the training based on target candidate frame;Predict object element.The present invention is suitable for detecting the application scenario of target.
Description
Technical field
The present invention relates to target detection technique field, in particular to a kind of device of target detection.
Background technique
With the progress of science and technology, the needs of target detection are more and more.Need to detect target in U.S. face application, herein
On the basis of carry out U.S. face;Changing face is also first to detect target in applying, and is then being changed face;Target attendance is also required to first detect mesh
Mark, in identification target.In use above, the accuracy of target detection has vital influence to above-mentioned application.
With the rise of convolutional neural networks, target detection achieves significant progress, and accuracy rate soars all the way.But
It is, since the calculation amount of convolutional network is huge, to need high performance GPU (such as TITAN) could real-time detection.Its high cost is always
It is the critical bottleneck for restricting its volume production.
This paper presents a kind of object detecting devices, identify whether target candidate frame is target by sorter network, lead to
Recurrent networks are crossed to predict offset of the target candidate frame relative to real goal, sorter network and Recurrent networks sharing feature
Layer, using error in classification and regression error together training convolutional neural networks, realizes end-to-end to reduce the calculation amount of algorithm
Training.This network has achieved the purpose that reduce calculation amount and model parameter amount by contribution characteristic layer, to be real-time detection
The condition of creation.
Summary of the invention
The embodiment of the present invention provides a kind of object detecting device, can be with real-time detection target.
The present invention is the technical solution that embodiment uses are as follows:
A kind of object detecting device, including memory 111, Memory Controller 112, processor 113 and Peripheral Interface
114;
The Peripheral Interface includes image acquisition units 115, man-machine interaction unit 116 and display unit 117;
The memory includes module of target detection 200;
The processor obtains described image acquisition unit acquired image information, is then defeated with described image information
Enter to execute
The module of target detection exports the target information of described image information;Finally the target information is shown in
It is described aobvious
Show in unit.
Wherein the module of target detection includes:
Sample unit is obtained, for obtaining the target sample data marked;
The convolutional network unit based on target candidate frame is established, for establishing the convolutional network based on target candidate frame, institute
Stating convolutional network includes feature extraction network and target classification Recurrent networks;
The label information of target candidate frame and the map information element of target sample data are determined, for according to the feature
Network and the target sample data are extracted, determine the label information of target candidate frame and the target callout box of target sample data
The map information of target callout box on the feature extraction network the last layer characteristic layer;
The recurrence information unit for determining target candidate frame, for the label information and the mesh according to the target candidate frame
The map information for marking callout box, determines the recurrence information of the target candidate frame;
Convolutional network unit of the training based on target candidate frame, for by the label information of the target candidate frame and described
Truth data of the recurrence information of target candidate frame as the target candidate frame, the training convolutional network, when the convolution
When network analog goes out the distribution of the Truth data of the target candidate frame, training terminates;
It predicts object element, receives object to be measured image, for exporting target area and target by the convolutional network
Score carries out target prediction.
Further, the feature extraction network is that VGG-16 removes remaining part after last three layers of full articulamentum.
Further, the map information element of the label information of the determining target candidate frame and target sample data, packet
It includes:
Band mappings characteristics module is obtained to be denoted as obtaining the last layer information of the feature extraction network wait reflect
Penetrate characteristic pattern;
Target candidate frame module is generated, is used in the characteristic pattern to be mapped, for each location of pixels, according to mesh
Dimensioning S and target length-width ratio R generates target candidate frame;
The label information module for determining target candidate frame, for determining the label information of the target candidate frame.If institute
The target callout box for stating target candidate frame and the target sample data has intersection, and the ratio between its intersection and its union, is greater than
The target candidate collimation mark is then denoted as positive sample by preset threshold T1;If the ratio between its intersection and its union, it is less than default threshold
The target candidate collimation mark is then denoted as negative sample by value T2;
The map information module for determining target sample data, for determining the map information of the target callout box.It calculates
The target callout box of the target sample data is mapped in characteristic pattern to be mapped by the scaling ratio of the feature extraction network
On, obtain the map information of target callout box.
Further, number and the target length and width that the number for generating target candidate frame is the target size S
The product of number than R.
Further, the label information module of the determining target candidate frame, further includes:
Mark the target callout box module of target sample data.If target candidate frame and the target sample marked
The target callout box of data has intersection, and the ratio between its intersection and its union, is greater than preset threshold T1, then the target candidate frame
Labeled as positive sample, and the target callout box is labeled as to have matched target candidate frame;
The target callout box module that statistics is not matched with target candidate frame.
Again matching operation module will not be carried out with the target callout box of target candidate frame successful matching.For each
Not with the target callout box of target candidate frame successful matching, calculates this and do not marked with the target of target candidate frame successful matching
The intersection of frame and target complete candidate frame and the ratio of union, and be ranked up;By target candidate corresponding to maximum ratio
Frame is labeled as positive sample.
Further, the recurrence information unit of the determining target candidate frame, comprising:
The label information of the target candidate frame includes positive sample and negative sample;
If the label information of the target candidate frame is positive sample, intersection and union with the target candidate frame are obtained
The ratio between maximum target callout box;Offset of the target candidate frame position relative to the target callout box is calculated, by institute
State the recurrence information that offset makees target candidate frame.
Further, convolutional network unit of the training based on target candidate frame, comprising:
The label information of the target candidate frame includes positive sample and negative sample;
The target candidate frame that the label information is positive sample and negative sample is randomly selected, the convolutional network is trained,
Described in label information be positive sample target candidate frame number and the label information be negative sample target candidate frame
Number is consistent.
Further, convolutional network unit of the training based on target candidate frame, comprising:
The objective cost function of the convolutional network is calculated, the objective cost function includes: classification cost and recurrence generation
Valence.
Further, the objective cost function for calculating the convolutional network, comprising:
Classification cost is normalized divided by the number for the target candidate frame for participating in calculating,
Cost is returned to be normalized divided by the number of four times of positive sample target candidate frame.
Further, the prediction object element, comprising:
It obtains first object collection modules: obtaining the target area of the convolutional network, be denoted as first object region, obtain
The target score of the convolutional network is denoted as first object score;It establishes the first object region and the first object obtains
The mapping set divided, is denoted as first object set;
Calculate the corresponding object module of highest goal score: according to first object score, to the first object set into
Row sequence, obtains the highest goal score in current first object set;
It calculates and is handed over the target of top score and than the object module less than preset threshold: it is corresponding to calculate highest goal score
Object module in the corresponding target area of highest goal score and first object set in remaining target area intersection and
The ratio between union deletes its intersection and simultaneously when the ratio between the intersection and union are greater than preset threshold T3 from first object set
The corresponding target area of the ratio between collection;
It calculates the second target collection module: highest goal in first object set must be removed, save to the second object set
It closes;
It calculates detection target collection module: merging first object set and the second target collection, be object to be measured image
Target area.
Compared with prior art, the invention proposes a kind of devices of target detection.The present apparatus is closed using attention mechanism
Mark candidate frame is gazed at, identifies whether target candidate frame is target by sorter network, predicts target candidate by Recurrent networks
Offset of the frame relative to real goal, sorter network and Recurrent networks sharing feature layer use to reduce the calculation amount of algorithm
Error in classification and regression error training convolutional neural networks together.Device proposed by the present invention, can be on the video card of low performance
Accurate detection detection, frame per second it can achieve 45fps/s in real time, accuracy rate 98% meets target in industry
The demand of detection.
Detailed description of the invention
It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art
Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description is only of the invention some
Embodiment for those of ordinary skill in the art without creative efforts, can also be attached according to these
Figure obtains other attached drawings.
Fig. 1 is a kind of schematic diagram of object detecting device.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts all other
Embodiment shall fall within the protection scope of the present invention.
The advantages of to make technical solution of the present invention, is clearer, makees specifically to the present invention with reference to the accompanying drawings and examples
It is bright.
Embodiment one
A kind of object detecting device, including memory 111, Memory Controller 112, processor 113 and Peripheral Interface
114;
The Peripheral Interface includes image acquisition units 115, man-machine interaction unit 116 and display unit 117;
The memory includes module of target detection 200;
The processor obtains described image acquisition unit acquired image information, is then defeated with described image information
Enter to execute
The module of target detection exports the target information of described image information;Finally the target information is shown in
It is described aobvious
Show in unit.
Wherein the module of target detection includes:
Sample unit is obtained, for obtaining the target sample data marked;
The target sample data are including but not limited to target detection data, such as BioID Face Database-FaceDB
With Labeled Faces in the Wild Home (LFW) etc..
The mark of target sample is marked including but not limited to by the rectangle frame of target area rule.Such as, in image I (x,
Y, w, h) position expression target area, wherein x and y indicates the coordinate in the target area upper left corner, and w and h are respectively target area
It is wide and high.
In addition, the target sample data can also include the sample data in practical application scene.For example, if using
, can be using mobile phone in different perspectives in the target detection of cell phone client, different distance, the image shot under varying environment,
Then label target region.Such mode is although more complicated, and cost of labor can be relatively high, but this data can be targetedly
Improve algorithm in the accuracy in this field.Certainly, such data are non-necessary, it may be assumed that if without in practical application scene
Sample data, this algorithm model also can be applied to cell phone client, and only its accuracy rate is without the former height.
The convolutional network unit based on target candidate frame is established, for establishing the convolutional network based on target candidate frame, institute
Stating convolutional network includes feature extraction network and target classification Recurrent networks;
Based on the convolutional network of target candidate frame, had in training stage and test phase slightly different.In test phase,
Its testing image inputted as arbitrary dimension, output are the position coordinates and target score of target detection frame in the testing image
Value;In the training stage, input is the sample image of arbitrary dimension, and output is penalty values, this penalty values reacts target candidate
The deviation of the target position of the target position and true sample image of the convolutional network prediction of frame.
Convolutional network based on target candidate frame includes feature extraction network and target classification Recurrent networks.
Feature extraction network is used to extract the characteristic information of testing image, can based on network, can be a variety of nets
Network is combined.Such feature extraction network can remove remaining convolution net after last three layers of full articulamentum for VGG-16
Network, it includes 5 stages, each stage includes 2-3 convolutional layer and a pond layer, the hyper parameter of the convolutional layer in each stage
Equally.Remaining convolutional network after last three layers of original excision full articulamentum can be used in feature extraction network herein.This
Network extracts the feature of testing image, its feature is then respectively fed to target classification network, each target candidate frame position
Whether it is target, while this feature is sent into Recurrent networks, in each target candidate frame position, predicts relative to target candidate
The target position of frame offset.
Feature extraction network objectives are to extract clarification of objective in testing image.When clarification of objective is relative complex, because
Target includes variation caused by changing features, ornaments caused by occlusion issue, dressing caused by deformation variation, wearing variation etc..
It needs to learn so complicated characteristic information in view of feature learning network, feature extraction network can use inception mould
Block, it can consider using the feature extraction layer in Googlenet.
The feature extraction network can be trained using the device of transfer learning, i.e., first use other classification image
To train the essential characteristic of bottom to learn the model parameter of bottom, the target sample image marked that then use is collected
The high-rise semantic information of training.In ILSVRC match, VGG16 and Googlenet have had disclosed model parameter, in equipment
In the case where not allowing, online disclosed model parameter can be directly used.This model parameter is as feature extraction network
Then initial parameter is trained on this basis, obtain the complete parameter of feature extraction network.
Classification Recurrent networks are used to detect the target position in testing image.It is special to be located at feature for the classification Recurrent networks
Sign is extracted behind network.The last layer for remembering feature extraction network is characteristic pattern to be mapped, on characteristic pattern to be mapped each
Location of pixels is known as an anchor point, and classification Recurrent networks traverse each anchor point respectively, to the target candidate on each anchor point
Frame predicts the offset relative to anchor point of score and real goal that it is target respectively.
It is described classification Recurrent networks network structure be, by one D of a characteristic layer boil down to tie up column vector, this arrange to
Amount is sent into target classification network, generates the column vector of 2*k dimension, respectively indicates k mesh of each anchor point of characteristic pattern to be mapped
Mark whether candidate frame is target, wherein k is the quasi- target candidate frame number of each anchor point of characteristic pattern to be mapped.The column vector
It is sent into goal regression network, the column vector of 4*k dimension is generated, respectively indicates each anchor point of target relative to characteristic pattern to be mapped
K target candidate frame offset.That is, the target classification Recurrent networks prediction and the target position returned are opposite
In anchor point, rather than whole image is predicted.
The label information of target candidate frame and the map information element of target sample data are determined, for according to the feature
Network and the target sample data are extracted, determine the label information of target candidate frame and the target callout box of target sample data
The map information of target callout box on the feature extraction network the last layer characteristic layer;
The main purpose of this step is the label for constructing the Truth data of training of the convolutional network based on target candidate frame
Information.This label information is used for the parameter of training objective sorter network.The detailed annotation of specific steps is illustrated in example 2.
The recurrence information unit for determining target candidate frame, for the label information and the mesh according to the target candidate frame
The map information for marking callout box, determines the recurrence information of the target candidate frame;
The main purpose of this step is to construct the recurrence of the Truth data of training of the convolutional network based on target candidate frame
The offset of information, i.e. locations of real targets relative to the target candidate frame of the anchor point on characteristic pattern to be mapped.It is waited based on target
The recurrence information of the convolutional network of frame is selected, only there are recurrence information when target position for calculating.I.e., it is only necessary to calculate real goal
Multiple regressive objects of place anchor point position.
The device for calculating the recurrence information of the convolutional network based on target candidate frame is as follows: if the target candidate frame
Label information is positive sample, calculates the intersection of real goal frame and target candidate frame and the ratio of union.To each true mesh
Mark frame, select one hand over and than maximum target candidate frame, calculate the recurrence information of this target candidate frame.It is true that this, which returns information,
Offset of the real target frame relative to target candidate frame, it may be assumed that
Wherein, xa、ya、wa、haThe respectively pixel coordinate in the upper left corner of anchor point, the length of anchor point and width.x*、y*、w*、 h*Point
Not Wei the pixel coordinate in the upper left corner of real goal frame, real goal frame length and width.To be waited based on target
Select the parameter of the quasi- recurrence of the convolutional network of frame.
The present invention is not returned directly against original input picture, but uses the recurrence based on anchor point, is had following
Two reasons, firstly, regression parameter needs recalculate, i.e., based on the recurrence of original image when original image is cut etc.
Parameter does not accomplish translation invariant shape;Secondly, the recurrence of remaining anchor point, is equivalent to and parameter is limited between [0.0,1.0],
The parameter in this section is easy convergence.
Convolutional network unit of the training based on target candidate frame, for by the label information of the target candidate frame and described
Truth data of the recurrence information of target candidate frame as the target candidate frame, the training convolutional network, when the convolution
When network analog goes out the distribution of the Truth data of the target candidate frame, training terminates;
It is according to given training objective learning model parameter based on the convolutional network of target candidate frame.This model parameter
What is learnt is a kind of distribution of given training objective.If training objective includes some fuzzy amounts, i.e. given training mesh
Mark includes the information of some mistakes, and convolutional network is by the highly difficult of study, it is more likely that does not restrain.Thereby, it is ensured that training objective is extremely
It closes important.The target for intending training herein includes the label information of target candidate frame and the recurrence information of target candidate frame, label letter
Explaining in detail for breath is illustrated in example 2, is returned explaining in detail for information and is analyzed above-mentioned.
The cost function of the training convolutional network seen based on target candidate frame includes two parts, respectively classification generation
Valence and recurrence cost.Its formula is as follows:
Wherein, Ltotal、Lcls、LregIt total cost function respectively based on target candidate frame, classification cost function and returns
Return cost function;I is the index of the current sample for being sent into training;piThe probability of target is predicted as i-th of anchor point;It is i-th
A anchor point is the probability of target, when i-th of anchor point is target,When i-th of anchor point is non-targeted,tiIt is pre-
Offset of the target of survey relative to i-th of anchor point,For the aim parameter of recurrence;NclsTo participate in calculating the number for the anchor point classified
Mesh;LregTo participate in calculating the number of the anchor point returned;γ is the aequum that classification returns cost function.
It is not difficult to find out that the convolutional network based on target candidate frame includes classification cost and recurrence cost.Classify cost and return
Cost is returned to be normalized respectively with the anchor point number of the number of the anchor point of parametric classification operation and participation regressing calculation.Due to dividing
Class cost is different with the range of cost is returned, and cost deviation is modified using γ.
In training process, an iteration is carried out using an image herein, extracts multiple positive and negative samples on an image respectively
Originally it is trained model.It is clear that the number of negative sample obviously can extra positive sample, have very big sample tilt phenomenon,
The present invention is solved using following device: setting N takes in all anchor points respectively as the anchor point number that an iteration intends training
N/2 positive sample and N/2 negative sample are trained, if the number deficiency N/2 of positive sample, using whole positive sample and
Suitable negative sample guarantees that positive and negative total sample number is N.
It predicts object element, receives object to be measured image, for exporting target area and target by the convolutional network
Score carries out target prediction.
After having trained the convolutional network based on target candidate frame, the model of target detection is obtained.According to this model and network
Structure, can direct future position.But the target position predicted at this time can be very much, and have very big intersection and union
The ratio between, it is therefore desirable to it is post-processed using certain devices.The device for the post-processing that the present invention uses will in the third embodiment into
Row detailed analysis.
Compared with prior art, the present embodiment pays close attention to target candidate frame using attention mechanism, is identified by sorter network
Whether target candidate frame is target, and offset of the target candidate frame relative to real goal is predicted by Recurrent networks, classification
Network and Recurrent networks sharing feature layer, to reduce the calculation amount of algorithm, using error in classification and regression error, training is rolled up together
Product neural network.The device will classify and recurrence is unified for a problem, shares convolution characteristic layer, reduces the operation of algorithm
Amount, reduces the size of model, algorithm of the invention is allow to detect target information in real time.
Embodiment two
The present embodiment provides label informations and target sample data that target candidate frame is determined in a kind of object detecting device
Map information element, the unit includes:
Band mappings characteristics module is obtained to be denoted as obtaining the last layer information of the feature extraction network wait reflect
Penetrate characteristic pattern;
The feature extraction network extracts the characteristic information of input picture, characteristic information include bottom low-level information and
High-rise semantic information.The low-level information of its bottom includes marginal information, colouring information, the texture information etc. of image;High-rise
Semantic information, such as nose information, mouth information, eye information, cap information, glasses information.High-rise semantic information is more
Reaction be image abstracted information, closer to classification and return information.The present invention obtains feature extraction network most
Later layer information, as characteristic pattern to be mapped.This characteristic information is high-rise semantic information.On this basis, target is obtained to wait
Select the label information of frame.
Target candidate frame module is generated, is used in the characteristic pattern to be mapped, for each location of pixels, according to mesh
Dimensioning S and target length-width ratio R generates target candidate frame;
On the characteristic pattern to be mapped, according to target size S and target length-width ratio R, anchor point is generated.Characteristic pattern to be mapped
Figure is the semantic information of testing image, centered on each location of pixels on its figure, according to target size S and target length and width
Than R, a series of anchor point is generated.Wherein, each anchor point corresponds to twin target size S and target length-width ratio R.The present invention can
Using the combination of sizes S and plurality of target length-width ratio R.The target detection of the present embodiment is target, common target
Length-width ratio between [1:1,1.5:1].Since the size of target depends on the distance taken pictures, target size can be very little
Region, such as 60*60 (unit: pixel), target size may be very big region, such as 1280*960 (pixel).Because
The variation range that target size can be set is larger.According to R pairs of target size S and target length-width ratio, the anchor point of generation, note
For the target candidate frame of this paper.
Number based on the target size S and target length-width ratio R target candidate frame generated can be target size S and target
The combined number of length-width ratio R.I.e. for each target size S, its target length-width ratio R, generates target size S and mesh respectively
Mark pair of length-width ratio.Then according to one anchor point of described in each pairs of generation, finally gather all anchor points, generate target candidate
Frame.
The label information module for determining target candidate frame, for determining the label information of the target candidate frame.If institute
The target callout box for stating target candidate frame and the target sample data has intersection, and the ratio between its intersection and its union, is greater than
The target candidate collimation mark is then denoted as positive sample by preset threshold T1;If the ratio between its intersection and its union, it is less than default threshold
The target candidate collimation mark is then denoted as negative sample by value T2;
The label information of target candidate frame is based on target candidate for convolutional network of the training based on target candidate frame
The convolutional network of frame learns its distribution according to the label information of target candidate frame, obtains the corresponding parameter information of the distribution.Needle
To each target candidate frame, its intersection and union with each real goal frame are sought respectively;Threshold value T1 is set, if institute
The ratio for stating intersection and union is greater than T1, then infuses this target candidate collimation mark for positive sample;Threshold value T2 is set, if the intersection
It is less than T2 with the ratio of union, then this target candidate collimation mark is infused into position negative sample.Herein, threshold value T1 is greater than threshold value T2.
Threshold value T1 is arranged larger, then positive sample can be very accurate, but the number of positive sample can be seldom.In the case,
In order to avoid data trend is in negative sample, threshold value T2 can reduce.If the sample number collected is very big, it may be considered that in this way
It does;If the sample number collected is less, doing so will lead to that training data is very little, and the risk of model over-fitting improves.If
The threshold value T1 of setting, can cause sample to be mixed into impurity at this time, i.e. sample is not clean, it is slow will lead to model convergence rate, or not
Convergence.So should take the circumstances into consideration to consider its value for threshold value T1 and threshold value T2.
Sample between threshold value T1 and threshold value T2 can directly be given up.This sample includes small part target, multi-section point
The sample of impurity, this part is not clean, will lead to model and is difficult to train, i.e., model is difficult to restrain, it is therefore proposed that directly giving up.
There are also a kind of situation, a real goal sample pane is small with the friendship union ratio of any one target candidate frame
In threshold value T1, this authentic specimen frame is not matched with it at this time target candidate frame.For this situation, the present invention uses following dresses
It sets:
The label target frame of the target sample data marked is marked first.If target candidate frame and the mark
The label target frame for the target sample data crossed has intersection, and the ratio between its intersection and its union, is greater than threshold value T1, then the mesh
It marks candidate frame and is labeled as positive sample, and the label target collimation mark is denoted as and has matched target candidate frame;
Then the label target frame that statistics is not matched with target candidate frame;
Finally matching operation is carried out for the label target frame not matched with target candidate frame again.For each
It is a not with target candidate frame pairing label target frame, calculate this with target candidate frame pairing label target frame with entirely
The intersection of portion's target candidate frame and the ratio of union, and be ranked up, take maximum ratio;It is corresponding with the maximum ratio
Target candidate frame is labeled as positive sample.
The map information module for determining target sample data, for determining the map information of the target callout box.It calculates
The target callout box of the target sample data is mapped in characteristic pattern to be mapped by the scaling ratio of the feature extraction network
On, obtain the map information of target callout box;
The map information of target callout box, for obtaining the recurrence information of target frame.The label information of target candidate frame is
Based on what is obtained on characteristic pattern to be mapped, the recurrence information of target candidate frame should also be obtained from characteristic pattern to be mapped, because
The classification and recurrence of target candidate frame be it is arranged side by side, symmetrically.The mark of the target sample image marked based on collection
Information is the resolution ratio based on original sample image, needs for this markup information to be mapped to the resolution of characteristic pattern to be mapped herein
In rate.The scaling ratio for calculating the feature extraction network, by the label target frame of the target sample data marked
It is mapped on the characteristic pattern figure to be mapped, obtains the map information of target callout box.
Compared with prior art, the present embodiment, which is used, constructs candidate region based on anchor mechanism.It rule of thumb fills first
Target size S and target length-width ratio R are installed, is then based on the friendship union ratio of target candidate frame and target frame, respectively often
Label is arranged in one target candidate frame, in order to avoid wasting data, at least to match a target for each target frame and wait
Select frame.The device of such construction target frame, reduces difficulty for the object detecting device based on target candidate frame, accelerates convergence.
Further, since forecast sample provided in this embodiment is the object detecting device based on anchor point, therefore based on target candidate frame
It can be trained on the basis of being based on anchor point, such device improves algorithm accuracy rate.
Embodiment three
The present embodiment provides predict that object element, the unit include: in a kind of object detecting device
It obtains first object collection modules: obtaining the target area of the convolutional network, be denoted as first object region, obtain
The target score of the convolutional network is denoted as first object score;It establishes the first object region and the first object obtains
The mapping set divided, is denoted as first object set;
First object region and first object score, for the result exported based on the convolutional network of target candidate frame.This mesh
It is many to mark region overlapping area, will lead to many redundancies if all output it.Therefore it needs to export result to it
It is handled.In order to facilitate processing, first object region and first object can be obtained into people and be mapped as gathering, wherein first object
It is scored at keyword.
Calculate the corresponding object module of highest goal score: according to first object score, to the first object set into
Row sequence, obtains the highest goal score in current first object set;
In first object set, in order to obtain the target area of top score, need according to target score keyword pair
First object set is ranked up.
It calculates and is handed over the target of top score and than the object module less than preset threshold: it is corresponding to calculate highest goal score
Object module in the corresponding target area of highest goal score and first object set in remaining target area intersection and
The ratio between union deletes its intersection and simultaneously when the ratio between the intersection and union are greater than preset threshold T3 from first object set
The corresponding target area of the ratio between collection;
Calculate in the target area and first object set of highest goal score the intersection of remaining target area and union it
Than, be arranged threshold value T3, when the ratio between this intersection and union be greater than T3 when, deleted in first object set corresponding target area and
Target score.
It calculates the second target collection module: highest goal in first object set must be removed, save to the second object set
It closes;
By in first object set, highest goal score and corresponding target area are added to the second target collection, simultaneously
Delete the highest goal score in first object set and target area pair.
It calculates detection target collection module: merging first object set and the second target collection, be object to be measured image
Target area.
The first object set obtained at this time and the second target collection are the target area of testing image, are completed at this time
The all processes of target detection.
Compared with prior art, the present embodiment proposes a kind of after-treatment device of target detection.Based on target candidate frame
Convolutional network extract target frame have very big Duplication, it includes a large amount of redundancies.For this phenomenon, this implementation
Example uses a kind of after-treatment device, threshold value T3 is arranged, by the corresponding target area of target top score and remaining target area
Domain, which calculates, hands over union ratio, deletes its target area for being greater than threshold value T3, progressive alternate, until the friendship of any two target area
Union ratio is respectively less than threshold value T3.It is more in line with the mankind's while reducing the redundancy of testing result by this processing
Understanding habit.
It is upper described, only a specific embodiment of the invention, but scope of protection of the present invention is not limited thereto, and it is any ripe
Know those skilled in the art in the technical scope disclosed by the present invention, any changes or substitutions that can be easily thought of, should all contain
Lid is within protection scope of the present invention.Therefore, protection scope of the present invention should be subject to the protection scope in claims.
Claims (10)
1. a kind of object detecting device, including memory, Memory Controller, processor and Peripheral Interface;
The Peripheral Interface includes image acquisition units, man-machine interaction unit and display unit;
The memory includes module of target detection;
The processor obtains described image acquisition unit acquired image information, is then that input is held with described image information
The row module of target detection, exports the target information of described image information;Finally the target information is shown in described aobvious
Show in unit;
Wherein the module of target detection includes:
Sample unit is obtained, for obtaining the target sample data marked;
The convolutional network unit based on target candidate frame is established, for establishing the convolutional network based on target candidate frame, the volume
Product network includes feature extraction network and target classification Recurrent networks;
The label information of target candidate frame and the map information element of target sample data are determined, for according to the feature extraction
Network and the target sample data, the target callout box of the label information and target sample data that determine target candidate frame is in institute
State the map information of the target callout box on feature extraction network the last layer characteristic layer;
The recurrence information unit for determining target candidate frame, for the label information and the target mark according to the target candidate frame
The map information for infusing frame, determines the recurrence information of the target candidate frame;
Convolutional network unit of the training based on target candidate frame, for by the label information of the target candidate frame and the target
Truth data of the recurrence information of candidate frame as the target candidate frame, the training convolutional network, when the convolutional network
When simulating the distribution of the Truth data of the target candidate frame, training terminates;
It predicts object element, receives object to be measured image, for exporting target area and target score by the convolutional network
Carry out target prediction.
2. the apparatus according to claim 1, which is characterized in that the feature extraction network is last three layers of VGG-16 removal
Full articulamentum after remaining part.
3. the apparatus according to claim 1, which is characterized in that the label information and target sample of the determining target candidate frame
The map information element of notebook data, comprising:
It obtains band mappings characteristics module and is denoted as spy to be mapped for obtaining the last layer information of the feature extraction network
Sign figure;
Target candidate frame module is generated, is used in the characteristic pattern to be mapped, for each location of pixels, according to target ruler
Very little S and target length-width ratio R generates target candidate frame;
The label information module for determining target candidate frame, for determining the label information of the target candidate frame;If the mesh
The target callout box of mark candidate frame and the target sample data has intersection, and the ratio between its intersection and its union, is greater than default
The target candidate collimation mark is then denoted as positive sample by threshold value T1;If the ratio between its intersection and its union, it is less than preset threshold T2,
The target candidate collimation mark is then denoted as negative sample;
The map information module for determining target sample data, for determining the map information of the target callout box;Described in calculating
The target callout box of the target sample data is mapped on characteristic pattern to be mapped, obtains by the scaling ratio of feature extraction network
Obtain the map information of target callout box.
4. device according to claim 3, which is characterized in that the number for generating target candidate frame is the target ruler
The product of the number of very little S and the number of the target length-width ratio R.
5. device according to claim 3, which is characterized in that the label information module of the determining target candidate frame, also
Include:
Mark the target callout box module of target sample data;If target candidate frame and the target sample data marked
Target callout box have intersection, and the ratio between its intersection and its union, be greater than preset threshold T1, then target candidate collimation mark note
For positive sample, and by the target callout box labeled as having matched target candidate frame;
The target callout box module that statistics is not matched with target candidate frame;
Again matching operation module will not be carried out with the target callout box of target candidate frame successful matching;Do not have for each
With the target callout box of target candidate frame successful matching, calculate this not with the target callout box of target candidate frame successful matching with
The intersection of target complete candidate frame and the ratio of union, and be ranked up;By target candidate frame corresponding to maximum ratio, mark
It is denoted as positive sample.
6. the apparatus according to claim 1, which is characterized in that the recurrence information unit of the determining target candidate frame, packet
It includes:
The label information of the target candidate frame includes positive sample and negative sample;
If the label information of the target candidate frame is positive sample, the ratio between intersection and the union with the target candidate frame are obtained
Maximum target callout box;Offset of the target candidate frame position relative to the target callout box is calculated, it will be described inclined
Shifting amount makees the recurrence information of target candidate frame.
7. the apparatus according to claim 1, which is characterized in that convolutional network list of the training based on target candidate frame
Member, comprising:
The label information of the target candidate frame includes positive sample and negative sample;
The target candidate frame that the label information is positive sample and negative sample is randomly selected, trains the convolutional network, wherein institute
The number and the label information of stating the target candidate frame that label information is positive sample are the number of the target candidate frame of negative sample
Unanimously.
8. device according to claim 7, which is characterized in that convolutional network list of the training based on target candidate frame
Member, comprising:
The objective cost function of the convolutional network is calculated, the objective cost function includes: classification cost and recurrence cost.
9. device according to claim 8, which is characterized in that the objective cost function for calculating the convolutional network,
Include:
Classification cost is normalized divided by the number for the target candidate frame for participating in calculating,
Cost is returned to be normalized divided by the number of four times of positive sample target candidate frame.
10. the apparatus according to claim 1, feature in the prediction object element, comprising:
It obtains first object collection modules: obtaining the target area of the convolutional network, be denoted as first object region, described in acquisition
The target score of convolutional network is denoted as first object score;Establish the first object region and the first object score
Mapping set is denoted as first object set;
It calculates the corresponding object module of highest goal score: according to first object score, the first object set being arranged
Sequence obtains the highest goal score in current first object set;
It calculates and is handed over the target of top score and than the object module less than preset threshold: the corresponding mesh of calculating highest goal score
Mark the intersection and union of remaining target area in the corresponding target area of highest goal score and the first object set in module
The ratio between, when the ratio between the intersection and union are greater than preset threshold T3, deleted from first object set its intersection and union it
Than corresponding target area;
It calculates the second target collection module: highest goal in first object set must be removed, save to the second target collection;
It calculates detection target collection module: merging first object set and the second target collection, be the target of object to be measured image
Region.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811385266.7A CN109614990A (en) | 2018-11-20 | 2018-11-20 | A kind of object detecting device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811385266.7A CN109614990A (en) | 2018-11-20 | 2018-11-20 | A kind of object detecting device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109614990A true CN109614990A (en) | 2019-04-12 |
Family
ID=66004223
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811385266.7A Pending CN109614990A (en) | 2018-11-20 | 2018-11-20 | A kind of object detecting device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109614990A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110930408A (en) * | 2019-10-15 | 2020-03-27 | 浙江大学 | Semantic image compression method based on knowledge reorganization |
CN111178126A (en) * | 2019-11-20 | 2020-05-19 | 北京迈格威科技有限公司 | Target detection method, target detection device, computer equipment and storage medium |
CN111444828A (en) * | 2020-03-25 | 2020-07-24 | 腾讯科技(深圳)有限公司 | Model training method, target detection method, device and storage medium |
CN112884055A (en) * | 2021-03-03 | 2021-06-01 | 歌尔股份有限公司 | Target labeling method and target labeling device |
CN113128575A (en) * | 2021-04-01 | 2021-07-16 | 西安电子科技大学广州研究院 | Target detection sample balancing method based on soft label |
CN114219936A (en) * | 2021-10-28 | 2022-03-22 | 中国科学院自动化研究所 | Object detection method, electronic device, storage medium, and computer program product |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106250812A (en) * | 2016-07-15 | 2016-12-21 | 汤平 | A kind of model recognizing method based on quick R CNN deep neural network |
CN106683091A (en) * | 2017-01-06 | 2017-05-17 | 北京理工大学 | Target classification and attitude detection method based on depth convolution neural network |
CN106709936A (en) * | 2016-12-14 | 2017-05-24 | 北京工业大学 | Single target tracking method based on convolution neural network |
CN107480730A (en) * | 2017-09-05 | 2017-12-15 | 广州供电局有限公司 | Power equipment identification model construction method and system, the recognition methods of power equipment |
CN108537215A (en) * | 2018-03-23 | 2018-09-14 | 清华大学 | A kind of flame detecting method based on image object detection |
-
2018
- 2018-11-20 CN CN201811385266.7A patent/CN109614990A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106250812A (en) * | 2016-07-15 | 2016-12-21 | 汤平 | A kind of model recognizing method based on quick R CNN deep neural network |
CN106709936A (en) * | 2016-12-14 | 2017-05-24 | 北京工业大学 | Single target tracking method based on convolution neural network |
CN106683091A (en) * | 2017-01-06 | 2017-05-17 | 北京理工大学 | Target classification and attitude detection method based on depth convolution neural network |
CN107480730A (en) * | 2017-09-05 | 2017-12-15 | 广州供电局有限公司 | Power equipment identification model construction method and system, the recognition methods of power equipment |
CN108537215A (en) * | 2018-03-23 | 2018-09-14 | 清华大学 | A kind of flame detecting method based on image object detection |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110930408A (en) * | 2019-10-15 | 2020-03-27 | 浙江大学 | Semantic image compression method based on knowledge reorganization |
CN110930408B (en) * | 2019-10-15 | 2021-06-18 | 浙江大学 | Semantic image compression method based on knowledge reorganization |
CN111178126A (en) * | 2019-11-20 | 2020-05-19 | 北京迈格威科技有限公司 | Target detection method, target detection device, computer equipment and storage medium |
CN111444828A (en) * | 2020-03-25 | 2020-07-24 | 腾讯科技(深圳)有限公司 | Model training method, target detection method, device and storage medium |
CN112884055A (en) * | 2021-03-03 | 2021-06-01 | 歌尔股份有限公司 | Target labeling method and target labeling device |
CN112884055B (en) * | 2021-03-03 | 2023-02-03 | 歌尔股份有限公司 | Target labeling method and target labeling device |
CN113128575A (en) * | 2021-04-01 | 2021-07-16 | 西安电子科技大学广州研究院 | Target detection sample balancing method based on soft label |
CN114219936A (en) * | 2021-10-28 | 2022-03-22 | 中国科学院自动化研究所 | Object detection method, electronic device, storage medium, and computer program product |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109558902A (en) | A kind of fast target detection method | |
CN109614990A (en) | A kind of object detecting device | |
Xie et al. | Scut-fbp: A benchmark dataset for facial beauty perception | |
WO2020215985A1 (en) | Medical image segmentation method and device, electronic device and storage medium | |
CN104346370B (en) | Picture search, the method and device for obtaining image text information | |
WO2021042547A1 (en) | Behavior identification method, device and computer-readable storage medium | |
CN109902546A (en) | Face identification method, device and computer-readable medium | |
CN107808143A (en) | Dynamic gesture identification method based on computer vision | |
CN108304820A (en) | A kind of method for detecting human face, device and terminal device | |
CN108399386A (en) | Information extracting method in pie chart and device | |
CN107871102A (en) | A kind of method for detecting human face and device | |
CN103988232B (en) | Motion manifold is used to improve images match | |
CN110503074A (en) | Information labeling method, apparatus, equipment and the storage medium of video frame | |
CN108197532A (en) | The method, apparatus and computer installation of recognition of face | |
CN109978918A (en) | A kind of trajectory track method, apparatus and storage medium | |
CN106557173B (en) | Dynamic gesture identification method and device | |
CN106874826A (en) | Face key point-tracking method and device | |
CN108052884A (en) | A kind of gesture identification method based on improvement residual error neutral net | |
CN110543906B (en) | Automatic skin recognition method based on Mask R-CNN model | |
CN105160317A (en) | Pedestrian gender identification method based on regional blocks | |
CN110245545A (en) | A kind of character recognition method and device | |
CN106971130A (en) | A kind of gesture identification method using face as reference | |
CN109670517A (en) | Object detection method, device, electronic equipment and target detection model | |
CN106203284B (en) | Method for detecting human face based on convolutional neural networks and condition random field | |
CN107808376A (en) | A kind of detection method of raising one's hand based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190412 |