CN109614990A

CN109614990A - A kind of object detecting device

Info

Publication number: CN109614990A
Application number: CN201811385266.7A
Authority: CN
Inventors: 高体红
Original assignee: Chengdu Tongjia Youbo Technology Co Ltd
Current assignee: Chengdu Tongjia Youbo Technology Co Ltd
Priority date: 2018-11-20
Filing date: 2018-11-20
Publication date: 2019-04-12

Abstract

The invention discloses a kind of object detecting devices.A kind of object detecting device, including memory, Memory Controller, processor and Peripheral Interface；The Peripheral Interface includes image acquisition units, man-machine interaction unit and display unit；The memory includes module of target detection；The processor obtains described image acquisition unit acquired image information, is then that input executes the module of target detection with described image information, exports the target information of described image information；Finally the target information is shown in the display unit.Wherein the module of target detection includes: obtaining sample unit；Establish the convolutional network unit based on target candidate frame；Determine the label information of target candidate frame and the map information element of target sample data；Determine the recurrence information unit of target candidate frame；Convolutional network unit of the training based on target candidate frame；Predict object element.The present invention is suitable for detecting the application scenario of target.

Description

A kind of object detecting device

Technical field

The present invention relates to target detection technique field, in particular to a kind of device of target detection.

Background technique

With the progress of science and technology, the needs of target detection are more and more.Need to detect target in U.S. face application, herein On the basis of carry out U.S. face；Changing face is also first to detect target in applying, and is then being changed face；Target attendance is also required to first detect mesh Mark, in identification target.In use above, the accuracy of target detection has vital influence to above-mentioned application.

With the rise of convolutional neural networks, target detection achieves significant progress, and accuracy rate soars all the way.But It is, since the calculation amount of convolutional network is huge, to need high performance GPU (such as TITAN) could real-time detection.Its high cost is always It is the critical bottleneck for restricting its volume production.

This paper presents a kind of object detecting devices, identify whether target candidate frame is target by sorter network, lead to Recurrent networks are crossed to predict offset of the target candidate frame relative to real goal, sorter network and Recurrent networks sharing feature Layer, using error in classification and regression error together training convolutional neural networks, realizes end-to-end to reduce the calculation amount of algorithm Training.This network has achieved the purpose that reduce calculation amount and model parameter amount by contribution characteristic layer, to be real-time detection The condition of creation.

Summary of the invention

The embodiment of the present invention provides a kind of object detecting device, can be with real-time detection target.

The present invention is the technical solution that embodiment uses are as follows:

A kind of object detecting device, including memory 111, Memory Controller 112, processor 113 and Peripheral Interface 114；

The Peripheral Interface includes image acquisition units 115, man-machine interaction unit 116 and display unit 117；

The memory includes module of target detection 200；

The processor obtains described image acquisition unit acquired image information, is then defeated with described image information Enter to execute

The module of target detection exports the target information of described image information；Finally the target information is shown in It is described aobvious

Show in unit.

Wherein the module of target detection includes:

Sample unit is obtained, for obtaining the target sample data marked；

The convolutional network unit based on target candidate frame is established, for establishing the convolutional network based on target candidate frame, institute Stating convolutional network includes feature extraction network and target classification Recurrent networks；

The label information of target candidate frame and the map information element of target sample data are determined, for according to the feature Network and the target sample data are extracted, determine the label information of target candidate frame and the target callout box of target sample data The map information of target callout box on the feature extraction network the last layer characteristic layer；

The recurrence information unit for determining target candidate frame, for the label information and the mesh according to the target candidate frame The map information for marking callout box, determines the recurrence information of the target candidate frame；

Convolutional network unit of the training based on target candidate frame, for by the label information of the target candidate frame and described Truth data of the recurrence information of target candidate frame as the target candidate frame, the training convolutional network, when the convolution When network analog goes out the distribution of the Truth data of the target candidate frame, training terminates；

It predicts object element, receives object to be measured image, for exporting target area and target by the convolutional network Score carries out target prediction.

Further, the feature extraction network is that VGG-16 removes remaining part after last three layers of full articulamentum.

Further, the map information element of the label information of the determining target candidate frame and target sample data, packet It includes:

Band mappings characteristics module is obtained to be denoted as obtaining the last layer information of the feature extraction network wait reflect Penetrate characteristic pattern；

Target candidate frame module is generated, is used in the characteristic pattern to be mapped, for each location of pixels, according to mesh Dimensioning S and target length-width ratio R generates target candidate frame；

The label information module for determining target candidate frame, for determining the label information of the target candidate frame.If institute The target callout box for stating target candidate frame and the target sample data has intersection, and the ratio between its intersection and its union, is greater than The target candidate collimation mark is then denoted as positive sample by preset threshold T1；If the ratio between its intersection and its union, it is less than default threshold The target candidate collimation mark is then denoted as negative sample by value T2；

The map information module for determining target sample data, for determining the map information of the target callout box.It calculates The target callout box of the target sample data is mapped in characteristic pattern to be mapped by the scaling ratio of the feature extraction network On, obtain the map information of target callout box.

Further, number and the target length and width that the number for generating target candidate frame is the target size S The product of number than R.

Further, the label information module of the determining target candidate frame, further includes:

Mark the target callout box module of target sample data.If target candidate frame and the target sample marked The target callout box of data has intersection, and the ratio between its intersection and its union, is greater than preset threshold T1, then the target candidate frame Labeled as positive sample, and the target callout box is labeled as to have matched target candidate frame；

The target callout box module that statistics is not matched with target candidate frame.

Again matching operation module will not be carried out with the target callout box of target candidate frame successful matching.For each Not with the target callout box of target candidate frame successful matching, calculates this and do not marked with the target of target candidate frame successful matching The intersection of frame and target complete candidate frame and the ratio of union, and be ranked up；By target candidate corresponding to maximum ratio Frame is labeled as positive sample.

Further, the recurrence information unit of the determining target candidate frame, comprising:

The label information of the target candidate frame includes positive sample and negative sample；

If the label information of the target candidate frame is positive sample, intersection and union with the target candidate frame are obtained The ratio between maximum target callout box；Offset of the target candidate frame position relative to the target callout box is calculated, by institute State the recurrence information that offset makees target candidate frame.

Further, convolutional network unit of the training based on target candidate frame, comprising:

The target candidate frame that the label information is positive sample and negative sample is randomly selected, the convolutional network is trained, Described in label information be positive sample target candidate frame number and the label information be negative sample target candidate frame Number is consistent.

The objective cost function of the convolutional network is calculated, the objective cost function includes: classification cost and recurrence generation Valence.

Further, the objective cost function for calculating the convolutional network, comprising:

Classification cost is normalized divided by the number for the target candidate frame for participating in calculating,

Cost is returned to be normalized divided by the number of four times of positive sample target candidate frame.

Further, the prediction object element, comprising:

It obtains first object collection modules: obtaining the target area of the convolutional network, be denoted as first object region, obtain The target score of the convolutional network is denoted as first object score；It establishes the first object region and the first object obtains The mapping set divided, is denoted as first object set；

Calculate the corresponding object module of highest goal score: according to first object score, to the first object set into Row sequence, obtains the highest goal score in current first object set；

It calculates and is handed over the target of top score and than the object module less than preset threshold: it is corresponding to calculate highest goal score Object module in the corresponding target area of highest goal score and first object set in remaining target area intersection and The ratio between union deletes its intersection and simultaneously when the ratio between the intersection and union are greater than preset threshold T3 from first object set The corresponding target area of the ratio between collection；

It calculates the second target collection module: highest goal in first object set must be removed, save to the second object set It closes；

It calculates detection target collection module: merging first object set and the second target collection, be object to be measured image Target area.

Compared with prior art, the invention proposes a kind of devices of target detection.The present apparatus is closed using attention mechanism Mark candidate frame is gazed at, identifies whether target candidate frame is target by sorter network, predicts target candidate by Recurrent networks Offset of the frame relative to real goal, sorter network and Recurrent networks sharing feature layer use to reduce the calculation amount of algorithm Error in classification and regression error training convolutional neural networks together.Device proposed by the present invention, can be on the video card of low performance Accurate detection detection, frame per second it can achieve 45fps/s in real time, accuracy rate 98% meets target in industry The demand of detection.

Detailed description of the invention

It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description is only of the invention some Embodiment for those of ordinary skill in the art without creative efforts, can also be attached according to these Figure obtains other attached drawings.

Fig. 1 is a kind of schematic diagram of object detecting device.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts all other Embodiment shall fall within the protection scope of the present invention.

The advantages of to make technical solution of the present invention, is clearer, makees specifically to the present invention with reference to the accompanying drawings and examples It is bright.

Embodiment one

The memory includes module of target detection 200；

Show in unit.

Wherein the module of target detection includes:

Sample unit is obtained, for obtaining the target sample data marked；

The target sample data are including but not limited to target detection data, such as BioID Face Database-FaceDB With Labeled Faces in the Wild Home (LFW) etc..

The mark of target sample is marked including but not limited to by the rectangle frame of target area rule.Such as, in image I (x, Y, w, h) position expression target area, wherein x and y indicates the coordinate in the target area upper left corner, and w and h are respectively target area It is wide and high.

In addition, the target sample data can also include the sample data in practical application scene.For example, if using , can be using mobile phone in different perspectives in the target detection of cell phone client, different distance, the image shot under varying environment, Then label target region.Such mode is although more complicated, and cost of labor can be relatively high, but this data can be targetedly Improve algorithm in the accuracy in this field.Certainly, such data are non-necessary, it may be assumed that if without in practical application scene Sample data, this algorithm model also can be applied to cell phone client, and only its accuracy rate is without the former height.

Based on the convolutional network of target candidate frame, had in training stage and test phase slightly different.In test phase, Its testing image inputted as arbitrary dimension, output are the position coordinates and target score of target detection frame in the testing image Value；In the training stage, input is the sample image of arbitrary dimension, and output is penalty values, this penalty values reacts target candidate The deviation of the target position of the target position and true sample image of the convolutional network prediction of frame.

Convolutional network based on target candidate frame includes feature extraction network and target classification Recurrent networks.

Feature extraction network is used to extract the characteristic information of testing image, can based on network, can be a variety of nets Network is combined.Such feature extraction network can remove remaining convolution net after last three layers of full articulamentum for VGG-16 Network, it includes 5 stages, each stage includes 2-3 convolutional layer and a pond layer, the hyper parameter of the convolutional layer in each stage Equally.Remaining convolutional network after last three layers of original excision full articulamentum can be used in feature extraction network herein.This Network extracts the feature of testing image, its feature is then respectively fed to target classification network, each target candidate frame position Whether it is target, while this feature is sent into Recurrent networks, in each target candidate frame position, predicts relative to target candidate The target position of frame offset.

Feature extraction network objectives are to extract clarification of objective in testing image.When clarification of objective is relative complex, because Target includes variation caused by changing features, ornaments caused by occlusion issue, dressing caused by deformation variation, wearing variation etc.. It needs to learn so complicated characteristic information in view of feature learning network, feature extraction network can use inception mould Block, it can consider using the feature extraction layer in Googlenet.

The feature extraction network can be trained using the device of transfer learning, i.e., first use other classification image To train the essential characteristic of bottom to learn the model parameter of bottom, the target sample image marked that then use is collected The high-rise semantic information of training.In ILSVRC match, VGG16 and Googlenet have had disclosed model parameter, in equipment In the case where not allowing, online disclosed model parameter can be directly used.This model parameter is as feature extraction network Then initial parameter is trained on this basis, obtain the complete parameter of feature extraction network.

Classification Recurrent networks are used to detect the target position in testing image.It is special to be located at feature for the classification Recurrent networks Sign is extracted behind network.The last layer for remembering feature extraction network is characteristic pattern to be mapped, on characteristic pattern to be mapped each Location of pixels is known as an anchor point, and classification Recurrent networks traverse each anchor point respectively, to the target candidate on each anchor point Frame predicts the offset relative to anchor point of score and real goal that it is target respectively.

It is described classification Recurrent networks network structure be, by one D of a characteristic layer boil down to tie up column vector, this arrange to Amount is sent into target classification network, generates the column vector of 2*k dimension, respectively indicates k mesh of each anchor point of characteristic pattern to be mapped Mark whether candidate frame is target, wherein k is the quasi- target candidate frame number of each anchor point of characteristic pattern to be mapped.The column vector It is sent into goal regression network, the column vector of 4*k dimension is generated, respectively indicates each anchor point of target relative to characteristic pattern to be mapped K target candidate frame offset.That is, the target classification Recurrent networks prediction and the target position returned are opposite In anchor point, rather than whole image is predicted.

The main purpose of this step is the label for constructing the Truth data of training of the convolutional network based on target candidate frame Information.This label information is used for the parameter of training objective sorter network.The detailed annotation of specific steps is illustrated in example 2.

The main purpose of this step is to construct the recurrence of the Truth data of training of the convolutional network based on target candidate frame The offset of information, i.e. locations of real targets relative to the target candidate frame of the anchor point on characteristic pattern to be mapped.It is waited based on target The recurrence information of the convolutional network of frame is selected, only there are recurrence information when target position for calculating.I.e., it is only necessary to calculate real goal Multiple regressive objects of place anchor point position.

The device for calculating the recurrence information of the convolutional network based on target candidate frame is as follows: if the target candidate frame Label information is positive sample, calculates the intersection of real goal frame and target candidate frame and the ratio of union.To each true mesh Mark frame, select one hand over and than maximum target candidate frame, calculate the recurrence information of this target candidate frame.It is true that this, which returns information, Offset of the real target frame relative to target candidate frame, it may be assumed that

Wherein, x_a、y_a、w_a、h_aThe respectively pixel coordinate in the upper left corner of anchor point, the length of anchor point and width.x^*、y^*、w^*、 h^*Point Not Wei the pixel coordinate in the upper left corner of real goal frame, real goal frame length and width.To be waited based on target Select the parameter of the quasi- recurrence of the convolutional network of frame.

The present invention is not returned directly against original input picture, but uses the recurrence based on anchor point, is had following Two reasons, firstly, regression parameter needs recalculate, i.e., based on the recurrence of original image when original image is cut etc. Parameter does not accomplish translation invariant shape；Secondly, the recurrence of remaining anchor point, is equivalent to and parameter is limited between [0.0,1.0], The parameter in this section is easy convergence.

It is according to given training objective learning model parameter based on the convolutional network of target candidate frame.This model parameter What is learnt is a kind of distribution of given training objective.If training objective includes some fuzzy amounts, i.e. given training mesh Mark includes the information of some mistakes, and convolutional network is by the highly difficult of study, it is more likely that does not restrain.Thereby, it is ensured that training objective is extremely It closes important.The target for intending training herein includes the label information of target candidate frame and the recurrence information of target candidate frame, label letter Explaining in detail for breath is illustrated in example 2, is returned explaining in detail for information and is analyzed above-mentioned.

The cost function of the training convolutional network seen based on target candidate frame includes two parts, respectively classification generation Valence and recurrence cost.Its formula is as follows:

Wherein, L_total、L_cls、L_regIt total cost function respectively based on target candidate frame, classification cost function and returns Return cost function；I is the index of the current sample for being sent into training；p_iThe probability of target is predicted as i-th of anchor point；It is i-th A anchor point is the probability of target, when i-th of anchor point is target,When i-th of anchor point is non-targeted,t_iIt is pre- Offset of the target of survey relative to i-th of anchor point,For the aim parameter of recurrence；N_clsTo participate in calculating the number for the anchor point classified Mesh；L_regTo participate in calculating the number of the anchor point returned；γ is the aequum that classification returns cost function.

It is not difficult to find out that the convolutional network based on target candidate frame includes classification cost and recurrence cost.Classify cost and return Cost is returned to be normalized respectively with the anchor point number of the number of the anchor point of parametric classification operation and participation regressing calculation.Due to dividing Class cost is different with the range of cost is returned, and cost deviation is modified using γ.

In training process, an iteration is carried out using an image herein, extracts multiple positive and negative samples on an image respectively Originally it is trained model.It is clear that the number of negative sample obviously can extra positive sample, have very big sample tilt phenomenon, The present invention is solved using following device: setting N takes in all anchor points respectively as the anchor point number that an iteration intends training N/2 positive sample and N/2 negative sample are trained, if the number deficiency N/2 of positive sample, using whole positive sample and Suitable negative sample guarantees that positive and negative total sample number is N.

After having trained the convolutional network based on target candidate frame, the model of target detection is obtained.According to this model and network Structure, can direct future position.But the target position predicted at this time can be very much, and have very big intersection and union The ratio between, it is therefore desirable to it is post-processed using certain devices.The device for the post-processing that the present invention uses will in the third embodiment into Row detailed analysis.

Compared with prior art, the present embodiment pays close attention to target candidate frame using attention mechanism, is identified by sorter network Whether target candidate frame is target, and offset of the target candidate frame relative to real goal is predicted by Recurrent networks, classification Network and Recurrent networks sharing feature layer, to reduce the calculation amount of algorithm, using error in classification and regression error, training is rolled up together Product neural network.The device will classify and recurrence is unified for a problem, shares convolution characteristic layer, reduces the operation of algorithm Amount, reduces the size of model, algorithm of the invention is allow to detect target information in real time.

Embodiment two

The present embodiment provides label informations and target sample data that target candidate frame is determined in a kind of object detecting device Map information element, the unit includes:

The feature extraction network extracts the characteristic information of input picture, characteristic information include bottom low-level information and High-rise semantic information.The low-level information of its bottom includes marginal information, colouring information, the texture information etc. of image；High-rise Semantic information, such as nose information, mouth information, eye information, cap information, glasses information.High-rise semantic information is more Reaction be image abstracted information, closer to classification and return information.The present invention obtains feature extraction network most Later layer information, as characteristic pattern to be mapped.This characteristic information is high-rise semantic information.On this basis, target is obtained to wait Select the label information of frame.

On the characteristic pattern to be mapped, according to target size S and target length-width ratio R, anchor point is generated.Characteristic pattern to be mapped Figure is the semantic information of testing image, centered on each location of pixels on its figure, according to target size S and target length and width Than R, a series of anchor point is generated.Wherein, each anchor point corresponds to twin target size S and target length-width ratio R.The present invention can Using the combination of sizes S and plurality of target length-width ratio R.The target detection of the present embodiment is target, common target Length-width ratio between [1:1,1.5:1].Since the size of target depends on the distance taken pictures, target size can be very little Region, such as 60*60 (unit: pixel), target size may be very big region, such as 1280*960 (pixel).Because The variation range that target size can be set is larger.According to R pairs of target size S and target length-width ratio, the anchor point of generation, note For the target candidate frame of this paper.

Number based on the target size S and target length-width ratio R target candidate frame generated can be target size S and target The combined number of length-width ratio R.I.e. for each target size S, its target length-width ratio R, generates target size S and mesh respectively Mark pair of length-width ratio.Then according to one anchor point of described in each pairs of generation, finally gather all anchor points, generate target candidate Frame.

The label information of target candidate frame is based on target candidate for convolutional network of the training based on target candidate frame The convolutional network of frame learns its distribution according to the label information of target candidate frame, obtains the corresponding parameter information of the distribution.Needle To each target candidate frame, its intersection and union with each real goal frame are sought respectively；Threshold value T1 is set, if institute The ratio for stating intersection and union is greater than T1, then infuses this target candidate collimation mark for positive sample；Threshold value T2 is set, if the intersection It is less than T2 with the ratio of union, then this target candidate collimation mark is infused into position negative sample.Herein, threshold value T1 is greater than threshold value T2.

Threshold value T1 is arranged larger, then positive sample can be very accurate, but the number of positive sample can be seldom.In the case, In order to avoid data trend is in negative sample, threshold value T2 can reduce.If the sample number collected is very big, it may be considered that in this way It does；If the sample number collected is less, doing so will lead to that training data is very little, and the risk of model over-fitting improves.If The threshold value T1 of setting, can cause sample to be mixed into impurity at this time, i.e. sample is not clean, it is slow will lead to model convergence rate, or not Convergence.So should take the circumstances into consideration to consider its value for threshold value T1 and threshold value T2.

Sample between threshold value T1 and threshold value T2 can directly be given up.This sample includes small part target, multi-section point The sample of impurity, this part is not clean, will lead to model and is difficult to train, i.e., model is difficult to restrain, it is therefore proposed that directly giving up.

There are also a kind of situation, a real goal sample pane is small with the friendship union ratio of any one target candidate frame In threshold value T1, this authentic specimen frame is not matched with it at this time target candidate frame.For this situation, the present invention uses following dresses It sets:

The label target frame of the target sample data marked is marked first.If target candidate frame and the mark The label target frame for the target sample data crossed has intersection, and the ratio between its intersection and its union, is greater than threshold value T1, then the mesh It marks candidate frame and is labeled as positive sample, and the label target collimation mark is denoted as and has matched target candidate frame；

Then the label target frame that statistics is not matched with target candidate frame；

Finally matching operation is carried out for the label target frame not matched with target candidate frame again.For each It is a not with target candidate frame pairing label target frame, calculate this with target candidate frame pairing label target frame with entirely The intersection of portion's target candidate frame and the ratio of union, and be ranked up, take maximum ratio；It is corresponding with the maximum ratio Target candidate frame is labeled as positive sample.

The map information module for determining target sample data, for determining the map information of the target callout box.It calculates The target callout box of the target sample data is mapped in characteristic pattern to be mapped by the scaling ratio of the feature extraction network On, obtain the map information of target callout box；

The map information of target callout box, for obtaining the recurrence information of target frame.The label information of target candidate frame is Based on what is obtained on characteristic pattern to be mapped, the recurrence information of target candidate frame should also be obtained from characteristic pattern to be mapped, because The classification and recurrence of target candidate frame be it is arranged side by side, symmetrically.The mark of the target sample image marked based on collection Information is the resolution ratio based on original sample image, needs for this markup information to be mapped to the resolution of characteristic pattern to be mapped herein In rate.The scaling ratio for calculating the feature extraction network, by the label target frame of the target sample data marked It is mapped on the characteristic pattern figure to be mapped, obtains the map information of target callout box.

Compared with prior art, the present embodiment, which is used, constructs candidate region based on anchor mechanism.It rule of thumb fills first Target size S and target length-width ratio R are installed, is then based on the friendship union ratio of target candidate frame and target frame, respectively often Label is arranged in one target candidate frame, in order to avoid wasting data, at least to match a target for each target frame and wait Select frame.The device of such construction target frame, reduces difficulty for the object detecting device based on target candidate frame, accelerates convergence. Further, since forecast sample provided in this embodiment is the object detecting device based on anchor point, therefore based on target candidate frame It can be trained on the basis of being based on anchor point, such device improves algorithm accuracy rate.

Embodiment three

The present embodiment provides predict that object element, the unit include: in a kind of object detecting device

First object region and first object score, for the result exported based on the convolutional network of target candidate frame.This mesh It is many to mark region overlapping area, will lead to many redundancies if all output it.Therefore it needs to export result to it It is handled.In order to facilitate processing, first object region and first object can be obtained into people and be mapped as gathering, wherein first object It is scored at keyword.

In first object set, in order to obtain the target area of top score, need according to target score keyword pair First object set is ranked up.

Calculate in the target area and first object set of highest goal score the intersection of remaining target area and union it Than, be arranged threshold value T3, when the ratio between this intersection and union be greater than T3 when, deleted in first object set corresponding target area and Target score.

By in first object set, highest goal score and corresponding target area are added to the second target collection, simultaneously Delete the highest goal score in first object set and target area pair.

The first object set obtained at this time and the second target collection are the target area of testing image, are completed at this time The all processes of target detection.

Compared with prior art, the present embodiment proposes a kind of after-treatment device of target detection.Based on target candidate frame Convolutional network extract target frame have very big Duplication, it includes a large amount of redundancies.For this phenomenon, this implementation Example uses a kind of after-treatment device, threshold value T3 is arranged, by the corresponding target area of target top score and remaining target area Domain, which calculates, hands over union ratio, deletes its target area for being greater than threshold value T3, progressive alternate, until the friendship of any two target area Union ratio is respectively less than threshold value T3.It is more in line with the mankind's while reducing the redundancy of testing result by this processing Understanding habit.

It is upper described, only a specific embodiment of the invention, but scope of protection of the present invention is not limited thereto, and it is any ripe Know those skilled in the art in the technical scope disclosed by the present invention, any changes or substitutions that can be easily thought of, should all contain Lid is within protection scope of the present invention.Therefore, protection scope of the present invention should be subject to the protection scope in claims.

Claims

1. a kind of object detecting device, including memory, Memory Controller, processor and Peripheral Interface；

The Peripheral Interface includes image acquisition units, man-machine interaction unit and display unit；

The memory includes module of target detection；

The processor obtains described image acquisition unit acquired image information, is then that input is held with described image information The row module of target detection, exports the target information of described image information；Finally the target information is shown in described aobvious Show in unit；

Wherein the module of target detection includes:

Sample unit is obtained, for obtaining the target sample data marked；

The convolutional network unit based on target candidate frame is established, for establishing the convolutional network based on target candidate frame, the volume Product network includes feature extraction network and target classification Recurrent networks；

The label information of target candidate frame and the map information element of target sample data are determined, for according to the feature extraction Network and the target sample data, the target callout box of the label information and target sample data that determine target candidate frame is in institute State the map information of the target callout box on feature extraction network the last layer characteristic layer；

The recurrence information unit for determining target candidate frame, for the label information and the target mark according to the target candidate frame The map information for infusing frame, determines the recurrence information of the target candidate frame；

Convolutional network unit of the training based on target candidate frame, for by the label information of the target candidate frame and the target Truth data of the recurrence information of candidate frame as the target candidate frame, the training convolutional network, when the convolutional network When simulating the distribution of the Truth data of the target candidate frame, training terminates；

It predicts object element, receives object to be measured image, for exporting target area and target score by the convolutional network Carry out target prediction.

2. the apparatus according to claim 1, which is characterized in that the feature extraction network is last three layers of VGG-16 removal Full articulamentum after remaining part.

3. the apparatus according to claim 1, which is characterized in that the label information and target sample of the determining target candidate frame The map information element of notebook data, comprising:

It obtains band mappings characteristics module and is denoted as spy to be mapped for obtaining the last layer information of the feature extraction network Sign figure；

Target candidate frame module is generated, is used in the characteristic pattern to be mapped, for each location of pixels, according to target ruler Very little S and target length-width ratio R generates target candidate frame；

The label information module for determining target candidate frame, for determining the label information of the target candidate frame；If the mesh The target callout box of mark candidate frame and the target sample data has intersection, and the ratio between its intersection and its union, is greater than default The target candidate collimation mark is then denoted as positive sample by threshold value T1；If the ratio between its intersection and its union, it is less than preset threshold T2, The target candidate collimation mark is then denoted as negative sample；

The map information module for determining target sample data, for determining the map information of the target callout box；Described in calculating The target callout box of the target sample data is mapped on characteristic pattern to be mapped, obtains by the scaling ratio of feature extraction network Obtain the map information of target callout box.

4. device according to claim 3, which is characterized in that the number for generating target candidate frame is the target ruler The product of the number of very little S and the number of the target length-width ratio R.

5. device according to claim 3, which is characterized in that the label information module of the determining target candidate frame, also Include:

Mark the target callout box module of target sample data；If target candidate frame and the target sample data marked Target callout box have intersection, and the ratio between its intersection and its union, be greater than preset threshold T1, then target candidate collimation mark note For positive sample, and by the target callout box labeled as having matched target candidate frame；

The target callout box module that statistics is not matched with target candidate frame；

Again matching operation module will not be carried out with the target callout box of target candidate frame successful matching；Do not have for each With the target callout box of target candidate frame successful matching, calculate this not with the target callout box of target candidate frame successful matching with The intersection of target complete candidate frame and the ratio of union, and be ranked up；By target candidate frame corresponding to maximum ratio, mark It is denoted as positive sample.

6. the apparatus according to claim 1, which is characterized in that the recurrence information unit of the determining target candidate frame, packet It includes:

If the label information of the target candidate frame is positive sample, the ratio between intersection and the union with the target candidate frame are obtained Maximum target callout box；Offset of the target candidate frame position relative to the target callout box is calculated, it will be described inclined Shifting amount makees the recurrence information of target candidate frame.

7. the apparatus according to claim 1, which is characterized in that convolutional network list of the training based on target candidate frame Member, comprising:

The target candidate frame that the label information is positive sample and negative sample is randomly selected, trains the convolutional network, wherein institute The number and the label information of stating the target candidate frame that label information is positive sample are the number of the target candidate frame of negative sample Unanimously.

8. device according to claim 7, which is characterized in that convolutional network list of the training based on target candidate frame Member, comprising:

The objective cost function of the convolutional network is calculated, the objective cost function includes: classification cost and recurrence cost.

9. device according to claim 8, which is characterized in that the objective cost function for calculating the convolutional network, Include:

10. the apparatus according to claim 1, feature in the prediction object element, comprising:

It obtains first object collection modules: obtaining the target area of the convolutional network, be denoted as first object region, described in acquisition The target score of convolutional network is denoted as first object score；Establish the first object region and the first object score Mapping set is denoted as first object set；

It calculates the corresponding object module of highest goal score: according to first object score, the first object set being arranged Sequence obtains the highest goal score in current first object set；

It calculates and is handed over the target of top score and than the object module less than preset threshold: the corresponding mesh of calculating highest goal score Mark the intersection and union of remaining target area in the corresponding target area of highest goal score and the first object set in module The ratio between, when the ratio between the intersection and union are greater than preset threshold T3, deleted from first object set its intersection and union it Than corresponding target area；

It calculates the second target collection module: highest goal in first object set must be removed, save to the second target collection；

It calculates detection target collection module: merging first object set and the second target collection, be the target of object to be measured image Region.