CN107316058A

CN107316058A - Improve the method for target detection performance by improving target classification and positional accuracy

Info

Publication number: CN107316058A
Application number: CN201710450327.2A
Authority: CN
Inventors: 娄英欣; 周芸; 付光涛; 姜竹青; 门爱东
Original assignee: National News Publishes Broadcast Research Institute Of General Bureau Of Radio Film And Television; Beijing University of Posts and Telecommunications
Current assignee: National News Publishes Broadcast Research Institute Of General Bureau Of Radio Film And Television; Beijing University of Posts and Telecommunications; Academy of Broadcasting Science of SAPPRFT
Priority date: 2017-06-15
Filing date: 2017-06-15
Publication date: 2017-11-03

Abstract

The present invention relates to a kind of method by improving target classification and positional accuracy improvement target detection performance, its technical characteristics is：Characteristics of image is extracted according to convolutional neural networks framework, and selects M layers of output before convolutional layer to carry out Fusion Features, the characteristic pattern of multiple features is formed；Mesh generation is carried out on convolutional layer M, the target candidate frame of fixed number and size is predicted in each network；Candidate frame is mapped on characteristic pattern and cut, result then will be cut and carry out multiple features connection；By the above results by full articulamentum after, characteristics of image is classified by Softmax sorting algorithms, and online iterative regression positioning is carried out with overlapping area loss function, obtains the result of final target detection.The present invention is reasonable in design, feature is extracted by convolutional neural networks, and multilayer fusion is carried out to characteristics of image, finally characteristics of image is classified using Softmax sorting algorithms, and positioned using overlapping area loss function, obtain good object detection results.

Description

Improve the method for target detection performance by improving target classification and positional accuracy

Technical field

It is especially a kind of to be improved by improving target classification and positional accuracy the invention belongs to target detection technique field The method of target detection performance.

Background technology

The mankind have more than 80% information source in vision in the perception engineering of the material world.And image is at certain It is with different patterns to mankind's transmission information, and it is as a kind of important is reflected to one kind of objective reality in meaning Information carrier, with it is directly perceived, abundant in content with being easy to exchange the features such as, be multimedia important composition content, therefore, based on figure As the various applications for the treatment of technology are just arisen at the historic moment.Images steganalysis and detection technique are exactly wherein most typical application skill Art.Computer vision research purpose is that the mankind are realized with computer to the perception of objective world, identification and are understood, target detection (Object Detection) is most common problem in computer vision, and received in theory on computer vision research field Extensive concern, has broad application prospects.Eyes, which are opened, when machine " opening " sees universal time, it is necessary to judge which be present in its visual field A little targets, what is respectively, where.The target detection of view-based access control model is image procossing, computer vision, pattern-recognition Etc. many multi-disciplinary crossing research problems.The purpose of target detection is that target is picked out from the background of differing complexity, and Indicated in the form of encirclement frame (Bounding Box), so as to the follow-up work such as complete to track, recognize.Therefore, target Detection is the high-rise background task understood with application, and the quality of its performance will directly affect follow-up target following, action recognition And the performance of the task on the middle and senior level such as behavior understanding., it is necessary to be handled in real time multiple targets especially in complex scene When, target, which is automatically extracted and recognized, just seems especially important.Therefore, object detection and recognition is graphical analysis and the base understood Plinth, furthers investigate object detection and recognition algorithm, very important meaning is suffered from academia and industrial quarters.But for machine For device, because the dynamic change of complicated identification background and target in itself adds the difficulty of target identification, huge system The matrix operation of parameter and higher-dimension takes substantial amounts of processing time, and the problem of Target detection and identification also has larger such as recognizes The degree of accuracy, real-time all await improve.

The main task of target detection be in image sequence target object carry out automatic detection, including judge classification with Recognize position.Current popular algorithm of target detection, generates 1K-2K candidate frame, then for every first on a pictures Individual candidate frame extracts feature using CNN convolutional neural networks, and secondly feature is inputted to the SVM classifier or Softmax of each class Grader judges whether target belongs to such, finally realizes the precisely fixed of target using the position for returning device amendment candidate frame Position.Traditional algorithm of target detection is had translation, imitative set, rotates using the feature such as SIFT, HOG and LBP by finding in picture The matching between image is realized etc. the invariant features point under change situation, so as to realize target detection.But extract the quality of feature The accuracy of classification is directly influenced, due to the Morphological Diversity of target, illumination variation diversity, the factor such as background diversity makes The feature that a robust must be designed not is that so easily the adaptability of traditional characteristic is not strong.And based on CNN convolutional Neurals The feature extraction of network has good robustness, and convolutional neural networks are more than one of particular design for identification two-dimensional shapes Layer perceptron, this network structure has height consistency to translation, proportional zoom, inclination or the deformation of his common form.Carry The CNN models of feature are taken to be obtained by training in advance, pre-training is complete based on the Computer Vision Recognition challenge matches of ILVCR 2012 Portion's data set is trained, and being then based on the training sets of PASCAL VOC 2007 for pre-training model carries out tuning training, so that Realize and pass through CNN network extraction picture features.Deep learning is widely used in the depth that target detection comes from Alex et al. propositions The convolutional Neural net AlexNet network architectures, the framework achieves extraordinary achievement in the matches of ILSVRC 2012, hereafter, volume Product neutral net is widely used in all kinds of image association areas.The AlexNet of Geoffrey Hinton designs is one 8 layers CNN frameworks, including 5 convolutional layers and 3 full articulamentums, the error rate of best algorithm at that time is halved, and it demonstrates CNN multiple Validity under parasitic mode type, and GPU causes training to obtain result in acceptable time range., Christian in 2014 Szegedy proposes GoogleNet frameworks and taken first place in the classification matches of ILSVRC 2014, different from AlexNet It is：GoogleNet depth (number of plies) is deeper, and width (layer core or neuron number) is wider.The same year, Andrew Zisserman The VGG-Net frameworks of proposition take first place in the positioning matches of ILSVRC 2014, and unlike AlexNet：VGG-Net Using more layers, generally there are 16-19 layers.In 2015, the Res-Net frameworks that Kaiming He are proposed were in ILSVRC 2015 Taken first place in classification and positioning match, the model employs 152 layers of deep layer convolutional neural networks.Hinton professor into Work(, has attracted the concern of a large amount of scholar both at home and abroad；Meanwhile, industrial quarters add deep learning research, Baidu, google, Facebook sets up deep learning laboratory one after another, by deep learning, carries out image recognition and classification.Although researcher carries Go out many algorithm of target detection based on deep learning convolutional neural networks, these algorithms also achieve good effect, but It is still to have many aspects to have much room for improvement, such as picture background is complicated, network inputs size is fixed, candidate frame is excessive, training speed is slow, Consume the problems such as computer memory, wisp detect inaccurate, complex steps and position not accurate.

The content of the invention

It is to overcome the deficiencies in the prior art the mesh of the present invention, proposes that a kind of reasonable in design, precision is high and stability is strong Improve the method for target detection performance by improving target classification and positional accuracy.

The present invention solves its technical problem and takes following technical scheme to realize：

It is a kind of to improve the method for target detection performance by improving target classification and positional accuracy, comprise the following steps：

Step 1, characteristics of image is extracted according to convolutional neural networks framework, and select before convolutional layer M layer output progress feature Fusion, forms the characteristic pattern of multiple features；

Step 2, mesh generation is carried out on convolutional layer M, predict that the target of fixed number and size is waited in each network Select frame；

Step 3, candidate frame is mapped on characteristic pattern cut, then will cut result and carry out multiple features connection；

Step 4, by the above results by full articulamentum after, characteristics of image is classified by Softmax sorting algorithms, And online iterative regression positioning is carried out with overlapping area loss function, obtain the result of final target detection.

The specific method of the step 1 comprises the following steps：

(1) the picture with the true encirclement frame of object is input in convolutional neural networks framework first, carried by Caffe Image is taken to pass through the feature of convolutional neural networks different layers；

(2) the characteristics of image forward convolutional layer exported carries out maximum pondization and operated, and the image that convolutional layer M is exported Feature carries out deconvolution operation, realizes that output characteristic of the size of output all with middle convolutional layer is in the same size；

(3) the last feature for exporting all convolutional layers is merged, and obtains the characteristic pattern of the multi-feature extraction of image.

The implementation method of the step 2 comprises the following steps：

(1) 6*6 mesh generation is carried out on the characteristic pattern that convolutional layer M is exported；

(2) the candidate frame of object may be included by predicting 4 in the small lattice middle of each single network, this 4 candidate frame tools There is fixed size and length-width ratio, length-width ratio is respectively 1:1、1:2 and 2:1, only for 1:The candidate frame of 1 length-width ratio, sets 2 kinds Candidate frame size 0.6 and 0.9；

(3) during network training, we are matched the true encirclement frame and candidate frame of object, by the two IOU overlapping areas screened more than or equal to 0.7, and delete the candidate frame beyond image boundary；

(4) 100 candidate frames are finally generated on convolutional layer M characteristic pattern.

The implementation method of the step 3 comprises the following steps：

(1), according to 100 candidate frames generated on convolutional layer M characteristic pattern, corresponding multilayer is mapped according to its position On characteristic pattern, and cut accordingly on multilayer feature figure；

(2) the characteristic pattern square after cutting is done into 1*1 convolution, 3*3 convolution and 5*5 volumes are then carried out respectively to convolution results Product；

(3) in order to obtain full text information, by multilayer feature figure by maximum pond layer, then by 1*1 convolutional layers and activation Layer；

(4) the convolution output result of 1*1 convolution, 3*3 convolution, 5*5 convolution and full text information is connected according to tandem Connect, form the multiple features connection of candidate frame.

The concrete methods of realizing of the step 4 comprises the following steps：

By full articulamentum after, characteristics of image is classified by Softmax sorting algorithms, based on target detection Data set, has oneself corresponding precision per type objects；

(2) recurrence positioning is carried out to candidate frame by overlapping area loss function so that candidate frame is closer to the true of object Real encirclement frame, the loss function is candidate frame and the common factor area divided by union area of true encirclement frame；

(3) it is ranked up according to Softmax losses and overlapping area penalty values, positive sample and negative sample is filtered out online Ratio is 3:1, renewal Sample Storehouse, which is input on multilayer feature figure, proceeds iterative regression positioning；

(4) after iteration n times, candidate frame closer to object true encirclement frame, model training well after can carry out reality The test of object.

Advantages and positive effects of the present invention are：

1st, picture is inputted in VGG-16 convolutional neural networks to obtain more image informations and extracts image by the present invention Feature, then carries out multilayer to characteristics of image and merges to form multi-characteristic；In order to quickly obtain object candidate frame, in convolutional layer 5 Target candidate frame is generated according to certain length-width ratio and size on characteristic pattern, and is mapped on multi-characteristic and is cut；In order to obtain The information of more candidate frames is obtained, result will be cut and carry out multiple features connection, and be input to full articulamentum；In order to realize precise elevation Classification and positioning, carry out the classification of Softmax graders and the positioning of Overlap loss functions iterative regression, realize complete mesh The classification and positioning of detection are marked, the Detection results better than other main flow target detection frameworks such as Fatser R-CNN have been obtained.

2nd, the present invention is reasonable in design, and it carries out multi-feature extraction using deep learning framework, obtains the multilayer feature of image Represent, but realize more accurately classification；And employ a kind of new Overlap overlapping area loss functions in positioning, The position of target-area object in input picture can be more accurately detected, has obtained good on the data set of target detection Effect.

Brief description of the drawings

Fig. 1 is entire block diagram of the invention；

Fig. 2 generates fixed aspect ratio and the candidate frame of size for the present invention on the characteristic pattern of convolutional layer 5；

Fig. 3 is the Overlap loss functions between the candidate frame of the invention proposed in positioning and true encirclement frame；

Influences of the Fig. 4 for the different training iterations of the present invention to target detection precision；

Fig. 5 is the target detection accuracy table based on PASCAL VOC.

Embodiment

The embodiment of the present invention is further described below in conjunction with accompanying drawing.

It is a kind of to improve the method for target detection performance by improving target classification and positional accuracy, as shown in figure 1, first First, in order to obtain more image informations, the picture with the true encirclement frame of object (Ground Truth) is input to VGG- Characteristics of image is extracted in 16 convolutional neural networks, then carrying out multilayer to characteristics of image merges to form multi-characteristic；Then in order to Object candidate frame quickly is obtained, target candidate frame is generated according to certain length-width ratio and size on the characteristic pattern of convolutional layer 5, and map Cut on to multi-characteristic；Then in order to obtain the information of more candidate frames, result will be cut and carry out multiple features connection, and It is input to full articulamentum；Last classification and positioning in order to realize precise elevation, carry out the classification of Softmax graders and Overlap loss functions iterative regression is positioned, and realizes the classification and positioning of the target detection that complete precision is improved.Below with one Individual instantiation is illustrated：

S1, based on VGG-16 convolutional neural networks framework extract characteristics of image, and by convolutional layer 1,2,3 and 5 carry out feature Fusion, forms the characteristic pattern of multiple features；

S2, mesh generation is carried out on convolutional layer 5, the target candidate of fixed number and size is predicted in each network Frame；

S3, candidate frame is mapped on characteristic pattern cut, then will cut result and carry out multiple features connection；、

After S4, the above results are by full articulamentum, characteristics of image is classified by Softmax sorting algorithms, is used in combination Overlap space wastages function carries out online iterative regression positioning, obtains the result of final target detection.

In the present embodiment, the step S1 further comprises：

S1.1, the picture with the true encirclement frame of object is input to VGG-16 convolutional neural networks frameworks first, passed through Caffe extracts feature of the image by convolutional neural networks different layers；

S1.2, the characteristics of image for exporting convolutional layer 1,2 carry out maximum pondization and operated, and the image that convolutional layer 5 is exported Feature carries out deconvolution operation, realizes that output characteristic of the size of output all with convolutional layer 3 is in the same size；

S1.3, the feature for finally exporting convolutional layer 1,2,3 and 5 are merged, and obtain the spy of the multi-feature extraction of image Levy figure.

Fig. 2 is given in the present invention and carries out mesh generation on the characteristic pattern of convolutional layer 5, and fixed length is generated in each grid Wide 4 candidate frames than with size, the step S2 further comprises：

S2.1, convolutional layer 5 export characteristic pattern on carry out 6*6 mesh generation；

S2.2, predict 4 in the small lattice middle of each single network and may include the candidate frame of object, this 4 candidates Frame has fixed size and length-width ratio, and length-width ratio is respectively 1:1、1:2 and 2:1, only for 1:The candidate frame of 1 length-width ratio, I To set 2 kinds of candidate frame sizes be respectively 32*32 pixels and 64*64 pixels；

S2.3, during network training, we are matched the true encirclement frame and candidate frame of object, pass through two The IOU overlapping areas of person are screened more than or equal to 0.7, and delete the candidate frame beyond image boundary；

S2.4, about 100 candidate frames are finally generated on the characteristic pattern of convolutional layer 5.

In the present embodiment, the step S3 further comprises：

100 candidate frames generated on S3.1, the characteristic pattern according to convolutional layer 5, map corresponding according to its position On multilayer feature figure, and cut accordingly on multilayer feature figure；

S3.2, the characteristic pattern square after cutting done into 1*1 convolution, in order to retain preceding layer can the visual field, and subtract Then convolution results are carried out 3*3 convolution and 5*5 convolution by few amount of calculation respectively；

S3.3, in order to obtain full text information, by multilayer feature figure by maximum pond layer, then by 1*1 convolutional layers with Active coating, can halve amount of calculation；

S3.4, the convolution output result of 1*1 convolution, 3*3 convolution, 5*5 convolution and full text information entered according to tandem Row connection, forms the multiple features connection of candidate frame.

In the present embodiment, the step S4 further comprises：

S4.1, above by convolutional layer and multiple features connection after result by 3 layers of full articulamentum after, pass through Softmax Sorting algorithm is classified to characteristics of image, based on PASCAL VOC data sets, and classification results include 20 type objects, per type objects With oneself corresponding precision；

S4.2, pass through Overlap space wastage function pairs candidate frame carry out recurrence positioning so that candidate frame is closer to thing The true encirclement frame of body, the loss function is candidate frame and the common factor area divided by union area of true encirclement frame, and the numerical value is got over Close to 1 explanation the two closer to；

S4.3, according to Softmax loss and Overlap penalty values be ranked up, positive sample and negative sample are filtered out online Ratio be 3:1, renewal Sample Storehouse, which is input on multilayer feature figure, proceeds iterative regression positioning；

After S4.4, iteration n times, candidate frame closer to object true encirclement frame, model training well after can carry out The test of actual object.

Fig. 3 shows the Overlap losses between the candidate frame proposed in the present invention in positioning and true encirclement frame Function, the step S4.2 further comprises：

The true encirclement frame (Ground Truth) of object, its upper left corner and the lower right corner are included in S4.2.1, input picture Coordinate constitute 4 dimensional vectors, beThe object candidate frame predicted by inventive algorithm, The coordinate of its upper left corner () and the lower right corner () constitutes 4 dimensional vectors, is x=(x₁,y₁,x₂,y₂)；

S4.2.2, traditional coordinate loss function are one-dimensional loss function, and the loss between each coordinate points is summed to come Overall position skew loss is calculated, but traditional method individually makes a distinction coordinate, it is impossible to overall prediction is truly wrapped Skew between peripheral frame and candidate frame is lost, and the formula of traditional one-dimensional coordinate loss function is：

S4.2.3, the skew loss predicted between true encirclement frame and candidate frame for entirety, we have proposed 4 dimension coordinates are carried out overall recurrence and calculated, calculate the area between true encirclement frame and candidate frame by Overlap loss functions Skew loss.Wherein I represents common factor area therebetween, and U represents union area therebetween, by common factor area divided by Union area evaluates position deviation therebetween, and the numerical value illustrates that position coordinates regression effect is better closer to 1.It is described The formula of two-dimensional areas Overlap loss functions is：

I=(x₂′-x₁′)×(y₂′-y₁′)

Fig. 4 shows influence of the training iterationses different in the present invention to target detection precision, the step S4.4 further comprises：

S4.4.1, in the training process according to Softmax loss and Overlap penalty values be ranked up, filter out 3:1 Positive and negative difficult sample, and the sample is re-entered into multilayer feature figure cut, multiple features connection is then proceeded by, By screening difficult sample, the robustness and accuracy of detection for lifting system proposed by the present invention are realized；

S4.4.2, as seen in Figure 4, after successive ignition is trained, the nicety of grading of target detection has been obtained very Big lifting, 1 Iterative classification precision proceeds iteration it is observed that precision is increased rapidly, but is passed through 42% or so Cross after 4 iteration, the increasing degree of precision is smaller, therefore balance accuracy and speed, we select the repetitive exercise number of times of system For 4, the nicety of grading of target detection and the overall raising of regression accuracy are obtained.

Method below as the present invention is tested, and illustrates the experiment effect of the present invention.

Test environment：MATLAB 2014b；Caffe；Ubuntu14.04 systems；NVIDIA GTX 1070p GPU

Cycle tests：Selected cycle tests and the true encirclement frame (Ground of its correspondence standard target detection object Truth), be all from target detection PASCAL VOC data sets (M.Everingham, L.Van Gool, C.K.Williams, J.Winn,and A.Zisserman,“The pascal visual object classes(voc)challenge,” International journal of computer vision,vol.88,no.2,pp.303–338,2007.).Wherein wrap The legend contained has 20 classifications, the respectively mankind；Animal (bird, cat, ox, dog, horse, sheep)；The vehicles (aircraft, bicycle, Ship, bus, car, motorcycle, train)；Indoor (bottle, chair, dining table, potted plant, sofa, TV).From Target be it is daily in most common object, be be exactly can preferably embody the practicality of algorithm, altogether comprising 9,963 figure Piece, there is 24,640 labeled target objects.

Test index：Present invention uses two kinds of evaluation indexes, respectively precision mAP (mean average ) and speed fps (frames per second) precision.Wherein precision mAP is the bat of object detection results Measurement, is compared with dreamboat testing result and is weighted average computation to all objects classification in database, to not This parameter value is calculated with algorithm, it was demonstrated that inventive algorithm obtains preferable result in object detection field；Speed fps is target inspection The measurement of the speed of result is surveyed, how many frame pictures can be handled by per second in test process and enters degree of testing the speed come evaluating target, it is right Algorithms of different calculates this parameter value, it was demonstrated that superiority of the inventive algorithm in object detection field.

Test result as shown in figure 5, Fig. 5 be based in PASCAL VOC data sets all image category measuring accuracies it is flat Equal result, it can be seen that inventive algorithm is significantly improved on mAP compared to other algorithm of target detection, wherein 4 generation of the invention Table loop iteration is trained 4 times, and the present invention 6 represents loop iteration and trained 6 times.The best result Faster R- of current target detection CNN mAP is 73.2%, and the mAP of the present invention 6 is 74.2%, and the accuracy of detection than Faster R-CNN improves 1.0%.And And, in wisp detection, such as bottle, aircraft and plant, inventive algorithm obtain higher accuracy of detection, example than other algorithms Wisp plant is such as directed to, inventive algorithm reaches 50.4%mAP, 11.6%mAP is higher by than Faster R-CNN.The above results Object detection results produced by showing inventive algorithm possess higher precision, and can preferably solve small target deteection Problem.

Target detection speed of the table 1 based on PASCAL VOC

Table 1 is the result based on all image category detection speeds in PASCAL VOC data sets, it can be seen that the present invention Algorithm is significantly improved on fps compared to other algorithm of target detection, carries out full articulamentum to block SVD wherein compression is represented (singular value decomposition) compresses.The best result Faster R-CNN of target detection speed is at present 7fps, the speed of the present invention 4 of compression is 12fps, accelerates 2fps than uncompressed convolutional layer, the speed of the present invention 6 of compression is 12fps, 2fps is accelerated than uncompressed convolutional layer；And the speed of inventive algorithm is 22 times of Fast R-CNN, almost close to real When detect.Object detection results produced by the above results show the present invention possess higher speed, and in target detection In two indexs of speed and precision, best object detection results can be reached, illustrate that inventive algorithm has frontier nature.

It is emphasized that embodiment of the present invention is illustrative, rather than it is limited, therefore present invention bag Include and be not limited to embodiment described in embodiment, it is every by those skilled in the art's technique according to the invention scheme The other embodiment drawn, also belongs to the scope of protection of the invention.

Claims

1. a kind of improve the method for target detection performance by improving target classification and positional accuracy, it is characterised in that including with Lower step：

Step 1, extract characteristics of image according to convolutional neural networks framework, and select before convolutional layer M layer output progress Fusion Features, Form the characteristic pattern of multiple features；

Step 2, mesh generation is carried out on convolutional layer M, the target candidate frame of fixed number and size is predicted in each network；

Step 4, by the above results by full articulamentum after, characteristics of image is classified by Softmax sorting algorithms, is used in combination Overlapping area loss function carries out online iterative regression positioning, obtains the result of final target detection.

2. according to claim 1 improve the method for target detection performance by improving target classification and positional accuracy, It is characterized in that：The specific method of the step 1 comprises the following steps：

(1) the picture with the true encirclement frame of object is input in convolutional neural networks framework first, is extracted and schemed by Caffe As the feature by convolutional neural networks different layers；

(2) the characteristics of image forward convolutional layer exported carries out maximum pondization and operated, and the characteristics of image that convolutional layer M is exported Deconvolution operation is carried out, realizes that output characteristic of the size of output all with middle convolutional layer is in the same size；

3. according to claim 1 improve the method for target detection performance by improving target classification and positional accuracy, It is characterized in that：The implementation method of the step 2 comprises the following steps：

(2) the candidate frame of object may be included by predicting 4 in the small lattice middle of each single network, and this 4 candidate frames have solid Fixed size and length-width ratio, length-width ratio is respectively 1:1、1:2 and 2:1, only for 1:The candidate frame of 1 length-width ratio, sets 2 kinds of candidates Frame size 0.6 and 0.9；

(3) during network training, we are matched the true encirclement frame and candidate frame of object, pass through the IOU of the two Overlapping area is screened more than or equal to 0.7, and deletes the candidate frame beyond image boundary；

4. according to claim 1 improve the method for target detection performance by improving target classification and positional accuracy, It is characterized in that：The implementation method of the step 3 comprises the following steps：

(1), according to 100 candidate frames generated on convolutional layer M characteristic pattern, corresponding multilayer feature is mapped according to its position On figure, and cut accordingly on multilayer feature figure；

(2) the characteristic pattern square after cutting is done into 1*1 convolution, 3*3 convolution and 5*5 convolution are then carried out respectively to convolution results；

(3) in order to obtain full text information, by multilayer feature figure by maximum pond layer, then by 1*1 convolutional layers and active coating；

(4) the convolution output result of 1*1 convolution, 3*3 convolution, 5*5 convolution and full text information is attached according to tandem, Form the multiple features connection of candidate frame.

5. the degree of accuracy according to claim 1 by improving target classification and positioning improves target detection performance, it is special Levy and be：The concrete methods of realizing of the step 4 comprises the following steps：

By full articulamentum after, characteristics of image is classified by Softmax sorting algorithms, the data based on target detection Collection, has oneself corresponding precision per type objects；

(2) recurrence positioning is carried out to candidate frame by overlapping area loss function so that true bag of the candidate frame closer to object Peripheral frame, the loss function is candidate frame and the common factor area divided by union area of true encirclement frame；

(3) it is ranked up according to Softmax losses and overlapping area penalty values, positive sample and the ratio of negative sample is filtered out online For 3:1, renewal Sample Storehouse, which is input on multilayer feature figure, proceeds iterative regression positioning；

(4) after iteration n times, candidate frame closer to object true encirclement frame, model training well after can carry out actual object Test.