CN107818302A

CN107818302A - Non-rigid multiple dimensioned object detecting method based on convolutional neural networks

Info

Publication number: CN107818302A
Application number: CN201710989778.3A
Authority: CN
Inventors: 饶江浩; 徐智勇; 张建林
Original assignee: Institute of Optics and Electronics of CAS
Current assignee: Institute of Optics and Electronics of CAS
Priority date: 2017-10-20
Filing date: 2017-10-20
Publication date: 2018-03-20

Abstract

The invention discloses a kind of non-rigid multiple dimensioned object detecting method based on convolutional neural networks, projected depth network first, and with certain representational infrared pedestrian's data set cvc 09, cvc 14 (has the Small object of about 20x30 pixels, also there is the big target pedestrian of closer distance), and the Small object pedestrian data of oneself mark carry out the depth network learning and training of target detection, are then based on the different detection model of Hardware platform design, and assessed.Designed network, uses 6 candidate frames with 7x7 mesh generations and for each grid, completes target area nomination, realizes that pedestrian target detects by Classification and Identification and position regression analysis.Neutral net is used in the design, adds the ability in feature extraction of detection model.In view of the thought of mesh generation, realizes the real-time of detection, data set and multiple dimensioned training with different target yardstick, increases candidate frame quantity, the final detectability for strengthening Small object.

Description

Non-rigid multiple dimensioned object detecting method based on convolutional neural networks

Technical field

The present invention relates to the technical field that object detects in real time, and in particular to a kind of based on the non-rigid of convolutional neural networks The detection method of multiple dimensioned object.

Background technology

As shown in figure 3, neuron is the mathematical modeling of mimic biology nerve construction, different inputs (such as different pixel Point) summed by weighting (convolution algorithm in such as convolutional neural networks), it is same at node to cross nonlinear function (activation primitive) Conversion obtain the output of the neurode.As shown in figure 4, numerous neurodes can form nervous layer, numerous nervous layers can be taken Build neutral net.Neutral net is shown to training because input is by multilayered nonlinear conversion (nesting of nonlinear function) The powerful capability of fitting of data (learning ability), so as to show good predictive ability.

For neural network model, there are training and two stages of test.Training is to train picture by inputting, and what is obtained is defeated Go out and contrasted with standard output (mark), its error is embodied by loss function, reduces loss function by Optimal Parameters.Survey Examination is then the assessment of the model performance to training, and the generalization ability of model can be evaluated by test result.

In Detection task, network model and three main plates：Region nomination (is chosen target Probability Area, often led to Cross sliding window and selective search), but sliding window is a kind of method of exhaustion in fact in conventional method, and there are many redundancies in process, selectivity Search then needs to take considerable time.Mesh generation, Mei Gequ are taken in detection end to end (positioning, identification integration) Domain each determines whether target, and then synthesis draws end product.

The content of the invention

The invention solves technical problem to be：In the target of mobile device and small device in real time detection, due to not Changeable with non-rigid multiscale target form, angle in scene, traditional detection method universality and robustness be not high.Using god In scheme through network and grid division range searching, solve the problems, such as universality and robustness to a certain extent, while again It ensure that the real-time of detection.

The present invention solve the technical scheme that uses of above-mentioned technical problem for：It is a kind of non-rigid more based on convolutional neural networks Nanoscale object (such as infrared pedestrian image) detection method, construct and detected with the neutral net of powerful feature extraction and learning ability System so that the abstract characteristics in data focusing study to non-rigid object different scale；It is again that program portable is flat to exploitation Platform, by hardware device input picture and carry out the real-time detection of target.

Further, there is suitable and accurate calibration data set (supervised learning), due to neural metwork training data set The middle training dataset lacked for smaller infrared target, and at this stage there is bottle in neutral net in terms of detection is compared with Small object Neck, by the way that the data set of non-standard configuration is found, screened on network, write script and data set mark is converted to the available number of network According on this basis, by shooting and marking, having expanded data, formd trainable data set.

Further, there is powerful ability in feature extraction and the model of learning ability, due to non-rigid targets (such as：People) Diversity ratio is larger in different angle posture, and target has different scale, and conventional method performance is restricted under Nonuniform field scape, The process employs the disaggregated model with powerful ability in feature extraction --- depth network, the model by different Internets it Between intensive connection so that each layer characteristic use is abundant.

Further, real-time detection can be realized on a hardware platform.By extracting the feature of non-rigid targets, then by Recurrence layer in neutral net and classification layer carry out target identification and location estimation to candidate frame, and process is simultaneously end to end for this The identification and positioning of target are completed, is divided by area grid and aforesaid operations is implemented to each region, simplified and searched possibility The process of target, target and non-targeted judgement are disposably completed, so as to ensure that the real-time of detection, last is non-very big Value suppresses the detection output of place to go redundancy.

The present invention compared with prior art the advantages of be：

(1) feature of the present invention is the real-time that network structure ensure that detection end to end.Compared with universal model, use More average pondization reduction operational data, with the thought of dense connection, is carried in the case where not increasing network depth The performance of high network of network, while the framework of the neutral net is succinct, has high portability so that the network energy trained Directly apply in imaging system target detection, realize the landing of detection technique, realize small device have it is intelligent Detection.

(2) method that the present invention employs augmentation on the training data, while (target is full of the positive sample in database Whole pictures) regard the training data of target detection, further directed to property the feature of learning objective in itself.

Brief description of the drawings

Fig. 1 is the flow chart that the inventive method network training process is realized；

Fig. 2 is the flow chart that the inventive method network detection process is realized；

Fig. 3 is the mathematical modeling of neuron node in neutral net；

The basic model of Fig. 4 neutral nets；

Fig. 5 is dense link model；

Fig. 6 is mesh generation input picture schematic diagram；

Fig. 7 signals network class after feature extraction identifies the net region where target；

Fig. 8 represents to return the region where target with candidate frame；

Fig. 9 is the final output after non-maxima suppression redundancy detection frame；

Figure 10 is that loss function changes chart in training process.

Embodiment

Below in conjunction with the accompanying drawings and embodiment further illustrates the present invention.

As shown in figure 1, a kind of non-rigid multiple dimensioned object detecting method based on convolutional neural networks of the present invention, this method Comprise the following steps that：

Step 1, the image with mark；

Fig. 1 represents whole training process.Training data (image to be marked) with mark is inputted into neutral net, led to Cross propagated forward and obtain output result.By output result and mark be compared to obtain error (weighed with loss function, training Target is to reduce loss function), by the error gradient backpropagation undated parameter of neutral net, so circulation, make loss letter Number reduces and convergence.

During training, translate, in addition to multiple dimensioned, adjustment picture contrast method except with traditional, also use The mode of other data augmentation carrys out the increased view data for being used to train, such as：

(1) target in the data set of mark is cut out to come, turns into the fresh target that callout box is view picture figure size.

(2) next target image random combine will be cut out to be placed on new background image, turns into new training sample.

Step 2, with 7 × 7 mesh generations into different zones；

Mesh generation except carrying out 7 × 7 to input picture, division size can be selected according to specific tasks.Such as For small compared with Small object in infrared pedestrian detection, using the above method., can be to nerve if detecting the fish of underwater large volume In network characteristic layer rearward carry out mesh generation (because in a network, what convolutional layer above extracted be more edge and Contour feature, below be then relatively abstract semantic feature), the information of characteristic layer is imported into the test section of network end-point Point.

Divide yardstick and characteristic layer selection can the motor-driven selection according to specific tasks, it is few and as far as possible not with characteristic parameter Premised on the amount of calculation for increasing network.

Step 3, feature extraction is carried out to entire image；

Feature extraction is completed by the convolutional layer in neutral net.The convolution nuclear parameter of neutral net convolutional layer is existing initial Change, during training, by constantly regulate parameter, reduce loss function, finally obtain the network model trained (now Parameter by training and change, enable to convolution operation to extract characteristics of image, then propagated forward, detects target).Number Effect according to augmentation is exactly to increase training sample so that network can learn more abstract, more commonly to target in the training process Feature.

Step 4, each region is detected with candidate frame；

The effect of candidate frame is that have individual priori to judge detection target.In specific Detection task, using the several of selection The frame of fixed dimension positions to each regional area, then merges in subsequent steps and removes redundancy, to obtain most Whole testing result.K mean cluster is then to draw the size of candidate frame according to markup information in training data.Candidate frame Number can as requested depending on, quantity is more to cause speed to decline, precision improvement.K-means can be utilized to choose candidate frame Size.

The classification and recurrence of step 5, candidate frame；

Classification and recurrence are by softmax (classification layer) and bbox regressor (frames conventional in neural network model Return layer) complete.

Step 6, non-maxima suppression；

Non-maxima suppression removes it according to given threshold, the frame of selected threshold above confidence level and the classification of probability His frame and classification, obtain positional information and classification information to the end.

Target is oriented in step 7, identification, and detection error is obtained according to label；

Detection process just with the neural network model for having trained parameter, is obtained and exported to the end, as shown in Figure 2. In training process, then by exporting and marking contrast, error successively backpropagation, with gradient descent method undated parameter so that The output of model approaches mark (reducing loss function value), completes the process of training study, as shown in Figure 1.

The present invention is that the method based on convolutional neural networks feature extraction carries out the non-rigid object of Small Target (such as row People) detection, i.e., by building convolutional neural networks detection model, import image data base and be trained Optimized model parameter, reach To good Detection results.Projected depth network first, and with certain representational infrared pedestrian's data set cvc-09, Cvc-14 (has the Small object of about 20x30 pixels, the big target pedestrian for also having closer distance), and the Small object row of oneself mark Personal data carries out the depth network learning and training of target detection, is then based on the different detection model of Hardware platform design, and add To assess.Designed network, multiple candidate frames are used with 7x7 mesh generations and for each grid, target area is completed and carries Name, realize that pedestrian target detects by Classification and Identification and position regression analysis.Neutral net is used in the design, adds detection The ability in feature extraction of model.In view of the thought of mesh generation, the real-time of detection is realized, with the number of different target yardstick According to collection and multiple dimensioned training, increase candidate frame quantity, the final detectability for strengthening Small object.

By the cascade of neutral net different layers characteristic pattern, strengthen the learning ability of network, while use multitiered network Model approach, ensure have the more network number of plies under have less parameter, facilitate training, simplify calculating, beneficial to Used under small volume, computational not high platform.

Network dense connection between layers and the combination of end-to-end technology, while have chosen with the method for mean cluster Multiple (6 candidate frames in such as pedestrian detection) candidate frames, are connected to the network end by detector or transition zone go out, according to making With one or more detector.Dense connection is as shown in Figure 5.

In Fig. 5, every layer using front layer output as input, for there is a legacy network of L layers, one shared L connects, for Multitiered network, then there is L (L+1)/2.In upper figure, H4 layers can directly use original input information X0, while before also having used Layer can so maximize the flowing of information to the information after X0 processing.In back-propagation process, X0 gradient information contains Loss function directly to X0 derivative, is advantageous to gradient propagation, because dense connection and weights are shared so that in non-rigid object In (such as pedestrian) detection, there is a network deep enough (more than 300 layers), and parameter (storage size only 60MB or so) few enough.

The nomination technology of mesh generation, data set is in combination, then good effect can be obtained above in detection, and And it is different according to specific task, can on the different transition zone of multitiered network connecting detection device, it is comprehensive under multiple dimensioned basis Close testing result.Can be also used in it is single go out connecting detection device (infrared pedestrian detection is such as connected at end) can then obtain compared with Good result, flow carries out mesh generation as Figure 6-9, to image, after by extracting feature, to the image in each region Classify with candidate frame and position, finally merge and remove redundancy, obtain detecting position and the classification information of target.

For darknet, hardware development platform employs GTX1080 and accelerates network instruction the neural network framework that the present invention uses Practice, hardware is Nvidia jetson TX1 using platform.

Installation configuration cuda and cudnn first accelerates storehouse and darknet.On this basis, script is write using python The labeled data of CVCX series is changed, is marked with lablImg and expands training sample.

By changing web results, network candidate frame is assessed, is drawn in network structure configuration using Route layers Detector, using mode augmentation training datas such as translation, rotations, suitable batch data amount and data is selected to import number to control Data are handled up in training process processed, select suitable learning rate and weights attenuation rate etc. to ensure the convergence of optimization process, and Prevent from converging to locally optimal solution, the upper repetitive exercises of GPU reduce loss function, and partial routine is as shown in Figure 10.

Finally, assess and select most suitable training weights, be transplanted to using platform.

Claims

1. a kind of non-rigid multiple dimensioned object detecting method based on convolutional neural networks, it is characterized in that：Construction has powerful spy Sign extraction and the neutral net detecting system of learning ability so that in data focusing study to non-rigid object different scale Abstract characteristics；Again by program portable to development platform, by hardware device input picture and the real-time detection of target is carried out.

2. a kind of non-rigid multiple dimensioned object detecting method based on convolutional neural networks according to claim 1, it is special Sign is：With suitable and accurate calibration data set, smaller infrared target is directed to due to lacking in neural metwork training data set Training dataset, and at this stage neutral net detection compared with Small object in terms of bottleneck be present, pass through on network find, screen The data set of non-standard configuration, write script data set mark be converted to the available data of network, on this basis, by shooting and Mark, has expanded data, has formd trainable data set.

3. a kind of non-rigid multiple dimensioned object detecting method based on convolutional neural networks according to claim 1, it is special Sign is：Model with powerful ability in feature extraction and learning ability, due to non-rigid targets in different angle posture difference It is bigger, and different scale be present in target, conventional method performance is restricted under Nonuniform field scape, and the process employs with strong Disaggregated model --- the depth network of big ability in feature extraction, the model is by intensive connection between different Internets so that each Layer characteristic use is abundant.

4. a kind of non-rigid multiple dimensioned object detecting method based on convolutional neural networks according to claim 1, it is special Sign is：Real-time detection can be realized on a hardware platform；By extracting the feature of non-rigid targets, then by neutral net Return layer and classification layer carries out target identification and location estimation to candidate frame, process completes the knowledge of target simultaneously end to end for this Not and positioning, by area grid divide and to each region implement aforesaid operations, simplify searched may target process, one Secondary property completes target and non-targeted judgement, so as to ensure that the real-time of detection, last non-maxima suppression place to go The detection output of redundancy.