CN104573731B

CN104573731B - Fast target detection method based on convolutional neural networks

Info

Publication number: CN104573731B
Application number: CN201510061852.6A
Authority: CN
Inventors: 王菡子; 郭冠军; 严严
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2015-02-06
Filing date: 2015-02-06
Publication date: 2018-03-23
Anticipated expiration: 2035-02-06
Also published as: CN104573731A

Abstract

Fast target detection method based on convolutional neural networks, is related to computer vision technique.Convolutional neural networks parameter is trained first with training set, then solving the problems, such as that maximum pondization is lost feature and generated using the mode of expander graphs differentiates complete characteristics figure；The full connection weight of convolutional neural networks is regarded as linear classifier, general magnificent error of the linear classifier on complete characteristics are differentiated is estimated using possible approximate learning framework；Linear classifier number needed for being estimated according to general magnificent error and desired extensive error threshold, finally complete target detection with mode of the linear classifier based on smoothing windows on complete characteristics figure is differentiated.Significantly improve detection efficiency and target detection precision.

Description

Fast target detection method based on convolutional neural networks

Technical field

The present invention relates to computer vision technique, is specifically related to a kind of fast target detection based on convolutional neural networks Method.

Background technology

One important sources in the human perception world are exactly to pass through image information, and research shows, the mankind obtain external information In about 80%~90% information come from human eye acquisition image information.The mankind perceive energy to extraneous image information Power is very high, can be with quickly positioning target and analysis target.Computer will have powerful visually-perceptible and understandability, just should Possesses the powerful Target detection and identification ability of the similar mankind.Target detection is one of visually-perceptible and object understanding critically important Premise work, the efficiency and precision of Target Acquisition decide the speed and effect of visually-perceptible.Once computer possesses similar The powerful target detection box perception of the mankind, it is possible to preferably substitute manual work in every profession and trade, greatly save and be produced into This.It also provides powerful support to daily intelligent Service industry in addition.Therefore, to the target detection skill in computer vision Art is furtherd investigate, and is improved constantly the accuracy rate of detection, is had important practical significance.

The development trend for solving the two problems in academic circles at present is from heuristic is used to using machine learning Method.Feature used also turns to the feature of task based access control extracted in self-adaptive from manual feature.The model of Target detection and identification Also begin to occur the functional shift for detecting and recognizing multi-target detection and identification from single specific objective while carry out.Most typically Example be exactly deep learning mould appearance, solve conventional target detection and identification model just for limited task target The problem of detection and identification mission are effective.For example in 2001, the obverse face detection framework that Viola-Jone is proposed was based on Harr features are quite effective to Face datection, but are not very good for side face face and pedestrian detection effect.Until 2005 Year, Dalal et al. proposes HOG features and using SVM to HOG (Histogram of corresponding to each smoothing windows Gradient) after the strategy that feature is classified, vertical pedestrian's Detection results just have the breakthrough of a matter.However, HOG this Manual feature, do not made for the Detection results of the targets such as the pedestrian of image classification and identification and any attitude, animal, plant People is satisfied with.Then deformation model (Deformable Part Models:DPM the target detection for solving have deformation of) arising at the historic moment is asked Topic.Although deformation model tries to solve the problems, such as because deformation causes target detection less than but deformations needed in its model Be difficult preferably to capture in part reality, the feature that reason is become reconciled with regard to the good model of neither one carrys out identification component, therefore it Effect is not very good on multi-class targets detection data set (PASCAL VOC, ImageNet).A nearest breakthrough sex work It is the appearance of deep learning model.In the image classification and target detection data set ImageNet of maximum, based on deep learning The raising for the Target detection and identification precision that one of model convolutional neural networks (CNN) are done is even more to exceed conventional one times of full accuracy As many as.Nearest 2 years ImageNet data sets classification and the outstanding algorithm almost all of detection performance use convolutional neural networks, They different network structures is different.ImageNet data sets epigraph classification at present and target detection highest precision difference For 95% and 55%.

Although very high precision is improved on Target detection and identification based on the method for convolutional neural networks, due to Convolutional neural networks network is complicated and computationally intensive, and it is not very high to apply the efficiency in target detection, and current many methods are all It is object detection program to be accelerated based on GPU.A target image is given, target detection is done using smoothing windows strategy, Even if being accelerated using GPU, its algorithm complex is still very big, extremely inefficient.In order to solve convolutional neural networks in target detection On efficiency, the solution of academic circles at present main flow can be divided into three classes.The first kind is the method cut based on figure, first right Given image carries out image segmentation, and some potential target areas are obtained by splitting block.Then with convolutional neural networks to this A little target areas carry out feature extraction and classification, finally give the position of target.The shortcomings that this method is exactly to become dependent upon figure As the quality of segmentation.Second class is to extract feature to original image by convolutional neural networks, then with smooth on characteristic pattern Window strategy does the recurrence of target location and the classification of target.This method is extracting feature using convolutional neural networks to big figure When, some can be lost to classifying and returning useful feature information, therefore the performance of last model be unable to reach it is optimal.3rd Class method is then to find part with the advantage of convolutional neural networks classification, and then builds deformation model, using deformation model Thought detects to target.But what the target detection in this classification and deformation model convolutional neural networks was performed separately Way so that the Detection results of general frame are not very outstanding, and the efficiency of this model is nor very high in addition.

The content of the invention

It is an object of the invention to propose a kind of fast target detection method based on convolutional neural networks.

The present invention comprises the following steps：

A training sample set (x) is prepared_i, y_i), i=1 ... ..., N, N is number of training, and N is natural number, x_iRepresent training The image of fixed size corresponding to sample, comprising target and target, the image full of picture is positive sample, and other images are negative samples This；y_iRepresent sample class vector：

B) m batches are divided by all training samples, the convolutional neural networks that m-2 lot samples are originally put into design are calculated with backpropagation Method is trained, and 2 lot sample sheets are used to test, and the convolutional neural networks include convolutional layer, maximum pond (max-pooling) layer drawn game Portion's contrast normalization (local contrast normalization) three composition of layer, these three compositions play the part of nonlinear function Function, it is therefore an objective to the original image in manifold is mapped to theorem in Euclid space, the activation primitive of convolution uses non-linear school Positive unit (Rectifier linear unit), can make it that the aspect ratio after convolution is sparse, design comprising these three into After the network structure model divided, model is trained with back-propagation algorithm training, obtains parameter W；

C) extraction step B) in the parameter W trained, solved by the way of expander graphs in convolutional neural networks maximum The problem of information, is lost in pond (max-pooling), gives test image, conventional maximum pond (max-pooling) is (namely Using skew A be used as starting point mode) operation after can only obtain expander graphs (A), other be advantageous to classify information can lose, Therefore for each being offset in 2 × 2 convolution kernel, a corresponding characteristic pattern, referred to as expander graphs are all preserved；For each layer Max-pooling cores size is K, and whole convolutional neural networks have p layers, each maximum pond (max-pooling) after extension Down-sampling layer feature map number extend to 2^KTimes, whole network expands to o=(2^K)^pTimes, then parameter W is used for arbitrarily large Small image to be detected, the extension feature figure before full connection is obtained, it is referred to as to differentiate complete characteristics (Discriminative Complete Features)；Given input picture x, the wave filter group K trained and skew b, the output of convolutional layer can be with Write as the form of (formula one)：

(formula one)

Wherein M_jThe index of input feature vector figure selected by expression；L represents the index of current layer；I and j represent respectively input and Output characteristic index of the picture；O represents segment index；F represent activation primitive, used here as linearity correction unit function f (x)=max (x, 0)；* convolution operation is represented；

Maximum pond (max-pooling) layer can be write as the form of (formula two)：

(formula two)

Wherein, m and n represents the pixel index of current layer respectively；S represents the size of down-sampling core；Before p and q is represented respectively One layer of pixel index, and meet p=s* (m-1)+k+1, q=s* (n-1)+k+1, wherein 0 ＜ k ＜ s；Branch is used to select Corresponding columns and rows；

In order that the feature that must be obtained has the property of Shandong nation, local contrast can be normalized layer and introduce segment processing In link, the layer can be write as the form of (formula three)：

(formula three)

Wherein, the number for the neighbouring competition characteristic pattern that r expressions are formulated, the feature map number of N expressions current layer altogether, k, α, β is some hyper parameters, can specify suitable floating-point values in the training process, after CNN model trainings are good, three more than Individual formula, it can obtain and differentiate that complete characteristics provide support to be follow-up based on smoothing windows scanning；

D the full connection of convolutional neural networks) is regarded as a linear classifier, is directly based upon and differentiates that complete characteristics figure is examined Survey, estimate that linear classifier is differentiating using possible approximate study (probably approximately correct) framework Extensive error R [ω] on complete characteristics, then according to built-up pattern (ensemble model) and desired extensive error H Training pattern number q required for [ω] is calculated；

(formula four)

E) each training pattern will be obtained in q model and performs feedforward operation in image to be detected respectively, each mould Type obtains the differentiation complete characteristics figure before o groups connect entirely, and then every group of differentiation complete characteristics figure is contracted with arest neighbors interpolation algorithm Put, obtain n*o groups and differentiate complete characteristics figure, q model amounts to n*o*q groups and differentiate complete characteristics figure, then complete in every group of differentiation Directly intensive smoothing windows classification work is performed on standby characteristic pattern with linear classifier and obtains n*o*q group response diagrams, wherein linear point Class device and differentiate complete graph dot product operations can change into convolution operation, due to use step B) in non-linear correction unit obtain To sparse features figure, convolution can accelerate sort operation using sparse Fourier transform；

F n*q group response diagrams greatly) are worth to using non-to every o groups response diagram in n*o*q, then in n*q response diagrams Every n response diagram perform non-maxima suppression obtain q groups have true yardstick response diagram, q groups response diagram perform and fortune Calculate and obtain a final response diagram with true yardstick, calculate the barycenter in each UNICOM region in final response diagram：

G) barycenter and true yardstick are mapped in former mapping to be checked, drawn pair according to each centroid position and scale-value The rectangle frame answered, complete target detection.

In step A) in, the preparation training sample set may include following sub-step：

A1 the image block to the frame that sets the goal) is extracted from training image, then zooms to fixed size size as positive sample, Each gives the image block for the frame that sets the goal to obtain N as a sample₀Individual image block, i.e. N₀Individual sample X_i, i=1 ... ... N；

A2) for negative sample image acquisition, typically around positive sample image block extract with its without very greatly it is overlapping , the image block of arbitrary size zoom to fixed dimension as negative sample X_i, the extraction of negative sample will have as far as possible to be represented Property, most of scene images are covered, extract N altogether₁Negative sample, i=1 ... ... N；

A3) N=N₀+N₁。

In step B) in, it is described to may include following sub-step using the training of m this progress of lot sample convolutional neural networks：

B1) N number of sample needed when in batches it is random break up order, be then divided into m lot sample sheets, purpose in batches is The Grad needed for neural metwork training is calculated with small quantities of sample, order is broken up and is advantageous to obtain more reasonably gradient direction；

B2) well-designed convolutional neural networks include convolutional layer, maximum pond (max-pooling) layer and local contrast Normalize (local contrast normalization) three composition of layer, the first two composition is required composition, the 3rd into Point be that non-linear correction unit is selected according to any optional composition, the activation primitive of convolution, by be combined these three into Divide the network structure that can be obtained for different target detection；

B3) filter number and feature map number, wave filter size, down-sampling core required in network structure is set Hyper parameter required for size, every layer of learning rate and local contrast normalization；

B4) during convolutional neural networks are trained, using momentum and at random training skills is thrown away；

B5 deconditioning) is determined when according to checking curve；

B6) the extracting parameter W from the model trained.

In step C) in, it is described to solve maximum pond (max- in convolutional neural networks by the way of expander graphs Pooling the problem of) losing information may include following sub-step：

C1) W extracted in step B is used for during the feedforward operation of image to be detected, met for each characteristic pattern It is new as one for each skew in the size K of down-sampling core when losing the situation of discriminant information to max-pooling Beginning offset, for the full down-sampling core that then tiles to whole image to be detected, each beginning offset obtains an expander graphs, Current down-sampling layer can generate 2^KIndividual expander graphs, the expander graphs that all characteristic patterns are obtained using same skew are referred to as one section (fragment) 2, can be generated after down-sampling^KThe characteristic pattern of section, if p layer down-sampling layers in whole network, o=can be obtained (2^K)^pThe characteristic pattern of section；

C2) the follow-up convolution operation of down-sampling to every section, it is necessary to use identical convolution operation；

C3) local contrast normalization layer needs to use identical local contrast normalization operation to every section；

C4) during treating test image and going to feedforward operation, full articulamentum preceding layer obtain (2^K)^pSection Characteristic pattern is referred to as differentiating complete characteristics figure.

In step D) in, described be directly based upon differentiates that complete characteristics figure detects (classification) and may include following sub-step：

D1) by (2 in step C^K)^pThe characteristic pattern of section all using arest neighbors interpolation algorithm n times, obtains n* (2^K)^pSection is sentenced Other complete characteristics figure；

D2) differentiate that doing convolution with it with linear classifier on complete characteristics figure obtains n* (2 at every section^K)^pResponse diagram；

D3) n* (2^K)^pResponse diagram all zoom to same size and record pantograph ratio, with non-maxima suppression algorithm Obtain the pantograph ratio of final response diagram and corresponding peak response figure.

The present invention trains convolutional neural networks parameter first with training set, is then solved most using the mode of expander graphs Great Chiization (max-pooling), which is lost the problem of feature and generated, differentiates complete characteristics figure；The full connection of convolutional neural networks Weight regards linear classifier as, estimates general China of the linear classifier on complete characteristics are differentiated using possible approximate learning framework Error；Linear classifier number needed for being estimated according to general magnificent error and desired extensive error threshold is finally complete in differentiation On characteristic pattern target detection is completed with mode of the linear classifier based on smoothing windows.

The present invention solves the problems, such as the Character losing in Max-pooling layers by the way of expander graphs, and expander graphs Move to other layers；All expander graphs before full connection are referred to as differentiating complete characteristics figure, and the weight connected entirely is regarded as One linear classifier, detection efficiency can be significantly improved by directly doing detection on complete characteristics figure is differentiated；Using possible approximate Learn (probably approximately correct) framework to estimate that linear classifier is general on complete characteristics are differentiated Change error R [ω], then the instruction according to required for calculating built-up pattern (ensemble model) and desired extensive error H [ω] Practice model number q, final prediction error is reduced according to the prediction of q model, and then improve target detection precision.

Brief description of the drawings

Fig. 1 is the detection framework schematic diagram of the embodiment of the present invention.

Fig. 2 is the schematic diagram of the expander graphs of the embodiment of the present invention.

Fig. 3 is the detection example figure of the embodiment of the present invention.

Fig. 4 is the testing result figure of the embodiment of the present invention.Wherein left frame is the inventive method testing result, and left frame is The spacious testing result for regarding the method that scientific and technological (Megvii) Co., Ltd proposes in Beijing.

Fig. 5 is the of the invention and ROC curve figure of contrast of other several object detection methods on FDDB data sets.

Wherein imaginary curve is (entitled:The method of the present invention) it is method of the invention；

Method 1 corresponds to method (B.Yang, J.Yan, the Z.Lei and of B.Yang et al. propositions S.Z.Li.Aggregate channel features for multi-view face detection..International Joint Conference on Biometrics,2014)；

Method 2 corresponds to method (H.Li, Z.Lin, J.Brandt, the X.Shen and of H.Li et al. propositions G.Hua.Efficient Boosted Exemplar-based Face Detection.CVPR 2014)；

Method 3 corresponds to method (J.Yan, Z.Lei, L.Wen the and S.Z.Li.The of J.Yan et al. propositions Fastest Deformable Part Model for Object Detection.CVPR 2014)；

Method 4 corresponds to the spacious method for regarding scientific and technological (Megvii) Co., Ltd and proposing in Beijing；

Method 5 corresponds to method (M.Mathias, R.Benenson, the M.Pedersoli of M.Mathias et al. propositions and L.Van Gool.Face detection without bells and whistles.ECCV 2014)；

Method 6 corresponds to method (X.Shen, Z.Lin, the J.Brandt and of X.Shen et al. propositions Y.Wu.Detecting and Aligning Faces by Image Retrieval.CVPR 2013)；

Method 7 corresponds to method (the J.Li and Y.Zhang.Learning SURF cascade of J.Li et al. propositions for fast and accurate object detection.CVPR 2013.)；

Method 8 corresponds to method (J.Li, T.Wang and the Y.Zhang.Face Detection of J.Li et al. propositions using SURF Cascade.ICCV 2011BeFIT workshop.)；

Method 9 corresponds to method (the P.Viola and M.Jones.Robust real-time of Viola et al. propositions object detection.In IJCV,2001)；

Method 10 correspond to A.Giusti et al. proposition method (A.Giusti, D.C.Ciresan, J.Masci, L.M.Gambardella,and J.Schmidhuber.Fast image scanning with deep max-pooling convolutional neural networks.In ICIP,2013)。

Embodiment

The method of the present invention is elaborated with reference to the accompanying drawings and examples, the present embodiment is with the technology of the present invention side Implemented under premised on case, give embodiment and specific operation process, but protection scope of the present invention be not limited to it is following Embodiment.

Referring to Fig. 1, the embodiment of the present invention comprises the following steps：

A. training sample set (x is prepared_i, y_i), i=1 ... ..., N, N are number of training, and N is natural number.x_iRepresent training The image of fixed size corresponding to sample, comprising target and target, the image full of picture is positive sample, and other images are negative samples This.y_iRepresent sample class vector：

B. divide m batches by all training samples, originally put m-2 lot samples into well-designed convolutional neural networks reversely biography Algorithm for Training is broadcast, 2 lot sample sheets are used to test.Well-designed convolutional neural networks include convolutional layer, maximum pond (max- Pooling) layer and local contrast normalization (local contrast normalization) three composition of layer.These three compositions The function of performer's nonlinear function, it is therefore an objective to which the original image in manifold is mapped to theorem in Euclid space.The activation of convolution Function uses non-linear correction unit (Rectifier linear unit), can make it that the aspect ratio after convolution is sparse.If After counting the network structure model comprising these three compositions, model is trained with back-propagation algorithm training, obtains parameter W。

C. the parameter W trained in extraction step B, solves max- in convolutional neural networks by the way of expander graphs Pooling loses the problem of information.Given test image, conventional maximum pond (max-pooling) is (namely using skew A Mode as starting point) operation after can only obtain expander graphs (A), other be advantageous to classification information can lose.Therefore it is directed to Each offset in 2*2 convolution kernel, all preserve a corresponding characteristic pattern, referred to as expander graphs, as shown in Fig. 2 (A), (B), (C), (D) is different expander graphs corresponding to different skews.

It is K for each layer of maximum pond (max-pooling) core size, whole convolutional neural networks have p layers, extend it Each max-pooling down-sampling layer feature map number extends to 2 afterwards^KTimes, whole network expands to o=(2^K)^pTimes, then Parameter W is used for image to be detected of arbitrary size, obtains the extension feature figure before full connection, it is referred to as to differentiate complete characteristics (Discriminative Complete Features), as shown in figure 3, (a) and (c) is referred to as differentiating complete characteristics in Fig. 3, its In (c) (a) obtained by arest neighbors difference arithmetic.

Given input picture x, the wave filter group K trained and skew b, the output of convolutional layer can be write as (formula one) Form：

(formula one)

Wherein M_jThe index of input feature vector figure selected by expression；L represents the index of current layer；I and j represent respectively input and Output characteristic index of the picture；O represents segment index；F represent activation primitive, used here as linearity correction unit function f (x)=max (x, 0)；* convolution operation is represented.

Maximum pond (max-pooling) layer can be write as the form of (formula two)：

(formula two)

Wherein, m and n represents the pixel index of current layer respectively；S represents the size of down-sampling core；Before p and q is represented respectively One layer of pixel index, and meet p=s* (m-1)+k+1, q=s* (n-1)+k+1, wherein 0 ＜ k ＜ s；Branch is used to select Corresponding columns and rows.

(formula three)

Wherein, the number for the neighbouring competition characteristic pattern that r expressions are formulated, the feature map number of N expressions current layer altogether, k, α, β is some hyper parameters, can specify suitable floating-point values in the training process.After CNN model trainings are good, 3 more than Individual formula, it can obtain and differentiate that complete characteristics provide support to be follow-up based on smoothing windows scanning.

D. the full connection of convolutional neural networks is regarded as a linear classifier, be directly based upon differentiate complete characteristics figure on and It is not that former detection figure does detection and can greatly improve detection speed.Using possible approximate study (probably Approximately correct) framework come estimate linear classifier differentiate complete characteristics on extensive error R [ω], so The training pattern number q according to required for calculating built-up pattern (ensemble model) and desired extensive error H [ω] afterwards.

(formula four)

E. each training pattern will be obtained in q model and performs feedforward operation in image to be detected respectively, each mould Type obtains the differentiation complete characteristics figure before o groups connect entirely, and then every group of differentiation complete characteristics figure is contracted with arest neighbors interpolation algorithm Put, obtain n*o groups and differentiate complete characteristics figure, q model amounts to n*o*q groups and differentiate complete characteristics figure.Then it is complete in every group of differentiation Directly intensive smoothing windows classification work, which is performed, on standby characteristic pattern with linear classifier obtains n*o*q group response diagrams.Wherein linear point The dot product operations of class device and differentiation complete graph can change into convolution operation, due to being obtained using the non-linear correction unit in step B To sparse features figure, convolution can accelerate sort operation using sparse Fourier transform.As shown in figure 3, (b) in Fig. 3 and (d) it is respectively that linear classifier obtains in differentiation complete characteristics (a) and (c) convolution, is obtained wherein (e) is (d) scaling.

F. n*q group response diagrams greatly are worth to using non-to every o groups response diagram in n*o*q, then in n*q response diagrams Every n response diagram perform non-maxima suppression obtain q groups have true yardstick response diagram, q groups response diagram perform and fortune Calculate and obtain a final response diagram with true yardstick, calculate the barycenter in each UNICOM region in final response diagram.

G. barycenter and true yardstick are mapped in former mapping to be checked, drawn pair according to each centroid position and scale-value The rectangle frame answered, target detection is completed, Fig. 4 is testing result.

The present invention and other based on convolutional neural networks the time required to target detection on comparative result be shown in Table 1.

Table 1

Method	CPU time (s)	GPU time (s)	Total time (s)
				Method 11	2.3	25.08	28.1
Method 12	43.2	0	43.2
				Method 13	2.3	0.25	2.55
The method of the present invention	1.3	0	1.3

In table 1, method 11 be Fabian et al. propose method (Fabian Nasse, Christian Thurau, and Gernot A.Fink,“Face detection using gpu-based convolutional neural networks,”in CAIP,2009,pp.83–90)；

Method 12 be A.Giusti et al. propose method (A.Giusti, D.C.Ciresan, J.Masci, L.M.Gambardella,and J.Schmidhuber.Fast image scanning with deep max-pooling convolutional neural networks.In ICIP,2013)；

Method 13 is method (K.He, X.Zhang, S.Ren, the and J.Sun.Spatial that K.He et al. is proposed pyramid pooling in deep convolutional networks for visual recognition.In ECCV,2014)。

The present invention directly does and classified that (multiple dimensioned differentiation complete characteristics figure is by most on multiple dimensioned differentiation complete characteristics figure Neighbour's interpolation algorithm scales to obtain), the weight vectors that linear classifier is changed into change into nuclear matrix, and linear classification can pass through core Matrix and multiple dimensioned differentiation complete characteristics picture scroll product are completed.Due to differentiating that complete characteristics are sparse, convolution speed can pass through Sparse Fourier transform obtains.Further, since directly detected on complete characteristics figure is differentiated, rather than in artwork, target inspection Degree of testing the speed greatly improves.

Claims

1. the fast target detection method based on convolutional neural networks, it is characterised in that comprise the following steps：

A training sample set (x) is prepared_i, y_i), i=1 ... ..., N, N is number of training, and N is natural number, x_iRepresent training sample The image of corresponding fixed size, comprising target and target, the image full of picture is positive sample, and other images are negative samples；y_i Represent sample class vector：

B) m batches are divided by all training samples, the convolutional neural networks that m-2 lot samples are originally put into design are instructed with back-propagation algorithm Practice, 2 lot sample sheets are used to test, and the convolutional neural networks include convolutional layer, maximum pond layer and local contrast normalization layer three Individual composition, these three compositions play the part of the function of nonlinear function, it is therefore an objective to the original image in manifold are mapped to European Space, the activation primitive of convolution uses non-linear correction unit, after designing the network structure model comprising these three compositions, uses Back-propagation algorithm training is trained to model, obtains parameter W；

C) extraction step B) in the parameter W trained, solve maximum pond in convolutional neural networks by the way of expander graphs The problem of losing information, test image is given, for each being offset in 2 × 2 convolution kernel, all preserve a corresponding feature Figure, referred to as expander graphs；It is K for each layer of max-pooling cores size, whole convolutional neural networks have p layers, every after extension The down-sampling layer feature map number in individual maximum pond extends to 2^KTimes, whole network expands to o=(2^K)^pTimes, then parameter W For image to be detected of arbitrary size, the extension feature figure before full connection is obtained, it is referred to as to differentiate complete characteristics；Given input Image x, the wave filter group K trained and skew b, the output of convolutional layer are write as the form of formula one：

Wherein M_jThe index of input feature vector figure selected by expression；L represents the index of current layer；I and j represents to input and export respectively Feature index of the picture；O represents segment index；F represents activation primitive, used here as linearity correction unit function f (x)=max (x, 0)；* Represent convolution operation；

Maximum pond layer is write as the form of formula two：

Wherein, m and n represents the pixel index of current layer respectively；S represents the size of down-sampling core；P and q represent preceding layer respectively Pixel index, and meet p=s* (m-1)+k+1, q=s* (n-1)+k+1, wherein 0 ＜ k ＜ s；Branch is used to select correspondingly Columns and rows；

In order that the feature that must be obtained has the property of Shandong nation, local contrast normalization layer is introduced into the link of segment processing, This layer is write as the form of formula three：

Wherein, r represents the number for the neighbouring competition characteristic pattern formulated, and N represents the feature map number of current layer altogether, k, α, and β is Some hyper parameters, suitable floating-point values are specified in the training process, after CNN model trainings are good, according to three above formula, Obtain and differentiate that complete characteristics provide support to be follow-up based on smoothing windows scanning；

D the full connection of convolutional neural networks) is regarded as a linear classifier, is directly based upon and differentiates that complete characteristics figure detects, Extensive error R [ω] of the linear classifier on complete characteristics are differentiated is estimated using possible approximate learning framework, then basis Training pattern number q required for built-up pattern and desired extensive error H [ω] calculate；

E) each training pattern will be obtained in q model and performs feedforward operation in image to be detected respectively, each model obtains The differentiation complete characteristics figure before the full connection of o groups is taken, then every group of differentiation complete characteristics figure is scaled with arest neighbors interpolation algorithm, obtained Differentiate complete characteristics figure to n*o groups, q model amounts to n*o*q groups and differentiate complete characteristics figure, then in every group of differentiation complete characteristics Directly perform the classification work of intensive smoothing windows with linear classifier on figure and obtain n*o*q group response diagrams, wherein linear classifier with Differentiate that the dot product operations of complete graph change into convolution operation；

F n*q group response diagrams greatly) are worth to using non-to every o groups response diagram in n*o*q, then to every in n*q response diagrams N response diagram performs non-maxima suppression and obtains response diagram of the q groups with true yardstick, and q groups response diagram is performed and obtained with computing The final response diagram that there is true yardstick to one, calculate the barycenter in each UNICOM region in final response diagram：

G) barycenter and true yardstick are mapped in former mapping to be checked, according to corresponding to being drawn each centroid position and scale-value Rectangle frame, complete target detection.

2. the fast target detection method based on convolutional neural networks as claimed in claim 1, it is characterised in that in step A) in, The preparation training sample set includes following sub-step：

A1 the image block to the frame that sets the goal) is extracted from training image, then zooms to fixed size size as positive sample, it is each The individual image block to the frame that sets the goal obtains N as a sample₀Individual image block, i.e. N₀Individual sample X_i, i=1 ... ... N₀；

A2) for negative sample image acquisition, around positive sample image block extract with its without very greatly it is overlapping, arbitrarily large Small image block zooms to fixed dimension as negative sample X_i’；The extraction of negative sample is near possible representative, to cover Most of scene images；N is extracted altogether₁Negative sample, i=1 ... ... N₁；

A3) N=N₀+N₁。

3. the fast target detection method based on convolutional neural networks as claimed in claim 1, it is characterised in that in step B) in, The utilizationThe training of this progress of lot sample convolutional neural networks includes following sub-step：

B1) N number of sample needed when in batches it is random break up order, be then divided intoLot sample sheet；

B2) convolutional neural networks include convolutional layer, maximum pond layer and local contrast normalization three composition of layer, the first two composition It is required composition, the 3rd composition is to select non-linear correction unit according to any optional composition, the activation primitive of convolution, is led to Cross and be combined the network structure that these three compositions obtain being directed to different target detection；

B3) set network structure in required for filter number and feature map number, wave filter size, down-sampling core it is big Hyper parameter required for small, every layer of learning rate and local contrast normalization；

B5 deconditioning) is determined when according to checking curve；

B6) the extracting parameter W from the model trained.

4. the fast target detection method based on convolutional neural networks as claimed in claim 1, it is characterised in that in step C) in, It is described to solve the problems, such as that maximum pondization loss information includes following sub-step in convolutional neural networks by the way of expander graphs：

C1) step B) in extraction W be used for image to be detected feedforward operation during, run into for each characteristic pattern When max-pooling loses the situation of discriminant information, new as one for each skew in the size K of down-sampling core Beginning offset, for the full down-sampling core that then tiles to whole image to be detected, each beginning offset obtains an expander graphs, when Front lower sample level can generate 2^KIndividual expander graphs, all characteristic patterns are referred to as one section using the same obtained expander graphs of offseting, under adopt 2 can be generated after sample^KThe characteristic pattern of section, if there is z layer down-sampling layers in whole network, obtains o=(2^K)^zThe characteristic pattern of section；

C4) during treating test image and going to feedforward operation, full articulamentum preceding layer obtain (2^K)^zThe feature of section Figure is referred to as differentiating complete characteristics figure.

5. the fast target detection method based on convolutional neural networks as claimed in claim 1, it is characterised in that in step D) in, Described be directly based upon differentiates that complete characteristics figure does detection and includes following sub-step：

D1) by step C) in (2^K)^zThe characteristic pattern of section all using arest neighbors interpolation algorithm n times, obtains n* (2^K)^zThe differentiation of section is complete Standby characteristic pattern；

D2) differentiate that doing convolution with it with linear classifier on complete characteristics figure obtains n* (2 at every section^K)^zResponse diagram；

D3) n* (2^K)^zResponse diagram all zoom to same size and record pantograph ratio, obtained with non-maxima suppression algorithm The pantograph ratio of final response diagram and corresponding peak response figure.