CN104573731B - Fast target detection method based on convolutional neural networks - Google Patents

Fast target detection method based on convolutional neural networks Download PDF

Info

Publication number
CN104573731B
CN104573731B CN201510061852.6A CN201510061852A CN104573731B CN 104573731 B CN104573731 B CN 104573731B CN 201510061852 A CN201510061852 A CN 201510061852A CN 104573731 B CN104573731 B CN 104573731B
Authority
CN
China
Prior art keywords
neural networks
convolutional neural
layer
sample
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201510061852.6A
Other languages
Chinese (zh)
Other versions
CN104573731A (en
Inventor
王菡子
郭冠军
严严
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN201510061852.6A priority Critical patent/CN104573731B/en
Publication of CN104573731A publication Critical patent/CN104573731A/en
Application granted granted Critical
Publication of CN104573731B publication Critical patent/CN104573731B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

Fast target detection method based on convolutional neural networks, is related to computer vision technique.Convolutional neural networks parameter is trained first with training set, then solving the problems, such as that maximum pondization is lost feature and generated using the mode of expander graphs differentiates complete characteristics figure;The full connection weight of convolutional neural networks is regarded as linear classifier, general magnificent error of the linear classifier on complete characteristics are differentiated is estimated using possible approximate learning framework;Linear classifier number needed for being estimated according to general magnificent error and desired extensive error threshold, finally complete target detection with mode of the linear classifier based on smoothing windows on complete characteristics figure is differentiated.Significantly improve detection efficiency and target detection precision.

Description

Fast target detection method based on convolutional neural networks
Technical field
The present invention relates to computer vision technique, is specifically related to a kind of fast target detection based on convolutional neural networks Method.
Background technology
One important sources in the human perception world are exactly to pass through image information, and research shows, the mankind obtain external information In about 80%~90% information come from human eye acquisition image information.The mankind perceive energy to extraneous image information Power is very high, can be with quickly positioning target and analysis target.Computer will have powerful visually-perceptible and understandability, just should Possesses the powerful Target detection and identification ability of the similar mankind.Target detection is one of visually-perceptible and object understanding critically important Premise work, the efficiency and precision of Target Acquisition decide the speed and effect of visually-perceptible.Once computer possesses similar The powerful target detection box perception of the mankind, it is possible to preferably substitute manual work in every profession and trade, greatly save and be produced into This.It also provides powerful support to daily intelligent Service industry in addition.Therefore, to the target detection skill in computer vision Art is furtherd investigate, and is improved constantly the accuracy rate of detection, is had important practical significance.
The development trend for solving the two problems in academic circles at present is from heuristic is used to using machine learning Method.Feature used also turns to the feature of task based access control extracted in self-adaptive from manual feature.The model of Target detection and identification Also begin to occur the functional shift for detecting and recognizing multi-target detection and identification from single specific objective while carry out.Most typically Example be exactly deep learning mould appearance, solve conventional target detection and identification model just for limited task target The problem of detection and identification mission are effective.For example in 2001, the obverse face detection framework that Viola-Jone is proposed was based on Harr features are quite effective to Face datection, but are not very good for side face face and pedestrian detection effect.Until 2005 Year, Dalal et al. proposes HOG features and using SVM to HOG (Histogram of corresponding to each smoothing windows Gradient) after the strategy that feature is classified, vertical pedestrian's Detection results just have the breakthrough of a matter.However, HOG this Manual feature, do not made for the Detection results of the targets such as the pedestrian of image classification and identification and any attitude, animal, plant People is satisfied with.Then deformation model (Deformable Part Models:DPM the target detection for solving have deformation of) arising at the historic moment is asked Topic.Although deformation model tries to solve the problems, such as because deformation causes target detection less than but deformations needed in its model Be difficult preferably to capture in part reality, the feature that reason is become reconciled with regard to the good model of neither one carrys out identification component, therefore it Effect is not very good on multi-class targets detection data set (PASCAL VOC, ImageNet).A nearest breakthrough sex work It is the appearance of deep learning model.In the image classification and target detection data set ImageNet of maximum, based on deep learning The raising for the Target detection and identification precision that one of model convolutional neural networks (CNN) are done is even more to exceed conventional one times of full accuracy As many as.Nearest 2 years ImageNet data sets classification and the outstanding algorithm almost all of detection performance use convolutional neural networks, They different network structures is different.ImageNet data sets epigraph classification at present and target detection highest precision difference For 95% and 55%.
Although very high precision is improved on Target detection and identification based on the method for convolutional neural networks, due to Convolutional neural networks network is complicated and computationally intensive, and it is not very high to apply the efficiency in target detection, and current many methods are all It is object detection program to be accelerated based on GPU.A target image is given, target detection is done using smoothing windows strategy, Even if being accelerated using GPU, its algorithm complex is still very big, extremely inefficient.In order to solve convolutional neural networks in target detection On efficiency, the solution of academic circles at present main flow can be divided into three classes.The first kind is the method cut based on figure, first right Given image carries out image segmentation, and some potential target areas are obtained by splitting block.Then with convolutional neural networks to this A little target areas carry out feature extraction and classification, finally give the position of target.The shortcomings that this method is exactly to become dependent upon figure As the quality of segmentation.Second class is to extract feature to original image by convolutional neural networks, then with smooth on characteristic pattern Window strategy does the recurrence of target location and the classification of target.This method is extracting feature using convolutional neural networks to big figure When, some can be lost to classifying and returning useful feature information, therefore the performance of last model be unable to reach it is optimal.3rd Class method is then to find part with the advantage of convolutional neural networks classification, and then builds deformation model, using deformation model Thought detects to target.But what the target detection in this classification and deformation model convolutional neural networks was performed separately Way so that the Detection results of general frame are not very outstanding, and the efficiency of this model is nor very high in addition.
The content of the invention
It is an object of the invention to propose a kind of fast target detection method based on convolutional neural networks.
The present invention comprises the following steps:
A training sample set (x) is preparedi, yi), i=1 ... ..., N, N is number of training, and N is natural number, xiRepresent training The image of fixed size corresponding to sample, comprising target and target, the image full of picture is positive sample, and other images are negative samples This;yiRepresent sample class vector:
B) m batches are divided by all training samples, the convolutional neural networks that m-2 lot samples are originally put into design are calculated with backpropagation Method is trained, and 2 lot sample sheets are used to test, and the convolutional neural networks include convolutional layer, maximum pond (max-pooling) layer drawn game Portion's contrast normalization (local contrast normalization) three composition of layer, these three compositions play the part of nonlinear function Function, it is therefore an objective to the original image in manifold is mapped to theorem in Euclid space, the activation primitive of convolution uses non-linear school Positive unit (Rectifier linear unit), can make it that the aspect ratio after convolution is sparse, design comprising these three into After the network structure model divided, model is trained with back-propagation algorithm training, obtains parameter W;
C) extraction step B) in the parameter W trained, solved by the way of expander graphs in convolutional neural networks maximum The problem of information, is lost in pond (max-pooling), gives test image, conventional maximum pond (max-pooling) is (namely Using skew A be used as starting point mode) operation after can only obtain expander graphs (A), other be advantageous to classify information can lose, Therefore for each being offset in 2 × 2 convolution kernel, a corresponding characteristic pattern, referred to as expander graphs are all preserved;For each layer Max-pooling cores size is K, and whole convolutional neural networks have p layers, each maximum pond (max-pooling) after extension Down-sampling layer feature map number extend to 2KTimes, whole network expands to o=(2K)pTimes, then parameter W is used for arbitrarily large Small image to be detected, the extension feature figure before full connection is obtained, it is referred to as to differentiate complete characteristics (Discriminative Complete Features);Given input picture x, the wave filter group K trained and skew b, the output of convolutional layer can be with Write as the form of (formula one):
(formula one)
Wherein MjThe index of input feature vector figure selected by expression;L represents the index of current layer;I and j represent respectively input and Output characteristic index of the picture;O represents segment index;F represent activation primitive, used here as linearity correction unit function f (x)=max (x, 0);* convolution operation is represented;
Maximum pond (max-pooling) layer can be write as the form of (formula two):
(formula two)
Wherein, m and n represents the pixel index of current layer respectively;S represents the size of down-sampling core;Before p and q is represented respectively One layer of pixel index, and meet p=s* (m-1)+k+1, q=s* (n-1)+k+1, wherein 0 < k < s;Branch is used to select Corresponding columns and rows;
In order that the feature that must be obtained has the property of Shandong nation, local contrast can be normalized layer and introduce segment processing In link, the layer can be write as the form of (formula three):
(formula three)
Wherein, the number for the neighbouring competition characteristic pattern that r expressions are formulated, the feature map number of N expressions current layer altogether, k, α, β is some hyper parameters, can specify suitable floating-point values in the training process, after CNN model trainings are good, three more than Individual formula, it can obtain and differentiate that complete characteristics provide support to be follow-up based on smoothing windows scanning;
D the full connection of convolutional neural networks) is regarded as a linear classifier, is directly based upon and differentiates that complete characteristics figure is examined Survey, estimate that linear classifier is differentiating using possible approximate study (probably approximately correct) framework Extensive error R [ω] on complete characteristics, then according to built-up pattern (ensemble model) and desired extensive error H Training pattern number q required for [ω] is calculated;
(formula four)
E) each training pattern will be obtained in q model and performs feedforward operation in image to be detected respectively, each mould Type obtains the differentiation complete characteristics figure before o groups connect entirely, and then every group of differentiation complete characteristics figure is contracted with arest neighbors interpolation algorithm Put, obtain n*o groups and differentiate complete characteristics figure, q model amounts to n*o*q groups and differentiate complete characteristics figure, then complete in every group of differentiation Directly intensive smoothing windows classification work is performed on standby characteristic pattern with linear classifier and obtains n*o*q group response diagrams, wherein linear point Class device and differentiate complete graph dot product operations can change into convolution operation, due to use step B) in non-linear correction unit obtain To sparse features figure, convolution can accelerate sort operation using sparse Fourier transform;
F n*q group response diagrams greatly) are worth to using non-to every o groups response diagram in n*o*q, then in n*q response diagrams Every n response diagram perform non-maxima suppression obtain q groups have true yardstick response diagram, q groups response diagram perform and fortune Calculate and obtain a final response diagram with true yardstick, calculate the barycenter in each UNICOM region in final response diagram:
G) barycenter and true yardstick are mapped in former mapping to be checked, drawn pair according to each centroid position and scale-value The rectangle frame answered, complete target detection.
In step A) in, the preparation training sample set may include following sub-step:
A1 the image block to the frame that sets the goal) is extracted from training image, then zooms to fixed size size as positive sample, Each gives the image block for the frame that sets the goal to obtain N as a sample0Individual image block, i.e. N0Individual sample Xi, i=1 ... ... N;
A2) for negative sample image acquisition, typically around positive sample image block extract with its without very greatly it is overlapping , the image block of arbitrary size zoom to fixed dimension as negative sample Xi, the extraction of negative sample will have as far as possible to be represented Property, most of scene images are covered, extract N altogether1Negative sample, i=1 ... ... N;
A3) N=N0+N1
In step B) in, it is described to may include following sub-step using the training of m this progress of lot sample convolutional neural networks:
B1) N number of sample needed when in batches it is random break up order, be then divided into m lot sample sheets, purpose in batches is The Grad needed for neural metwork training is calculated with small quantities of sample, order is broken up and is advantageous to obtain more reasonably gradient direction;
B2) well-designed convolutional neural networks include convolutional layer, maximum pond (max-pooling) layer and local contrast Normalize (local contrast normalization) three composition of layer, the first two composition is required composition, the 3rd into Point be that non-linear correction unit is selected according to any optional composition, the activation primitive of convolution, by be combined these three into Divide the network structure that can be obtained for different target detection;
B3) filter number and feature map number, wave filter size, down-sampling core required in network structure is set Hyper parameter required for size, every layer of learning rate and local contrast normalization;
B4) during convolutional neural networks are trained, using momentum and at random training skills is thrown away;
B5 deconditioning) is determined when according to checking curve;
B6) the extracting parameter W from the model trained.
In step C) in, it is described to solve maximum pond (max- in convolutional neural networks by the way of expander graphs Pooling the problem of) losing information may include following sub-step:
C1) W extracted in step B is used for during the feedforward operation of image to be detected, met for each characteristic pattern It is new as one for each skew in the size K of down-sampling core when losing the situation of discriminant information to max-pooling Beginning offset, for the full down-sampling core that then tiles to whole image to be detected, each beginning offset obtains an expander graphs, Current down-sampling layer can generate 2KIndividual expander graphs, the expander graphs that all characteristic patterns are obtained using same skew are referred to as one section (fragment) 2, can be generated after down-samplingKThe characteristic pattern of section, if p layer down-sampling layers in whole network, o=can be obtained (2K)pThe characteristic pattern of section;
C2) the follow-up convolution operation of down-sampling to every section, it is necessary to use identical convolution operation;
C3) local contrast normalization layer needs to use identical local contrast normalization operation to every section;
C4) during treating test image and going to feedforward operation, full articulamentum preceding layer obtain (2K)pSection Characteristic pattern is referred to as differentiating complete characteristics figure.
In step D) in, described be directly based upon differentiates that complete characteristics figure detects (classification) and may include following sub-step:
D1) by (2 in step CK)pThe characteristic pattern of section all using arest neighbors interpolation algorithm n times, obtains n* (2K)pSection is sentenced Other complete characteristics figure;
D2) differentiate that doing convolution with it with linear classifier on complete characteristics figure obtains n* (2 at every sectionK)pResponse diagram;
D3) n* (2K)pResponse diagram all zoom to same size and record pantograph ratio, with non-maxima suppression algorithm Obtain the pantograph ratio of final response diagram and corresponding peak response figure.
The present invention trains convolutional neural networks parameter first with training set, is then solved most using the mode of expander graphs Great Chiization (max-pooling), which is lost the problem of feature and generated, differentiates complete characteristics figure;The full connection of convolutional neural networks Weight regards linear classifier as, estimates general China of the linear classifier on complete characteristics are differentiated using possible approximate learning framework Error;Linear classifier number needed for being estimated according to general magnificent error and desired extensive error threshold is finally complete in differentiation On characteristic pattern target detection is completed with mode of the linear classifier based on smoothing windows.
The present invention solves the problems, such as the Character losing in Max-pooling layers by the way of expander graphs, and expander graphs Move to other layers;All expander graphs before full connection are referred to as differentiating complete characteristics figure, and the weight connected entirely is regarded as One linear classifier, detection efficiency can be significantly improved by directly doing detection on complete characteristics figure is differentiated;Using possible approximate Learn (probably approximately correct) framework to estimate that linear classifier is general on complete characteristics are differentiated Change error R [ω], then the instruction according to required for calculating built-up pattern (ensemble model) and desired extensive error H [ω] Practice model number q, final prediction error is reduced according to the prediction of q model, and then improve target detection precision.
Brief description of the drawings
Fig. 1 is the detection framework schematic diagram of the embodiment of the present invention.
Fig. 2 is the schematic diagram of the expander graphs of the embodiment of the present invention.
Fig. 3 is the detection example figure of the embodiment of the present invention.
Fig. 4 is the testing result figure of the embodiment of the present invention.Wherein left frame is the inventive method testing result, and left frame is The spacious testing result for regarding the method that scientific and technological (Megvii) Co., Ltd proposes in Beijing.
Fig. 5 is the of the invention and ROC curve figure of contrast of other several object detection methods on FDDB data sets.
Wherein imaginary curve is (entitled:The method of the present invention) it is method of the invention;
Method 1 corresponds to method (B.Yang, J.Yan, the Z.Lei and of B.Yang et al. propositions S.Z.Li.Aggregate channel features for multi-view face detection..International Joint Conference on Biometrics,2014);
Method 2 corresponds to method (H.Li, Z.Lin, J.Brandt, the X.Shen and of H.Li et al. propositions G.Hua.Efficient Boosted Exemplar-based Face Detection.CVPR 2014);
Method 3 corresponds to method (J.Yan, Z.Lei, L.Wen the and S.Z.Li.The of J.Yan et al. propositions Fastest Deformable Part Model for Object Detection.CVPR 2014);
Method 4 corresponds to the spacious method for regarding scientific and technological (Megvii) Co., Ltd and proposing in Beijing;
Method 5 corresponds to method (M.Mathias, R.Benenson, the M.Pedersoli of M.Mathias et al. propositions and L.Van Gool.Face detection without bells and whistles.ECCV 2014);
Method 6 corresponds to method (X.Shen, Z.Lin, the J.Brandt and of X.Shen et al. propositions Y.Wu.Detecting and Aligning Faces by Image Retrieval.CVPR 2013);
Method 7 corresponds to method (the J.Li and Y.Zhang.Learning SURF cascade of J.Li et al. propositions for fast and accurate object detection.CVPR 2013.);
Method 8 corresponds to method (J.Li, T.Wang and the Y.Zhang.Face Detection of J.Li et al. propositions using SURF Cascade.ICCV 2011BeFIT workshop.);
Method 9 corresponds to method (the P.Viola and M.Jones.Robust real-time of Viola et al. propositions object detection.In IJCV,2001);
Method 10 correspond to A.Giusti et al. proposition method (A.Giusti, D.C.Ciresan, J.Masci, L.M.Gambardella,and J.Schmidhuber.Fast image scanning with deep max-pooling convolutional neural networks.In ICIP,2013)。
Embodiment
The method of the present invention is elaborated with reference to the accompanying drawings and examples, the present embodiment is with the technology of the present invention side Implemented under premised on case, give embodiment and specific operation process, but protection scope of the present invention be not limited to it is following Embodiment.
Referring to Fig. 1, the embodiment of the present invention comprises the following steps:
A. training sample set (x is preparedi, yi), i=1 ... ..., N, N are number of training, and N is natural number.xiRepresent training The image of fixed size corresponding to sample, comprising target and target, the image full of picture is positive sample, and other images are negative samples This.yiRepresent sample class vector:
B. divide m batches by all training samples, originally put m-2 lot samples into well-designed convolutional neural networks reversely biography Algorithm for Training is broadcast, 2 lot sample sheets are used to test.Well-designed convolutional neural networks include convolutional layer, maximum pond (max- Pooling) layer and local contrast normalization (local contrast normalization) three composition of layer.These three compositions The function of performer's nonlinear function, it is therefore an objective to which the original image in manifold is mapped to theorem in Euclid space.The activation of convolution Function uses non-linear correction unit (Rectifier linear unit), can make it that the aspect ratio after convolution is sparse.If After counting the network structure model comprising these three compositions, model is trained with back-propagation algorithm training, obtains parameter W。
C. the parameter W trained in extraction step B, solves max- in convolutional neural networks by the way of expander graphs Pooling loses the problem of information.Given test image, conventional maximum pond (max-pooling) is (namely using skew A Mode as starting point) operation after can only obtain expander graphs (A), other be advantageous to classification information can lose.Therefore it is directed to Each offset in 2*2 convolution kernel, all preserve a corresponding characteristic pattern, referred to as expander graphs, as shown in Fig. 2 (A), (B), (C), (D) is different expander graphs corresponding to different skews.
It is K for each layer of maximum pond (max-pooling) core size, whole convolutional neural networks have p layers, extend it Each max-pooling down-sampling layer feature map number extends to 2 afterwardsKTimes, whole network expands to o=(2K)pTimes, then Parameter W is used for image to be detected of arbitrary size, obtains the extension feature figure before full connection, it is referred to as to differentiate complete characteristics (Discriminative Complete Features), as shown in figure 3, (a) and (c) is referred to as differentiating complete characteristics in Fig. 3, its In (c) (a) obtained by arest neighbors difference arithmetic.
Given input picture x, the wave filter group K trained and skew b, the output of convolutional layer can be write as (formula one) Form:
(formula one)
Wherein MjThe index of input feature vector figure selected by expression;L represents the index of current layer;I and j represent respectively input and Output characteristic index of the picture;O represents segment index;F represent activation primitive, used here as linearity correction unit function f (x)=max (x, 0);* convolution operation is represented.
Maximum pond (max-pooling) layer can be write as the form of (formula two):
(formula two)
Wherein, m and n represents the pixel index of current layer respectively;S represents the size of down-sampling core;Before p and q is represented respectively One layer of pixel index, and meet p=s* (m-1)+k+1, q=s* (n-1)+k+1, wherein 0 < k < s;Branch is used to select Corresponding columns and rows.
In order that the feature that must be obtained has the property of Shandong nation, local contrast can be normalized layer and introduce segment processing In link, the layer can be write as the form of (formula three):
(formula three)
Wherein, the number for the neighbouring competition characteristic pattern that r expressions are formulated, the feature map number of N expressions current layer altogether, k, α, β is some hyper parameters, can specify suitable floating-point values in the training process.After CNN model trainings are good, 3 more than Individual formula, it can obtain and differentiate that complete characteristics provide support to be follow-up based on smoothing windows scanning.
D. the full connection of convolutional neural networks is regarded as a linear classifier, be directly based upon differentiate complete characteristics figure on and It is not that former detection figure does detection and can greatly improve detection speed.Using possible approximate study (probably Approximately correct) framework come estimate linear classifier differentiate complete characteristics on extensive error R [ω], so The training pattern number q according to required for calculating built-up pattern (ensemble model) and desired extensive error H [ω] afterwards.
(formula four)
E. each training pattern will be obtained in q model and performs feedforward operation in image to be detected respectively, each mould Type obtains the differentiation complete characteristics figure before o groups connect entirely, and then every group of differentiation complete characteristics figure is contracted with arest neighbors interpolation algorithm Put, obtain n*o groups and differentiate complete characteristics figure, q model amounts to n*o*q groups and differentiate complete characteristics figure.Then it is complete in every group of differentiation Directly intensive smoothing windows classification work, which is performed, on standby characteristic pattern with linear classifier obtains n*o*q group response diagrams.Wherein linear point The dot product operations of class device and differentiation complete graph can change into convolution operation, due to being obtained using the non-linear correction unit in step B To sparse features figure, convolution can accelerate sort operation using sparse Fourier transform.As shown in figure 3, (b) in Fig. 3 and (d) it is respectively that linear classifier obtains in differentiation complete characteristics (a) and (c) convolution, is obtained wherein (e) is (d) scaling.
F. n*q group response diagrams greatly are worth to using non-to every o groups response diagram in n*o*q, then in n*q response diagrams Every n response diagram perform non-maxima suppression obtain q groups have true yardstick response diagram, q groups response diagram perform and fortune Calculate and obtain a final response diagram with true yardstick, calculate the barycenter in each UNICOM region in final response diagram.
G. barycenter and true yardstick are mapped in former mapping to be checked, drawn pair according to each centroid position and scale-value The rectangle frame answered, target detection is completed, Fig. 4 is testing result.
The present invention and other based on convolutional neural networks the time required to target detection on comparative result be shown in Table 1.
Table 1
Method CPU time (s) GPU time (s) Total time (s)
Method 11 2.3 25.08 28.1
Method 12 43.2 0 43.2
Method 13 2.3 0.25 2.55
The method of the present invention 1.3 0 1.3
In table 1, method 11 be Fabian et al. propose method (Fabian Nasse, Christian Thurau, and Gernot A.Fink,“Face detection using gpu-based convolutional neural networks,”in CAIP,2009,pp.83–90);
Method 12 be A.Giusti et al. propose method (A.Giusti, D.C.Ciresan, J.Masci, L.M.Gambardella,and J.Schmidhuber.Fast image scanning with deep max-pooling convolutional neural networks.In ICIP,2013);
Method 13 is method (K.He, X.Zhang, S.Ren, the and J.Sun.Spatial that K.He et al. is proposed pyramid pooling in deep convolutional networks for visual recognition.In ECCV,2014)。
The present invention directly does and classified that (multiple dimensioned differentiation complete characteristics figure is by most on multiple dimensioned differentiation complete characteristics figure Neighbour's interpolation algorithm scales to obtain), the weight vectors that linear classifier is changed into change into nuclear matrix, and linear classification can pass through core Matrix and multiple dimensioned differentiation complete characteristics picture scroll product are completed.Due to differentiating that complete characteristics are sparse, convolution speed can pass through Sparse Fourier transform obtains.Further, since directly detected on complete characteristics figure is differentiated, rather than in artwork, target inspection Degree of testing the speed greatly improves.

Claims (5)

1. the fast target detection method based on convolutional neural networks, it is characterised in that comprise the following steps:
A training sample set (x) is preparedi, yi), i=1 ... ..., N, N is number of training, and N is natural number, xiRepresent training sample The image of corresponding fixed size, comprising target and target, the image full of picture is positive sample, and other images are negative samples;yi Represent sample class vector:
B) m batches are divided by all training samples, the convolutional neural networks that m-2 lot samples are originally put into design are instructed with back-propagation algorithm Practice, 2 lot sample sheets are used to test, and the convolutional neural networks include convolutional layer, maximum pond layer and local contrast normalization layer three Individual composition, these three compositions play the part of the function of nonlinear function, it is therefore an objective to the original image in manifold are mapped to European Space, the activation primitive of convolution uses non-linear correction unit, after designing the network structure model comprising these three compositions, uses Back-propagation algorithm training is trained to model, obtains parameter W;
C) extraction step B) in the parameter W trained, solve maximum pond in convolutional neural networks by the way of expander graphs The problem of losing information, test image is given, for each being offset in 2 × 2 convolution kernel, all preserve a corresponding feature Figure, referred to as expander graphs;It is K for each layer of max-pooling cores size, whole convolutional neural networks have p layers, every after extension The down-sampling layer feature map number in individual maximum pond extends to 2KTimes, whole network expands to o=(2K)pTimes, then parameter W For image to be detected of arbitrary size, the extension feature figure before full connection is obtained, it is referred to as to differentiate complete characteristics;Given input Image x, the wave filter group K trained and skew b, the output of convolutional layer are write as the form of formula one:
Wherein MjThe index of input feature vector figure selected by expression;L represents the index of current layer;I and j represents to input and export respectively Feature index of the picture;O represents segment index;F represents activation primitive, used here as linearity correction unit function f (x)=max (x, 0);* Represent convolution operation;
Maximum pond layer is write as the form of formula two:
Wherein, m and n represents the pixel index of current layer respectively;S represents the size of down-sampling core;P and q represent preceding layer respectively Pixel index, and meet p=s* (m-1)+k+1, q=s* (n-1)+k+1, wherein 0 < k < s;Branch is used to select correspondingly Columns and rows;
In order that the feature that must be obtained has the property of Shandong nation, local contrast normalization layer is introduced into the link of segment processing, This layer is write as the form of formula three:
Wherein, r represents the number for the neighbouring competition characteristic pattern formulated, and N represents the feature map number of current layer altogether, k, α, and β is Some hyper parameters, suitable floating-point values are specified in the training process, after CNN model trainings are good, according to three above formula, Obtain and differentiate that complete characteristics provide support to be follow-up based on smoothing windows scanning;
D the full connection of convolutional neural networks) is regarded as a linear classifier, is directly based upon and differentiates that complete characteristics figure detects, Extensive error R [ω] of the linear classifier on complete characteristics are differentiated is estimated using possible approximate learning framework, then basis Training pattern number q required for built-up pattern and desired extensive error H [ω] calculate;
E) each training pattern will be obtained in q model and performs feedforward operation in image to be detected respectively, each model obtains The differentiation complete characteristics figure before the full connection of o groups is taken, then every group of differentiation complete characteristics figure is scaled with arest neighbors interpolation algorithm, obtained Differentiate complete characteristics figure to n*o groups, q model amounts to n*o*q groups and differentiate complete characteristics figure, then in every group of differentiation complete characteristics Directly perform the classification work of intensive smoothing windows with linear classifier on figure and obtain n*o*q group response diagrams, wherein linear classifier with Differentiate that the dot product operations of complete graph change into convolution operation;
F n*q group response diagrams greatly) are worth to using non-to every o groups response diagram in n*o*q, then to every in n*q response diagrams N response diagram performs non-maxima suppression and obtains response diagram of the q groups with true yardstick, and q groups response diagram is performed and obtained with computing The final response diagram that there is true yardstick to one, calculate the barycenter in each UNICOM region in final response diagram:
G) barycenter and true yardstick are mapped in former mapping to be checked, according to corresponding to being drawn each centroid position and scale-value Rectangle frame, complete target detection.
2. the fast target detection method based on convolutional neural networks as claimed in claim 1, it is characterised in that in step A) in, The preparation training sample set includes following sub-step:
A1 the image block to the frame that sets the goal) is extracted from training image, then zooms to fixed size size as positive sample, it is each The individual image block to the frame that sets the goal obtains N as a sample0Individual image block, i.e. N0Individual sample Xi, i=1 ... ... N0
A2) for negative sample image acquisition, around positive sample image block extract with its without very greatly it is overlapping, arbitrarily large Small image block zooms to fixed dimension as negative sample Xi’;The extraction of negative sample is near possible representative, to cover Most of scene images;N is extracted altogether1Negative sample, i=1 ... ... N1
A3) N=N0+N1
3. the fast target detection method based on convolutional neural networks as claimed in claim 1, it is characterised in that in step B) in, The utilizationThe training of this progress of lot sample convolutional neural networks includes following sub-step:
B1) N number of sample needed when in batches it is random break up order, be then divided intoLot sample sheet;
B2) convolutional neural networks include convolutional layer, maximum pond layer and local contrast normalization three composition of layer, the first two composition It is required composition, the 3rd composition is to select non-linear correction unit according to any optional composition, the activation primitive of convolution, is led to Cross and be combined the network structure that these three compositions obtain being directed to different target detection;
B3) set network structure in required for filter number and feature map number, wave filter size, down-sampling core it is big Hyper parameter required for small, every layer of learning rate and local contrast normalization;
B4) during convolutional neural networks are trained, using momentum and at random training skills is thrown away;
B5 deconditioning) is determined when according to checking curve;
B6) the extracting parameter W from the model trained.
4. the fast target detection method based on convolutional neural networks as claimed in claim 1, it is characterised in that in step C) in, It is described to solve the problems, such as that maximum pondization loss information includes following sub-step in convolutional neural networks by the way of expander graphs:
C1) step B) in extraction W be used for image to be detected feedforward operation during, run into for each characteristic pattern When max-pooling loses the situation of discriminant information, new as one for each skew in the size K of down-sampling core Beginning offset, for the full down-sampling core that then tiles to whole image to be detected, each beginning offset obtains an expander graphs, when Front lower sample level can generate 2KIndividual expander graphs, all characteristic patterns are referred to as one section using the same obtained expander graphs of offseting, under adopt 2 can be generated after sampleKThe characteristic pattern of section, if there is z layer down-sampling layers in whole network, obtains o=(2K)zThe characteristic pattern of section;
C2) the follow-up convolution operation of down-sampling to every section, it is necessary to use identical convolution operation;
C3) local contrast normalization layer needs to use identical local contrast normalization operation to every section;
C4) during treating test image and going to feedforward operation, full articulamentum preceding layer obtain (2K)zThe feature of section Figure is referred to as differentiating complete characteristics figure.
5. the fast target detection method based on convolutional neural networks as claimed in claim 1, it is characterised in that in step D) in, Described be directly based upon differentiates that complete characteristics figure does detection and includes following sub-step:
D1) by step C) in (2K)zThe characteristic pattern of section all using arest neighbors interpolation algorithm n times, obtains n* (2K)zThe differentiation of section is complete Standby characteristic pattern;
D2) differentiate that doing convolution with it with linear classifier on complete characteristics figure obtains n* (2 at every sectionK)zResponse diagram;
D3) n* (2K)zResponse diagram all zoom to same size and record pantograph ratio, obtained with non-maxima suppression algorithm The pantograph ratio of final response diagram and corresponding peak response figure.
CN201510061852.6A 2015-02-06 2015-02-06 Fast target detection method based on convolutional neural networks Expired - Fee Related CN104573731B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510061852.6A CN104573731B (en) 2015-02-06 2015-02-06 Fast target detection method based on convolutional neural networks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510061852.6A CN104573731B (en) 2015-02-06 2015-02-06 Fast target detection method based on convolutional neural networks

Publications (2)

Publication Number Publication Date
CN104573731A CN104573731A (en) 2015-04-29
CN104573731B true CN104573731B (en) 2018-03-23

Family

ID=53089751

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510061852.6A Expired - Fee Related CN104573731B (en) 2015-02-06 2015-02-06 Fast target detection method based on convolutional neural networks

Country Status (1)

Country Link
CN (1) CN104573731B (en)

Families Citing this family (62)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9998130B2 (en) * 2016-07-06 2018-06-12 Hrl Laboratories, Llc Method to perform convolutions between arbitrary vectors using clusters of weakly coupled oscillators
CN104992223B (en) * 2015-06-12 2018-02-16 安徽大学 Intensive Population size estimation method based on deep learning
US10614339B2 (en) 2015-07-29 2020-04-07 Nokia Technologies Oy Object detection with neural network
US10970617B2 (en) 2015-08-21 2021-04-06 Institute Of Automation Chinese Academy Of Sciences Deep convolutional neural network acceleration and compression method based on parameter quantification
CN105160310A (en) * 2015-08-25 2015-12-16 西安电子科技大学 3D (three-dimensional) convolutional neural network based human body behavior recognition method
US10332028B2 (en) * 2015-08-25 2019-06-25 Qualcomm Incorporated Method for improving performance of a trained machine learning model
CN105205453B (en) * 2015-08-28 2019-01-08 中国科学院自动化研究所 Human eye detection and localization method based on depth self-encoding encoder
CN105120130B (en) * 2015-09-17 2018-06-29 京东方科技集团股份有限公司 A kind of image raising frequency system, its training method and image raising frequency method
CN105184271A (en) * 2015-09-18 2015-12-23 苏州派瑞雷尔智能科技有限公司 Automatic vehicle detection method based on deep learning
US10614354B2 (en) * 2015-10-07 2020-04-07 Altera Corporation Method and apparatus for implementing layers on a convolutional neural network accelerator
CN105335716B (en) * 2015-10-29 2019-03-26 北京工业大学 A kind of pedestrian detection method extracting union feature based on improvement UDN
US9965705B2 (en) * 2015-11-03 2018-05-08 Baidu Usa Llc Systems and methods for attention-based configurable convolutional neural networks (ABC-CNN) for visual question answering
CN105279556B (en) * 2015-11-05 2017-11-07 国家卫星海洋应用中心 A kind of Enteromorpha detection method and device
CN105426919B (en) * 2015-11-23 2017-11-14 河海大学 The image classification method of non-supervisory feature learning is instructed based on conspicuousness
CN105468335B (en) * 2015-11-24 2017-04-12 中国科学院计算技术研究所 Pipeline-level operation device, data processing method and network-on-chip chip
CN106778604B (en) * 2015-12-15 2020-04-14 西安电子科技大学 Pedestrian re-identification method based on matching convolutional neural network
US10360477B2 (en) * 2016-01-11 2019-07-23 Kla-Tencor Corp. Accelerating semiconductor-related computations using learning based models
CN105718890A (en) * 2016-01-22 2016-06-29 北京大学 Method for detecting specific videos based on convolution neural network
CN105740892A (en) * 2016-01-27 2016-07-06 北京工业大学 High-accuracy human body multi-position identification method based on convolutional neural network
CN108475331B (en) * 2016-02-17 2022-04-05 英特尔公司 Method, apparatus, system and computer readable medium for object detection
CN105821538B (en) * 2016-04-20 2018-07-17 广州视源电子科技股份有限公司 The detection method and system of spun yarn fracture
CN111860772B (en) * 2016-04-29 2024-01-16 中科寒武纪科技股份有限公司 Device and method for executing artificial neural network mapping operation
CN106019359A (en) * 2016-05-17 2016-10-12 浪潮集团有限公司 Earthquake prediction system based on neural network
WO2017200597A1 (en) * 2016-05-20 2017-11-23 Google Llc Progressive neural networks
CN109154990B (en) * 2016-06-03 2023-10-03 英特尔公司 Finding convolutional layers in convolutional neural networks
CN106021990B (en) * 2016-06-07 2019-06-25 广州麦仑信息科技有限公司 A method of biological gene is subjected to classification and Urine scent with specific character
CN106096655B (en) * 2016-06-14 2019-08-27 厦门大学 A kind of remote sensing image airplane detection method based on convolutional neural networks
CN106203496B (en) * 2016-07-01 2019-07-12 河海大学 Hydrographic curve extracting method based on machine learning
CN106504233B (en) * 2016-10-18 2019-04-09 国网山东省电力公司电力科学研究院 Unmanned plane inspection image electric power widget recognition methods and system based on Faster R-CNN
CN106778835B (en) * 2016-11-29 2020-03-24 武汉大学 Remote sensing image airport target identification method fusing scene information and depth features
CN106780512B (en) * 2016-11-30 2020-01-17 厦门美图之家科技有限公司 Method, application and computing device for segmenting image
CN106845528A (en) * 2016-12-30 2017-06-13 湖北工业大学 A kind of image classification algorithms based on K means Yu deep learning
CN107038448B (en) * 2017-03-01 2020-02-28 中科视语(北京)科技有限公司 Target detection model construction method
CN108229675B (en) * 2017-03-17 2021-01-01 北京市商汤科技开发有限公司 Neural network training method, object detection method, device and electronic equipment
CN108629354B (en) * 2017-03-17 2020-08-04 杭州海康威视数字技术股份有限公司 Target detection method and device
CN107124609A (en) * 2017-04-27 2017-09-01 京东方科技集团股份有限公司 A kind of processing system of video image, its processing method and display device
CN107220652B (en) * 2017-05-31 2020-05-01 北京京东尚科信息技术有限公司 Method and device for processing pictures
CN107527355B (en) * 2017-07-20 2020-08-11 中国科学院自动化研究所 Visual tracking method and device based on convolutional neural network regression model
CN109325385A (en) * 2017-07-31 2019-02-12 株式会社理光 Target detection and region segmentation method, device and computer readable storage medium
CN107563303B (en) * 2017-08-09 2020-06-09 中国科学院大学 Robust ship target detection method based on deep learning
CN107292886B (en) * 2017-08-11 2019-12-31 厦门市美亚柏科信息股份有限公司 Target object intrusion detection method and device based on grid division and neural network
CN107506774A (en) * 2017-10-09 2017-12-22 深圳市唯特视科技有限公司 A kind of segmentation layered perception neural networks method based on local attention mask
CN107766643B (en) * 2017-10-16 2021-08-03 华为技术有限公司 Data processing method and related device
CN107944354B (en) * 2017-11-10 2021-09-17 南京航空航天大学 Vehicle detection method based on deep learning
WO2019099899A1 (en) * 2017-11-17 2019-05-23 Facebook, Inc. Analyzing spatially-sparse data based on submanifold sparse convolutional neural networks
CN108171796A (en) * 2017-12-25 2018-06-15 燕山大学 A kind of inspection machine human visual system and control method based on three-dimensional point cloud
CN108280453B (en) * 2018-01-08 2020-06-16 西安电子科技大学 Low-power-consumption rapid image target detection method based on deep learning
CN110390344B (en) * 2018-04-19 2021-10-26 华为技术有限公司 Alternative frame updating method and device
CN108830280B (en) * 2018-05-14 2021-10-26 华南理工大学 Small target detection method based on regional nomination
CN108830300A (en) * 2018-05-28 2018-11-16 深圳市唯特视科技有限公司 A kind of object transmission method based on mixing supervisory detection
CN108875819B (en) * 2018-06-08 2020-10-27 浙江大学 Object and component joint detection method based on long-term and short-term memory network
CN109189965A (en) * 2018-07-19 2019-01-11 中国科学院信息工程研究所 Pictograph search method and system
CN109466725B (en) * 2018-10-11 2021-05-18 重庆邮电大学 Intelligent water surface floater fishing system based on neural network and image recognition
CN109376787B (en) * 2018-10-31 2021-02-26 聚时科技(上海)有限公司 Manifold learning network and computer vision image set classification method based on manifold learning network
CN109753903B (en) * 2019-02-27 2020-09-15 北航(四川)西部国际创新港科技有限公司 Unmanned aerial vehicle detection method based on deep learning
CN110135312B (en) * 2019-05-06 2022-05-03 电子科技大学 Rapid small target detection method based on hierarchical LCM
CN110390394B (en) * 2019-07-19 2021-11-05 深圳市商汤科技有限公司 Batch normalization data processing method and device, electronic equipment and storage medium
CN110674829B (en) * 2019-09-26 2023-06-02 哈尔滨工程大学 Three-dimensional target detection method based on graph convolution attention network
CN111612051B (en) * 2020-04-30 2023-06-20 杭州电子科技大学 Weak supervision target detection method based on graph convolution neural network
CN112862195B (en) * 2021-02-19 2023-06-20 金陵科技学院 SFT-ALS-based time series vermicelli fluctuation prediction method
WO2023220892A1 (en) * 2022-05-16 2023-11-23 Intel Corporation Expanded neural network training layers for convolution
CN114657513B (en) * 2022-05-23 2022-09-20 河南银金达新材料股份有限公司 Preparation method of antibacterial regenerated polyester film

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810503A (en) * 2013-12-26 2014-05-21 西北工业大学 Depth study based method for detecting salient regions in natural image
CN104063719A (en) * 2014-06-27 2014-09-24 深圳市赛为智能股份有限公司 Method and device for pedestrian detection based on depth convolutional network
CN104281853A (en) * 2014-09-02 2015-01-14 电子科技大学 Behavior identification method based on 3D convolution neural network
CN104680508A (en) * 2013-11-29 2015-06-03 华为技术有限公司 Convolutional neural network and target object detection method based on convolutional neural network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7274832B2 (en) * 2003-11-13 2007-09-25 Eastman Kodak Company In-plane rotation invariant object detection in digitized images

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104680508A (en) * 2013-11-29 2015-06-03 华为技术有限公司 Convolutional neural network and target object detection method based on convolutional neural network
CN103810503A (en) * 2013-12-26 2014-05-21 西北工业大学 Depth study based method for detecting salient regions in natural image
CN104063719A (en) * 2014-06-27 2014-09-24 深圳市赛为智能股份有限公司 Method and device for pedestrian detection based on depth convolutional network
CN104281853A (en) * 2014-09-02 2015-01-14 电子科技大学 Behavior identification method based on 3D convolution neural network

Also Published As

Publication number Publication date
CN104573731A (en) 2015-04-29

Similar Documents

Publication Publication Date Title
CN104573731B (en) Fast target detection method based on convolutional neural networks
CN109543606B (en) Human face recognition method with attention mechanism
CN107844795B (en) Convolutional neural networks feature extracting method based on principal component analysis
CN109034210A (en) Object detection method based on super Fusion Features Yu multi-Scale Pyramid network
CN104392463B (en) Image salient region detection method based on joint sparse multi-scale fusion
CN111860171B (en) Method and system for detecting irregular-shaped target in large-scale remote sensing image
CN108171112A (en) Vehicle identification and tracking based on convolutional neural networks
CN110807422A (en) Natural scene text detection method based on deep learning
CN109446922B (en) Real-time robust face detection method
CN109741318A (en) The real-time detection method of single phase multiple dimensioned specific objective based on effective receptive field
CN110827312B (en) Learning method based on cooperative visual attention neural network
CN107944459A (en) A kind of RGB D object identification methods
CN113160062B (en) Infrared image target detection method, device, equipment and storage medium
CN106778768A (en) Image scene classification method based on multi-feature fusion
CN110781962B (en) Target detection method based on lightweight convolutional neural network
CN108520203A (en) Multiple target feature extracting method based on fusion adaptive more external surrounding frames and cross pond feature
CN109948457B (en) Real-time target recognition method based on convolutional neural network and CUDA acceleration
CN105046278B (en) The optimization method of Adaboost detection algorithm based on Haar feature
CN112163498A (en) Foreground guiding and texture focusing pedestrian re-identification model establishing method and application thereof
CN112528845A (en) Physical circuit diagram identification method based on deep learning and application thereof
CN106529441A (en) Fuzzy boundary fragmentation-based depth motion map human body action recognition method
CN110969101A (en) Face detection and tracking method based on HOG and feature descriptor
CN114492634A (en) Fine-grained equipment image classification and identification method and system
CN112102379B (en) Unmanned aerial vehicle multispectral image registration method
CN111160372B (en) Large target identification method based on high-speed convolutional neural network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180323