CN109583483A - A kind of object detection method and system based on convolutional neural networks - Google Patents

A kind of object detection method and system based on convolutional neural networks Download PDF

Info

Publication number
CN109583483A
CN109583483A CN201811347546.9A CN201811347546A CN109583483A CN 109583483 A CN109583483 A CN 109583483A CN 201811347546 A CN201811347546 A CN 201811347546A CN 109583483 A CN109583483 A CN 109583483A
Authority
CN
China
Prior art keywords
feature
anchor point
characteristic pattern
frame
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811347546.9A
Other languages
Chinese (zh)
Other versions
CN109583483B (en
Inventor
唐乾坤
胡瑜
金贝贝
曾鸣
曾一鸣
刘世策
叶靖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201811347546.9A priority Critical patent/CN109583483B/en
Publication of CN109583483A publication Critical patent/CN109583483A/en
Application granted granted Critical
Publication of CN109583483B publication Critical patent/CN109583483B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The present invention relates to a kind of object detection method and system based on convolutional neural networks, comprising: extract the convolution characteristic pattern of picture to be measured respectively using the convolution kernel of a variety of scales;The feature vector that each spatial position of convolution characteristic pattern is adjusted using full articulamentum, obtains fisrt feature figure, is spliced to obtain splicing characteristic pattern, the characteristic information in the splicing each channel of characteristic pattern is adjusted using full articulamentum, obtains second feature figure;For the anchor point frame for setting different scale and length-width ratio on each spatial position of second feature figure, the coordinate and size of anchor point frame are the coordinate systems relative to picture to be measured;Each anchor point frame is projected on second feature figure, the feature after the extraction projection of using area feature extraction operation inside anchor point frame, and has the anchor point frame of object as target candidate frame frame choosing;The accurate location and size of classification and regressive object candidate frame are carried out to the object in target candidate frame using target identification network.

Description

A kind of object detection method and system based on convolutional neural networks
Technical field
The present invention relates to technical field of computer vision more particularly to a kind of object candidate area generation method of enhancing with Device and object detection method.
Background technique
Target detection is one of basic project of computer vision field, with the development of depth convolutional neural networks, mesh The performance of mark detection also obtains very big improvement.It is most common in the object detection method currently based on convolutional neural networks It is based on two stage target detection process, which uses target candidate to generate network first and generate candidate frame (proposals), classification then is carried out to candidate frame (proposals) using identification network and accurate adjustment obtains boundary to the end Frame.
Based in two stage testing process, generating candidate frame is a most important step.It there is now two class target candidates The generation method of frame: one kind is using the method for traditional manual feature, and another kind of is that the candidate frame based on deep learning generates Technology.The former obtains candidate frame using super-pixel or the marginal information of object etc.;The latter is using full convolutional network by means of anchor Point frame (anchorbox, a series of rectangle frame of predetermined locations, scale and length-width ratio, similarly hereinafter) carry out predicting candidate frame simultaneously Position and judge whether candidate frame includes object.
Although the quality for the candidate frame that the candidate frame generation technique based on deep learning obtains is compared to based on traditional-handwork The method of feature, which obtains result, to get well, but includes in the candidate frame of the generation of the target candidate frame generation technique based on deep learning A large amount of background rather than include real object.The bounding box for finally identification model of second stage being obtained is from obtaining Higher positional accuracy is obtained, the improvement of target detection performance is limited.The primary limitation of generation method based on deep learning It is using the convolution kernel of single scale come the Object Extraction feature for different scale, while different rulers on characteristic pattern same position The anchor point frame of degree has used identical feature, so that final the result is that son optimization.
Summary of the invention
The volume of single scale is used during generating candidate frame for the candidate frame generation technique based on deep learning Product core extracts feature and different scale anchor point frame shares the limitation of same characteristic features, causes target detection that cannot obtain more Gao Ding The case where position precision, the invention proposes a kind of object detection methods based on convolutional neural networks, comprising:
Step 1, the convolution characteristic pattern for extracting picture to be measured respectively using the convolution kernel of a variety of scales;
Step 2, the feature vector that each spatial position of convolution characteristic pattern is adjusted using full articulamentum obtain the first spy Sign figure;
Step 3 splices the fisrt feature figure, obtains splicing characteristic pattern, adjusts the splicing characteristic pattern using the full articulamentum The characteristic information in each channel obtains second feature figure;
Step 4, the anchor point frame to set different scale and length-width ratio on each spatial position of the second feature figure, the anchor The coordinate and size of point frame are the coordinate systems relative to the picture to be measured;
Step 5 projects to each anchor point frame on the second feature figure, and using area feature extraction operation extracts projection Feature inside anchor point frame afterwards, and the probability value in the anchor point frame comprising object is obtained according to the feature inside the anchor point frame, and Target candidate frame is selected from all anchor point frames according to the probability value;
Step 6 carries out classification and regressive object candidate frame to the object in target candidate frame using target identification network Accurate location and size, the bounding box of object is determined according to the accurate location and size, classification results and the bounding box are made For object detection results output.
The object detection method based on convolutional neural networks, wherein the step 1 is specially using k different convolution kernel Convolution operation extracts feature parallel.
The object detection method based on convolutional neural networks, wherein the step 2 adjusts convolution spy especially by following formula Sign schemes the feature vector of each spatial position:
ωij=F (dij)
D in formulaijFor the feature vector of a spatial position of convolution characteristic pattern (i, j), nonlinear function F is three cascades The full articulamentum, ωijFor the first adjustment factor, oijFor the feature vector of spatial position (i, j) of the fisrt feature figure, Indicate dot product.
The object detection method based on convolutional neural networks, wherein the step 3 adjusts each channel of splicing characteristic pattern Characteristic information specifically include:
The Feature Descriptor a in the channel is obtained using the average pond of the overall situation:
A=global_pooling (U)
U is the splicing characteristic pattern in formula, and global_pooling indicates global average pond;
Use three cascade full articulamentums as nonlinear function F, obtain the adjustment factor e in each channel:
E=F (a),
WhereinIndicate dot product, U ' is the second feature figure.
The object detection method based on convolutional neural networks, wherein the step 5 include:
According to the probability value, which is ranked up, after filtering out duplicate anchor point frame using non-maxima suppression, N number of target candidate frame of maximum probability is selected, N is default positive integer.
The invention also discloses a kind of object detection system based on convolutional neural networks, including:
Extraction module extracts the convolution characteristic pattern of picture to be measured using the convolution kernel of a variety of scales respectively;
First adjustment module is adjusted the feature vector of each spatial position of convolution characteristic pattern using full articulamentum, obtained To fisrt feature figure;
Second adjustment module splices the fisrt feature figure, obtains splicing characteristic pattern, adjusts the splicing using the full articulamentum The characteristic information in each channel of characteristic pattern, obtains second feature figure;
First adjustment module, for the anchor point for setting different scale and length-width ratio on each spatial position of the second feature figure Frame, the coordinate and size of the anchor point frame are the coordinate systems relative to the picture to be measured;
Candidate frame Choosing module, for each anchor point frame to be projected to the second feature figure, using area feature extraction Feature after operation extraction projection inside anchor point frame, and inclusion in the anchor point frame is obtained according to the feature inside the anchor point frame The probability value of body, and target candidate frame is selected from all anchor point frames according to the probability value;
Module of target detection carries out classification and regressive object to the object in target candidate frame using target identification network The accurate location and size of candidate frame, the bounding box of object is determined according to the accurate location and size, by classification results and the side Boundary's frame is exported as object detection results.
The object detection system based on convolutional neural networks, wherein the extraction module is specially to use k different convolution The convolution operation of core extracts feature parallel.
The object detection system based on convolutional neural networks, wherein first adjustment module is adjusted especially by following formula and is somebody's turn to do The feature vector of each spatial position of convolution characteristic pattern:
ωij=F (dij)
D in formulaijFor the feature vector of a spatial position of convolution characteristic pattern (i, j), nonlinear function F is three cascades The full articulamentum, ωijFor the first adjustment factor, oijFor the feature vector of spatial position (i, j) of the fisrt feature figure, Indicate dot product.
The object detection system based on convolutional neural networks, wherein it is every to adjust the splicing characteristic pattern for second adjustment module The characteristic information in a channel specifically includes:
The Feature Descriptor a in the channel is obtained using the average pond of the overall situation:
A=global_pooling (U)
U is the splicing characteristic pattern in formula, and global_pooling indicates global average pond;
Use three cascade full articulamentums as nonlinear function F, obtain the adjustment factor e in each channel:
E=F (a),
WhereinIndicate dot product, U ' is the second feature figure.
The object detection system based on convolutional neural networks, wherein the candidate frame Choosing module include:
According to the probability value, which is ranked up, after filtering out duplicate anchor point frame using non-maxima suppression, N number of target candidate frame of maximum probability is selected, N is default positive integer.
Compared with prior art, the present invention having the following advantages and benefits:
1, target candidate generation method provided by the present invention and device are independent of specific core network.Existing mind It all can serve as target candidate generation method provided by the invention and device after removing last full articulamentum through network Target candidate generation method provided by the invention and device easily can be directly connected to core network by core network On the last layer convolutional layer;
2, had using the target candidate frame (proposals) that target candidate generation method provided by the invention and device generate There is higher quality, i.e. proposals seldom includes background information, can accurately navigate to object;
3, target candidate generation method provided by the invention and device have processing speed more faster than the prior art;
4, target candidate generation method provided by the invention and device are utilized in being based on two stage target detection process The available higher detection accuracy of target candidate frame (proposals) of generation.
Detailed description of the invention
Fig. 1 is a kind of flow chart of the target candidate generation method of enhancing of the embodiment of the present invention;
Fig. 2 is a kind of schematic diagram that feature is extracted using the convolution kernel of a variety of scales of the embodiment of the present invention;
Fig. 3 is each space bit for the convolution characteristic pattern that a kind of convolution kernel for every kind of scale of the embodiment of the present invention obtains Set the schematic diagram of learning regulation coefficient;
Fig. 4 is that a kind of of the embodiment of the present invention is characterized each channel learning regulation coefficient of figure to be used to adaptive adjusting every The schematic diagram of a channel characteristics information;
Fig. 5 is the schematic diagram of feature in a kind of anchor point frame for extracting each spatial position of the embodiment of the present invention;
Fig. 6 is a kind of object candidate area generating means schematic diagram of enhancing of the embodiment of the present invention;
Fig. 7 is a kind of based on a kind of mesh of the target candidate generation method of enhancing provided by the invention of the embodiment of the present invention Mark detection method flow chart.
Specific embodiment
The invention proposes a kind of object detection methods based on convolutional neural networks, comprising:
Step 1, the convolution characteristic pattern for extracting picture to be measured respectively using the convolution kernel of a variety of scales;
Step 2, the feature vector that each spatial position of convolution characteristic pattern is adjusted using full articulamentum obtain the first spy Sign figure;
Step 3 splices the fisrt feature figure, obtains splicing characteristic pattern, adjusts the splicing characteristic pattern using the full articulamentum The characteristic information in each channel obtains second feature figure;
Step 4, the anchor point frame to set different scale and length-width ratio on each spatial position of the second feature figure, the anchor The coordinate and size of point frame are the coordinate systems relative to the picture to be measured;
Step 5 projects to each anchor point frame on the second feature figure, and using area feature extraction operation extracts projection Feature inside anchor point frame afterwards, and the probability value in the anchor point frame comprising object is obtained according to the feature inside the anchor point frame, and Target candidate frame is selected from all anchor point frames according to the probability value;
Step 6 carries out classification and regressive object candidate frame to the object in target candidate frame using target identification network Accurate location and size, the bounding box of object is determined according to the accurate location and size, classification results and the bounding box are made For object detection results output.
The object detection method based on convolutional neural networks, wherein the step 1 is specially using k different convolution kernel Convolution operation extracts feature parallel.
The object detection method based on convolutional neural networks, wherein the step 2 adjusts convolution spy especially by following formula Sign schemes the feature vector of each spatial position:
ωij=F (dij)
D in formulaijFor the feature vector of a spatial position of convolution characteristic pattern (i, j), nonlinear function F is three cascades The full articulamentum, ωijFor the first adjustment factor, oijFor the feature vector of spatial position (i, j) of the fisrt feature figure, Indicate dot product.
The object detection method based on convolutional neural networks, wherein the step 3 adjusts each channel of splicing characteristic pattern Characteristic information specifically include:
The Feature Descriptor a in the channel is obtained using the average pond of the overall situation:
A=global_pooling (U)
U is the splicing characteristic pattern in formula, and global_pooling indicates global average pond;
Use three cascade full articulamentums as nonlinear function F, obtain the adjustment factor e in each channel:
E=F (a),
WhereinIndicate dot product, U ' is the second feature figure.
The object detection method based on convolutional neural networks, wherein the step 5 include:
According to the probability value, which is ranked up, after filtering out duplicate anchor point frame using non-maxima suppression, N number of target candidate frame of maximum probability is selected, N is default positive integer.
It is logical below in conjunction with attached drawing in order to keep the purpose of the present invention, technical solution, design method and advantage more clear Crossing specific embodiment, the present invention is described in more detail.It should be appreciated that specific embodiment described herein is only to explain The present invention is not intended to limit the present invention.
Embodiment 1
Fig. 1 is a kind of target candidate frame generation method provided by the invention, be the steps include:
S11: feature is extracted using the convolution kernel of a variety of scales;
In a kind of preferred embodiment, as shown in Fig. 2, using the convolution kernel 1 × 1 of k=3 kind scale, 3 × 3 and 5 × 5 To extract feature in multiple dimensioned layer.In specific implementation in order to reduce parameter amount and increase non-linear expression, 3 × 3 and 5 × It is added to the convolutional layer that a public convolution kernel is 1 × 1 before 5 convolution and is used for dimensionality reduction;The convolutional layer for being 5 × 5 by convolution kernel It is further divided into the convolutional layer that cascade two layers of convolution kernel is 3 × 3.Every kind is also given in a kind of preferred embodiment in Fig. 2 The output channel number of convolution operation.
S12: for the obtained each spatial position learning regulation coefficient of characteristic pattern of convolution nuclear convolution of every kind of scale, it is used to The characteristic information that the adaptive each spatial position useful feature information of enhancing inhibits each spatial position useless simultaneously;
The step specific embodiment is, as shown in figure 3, for using the convolution kernel of a certain scale to carry out convolution behaviour Make the obtained a height of H of input, width W, port number is the characteristic pattern M of Cin, the feature vector of a spatial position (i, j) is dij(height × wide × port number=1 × 1 × C) indicates the characteristic value of position (i, j) of H*W characteristic pattern, and the position is all logical It is exactly feature vector d that the characteristic value in road, which is taken out,ij., nonlinear function F use three cascade full articulamentums, act on dijOn Obtain the adjustment factor ω of the spatial positionij, i.e.,
ωij=F (dij),
Wherein F indicates the nonlinear function that three full articulamentums are constituted.
The feature vector of the position after thus being adjusted is
WhereinIndicate dot product, oijFor the characteristic pattern M after adjustingoutThe spatial position (i, j) feature vector.
It is in actual implementation in order to reduce the complexity of model parameter and network, the parameter of three full articulamentums is each Spatial position is shared, and articulamentum complete in this way can be replaced with convolution kernel by 1 × 1 convolutional layer.
S13: together by the convolution merging features of every kind of scale after adjusting;
S14: to the characteristic pattern after splicing, each channel is learnt according to the feature distribution in each channel of this feature figure Adjustment factor, for the characteristic information in the adaptive each channel of adjusting, it should be noted that characteristic information be different from feature to Amount, characteristic information indicates the adjustment factor obtained with study to adjust the expressed feature out in each channel, and feature vector is more More is to indicate a column data, and the data that S14 is adjusted are H*W;
The specific embodiment of the step are as follows: as shown in figure 4, input feature vector figure is U, first with the average pond of the overall situation To the Feature Descriptor in the channel
A=global_pooling (U),
Wherein global_pooling indicates global average pond.
Then three cascade full articulamentums are used as nonlinear function F to obtain the adjustment factor e in each channel, i.e.,
E=F (a),
Wherein F indicates the nonlinear function that three full articulamentums are constituted.
Therefore the characteristic pattern after final each channel is adjusted:
WhereinIndicate dot product.
S15: different scale and length and width are set on each spatial position to adjust the characteristic pattern after channel characteristics information The anchor point frame of ratio, the coordinate and size of the anchor point frame are relative to the coordinate system for being originally inputted picture.Each position of characteristic pattern As the center point coordinate of anchor point frame, which can be by obtaining the coordinate on input picture multiplied by down-sampling step number.In this way The frame for being primarily due to mark exists on original image, and anchor point frame, which projects on input picture, facilitates calculating training mesh Mark;
S16: each anchor point frame is projected on the characteristic pattern after adjusting channel characteristics information, using area feature extraction The feature inside the anchor point frame after projection is extracted in operation.
It is illustrated in figure 5 the concrete operations mode of the step, for each spatial position of characteristic pattern, a kind of preferred reality It applies in mode, in advance in the anchor point frame of each position setting 3 kinds of scales and 3 kinds of length-width ratios, the anchor point frame is then projected into institute It states on the output characteristic pattern U ' of channel adjustment module.Using area feature extracting method φ come extract projection after each anchor point The feature for including in frame, a kind of simple and effective Region Feature Extraction method can use (the candidate region pond RoIpooling Change).In the present embodiment in order to reduce number of parameters, anchor point frame is grouped according to length-width ratio, the extraction of identical aspect ratio The feature of same size.It is 128 for example, by using 3 kinds of scales2, 2562, 5122Pixel and length-width ratio are 1:2, totally 9 kinds of 1:1,2:1 Anchor point frame can be the characteristic information that anchor point frame extracts 5 × 11,7 × 7 and 11 × 5 sizes in each spatial position.Obtaining this After a little characteristic informations, two full articulamentums can be used and be further processed, then spells the characteristic pattern after processing It is connected into a characteristic pattern.The parameter of the full articulamentum of each spatial position identical aspect ratio can be total in actual implementation It enjoys, thus full articulamentum used can be converted to convolution layer operation.
S17: it is respectively intended to return by two parallel network layers are connected after the feature inside each anchor point frame of extraction Whether include object in the position of candidate frame and the differentiation candidate frame.
It, can be according to output in the target candidate frame generated using the target candidate frame generation method proposed by the present invention Include that the probability value of object is ranked up, after filtering out duplicate target candidate frame using non-maxima suppression, select N number of target candidate frame (proposals) of maximum probability.
Embodiment 2
The embodiment of the present invention also provides a kind of object candidate area generating means of enhancing, as shown in fig. 6, the device includes Convolution pyramid module 21, Space adjustment module 22, Space adjustment merging features module 23, channel adjustment module 24, feature are suitable Answer module 25 and classification and regression block 26.
Wherein convolution pyramid module 21, the module extract feature for the convolution kernel using a variety of scales;Space tune Save module 22, each spatial position learning regulation system for the characteristic pattern which is used to obtain for the convolution nuclear convolution of every kind of scale Number is used to the feature letter that the adaptive each spatial position useful feature information of enhancing inhibits each spatial position useless simultaneously Breath;Space adjustment merging features module 23, the convolution which is used to obtain the convolution kernel of every kind of scale after adjusting are special Sign is stitched together;Channel adjustment module 24, which is used for the characteristic pattern after splicing, logical according to each of this feature figure The feature distribution in road learns the adjustment factor in each channel, for the characteristic information in the adaptive each channel of adjusting;Feature is suitable Module 25 is answered, which is used for the anchor point frame of the different scale for the setting of each spatial position, while each anchor point frame being projected On characteristic pattern after to adjusting channel characteristics information, using area feature extraction operation is extracted inside the anchor point frame after projection Feature;Classify and regression block 26, connects two after the feature inside each anchor point frame in the module for that will extract Whether parallel network layer is respectively intended to return the position of candidate frame and differentiate in the candidate frame comprising object.
In object candidate area generating means provided by the embodiment of the present invention, the course of work and target of modules are waited Therefore aforementioned function equally may be implemented, details are not described herein in favored area generation method technical characteristic having the same.
Embodiment 3
The embodiment of the present invention provides a kind of object detection method of target candidate frame generated based on embodiment 1.Including such as Lower step:
S31: picture to be detected is obtained;
S32: picture is input in target detection network, and the target detection network includes enhancing described in embodiment 1 Target candidate frame generate network and target identification network;
S321: target candidate frame generates the target candidate frame (proposals) that network generation may include object;
S322: target identification network is classified to obtain in some proposal to the proposals that may include object The specific category of object;
S323: target identification network is returned to obtain in some proposal to the proposals that may include object Bounding box (boundingbox) size of object estimation;
The following are system embodiment corresponding with above method embodiment, present embodiment can be mutual with above embodiment Cooperation is implemented.The relevant technical details mentioned in above embodiment are still effective in the present embodiment, in order to reduce repetition, Which is not described herein again.Correspondingly, the relevant technical details mentioned in present embodiment are also applicable in above embodiment.
Invention additionally discloses can a kind of object detection system based on convolutional neural networks, including:
Extraction module extracts the convolution characteristic pattern of picture to be measured using the convolution kernel of a variety of scales respectively;
First adjustment module is adjusted the feature vector of each spatial position of convolution characteristic pattern using full articulamentum, obtained To fisrt feature figure;
Second adjustment module splices the fisrt feature figure, obtains splicing characteristic pattern, adjusts the splicing using the full articulamentum The characteristic information in each channel of characteristic pattern, obtains second feature figure;
First adjustment module, for the anchor point for setting different scale and length-width ratio on each spatial position of the second feature figure Frame, the coordinate and size of the anchor point frame are the coordinate systems relative to the picture to be measured;
Candidate frame Choosing module, for each anchor point frame to be projected to the second feature figure, using area feature extraction Feature after operation extraction projection inside anchor point frame, and inclusion in the anchor point frame is obtained according to the feature inside the anchor point frame The probability value of body, and target candidate frame is selected from all anchor point frames according to the probability value;
Module of target detection carries out classification and regressive object to the object in target candidate frame using target identification network The accurate location and size of candidate frame, the bounding box of object is determined according to the accurate location and size, by classification results and the side Boundary's frame is exported as object detection results.
The object detection system based on convolutional neural networks, wherein the extraction module is specially to use k different convolution The convolution operation of core extracts feature parallel.
The object detection system based on convolutional neural networks, wherein first adjustment module is adjusted especially by following formula and is somebody's turn to do The feature vector of each spatial position of convolution characteristic pattern:
ωij=F (dij)
D in formulaijFor the feature vector of a spatial position of convolution characteristic pattern (i, j), nonlinear function F is three cascades The full articulamentum, ωijFor the first adjustment factor, oijFor the feature vector of spatial position (i, j) of the fisrt feature figure, Indicate dot product.
The object detection system based on convolutional neural networks, wherein it is every to adjust the splicing characteristic pattern for second adjustment module The characteristic information in a channel specifically includes:
The Feature Descriptor a in the channel is obtained using the average pond of the overall situation:
A=global_pooling (U)
U is the splicing characteristic pattern in formula, and global_pooling indicates global average pond;
Use three cascade full articulamentums as nonlinear function F, obtain the adjustment factor e in each channel:
E=F (a),
WhereinIndicate dot product, U ' is the second feature figure.
The object detection system based on convolutional neural networks, wherein the candidate frame Choosing module include:
According to the probability value, which is ranked up, after filtering out duplicate anchor point frame using non-maxima suppression, N number of target candidate frame of maximum probability is selected, N is default positive integer.

Claims (10)

1. a kind of object detection method based on convolutional neural networks characterized by comprising
Step 1, the convolution characteristic pattern for extracting picture to be measured respectively using the convolution kernel of a variety of scales;
Step 2, the feature vector that each spatial position of convolution characteristic pattern is adjusted using full articulamentum, obtain fisrt feature Figure;
Step 3 splices the fisrt feature figure, obtains splicing characteristic pattern, adjusts the splicing characteristic pattern using full articulamentum and each lead to The characteristic information in road obtains second feature figure;
Step 4, the anchor point frame to set different scale and length-width ratio on each spatial position of the second feature figure;
Step 5 projects to each anchor point frame on the second feature figure, and using area feature extraction operation extracts anchor after projection Point frame inside feature, and according to the feature inside the anchor point frame obtain in the anchor point frame include object probability value, and according to The probability value selects target candidate frame from all anchor point frames;
Step 6, the standard for carrying out classification and regressive object candidate frame to the object in target candidate frame using target identification network True position and size, the bounding box of object is determined according to the accurate location and size, using classification results and the bounding box as mesh Mark testing result output.
2. as described in claim 1 based on the object detection method of convolutional neural networks, which is characterized in that the step 1 is specific To use the convolution operation of k different convolution kernels to extract feature parallel.
3. as described in claim 1 based on the object detection method of convolutional neural networks, which is characterized in that the step 2 is specific The feature vector of each spatial position of convolution characteristic pattern is adjusted by following formula:
ωij=F (dij)
D in formulaijFor the feature vector of a spatial position of convolution characteristic pattern (i, j), nonlinear function F be three it is cascade should Full articulamentum, ωijFor the first adjustment factor, oijFor the feature vector of spatial position (i, j) of the fisrt feature figure,It indicates Dot product.
4. the object detection method as claimed in claim 1 or 3 based on convolutional neural networks, which is characterized in that the step 3 is adjusted The characteristic information for saving each channel of splicing characteristic pattern specifically includes:
The Feature Descriptor a in the channel is obtained using the average pond of the overall situation:
A=global_pooling (U)
U is the splicing characteristic pattern in formula, and global_pooling indicates global average pond;
Use three cascade full articulamentums as nonlinear function F, obtain the adjustment factor e in each channel:
E=F (a),
WhereinIndicate dot product, U ' is the second feature figure.
5. as claimed in claim 4 based on the object detection method of convolutional neural networks, which is characterized in that the step 5 includes:
According to the probability value, which is ranked up, after filtering out duplicate anchor point frame using non-maxima suppression, is selected N number of target candidate frame of maximum probability, N are default positive integer.
6. a kind of object detection system based on convolutional neural networks characterized by comprising
Extraction module extracts the convolution characteristic pattern of picture to be measured using the convolution kernel of a variety of scales respectively;
First adjustment module adjusts the feature vector of each spatial position of convolution characteristic pattern using full articulamentum, obtains One characteristic pattern;
Second adjustment module splices the fisrt feature figure, obtains splicing characteristic pattern, adjusts the splicing feature using the full articulamentum The characteristic information for scheming each channel obtains second feature figure;
First adjustment module, for the anchor point frame for setting different scale and length-width ratio on each spatial position of the second feature figure;
Candidate frame Choosing module, for each anchor point frame to be projected to the second feature figure, using area feature extraction operation The feature after projecting inside anchor point frame is extracted, and obtains including object in the anchor point frame according to the feature inside the anchor point frame Probability value, and target candidate frame is selected from all anchor point frames according to the probability value;
Module of target detection carries out classification to the object in target candidate frame using target identification network and regressive object is candidate The accurate location and size of frame, the bounding box of object is determined according to the accurate location and size, by classification results and the bounding box It is exported as object detection results.
7. as described in claim 1 based on the object detection system of convolutional neural networks, which is characterized in that extraction module tool Body is to extract feature parallel using the convolution operation of k different convolution kernels.
8. as described in claim 1 based on the object detection system of convolutional neural networks, which is characterized in that the first adjusting mould Block adjusts the feature vector of each spatial position of convolution characteristic pattern especially by following formula:
ωij=F (dij)
D in formulaijFor the feature vector of a spatial position of convolution characteristic pattern (i, j), nonlinear function F be three it is cascade should Full articulamentum, ωijFor the first adjustment factor, oijFor the feature vector of spatial position (i, j) of the fisrt feature figure,It indicates Dot product.
9. the object detection system based on convolutional neural networks as described in claim 6 or 8, which is characterized in that second tune The characteristic information that section module adjusts each channel of splicing characteristic pattern specifically includes:
The Feature Descriptor a in the channel is obtained using the average pond of the overall situation:
A=global_pooling (U)
U is the splicing characteristic pattern in formula, and global_pooling indicates global average pond;
Use three cascade full articulamentums as nonlinear function F, obtain the adjustment factor e in each channel:
E=F (a),
WhereinIndicate dot product, U ' is the second feature figure.
10. as claimed in claim 9 based on the object detection system of convolutional neural networks, which is characterized in that the candidate frame is chosen Modeling block includes:
According to the probability value, which is ranked up, after filtering out duplicate anchor point frame using non-maxima suppression, is selected N number of target candidate frame of maximum probability, N are default positive integer.
CN201811347546.9A 2018-11-13 2018-11-13 Target detection method and system based on convolutional neural network Active CN109583483B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811347546.9A CN109583483B (en) 2018-11-13 2018-11-13 Target detection method and system based on convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811347546.9A CN109583483B (en) 2018-11-13 2018-11-13 Target detection method and system based on convolutional neural network

Publications (2)

Publication Number Publication Date
CN109583483A true CN109583483A (en) 2019-04-05
CN109583483B CN109583483B (en) 2020-12-11

Family

ID=65922216

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811347546.9A Active CN109583483B (en) 2018-11-13 2018-11-13 Target detection method and system based on convolutional neural network

Country Status (1)

Country Link
CN (1) CN109583483B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110070122A (en) * 2019-04-15 2019-07-30 沈阳理工大学 A kind of convolutional neural networks blurred picture classification method based on image enhancement
CN110276345A (en) * 2019-06-05 2019-09-24 北京字节跳动网络技术有限公司 Convolutional neural networks model training method, device and computer readable storage medium
CN110427940A (en) * 2019-08-05 2019-11-08 山东浪潮人工智能研究院有限公司 A method of pre-selection frame is generated for object detection model
CN111382695A (en) * 2020-03-06 2020-07-07 北京百度网讯科技有限公司 Method and apparatus for detecting boundary points of object
CN111401215A (en) * 2020-03-12 2020-07-10 杭州涂鸦信息技术有限公司 Method and system for detecting multi-class targets
CN111461145A (en) * 2020-03-31 2020-07-28 中国科学院计算技术研究所 Method for detecting target based on convolutional neural network
CN111563441A (en) * 2020-04-29 2020-08-21 上海富瀚微电子股份有限公司 Anchor point generation matching method for target detection
CN111709377A (en) * 2020-06-18 2020-09-25 苏州科达科技股份有限公司 Feature extraction method, target re-identification method and device and electronic equipment
CN111723632A (en) * 2019-11-08 2020-09-29 珠海达伽马科技有限公司 Ship tracking method and system based on twin network
CN111832328A (en) * 2019-04-15 2020-10-27 北京京东尚科信息技术有限公司 Bar code detection method, bar code detection device, electronic equipment and medium
CN111931877A (en) * 2020-10-12 2020-11-13 腾讯科技(深圳)有限公司 Target detection method, device, equipment and storage medium
CN111951268A (en) * 2020-08-11 2020-11-17 长沙大端信息科技有限公司 Parallel segmentation method and device for brain ultrasonic images
CN112926595A (en) * 2021-02-04 2021-06-08 深圳市豪恩汽车电子装备股份有限公司 Training device for deep learning neural network model, target detection system and method
CN113780355A (en) * 2021-08-12 2021-12-10 上海理工大学 Deep convolutional neural network learning method for deep sea submersible propeller fault identification

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9646243B1 (en) * 2016-09-12 2017-05-09 International Business Machines Corporation Convolutional neural networks using resistive processing unit array
CN107316058A (en) * 2017-06-15 2017-11-03 国家新闻出版广电总局广播科学研究院 Improve the method for target detection performance by improving target classification and positional accuracy
CN107680678A (en) * 2017-10-18 2018-02-09 北京航空航天大学 Based on multiple dimensioned convolutional neural networks Thyroid ultrasound image tubercle auto-check system
CN108564097A (en) * 2017-12-05 2018-09-21 华南理工大学 A kind of multiscale target detection method based on depth convolutional neural networks
CN108647585A (en) * 2018-04-20 2018-10-12 浙江工商大学 A kind of traffic mark symbol detection method based on multiple dimensioned cycle attention network
CN108765387A (en) * 2018-05-17 2018-11-06 杭州电子科技大学 Based on Faster RCNN mammary gland DBT image lump automatic testing methods

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9646243B1 (en) * 2016-09-12 2017-05-09 International Business Machines Corporation Convolutional neural networks using resistive processing unit array
US20180075338A1 (en) * 2016-09-12 2018-03-15 International Business Machines Corporation Convolutional neural networks using resistive processing unit array
CN107316058A (en) * 2017-06-15 2017-11-03 国家新闻出版广电总局广播科学研究院 Improve the method for target detection performance by improving target classification and positional accuracy
CN107680678A (en) * 2017-10-18 2018-02-09 北京航空航天大学 Based on multiple dimensioned convolutional neural networks Thyroid ultrasound image tubercle auto-check system
CN108564097A (en) * 2017-12-05 2018-09-21 华南理工大学 A kind of multiscale target detection method based on depth convolutional neural networks
CN108647585A (en) * 2018-04-20 2018-10-12 浙江工商大学 A kind of traffic mark symbol detection method based on multiple dimensioned cycle attention network
CN108765387A (en) * 2018-05-17 2018-11-06 杭州电子科技大学 Based on Faster RCNN mammary gland DBT image lump automatic testing methods

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SHAOQING REN ET AL: "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", 《ARXIV:1506.01497V3 [CS.CV]》 *

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111832328A (en) * 2019-04-15 2020-10-27 北京京东尚科信息技术有限公司 Bar code detection method, bar code detection device, electronic equipment and medium
CN111832328B (en) * 2019-04-15 2024-07-16 北京京东乾石科技有限公司 Bar code detection method, device, electronic equipment and medium
CN110070122A (en) * 2019-04-15 2019-07-30 沈阳理工大学 A kind of convolutional neural networks blurred picture classification method based on image enhancement
CN110070122B (en) * 2019-04-15 2022-05-06 沈阳理工大学 Convolutional neural network fuzzy image classification method based on image enhancement
CN110276345A (en) * 2019-06-05 2019-09-24 北京字节跳动网络技术有限公司 Convolutional neural networks model training method, device and computer readable storage medium
CN110276345B (en) * 2019-06-05 2021-09-17 北京字节跳动网络技术有限公司 Convolutional neural network model training method and device and computer readable storage medium
CN110427940A (en) * 2019-08-05 2019-11-08 山东浪潮人工智能研究院有限公司 A method of pre-selection frame is generated for object detection model
CN111723632A (en) * 2019-11-08 2020-09-29 珠海达伽马科技有限公司 Ship tracking method and system based on twin network
CN111723632B (en) * 2019-11-08 2023-09-15 珠海达伽马科技有限公司 Ship tracking method and system based on twin network
CN111382695A (en) * 2020-03-06 2020-07-07 北京百度网讯科技有限公司 Method and apparatus for detecting boundary points of object
CN111401215A (en) * 2020-03-12 2020-07-10 杭州涂鸦信息技术有限公司 Method and system for detecting multi-class targets
CN111401215B (en) * 2020-03-12 2023-10-31 杭州涂鸦信息技术有限公司 Multi-class target detection method and system
CN111461145A (en) * 2020-03-31 2020-07-28 中国科学院计算技术研究所 Method for detecting target based on convolutional neural network
CN111563441B (en) * 2020-04-29 2023-03-24 上海富瀚微电子股份有限公司 Anchor point generation matching method for target detection
CN111563441A (en) * 2020-04-29 2020-08-21 上海富瀚微电子股份有限公司 Anchor point generation matching method for target detection
CN111709377A (en) * 2020-06-18 2020-09-25 苏州科达科技股份有限公司 Feature extraction method, target re-identification method and device and electronic equipment
CN111951268A (en) * 2020-08-11 2020-11-17 长沙大端信息科技有限公司 Parallel segmentation method and device for brain ultrasonic images
CN111951268B (en) * 2020-08-11 2024-06-07 深圳蓝湘智影科技有限公司 Brain ultrasound image parallel segmentation method and device
CN111931877B (en) * 2020-10-12 2021-01-05 腾讯科技(深圳)有限公司 Target detection method, device, equipment and storage medium
CN111931877A (en) * 2020-10-12 2020-11-13 腾讯科技(深圳)有限公司 Target detection method, device, equipment and storage medium
CN112926595A (en) * 2021-02-04 2021-06-08 深圳市豪恩汽车电子装备股份有限公司 Training device for deep learning neural network model, target detection system and method
CN112926595B (en) * 2021-02-04 2022-12-02 深圳市豪恩汽车电子装备股份有限公司 Training device of deep learning neural network model, target detection system and method
CN113780355A (en) * 2021-08-12 2021-12-10 上海理工大学 Deep convolutional neural network learning method for deep sea submersible propeller fault identification
CN113780355B (en) * 2021-08-12 2024-02-09 上海理工大学 Deep convolution neural network learning method for fault identification of deep sea submersible propeller

Also Published As

Publication number Publication date
CN109583483B (en) 2020-12-11

Similar Documents

Publication Publication Date Title
CN109583483A (en) A kind of object detection method and system based on convolutional neural networks
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
CN109859190B (en) Target area detection method based on deep learning
CN113160062B (en) Infrared image target detection method, device, equipment and storage medium
CN114202672A (en) Small target detection method based on attention mechanism
CN111160249A (en) Multi-class target detection method of optical remote sensing image based on cross-scale feature fusion
CN108960404B (en) Image-based crowd counting method and device
CN109492596B (en) Pedestrian detection method and system based on K-means clustering and regional recommendation network
CN111652273B (en) Deep learning-based RGB-D image classification method
CN112861970B (en) Fine-grained image classification method based on feature fusion
CN114758288A (en) Power distribution network engineering safety control detection method and device
CN111768415A (en) Image instance segmentation method without quantization pooling
CN111126459A (en) Method and device for identifying fine granularity of vehicle
CN111461213A (en) Training method of target detection model and target rapid detection method
CN107944403A (en) Pedestrian's attribute detection method and device in a kind of image
CN107025444A (en) Piecemeal collaboration represents that embedded nuclear sparse expression blocks face identification method and device
CN113496480A (en) Method for detecting weld image defects
CN110751195A (en) Fine-grained image classification method based on improved YOLOv3
CN115631344A (en) Target detection method based on feature adaptive aggregation
CN114332921A (en) Pedestrian detection method based on improved clustering algorithm for Faster R-CNN network
CN116645592A (en) Crack detection method based on image processing and storage medium
CN117333845A (en) Real-time detection method for small target traffic sign based on improved YOLOv5s
CN113486879B (en) Image area suggestion frame detection method, device, equipment and storage medium
CN114926498A (en) Rapid target tracking method based on space-time constraint and learnable feature matching
CN111104539A (en) Fine-grained vehicle image retrieval method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant