CN109583483A - A kind of object detection method and system based on convolutional neural networks - Google Patents
A kind of object detection method and system based on convolutional neural networks Download PDFInfo
- Publication number
- CN109583483A CN109583483A CN201811347546.9A CN201811347546A CN109583483A CN 109583483 A CN109583483 A CN 109583483A CN 201811347546 A CN201811347546 A CN 201811347546A CN 109583483 A CN109583483 A CN 109583483A
- Authority
- CN
- China
- Prior art keywords
- feature
- anchor point
- characteristic pattern
- frame
- convolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The present invention relates to a kind of object detection method and system based on convolutional neural networks, comprising: extract the convolution characteristic pattern of picture to be measured respectively using the convolution kernel of a variety of scales;The feature vector that each spatial position of convolution characteristic pattern is adjusted using full articulamentum, obtains fisrt feature figure, is spliced to obtain splicing characteristic pattern, the characteristic information in the splicing each channel of characteristic pattern is adjusted using full articulamentum, obtains second feature figure;For the anchor point frame for setting different scale and length-width ratio on each spatial position of second feature figure, the coordinate and size of anchor point frame are the coordinate systems relative to picture to be measured;Each anchor point frame is projected on second feature figure, the feature after the extraction projection of using area feature extraction operation inside anchor point frame, and has the anchor point frame of object as target candidate frame frame choosing;The accurate location and size of classification and regressive object candidate frame are carried out to the object in target candidate frame using target identification network.
Description
Technical field
The present invention relates to technical field of computer vision more particularly to a kind of object candidate area generation method of enhancing with
Device and object detection method.
Background technique
Target detection is one of basic project of computer vision field, with the development of depth convolutional neural networks, mesh
The performance of mark detection also obtains very big improvement.It is most common in the object detection method currently based on convolutional neural networks
It is based on two stage target detection process, which uses target candidate to generate network first and generate candidate frame
(proposals), classification then is carried out to candidate frame (proposals) using identification network and accurate adjustment obtains boundary to the end
Frame.
Based in two stage testing process, generating candidate frame is a most important step.It there is now two class target candidates
The generation method of frame: one kind is using the method for traditional manual feature, and another kind of is that the candidate frame based on deep learning generates
Technology.The former obtains candidate frame using super-pixel or the marginal information of object etc.;The latter is using full convolutional network by means of anchor
Point frame (anchorbox, a series of rectangle frame of predetermined locations, scale and length-width ratio, similarly hereinafter) carry out predicting candidate frame simultaneously
Position and judge whether candidate frame includes object.
Although the quality for the candidate frame that the candidate frame generation technique based on deep learning obtains is compared to based on traditional-handwork
The method of feature, which obtains result, to get well, but includes in the candidate frame of the generation of the target candidate frame generation technique based on deep learning
A large amount of background rather than include real object.The bounding box for finally identification model of second stage being obtained is from obtaining
Higher positional accuracy is obtained, the improvement of target detection performance is limited.The primary limitation of generation method based on deep learning
It is using the convolution kernel of single scale come the Object Extraction feature for different scale, while different rulers on characteristic pattern same position
The anchor point frame of degree has used identical feature, so that final the result is that son optimization.
Summary of the invention
The volume of single scale is used during generating candidate frame for the candidate frame generation technique based on deep learning
Product core extracts feature and different scale anchor point frame shares the limitation of same characteristic features, causes target detection that cannot obtain more Gao Ding
The case where position precision, the invention proposes a kind of object detection methods based on convolutional neural networks, comprising:
Step 1, the convolution characteristic pattern for extracting picture to be measured respectively using the convolution kernel of a variety of scales;
Step 2, the feature vector that each spatial position of convolution characteristic pattern is adjusted using full articulamentum obtain the first spy
Sign figure;
Step 3 splices the fisrt feature figure, obtains splicing characteristic pattern, adjusts the splicing characteristic pattern using the full articulamentum
The characteristic information in each channel obtains second feature figure;
Step 4, the anchor point frame to set different scale and length-width ratio on each spatial position of the second feature figure, the anchor
The coordinate and size of point frame are the coordinate systems relative to the picture to be measured;
Step 5 projects to each anchor point frame on the second feature figure, and using area feature extraction operation extracts projection
Feature inside anchor point frame afterwards, and the probability value in the anchor point frame comprising object is obtained according to the feature inside the anchor point frame, and
Target candidate frame is selected from all anchor point frames according to the probability value;
Step 6 carries out classification and regressive object candidate frame to the object in target candidate frame using target identification network
Accurate location and size, the bounding box of object is determined according to the accurate location and size, classification results and the bounding box are made
For object detection results output.
The object detection method based on convolutional neural networks, wherein the step 1 is specially using k different convolution kernel
Convolution operation extracts feature parallel.
The object detection method based on convolutional neural networks, wherein the step 2 adjusts convolution spy especially by following formula
Sign schemes the feature vector of each spatial position:
ωij=F (dij)
D in formulaijFor the feature vector of a spatial position of convolution characteristic pattern (i, j), nonlinear function F is three cascades
The full articulamentum, ωijFor the first adjustment factor, oijFor the feature vector of spatial position (i, j) of the fisrt feature figure,
Indicate dot product.
The object detection method based on convolutional neural networks, wherein the step 3 adjusts each channel of splicing characteristic pattern
Characteristic information specifically include:
The Feature Descriptor a in the channel is obtained using the average pond of the overall situation:
A=global_pooling (U)
U is the splicing characteristic pattern in formula, and global_pooling indicates global average pond;
Use three cascade full articulamentums as nonlinear function F, obtain the adjustment factor e in each channel:
E=F (a),
WhereinIndicate dot product, U ' is the second feature figure.
The object detection method based on convolutional neural networks, wherein the step 5 include:
According to the probability value, which is ranked up, after filtering out duplicate anchor point frame using non-maxima suppression,
N number of target candidate frame of maximum probability is selected, N is default positive integer.
The invention also discloses a kind of object detection system based on convolutional neural networks, including:
Extraction module extracts the convolution characteristic pattern of picture to be measured using the convolution kernel of a variety of scales respectively;
First adjustment module is adjusted the feature vector of each spatial position of convolution characteristic pattern using full articulamentum, obtained
To fisrt feature figure;
Second adjustment module splices the fisrt feature figure, obtains splicing characteristic pattern, adjusts the splicing using the full articulamentum
The characteristic information in each channel of characteristic pattern, obtains second feature figure;
First adjustment module, for the anchor point for setting different scale and length-width ratio on each spatial position of the second feature figure
Frame, the coordinate and size of the anchor point frame are the coordinate systems relative to the picture to be measured;
Candidate frame Choosing module, for each anchor point frame to be projected to the second feature figure, using area feature extraction
Feature after operation extraction projection inside anchor point frame, and inclusion in the anchor point frame is obtained according to the feature inside the anchor point frame
The probability value of body, and target candidate frame is selected from all anchor point frames according to the probability value;
Module of target detection carries out classification and regressive object to the object in target candidate frame using target identification network
The accurate location and size of candidate frame, the bounding box of object is determined according to the accurate location and size, by classification results and the side
Boundary's frame is exported as object detection results.
The object detection system based on convolutional neural networks, wherein the extraction module is specially to use k different convolution
The convolution operation of core extracts feature parallel.
The object detection system based on convolutional neural networks, wherein first adjustment module is adjusted especially by following formula and is somebody's turn to do
The feature vector of each spatial position of convolution characteristic pattern:
ωij=F (dij)
D in formulaijFor the feature vector of a spatial position of convolution characteristic pattern (i, j), nonlinear function F is three cascades
The full articulamentum, ωijFor the first adjustment factor, oijFor the feature vector of spatial position (i, j) of the fisrt feature figure,
Indicate dot product.
The object detection system based on convolutional neural networks, wherein it is every to adjust the splicing characteristic pattern for second adjustment module
The characteristic information in a channel specifically includes:
The Feature Descriptor a in the channel is obtained using the average pond of the overall situation:
A=global_pooling (U)
U is the splicing characteristic pattern in formula, and global_pooling indicates global average pond;
Use three cascade full articulamentums as nonlinear function F, obtain the adjustment factor e in each channel:
E=F (a),
WhereinIndicate dot product, U ' is the second feature figure.
The object detection system based on convolutional neural networks, wherein the candidate frame Choosing module include:
According to the probability value, which is ranked up, after filtering out duplicate anchor point frame using non-maxima suppression,
N number of target candidate frame of maximum probability is selected, N is default positive integer.
Compared with prior art, the present invention having the following advantages and benefits:
1, target candidate generation method provided by the present invention and device are independent of specific core network.Existing mind
It all can serve as target candidate generation method provided by the invention and device after removing last full articulamentum through network
Target candidate generation method provided by the invention and device easily can be directly connected to core network by core network
On the last layer convolutional layer;
2, had using the target candidate frame (proposals) that target candidate generation method provided by the invention and device generate
There is higher quality, i.e. proposals seldom includes background information, can accurately navigate to object;
3, target candidate generation method provided by the invention and device have processing speed more faster than the prior art;
4, target candidate generation method provided by the invention and device are utilized in being based on two stage target detection process
The available higher detection accuracy of target candidate frame (proposals) of generation.
Detailed description of the invention
Fig. 1 is a kind of flow chart of the target candidate generation method of enhancing of the embodiment of the present invention;
Fig. 2 is a kind of schematic diagram that feature is extracted using the convolution kernel of a variety of scales of the embodiment of the present invention;
Fig. 3 is each space bit for the convolution characteristic pattern that a kind of convolution kernel for every kind of scale of the embodiment of the present invention obtains
Set the schematic diagram of learning regulation coefficient;
Fig. 4 is that a kind of of the embodiment of the present invention is characterized each channel learning regulation coefficient of figure to be used to adaptive adjusting every
The schematic diagram of a channel characteristics information;
Fig. 5 is the schematic diagram of feature in a kind of anchor point frame for extracting each spatial position of the embodiment of the present invention;
Fig. 6 is a kind of object candidate area generating means schematic diagram of enhancing of the embodiment of the present invention;
Fig. 7 is a kind of based on a kind of mesh of the target candidate generation method of enhancing provided by the invention of the embodiment of the present invention
Mark detection method flow chart.
Specific embodiment
The invention proposes a kind of object detection methods based on convolutional neural networks, comprising:
Step 1, the convolution characteristic pattern for extracting picture to be measured respectively using the convolution kernel of a variety of scales;
Step 2, the feature vector that each spatial position of convolution characteristic pattern is adjusted using full articulamentum obtain the first spy
Sign figure;
Step 3 splices the fisrt feature figure, obtains splicing characteristic pattern, adjusts the splicing characteristic pattern using the full articulamentum
The characteristic information in each channel obtains second feature figure;
Step 4, the anchor point frame to set different scale and length-width ratio on each spatial position of the second feature figure, the anchor
The coordinate and size of point frame are the coordinate systems relative to the picture to be measured;
Step 5 projects to each anchor point frame on the second feature figure, and using area feature extraction operation extracts projection
Feature inside anchor point frame afterwards, and the probability value in the anchor point frame comprising object is obtained according to the feature inside the anchor point frame, and
Target candidate frame is selected from all anchor point frames according to the probability value;
Step 6 carries out classification and regressive object candidate frame to the object in target candidate frame using target identification network
Accurate location and size, the bounding box of object is determined according to the accurate location and size, classification results and the bounding box are made
For object detection results output.
The object detection method based on convolutional neural networks, wherein the step 1 is specially using k different convolution kernel
Convolution operation extracts feature parallel.
The object detection method based on convolutional neural networks, wherein the step 2 adjusts convolution spy especially by following formula
Sign schemes the feature vector of each spatial position:
ωij=F (dij)
D in formulaijFor the feature vector of a spatial position of convolution characteristic pattern (i, j), nonlinear function F is three cascades
The full articulamentum, ωijFor the first adjustment factor, oijFor the feature vector of spatial position (i, j) of the fisrt feature figure,
Indicate dot product.
The object detection method based on convolutional neural networks, wherein the step 3 adjusts each channel of splicing characteristic pattern
Characteristic information specifically include:
The Feature Descriptor a in the channel is obtained using the average pond of the overall situation:
A=global_pooling (U)
U is the splicing characteristic pattern in formula, and global_pooling indicates global average pond;
Use three cascade full articulamentums as nonlinear function F, obtain the adjustment factor e in each channel:
E=F (a),
WhereinIndicate dot product, U ' is the second feature figure.
The object detection method based on convolutional neural networks, wherein the step 5 include:
According to the probability value, which is ranked up, after filtering out duplicate anchor point frame using non-maxima suppression,
N number of target candidate frame of maximum probability is selected, N is default positive integer.
It is logical below in conjunction with attached drawing in order to keep the purpose of the present invention, technical solution, design method and advantage more clear
Crossing specific embodiment, the present invention is described in more detail.It should be appreciated that specific embodiment described herein is only to explain
The present invention is not intended to limit the present invention.
Embodiment 1
Fig. 1 is a kind of target candidate frame generation method provided by the invention, be the steps include:
S11: feature is extracted using the convolution kernel of a variety of scales;
In a kind of preferred embodiment, as shown in Fig. 2, using the convolution kernel 1 × 1 of k=3 kind scale, 3 × 3 and 5 × 5
To extract feature in multiple dimensioned layer.In specific implementation in order to reduce parameter amount and increase non-linear expression, 3 × 3 and 5 ×
It is added to the convolutional layer that a public convolution kernel is 1 × 1 before 5 convolution and is used for dimensionality reduction;The convolutional layer for being 5 × 5 by convolution kernel
It is further divided into the convolutional layer that cascade two layers of convolution kernel is 3 × 3.Every kind is also given in a kind of preferred embodiment in Fig. 2
The output channel number of convolution operation.
S12: for the obtained each spatial position learning regulation coefficient of characteristic pattern of convolution nuclear convolution of every kind of scale, it is used to
The characteristic information that the adaptive each spatial position useful feature information of enhancing inhibits each spatial position useless simultaneously;
The step specific embodiment is, as shown in figure 3, for using the convolution kernel of a certain scale to carry out convolution behaviour
Make the obtained a height of H of input, width W, port number is the characteristic pattern M of Cin, the feature vector of a spatial position (i, j) is
dij(height × wide × port number=1 × 1 × C) indicates the characteristic value of position (i, j) of H*W characteristic pattern, and the position is all logical
It is exactly feature vector d that the characteristic value in road, which is taken out,ij., nonlinear function F use three cascade full articulamentums, act on dijOn
Obtain the adjustment factor ω of the spatial positionij, i.e.,
ωij=F (dij),
Wherein F indicates the nonlinear function that three full articulamentums are constituted.
The feature vector of the position after thus being adjusted is
WhereinIndicate dot product, oijFor the characteristic pattern M after adjustingoutThe spatial position (i, j) feature vector.
It is in actual implementation in order to reduce the complexity of model parameter and network, the parameter of three full articulamentums is each
Spatial position is shared, and articulamentum complete in this way can be replaced with convolution kernel by 1 × 1 convolutional layer.
S13: together by the convolution merging features of every kind of scale after adjusting;
S14: to the characteristic pattern after splicing, each channel is learnt according to the feature distribution in each channel of this feature figure
Adjustment factor, for the characteristic information in the adaptive each channel of adjusting, it should be noted that characteristic information be different from feature to
Amount, characteristic information indicates the adjustment factor obtained with study to adjust the expressed feature out in each channel, and feature vector is more
More is to indicate a column data, and the data that S14 is adjusted are H*W;
The specific embodiment of the step are as follows: as shown in figure 4, input feature vector figure is U, first with the average pond of the overall situation
To the Feature Descriptor in the channel
A=global_pooling (U),
Wherein global_pooling indicates global average pond.
Then three cascade full articulamentums are used as nonlinear function F to obtain the adjustment factor e in each channel, i.e.,
E=F (a),
Wherein F indicates the nonlinear function that three full articulamentums are constituted.
Therefore the characteristic pattern after final each channel is adjusted:
WhereinIndicate dot product.
S15: different scale and length and width are set on each spatial position to adjust the characteristic pattern after channel characteristics information
The anchor point frame of ratio, the coordinate and size of the anchor point frame are relative to the coordinate system for being originally inputted picture.Each position of characteristic pattern
As the center point coordinate of anchor point frame, which can be by obtaining the coordinate on input picture multiplied by down-sampling step number.In this way
The frame for being primarily due to mark exists on original image, and anchor point frame, which projects on input picture, facilitates calculating training mesh
Mark;
S16: each anchor point frame is projected on the characteristic pattern after adjusting channel characteristics information, using area feature extraction
The feature inside the anchor point frame after projection is extracted in operation.
It is illustrated in figure 5 the concrete operations mode of the step, for each spatial position of characteristic pattern, a kind of preferred reality
It applies in mode, in advance in the anchor point frame of each position setting 3 kinds of scales and 3 kinds of length-width ratios, the anchor point frame is then projected into institute
It states on the output characteristic pattern U ' of channel adjustment module.Using area feature extracting method φ come extract projection after each anchor point
The feature for including in frame, a kind of simple and effective Region Feature Extraction method can use (the candidate region pond RoIpooling
Change).In the present embodiment in order to reduce number of parameters, anchor point frame is grouped according to length-width ratio, the extraction of identical aspect ratio
The feature of same size.It is 128 for example, by using 3 kinds of scales2, 2562, 5122Pixel and length-width ratio are 1:2, totally 9 kinds of 1:1,2:1
Anchor point frame can be the characteristic information that anchor point frame extracts 5 × 11,7 × 7 and 11 × 5 sizes in each spatial position.Obtaining this
After a little characteristic informations, two full articulamentums can be used and be further processed, then spells the characteristic pattern after processing
It is connected into a characteristic pattern.The parameter of the full articulamentum of each spatial position identical aspect ratio can be total in actual implementation
It enjoys, thus full articulamentum used can be converted to convolution layer operation.
S17: it is respectively intended to return by two parallel network layers are connected after the feature inside each anchor point frame of extraction
Whether include object in the position of candidate frame and the differentiation candidate frame.
It, can be according to output in the target candidate frame generated using the target candidate frame generation method proposed by the present invention
Include that the probability value of object is ranked up, after filtering out duplicate target candidate frame using non-maxima suppression, select
N number of target candidate frame (proposals) of maximum probability.
Embodiment 2
The embodiment of the present invention also provides a kind of object candidate area generating means of enhancing, as shown in fig. 6, the device includes
Convolution pyramid module 21, Space adjustment module 22, Space adjustment merging features module 23, channel adjustment module 24, feature are suitable
Answer module 25 and classification and regression block 26.
Wherein convolution pyramid module 21, the module extract feature for the convolution kernel using a variety of scales;Space tune
Save module 22, each spatial position learning regulation system for the characteristic pattern which is used to obtain for the convolution nuclear convolution of every kind of scale
Number is used to the feature letter that the adaptive each spatial position useful feature information of enhancing inhibits each spatial position useless simultaneously
Breath;Space adjustment merging features module 23, the convolution which is used to obtain the convolution kernel of every kind of scale after adjusting are special
Sign is stitched together;Channel adjustment module 24, which is used for the characteristic pattern after splicing, logical according to each of this feature figure
The feature distribution in road learns the adjustment factor in each channel, for the characteristic information in the adaptive each channel of adjusting;Feature is suitable
Module 25 is answered, which is used for the anchor point frame of the different scale for the setting of each spatial position, while each anchor point frame being projected
On characteristic pattern after to adjusting channel characteristics information, using area feature extraction operation is extracted inside the anchor point frame after projection
Feature;Classify and regression block 26, connects two after the feature inside each anchor point frame in the module for that will extract
Whether parallel network layer is respectively intended to return the position of candidate frame and differentiate in the candidate frame comprising object.
In object candidate area generating means provided by the embodiment of the present invention, the course of work and target of modules are waited
Therefore aforementioned function equally may be implemented, details are not described herein in favored area generation method technical characteristic having the same.
Embodiment 3
The embodiment of the present invention provides a kind of object detection method of target candidate frame generated based on embodiment 1.Including such as
Lower step:
S31: picture to be detected is obtained;
S32: picture is input in target detection network, and the target detection network includes enhancing described in embodiment 1
Target candidate frame generate network and target identification network;
S321: target candidate frame generates the target candidate frame (proposals) that network generation may include object;
S322: target identification network is classified to obtain in some proposal to the proposals that may include object
The specific category of object;
S323: target identification network is returned to obtain in some proposal to the proposals that may include object
Bounding box (boundingbox) size of object estimation;
The following are system embodiment corresponding with above method embodiment, present embodiment can be mutual with above embodiment
Cooperation is implemented.The relevant technical details mentioned in above embodiment are still effective in the present embodiment, in order to reduce repetition,
Which is not described herein again.Correspondingly, the relevant technical details mentioned in present embodiment are also applicable in above embodiment.
Invention additionally discloses can a kind of object detection system based on convolutional neural networks, including:
Extraction module extracts the convolution characteristic pattern of picture to be measured using the convolution kernel of a variety of scales respectively;
First adjustment module is adjusted the feature vector of each spatial position of convolution characteristic pattern using full articulamentum, obtained
To fisrt feature figure;
Second adjustment module splices the fisrt feature figure, obtains splicing characteristic pattern, adjusts the splicing using the full articulamentum
The characteristic information in each channel of characteristic pattern, obtains second feature figure;
First adjustment module, for the anchor point for setting different scale and length-width ratio on each spatial position of the second feature figure
Frame, the coordinate and size of the anchor point frame are the coordinate systems relative to the picture to be measured;
Candidate frame Choosing module, for each anchor point frame to be projected to the second feature figure, using area feature extraction
Feature after operation extraction projection inside anchor point frame, and inclusion in the anchor point frame is obtained according to the feature inside the anchor point frame
The probability value of body, and target candidate frame is selected from all anchor point frames according to the probability value;
Module of target detection carries out classification and regressive object to the object in target candidate frame using target identification network
The accurate location and size of candidate frame, the bounding box of object is determined according to the accurate location and size, by classification results and the side
Boundary's frame is exported as object detection results.
The object detection system based on convolutional neural networks, wherein the extraction module is specially to use k different convolution
The convolution operation of core extracts feature parallel.
The object detection system based on convolutional neural networks, wherein first adjustment module is adjusted especially by following formula and is somebody's turn to do
The feature vector of each spatial position of convolution characteristic pattern:
ωij=F (dij)
D in formulaijFor the feature vector of a spatial position of convolution characteristic pattern (i, j), nonlinear function F is three cascades
The full articulamentum, ωijFor the first adjustment factor, oijFor the feature vector of spatial position (i, j) of the fisrt feature figure,
Indicate dot product.
The object detection system based on convolutional neural networks, wherein it is every to adjust the splicing characteristic pattern for second adjustment module
The characteristic information in a channel specifically includes:
The Feature Descriptor a in the channel is obtained using the average pond of the overall situation:
A=global_pooling (U)
U is the splicing characteristic pattern in formula, and global_pooling indicates global average pond;
Use three cascade full articulamentums as nonlinear function F, obtain the adjustment factor e in each channel:
E=F (a),
WhereinIndicate dot product, U ' is the second feature figure.
The object detection system based on convolutional neural networks, wherein the candidate frame Choosing module include:
According to the probability value, which is ranked up, after filtering out duplicate anchor point frame using non-maxima suppression,
N number of target candidate frame of maximum probability is selected, N is default positive integer.
Claims (10)
1. a kind of object detection method based on convolutional neural networks characterized by comprising
Step 1, the convolution characteristic pattern for extracting picture to be measured respectively using the convolution kernel of a variety of scales;
Step 2, the feature vector that each spatial position of convolution characteristic pattern is adjusted using full articulamentum, obtain fisrt feature
Figure;
Step 3 splices the fisrt feature figure, obtains splicing characteristic pattern, adjusts the splicing characteristic pattern using full articulamentum and each lead to
The characteristic information in road obtains second feature figure;
Step 4, the anchor point frame to set different scale and length-width ratio on each spatial position of the second feature figure;
Step 5 projects to each anchor point frame on the second feature figure, and using area feature extraction operation extracts anchor after projection
Point frame inside feature, and according to the feature inside the anchor point frame obtain in the anchor point frame include object probability value, and according to
The probability value selects target candidate frame from all anchor point frames;
Step 6, the standard for carrying out classification and regressive object candidate frame to the object in target candidate frame using target identification network
True position and size, the bounding box of object is determined according to the accurate location and size, using classification results and the bounding box as mesh
Mark testing result output.
2. as described in claim 1 based on the object detection method of convolutional neural networks, which is characterized in that the step 1 is specific
To use the convolution operation of k different convolution kernels to extract feature parallel.
3. as described in claim 1 based on the object detection method of convolutional neural networks, which is characterized in that the step 2 is specific
The feature vector of each spatial position of convolution characteristic pattern is adjusted by following formula:
ωij=F (dij)
D in formulaijFor the feature vector of a spatial position of convolution characteristic pattern (i, j), nonlinear function F be three it is cascade should
Full articulamentum, ωijFor the first adjustment factor, oijFor the feature vector of spatial position (i, j) of the fisrt feature figure,It indicates
Dot product.
4. the object detection method as claimed in claim 1 or 3 based on convolutional neural networks, which is characterized in that the step 3 is adjusted
The characteristic information for saving each channel of splicing characteristic pattern specifically includes:
The Feature Descriptor a in the channel is obtained using the average pond of the overall situation:
A=global_pooling (U)
U is the splicing characteristic pattern in formula, and global_pooling indicates global average pond;
Use three cascade full articulamentums as nonlinear function F, obtain the adjustment factor e in each channel:
E=F (a),
WhereinIndicate dot product, U ' is the second feature figure.
5. as claimed in claim 4 based on the object detection method of convolutional neural networks, which is characterized in that the step 5 includes:
According to the probability value, which is ranked up, after filtering out duplicate anchor point frame using non-maxima suppression, is selected
N number of target candidate frame of maximum probability, N are default positive integer.
6. a kind of object detection system based on convolutional neural networks characterized by comprising
Extraction module extracts the convolution characteristic pattern of picture to be measured using the convolution kernel of a variety of scales respectively;
First adjustment module adjusts the feature vector of each spatial position of convolution characteristic pattern using full articulamentum, obtains
One characteristic pattern;
Second adjustment module splices the fisrt feature figure, obtains splicing characteristic pattern, adjusts the splicing feature using the full articulamentum
The characteristic information for scheming each channel obtains second feature figure;
First adjustment module, for the anchor point frame for setting different scale and length-width ratio on each spatial position of the second feature figure;
Candidate frame Choosing module, for each anchor point frame to be projected to the second feature figure, using area feature extraction operation
The feature after projecting inside anchor point frame is extracted, and obtains including object in the anchor point frame according to the feature inside the anchor point frame
Probability value, and target candidate frame is selected from all anchor point frames according to the probability value;
Module of target detection carries out classification to the object in target candidate frame using target identification network and regressive object is candidate
The accurate location and size of frame, the bounding box of object is determined according to the accurate location and size, by classification results and the bounding box
It is exported as object detection results.
7. as described in claim 1 based on the object detection system of convolutional neural networks, which is characterized in that extraction module tool
Body is to extract feature parallel using the convolution operation of k different convolution kernels.
8. as described in claim 1 based on the object detection system of convolutional neural networks, which is characterized in that the first adjusting mould
Block adjusts the feature vector of each spatial position of convolution characteristic pattern especially by following formula:
ωij=F (dij)
D in formulaijFor the feature vector of a spatial position of convolution characteristic pattern (i, j), nonlinear function F be three it is cascade should
Full articulamentum, ωijFor the first adjustment factor, oijFor the feature vector of spatial position (i, j) of the fisrt feature figure,It indicates
Dot product.
9. the object detection system based on convolutional neural networks as described in claim 6 or 8, which is characterized in that second tune
The characteristic information that section module adjusts each channel of splicing characteristic pattern specifically includes:
The Feature Descriptor a in the channel is obtained using the average pond of the overall situation:
A=global_pooling (U)
U is the splicing characteristic pattern in formula, and global_pooling indicates global average pond;
Use three cascade full articulamentums as nonlinear function F, obtain the adjustment factor e in each channel:
E=F (a),
WhereinIndicate dot product, U ' is the second feature figure.
10. as claimed in claim 9 based on the object detection system of convolutional neural networks, which is characterized in that the candidate frame is chosen
Modeling block includes:
According to the probability value, which is ranked up, after filtering out duplicate anchor point frame using non-maxima suppression, is selected
N number of target candidate frame of maximum probability, N are default positive integer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811347546.9A CN109583483B (en) | 2018-11-13 | 2018-11-13 | Target detection method and system based on convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811347546.9A CN109583483B (en) | 2018-11-13 | 2018-11-13 | Target detection method and system based on convolutional neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109583483A true CN109583483A (en) | 2019-04-05 |
CN109583483B CN109583483B (en) | 2020-12-11 |
Family
ID=65922216
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811347546.9A Active CN109583483B (en) | 2018-11-13 | 2018-11-13 | Target detection method and system based on convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109583483B (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110070122A (en) * | 2019-04-15 | 2019-07-30 | 沈阳理工大学 | A kind of convolutional neural networks blurred picture classification method based on image enhancement |
CN110276345A (en) * | 2019-06-05 | 2019-09-24 | 北京字节跳动网络技术有限公司 | Convolutional neural networks model training method, device and computer readable storage medium |
CN110427940A (en) * | 2019-08-05 | 2019-11-08 | 山东浪潮人工智能研究院有限公司 | A method of pre-selection frame is generated for object detection model |
CN111382695A (en) * | 2020-03-06 | 2020-07-07 | 北京百度网讯科技有限公司 | Method and apparatus for detecting boundary points of object |
CN111401215A (en) * | 2020-03-12 | 2020-07-10 | 杭州涂鸦信息技术有限公司 | Method and system for detecting multi-class targets |
CN111461145A (en) * | 2020-03-31 | 2020-07-28 | 中国科学院计算技术研究所 | Method for detecting target based on convolutional neural network |
CN111563441A (en) * | 2020-04-29 | 2020-08-21 | 上海富瀚微电子股份有限公司 | Anchor point generation matching method for target detection |
CN111709377A (en) * | 2020-06-18 | 2020-09-25 | 苏州科达科技股份有限公司 | Feature extraction method, target re-identification method and device and electronic equipment |
CN111723632A (en) * | 2019-11-08 | 2020-09-29 | 珠海达伽马科技有限公司 | Ship tracking method and system based on twin network |
CN111832328A (en) * | 2019-04-15 | 2020-10-27 | 北京京东尚科信息技术有限公司 | Bar code detection method, bar code detection device, electronic equipment and medium |
CN111931877A (en) * | 2020-10-12 | 2020-11-13 | 腾讯科技(深圳)有限公司 | Target detection method, device, equipment and storage medium |
CN111951268A (en) * | 2020-08-11 | 2020-11-17 | 长沙大端信息科技有限公司 | Parallel segmentation method and device for brain ultrasonic images |
CN112926595A (en) * | 2021-02-04 | 2021-06-08 | 深圳市豪恩汽车电子装备股份有限公司 | Training device for deep learning neural network model, target detection system and method |
CN113780355A (en) * | 2021-08-12 | 2021-12-10 | 上海理工大学 | Deep convolutional neural network learning method for deep sea submersible propeller fault identification |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9646243B1 (en) * | 2016-09-12 | 2017-05-09 | International Business Machines Corporation | Convolutional neural networks using resistive processing unit array |
CN107316058A (en) * | 2017-06-15 | 2017-11-03 | 国家新闻出版广电总局广播科学研究院 | Improve the method for target detection performance by improving target classification and positional accuracy |
CN107680678A (en) * | 2017-10-18 | 2018-02-09 | 北京航空航天大学 | Based on multiple dimensioned convolutional neural networks Thyroid ultrasound image tubercle auto-check system |
CN108564097A (en) * | 2017-12-05 | 2018-09-21 | 华南理工大学 | A kind of multiscale target detection method based on depth convolutional neural networks |
CN108647585A (en) * | 2018-04-20 | 2018-10-12 | 浙江工商大学 | A kind of traffic mark symbol detection method based on multiple dimensioned cycle attention network |
CN108765387A (en) * | 2018-05-17 | 2018-11-06 | 杭州电子科技大学 | Based on Faster RCNN mammary gland DBT image lump automatic testing methods |
-
2018
- 2018-11-13 CN CN201811347546.9A patent/CN109583483B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9646243B1 (en) * | 2016-09-12 | 2017-05-09 | International Business Machines Corporation | Convolutional neural networks using resistive processing unit array |
US20180075338A1 (en) * | 2016-09-12 | 2018-03-15 | International Business Machines Corporation | Convolutional neural networks using resistive processing unit array |
CN107316058A (en) * | 2017-06-15 | 2017-11-03 | 国家新闻出版广电总局广播科学研究院 | Improve the method for target detection performance by improving target classification and positional accuracy |
CN107680678A (en) * | 2017-10-18 | 2018-02-09 | 北京航空航天大学 | Based on multiple dimensioned convolutional neural networks Thyroid ultrasound image tubercle auto-check system |
CN108564097A (en) * | 2017-12-05 | 2018-09-21 | 华南理工大学 | A kind of multiscale target detection method based on depth convolutional neural networks |
CN108647585A (en) * | 2018-04-20 | 2018-10-12 | 浙江工商大学 | A kind of traffic mark symbol detection method based on multiple dimensioned cycle attention network |
CN108765387A (en) * | 2018-05-17 | 2018-11-06 | 杭州电子科技大学 | Based on Faster RCNN mammary gland DBT image lump automatic testing methods |
Non-Patent Citations (1)
Title |
---|
SHAOQING REN ET AL: "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", 《ARXIV:1506.01497V3 [CS.CV]》 * |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111832328A (en) * | 2019-04-15 | 2020-10-27 | 北京京东尚科信息技术有限公司 | Bar code detection method, bar code detection device, electronic equipment and medium |
CN111832328B (en) * | 2019-04-15 | 2024-07-16 | 北京京东乾石科技有限公司 | Bar code detection method, device, electronic equipment and medium |
CN110070122A (en) * | 2019-04-15 | 2019-07-30 | 沈阳理工大学 | A kind of convolutional neural networks blurred picture classification method based on image enhancement |
CN110070122B (en) * | 2019-04-15 | 2022-05-06 | 沈阳理工大学 | Convolutional neural network fuzzy image classification method based on image enhancement |
CN110276345A (en) * | 2019-06-05 | 2019-09-24 | 北京字节跳动网络技术有限公司 | Convolutional neural networks model training method, device and computer readable storage medium |
CN110276345B (en) * | 2019-06-05 | 2021-09-17 | 北京字节跳动网络技术有限公司 | Convolutional neural network model training method and device and computer readable storage medium |
CN110427940A (en) * | 2019-08-05 | 2019-11-08 | 山东浪潮人工智能研究院有限公司 | A method of pre-selection frame is generated for object detection model |
CN111723632A (en) * | 2019-11-08 | 2020-09-29 | 珠海达伽马科技有限公司 | Ship tracking method and system based on twin network |
CN111723632B (en) * | 2019-11-08 | 2023-09-15 | 珠海达伽马科技有限公司 | Ship tracking method and system based on twin network |
CN111382695A (en) * | 2020-03-06 | 2020-07-07 | 北京百度网讯科技有限公司 | Method and apparatus for detecting boundary points of object |
CN111401215A (en) * | 2020-03-12 | 2020-07-10 | 杭州涂鸦信息技术有限公司 | Method and system for detecting multi-class targets |
CN111401215B (en) * | 2020-03-12 | 2023-10-31 | 杭州涂鸦信息技术有限公司 | Multi-class target detection method and system |
CN111461145A (en) * | 2020-03-31 | 2020-07-28 | 中国科学院计算技术研究所 | Method for detecting target based on convolutional neural network |
CN111563441B (en) * | 2020-04-29 | 2023-03-24 | 上海富瀚微电子股份有限公司 | Anchor point generation matching method for target detection |
CN111563441A (en) * | 2020-04-29 | 2020-08-21 | 上海富瀚微电子股份有限公司 | Anchor point generation matching method for target detection |
CN111709377A (en) * | 2020-06-18 | 2020-09-25 | 苏州科达科技股份有限公司 | Feature extraction method, target re-identification method and device and electronic equipment |
CN111951268A (en) * | 2020-08-11 | 2020-11-17 | 长沙大端信息科技有限公司 | Parallel segmentation method and device for brain ultrasonic images |
CN111951268B (en) * | 2020-08-11 | 2024-06-07 | 深圳蓝湘智影科技有限公司 | Brain ultrasound image parallel segmentation method and device |
CN111931877B (en) * | 2020-10-12 | 2021-01-05 | 腾讯科技(深圳)有限公司 | Target detection method, device, equipment and storage medium |
CN111931877A (en) * | 2020-10-12 | 2020-11-13 | 腾讯科技(深圳)有限公司 | Target detection method, device, equipment and storage medium |
CN112926595A (en) * | 2021-02-04 | 2021-06-08 | 深圳市豪恩汽车电子装备股份有限公司 | Training device for deep learning neural network model, target detection system and method |
CN112926595B (en) * | 2021-02-04 | 2022-12-02 | 深圳市豪恩汽车电子装备股份有限公司 | Training device of deep learning neural network model, target detection system and method |
CN113780355A (en) * | 2021-08-12 | 2021-12-10 | 上海理工大学 | Deep convolutional neural network learning method for deep sea submersible propeller fault identification |
CN113780355B (en) * | 2021-08-12 | 2024-02-09 | 上海理工大学 | Deep convolution neural network learning method for fault identification of deep sea submersible propeller |
Also Published As
Publication number | Publication date |
---|---|
CN109583483B (en) | 2020-12-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109583483A (en) | A kind of object detection method and system based on convolutional neural networks | |
CN109584248B (en) | Infrared target instance segmentation method based on feature fusion and dense connection network | |
CN109859190B (en) | Target area detection method based on deep learning | |
CN113160062B (en) | Infrared image target detection method, device, equipment and storage medium | |
CN114202672A (en) | Small target detection method based on attention mechanism | |
CN111160249A (en) | Multi-class target detection method of optical remote sensing image based on cross-scale feature fusion | |
CN108960404B (en) | Image-based crowd counting method and device | |
CN109492596B (en) | Pedestrian detection method and system based on K-means clustering and regional recommendation network | |
CN111652273B (en) | Deep learning-based RGB-D image classification method | |
CN112861970B (en) | Fine-grained image classification method based on feature fusion | |
CN114758288A (en) | Power distribution network engineering safety control detection method and device | |
CN111768415A (en) | Image instance segmentation method without quantization pooling | |
CN111126459A (en) | Method and device for identifying fine granularity of vehicle | |
CN111461213A (en) | Training method of target detection model and target rapid detection method | |
CN107944403A (en) | Pedestrian's attribute detection method and device in a kind of image | |
CN107025444A (en) | Piecemeal collaboration represents that embedded nuclear sparse expression blocks face identification method and device | |
CN113496480A (en) | Method for detecting weld image defects | |
CN110751195A (en) | Fine-grained image classification method based on improved YOLOv3 | |
CN115631344A (en) | Target detection method based on feature adaptive aggregation | |
CN114332921A (en) | Pedestrian detection method based on improved clustering algorithm for Faster R-CNN network | |
CN116645592A (en) | Crack detection method based on image processing and storage medium | |
CN117333845A (en) | Real-time detection method for small target traffic sign based on improved YOLOv5s | |
CN113486879B (en) | Image area suggestion frame detection method, device, equipment and storage medium | |
CN114926498A (en) | Rapid target tracking method based on space-time constraint and learnable feature matching | |
CN111104539A (en) | Fine-grained vehicle image retrieval method, device and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |