CN109214505A - A kind of full convolution object detection method of intensive connection convolutional neural networks - Google Patents
A kind of full convolution object detection method of intensive connection convolutional neural networks Download PDFInfo
- Publication number
- CN109214505A CN109214505A CN201810998184.3A CN201810998184A CN109214505A CN 109214505 A CN109214505 A CN 109214505A CN 201810998184 A CN201810998184 A CN 201810998184A CN 109214505 A CN109214505 A CN 109214505A
- Authority
- CN
- China
- Prior art keywords
- feature
- layer
- convolutional neural
- network
- neural networks
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 50
- 238000001514 detection method Methods 0.000 title claims abstract description 22
- 238000013507 mapping Methods 0.000 claims abstract description 74
- 238000000034 method Methods 0.000 claims abstract description 10
- 238000006243 chemical reaction Methods 0.000 claims description 18
- 238000000605 extraction Methods 0.000 claims description 13
- 238000012545 processing Methods 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 9
- 230000004913 activation Effects 0.000 claims description 8
- 239000000284 extract Substances 0.000 claims description 7
- 238000010276 construction Methods 0.000 claims description 6
- 239000000203 mixture Substances 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 6
- 230000000007 visual effect Effects 0.000 claims description 6
- 230000000694 effects Effects 0.000 claims description 3
- 230000001537 neural effect Effects 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 3
- 230000000717 retained effect Effects 0.000 claims description 3
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 claims 1
- 239000010931 gold Substances 0.000 claims 1
- 229910052737 gold Inorganic materials 0.000 claims 1
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 20
- 235000013399 edible fruits Nutrition 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
Abstract
The present invention relates to artificial intelligence fields, more particularly to a kind of full convolution object detection method of intensive connection convolutional neural networks.The present invention is in order to overcome the shortcomings of that existing method cannot more accurately detect multiscale target, provide a kind of full convolution object detection method of intensive connection convolutional neural networks, it is characterized in that multiple dimensioned Feature Mapping can be effectively utilized to carry out target detection, so that convolutional neural networks are to the detection of the different scale target in same image accuracy rate all with higher.
Description
Technical field
The present invention relates to artificial intelligence fields, more particularly to a kind of full convolution of intensive connection convolutional neural networks
Object detection method.
Background technique
Convolutional neural networks have invariance to the detection of feature.Such as convolution mind after an object is translated, rotated
It remains to identify that they are same object through network, but the target less for some occupied areas in the picture, information exist
Convolutional neural networks can be lost during extracting feature, lead to not accurately detect target.With pushing away for recent research
Into, it has been found that it can effectively improve in the character representation of use " multiple dimensioned " to the accurate of the target detection of different scale
Rate.It there have been attempts the detection that multiscale target is carried out using image pyramid, specific practice is first more to sub-picture progress
Then the image of different scale is input in convolutional neural networks by the scaling of a scale, but this method needs are very big
Calculation amount and memory, therefore do not have feasibility.
Summary of the invention
In order to overcome existing method that cannot more accurately be detected to multiscale target, the present invention provides a kind of intensive
Connect the full convolution object detection method of convolutional neural networks.
To realize the above goal of the invention, the technical solution adopted is that:
A kind of full convolution object detection method of intensive connection convolutional neural networks, specifically includes the following steps:
Step S1: construction feature extracts network Densenet, and feature extraction network is by multiple intensive link blocks and conversion layer
Composition, the visual signature that identification is had more in image can be recognized using intensive link block, and input picture passes through feature extraction
After network, retain the features with different semantemes and different resolution of each intensive link block output;
Step S2: construction feature pyramid FPN, it is input in FPN each layer feature is retained in step S1, according to feature ruler
Degree stacks, formed one from bottom to top, the incremental low semantic feature pyramid of scale, by lowest level, every layer of feature is all passed through
It crosses " parallel path " and carries out convolution operation to obtain higher Semantic;The feature after convolution can be by upper sampling to upper one layer simultaneously
The same scale of feature, and merging with upper one layer of feature, this feature continues to up transmit, until pyramid tower top,
This step is recycled until constructing complete feature pyramid;
Step S3: full convolution fallout predictor FCP network is constructed, full convolution fallout predictor FCP is one can export target side simultaneously
The fallout predictor of boundary's frame information and class probability is respectively predicted the Feature Mapping of all scales in feature pyramid, in advance
Survey the vector conduct that device makes the Feature Mapping of input export a size S*S* (B*5+C) after a convolutional neural networks
Prediction result functions as and original image is divided into S*S grid, to B bounding box of each grid forecasting, each boundary
Frame includes 5 information, the centre coordinate deviant (t including bounding boxx, ty), the high deviant (t of the width of bounding boxw,th), and
The confidence level t of predicted boundary frame0, there are also the probability to C target category of each grid forecasting;
Step S4: training overall network, acquisition target image parameter are simultaneously input in network, the parameter of each layer network according to
The mode of Xavier initializes, and using the stochastic gradient returned as bounding box coordinates with loss function composed by object classification
Descent algorithm is calculated loss gradient and is finely adjusted using reverse conduction algorithm to the parameter in layers all in whole network.
Preferably, specific step is as follows in the step S1:
Existing trained intensive connection convolutional neural networks model is adjusted to obtain preliminary by step S101
Feature extraction network model;
Step S102 intensively connects convolutional neural networks and is divided into multiple intensive link blocks in implementation process, and different is close
It is attached between collection link block by conversion layer;
Step S103 has multiple convolutional neural networks layers, each convolutional neural networks layer in an intensive link block
Input be all convolutional neural networks layers in the same intensive link block before it output superposition;If intensive connection
L layers of convolutional network input is x in blockl, export as yl, then xl=(x1+y1+…+yl-1), yl=H (xl), wherein H () is fixed
Justice is activation primitive;
Step S104 H () is the activation primitive that every layer of convolutional neural networks are followed by, it is a composition operation herein,
Indicate input xlA BN operation is first passed through, using a ReLU function, finally by the processing of a convolutional layer as whole
The output of a activation primitive;
Step S105 is since the space size of different intensive link blocks is different, so passing through a conversion layer between each other
It is attached, the more than conversion layer output of one intensive link block first passes through a BN operation as input, then connects a convolution
The space size of Feature Mapping finally is adjusted to meet next intensive link block by neural net layer by a pond layer
Input;Setting the space size by pond layer Feature Mapping herein becomes original 1/n times;
The intensive link block of step S106 and conversion layer repeatedly alternately connect, so that the every warp of the space size of Feature Mapping
All reduce after crossing an intensive link block, and the port number of Feature Mapping then increases, and sets each intensive link block herein most
The Feature Mapping of later layer convolutional neural networks output is cm;
The overall situation that step S107 deletes existing intensive connection convolutional neural networks is averaged pond layer and the classification that connects entirely
Layer, and using the Feature Mapping of the last layer convolutional neural networks of the last one intensive link block output as feature extraction network
Output.
Preferably, specific step is as follows in the step S2:
Step S201FPN is made of " feature pyramid from bottom to top " and one " parallel path ", and FPN is first mentioned from feature
It takes and obtains the visual signature that its each layer has different semantic different scales in network, it is then raw by the build stack of " from bottom to top "
At the feature pyramid of lower semantic feature;
Step S202 takes first input of the Feature Mapping exported in step S107 as FPN, and the Feature Mapping of input is used
Port number is adjusted to a constant d by one convolutional layer, and will be by port number Feature Mapping adjusted as feature pyramid
Lowermost layer Feature Mapping, set the Feature Mapping of every layer of feature pyramid herein as Dm;
" path from bottom to top " in step S203FPN, main task are that low one layer of feature pyramidal to feature is reflected
It injects and samples on row, the factor sampled thereon is characterized the n reciprocal for extracting the diminution factor of pond layer in network, obtained feature
Map the Feature Mapping space size having the same of intensive link block output corresponding with step S1;
" parallel path " in step S204FPN, it is made with the Feature Mapping that intensive link block each in step S1 exports
For input, the port number of the Feature Mapping of output is then adjusted to d using a convolutional layer;
Step S205 passes through step S203 and step S204, obtains two identical features on space size and port number
The two Feature Mappings are carried out corresponding element addition by mapping, are then reached in the upper sampling process of reduction by a convolutional layer
Aliasing effect, result in low one layer of the Feature Mapping of feature pyramid, in step S203 and step S204 to input
Operation be denoted as f () and g () respectively, then Dm=g (Cm), Dk=∫ (f (Dk+1)+g(Ck)), wherein (0 < k < m), ∫ indicates S2.5
In convolution operation;
Step S206 repeats step S203, step S204 and step S205, so that layer-by-layer from the pyramidal lowermost layer of feature
Entire feature pyramid is constructed toward Shangdi.
Preferably, specific step is as follows in the step S3:
Step S301 has obtained a feature pyramid in step S02, its main feature is that the pyramidal characteristic dimension of feature
Successively increase from bottom to top, but each layer of port number remains unchanged, the ratio of the space size of the Feature Mapping of adjacent two layers
The example factor is n, and building one exports the fallout predictor of object boundary frame information and class probability simultaneously, and fallout predictor will act on feature
Pyramidal each layer of feature enables network to utilize the Feature Mapping of different scale;
Step S302 exports the building of the fallout predictor of object boundary frame information and class probability, pyramidal a certain with feature
Layer Feature Mapping is input, after the processing of two full articulamentums, exports the vector an of S*S* (B*5+C) as prediction knot
Fruit functions as and original image is divided into S*S grid, and to B bounding box of each grid forecasting, each bounding box includes
5 information, the centre coordinate deviant (t including bounding boxx, ty), the high deviant (t of the width of bounding boxw, th), and prediction side
The confidence level t of boundary's frame0, there are also the probability to C target category of each grid forecasting;
The calculating of step S303 coordinate value:
X=cx+σ(tx)
Y=cy+σ(ty)
σ(t0)=Pr (object) * IOU (b, object)
Wherein x, y are the actual coordinate of bounding box center in the picture, and w, h are respectively the width and height of bounding box;(cx, cy)
It is p for the top left co-ordinate of gridw, phWidth and high difference for input picture.
Preferably, specific step is as follows in the step S4:
Step S401 Image Acquisition: the image comprising all kinds of targets is as training image, every figure in acquisition daily life
As take through processing all obtain about the bounding box of target in the image and the information of classification;
Step S402 is that each premeasuring establishes cost function for training, for the centre coordinate of bounding box, using public affairs
Formula
It is high for the width of bounding box as cost function, using formula
As cost function, for predicting classification, using formula
Wherein λcoordAnd λnoobjBe in order to allow cost function to make balance between bounding box and the cost of probability, andTable
Show that target appears in i-th of grid,Indicate the target of the corresponding prediction of j-th of bounding box in i-th of grid, it is final to obtain
To following cost function:
Step S403 is input to the data marked being collected into step S401 in network, and the parameter of each layer is pressed
It is initialized according to the mode of Xavier, and using the boarding steps returned as bounding box coordinates with loss function composed by object classification
Degree descent algorithm is calculated loss gradient and is finely adjusted using reverse conduction algorithm to the parameter in layers all in whole network, is reached
To the purpose being trained to network.
Preferably, in the step S1, the network structure for replacing connection with conversion layer using intensive link block carries out feature
It extracts, the Feature Mapping for more preferably having identification in image can be extracted.
Preferably, under the intensive connection convolution of described one kind and on feature pyramid " and " parallel path " FPN for forming
Network, can efficiently use high semantic low scale and high yardstick is spoken in a low voice the Feature Mapping of justice, construct and have high semantic feature, big
The feature pyramid of scale and high location information.
Compared with prior art, the beneficial effects of the present invention are:
The present invention provides a kind of full convolution object detection methods of intensive connection convolutional neural networks, it is characterized in that can
Target detection is carried out to effectively utilize multiple dimensioned Feature Mapping, so that convolutional neural networks are to the difference in same image
The detection of scaled target accuracy rate all with higher.
Detailed description of the invention
Fig. 1 is flow chart of the invention.
Specific embodiment
The attached figures are only used for illustrative purposes and cannot be understood as limitating the patent;
Below in conjunction with drawings and examples, the present invention is further elaborated.
Embodiment 1
As shown in Figure 1, the present invention provides a kind of full convolution object detection method of intensive connection convolutional neural networks, specifically
The following steps are included:
Step S1: construction feature extracts network Densenet, and feature extraction network is by multiple intensive link blocks and conversion layer
Composition, the visual signature that identification is had more in image can be recognized using intensive link block, and input picture passes through feature extraction
After network, retain the features with different semantemes and different resolution of each intensive link block output;
Step S2: construction feature pyramid FPN, it is input in FPN each layer feature is retained in step S1, according to feature ruler
Degree stacks, formed one from bottom to top, the incremental low semantic feature pyramid of scale, by lowest level, every layer of feature is all passed through
It crosses " parallel path " and carries out convolution operation to obtain higher Semantic;The feature after convolution can be by upper sampling to upper one layer simultaneously
The same scale of feature, and merging with upper one layer of feature, this feature continues to up transmit, until pyramid tower top,
This step is recycled until constructing complete feature pyramid;
Step S3: full convolution fallout predictor FCP network is constructed, full convolution fallout predictor FCP is one can export target side simultaneously
The fallout predictor of boundary's frame information and class probability is respectively predicted the Feature Mapping of all scales in feature pyramid, in advance
Survey the vector conduct that device makes the Feature Mapping of input export a size S*S* (B*5+C) after a convolutional neural networks
Prediction result functions as and original image is divided into S*S grid, to B bounding box of each grid forecasting, each boundary
Frame includes 5 information, the centre coordinate deviant (t including bounding boxx, ty), the high deviant (t of the width of bounding boxw,th), and
The confidence level t of predicted boundary frame0, there are also the probability to C target category of each grid forecasting;
Step S4: training overall network, acquisition target image parameter are simultaneously input in network, the parameter of each layer network according to
The mode of Xavier initializes, and using the stochastic gradient returned as bounding box coordinates with loss function composed by object classification
Descent algorithm is calculated loss gradient and is finely adjusted using reverse conduction algorithm to the parameter in layers all in whole network.
Preferably, specific step is as follows in the step S1:
Existing trained intensive connection convolutional neural networks model is adjusted to obtain preliminary by step S101
Feature extraction network model;
Step S102 intensively connects convolutional neural networks and is divided into multiple intensive link blocks in implementation process, and different is close
It is attached between collection link block by conversion layer;
Step S103 has multiple convolutional neural networks layers, each convolutional neural networks layer in an intensive link block
Input be all convolutional neural networks layers in the same intensive link block before it output superposition;If intensive connection
L layers of convolutional network input is x in blockl, export as yl, then xl=(x1+y1+…+yl-1), yl=H (xl), wherein H () is fixed
Justice is activation primitive;
Step S104 H () is the activation primitive that every layer of convolutional neural networks are followed by, it is a composition operation herein,
Indicate input xlA BN operation is first passed through, using a ReLU function, finally by the processing of a convolutional layer as whole
The output of a activation primitive;
Step S105 is since the space size of different intensive link blocks is different, so passing through a conversion layer between each other
It is attached, the more than conversion layer output of one intensive link block first passes through a BN operation as input, then connects a convolution
The space size of Feature Mapping finally is adjusted to meet next intensive link block by neural net layer by a pond layer
Input;Setting the space size by pond layer Feature Mapping herein becomes original 1/n times;
The intensive link block of step S106 and conversion layer repeatedly alternately connect, so that the every warp of the space size of Feature Mapping
All reduce after crossing an intensive link block, and the port number of Feature Mapping then increases, and sets each intensive link block herein most
The Feature Mapping of later layer convolutional neural networks output is Cm;
The overall situation that step S107 deletes existing intensive connection convolutional neural networks is averaged pond layer and the classification that connects entirely
Layer, and using the Feature Mapping of the last layer convolutional neural networks of the last one intensive link block output as feature extraction network
Output.
Preferably, specific step is as follows in the step S2:
Step S201FPN is made of " feature pyramid from bottom to top " and one " parallel path ", and FPN is first mentioned from feature
It takes and obtains the visual signature that its each layer has different semantic different scales in network, it is then raw by the build stack of " from bottom to top "
At the feature pyramid of lower semantic feature;
Step S202 takes first input of the Feature Mapping exported in step S107 as FPN, and the Feature Mapping of input is used
Port number is adjusted to a constant d by one convolutional layer, and will be by port number Feature Mapping adjusted as feature pyramid
Lowermost layer Feature Mapping, set the Feature Mapping of every layer of feature pyramid herein as Dm;
" path from bottom to top " in step S203FPN, main task are that low one layer of feature pyramidal to feature is reflected
It injects and samples on row, the factor sampled thereon is characterized the n reciprocal for extracting the diminution factor of pond layer in network, obtained feature
Map the Feature Mapping space size having the same of intensive link block output corresponding with step S1;
" parallel path " in step S204FPN, it is made with the Feature Mapping that intensive link block each in step S1 exports
For input, the port number of the Feature Mapping of output is then adjusted to d using a convolutional layer;
Step S205 passes through step S203 and step S204, obtains two identical features on space size and port number
The two Feature Mappings are carried out corresponding element addition by mapping, are then reached in the upper sampling process of reduction by a convolutional layer
Aliasing effect, result in low one layer of the Feature Mapping of feature pyramid, in step S203 and step S204 to input
Operation be denoted as f () and g () respectively, then Dm=g (Cm), Dk=∫ (f (Dk+1)+g(Ck)), wherein (0 < k < m), ∫ indicates S2.5
In convolution operation;
Step S206 repeats step S203, step S204 and step S205, so that layer-by-layer from the pyramidal lowermost layer of feature
Entire feature pyramid is constructed toward Shangdi.
Preferably, specific step is as follows in the step S3:
Step S301 has obtained a feature pyramid in step S02, its main feature is that the pyramidal characteristic dimension of feature
Successively increase from bottom to top, but each layer of port number remains unchanged, the ratio of the space size of the Feature Mapping of adjacent two layers
The example factor is n, and building one exports the fallout predictor of object boundary frame information and class probability simultaneously, and fallout predictor will act on feature
Pyramidal each layer of feature enables network to utilize the Feature Mapping of different scale;
Step S302 exports the building of the fallout predictor of object boundary frame information and class probability, pyramidal a certain with feature
Layer Feature Mapping is input, after the processing of two full articulamentums, exports the vector an of S*S* (B*5+C) as prediction knot
Fruit functions as and original image is divided into S*S grid, and to B bounding box of each grid forecasting, each bounding box includes
5 information, the centre coordinate deviant (t including bounding boxx, ty), the high deviant (t of the width of bounding boxw, th), and prediction side
The confidence level t of boundary's frame0, there are also the probability to C target category of each grid forecasting;
The calculating of step S303 coordinate value:
X=cx+σ(tx)
Y=cy+σ(ty)
σ(t0)=Pr (object) * IOU (b, object)
Wherein x, y are the actual coordinate of bounding box center in the picture, and w, h are respectively the width and height of bounding box;(cx, cy)
It is p for the top left co-ordinate of gridw, phWidth and high difference for input picture.
Preferably, specific step is as follows in the step S4:
Step S401 Image Acquisition: the image comprising all kinds of targets is as training image, every figure in acquisition daily life
As take through processing all obtain about the bounding box of target in the image and the information of classification;
Step S402 is that each premeasuring establishes cost function for training, for the centre coordinate of bounding box, using public affairs
Formula
It is high for the width of bounding box as cost function, using formula
As cost function, for predicting classification, using formula
Wherein λcoordAnd λnoobjBe in order to allow cost function to make balance between bounding box and the cost of probability, andTable
Show that target appears in i-th of grid,Indicate the target of the corresponding prediction of j-th of bounding box in i-th of grid, it is final to obtain
To following cost function:
Step S403 is input to the data marked being collected into step S401 in network, and the parameter of each layer is pressed
It is initialized according to the mode of Xavier, and using the boarding steps returned as bounding box coordinates with loss function composed by object classification
Degree descent algorithm is calculated loss gradient and is finely adjusted using reverse conduction algorithm to the parameter in layers all in whole network, is reached
To the purpose being trained to network.
Preferably, in the step S1, the network structure for replacing connection with conversion layer using intensive link block carries out feature
It extracts, the Feature Mapping for more preferably having identification in image can be extracted.
Preferably, under the intensive connection convolution of described one kind and on feature pyramid " and " parallel path " FPN for forming
Network, can efficiently use high semantic low scale and high yardstick is spoken in a low voice the Feature Mapping of justice, construct and have high semantic feature, big
The feature pyramid of scale and high location information.
Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair
The restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above description
To make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all this
Made any modifications, equivalent replacements, and improvements etc., should be included in the claims in the present invention within the spirit and principle of invention
Protection scope within.
Claims (7)
1. a kind of full convolution object detection method of intensive connection convolutional neural networks, which comprises the following steps:
Step S1: construction feature extracts network Densenet, and feature extraction network is made of multiple intensive link blocks and conversion layer,
The visual signature that identification is had more in image can be recognized using intensive link block, input picture passes through feature extraction network
Afterwards, retain the features with different semantemes and different resolution of each intensive link block output;
Step S2: construction feature pyramid FPN, it is input in FPN each layer feature is retained in step S1, according to characteristic dimension heap
It is folded, formed one from bottom to top, the incremental low semantic feature pyramid of scale, by lowest level, every layer of feature is all passed through " flat
Walking along the street diameter " carries out convolution operation to obtain higher Semantic;The feature after convolution can be by upper sampling to upper one layer of feature simultaneously
Same scale, and merged with upper one layer of feature, this feature continues to up transmit, until pyramid tower top, circulation
This step is until constructing complete feature pyramid;
Step S3: full convolution fallout predictor FCP network is constructed, full convolution fallout predictor FCP is one can export object boundary frame simultaneously
The fallout predictor of information and class probability is respectively predicted the Feature Mapping of all scales in feature pyramid, fallout predictor
The Feature Mapping of input is set to export the vector of a size S*S* (B*5+C) after a convolutional neural networks as prediction
As a result, it, which is functioned as, is divided into S*S grid original image, and to B bounding box of each grid forecasting, each bounding box packet
Containing 5 information, the centre coordinate deviant (t including bounding boxx, ty), the high deviant (t of the width of bounding boxw,th), and prediction
The confidence level t of bounding box0, there are also the probability to C target category of each grid forecasting;
Step S4: training overall network, acquisition target image parameter are simultaneously input in network, the parameter of each layer network according to
The mode of Xavier initializes, and using the stochastic gradient returned as bounding box coordinates with loss function composed by object classification
Descent algorithm is calculated loss gradient and is finely adjusted using reverse conduction algorithm to the parameter in layers all in whole network.
2. the full convolution object detection method of intensive connection convolutional neural networks according to claim 1, which is characterized in that
Specific step is as follows in the step S1:
Step S101 is adjusted existing trained intensive connection convolutional neural networks model to obtain preliminary feature
Extract network model;
Step S102 intensively connects convolutional neural networks and is divided into multiple intensive link blocks in implementation process, and different intensively connects
It connects and is attached between block by conversion layer;
Step S103 in an intensive link block have multiple convolutional neural networks layers, each convolutional neural networks layer it is defeated
Enter be all convolutional neural networks layers in the same intensive link block before it output superposition;If in intensive link block
L layers of convolutional network input is xl, export as yl, then xl=(x1+y1+…+yl-1), yl=H (xl), wherein H () is defined as
Activation primitive;
Step S104 H () is the activation primitive that every layer of convolutional neural networks are followed by, it is a composition operation herein, is indicated
Input xlA BN operation is first passed through, using a ReLU function, is finally swashed by the processing of a convolutional layer as entire
The output of function living;
Step S105 is since the space size of different intensive link blocks is different, so being carried out between each other by a conversion layer
It connects, the more than conversion layer output of one intensive link block first passes through a BN operation as input, then connects a convolutional Neural
The space size of Feature Mapping finally is adjusted to meet the defeated of next intensive link block by network layer by a pond layer
Enter;Setting the space size by pond layer Feature Mapping herein becomes original 1/n times;
The intensive link block of step S106 and conversion layer repeatedly alternately connect, so that the space size of Feature Mapping is every to pass through one
All reduce after a intensive link block, and the port number of Feature Mapping then increases, and sets each intensive link block last herein
The Feature Mapping of layer convolutional neural networks output is Cm;
The overall situation that step S107 deletes existing intensive connection convolutional neural networks is averaged pond layer and the classification layer that connects entirely, and
The Feature Mapping that the last layer convolutional neural networks of the last one intensive link block are exported is as the defeated of feature extraction network
Out.
3. the full convolution object detection method of intensive connection convolutional neural networks according to claim 2, which is characterized in that
Specific step is as follows in the step S2:
Step S201 FPN is made of " feature pyramid from bottom to top " and one " parallel path ", and FPN is first from feature extraction
The visual signature that its each layer has different semantic different scales is obtained in network, is then generated by the build stack of " from bottom to top "
The feature pyramid of lower semantic feature;
Step S202 takes first input of the Feature Mapping exported in step S107 as FPN, and the Feature Mapping of input is with one
Port number is adjusted to a constant d by convolutional layer, and will be pyramidal most as feature by port number Feature Mapping adjusted
Low-level feature mapping, sets the Feature Mapping of every layer of feature pyramid herein as Dm;
" path from bottom to top " in step S203FPN, main task be low one layer of Feature Mapping pyramidal to feature into
It samples on row, the factor sampled thereon is characterized the n reciprocal for extracting the diminution factor of pond layer in network, obtained Feature Mapping
The Feature Mapping space size having the same of intensive link block output corresponding with step S1;
" parallel path " in step S204FPN, the Feature Mapping of its each intensive link block output using in step S1 is as defeated
Enter, the port number of the Feature Mapping of output is then adjusted to d using a convolutional layer;
Step S205 pass through step S203 and step S204, obtain two on space size and port number identical feature reflect
It penetrates, the two Feature Mappings is carried out corresponding element addition, then reach in the upper sampling process of reduction by a convolutional layer
Aliasing effect results in low one layer of the Feature Mapping of feature pyramid, in step S203 and step S204 to input
Operation is denoted as f () and g () respectively, then Dm=g (Cm), Dk=∫ (f (Dk+1)+g(Ck)), wherein (0 < k < m), ∫ is indicated in S2.5
Convolution operation;
Step S206 repeats step S203, step S204 and step S205, so that successively up from the pyramidal lowermost layer of feature
Construct entire feature pyramid in ground.
4. the full convolution object detection method of intensive connection convolutional neural networks according to claim 2, which is characterized in that
Specific step is as follows in the step S3:
Step S301 has obtained a feature pyramid in step S02, its main feature is that the pyramidal characteristic dimension of feature is under
And upper layer-by-layer increase, but each layer of port number remains unchanged, the ratio of the space size of the Feature Mapping of adjacent two layers because
Son is n, and building one exports the fallout predictor of object boundary frame information and class probability simultaneously, and fallout predictor will act on feature gold word
Each layer of feature of tower enables network to utilize the Feature Mapping of different scale;
Step S302 exports the building of the fallout predictor of object boundary frame information and class probability, special with the pyramidal a certain layer of feature
Sign is mapped as inputting, and after the processing of two full articulamentums, exports the vector of a S*S* (B*5+C) as prediction result,
It, which is functioned as, is divided into S*S grid original image, and to B bounding box of each grid forecasting, each bounding box includes 5
Information, the centre coordinate deviant (t including bounding boxx, ty), the high deviant (t of the width of bounding boxw, th) and predicted boundary
The confidence level t of frame0, there are also the probability to C target category of each grid forecasting;
The calculating of step S303 coordinate value:
X=cx+σ(tx)
Y=cy+σ(ty)
σ(t0)=Pr (object) * IOU (b, object)
Wherein x, y are the actual coordinate of bounding box center in the picture, and w, h are respectively the width and height of bounding box;(cx, cy) it is lattice
The top left co-ordinate of son is pw, phWidth and high difference for input picture.
5. the full convolution object detection method of intensive connection convolutional neural networks according to claim 1, which is characterized in that
Specific step is as follows in the step S4:
Step S401 Image Acquisition: the image comprising all kinds of targets is as training image, every picture strip in acquisition daily life
It is upper by processing all obtain about the bounding box of target in the image and the information of classification;
Step S402 is that each premeasuring establishes cost function for training, for the centre coordinate of bounding box, using formula
It is high for the width of bounding box as cost function, using formula
As cost function, for predicting classification, using formula
Wherein λcoordAnd λnoobjBe in order to allow cost function to make balance between bounding box and the cost of probability, andIndicate mesh
It marks in present i-th of grid,The target for indicating the corresponding prediction of j-th of bounding box in i-th of grid, finally obtain as
Under cost function:
Step S403 is input to the data marked being collected into step S401 in network, the parameter of each layer according to
The mode of Xavier initializes, and using the stochastic gradient returned as bounding box coordinates with loss function composed by object classification
Descent algorithm is calculated loss gradient and is finely adjusted using reverse conduction algorithm to the parameter in layers all in whole network, is reached
The purpose that network is trained.
6. a kind of full convolution object detection method of intensive connection convolutional neural networks according to claim 1, feature
It is, in the step S1, the network structure for replacing connection with conversion layer using intensive link block carries out feature extraction, can extract
More preferably there is the Feature Mapping of identification into image.
7. the feature pyramid under a kind of intensive connection convolution according to claim 1 " and " parallel path " composition
FPN network, can efficiently use high semantic low scale and high yardstick is spoken in a low voice the Feature Mapping of justice, construct have it is high semantic special
The feature pyramid of sign, large scale and high location information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810998184.3A CN109214505B (en) | 2018-08-29 | 2018-08-29 | Full convolution target detection method of densely connected convolution neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810998184.3A CN109214505B (en) | 2018-08-29 | 2018-08-29 | Full convolution target detection method of densely connected convolution neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109214505A true CN109214505A (en) | 2019-01-15 |
CN109214505B CN109214505B (en) | 2022-07-01 |
Family
ID=64985668
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810998184.3A Active CN109214505B (en) | 2018-08-29 | 2018-08-29 | Full convolution target detection method of densely connected convolution neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109214505B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109815886A (en) * | 2019-01-21 | 2019-05-28 | 南京邮电大学 | A kind of pedestrian and vehicle checking method and system based on improvement YOLOv3 |
CN109871823A (en) * | 2019-03-11 | 2019-06-11 | 中国电子科技集团公司第五十四研究所 | A kind of satellite image Ship Detection of combination rotating frame and contextual information |
CN110009622A (en) * | 2019-04-04 | 2019-07-12 | 武汉精立电子技术有限公司 | A kind of display panel open defect detection network and its defect inspection method |
CN110060274A (en) * | 2019-04-12 | 2019-07-26 | 北京影谱科技股份有限公司 | The visual target tracking method and device of neural network based on the dense connection of depth |
CN110322509A (en) * | 2019-06-26 | 2019-10-11 | 重庆邮电大学 | Object localization method, system and computer equipment based on level Class Activation figure |
CN110555371A (en) * | 2019-07-19 | 2019-12-10 | 华瑞新智科技(北京)有限公司 | Wild animal information acquisition method and device based on unmanned aerial vehicle |
CN110689081A (en) * | 2019-09-30 | 2020-01-14 | 中国科学院大学 | Weak supervision target classification and positioning method based on bifurcation learning |
CN112016535A (en) * | 2020-10-26 | 2020-12-01 | 成都合能创越软件有限公司 | Vehicle-mounted garbage traceability method and system based on edge calculation and block chain |
CN112560778A (en) * | 2020-12-25 | 2021-03-26 | 万里云医疗信息科技(北京)有限公司 | DR image body part identification method, device, equipment and readable storage medium |
CN112581470A (en) * | 2020-09-15 | 2021-03-30 | 佛山中纺联检验技术服务有限公司 | Small target object detection method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106250812A (en) * | 2016-07-15 | 2016-12-21 | 汤平 | A kind of model recognizing method based on quick R CNN deep neural network |
CN106844442A (en) * | 2016-12-16 | 2017-06-13 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Multi-modal Recognition with Recurrent Neural Network Image Description Methods based on FCN feature extractions |
CN107437096A (en) * | 2017-07-28 | 2017-12-05 | 北京大学 | Image classification method based on the efficient depth residual error network model of parameter |
US20170372201A1 (en) * | 2016-06-22 | 2017-12-28 | Massachusetts Institute Of Technology | Secure Training of Multi-Party Deep Neural Network |
CN108182388A (en) * | 2017-12-14 | 2018-06-19 | 哈尔滨工业大学(威海) | A kind of motion target tracking method based on image |
-
2018
- 2018-08-29 CN CN201810998184.3A patent/CN109214505B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170372201A1 (en) * | 2016-06-22 | 2017-12-28 | Massachusetts Institute Of Technology | Secure Training of Multi-Party Deep Neural Network |
CN106250812A (en) * | 2016-07-15 | 2016-12-21 | 汤平 | A kind of model recognizing method based on quick R CNN deep neural network |
CN106844442A (en) * | 2016-12-16 | 2017-06-13 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Multi-modal Recognition with Recurrent Neural Network Image Description Methods based on FCN feature extractions |
CN107437096A (en) * | 2017-07-28 | 2017-12-05 | 北京大学 | Image classification method based on the efficient depth residual error network model of parameter |
CN108182388A (en) * | 2017-12-14 | 2018-06-19 | 哈尔滨工业大学(威海) | A kind of motion target tracking method based on image |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109815886A (en) * | 2019-01-21 | 2019-05-28 | 南京邮电大学 | A kind of pedestrian and vehicle checking method and system based on improvement YOLOv3 |
CN109871823A (en) * | 2019-03-11 | 2019-06-11 | 中国电子科技集团公司第五十四研究所 | A kind of satellite image Ship Detection of combination rotating frame and contextual information |
CN109871823B (en) * | 2019-03-11 | 2021-08-31 | 中国电子科技集团公司第五十四研究所 | Satellite image ship detection method combining rotating frame and context information |
CN110009622A (en) * | 2019-04-04 | 2019-07-12 | 武汉精立电子技术有限公司 | A kind of display panel open defect detection network and its defect inspection method |
CN110060274A (en) * | 2019-04-12 | 2019-07-26 | 北京影谱科技股份有限公司 | The visual target tracking method and device of neural network based on the dense connection of depth |
CN110322509A (en) * | 2019-06-26 | 2019-10-11 | 重庆邮电大学 | Object localization method, system and computer equipment based on level Class Activation figure |
CN110322509B (en) * | 2019-06-26 | 2021-11-12 | 重庆邮电大学 | Target positioning method, system and computer equipment based on hierarchical class activation graph |
CN110555371A (en) * | 2019-07-19 | 2019-12-10 | 华瑞新智科技(北京)有限公司 | Wild animal information acquisition method and device based on unmanned aerial vehicle |
CN110689081A (en) * | 2019-09-30 | 2020-01-14 | 中国科学院大学 | Weak supervision target classification and positioning method based on bifurcation learning |
CN112581470A (en) * | 2020-09-15 | 2021-03-30 | 佛山中纺联检验技术服务有限公司 | Small target object detection method |
CN112016535A (en) * | 2020-10-26 | 2020-12-01 | 成都合能创越软件有限公司 | Vehicle-mounted garbage traceability method and system based on edge calculation and block chain |
CN112560778A (en) * | 2020-12-25 | 2021-03-26 | 万里云医疗信息科技(北京)有限公司 | DR image body part identification method, device, equipment and readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109214505B (en) | 2022-07-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109214505A (en) | A kind of full convolution object detection method of intensive connection convolutional neural networks | |
Li et al. | RSI-CB: A large scale remote sensing image classification benchmark via crowdsource data | |
CN108334847B (en) | A kind of face identification method based on deep learning under real scene | |
CN109543630B (en) | Remote sensing image woodland extraction method and system based on deep learning, storage medium and electronic equipment | |
CN109241913A (en) | In conjunction with the ship detection method and system of conspicuousness detection and deep learning | |
CN104376326B (en) | A kind of feature extracting method for image scene identification | |
CN110222787A (en) | Multiscale target detection method, device, computer equipment and storage medium | |
CN107808375B (en) | Merge the rice disease image detecting method of a variety of context deep learning models | |
CN109800629A (en) | A kind of Remote Sensing Target detection method based on convolutional neural networks | |
CN104239902B (en) | Hyperspectral image classification method based on non local similitude and sparse coding | |
CN101271526B (en) | Method for object automatic recognition and three-dimensional reconstruction in image processing | |
CN108648169A (en) | The method and device of high voltage power transmission tower defects of insulator automatic identification | |
CN107665498A (en) | The full convolutional network airplane detection method excavated based on typical case | |
Li et al. | Breaking the resolution barrier: A low-to-high network for large-scale high-resolution land-cover mapping using low-resolution labels | |
CN108090906A (en) | A kind of uterine neck image processing method and device based on region nomination | |
CN109766873A (en) | A kind of pedestrian mixing deformable convolution recognition methods again | |
CN108256462A (en) | A kind of demographic method in market monitor video | |
Bhagat et al. | WheatNet-lite: A novel light weight network for wheat head detection | |
CN102880870B (en) | The extracting method of face characteristic and system | |
CN116363521B (en) | Semantic prediction method for remote sensing image | |
CN113822185A (en) | Method for detecting daily behavior of group health pigs | |
CN114898089B (en) | Functional area extraction and classification method fusing high-resolution images and POI data | |
CN109360191A (en) | A kind of image significance detection method based on variation self-encoding encoder | |
Hu et al. | A bag of tricks for fine-grained roof extraction | |
CN107121681A (en) | Residential area extraction system based on high score satellite remote sensing date |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |