CN109214505A - A kind of full convolution object detection method of intensive connection convolutional neural networks - Google Patents

A kind of full convolution object detection method of intensive connection convolutional neural networks Download PDF

Info

Publication number
CN109214505A
CN109214505A CN201810998184.3A CN201810998184A CN109214505A CN 109214505 A CN109214505 A CN 109214505A CN 201810998184 A CN201810998184 A CN 201810998184A CN 109214505 A CN109214505 A CN 109214505A
Authority
CN
China
Prior art keywords
feature
layer
convolutional neural
network
neural networks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810998184.3A
Other languages
Chinese (zh)
Other versions
CN109214505B (en
Inventor
胡海峰
黄福强
王伟轩
张运鸿
孙永丞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN201810998184.3A priority Critical patent/CN109214505B/en
Publication of CN109214505A publication Critical patent/CN109214505A/en
Application granted granted Critical
Publication of CN109214505B publication Critical patent/CN109214505B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The present invention relates to artificial intelligence fields, more particularly to a kind of full convolution object detection method of intensive connection convolutional neural networks.The present invention is in order to overcome the shortcomings of that existing method cannot more accurately detect multiscale target, provide a kind of full convolution object detection method of intensive connection convolutional neural networks, it is characterized in that multiple dimensioned Feature Mapping can be effectively utilized to carry out target detection, so that convolutional neural networks are to the detection of the different scale target in same image accuracy rate all with higher.

Description

A kind of full convolution object detection method of intensive connection convolutional neural networks
Technical field
The present invention relates to artificial intelligence fields, more particularly to a kind of full convolution of intensive connection convolutional neural networks Object detection method.
Background technique
Convolutional neural networks have invariance to the detection of feature.Such as convolution mind after an object is translated, rotated It remains to identify that they are same object through network, but the target less for some occupied areas in the picture, information exist Convolutional neural networks can be lost during extracting feature, lead to not accurately detect target.With pushing away for recent research Into, it has been found that it can effectively improve in the character representation of use " multiple dimensioned " to the accurate of the target detection of different scale Rate.It there have been attempts the detection that multiscale target is carried out using image pyramid, specific practice is first more to sub-picture progress Then the image of different scale is input in convolutional neural networks by the scaling of a scale, but this method needs are very big Calculation amount and memory, therefore do not have feasibility.
Summary of the invention
In order to overcome existing method that cannot more accurately be detected to multiscale target, the present invention provides a kind of intensive Connect the full convolution object detection method of convolutional neural networks.
To realize the above goal of the invention, the technical solution adopted is that:
A kind of full convolution object detection method of intensive connection convolutional neural networks, specifically includes the following steps:
Step S1: construction feature extracts network Densenet, and feature extraction network is by multiple intensive link blocks and conversion layer Composition, the visual signature that identification is had more in image can be recognized using intensive link block, and input picture passes through feature extraction After network, retain the features with different semantemes and different resolution of each intensive link block output;
Step S2: construction feature pyramid FPN, it is input in FPN each layer feature is retained in step S1, according to feature ruler Degree stacks, formed one from bottom to top, the incremental low semantic feature pyramid of scale, by lowest level, every layer of feature is all passed through It crosses " parallel path " and carries out convolution operation to obtain higher Semantic;The feature after convolution can be by upper sampling to upper one layer simultaneously The same scale of feature, and merging with upper one layer of feature, this feature continues to up transmit, until pyramid tower top, This step is recycled until constructing complete feature pyramid;
Step S3: full convolution fallout predictor FCP network is constructed, full convolution fallout predictor FCP is one can export target side simultaneously The fallout predictor of boundary's frame information and class probability is respectively predicted the Feature Mapping of all scales in feature pyramid, in advance Survey the vector conduct that device makes the Feature Mapping of input export a size S*S* (B*5+C) after a convolutional neural networks Prediction result functions as and original image is divided into S*S grid, to B bounding box of each grid forecasting, each boundary Frame includes 5 information, the centre coordinate deviant (t including bounding boxx, ty), the high deviant (t of the width of bounding boxw,th), and The confidence level t of predicted boundary frame0, there are also the probability to C target category of each grid forecasting;
Step S4: training overall network, acquisition target image parameter are simultaneously input in network, the parameter of each layer network according to The mode of Xavier initializes, and using the stochastic gradient returned as bounding box coordinates with loss function composed by object classification Descent algorithm is calculated loss gradient and is finely adjusted using reverse conduction algorithm to the parameter in layers all in whole network.
Preferably, specific step is as follows in the step S1:
Existing trained intensive connection convolutional neural networks model is adjusted to obtain preliminary by step S101 Feature extraction network model;
Step S102 intensively connects convolutional neural networks and is divided into multiple intensive link blocks in implementation process, and different is close It is attached between collection link block by conversion layer;
Step S103 has multiple convolutional neural networks layers, each convolutional neural networks layer in an intensive link block Input be all convolutional neural networks layers in the same intensive link block before it output superposition;If intensive connection L layers of convolutional network input is x in blockl, export as yl, then xl=(x1+y1+…+yl-1), yl=H (xl), wherein H () is fixed Justice is activation primitive;
Step S104 H () is the activation primitive that every layer of convolutional neural networks are followed by, it is a composition operation herein, Indicate input xlA BN operation is first passed through, using a ReLU function, finally by the processing of a convolutional layer as whole The output of a activation primitive;
Step S105 is since the space size of different intensive link blocks is different, so passing through a conversion layer between each other It is attached, the more than conversion layer output of one intensive link block first passes through a BN operation as input, then connects a convolution The space size of Feature Mapping finally is adjusted to meet next intensive link block by neural net layer by a pond layer Input;Setting the space size by pond layer Feature Mapping herein becomes original 1/n times;
The intensive link block of step S106 and conversion layer repeatedly alternately connect, so that the every warp of the space size of Feature Mapping All reduce after crossing an intensive link block, and the port number of Feature Mapping then increases, and sets each intensive link block herein most The Feature Mapping of later layer convolutional neural networks output is cm
The overall situation that step S107 deletes existing intensive connection convolutional neural networks is averaged pond layer and the classification that connects entirely Layer, and using the Feature Mapping of the last layer convolutional neural networks of the last one intensive link block output as feature extraction network Output.
Preferably, specific step is as follows in the step S2:
Step S201FPN is made of " feature pyramid from bottom to top " and one " parallel path ", and FPN is first mentioned from feature It takes and obtains the visual signature that its each layer has different semantic different scales in network, it is then raw by the build stack of " from bottom to top " At the feature pyramid of lower semantic feature;
Step S202 takes first input of the Feature Mapping exported in step S107 as FPN, and the Feature Mapping of input is used Port number is adjusted to a constant d by one convolutional layer, and will be by port number Feature Mapping adjusted as feature pyramid Lowermost layer Feature Mapping, set the Feature Mapping of every layer of feature pyramid herein as Dm
" path from bottom to top " in step S203FPN, main task are that low one layer of feature pyramidal to feature is reflected It injects and samples on row, the factor sampled thereon is characterized the n reciprocal for extracting the diminution factor of pond layer in network, obtained feature Map the Feature Mapping space size having the same of intensive link block output corresponding with step S1;
" parallel path " in step S204FPN, it is made with the Feature Mapping that intensive link block each in step S1 exports For input, the port number of the Feature Mapping of output is then adjusted to d using a convolutional layer;
Step S205 passes through step S203 and step S204, obtains two identical features on space size and port number The two Feature Mappings are carried out corresponding element addition by mapping, are then reached in the upper sampling process of reduction by a convolutional layer Aliasing effect, result in low one layer of the Feature Mapping of feature pyramid, in step S203 and step S204 to input Operation be denoted as f () and g () respectively, then Dm=g (Cm), Dk=∫ (f (Dk+1)+g(Ck)), wherein (0 < k < m), ∫ indicates S2.5 In convolution operation;
Step S206 repeats step S203, step S204 and step S205, so that layer-by-layer from the pyramidal lowermost layer of feature Entire feature pyramid is constructed toward Shangdi.
Preferably, specific step is as follows in the step S3:
Step S301 has obtained a feature pyramid in step S02, its main feature is that the pyramidal characteristic dimension of feature Successively increase from bottom to top, but each layer of port number remains unchanged, the ratio of the space size of the Feature Mapping of adjacent two layers The example factor is n, and building one exports the fallout predictor of object boundary frame information and class probability simultaneously, and fallout predictor will act on feature Pyramidal each layer of feature enables network to utilize the Feature Mapping of different scale;
Step S302 exports the building of the fallout predictor of object boundary frame information and class probability, pyramidal a certain with feature Layer Feature Mapping is input, after the processing of two full articulamentums, exports the vector an of S*S* (B*5+C) as prediction knot Fruit functions as and original image is divided into S*S grid, and to B bounding box of each grid forecasting, each bounding box includes 5 information, the centre coordinate deviant (t including bounding boxx, ty), the high deviant (t of the width of bounding boxw, th), and prediction side The confidence level t of boundary's frame0, there are also the probability to C target category of each grid forecasting;
The calculating of step S303 coordinate value:
X=cx+σ(tx)
Y=cy+σ(ty)
σ(t0)=Pr (object) * IOU (b, object)
Wherein x, y are the actual coordinate of bounding box center in the picture, and w, h are respectively the width and height of bounding box;(cx, cy) It is p for the top left co-ordinate of gridw, phWidth and high difference for input picture.
Preferably, specific step is as follows in the step S4:
Step S401 Image Acquisition: the image comprising all kinds of targets is as training image, every figure in acquisition daily life As take through processing all obtain about the bounding box of target in the image and the information of classification;
Step S402 is that each premeasuring establishes cost function for training, for the centre coordinate of bounding box, using public affairs Formula
It is high for the width of bounding box as cost function, using formula
As cost function, for predicting classification, using formula
Wherein λcoordAnd λnoobjBe in order to allow cost function to make balance between bounding box and the cost of probability, andTable Show that target appears in i-th of grid,Indicate the target of the corresponding prediction of j-th of bounding box in i-th of grid, it is final to obtain To following cost function:
Step S403 is input to the data marked being collected into step S401 in network, and the parameter of each layer is pressed It is initialized according to the mode of Xavier, and using the boarding steps returned as bounding box coordinates with loss function composed by object classification Degree descent algorithm is calculated loss gradient and is finely adjusted using reverse conduction algorithm to the parameter in layers all in whole network, is reached To the purpose being trained to network.
Preferably, in the step S1, the network structure for replacing connection with conversion layer using intensive link block carries out feature It extracts, the Feature Mapping for more preferably having identification in image can be extracted.
Preferably, under the intensive connection convolution of described one kind and on feature pyramid " and " parallel path " FPN for forming Network, can efficiently use high semantic low scale and high yardstick is spoken in a low voice the Feature Mapping of justice, construct and have high semantic feature, big The feature pyramid of scale and high location information.
Compared with prior art, the beneficial effects of the present invention are:
The present invention provides a kind of full convolution object detection methods of intensive connection convolutional neural networks, it is characterized in that can Target detection is carried out to effectively utilize multiple dimensioned Feature Mapping, so that convolutional neural networks are to the difference in same image The detection of scaled target accuracy rate all with higher.
Detailed description of the invention
Fig. 1 is flow chart of the invention.
Specific embodiment
The attached figures are only used for illustrative purposes and cannot be understood as limitating the patent;
Below in conjunction with drawings and examples, the present invention is further elaborated.
Embodiment 1
As shown in Figure 1, the present invention provides a kind of full convolution object detection method of intensive connection convolutional neural networks, specifically The following steps are included:
Step S1: construction feature extracts network Densenet, and feature extraction network is by multiple intensive link blocks and conversion layer Composition, the visual signature that identification is had more in image can be recognized using intensive link block, and input picture passes through feature extraction After network, retain the features with different semantemes and different resolution of each intensive link block output;
Step S2: construction feature pyramid FPN, it is input in FPN each layer feature is retained in step S1, according to feature ruler Degree stacks, formed one from bottom to top, the incremental low semantic feature pyramid of scale, by lowest level, every layer of feature is all passed through It crosses " parallel path " and carries out convolution operation to obtain higher Semantic;The feature after convolution can be by upper sampling to upper one layer simultaneously The same scale of feature, and merging with upper one layer of feature, this feature continues to up transmit, until pyramid tower top, This step is recycled until constructing complete feature pyramid;
Step S3: full convolution fallout predictor FCP network is constructed, full convolution fallout predictor FCP is one can export target side simultaneously The fallout predictor of boundary's frame information and class probability is respectively predicted the Feature Mapping of all scales in feature pyramid, in advance Survey the vector conduct that device makes the Feature Mapping of input export a size S*S* (B*5+C) after a convolutional neural networks Prediction result functions as and original image is divided into S*S grid, to B bounding box of each grid forecasting, each boundary Frame includes 5 information, the centre coordinate deviant (t including bounding boxx, ty), the high deviant (t of the width of bounding boxw,th), and The confidence level t of predicted boundary frame0, there are also the probability to C target category of each grid forecasting;
Step S4: training overall network, acquisition target image parameter are simultaneously input in network, the parameter of each layer network according to The mode of Xavier initializes, and using the stochastic gradient returned as bounding box coordinates with loss function composed by object classification Descent algorithm is calculated loss gradient and is finely adjusted using reverse conduction algorithm to the parameter in layers all in whole network.
Preferably, specific step is as follows in the step S1:
Existing trained intensive connection convolutional neural networks model is adjusted to obtain preliminary by step S101 Feature extraction network model;
Step S102 intensively connects convolutional neural networks and is divided into multiple intensive link blocks in implementation process, and different is close It is attached between collection link block by conversion layer;
Step S103 has multiple convolutional neural networks layers, each convolutional neural networks layer in an intensive link block Input be all convolutional neural networks layers in the same intensive link block before it output superposition;If intensive connection L layers of convolutional network input is x in blockl, export as yl, then xl=(x1+y1+…+yl-1), yl=H (xl), wherein H () is fixed Justice is activation primitive;
Step S104 H () is the activation primitive that every layer of convolutional neural networks are followed by, it is a composition operation herein, Indicate input xlA BN operation is first passed through, using a ReLU function, finally by the processing of a convolutional layer as whole The output of a activation primitive;
Step S105 is since the space size of different intensive link blocks is different, so passing through a conversion layer between each other It is attached, the more than conversion layer output of one intensive link block first passes through a BN operation as input, then connects a convolution The space size of Feature Mapping finally is adjusted to meet next intensive link block by neural net layer by a pond layer Input;Setting the space size by pond layer Feature Mapping herein becomes original 1/n times;
The intensive link block of step S106 and conversion layer repeatedly alternately connect, so that the every warp of the space size of Feature Mapping All reduce after crossing an intensive link block, and the port number of Feature Mapping then increases, and sets each intensive link block herein most The Feature Mapping of later layer convolutional neural networks output is Cm
The overall situation that step S107 deletes existing intensive connection convolutional neural networks is averaged pond layer and the classification that connects entirely Layer, and using the Feature Mapping of the last layer convolutional neural networks of the last one intensive link block output as feature extraction network Output.
Preferably, specific step is as follows in the step S2:
Step S201FPN is made of " feature pyramid from bottom to top " and one " parallel path ", and FPN is first mentioned from feature It takes and obtains the visual signature that its each layer has different semantic different scales in network, it is then raw by the build stack of " from bottom to top " At the feature pyramid of lower semantic feature;
Step S202 takes first input of the Feature Mapping exported in step S107 as FPN, and the Feature Mapping of input is used Port number is adjusted to a constant d by one convolutional layer, and will be by port number Feature Mapping adjusted as feature pyramid Lowermost layer Feature Mapping, set the Feature Mapping of every layer of feature pyramid herein as Dm
" path from bottom to top " in step S203FPN, main task are that low one layer of feature pyramidal to feature is reflected It injects and samples on row, the factor sampled thereon is characterized the n reciprocal for extracting the diminution factor of pond layer in network, obtained feature Map the Feature Mapping space size having the same of intensive link block output corresponding with step S1;
" parallel path " in step S204FPN, it is made with the Feature Mapping that intensive link block each in step S1 exports For input, the port number of the Feature Mapping of output is then adjusted to d using a convolutional layer;
Step S205 passes through step S203 and step S204, obtains two identical features on space size and port number The two Feature Mappings are carried out corresponding element addition by mapping, are then reached in the upper sampling process of reduction by a convolutional layer Aliasing effect, result in low one layer of the Feature Mapping of feature pyramid, in step S203 and step S204 to input Operation be denoted as f () and g () respectively, then Dm=g (Cm), Dk=∫ (f (Dk+1)+g(Ck)), wherein (0 < k < m), ∫ indicates S2.5 In convolution operation;
Step S206 repeats step S203, step S204 and step S205, so that layer-by-layer from the pyramidal lowermost layer of feature Entire feature pyramid is constructed toward Shangdi.
Preferably, specific step is as follows in the step S3:
Step S301 has obtained a feature pyramid in step S02, its main feature is that the pyramidal characteristic dimension of feature Successively increase from bottom to top, but each layer of port number remains unchanged, the ratio of the space size of the Feature Mapping of adjacent two layers The example factor is n, and building one exports the fallout predictor of object boundary frame information and class probability simultaneously, and fallout predictor will act on feature Pyramidal each layer of feature enables network to utilize the Feature Mapping of different scale;
Step S302 exports the building of the fallout predictor of object boundary frame information and class probability, pyramidal a certain with feature Layer Feature Mapping is input, after the processing of two full articulamentums, exports the vector an of S*S* (B*5+C) as prediction knot Fruit functions as and original image is divided into S*S grid, and to B bounding box of each grid forecasting, each bounding box includes 5 information, the centre coordinate deviant (t including bounding boxx, ty), the high deviant (t of the width of bounding boxw, th), and prediction side The confidence level t of boundary's frame0, there are also the probability to C target category of each grid forecasting;
The calculating of step S303 coordinate value:
X=cx+σ(tx)
Y=cy+σ(ty)
σ(t0)=Pr (object) * IOU (b, object)
Wherein x, y are the actual coordinate of bounding box center in the picture, and w, h are respectively the width and height of bounding box;(cx, cy) It is p for the top left co-ordinate of gridw, phWidth and high difference for input picture.
Preferably, specific step is as follows in the step S4:
Step S401 Image Acquisition: the image comprising all kinds of targets is as training image, every figure in acquisition daily life As take through processing all obtain about the bounding box of target in the image and the information of classification;
Step S402 is that each premeasuring establishes cost function for training, for the centre coordinate of bounding box, using public affairs Formula
It is high for the width of bounding box as cost function, using formula
As cost function, for predicting classification, using formula
Wherein λcoordAnd λnoobjBe in order to allow cost function to make balance between bounding box and the cost of probability, andTable Show that target appears in i-th of grid,Indicate the target of the corresponding prediction of j-th of bounding box in i-th of grid, it is final to obtain To following cost function:
Step S403 is input to the data marked being collected into step S401 in network, and the parameter of each layer is pressed It is initialized according to the mode of Xavier, and using the boarding steps returned as bounding box coordinates with loss function composed by object classification Degree descent algorithm is calculated loss gradient and is finely adjusted using reverse conduction algorithm to the parameter in layers all in whole network, is reached To the purpose being trained to network.
Preferably, in the step S1, the network structure for replacing connection with conversion layer using intensive link block carries out feature It extracts, the Feature Mapping for more preferably having identification in image can be extracted.
Preferably, under the intensive connection convolution of described one kind and on feature pyramid " and " parallel path " FPN for forming Network, can efficiently use high semantic low scale and high yardstick is spoken in a low voice the Feature Mapping of justice, construct and have high semantic feature, big The feature pyramid of scale and high location information.
Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair The restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above description To make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all this Made any modifications, equivalent replacements, and improvements etc., should be included in the claims in the present invention within the spirit and principle of invention Protection scope within.

Claims (7)

1. a kind of full convolution object detection method of intensive connection convolutional neural networks, which comprises the following steps:
Step S1: construction feature extracts network Densenet, and feature extraction network is made of multiple intensive link blocks and conversion layer, The visual signature that identification is had more in image can be recognized using intensive link block, input picture passes through feature extraction network Afterwards, retain the features with different semantemes and different resolution of each intensive link block output;
Step S2: construction feature pyramid FPN, it is input in FPN each layer feature is retained in step S1, according to characteristic dimension heap It is folded, formed one from bottom to top, the incremental low semantic feature pyramid of scale, by lowest level, every layer of feature is all passed through " flat Walking along the street diameter " carries out convolution operation to obtain higher Semantic;The feature after convolution can be by upper sampling to upper one layer of feature simultaneously Same scale, and merged with upper one layer of feature, this feature continues to up transmit, until pyramid tower top, circulation This step is until constructing complete feature pyramid;
Step S3: full convolution fallout predictor FCP network is constructed, full convolution fallout predictor FCP is one can export object boundary frame simultaneously The fallout predictor of information and class probability is respectively predicted the Feature Mapping of all scales in feature pyramid, fallout predictor The Feature Mapping of input is set to export the vector of a size S*S* (B*5+C) after a convolutional neural networks as prediction As a result, it, which is functioned as, is divided into S*S grid original image, and to B bounding box of each grid forecasting, each bounding box packet Containing 5 information, the centre coordinate deviant (t including bounding boxx, ty), the high deviant (t of the width of bounding boxw,th), and prediction The confidence level t of bounding box0, there are also the probability to C target category of each grid forecasting;
Step S4: training overall network, acquisition target image parameter are simultaneously input in network, the parameter of each layer network according to The mode of Xavier initializes, and using the stochastic gradient returned as bounding box coordinates with loss function composed by object classification Descent algorithm is calculated loss gradient and is finely adjusted using reverse conduction algorithm to the parameter in layers all in whole network.
2. the full convolution object detection method of intensive connection convolutional neural networks according to claim 1, which is characterized in that Specific step is as follows in the step S1:
Step S101 is adjusted existing trained intensive connection convolutional neural networks model to obtain preliminary feature Extract network model;
Step S102 intensively connects convolutional neural networks and is divided into multiple intensive link blocks in implementation process, and different intensively connects It connects and is attached between block by conversion layer;
Step S103 in an intensive link block have multiple convolutional neural networks layers, each convolutional neural networks layer it is defeated Enter be all convolutional neural networks layers in the same intensive link block before it output superposition;If in intensive link block L layers of convolutional network input is xl, export as yl, then xl=(x1+y1+…+yl-1), yl=H (xl), wherein H () is defined as Activation primitive;
Step S104 H () is the activation primitive that every layer of convolutional neural networks are followed by, it is a composition operation herein, is indicated Input xlA BN operation is first passed through, using a ReLU function, is finally swashed by the processing of a convolutional layer as entire The output of function living;
Step S105 is since the space size of different intensive link blocks is different, so being carried out between each other by a conversion layer It connects, the more than conversion layer output of one intensive link block first passes through a BN operation as input, then connects a convolutional Neural The space size of Feature Mapping finally is adjusted to meet the defeated of next intensive link block by network layer by a pond layer Enter;Setting the space size by pond layer Feature Mapping herein becomes original 1/n times;
The intensive link block of step S106 and conversion layer repeatedly alternately connect, so that the space size of Feature Mapping is every to pass through one All reduce after a intensive link block, and the port number of Feature Mapping then increases, and sets each intensive link block last herein The Feature Mapping of layer convolutional neural networks output is Cm
The overall situation that step S107 deletes existing intensive connection convolutional neural networks is averaged pond layer and the classification layer that connects entirely, and The Feature Mapping that the last layer convolutional neural networks of the last one intensive link block are exported is as the defeated of feature extraction network Out.
3. the full convolution object detection method of intensive connection convolutional neural networks according to claim 2, which is characterized in that Specific step is as follows in the step S2:
Step S201 FPN is made of " feature pyramid from bottom to top " and one " parallel path ", and FPN is first from feature extraction The visual signature that its each layer has different semantic different scales is obtained in network, is then generated by the build stack of " from bottom to top " The feature pyramid of lower semantic feature;
Step S202 takes first input of the Feature Mapping exported in step S107 as FPN, and the Feature Mapping of input is with one Port number is adjusted to a constant d by convolutional layer, and will be pyramidal most as feature by port number Feature Mapping adjusted Low-level feature mapping, sets the Feature Mapping of every layer of feature pyramid herein as Dm
" path from bottom to top " in step S203FPN, main task be low one layer of Feature Mapping pyramidal to feature into It samples on row, the factor sampled thereon is characterized the n reciprocal for extracting the diminution factor of pond layer in network, obtained Feature Mapping The Feature Mapping space size having the same of intensive link block output corresponding with step S1;
" parallel path " in step S204FPN, the Feature Mapping of its each intensive link block output using in step S1 is as defeated Enter, the port number of the Feature Mapping of output is then adjusted to d using a convolutional layer;
Step S205 pass through step S203 and step S204, obtain two on space size and port number identical feature reflect It penetrates, the two Feature Mappings is carried out corresponding element addition, then reach in the upper sampling process of reduction by a convolutional layer Aliasing effect results in low one layer of the Feature Mapping of feature pyramid, in step S203 and step S204 to input Operation is denoted as f () and g () respectively, then Dm=g (Cm), Dk=∫ (f (Dk+1)+g(Ck)), wherein (0 < k < m), ∫ is indicated in S2.5 Convolution operation;
Step S206 repeats step S203, step S204 and step S205, so that successively up from the pyramidal lowermost layer of feature Construct entire feature pyramid in ground.
4. the full convolution object detection method of intensive connection convolutional neural networks according to claim 2, which is characterized in that Specific step is as follows in the step S3:
Step S301 has obtained a feature pyramid in step S02, its main feature is that the pyramidal characteristic dimension of feature is under And upper layer-by-layer increase, but each layer of port number remains unchanged, the ratio of the space size of the Feature Mapping of adjacent two layers because Son is n, and building one exports the fallout predictor of object boundary frame information and class probability simultaneously, and fallout predictor will act on feature gold word Each layer of feature of tower enables network to utilize the Feature Mapping of different scale;
Step S302 exports the building of the fallout predictor of object boundary frame information and class probability, special with the pyramidal a certain layer of feature Sign is mapped as inputting, and after the processing of two full articulamentums, exports the vector of a S*S* (B*5+C) as prediction result, It, which is functioned as, is divided into S*S grid original image, and to B bounding box of each grid forecasting, each bounding box includes 5 Information, the centre coordinate deviant (t including bounding boxx, ty), the high deviant (t of the width of bounding boxw, th) and predicted boundary The confidence level t of frame0, there are also the probability to C target category of each grid forecasting;
The calculating of step S303 coordinate value:
X=cx+σ(tx)
Y=cy+σ(ty)
σ(t0)=Pr (object) * IOU (b, object)
Wherein x, y are the actual coordinate of bounding box center in the picture, and w, h are respectively the width and height of bounding box;(cx, cy) it is lattice The top left co-ordinate of son is pw, phWidth and high difference for input picture.
5. the full convolution object detection method of intensive connection convolutional neural networks according to claim 1, which is characterized in that Specific step is as follows in the step S4:
Step S401 Image Acquisition: the image comprising all kinds of targets is as training image, every picture strip in acquisition daily life It is upper by processing all obtain about the bounding box of target in the image and the information of classification;
Step S402 is that each premeasuring establishes cost function for training, for the centre coordinate of bounding box, using formula
It is high for the width of bounding box as cost function, using formula
As cost function, for predicting classification, using formula
Wherein λcoordAnd λnoobjBe in order to allow cost function to make balance between bounding box and the cost of probability, andIndicate mesh It marks in present i-th of grid,The target for indicating the corresponding prediction of j-th of bounding box in i-th of grid, finally obtain as Under cost function:
Step S403 is input to the data marked being collected into step S401 in network, the parameter of each layer according to The mode of Xavier initializes, and using the stochastic gradient returned as bounding box coordinates with loss function composed by object classification Descent algorithm is calculated loss gradient and is finely adjusted using reverse conduction algorithm to the parameter in layers all in whole network, is reached The purpose that network is trained.
6. a kind of full convolution object detection method of intensive connection convolutional neural networks according to claim 1, feature It is, in the step S1, the network structure for replacing connection with conversion layer using intensive link block carries out feature extraction, can extract More preferably there is the Feature Mapping of identification into image.
7. the feature pyramid under a kind of intensive connection convolution according to claim 1 " and " parallel path " composition FPN network, can efficiently use high semantic low scale and high yardstick is spoken in a low voice the Feature Mapping of justice, construct have it is high semantic special The feature pyramid of sign, large scale and high location information.
CN201810998184.3A 2018-08-29 2018-08-29 Full convolution target detection method of densely connected convolution neural network Active CN109214505B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810998184.3A CN109214505B (en) 2018-08-29 2018-08-29 Full convolution target detection method of densely connected convolution neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810998184.3A CN109214505B (en) 2018-08-29 2018-08-29 Full convolution target detection method of densely connected convolution neural network

Publications (2)

Publication Number Publication Date
CN109214505A true CN109214505A (en) 2019-01-15
CN109214505B CN109214505B (en) 2022-07-01

Family

ID=64985668

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810998184.3A Active CN109214505B (en) 2018-08-29 2018-08-29 Full convolution target detection method of densely connected convolution neural network

Country Status (1)

Country Link
CN (1) CN109214505B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109815886A (en) * 2019-01-21 2019-05-28 南京邮电大学 A kind of pedestrian and vehicle checking method and system based on improvement YOLOv3
CN109871823A (en) * 2019-03-11 2019-06-11 中国电子科技集团公司第五十四研究所 A kind of satellite image Ship Detection of combination rotating frame and contextual information
CN110009622A (en) * 2019-04-04 2019-07-12 武汉精立电子技术有限公司 A kind of display panel open defect detection network and its defect inspection method
CN110060274A (en) * 2019-04-12 2019-07-26 北京影谱科技股份有限公司 The visual target tracking method and device of neural network based on the dense connection of depth
CN110322509A (en) * 2019-06-26 2019-10-11 重庆邮电大学 Object localization method, system and computer equipment based on level Class Activation figure
CN110555371A (en) * 2019-07-19 2019-12-10 华瑞新智科技(北京)有限公司 Wild animal information acquisition method and device based on unmanned aerial vehicle
CN110689081A (en) * 2019-09-30 2020-01-14 中国科学院大学 Weak supervision target classification and positioning method based on bifurcation learning
CN112016535A (en) * 2020-10-26 2020-12-01 成都合能创越软件有限公司 Vehicle-mounted garbage traceability method and system based on edge calculation and block chain
CN112560778A (en) * 2020-12-25 2021-03-26 万里云医疗信息科技(北京)有限公司 DR image body part identification method, device, equipment and readable storage medium
CN112581470A (en) * 2020-09-15 2021-03-30 佛山中纺联检验技术服务有限公司 Small target object detection method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250812A (en) * 2016-07-15 2016-12-21 汤平 A kind of model recognizing method based on quick R CNN deep neural network
CN106844442A (en) * 2016-12-16 2017-06-13 广东顺德中山大学卡内基梅隆大学国际联合研究院 Multi-modal Recognition with Recurrent Neural Network Image Description Methods based on FCN feature extractions
CN107437096A (en) * 2017-07-28 2017-12-05 北京大学 Image classification method based on the efficient depth residual error network model of parameter
US20170372201A1 (en) * 2016-06-22 2017-12-28 Massachusetts Institute Of Technology Secure Training of Multi-Party Deep Neural Network
CN108182388A (en) * 2017-12-14 2018-06-19 哈尔滨工业大学(威海) A kind of motion target tracking method based on image

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170372201A1 (en) * 2016-06-22 2017-12-28 Massachusetts Institute Of Technology Secure Training of Multi-Party Deep Neural Network
CN106250812A (en) * 2016-07-15 2016-12-21 汤平 A kind of model recognizing method based on quick R CNN deep neural network
CN106844442A (en) * 2016-12-16 2017-06-13 广东顺德中山大学卡内基梅隆大学国际联合研究院 Multi-modal Recognition with Recurrent Neural Network Image Description Methods based on FCN feature extractions
CN107437096A (en) * 2017-07-28 2017-12-05 北京大学 Image classification method based on the efficient depth residual error network model of parameter
CN108182388A (en) * 2017-12-14 2018-06-19 哈尔滨工业大学(威海) A kind of motion target tracking method based on image

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109815886A (en) * 2019-01-21 2019-05-28 南京邮电大学 A kind of pedestrian and vehicle checking method and system based on improvement YOLOv3
CN109871823A (en) * 2019-03-11 2019-06-11 中国电子科技集团公司第五十四研究所 A kind of satellite image Ship Detection of combination rotating frame and contextual information
CN109871823B (en) * 2019-03-11 2021-08-31 中国电子科技集团公司第五十四研究所 Satellite image ship detection method combining rotating frame and context information
CN110009622A (en) * 2019-04-04 2019-07-12 武汉精立电子技术有限公司 A kind of display panel open defect detection network and its defect inspection method
CN110060274A (en) * 2019-04-12 2019-07-26 北京影谱科技股份有限公司 The visual target tracking method and device of neural network based on the dense connection of depth
CN110322509A (en) * 2019-06-26 2019-10-11 重庆邮电大学 Object localization method, system and computer equipment based on level Class Activation figure
CN110322509B (en) * 2019-06-26 2021-11-12 重庆邮电大学 Target positioning method, system and computer equipment based on hierarchical class activation graph
CN110555371A (en) * 2019-07-19 2019-12-10 华瑞新智科技(北京)有限公司 Wild animal information acquisition method and device based on unmanned aerial vehicle
CN110689081A (en) * 2019-09-30 2020-01-14 中国科学院大学 Weak supervision target classification and positioning method based on bifurcation learning
CN112581470A (en) * 2020-09-15 2021-03-30 佛山中纺联检验技术服务有限公司 Small target object detection method
CN112016535A (en) * 2020-10-26 2020-12-01 成都合能创越软件有限公司 Vehicle-mounted garbage traceability method and system based on edge calculation and block chain
CN112560778A (en) * 2020-12-25 2021-03-26 万里云医疗信息科技(北京)有限公司 DR image body part identification method, device, equipment and readable storage medium

Also Published As

Publication number Publication date
CN109214505B (en) 2022-07-01

Similar Documents

Publication Publication Date Title
CN109214505A (en) A kind of full convolution object detection method of intensive connection convolutional neural networks
Li et al. RSI-CB: A large scale remote sensing image classification benchmark via crowdsource data
CN108334847B (en) A kind of face identification method based on deep learning under real scene
CN109543630B (en) Remote sensing image woodland extraction method and system based on deep learning, storage medium and electronic equipment
CN109241913A (en) In conjunction with the ship detection method and system of conspicuousness detection and deep learning
CN104376326B (en) A kind of feature extracting method for image scene identification
CN110222787A (en) Multiscale target detection method, device, computer equipment and storage medium
CN107808375B (en) Merge the rice disease image detecting method of a variety of context deep learning models
CN109800629A (en) A kind of Remote Sensing Target detection method based on convolutional neural networks
CN104239902B (en) Hyperspectral image classification method based on non local similitude and sparse coding
CN101271526B (en) Method for object automatic recognition and three-dimensional reconstruction in image processing
CN108648169A (en) The method and device of high voltage power transmission tower defects of insulator automatic identification
CN107665498A (en) The full convolutional network airplane detection method excavated based on typical case
Li et al. Breaking the resolution barrier: A low-to-high network for large-scale high-resolution land-cover mapping using low-resolution labels
CN108090906A (en) A kind of uterine neck image processing method and device based on region nomination
CN109766873A (en) A kind of pedestrian mixing deformable convolution recognition methods again
CN108256462A (en) A kind of demographic method in market monitor video
Bhagat et al. WheatNet-lite: A novel light weight network for wheat head detection
CN102880870B (en) The extracting method of face characteristic and system
CN116363521B (en) Semantic prediction method for remote sensing image
CN113822185A (en) Method for detecting daily behavior of group health pigs
CN114898089B (en) Functional area extraction and classification method fusing high-resolution images and POI data
CN109360191A (en) A kind of image significance detection method based on variation self-encoding encoder
Hu et al. A bag of tricks for fine-grained roof extraction
CN107121681A (en) Residential area extraction system based on high score satellite remote sensing date

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant