CN111950709B - SquezeNet network folding construction method and system for unmanned aerial vehicle image target detection - Google Patents

SquezeNet network folding construction method and system for unmanned aerial vehicle image target detection Download PDF

Info

Publication number
CN111950709B
CN111950709B CN202010808453.2A CN202010808453A CN111950709B CN 111950709 B CN111950709 B CN 111950709B CN 202010808453 A CN202010808453 A CN 202010808453A CN 111950709 B CN111950709 B CN 111950709B
Authority
CN
China
Prior art keywords
module
bram
intermediate result
data flow
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010808453.2A
Other languages
Chinese (zh)
Other versions
CN111950709A (en
Inventor
杜培栋
王一鸣
徐磊
王琴
蒋剑飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202010808453.2A priority Critical patent/CN111950709B/en
Publication of CN111950709A publication Critical patent/CN111950709A/en
Application granted granted Critical
Publication of CN111950709B publication Critical patent/CN111950709B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Astronomy & Astrophysics (AREA)
  • Remote Sensing (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The application provides a SquezeNet network folding construction method and a system for unmanned aerial vehicle image target detection, wherein the method comprises the following steps: step S1: calling and configuring each submodule to realize the whole convolution pooling part; step S2: acquiring object type information, object confidence information and object position information; step S3: determining a data flow direction and acquiring data flow direction determination result information; step S4: according to the convolution operation number which is carried out simultaneously by a hardware structure and the structure of the image recognition neural network, dividing a convolution pooling part into 7 stages, updating the weight of the HalfFire module, and after the HalfFire module is instantiated for the configuration of the accelerator, sequentially completing corresponding operations in each stage; step S5: and acquiring SquezeNet network folding construction result information for unmanned aerial vehicle image target detection. The application has reasonable structure and convenient use, and overcomes the defects of the prior art.

Description

SquezeNet network folding construction method and system for unmanned aerial vehicle image target detection
Technical Field
The application relates to the technical field of unmanned aerial vehicle image target detection, in particular to a SquezeNet network folding construction method and system for unmanned aerial vehicle image target detection.
Background
Neural network technology is an important branch of artificial intelligence. A large number of neurons (basic units of storage, operation and input and output) form a hierarchical structure similar to a human brain through interconnection, and the hierarchical structure is a neural network and is generally composed of an input layer, an output layer and a plurality of hidden layers. Under the condition of supervised training, a large amount of tagged data is utilized to train the neural network, and the information such as weight, bias and the like of nodes in the neural network is continuously adjusted through two stages of forward propagation and residual reverse propagation, so that the aim of outputting a correct result is finally achieved. The neural network has high precision and strong learning ability, and has wide and important application in the fields of image voice recognition, pattern recognition and the like. The neural networks are of many kinds, including BP neural networks, convolutional Neural Networks (CNNs), and Recurrent Neural Networks (RNNs). The convolutional neural network has important roles in the field of image recognition due to the characteristics of weight sharing, region feature extraction and the like. In large-scale visual challenge games (ILSVRC), the best performance of image recognition is created by convolutional neural network correlation algorithms. A special basic module, fire module (shown in fig. 1), is proposed in the SqueezeNet neural network. The Fire module largely uses 1 x 1 convolution kernels instead of 3 x 3 convolution kernels, thereby greatly reducing the number of parameters of the model. A Fire module includes a compressed convolution layer (only 1 x 1 convolution kernels), and an extension layer containing 1 x 1 and 3 x 3 convolution kernels. There are three adjustable dimensions (super parameters) in the Fire module: s1×1, e1×1, and e3×3. Where s1×1 is the number of convolutions in the compression layer, e1×1 is the number of 1×1 convolution kernels in the expansion layer, and e3×3 is the number of 3×3 convolution kernels in the expansion layer. When using Fire modules, s1×1 is set to be smaller than (e1×1+e3×3), thereby limiting the number of input channels of the 3×3 convolution kernel, achieving the goal of reducing model parameters.
Patent document CN111291634a discloses an unmanned aerial vehicle image target detection method based on a convolution-limited boltzmann machine, firstly, obtaining a block in an aerial image, and constructing an initial training sample data set; performing expansion processing on the initial training sample data set, and combining the initial training sample data set and the expanded training sample data set to obtain a total training sample data set; constructing a convolution limited Boltzmann model, extracting feature vectors of a total training sample data set by adopting convolution kernels with two different sizes, calculating probability distribution of a visible layer and a hidden layer according to the feature vectors, and calculating parameters of the limited Boltzmann model to finish training of the convolution limited Boltzmann model; and inputting the aerial image into a trained convolution-limited Boltzmann model to obtain a target detection classification result. There is still room for improvement in performance.
Disclosure of Invention
Aiming at the defects in the prior art, the application aims to provide a SquezeNet network folding construction method and system for unmanned aerial vehicle image target detection.
The application provides a SquezeNet network folding construction method for unmanned aerial vehicle image target detection, which is characterized by comprising the following steps: step S1: aiming at setting a convolutional neural network, constructing a convolutional pooling layer on hardware, and calling and configuring each submodule to realize the whole convolutional pooling part by controlling iteration; step S2: using 7 HalfFire modules to obtain object type information, object confidence information and object position information; step S3: determining a data flow direction and acquiring data flow direction determination result information; step S4: according to the convolution operation number which is carried out simultaneously by a hardware structure and the structure of the image recognition neural network, dividing a convolution pooling part into 7 stages, updating the weight of the HalfFire module, and after the HalfFire module is instantiated for the configuration of the accelerator, sequentially completing corresponding operations in each stage; step S5: and acquiring SquezeNet network folding construction result information for unmanned aerial vehicle image target detection.
Preferably, the step S3 includes: step S3.1: adopting a folded network architecture, and realizing 1 parameter-configurable HalfFire module, 4 convolution layers and 3 pooling layers through hardware; step S3.2: the halfmire module is instantiated for configuration of the accelerator by updating its weight, determining the data flow direction.
Preferably, the step S4 includes: step S4.1: inputting weight of HalfFire1, reading picture to be detected from BRAM, inputting 224×224×3 (height×width×depth) image into network, performing maximum pooling via Conv2D layer with convolution kernel size of 3×3×32, performing maximum pooling via HalfFire1 module with s1×1, e1×1 and e3×3 of 32,0 and 96 respectively, performing maximum pooling, and controlling data flow to be Input->Conv2D->Maxpool->HalfFire->Maxpool->Output, store the first intermediate result in BRAM; step S4.2: inputting the weight of HalfFire2, reading the buffered first intermediate result from BRAM, and passing s 1×1 ,e 1×1 And e 3×3 Halfmire 2 modules of 32,0, 96 respectively, and then max poolingThe method comprises the steps of carrying out a first treatment on the surface of the Control data flow to be Input->HalfFire->Maxpool->Output, store the second intermediate result in BRAM.
Preferably, the step S4 includes: step S4.3: inputting the weight of HalfFire3, reading the buffered second intermediate result from BRAM, and passing s 1×1 ,e 1×1 And e 3×3 Halfmire 3 modules 32,0, 96, respectively; control data flow to be Input->HalfFire->Output, store the third intermediate result in BRAM; step S4.4: inputting the weight of HalfFire4, reading the buffered third intermediate result from BRAM, and passing through s 1×1 ,e 1×1 And e 3×3 halfFire4 modules of 32,0, 96 respectively, control the data flow to be Input->HalfFire->Output, store the fourth intermediate result in BRAM; step S4.5: inputting the weight of HalfFire5, reading the buffered fourth intermediate result from BRAM, and passing through s 1×1 ,e 1×1 And e 3×3 Halfmire 5 modules 32,0, 96, respectively; control data flow to be Input->HalfFire->Output, store the fifth intermediate result in BRAM.
Preferably, the step S4 includes: step S4.6: inputting the weight of HalfFire6, reading the buffered fifth intermediate result from BRAM, and passing through s 1×1 ,e 1×1 And e 3×3 halfFire6 modules of 32,0, 96 respectively, control the data flow to be Input->HalfFire->Output, store the sixth intermediate result in BRAM; step S4.7: inputting the weight of HalfFire7, reading the buffered sixth intermediate result from BRAM, and then passing s 1×1 ,e 1×1 And e 3×3 The HalfFire7 modules which are 32,0 and 96 respectively pass through a 1 multiplied by 1 convolution layer ConvObj, are combined with the original input, are respectively input into a convolution layer ConvObj and a ConvBox, and finally perform maximum value selection in ConvObj layer activation, and perform maximum value selection on ConvBox layer activation on the basis; control data flow to be Input->HalfFire
- > ConvClass- > ConvBox, convObj- > SelectMax- > Output, storing the detection result in the SDMA register, waiting for the processor to read.
The application provides a SqueEzeNet network folding construction system for unmanned aerial vehicle image target detection, which is characterized by comprising the following components: module M1: aiming at setting a convolutional neural network, constructing a convolutional pooling layer on hardware, and calling and configuring each submodule to realize the whole convolutional pooling part by controlling iteration; module M2: using 7 HalfFire modules to obtain object type information, object confidence information and object position information; module M3: determining a data flow direction and acquiring data flow direction determination result information; module M4: according to the convolution operation number which is carried out simultaneously by a hardware structure and the structure of the image recognition neural network, dividing a convolution pooling part into 7 stages, updating the weight of the HalfFire module, and after the HalfFire module is instantiated for the configuration of the accelerator, sequentially completing corresponding operations in each stage; module M5: and acquiring SquezeNet network folding construction result information for unmanned aerial vehicle image target detection.
Preferably, the module M3 comprises: module M3.1: adopting a folded network architecture, and realizing 1 parameter-configurable HalfFire module, 4 convolution layers and 3 pooling layers through hardware; module M3.2: the halfmire module is instantiated for configuration of the accelerator by updating its weight, determining the data flow direction.
Preferably, the module M4 comprises: module M4.1: inputting the weight of HalfFire1, reading a picture to be detected from a BRAM, wherein the size of an Input image of the network is 224 multiplied by 3 (height multiplied by width multiplied by depth), carrying out maximum pooling through Conv2D layers with convolution kernel size of 3 multiplied by 32, carrying out maximum pooling through HalfFire1 modules with s1 multiplied by 1, e1 multiplied by 1 and e3 multiplied by 3 respectively being 32,0 and 96, carrying out maximum pooling, controlling the flow of data to be Input- > Conv2D- > Maxpool- > HalfFire- > Maxpool- > Output, and storing a first intermediate result into the BRAM; module M4.2: inputting the weight of HalfFire2, reading the first intermediate result from BRAM, and carrying out maximum value pooling after passing through HalfFire2 modules with s1 multiplied by 1, e1 multiplied by 1 and e3 multiplied by 3 being 32,0 and 96 respectively; and controlling the data flow direction to be Input- > HalfFire- > Maxpool- > Output, and storing the second intermediate result into BRAM.
Preferably, the module M4 comprises: module M4.3: inputting the weight of HalfFire3, reading the buffered second intermediate result from BRAM, and passing through s 1X1, e1X1 and e3X1 are HalfFire3 modules of 32,0, 96 respectively; control data flow to be Input->HalfFire->Output, store the third intermediate result in BRAM; module M4.4: inputting the weight of HalfFire4, reading the buffered third intermediate result from BRAM, and controlling the data flow to be Input->HalfFire->Output, store the fourth intermediate result in BRAM; module M4.5: inputting the weight of HalfFire5, reading the buffered fourth intermediate result from BRAM, and passing through s 1×1 ,e 1×1 And e 3×3 Halfmire 5 modules 32,0, 96, respectively; control data flow to be Input->HalfFire->Output, store the fifth intermediate result in BRAM.
Preferably, the module M4 comprises: module M4.6: inputting the weight of HalfFire6, reading the buffered fifth intermediate result from BRAM, and passing through s 1×1 ,e 1×1 And e 3×3 halfFire6 modules of 32,0, 96 respectively, control the data flow to be Input->HalfFire->Output, store the sixth intermediate result in BRAM; module M4.7: inputting the weight of HalfFire7, reading the buffered sixth intermediate result from BRAM, and then passing s 1×1 ,e 1×1 And e 3×3 The HalfFire7 modules which are 32,0 and 96 respectively pass through a 1 multiplied by 1 convolution layer ConvObj, are combined with the original input, are respectively input into a convolution layer ConvObj and a ConvBox, and finally perform maximum value selection in ConvObj layer activation, and perform maximum value selection on ConvBox layer activation on the basis; control data flow to be Input->HalfFire->ConvClass->ConvBox、ConvObj->SelectMax->Output, the detection result is stored in SDMA register, wait processor to read.
Compared with the prior art, the application has the following beneficial effects:
1. the application is based on a SquezeNet neural network architecture, where the SquezeNet starts with an independent convolutional layer (conv 1), then 8 Fire modules (Fire 2-9), and finally one convolutional layer (conv 10). From the beginning to the end of the network we gradually increase the number of filters per Fire module. The SqueezeNet performs a maximum pooling with a step size of 2 after conv1, fire4, fire8 and conv10 layers. These relatively late pooling layers allow many layers in the network to have larger feature patterns, which may lead to higher classification accuracy.
2. In the present application, the network input image size is 224×224×3 (height×width×depth), the conv1 convolution kernel is 7×7×1, and the ReLU activation function is adopted for maximum pooling. The first layer convolution includes 96 convolution kernels, generating 96 111 x 111 pixel feature maps, downsampled to 55 x 55 pixel feature maps. Then through Fire2 and Fire3 modules with the s1 multiplied by 1, e1 multiplied by 1 and e3 multiplied by 3 being 16, 64 and 64 respectively, the output size is 55 multiplied by 128; then through the Fire4 module with the output sizes of 32, 128 and 128 of s1×1, e1×1 and e3×3 respectively, the output sizes become 55×55×256, and the maximum value is pooled at this time, and downsampled into a 27×27 pixel characteristic diagram. Then the output size is 27×27×512 through the Fire5 module with s1×1, e1×1 and e3×3 being 32, 128 and 128 respectively, and then through the Fire6 and Fire7 module with s1×1, e1×1 and e3×3 being 48, 192 and 192 respectively, at this time, the maximum value is pooled, and the downsampled into 13×13 pixel characteristic diagram. And finally, carrying out mean value pooling through a convolution layer conv 10. In order for the 1 x 1 and 3 x 3 sized fi lter output activations to have the same height and width, we add a 1 pixel zero-fill boundary to the 3 x 3fi lter input data of the expansion module. ReLU is used for activation of the compression layer and the extension layer.
3. The application has reasonable structure and convenient use, and overcomes the defects of the prior art.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:
fig. 1 is a schematic diagram of a Fire module structure in an embodiment of the application.
Fig. 2 is a schematic diagram of a SqueezeNet network structure in an embodiment of the application.
Fig. 3 is a schematic diagram of a HalfSqueezeNet network in an embodiment of the present application.
Fig. 4 is a schematic diagram of a folded structure of a HalfSqueezeNet network in an embodiment of the present application.
Detailed Description
The present application will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the present application, but are not intended to limit the application in any way. It should be noted that variations and modifications could be made by those skilled in the art without departing from the inventive concept. These are all within the scope of the present application.
For a certain convolutional neural network, a convolutional-pooling layer is constructed on hardware, and the hardware cost can be saved by controlling iteration and calling and configuring each submodule to realize the whole convolutional-pooling part.
The whole network uses 7 HalfFire modules, and uses ConvClass, convObj and ConvBox three convolution layers for calculating object type information, object confidence information and object position information respectively. The folding network architecture is adopted, and 1 parameter-configurable HalfFire module, 4 convolution layers and 3 pooling layers are realized through hardware. The halfmire module is instantiated for configuration of the accelerator by updating its weight, determining the data flow direction.
1. Division of phases
According to the convolution operation number (shown in fig. 4) and the image recognition neural network structure (shown in fig. 3) which can be simultaneously carried out by the hardware structure, the convolution-pooling part is divided into 7 stages, the weight of the HalfFire module is updated, and after the HalfFire module is instantiated for the configuration of the accelerator, the corresponding operation is completed in each stage in turn.
2. Operation of each stage
(1) Inputting the weight of HalfFire1, reading the picture to be detected from BRAM, wherein the size of the network input image is 224×224×3 (height×width×depth), and the picture is subjected to Conv2D layer with convolution kernel size of 3×3×32, and the maximum value is pooled. After passing through s 1×1 ,e 1×1 And e 3×3 Halfmire 1 modules, 32,0, 96 respectively, were then max pooled. Control data flow to be Input->Conv2D->Maxpool->HalfFire->Maxpool->Output, store intermediate result 1 in BRAM;
(2) Input the weight of HalfFire2 fromBRAM reads cached intermediate result 1, through s 1×1 ,e 1×1 And e 3×3 Halfmire 2 modules, 32,0, 96 respectively, were then max pooled. Control data flow to be Input->HalfFire->Maxpool->Output, store intermediate result 2 in BRAM;
(3) Input the weight of HalfFire3, read buffered intermediate result 2 from BRAM, pass s 1×1 ,e 1×1 And e 3×3 Halfmire 3 modules of 32,0, 96, respectively. Control data flow to be Input->HalfFire
- > Output, storing intermediate result 3 in BRAM;
(4) Input the weight of HalfFire4, read buffered intermediate result 3 from BRAM, pass s 1×1 ,e 1×1 And e 3×3 Halfmire 4 modules of 32,0, 96, respectively. Control data flow to be Input->HalfFire
- > Output, storing intermediate result 4 in BRAM;
(5) Input the weight of halfmire 5, read the buffered intermediate result 4 from BRAM, pass s 1×1 ,e 1×1 And e 3×3 Halfmire 5 modules of 32,0, 96, respectively. Control data flow to be Input->HalfFire
- > Output, storing intermediate result 5 in BRAM;
(6) Input the weight of halfmire 6, read the buffered intermediate result 5 from BRAM, pass s 1×1 ,e 1×1 And e 3×3 Halfmire 6 modules of 32,0, 96, respectively. Control data flow to be Input->HalfFire
- > Output, storing intermediate result 6 in BRAM;
(7) The buffered intermediate result 6 is read from BRAM by inputting the weight of halfmire 7, and then passed s 1×1 ,e 1×1 And e 3×3 The HalfFire7 modules which are 32,0 and 96 respectively pass through a 1 multiplied by 1 convolution layer ConvObj, are combined with the original input, are respectively input into the convolution layers ConvObj and ConvBox, and finally are subjected to maximum value selection in ConvObj layer activation, and the ConvBox layer activation is subjected to maximum value selection on the basis. Control data flow to Input
- > halfFire- > ConvClass- > ConvBox, convObj- > SelectMax- > Output, storing the detection result in the SDMA register, waiting for the processor to read.
Aiming at unmanned aerial vehicle image target recognition, the application designs a hardware accelerator for parallel operation realized by an FPGA based on a convolutional neural network. According to the characteristics of the neural network structure, the hardware realizes the HalFire modules, and the whole network only uses 7 HalfFire modules, so that the parameter quantity and the calculation amount are reduced to a certain extent. Furthermore, for target detection, we add ConvClass, convObj and ConvBox three convolution layers at the end of the network to calculate object type information, object confidence information and object position information, respectively. The folding network architecture is adopted, and 1 parameter-configurable HalfFire module, 4 convolution layers and 3 pooling layers are realized through hardware. Before each calculation, the weight of the HalfFire module is updated, the HalfFire module is instantiated by the configuration of the accelerator, the data flow direction is determined, and then the data is input for calculation. And after the image to be detected is processed by the accelerator for 7 times, a target detection result is obtained.
In the description of the present application, it should be understood that the terms "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are merely for convenience in describing the present application and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present application.
The foregoing describes specific embodiments of the present application. It is to be understood that the application is not limited to the particular embodiments described above, and that various changes or modifications may be made by those skilled in the art within the scope of the appended claims without affecting the spirit of the application. The embodiments of the application and the features of the embodiments may be combined with each other arbitrarily without conflict.

Claims (4)

1. The SquezeNet network folding construction method for unmanned aerial vehicle image target detection is characterized by comprising the following steps of: step S1: aiming at setting a convolutional neural network, constructing a convolutional pooling layer on hardware, and calling and configuring each submodule to realize the whole convolutional pooling part by controlling iteration;
step S2: using 7 HalfFire modules to obtain object type information, object confidence information and object position information;
step S3: determining a data flow direction and acquiring data flow direction determination result information;
step S4: according to the convolution operation number which is carried out simultaneously by a hardware structure and the structure of the image recognition neural network, dividing a convolution pooling part into 7 stages, updating the weight of the HalfFire module, and after the HalfFire module is instantiated for the configuration of the accelerator, sequentially completing corresponding operations in each stage;
step S5: acquiring SquezeNet network folding construction result information for unmanned aerial vehicle image target detection;
the step S4 includes:
step S4.1: inputting weight of HalfFire1, reading a picture to be detected from BRAM, performing Conv2D layer with convolution kernel size of 3×3×32, performing maximum pooling, and performing s 1×1 ,e 1×1 And e 3×3 halfFire1 modules of 32,0 and 96 respectively, and performing maximum value pooling to control the data flow direction to be Input->Conv2D->Maxpool
- > HalfFire- > Maxpool- > Output, storing the first intermediate result in BRAM;
step S4.2: inputting the weight of HalfFire2, reading the buffered first intermediate result from BRAM, and passing s 1×1 ,e 1×1 And e 3×3 Halfmire 2 modules of 32,0, 96 respectively, and then carrying out maximum pooling; control data flow to be Input->HalfFire->Maxpool->Output, store the second intermediate result in BRAM;
the step S4 includes:
step S4.3: inputting the weight of HalfFire3, reading the buffered second intermediate result from BRAM, and passing s 1×1 ,e 1×1 And e 3×3 Halfmire 3 modules 32,0, 96, respectively; control data flow to Input
- > halfffire- > Output, storing the third intermediate result in BRAM;
step S4.4: inputting the weight of HalfFire4, reading the buffered third intermediate result from BRAM, and passing through s 1×1 ,e 1×1 And e 3×3 Halfmire 4 modules 32,0, 96, respectively, control the data flow to Input
- > halfffire- > Output, storing the fourth intermediate result in BRAM;
step S4.5: inputting the weight of HalfFire5, reading the buffered fourth intermediate result from BRAM, and passing through s 1×1 ,e 1×1 And e 3×3 Halfmire 5 modules 32,0, 96, respectively; control data flow to Input
- > halfffire- > Output, storing the fifth intermediate result in BRAM;
the step S4 includes:
step S4.6: inputting the weight of HalfFire6, reading the buffered fifth intermediate result from BRAM, and passing through s 1×1 ,e 1×1 And e 3×3 Halfmire 6 modules 32,0, 96, respectively, control the data flow to Input
- > halfffire- > Output, storing the sixth intermediate result in BRAM;
step S4.7: inputting the weight of HalfFire7, reading the buffered sixth intermediate result from BRAM, and then passing s 1×1 ,e 1×1 And e 3×3 The HalfFire7 modules which are 32,0 and 96 respectively pass through a 1 multiplied by 1 convolution layer ConvObj, are combined with the original input, are respectively input into a convolution layer ConvObj and a ConvBox, and finally perform maximum value selection in ConvObj layer activation, and perform maximum value selection on ConvBox layer activation on the basis; control data flow to be Input->HalfFire->ConvClass->ConvBox、ConvObj->SelectMax->Output, the detection result is stored in SDMA register, wait processor to read.
2. The method for constructing a folded network of SqueezeNet for unmanned aerial vehicle image object detection according to claim 1, wherein the step S3 comprises:
step S3.1: adopting a folded network architecture, and realizing 1 parameter-configurable HalfFire module, 4 convolution layers and 3 pooling layers through hardware;
step S3.2: the halfmire module is instantiated for configuration of the accelerator by updating its weight, determining the data flow direction.
3. The utility model provides a SquezeNet network folding construction system towards unmanned aerial vehicle image target detection which characterized in that includes: module M1: aiming at setting a convolutional neural network, constructing a convolutional pooling layer on hardware, and calling and configuring each submodule to realize the whole convolutional pooling part by controlling iteration;
module M2: using 7 HalfFire modules to obtain object type information, object confidence information and object position information;
module M3: determining a data flow direction and acquiring data flow direction determination result information;
module M4: according to the convolution operation number which is carried out simultaneously by a hardware structure and the structure of the image recognition neural network, dividing a convolution pooling part into 7 stages, updating the weight of the HalfFire module, and after the HalfFire module is instantiated for the configuration of the accelerator, sequentially completing corresponding operations in each stage;
module M5: acquiring SquezeNet network folding construction result information for unmanned aerial vehicle image target detection;
the module M4 includes:
module M4.1: inputting weight of HalfFire1, reading a picture to be detected from BRAM, performing Conv2D layer with convolution kernel size of 3×3×32, performing maximum pooling, and performing s 1×1 ,e 1×1 And e 3×3 halfFire1 modules of 32,0 and 96 respectively, and performing maximum value pooling to control the data flow direction to be Input->Conv2D->Maxpool
- > HalfFire- > Maxpool- > Output, storing the first intermediate result in BRAM;
module M4.2: inputting the weight of HalfFire2, reading the buffered first intermediate result from BRAM, and passing s 1×1 ,e 1×1 And e 3×3 Halfmire 2 modules of 32,0, 96 respectively, and then carrying out maximum pooling; control data flow to be Input->HalfFire->Maxpool->Output, store the second intermediate result in BRAM;
the module M4 includes:
module M4.3: inputting the weight of HalfFire3, reading the buffered second intermediate result from BRAM, and passing s 1×1 ,e 1×1 And e 3×3 Halfmire 3 modules 32,0, 96, respectively; control data flow to Input
- > halfffire- > Output, storing the third intermediate result in BRAM;
module M4.4: inputting the weight of HalfFire4, reading the buffered third intermediate result from BRAM, and passing through s 1×1 ,e 1×1 And e 3×3 Halfmire 4 modules 32,0, 96, respectively, control the data flow to Input
- > halfffire- > Output, storing the fourth intermediate result in BRAM;
module M4.5: inputting the weight of HalfFire5, reading the buffered fourth intermediate result from BRAM, and passing through s 1×1 ,e 1×1 And e 3×3 Halfmire 5 modules 32,0, 96, respectively; control data flow to Input
- > halfffire- > Output, storing the fifth intermediate result in BRAM;
the module M4 includes:
module M4.6: inputting the weight of HalfFire6, reading the buffered fifth intermediate result from BRAM, and passing through s 1×1 ,e 1×1 And e 3×3 Halfmire 6 modules 32,0, 96, respectively, control the data flow to Input
- > halfffire- > Output, storing the sixth intermediate result in BRAM;
module M4.7: inputting the weight of HalfFire7, reading the buffered sixth intermediate result from BRAM, and then passing s 1×1 ,e 1×1 And e 3×3 The HalfFire7 modules of 32,0 and 96 are respectively input into the convolution layers ConvObj and ConvBox after being combined with the original input through the 1X 1 convolution layer ConvObj, and finally are arranged in the ConvObj layerSelecting a maximum value in the activation process, and selecting the maximum value of ConvBox layer activation on the basis; control data flow to be Input->HalfFire->ConvClass->ConvBox、ConvObj->SelectMax->Output, the detection result is stored in SDMA register, wait processor to read.
4. The unmanned aerial vehicle image object detection-oriented SqueezeNet network fold construction system of claim 3, wherein the module M3 comprises:
module M3.1: adopting a folded network architecture, and realizing 1 parameter-configurable HalfFire module, 4 convolution layers and 3 pooling layers through hardware;
module M3.2: the halfmire module is instantiated for configuration of the accelerator by updating its weight, determining the data flow direction.
CN202010808453.2A 2020-08-12 2020-08-12 SquezeNet network folding construction method and system for unmanned aerial vehicle image target detection Active CN111950709B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010808453.2A CN111950709B (en) 2020-08-12 2020-08-12 SquezeNet network folding construction method and system for unmanned aerial vehicle image target detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010808453.2A CN111950709B (en) 2020-08-12 2020-08-12 SquezeNet network folding construction method and system for unmanned aerial vehicle image target detection

Publications (2)

Publication Number Publication Date
CN111950709A CN111950709A (en) 2020-11-17
CN111950709B true CN111950709B (en) 2023-11-03

Family

ID=73332808

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010808453.2A Active CN111950709B (en) 2020-08-12 2020-08-12 SquezeNet network folding construction method and system for unmanned aerial vehicle image target detection

Country Status (1)

Country Link
CN (1) CN111950709B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109002807A (en) * 2018-07-27 2018-12-14 重庆大学 A kind of Driving Scene vehicle checking method based on SSD neural network
CN109815799A (en) * 2018-12-18 2019-05-28 南京理工大学 A kind of vehicle detecting algorithm of quickly taking photo by plane based on SSD
CN111079923A (en) * 2019-11-08 2020-04-28 中国科学院上海高等研究院 Spark convolution neural network system suitable for edge computing platform and circuit thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109002807A (en) * 2018-07-27 2018-12-14 重庆大学 A kind of Driving Scene vehicle checking method based on SSD neural network
CN109815799A (en) * 2018-12-18 2019-05-28 南京理工大学 A kind of vehicle detecting algorithm of quickly taking photo by plane based on SSD
CN111079923A (en) * 2019-11-08 2020-04-28 中国科学院上海高等研究院 Spark convolution neural network system suitable for edge computing platform and circuit thereof

Also Published As

Publication number Publication date
CN111950709A (en) 2020-11-17

Similar Documents

Publication Publication Date Title
CN111489358B (en) Three-dimensional point cloud semantic segmentation method based on deep learning
CN110619655B (en) Target tracking method and device integrating optical flow information and Simese framework
CN109948029B (en) Neural network self-adaptive depth Hash image searching method
CN111667399B (en) Training method of style migration model, video style migration method and device
CN110378381A (en) Object detecting method, device and computer storage medium
CN107609638B (en) method for optimizing convolutional neural network based on linear encoder and interpolation sampling
CN110033003A (en) Image partition method and image processing apparatus
CN107767419A (en) A kind of skeleton critical point detection method and device
CN111291809B (en) Processing device, method and storage medium
CN109902548A (en) A kind of object properties recognition methods, calculates equipment and system at device
CN112633350A (en) Multi-scale point cloud classification implementation method based on graph convolution
CN110222717A (en) Image processing method and device
CN113326930A (en) Data processing method, neural network training method, related device and equipment
CN111767860A (en) Method and terminal for realizing image recognition through convolutional neural network
CN113449612B (en) Three-dimensional target point cloud identification method based on sub-flow sparse convolution
CN113298032A (en) Unmanned aerial vehicle visual angle image vehicle target detection method based on deep learning
CN112132145A (en) Image classification method and system based on model extended convolutional neural network
CN115018039A (en) Neural network distillation method, target detection method and device
CN115331104A (en) Crop planting information extraction method based on convolutional neural network
CN116863194A (en) Foot ulcer image classification method, system, equipment and medium
CN110532409B (en) Image retrieval method based on heterogeneous bilinear attention network
CN113158970B (en) Action identification method and system based on fast and slow dual-flow graph convolutional neural network
CN111144497B (en) Image significance prediction method under multitasking depth network based on aesthetic analysis
CN111950709B (en) SquezeNet network folding construction method and system for unmanned aerial vehicle image target detection
CN113065529A (en) Motion recognition method and system based on inter-joint association modeling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant