CN111950709B

CN111950709B - SquezeNet network folding construction method and system for unmanned aerial vehicle image target detection

Info

Publication number: CN111950709B
Application number: CN202010808453.2A
Authority: CN
Inventors: 杜培栋; 王一鸣; 徐磊; 王琴; 蒋剑飞
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2020-08-12
Filing date: 2020-08-12
Publication date: 2023-11-03
Anticipated expiration: 2040-08-12
Also published as: CN111950709A

Abstract

The application provides a SquezeNet network folding construction method and a system for unmanned aerial vehicle image target detection, wherein the method comprises the following steps: step S1: calling and configuring each submodule to realize the whole convolution pooling part; step S2: acquiring object type information, object confidence information and object position information; step S3: determining a data flow direction and acquiring data flow direction determination result information; step S4: according to the convolution operation number which is carried out simultaneously by a hardware structure and the structure of the image recognition neural network, dividing a convolution pooling part into 7 stages, updating the weight of the HalfFire module, and after the HalfFire module is instantiated for the configuration of the accelerator, sequentially completing corresponding operations in each stage; step S5: and acquiring SquezeNet network folding construction result information for unmanned aerial vehicle image target detection. The application has reasonable structure and convenient use, and overcomes the defects of the prior art.

Description

SquezeNet network folding construction method and system for unmanned aerial vehicle image target detection

Technical Field

The application relates to the technical field of unmanned aerial vehicle image target detection, in particular to a SquezeNet network folding construction method and system for unmanned aerial vehicle image target detection.

Background

Neural network technology is an important branch of artificial intelligence. A large number of neurons (basic units of storage, operation and input and output) form a hierarchical structure similar to a human brain through interconnection, and the hierarchical structure is a neural network and is generally composed of an input layer, an output layer and a plurality of hidden layers. Under the condition of supervised training, a large amount of tagged data is utilized to train the neural network, and the information such as weight, bias and the like of nodes in the neural network is continuously adjusted through two stages of forward propagation and residual reverse propagation, so that the aim of outputting a correct result is finally achieved. The neural network has high precision and strong learning ability, and has wide and important application in the fields of image voice recognition, pattern recognition and the like. The neural networks are of many kinds, including BP neural networks, convolutional Neural Networks (CNNs), and Recurrent Neural Networks (RNNs). The convolutional neural network has important roles in the field of image recognition due to the characteristics of weight sharing, region feature extraction and the like. In large-scale visual challenge games (ILSVRC), the best performance of image recognition is created by convolutional neural network correlation algorithms. A special basic module, fire module (shown in fig. 1), is proposed in the SqueezeNet neural network. The Fire module largely uses 1 x 1 convolution kernels instead of 3 x 3 convolution kernels, thereby greatly reducing the number of parameters of the model. A Fire module includes a compressed convolution layer (only 1 x 1 convolution kernels), and an extension layer containing 1 x 1 and 3 x 3 convolution kernels. There are three adjustable dimensions (super parameters) in the Fire module: s1×1, e1×1, and e3×3. Where s1×1 is the number of convolutions in the compression layer, e1×1 is the number of 1×1 convolution kernels in the expansion layer, and e3×3 is the number of 3×3 convolution kernels in the expansion layer. When using Fire modules, s1×1 is set to be smaller than (e1×1+e3×3), thereby limiting the number of input channels of the 3×3 convolution kernel, achieving the goal of reducing model parameters.

Patent document CN111291634a discloses an unmanned aerial vehicle image target detection method based on a convolution-limited boltzmann machine, firstly, obtaining a block in an aerial image, and constructing an initial training sample data set; performing expansion processing on the initial training sample data set, and combining the initial training sample data set and the expanded training sample data set to obtain a total training sample data set; constructing a convolution limited Boltzmann model, extracting feature vectors of a total training sample data set by adopting convolution kernels with two different sizes, calculating probability distribution of a visible layer and a hidden layer according to the feature vectors, and calculating parameters of the limited Boltzmann model to finish training of the convolution limited Boltzmann model; and inputting the aerial image into a trained convolution-limited Boltzmann model to obtain a target detection classification result. There is still room for improvement in performance.

Disclosure of Invention

Aiming at the defects in the prior art, the application aims to provide a SquezeNet network folding construction method and system for unmanned aerial vehicle image target detection.

The application provides a SquezeNet network folding construction method for unmanned aerial vehicle image target detection, which is characterized by comprising the following steps: step S1: aiming at setting a convolutional neural network, constructing a convolutional pooling layer on hardware, and calling and configuring each submodule to realize the whole convolutional pooling part by controlling iteration; step S2: using 7 HalfFire modules to obtain object type information, object confidence information and object position information; step S3: determining a data flow direction and acquiring data flow direction determination result information; step S4: according to the convolution operation number which is carried out simultaneously by a hardware structure and the structure of the image recognition neural network, dividing a convolution pooling part into 7 stages, updating the weight of the HalfFire module, and after the HalfFire module is instantiated for the configuration of the accelerator, sequentially completing corresponding operations in each stage; step S5: and acquiring SquezeNet network folding construction result information for unmanned aerial vehicle image target detection.

Preferably, the step S3 includes: step S3.1: adopting a folded network architecture, and realizing 1 parameter-configurable HalfFire module, 4 convolution layers and 3 pooling layers through hardware; step S3.2: the halfmire module is instantiated for configuration of the accelerator by updating its weight, determining the data flow direction.

Preferably, the step S4 includes: step S4.1: inputting weight of HalfFire1, reading picture to be detected from BRAM, inputting 224×224×3 (height×width×depth) image into network, performing maximum pooling via Conv2D layer with convolution kernel size of 3×3×32, performing maximum pooling via HalfFire1 module with s1×1, e1×1 and e3×3 of 32,0 and 96 respectively, performing maximum pooling, and controlling data flow to be Input->Conv2D－>Maxpool－>HalfFire－>Maxpool－>Output, store the first intermediate result in BRAM; step S4.2: inputting the weight of HalfFire2, reading the buffered first intermediate result from BRAM, and passing s _1×1 ，e _1×1 And e _3×3 Halfmire 2 modules of 32,0, 96 respectively, and then max poolingThe method comprises the steps of carrying out a first treatment on the surface of the Control data flow to be Input->HalfFire－>Maxpool－>Output, store the second intermediate result in BRAM.

Preferably, the step S4 includes: step S4.3: inputting the weight of HalfFire3, reading the buffered second intermediate result from BRAM, and passing s _1×1 ，e _1×1 And e _3×3 Halfmire 3 modules 32,0, 96, respectively; control data flow to be Input->HalfFire－>Output, store the third intermediate result in BRAM; step S4.4: inputting the weight of HalfFire4, reading the buffered third intermediate result from BRAM, and passing through s _1×1 ，e _1×1 And e _3×3 halfFire4 modules of 32,0, 96 respectively, control the data flow to be Input->HalfFire－>Output, store the fourth intermediate result in BRAM; step S4.5: inputting the weight of HalfFire5, reading the buffered fourth intermediate result from BRAM, and passing through s _1×1 ，e _1×1 And e _3×3 Halfmire 5 modules 32,0, 96, respectively; control data flow to be Input->HalfFire－>Output, store the fifth intermediate result in BRAM.

Preferably, the step S4 includes: step S4.6: inputting the weight of HalfFire6, reading the buffered fifth intermediate result from BRAM, and passing through s _1×1 ，e _1×1 And e _3×3 halfFire6 modules of 32,0, 96 respectively, control the data flow to be Input->HalfFire－>Output, store the sixth intermediate result in BRAM; step S4.7: inputting the weight of HalfFire7, reading the buffered sixth intermediate result from BRAM, and then passing s _1×1 ，e _1×1 And e _3×3 The HalfFire7 modules which are 32,0 and 96 respectively pass through a 1 multiplied by 1 convolution layer ConvObj, are combined with the original input, are respectively input into a convolution layer ConvObj and a ConvBox, and finally perform maximum value selection in ConvObj layer activation, and perform maximum value selection on ConvBox layer activation on the basis; control data flow to be Input->HalfFire

- > ConvClass- > ConvBox, convObj- > SelectMax- > Output, storing the detection result in the SDMA register, waiting for the processor to read.

The application provides a SqueEzeNet network folding construction system for unmanned aerial vehicle image target detection, which is characterized by comprising the following components: module M1: aiming at setting a convolutional neural network, constructing a convolutional pooling layer on hardware, and calling and configuring each submodule to realize the whole convolutional pooling part by controlling iteration; module M2: using 7 HalfFire modules to obtain object type information, object confidence information and object position information; module M3: determining a data flow direction and acquiring data flow direction determination result information; module M4: according to the convolution operation number which is carried out simultaneously by a hardware structure and the structure of the image recognition neural network, dividing a convolution pooling part into 7 stages, updating the weight of the HalfFire module, and after the HalfFire module is instantiated for the configuration of the accelerator, sequentially completing corresponding operations in each stage; module M5: and acquiring SquezeNet network folding construction result information for unmanned aerial vehicle image target detection.

Preferably, the module M3 comprises: module M3.1: adopting a folded network architecture, and realizing 1 parameter-configurable HalfFire module, 4 convolution layers and 3 pooling layers through hardware; module M3.2: the halfmire module is instantiated for configuration of the accelerator by updating its weight, determining the data flow direction.

Preferably, the module M4 comprises: module M4.1: inputting the weight of HalfFire1, reading a picture to be detected from a BRAM, wherein the size of an Input image of the network is 224 multiplied by 3 (height multiplied by width multiplied by depth), carrying out maximum pooling through Conv2D layers with convolution kernel size of 3 multiplied by 32, carrying out maximum pooling through HalfFire1 modules with s1 multiplied by 1, e1 multiplied by 1 and e3 multiplied by 3 respectively being 32,0 and 96, carrying out maximum pooling, controlling the flow of data to be Input- > Conv2D- > Maxpool- > HalfFire- > Maxpool- > Output, and storing a first intermediate result into the BRAM; module M4.2: inputting the weight of HalfFire2, reading the first intermediate result from BRAM, and carrying out maximum value pooling after passing through HalfFire2 modules with s1 multiplied by 1, e1 multiplied by 1 and e3 multiplied by 3 being 32,0 and 96 respectively; and controlling the data flow direction to be Input- > HalfFire- > Maxpool- > Output, and storing the second intermediate result into BRAM.

Preferably, the module M4 comprises: module M4.3: inputting the weight of HalfFire3, reading the buffered second intermediate result from BRAM, and passing through s 1X1, e1X1 and e3X1 are HalfFire3 modules of 32,0, 96 respectively; control data flow to be Input->HalfFire－>Output, store the third intermediate result in BRAM; module M4.4: inputting the weight of HalfFire4, reading the buffered third intermediate result from BRAM, and controlling the data flow to be Input->HalfFire－>Output, store the fourth intermediate result in BRAM; module M4.5: inputting the weight of HalfFire5, reading the buffered fourth intermediate result from BRAM, and passing through s _1×1 ，e _1×1 And e _3×3 Halfmire 5 modules 32,0, 96, respectively; control data flow to be Input->HalfFire－>Output, store the fifth intermediate result in BRAM.

Preferably, the module M4 comprises: module M4.6: inputting the weight of HalfFire6, reading the buffered fifth intermediate result from BRAM, and passing through s _1×1 ，e _1×1 And e _3×3 halfFire6 modules of 32,0, 96 respectively, control the data flow to be Input->HalfFire－>Output, store the sixth intermediate result in BRAM; module M4.7: inputting the weight of HalfFire7, reading the buffered sixth intermediate result from BRAM, and then passing s _1×1 ，e _1×1 And e _3×3 The HalfFire7 modules which are 32,0 and 96 respectively pass through a 1 multiplied by 1 convolution layer ConvObj, are combined with the original input, are respectively input into a convolution layer ConvObj and a ConvBox, and finally perform maximum value selection in ConvObj layer activation, and perform maximum value selection on ConvBox layer activation on the basis; control data flow to be Input->HalfFire－>ConvClass－>ConvBox、ConvObj－>SelectMax－>Output, the detection result is stored in SDMA register, wait processor to read.

Compared with the prior art, the application has the following beneficial effects:

1. the application is based on a SquezeNet neural network architecture, where the SquezeNet starts with an independent convolutional layer (conv 1), then 8 Fire modules (Fire 2-9), and finally one convolutional layer (conv 10). From the beginning to the end of the network we gradually increase the number of filters per Fire module. The SqueezeNet performs a maximum pooling with a step size of 2 after conv1, fire4, fire8 and conv10 layers. These relatively late pooling layers allow many layers in the network to have larger feature patterns, which may lead to higher classification accuracy.

2. In the present application, the network input image size is 224×224×3 (height×width×depth), the conv1 convolution kernel is 7×7×1, and the ReLU activation function is adopted for maximum pooling. The first layer convolution includes 96 convolution kernels, generating 96 111 x 111 pixel feature maps, downsampled to 55 x 55 pixel feature maps. Then through Fire2 and Fire3 modules with the s1 multiplied by 1, e1 multiplied by 1 and e3 multiplied by 3 being 16, 64 and 64 respectively, the output size is 55 multiplied by 128; then through the Fire4 module with the output sizes of 32, 128 and 128 of s1×1, e1×1 and e3×3 respectively, the output sizes become 55×55×256, and the maximum value is pooled at this time, and downsampled into a 27×27 pixel characteristic diagram. Then the output size is 27×27×512 through the Fire5 module with s1×1, e1×1 and e3×3 being 32, 128 and 128 respectively, and then through the Fire6 and Fire7 module with s1×1, e1×1 and e3×3 being 48, 192 and 192 respectively, at this time, the maximum value is pooled, and the downsampled into 13×13 pixel characteristic diagram. And finally, carrying out mean value pooling through a convolution layer conv 10. In order for the 1 x 1 and 3 x 3 sized fi lter output activations to have the same height and width, we add a 1 pixel zero-fill boundary to the 3 x 3fi lter input data of the expansion module. ReLU is used for activation of the compression layer and the extension layer.

3. The application has reasonable structure and convenient use, and overcomes the defects of the prior art.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:

fig. 1 is a schematic diagram of a Fire module structure in an embodiment of the application.

Fig. 2 is a schematic diagram of a SqueezeNet network structure in an embodiment of the application.

Fig. 3 is a schematic diagram of a HalfSqueezeNet network in an embodiment of the present application.

Fig. 4 is a schematic diagram of a folded structure of a HalfSqueezeNet network in an embodiment of the present application.

Detailed Description

The present application will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the present application, but are not intended to limit the application in any way. It should be noted that variations and modifications could be made by those skilled in the art without departing from the inventive concept. These are all within the scope of the present application.

For a certain convolutional neural network, a convolutional-pooling layer is constructed on hardware, and the hardware cost can be saved by controlling iteration and calling and configuring each submodule to realize the whole convolutional-pooling part.

The whole network uses 7 HalfFire modules, and uses ConvClass, convObj and ConvBox three convolution layers for calculating object type information, object confidence information and object position information respectively. The folding network architecture is adopted, and 1 parameter-configurable HalfFire module, 4 convolution layers and 3 pooling layers are realized through hardware. The halfmire module is instantiated for configuration of the accelerator by updating its weight, determining the data flow direction.

1. Division of phases

According to the convolution operation number (shown in fig. 4) and the image recognition neural network structure (shown in fig. 3) which can be simultaneously carried out by the hardware structure, the convolution-pooling part is divided into 7 stages, the weight of the HalfFire module is updated, and after the HalfFire module is instantiated for the configuration of the accelerator, the corresponding operation is completed in each stage in turn.

2. Operation of each stage

(1) Inputting the weight of HalfFire1, reading the picture to be detected from BRAM, wherein the size of the network input image is 224×224×3 (height×width×depth), and the picture is subjected to Conv2D layer with convolution kernel size of 3×3×32, and the maximum value is pooled. After passing through s _1×1 ，e _1×1 And e _3×3 Halfmire 1 modules, 32,0, 96 respectively, were then max pooled. Control data flow to be Input->Conv2D－>Maxpool－>HalfFire－>Maxpool－>Output, store intermediate result 1 in BRAM;

(2) Input the weight of HalfFire2 fromBRAM reads cached intermediate result 1, through s _1×1 ，e _1×1 And e _3×3 Halfmire 2 modules, 32,0, 96 respectively, were then max pooled. Control data flow to be Input->HalfFire－>Maxpool－>Output, store intermediate result 2 in BRAM;

(3) Input the weight of HalfFire3, read buffered intermediate result 2 from BRAM, pass s _1×1 ，e _1×1 And e _3×3 Halfmire 3 modules of 32,0, 96, respectively. Control data flow to be Input->HalfFire

- > Output, storing intermediate result 3 in BRAM;

(4) Input the weight of HalfFire4, read buffered intermediate result 3 from BRAM, pass s _1×1 ，e _1×1 And e _3×3 Halfmire 4 modules of 32,0, 96, respectively. Control data flow to be Input->HalfFire

- > Output, storing intermediate result 4 in BRAM;

(5) Input the weight of halfmire 5, read the buffered intermediate result 4 from BRAM, pass s _1×1 ，e _1×1 And e _3×3 Halfmire 5 modules of 32,0, 96, respectively. Control data flow to be Input->HalfFire

- > Output, storing intermediate result 5 in BRAM;

(6) Input the weight of halfmire 6, read the buffered intermediate result 5 from BRAM, pass s _1×1 ，e _1×1 And e _3×3 Halfmire 6 modules of 32,0, 96, respectively. Control data flow to be Input->HalfFire

- > Output, storing intermediate result 6 in BRAM;

(7) The buffered intermediate result 6 is read from BRAM by inputting the weight of halfmire 7, and then passed s _1×1 ，e _1×1 And e _3×3 The HalfFire7 modules which are 32,0 and 96 respectively pass through a 1 multiplied by 1 convolution layer ConvObj, are combined with the original input, are respectively input into the convolution layers ConvObj and ConvBox, and finally are subjected to maximum value selection in ConvObj layer activation, and the ConvBox layer activation is subjected to maximum value selection on the basis. Control data flow to Input

- > halfFire- > ConvClass- > ConvBox, convObj- > SelectMax- > Output, storing the detection result in the SDMA register, waiting for the processor to read.

Aiming at unmanned aerial vehicle image target recognition, the application designs a hardware accelerator for parallel operation realized by an FPGA based on a convolutional neural network. According to the characteristics of the neural network structure, the hardware realizes the HalFire modules, and the whole network only uses 7 HalfFire modules, so that the parameter quantity and the calculation amount are reduced to a certain extent. Furthermore, for target detection, we add ConvClass, convObj and ConvBox three convolution layers at the end of the network to calculate object type information, object confidence information and object position information, respectively. The folding network architecture is adopted, and 1 parameter-configurable HalfFire module, 4 convolution layers and 3 pooling layers are realized through hardware. Before each calculation, the weight of the HalfFire module is updated, the HalfFire module is instantiated by the configuration of the accelerator, the data flow direction is determined, and then the data is input for calculation. And after the image to be detected is processed by the accelerator for 7 times, a target detection result is obtained.

In the description of the present application, it should be understood that the terms "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are merely for convenience in describing the present application and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present application.

The foregoing describes specific embodiments of the present application. It is to be understood that the application is not limited to the particular embodiments described above, and that various changes or modifications may be made by those skilled in the art within the scope of the appended claims without affecting the spirit of the application. The embodiments of the application and the features of the embodiments may be combined with each other arbitrarily without conflict.

Claims

1. The SquezeNet network folding construction method for unmanned aerial vehicle image target detection is characterized by comprising the following steps of: step S1: aiming at setting a convolutional neural network, constructing a convolutional pooling layer on hardware, and calling and configuring each submodule to realize the whole convolutional pooling part by controlling iteration;

step S2: using 7 HalfFire modules to obtain object type information, object confidence information and object position information;

step S3: determining a data flow direction and acquiring data flow direction determination result information;

step S4: according to the convolution operation number which is carried out simultaneously by a hardware structure and the structure of the image recognition neural network, dividing a convolution pooling part into 7 stages, updating the weight of the HalfFire module, and after the HalfFire module is instantiated for the configuration of the accelerator, sequentially completing corresponding operations in each stage;

step S5: acquiring SquezeNet network folding construction result information for unmanned aerial vehicle image target detection;

the step S4 includes:

step S4.1: inputting weight of HalfFire1, reading a picture to be detected from BRAM, performing Conv2D layer with convolution kernel size of 3×3×32, performing maximum pooling, and performing s _1×1 ，e _1×1 And e _3×3 halfFire1 modules of 32,0 and 96 respectively, and performing maximum value pooling to control the data flow direction to be Input->Conv2D－>Maxpool

- > HalfFire- > Maxpool- > Output, storing the first intermediate result in BRAM;

step S4.2: inputting the weight of HalfFire2, reading the buffered first intermediate result from BRAM, and passing s _1×1 ，e _1×1 And e _3×3 Halfmire 2 modules of 32,0, 96 respectively, and then carrying out maximum pooling; control data flow to be Input->HalfFire－>Maxpool－>Output, store the second intermediate result in BRAM;

the step S4 includes:

step S4.3: inputting the weight of HalfFire3, reading the buffered second intermediate result from BRAM, and passing s _1×1 ，e _1×1 And e _3×3 Halfmire 3 modules 32,0, 96, respectively; control data flow to Input

- > halfffire- > Output, storing the third intermediate result in BRAM;

step S4.4: inputting the weight of HalfFire4, reading the buffered third intermediate result from BRAM, and passing through s _1×1 ，e _1×1 And e _3×3 Halfmire 4 modules 32,0, 96, respectively, control the data flow to Input

- > halfffire- > Output, storing the fourth intermediate result in BRAM;

step S4.5: inputting the weight of HalfFire5, reading the buffered fourth intermediate result from BRAM, and passing through s _1×1 ，e _1×1 And e _3×3 Halfmire 5 modules 32,0, 96, respectively; control data flow to Input

- > halfffire- > Output, storing the fifth intermediate result in BRAM;

the step S4 includes:

step S4.6: inputting the weight of HalfFire6, reading the buffered fifth intermediate result from BRAM, and passing through s _1×1 ，e _1×1 And e _3×3 Halfmire 6 modules 32,0, 96, respectively, control the data flow to Input

- > halfffire- > Output, storing the sixth intermediate result in BRAM;

step S4.7: inputting the weight of HalfFire7, reading the buffered sixth intermediate result from BRAM, and then passing s _1×1 ，e _1×1 And e _3×3 The HalfFire7 modules which are 32,0 and 96 respectively pass through a 1 multiplied by 1 convolution layer ConvObj, are combined with the original input, are respectively input into a convolution layer ConvObj and a ConvBox, and finally perform maximum value selection in ConvObj layer activation, and perform maximum value selection on ConvBox layer activation on the basis; control data flow to be Input->HalfFire－>ConvClass－>ConvBox、ConvObj－>SelectMax－>Output, the detection result is stored in SDMA register, wait processor to read.

2. The method for constructing a folded network of SqueezeNet for unmanned aerial vehicle image object detection according to claim 1, wherein the step S3 comprises:

step S3.1: adopting a folded network architecture, and realizing 1 parameter-configurable HalfFire module, 4 convolution layers and 3 pooling layers through hardware;

step S3.2: the halfmire module is instantiated for configuration of the accelerator by updating its weight, determining the data flow direction.

3. The utility model provides a SquezeNet network folding construction system towards unmanned aerial vehicle image target detection which characterized in that includes: module M1: aiming at setting a convolutional neural network, constructing a convolutional pooling layer on hardware, and calling and configuring each submodule to realize the whole convolutional pooling part by controlling iteration;

module M2: using 7 HalfFire modules to obtain object type information, object confidence information and object position information;

module M3: determining a data flow direction and acquiring data flow direction determination result information;

module M4: according to the convolution operation number which is carried out simultaneously by a hardware structure and the structure of the image recognition neural network, dividing a convolution pooling part into 7 stages, updating the weight of the HalfFire module, and after the HalfFire module is instantiated for the configuration of the accelerator, sequentially completing corresponding operations in each stage;

module M5: acquiring SquezeNet network folding construction result information for unmanned aerial vehicle image target detection;

the module M4 includes:

module M4.1: inputting weight of HalfFire1, reading a picture to be detected from BRAM, performing Conv2D layer with convolution kernel size of 3×3×32, performing maximum pooling, and performing s _1×1 ，e _1×1 And e _3×3 halfFire1 modules of 32,0 and 96 respectively, and performing maximum value pooling to control the data flow direction to be Input->Conv2D－>Maxpool

module M4.2: inputting the weight of HalfFire2, reading the buffered first intermediate result from BRAM, and passing s _1×1 ，e _1×1 And e _3×3 Halfmire 2 modules of 32,0, 96 respectively, and then carrying out maximum pooling; control data flow to be Input->HalfFire－>Maxpool－>Output, store the second intermediate result in BRAM;

the module M4 includes:

module M4.3: inputting the weight of HalfFire3, reading the buffered second intermediate result from BRAM, and passing s _1×1 ，e _1×1 And e _3×3 Halfmire 3 modules 32,0, 96, respectively; control data flow to Input

- > halfffire- > Output, storing the third intermediate result in BRAM;

module M4.4: inputting the weight of HalfFire4, reading the buffered third intermediate result from BRAM, and passing through s _1×1 ，e _1×1 And e _3×3 Halfmire 4 modules 32,0, 96, respectively, control the data flow to Input

- > halfffire- > Output, storing the fourth intermediate result in BRAM;

module M4.5: inputting the weight of HalfFire5, reading the buffered fourth intermediate result from BRAM, and passing through s _1×1 ，e _1×1 And e _3×3 Halfmire 5 modules 32,0, 96, respectively; control data flow to Input

- > halfffire- > Output, storing the fifth intermediate result in BRAM;

the module M4 includes:

module M4.6: inputting the weight of HalfFire6, reading the buffered fifth intermediate result from BRAM, and passing through s _1×1 ，e _1×1 And e _3×3 Halfmire 6 modules 32,0, 96, respectively, control the data flow to Input

- > halfffire- > Output, storing the sixth intermediate result in BRAM;

module M4.7: inputting the weight of HalfFire7, reading the buffered sixth intermediate result from BRAM, and then passing s _1×1 ，e _1×1 And e _3×3 The HalfFire7 modules of 32,0 and 96 are respectively input into the convolution layers ConvObj and ConvBox after being combined with the original input through the 1X 1 convolution layer ConvObj, and finally are arranged in the ConvObj layerSelecting a maximum value in the activation process, and selecting the maximum value of ConvBox layer activation on the basis; control data flow to be Input->HalfFire－>ConvClass－>ConvBox、ConvObj－>SelectMax－>Output, the detection result is stored in SDMA register, wait processor to read.

4. The unmanned aerial vehicle image object detection-oriented SqueezeNet network fold construction system of claim 3, wherein the module M3 comprises:

module M3.1: adopting a folded network architecture, and realizing 1 parameter-configurable HalfFire module, 4 convolution layers and 3 pooling layers through hardware;

module M3.2: the halfmire module is instantiated for configuration of the accelerator by updating its weight, determining the data flow direction.