CN115272412B

CN115272412B - Edge calculation-based low-small slow target detection method and tracking system

Info

Publication number: CN115272412B
Application number: CN202210920753.9A
Authority: CN
Inventors: 刘益安; 肖枭; 胡绍刚; 冉欢欢
Original assignee: Chongqing Institute Of Microelectronics Industry Technology University Of Electronic Science And Technology
Current assignee: Chongqing Institute Of Microelectronics Industry Technology University Of Electronic Science And Technology
Priority date: 2022-08-02
Filing date: 2022-08-02
Publication date: 2023-09-26
Anticipated expiration: 2042-08-02
Also published as: CN115272412A

Abstract

The invention belongs to the field of low small slow target detection, and in particular relates to a low small slow target detection method and a tracking system based on edge calculation, wherein the method comprises the steps of collecting image sets of different targets under different exposure degrees, and processing the image sets to obtain a low small slow target data set; constructing a lightweight detection network based on YOLOv5, training by adopting a public data set, and testing by adopting a low-low slow target data set; adopting a lightweight detection network after training test is completed to detect a low-speed target in real time; the invention reduces the manpower and time cost for manually collecting and processing the data, and is beneficial to acquiring more comprehensive and higher-quality data.

Description

Edge calculation-based low-small slow target detection method and tracking system

Technical Field

The invention belongs to the field of low small slow target detection, and particularly relates to a low small slow target detection method and a tracking system based on edge calculation.

Background

Among a plurality of detection targets, the low-low aviation targets have small volume, simple operation, low flying height and more ground object shielding, and air force and radar equipment cannot be covered, so that the pressure of an air defense system is increased dramatically, and the safety guarantee work of important activities and key areas is seriously threatened. The detection is a necessary premise for interception and striking, so that the early warning and the identification of low and small slow targets such as unmanned aerial vehicles are important guarantees for preventing the air combat from capturing war initiative under the condition of future informatization. And because the manual shooting has an influence on the real-time performance and the stability of the shooting process, the arrangement of the small and slow detection intelligent system on the edge equipment is more significant for automatically detecting the small and slow targets in the air.

In recent years, most of the mainstream target detection algorithms use deep convolutional neural networks and various large-scale data sets for training and detection. Common detection models are classified into one-stage or two-stage models represented by YOLO, SSD, R-CNN, fast R-CNN, mask R-CNN, etc. Two-stage detection algorithm model training is a multi-stage process that is slow and difficult to optimize because each individual stage must be trained separately. Because of their limited storage and computing power, two-stage object detection algorithms are difficult to apply for real-time detection of real-world scenarios. The target detection at one stage does not need to generate a candidate region first and then carry out detection tasks, but returns the coordinates of a boundary frame of the target object while classifying the target object, and the result can be directly predicted by only using a small number of candidate frames, so that the accuracy of prediction is reduced, the detection speed is greatly improved, and the target object can be smoothly deployed into a mobile terminal and embedded equipment after being light. Compression and acceleration of neural networks, and placement of these network models with good performance after weight reduction into related intelligent small systems, have significant academic and engineering value.

The prior art has the following disadvantages: the cost of manually collecting data is high, and the problem of difficulty in collecting data exists; the long-distance transmission of the required data and the manual identification of the low-low target type and threat level take more time and have larger hysteresis; the operation amount of the target detection algorithm is too large, and the real-time target detection with high frame rate cannot be stably performed for a long time.

Disclosure of Invention

In order to solve the problems, the invention provides a low-small slow target detection method and a tracking system based on edge calculation.

In a first aspect, the present invention provides a method for detecting a low-low slow target based on edge calculation, including:

s1, collecting image sets of different targets under different exposure degrees, and processing the image sets to obtain a low-small slow target data set;

marking a low small slow target in an image set by using a marking tool, and carrying out data enhancement and fuzzy processing on the low small slow target to obtain a low small slow data set;

s2, constructing a lightweight detection network based on YOLOv5, training by adopting a public data set, and performing test adjustment on the trained lightweight detection network through a low-low target data set;

s3, adopting a lightweight detection network after training test is completed to detect the low-speed target in real time.

Further, the detection network is modified based on the YOLOv5 network structure, and comprises 1 CBL module, 3 SFB1 modules, 3 SFB2 modules, 1 CBS module, 2 upsampling modules, 2 concat modules, 6 DWB modules, 3 conv modules and 2 add modules.

Further, the CBL module comprises a 3×3 convolution layer, a BN (Batch Normalization) layer and a leakyRelu activation function layer which are sequentially cascaded;

the CBR module comprises a 1 multiplied by 1 convolution layer, a BN layer and a ReLu activation function layer which are sequentially cascaded;

the DWB module comprises a 3X 3 depth separable convolution layer, a BN layer, a 1X 1 convolution layer, a BN layer and a ReLu activation function layer which are sequentially cascaded;

the CBS module comprises a 3 multiplied by 3 convolution layer, a BN layer and a SiLU activation function layer which are sequentially cascaded;

the SFB1 module comprises a first branch and a second branch, the first branch is provided with a CBR module and a DWB module, the second branch is provided with a DWB module, the outputs of the first branch and the second branch are spliced through a concat layer, and the splicing result passes through a Channel Shuffle layer;

the SFB2 module comprises a slice layer, a CBR module and a DWB module which are sequentially cascaded, wherein the DWB module and the output of the slice layer are spliced through a concat layer, and the splicing result passes through a Channel Shuffle layer.

Further, to achieve the light weight of the detection network, the DWB module of the first branch of the SFB1 module only retains the 3×3 depth-separable convolutional layer and the BN layer, and the DWB module in the SFB2 module only retains the 3×3 depth-separable convolutional layer and the BN layer.

Further, in the training process, L1 regularization is added to restrict the coefficients of the BN layers, gamma in each BN layer is directly selected as a sparse cutting scale factor, and the gamma is multiplied with the input of the BN channel. Wherein, the L1 canonical constraint is expressed as:

the first term to the right of the formula equal sign is the loss function of the normal training of CNN, the second term is a regular constraint, (x, y) represents training input and target, W represents trainable weight, g () is scale factor penalty caused by sparsity, and g(s) = |s| is selected to realize regularization of parameters. Lambda is a regularized coefficient, and parameters can be thinned out by modifying according to the data set.

After parameter sparse regularization, the constraint term pulls γ toward 0 as the γ value is updated. The thinned gamma value is recorded as delta _i The output of BN layer is Z _i And performing linear transformation on the input and the output according to the training process of the BN layer:

Z _i ＝δ _i *x _i ′+β _i

wherein x is _i ' is normalized input, beta _i Is the bias of the BN layer.

In a second aspect, based on the first aspect, the invention provides a low-small slow target tracking system based on edge calculation, which comprises a camera, a cradle head, a Jetson Xavier NX edge calculation platform and an embedded calculation platform of dsp+fpga, wherein the cradle head is arranged at the bottom of the camera, the camera is connected with the Jetson Xavier NX edge calculation platform by using a USB interface, the Jetson Xavier NX edge calculation platform and the embedded calculation platform of dsp+fpga transmit data through a network cable, a strong quantitative detection network is arranged in the Jetson Xavier NX edge calculation platform, and the light weight detection network is tested and parameter-adjusted on a PC end, wherein:

the camera is used for acquiring images and transmitting the images to the Jetson Xavier NX edge computing platform for detection;

the Jetson Xavier NX edge computing platform is used for detecting a low-low slow target in an acquired image of a camera, outputting an anchor frame position and an image template of the low-low slow target and controlling a holder to rotate and track the low-low slow target;

the embedded computing platform of the DSP+FPGA comprises a DSP module and an FPGA module:

a tracking module is arranged in the DSP module and is responsible for processing a tracking algorithm and communication with the Jetson Xavier NX;

the FPGA module is used for deploying a cradle head and is used for realizing the connection of equipment.

The invention has the beneficial effects that:

compared with the original YOLOv5 target detection network, the lightweight YOLOv5 network provided by the invention can realize accurate detection of small target objects under a complex background on the detection task of the small and slow targets, and reduces the parameter amount and the calculation amount. Wherein a lightweight backbone network is designed; the reasoning speed is improved by fusing the convolution layer and the BN layer; and constraint BN coefficient is carried out by using an L1 regularization mode, and channel clipping is carried out, so that the parameter quantity and the energy consumption are reduced.

The method detects low-small slow targets with more ground object shielding and complex background in a low-altitude scene, and reduces the manpower and time cost for manually collecting and processing data by a deep learning method. And the data enhancement mode is utilized to acquire more comprehensive and higher-quality data. The network is deployed on the edge computing platform with small occupied space, strong portability and strong portability, so that the detection network can be quickly docked with the embedded computing platform, real-time detection of a target is realized, and finally the industry landing is completed.

Drawings

FIG. 1 is a block diagram of a low-low slow target tracking system based on edge computation of the present invention;

FIG. 2 is a flow chart of the use of the low-low slow target tracking system according to an embodiment of the invention;

FIG. 3 is a diagram of a lightweight detection network based on YOLOv5 of the present invention;

FIG. 4 is a block diagram of an SFB1 module and an SFB2 module according to an embodiment of the present invention;

FIG. 5 is a diagram of a low-low slow target detection method based on edge computation according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The edge calculation-based low-small slow target detection method, as shown in fig. 5, comprises the following steps:

specifically, images of different targets under different exposure degrees are acquired through a camera, a labeling tool is used for labeling low-small slow targets in the images, so that in order to enhance the generalization, the mosaic and mixup combined data of yolov4 are used for reference, the data dimension is enhanced, and the image fuzzy data of different degrees are increased according to different low-small slow targets in a blurring manner, so that the detection precision of the fuzzy data is improved.

S2, constructing a lightweight detection network based on YOLOv5, training by adopting a public data set, and testing by adopting a low-low slow target data set; and counting and screening the collected images and the labeling conditions thereof in the low-low target data set, screening out a very small or very large labeling frame, checking errors of labeling, and modifying and perfecting.

Specifically, the lightweight detection network is optimized based on the Yolov5 network structure, and a Focus module in Yolov5 is replaced by a CBL module containing common 3×3 convolution for downsampling, so that the calculated amount and the parameter amount are reduced to 1/4 of the original calculated amount and parameter amount; and modifying according to a main module DWB in the SheffleNet network structure, and referring to and combining five stages of a backbone network CSP-Darknet53 structure of YOLOv5 to construct a new backbone network for extracting image features.

In an embodiment, further modification optimization of the backbone network in the lightweight detection network includes:

1) And analyzing the structure of the SheffeNet network, and constructing backbone network structures SFB1 and SFB2 in the lightweight detection network based on the structure. 2) According to the SheffleNet v2, two structures of SFB1 and SFB2 are constructed, and for the first block of each stage, the first block directly copies the input into two parts because of the down-sampling and dimension-lifting, and the two parts are respectively passed through a DWB module and a CBR+DWB module and then are concated together, so that the channel is doubled, and the structure of SFB1 is formed. The common block divides an input channel spiit into two t channels, wherein one channel is subjected to CBR+DWB convolution extraction characteristics and then directly subjected to concat with the other channel after segmentation, so that an SFB2 structure is formed. While 2 1*1 convolutions are used by the SheffeNet v2, the invention only performs small target detection, does not need to perform dimension lifting and dimension reduction, cuts out one 1X 1 convolution in the DWB module for fusing the inter-channel information of the DWB convolution, and realizes the light weight of the model of the SheffeNet, as shown in fig. 4.

3) The maximum pooling layer is cancelled, and a Spatial Pyramid Pooling (SPP) module in YOLOv5 is deleted, so that a CBS module is used for replacement, and the module uses a SiLU activation function to replace Relu, so that the effect on a deep model is better. The CBS module serves to simplify the overall features and thus extract the main features, and is therefore functionally similar to the SPP module, but with a lower reference number than the SPP module.

Specifically, the Mish activation function has the characteristics of large calculated amount and good effect, and an activation layer in the lightweight detection network adopts the Mish function during training, and can be replaced by a less calculated amount of the LeakyReLu activation function and the ReLu activation function after training is completed.

Specifically, the YOLOv 5-based lightweight detection network, as shown in fig. 3, includes 1 CBL module, 3 SFB1 modules, 3 SFB2 modules, 1 CBS module, 2 upsampling modules, 2 concat modules, 6 DWB modules, 3 conv modules, and 2 add modules;

the CBL module comprises a 3X 3 convolution layer, a BN layer and a leakyRelu activation function layer which are sequentially cascaded; the method has the functions of carrying out convolution calculation, batch standardization calculation and an activation function on input data, adding nonlinearity to a network and accelerating the convergence rate of the network;

the up-sampling module mainly adopts an adjacent interpolation method to expand the feature images to the corresponding size and splice the feature images with other feature images through a Concat layer for use, and the up-sampling module is used for extracting the features of different dimensions in the images;

the concat module is used for realizing better performance by utilizing semantic information of feature graphs with different scales and adding channels;

the add module adds the feature images, the number of channels is unchanged, and the add module is equivalent to a special concat module, but the calculated amount is smaller; therefore, in the detection network, a concat module is adopted for splicing the feature images with different scales, and an add module is adopted to reduce the calculated amount when the feature images of the two input channels are similar in semantic;

the DWB module comprises a 3X 3 depth separable convolution layer, a BN layer, a 1X 1 point convolution layer, a BN layer and a ReLu activation function layer which are sequentially cascaded;

the SFB2 module comprises a slice layer, a CBR module and a DWB module which are sequentially cascaded, wherein the DWB module and the output of the slice layer are spliced through a concat layer, and the splicing result passes through a Channel Shuffle layer;

in order to realize the light weight of the detection network, the DWB module of the first branch of the SFB1 module only reserves the 3×3 depth-separable convolution layer and the BN layer, and the DWB module in the SFB2 module only reserves the 3×3 depth-separable convolution layer and the BN layer.

The SFB1 module and the SFB2 module are basic units of the lightweight detection network backbone network. The function is to complete the feature extraction of the input image. When the Stride is 1, the SFB2 module is adopted, and the residual edge is not convolved, so that the width and the height are unchanged, and the SFB2 module is mainly used for deepening the network layer number. When the Stride is 2, the SFB1 module is adopted, and the residual edge has convolution, so the width and the height are variable, and the SFB1 module is mainly used for compressing the width and the height of the characteristic layer and downsampling.

In one embodiment, to improve inference efficiency, during training, the BN coefficient is constrained by adding L1 regularities to achieve channel clipping, thereby reducing the number of parameters.

For BN layer: the final BN output can be obtained by learning the mean variance of elements in a miniband, dividing the mean value obtained by subtracting statistics from the input of neurons by the standard deviation, multiplying the mean value by a learnable coefficient gamma, and adjusting the mean value by offset beta, wherein the specific process in the training process is as follows:

1. set input x= { X ₁ ,…,x _m Mini-batch of m samples

2. Calculating mini-batch mean

3. Calculating mini-batch variance

4. Normalized (normal) input

5. Linear transformation

y _i ＝γ _i *x _i ′+β _i

Where E (x) is the mean of the sliding window and Var (x) is the variance of the sliding window. E is a small constant used to prevent denominator 0. Gamma and beta are then learnable parameters, gamma is a scaling factor and beta is a bias of the BN layer. During training, the parameters are learned by gradient descent, as are the parameters of other convolution kernels.

In the training process, the embodiment constrains the BN layer coefficients by adding L1 regularization, directly selects gamma in each BN layer as a sparsely cut scale factor, multiplies the gamma with the output of the BN channel, combines the gamma with the training network weight scale factor, and sparsely regularizes the gamma. The L1 canonical constraint is as follows:

the first term above is a loss function of normal training, the second term is a constraint, (x, y) represents training input and goal, W represents trainable weights, g () is a scale factor penalty caused by sparsity, and g(s) = |s| is selected to implement regularization of the parameters. λ is a regularized coefficient, which can be adjusted according to the dataset to sparsify the parameters. After parameter sparse regularization, the constraint term pulls γ toward 0 as the γ value is updated. The thinned gamma value is recorded as delta _i The output of BN layer is Z _i And performing linear transformation on the input and the output according to the training process of the BN layer:

Z _i ＝δ _i *x _i ′+β _i

when coefficient delta _i Approximately 0, the corresponding activation Z _i Will be correspondingly small. Therefore, the convolution kernels of the upper and lower convolution layers which are correspondingly activated are cut layer by layer, and the obtained model is subjected to fine adjustment to restore the accuracy lost by cutting, so that the channel pruning of the BN layer is realized.

After training, parameters delta and beta and the method and mean value of the whole training set are constants, so that single-step operation is carried out by combining the parameters of the BN layer into the convolution layer, and the forward deducing speed of the model is improved. For the convolutional layer: assuming that one element in convolution kernel weight is omega, one element in input feature map is x, offset is b, and output is y _conv The calculation process for ω and x is as follows:

y _conv ＝ω*x+b

substituting the convolution formula into the BN formula and performing affine transformation:

updating the weights ω and the bias b to ω 'and b':

then there is an output: y is _bn =ω '×x+b', and finally realizing fusion of the convolutional layer and BN layer, and only performing the calculation of y in the reasoning process _bn And calculating so as to improve the forward reasoning speed of the model.

The invention adopts SVD decomposition type convolution structure to replace part of standard convolution in the neg network, namely, a DWB module is used for replacing a CBL module to optimize high calculation loss generated by YOLO v5 in actual deployment and detection.

In one embodiment, a 3×3 convolution and a 1×1 convolution are utilized as the final output module of the YOLOv5 network; and respectively inputting the detected feature maps with three different pixel scales into a YOLO Head for decoding, extracting global features through a 3×3 convolution layer, fully connecting the 1×1 convolution layers, and finally calculating to obtain a prediction boundary box, a confidence value and a category. After the YOLO Head, the loss function value of the detection model is minimized through iterative calculation, and when the training time iteration is completed, the model with the highest detection precision is selected as the final detection model. The bounding box loss is calculated using CIoU, the loss of objects and categories is calculated using cross entropy, and back propagation update model parameters are performed.

The utility model provides a low little slow target tracking system based on edge calculation, as shown in fig. 1, including the camera, the cloud platform, jetson Xavier NX edge computing platform and DSP+FPGA's embedded computing platform, the cloud platform is installed in the camera bottom, the camera uses the USB interface to be connected with Jetson Xavier NX edge computing platform, jetson Xavier NX edge computing platform passes through the network cable with DSP+FPGA's embedded computing platform transmission data, be equipped with the lightweight detection network in the Jetson Xavier NX edge computing platform, this lightweight detection network is the improved network based on YOLOv5 that proposes in the above-mentioned low little slow target detection method, it has been tested and the accent parameter on the PC end, wherein:

In one embodiment, the low-low speed target tracking detection system adopts an MV-CA013-21UM camera, and the target tracking flow thereof, as shown in FIG. 2, comprises:

s1, acquiring an image through an MV-CA013-21UM camera, transmitting the image into a detection network of a Jetson Xavier NX edge computing platform, judging whether automatic detection is carried out, if yes, entering a step S3, otherwise, entering a step S2;

s2, knowing the coordinates of a low-small slow target in the image, manually inputting the coordinates of the low-small slow target through an instruction, and then executing a step S5;

s3, carrying out target detection on the image, judging whether a low-small slow target is detected, if so, outputting the anchor frame position of the low-small slow target and the image template, and executing a step S5, otherwise, executing a step S4;

s4, controlling the cradle head to rotate to the next observation angle through a Jetson Xavier NX edge computing platform, and returning to the step S3;

s5, transmitting the data (the coordinates of the low-speed target) into an embedded computing platform of the DSP+FPGA, decoding the data by the FPGA module, and transmitting the decoded data to the DSP module for storage;

s6, a tracking module in the DSP module takes a center point of a low-small slow target as a tracking point, and the turntable is controlled and adjusted through a Jetson Xavier NX to ensure that the center point of the camera coincides with the center point of the low-small slow target, so that the tracking of the low-small slow target is realized;

specifically, in order to improve the robustness of the whole system and enhance the tracking capability of the whole system, in the whole tracking process, a tracking module always verifies the validity of a tracking point through an image template transmitted by a target detection network, if the tracking point is valid, the tracking point and the image template are updated, if the tracking is invalid, the searching range is reduced through a Camshift algorithm, and the detection target can be searched through controlling a turntable under the condition that the target is lost;

and S7, packaging the output result and the data by the embedded computing platform of the DSP+FPGA and sending the packaged result and the data to a display to realize real-time detection of the target.

In the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "configured," "connected," "secured," "rotated," and the like are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; either directly or indirectly through intermediaries, or in communication with each other or in interaction with each other, unless explicitly defined otherwise, the meaning of the terms described above in this application will be understood by those of ordinary skill in the art in view of the specific circumstances.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A method for detecting a low-speed target based on edge calculation, comprising:

the lightweight detection network is optimized based on a YOLOv5 network structure and comprises 1 CBL module, 3 SFB1 modules, 3 SFB2 modules, 1 CBS module, 2 upsampling modules, 2 concat modules, 6 DWB modules, 3 conv modules and 2 add modules;

the CBL module comprises a 3 multiplied by 3 convolution layer, a BN layer and a releaseRelu activation function layer which are sequentially cascaded;

in order to realize the light weight of the detection network, the DWB module of the first branch of the SFB1 module only reserves a 3X 3 depth separable convolution layer and a BN layer, and the DWB module in the SFB2 module only reserves a 3X 3 depth separable convolution layer and a BN layer;

the head network of the YOLOv5 network is improved, and a convolution unit is arranged in front of each detection head, wherein the convolution unit comprises a convolution of 3×3 and a convolution of 1×1; s3, adopting a lightweight detection network after training test is completed to detect the low-speed target in real time.

2. The edge-calculation-based low-small slow target detection method according to claim 1, wherein in the training process, L1 regularization is added to constrain coefficients of BN layers, a learnable parameter γ in each BN layer is selected as a scale factor for thinning and clipping, and the L1 regularization constraint is expressed as:

(x, y) represents training input and target, respectively, W represents trainable weights, g () is a scale factor penalty caused by sparsity, and λ is a regularization coefficient;

after gamma parameter sparse regularization, the BN layer is transformed as follows:

Z _i ＝δ _i *x _i ′+β _i

wherein Z is _i For the output of one BN layer, delta _i Is the gamma value after sparsification, x _i ' is normalized input, beta _i Is the bias of the BN layer.

3. The low-small slow target tracking system based on edge calculation realized by the method as claimed in claim 1, which is characterized by comprising a camera, a cradle head, a Jetson Xavier NX edge computing platform and an embedded computing platform of DSP+FPGA, wherein the cradle head is arranged at the bottom of the camera, the camera is connected with the Jetson Xavier NX edge computing platform by using a USB interface, the Jetson Xavier NX edge computing platform and the embedded computing platform of DSP+FPGA transmit data through a network cable, a lightweight detection network is arranged in the Jetson Xavier NX edge computing platform, and the lightweight detection network is tested and parameter-adjusted on a PC end, wherein: