CN116385810B

CN116385810B - Yolov 7-based small target detection method and system

Info

Publication number: CN116385810B
Application number: CN202310656474.0A
Authority: CN
Inventors: 杨文姬; 马欣欣; 安航; 胡文超; 杨振姬; 易文龙; 杨红云
Original assignee: Jiangxi Agricultural University
Current assignee: Jiangxi Agricultural University
Priority date: 2023-06-05
Filing date: 2023-06-05
Publication date: 2023-08-15
Anticipated expiration: 2043-06-05
Also published as: CN116385810A

Abstract

The invention belongs to the technical field of deep learning target detection, and particularly relates to a small target detection method and system based on YOLOv7, wherein the method carries out small target detection through a small target detection model based on YOLOv7, and a small target enhancement module is added to a backbone network of the small target detection model based on YOLOv 7; the small target enhancement module comprises two branches, wherein the first branch consists of two CBS modules and space-to-depth operation; the second branch is a module for enhancing feature extraction capability, and features extracted by the two branches are fused. According to the invention, the small target enhancement module is used for extracting detail characteristics and enriching semantic information of the small target, so that the detection effect of the small target is improved, and the CIoU loss function is replaced by normalized Gaussian distance loss.

Description

Yolov 7-based small target detection method and system

Technical Field

The invention belongs to the technical field of deep learning target detection, and particularly relates to a method and a system for detecting a small target based on YOLOv 7.

Background

Document one (Qi Linglong, gao Jian) proposes an improved YOLOv7 object detection model based on improved YOLOv7 small object detection [ J ]. Computer engineering, 2023,49 (01): 41-48.) by improving the MPConv module to reduce the feature loss caused by the network feature handling process; increasing the sensitivity of the network to the small-scale target by using an ACmix attention module; finally, the loss function is replaced by SIoU, so that the robustness of the network is improved, and the improved algorithm can effectively improve the detection precision of the small target.

Document two (Li Jiaxin, hou Jin, cheng Boying, peripheral aerospace. Remote sensing small target detection network based on improved YOLOv5 [ J/OL ]. Computer engineering: 1-11[2023-03-21 ]) proposes a remote sensing small target detection algorithm based on YOLOv5-RS. Firstly, a parallel mixed attention module is constructed, the interference of complex background and negative sample in the image is reduced, and the process of generating a weight characteristic diagram by the attention module is optimized by adopting the operations of convolution to replace a fully connected layer and removing a pooling layer. Further, in order to acquire and transmit the small target features which are richer and more discriminant, the downsampling multiple is adjusted, shallow features rich in small target information are added in the model training process, a feature extraction module combining convolution and multi-head self-attention is designed, and the limitation of common convolution extraction is broken through by carrying out joint characterization on local and global information, so that a larger receptive field is obtained. And finally, optimizing the regression process of the prediction frame and the detection frame by adopting an EIoU loss function, and enhancing the positioning capability of the small target. Experiments on remote sensing small target datasets can prove the superiority of the proposed algorithm.

The existing target detection network has poor detection effect on small targets, and the main network has insufficient detection capability on the small targets, so that the targets are detected by mistake or are missed. The reason for this can be summarized as follows:

1. object size-small object size in an image is relatively small, and its area may be only a small part of the whole image, which makes it difficult to accurately extract the features of the small object, resulting in poor detection effect.

2. Object density-small objects are typically dense, and there may be some overlap or occlusion between them, which also makes detection more difficult.

3. Background noise-small objects are often in a complex background, which increases the difficulty of detection. In addition, due to the small object area, noise and background of surrounding pixels can interfere with feature extraction and classification, thereby affecting detection effect.

4. Model selection some object detection models (like Faster R-CNN) do not work well when dealing with small objects, as they may only focus on larger objects, for which it is often difficult to capture features.

5. The data volume is insufficient, because the number of small objects is relatively small, most data sets can only have a small amount of small object data, so that the model is difficult to fully learn the characteristics of the small objects, and the detection effect is affected.

Disclosure of Invention

In order to solve the problem that the existing target detection network is insufficient in small target detection capability, the invention provides a method and a system for detecting a small target based on YOLOv7, so that the detection precision of the small target is improved.

The technical scheme of the invention is as follows: according to the small target detection method based on the YOLOv7, after image data are processed, a small target detection model based on the YOLOv7 is input to carry out small target detection, and the small target detection model based on the YOLOv7 comprises a main network, a neck network and a prediction network; the image is sent to a backbone network for feature extraction, then the neck network carries out feature fusion on the extracted features, and the fused features are sent to a large-scale detection layer, a middle-scale detection layer, a small-scale detection layer and a micro-scale detection layer of a prediction network for detection, so that a detection result is obtained; the backbone network sequentially comprises a 1 st to 4 th CBS module, a 1 st ELAN module, a 1 st MP module, a 2 nd ELAN module, a 2 nd MP module, a 3 rd ELAN module, a 3 rd MP module, a 4 th ELAN module and a small target enhancement module; the small target enhancement module comprises two branches, wherein the first branch consists of two CBS modules and space-to-depth operation, the space-to-depth operation is performed after the convolution of one CBS module, and then the channel number is adjusted after the convolution of one CBS module; the second branch is a module for enhancing feature extraction capability, and features extracted by the two branches are fused;

the enhanced feature extraction capability module is composed of two C2f modules, a CBS module and a Shuffle attention module, the input features are respectively divided into two branches, one branch is sequentially composed of one C2f module and the Shuffle attention module, the other branch is sequentially composed of a CBS module and a C2f module which are formed by 1X1 convolution, and finally the features extracted by the two branches are fused.

Further preferably, the space-to-depth operation means: first, for the input size of S x C ₁ Is divided into 4S/2 XS/2 XC ₁ S represents length and width, C ₁ Representing the number of channels, and then splicing the channels to obtain S/2 XS/2X 4C ₁ Is characterized by (2); finally, the feature is processed by convolution of 1×1 to obtain S/2×S/2×4C ₂ Is characterized by C ₂ Representing the number of convolved channels of 1x 1.

Further preferably, the neck network is composed of SPPCSPC module, CBS module, up-sampling, ELAN-W module and MP module, and MP module is composed of max-pooling layer and CBS module for down-sampling the feature map.

Further preferably, the loss function of the YOLOv 7-based small target detection model consists of coordinate loss, target confidence loss, classification loss; the coordinate loss function uses normalized gaussian wasperstein distance loss.

Further preferably, in the detection process of the small target detection model based on YOLOv7, the bounding boxes are modeled as two-dimensional gaussian distribution, and the second-order wasperstein distance between the bounding boxes is as follows:

wherein ,representing the second order Wasserstein distance between bounding box A and bounding box B, +.>Is the central abscissa of bounding box A, +.>Is the central ordinate of bounding box A, +.>For the width of bounding box A, +.>Is the height of bounding box a; />Is the central abscissa of bounding box B, +.>Is the central ordinate of bounding box B, +.>For the width of bounding box B, +.>For the height of bounding box B, T represents the transpose;

the second order wasperstein distance is normalized and a so-called normalized gaussian wasperstein distance is obtained:

in the formula ,representing the normalized Gaussian Wasserstein distance between bounding box A and bounding box B, C is the normalization constant.

The invention provides a small target detection system based on YOLOv7, which comprises an image data acquisition module, an image data preprocessing module and a small target detection module, wherein an image data acquisition module is used for acquiring a target image to be detected, then the image data preprocessing module is used for preprocessing the target image, the preprocessing comprises image enhancement, a small target detection model based on YOLOv7 is arranged in the small target detection module, the small target detection model based on YOLOv7 comprises a main network for feature extraction, a neck network for feature fusion and a prediction network, and the prediction network comprises a large-scale detection layer, a medium-scale detection layer, a small-scale detection layer and a small-scale detection layer; the backbone network sequentially comprises a 1 st to 4 th CBS module, a 1 st ELAN module, a 1 st MP module, a 2 nd ELAN module, a 2 nd MP module, a 3 rd ELAN module, a 3 rd MP module, a 4 th ELAN module and a small target enhancement module; the small target enhancement module comprises two branches, wherein the first branch consists of two CBS modules and space-to-depth operation, and after the convolution of one CBS module, the channel number is adjusted through the space-to-depth operation and then the convolution of one CBS module; the second branch is a module for enhancing feature extraction capability, and features extracted by the two branches are fused; the enhanced feature extraction capability module is composed of two C2f modules, a CBS module and a Shuffle attention module, the input features are respectively passed through two branches, one branch is composed of one C2f module and the Shuffle attention module in sequence, the other branch is composed of a CBS module and a C2f module which are formed by 1X1 convolution in sequence, and finally the features extracted by the two branches are fused.

The invention also provides a nonvolatile computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions execute the small target detection method based on the YOLOv 7.

The present invention also provides a computer program product comprising a computer program stored on a non-volatile computer storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the YOLOv 7-based small object detection method described above.

The present invention provides an electronic device including: the system comprises at least one processor and a memory communicatively connected with the at least one processor, wherein the memory stores instructions for execution by the at least one processor to cause the at least one processor to perform a YOLOv 7-based small object detection method.

The invention has the beneficial effects that:

1. since the original YOLOv7 network has only three detection scales and is not aimed at a small target, the YOLOv7 network is generally not ideal in detecting a small target. Based on the above factors, we add a detection layer special for detecting small objects-a micro-scale detection layer based on the original YOLOv7 network, and expand the detection layer into four.

2. The main network is used for feature extraction, but as the feature extraction of the small target is difficult, the features of the small target are difficult to capture, and the final detection effect is not ideal, a small target enhancement module is proposed and designed and placed in the main network to extract detail features and enrich semantic information of the small target, and the detection effect of the small target is improved.

3. Loss function improvement: the loss function of the original YOLOv7 network is CIoU, however CIoU is very sensitive to the position deviation of tiny objects, and therefore the detection performance of the network model is reduced. The invention therefore replaces the CIoU loss function with a normalized gaussian wasperstein distance loss specific to the small target.

The method can be applied to a plurality of fields, such as offshore searching and rescuing tasks, for measuring and opening people, ships and other objects in the water domain, and the occupied area of the objects in pictures sent in the offshore rescuing tasks is small, so that a common target detection network is difficult to accurately identify targets, and other losses such as casualties can be caused. The invention can improve the precision of small target detection.

Drawings

FIG. 1 is a diagram of a YOLOv7 object detection model architecture.

Fig. 2 is a schematic diagram of a backbone network structure according to the present invention.

Fig. 3 is a schematic diagram of a small target enhancement module structure.

FIG. 4 is a schematic diagram of an enhanced feature extraction capability module.

Fig. 5 is a schematic diagram of a space-to-depth operation.

Detailed Description

The invention is further elucidated in detail below with reference to examples and figures.

The YOLOv7 target detection algorithm consists of four parts: input, backbone network (Backbone), neck network (neg), and predictive network (Prediction). The main network mainly performs the function of feature extraction, the neck network performs the function of feature fusion, and feature graphs with different sizes obtained from the neck network are input into a prediction network to respectively predict three targets with different sizes.

The embodiment improves the YOLOv7 target detection algorithm, adds a scale on the prediction network, is specially used for detecting small targets, adds a small target enhancement module on the main network so as to improve the extraction capability of the small targets, and replaces the original CIoU loss function with normalized gaussian Wasserstein Distance loss.

Referring to fig. 1, in the YOLOv 7-based small target detection method of the present embodiment, after image data is processed, small target detection is performed by a YOLOv 7-based small target detection model, where the YOLOv 7-based small target detection model includes a backbone network, a neck network, and a prediction network; the image is sent into a backbone network for feature extraction, then the neck network carries out feature fusion on the extracted features, and the fused features are sent into a large-scale detection layer, a middle-scale detection layer, a small-scale detection layer and a small-scale detection layer of a prediction network for detection, so that a detection result is obtained. The original small-scale detection layer in YOLOv7 has poor applicability to these small targets. Thus we add a micro-scale detection layer that downsamples the input image dimension by 4. The microscale detection layer generates feature maps by extracting lower spatial features and fusing them with depth semantic features. The novel micro-scale detection layer forms a wider and more detailed detection network structure and is suitable for detecting smaller targets.

As shown in fig. 2, the backbone network of the present embodiment adds a small target enhancement module at the end of the backbone network of YOLOv7 in the prior art, so as to enhance the extraction capability of the backbone network to small targets. The backbone network of this embodiment includes, in order, a 1 st to 4 th CBS module, a 1 st ELAN module, a 1 st MP module, a 2 nd ELAN module, a 2 nd MP module, a 3 rd ELAN module, a 3 rd MP module, a 4 th ELAN module, and a small target enhancement module. The main network plays a role in feature extraction, and features obtained through the main network are input into the neck network.

As shown in fig. 3, the small target enhancement module includes two branches, the first branch is composed of two CBS modules and a Space-to-Depth operation, and after convolution by one CBS module, the channel number is adjusted by Space-to-Depth operation and then convolution by one CBS module; the second branch is a feature extraction enhancement capability module designed in the embodiment, and the feature extraction enhancement capability module can improve the feature extraction capability of the model on a small target; finally, the features extracted by the two branches are fused, so that the feature extraction capability of the small target is improved.

The enhanced feature extraction capability module is shown in fig. 4, and is composed of two C2f modules, a CBS module and a Shuffle attention module, wherein the input features respectively pass through two branches, one branch sequentially consists of one C2f module and the Shuffle attention module, and the purpose of the design is that the C2f module with excellent performance is firstly used for extracting more detail features in a small target object, the following Shuffle attention module can pay attention to the small target object features from space and channels, and the features output by the branch contain more abundant detail information. The other branch sequentially consists of a CBS module and a C2f module which are formed by 1X1 convolution, and finally, the features extracted by the two branches are fused. The module can enhance the extraction capability of the model on small target features and inhibit the interference of complex backgrounds and negative samples in images.

The space-to-depth operation is as shown in FIG. 5, and is performed by first inputting a size S x C ₁ Is divided into 4S/2 XS/2 XC ₁ S represents length and width, C ₁ Representing the number of channels, and then splicing the channels to obtain S/2 XS/2X 4C ₁ Is characterized by (3). Finally, the feature is processed by convolution of 1×1 to obtain S/2×S/2×4C ₂ Features of (2)，C ₂ Representing the number of convolved channels of 1x 1. The space-to-depth operation replaces the convolution step size, eliminating the loss of fine-grained information due to the use of the convolution step size.

The input end uses mosaic enhancement technology, four images input are randomly selected and cut and mixed into a picture, the length and width of the cutting position can be randomly changed, and the purpose is to expand the data volume and avoid the model overfitting.

The neck network is a feature fusion network that fuses top and bottom features top-down using a fpn+pan combination. The neck network is composed of SPPCSPC module, CBS module, up-sampling, ELAN-W module and MP module, SPPCSPC module is used to increase the receptive field of the network.

The CBS modules are composed of convolution (Conv), BN layers and Silu activation functions, in the YOLOv7 network, each CBS module has different functions according to different convolution kernels and step sizes, 1*1 changes the channel number, 3×3, and stride=1 performs feature extraction on an input image, and stride=2 downsampling.

The ELAN module is an efficient network architecture that allows the network to learn more features and is more robust by controlling the shortest and longest gradient paths. The ELAN has two branches. The first branch is convolved by a 1x1 convolution to make a change in the number of channels. The second branch is more complex. It first passes through a 1x1 convolution module to make channel number change. And then, performing feature extraction through four 3x3 convolution modules. And finally, superposing the four features together to obtain a final feature extraction result.

ELAN-W module: slightly different from the ELAN module, the number of outputs selected at the time of the second branch is different. The ELAN module selects the three outputs for final addition. Five ELAN-W modules are selected for addition.

The MP module is composed of a maximum pooling layer and a CBS module and is used for downsampling the feature map.

The loss function of the YOLOv7 network consists of coordinate loss, target confidence loss, and classification loss. Wherein the coordinate loss function uses a CIoU loss function. The CIoU loss function is very sensitive to the position deviation of the tiny object, so the detection performance of the network model is reduced, so in order to improve the detection performance of the small target, the YOLOv 7-based small target detection model of the embodiment replaces the CIoU loss function with the normalized gaussian wasperstein distance loss. Specifically, the bounding box is first modeled as a two-dimensional gaussian distribution, and then a new metric called normalized gaussian distance (NWD) is proposed, with the similarity between them calculated from their corresponding gaussian distribution.

To better describe the weights of the different pixels in the bounding box, the bounding box is modeled as a two-dimensional gaussian distribution, with the center pixel of the bounding box having the highest weight and the importance of the pixel decreasing from center to boundary. For horizontal bounding boxes，/>Representing center coordinates (+)>For the center abscissa, +.>Center ordinate),>in the form of a width, the width,to be high, an expression for an inscribed ellipse within a horizontal bounding box is derived:

，

where x is the abscissa of the two-dimensional gaussian, y is the ordinate of the two-dimensional gaussian,is the center coordinates of the ellipse (+)>Center abscissa of ellipse, ">Center ordinate of ellipse),>half-shaft length along transverse axis>Is the half-axis length along the longitudinal axis. Thus (S)>，/>，/>，/>。

From this, a probability density function of the two-dimensional gaussian distribution is derived:

，

wherein X represents the coordinates (X, y) of a two-dimensional Gaussian distribution,mean vector representing two-dimensional gaussian distribution, +.>The covariance matrix of the two-dimensional gaussian distribution is represented, and T represents the transpose matrix. When:

，

the aforementioned ellipse will be a density profile of a two-dimensional gaussian distribution. Thus, a horizontal bounding boxCan be modeled as a two-dimensional Gaussian distribution +.>, wherein ：

，

furthermore, the bounding box a and the bounding box B may be converted into a distribution distance between two gaussian distributions according to the similarity between the bounding boxes. The distribution distance was calculated using the wasperstein distance. For two-dimensional gaussian distributions and />The second order wasperstein distance between them is:

，

in the formula ,representing a two-dimensional gaussian distribution a,/>Representing a two-dimensional gaussian distribution B, N representing a two-dimensional gaussian distribution function,mean vector representing two-dimensional gaussian distribution a, +.>Mean vector representing two-dimensional gaussian distribution B, +.>Covariance matrix representing two-dimensional gaussian distribution a,/->Representing the covariance matrix of the two-dimensional gaussian distribution B. />Representing the second order Wasserstein distance between bounding box A and bounding box B, +.>Indicating the Frobenius norm.

Furthermore, bounding boxIs>And bounding box->Is>The second order wasperstein distance between them is further reduced to:

，

wherein ,is the central abscissa of bounding box A, +.>Is the central ordinate of bounding box A, +.>For the width of bounding box A, +.>Is the height of bounding box a; />Is the central abscissa of bounding box B, +.>As the central ordinate of the bounding box B,for the width of bounding box B, +.>Is the height of bounding box B. />Is a distance metric that cannot be used directly as a similarity metric (i.e., a value between 0 and 1 as IoU). We therefore normalize using its exponential form and obtain what is known as normalized gaussian wasperstein distance (NWD):

，

in the formula ,representing the normalized Gaussian Wasserstein distance between bounding box A and bounding box B, C is a normalization constant, a second order Wasserstein distance average may be used, or a given one.

Compared to IoU, NWD has the advantages of scale invariance in detecting small objects, smoothness of positional deviation, and capability of measuring similarity between bounding boxes that do not overlap or are mutually inclusive. Therefore, the method is integrated into the architecture for constructing the small target detection network model at this time.

The small target detection system based on the YOLOv7 comprises an image data acquisition module, an image data preprocessing module and a small target detection module, wherein an image data acquisition module acquires a target image to be detected, then the image data acquisition module performs preprocessing, the preprocessing comprises image enhancement, the small target detection module is internally provided with a small target detection model based on the YOLOv7, the small target detection model based on the YOLOv7 comprises a main network for feature extraction, a neck network for feature fusion and a prediction network, and the prediction network comprises a large-scale detection layer, a medium-scale detection layer, a small-scale detection layer and a micro-scale detection layer; the backbone network sequentially comprises a 1 st to 4 th CBS module, a 1 st ELAN module, a 1 st MP module, a 2 nd ELAN module, a 2 nd MP module, a 3 rd ELAN module, a 3 rd MP module, a 4 th ELAN module and a small target enhancement module; the small target enhancement module comprises two branches, wherein the first branch consists of two CBS modules and space-to-depth operation, and after the convolution of one CBS module, the channel number is adjusted through the space-to-depth operation and then the convolution of one CBS module; the second branch is a module for enhancing feature extraction capability, and features extracted by the two branches are fused; the enhanced feature extraction capability module is composed of two C2f modules, a CBS module and a Shuffle attention module, the input features are respectively passed through two branches, one branch is composed of one C2f module and the Shuffle attention module in sequence, the other branch is composed of a CBS module and a C2f module which are formed by 1X1 convolution in sequence, and finally the features extracted by the two branches are fused.

In another embodiment, a non-volatile computer storage medium is provided, the computer storage medium storing computer-executable instructions that are capable of performing the YOLOv 7-based small object detection method of any of the embodiments described above.

The present embodiment also provides a computer program product comprising a computer program stored on a non-volatile computer storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the YOLOv 7-based small object detection method of the above embodiments.

The present embodiment provides an electronic device including: the system comprises at least one processor and a memory communicatively connected with the at least one processor, wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform a YOLOv 7-based small object detection method.

The preferred embodiments of the invention disclosed above are intended only to assist in the explanation of the invention. The preferred embodiments are not exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. The invention is limited only by the claims and the full scope and equivalents thereof.

Claims

1. The small target detection method based on the YOLOv7 is characterized in that after image data are processed, a small target detection model based on the YOLOv7 is input to carry out small target detection, and the small target detection model based on the YOLOv7 comprises a main network, a neck network and a prediction network; the image is sent to a backbone network for feature extraction, then the neck network carries out feature fusion on the extracted features, and the fused features are sent to a large-scale detection layer, a middle-scale detection layer, a small-scale detection layer and a micro-scale detection layer of a prediction network for detection, so that a detection result is obtained; the backbone network sequentially comprises a 1 st to 4 th CBS module, a 1 st ELAN module, a 1 st MP module, a 2 nd ELAN module, a 2 nd MP module, a 3 rd ELAN module, a 3 rd MP module, a 4 th ELAN module and a small target enhancement module; the small target enhancement module comprises two branches, wherein the first branch consists of two CBS modules and space-to-depth operation, the space-to-depth operation is performed after the convolution of one CBS module, and then the channel number is adjusted after the convolution of one CBS module; the second branch is a module for enhancing feature extraction capability, and features extracted by the two branches are fused;

the enhanced feature extraction capability module consists of two C2f modules, a CBS module and a Shuffle attention module, the input features respectively pass through two branches, one branch sequentially consists of one C2f module and the Shuffle attention module, the other branch sequentially consists of a CBS module and a C2f module which are formed by 1X1 convolution, and finally the features extracted by the two branches are fused;

the space-to-depth operation is specifically: first, for the input size of S x C ₁ Is divided into 4S/2 XS/2 XC ₁ S represents length and width, C ₁ Representing the number of channels, and then splicing the channels to obtain S/2 XS/2X 4C ₁ Is characterized by (2); finally, the feature is processed by convolution of 1×1 to obtain S/2×S/2×4C ₂ Is characterized by C ₂ Representing the number of convolved channels of 1x 1.

2. The YOLOv 7-based small object detection method of claim 1, wherein the neck network is composed of SPPCSPC module, CBS module, upsampling, ELAN-W module, and MP module is composed of max pooling layer and CBS module for downsampling feature map; the ELAN-W module is different from the ELAN module in that: the number of outputs selected during the second branch is different, and the ELAN module selects three outputs for final addition, and the ELAN-W module selects five outputs for addition.

3. The YOLOv 7-based small target detection method of claim 1, wherein a loss function of the YOLOv 7-based small target detection model consists of a coordinate loss, a target confidence loss, a classification loss; the coordinate loss function uses normalized gaussian wasperstein distance loss.

4. The YOLOv 7-based small target detection method of claim 1, wherein in the YOLOv 7-based small target detection model detection process, bounding boxes are modeled as two-dimensional gaussian distributions, and second-order wasperstein distances between the bounding boxes are:

；

wherein ,representing bounding box ASecond order Wasserstein distance between the boundary box B,/and>is the central abscissa of bounding box A, +.>Is the central ordinate of bounding box A, +.>For the width of bounding box A, +.>Is the height of bounding box a; />Is the central abscissa of bounding box B, +.>Is the central ordinate of bounding box B, +.>For the width of bounding box B, +.>For the height of bounding box B, T represents the transpose;

；

5. The small target detection system based on the YOLOv7 is characterized by comprising an image data acquisition module, an image data preprocessing module and a small target detection module, wherein an image data acquisition module acquires a target image to be detected, then the image data acquisition module performs preprocessing, the preprocessing comprises image enhancement, the small target detection module is internally provided with a small target detection model based on the YOLOv7, the small target detection model based on the YOLOv7 comprises a main network for feature extraction, a neck network for feature fusion and a prediction network, and the prediction network comprises a large-scale detection layer, a medium-scale detection layer, a small-scale detection layer and a small-scale detection layer; the backbone network sequentially comprises a 1 st to 4 th CBS module, a 1 st ELAN module, a 1 st MP module, a 2 nd ELAN module, a 2 nd MP module, a 3 rd ELAN module, a 3 rd MP module, a 4 th ELAN module and a small target enhancement module; the small target enhancement module comprises two branches, wherein the first branch consists of two CBS modules and space-to-depth operation, and after the convolution of one CBS module, the channel number is adjusted through the space-to-depth operation and then the convolution of one CBS module; the second branch is a module for enhancing feature extraction capability, and features extracted by the two branches are fused; the feature extraction enhancement capability module consists of two C2f modules, a CBS module and a Shuffle attention module, the input features respectively pass through two branches, one branch sequentially consists of one C2f module and the Shuffle attention module, the other branch sequentially consists of a CBS module and a C2f module which are formed by 1X1 convolution, and finally the features extracted by the two branches are fused;

6. A non-transitory computer storage medium storing computer-executable instructions, wherein the computer-executable instructions perform the YOLOv 7-based small object detection method of claim 1.

7. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions for execution by the at least one processor, wherein the instructions are executable by the at least one processor to cause the at least one processor to perform the YOLOv 7-based small object detection method of claim 1.