CN116385810B - Yolov 7-based small target detection method and system - Google Patents

Yolov 7-based small target detection method and system Download PDF

Info

Publication number
CN116385810B
CN116385810B CN202310656474.0A CN202310656474A CN116385810B CN 116385810 B CN116385810 B CN 116385810B CN 202310656474 A CN202310656474 A CN 202310656474A CN 116385810 B CN116385810 B CN 116385810B
Authority
CN
China
Prior art keywords
module
small target
cbs
elan
small
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310656474.0A
Other languages
Chinese (zh)
Other versions
CN116385810A (en
Inventor
杨文姬
马欣欣
安航
胡文超
杨振姬
易文龙
杨红云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi Agricultural University
Original Assignee
Jiangxi Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangxi Agricultural University filed Critical Jiangxi Agricultural University
Priority to CN202310656474.0A priority Critical patent/CN116385810B/en
Publication of CN116385810A publication Critical patent/CN116385810A/en
Application granted granted Critical
Publication of CN116385810B publication Critical patent/CN116385810B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)
  • Radar Systems Or Details Thereof (AREA)

Abstract

The invention belongs to the technical field of deep learning target detection, and particularly relates to a small target detection method and system based on YOLOv7, wherein the method carries out small target detection through a small target detection model based on YOLOv7, and a small target enhancement module is added to a backbone network of the small target detection model based on YOLOv 7; the small target enhancement module comprises two branches, wherein the first branch consists of two CBS modules and space-to-depth operation; the second branch is a module for enhancing feature extraction capability, and features extracted by the two branches are fused. According to the invention, the small target enhancement module is used for extracting detail characteristics and enriching semantic information of the small target, so that the detection effect of the small target is improved, and the CIoU loss function is replaced by normalized Gaussian distance loss.

Description

Yolov 7-based small target detection method and system
Technical Field
The invention belongs to the technical field of deep learning target detection, and particularly relates to a method and a system for detecting a small target based on YOLOv 7.
Background
Document one (Qi Linglong, gao Jian) proposes an improved YOLOv7 object detection model based on improved YOLOv7 small object detection [ J ]. Computer engineering, 2023,49 (01): 41-48.) by improving the MPConv module to reduce the feature loss caused by the network feature handling process; increasing the sensitivity of the network to the small-scale target by using an ACmix attention module; finally, the loss function is replaced by SIoU, so that the robustness of the network is improved, and the improved algorithm can effectively improve the detection precision of the small target.
Document two (Li Jiaxin, hou Jin, cheng Boying, peripheral aerospace. Remote sensing small target detection network based on improved YOLOv5 [ J/OL ]. Computer engineering: 1-11[2023-03-21 ]) proposes a remote sensing small target detection algorithm based on YOLOv5-RS. Firstly, a parallel mixed attention module is constructed, the interference of complex background and negative sample in the image is reduced, and the process of generating a weight characteristic diagram by the attention module is optimized by adopting the operations of convolution to replace a fully connected layer and removing a pooling layer. Further, in order to acquire and transmit the small target features which are richer and more discriminant, the downsampling multiple is adjusted, shallow features rich in small target information are added in the model training process, a feature extraction module combining convolution and multi-head self-attention is designed, and the limitation of common convolution extraction is broken through by carrying out joint characterization on local and global information, so that a larger receptive field is obtained. And finally, optimizing the regression process of the prediction frame and the detection frame by adopting an EIoU loss function, and enhancing the positioning capability of the small target. Experiments on remote sensing small target datasets can prove the superiority of the proposed algorithm.
The existing target detection network has poor detection effect on small targets, and the main network has insufficient detection capability on the small targets, so that the targets are detected by mistake or are missed. The reason for this can be summarized as follows:
1. object size-small object size in an image is relatively small, and its area may be only a small part of the whole image, which makes it difficult to accurately extract the features of the small object, resulting in poor detection effect.
2. Object density-small objects are typically dense, and there may be some overlap or occlusion between them, which also makes detection more difficult.
3. Background noise-small objects are often in a complex background, which increases the difficulty of detection. In addition, due to the small object area, noise and background of surrounding pixels can interfere with feature extraction and classification, thereby affecting detection effect.
4. Model selection some object detection models (like Faster R-CNN) do not work well when dealing with small objects, as they may only focus on larger objects, for which it is often difficult to capture features.
5. The data volume is insufficient, because the number of small objects is relatively small, most data sets can only have a small amount of small object data, so that the model is difficult to fully learn the characteristics of the small objects, and the detection effect is affected.
Disclosure of Invention
In order to solve the problem that the existing target detection network is insufficient in small target detection capability, the invention provides a method and a system for detecting a small target based on YOLOv7, so that the detection precision of the small target is improved.
The technical scheme of the invention is as follows: according to the small target detection method based on the YOLOv7, after image data are processed, a small target detection model based on the YOLOv7 is input to carry out small target detection, and the small target detection model based on the YOLOv7 comprises a main network, a neck network and a prediction network; the image is sent to a backbone network for feature extraction, then the neck network carries out feature fusion on the extracted features, and the fused features are sent to a large-scale detection layer, a middle-scale detection layer, a small-scale detection layer and a micro-scale detection layer of a prediction network for detection, so that a detection result is obtained; the backbone network sequentially comprises a 1 st to 4 th CBS module, a 1 st ELAN module, a 1 st MP module, a 2 nd ELAN module, a 2 nd MP module, a 3 rd ELAN module, a 3 rd MP module, a 4 th ELAN module and a small target enhancement module; the small target enhancement module comprises two branches, wherein the first branch consists of two CBS modules and space-to-depth operation, the space-to-depth operation is performed after the convolution of one CBS module, and then the channel number is adjusted after the convolution of one CBS module; the second branch is a module for enhancing feature extraction capability, and features extracted by the two branches are fused;
the enhanced feature extraction capability module is composed of two C2f modules, a CBS module and a Shuffle attention module, the input features are respectively divided into two branches, one branch is sequentially composed of one C2f module and the Shuffle attention module, the other branch is sequentially composed of a CBS module and a C2f module which are formed by 1X1 convolution, and finally the features extracted by the two branches are fused.
Further preferably, the space-to-depth operation means: first, for the input size of S x C 1 Is divided into 4S/2 XS/2 XC 1 S represents length and width, C 1 Representing the number of channels, and then splicing the channels to obtain S/2 XS/2X 4C 1 Is characterized by (2); finally, the feature is processed by convolution of 1×1 to obtain S/2×S/2×4C 2 Is characterized by C 2 Representing the number of convolved channels of 1x 1.
Further preferably, the neck network is composed of SPPCSPC module, CBS module, up-sampling, ELAN-W module and MP module, and MP module is composed of max-pooling layer and CBS module for down-sampling the feature map.
Further preferably, the loss function of the YOLOv 7-based small target detection model consists of coordinate loss, target confidence loss, classification loss; the coordinate loss function uses normalized gaussian wasperstein distance loss.
Further preferably, in the detection process of the small target detection model based on YOLOv7, the bounding boxes are modeled as two-dimensional gaussian distribution, and the second-order wasperstein distance between the bounding boxes is as follows:
wherein ,representing the second order Wasserstein distance between bounding box A and bounding box B, +.>Is the central abscissa of bounding box A, +.>Is the central ordinate of bounding box A, +.>For the width of bounding box A, +.>Is the height of bounding box a; />Is the central abscissa of bounding box B, +.>Is the central ordinate of bounding box B, +.>For the width of bounding box B, +.>For the height of bounding box B, T represents the transpose;
the second order wasperstein distance is normalized and a so-called normalized gaussian wasperstein distance is obtained:
in the formula ,representing the normalized Gaussian Wasserstein distance between bounding box A and bounding box B, C is the normalization constant.
The invention provides a small target detection system based on YOLOv7, which comprises an image data acquisition module, an image data preprocessing module and a small target detection module, wherein an image data acquisition module is used for acquiring a target image to be detected, then the image data preprocessing module is used for preprocessing the target image, the preprocessing comprises image enhancement, a small target detection model based on YOLOv7 is arranged in the small target detection module, the small target detection model based on YOLOv7 comprises a main network for feature extraction, a neck network for feature fusion and a prediction network, and the prediction network comprises a large-scale detection layer, a medium-scale detection layer, a small-scale detection layer and a small-scale detection layer; the backbone network sequentially comprises a 1 st to 4 th CBS module, a 1 st ELAN module, a 1 st MP module, a 2 nd ELAN module, a 2 nd MP module, a 3 rd ELAN module, a 3 rd MP module, a 4 th ELAN module and a small target enhancement module; the small target enhancement module comprises two branches, wherein the first branch consists of two CBS modules and space-to-depth operation, and after the convolution of one CBS module, the channel number is adjusted through the space-to-depth operation and then the convolution of one CBS module; the second branch is a module for enhancing feature extraction capability, and features extracted by the two branches are fused; the enhanced feature extraction capability module is composed of two C2f modules, a CBS module and a Shuffle attention module, the input features are respectively passed through two branches, one branch is composed of one C2f module and the Shuffle attention module in sequence, the other branch is composed of a CBS module and a C2f module which are formed by 1X1 convolution in sequence, and finally the features extracted by the two branches are fused.
The invention also provides a nonvolatile computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions execute the small target detection method based on the YOLOv 7.
The present invention also provides a computer program product comprising a computer program stored on a non-volatile computer storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the YOLOv 7-based small object detection method described above.
The present invention provides an electronic device including: the system comprises at least one processor and a memory communicatively connected with the at least one processor, wherein the memory stores instructions for execution by the at least one processor to cause the at least one processor to perform a YOLOv 7-based small object detection method.
The invention has the beneficial effects that:
1. since the original YOLOv7 network has only three detection scales and is not aimed at a small target, the YOLOv7 network is generally not ideal in detecting a small target. Based on the above factors, we add a detection layer special for detecting small objects-a micro-scale detection layer based on the original YOLOv7 network, and expand the detection layer into four.
2. The main network is used for feature extraction, but as the feature extraction of the small target is difficult, the features of the small target are difficult to capture, and the final detection effect is not ideal, a small target enhancement module is proposed and designed and placed in the main network to extract detail features and enrich semantic information of the small target, and the detection effect of the small target is improved.
3. Loss function improvement: the loss function of the original YOLOv7 network is CIoU, however CIoU is very sensitive to the position deviation of tiny objects, and therefore the detection performance of the network model is reduced. The invention therefore replaces the CIoU loss function with a normalized gaussian wasperstein distance loss specific to the small target.
The method can be applied to a plurality of fields, such as offshore searching and rescuing tasks, for measuring and opening people, ships and other objects in the water domain, and the occupied area of the objects in pictures sent in the offshore rescuing tasks is small, so that a common target detection network is difficult to accurately identify targets, and other losses such as casualties can be caused. The invention can improve the precision of small target detection.
Drawings
FIG. 1 is a diagram of a YOLOv7 object detection model architecture.
Fig. 2 is a schematic diagram of a backbone network structure according to the present invention.
Fig. 3 is a schematic diagram of a small target enhancement module structure.
FIG. 4 is a schematic diagram of an enhanced feature extraction capability module.
Fig. 5 is a schematic diagram of a space-to-depth operation.
Detailed Description
The invention is further elucidated in detail below with reference to examples and figures.
The YOLOv7 target detection algorithm consists of four parts: input, backbone network (Backbone), neck network (neg), and predictive network (Prediction). The main network mainly performs the function of feature extraction, the neck network performs the function of feature fusion, and feature graphs with different sizes obtained from the neck network are input into a prediction network to respectively predict three targets with different sizes.
The embodiment improves the YOLOv7 target detection algorithm, adds a scale on the prediction network, is specially used for detecting small targets, adds a small target enhancement module on the main network so as to improve the extraction capability of the small targets, and replaces the original CIoU loss function with normalized gaussian Wasserstein Distance loss.
Referring to fig. 1, in the YOLOv 7-based small target detection method of the present embodiment, after image data is processed, small target detection is performed by a YOLOv 7-based small target detection model, where the YOLOv 7-based small target detection model includes a backbone network, a neck network, and a prediction network; the image is sent into a backbone network for feature extraction, then the neck network carries out feature fusion on the extracted features, and the fused features are sent into a large-scale detection layer, a middle-scale detection layer, a small-scale detection layer and a small-scale detection layer of a prediction network for detection, so that a detection result is obtained. The original small-scale detection layer in YOLOv7 has poor applicability to these small targets. Thus we add a micro-scale detection layer that downsamples the input image dimension by 4. The microscale detection layer generates feature maps by extracting lower spatial features and fusing them with depth semantic features. The novel micro-scale detection layer forms a wider and more detailed detection network structure and is suitable for detecting smaller targets.
As shown in fig. 2, the backbone network of the present embodiment adds a small target enhancement module at the end of the backbone network of YOLOv7 in the prior art, so as to enhance the extraction capability of the backbone network to small targets. The backbone network of this embodiment includes, in order, a 1 st to 4 th CBS module, a 1 st ELAN module, a 1 st MP module, a 2 nd ELAN module, a 2 nd MP module, a 3 rd ELAN module, a 3 rd MP module, a 4 th ELAN module, and a small target enhancement module. The main network plays a role in feature extraction, and features obtained through the main network are input into the neck network.
As shown in fig. 3, the small target enhancement module includes two branches, the first branch is composed of two CBS modules and a Space-to-Depth operation, and after convolution by one CBS module, the channel number is adjusted by Space-to-Depth operation and then convolution by one CBS module; the second branch is a feature extraction enhancement capability module designed in the embodiment, and the feature extraction enhancement capability module can improve the feature extraction capability of the model on a small target; finally, the features extracted by the two branches are fused, so that the feature extraction capability of the small target is improved.
The enhanced feature extraction capability module is shown in fig. 4, and is composed of two C2f modules, a CBS module and a Shuffle attention module, wherein the input features respectively pass through two branches, one branch sequentially consists of one C2f module and the Shuffle attention module, and the purpose of the design is that the C2f module with excellent performance is firstly used for extracting more detail features in a small target object, the following Shuffle attention module can pay attention to the small target object features from space and channels, and the features output by the branch contain more abundant detail information. The other branch sequentially consists of a CBS module and a C2f module which are formed by 1X1 convolution, and finally, the features extracted by the two branches are fused. The module can enhance the extraction capability of the model on small target features and inhibit the interference of complex backgrounds and negative samples in images.
The space-to-depth operation is as shown in FIG. 5, and is performed by first inputting a size S x C 1 Is divided into 4S/2 XS/2 XC 1 S represents length and width, C 1 Representing the number of channels, and then splicing the channels to obtain S/2 XS/2X 4C 1 Is characterized by (3). Finally, the feature is processed by convolution of 1×1 to obtain S/2×S/2×4C 2 Features of (2),C 2 Representing the number of convolved channels of 1x 1. The space-to-depth operation replaces the convolution step size, eliminating the loss of fine-grained information due to the use of the convolution step size.
The input end uses mosaic enhancement technology, four images input are randomly selected and cut and mixed into a picture, the length and width of the cutting position can be randomly changed, and the purpose is to expand the data volume and avoid the model overfitting.
The neck network is a feature fusion network that fuses top and bottom features top-down using a fpn+pan combination. The neck network is composed of SPPCSPC module, CBS module, up-sampling, ELAN-W module and MP module, SPPCSPC module is used to increase the receptive field of the network.
The CBS modules are composed of convolution (Conv), BN layers and Silu activation functions, in the YOLOv7 network, each CBS module has different functions according to different convolution kernels and step sizes, 1*1 changes the channel number, 3×3, and stride=1 performs feature extraction on an input image, and stride=2 downsampling.
The ELAN module is an efficient network architecture that allows the network to learn more features and is more robust by controlling the shortest and longest gradient paths. The ELAN has two branches. The first branch is convolved by a 1x1 convolution to make a change in the number of channels. The second branch is more complex. It first passes through a 1x1 convolution module to make channel number change. And then, performing feature extraction through four 3x3 convolution modules. And finally, superposing the four features together to obtain a final feature extraction result.
ELAN-W module: slightly different from the ELAN module, the number of outputs selected at the time of the second branch is different. The ELAN module selects the three outputs for final addition. Five ELAN-W modules are selected for addition.
The MP module is composed of a maximum pooling layer and a CBS module and is used for downsampling the feature map.
The loss function of the YOLOv7 network consists of coordinate loss, target confidence loss, and classification loss. Wherein the coordinate loss function uses a CIoU loss function. The CIoU loss function is very sensitive to the position deviation of the tiny object, so the detection performance of the network model is reduced, so in order to improve the detection performance of the small target, the YOLOv 7-based small target detection model of the embodiment replaces the CIoU loss function with the normalized gaussian wasperstein distance loss. Specifically, the bounding box is first modeled as a two-dimensional gaussian distribution, and then a new metric called normalized gaussian distance (NWD) is proposed, with the similarity between them calculated from their corresponding gaussian distribution.
To better describe the weights of the different pixels in the bounding box, the bounding box is modeled as a two-dimensional gaussian distribution, with the center pixel of the bounding box having the highest weight and the importance of the pixel decreasing from center to boundary. For horizontal bounding boxes,/>Representing center coordinates (+)>For the center abscissa, +.>Center ordinate),>in the form of a width, the width,to be high, an expression for an inscribed ellipse within a horizontal bounding box is derived:
where x is the abscissa of the two-dimensional gaussian, y is the ordinate of the two-dimensional gaussian,is the center coordinates of the ellipse (+)>Center abscissa of ellipse, ">Center ordinate of ellipse),>half-shaft length along transverse axis>Is the half-axis length along the longitudinal axis. Thus (S)>,/>,/>,/>
From this, a probability density function of the two-dimensional gaussian distribution is derived:
wherein X represents the coordinates (X, y) of a two-dimensional Gaussian distribution,mean vector representing two-dimensional gaussian distribution, +.>The covariance matrix of the two-dimensional gaussian distribution is represented, and T represents the transpose matrix. When:
the aforementioned ellipse will be a density profile of a two-dimensional gaussian distribution. Thus, a horizontal bounding boxCan be modeled as a two-dimensional Gaussian distribution +.>, wherein :
furthermore, the bounding box a and the bounding box B may be converted into a distribution distance between two gaussian distributions according to the similarity between the bounding boxes. The distribution distance was calculated using the wasperstein distance. For two-dimensional gaussian distributions and />The second order wasperstein distance between them is:
in the formula ,representing a two-dimensional gaussian distribution a,/>Representing a two-dimensional gaussian distribution B, N representing a two-dimensional gaussian distribution function,mean vector representing two-dimensional gaussian distribution a, +.>Mean vector representing two-dimensional gaussian distribution B, +.>Covariance matrix representing two-dimensional gaussian distribution a,/->Representing the covariance matrix of the two-dimensional gaussian distribution B. />Representing the second order Wasserstein distance between bounding box A and bounding box B, +.>Indicating the Frobenius norm.
Furthermore, bounding boxIs>And bounding box->Is>The second order wasperstein distance between them is further reduced to:
wherein ,is the central abscissa of bounding box A, +.>Is the central ordinate of bounding box A, +.>For the width of bounding box A, +.>Is the height of bounding box a; />Is the central abscissa of bounding box B, +.>As the central ordinate of the bounding box B,for the width of bounding box B, +.>Is the height of bounding box B. />Is a distance metric that cannot be used directly as a similarity metric (i.e., a value between 0 and 1 as IoU). We therefore normalize using its exponential form and obtain what is known as normalized gaussian wasperstein distance (NWD):
in the formula ,representing the normalized Gaussian Wasserstein distance between bounding box A and bounding box B, C is a normalization constant, a second order Wasserstein distance average may be used, or a given one.
Compared to IoU, NWD has the advantages of scale invariance in detecting small objects, smoothness of positional deviation, and capability of measuring similarity between bounding boxes that do not overlap or are mutually inclusive. Therefore, the method is integrated into the architecture for constructing the small target detection network model at this time.
The small target detection system based on the YOLOv7 comprises an image data acquisition module, an image data preprocessing module and a small target detection module, wherein an image data acquisition module acquires a target image to be detected, then the image data acquisition module performs preprocessing, the preprocessing comprises image enhancement, the small target detection module is internally provided with a small target detection model based on the YOLOv7, the small target detection model based on the YOLOv7 comprises a main network for feature extraction, a neck network for feature fusion and a prediction network, and the prediction network comprises a large-scale detection layer, a medium-scale detection layer, a small-scale detection layer and a micro-scale detection layer; the backbone network sequentially comprises a 1 st to 4 th CBS module, a 1 st ELAN module, a 1 st MP module, a 2 nd ELAN module, a 2 nd MP module, a 3 rd ELAN module, a 3 rd MP module, a 4 th ELAN module and a small target enhancement module; the small target enhancement module comprises two branches, wherein the first branch consists of two CBS modules and space-to-depth operation, and after the convolution of one CBS module, the channel number is adjusted through the space-to-depth operation and then the convolution of one CBS module; the second branch is a module for enhancing feature extraction capability, and features extracted by the two branches are fused; the enhanced feature extraction capability module is composed of two C2f modules, a CBS module and a Shuffle attention module, the input features are respectively passed through two branches, one branch is composed of one C2f module and the Shuffle attention module in sequence, the other branch is composed of a CBS module and a C2f module which are formed by 1X1 convolution in sequence, and finally the features extracted by the two branches are fused.
In another embodiment, a non-volatile computer storage medium is provided, the computer storage medium storing computer-executable instructions that are capable of performing the YOLOv 7-based small object detection method of any of the embodiments described above.
The present embodiment also provides a computer program product comprising a computer program stored on a non-volatile computer storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the YOLOv 7-based small object detection method of the above embodiments.
The present embodiment provides an electronic device including: the system comprises at least one processor and a memory communicatively connected with the at least one processor, wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform a YOLOv 7-based small object detection method.
The preferred embodiments of the invention disclosed above are intended only to assist in the explanation of the invention. The preferred embodiments are not exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. The invention is limited only by the claims and the full scope and equivalents thereof.

Claims (7)

1. The small target detection method based on the YOLOv7 is characterized in that after image data are processed, a small target detection model based on the YOLOv7 is input to carry out small target detection, and the small target detection model based on the YOLOv7 comprises a main network, a neck network and a prediction network; the image is sent to a backbone network for feature extraction, then the neck network carries out feature fusion on the extracted features, and the fused features are sent to a large-scale detection layer, a middle-scale detection layer, a small-scale detection layer and a micro-scale detection layer of a prediction network for detection, so that a detection result is obtained; the backbone network sequentially comprises a 1 st to 4 th CBS module, a 1 st ELAN module, a 1 st MP module, a 2 nd ELAN module, a 2 nd MP module, a 3 rd ELAN module, a 3 rd MP module, a 4 th ELAN module and a small target enhancement module; the small target enhancement module comprises two branches, wherein the first branch consists of two CBS modules and space-to-depth operation, the space-to-depth operation is performed after the convolution of one CBS module, and then the channel number is adjusted after the convolution of one CBS module; the second branch is a module for enhancing feature extraction capability, and features extracted by the two branches are fused;
the enhanced feature extraction capability module consists of two C2f modules, a CBS module and a Shuffle attention module, the input features respectively pass through two branches, one branch sequentially consists of one C2f module and the Shuffle attention module, the other branch sequentially consists of a CBS module and a C2f module which are formed by 1X1 convolution, and finally the features extracted by the two branches are fused;
the space-to-depth operation is specifically: first, for the input size of S x C 1 Is divided into 4S/2 XS/2 XC 1 S represents length and width, C 1 Representing the number of channels, and then splicing the channels to obtain S/2 XS/2X 4C 1 Is characterized by (2); finally, the feature is processed by convolution of 1×1 to obtain S/2×S/2×4C 2 Is characterized by C 2 Representing the number of convolved channels of 1x 1.
2. The YOLOv 7-based small object detection method of claim 1, wherein the neck network is composed of SPPCSPC module, CBS module, upsampling, ELAN-W module, and MP module is composed of max pooling layer and CBS module for downsampling feature map; the ELAN-W module is different from the ELAN module in that: the number of outputs selected during the second branch is different, and the ELAN module selects three outputs for final addition, and the ELAN-W module selects five outputs for addition.
3. The YOLOv 7-based small target detection method of claim 1, wherein a loss function of the YOLOv 7-based small target detection model consists of a coordinate loss, a target confidence loss, a classification loss; the coordinate loss function uses normalized gaussian wasperstein distance loss.
4. The YOLOv 7-based small target detection method of claim 1, wherein in the YOLOv 7-based small target detection model detection process, bounding boxes are modeled as two-dimensional gaussian distributions, and second-order wasperstein distances between the bounding boxes are:
wherein ,representing bounding box ASecond order Wasserstein distance between the boundary box B,/and>is the central abscissa of bounding box A, +.>Is the central ordinate of bounding box A, +.>For the width of bounding box A, +.>Is the height of bounding box a; />Is the central abscissa of bounding box B, +.>Is the central ordinate of bounding box B, +.>For the width of bounding box B, +.>For the height of bounding box B, T represents the transpose;
the second order wasperstein distance is normalized and a so-called normalized gaussian wasperstein distance is obtained:
in the formula ,representing the normalized Gaussian Wasserstein distance between bounding box A and bounding box B, C is the normalization constant.
5. The small target detection system based on the YOLOv7 is characterized by comprising an image data acquisition module, an image data preprocessing module and a small target detection module, wherein an image data acquisition module acquires a target image to be detected, then the image data acquisition module performs preprocessing, the preprocessing comprises image enhancement, the small target detection module is internally provided with a small target detection model based on the YOLOv7, the small target detection model based on the YOLOv7 comprises a main network for feature extraction, a neck network for feature fusion and a prediction network, and the prediction network comprises a large-scale detection layer, a medium-scale detection layer, a small-scale detection layer and a small-scale detection layer; the backbone network sequentially comprises a 1 st to 4 th CBS module, a 1 st ELAN module, a 1 st MP module, a 2 nd ELAN module, a 2 nd MP module, a 3 rd ELAN module, a 3 rd MP module, a 4 th ELAN module and a small target enhancement module; the small target enhancement module comprises two branches, wherein the first branch consists of two CBS modules and space-to-depth operation, and after the convolution of one CBS module, the channel number is adjusted through the space-to-depth operation and then the convolution of one CBS module; the second branch is a module for enhancing feature extraction capability, and features extracted by the two branches are fused; the feature extraction enhancement capability module consists of two C2f modules, a CBS module and a Shuffle attention module, the input features respectively pass through two branches, one branch sequentially consists of one C2f module and the Shuffle attention module, the other branch sequentially consists of a CBS module and a C2f module which are formed by 1X1 convolution, and finally the features extracted by the two branches are fused;
the space-to-depth operation is specifically: first, for the input size of S x C 1 Is divided into 4S/2 XS/2 XC 1 S represents length and width, C 1 Representing the number of channels, and then splicing the channels to obtain S/2 XS/2X 4C 1 Is characterized by (2); finally, the feature is processed by convolution of 1×1 to obtain S/2×S/2×4C 2 Is characterized by C 2 Representing the number of convolved channels of 1x 1.
6. A non-transitory computer storage medium storing computer-executable instructions, wherein the computer-executable instructions perform the YOLOv 7-based small object detection method of claim 1.
7. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions for execution by the at least one processor, wherein the instructions are executable by the at least one processor to cause the at least one processor to perform the YOLOv 7-based small object detection method of claim 1.
CN202310656474.0A 2023-06-05 2023-06-05 Yolov 7-based small target detection method and system Active CN116385810B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310656474.0A CN116385810B (en) 2023-06-05 2023-06-05 Yolov 7-based small target detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310656474.0A CN116385810B (en) 2023-06-05 2023-06-05 Yolov 7-based small target detection method and system

Publications (2)

Publication Number Publication Date
CN116385810A CN116385810A (en) 2023-07-04
CN116385810B true CN116385810B (en) 2023-08-15

Family

ID=86977295

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310656474.0A Active CN116385810B (en) 2023-06-05 2023-06-05 Yolov 7-based small target detection method and system

Country Status (1)

Country Link
CN (1) CN116385810B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117218329B (en) * 2023-11-09 2024-01-26 四川泓宝润业工程技术有限公司 Wellhead valve detection method and device, storage medium and electronic equipment
CN117611998A (en) * 2023-11-22 2024-02-27 盐城工学院 Optical remote sensing image target detection method based on improved YOLOv7

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113792785A (en) * 2021-09-14 2021-12-14 上海理工大学 Rapid identification method for ship attachment based on WGAN-GP and YOLO
CN114565900A (en) * 2022-01-18 2022-05-31 广州软件应用技术研究院 Target detection method based on improved YOLOv5 and binocular stereo vision
CN114565035A (en) * 2022-02-24 2022-05-31 集美大学 Tongue picture analysis method, terminal equipment and storage medium
WO2022110158A1 (en) * 2020-11-30 2022-06-02 Intel Corporation Online learning method and system for action recongition
CN115512387A (en) * 2022-08-15 2022-12-23 艾迪恩(山东)科技有限公司 Construction site safety helmet wearing detection method based on improved YOLOV5 model
CN115965827A (en) * 2023-01-17 2023-04-14 淮阴工学院 Lightweight small target detection method and device integrating multi-scale features
CN116206185A (en) * 2023-02-27 2023-06-02 山东浪潮科学研究院有限公司 Lightweight small target detection method based on improved YOLOv7

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022110158A1 (en) * 2020-11-30 2022-06-02 Intel Corporation Online learning method and system for action recongition
CN113792785A (en) * 2021-09-14 2021-12-14 上海理工大学 Rapid identification method for ship attachment based on WGAN-GP and YOLO
CN114565900A (en) * 2022-01-18 2022-05-31 广州软件应用技术研究院 Target detection method based on improved YOLOv5 and binocular stereo vision
CN114565035A (en) * 2022-02-24 2022-05-31 集美大学 Tongue picture analysis method, terminal equipment and storage medium
CN115512387A (en) * 2022-08-15 2022-12-23 艾迪恩(山东)科技有限公司 Construction site safety helmet wearing detection method based on improved YOLOV5 model
CN115965827A (en) * 2023-01-17 2023-04-14 淮阴工学院 Lightweight small target detection method and device integrating multi-scale features
CN116206185A (en) * 2023-02-27 2023-06-02 山东浪潮科学研究院有限公司 Lightweight small target detection method based on improved YOLOv7

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于改进YOLOv7的X线图像旋转目标检测;成浪 等;《图学学报》;第2023年卷(第02期);324-334页 *

Also Published As

Publication number Publication date
CN116385810A (en) 2023-07-04

Similar Documents

Publication Publication Date Title
CN116385810B (en) Yolov 7-based small target detection method and system
CN110532920B (en) Face recognition method for small-quantity data set based on FaceNet method
WO2019100723A1 (en) Method and device for training multi-label classification model
KR101896357B1 (en) Method, device and program for detecting an object
CN110929593B (en) Real-time significance pedestrian detection method based on detail discrimination
CN110738207A (en) character detection method for fusing character area edge information in character image
CN111985376A (en) Remote sensing image ship contour extraction method based on deep learning
CN113076871A (en) Fish shoal automatic detection method based on target shielding compensation
CN107766864B (en) Method and device for extracting features and method and device for object recognition
Nguyen et al. Satellite image classification using convolutional learning
CN113159023A (en) Scene text recognition method based on explicit supervision mechanism
CN115631344B (en) Target detection method based on feature self-adaptive aggregation
CN110826609A (en) Double-flow feature fusion image identification method based on reinforcement learning
CN113657409A (en) Vehicle loss detection method, device, electronic device and storage medium
CN110008900A (en) A kind of visible remote sensing image candidate target extracting method by region to target
JP7118622B2 (en) OBJECT DETECTION DEVICE, OBJECT DETECTION METHOD AND PROGRAM
CN114973222A (en) Scene text recognition method based on explicit supervision mechanism
CN116310684A (en) Method for detecting three-dimensional target based on multi-mode feature fusion of Transformer
CN115908772A (en) Target detection method and system based on Transformer and fusion attention mechanism
CN114519853A (en) Three-dimensional target detection method and system based on multi-mode fusion
Fan et al. A novel sonar target detection and classification algorithm
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium
CN115115601A (en) Remote sensing ship target detection method based on deformation attention pyramid
CN113743521B (en) Target detection method based on multi-scale context awareness
Zhang et al. YoloXT: A object detection algorithm for marine benthos

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant