CN115063691B - Feature enhancement-based small target detection method in complex scene - Google Patents
Feature enhancement-based small target detection method in complex scene Download PDFInfo
- Publication number
- CN115063691B CN115063691B CN202210780211.6A CN202210780211A CN115063691B CN 115063691 B CN115063691 B CN 115063691B CN 202210780211 A CN202210780211 A CN 202210780211A CN 115063691 B CN115063691 B CN 115063691B
- Authority
- CN
- China
- Prior art keywords
- feature
- network
- prediction
- targets
- scale
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 46
- 238000000034 method Methods 0.000 claims abstract description 23
- 230000004927 fusion Effects 0.000 claims abstract description 9
- 230000002776 aggregation Effects 0.000 claims abstract description 8
- 238000004220 aggregation Methods 0.000 claims abstract description 8
- 238000004364 calculation method Methods 0.000 claims description 18
- 230000004913 activation Effects 0.000 claims description 12
- 230000006835 compression Effects 0.000 claims description 9
- 238000007906 compression Methods 0.000 claims description 9
- 238000010586 diagram Methods 0.000 claims description 8
- 238000011176 pooling Methods 0.000 claims description 8
- 238000012549 training Methods 0.000 claims description 7
- 230000008859 change Effects 0.000 claims description 3
- 230000005284 excitation Effects 0.000 claims description 3
- 238000002360 preparation method Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 2
- 230000007246 mechanism Effects 0.000 abstract description 5
- 230000035945 sensitivity Effects 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 14
- 238000000605 extraction Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 206010028980 Neoplasm Diseases 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000002059 diagnostic imaging Methods 0.000 description 1
- 238000013399 early diagnosis Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
- G06V20/13—Satellite images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/776—Validation; Performance evaluation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Abstract
The invention belongs to the field of computer vision and target detection, and particularly relates to a small target detection method under a complex scene based on feature enhancement. The technical scheme of the invention is as follows: firstly, a Cutout-DA data enhancement method is provided, new shielding data are generated and expanded into a VisDrone2021 data set, then a multi-scale fused characteristic enhancement path aggregation network MSFE-PANet is designed, richer and finer semantic information characteristics and spatial information characteristics are obtained through an integrated attention mechanism, characteristic fusion and a network prediction scale strategy aiming at a small target, a prediction frame rejection Loss function RB_loss is designed, and finally a model is trained. The invention can enhance the mutual fusion of the strong positioning information of the deep feature map and the strong semantic information of the shallow feature map, help the network to find the region of interest in the complex scene and improve the sensitivity to the small target. And the RB_Loss rejection Loss function and the network prediction scale are designed to solve the problems of overlapping, missing detection of small shielding targets and false detection under a complex background.
Description
Technical Field
The invention belongs to the field of computer vision and target detection, and particularly relates to a small target detection method under a complex scene based on feature enhancement.
Background
In recent years, the rapid development of deep learning technology has prompted remarkable breakthrough in computer vision, and has been pushed to unprecedented research hotspots. The main task of computer vision is to parse images, including classification, detection and segmentation of images. Target detection is used as one of the core research directions in the field of computer vision, and specific target classes are found through accurate positioning by using a correlation algorithm. The small target detection has the same important application value as the difficulty of target detection, and plays an important role in the fields of automatic driving, intelligent medical treatment, defect detection, aerial image analysis and the like. Detecting small, remote objects in a high resolution scene photograph of an automobile is a necessary condition for safe deployment of an autonomous automobile; meanwhile, in medical imaging, if the tumor mass and the tumor with the size of only a few pixels can be found early, the early detection is important for accurate and early diagnosis; automated industrial inspection can also benefit from small target inspection by locating small defects that can be seen on the surface of the material. In conclusion, the small target detection has wide application value and important research significance.
Although the target detection algorithm has made a major breakthrough, the study of small targets is still less than ideal due to the significant gap in performance between detecting small targets and large targets. The small target detection in the existing method can not be well applied to actual complex scenes, and mainly has the following problems. 1. Visual characteristics are not obvious: the difficulty in detecting the small target is that the target features are not obvious, available information is less, if the resolution of the image is low, the small target can be represented by a few pixels, and under the condition that the visual features are not obvious, the accurate detection of the small target is a great challenge at present; 2. feature extraction problem: in target detection, the quality of feature extraction directly affects the performance of final detection, and features of small targets are more difficult to extract than those of large-scale targets. Most computer vision architectures use a pooling layer, and some of the features of the small objects are deleted after pooling. Extracting effective small target features in deep neural networks is also a current problem; 3. background interference problems. Small target detection in complex environments is subject to interference from factors such as illumination, complex geographic elements, occlusion, aggregation, etc., so it is difficult to distinguish them from background or similar targets, and effectively improving complex background interference is also a current challenge.
Disclosure of Invention
Aiming at the problems that small targets cannot be accurately detected, characteristics are difficult to extract and detection cannot well meet actual complex scenes in the prior art, the invention provides a small target detection method under complex scenes based on characteristic enhancement.
In order to achieve the above purpose, the technical scheme of the invention is as follows: a small target detection method under a complex scene based on feature enhancement comprises the following steps of
Step 1, data preparation: the dataset is derived from an aerial image;
step 2, data enhancement: the method comprises the steps of firstly, randomly selecting partial data images in a data set, and then randomly shielding the visible partial targets and all targets in the images according to the size proportion of 0.2, 0.4, 0.6 and 0.8 of the targets to generate new shielding data to expand the new shielding data into the VisDrone2021 data set;
step 3, designing a multi-scale fused characteristic enhanced path aggregation network MSFE-PANet;
step 3.1: improving network prediction scale
Removing the prediction head YOLO head3 aiming at the detection large target in YOLO v4, but retaining the corresponding 13 x 13 characteristic diagram; meanwhile, a prediction head YOLO head0 for detecting a small-scale target, which is generated by the shallow high-resolution feature map 104 x 104, is added in the prediction network, and a new network prediction scale structure is generated.
Step 3.2: feature layer fusion
Carrying out corresponding multiple up-sampling on the feature images extracted by each layer of feature network on a new network prediction scale structure, and respectively adding and fusing the feature images with the first layer of feature images to obtain new feature images;
step 3.3: an attention module;
step 4: the prediction block rejection Loss function rb_loss is designed.
Step 5: and training a model.
Step 3.3 above: integrating attention mechanisms in PANet
Step 3.3.1: a CBAM attention module is added as shown in equation (1).
The calculation formula of the channel attention is (2): wherein sigma is a Sigmoid activation function, and MLP weights W 0 And W is 1 Is shared by
The calculation formula of the spatial attention is (3): wherein σ is a Sigmoid activation function, f 7*7 A filter denoted 7*7.
Step 3.3.2: improving the channel attention module of the CBAM;
step 3.3.3: introducing an SE-attention module;
step 3.3.4: improving the SPP module;
step 3.3.5: the SE-attention module is optimized.
Step 3.3.2 above, the calculation formula is defined as (4):
step 3.3.3, giving an input X, the number of channels is C 1 Through F tr Is subjected to a series of convolution and pooling operations to obtain a channel number C 2 Is characterized by U; f (F) sq For feature compression operation, feature compression is carried out along the space dimension, and each two-dimensional feature channel is changed into one pixel; followed by F ex Excitation operations, then weighted by multiplication onto previous features
Calculation formula (5):
in (a): u (U) C Representing the C-th channel in the feature map; z is Z C Is the output of the compression operation. The sigma is a Sigmoid activation function; w (W) 1 ,W 2 All are all fully connected operation; delta is the ReLU activation function. Calculation formula (7) S C Is the C-th weight in step S.
S=F ex (Z,W)=σ(g(Z,W))=σ(W 2 δ(W 1 Z)) (6)
F scale =(U C ,S C )=S C ·U C (7)
The step 3.3.4 specifically comprises the following steps: changing the pooling layer of the kernels with sizes of 1,5, 9 and 13 in the SPP into 1*1 convolution and 3*3 hole convolution, the improved SPP module does not change the size of the feature map, and the output feature map size calculation formula is (8):
the step 3.3.5 specifically comprises the following steps: an improved SPP module is added in the SE-attention to obtain an SSE-attention module.
The step 4 specifically comprises the following steps: taking the degree of overlap IOU between the prior prediction frames of the two overlapped targets as the loss value, optimizing a back propagation network according to the gradient direction, separating the overlapped prior prediction frames of the two targets, defining the prior prediction frames as (9) expressing the matching of different target frames in the formula
Compared with the prior art, the invention has the beneficial effects that:
1. compared with the Mosaic and CutMix data enhancement methods of the reference network YOLOv4, the method has the advantage that mAP on the data set VisDrone2021 is improved by 3.57 precision by adopting the Cutout-DA data enhancement strategy designed by the method. The result fully shows the effectiveness of the detection algorithm in adopting a Cutout-DA strategy aiming at a small target; in the YOLOv4 prediction network, because the output prior prediction frames need to be judged and processed by the NMS, mutual shielding and overlapping targets can influence target frame matching, so that a large number of conditions of missed detection and false detection are caused. The RB_loss provided by the invention further reduces the mutual influence of shielding target detection through the IOU, and the mAP improves the accuracy by 2.8.
2. According to the multi-scale fusion characteristic enhanced path aggregation network MSFE-PANet, richer and finer semantic information characteristics and spatial information characteristics can be obtained through network prediction scale strategy and multi-scale characteristic fusion aiming at small targets, and the mAP improves 9.47 precision on two strategies of Cutout-DA and RB_Loss, so that the accuracy of small target detection is greatly improved; and then an LW-CBAM and SSE-Attention mechanism is added, so that an Attention area is further extracted, the network is helped to concentrate on useful small target objects, the mAP improves 6.63 accuracies, and the problems of omission and false detection of overlapping and shielding small targets under a complex background are solved.
3. The invention can accurately detect small targets, has easy feature extraction and can meet various actual complex scenes. The application range is wide, and the adaptability is strong.
Drawings
FIG. 1 is a diagram of a multi-scale converged feature enhanced path aggregation network MSFE-PANet structure in the present invention;
FIG. 2 is a detailed structure diagram of MSFE-PANet in the present invention;
FIG. 3 is a predicted scale improvement architecture in accordance with the present invention;
FIG. 4 is a channel attention structure of a CBAM according to the present invention;
FIG. 5 is a spatial attention structure of a CBAM according to the present invention;
FIG. 6 is a comparison of results from different modules in the present invention;
FIG. 7 illustrates various embedding patterns of the attention module according to the present invention;
FIG. 8 is a detailed result image of MSFE-PANet in an embodiment of the present invention;
FIG. 9 is a visual result image of MSFE-PANet in an embodiment of the present invention;
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.
The multi-scale fused characteristic enhanced path aggregation network MSFE-PANet can enhance the mutual fusion of the strong positioning information of the deep characteristic diagram and the strong semantic information of the shallow characteristic diagram, help the network to find the region of interest in the complex scene and improve the sensitivity to the small target. And designing an RB_Loss rejection Loss function and a network prediction scale to solve the problems of overlapping and small target shielding missed detection and false detection under a complex background.
Referring to fig. 1, the method for detecting the small target in the complex scene based on the feature enhancement provided by the invention comprises the following steps:
step 1: data preparation. The method comprises the following steps: a large aerial dataset visclone 2021 was used, the image size of which was approximately 2000 x 1500, containing a variety of scenes from country to city, and containing various climate changes, light and shade changes, and shooting angle changes, etc., while including 10 categories of pedestrians, automobiles, bicycles, and tricycles, 6471 images in the dataset were used for training, 548 images were used for verification, and 1610 images were used for testing.
Step 2: data enhancement. The method comprises the following steps: according to the method, partial data images are selected at will in a data set, then partial positions of visible partial targets and all targets in the images are shielded randomly according to 0.2, 0.4, 0.6 and 0.8 of the size proportion of the targets, new shielding data are generated and expanded into the data set, robustness of a model on the shielding targets is enhanced, and accuracy of judgment on the shielding targets is improved.
Step 3: : and (5) algorithm design. The method comprises the following steps: and designing a multi-scale fusion characteristic enhanced path aggregation network FEMF-PANet.
Step 3.1: the network prediction scale is improved, see fig. 3. Removing the prediction head YOLO head3 aiming at the detection large target in YOLO v4, but retaining the corresponding 13 x 13 characteristic diagram; meanwhile, a prediction head YOLO head0 which is generated by the shallow high-resolution feature graphs 104 x 104 and aims at detecting a small-scale target is added in the prediction network, so that a new network prediction scale structure is obtained.
Step 3.2: referring to fig. 3, the feature layer fusion is specifically: the feature images extracted from each layer of feature network are up-sampled by corresponding times on a new network prediction scale structure, and are respectively added and fused with the first layer of feature images to obtain new feature images, so that the feature prediction network is finer, and the detection precision of small targets is improved;
step 3.3: the attention module is introduced. The method comprises the following steps: attention mechanisms are integrated in PANet.
Step 3.3.1: a CBAM attention module is added. CBAM can be integrated into most CNN network frameworks, enabling end-to-end training. Given an intermediate feature map as input, the CBAM sequentially extrapolates the attention pattern along two independent dimensions of the channel and space, as shown in equation (1).
Referring to fig. 4, an improved CBAM attention module is introduced, wherein the channel attention is calculated as (10): sigma is a Sigmoid activation function, and MLP weights W 0 And W is 1 Is shared by
Referring to FIG. 5, at the spatial attention module, a spatial attention pattern is generated using the spatial relationships of features.
The calculation formula of the spatial attention is (3): sigma is a Sigmoid activation function, f 7*7 A filter denoted 7*7.
Step 3.3.2: the channel attention module of CBAM is improved. The present invention uses the convolution of 1*1 instead of the fully connected layers in the channel attention module, resulting in a lighter weight convolved attention module LW-CBAM. The calculation formula can be defined as (11):
step 3.3.3: a SE-attention module is introduced. The method comprises the following steps: first, an input X is given, and the number of channels is C 1 Through F tr Is subjected to a series of convolution and pooling operations to obtain a channel number C 2 Is characterized by U; f (F) sq For the feature compression operation, feature compression is carried out along the space dimension, each two-dimensional feature channel is changed into a pixel, the pixel has a global receptive field, and the output dimension is matched with the input feature channel number; followed by F ex Excitation operation, based on correlation among characteristic channels, each characteristic channel generates a weight to represent importance degree of each characteristic channel, and then the importance degree is weighted to the previous characteristic through multiplication to finish the calibration of the important characteristic. Calculation formula (5): u (U) C Representing the C-th channel in the feature map; z is Z C Is the output of the compression operation. The sigma is a Sigmoid activation function; w (W) 1 ,W 2 All are all fully connected operation; delta is the ReLU activation function. Calculation formula (7) S C Is the C-th weight in step S.
S=F ex (Z,W)=σ(g(Z,W))=σ(W 2 δ(W 1 Z)) (6)
F scale =(U C ,S C )=S C ·U C (7)
Step 3.3.4: referring to fig. 2, the SPP module is modified. The method comprises the following steps: the pooling layer of the 1 x 1,5 x 5,9 x 9 and 13 x 13 size kernels in the SPP was changed to 1*1 convolution and 3*3 hole convolution, but the improved SPP module did not change the size of the feature map. The output characteristic diagram size calculation formula is (13):
step 3.3.5: referring to fig. 2, an optimized SE-attention module is integrated, and an improved SPP module is added in the SE-attention to enhance the expression capability of feature information of a feature map input into the SE-attention, thereby achieving a better classification effect.
Referring to fig. 7, the present invention embeds LW-CBAM and SSE-Attention mechanism modules in two different regions of the neck and detection head of the network, respectively, according to the new network prediction scale structure, to enhance important channels and spatial features. And experimental verification is carried out by adopting four embedding modes, so that an optimal MSFE-PANet network model is obtained, and the performance of small target detection is improved.
Referring to fig. 8, after the optimal attention module is embedded, the method of the invention overlaps small targets, gathers the small targets and blocks the detailed effects of small target detection in a complex background.
Step 4: in the model training scheme, a prediction block rejection Loss function RB_Loss is designed
The method comprises the following steps: the degree of overlap IOU between the a priori predicted frames of two overlapping targets is taken as the value of the penalty. The larger the overlap, the larger the value of the loss function, and in the training phase, the back propagation network will be optimized according to the gradient direction, separating the overlapping a priori prediction frames of the two targets. The rejection loss function is combined with the YOLOv4 model, so that the rejection loss function accords with small target detection under a complex application scene, and the problem that targets in an image are mutually shielded and overlapped is effectively solved. Defined as (9) a priori prediction box in the formula and representing the matching of different target boxes.
The method accords with small target detection in a complex application scene, and effectively solves the problem that targets in images are mutually shielded and overlapped;
step 5: the training model, the network was trained on the visclone 2021 dataset with 200epochs, the experiment set the input picture Size to 416 x 416, the first 100epoch set Batch Size to 4, and the second 100epoch set Batch Size to 8.
Referring to fig. 6, the method of the present invention was validated against the visclone 2021 dataset and compared to the detection performance of the baseline network YOLOv 4. The effectiveness of the method for detecting the small target in the complex scene is verified by gradually adding corresponding modules such as a Cutout-DA data enhancement method, an attention module, an RB_loss and the like.
Referring to fig. 9, the method of the present invention compares the result with other methods, and can accurately detect the missing detection and false detection of the small target, and adapt to the detection task of the small target in the complex scene.
Claims (1)
1. A small target detection method under a complex scene based on feature enhancement comprises the following steps of
Step 1, data preparation: the dataset is derived from an aerial image;
step 2, data enhancement: the method comprises the steps of firstly, randomly selecting partial data images in a data set, then randomly shielding the visible partial targets and all targets in the images according to the size proportion of 0.2, 0.4, 0.6 and 0.8 of the targets, generating new shielding data and expanding the new shielding data into a VisDrone2021 data set;
step 3, designing a multi-scale fused characteristic enhanced path aggregation network MSFE-PANet;
step 3.1: improving network prediction scale
Removing the prediction head YOLO head3 aiming at the detection large target in YOLO v4, but retaining the corresponding 13 x 13 characteristic diagram; meanwhile, a prediction head YOLO head0 which is generated by shallow high-resolution feature graphs 104 x 104 and aims at detecting a small-scale target is added in a prediction network, and a new network prediction scale structure is generated;
step 3.2: feature layer fusion
Carrying out corresponding multiple up-sampling on the feature images extracted by each layer of feature network on a new network prediction scale structure, and respectively adding and fusing the feature images with the first layer of feature images to obtain new feature images;
step 3.3: an attention module;
step 4: designing a prediction frame rejection Loss function RB_Loss;
step 5: training a model;
the step 3.3 specifically comprises
Step 3.3.1: adding a CBAM attention module as shown in a formula (1);
the calculation formula of the channel attention is (2): wherein sigma is a Sigmoid activation function, and MLP weights W 0 And W is 1 Is shared by
The calculation formula of the spatial attention is (3): wherein σ is a Sigmoid activation function, f 7*7 A filter denoted 7*7;
step 3.3.2: improving the channel attention module of the CBAM;
step 3.3.3: introducing an SE-attention module;
step 3.3.4: improving the SPP module;
step 3.3.5: optimizing the SE-attention module;
said step 3.3.2, the calculation formula is defined as (4)
Step 3.3.3, giving an input X, the channel number is C 1 Through F tr Is subjected to a series of convolution and pooling operations to obtain a channel number C 2 Is characterized by U; f (F) sq For feature compression operation, feature compression is carried out along the space dimension, and each two-dimensional feature channel is changed into one pixel; then F is carried out ex Excitation operations, then weighted by multiplication onto previous features
Calculation formula (5):
wherein: u (U) C Representing the C-th channel in the feature map; z is Z C Is the output of the compression operation; calculation formula (6): sigma is a Sigmoid activation function; w (W) 1 ,W 2 All are all fully connected operation; delta is a ReLU activation function; calculation formula (7): s is S C Is the C weight in the step S;
S=F ex (Z,W)=σ(g(Z,W))=σ(W 2 δ(W 1 Z)) (6)
F scale =(U C ,S C )=S C ·U C (7);
the step 3.3.4 is specifically that
The pooling layer of the kernels with the sizes of 1,5, 9 and 13 in the SPP is changed into 1*1 convolution and 3*3 cavity convolution, the improved SPP module does not change the size of the feature map, and the size calculation formula of the output feature map is (8)
The step 3.3.5 is specifically
Adding an improved SPP module into the SE-attention to obtain an SSE-attention module;
the step 4 is specifically that
Taking the degree of overlap IOU between the prior prediction frames of the two overlapped targets as the loss value, optimizing a back propagation network according to the gradient direction, separating the overlapped prior prediction frames of the two targets, defining the prior prediction frames as (9) expressing the matching of different target frames in the formula
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210780211.6A CN115063691B (en) | 2022-07-04 | 2022-07-04 | Feature enhancement-based small target detection method in complex scene |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210780211.6A CN115063691B (en) | 2022-07-04 | 2022-07-04 | Feature enhancement-based small target detection method in complex scene |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115063691A CN115063691A (en) | 2022-09-16 |
CN115063691B true CN115063691B (en) | 2024-04-12 |
Family
ID=83204087
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210780211.6A Active CN115063691B (en) | 2022-07-04 | 2022-07-04 | Feature enhancement-based small target detection method in complex scene |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115063691B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112733749A (en) * | 2021-01-14 | 2021-04-30 | 青岛科技大学 | Real-time pedestrian detection method integrating attention mechanism |
WO2021139069A1 (en) * | 2020-01-09 | 2021-07-15 | 南京信息工程大学 | General target detection method for adaptive attention guidance mechanism |
CN114565900A (en) * | 2022-01-18 | 2022-05-31 | 广州软件应用技术研究院 | Target detection method based on improved YOLOv5 and binocular stereo vision |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11256960B2 (en) * | 2020-04-15 | 2022-02-22 | Adobe Inc. | Panoptic segmentation |
-
2022
- 2022-07-04 CN CN202210780211.6A patent/CN115063691B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021139069A1 (en) * | 2020-01-09 | 2021-07-15 | 南京信息工程大学 | General target detection method for adaptive attention guidance mechanism |
CN112733749A (en) * | 2021-01-14 | 2021-04-30 | 青岛科技大学 | Real-time pedestrian detection method integrating attention mechanism |
CN114565900A (en) * | 2022-01-18 | 2022-05-31 | 广州软件应用技术研究院 | Target detection method based on improved YOLOv5 and binocular stereo vision |
Also Published As
Publication number | Publication date |
---|---|
CN115063691A (en) | 2022-09-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Bolte et al. | Towards corner case detection for autonomous driving | |
CN107527352B (en) | Remote sensing ship target contour segmentation and detection method based on deep learning FCN network | |
CN105488517B (en) | A kind of vehicle brand type identifier method based on deep learning | |
CN111768388B (en) | Product surface defect detection method and system based on positive sample reference | |
CN111915530B (en) | End-to-end-based haze concentration self-adaptive neural network image defogging method | |
CN109522855B (en) | Low-resolution pedestrian detection method and system combining ResNet and SENet and storage medium | |
CN110222604B (en) | Target identification method and device based on shared convolutional neural network | |
CN111460914A (en) | Pedestrian re-identification method based on global and local fine-grained features | |
CN112215074A (en) | Real-time target identification and detection tracking system and method based on unmanned aerial vehicle vision | |
Cai et al. | MHA-Net: Multipath Hybrid Attention Network for building footprint extraction from high-resolution remote sensing imagery | |
CN112801027A (en) | Vehicle target detection method based on event camera | |
CN114972989A (en) | Single remote sensing image height information estimation method based on deep learning algorithm | |
CN116229452B (en) | Point cloud three-dimensional target detection method based on improved multi-scale feature fusion | |
CN117037004A (en) | Unmanned aerial vehicle image detection method based on multi-scale feature fusion and context enhancement | |
Wang et al. | Prohibited items detection in baggage security based on improved YOLOv5 | |
Zhang et al. | SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-view 3D Object Detection | |
CN115063691B (en) | Feature enhancement-based small target detection method in complex scene | |
Tang et al. | HIC-YOLOv5: Improved YOLOv5 For Small Object Detection | |
Zhang et al. | Drone video object detection using convolutional neural networks with time domain motion features | |
CN115797684A (en) | Infrared small target detection method and system based on context information | |
Xie et al. | Pedestrian detection and location algorithm based on deep learning | |
CN116912670A (en) | Deep sea fish identification method based on improved YOLO model | |
Li et al. | ECA-YOLOv5: Multi scale infrared salient target detection algorithm based on anchor free network | |
CN112668495B (en) | Full-time space convolution module-based violent video detection algorithm | |
CN117765378B (en) | Method and device for detecting forbidden articles in complex environment with multi-scale feature fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |