CN112508014A - Improved YOLOv3 target detection method based on attention mechanism - Google Patents
Improved YOLOv3 target detection method based on attention mechanism Download PDFInfo
- Publication number
- CN112508014A CN112508014A CN202011396416.1A CN202011396416A CN112508014A CN 112508014 A CN112508014 A CN 112508014A CN 202011396416 A CN202011396416 A CN 202011396416A CN 112508014 A CN112508014 A CN 112508014A
- Authority
- CN
- China
- Prior art keywords
- attention mechanism
- feature
- channel
- target detection
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 45
- 230000007246 mechanism Effects 0.000 title claims abstract description 34
- 238000011176 pooling Methods 0.000 claims abstract description 14
- 230000004927 fusion Effects 0.000 claims abstract description 13
- 238000000034 method Methods 0.000 claims abstract description 10
- 238000012549 training Methods 0.000 claims abstract description 7
- 238000000605 extraction Methods 0.000 claims abstract description 6
- 238000005070 sampling Methods 0.000 claims description 10
- 238000010586 diagram Methods 0.000 claims description 9
- 230000003044 adaptive effect Effects 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 2
- 238000002474 experimental method Methods 0.000 abstract 1
- 238000013527 convolutional neural network Methods 0.000 description 6
- 239000013598 vector Substances 0.000 description 6
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000007306 turnover Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Abstract
The invention discloses an improved YOLOv3 target detection method based on an attention mechanism, wherein an attention module SKNet is introduced into a backbone network Darknet-53, the size of a convolution kernel is adaptively adjusted according to input, and the convolution kernel is focused on an interested area; introducing a spatial pyramid pooling module SPP at the top of the feature extraction network to increase the receptive field of the network; and introducing a channel attention module SENET in the feature fusion network, distributing weight to the channel, and fully extracting effective feature information of the channel. Experiments show that compared with the original YOLOv3 model, the method can effectively detect small targets, accelerate the convergence rate of training and improve the detection precision on the premise that the detection speed is not greatly influenced.
Description
Technical Field
The invention relates to an improved YOLOv3 target detection method based on an attention mechanism, and belongs to the technical field of target detection in image processing.
Background
Target detection serves as a foundation for image understanding and computer vision, and is the basis for solving more complex and higher-level visual tasks such as segmentation, scene understanding, target tracking, image description, event detection and activity recognition. Target detection has wide application in many fields such as artificial intelligence and information technology, such as security, human-computer interaction, automatic driving, robot vision, consumer electronics, content-based image retrieval, intelligent video monitoring and augmented reality.
Currently, a series of target detection algorithms based on deep learning can be roughly divided into two major genres:
1. two-step (two-stage) algorithm: candidate regions are generated and then CNN classification (R-CNN series) is performed,
2. one-step (one-stage) algorithm: the algorithm is applied directly to the input image and the class and corresponding localization (YOLO series) are output.
Although the accuracy rate of the previous R-CNN series is higher, even if the R-CNN series is developed to fast R-CNN, the detection speed is only 7FPS (original text is 5FPS), and the YOLO series greatly improves the detection speed on the basis of giving consideration to the accuracy rate, so that the detection work can be carried out in a real-time scene. The detection idea of YOLO is different from that of the R-CNN series, and it solves the target detection as a regression task. The YOLO neural network directly predicts the target position and probability from the complete image in one-time prediction, and is an end-to-end network structure.
The YOLOv3 is a target detection method which is applied more currently, and improves YOLO, so that the network is better improved in small target detection and detection precision, the detection speed is not greatly influenced, and the detection real-time requirement is still met. However, YOLOv3 still has the following problems: the accuracy of target positioning is not high; the training convergence speed is low; the small target detection error rate is high.
Disclosure of Invention
The invention aims to provide an improved YOLOv3 target detection method based on an attention mechanism, which can effectively detect small targets to a certain extent, accelerate the convergence speed of training and improve the detection precision on the premise that the detection speed is not greatly influenced.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention discloses an improved YOLOv3 target detection method based on an attention mechanism, which comprises the following steps:
s1: preprocessing an original image, and normalizing the original image into 416 multiplied by 3 to obtain a training sample;
s2: modifying the network structure of Darknet-53, and introducing an attention mechanism of the size of an adaptive convolution kernel in each residual layer Basic Block module;
s3: introducing a spatial pyramid pooling module SPP at the top of Darknet-53 to increase the receptive field of the feature extraction network;
s4: extracting image features by using an improved Darknet-53 network, and leading out feature maps (feature maps) of three scales from different depths of the network to a feature fusion branch;
s5: a channel attention mechanism is introduced into the three characteristic fusion branches, weights are distributed to the channels, and effective characteristic information of the channels is fully extracted;
s6: and finally, respectively predicting on the three branches to obtain a multi-scale target detection result.
As a further technical solution of the present invention, in the step S1, the preprocessing manner includes random rotation, horizontal inversion and normalization.
As a further technical solution of the present invention, in step S2, the method for introducing the attention mechanism of the adaptive convolution kernel size into the residual layer Basic Block module includes:
s21: inserting a convolution module Selective Kernel Networks with the size of the self-adaptive convolution Kernel after a convolution layer with the size of the first convolution Kernel of 1 multiplied by 1;
s22: the convolution kernel size of the original second convolution layer is modified from 3 × 3 to 1 × 1. The original 3 × 3 convolution is replaced by SKNet, and a 1 × 1 convolution is inserted behind SKNet, so that the modified residual structure is similar to a bottleneck block.
As a further technical solution of the present invention, in the step S3, the method for introducing the spatial pyramid pooling module SPP at the top of the Darknet-53 includes:
s31: 4 branches are led out from the output of the last basic convolution module of Darknet-53;
s32: the first, the second and the third branch respectively pass through the maximum pooling layer a1、a2、a3,a1Has a kernel size of 5, step size of 1, a2Has a kernel size of 9, step size of 1, a3The kernel size of (1) is 13, the step length is 1, and the last branch retains the original output characteristics;
s33: splicing the outputs of the 4 branches on the channel dimension to obtain a new feature map;
s34: and finally, passing the newly obtained feature diagram through a convolution layer to obtain the channel number of the original feature. Keeping the input and output profile dimensions of the SPP block equal.
The SPP module is designed for plug and play, so it is important to keep the dimensions constant, which ensures that SPPs can be plugged anywhere in the network without error.
As a further technical solution of the present invention, in the step S5, the method for introducing the attention mechanism into the feature fusion branch includes:
two branches of 8-time down-sampling and 16-time down-sampling are selected, and after the Upesample at the upper sampling layer and feature graphs output by the two branches are spliced and fused according to channel dimensions, a channel attention mechanism module Squeeze-and-Excitation Networks is inserted. The two branches of 8-time down-sampling and 16-time down-sampling correspond to feature maps with different sizes, and the feature maps with different sizes are fused, so that multi-scale information can be fully utilized, and the detection precision of the target object under different scales is improved. The multi-scale features are obtained by directly splicing according to channel dimensions, information of some channels may have redundancy, weights are distributed to the channels, effective information of the channels can be fully extracted, and redundant information is reduced
Compared with the prior art, the invention has the following beneficial effects: the invention discloses an improved YOLOv3 target detection method based on an attention mechanism, wherein the attention mechanism and a spatial pyramid pooling module SPP with the size of a self-adaptive convolution kernel are introduced into a feature extraction network, the size of a receptive field can be self-adaptively adjusted according to the size of a detected target, the detection target is better focused on an interested area, the positioning precision of the target is improved, and the detection error rate of the small target is reduced; a channel attention mechanism is introduced into the feature fusion branch, meaningful channel feature information in an input image is focused, and the weight of redundant information is reduced; in addition, the invention also accelerates the convergence speed of the model in the training process, and improves the detection precision on the premise that the detection speed is not greatly influenced. Experimental results show that on the premise that the number of model parameters is slightly increased, the precision of the VOC data set and the precision of the COCO data set are obviously improved.
Drawings
FIG. 1 is a flow chart of the improved YOLOv3 target detection method based on attention mechanism of the present invention;
FIG. 2 is a diagram showing a comparison of the structure of residual modules of the present invention, wherein (A) is the original residual module and (B) is the residual module after the attention mechanism is introduced;
FIG. 3 is a diagram of a Selective Kernel Networks network architecture for use with the present invention;
FIG. 4 is a block SPP of the present invention incorporating spatial pyramid pooling;
FIG. 5 is a diagram of a feature fusion bypass network architecture after an attention-calling mechanism of the present invention;
FIG. 6 is a channel attention module for use with the present invention.
Detailed Description
The technical solution of the present invention will be further described with reference to the following detailed description and accompanying drawings.
Example 1: the specific embodiment discloses an improved YOLOv3 target detection method based on an attention mechanism, as shown in fig. 1 to 6, comprising the following steps:
s1: preprocessing an original image, and normalizing the original image into 416 multiplied by 3 to obtain a training sample, wherein the preprocessing mode comprises random rotation (-30 degrees to 30 degrees), horizontal turnover (50 percent of probability) and standardization processing;
s2: the characteristic extraction network Darknet-53 is composed of a large number of residual modules Basic Block, the convolution is adopted to finish down-sampling, the network structure of the Darknet-53 is modified, and an attention mechanism of the size of a self-adaptive convolution kernel is introduced into each residual layer Basic Block module, so that the network can automatically adjust the size of a receptive field according to the size of a detection target and better focus on an interested area;
s3: introducing a spatial pyramid pooling module SPP at the top of Darknet-53 to increase the receptive field of the feature extraction network;
s4: extracting image features by using the improved Darknet-53 network, and leading out the features from the feature maps of three different scales of 32 times, 16 times and 8 times of down-sampling of the network to a feature fusion branch for respectively detecting the targets with different sizes, so that the detection method can have better detection effect on the targets with different scales;
s5: introducing a channel attention mechanism into the three characteristic fusion branches;
s6: and finally, forecasting on the three branches respectively to forecast the position of the target and the confidence coefficient of the category of the target, so as to obtain a multi-scale target detection result.
In step S2, the method for introducing the attention mechanism into the residual layer Basic Block module is as follows:
s21: inserting a convolution module Selective Kernel Networks with the adaptive convolution Kernel size after a convolution layer with the first convolution Kernel size of 1 multiplied by 1, and introducing an attention mechanism; the structure diagram of the SKNet is shown in figure 3, the input is firstly processed by convolution layers with different convolution kernel sizes, the two outputs are added point by point and then subjected to global average pooling, then the two outputs pass through a full connection layer, the obtained channel information is divided into two sub-vectors A and B through Softmax, the two sub-vectors are respectively multiplied with the convolution output of the first step, and finally the two feature vectors are added point by point to obtain a final output result;
s22: the convolution kernel size of the original second convolutional layer is modified from 3 × 3 to 1 × 1, so that the modified residual layer is similar to the BottleNeck module.
As shown in fig. 2, fig. 2(a) is the original residual Block Basic Block, and fig. 2(B) is the residual Block after SKNet is introduced, which is similar to BottleNeck.
In step S3, the method for introducing the spatial pyramid pooling module SPP at the top of the Darknet-53 is as follows:
s31: 4 branches are led out from the output of the last basic convolution module of Darknet-53;
s32: a first, a second,The third branch passes through the maximum pooling layer a1、a2、a3,a1Has a kernel size of 5, step size of 1, a2Has a kernel size of 9, step size of 1, a3The kernel size of (1) is 13, the step length is 1, and the last branch retains the original output characteristics;
s33: splicing the outputs of the 4 branches on the channel dimension to obtain a new feature map;
s34: finally, the newly obtained feature graph passes through a convolution layer to obtain the channel number of the original feature;
the SPP network structure is shown in fig. 4.
In step S5, the network structure diagram after the attention mechanism is introduced into the feature fusion branch is shown in fig. 4, that is, after the Upsample and the feature diagram of the corresponding scale are merged and fused according to the channel dimension concat, a channel attention mechanism module Squeeze-and-Excitation Networks is inserted, and the weight distribution of the channels is adjusted, so that the channel information after the feature fusion is more effective.
FIG. 6 is a block diagram of SENEt, which does not change the dimension of the input feature vector; firstly, performing global average pooling on the feature vectors, and obtaining channel information when the dimension is changed into 1 multiplied by C; then, a full connection layer and a ReLU activation function are passed; then, obtaining the weight of the channel through a full connection layer and a Sigmoid function; and finally, multiplying the weight distributed by the channel by the input feature vector to obtain the output feature after the attention of the channel.
The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can understand that the modifications or substitutions within the technical scope of the present invention are included in the scope of the present invention, and therefore, the scope of the present invention should be subject to the protection scope of the claims.
Claims (5)
1. An improved Yolov3 target detection method based on an attention mechanism is characterized by comprising the following steps:
s1: preprocessing an original image, and normalizing the original image into 416 multiplied by 3 to obtain a training sample;
s2: modifying the network structure of Darknet-53, and introducing an attention mechanism of the size of an adaptive convolution kernel in each residual layer Basic Block module;
s3: introducing a spatial pyramid pooling module SPP at the top of Darknet-53 to increase the receptive field of the feature extraction network;
s4: extracting image features by using an improved Darknet-53 network, and leading out feature maps (feature maps) of three scales from different depths of the network to a feature fusion branch;
s5: a channel attention mechanism is introduced into the three characteristic fusion branches, weights are distributed to the channels, and effective characteristic information of the channels is fully extracted;
s6: and finally, respectively predicting on the three branches to obtain a multi-scale target detection result.
2. The improved YOLOv3 target detection method based on attention mechanism as claimed in claim 1, wherein the preprocessing comprises random rotation, horizontal inversion and normalization in step S1.
3. The improved YOLOv3 target detection method based on attention mechanism as claimed in claim 1, wherein in step S2, the method of introducing the attention mechanism of adaptive convolution kernel size in the residual layer Basic Block module is:
s21: inserting a convolution module Selective Kernel Networks (SKNet) with the adaptive convolution Kernel size after a convolution layer with the first convolution Kernel size of 1 multiplied by 1;
s22: the convolution kernel size of the original second convolution layer is modified from 3 × 3 to 1 × 1.
4. The improved YOLOv3 target detection method based on attention mechanism as claimed in claim 1, wherein in step S3, the method of introducing the spatial pyramid pooling module SPP at the top of the Darknet-53 is:
s31: 4 branches are led out from the output of the last basic convolution module of Darknet-53;
s32: the first, the second and the third branch respectively pass through the maximum pooling layer a1、a2、a3,a1Has a kernel size of 5, step size of 1, a2Has a kernel size of 9, step size of 1, a3The kernel size of (1) is 13, the step length is 1, and the last branch retains the original output characteristics;
s33: splicing the outputs of the 4 branches on the channel dimension to obtain a new feature map;
s34: and finally, passing the newly obtained feature diagram through a convolution layer to obtain the channel number of the original feature.
5. The improved YOLOv3 target detection method based on attention mechanism as claimed in claim 1, wherein in step S5, the method for introducing the channel attention mechanism in the feature fusion branch is:
two branches of 8-time down-sampling and 16-time down-sampling are selected, and after the Upsample at the upper sampling layer and the feature maps output by the two branches are spliced and fused according to the channel dimension, a channel attention mechanism module Squeeze-and-Excitation Networks (SEnet) is inserted.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011396416.1A CN112508014A (en) | 2020-12-04 | 2020-12-04 | Improved YOLOv3 target detection method based on attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011396416.1A CN112508014A (en) | 2020-12-04 | 2020-12-04 | Improved YOLOv3 target detection method based on attention mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112508014A true CN112508014A (en) | 2021-03-16 |
Family
ID=74969561
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011396416.1A Pending CN112508014A (en) | 2020-12-04 | 2020-12-04 | Improved YOLOv3 target detection method based on attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112508014A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112990325A (en) * | 2021-03-24 | 2021-06-18 | 南通大学 | Light network construction method for embedded real-time visual target detection |
CN113111828A (en) * | 2021-04-23 | 2021-07-13 | 中国科学院宁波材料技术与工程研究所 | Three-dimensional defect detection method and system for bearing |
CN113223044A (en) * | 2021-04-21 | 2021-08-06 | 西北工业大学 | Infrared video target detection method combining feature aggregation and attention mechanism |
CN113378672A (en) * | 2021-05-31 | 2021-09-10 | 扬州大学 | Multi-target detection method for defects of power transmission line based on improved YOLOv3 |
CN113393438A (en) * | 2021-06-15 | 2021-09-14 | 哈尔滨理工大学 | Resin lens defect detection method based on convolutional neural network |
CN113837275A (en) * | 2021-09-24 | 2021-12-24 | 南京邮电大学 | Improved YOLOv3 target detection method based on expanded coordinate attention |
CN113902735A (en) * | 2021-09-13 | 2022-01-07 | 云南春芯科技有限公司 | Crop disease identification method and device, electronic equipment and storage medium |
CN114724022A (en) * | 2022-03-04 | 2022-07-08 | 大连海洋大学 | Culture fish school detection method, system and medium fusing SKNet and YOLOv5 |
CN114724022B (en) * | 2022-03-04 | 2024-05-10 | 大连海洋大学 | Method, system and medium for detecting farmed fish shoal by fusing SKNet and YOLOv5 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200057935A1 (en) * | 2017-03-23 | 2020-02-20 | Peking University Shenzhen Graduate School | Video action detection method based on convolutional neural network |
CN111079584A (en) * | 2019-12-03 | 2020-04-28 | 东华大学 | Rapid vehicle detection method based on improved YOLOv3 |
CN111814621A (en) * | 2020-06-29 | 2020-10-23 | 中国科学院合肥物质科学研究院 | Multi-scale vehicle and pedestrian detection method and device based on attention mechanism |
-
2020
- 2020-12-04 CN CN202011396416.1A patent/CN112508014A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200057935A1 (en) * | 2017-03-23 | 2020-02-20 | Peking University Shenzhen Graduate School | Video action detection method based on convolutional neural network |
CN111079584A (en) * | 2019-12-03 | 2020-04-28 | 东华大学 | Rapid vehicle detection method based on improved YOLOv3 |
CN111814621A (en) * | 2020-06-29 | 2020-10-23 | 中国科学院合肥物质科学研究院 | Multi-scale vehicle and pedestrian detection method and device based on attention mechanism |
Non-Patent Citations (1)
Title |
---|
ASHERGAGA: "【论文解读】SKNet网络(自适应调整感受野尺寸", pages 1 - 8, Retrieved from the Internet <URL:https://zhuanlan.zhihu.com/p/80513438> * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112990325A (en) * | 2021-03-24 | 2021-06-18 | 南通大学 | Light network construction method for embedded real-time visual target detection |
CN113223044A (en) * | 2021-04-21 | 2021-08-06 | 西北工业大学 | Infrared video target detection method combining feature aggregation and attention mechanism |
CN113111828A (en) * | 2021-04-23 | 2021-07-13 | 中国科学院宁波材料技术与工程研究所 | Three-dimensional defect detection method and system for bearing |
CN113378672A (en) * | 2021-05-31 | 2021-09-10 | 扬州大学 | Multi-target detection method for defects of power transmission line based on improved YOLOv3 |
CN113393438A (en) * | 2021-06-15 | 2021-09-14 | 哈尔滨理工大学 | Resin lens defect detection method based on convolutional neural network |
CN113393438B (en) * | 2021-06-15 | 2022-09-16 | 哈尔滨理工大学 | Resin lens defect detection method based on convolutional neural network |
CN113902735A (en) * | 2021-09-13 | 2022-01-07 | 云南春芯科技有限公司 | Crop disease identification method and device, electronic equipment and storage medium |
CN113837275A (en) * | 2021-09-24 | 2021-12-24 | 南京邮电大学 | Improved YOLOv3 target detection method based on expanded coordinate attention |
CN113837275B (en) * | 2021-09-24 | 2023-10-17 | 南京邮电大学 | Improved YOLOv3 target detection method based on expanded coordinate attention |
CN114724022A (en) * | 2022-03-04 | 2022-07-08 | 大连海洋大学 | Culture fish school detection method, system and medium fusing SKNet and YOLOv5 |
CN114724022B (en) * | 2022-03-04 | 2024-05-10 | 大连海洋大学 | Method, system and medium for detecting farmed fish shoal by fusing SKNet and YOLOv5 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112508014A (en) | Improved YOLOv3 target detection method based on attention mechanism | |
CN109344725B (en) | Multi-pedestrian online tracking method based on space-time attention mechanism | |
CN109800689B (en) | Target tracking method based on space-time feature fusion learning | |
CN111709311B (en) | Pedestrian re-identification method based on multi-scale convolution feature fusion | |
CN111079646A (en) | Method and system for positioning weak surveillance video time sequence action based on deep learning | |
CN113673510B (en) | Target detection method combining feature point and anchor frame joint prediction and regression | |
CN108520203B (en) | Multi-target feature extraction method based on fusion of self-adaptive multi-peripheral frame and cross pooling feature | |
CN112434599B (en) | Pedestrian re-identification method based on random occlusion recovery of noise channel | |
CN112084911B (en) | Human face feature point positioning method and system based on global attention | |
CN112434723B (en) | Day/night image classification and object detection method based on attention network | |
CN111723660A (en) | Detection method for long ground target detection network | |
CN111259837A (en) | Pedestrian re-identification method and system based on part attention | |
CN113298817A (en) | High-accuracy semantic segmentation method for remote sensing image | |
CN113420827A (en) | Semantic segmentation network training and image semantic segmentation method, device and equipment | |
CN116258990A (en) | Cross-modal affinity-based small sample reference video target segmentation method | |
Kadim et al. | Deep-learning based single object tracker for night surveillance. | |
CN116740516A (en) | Target detection method and system based on multi-scale fusion feature extraction | |
CN117079095A (en) | Deep learning-based high-altitude parabolic detection method, system, medium and equipment | |
Rao et al. | Roads detection of aerial image with FCN-CRF model | |
CN114120076B (en) | Cross-view video gait recognition method based on gait motion estimation | |
CN113159071B (en) | Cross-modal image-text association anomaly detection method | |
CN114782360A (en) | Real-time tomato posture detection method based on DCT-YOLOv5 model | |
CN114140524A (en) | Closed loop detection system and method for multi-scale feature fusion | |
Xia et al. | Multi-RPN Fusion-Based Sparse PCA-CNN Approach to Object Detection and Recognition for Robot-Aided Visual System | |
Han | Comparison on object detection algorithms: A taxonomy |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |