CN112115977A - Target detection algorithm based on scale invariance and feature fusion - Google Patents

Target detection algorithm based on scale invariance and feature fusion Download PDF

Info

Publication number
CN112115977A
CN112115977A CN202010856245.XA CN202010856245A CN112115977A CN 112115977 A CN112115977 A CN 112115977A CN 202010856245 A CN202010856245 A CN 202010856245A CN 112115977 A CN112115977 A CN 112115977A
Authority
CN
China
Prior art keywords
feature
candidate
frame
fusion
feature maps
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010856245.XA
Other languages
Chinese (zh)
Other versions
CN112115977B (en
Inventor
周轩弘
李季
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN202010856245.XA priority Critical patent/CN112115977B/en
Publication of CN112115977A publication Critical patent/CN112115977A/en
Application granted granted Critical
Publication of CN112115977B publication Critical patent/CN112115977B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Abstract

A target detection algorithm based on scale invariance and feature fusion adopts the following steps: the method comprises the following steps: inputting an image to be detected into detnet59 for feature extraction to obtain a plurality of feature maps; step two: selecting a mode of fusing features for the obtained feature maps to obtain a plurality of new feature maps with the same channel; step three: and generating a candidate frame by using the plurality of feature maps, and performing multiple selection classification and regression on the candidate frame.

Description

Target detection algorithm based on scale invariance and feature fusion
Technical Field
The invention relates to the technical field of target detection, in particular to a target detection algorithm based on scale invariance and feature fusion.
Background
With the continuous development of deep learning technology, more and more target detection methods are available. A large number of targets exist in an image, and classification and detection of each target are difficult, especially for some small targets, so that detection of the small targets is a key area in the field of target detection at present.
The target detection is a complex and important task, and has great effects on military affairs, medical treatment, life and other aspects. Existing target detection techniques are mainly divided into two types: the method comprises the following steps of firstly, based on a traditional method of manually labeling features, such as a Hear feature, an Adaboost algorithm, an SVM algorithm and a DPM algorithm; the second is a method based on deep learning technology. Under deep learning, target detection is mainly divided into the following two tasks: one is the prediction of the frame, marking the up, down, left, right position of each object. The other is a prediction of the class, which predicts to which object each pixel belongs. And because the steps are different, the target detection is divided into two-stage detection and single-stage target detection. Representative papers for two-stage object detection are mainly the RCNN series, i.e., object candidate regions (Region probes) are generated and then corrected. The representative papers of single-stage target detection are mainly the YOLO and SSD series, i.e. the position of the frame is directly predicted through the network. Generally, the precision of target detection in two stages is higher than that of target detection in a single stage, and the precision of target detection in a single stage is not as high as that in two stages, but the detection speed is higher under the condition that certain precision is ensured. However, both methods have a scaling problem. Because the two methods are based on larger down-sampling factors and generate higher receptive fields to obtain more semantic information, the method is beneficial to large object recognition. However, downsampling necessarily suffers from a loss of spatial resolution, and as downsampling is larger, resolution is smaller, and small object recognition is more difficult. In order to solve the problem of scale transformation caused by down-sampling, a common method is multi-scale feature fusion. The FPN uses the method for the first time, and the lower-layer features are fused with the higher-layer features to obtain more semantic information through a top-down idea. Then, the PANET is improved on the basis of FPN, and is added with a bottom-up thought, and is gradually sampled from the lower-layer features to the resolution of the higher-layer features, and is fused with the higher-layer features, so that the higher-layer features also have the spatial information of the lower-layer features. However, the method has the defects that different layers have different sensitivities to different scales, and even though the high-layer features are fused with the spatial information of the low-layer features, the semantic information of the low layer is brought at the same time, so that the trained high-layer features can be influenced, and the classification and prediction capabilities of the high-layer features on large objects are weakened.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a target detection algorithm based on scale invariance and feature fusion, which solves the problem of scale change in the existing target detection method, improves the detection of small targets and large targets, and has the following specific technical scheme:
a target detection algorithm based on scale invariance and feature fusion adopts the following steps:
the method comprises the following steps: inputting an image to be detected into detnet59 for feature extraction to obtain a plurality of feature maps;
step two: selecting a mode of fusing features for the obtained feature maps to obtain a plurality of new feature maps with the same channel;
step three: and generating a candidate frame by using the plurality of feature maps, and performing multiple selection classification and regression on the candidate frame.
As an optimization: the detnet59 is a modified detnet59, the modified detnet59 and the detnet59 have the same first step to the fifth step, respectively generate 1-5 feature maps, the fifth step is started, the 5 th feature map is used and is divided into three branches, and the 6 th-8 th feature map is generated, the resolution of the 6 th feature map is kept the same as that of the 5 th feature map, the perceptive field is kept different by using dilation convolution, the 7 th and 8 th feature maps are used for reducing the resolution and increasing semantic information, and then the perceptive field of the 7 th and 8 th feature maps is increased by using dilation convolution.
As an optimization: the second step is to select a fusion characteristic mode specifically;
step 2.1: the feature maps of the 2 nd to the 8 th are changed into the feature map of the channel 256 through convolution operation, wherein the feature maps of the 6 th to the 8 th are generated into P6-P8;
step 2.2: after the feature maps of 7 and 8 are subjected to upsampling, the feature maps and the feature map of 6 are fused into a feature map 5, and after the fusion, each fusion result is convolved to generate P5;
step 2.3: performing upsampling fusion on the P5 to the feature map 4, and performing convolution on each fusion result after the fusion to generate P4;
step 2.4: and repeating the step 2.3 until the feature map 2 is fused, and generating P2 and P3.
As an optimization: the third step is specifically that the first step is,
step 3.1: for the P2, P3, P4, P5, P6, P7, and P8 layers, a large amount of anchors are generated;
step 3.2: for the three layers of P6, P7 and P8, the anchorages and ground channels generated by the three layers are screened according to a function l _ i ≦ v/wh h [ (u) ≦ u ] and i represents the minimum width value, u _ i represents the maximum width value, w and h represent the height and width of the frame respectively, P6 retains only small anchorages, P7 retains only medium anchorages, and P8 retains only large anchorages; then, suppressing and generating the candidate frames of the first part by using an NMS non-maximum value with an IoU threshold value of 0.5 for the anchor, and then carrying out classification and border regression on the candidate frames of the first part; the value of the IOU is the intersection of the two prediction boxes divided by the union of the two prediction boxes; NMS compares all the frames one by one, if the intersection of the two frames is larger than the threshold value set by IOU, then the frame with the maximum score is kept, and the other frames are deleted; obtaining a first partial candidate frame; p6 only needs to return loss to small group route, P7 only needs to return loss to medium group route, and P8 only needs to return loss to large group route;
step 3.2: after the regressed candidate frames of the first part are obtained, inhibiting and generating the candidate frames of the second part by using the NMS non-maximum value with the threshold value of 0.6, and then classifying and frame regressing the candidate frames of the second part;
step 3.3: and after the candidate frames after the second part of regression are obtained, inhibiting and generating a final candidate frame by using the NMS non-maximum value with the threshold value of 0.7, and then classifying and performing frame regression on the final candidate frame.
As an optimization: the classifying the candidate box may include classifying the candidate box,
mapping the features corresponding to the candidate frames to a (0, 1) interval by utilizing a softmax function in classification, wherein the features correspond to n categories, n is an integer greater than 1, and the category with the highest probability is a predicted category;
Figure BDA0002646494990000041
where Si represents the probability for the class, ei represents the prediction score for the class, and Σ jej represents the sum of all class scores.
As an optimization: the regression of the final candidate box comprises:
the regression utilizes a DIoU loss function to calculate the scale, the overlapping rate and the distance between the candidate frame and the target;
Figure BDA0002646494990000042
IoU represents the intersection ratio of the target frame and the candidate frame, b represents the center point of the candidate frame, bgt represents the center point of the target frame, ρ represents calculation of the European sniping between the two center points, and c represents the diagonal distance of the minimum closure area which can contain the candidate frame and the target frame at the same time.
The invention has the beneficial effects that: inputting the image into a deep neural network for feature extraction, obtaining a feature map with scale invariance, screening the feature map, obtaining a plurality of candidate frames, generating candidate frames by using the feature map, then carrying out maximum value inhibition with the cross-over ratio of 0.5 on the candidate frames to select a first part of candidate frames, classifying and regressing the first part of candidate frames to obtain new candidate frames, and then carrying out maximum value inhibition with the cross-over ratio of 0.6 on the new candidate frames to obtain a second part of candidate frames. And classifying and regressing the second part of candidate frames to obtain new candidate frames, and performing maximum value suppression with the intersection ratio of 0.7 on the new candidate frames to obtain final candidate frames.
Drawings
FIG. 1 is a flow chart of a target detection algorithm based on scale invariance and feature fusion in accordance with the present invention;
FIG. 2 is a diagram of a multi-drop detnet network architecture in accordance with the present invention;
FIG. 3 is a diagram of an alternative fusion architecture in accordance with the present invention;
FIG. 4 is a graph of predicted object size for each branch in the present invention;
FIG. 5 is a diagram of multiple classification and regression of candidate frames in accordance with the present invention;
FIG. 6 is a diagram of a network architecture according to the present invention;
Detailed Description
The following detailed description of the preferred embodiments of the present invention, taken in conjunction with the accompanying drawings, will make the advantages and features of the invention easier to understand by those skilled in the art, and thus will clearly and clearly define the scope of the invention.
The hardware equipment used by the invention comprises 1 PC and 1 nvidia1080Ti video card;
as shown in fig. 1: a target detection algorithm based on scale invariance and feature fusion comprises the following steps:
s1, inputting the image to be detected into an improved detnet59 for feature extraction to obtain a plurality of feature maps;
s2, carrying out a mode of selecting and fusing the characteristics on the obtained characteristic graphs to obtain a plurality of new characteristic graphs with the same channel;
and S3, generating a candidate frame by using the plurality of feature maps, and performing multiple selection classification and regression on the candidate frame.
Improved detnet59 network architecture:
referring to fig. 2, the improved detnet59 network structure used, using a convolution operation with step size 2 each time on the input pictures, produces 4 layers of feature maps C2, C2, C3, C4 of different sizes. 36 expansion convolutions are used, and a characteristic diagram is taken after every 9 expansion convolution layers and is divided into C5, C6, C7 and C8. And C5 uses the dilation convolution with dilation rate of 2, and C6 uses the dilation convolution with dilation rate of 2 on the basis of C5, so that a receptive field different from C5 is obtained. C7 is a method in which C5 is first convolved at a step size of 2 so that the image size is reduced in addition to C5, and the reduced image is then convolved with a dilation rate of 2 so that a receptive field different from C6 is obtained. C8 is also based on C5, C5 is first convolved with a step size of 2 to reduce the image size, and the reduced image is then convolved with a dilation rate of 2 to obtain a receptive field different from that of C6 or C7.
Selecting fusion:
referring to fig. 3, using the feature maps { C2, C3, C4, C5, C6, C7, and C8} extracted in the first step, a convolution operation with 1 × 1 convolution kernel 256 is used to generate { C2_ reduced, C3_ reduced, C4_ reduced, C5_ reduced, P6, P7, and P8} for all feature maps; { P7, P8} is processed by a bilinear interpolation to become { P7_ upsampled, P8_ upsampled }, C5_ reduced and { P7_ upsampled, P8_ upsampled, C6_ reduced } are processed by add convolution fusion to generate P5_ clustered, P5_ clustered is processed by a convolution with a 3 ^ 3 convolution kernel of 256 to obtain P5; p5 is processed by bilinear interpolation to become P5_ upsampled, C4_ reduced and P5_ upsampled are processed by add convolution fusion to generate P4_ clustered, and P4_ clustered is processed by convolution with 3 × 3 convolution kernel being 256 to obtain P4;
in the same way, P4 is processed by bilinear interpolation to become P4_ upsamplated, C3_ reduced and P4_ upsamplated are processed by add convolution and fusion to generate P3_ clustered, and P3_ clustered is processed by convolution with 3 × 3 convolution kernel being 256 to obtain P3; p3 is processed by bilinear interpolation to become P3_ upsampled, C2_ reduced and P3_ upsampled are processed by add convolution fusion to generate P2_ clustered, and P2_ clustered is processed by convolution with 3 × 3 convolution kernel being 256 to obtain P2;
prediction of anchor:
referring to FIG. 4, { P6, P7, P8} is fed into the RPN network, for which anchors and ground nodes are generated according to
Figure BDA0002646494990000061
The function is screened, and P6 is only retained in li,uiAt [0,90 ]]The range anchors, P7 remain only at li,uiAt [30,160 ]]The range anchors, P8 remain only at li,uiAt [90, ∞ ]]Anchors within the range. The anchors with corresponding sizes are predicted respectively. And { P2, P3, P4, P5} predicts the anchors of all scales
Multiple classification and regression of candidate frames:
referring to fig. 5, NMS non-maximum with threshold IoU of 0.5 is used to suppress the generation of candidate boxes for the first portion, and then the candidate boxes for the first portion are classified and bounding box regressed. And after the regressed candidate frames are obtained, inhibiting and generating the candidate frames of the second part by using the NMS non-maximum value with the threshold value of IoU being 0.6, and classifying and frame regressing the candidate frames of the second part. And after the regressed candidate frames are obtained, inhibiting and generating the candidate frames of the final part by using the NMS non-maximum value with the threshold value of IoU being 0.7, and classifying and performing border regression on the candidate frames of the final part. All classifications use the softmax function and all regressions are DIoU loss functions.
FIG. 6 is a block diagram of the overall network used in the patent
Training target detection network
And loading an image pre-training model, freezing parameters of the characteristic extraction part of the network, only training the network after the parameters are frozen, and performing next training after the best result is achieved.

Claims (6)

1. A target detection algorithm based on scale invariance and feature fusion is characterized by comprising the following steps:
the method comprises the following steps: inputting an image to be detected into detnet59 for feature extraction to obtain a plurality of feature maps;
step two: selecting a mode of fusing features for the obtained feature maps to obtain a plurality of new feature maps with the same channel;
step three: and generating a candidate frame by using the plurality of feature maps, and performing multiple selection classification and regression on the candidate frame.
2. The target detection algorithm based on scale invariance and feature fusion of claim 1, wherein: the detnet59 is a modified detnet59, the modified detnet59 and the detnet59 have the same first step to the fifth step, respectively generate 1-5 feature maps, the fifth step is started, the 5 th feature map is used and is divided into three branches, and the 6 th-8 th feature map is generated, the resolution of the 6 th feature map is kept the same as that of the 5 th feature map, the perceptive field is kept different by using dilation convolution, the 7 th and 8 th feature maps are used for reducing the resolution and increasing semantic information, and then the perceptive field of the 7 th and 8 th feature maps is increased by using dilation convolution.
3. The target detection algorithm based on scale invariance and feature fusion of claim 1, wherein: the second step is to select a fusion characteristic mode specifically;
step 2.1: the feature maps of the 2 nd to the 8 th are changed into the feature map of the channel 256 through convolution operation, wherein the feature maps of the 6 th to the 8 th are generated into P6-P8;
step 2.2: after the feature maps of 7 and 8 are subjected to upsampling, the feature maps and the feature map of 6 are fused into a feature map 5, and after the fusion, each fusion result is convolved to generate P5;
step 2.3: performing upsampling fusion on the P5 to the feature map 4, and performing convolution on each fusion result after the fusion to generate P4;
step 2.4: and repeating the step 2.3 until the feature map 2 is fused, and generating P2 and P3.
4. The target detection algorithm based on scale invariance and feature fusion of claim 1, wherein: the third step is specifically that the first step is,
step 3.1: for the P2, P3, P4, P5, P6, P7, and P8 layers, a large amount of anchors are generated;
step 3.2: for the three layers of P6, P7 and P8, the anchorages and ground channels generated by the three layers are screened according to a function l _ i ≦ v/wh h [ (u) ≦ u ] and i represents the minimum width value, u _ i represents the maximum width value, w and h represent the height and width of the frame respectively, P6 retains only small anchorages, P7 retains only medium anchorages, and P8 retains only large anchorages; then, suppressing and generating the candidate frames of the first part by using an NMS non-maximum value with an IoU threshold value of 0.5 for the anchor, and then carrying out classification and border regression on the candidate frames of the first part; the value of the IOU is the intersection of the two prediction boxes divided by the union of the two prediction boxes; NMS compares all the frames one by one, if the intersection of the two frames is larger than the threshold value set by IOU, then the frame with the maximum score is kept, and the other frames are deleted; obtaining a first partial candidate frame; p6 only needs to return loss to small group route, P7 only needs to return loss to medium group route, and P8 only needs to return loss to large group route;
step 3.2: after the regressed candidate frames of the first part are obtained, inhibiting and generating the candidate frames of the second part by using the NMS non-maximum value with the threshold value of 0.6, and then classifying and frame regressing the candidate frames of the second part;
step 3.3: and after the candidate frames after the second part of regression are obtained, inhibiting and generating a final candidate frame by using the NMS non-maximum value with the threshold value of 0.7, and then classifying and performing frame regression on the final candidate frame.
5. The target detection algorithm based on scale invariance and feature fusion of claim 4, wherein: the classifying the candidate box may include classifying the candidate box,
mapping the features corresponding to the candidate frames to a (0, 1) interval by utilizing a softmax function in classification, wherein the features correspond to n categories, n is an integer greater than 1, and the category with the highest probability is a predicted category;
Figure FDA0002646494980000021
where Si represents the probability for the class, ei represents the prediction score for the class, and Σ jej represents the sum of all class scores.
6. The target detection algorithm based on scale invariance and feature fusion of claim 4, wherein: the regression of the final candidate box comprises:
the regression utilizes a DIoU loss function to calculate the scale, the overlapping rate and the distance between the candidate frame and the target;
Figure FDA0002646494980000022
IoU represents the intersection ratio of the target frame and the candidate frame, b represents the center point of the candidate frame, bgt represents the center point of the target frame, ρ represents calculation of the European sniping between the two center points, and c represents the diagonal distance of the minimum closure area which can contain the candidate frame and the target frame at the same time.
CN202010856245.XA 2020-08-24 2020-08-24 Target detection algorithm based on scale invariance and feature fusion Active CN112115977B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010856245.XA CN112115977B (en) 2020-08-24 2020-08-24 Target detection algorithm based on scale invariance and feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010856245.XA CN112115977B (en) 2020-08-24 2020-08-24 Target detection algorithm based on scale invariance and feature fusion

Publications (2)

Publication Number Publication Date
CN112115977A true CN112115977A (en) 2020-12-22
CN112115977B CN112115977B (en) 2024-04-02

Family

ID=73805356

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010856245.XA Active CN112115977B (en) 2020-08-24 2020-08-24 Target detection algorithm based on scale invariance and feature fusion

Country Status (1)

Country Link
CN (1) CN112115977B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109345472A (en) * 2018-09-11 2019-02-15 重庆大学 A kind of infrared moving small target detection method of complex scene
US20190057507A1 (en) * 2017-08-18 2019-02-21 Samsung Electronics Co., Ltd. System and method for semantic segmentation of images
CN109871806A (en) * 2019-02-21 2019-06-11 山东大学 Landform recognition methods and system based on depth residual texture network
CN110689044A (en) * 2019-08-22 2020-01-14 湖南四灵电子科技有限公司 Target detection method and system combining relationship between targets
CN110929578A (en) * 2019-10-25 2020-03-27 南京航空航天大学 Anti-blocking pedestrian detection method based on attention mechanism
CN111027547A (en) * 2019-12-06 2020-04-17 南京大学 Automatic detection method for multi-scale polymorphic target in two-dimensional image
CN111241905A (en) * 2019-11-21 2020-06-05 南京工程学院 Power transmission line nest detection method based on improved SSD algorithm
CN111292305A (en) * 2020-01-22 2020-06-16 重庆大学 Improved YOLO-V3 metal processing surface defect detection method
CN111310756A (en) * 2020-01-20 2020-06-19 陕西师范大学 Damaged corn particle detection and classification method based on deep learning

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190057507A1 (en) * 2017-08-18 2019-02-21 Samsung Electronics Co., Ltd. System and method for semantic segmentation of images
CN109345472A (en) * 2018-09-11 2019-02-15 重庆大学 A kind of infrared moving small target detection method of complex scene
CN109871806A (en) * 2019-02-21 2019-06-11 山东大学 Landform recognition methods and system based on depth residual texture network
CN110689044A (en) * 2019-08-22 2020-01-14 湖南四灵电子科技有限公司 Target detection method and system combining relationship between targets
CN110929578A (en) * 2019-10-25 2020-03-27 南京航空航天大学 Anti-blocking pedestrian detection method based on attention mechanism
CN111241905A (en) * 2019-11-21 2020-06-05 南京工程学院 Power transmission line nest detection method based on improved SSD algorithm
CN111027547A (en) * 2019-12-06 2020-04-17 南京大学 Automatic detection method for multi-scale polymorphic target in two-dimensional image
CN111310756A (en) * 2020-01-20 2020-06-19 陕西师范大学 Damaged corn particle detection and classification method based on deep learning
CN111292305A (en) * 2020-01-22 2020-06-16 重庆大学 Improved YOLO-V3 metal processing surface defect detection method

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
MIAOHUI ZHANG等: "Adaptive Anchor Networks for Multi-Scale Object Detection in Remote Sensing Images", 《IEEE ACCESS》, vol. 8, pages 57552 - 57565, XP011781249, DOI: 10.1109/ACCESS.2020.2982658 *
ZEMING LI等: "DetNet: A Backbone network for Object Detection", 《ARXIV:1804.06215V2》, pages 1 - 17 *
丁瑶: "基于融合机制的航拍目标检测与识别", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》, no. 07, pages 138 - 572 *
刘辉等: "基于多特征融合与ROI预测的红外目标跟踪算法", 《光子学报》, vol. 48, no. 07, pages 108 - 123 *
李季等: "基于尺度不变性与特征融合的目标检测算法", 《南京大学学报(自然科学)》, vol. 57, no. 02, pages 237 - 244 *

Also Published As

Publication number Publication date
CN112115977B (en) 2024-04-02

Similar Documents

Publication Publication Date Title
CN107341517B (en) Multi-scale small object detection method based on deep learning inter-level feature fusion
CN111027493B (en) Pedestrian detection method based on deep learning multi-network soft fusion
CN112507777A (en) Optical remote sensing image ship detection and segmentation method based on deep learning
US20060165258A1 (en) Tracking objects in videos with adaptive classifiers
CN111738055B (en) Multi-category text detection system and bill form detection method based on same
CN113139543B (en) Training method of target object detection model, target object detection method and equipment
Yang et al. Real-time pedestrian and vehicle detection for autonomous driving
CN111368769A (en) Ship multi-target detection method based on improved anchor point frame generation model
CN111274981B (en) Target detection network construction method and device and target detection method
CN111461145B (en) Method for detecting target based on convolutional neural network
CN111460980A (en) Multi-scale detection method for small-target pedestrian based on multi-semantic feature fusion
CN113255837A (en) Improved CenterNet network-based target detection method in industrial environment
CN112232371A (en) American license plate recognition method based on YOLOv3 and text recognition
CN111340039A (en) Target detection method based on feature selection
CN110084284A (en) Target detection and secondary classification algorithm and device based on region convolutional neural networks
CN113313706A (en) Power equipment defect image detection method based on detection reference point offset analysis
CN111462090B (en) Multi-scale image target detection method
CN113297959A (en) Target tracking method and system based on corner attention twin network
CN111368845A (en) Feature dictionary construction and image segmentation method based on deep learning
CN114332921A (en) Pedestrian detection method based on improved clustering algorithm for Faster R-CNN network
CN111931572B (en) Target detection method for remote sensing image
CN114037839A (en) Small target identification method, system, electronic equipment and medium
CN113963272A (en) Unmanned aerial vehicle image target detection method based on improved yolov3
CN116758340A (en) Small target detection method based on super-resolution feature pyramid and attention mechanism
CN112115977A (en) Target detection algorithm based on scale invariance and feature fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant