CN112115977A - Target detection algorithm based on scale invariance and feature fusion - Google Patents
Target detection algorithm based on scale invariance and feature fusion Download PDFInfo
- Publication number
- CN112115977A CN112115977A CN202010856245.XA CN202010856245A CN112115977A CN 112115977 A CN112115977 A CN 112115977A CN 202010856245 A CN202010856245 A CN 202010856245A CN 112115977 A CN112115977 A CN 112115977A
- Authority
- CN
- China
- Prior art keywords
- feature
- candidate
- frame
- fusion
- feature maps
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 34
- 230000004927 fusion Effects 0.000 title claims abstract description 33
- 238000000034 method Methods 0.000 claims abstract description 13
- 238000000605 extraction Methods 0.000 claims abstract description 6
- 230000010339 dilation Effects 0.000 claims description 10
- 230000002401 inhibitory effect Effects 0.000 claims description 6
- 101100400452 Caenorhabditis elegans map-2 gene Proteins 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 claims description 2
- 238000013507 mapping Methods 0.000 claims description 2
- 230000036961 partial effect Effects 0.000 claims description 2
- 230000002829 reductive effect Effects 0.000 description 12
- 238000010586 diagram Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 238000005457 optimization Methods 0.000 description 5
- 238000012549 training Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000005764 inhibitory process Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008014 freezing Effects 0.000 description 1
- 238000007710 freezing Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 239000000523 sample Substances 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
Abstract
A target detection algorithm based on scale invariance and feature fusion adopts the following steps: the method comprises the following steps: inputting an image to be detected into detnet59 for feature extraction to obtain a plurality of feature maps; step two: selecting a mode of fusing features for the obtained feature maps to obtain a plurality of new feature maps with the same channel; step three: and generating a candidate frame by using the plurality of feature maps, and performing multiple selection classification and regression on the candidate frame.
Description
Technical Field
The invention relates to the technical field of target detection, in particular to a target detection algorithm based on scale invariance and feature fusion.
Background
With the continuous development of deep learning technology, more and more target detection methods are available. A large number of targets exist in an image, and classification and detection of each target are difficult, especially for some small targets, so that detection of the small targets is a key area in the field of target detection at present.
The target detection is a complex and important task, and has great effects on military affairs, medical treatment, life and other aspects. Existing target detection techniques are mainly divided into two types: the method comprises the following steps of firstly, based on a traditional method of manually labeling features, such as a Hear feature, an Adaboost algorithm, an SVM algorithm and a DPM algorithm; the second is a method based on deep learning technology. Under deep learning, target detection is mainly divided into the following two tasks: one is the prediction of the frame, marking the up, down, left, right position of each object. The other is a prediction of the class, which predicts to which object each pixel belongs. And because the steps are different, the target detection is divided into two-stage detection and single-stage target detection. Representative papers for two-stage object detection are mainly the RCNN series, i.e., object candidate regions (Region probes) are generated and then corrected. The representative papers of single-stage target detection are mainly the YOLO and SSD series, i.e. the position of the frame is directly predicted through the network. Generally, the precision of target detection in two stages is higher than that of target detection in a single stage, and the precision of target detection in a single stage is not as high as that in two stages, but the detection speed is higher under the condition that certain precision is ensured. However, both methods have a scaling problem. Because the two methods are based on larger down-sampling factors and generate higher receptive fields to obtain more semantic information, the method is beneficial to large object recognition. However, downsampling necessarily suffers from a loss of spatial resolution, and as downsampling is larger, resolution is smaller, and small object recognition is more difficult. In order to solve the problem of scale transformation caused by down-sampling, a common method is multi-scale feature fusion. The FPN uses the method for the first time, and the lower-layer features are fused with the higher-layer features to obtain more semantic information through a top-down idea. Then, the PANET is improved on the basis of FPN, and is added with a bottom-up thought, and is gradually sampled from the lower-layer features to the resolution of the higher-layer features, and is fused with the higher-layer features, so that the higher-layer features also have the spatial information of the lower-layer features. However, the method has the defects that different layers have different sensitivities to different scales, and even though the high-layer features are fused with the spatial information of the low-layer features, the semantic information of the low layer is brought at the same time, so that the trained high-layer features can be influenced, and the classification and prediction capabilities of the high-layer features on large objects are weakened.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a target detection algorithm based on scale invariance and feature fusion, which solves the problem of scale change in the existing target detection method, improves the detection of small targets and large targets, and has the following specific technical scheme:
a target detection algorithm based on scale invariance and feature fusion adopts the following steps:
the method comprises the following steps: inputting an image to be detected into detnet59 for feature extraction to obtain a plurality of feature maps;
step two: selecting a mode of fusing features for the obtained feature maps to obtain a plurality of new feature maps with the same channel;
step three: and generating a candidate frame by using the plurality of feature maps, and performing multiple selection classification and regression on the candidate frame.
As an optimization: the detnet59 is a modified detnet59, the modified detnet59 and the detnet59 have the same first step to the fifth step, respectively generate 1-5 feature maps, the fifth step is started, the 5 th feature map is used and is divided into three branches, and the 6 th-8 th feature map is generated, the resolution of the 6 th feature map is kept the same as that of the 5 th feature map, the perceptive field is kept different by using dilation convolution, the 7 th and 8 th feature maps are used for reducing the resolution and increasing semantic information, and then the perceptive field of the 7 th and 8 th feature maps is increased by using dilation convolution.
As an optimization: the second step is to select a fusion characteristic mode specifically;
step 2.1: the feature maps of the 2 nd to the 8 th are changed into the feature map of the channel 256 through convolution operation, wherein the feature maps of the 6 th to the 8 th are generated into P6-P8;
step 2.2: after the feature maps of 7 and 8 are subjected to upsampling, the feature maps and the feature map of 6 are fused into a feature map 5, and after the fusion, each fusion result is convolved to generate P5;
step 2.3: performing upsampling fusion on the P5 to the feature map 4, and performing convolution on each fusion result after the fusion to generate P4;
step 2.4: and repeating the step 2.3 until the feature map 2 is fused, and generating P2 and P3.
As an optimization: the third step is specifically that the first step is,
step 3.1: for the P2, P3, P4, P5, P6, P7, and P8 layers, a large amount of anchors are generated;
step 3.2: for the three layers of P6, P7 and P8, the anchorages and ground channels generated by the three layers are screened according to a function l _ i ≦ v/wh h [ (u) ≦ u ] and i represents the minimum width value, u _ i represents the maximum width value, w and h represent the height and width of the frame respectively, P6 retains only small anchorages, P7 retains only medium anchorages, and P8 retains only large anchorages; then, suppressing and generating the candidate frames of the first part by using an NMS non-maximum value with an IoU threshold value of 0.5 for the anchor, and then carrying out classification and border regression on the candidate frames of the first part; the value of the IOU is the intersection of the two prediction boxes divided by the union of the two prediction boxes; NMS compares all the frames one by one, if the intersection of the two frames is larger than the threshold value set by IOU, then the frame with the maximum score is kept, and the other frames are deleted; obtaining a first partial candidate frame; p6 only needs to return loss to small group route, P7 only needs to return loss to medium group route, and P8 only needs to return loss to large group route;
step 3.2: after the regressed candidate frames of the first part are obtained, inhibiting and generating the candidate frames of the second part by using the NMS non-maximum value with the threshold value of 0.6, and then classifying and frame regressing the candidate frames of the second part;
step 3.3: and after the candidate frames after the second part of regression are obtained, inhibiting and generating a final candidate frame by using the NMS non-maximum value with the threshold value of 0.7, and then classifying and performing frame regression on the final candidate frame.
As an optimization: the classifying the candidate box may include classifying the candidate box,
mapping the features corresponding to the candidate frames to a (0, 1) interval by utilizing a softmax function in classification, wherein the features correspond to n categories, n is an integer greater than 1, and the category with the highest probability is a predicted category;
where Si represents the probability for the class, ei represents the prediction score for the class, and Σ jej represents the sum of all class scores.
As an optimization: the regression of the final candidate box comprises:
the regression utilizes a DIoU loss function to calculate the scale, the overlapping rate and the distance between the candidate frame and the target;
IoU represents the intersection ratio of the target frame and the candidate frame, b represents the center point of the candidate frame, bgt represents the center point of the target frame, ρ represents calculation of the European sniping between the two center points, and c represents the diagonal distance of the minimum closure area which can contain the candidate frame and the target frame at the same time.
The invention has the beneficial effects that: inputting the image into a deep neural network for feature extraction, obtaining a feature map with scale invariance, screening the feature map, obtaining a plurality of candidate frames, generating candidate frames by using the feature map, then carrying out maximum value inhibition with the cross-over ratio of 0.5 on the candidate frames to select a first part of candidate frames, classifying and regressing the first part of candidate frames to obtain new candidate frames, and then carrying out maximum value inhibition with the cross-over ratio of 0.6 on the new candidate frames to obtain a second part of candidate frames. And classifying and regressing the second part of candidate frames to obtain new candidate frames, and performing maximum value suppression with the intersection ratio of 0.7 on the new candidate frames to obtain final candidate frames.
Drawings
FIG. 1 is a flow chart of a target detection algorithm based on scale invariance and feature fusion in accordance with the present invention;
FIG. 2 is a diagram of a multi-drop detnet network architecture in accordance with the present invention;
FIG. 3 is a diagram of an alternative fusion architecture in accordance with the present invention;
FIG. 4 is a graph of predicted object size for each branch in the present invention;
FIG. 5 is a diagram of multiple classification and regression of candidate frames in accordance with the present invention;
FIG. 6 is a diagram of a network architecture according to the present invention;
Detailed Description
The following detailed description of the preferred embodiments of the present invention, taken in conjunction with the accompanying drawings, will make the advantages and features of the invention easier to understand by those skilled in the art, and thus will clearly and clearly define the scope of the invention.
The hardware equipment used by the invention comprises 1 PC and 1 nvidia1080Ti video card;
as shown in fig. 1: a target detection algorithm based on scale invariance and feature fusion comprises the following steps:
s1, inputting the image to be detected into an improved detnet59 for feature extraction to obtain a plurality of feature maps;
s2, carrying out a mode of selecting and fusing the characteristics on the obtained characteristic graphs to obtain a plurality of new characteristic graphs with the same channel;
and S3, generating a candidate frame by using the plurality of feature maps, and performing multiple selection classification and regression on the candidate frame.
Improved detnet59 network architecture:
referring to fig. 2, the improved detnet59 network structure used, using a convolution operation with step size 2 each time on the input pictures, produces 4 layers of feature maps C2, C2, C3, C4 of different sizes. 36 expansion convolutions are used, and a characteristic diagram is taken after every 9 expansion convolution layers and is divided into C5, C6, C7 and C8. And C5 uses the dilation convolution with dilation rate of 2, and C6 uses the dilation convolution with dilation rate of 2 on the basis of C5, so that a receptive field different from C5 is obtained. C7 is a method in which C5 is first convolved at a step size of 2 so that the image size is reduced in addition to C5, and the reduced image is then convolved with a dilation rate of 2 so that a receptive field different from C6 is obtained. C8 is also based on C5, C5 is first convolved with a step size of 2 to reduce the image size, and the reduced image is then convolved with a dilation rate of 2 to obtain a receptive field different from that of C6 or C7.
Selecting fusion:
referring to fig. 3, using the feature maps { C2, C3, C4, C5, C6, C7, and C8} extracted in the first step, a convolution operation with 1 × 1 convolution kernel 256 is used to generate { C2_ reduced, C3_ reduced, C4_ reduced, C5_ reduced, P6, P7, and P8} for all feature maps; { P7, P8} is processed by a bilinear interpolation to become { P7_ upsampled, P8_ upsampled }, C5_ reduced and { P7_ upsampled, P8_ upsampled, C6_ reduced } are processed by add convolution fusion to generate P5_ clustered, P5_ clustered is processed by a convolution with a 3 ^ 3 convolution kernel of 256 to obtain P5; p5 is processed by bilinear interpolation to become P5_ upsampled, C4_ reduced and P5_ upsampled are processed by add convolution fusion to generate P4_ clustered, and P4_ clustered is processed by convolution with 3 × 3 convolution kernel being 256 to obtain P4;
in the same way, P4 is processed by bilinear interpolation to become P4_ upsamplated, C3_ reduced and P4_ upsamplated are processed by add convolution and fusion to generate P3_ clustered, and P3_ clustered is processed by convolution with 3 × 3 convolution kernel being 256 to obtain P3; p3 is processed by bilinear interpolation to become P3_ upsampled, C2_ reduced and P3_ upsampled are processed by add convolution fusion to generate P2_ clustered, and P2_ clustered is processed by convolution with 3 × 3 convolution kernel being 256 to obtain P2;
prediction of anchor:
referring to FIG. 4, { P6, P7, P8} is fed into the RPN network, for which anchors and ground nodes are generated according toThe function is screened, and P6 is only retained in li,uiAt [0,90 ]]The range anchors, P7 remain only at li,uiAt [30,160 ]]The range anchors, P8 remain only at li,uiAt [90, ∞ ]]Anchors within the range. The anchors with corresponding sizes are predicted respectively. And { P2, P3, P4, P5} predicts the anchors of all scales
Multiple classification and regression of candidate frames:
referring to fig. 5, NMS non-maximum with threshold IoU of 0.5 is used to suppress the generation of candidate boxes for the first portion, and then the candidate boxes for the first portion are classified and bounding box regressed. And after the regressed candidate frames are obtained, inhibiting and generating the candidate frames of the second part by using the NMS non-maximum value with the threshold value of IoU being 0.6, and classifying and frame regressing the candidate frames of the second part. And after the regressed candidate frames are obtained, inhibiting and generating the candidate frames of the final part by using the NMS non-maximum value with the threshold value of IoU being 0.7, and classifying and performing border regression on the candidate frames of the final part. All classifications use the softmax function and all regressions are DIoU loss functions.
FIG. 6 is a block diagram of the overall network used in the patent
Training target detection network
And loading an image pre-training model, freezing parameters of the characteristic extraction part of the network, only training the network after the parameters are frozen, and performing next training after the best result is achieved.
Claims (6)
1. A target detection algorithm based on scale invariance and feature fusion is characterized by comprising the following steps:
the method comprises the following steps: inputting an image to be detected into detnet59 for feature extraction to obtain a plurality of feature maps;
step two: selecting a mode of fusing features for the obtained feature maps to obtain a plurality of new feature maps with the same channel;
step three: and generating a candidate frame by using the plurality of feature maps, and performing multiple selection classification and regression on the candidate frame.
2. The target detection algorithm based on scale invariance and feature fusion of claim 1, wherein: the detnet59 is a modified detnet59, the modified detnet59 and the detnet59 have the same first step to the fifth step, respectively generate 1-5 feature maps, the fifth step is started, the 5 th feature map is used and is divided into three branches, and the 6 th-8 th feature map is generated, the resolution of the 6 th feature map is kept the same as that of the 5 th feature map, the perceptive field is kept different by using dilation convolution, the 7 th and 8 th feature maps are used for reducing the resolution and increasing semantic information, and then the perceptive field of the 7 th and 8 th feature maps is increased by using dilation convolution.
3. The target detection algorithm based on scale invariance and feature fusion of claim 1, wherein: the second step is to select a fusion characteristic mode specifically;
step 2.1: the feature maps of the 2 nd to the 8 th are changed into the feature map of the channel 256 through convolution operation, wherein the feature maps of the 6 th to the 8 th are generated into P6-P8;
step 2.2: after the feature maps of 7 and 8 are subjected to upsampling, the feature maps and the feature map of 6 are fused into a feature map 5, and after the fusion, each fusion result is convolved to generate P5;
step 2.3: performing upsampling fusion on the P5 to the feature map 4, and performing convolution on each fusion result after the fusion to generate P4;
step 2.4: and repeating the step 2.3 until the feature map 2 is fused, and generating P2 and P3.
4. The target detection algorithm based on scale invariance and feature fusion of claim 1, wherein: the third step is specifically that the first step is,
step 3.1: for the P2, P3, P4, P5, P6, P7, and P8 layers, a large amount of anchors are generated;
step 3.2: for the three layers of P6, P7 and P8, the anchorages and ground channels generated by the three layers are screened according to a function l _ i ≦ v/wh h [ (u) ≦ u ] and i represents the minimum width value, u _ i represents the maximum width value, w and h represent the height and width of the frame respectively, P6 retains only small anchorages, P7 retains only medium anchorages, and P8 retains only large anchorages; then, suppressing and generating the candidate frames of the first part by using an NMS non-maximum value with an IoU threshold value of 0.5 for the anchor, and then carrying out classification and border regression on the candidate frames of the first part; the value of the IOU is the intersection of the two prediction boxes divided by the union of the two prediction boxes; NMS compares all the frames one by one, if the intersection of the two frames is larger than the threshold value set by IOU, then the frame with the maximum score is kept, and the other frames are deleted; obtaining a first partial candidate frame; p6 only needs to return loss to small group route, P7 only needs to return loss to medium group route, and P8 only needs to return loss to large group route;
step 3.2: after the regressed candidate frames of the first part are obtained, inhibiting and generating the candidate frames of the second part by using the NMS non-maximum value with the threshold value of 0.6, and then classifying and frame regressing the candidate frames of the second part;
step 3.3: and after the candidate frames after the second part of regression are obtained, inhibiting and generating a final candidate frame by using the NMS non-maximum value with the threshold value of 0.7, and then classifying and performing frame regression on the final candidate frame.
5. The target detection algorithm based on scale invariance and feature fusion of claim 4, wherein: the classifying the candidate box may include classifying the candidate box,
mapping the features corresponding to the candidate frames to a (0, 1) interval by utilizing a softmax function in classification, wherein the features correspond to n categories, n is an integer greater than 1, and the category with the highest probability is a predicted category;
where Si represents the probability for the class, ei represents the prediction score for the class, and Σ jej represents the sum of all class scores.
6. The target detection algorithm based on scale invariance and feature fusion of claim 4, wherein: the regression of the final candidate box comprises:
the regression utilizes a DIoU loss function to calculate the scale, the overlapping rate and the distance between the candidate frame and the target;
IoU represents the intersection ratio of the target frame and the candidate frame, b represents the center point of the candidate frame, bgt represents the center point of the target frame, ρ represents calculation of the European sniping between the two center points, and c represents the diagonal distance of the minimum closure area which can contain the candidate frame and the target frame at the same time.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010856245.XA CN112115977B (en) | 2020-08-24 | 2020-08-24 | Target detection algorithm based on scale invariance and feature fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010856245.XA CN112115977B (en) | 2020-08-24 | 2020-08-24 | Target detection algorithm based on scale invariance and feature fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112115977A true CN112115977A (en) | 2020-12-22 |
CN112115977B CN112115977B (en) | 2024-04-02 |
Family
ID=73805356
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010856245.XA Active CN112115977B (en) | 2020-08-24 | 2020-08-24 | Target detection algorithm based on scale invariance and feature fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112115977B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109345472A (en) * | 2018-09-11 | 2019-02-15 | 重庆大学 | A kind of infrared moving small target detection method of complex scene |
US20190057507A1 (en) * | 2017-08-18 | 2019-02-21 | Samsung Electronics Co., Ltd. | System and method for semantic segmentation of images |
CN109871806A (en) * | 2019-02-21 | 2019-06-11 | 山东大学 | Landform recognition methods and system based on depth residual texture network |
CN110689044A (en) * | 2019-08-22 | 2020-01-14 | 湖南四灵电子科技有限公司 | Target detection method and system combining relationship between targets |
CN110929578A (en) * | 2019-10-25 | 2020-03-27 | 南京航空航天大学 | Anti-blocking pedestrian detection method based on attention mechanism |
CN111027547A (en) * | 2019-12-06 | 2020-04-17 | 南京大学 | Automatic detection method for multi-scale polymorphic target in two-dimensional image |
CN111241905A (en) * | 2019-11-21 | 2020-06-05 | 南京工程学院 | Power transmission line nest detection method based on improved SSD algorithm |
CN111292305A (en) * | 2020-01-22 | 2020-06-16 | 重庆大学 | Improved YOLO-V3 metal processing surface defect detection method |
CN111310756A (en) * | 2020-01-20 | 2020-06-19 | 陕西师范大学 | Damaged corn particle detection and classification method based on deep learning |
-
2020
- 2020-08-24 CN CN202010856245.XA patent/CN112115977B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190057507A1 (en) * | 2017-08-18 | 2019-02-21 | Samsung Electronics Co., Ltd. | System and method for semantic segmentation of images |
CN109345472A (en) * | 2018-09-11 | 2019-02-15 | 重庆大学 | A kind of infrared moving small target detection method of complex scene |
CN109871806A (en) * | 2019-02-21 | 2019-06-11 | 山东大学 | Landform recognition methods and system based on depth residual texture network |
CN110689044A (en) * | 2019-08-22 | 2020-01-14 | 湖南四灵电子科技有限公司 | Target detection method and system combining relationship between targets |
CN110929578A (en) * | 2019-10-25 | 2020-03-27 | 南京航空航天大学 | Anti-blocking pedestrian detection method based on attention mechanism |
CN111241905A (en) * | 2019-11-21 | 2020-06-05 | 南京工程学院 | Power transmission line nest detection method based on improved SSD algorithm |
CN111027547A (en) * | 2019-12-06 | 2020-04-17 | 南京大学 | Automatic detection method for multi-scale polymorphic target in two-dimensional image |
CN111310756A (en) * | 2020-01-20 | 2020-06-19 | 陕西师范大学 | Damaged corn particle detection and classification method based on deep learning |
CN111292305A (en) * | 2020-01-22 | 2020-06-16 | 重庆大学 | Improved YOLO-V3 metal processing surface defect detection method |
Non-Patent Citations (5)
Title |
---|
MIAOHUI ZHANG等: "Adaptive Anchor Networks for Multi-Scale Object Detection in Remote Sensing Images", 《IEEE ACCESS》, vol. 8, pages 57552 - 57565, XP011781249, DOI: 10.1109/ACCESS.2020.2982658 * |
ZEMING LI等: "DetNet: A Backbone network for Object Detection", 《ARXIV:1804.06215V2》, pages 1 - 17 * |
丁瑶: "基于融合机制的航拍目标检测与识别", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》, no. 07, pages 138 - 572 * |
刘辉等: "基于多特征融合与ROI预测的红外目标跟踪算法", 《光子学报》, vol. 48, no. 07, pages 108 - 123 * |
李季等: "基于尺度不变性与特征融合的目标检测算法", 《南京大学学报(自然科学)》, vol. 57, no. 02, pages 237 - 244 * |
Also Published As
Publication number | Publication date |
---|---|
CN112115977B (en) | 2024-04-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107341517B (en) | Multi-scale small object detection method based on deep learning inter-level feature fusion | |
CN111027493B (en) | Pedestrian detection method based on deep learning multi-network soft fusion | |
CN112507777A (en) | Optical remote sensing image ship detection and segmentation method based on deep learning | |
US20060165258A1 (en) | Tracking objects in videos with adaptive classifiers | |
CN111738055B (en) | Multi-category text detection system and bill form detection method based on same | |
CN113139543B (en) | Training method of target object detection model, target object detection method and equipment | |
Yang et al. | Real-time pedestrian and vehicle detection for autonomous driving | |
CN111368769A (en) | Ship multi-target detection method based on improved anchor point frame generation model | |
CN111274981B (en) | Target detection network construction method and device and target detection method | |
CN111461145B (en) | Method for detecting target based on convolutional neural network | |
CN111460980A (en) | Multi-scale detection method for small-target pedestrian based on multi-semantic feature fusion | |
CN113255837A (en) | Improved CenterNet network-based target detection method in industrial environment | |
CN112232371A (en) | American license plate recognition method based on YOLOv3 and text recognition | |
CN111340039A (en) | Target detection method based on feature selection | |
CN110084284A (en) | Target detection and secondary classification algorithm and device based on region convolutional neural networks | |
CN113313706A (en) | Power equipment defect image detection method based on detection reference point offset analysis | |
CN111462090B (en) | Multi-scale image target detection method | |
CN113297959A (en) | Target tracking method and system based on corner attention twin network | |
CN111368845A (en) | Feature dictionary construction and image segmentation method based on deep learning | |
CN114332921A (en) | Pedestrian detection method based on improved clustering algorithm for Faster R-CNN network | |
CN111931572B (en) | Target detection method for remote sensing image | |
CN114037839A (en) | Small target identification method, system, electronic equipment and medium | |
CN113963272A (en) | Unmanned aerial vehicle image target detection method based on improved yolov3 | |
CN116758340A (en) | Small target detection method based on super-resolution feature pyramid and attention mechanism | |
CN112115977A (en) | Target detection algorithm based on scale invariance and feature fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |