CN110782420A - Small target feature representation enhancement method based on deep learning - Google Patents
Small target feature representation enhancement method based on deep learning Download PDFInfo
- Publication number
- CN110782420A CN110782420A CN201910886472.4A CN201910886472A CN110782420A CN 110782420 A CN110782420 A CN 110782420A CN 201910886472 A CN201910886472 A CN 201910886472A CN 110782420 A CN110782420 A CN 110782420A
- Authority
- CN
- China
- Prior art keywords
- feature map
- characteristic diagram
- feature
- size
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 22
- 238000013135 deep learning Methods 0.000 title claims abstract description 15
- 238000010586 diagram Methods 0.000 claims abstract description 30
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 29
- 238000001514 detection method Methods 0.000 claims abstract description 17
- 230000007246 mechanism Effects 0.000 claims abstract description 10
- 230000002708 enhancing effect Effects 0.000 claims abstract description 8
- 238000003062 neural network model Methods 0.000 claims abstract description 8
- 238000012549 training Methods 0.000 claims abstract description 6
- 239000011159 matrix material Substances 0.000 claims description 12
- 230000004927 fusion Effects 0.000 claims description 10
- 238000011176 pooling Methods 0.000 claims description 10
- 229940050561 matrix product Drugs 0.000 claims description 6
- 230000006870 function Effects 0.000 claims description 5
- 230000003213 activating effect Effects 0.000 claims description 3
- 230000004913 activation Effects 0.000 claims description 3
- 230000008859 change Effects 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 230000004807 localization Effects 0.000 claims 1
- 239000010410 layer Substances 0.000 description 18
- 239000013598 vector Substances 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000002356 single layer Substances 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a small target feature representation enhancing method based on deep learning. The invention comprises the following steps: step 1, pre-training a neural network model Faster R-CNN on a super-large-scale database which contains more than 1400 million images and covers 20000 categories; step 2, reading input image data; step 3, generating a characteristic diagram through a convolutional neural network, and establishing a characteristic diagram space pyramid; step 4, obtaining the characteristic diagram weight from the attention mechanism module; step 5, fusing feature graphs from different layers according to the obtained weights; step 6, detecting and positioning the characteristic diagram; and 7, repeating the steps 3 to 6 aiming at the specified task and continuing to train the neural network model until the network reaches an optimal value. The method enhances the influence of the significant features and effectively combines the deep semantic and shallow high-resolution convolutional neural network features, thereby improving the overall target detection accuracy.
Description
Technical Field
The invention relates to a target detection method, in particular to a small target feature representation enhancement method based on deep learning, and belongs to the technical field of computer visual image processing.
Background
Object detection, one of the basic problems of computer vision, is the basis of many other computer vision tasks, such as example segmentation, image captioning, object tracking, etc. From the application point of view, object detection can be divided into two research topics, "general object detection" and "specific object detection", the former aiming at exploring methods for detecting different types of objects under a unified framework to simulate human vision and cognition; the latter refers to detection in a specific application scenario, such as pedestrian detection, face detection, text detection, and the like. In recent years, with the rapid development of deep learning technology, new blood is injected for target detection, and a significant breakthrough is made, which is pushed to an unprecedented research hotspot. At present, target detection is widely applied to the fields of autonomous driving, robot vision, video monitoring and the like.
Disclosure of Invention
The invention provides a small target feature representation enhancing method based on deep learning, which is mainly used for solving the problem of contradiction between detail information and abstract information existing in single-layer convolution features.
To achieve the above technical objectives, the present invention adopts the following technical solutions:
a small target feature representation enhancing method based on deep learning is realized by the following steps:
firstly, pre-training a neural network model Faster R-CNN on a super-large-scale database which contains more than 1400 million images and covers 20000 categories;
reading input image data;
step (3) generating a characteristic diagram through a convolutional neural network, and establishing a characteristic diagram space pyramid;
step (4) obtaining feature map weights from the attention mechanism module;
step (5) fusing feature maps from different layers according to the obtained weights;
and (6) detecting and positioning the characteristic diagram.
Step (7) repeating the steps (3) to (6) for a specific task to continue training the neural network model until the network reaches an optimal value;
further, the deep convolutional neural network framework adopted in the step (1) is fast R-CNN, which comprises:
fast RCNN firstly supports inputting pictures of arbitrary size, and the pictures are set with a normalized dimension before entering the network, for example, the short edge of the image is set to be not more than 600, and the long edge of the image is set to be not more than 1000, and we can assume that M × N is 1000 × 600 (if the pictures are less than the size, 0 can be complemented by the edge, i.e. the image has black edges).
Further, the Faster R-CNN network framework includes:
13 convolution (conv) layers: kernel _ size is 3, pad is 1, stride is 1;
using the convolution formula:
where kernel _ size indicates that the convolution kernel size used is 3 × 3, pad indicates that the edge is filled with 1 bit, and stride indicates that the convolution kernel is shifted by 1 bit at a time. The formula of calculation shows that the conv layer does not change the picture size, that is: the size of the input picture is equal to the size of the output picture;
further, the Faster R-CNN also includes 13 activation (relu) layers: activating a function without changing the size of the picture;
further, FasterR-CNN also comprises 4 pooling (Pooling) layers: kernel _ size 2, stride 2; the pooling (Pooling) layer would let the output picture be 1/2 of the input picture;
further, after feature extraction, the picture size becomes (M/16) × (N/16), that is: 60 x 40(1000/16 ≈ 60,600/16 ≈ 40); the feature map is 60 × 40 × 512, which means that the size of the feature map is 60 × 40, and the number of the feature map is 512;
further, the step (3) of establishing a feature map spatial pyramid by using the feature map generated by the convolutional neural network is specifically implemented as follows:
the output is activated using the characteristics of the last convolutional layer output of each stage. For the conv2, conv3, conv4 and conv5 outputs, these final outputs are denoted as { C2, C3, C4, C5} and they have a step size of {4,8,16,32} relative to the input image. Conv1 would not be incorporated into the pyramid due to its large memory footprint. The low resolution feature map is then up-sampled by a factor of 2 so that each of the cross-connect bottom-up and top-down path feature maps have the same dimensions. Before fusion, an attention mechanism module is used for automatically learning the weights of the feature maps with different scales, and then fusion is carried out. This process is iterative until the final feature map is generated. This final set of feature maps is called P2, P3, P4, P5, corresponding to C2, C3, C4, C5, respectively.
Further, the step (4) obtains the feature graph weight from the attention mechanism module, and is specifically realized as follows:
the characteristic diagram A is subjected to matrix multiplication with the transposed AT of the characteristic diagram, because the characteristic diagram has channel dimensions, each pixel and every other pixel are equivalently subjected to point multiplication, the point multiplication geometrical meaning of the vectors is to calculate the similarity of two vectors, and the more similar the two vectors are, the larger the point multiplication is. And multiplying the characteristic diagram transpose matrix and the characteristic diagram matrix, and then normalizing by softmax to obtain the attention weight wi. The attention weight wi is multiplied by the transpose of the feature graph through a matrix, the correlation information is redistributed to the original feature graph, and the fusion mode is expressed by a formula as follows:
wi=softmax(matmul(Ai,AiT))
wherein matmul represents the matrix product and the softmax function represents the value that maps the matrix product to (0, 1);
further, the step (5) of fusing feature maps from different layers according to the obtained attention weight wi, wherein the fused formula is as follows:
Ei=wi*Ai+Ai;
wherein Ei represents the ith new feature map, wi represents the attention weight of the ith layer, and Ai represents the ith original feature map;
further, the detection and the positioning of the feature map in the step (6) are specifically realized as follows: and for the features extracted from the fused feature map, detecting whether the features belong to a specific class by using a classifier, and further adjusting the position of the candidate frame belonging to a certain class by using a positioner.
The invention has the following advantages:
the method utilizes the deep learning technology to detect the image content, automatically learns the characteristics of the target types by means of non-manual intervention, and has good robustness and self-adaptive capacity for detecting the small targets during identification, classification and positioning.
The method enhances the influence of the obvious features, and effectively combines the deep semantic and shallow high-resolution convolutional neural network features, thereby improving the overall target detection precision.
Drawings
FIG. 1 is a flow chart of the overall implementation of the present invention;
FIG. 2 is a diagram of a convolutional neural network architecture used in the present invention;
FIG. 3 is a diagram of a weight assignment method used by the present invention;
FIG. 4 is a picture to be examined;
FIG. 5 is a picture after inspection using the present invention;
Detailed Description
The attached drawings disclose a flow chart of a preferred embodiment of the invention in a non-limiting way; the technical solution of the present invention will be described in detail below with reference to the accompanying drawings.
As shown in fig. 1, fig. 2 and fig. 3, a small target feature representation enhancement method based on deep learning is implemented by the following steps:
firstly, pre-training a neural network model Faster R-CNN on a super-large-scale database which contains more than 1400 million images and covers 20000 categories;
reading input image data;
step (3) generating a characteristic diagram through a convolutional neural network, and establishing a characteristic diagram space pyramid;
step (4) obtaining feature map weights from the attention mechanism module;
step (5) fusing feature maps from different layers according to the obtained weights;
and (6) detecting and positioning the characteristic diagram.
Step (7) repeating the steps (3) to (6) for a specific task to continue training the neural network model until the network reaches an optimal value;
further, the deep convolutional neural network framework adopted in the step (1) is fast R-CNN, which comprises:
fast RCNN firstly supports inputting pictures of arbitrary size, and the pictures are set with a normalized dimension before entering the network, for example, the short edge of the image is set to be not more than 600, and the long edge of the image is set to be not more than 1000, and we can assume that M × N is 1000 × 600 (if the pictures are less than the size, 0 can be complemented by the edge, i.e. the image has black edges).
Further, the Faster R-CNN network framework includes:
13 convolution (conv) layers: kernel _ size is 3, pad is 1, stride is 1;
using the convolution formula:
where kernel _ size indicates that the convolution kernel size used is 3 × 3, pad indicates that the edge is filled with 1 bit, and stride indicates that the convolution kernel is shifted by 1 bit at a time. The formula of calculation shows that the conv layer does not change the picture size, that is: the size of the input picture is equal to the size of the output picture;
further, the Faster R-CNN also includes 13 activation (relu) layers: activating a function without changing the size of the picture;
further, FasterR-CNN also comprises 4 pooling (Pooling) layers: kernel _ size 2, stride 2; the pooling (Pooling) layer would let the output picture be 1/2 of the input picture;
further, after feature extraction, the picture size becomes (M/16) × (N/16), that is: 60 x 40(1000/16 ≈ 60,600/16 ≈ 40); the feature map is 60 × 40 × 512, which means that the size of the feature map is 60 × 40, and the number of the feature map is 512;
further, the step (3) of establishing a feature map spatial pyramid by using the feature map generated by the convolutional neural network is specifically implemented as follows:
the output is activated using the characteristics of the last convolutional layer output of each stage. For the conv2, conv3, conv4 and conv5 outputs, these final outputs are denoted as { C2, C3, C4, C5} and they have a step size of {4,8,16,32} relative to the input image. Conv1 would not be incorporated into the pyramid due to its large memory footprint. The low resolution feature map is then up-sampled by a factor of 2 so that each of the cross-connect bottom-up and top-down path feature maps have the same dimensions. Before fusion, an attention mechanism module is used for automatically learning the weights of the feature maps with different scales, and then fusion is carried out. This process is iterative until the final feature map is generated. This final set of feature maps is called P2, P3, P4, P5, corresponding to C2, C3, C4, C5, respectively.
Further, the step (4) obtains the feature graph weight from the attention mechanism module, and is specifically realized as follows:
the characteristic diagram A is subjected to matrix multiplication with the transposed AT of the characteristic diagram, because the characteristic diagram has channel dimensions, each pixel and every other pixel are equivalently subjected to point multiplication, the point multiplication geometrical meaning of the vectors is to calculate the similarity of two vectors, and the more similar the two vectors are, the larger the point multiplication is. And multiplying the characteristic diagram transpose matrix and the characteristic diagram matrix, and then normalizing by softmax to obtain the attention weight wi. The attention weight wi is multiplied by the transpose of the feature graph through a matrix, the correlation information is redistributed to the original feature graph, and the fusion mode is expressed by a formula as follows:
wi=softmax(matmul(Ai,AiT))
wherein matmul represents the matrix product and the softmax function represents the value that maps the matrix product to (0, 1);
further, the step (5) of fusing feature maps from different layers according to the obtained attention weight wi, wherein the fused formula is as follows:
Ei=wi*Ai+Ai;
wherein Ei represents the ith new feature map, wi represents the attention weight of the ith layer, and Ai represents the ith original feature map;
further, the detection and the positioning of the feature map in the step (6) are specifically realized as follows: and for the features extracted from the fused feature map, detecting whether the features belong to a specific class by using a classifier, and further adjusting the position of the candidate frame belonging to a certain class by using a positioner.
As can be easily seen from the comparison between fig. 4 and fig. 5, the detection effect on the small target in the picture is significant.
Claims (7)
1. A small target feature representation enhancement method based on deep learning is characterized by comprising the following implementation steps:
firstly, pre-training a neural network model Faster R-CNN on a super-large-scale database which contains more than 1400 million images and covers 20000 categories;
reading input image data;
step (3) generating a characteristic diagram through a convolutional neural network, and establishing a characteristic diagram space pyramid;
step (4) obtaining feature map weights from the attention mechanism module;
step (5) fusing feature maps from different layers according to the obtained weights;
step (6) detecting and positioning the characteristic diagram;
and (7) repeating the steps (3) to (6) for the specified task, and continuing to train the neural network model until the network reaches an optimal value.
2. The method for enhancing small target feature representation based on deep learning of claim 1, wherein the deep convolutional neural network framework adopted in step (1) is fast R-CNN, which comprises:
the fast RCNN firstly supports inputting pictures with any size, and the pictures are set in a regularized scale before entering a network, if the short edge of the picture is set to be not more than N and the long edge of the picture is set to be not more than M, if the picture is less than the size, the edge is supplemented with 0, namely the picture has a black edge;
the Faster R-CNN network framework includes:
13 convolution (conv) layers: kernel _ size is 3, pad is 1, stride is 1;
using the convolution formula:
wherein, kernel _ size indicates that the size of the used convolution kernel is 3 × 3, pad indicates that the edge is filled with 1 bit, and stride indicates that the convolution kernel moves 1 bit each time; the computational formula shows that the conv layer does not change the size of the picture;
the Faster R-CNN also includes 13 activation (relu) layers;
FasterR-CNN also comprises 4 pooling layers: kernel _ size 2, stride 2; the pooling layer would let the output picture be 1/2 of the input picture.
3. The method according to claim 2, wherein the image size is (M/16) × (N/16) after the step (2) of reading the input image data and performing feature extraction, and the feature map is (M/16) × (N/16) × 512, which indicates that the feature map has a size of (M/16) × (N/16) and a number of 512.
4. The method for enhancing feature representation of small objects based on deep learning of claim 3, wherein the feature map generated by the convolutional neural network in step (3) is used to establish a feature map spatial pyramid, and the method is specifically implemented as follows:
activating an output using a characteristic of a last convolutional layer output of each stage; for conv2, conv3, conv4 and conv5 outputs, the final outputs are denoted as { C2, C3, C4, C5} and they have a step size of {4,8,16,32} relative to the input image; then, performing 2 times of upsampling on the low-resolution feature map, so that each feature map of the transverse connection from the bottom to the top path and each feature map of the transverse connection from the top to the bottom path have the same size; before fusion, an attention mechanism module is used for automatically learning the weights of the feature maps with different scales, and then fusion is carried out until a final feature map is generated; the final feature map set is called P2, P3, P4, P5, corresponding to C2, C3, C4, C5, respectively.
5. The method for enhancing the small target feature representation based on the deep learning as claimed in claim 4, wherein the step (4) obtains the feature map weight from the attention mechanism module, and is implemented as follows:
matrix multiplication is carried out on the characteristic diagram A and the transposed AT of the characteristic diagram, and as the characteristic diagram has channel dimensions, each pixel and each other pixel are equivalently subjected to dot multiplication operation; multiplying the characteristic diagram transpose matrix and the characteristic diagram matrix, and then normalizing by softmax to obtain an attention weight wi; the attention weight wi is multiplied by the transpose of the feature graph through a matrix, the correlation information is redistributed to the original feature graph, and the fusion mode is expressed by a formula as follows:
wi=softmax(matmul(Ai,AiT))
where matmul represents the matrix product and the softmax function represents the value that maps the matrix product to (0, 1).
6. The method for enhancing small object feature representation based on deep learning of claim 5, wherein the step (5) is to fuse feature maps from different levels according to the obtained attention weight wi, and the formula of the fusion is as follows:
Ei=wi*Ai+Ai;
wherein Ei represents the ith new feature map, wi represents the attention weight of the ith layer, and Ai represents the ith original feature map.
7. The method for enhancing small target feature representation based on deep learning of claim 6, wherein the detection and localization of the feature map in step (6) are implemented as follows: and for the features extracted from the fused feature map, detecting whether the features belong to a specific class by using a classifier, and further adjusting the position of the candidate frame belonging to a certain class by using a positioner.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910886472.4A CN110782420A (en) | 2019-09-19 | 2019-09-19 | Small target feature representation enhancement method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910886472.4A CN110782420A (en) | 2019-09-19 | 2019-09-19 | Small target feature representation enhancement method based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110782420A true CN110782420A (en) | 2020-02-11 |
Family
ID=69383587
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910886472.4A Pending CN110782420A (en) | 2019-09-19 | 2019-09-19 | Small target feature representation enhancement method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110782420A (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111507183A (en) * | 2020-03-11 | 2020-08-07 | 杭州电子科技大学 | Crowd counting method based on multi-scale density map fusion cavity convolution |
CN111507359A (en) * | 2020-03-09 | 2020-08-07 | 杭州电子科技大学 | Self-adaptive weighting fusion method of image feature pyramid |
CN111539458A (en) * | 2020-04-02 | 2020-08-14 | 咪咕文化科技有限公司 | Feature map processing method and device, electronic equipment and storage medium |
CN111563414A (en) * | 2020-04-08 | 2020-08-21 | 西北工业大学 | SAR image ship target detection method based on non-local feature enhancement |
CN111709291A (en) * | 2020-05-18 | 2020-09-25 | 杭州电子科技大学 | Takeaway personnel identity identification method based on fusion information |
CN111709294A (en) * | 2020-05-18 | 2020-09-25 | 杭州电子科技大学 | Express delivery personnel identity identification method based on multi-feature information |
CN111723841A (en) * | 2020-05-09 | 2020-09-29 | 北京捷通华声科技股份有限公司 | Text detection method and device, electronic equipment and storage medium |
CN112131935A (en) * | 2020-08-13 | 2020-12-25 | 浙江大华技术股份有限公司 | Motor vehicle carriage manned identification method and device and computer equipment |
CN112131925A (en) * | 2020-07-22 | 2020-12-25 | 浙江元亨通信技术股份有限公司 | Construction method of multi-channel characteristic space pyramid |
CN112396115A (en) * | 2020-11-23 | 2021-02-23 | 平安科技(深圳)有限公司 | Target detection method and device based on attention mechanism and computer equipment |
CN113327253A (en) * | 2021-05-24 | 2021-08-31 | 北京市遥感信息研究所 | Weak and small target detection method based on satellite-borne infrared remote sensing image |
CN113570003A (en) * | 2021-09-23 | 2021-10-29 | 深圳新视智科技术有限公司 | Feature fusion defect detection method and device based on attention mechanism |
CN113591593A (en) * | 2021-07-06 | 2021-11-02 | 厦门路桥信息股份有限公司 | Method, equipment and medium for detecting target under abnormal weather based on causal intervention |
US11436447B2 (en) | 2020-06-29 | 2022-09-06 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Target detection |
US11521603B2 (en) | 2020-06-30 | 2022-12-06 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Automatically generating conference minutes |
CN115482395A (en) * | 2022-09-30 | 2022-12-16 | 北京百度网讯科技有限公司 | Model training method, image classification method, device, electronic equipment and medium |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109242032A (en) * | 2018-09-21 | 2019-01-18 | 桂林电子科技大学 | A kind of object detection method based on deep learning |
CN109658387A (en) * | 2018-11-27 | 2019-04-19 | 北京交通大学 | The detection method of the pantograph carbon slide defect of power train |
CN109816037A (en) * | 2019-01-31 | 2019-05-28 | 北京字节跳动网络技术有限公司 | The method and apparatus for extracting the characteristic pattern of image |
CN109858451A (en) * | 2019-02-14 | 2019-06-07 | 清华大学深圳研究生院 | A kind of non-cooperation hand detection method |
CN109902399A (en) * | 2019-03-01 | 2019-06-18 | 哈尔滨理工大学 | Rolling bearing fault recognition methods under a kind of variable working condition based on ATT-CNN |
CN109948658A (en) * | 2019-02-25 | 2019-06-28 | 浙江工业大学 | The confrontation attack defense method of Feature Oriented figure attention mechanism and application |
CN110084210A (en) * | 2019-04-30 | 2019-08-02 | 电子科技大学 | The multiple dimensioned Ship Detection of SAR image based on attention pyramid network |
CN110163836A (en) * | 2018-11-14 | 2019-08-23 | 宁波大学 | Based on deep learning for the excavator detection method under the inspection of high-altitude |
CN110245665A (en) * | 2019-05-13 | 2019-09-17 | 天津大学 | Image, semantic dividing method based on attention mechanism |
-
2019
- 2019-09-19 CN CN201910886472.4A patent/CN110782420A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109242032A (en) * | 2018-09-21 | 2019-01-18 | 桂林电子科技大学 | A kind of object detection method based on deep learning |
CN110163836A (en) * | 2018-11-14 | 2019-08-23 | 宁波大学 | Based on deep learning for the excavator detection method under the inspection of high-altitude |
CN109658387A (en) * | 2018-11-27 | 2019-04-19 | 北京交通大学 | The detection method of the pantograph carbon slide defect of power train |
CN109816037A (en) * | 2019-01-31 | 2019-05-28 | 北京字节跳动网络技术有限公司 | The method and apparatus for extracting the characteristic pattern of image |
CN109858451A (en) * | 2019-02-14 | 2019-06-07 | 清华大学深圳研究生院 | A kind of non-cooperation hand detection method |
CN109948658A (en) * | 2019-02-25 | 2019-06-28 | 浙江工业大学 | The confrontation attack defense method of Feature Oriented figure attention mechanism and application |
CN109902399A (en) * | 2019-03-01 | 2019-06-18 | 哈尔滨理工大学 | Rolling bearing fault recognition methods under a kind of variable working condition based on ATT-CNN |
CN110084210A (en) * | 2019-04-30 | 2019-08-02 | 电子科技大学 | The multiple dimensioned Ship Detection of SAR image based on attention pyramid network |
CN110245665A (en) * | 2019-05-13 | 2019-09-17 | 天津大学 | Image, semantic dividing method based on attention mechanism |
Non-Patent Citations (4)
Title |
---|
SHAOQING REN 等: "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", 《ARXIV》 * |
TSUNG-YI LIN 等: "Feature Pyramid Networks for Object Detection", 《ARXIV》 * |
董镭刚: "特征金字塔网络在图像检测中的应用", 《科学技术创新》 * |
陈飞 等: "基于多尺度特征融合的Faster R-CNN道路目标检测", 《中国计量大学学报》 * |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111507359A (en) * | 2020-03-09 | 2020-08-07 | 杭州电子科技大学 | Self-adaptive weighting fusion method of image feature pyramid |
CN111507183A (en) * | 2020-03-11 | 2020-08-07 | 杭州电子科技大学 | Crowd counting method based on multi-scale density map fusion cavity convolution |
CN111539458A (en) * | 2020-04-02 | 2020-08-14 | 咪咕文化科技有限公司 | Feature map processing method and device, electronic equipment and storage medium |
CN111539458B (en) * | 2020-04-02 | 2024-02-27 | 咪咕文化科技有限公司 | Feature map processing method and device, electronic equipment and storage medium |
CN111563414A (en) * | 2020-04-08 | 2020-08-21 | 西北工业大学 | SAR image ship target detection method based on non-local feature enhancement |
CN111563414B (en) * | 2020-04-08 | 2022-03-01 | 西北工业大学 | SAR image ship target detection method based on non-local feature enhancement |
CN111723841A (en) * | 2020-05-09 | 2020-09-29 | 北京捷通华声科技股份有限公司 | Text detection method and device, electronic equipment and storage medium |
CN111709294A (en) * | 2020-05-18 | 2020-09-25 | 杭州电子科技大学 | Express delivery personnel identity identification method based on multi-feature information |
CN111709291A (en) * | 2020-05-18 | 2020-09-25 | 杭州电子科技大学 | Takeaway personnel identity identification method based on fusion information |
CN111709294B (en) * | 2020-05-18 | 2023-07-14 | 杭州电子科技大学 | Express delivery personnel identity recognition method based on multi-feature information |
CN111709291B (en) * | 2020-05-18 | 2023-05-26 | 杭州电子科技大学 | Takeaway personnel identity recognition method based on fusion information |
US11436447B2 (en) | 2020-06-29 | 2022-09-06 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Target detection |
US11521603B2 (en) | 2020-06-30 | 2022-12-06 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Automatically generating conference minutes |
CN112131925B (en) * | 2020-07-22 | 2024-06-07 | 随锐科技集团股份有限公司 | Construction method of multichannel feature space pyramid |
CN112131925A (en) * | 2020-07-22 | 2020-12-25 | 浙江元亨通信技术股份有限公司 | Construction method of multi-channel characteristic space pyramid |
CN112131935A (en) * | 2020-08-13 | 2020-12-25 | 浙江大华技术股份有限公司 | Motor vehicle carriage manned identification method and device and computer equipment |
WO2021208726A1 (en) * | 2020-11-23 | 2021-10-21 | 平安科技(深圳)有限公司 | Target detection method and apparatus based on attention mechanism, and computer device |
CN112396115A (en) * | 2020-11-23 | 2021-02-23 | 平安科技(深圳)有限公司 | Target detection method and device based on attention mechanism and computer equipment |
CN112396115B (en) * | 2020-11-23 | 2023-12-22 | 平安科技(深圳)有限公司 | Attention mechanism-based target detection method and device and computer equipment |
CN113327253B (en) * | 2021-05-24 | 2024-05-24 | 北京市遥感信息研究所 | Weak and small target detection method based on satellite-borne infrared remote sensing image |
CN113327253A (en) * | 2021-05-24 | 2021-08-31 | 北京市遥感信息研究所 | Weak and small target detection method based on satellite-borne infrared remote sensing image |
CN113591593A (en) * | 2021-07-06 | 2021-11-02 | 厦门路桥信息股份有限公司 | Method, equipment and medium for detecting target under abnormal weather based on causal intervention |
CN113591593B (en) * | 2021-07-06 | 2023-08-15 | 厦门路桥信息股份有限公司 | Method, equipment and medium for detecting target in abnormal weather based on causal intervention |
CN113570003A (en) * | 2021-09-23 | 2021-10-29 | 深圳新视智科技术有限公司 | Feature fusion defect detection method and device based on attention mechanism |
CN113570003B (en) * | 2021-09-23 | 2022-01-07 | 深圳新视智科技术有限公司 | Feature fusion defect detection method and device based on attention mechanism |
CN115482395B (en) * | 2022-09-30 | 2024-02-20 | 北京百度网讯科技有限公司 | Model training method, image classification device, electronic equipment and medium |
CN115482395A (en) * | 2022-09-30 | 2022-12-16 | 北京百度网讯科技有限公司 | Model training method, image classification method, device, electronic equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110782420A (en) | Small target feature representation enhancement method based on deep learning | |
CN111210443B (en) | Deformable convolution mixing task cascading semantic segmentation method based on embedding balance | |
Gao et al. | Reading scene text with fully convolutional sequence modeling | |
CN114202672A (en) | Small target detection method based on attention mechanism | |
US20180114071A1 (en) | Method for analysing media content | |
CN108734210B (en) | Object detection method based on cross-modal multi-scale feature fusion | |
JP2017062781A (en) | Similarity-based detection of prominent objects using deep cnn pooling layers as features | |
CN111027576B (en) | Cooperative significance detection method based on cooperative significance generation type countermeasure network | |
CN111353544B (en) | Improved Mixed Pooling-YOLOV 3-based target detection method | |
CN110781744A (en) | Small-scale pedestrian detection method based on multi-level feature fusion | |
CN110781980B (en) | Training method of target detection model, target detection method and device | |
CN112434618B (en) | Video target detection method, storage medium and device based on sparse foreground priori | |
CN113487610B (en) | Herpes image recognition method and device, computer equipment and storage medium | |
CN116758130A (en) | Monocular depth prediction method based on multipath feature extraction and multi-scale feature fusion | |
CN111507359A (en) | Self-adaptive weighting fusion method of image feature pyramid | |
CN112037239B (en) | Text guidance image segmentation method based on multi-level explicit relation selection | |
Fan et al. | A novel sonar target detection and classification algorithm | |
CN112149526A (en) | Lane line detection method and system based on long-distance information fusion | |
Zhao et al. | BiTNet: a lightweight object detection network for real-time classroom behavior recognition with transformer and bi-directional pyramid network | |
US20230072445A1 (en) | Self-supervised video representation learning by exploring spatiotemporal continuity | |
CN109284752A (en) | A kind of rapid detection method of vehicle | |
Cai et al. | Vehicle detection based on visual saliency and deep sparse convolution hierarchical model | |
TWI809957B (en) | Object detection method and electronic apparatus | |
CN113688864B (en) | Human-object interaction relation classification method based on split attention | |
CN114387489A (en) | Power equipment identification method and device and terminal equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200211 |
|
RJ01 | Rejection of invention patent application after publication |