CN112560732B - Feature extraction method of multi-scale feature extraction network - Google Patents
Feature extraction method of multi-scale feature extraction network Download PDFInfo
- Publication number
- CN112560732B CN112560732B CN202011530198.6A CN202011530198A CN112560732B CN 112560732 B CN112560732 B CN 112560732B CN 202011530198 A CN202011530198 A CN 202011530198A CN 112560732 B CN112560732 B CN 112560732B
- Authority
- CN
- China
- Prior art keywords
- scale
- layer
- map
- feature
- sampling
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 59
- 238000001514 detection method Methods 0.000 claims abstract description 38
- 230000004927 fusion Effects 0.000 claims abstract description 26
- 238000005070 sampling Methods 0.000 claims description 56
- 238000011084 recovery Methods 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 abstract description 7
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000000034 method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Multimedia (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a feature extraction method of a multi-scale feature extraction network, which comprises a dimension reduction convolution layer, a scale feature extraction layer, a merging layer and a feature fusion layer which are sequentially connected, wherein the scale feature extraction layer comprises a large target detection branch, an original feature detection branch and a small target detection branch, features with different scales can be extracted, and feature fusion is carried out through the feature fusion layer, so that the multi-scale feature extraction network has the multi-scale feature extraction capability and low calculation complexity, the feature extraction network can be immediately applied when multi-scale feature extraction is needed, the target detection precision is improved, and the feature extraction method of the multi-scale feature extraction network can carry out feature dimension reduction, target detection and core multi-scale feature extraction on an input feature image to be extracted, can rapidly obtain the scale feature extraction image, and has the advantages of high target detection precision and less calculation quantity.
Description
Technical Field
The present invention relates to a neural network for feature extraction, and more particularly, to a multi-scale feature extraction network and a feature extraction method thereof.
Background
The multi-scale target detection is always a hotspot and a difficulty in research in the field of computer vision, in order to obtain the improvement of the precision of the multi-scale target detection, network structures such as FPN, PA-Net, NAS-FPN, biFPN and the like are continuously proposed, but because the network structures are relatively complex, the target detection precision is improved, and meanwhile, too much calculation amount is carried, so that the reasoning time is delayed, and the application and popularization of the multi-scale target detection in the industry become difficult.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a feature extraction method of a multi-scale feature extraction network for improving the target detection precision.
The technical scheme adopted for solving the technical problems is as follows:
the feature extraction method of the multi-scale feature extraction network comprises a dimension reduction convolution layer, a scale feature extraction layer, a merging layer and a feature fusion layer which are sequentially connected, wherein the scale feature extraction layer comprises a large target detection branch, an original feature detection branch and a small target detection branch.
The large target detection branch comprises a downsampling characteristic layer, a first cavity convolution layer and an upsampling recovery layer which are connected in sequence; the original feature detection branch comprises a second cavity convolution layer; the small target detection branch comprises an up-sampling characteristic layer, a third cavity convolution layer and a down-sampling recovery layer which are connected in sequence.
The first cavity convolution layer comprises three cavity convolutions with convolution kernels of 3*3, and the cavity rates of the three cavity convolutions are 1, 2 and 3 respectively; the structures of the second hole convolution layer and the third hole convolution layer are the same as those of the first hole convolution layer.
The convolution kernels of the dimension reduction convolution layer and the feature fusion layer are 1*1.
The feature extraction method of the multi-scale feature extraction network comprises the following steps of:
(1) Inputting the feature image to be extracted into a dimension reduction convolution layer to carry out 1x1 convolution, and carrying out feature fusion and dimension reduction on the feature image to be extracted to form an original feature image;
(2) Downsampling the original feature map by the downsampling feature map layer to form a downsampled feature map; the up-sampling feature map layer up-samples the original feature map to form an up-sampling feature map;
(3) The first hole convolution layer to the third hole convolution layer respectively conduct three 3*3 hole convolutions on the downsampled feature map, the original feature map and the upsampled feature map to generate a downsampled first scale map, a downsampled second scale map, a downsampled third scale map, an original first scale map, an original second scale map, an original third scale map, an upsampled first scale map, an upsampled second scale map and an upsampled third scale map;
(4) The up-sampling recovery layer up-samples the down-sampling first scale map, the down-sampling second scale map and the down-sampling third scale map respectively, and the down-sampling recovery layer down-samples the up-sampling first scale map, the up-sampling second scale map and the up-sampling third scale map respectively;
(5) And combining the three dimensional graphs obtained by respectively downsampling the downsampled first dimensional graph, the downsampled second dimensional graph and the downsampled third dimensional graph by the combining layer, and then performing 1*1 convolution fusion by the feature fusion layer to form a dimensional feature extraction graph.
The beneficial effects of the invention are as follows: the method can extract the features of different scales, and then performs feature fusion through the feature fusion layer, so that the multi-scale feature extraction network has the multi-scale feature extraction capability and low computation complexity, the feature extraction network can be immediately applied when the multi-scale feature extraction is required, the target detection precision is improved, and the feature extraction method of the multi-scale feature extraction network can perform feature dimension reduction, target detection and core multi-scale feature extraction on the input feature image to be extracted, can rapidly acquire the scale feature extraction image, and has the advantages of high target detection precision and less calculation amount.
Drawings
The invention will be further described with reference to the drawings and examples.
FIG. 1 is a schematic diagram of a network architecture of the present invention;
fig. 2 is a flow chart of a feature extraction method of the present invention.
Description of the embodiments
Referring to fig. 1, a feature extraction method of a multi-scale feature extraction network includes a dimension reduction convolution layer 1, a scale feature extraction layer, a merging layer 2 and a feature fusion layer 3 which are sequentially connected, wherein the scale feature extraction layer includes a large target detection branch, an original feature detection branch and a small target detection branch, features with different scales can be extracted, feature fusion is performed through the feature fusion layer 3, the multi-scale feature extraction network has multi-scale feature extraction capability and low calculation complexity, and the feature extraction network can be immediately applied when the neural network needs multi-scale feature extraction, so that the target detection accuracy is improved.
The large target detection branch comprises a downsampling characteristic layer 4, a first cavity convolution layer 5 and an upsampling recovery layer 6 which are connected in sequence; the original feature detection branch comprises a second cavity convolution layer 7; the small target detection branch comprises an UP-sampling feature layer 8, a third hole convolution layer 9 and a down-sampling recovery layer 10 which are sequentially connected, in this embodiment, an UP-sampling mode is UP-CONV, and a down-sampling mode is MAXPOOL.
The first hole convolution layer 5 comprises three hole convolutions with convolution kernels of 3*3, and the hole rates of the three hole convolutions are 1, 2 and 3 respectively; the structures of the second hole convolution layer 7 and the third hole convolution layer 9 are the same as those of the first hole convolution layer 5, and the structures are hole convolutions with three convolution kernels of 3*3, the hole rate is 1, 2 and 3 respectively, and each convolution kernel only acts on 1/3 of the channel number of the hole convolution layer, so that the large target detection branch, the original feature detection branch and the small target detection branch all have multi-scale feature extraction capability.
The convolution kernels of the dimension reduction convolution layer 1 and the feature fusion layer 3 are 1*1, so that the dimension reduction convolution layer 1 and the feature fusion layer 3 have feature dimension reduction and feature fusion capabilities, and calculation time is saved.
Referring to fig. 1 and 2, the feature extraction method of the multi-scale feature extraction network includes the steps of:
(1) The feature map to be extracted is input into the dimension reduction convolution layer 1 to carry out 1x1 convolution, feature fusion and dimension reduction are carried out on the feature map to be extracted to form an original feature map, the dimension reduction convolution layer is used for realizing feature fusion and dimension reduction of the features on the input feature map to be extracted, calculation time can be saved, and the feature depth of the original feature map is reduced to 1/3 of the original depth of the feature map to be extracted.
(2) The downsampling feature map layer 4 downsamples the original feature map to form a downsampling feature map, the depth of the downsampling feature map is the same as that of the original feature map, the width and the height of the image are 1/2 times that of the original feature map, and the downsampling purpose is to achieve detection of a large target and reduce the operation amount.
The up-sampling feature map layer 8 up-samples the original feature map to form an up-sampling feature map; the up-sampling feature map is the same as the original feature map, the image width and height are changed to 2 times of the original feature map, and the purpose of up-sampling is to realize the detection of a small target.
(3) The first hole convolution layer 5 to the third hole convolution layer 9 respectively carry out three 3*3 hole convolutions on the downsampled feature map, the original feature map and the upsampled feature map, namely, the downsampled feature map, the original feature map and the upsampled feature map respectively carry out three 3*3 hole convolutions, the hole rate of the three 3*3 hole convolutions is respectively 1 (namely, standard 3x3 convolutions), and 2 and 3, each convolution kernel only acts on 1/3 of the channel number of the layer, so that a downsampled first scale map 11, a downsampled second scale map 12, a downsampled third scale map 13, an original first scale map 14, an original second scale map 15, an original third scale map 16, an upsampled first scale map 17, an upsampled second scale map 18 and an upsampled third scale map 19 are generated, and the downsampled feature map, the original feature map and the upsampled feature map can accept convolutions of different receptive fields, and extraction of multi-scale features is realized.
(4) The up-sampling restoring layer 6 up-samples the down-sampling first scale map 11, the down-sampling second scale map 12 and the down-sampling third scale map 13 respectively, and the down-sampling restoring layer 10 down-samples the up-sampling first scale map 17, the up-sampling second scale map 18 and the up-sampling third scale map 19 respectively, so that the up-sampling restoring layer 6 and the down-sampling restoring layer 10 can keep the widths and heights of the down-sampling first scale map 11, the down-sampling second scale map 12, the down-sampling third scale map 13, the up-sampling first scale map 17, the up-sampling second scale map 18 and the up-sampling third scale map 19 consistent with those of the original first scale map 14, the original second scale map 15 and the original third scale map 16, and the combination and feature fusion of the fifth step are facilitated.
(5) The merging layer 2 respectively carries out up-sampling on the down-sampling first scale image 11, the down-sampling second scale image 12 and the down-sampling third scale image 13 to obtain three scale images, namely an original first scale image 14, an original second scale image 15 and an original third scale image 16, and three scale images respectively obtained by down-sampling the up-sampling first scale image 17, the up-sampling second scale image 18 and the up-sampling third scale image 19 are merged and then subjected to 1*1 convolution fusion by the feature fusion layer 3 to form a scale feature extraction image, multi-scale feature extraction is completed, and feature depth identical to that of the feature image to be extracted is kept.
The above embodiments do not limit the protection scope of the invention, and those skilled in the art can make equivalent modifications and variations without departing from the whole inventive concept, and they still fall within the scope of the invention.
Claims (1)
1. The characteristic extraction method of the multi-scale characteristic extraction network is characterized in that the multi-scale characteristic extraction network comprises a dimension reduction convolution layer (1), a scale characteristic extraction layer, a merging layer (2) and a characteristic fusion layer (3) which are connected in sequence, wherein the scale characteristic extraction layer comprises a large target detection branch, an original characteristic detection branch and a small target detection branch;
the large target detection branch comprises a downsampling characteristic layer (4), a first cavity convolution layer (5) and an upsampling recovery layer (6) which are connected in sequence; the original feature detection branch comprises a second hole convolution layer (7); the small target detection branch comprises an up-sampling characteristic layer (8), a third cavity convolution layer (9) and a down-sampling recovery layer (10) which are connected in sequence; the first cavity convolution layer (5) comprises three cavity convolutions with convolution kernels of 3*3, and the cavity rates of the three cavity convolutions are 1, 2 and 3 respectively; the structures of the second hole convolution layer (7) and the third hole convolution layer (9) are the same as those of the first hole convolution layer (5);
the convolution kernels of the dimension reduction convolution layer (1) and the feature fusion layer (3) are 1*1;
the feature extraction method of the multi-scale feature extraction network comprises the following steps of:
firstly, inputting a feature image to be extracted into a dimension reduction convolution layer to carry out 1x1 convolution, and carrying out feature fusion and dimension reduction on the feature image to be extracted to form an original feature image;
secondly, the downsampling feature map layer downsamples the original feature map to form a downsampled feature map; the up-sampling feature map layer up-samples the original feature map to form an up-sampling feature map;
thirdly, the first hole convolution layer to the third hole convolution layer respectively conduct three 3*3 hole convolutions on the downsampled feature map, the original feature map and the upsampled feature map to generate a downsampled first scale map, a downsampled second scale map, a downsampled third scale map, an original first scale map, an original second scale map, an original third scale map, an upsampled first scale map, an upsampled second scale map and an upsampled third scale map;
fourthly, the up-sampling recovery layer up-samples the down-sampling first scale map, the down-sampling second scale map and the down-sampling third scale map respectively, and the down-sampling recovery layer down-samples the up-sampling first scale map, the up-sampling second scale map and the up-sampling third scale map respectively;
and fifthly, merging three scale maps obtained by respectively carrying out downsampling on the downsampled first scale map, the downsampled second scale map and the downsampled third scale map by a merging layer, and carrying out 1*1 convolution fusion on the three scale maps obtained by respectively carrying out downsampling on the upsampled first scale map, the upsampled second scale map and the upsampled third scale map by a feature fusion layer to form a scale feature extraction map.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011530198.6A CN112560732B (en) | 2020-12-22 | 2020-12-22 | Feature extraction method of multi-scale feature extraction network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011530198.6A CN112560732B (en) | 2020-12-22 | 2020-12-22 | Feature extraction method of multi-scale feature extraction network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112560732A CN112560732A (en) | 2021-03-26 |
CN112560732B true CN112560732B (en) | 2023-07-04 |
Family
ID=75031388
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011530198.6A Active CN112560732B (en) | 2020-12-22 | 2020-12-22 | Feature extraction method of multi-scale feature extraction network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112560732B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115187820A (en) * | 2021-04-06 | 2022-10-14 | 中国科学院深圳先进技术研究院 | Light-weight target detection method, device, equipment and storage medium |
CN113313668B (en) * | 2021-04-19 | 2022-09-27 | 石家庄铁道大学 | Subway tunnel surface disease feature extraction method |
CN113378786B (en) * | 2021-07-05 | 2023-09-19 | 广东省机场集团物流有限公司 | Ultra-light target detection network and method |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108460403A (en) * | 2018-01-23 | 2018-08-28 | 上海交通大学 | The object detection method and system of multi-scale feature fusion in a kind of image |
CN111242036A (en) * | 2020-01-14 | 2020-06-05 | 西安建筑科技大学 | Crowd counting method based on encoding-decoding structure multi-scale convolutional neural network |
CN111462127A (en) * | 2020-04-20 | 2020-07-28 | 武汉大学 | Real-time semantic segmentation method and system for automatic driving |
CN111860693A (en) * | 2020-07-31 | 2020-10-30 | 元神科技(杭州)有限公司 | Lightweight visual target detection method and system |
CN111898539A (en) * | 2020-07-30 | 2020-11-06 | 国汽(北京)智能网联汽车研究院有限公司 | Multi-target detection method, device, system, equipment and readable storage medium |
CN111899191A (en) * | 2020-07-21 | 2020-11-06 | 武汉工程大学 | Text image restoration method and device and storage medium |
CN111967524A (en) * | 2020-08-20 | 2020-11-20 | 中国石油大学(华东) | Multi-scale fusion feature enhancement algorithm based on Gaussian filter feedback and cavity convolution |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11651206B2 (en) * | 2018-06-27 | 2023-05-16 | International Business Machines Corporation | Multiscale feature representations for object recognition and detection |
-
2020
- 2020-12-22 CN CN202011530198.6A patent/CN112560732B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108460403A (en) * | 2018-01-23 | 2018-08-28 | 上海交通大学 | The object detection method and system of multi-scale feature fusion in a kind of image |
CN111242036A (en) * | 2020-01-14 | 2020-06-05 | 西安建筑科技大学 | Crowd counting method based on encoding-decoding structure multi-scale convolutional neural network |
CN111462127A (en) * | 2020-04-20 | 2020-07-28 | 武汉大学 | Real-time semantic segmentation method and system for automatic driving |
CN111899191A (en) * | 2020-07-21 | 2020-11-06 | 武汉工程大学 | Text image restoration method and device and storage medium |
CN111898539A (en) * | 2020-07-30 | 2020-11-06 | 国汽(北京)智能网联汽车研究院有限公司 | Multi-target detection method, device, system, equipment and readable storage medium |
CN111860693A (en) * | 2020-07-31 | 2020-10-30 | 元神科技(杭州)有限公司 | Lightweight visual target detection method and system |
CN111967524A (en) * | 2020-08-20 | 2020-11-20 | 中国石油大学(华东) | Multi-scale fusion feature enhancement algorithm based on Gaussian filter feedback and cavity convolution |
Non-Patent Citations (4)
Title |
---|
AtICNet: semantic segmentation with atrous spatial pyramid pooling in image cascade network;Jin Chen等;《EURASIP Journal on Wireless Communications and Networking》;1-7 * |
基于FD-SSD的遥感图像多目标检测方法;朱敏超等;《计算机应用与软件》;第36卷(第1期);232-238 * |
基于多级特征和混合注意力机制的室内人群检测网络;沈文祥等;《计算机应用》;第39卷(第12期);3496-3502 * |
基于深度学习的无人驾驶汽车环境感知与控制方法研究;李健明;《中国优秀硕士学位论文全文数据库 工程科技II辑》(第01期);C035-469 * |
Also Published As
Publication number | Publication date |
---|---|
CN112560732A (en) | 2021-03-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112560732B (en) | Feature extraction method of multi-scale feature extraction network | |
CN108475415B (en) | Method and system for image processing | |
CN110909642A (en) | Remote sensing image target detection method based on multi-scale semantic feature fusion | |
CN110047058B (en) | Image fusion method based on residual pyramid | |
CN108717569A (en) | Expansion full convolution neural network and construction method thereof | |
CN112990219B (en) | Method and device for image semantic segmentation | |
CN110674704A (en) | Crowd density estimation method and device based on multi-scale expansion convolutional network | |
CN111932480A (en) | Deblurred video recovery method and device, terminal equipment and storage medium | |
CN115187820A (en) | Light-weight target detection method, device, equipment and storage medium | |
CN115953303A (en) | Multi-scale image compressed sensing reconstruction method and system combining channel attention | |
CN111612825A (en) | Image sequence motion occlusion detection method based on optical flow and multi-scale context | |
Yang et al. | Image super-resolution reconstruction based on improved Dirac residual network | |
Deng et al. | Multiple frame splicing and degradation learning for hyperspectral imagery super-resolution | |
CN110599495B (en) | Image segmentation method based on semantic information mining | |
CN115565034A (en) | Infrared small target detection method based on double-current enhanced network | |
CN106033594B (en) | Spatial information restoration methods based on the obtained feature of convolutional neural networks and device | |
CN113313162A (en) | Method and system for detecting multi-scale feature fusion target | |
CN114820423A (en) | Automatic cutout method based on saliency target detection and matching system thereof | |
CN111582353B (en) | Image feature detection method, system, device and medium | |
CN111428809B (en) | Crowd counting method based on spatial information fusion and convolutional neural network | |
CN115775214B (en) | Point cloud completion method and system based on multi-stage fractal combination | |
CN111402140A (en) | Single image super-resolution reconstruction system and method | |
CN117952883A (en) | Backlight image enhancement method based on bilateral grid and significance guidance | |
CN116681978A (en) | Attention mechanism and multi-scale feature fusion-based saliency target detection method | |
CN116740376A (en) | Pyramid integration and attention enhancement-based target detection method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |