CN113837199B - Image feature extraction method based on cross-layer residual double-path pyramid network - Google Patents
Image feature extraction method based on cross-layer residual double-path pyramid network Download PDFInfo
- Publication number
- CN113837199B CN113837199B CN202111002973.5A CN202111002973A CN113837199B CN 113837199 B CN113837199 B CN 113837199B CN 202111002973 A CN202111002973 A CN 202111002973A CN 113837199 B CN113837199 B CN 113837199B
- Authority
- CN
- China
- Prior art keywords
- feature
- network
- feature map
- residual
- cross
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 27
- 238000012937 correction Methods 0.000 claims description 9
- 238000000034 method Methods 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 6
- 238000010586 diagram Methods 0.000 abstract description 7
- 230000015556 catabolic process Effects 0.000 abstract description 6
- 238000006731 degradation reaction Methods 0.000 abstract description 6
- 230000007547 defect Effects 0.000 abstract description 5
- 230000004927 fusion Effects 0.000 abstract description 5
- 238000013527 convolutional neural network Methods 0.000 description 12
- 230000000007 visual effect Effects 0.000 description 10
- 238000011176 pooling Methods 0.000 description 8
- 238000001514 detection method Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- ZAIPMKNFIOOWCQ-UEKVPHQBSA-N cephalexin Chemical compound C1([C@@H](N)C(=O)N[C@H]2[C@@H]3N(C2=O)C(=C(CS3)C)C(O)=O)=CC=CC=C1 ZAIPMKNFIOOWCQ-UEKVPHQBSA-N 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- UHDGCWIWMRVCDJ-CCXZUQQUSA-N Cytarabine Chemical compound O=C1N=C(N)C=CN1[C@H]1[C@@H](O)[C@H](O)[C@@H](CO)O1 UHDGCWIWMRVCDJ-CCXZUQQUSA-N 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an image feature extraction method based on a cross-layer residual double-path pyramid network, which is characterized in that an original RGB color image is input into a residual network ResNet50 for preliminary feature extraction to obtain a feature pyramid network DTFPN from bottom to top; realizing a cross-layer residual network based on a residual network ResNet 50; the output characteristic diagram P1', is obtained after the processing of the characteristic pyramid network FPN P2' ", P3 '", P4' ", P5 '". The invention further alleviates the problem of network degradation of the residual network ResNet50 and mixes and utilizes the characteristics of different levels to further extract deep characteristics so as to obviously enhance the characteristic extraction capability of the residual network ResNet 50. The defect that high-level features in a Feature Pyramid Network (FPN) lack low-level detail texture information is overcome, and efficient fusion of feature map information of each level is realized.
Description
Technical Field
The invention relates to the fields of computer vision, artificial intelligence, mode identification and the like, in particular to an image feature extraction method based on a cross-layer residual double-path pyramid network.
Background
With the development of artificial intelligence, convolutional neural networks have become a major method for extracting image features, and well-known feature extraction networks include a plum network 5 (LeNet 5), an alexin network (AlexNet), a visual geometry group (Visual Geometry Group, VGG), a google network (google net), a residual network (ResNet), and the like.
The generation of the lie network 5 (LeNet 5) was one of the earliest convolutional neural networks in 1994, and has driven the development of deep learning. The method consists of two convolution layers, two pooling layers and two full-connection layers, wherein convolution adopts convolution kernel with the convolution kernel size of 5x5 and the stride of 1, and maximum pooling downsampling is adopted.
The alexant network (alexent) has obtained in 2012 the image network (ImageNet) champion, which can be said to be a deeper and wider version of Li Wanglao (LeNet) which contains 6 million 3000 ten thousand connections, 6000 ten thousand parameters and 65 ten thousand neurons, with 5 convolutional layers, 3 of which are followed by the max pooling layer and finally 3 full connection layers. The alexant network (alexent) wins the champion of the image network large-scale visual recognition challenge (ImageNet Large Scale Visual Recognition Challenge, ILSVRC) race with significant advantages, and the error rate of five prediction errors (top-5) is reduced from the previous 25.8% to 16.4. The main technical points of Alexax networks (AlexNet) are: (1) The gradient dispersion problem of the logic function (sigmoid) when the network is deep is solved by using the modified linear units (Rectified linear unit, reLU) as the activation function of the Convolutional Neural Network (CNN). (2) Random discard (Dropout) was used during training to randomly ignore a portion of the neurons to avoid model overfitting. (3) The overlapped maximum pooling is used in the Convolutional Neural Network (CNN), and the step length is smaller than the pooling core, so that the outputs are overlapped and covered, and the feature richness is improved. Heretofore Convolutional Neural Networks (CNNs) have commonly used average pooling, with alexant networks (AlexNet) all using maximum pooling, avoiding the ambiguity effects of average pooling. (4) Data enhancement is used, overfitting is reduced, and model generalization capability is improved.
The Visual Geometry Group (VGG) is the first network to use smaller 3x3 convolution kernels at each convolution layer and to treat them in combination as one convolution sequence, and is characterized by a large number of sequential convolution computations and a large computational effort. A tremendous development of the Visual Geometry Group (VGG) is that by employing multiple 3x3 convolutions in sequence, the effect of a larger receptive field can be simulated. Models of the Visual Geometry Group (VGG) indicate that depth is beneficial for improving classification accuracy, and one important idea is that convolution can replace full concatenation. The whole parameter reaches 1 hundred million and 4 million, and is mainly characterized in that the first full connection layer is replaced by convolution, and the parameter quantity is reduced without precision loss.
Google network (google net), the first "open" architecture, first appears in the race of the image network large-scale visual recognition challenge (ImageNet Large Scale Visual Recognition Challenge, ILSVRC) 2014 to gain the first name with great advantage. The "open network" (acceptance Net) in that match is generally called "open V1" (acceptance V1), and its biggest feature is that the calculation amount and the parameter amount are controlled, and meanwhile, very good classification performance is obtained, that is, the five-time prediction error (top-5) error rate is 6.67%, and only half of alexan network (alexene Net) is not reached. The "onset V1" (acceptance V1) has a 22-layer depth that is deeper than the 19 layers of an 8-layer or Visual Geometry Group (VGG) network of an alexin network (alexene). However, the calculated amount is only 15 hundred million floating point operations, and simultaneously, only 500 ten thousand parameter amounts are only 1/12 of Alexarnt network (Alexarnt) parameter amount (6000 ten thousand), but the accuracy far superior to Alexarnt network (Alexarnt) can be achieved, so that the model is a very excellent and practical model. The versions V2, V3 and V4 are sequentially deduced on the basis of the beginning V1 (introduction V1).
Residual network ResNet was proposed in 2015 to obtain a first name on the image network (ImageNet) game classification task because it is "simple and practical" and many methods are then done on the basis of residual network ResNet50 or residual network ResNet 101. The residual network ResNet provides a residual structure and effectively solves the problems of deep network gradient disappearance or gradient explosion and network degradation by using a batch normalization (Batch Normalization) method, so that the performance of the ultra-deep residual network ResNet is greatly improved compared with the prior characteristic extraction network, and excellent results are obtained in the fields of image detection, image classification, image segmentation and the like.
The Feature Pyramid Network (FPN) constructs a feature pyramid network capable of performing end-to-end training, and the feature pyramid network fuses high-level features extracted by the feature extraction network after downsampling with low-level features, so that semantic information of the low-level features is enriched. For small objects, feature Pyramid Networks (FPNs) increase the resolution of feature mapping, i.e., operate on larger feature maps to obtain more information about the small object.
Inspired by a residual network ResNet, the feature map output by the residual network module can still be connected by residual to form a cross-layer residual network module (spans multiple residual layers in the original residual network module), namely, assuming that the input of a certain residual network module is x and the expected output is H (x), if we directly transmit the input x to the output as an initial result, the target that the residual network module needs to learn is F (x) =H (x) -x, which is equivalent to changing the learning target of the residual network module, and learning F (x) is much easier than learning H (x). And optimizing the ResNet structure again to form a cross-layer residual network, so that the problem of degradation of the residual network ResNet network can be further reduced, and deep features can be further extracted by mixing and utilizing features of different levels. For a Feature Pyramid Network (FPN), high-level features are fused into low-level features, and although the information of the low-level features can be greatly enriched, the information of the high-level features is not improved, and the high-level features also need to be supplemented with low-level feature texture information, so that the feature fusion is insufficient, and the network performance is limited.
The invention comprises the following steps:
in order to overcome the defects of the background technology, the invention provides an image feature extraction method based on a cross-layer residual double-path pyramid network, which realizes two targets on the premise of being equivalent to the feature extraction speed of the original network: (1) Further alleviating the problem of network degradation of the residual network ResNet50 and mixing the further extraction of deep features by using features of different levels to significantly enhance the feature extraction capability of the residual network ResNet 50. (2) The defect that high-level features in a Feature Pyramid Network (FPN) lack low-level detail texture information is overcome, and efficient fusion of feature map information of each level is realized.
In order to solve the technical problems, the invention adopts the following technical scheme:
an image feature extraction method based on a cross-layer residual double-path pyramid network comprises the following steps:
step S1, the original RGB color image is input into the residual network res net50 for preliminary feature extraction, the conv1 convolutional network module 1 of the residual network res net50 outputs the feature map P0, the conv2_x residual network module 2 of the residual network res net50 outputs the feature maps P1, P1', p1=p1', the conv3_x residual network module 3 of the residual network res net50 outputs the feature map P2, the conv4_x residual network module 4 of the residual network res net50 outputs the feature map P3, and the conv5_x residual network module 5 of the residual network res net50 outputs the feature map P4;
step S2, down-sampling the feature map P1 'and then fusing with the feature map P2 to obtain a feature map P2', down-sampling the feature map P2 'and then fusing with the feature map P3 to obtain a feature map P3', down-sampling the feature map P3 'and then fusing with the feature map P4 to obtain a feature map P4' to obtain a feature pyramid network DTFPN from bottom to top;
step S3, the feature maps P1', P2' and the intermediate network thereof form a cross-layer residual network module (a plurality of residual layers in the residual network module crossing the residual network ResNet 50); feature maps P2', P3' and intermediate networks thereof form a cross-layer residual error network module; the feature maps P3', P4' and the intermediate network thereof form a cross-layer residual error network module; realizing a cross-layer residual network based on a residual network ResNet 50;
s4, inputting the feature graphs P1', P2', P3', P4' to a feature pyramid network FPN, and establishing a cross-layer residual double-path pyramid network with the feature pyramid network FPN; the feature maps P1', P2', P3', P4' are processed by the feature pyramid network FPN to obtain output feature maps P1' ", P2 '", P3' ", P4 '", P5 ' ".
Preferably, in step 1, the width and height of the profile Pi (i=0, 1,2, 3) are 1/2 of the profile pi+1, and the number of channels of the profile Pi is 2 times the number of channels of the profile pi+1.
Preferably, step 2 includes:
s2.1, adopting downsampling operation with a convolution kernel size of 1x1 and a step distance of 2 for the feature map P1', reducing the width and height by 1/2, and increasing the channel number by 1 time; inputting the feature map after downsampling of the feature map P1 'into a correction linear unit to adjust the distribution of the feature map data, and adding the feature map obtained after adjustment with the feature map P2 to obtain a feature map P2';
s2.2, adopting downsampling operation with a convolution kernel size of 1x1 and a step distance of 2 on the feature map P2' to reduce the width and height by 1/2, and increasing the channel number by 1 time; then inputting the feature map after downsampling of the feature map P2 'into a correction linear unit to adjust the distribution of the feature map data, and adding the feature map obtained after adjustment with the feature map P3 to obtain a feature map P3';
s2.3, adopting downsampling operation with a convolution kernel size of 1x1 and a step distance of 2 on the characteristic map P3' to reduce the width and height by 1/2, and increasing the channel number by 1 time; and then inputting the feature map after the downsampling of the feature map P3 'into a correction linear unit to adjust the distribution of the feature map data, and adding the feature map obtained after the adjustment with the feature map P4 to obtain a feature map P4'.
Preferably, in step 3, the feature maps P1, P2', P3' are calculated by the conv3_x residual network module 3, the conv4_x residual network module 4, and the conv5_x residual network module 5 of the residual network res net50 in step S1 to obtain feature maps P2, P3, and P4, respectively.
Preferably, the step 3 cross-layer residual network module refers to a plurality of residual layers in the residual network module spanning the residual network res net 50.
The invention has the beneficial effects that:
(1) The invention builds the feature pyramid network (Down to Top Feature Pyramid Network, DTFPN) from bottom to top based on the feature graphs output by each residual network module of the residual network ResNet50, realizes the supplement of low-layer texture detail information to high-layer feature graph information, effectively solves the defect that the high-layer features in the feature pyramid network (Feature Pyramid Network, FPN) lack the low-layer detail texture information, and realizes the efficient fusion of the feature graph information of each layer.
(2) Based on the feature pyramid network (DTFPN) from bottom to top in the step (1), a Cross-layer ResNet50 based on the residual ResNet50 is built, so that the problem of network degradation of the residual ResNet50 is further reduced, deep features are further extracted by mixing and utilizing features of different levels, and the feature extraction capability of the residual ResNet50 is remarkably enhanced.
Compared with a Faster regional convolutional neural network (Faster Region-based Convolution Neural Networks, faster_R-CNN) based on a feature extraction network ResNet50-FPN, the Faster regional convolutional neural network (Faster Region-based Convolution Neural Networks, faster_R-CNN) based on the Cross-layer residual double-path pyramid network (Cross-layer residual Bi-FPN) provided by the invention has the advantage that the average target detection accuracy AP (0.5-0.95) on a Kate (KITTY) data set is improved by 3.8%, but the network reasoning speed is almost unchanged.
Drawings
Fig. 1 is a general network frame diagram of a technical solution according to an embodiment of the present invention;
FIG. 2 is a schematic diagram showing a detail of a bottom-up feature pyramid network (Down to Top Feature Pyramid Network, DTFPN) according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a Cross-layer residual structure of a Cross-layer residual network (Cross-layer res net 50) according to an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the drawings and examples.
The invention provides an image feature extraction method based on a Cross-layer residual double-path pyramid network (Cross-layer residual Bi-FPN), which comprises a Cross-layer residual network (Cross-layer ResNet 50) and a Feature Pyramid Network (FPN) which are self-designed based on a residual network ResNet50, wherein the Cross-layer ResNet50 comprises a completely new bottom-up feature pyramid network (Down to Top Feature Pyramid Network, DTFPN). The realization of the invention is divided into the following steps: s1, defining characteristic diagrams output by an original image through a convolution network module 1 (conv 1), a residual network module 2 (conv2_x), a residual network module 3 (conv3_x), a residual network module 4 (conv4_x) and a residual network module 5 (conv5_x) in a residual network ResNet50 skeleton network as P0, P1 and P1 '(P1=P1'), P2, P3 and P4 respectively. S2, the input of the step is feature graphs P1', P2, P3 and P4 output by the step S1, the feature graph P1' is downsampled and fused with the feature graph P2 to obtain a feature graph P2', the feature graph P2' is downsampled and fused with the feature graph P3 to obtain a feature graph P3', the feature graph P3' is downsampled and fused with the feature graph P4 to obtain a feature graph P4', and therefore a bottom-up feature pyramid network (DTFPN) is constructed, and the step outputs the feature graphs P1', P2', P3' and P4'. S3, the input of the step is a characteristic map P1 output by the step S1 and characteristic maps P2', P3' output by the step S2. The feature maps P1, P2', P3' are calculated by the residual network module 3 (conv3_x), the residual network module 4 (conv4_x) and the residual network module 5 (conv5_x) in the residual network res net50 to obtain feature maps P2, P3 and P4, respectively, so that the feature maps P1', P2' and intermediate networks thereof form a Cross-layer residual network module (a plurality of residual layers in the residual network module crossing the residual network res net 50), the feature maps P2', P3' and intermediate networks thereof form a Cross-layer residual network module, and the feature maps P3', P4' and intermediate networks thereof form a Cross-layer residual network module, thereby realizing a Cross-layer residual network (Cross-layer res net 50). S4, the input of the step is the characteristic graphs P1', P2', P3', P4' output by the step S2. The feature graphs P1', P2', P3', P4' are input into a Feature Pyramid Network (FPN), and a Cross-layer residual double-path pyramid network (Cross-layer residual Bi-FPN) is established with the Feature Pyramid Network (FPN). The feature maps P1', P2', P3', P4' are processed by a Feature Pyramid Network (FPN) to output a feature map P1', a feature map P2' ", P3 '", P4' ", P5 '". According to the invention, a new feature extraction network is formed by designing a new bottom-up feature pyramid network and a cross-layer residual structure, the problem of network degradation of a residual network ResNet50 is further reduced on the premise of being equivalent to the feature extraction speed of the original network, deep features are further extracted by mixing and utilizing features of different levels, the defect that high-level features in the Feature Pyramid Network (FPN) lack low-level detail information is overcome, and efficient fusion of feature map information of each layer is realized. The invention is applied to tasks such as image target detection and semantic segmentation and the like and is excellent in performance.
Fig. one is a general network frame diagram of the technical solution of the present embodiment, where the network includes a Cross-layer residual network (Cross-layer res net 50) and a feature pyramid network (Feature Pyramid Network, FPN) that are self-designed based on the residual network res net50, where the Cross-layer residual network (Cross-layer res net 50) includes an entirely new bottom-up feature pyramid network (Down to Top Feature Pyramid Network, DTFPN). A Cross-layer residual two-way pyramid network (Cross-layer residual Bi-FPN) is constructed, which comprises a Cross-layer residual network (Cross-layer res net 50) and a feature pyramid network (Feature Pyramid Network, FPN) which are self-designed based on the residual network res net50, wherein the Cross-layer residual network (Cross-layer res net 50) comprises an entirely new bottom-up feature pyramid network (Down to Top Feature Pyramid Network, DTFPN). The detailed steps for building the whole feature extraction network are as follows:
as shown in the attached table one, the residual network ResNet50 feature extraction part is composed of a convolutional network module 1 (conv 1), a residual network module 2 (conv2_x), a residual network module 3 (conv3_x), a residual network module 4 (conv4_x), and a residual network module 5 (conv5_x), where "conv2_x" and each residual network module thereafter is composed of a plurality of residual layer structures. The original RGB color image is input into the residual network res net50 for preliminary feature extraction, and "conv1", "conv2_x", "conv3_x", "conv4_x", "conv5_x" are defined to output feature maps P0, P1 '(p1=p1'), P2, P3, P4, respectively. Wherein the width and height of the profile Pi (i=0, 1,2, 3) are 1/2 of the profile pi+1, and the number of channels of the profile Pi is 2 times the number of channels of the profile pi+1.
Attached table-network architecture of residual network resnet50 feature extraction part in this example
S2, the input of the step is the characteristic diagrams P1', P2, P3 and P4 output by the step S1. The feature map P1 'is downsampled and then fused with the feature map P2 to obtain a feature map P2', the feature map P2 'is downsampled and then fused with the feature map P3 to obtain a feature map P3', and the feature map P3 'is downsampled and then fused with the feature map P4 to obtain a feature map P4', so that a bottom-up feature pyramid network (DTFPN) is constructed. The output of this step is the feature maps P1', P2', P3', P4'. The details of downsampling and fusing are shown in the third drawing, and are further described below in conjunction with the third drawing:
s2.1, the input of the step is the characteristic graphs P1', P2 output by the step S1. And the downsampling operation with the convolution kernel size of 1x1 and the step distance of 2 is adopted for the feature map P1', so that the width and the height of the feature map P1' are reduced by 1/2, the channel number is increased by 1 time, and the size of the feature map P1' is ensured to be the same as that of the feature map P2 after downsampling. Then, the feature map after the downsampling of the feature map P1 'is inputted to a correction linear unit (Rectified linear unit, reLU) to adjust the distribution of the feature map data, and the outputted feature map is added to the feature map P2 to obtain a feature map P2'. This step outputs a feature map P2'.
The inputs of step S2.2 are the feature map P3 output in step S1 and the feature map P2' output in step S2.1. And the downsampling operation with the convolution kernel size of 1x1 and the step distance of 2 is adopted for the feature map P2', so that the width and the height of the feature map P2' are reduced by 1/2, the channel number is increased by 1 time, and the downsampling of the feature map P2' is ensured to be the same as the feature map P3. Then, the feature map after the downsampling of the feature map P2 'is inputted to a correction linear unit (Rectified linear unit, reLU) to adjust the distribution of the feature map data, and the outputted feature map is added to the feature map P3 to obtain a feature map P3'. This step outputs a feature map P3'.
The inputs of step S2.3 are the feature map P4 output in step S1 and the feature map P3' output in step S2.2. And the downsampling operation with the convolution kernel size of 1x1 and the step distance of 2 is adopted for the feature map P3', so that the width and the height of the feature map P3' are reduced by 1/2, the channel number is increased by 1 time, and the downsampling of the feature map P3' is ensured to be the same as the feature map P4. Then, the feature map after the feature map P3 'is downsampled is input to a correction linear unit (Rectified linear unit, reLU) to adjust the distribution of the feature map data, and the output feature map is added to the feature map P4 to obtain a feature map P4'. This step outputs a feature map P4'.
S3, the input of the step is a characteristic map P1 output by the step S1 and characteristic maps P2', P3' output by the step S2. The feature maps P1, P2', P3' are calculated by the residual network module 3 (conv3_x), the residual network module 4 (conv4_x), and the residual network module 5 (conv5_x) of the residual network res net50 in step S1 to obtain feature maps P2, P3, and P4, respectively. As shown in fig. three, there are feature graphs P1', P2' and their intermediate networks forming a Cross-layer residual network module (multiple residual layers in the residual network module crossing the residual network res net 50), feature graphs P2', P3' and their intermediate networks forming a Cross-layer residual network module, and feature graphs P3', P4' and their intermediate networks forming a Cross-layer residual network module, thereby realizing a Cross-layer residual network (Cross-layer res net 50) based on the residual network res net 50.
S4, the input of the step is the characteristic graphs P1', P2', P3', P4' output by the step S2. The feature graphs P1', P2', P3', P4' are input into a Feature Pyramid Network (FPN), and a Cross-layer residual double-path pyramid network (Cross-layer residual Bi-FPN) is established with the Feature Pyramid Network (FPN). The feature maps P1', P2', P3', P4' are processed by a Feature Pyramid Network (FPN) to output a feature map P1', a feature map P2', P3', P4', P5 ', thus, all steps of extracting the feature map by inputting the original image into a Cross-layer residual double-path pyramid network (Cross-layer residual Bi-FPN) are completed.
It will be understood that modifications and variations will be apparent to those skilled in the art from the foregoing description, and it is intended that all such modifications and variations be included within the scope of the following claims.
Claims (3)
1. The image feature extraction method based on the cross-layer residual double-path pyramid network is characterized by comprising the following steps of:
step S1, inputting an original RGB color image into a residual network res net50 for preliminary feature extraction, wherein a conv1 convolutional network module 1 of the residual network res net50 outputs a feature map P0, a conv2_x residual network module 2 of the residual network res net50 outputs feature maps P1, P1', p1=p1', a conv3_x residual network module 3 of the residual network res net50 outputs a feature map P2, a conv4_x residual network module 4 of the residual network res net50 outputs a feature map P3, and a conv5_x residual network module 5 of the residual network res net50 outputs a feature map P4;
step S2, down-sampling the feature map P1 'and then fusing with the feature map P2 to obtain a feature map P2', down-sampling the feature map P2 'and then fusing with the feature map P3 to obtain a feature map P3', down-sampling the feature map P3 'and then fusing with the feature map P4 to obtain a feature map P4' to obtain a feature pyramid network DTFPN from bottom to top;
step S3, the feature maps P1', P2' and intermediate networks thereof form a cross-layer residual error network module, wherein the intermediate networks form a plurality of residual error layers in the residual error network module crossing the residual error network ResNet 50; the feature maps P2', P3 and P3' and the intermediate network thereof form a cross-layer residual network module, the input of the step is the feature map P1 'output by the step S1 and the feature maps P2', P3 'output by the step S2, and the feature maps P1', P2 'and P3' are respectively calculated by the residual network module 3, the residual network module 4 and the residual network module 5 of the residual network ResNet50 in the step S1 to obtain feature maps P2, P3 and P4; the feature maps P3', P4' and the intermediate network thereof form a cross-layer residual error network module; realizing a cross-layer residual network based on a residual network ResNet 50;
s4, inputting the feature graphs P1', P2', P3', P4' to a feature pyramid network FPN, and establishing a cross-layer residual double-path pyramid network with the feature pyramid network FPN; the feature maps P1', P2', P3', P4' are processed by the feature pyramid network FPN to obtain output feature maps P1' ", P2 '", P3' ", P4 '", P5 ' ".
2. The image feature extraction method based on the cross-layer residual double-path pyramid network according to claim 1, wherein the method comprises the following steps: in the step S1, the width and height of the profile Pi are 1/2 of the profile pi+1, and the number of channels of the profile Pi is 2 times the number of channels of the profile pi+1, where i=0, 1,2,3.
3. The method for extracting image features based on the cross-layer residual two-way pyramid network according to claim 1, wherein the step S2 includes:
s2.1, adopting downsampling operation with a convolution kernel size of 1x1 and a step distance of 2 for the feature map P1', reducing the width and height by 1/2, and increasing the channel number by 1 time; inputting the feature map after downsampling of the feature map P1 'into a correction linear unit to adjust the distribution of the feature map data, and adding the feature map obtained after adjustment with the feature map P2 to obtain a feature map P2';
s2.2, adopting downsampling operation with a convolution kernel size of 1x1 and a step distance of 2 on the feature map P2' to reduce the width and height by 1/2, and increasing the channel number by 1 time; then inputting the feature map after downsampling of the feature map P2 'into a correction linear unit to adjust the distribution of the feature map data, and adding the feature map obtained after adjustment with the feature map P3 to obtain a feature map P3';
s2.3, adopting downsampling operation with a convolution kernel size of 1x1 and a step distance of 2 on the characteristic map P3' to reduce the width and height by 1/2, and increasing the channel number by 1 time; and then inputting the feature map after the downsampling of the feature map P3 'into a correction linear unit to adjust the distribution of the feature map data, and adding the feature map obtained after the adjustment with the feature map P4 to obtain a feature map P4'.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111002973.5A CN113837199B (en) | 2021-08-30 | 2021-08-30 | Image feature extraction method based on cross-layer residual double-path pyramid network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111002973.5A CN113837199B (en) | 2021-08-30 | 2021-08-30 | Image feature extraction method based on cross-layer residual double-path pyramid network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113837199A CN113837199A (en) | 2021-12-24 |
CN113837199B true CN113837199B (en) | 2024-01-09 |
Family
ID=78961539
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111002973.5A Active CN113837199B (en) | 2021-08-30 | 2021-08-30 | Image feature extraction method based on cross-layer residual double-path pyramid network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113837199B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115100473A (en) * | 2022-06-29 | 2022-09-23 | 武汉兰丁智能医学股份有限公司 | Lung cell image classification method based on parallel neural network |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111339893A (en) * | 2020-02-21 | 2020-06-26 | 哈尔滨工业大学 | Pipeline detection system and method based on deep learning and unmanned aerial vehicle |
CN111507359A (en) * | 2020-03-09 | 2020-08-07 | 杭州电子科技大学 | Self-adaptive weighting fusion method of image feature pyramid |
CN111753677A (en) * | 2020-06-10 | 2020-10-09 | 杭州电子科技大学 | Multi-angle remote sensing ship image target detection method based on characteristic pyramid structure |
CN112163449A (en) * | 2020-08-21 | 2021-01-01 | 同济大学 | Lightweight multi-branch feature cross-layer fusion image semantic segmentation method |
CN112507861A (en) * | 2020-12-04 | 2021-03-16 | 江苏科技大学 | Pedestrian detection method based on multilayer convolution feature fusion |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108428229B (en) * | 2018-03-14 | 2020-06-16 | 大连理工大学 | Lung texture recognition method based on appearance and geometric features extracted by deep neural network |
CN110136136B (en) * | 2019-05-27 | 2022-02-08 | 北京达佳互联信息技术有限公司 | Scene segmentation method and device, computer equipment and storage medium |
-
2021
- 2021-08-30 CN CN202111002973.5A patent/CN113837199B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111339893A (en) * | 2020-02-21 | 2020-06-26 | 哈尔滨工业大学 | Pipeline detection system and method based on deep learning and unmanned aerial vehicle |
CN111507359A (en) * | 2020-03-09 | 2020-08-07 | 杭州电子科技大学 | Self-adaptive weighting fusion method of image feature pyramid |
CN111753677A (en) * | 2020-06-10 | 2020-10-09 | 杭州电子科技大学 | Multi-angle remote sensing ship image target detection method based on characteristic pyramid structure |
CN112163449A (en) * | 2020-08-21 | 2021-01-01 | 同济大学 | Lightweight multi-branch feature cross-layer fusion image semantic segmentation method |
CN112507861A (en) * | 2020-12-04 | 2021-03-16 | 江苏科技大学 | Pedestrian detection method based on multilayer convolution feature fusion |
Non-Patent Citations (2)
Title |
---|
Convolutional networks with cross-layer neurons for image recognition;Zeng Yu 等;《Information Sciences》;第241-254页 * |
领域知识驱动的深度学习单幅图像去雨研究;傅雪阳;《中国博士学位论文全文数据库 信息科技辑》(第12期);第1-109页 * |
Also Published As
Publication number | Publication date |
---|---|
CN113837199A (en) | 2021-12-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111242288B (en) | Multi-scale parallel deep neural network model construction method for lesion image segmentation | |
WO2022001623A1 (en) | Image processing method and apparatus based on artificial intelligence, and device and storage medium | |
WO2022252272A1 (en) | Transfer learning-based method for improved vgg16 network pig identity recognition | |
CN108304826A (en) | Facial expression recognizing method based on convolutional neural networks | |
CN109949255A (en) | Image rebuilding method and equipment | |
CN113807355A (en) | Image semantic segmentation method based on coding and decoding structure | |
CN108846444A (en) | The multistage depth migration learning method excavated towards multi-source data | |
CN113159073A (en) | Knowledge distillation method and device, storage medium and terminal | |
CN111860528B (en) | Image segmentation model based on improved U-Net network and training method | |
CN113706545A (en) | Semi-supervised image segmentation method based on dual-branch nerve discrimination dimensionality reduction | |
CN107563430A (en) | A kind of convolutional neural networks algorithm optimization method based on sparse autocoder and gray scale correlation fractal dimension | |
Zhang et al. | Channel-wise and feature-points reweights densenet for image classification | |
CN113837199B (en) | Image feature extraction method based on cross-layer residual double-path pyramid network | |
CN111310820A (en) | Foundation meteorological cloud chart classification method based on cross validation depth CNN feature integration | |
Dong et al. | Field-matching attention network for object detection | |
CN116912253B (en) | Lung cancer pathological image classification method based on multi-scale mixed neural network | |
CN117152438A (en) | Lightweight street view image semantic segmentation method based on improved deep LabV3+ network | |
CN114494284B (en) | Scene analysis model and method based on explicit supervision area relation | |
CN114332491A (en) | Saliency target detection algorithm based on feature reconstruction | |
CN113052810B (en) | Small medical image focus segmentation method suitable for mobile application | |
Fan et al. | EGFNet: Efficient guided feature fusion network for skin cancer lesion segmentation | |
CN113269702A (en) | Low-exposure vein image enhancement method based on cross-scale feature fusion | |
CN113724266A (en) | Glioma segmentation method and system | |
CN117456286B (en) | Ginseng grading method, device and equipment | |
Wu et al. | Lightweight stepless super-resolution of remote sensing images via saliency-aware dynamic routing strategy |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |