CN113837199B - Image feature extraction method based on cross-layer residual double-path pyramid network - Google Patents

Image feature extraction method based on cross-layer residual double-path pyramid network Download PDF

Info

Publication number
CN113837199B
CN113837199B CN202111002973.5A CN202111002973A CN113837199B CN 113837199 B CN113837199 B CN 113837199B CN 202111002973 A CN202111002973 A CN 202111002973A CN 113837199 B CN113837199 B CN 113837199B
Authority
CN
China
Prior art keywords
feature
network
feature map
residual
cross
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111002973.5A
Other languages
Chinese (zh)
Other versions
CN113837199A (en
Inventor
胡杰
谢礼浩
安永鹏
熊宗权
徐文才
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN202111002973.5A priority Critical patent/CN113837199B/en
Publication of CN113837199A publication Critical patent/CN113837199A/en
Application granted granted Critical
Publication of CN113837199B publication Critical patent/CN113837199B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an image feature extraction method based on a cross-layer residual double-path pyramid network, which is characterized in that an original RGB color image is input into a residual network ResNet50 for preliminary feature extraction to obtain a feature pyramid network DTFPN from bottom to top; realizing a cross-layer residual network based on a residual network ResNet 50; the output characteristic diagram P1', is obtained after the processing of the characteristic pyramid network FPN P2' ", P3 '", P4' ", P5 '". The invention further alleviates the problem of network degradation of the residual network ResNet50 and mixes and utilizes the characteristics of different levels to further extract deep characteristics so as to obviously enhance the characteristic extraction capability of the residual network ResNet 50. The defect that high-level features in a Feature Pyramid Network (FPN) lack low-level detail texture information is overcome, and efficient fusion of feature map information of each level is realized.

Description

Image feature extraction method based on cross-layer residual double-path pyramid network
Technical Field
The invention relates to the fields of computer vision, artificial intelligence, mode identification and the like, in particular to an image feature extraction method based on a cross-layer residual double-path pyramid network.
Background
With the development of artificial intelligence, convolutional neural networks have become a major method for extracting image features, and well-known feature extraction networks include a plum network 5 (LeNet 5), an alexin network (AlexNet), a visual geometry group (Visual Geometry Group, VGG), a google network (google net), a residual network (ResNet), and the like.
The generation of the lie network 5 (LeNet 5) was one of the earliest convolutional neural networks in 1994, and has driven the development of deep learning. The method consists of two convolution layers, two pooling layers and two full-connection layers, wherein convolution adopts convolution kernel with the convolution kernel size of 5x5 and the stride of 1, and maximum pooling downsampling is adopted.
The alexant network (alexent) has obtained in 2012 the image network (ImageNet) champion, which can be said to be a deeper and wider version of Li Wanglao (LeNet) which contains 6 million 3000 ten thousand connections, 6000 ten thousand parameters and 65 ten thousand neurons, with 5 convolutional layers, 3 of which are followed by the max pooling layer and finally 3 full connection layers. The alexant network (alexent) wins the champion of the image network large-scale visual recognition challenge (ImageNet Large Scale Visual Recognition Challenge, ILSVRC) race with significant advantages, and the error rate of five prediction errors (top-5) is reduced from the previous 25.8% to 16.4. The main technical points of Alexax networks (AlexNet) are: (1) The gradient dispersion problem of the logic function (sigmoid) when the network is deep is solved by using the modified linear units (Rectified linear unit, reLU) as the activation function of the Convolutional Neural Network (CNN). (2) Random discard (Dropout) was used during training to randomly ignore a portion of the neurons to avoid model overfitting. (3) The overlapped maximum pooling is used in the Convolutional Neural Network (CNN), and the step length is smaller than the pooling core, so that the outputs are overlapped and covered, and the feature richness is improved. Heretofore Convolutional Neural Networks (CNNs) have commonly used average pooling, with alexant networks (AlexNet) all using maximum pooling, avoiding the ambiguity effects of average pooling. (4) Data enhancement is used, overfitting is reduced, and model generalization capability is improved.
The Visual Geometry Group (VGG) is the first network to use smaller 3x3 convolution kernels at each convolution layer and to treat them in combination as one convolution sequence, and is characterized by a large number of sequential convolution computations and a large computational effort. A tremendous development of the Visual Geometry Group (VGG) is that by employing multiple 3x3 convolutions in sequence, the effect of a larger receptive field can be simulated. Models of the Visual Geometry Group (VGG) indicate that depth is beneficial for improving classification accuracy, and one important idea is that convolution can replace full concatenation. The whole parameter reaches 1 hundred million and 4 million, and is mainly characterized in that the first full connection layer is replaced by convolution, and the parameter quantity is reduced without precision loss.
Google network (google net), the first "open" architecture, first appears in the race of the image network large-scale visual recognition challenge (ImageNet Large Scale Visual Recognition Challenge, ILSVRC) 2014 to gain the first name with great advantage. The "open network" (acceptance Net) in that match is generally called "open V1" (acceptance V1), and its biggest feature is that the calculation amount and the parameter amount are controlled, and meanwhile, very good classification performance is obtained, that is, the five-time prediction error (top-5) error rate is 6.67%, and only half of alexan network (alexene Net) is not reached. The "onset V1" (acceptance V1) has a 22-layer depth that is deeper than the 19 layers of an 8-layer or Visual Geometry Group (VGG) network of an alexin network (alexene). However, the calculated amount is only 15 hundred million floating point operations, and simultaneously, only 500 ten thousand parameter amounts are only 1/12 of Alexarnt network (Alexarnt) parameter amount (6000 ten thousand), but the accuracy far superior to Alexarnt network (Alexarnt) can be achieved, so that the model is a very excellent and practical model. The versions V2, V3 and V4 are sequentially deduced on the basis of the beginning V1 (introduction V1).
Residual network ResNet was proposed in 2015 to obtain a first name on the image network (ImageNet) game classification task because it is "simple and practical" and many methods are then done on the basis of residual network ResNet50 or residual network ResNet 101. The residual network ResNet provides a residual structure and effectively solves the problems of deep network gradient disappearance or gradient explosion and network degradation by using a batch normalization (Batch Normalization) method, so that the performance of the ultra-deep residual network ResNet is greatly improved compared with the prior characteristic extraction network, and excellent results are obtained in the fields of image detection, image classification, image segmentation and the like.
The Feature Pyramid Network (FPN) constructs a feature pyramid network capable of performing end-to-end training, and the feature pyramid network fuses high-level features extracted by the feature extraction network after downsampling with low-level features, so that semantic information of the low-level features is enriched. For small objects, feature Pyramid Networks (FPNs) increase the resolution of feature mapping, i.e., operate on larger feature maps to obtain more information about the small object.
Inspired by a residual network ResNet, the feature map output by the residual network module can still be connected by residual to form a cross-layer residual network module (spans multiple residual layers in the original residual network module), namely, assuming that the input of a certain residual network module is x and the expected output is H (x), if we directly transmit the input x to the output as an initial result, the target that the residual network module needs to learn is F (x) =H (x) -x, which is equivalent to changing the learning target of the residual network module, and learning F (x) is much easier than learning H (x). And optimizing the ResNet structure again to form a cross-layer residual network, so that the problem of degradation of the residual network ResNet network can be further reduced, and deep features can be further extracted by mixing and utilizing features of different levels. For a Feature Pyramid Network (FPN), high-level features are fused into low-level features, and although the information of the low-level features can be greatly enriched, the information of the high-level features is not improved, and the high-level features also need to be supplemented with low-level feature texture information, so that the feature fusion is insufficient, and the network performance is limited.
The invention comprises the following steps:
in order to overcome the defects of the background technology, the invention provides an image feature extraction method based on a cross-layer residual double-path pyramid network, which realizes two targets on the premise of being equivalent to the feature extraction speed of the original network: (1) Further alleviating the problem of network degradation of the residual network ResNet50 and mixing the further extraction of deep features by using features of different levels to significantly enhance the feature extraction capability of the residual network ResNet 50. (2) The defect that high-level features in a Feature Pyramid Network (FPN) lack low-level detail texture information is overcome, and efficient fusion of feature map information of each level is realized.
In order to solve the technical problems, the invention adopts the following technical scheme:
an image feature extraction method based on a cross-layer residual double-path pyramid network comprises the following steps:
step S1, the original RGB color image is input into the residual network res net50 for preliminary feature extraction, the conv1 convolutional network module 1 of the residual network res net50 outputs the feature map P0, the conv2_x residual network module 2 of the residual network res net50 outputs the feature maps P1, P1', p1=p1', the conv3_x residual network module 3 of the residual network res net50 outputs the feature map P2, the conv4_x residual network module 4 of the residual network res net50 outputs the feature map P3, and the conv5_x residual network module 5 of the residual network res net50 outputs the feature map P4;
step S2, down-sampling the feature map P1 'and then fusing with the feature map P2 to obtain a feature map P2', down-sampling the feature map P2 'and then fusing with the feature map P3 to obtain a feature map P3', down-sampling the feature map P3 'and then fusing with the feature map P4 to obtain a feature map P4' to obtain a feature pyramid network DTFPN from bottom to top;
step S3, the feature maps P1', P2' and the intermediate network thereof form a cross-layer residual network module (a plurality of residual layers in the residual network module crossing the residual network ResNet 50); feature maps P2', P3' and intermediate networks thereof form a cross-layer residual error network module; the feature maps P3', P4' and the intermediate network thereof form a cross-layer residual error network module; realizing a cross-layer residual network based on a residual network ResNet 50;
s4, inputting the feature graphs P1', P2', P3', P4' to a feature pyramid network FPN, and establishing a cross-layer residual double-path pyramid network with the feature pyramid network FPN; the feature maps P1', P2', P3', P4' are processed by the feature pyramid network FPN to obtain output feature maps P1' ", P2 '", P3' ", P4 '", P5 ' ".
Preferably, in step 1, the width and height of the profile Pi (i=0, 1,2, 3) are 1/2 of the profile pi+1, and the number of channels of the profile Pi is 2 times the number of channels of the profile pi+1.
Preferably, step 2 includes:
s2.1, adopting downsampling operation with a convolution kernel size of 1x1 and a step distance of 2 for the feature map P1', reducing the width and height by 1/2, and increasing the channel number by 1 time; inputting the feature map after downsampling of the feature map P1 'into a correction linear unit to adjust the distribution of the feature map data, and adding the feature map obtained after adjustment with the feature map P2 to obtain a feature map P2';
s2.2, adopting downsampling operation with a convolution kernel size of 1x1 and a step distance of 2 on the feature map P2' to reduce the width and height by 1/2, and increasing the channel number by 1 time; then inputting the feature map after downsampling of the feature map P2 'into a correction linear unit to adjust the distribution of the feature map data, and adding the feature map obtained after adjustment with the feature map P3 to obtain a feature map P3';
s2.3, adopting downsampling operation with a convolution kernel size of 1x1 and a step distance of 2 on the characteristic map P3' to reduce the width and height by 1/2, and increasing the channel number by 1 time; and then inputting the feature map after the downsampling of the feature map P3 'into a correction linear unit to adjust the distribution of the feature map data, and adding the feature map obtained after the adjustment with the feature map P4 to obtain a feature map P4'.
Preferably, in step 3, the feature maps P1, P2', P3' are calculated by the conv3_x residual network module 3, the conv4_x residual network module 4, and the conv5_x residual network module 5 of the residual network res net50 in step S1 to obtain feature maps P2, P3, and P4, respectively.
Preferably, the step 3 cross-layer residual network module refers to a plurality of residual layers in the residual network module spanning the residual network res net 50.
The invention has the beneficial effects that:
(1) The invention builds the feature pyramid network (Down to Top Feature Pyramid Network, DTFPN) from bottom to top based on the feature graphs output by each residual network module of the residual network ResNet50, realizes the supplement of low-layer texture detail information to high-layer feature graph information, effectively solves the defect that the high-layer features in the feature pyramid network (Feature Pyramid Network, FPN) lack the low-layer detail texture information, and realizes the efficient fusion of the feature graph information of each layer.
(2) Based on the feature pyramid network (DTFPN) from bottom to top in the step (1), a Cross-layer ResNet50 based on the residual ResNet50 is built, so that the problem of network degradation of the residual ResNet50 is further reduced, deep features are further extracted by mixing and utilizing features of different levels, and the feature extraction capability of the residual ResNet50 is remarkably enhanced.
Compared with a Faster regional convolutional neural network (Faster Region-based Convolution Neural Networks, faster_R-CNN) based on a feature extraction network ResNet50-FPN, the Faster regional convolutional neural network (Faster Region-based Convolution Neural Networks, faster_R-CNN) based on the Cross-layer residual double-path pyramid network (Cross-layer residual Bi-FPN) provided by the invention has the advantage that the average target detection accuracy AP (0.5-0.95) on a Kate (KITTY) data set is improved by 3.8%, but the network reasoning speed is almost unchanged.
Drawings
Fig. 1 is a general network frame diagram of a technical solution according to an embodiment of the present invention;
FIG. 2 is a schematic diagram showing a detail of a bottom-up feature pyramid network (Down to Top Feature Pyramid Network, DTFPN) according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a Cross-layer residual structure of a Cross-layer residual network (Cross-layer res net 50) according to an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the drawings and examples.
The invention provides an image feature extraction method based on a Cross-layer residual double-path pyramid network (Cross-layer residual Bi-FPN), which comprises a Cross-layer residual network (Cross-layer ResNet 50) and a Feature Pyramid Network (FPN) which are self-designed based on a residual network ResNet50, wherein the Cross-layer ResNet50 comprises a completely new bottom-up feature pyramid network (Down to Top Feature Pyramid Network, DTFPN). The realization of the invention is divided into the following steps: s1, defining characteristic diagrams output by an original image through a convolution network module 1 (conv 1), a residual network module 2 (conv2_x), a residual network module 3 (conv3_x), a residual network module 4 (conv4_x) and a residual network module 5 (conv5_x) in a residual network ResNet50 skeleton network as P0, P1 and P1 '(P1=P1'), P2, P3 and P4 respectively. S2, the input of the step is feature graphs P1', P2, P3 and P4 output by the step S1, the feature graph P1' is downsampled and fused with the feature graph P2 to obtain a feature graph P2', the feature graph P2' is downsampled and fused with the feature graph P3 to obtain a feature graph P3', the feature graph P3' is downsampled and fused with the feature graph P4 to obtain a feature graph P4', and therefore a bottom-up feature pyramid network (DTFPN) is constructed, and the step outputs the feature graphs P1', P2', P3' and P4'. S3, the input of the step is a characteristic map P1 output by the step S1 and characteristic maps P2', P3' output by the step S2. The feature maps P1, P2', P3' are calculated by the residual network module 3 (conv3_x), the residual network module 4 (conv4_x) and the residual network module 5 (conv5_x) in the residual network res net50 to obtain feature maps P2, P3 and P4, respectively, so that the feature maps P1', P2' and intermediate networks thereof form a Cross-layer residual network module (a plurality of residual layers in the residual network module crossing the residual network res net 50), the feature maps P2', P3' and intermediate networks thereof form a Cross-layer residual network module, and the feature maps P3', P4' and intermediate networks thereof form a Cross-layer residual network module, thereby realizing a Cross-layer residual network (Cross-layer res net 50). S4, the input of the step is the characteristic graphs P1', P2', P3', P4' output by the step S2. The feature graphs P1', P2', P3', P4' are input into a Feature Pyramid Network (FPN), and a Cross-layer residual double-path pyramid network (Cross-layer residual Bi-FPN) is established with the Feature Pyramid Network (FPN). The feature maps P1', P2', P3', P4' are processed by a Feature Pyramid Network (FPN) to output a feature map P1', a feature map P2' ", P3 '", P4' ", P5 '". According to the invention, a new feature extraction network is formed by designing a new bottom-up feature pyramid network and a cross-layer residual structure, the problem of network degradation of a residual network ResNet50 is further reduced on the premise of being equivalent to the feature extraction speed of the original network, deep features are further extracted by mixing and utilizing features of different levels, the defect that high-level features in the Feature Pyramid Network (FPN) lack low-level detail information is overcome, and efficient fusion of feature map information of each layer is realized. The invention is applied to tasks such as image target detection and semantic segmentation and the like and is excellent in performance.
Fig. one is a general network frame diagram of the technical solution of the present embodiment, where the network includes a Cross-layer residual network (Cross-layer res net 50) and a feature pyramid network (Feature Pyramid Network, FPN) that are self-designed based on the residual network res net50, where the Cross-layer residual network (Cross-layer res net 50) includes an entirely new bottom-up feature pyramid network (Down to Top Feature Pyramid Network, DTFPN). A Cross-layer residual two-way pyramid network (Cross-layer residual Bi-FPN) is constructed, which comprises a Cross-layer residual network (Cross-layer res net 50) and a feature pyramid network (Feature Pyramid Network, FPN) which are self-designed based on the residual network res net50, wherein the Cross-layer residual network (Cross-layer res net 50) comprises an entirely new bottom-up feature pyramid network (Down to Top Feature Pyramid Network, DTFPN). The detailed steps for building the whole feature extraction network are as follows:
as shown in the attached table one, the residual network ResNet50 feature extraction part is composed of a convolutional network module 1 (conv 1), a residual network module 2 (conv2_x), a residual network module 3 (conv3_x), a residual network module 4 (conv4_x), and a residual network module 5 (conv5_x), where "conv2_x" and each residual network module thereafter is composed of a plurality of residual layer structures. The original RGB color image is input into the residual network res net50 for preliminary feature extraction, and "conv1", "conv2_x", "conv3_x", "conv4_x", "conv5_x" are defined to output feature maps P0, P1 '(p1=p1'), P2, P3, P4, respectively. Wherein the width and height of the profile Pi (i=0, 1,2, 3) are 1/2 of the profile pi+1, and the number of channels of the profile Pi is 2 times the number of channels of the profile pi+1.
Attached table-network architecture of residual network resnet50 feature extraction part in this example
S2, the input of the step is the characteristic diagrams P1', P2, P3 and P4 output by the step S1. The feature map P1 'is downsampled and then fused with the feature map P2 to obtain a feature map P2', the feature map P2 'is downsampled and then fused with the feature map P3 to obtain a feature map P3', and the feature map P3 'is downsampled and then fused with the feature map P4 to obtain a feature map P4', so that a bottom-up feature pyramid network (DTFPN) is constructed. The output of this step is the feature maps P1', P2', P3', P4'. The details of downsampling and fusing are shown in the third drawing, and are further described below in conjunction with the third drawing:
s2.1, the input of the step is the characteristic graphs P1', P2 output by the step S1. And the downsampling operation with the convolution kernel size of 1x1 and the step distance of 2 is adopted for the feature map P1', so that the width and the height of the feature map P1' are reduced by 1/2, the channel number is increased by 1 time, and the size of the feature map P1' is ensured to be the same as that of the feature map P2 after downsampling. Then, the feature map after the downsampling of the feature map P1 'is inputted to a correction linear unit (Rectified linear unit, reLU) to adjust the distribution of the feature map data, and the outputted feature map is added to the feature map P2 to obtain a feature map P2'. This step outputs a feature map P2'.
The inputs of step S2.2 are the feature map P3 output in step S1 and the feature map P2' output in step S2.1. And the downsampling operation with the convolution kernel size of 1x1 and the step distance of 2 is adopted for the feature map P2', so that the width and the height of the feature map P2' are reduced by 1/2, the channel number is increased by 1 time, and the downsampling of the feature map P2' is ensured to be the same as the feature map P3. Then, the feature map after the downsampling of the feature map P2 'is inputted to a correction linear unit (Rectified linear unit, reLU) to adjust the distribution of the feature map data, and the outputted feature map is added to the feature map P3 to obtain a feature map P3'. This step outputs a feature map P3'.
The inputs of step S2.3 are the feature map P4 output in step S1 and the feature map P3' output in step S2.2. And the downsampling operation with the convolution kernel size of 1x1 and the step distance of 2 is adopted for the feature map P3', so that the width and the height of the feature map P3' are reduced by 1/2, the channel number is increased by 1 time, and the downsampling of the feature map P3' is ensured to be the same as the feature map P4. Then, the feature map after the feature map P3 'is downsampled is input to a correction linear unit (Rectified linear unit, reLU) to adjust the distribution of the feature map data, and the output feature map is added to the feature map P4 to obtain a feature map P4'. This step outputs a feature map P4'.
S3, the input of the step is a characteristic map P1 output by the step S1 and characteristic maps P2', P3' output by the step S2. The feature maps P1, P2', P3' are calculated by the residual network module 3 (conv3_x), the residual network module 4 (conv4_x), and the residual network module 5 (conv5_x) of the residual network res net50 in step S1 to obtain feature maps P2, P3, and P4, respectively. As shown in fig. three, there are feature graphs P1', P2' and their intermediate networks forming a Cross-layer residual network module (multiple residual layers in the residual network module crossing the residual network res net 50), feature graphs P2', P3' and their intermediate networks forming a Cross-layer residual network module, and feature graphs P3', P4' and their intermediate networks forming a Cross-layer residual network module, thereby realizing a Cross-layer residual network (Cross-layer res net 50) based on the residual network res net 50.
S4, the input of the step is the characteristic graphs P1', P2', P3', P4' output by the step S2. The feature graphs P1', P2', P3', P4' are input into a Feature Pyramid Network (FPN), and a Cross-layer residual double-path pyramid network (Cross-layer residual Bi-FPN) is established with the Feature Pyramid Network (FPN). The feature maps P1', P2', P3', P4' are processed by a Feature Pyramid Network (FPN) to output a feature map P1', a feature map P2', P3', P4', P5 ', thus, all steps of extracting the feature map by inputting the original image into a Cross-layer residual double-path pyramid network (Cross-layer residual Bi-FPN) are completed.
It will be understood that modifications and variations will be apparent to those skilled in the art from the foregoing description, and it is intended that all such modifications and variations be included within the scope of the following claims.

Claims (3)

1. The image feature extraction method based on the cross-layer residual double-path pyramid network is characterized by comprising the following steps of:
step S1, inputting an original RGB color image into a residual network res net50 for preliminary feature extraction, wherein a conv1 convolutional network module 1 of the residual network res net50 outputs a feature map P0, a conv2_x residual network module 2 of the residual network res net50 outputs feature maps P1, P1', p1=p1', a conv3_x residual network module 3 of the residual network res net50 outputs a feature map P2, a conv4_x residual network module 4 of the residual network res net50 outputs a feature map P3, and a conv5_x residual network module 5 of the residual network res net50 outputs a feature map P4;
step S2, down-sampling the feature map P1 'and then fusing with the feature map P2 to obtain a feature map P2', down-sampling the feature map P2 'and then fusing with the feature map P3 to obtain a feature map P3', down-sampling the feature map P3 'and then fusing with the feature map P4 to obtain a feature map P4' to obtain a feature pyramid network DTFPN from bottom to top;
step S3, the feature maps P1', P2' and intermediate networks thereof form a cross-layer residual error network module, wherein the intermediate networks form a plurality of residual error layers in the residual error network module crossing the residual error network ResNet 50; the feature maps P2', P3 and P3' and the intermediate network thereof form a cross-layer residual network module, the input of the step is the feature map P1 'output by the step S1 and the feature maps P2', P3 'output by the step S2, and the feature maps P1', P2 'and P3' are respectively calculated by the residual network module 3, the residual network module 4 and the residual network module 5 of the residual network ResNet50 in the step S1 to obtain feature maps P2, P3 and P4; the feature maps P3', P4' and the intermediate network thereof form a cross-layer residual error network module; realizing a cross-layer residual network based on a residual network ResNet 50;
s4, inputting the feature graphs P1', P2', P3', P4' to a feature pyramid network FPN, and establishing a cross-layer residual double-path pyramid network with the feature pyramid network FPN; the feature maps P1', P2', P3', P4' are processed by the feature pyramid network FPN to obtain output feature maps P1' ", P2 '", P3' ", P4 '", P5 ' ".
2. The image feature extraction method based on the cross-layer residual double-path pyramid network according to claim 1, wherein the method comprises the following steps: in the step S1, the width and height of the profile Pi are 1/2 of the profile pi+1, and the number of channels of the profile Pi is 2 times the number of channels of the profile pi+1, where i=0, 1,2,3.
3. The method for extracting image features based on the cross-layer residual two-way pyramid network according to claim 1, wherein the step S2 includes:
s2.1, adopting downsampling operation with a convolution kernel size of 1x1 and a step distance of 2 for the feature map P1', reducing the width and height by 1/2, and increasing the channel number by 1 time; inputting the feature map after downsampling of the feature map P1 'into a correction linear unit to adjust the distribution of the feature map data, and adding the feature map obtained after adjustment with the feature map P2 to obtain a feature map P2';
s2.2, adopting downsampling operation with a convolution kernel size of 1x1 and a step distance of 2 on the feature map P2' to reduce the width and height by 1/2, and increasing the channel number by 1 time; then inputting the feature map after downsampling of the feature map P2 'into a correction linear unit to adjust the distribution of the feature map data, and adding the feature map obtained after adjustment with the feature map P3 to obtain a feature map P3';
s2.3, adopting downsampling operation with a convolution kernel size of 1x1 and a step distance of 2 on the characteristic map P3' to reduce the width and height by 1/2, and increasing the channel number by 1 time; and then inputting the feature map after the downsampling of the feature map P3 'into a correction linear unit to adjust the distribution of the feature map data, and adding the feature map obtained after the adjustment with the feature map P4 to obtain a feature map P4'.
CN202111002973.5A 2021-08-30 2021-08-30 Image feature extraction method based on cross-layer residual double-path pyramid network Active CN113837199B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111002973.5A CN113837199B (en) 2021-08-30 2021-08-30 Image feature extraction method based on cross-layer residual double-path pyramid network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111002973.5A CN113837199B (en) 2021-08-30 2021-08-30 Image feature extraction method based on cross-layer residual double-path pyramid network

Publications (2)

Publication Number Publication Date
CN113837199A CN113837199A (en) 2021-12-24
CN113837199B true CN113837199B (en) 2024-01-09

Family

ID=78961539

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111002973.5A Active CN113837199B (en) 2021-08-30 2021-08-30 Image feature extraction method based on cross-layer residual double-path pyramid network

Country Status (1)

Country Link
CN (1) CN113837199B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115100473A (en) * 2022-06-29 2022-09-23 武汉兰丁智能医学股份有限公司 Lung cell image classification method based on parallel neural network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111339893A (en) * 2020-02-21 2020-06-26 哈尔滨工业大学 Pipeline detection system and method based on deep learning and unmanned aerial vehicle
CN111507359A (en) * 2020-03-09 2020-08-07 杭州电子科技大学 Self-adaptive weighting fusion method of image feature pyramid
CN111753677A (en) * 2020-06-10 2020-10-09 杭州电子科技大学 Multi-angle remote sensing ship image target detection method based on characteristic pyramid structure
CN112163449A (en) * 2020-08-21 2021-01-01 同济大学 Lightweight multi-branch feature cross-layer fusion image semantic segmentation method
CN112507861A (en) * 2020-12-04 2021-03-16 江苏科技大学 Pedestrian detection method based on multilayer convolution feature fusion

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108428229B (en) * 2018-03-14 2020-06-16 大连理工大学 Lung texture recognition method based on appearance and geometric features extracted by deep neural network
CN110136136B (en) * 2019-05-27 2022-02-08 北京达佳互联信息技术有限公司 Scene segmentation method and device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111339893A (en) * 2020-02-21 2020-06-26 哈尔滨工业大学 Pipeline detection system and method based on deep learning and unmanned aerial vehicle
CN111507359A (en) * 2020-03-09 2020-08-07 杭州电子科技大学 Self-adaptive weighting fusion method of image feature pyramid
CN111753677A (en) * 2020-06-10 2020-10-09 杭州电子科技大学 Multi-angle remote sensing ship image target detection method based on characteristic pyramid structure
CN112163449A (en) * 2020-08-21 2021-01-01 同济大学 Lightweight multi-branch feature cross-layer fusion image semantic segmentation method
CN112507861A (en) * 2020-12-04 2021-03-16 江苏科技大学 Pedestrian detection method based on multilayer convolution feature fusion

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Convolutional networks with cross-layer neurons for image recognition;Zeng Yu 等;《Information Sciences》;第241-254页 *
领域知识驱动的深度学习单幅图像去雨研究;傅雪阳;《中国博士学位论文全文数据库 信息科技辑》(第12期);第1-109页 *

Also Published As

Publication number Publication date
CN113837199A (en) 2021-12-24

Similar Documents

Publication Publication Date Title
CN111242288B (en) Multi-scale parallel deep neural network model construction method for lesion image segmentation
WO2022001623A1 (en) Image processing method and apparatus based on artificial intelligence, and device and storage medium
WO2022252272A1 (en) Transfer learning-based method for improved vgg16 network pig identity recognition
CN108304826A (en) Facial expression recognizing method based on convolutional neural networks
CN109949255A (en) Image rebuilding method and equipment
CN113807355A (en) Image semantic segmentation method based on coding and decoding structure
CN108846444A (en) The multistage depth migration learning method excavated towards multi-source data
CN113159073A (en) Knowledge distillation method and device, storage medium and terminal
CN111860528B (en) Image segmentation model based on improved U-Net network and training method
CN113706545A (en) Semi-supervised image segmentation method based on dual-branch nerve discrimination dimensionality reduction
CN107563430A (en) A kind of convolutional neural networks algorithm optimization method based on sparse autocoder and gray scale correlation fractal dimension
Zhang et al. Channel-wise and feature-points reweights densenet for image classification
CN113837199B (en) Image feature extraction method based on cross-layer residual double-path pyramid network
CN111310820A (en) Foundation meteorological cloud chart classification method based on cross validation depth CNN feature integration
Dong et al. Field-matching attention network for object detection
CN116912253B (en) Lung cancer pathological image classification method based on multi-scale mixed neural network
CN117152438A (en) Lightweight street view image semantic segmentation method based on improved deep LabV3+ network
CN114494284B (en) Scene analysis model and method based on explicit supervision area relation
CN114332491A (en) Saliency target detection algorithm based on feature reconstruction
CN113052810B (en) Small medical image focus segmentation method suitable for mobile application
Fan et al. EGFNet: Efficient guided feature fusion network for skin cancer lesion segmentation
CN113269702A (en) Low-exposure vein image enhancement method based on cross-scale feature fusion
CN113724266A (en) Glioma segmentation method and system
CN117456286B (en) Ginseng grading method, device and equipment
Wu et al. Lightweight stepless super-resolution of remote sensing images via saliency-aware dynamic routing strategy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant