CN113837199A - Image feature extraction method based on cross-layer residual error double-path pyramid network - Google Patents

Image feature extraction method based on cross-layer residual error double-path pyramid network Download PDF

Info

Publication number
CN113837199A
CN113837199A CN202111002973.5A CN202111002973A CN113837199A CN 113837199 A CN113837199 A CN 113837199A CN 202111002973 A CN202111002973 A CN 202111002973A CN 113837199 A CN113837199 A CN 113837199A
Authority
CN
China
Prior art keywords
network
feature map
feature
residual
cross
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111002973.5A
Other languages
Chinese (zh)
Other versions
CN113837199B (en
Inventor
胡杰
谢礼浩
安永鹏
熊宗权
徐文才
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN202111002973.5A priority Critical patent/CN113837199B/en
Publication of CN113837199A publication Critical patent/CN113837199A/en
Application granted granted Critical
Publication of CN113837199B publication Critical patent/CN113837199B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an image feature extraction method based on a cross-layer residual error two-way pyramid network, which comprises the steps of inputting an original RGB color image into a residual error network ResNet50 for primary feature extraction to obtain a bottom-up feature pyramid network DTFPN; implementing a cross-layer residual network based on the residual network ResNet 50; and obtaining output feature maps P1 ', P2 ', P3 ', P4 ', P5 ', after the feature pyramid network FPN processing. The invention further alleviates the network degradation problem of the residual network ResNet50 and blends the features of different levels to further extract deep features to remarkably enhance the feature extraction capability of the residual network ResNet 50. The defect that high-level features in a Feature Pyramid Network (FPN) lack low-level detail texture information is overcome, and efficient fusion of feature map information of each layer is achieved.

Description

Image feature extraction method based on cross-layer residual error double-path pyramid network
Technical Field
The invention relates to the fields of computer vision, artificial intelligence, pattern recognition and the like, in particular to an image feature extraction method based on a cross-layer residual error double-path pyramid network.
Background
With the development of artificial intelligence, convolutional neural networks become a main method for extracting image features, and the famous feature extraction networks include a lie network 5(LeNet5), an Alexanet network (AlexNet), a Visual Geometry Group (VGG), a Google network (GoogleNet), a residual error network (ResNet), and the like.
The plum network 5(LeNet5) was born in 1994, is one of the earliest convolutional neural networks, and promotes the development of deep learning. The method consists of two convolution layers, two pooling layers and two full-connection layers, wherein convolution adopts a convolution kernel with the convolution kernel size of 5x5, the step pitch is 1, and maximum pooling downsampling is adopted.
The alexans network (AlexNet) obtained the image network (ImageNet) race champion in 2012, which was a deeper and broader version of the lie network (LeNet), which contained 6 hundred million and 3000 million connections, 6000 million parameters and 65 million neurons, with 5 convolutional layers, 3 of which were followed by the max pooling layer and finally 3 fully connected layers. The alexans network (AlexNet) wins the champion of the image network Large-Scale Visual Recognition Challenge (ILSVRC) game with significant advantage, and the error rate of five prediction mean errors (top-5) is reduced from the previous 25.8% to 16.4. The main technical points of the alexant network (AlexNet) are: (1) the problem of gradient diffusion of logistic function (sigmoid) when the network is deep is solved by using a modified linear unit (ReLU) as an activation function of the Convolutional Neural Network (CNN). (2) Random discard (Dropout) was used during training to randomly ignore a portion of the neurons to avoid model overfitting. (3) The maximum pooling of the overlap is used in the Convolutional Neural Network (CNN), the step size is smaller than the pooling kernel, so that the outputs are overlapped and covered, and the richness of the features is improved. The prior Convolutional Neural Network (CNN) generally uses average pooling, and the AlexNet network (AlexNet) all uses maximum pooling to avoid the fuzzy effect of the average pooling. (4) And data enhancement is used, overfitting is reduced, and the generalization capability of the model is improved.
The Visual Geometry Group (VGG) is the first network to use smaller 3 × 3 convolution kernels on each convolution layer and combine them as a convolution sequence to process, and its features are many continuous convolution calculations and huge calculation amount. A great advance in the Visual Geometry Group (VGG) is that by using multiple 3x3 convolutions in sequence, the effect of a larger field of view can be simulated. Models of the Visual Geometry Group (VGG) show that depth is beneficial for improving classification accuracy, and an important idea is that convolution can replace full concatenation. The overall parameters reach 1 hundred million and 4 million, mainly lie in the first full-connected layer, and after the convolution is used for replacement, the parameter quantity is reduced and no precision is lost.
Google network (google net) -the first "beginning" (inclusion) architecture first appeared in the image network Large Scale Visual Recognition Challenge (ILSVRC) 2014 match, the first one was obtained with great advantage. The 'onset network' (initiation Net) in the game is generally called 'onset V1' (initiation V1), and the biggest characteristic is that the calculation amount and the parameter amount are controlled, and meanwhile, the very good classification performance is obtained, namely, the error rate of five times of prediction mean error (top-5) is 6.67%, and only half of the AlexNet (AlexNet) is less. The "open end V1" (inclusion V1) is 22 layers deep, even deeper than the 8 layers of the alexant network (AlexNet) or the 19 layers of the Visual Geometry Group (VGG) network. However, the calculation amount is only 15 hundred million floating point operations, and at the same time, the calculation amount is only 500 ten thousand of parameter amounts, which is only 1/12 of the parameter amount (6000 ten thousand) of the alexan network (AlexNet), but the accuracy rate can be far better than that of the alexan network (AlexNet), so that the model is very excellent and very practical. The versions of V2, V3 and V4 are successively introduced on the basis of the 'beginning V1' (inclusion V1).
Residual network ResNet was proposed in 2015 to obtain the first name on the image network (ImageNet) race classification task because it coexists "simple and practical", and then many methods are based on residual network ResNet50 or residual network ResNet 101. The residual error network ResNet provides a residual error structure and uses a Batch Normalization method to effectively solve the problems of gradient disappearance or gradient explosion and network degradation of a deep network, so that the performance of a feature extraction network of the ultra-deep residual error network ResNet is greatly improved compared with the prior art, and excellent performances are obtained in the fields of image detection, image classification, image segmentation and the like.
The Feature Pyramid Network (FPN) constructs a feature pyramid network capable of performing end-to-end training, and the feature pyramid network fuses the high-level features extracted by the feature extraction network with the low-level features after down-sampling, so that semantic information of the low-level features is enriched. For small targets, a Feature Pyramid Network (FPN) increases the resolution of the feature map, i.e., operates on a larger feature map to obtain more information about the small target.
The feature graph output by the residual error network module can still be connected by residual errors to form a cross-layer residual error network module (spanning multiple residual error layers in the original residual error network module), that is, if the input of a certain residual error network module is x and the expected output is h (x), if we directly transmit the input x to the output as an initial result, the target to be learned by the layer of residual error network module is f (x) ═ h (x) -x, which is equivalent to changing the learning target of the residual error network module, and learning f (x) is much easier than learning h (x). Therefore, the ResNet structure is optimized again to form a cross-layer residual network, so that the problem of residual network ResNet network degradation can be further relieved, and deep features can be further extracted by mixedly utilizing features of different levels. For a Feature Pyramid Network (FPN), high-level features are fused into low-level features, although information of the low-level features can be greatly enriched, the high-level features are not improved, and the high-level features also need to supplement low-level feature texture information, which results in insufficient feature fusion and limited network performance.
The invention content is as follows:
in order to overcome the defects of the background technology, the invention provides an image feature extraction method based on a cross-layer residual error two-way pyramid network, which realizes two goals on the premise of equivalent to the original network feature extraction speed: (1) further reducing the network degradation problem of the residual network ResNet50 and blending the features of different levels to further extract deep features to significantly enhance the feature extraction capability of the residual network ResNet 50. (2) The defect that high-level features in a Feature Pyramid Network (FPN) lack low-level detail texture information is overcome, and efficient fusion of feature map information of each layer is achieved.
In order to solve the technical problems, the invention adopts the technical scheme that:
an image feature extraction method based on a cross-layer residual error two-way pyramid network comprises the following steps:
step S1, inputting the original RGB color image into residual network ResNet50 for preliminary feature extraction, conv1 convolution network module 1 of residual network ResNet50 outputting feature map P0, conv2_ x residual network module 2 of residual network ResNet50 outputting feature maps P1, P1 ', P1 ═ P1', conv3_ x residual network module 3 of residual network ResNet50 outputting feature map P2, conv4_ x residual network module 4 of residual network ResNet50 outputting feature map P3, conv5_ x residual network module 5 of residual network ResNet50 outputting feature map P4;
step S2, down-sampling the feature map P1 ', fusing the down-sampled feature map with the feature map P2 to obtain a feature map P2', down-sampling the feature map P2 ', fusing the down-sampled feature map P2' with the feature map P3 to obtain a feature map P3 ', down-sampling the feature map P3', fusing the down-sampled feature map P3 'with the feature map P4 to obtain a feature map P4', and obtaining a feature pyramid network DTFPN from the bottom up;
step S3, the characteristic maps P1 ', P2, P2' and the intermediate network constitute a cross-layer residual network module (a plurality of residual layers in the residual network module which crosses the residual network ResNet 50); the feature maps P2 ', P3, P3' and the intermediate network thereof form a cross-layer residual error network module; the feature maps P3 ', P4, P4' and the intermediate network thereof form a cross-layer residual error network module; implementing a cross-layer residual network based on the residual network ResNet 50;
step S4, inputting the feature maps P1 ', P2', P3 'and P4' into a feature pyramid network FPN, and establishing a cross-layer residual error two-way pyramid network with the feature pyramid network FPN; the feature maps P1 ', P2 ', P3 ' and P4 ' are processed by a feature pyramid network FPN to obtain output feature maps P1 ', P2 ', P3 ', P4 ', P5 '.
Preferably, in step 1, the width and height of the characteristic map Pi (i ═ 0,1,2,3) are 1/2 of the characteristic map Pi +1, and the number of channels of the characteristic map Pi is 2 times the number of channels of the characteristic map Pi + 1.
Preferably, step 2 comprises:
s2.1, adopting downsampling operation with convolution kernel size of 1x1 and step pitch of 2 to the feature map P1' to reduce the width and height of the feature map into 1/2, and increasing the number of channels by 1 time; inputting the feature map after downsampling of the feature map P1 'into a correction linear unit to adjust the distribution of the feature map data, and adding the adjusted feature map and the feature map P2 to obtain a feature map P2';
s2.2, adopting downsampling operation with convolution kernel size of 1x1 and step pitch of 2 to the feature map P2' to reduce the width and height of the feature map into 1/2, and increasing the number of channels by 1 time; then inputting the feature map after downsampling of the feature map P2 'into a correction linear unit to adjust the distribution of the feature map data, and adding the adjusted feature map and the feature map P3 to obtain a feature map P3';
s2.3, adopting downsampling operation with convolution kernel size of 1x1 and step pitch of 2 to the feature map P3' to reduce the width and height of the feature map into 1/2, and increasing the number of channels by 1 time; next, the feature map obtained by down-sampling the feature map P3 'is input to a modified linear unit to adjust the distribution of the feature map data, and the feature map obtained by the adjustment is added to the feature map P4 to obtain a feature map P4'.
Preferably, in step S1, the feature maps P1, P2 'and P3' are respectively calculated by the conv3_ x residual network module 3, the conv4_ x residual network module 4 and the conv5_ x residual network module 5 of the residual network ResNet50 in step S1 to obtain the feature maps P2, P3 and P4.
Preferably, the step 3 cross-layer residual network module refers to a plurality of residual layers in the residual network module that cross the residual network ResNet 50.
The invention has the beneficial effects that:
(1) according to the method, a bottom-up Feature Pyramid Network (DTFPN) is built based on Feature maps output by residual Network ResNet50 residual Network modules, the supplement of low-level texture detail information to high-level Feature map information is realized, the defect that high-level features in the Feature Pyramid Network (FPN) lack low-level detail texture information is effectively overcome, and the efficient fusion of the Feature map information of each layer is realized.
(2) On the basis of a bottom-up feature pyramid network (DTFPN) in the step (1), a Cross-layer residual error network (Cross-layer ResNet50) based on a residual error network ResNet50 is built, so that the problem of network degradation of the residual error network ResNet50 is further relieved, deep features are further extracted by mixedly utilizing features of different levels, and the feature extraction capability of the residual error network ResNet50 is remarkably enhanced.
Compared with a Faster Region-based convolutional Neural network (fast Region-based convolutional Neural Networks, fast _ R-CNN) based on a feature extraction network ResNet50-FPN, the Faster Region-based convolutional Neural network (fast Region-based convolutional Neural Networks, fast _ R-CNN) based on the Cross-layer residual error two-way pyramid network (Cross-layer residual Bi-FPN) provided by the invention has the target detection average accuracy AP (0.5-0.95) on a Kate (KITTY) data set improved by 3.8%, but the inference network speed is almost kept unchanged.
Drawings
FIG. 1 is a diagram of an overall network framework for implementing the solution of the present invention;
FIG. 2 is a diagram illustrating a detailed structure of a bottom-up Feature Pyramid Network (DTFPN) according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a Cross-layer residual structure of a Cross-layer residual network (Cross-layer ResNet50) according to an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings and examples.
The invention provides an image Feature extraction method based on a Cross-layer residual error two-way Pyramid Network (Cross-layer residual Bi-FPN), which comprises a Cross-layer residual error Network (Cross-layer ResNet50) and a Feature Pyramid Network (FPN) which are self-designed based on a residual error Network ResNet50, wherein the Cross-layer residual error Network (Cross-layer ResNet50) comprises a brand-new bottom-up Feature Pyramid Network (DTFPN). The invention is realized by the following steps: s1, defining feature maps output by the convolutional network module 1(conv1), the residual network module 2(conv2_ x), the residual network module 3(conv3_ x), the residual network module 4(conv4_ x), and the residual network module 5(conv5_ x) of the original image via the residual network ResNet50 skeleton network, which are P0, P1, P1 '(P1 ═ P1'), P2, P3, and P4, respectively. S2, the input of this step is feature maps P1 ', P2, P3 and P4 output in step S1, a feature map P2 ' is obtained by down-sampling the feature map P1 ' and fusing it with the feature map P2, a feature map P3 ' is obtained by down-sampling the feature map P2 ' and fusing it with the feature map P3, and a feature map P4 ' is obtained by down-sampling the feature map P3 ' and fusing it with the feature map P4, thereby constructing a feature pyramid network (DTFPN) from the bottom up, and feature maps P1 ', P2 ', P3 ' and P4 ' are output in this step. The inputs of S3 and this step are the signature P1 output in step S1 and the signature P2 'and P3' output in step S2. Feature maps P1, P2 'and P3' are respectively calculated by a residual network module 3(conv3_ x), a residual network module 4(conv4_ x) and a residual network module 5(conv5_ x) in a residual network ResNet50 to obtain feature maps P2, P3 and P4, so that the feature maps P1 ', P2 and P2' and intermediate networks thereof form a Cross-layer residual network module (a plurality of residual layers in the residual network module spanning the residual network ResNet50), the feature maps P2 ', P3 and P3' and intermediate networks thereof form a Cross-layer residual network module, and the feature maps P3 ', P4, P4' and intermediate networks thereof form a Cross-layer residual network module, thereby realizing a Cross-layer residual network (Cross-layer ResNet 50). The inputs of S4 and this step are feature maps P1 ', P2', P3 'and P4' output in step S2. Inputting the feature maps P1 ', P2', P3 'and P4' into a Feature Pyramid Network (FPN), and establishing a Cross-layer residual error two-way pyramid network (Cross-layer residual Bi-FPN) with the Feature Pyramid Network (FPN). The feature maps P1 ', P2 ', P3 ' and P4 ' are processed by a Feature Pyramid Network (FPN) to output feature maps P1 ', P2 ', P3 ', P4 ', P5 '. The invention forms a new feature extraction network by designing a new bottom-up feature pyramid network and a cross-layer residual error structure, further reduces the degradation problem of a residual error network ResNet50 network on the premise of equivalent to the original network feature extraction speed, and further extracts deep features by mixedly utilizing features of different levels, thereby solving the defect that high-level features in a Feature Pyramid Network (FPN) lack low-level detail information, and realizing the high-efficiency fusion of each layer of feature map information. The method is excellent in application to tasks such as image target detection and semantic segmentation.
Fig. one of the accompanying drawings is a general Network framework diagram of the technical solution of this embodiment, and the Network includes a Cross-layer residual Network (Cross-layer ResNet50) and a Feature Pyramid Network (FPN) which are self-designed based on the residual Network ResNet50, where the Cross-layer residual Network (Cross-layer ResNet50) includes a completely new bottom-up Feature Pyramid Network (DTFPN). Constructing a Cross-layer residual error two-way Pyramid Network (Cross-layer residual Bi-FPN), wherein the Cross-layer residual error Network (Cross-layer ResNet50) and a Feature Pyramid Network (FPN) are designed based on the residual error Network ResNet50, and the Cross-layer residual error Network (Cross-layer ResNet50) comprises a brand-new bottom-up Feature Pyramid Network (DTFPN). The detailed steps for building the whole feature extraction network are as follows:
s1, as shown in the attached table, the residual network ResNet50 feature extraction part is composed of a convolutional network module 1(conv1), a residual network module 2(conv2_ x), a residual network module 3(conv3_ x), a residual network module 4(conv4_ x), and a residual network module 5(conv5_ x), where "conv 2_ x" and each residual network module thereafter are composed of a plurality of residual layer structures. The original RGB color image is input into a residual network ResNet50 for preliminary feature extraction, and defined as "conv 1", "conv 2_ x", "conv 3_ x", "conv 4_ x" and "conv 5_ x" respectively output feature maps P0, P1, P1 '(P1 ═ P1'), P2, P3 and P4. The width and height of the characteristic map Pi (i is 0,1,2,3) are 1/2 of the characteristic map Pi +1, and the number of channels of the characteristic map Pi is 2 times of the number of channels of the characteristic map Pi + 1.
Figure BDA0003236207560000101
Figure BDA0003236207560000111
Attached table-network architecture of residual error network resnet50 feature extraction part in this example
The input of the step S2 is the characteristic maps P1', P2, P3 and P4 output in the step S1. The feature map P1 'is downsampled and then fused with the feature map P2 to obtain a feature map P2', the feature map P2 'is downsampled and then fused with the feature map P3 to obtain a feature map P3', and the feature map P3 'is downsampled and then fused with the feature map P4 to obtain a feature map P4', so that a bottom-up feature pyramid network (DTFPN) is constructed. The outputs of this step are feature maps P1 ', P2', P3 ', P4'. The details of downsampling and fusion are shown in fig. three, and are further described below with reference to fig. three:
s2.1, the input of the step is the characteristic map P1' and P2 output by the step S1. The feature map P1 'is downsampled by a convolution kernel size of 1x1 and a step size of 2, the width and the height of the feature map are reduced by 1/2, the number of channels is increased by 1 time, and the feature map P1' is guaranteed to be the same as the feature map P2 after downsampling. Next, the feature map obtained after downsampling the feature map P1 'is input to a modified linear unit (ReLU) to adjust the distribution of the feature map data, and the output feature map is added to the feature map P2 to obtain a feature map P2'. This step outputs a feature map P2'.
S2.2, the input of the step is the characteristic diagram P3 output by the step S1 and the characteristic diagram P2' output by the step S2.1. The feature map P2 'is downsampled by a convolution kernel size of 1x1 and a step size of 2, the width and the height of the feature map are reduced by 1/2, the number of channels is increased by 1 time, and the feature map P2' is guaranteed to be the same as the feature map P3 after downsampling. Next, the feature map obtained after downsampling the feature map P2 'is input to a modified linear unit (ReLU) to adjust the distribution of the feature map data, and the output feature map is added to the feature map P3 to obtain a feature map P3'. This step outputs a feature map P3'.
S2.3, the input of the step is the characteristic map P4 output by the step S1 and the characteristic map P3' output by the step S2.2. The feature map P3 'is downsampled by a convolution kernel size of 1x1 and a step size of 2, the width and the height of the feature map are reduced by 1/2, the number of channels is increased by 1 time, and the feature map P3' is guaranteed to be the same as the feature map P4 after downsampling. Next, the feature map obtained after downsampling the feature map P3 'is input to a modified linear unit (ReLU) to adjust the distribution of the feature map data, and the output feature map is added to the feature map P4 to obtain a feature map P4'. This step outputs a feature map P4'.
The inputs of S3 and this step are the signature P1 output in step S1 and the signature P2 'and P3' output in step S2. Feature maps P1, P2 'and P3' are calculated by residual network module 3(conv3_ x), residual network module 4(conv4_ x) and residual network module 5(conv5_ x) of residual network ResNet50 in step S1, respectively, to obtain feature maps P2, P3 and P4. As shown in fig. three, the Cross-layer residual network (Cross-layer ResNet50) based on the residual network ResNet50 is realized by constructing a Cross-layer residual network module with feature maps P1 ', P2, P2' and their intermediate networks (a plurality of residual layers in the residual network module spanning the residual network ResNet50), a Cross-layer residual network module with feature maps P2 ', P3, P3' and their intermediate networks, and a Cross-layer residual network module with feature maps P3 ', P4, P4' and their intermediate networks.
The inputs of S4 and this step are feature maps P1 ', P2', P3 'and P4' output in step S2. Inputting the feature maps P1 ', P2', P3 'and P4' into a Feature Pyramid Network (FPN), and establishing a Cross-layer residual error two-way pyramid network (Cross-layer residual Bi-FPN) with the Feature Pyramid Network (FPN). The feature maps P1 ', P2 ', P3 ' and P4 ' are processed by a Feature Pyramid Network (FPN) and then output feature maps P1 ', P2 ', P3 ', P4 ', P5 ', thereby completing all steps of extracting the feature maps of the original image input Cross-layer residual two-way pyramid network (Cross-layer residual Bi-FPN).
It will be understood that modifications and variations can be made by persons skilled in the art in light of the above teachings and all such modifications and variations are intended to be included within the scope of the invention as defined in the appended claims.

Claims (5)

1. An image feature extraction method based on a cross-layer residual error two-way pyramid network is characterized by comprising the following steps:
step S1, inputting the original RGB color image into residual network ResNet50 for preliminary feature extraction, where conv1 convolution network module 1 of residual network ResNet50 outputs feature map P0, conv2_ x residual network module 2 of residual network ResNet50 outputs feature maps P1, P1 ', P1 ═ P1', conv3_ x residual network module 3 of residual network ResNet50 outputs feature map P2, conv4_ x residual network module 4 of residual network ResNet50 outputs feature map P3, and conv5_ x residual network module 5 of residual network ResNet50 outputs feature map P4;
step S2, down-sampling the feature map P1 ', fusing the down-sampled feature map with the feature map P2 to obtain a feature map P2', down-sampling the feature map P2 ', fusing the down-sampled feature map P2' with the feature map P3 to obtain a feature map P3 ', down-sampling the feature map P3', fusing the down-sampled feature map P3 'with the feature map P4 to obtain a feature map P4', and obtaining a feature pyramid network DTFPN from the bottom up;
step S3, the characteristic maps P1 ', P2, P2' and the intermediate network constitute a cross-layer residual network module (a plurality of residual layers in the residual network module which crosses the residual network ResNet 50); the feature maps P2 ', P3, P3' and the intermediate network thereof form a cross-layer residual error network module; the feature maps P3 ', P4, P4' and the intermediate network thereof form a cross-layer residual error network module; implementing a cross-layer residual network based on the residual network ResNet 50;
step S4, inputting the feature maps P1 ', P2', P3 'and P4' into a feature pyramid network FPN, and establishing a cross-layer residual error two-way pyramid network with the feature pyramid network FPN; the feature maps P1 ', P2 ', P3 ' and P4 ' are processed by a feature pyramid network FPN to obtain output feature maps P1 ', P2 ', P3 ', P4 ', P5 '.
2. The method for extracting image features based on the cross-layer residual error two-way pyramid network as claimed in claim 1, wherein: in step 1, the width and height of the characteristic map Pi (i is 0,1,2,3) are 1/2 of the characteristic map Pi +1, and the number of channels of the characteristic map Pi is 2 times the number of channels of the characteristic map Pi + 1.
3. The method for extracting image features based on the cross-layer residual error two-way pyramid network as claimed in claim 1, wherein the step 2 comprises:
s2.1, adopting downsampling operation with convolution kernel size of 1x1 and step pitch of 2 to the feature map P1' to reduce the width and height of the feature map into 1/2, and increasing the number of channels by 1 time; inputting the feature map after downsampling of the feature map P1 'into a correction linear unit to adjust the distribution of the feature map data, and adding the adjusted feature map and the feature map P2 to obtain a feature map P2';
s2.2, adopting downsampling operation with convolution kernel size of 1x1 and step pitch of 2 to the feature map P2' to reduce the width and height of the feature map into 1/2, and increasing the number of channels by 1 time; then inputting the feature map after downsampling of the feature map P2 'into a correction linear unit to adjust the distribution of the feature map data, and adding the adjusted feature map and the feature map P3 to obtain a feature map P3';
s2.3, adopting downsampling operation with convolution kernel size of 1x1 and step pitch of 2 to the feature map P3' to reduce the width and height of the feature map into 1/2, and increasing the number of channels by 1 time; next, the feature map obtained by down-sampling the feature map P3 'is input to a modified linear unit to adjust the distribution of the feature map data, and the feature map obtained by the adjustment is added to the feature map P4 to obtain a feature map P4'.
4. The method for extracting image features based on cross-layer residual error two-way pyramid network as claimed in claim 1, wherein the feature maps P1, P2 'and P3' in step S3 are respectively calculated by the conv3_ x residual error network module 3, the conv4_ x residual error network module 4 and the conv5_ x residual error network module 5 of the residual error network ResNet50 in step S1 to obtain the feature maps P2, P3 and P4.
5. The method as claimed in claim 1, wherein the step 3 of the cross-layer residual error network module refers to a plurality of residual error layers in a residual error network module spanning a residual error network ResNet 50.
CN202111002973.5A 2021-08-30 2021-08-30 Image feature extraction method based on cross-layer residual double-path pyramid network Active CN113837199B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111002973.5A CN113837199B (en) 2021-08-30 2021-08-30 Image feature extraction method based on cross-layer residual double-path pyramid network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111002973.5A CN113837199B (en) 2021-08-30 2021-08-30 Image feature extraction method based on cross-layer residual double-path pyramid network

Publications (2)

Publication Number Publication Date
CN113837199A true CN113837199A (en) 2021-12-24
CN113837199B CN113837199B (en) 2024-01-09

Family

ID=78961539

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111002973.5A Active CN113837199B (en) 2021-08-30 2021-08-30 Image feature extraction method based on cross-layer residual double-path pyramid network

Country Status (1)

Country Link
CN (1) CN113837199B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115100473A (en) * 2022-06-29 2022-09-23 武汉兰丁智能医学股份有限公司 Lung cell image classification method based on parallel neural network

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111339893A (en) * 2020-02-21 2020-06-26 哈尔滨工业大学 Pipeline detection system and method based on deep learning and unmanned aerial vehicle
CN111507359A (en) * 2020-03-09 2020-08-07 杭州电子科技大学 Self-adaptive weighting fusion method of image feature pyramid
US20200258218A1 (en) * 2018-03-14 2020-08-13 Dalian University Of Technology Method based on deep neural network to extract appearance and geometry features for pulmonary textures classification
US20200272825A1 (en) * 2019-05-27 2020-08-27 Beijing Dajia Internet Information Technology Co., Ltd. Scene segmentation method and device, and storage medium
CN111753677A (en) * 2020-06-10 2020-10-09 杭州电子科技大学 Multi-angle remote sensing ship image target detection method based on characteristic pyramid structure
CN112163449A (en) * 2020-08-21 2021-01-01 同济大学 Lightweight multi-branch feature cross-layer fusion image semantic segmentation method
CN112507861A (en) * 2020-12-04 2021-03-16 江苏科技大学 Pedestrian detection method based on multilayer convolution feature fusion

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200258218A1 (en) * 2018-03-14 2020-08-13 Dalian University Of Technology Method based on deep neural network to extract appearance and geometry features for pulmonary textures classification
US20200272825A1 (en) * 2019-05-27 2020-08-27 Beijing Dajia Internet Information Technology Co., Ltd. Scene segmentation method and device, and storage medium
CN111339893A (en) * 2020-02-21 2020-06-26 哈尔滨工业大学 Pipeline detection system and method based on deep learning and unmanned aerial vehicle
CN111507359A (en) * 2020-03-09 2020-08-07 杭州电子科技大学 Self-adaptive weighting fusion method of image feature pyramid
CN111753677A (en) * 2020-06-10 2020-10-09 杭州电子科技大学 Multi-angle remote sensing ship image target detection method based on characteristic pyramid structure
CN112163449A (en) * 2020-08-21 2021-01-01 同济大学 Lightweight multi-branch feature cross-layer fusion image semantic segmentation method
CN112507861A (en) * 2020-12-04 2021-03-16 江苏科技大学 Pedestrian detection method based on multilayer convolution feature fusion

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZENG YU 等: "Convolutional networks with cross-layer neurons for image recognition", 《INFORMATION SCIENCES》, pages 241 - 254 *
傅雪阳: "领域知识驱动的深度学习单幅图像去雨研究", 《中国博士学位论文全文数据库 信息科技辑》, no. 12, pages 1 - 109 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115100473A (en) * 2022-06-29 2022-09-23 武汉兰丁智能医学股份有限公司 Lung cell image classification method based on parallel neural network

Also Published As

Publication number Publication date
CN113837199B (en) 2024-01-09

Similar Documents

Publication Publication Date Title
WO2020244261A1 (en) Scene recognition system for high-resolution remote sensing image, and model generation method
CN110929610B (en) Plant disease identification method and system based on CNN model and transfer learning
Teow Understanding convolutional neural networks using a minimal model for handwritten digit recognition
CN107480726A (en) A kind of Scene Semantics dividing method based on full convolution and shot and long term mnemon
CN109492666A (en) Image recognition model training method, device and storage medium
CN108764195A (en) Handwriting model training method, hand-written character recognizing method, device, equipment and medium
Wu et al. Convolutional reconstruction-to-sequence for video captioning
CN109817276A (en) A kind of secondary protein structure prediction method based on deep neural network
Wang et al. Sketch-guided scenery image outpainting
Lin et al. Lateral refinement network for contour detection
CN114821058A (en) Image semantic segmentation method and device, electronic equipment and storage medium
CN113837199A (en) Image feature extraction method based on cross-layer residual error double-path pyramid network
CN110222817A (en) Convolutional neural networks compression method, system and medium based on learning automaton
Zhang et al. Skip-attention encoder–decoder framework for human motion prediction
CN111310820A (en) Foundation meteorological cloud chart classification method based on cross validation depth CNN feature integration
CN111914993B (en) Multi-scale deep convolutional neural network model construction method based on non-uniform grouping
CN116704079B (en) Image generation method, device, equipment and storage medium
Asaad Keras Deep Learning for Pupil Detection Method
CN110766083A (en) Alexnet mural image classification method based on feature fusion
CN114494284B (en) Scene analysis model and method based on explicit supervision area relation
Lei et al. English Letter Recognition Based on TensorFlow Deep Learning
CN115620064A (en) Point cloud down-sampling classification method and system based on convolutional neural network
Zhou et al. Research on knowledge distillation algorithm based on Yolov5 attention mechanism
Lu et al. Mixseg: a lightweight and accurate mix structure network for semantic segmentation of apple leaf disease in complex environments
Zhang Research on Applying Dense Convolutional Neural Network in Chinese Character Font Recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant