CN111028235B - Image segmentation method for enhancing edge and detail information by utilizing feature fusion - Google Patents

Image segmentation method for enhancing edge and detail information by utilizing feature fusion Download PDF

Info

Publication number
CN111028235B
CN111028235B CN201911094462.3A CN201911094462A CN111028235B CN 111028235 B CN111028235 B CN 111028235B CN 201911094462 A CN201911094462 A CN 201911094462A CN 111028235 B CN111028235 B CN 111028235B
Authority
CN
China
Prior art keywords
feature map
feature
pooling
convolution
decoding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911094462.3A
Other languages
Chinese (zh)
Other versions
CN111028235A (en
Inventor
朱和贵
苗艳
Original Assignee
东北大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 东北大学 filed Critical 东北大学
Priority to CN201911094462.3A priority Critical patent/CN111028235B/en
Publication of CN111028235A publication Critical patent/CN111028235A/en
Application granted granted Critical
Publication of CN111028235B publication Critical patent/CN111028235B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20172Image enhancement details
    • G06T2207/20192Edge enhancement; Edge preservation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Processing (AREA)

Abstract

The invention provides an image segmentation method for enhancing edge and detail information by utilizing feature fusion, and relates to the technical field of computer vision. The method utilizes a convolutional neural network to extract characteristics of an input image; inputting the extracted features into a decoding structure added with more feature fusion, and enriching edge and detail information while recovering the resolution of the image to obtain a dense feature map; outputting maximum values of different classifications by a normalization method; and calculating a cross entropy loss function, and updating the weight in the network by using a random gradient descent method. The method can restore the position and boundary detail information lost in the encoding stage while restoring the resolution of the feature map, enrich the information of the picture, obtain the dense feature map, make up the sparse feature map brought by direct up-sampling, make the divided boundary and detail clearer, and promote the division effect on the detail tiny objects.

Description

Image segmentation method for enhancing edge and detail information by utilizing feature fusion
Technical Field
The invention relates to the technical field of computer vision, in particular to an image segmentation method for enhancing edge and detail information by utilizing feature fusion.
Background
Along with the continuous progress of science and technology and the rapid development of national economy, artificial intelligence gradually enters the field of view of people, plays an increasingly heavy role in the production and life of human beings, is widely applied to various fields, image semantic segmentation is an important research direction of the artificial intelligence, is a very important means for realizing automatic scene understanding, and can be applied to various fields such as an automatic driving system, unmanned and application.
The image semantic segmentation technology is an important branch in the field of computer vision in machine learning, and the image semantic segmentation is to process an input image, automatically segment and identify the content in the image. Prior to applying deep learning to the field of computer vision, classifiers that construct semantic segmentation of images are typically either using a forest of texels, or a random forest. With the appearance and the vigorous development of the deep convolutional neural network, a very effective method is provided for semantic segmentation, the CNN is well developed when being applied to the semantic segmentation, the development of the semantic segmentation is promoted, and the CNN is remarkably achieved when being applied to various fields.
Many classical segmentation methods appear after deep learning is applied to semantic segmentation, such as a full convolution network FCN, a SegNet network with an encoder-decoder structure and a deep Lab with hole convolution, but as the hierarchy of the CNN network deepens, the continuous pooling and downsampling can lead the position information and boundary detail information of pictures to be lost, the process is irreversible, the removed information cannot be completely recovered, so that the feature map after upsampling in the decoding stage can be sparse due to the loss of the information, and the methods have certain limitations.
The full convolutional network FCN and the conventional SegNet network lose position and edge details due to downsampling, lost information is not reproduced when upsampling is performed in a decoding stage, the obtained feature map is sparse, and although the SegNet network recovers the position information through pooling indexes and enriches boundary and detail information by utilizing convolution operation, a great amount of information loss still exists.
The hole convolution is a convolution layer capable of obtaining dense feature images, but the calculation cost of using the hole convolution is relatively high, and a large amount of memory is occupied by processing a large amount of high-resolution feature images.
The problem of the existing image semantic segmentation method is that the maintenance of edge detail characteristics and position information still needs to be further improved, and the segmentation accuracy is still to be improved.
Disclosure of Invention
The invention aims to solve the technical problem of providing an image segmentation method for enhancing edge and detail information by utilizing feature fusion to realize segmentation of images aiming at the defects of the prior art.
In order to solve the technical problems, the invention adopts the following technical scheme: an image segmentation method for enhancing edge and detail information by utilizing feature fusion, comprising the following steps:
step 1: processing the images in the training data set to obtain images with uniform resolution;
step 1.1: scaling and cutting the images in the training data set to enable the input images to have uniform sizes;
step 1.2: fixing the resolution of the input image to 360×480;
step 2: inputting the image into a coding structure for feature extraction; the coding structure is the same as the SegNet network, the first 13 layers of VGG-16 are adopted, and meanwhile, the maximum pooling index is added during pooling to memorize the maximum value of pixels in an image and the position of the maximum value;
the convolution kernel size of each convolution layer of the coding structure is 3×3, and the feature map after each convolution layer is named conv_i_j, where i=1, 2,3,4,5, j=1, 2 when i=1, 2, and j=1, 2,3 when i=3, 4,5; meanwhile, each convolution layer is followed by a connection Batch Normalisation and a ReLU activation function; adding a maximum pooling index into each pooling layer, realizing downsampling by using 2 multiplied by 2 non-overlapping maximum pooling, and memorizing the position of the maximum value of the pixel through the maximum pooling index, wherein the characteristic diagram obtained by each pooling layer is represented by pool_r, wherein r=1, 2,3,4 and 5;
the specific method for memorizing the maximum value of the pixels in the image and the positions thereof by adding the maximum pooling index during pooling is as follows:
for the input feature map X ε R h×w×c Wherein h and w are the height and width of the feature map respectively, c is the number of channels, and the feature map is obtained through 2X 2 non-overlapping maximization poolingWherein the value of pixel (i, j) is shown in the following formula:
the position corresponding to the maximum value of the pixel point is recorded as (m i ,n j ) The following formula is shown:
step 3: inputting a pooling feature map pool_5 obtained through the coding structure into a decoding structure added with more feature fusion, releasing the maximum value of pixels in situ by utilizing the maximum pooling index, filling the rest positions with 0, and realizing up-sampling by 2 times to obtain a sparse feature map upsampling5;
the decoding structure comprises three-layer convolution structures and two-layer convolution structures; each convolutional layer in the decoding structure is followed by a concatenation Batch Normalisation and a ReLU activation function;
the obtained sparse feature map upsampling5 is characterized in that the value of each pixel is shown in the following formula:
wherein Z is u,v Pixel values of pixel points (u, v) in the sparse feature map upsampling5;
step 4: performing one-time feature fusion operation through a decoding structure, fusing a sparse feature map upsampling5 with convolution feature maps conv_5_1 and conv_5_2, and fusing a feature map obtained by fusion with a pooling feature map pool_4 with a corresponding size to obtain a fusion feature map F 1
The fusion process is to add pixel values at corresponding positions in the feature map;
will fuse the feature map F 1 Inputting the data into a first three-layer convolution structure to perform convolution operation to obtain a dense feature map conv_decoding 5, and compensating information loss caused by pooling and downsampling;
step 5: performing four feature fusion operations through the decoding structure, and repeatedly performing up-sampling, feature fusion and convolution operations until the resolution of the feature map is restored to the original size;
step 5.1: performing secondary feature fusion through the decoding structure to recover image information;
step 5.1.1: 2 times up-sampling the conv_decoding 5 by using the maximum pooling index stored when generating the pooling feature map pool_4 to obtain a sparse feature map upsampling4;
step 5.1.2: the sparse feature map upsampling4 is compared with a convolution feature map conv_4_1, conv/u which is extracted from a coding structure and has the same resolution4_2, pooling the feature images pool_3 to obtain a fused feature image F 2
Step 5.1.3: will fuse the feature map F 2 Inputting the dense feature map conv_decoding 4 into a second three-layer convolution structure to perform convolution operation;
step 5.2: performing third feature fusion through the decoding structure to recover image information;
step 5.2.1: performing 2-time up-sampling on the feature map conv_decoding 4 by using a maximum pooling index stored when generating pooling feature map pool_3 to obtain sparse feature map upsampling3;
step 5.2.2: performing feature fusion on the sparse feature map upsampling3 and the convolution feature maps conv3_1, conv3_2 and pooling feature map pool_2 which are extracted from the coding structure and have the same resolution to obtain a fusion feature map F 3
Step 5.2.3: will fuse the feature map F 3 Inputting the dense feature map conv_decoding 3 into a third three-layer convolution structure to perform convolution operation;
step 5.3: performing fourth feature fusion through the decoding structure to recover the detail information of the image;
step 5.3.1: performing 2-time up-sampling on the feature map conv_decoding 3 by using a maximum pooling index stored when generating pooling feature map pool_2 to obtain sparse feature map upsampling2;
step 5.3.2: feature fusion is carried out on the sparse feature map upsampling2, the convolution feature map conv_2_1 and the pooling feature map pool_1 to obtain a fusion feature map F 4
Step 5.3.3: according to the symmetry of SegNet network, the feature map F is fused 4 Inputting the two-layer characteristic images into a first two-layer convolution structure to carry out convolution operation to obtain a dense characteristic image conv_decoding 2;
step 5.4: performing fifth feature fusion through the decoding structure to recover the edge information of the image;
step 5.4.1: performing 2-time up-sampling on the feature map conv_decoding 2 by using a maximum pooling index stored when generating pooling feature map pool_1 to obtain sparse feature map upsampling1;
step (a)5.4.2: feature fusion is carried out on the sparse feature map upsampling1 and the convolution feature map conv_1_1 to obtain a fusion feature map F 5
Step 5.4.3: will fuse the feature map F 5 Inputting the two-layer convolution structure to a second two-layer convolution structure to perform convolution operation to obtain a dense feature map conv_decoding 1;
step 6: inputting the dense feature map conv_decoding 1 into a Softmax layer to obtain the maximum probability of pixel classification in the image;
step 7: the cross entropy loss function is calculated through the maximum probability of pixel classification in the image, and the convolution kernel parameters of each convolution layer and pooling layer in the coding structure and the decoding structure are updated through a random gradient descent method, so that the image segmentation is realized.
The technical principle of the method of the invention is as follows: improving the decoding stage on the basis of the original SegNet network, and recovering the image position and boundary detail information while recovering the resolution of the feature map to obtain a dense feature map; since the features of the image are extracted by the convolution layer and the pooling layer in the coding structure, and the convolution layers and the pooling layers with different depths extract information with different scales, global low-level semantic information such as edges, directions, textures, chromaticity and the like is extracted in the shallow structure, and local high-level semantic information such as the shape of an object is extracted in the deep structure, the more abstract the features extracted by the network layer are, and in order to extract the more abstract high-level features, the model selects the maximum pooling rather than the average pooling in the coding structure.
Because the maximum value of the pixel extracted from the feature map and the position thereof are critical, edge detail information can be lost when pooling is carried out, and position information can be lost because of the reduction of the resolution of the feature map, so that a pooling index is added into an encoding structure to memorize the position of the maximum value of the pixel, a decoding structure releases the maximum value of the pixel at the original position through the pooling index, and the rest positions are filled with 0, thereby realizing up-sampling by 2 times, recovering important position information and reducing errors.
However, as the network level of the decoding structure deepens, the extracted features are more and more abstract, much edge detail information is lost, and each layer is lost with information of different scales, all the positions of the up-sampled feature images except the maximum value are 0 in the decoding structure, namely the obtained feature images are sparse, the lost information is not reproduced in the up-sampled feature images, so that feature fusion is added in the decoding structure to restore information, and the sparse feature images obtained after each up-sampling are overlapped with the feature images after convolution of the corresponding size of the encoding stage and after pooling. In this way, the method inputs each up-sampled characteristic diagram into the fusion structure, gradually recovers the lost information in the encoding stage, inputs the fusion result into the convolution layer to further enrich the information, and obtains denser characteristic diagrams, so that the segmentation effect is better, and the precision is higher.
The beneficial effects of adopting above-mentioned technical scheme to produce lie in: the image segmentation method utilizing the feature fusion enhanced edge and detail information provided by the invention can recover the position and boundary detail information lost in the encoding stage while recovering the resolution of the feature image, enrich the information of the image, obtain a dense feature image, make up the sparse feature image brought by direct up-sampling, make the segmented boundary and detail clearer, improve the segmentation effect on fine detail objects, and improve the average segmentation precision and the mIOU.
Drawings
Fig. 1 is a flowchart of an image segmentation method using feature fusion to enhance edge and detail information according to an embodiment of the present invention.
Detailed Description
The following describes in further detail the embodiments of the present invention with reference to the drawings and examples. The following examples are illustrative of the invention and are not intended to limit the scope of the invention.
In this embodiment, an image segmentation method using feature fusion to enhance edge and detail information, as shown in fig. 1, includes the following steps:
step 1: processing the images in the training data set to obtain images with uniform resolution;
step 1.1: scaling and cutting the images in the training data set to enable the input images to have uniform sizes;
step 1.2: fixing the resolution of the input image to 360×480;
step 2: inputting the image into a coding structure for feature extraction; the coding structure is the same as the SegNet network, the first 13 layers of VGG-16 are adopted, and meanwhile, the maximum pooling index is added during pooling to memorize the maximum value of pixels in an image and the position of the maximum value;
the convolution kernel size of each convolution layer of the coding structure is 3×3, so that the image size is kept unchanged, the feature map after each convolution layer is named conv_i_j, wherein i=1, 2,3,4,5, j=1, 2 when i=1, 2, and j=1, 2,3 when i=3, 4,5; meanwhile, each convolution layer is followed by a connection Batch Normalisation and a ReLU activation function; batch Normalisation is to accelerate the convergence speed of the model and alleviate the gradient dispersion problem in the deep network to a certain extent, so that the deep network model is easier and more stable to train; selecting a ReLU activation function can solve gradient disappearance and alleviate network overfitting; adding a maximum pooling index into each pooling layer, realizing downsampling by using 2 multiplied by 2 non-overlapping maximum pooling, and memorizing the position of the maximum value of the pixel through the maximum pooling index, wherein the characteristic diagram obtained by each pooling layer is represented by pool_r, wherein r=1, 2,3,4 and 5;
the coding structure uses the front 13 layers of VGG-16 to extract the features of pictures, and uses a convolution layer and a pooling layer to extract the features of images with different scales, wherein the front 4 layers of the structure can be regarded as a shallow structure, the obtained low-level semantic information can be regarded as a deep structure, the obtained high-level abstract information can be regarded as a deep structure, and the features with different scales can be obtained through the coding structure;
for the input feature map X ε R h×w×c Wherein h and w are the height and width of the feature map respectively, c is the number of channels, and the feature map is obtained through 2X 2 non-overlapping maximization poolingWherein the value of pixel (i, j) is shown in the following formula:
the position corresponding to the maximum value of the pixel point is recorded as (m i ,n j ) The following formula is shown:
step 3: inputting a pooling feature map pool_5 obtained through the coding structure into a decoding structure added with more feature fusion, releasing the maximum value of pixels in situ by utilizing the maximum pooling index, filling the rest positions with 0, and realizing up-sampling by 2 times to obtain a sparse feature map upsampling5;
the decoding structure comprises three-layer convolution structures and two-layer convolution structures; each convolutional layer in the decoding structure is followed by a concatenation Batch Normalisation and a ReLU activation function;
the value of each pixel in the obtained sparse feature map sampling5 is shown in the following formula:
wherein Z is u,v And (5) uploading pixel values of the pixel points (u, v) in the sampling5 for the sparse feature map.
Step 4: because the feature map obtained by up-sampling is sparse, performing a feature fusion operation through the decoding structure; the convolution feature graphs extracted from the coding structure and having the same resolution as the sparse feature graph upsampling5 are conv_5_1, conv_5_2 and conv_5_3, and because pool_5 is obtained by direct pooling of conv_5_3, part of information is recovered in the process of 2 times up sampling, and meanwhile, in order to reduce training parameters of a model, only the sparse feature graph upsampling5 is fused with the convolution feature graphs conv_5_1 and conv_5_2, and the fused feature graphs are fusedFusing the pooling feature map pool_4 with the corresponding size to obtain a fused feature map F 1
The fusion process is to add pixel values at corresponding positions in the feature map;
to maintain the symmetry of the original SegNet network, feature map F is fused 1 Inputting the information into a first three-layer convolution structure to perform convolution operation to obtain a dense feature map conv_decoding 5, further enriching the information of the picture, and compensating the information loss caused by pooling and downsampling;
step 4 is equivalent to the first feature fusion operation, and the method of the invention needs to perform five times of feature fusion in the decoding process, and is divided into three different fusion forms according to the difference of up-sampling depth, wherein the previous three fusion forms are the same, and the following four times of feature fusion are also needed.
Step 5: performing four feature fusion operations through the decoding structure, and repeatedly performing up-sampling, feature fusion and convolution operations until the resolution of the feature map is restored to the original size to obtain a dense feature map conv_decoding 1;
step 5.1: performing secondary feature fusion through the decoding structure to recover image information;
step 5.1.1: after the step 4, the resolution of the feature map conv_decoding 5 is the same as that of the pooling feature map pool_4, and the conv_decoding 5 is up-sampled by 2 times by utilizing the maximum pooling index stored when the pooling feature map pool_4 is generated, so as to obtain a sparse feature map upsampling4;
step 5.1.2: merging the sparse feature map upsampling4 with convolution feature maps conv_4_1 and conv_4_2 with the same resolution extracted from the coding structure, and pooling feature map pool_3 to obtain a merged feature map F 2
Step 5.1.3: will fuse the feature map F 2 Inputting the dense feature map conv_decoding 4 into a second three-layer convolution structure to perform convolution operation;
step 5.2: performing third feature fusion through the decoding structure to recover image information;
step 5.2.1: performing 2-time up-sampling on the feature map conv_decoding 4 by using a maximum pooling index stored when generating pooling feature map pool_3 to obtain sparse feature map upsampling3;
step 5.2.2: performing feature fusion on the sparse feature map upsampling3 and the convolution feature maps conv3_1, conv3_2 and pooling feature map pool_2 which are extracted from the coding structure and have the same resolution to obtain a fusion feature map F 3
Step 5.2.3: will fuse the feature map F 3 Inputting the dense feature map conv_decoding 3 into a third three-layer convolution structure to perform convolution operation;
the first three feature fusion is the coding feature graphs corresponding to the three stages, and has the same fusion structure, the feature graphs participating in fusion have lower resolution and local abstract features, so that the same fusion form is used for recovering the local abstract features.
Step 5.3: performing fourth feature fusion through the decoding structure to recover the detail information of the image;
step 5.3.1: performing 2-time up-sampling on the feature map conv_decoding 3 by using a maximum pooling index stored when generating pooling feature map pool_2 to obtain sparse feature map upsampling2;
step 5.3.2: since the resolution of the feature map has been restored to the original map after step 5.3.1At this time, the corresponding feature graphs comprise conv_2_1, conv_2_2 and pool_1, so that in order to reduce the parameters of model training, only sparse feature graph upsampling2 is subjected to feature fusion with convolution feature graph conv_2_1 and pooling feature graph pool_1 to obtain a fusion feature graph F 4
Step 5.3.3: according to the symmetry of SegNet network, the feature map F is fused 4 Inputting the two-layer characteristic images into a first two-layer convolution structure to carry out convolution operation to obtain a dense characteristic image conv_decoding 2;
different from the previous three feature fusion, the feature fusion corresponds to two stages of coding feature graphs for recovering detail information, so that the fusion forms are different;
step 5.4: performing fifth feature fusion through the decoding structure to recover the edge information of the image;
step 5.4.1: performing 2-time up-sampling on the feature map conv_decoding 2 by using a maximum pooling index stored when generating pooling feature map pool_1 to obtain sparse feature map upsampling1;
step 5.4.2: since the resolution of the feature map is restored to the original size after the step 5.4.1, the feature maps with the same resolution obtained by the coding structure have convolution features conv_1_1 and conv_1_2, and in order to reduce the parameters of model training, only the sparse feature map upsampling1 and the convolution feature map conv_1_1 are subjected to feature fusion to obtain a fusion feature map F 5
Step 5.4.3: will fuse the feature map F 5 Inputting the two-layer convolution structure to a second two-layer convolution structure to perform convolution operation to obtain a dense feature map conv_decoding 1;
this feature fusion only has one stage of coded feature map to participate in the fusion and is used for recovering edge information.
Step 6: the dense feature map conv_decode1 is input to the Softmax layer to get the maximum probability of pixel classification in the image.
Step 7: the cross entropy loss function is calculated through the maximum probability of pixel classification in the image, and the convolution kernel parameters of each convolution layer and pooling layer in the coding structure and the decoding structure are updated through a random gradient descent method, so that the image segmentation is realized.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced with equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions, which are defined by the scope of the appended claims.

Claims (5)

1. An image segmentation method for enhancing edge and detail information by utilizing feature fusion is characterized in that: the method comprises the following steps:
step 1: processing the images in the training data set to obtain images with uniform resolution;
step 2: inputting the image into a coding structure for feature extraction; the coding structure is the same as the SegNet network, the first 13 layers of VGG-16 are adopted, and meanwhile, the maximum pooling index is added during pooling to memorize the maximum value of pixels in an image and the position of the maximum value;
the convolution kernel size of each convolution layer of the coding structure is 3×3, and the feature map after each convolution layer is named conv_i_j, where i=1, 2,3,4,5, j=1, 2 when i=1, 2, and j=1, 2,3 when i=3, 4,5; meanwhile, each convolution layer is followed by a connection Batch Normalisation and a ReLU activation function; adding a maximum pooling index into each pooling layer, realizing downsampling by using 2 multiplied by 2 non-overlapping maximum pooling, and memorizing the position of the maximum value of the pixel through the maximum pooling index, wherein the characteristic diagram obtained by each pooling layer is represented by pool_r, wherein r=1, 2,3,4 and 5;
step 3: inputting a pooling feature map pool_5 obtained through the coding structure into a decoding structure added with more feature fusion, releasing the maximum value of pixels in situ by utilizing the maximum pooling index, filling the rest positions with 0, and realizing up-sampling by 2 times to obtain a sparse feature map upsampling5;
the decoding structure comprises three-layer convolution structures and two-layer convolution structures; each convolutional layer in the decoding structure is followed by a concatenation Batch Normalisation and a ReLU activation function;
step 4: performing one-time feature fusion operation through a decoding structure, fusing a sparse feature map upsampling5 with convolution feature maps conv_5_1 and conv_5_2, and fusing a feature map obtained by fusion with a pooling feature map pool_4 with a corresponding size to obtain a fusion feature map F 1
Will fuse the feature map F 1 The method comprises the steps of inputting the data into a first three-layer convolution structure of a decoding structure to carry out convolution operation to obtain a dense feature map conv_decoding 5, and compensating information loss caused by pooling and downsampling;
step 5: performing four feature fusion operations through the decoding structure until the resolution of the feature map is restored to the original size to obtain a dense feature map conv_decoding 1;
step 5.1: performing secondary feature fusion through the decoding structure to recover image information;
step 5.1.1: 2 times up-sampling the conv_decoding 5 by using the maximum pooling index stored when generating the pooling feature map pool_4 to obtain a sparse feature map upsampling4;
step 5.1.2: merging the sparse feature map upsampling4 with convolution feature maps conv_4_1 and conv_4_2 with the same resolution extracted from the coding structure, and pooling feature map pool_3 to obtain a merged feature map F 2
Step 5.1.3: will fuse the feature map F 2 Inputting the dense feature map conv_decoding 4 into a second three-layer convolution structure to perform convolution operation;
step 5.2: performing third feature fusion through the decoding structure to recover image information;
step 5.2.1: performing 2-time up-sampling on the feature map conv_decoding 4 by using a maximum pooling index stored when generating pooling feature map pool_3 to obtain sparse feature map upsampling3;
step 5.2.2: performing feature fusion on the sparse feature map upsampling3 and the convolution feature maps conv3_1, conv3_2 and pooling feature map pool_2 which are extracted from the coding structure and have the same resolution to obtain a fusion feature map F 3
Step 5.2.3: will fuse the feature map F 3 Inputting the dense feature map conv_decoding 3 into a third three-layer convolution structure to perform convolution operation;
step 5.3: performing fourth feature fusion through the decoding structure to recover the detail information of the image;
step 5.3.1: performing 2-time up-sampling on the feature map conv_decoding 3 by using a maximum pooling index stored when generating pooling feature map pool_2 to obtain sparse feature map upsampling2;
step 5.3.2: feature fusion is carried out on the sparse feature map upsampling2, the convolution feature map conv_2_1 and the pooling feature map pool_1 to obtain a fusion feature map F 4
Step 5.3.3: root of Chinese characterAccording to the symmetry of SegNet network, feature map F will be fused 4 Inputting the two-layer characteristic images into a first two-layer convolution structure to carry out convolution operation to obtain a dense characteristic image conv_decoding 2;
step 5.4: performing fifth feature fusion through the decoding structure to recover the edge information of the image;
step 5.4.1: performing 2-time up-sampling on the feature map conv_decoding 2 by using a maximum pooling index stored when generating pooling feature map pool_1 to obtain sparse feature map upsampling1;
step 5.4.2: feature fusion is carried out on the sparse feature map upsampling1 and the convolution feature map conv_1_1 to obtain a fusion feature map F 5
Step 5.4.3: will fuse the feature map F 5 Inputting the two-layer convolution structure to a second two-layer convolution structure to perform convolution operation to obtain a dense feature map conv_decoding 1;
step 6: inputting the dense feature map conv_decoding 1 into a Softmax layer to obtain the maximum probability of pixel classification in the image;
step 7: the cross entropy loss function is calculated through the maximum probability of pixel classification in the image, and the convolution kernel parameters of each convolution layer and pooling layer in the coding structure and the decoding structure are updated through a random gradient descent method, so that the image segmentation is realized.
2. An image segmentation method using feature fusion to enhance edge and detail information as defined in claim 1, wherein: the specific method of the step 1 is as follows:
step 1.1: scaling and cutting the images in the training data set to enable the input images to have uniform sizes;
step 1.2: the resolution of the input image is fixed to 360×480.
3. An image segmentation method using feature fusion to enhance edge and detail information as defined in claim 1, wherein: the specific method for memorizing the maximum value of the pixels in the image and the positions thereof by adding the maximum pooling index during pooling in the step 2 is as follows:
for inputFeature map X ε R h×w×c Wherein h and w are the height and width of the feature map respectively, c is the number of channels, and the feature map is obtained through 2X 2 non-overlapping maximization poolingWherein the value of pixel (i, j) is shown in the following formula:
the position corresponding to the maximum value of the pixel point is recorded as (m i ,n j ) The following formula is shown:
4. a method of image segmentation using feature fusion to enhance edge and detail information as claimed in claim 3, wherein: the value of each pixel in the sparse feature map upsamping 5 obtained in the step 3 is shown in the following formula:
wherein Z is u,v And (5) uploading pixel values of the pixel points (u, v) in the sampling5 for the sparse feature map.
5. An image segmentation method using feature fusion to enhance edge and detail information as defined in claim 1, wherein: and 4, performing addition operation on pixel values at corresponding positions in the feature map in the fusion process.
CN201911094462.3A 2019-11-11 2019-11-11 Image segmentation method for enhancing edge and detail information by utilizing feature fusion Active CN111028235B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911094462.3A CN111028235B (en) 2019-11-11 2019-11-11 Image segmentation method for enhancing edge and detail information by utilizing feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911094462.3A CN111028235B (en) 2019-11-11 2019-11-11 Image segmentation method for enhancing edge and detail information by utilizing feature fusion

Publications (2)

Publication Number Publication Date
CN111028235A CN111028235A (en) 2020-04-17
CN111028235B true CN111028235B (en) 2023-08-22

Family

ID=70205321

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911094462.3A Active CN111028235B (en) 2019-11-11 2019-11-11 Image segmentation method for enhancing edge and detail information by utilizing feature fusion

Country Status (1)

Country Link
CN (1) CN111028235B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582111B (en) * 2020-04-29 2022-04-29 电子科技大学 Cell component segmentation method based on semantic segmentation
CN111666842B (en) * 2020-05-25 2022-08-26 东华大学 Shadow detection method based on double-current-cavity convolution neural network
CN111784642B (en) * 2020-06-10 2021-12-28 中铁四局集团有限公司 Image processing method, target recognition model training method and target recognition method
CN113052159B (en) * 2021-04-14 2024-06-07 中国移动通信集团陕西有限公司 Image recognition method, device, equipment and computer storage medium
CN113192200B (en) * 2021-04-26 2022-04-01 泰瑞数创科技(北京)有限公司 Method for constructing urban real scene three-dimensional model based on space-three parallel computing algorithm
CN113280820B (en) * 2021-06-09 2022-11-29 华南农业大学 Orchard visual navigation path extraction method and system based on neural network
CN113496453A (en) * 2021-06-29 2021-10-12 上海电力大学 Anti-network image steganography method based on multi-level feature fusion
CN113724269A (en) * 2021-08-12 2021-11-30 浙江大华技术股份有限公司 Example segmentation method, training method of example segmentation network and related equipment
CN115828079B (en) * 2022-04-20 2023-08-11 北京爱芯科技有限公司 Method and device for maximum pooling operation

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10304193B1 (en) * 2018-08-17 2019-05-28 12 Sigma Technologies Image segmentation and object detection using fully convolutional neural network
CN109903292A (en) * 2019-01-24 2019-06-18 西安交通大学 A kind of three-dimensional image segmentation method and system based on full convolutional neural networks
CN110264483A (en) * 2019-06-19 2019-09-20 东北大学 A kind of semantic image dividing method based on deep learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10304193B1 (en) * 2018-08-17 2019-05-28 12 Sigma Technologies Image segmentation and object detection using fully convolutional neural network
CN109903292A (en) * 2019-01-24 2019-06-18 西安交通大学 A kind of three-dimensional image segmentation method and system based on full convolutional neural networks
CN110264483A (en) * 2019-06-19 2019-09-20 东北大学 A kind of semantic image dividing method based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
图像语义分割问题研究综述;肖朝霞 等;软件导刊;第17卷(第8期);第6-12页 *

Also Published As

Publication number Publication date
CN111028235A (en) 2020-04-17

Similar Documents

Publication Publication Date Title
CN111028235B (en) Image segmentation method for enhancing edge and detail information by utilizing feature fusion
CN111047551B (en) Remote sensing image change detection method and system based on U-net improved algorithm
CN110322495B (en) Scene text segmentation method based on weak supervised deep learning
CN107644006B (en) Automatic generation method of handwritten Chinese character library based on deep neural network
CN111612807B (en) Small target image segmentation method based on scale and edge information
CN111915627B (en) Semantic segmentation method, network, device and computer storage medium
CN101714262A (en) Method for reconstructing three-dimensional scene of single image
CN113569865B (en) Single sample image segmentation method based on class prototype learning
CN110751111B (en) Road extraction method and system based on high-order spatial information global automatic perception
CN113505792B (en) Multi-scale semantic segmentation method and model for unbalanced remote sensing image
CN114936605A (en) Knowledge distillation-based neural network training method, device and storage medium
WO2023212997A1 (en) Knowledge distillation based neural network training method, device, and storage medium
CN113408471A (en) Non-green-curtain portrait real-time matting algorithm based on multitask deep learning
CN106910202B (en) Image segmentation method and system for ground object of remote sensing image
CN111539887A (en) Neural network image defogging method based on mixed convolution channel attention mechanism and layered learning
CN113066025B (en) Image defogging method based on incremental learning and feature and attention transfer
CN112581409B (en) Image defogging method based on end-to-end multiple information distillation network
CN112270366B (en) Micro target detection method based on self-adaptive multi-feature fusion
CN114048822A (en) Attention mechanism feature fusion segmentation method for image
CN112950477A (en) High-resolution saliency target detection method based on dual-path processing
CN116797787B (en) Remote sensing image semantic segmentation method based on cross-modal fusion and graph neural network
CN113689434A (en) Image semantic segmentation method based on strip pooling
CN110633706B (en) Semantic segmentation method based on pyramid network
CN113837290A (en) Unsupervised unpaired image translation method based on attention generator network
CN116485867A (en) Structured scene depth estimation method for automatic driving

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant