CN111028235B - Image segmentation method for enhancing edge and detail information by utilizing feature fusion - Google Patents
Image segmentation method for enhancing edge and detail information by utilizing feature fusion Download PDFInfo
- Publication number
- CN111028235B CN111028235B CN201911094462.3A CN201911094462A CN111028235B CN 111028235 B CN111028235 B CN 111028235B CN 201911094462 A CN201911094462 A CN 201911094462A CN 111028235 B CN111028235 B CN 111028235B
- Authority
- CN
- China
- Prior art keywords
- feature map
- feature
- pooling
- convolution
- decoding
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000004927 fusion Effects 0.000 title claims abstract description 77
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000003709 image segmentation Methods 0.000 title claims abstract description 17
- 230000002708 enhancing effect Effects 0.000 title claims abstract description 7
- 238000005070 sampling Methods 0.000 claims abstract description 23
- 238000010606 normalization Methods 0.000 claims abstract description 8
- 238000011478 gradient descent method Methods 0.000 claims abstract description 4
- 238000011176 pooling Methods 0.000 claims description 89
- 238000012549 training Methods 0.000 claims description 9
- 230000004913 activation Effects 0.000 claims description 7
- 238000010586 diagram Methods 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 3
- 238000007499 fusion processing Methods 0.000 claims description 3
- 238000011065 in-situ storage Methods 0.000 claims description 3
- 238000013527 convolutional neural network Methods 0.000 abstract description 5
- 230000000694 effects Effects 0.000 abstract description 3
- 230000011218 segmentation Effects 0.000 description 15
- 230000006870 function Effects 0.000 description 7
- 230000008569 process Effects 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 241000282414 Homo sapiens Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 230000002427 irreversible effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20172—Image enhancement details
- G06T2207/20192—Edge enhancement; Edge preservation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Image Processing (AREA)
Abstract
The invention provides an image segmentation method for enhancing edge and detail information by utilizing feature fusion, and relates to the technical field of computer vision. The method utilizes a convolutional neural network to extract characteristics of an input image; inputting the extracted features into a decoding structure added with more feature fusion, and enriching edge and detail information while recovering the resolution of the image to obtain a dense feature map; outputting maximum values of different classifications by a normalization method; and calculating a cross entropy loss function, and updating the weight in the network by using a random gradient descent method. The method can restore the position and boundary detail information lost in the encoding stage while restoring the resolution of the feature map, enrich the information of the picture, obtain the dense feature map, make up the sparse feature map brought by direct up-sampling, make the divided boundary and detail clearer, and promote the division effect on the detail tiny objects.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to an image segmentation method for enhancing edge and detail information by utilizing feature fusion.
Background
Along with the continuous progress of science and technology and the rapid development of national economy, artificial intelligence gradually enters the field of view of people, plays an increasingly heavy role in the production and life of human beings, is widely applied to various fields, image semantic segmentation is an important research direction of the artificial intelligence, is a very important means for realizing automatic scene understanding, and can be applied to various fields such as an automatic driving system, unmanned and application.
The image semantic segmentation technology is an important branch in the field of computer vision in machine learning, and the image semantic segmentation is to process an input image, automatically segment and identify the content in the image. Prior to applying deep learning to the field of computer vision, classifiers that construct semantic segmentation of images are typically either using a forest of texels, or a random forest. With the appearance and the vigorous development of the deep convolutional neural network, a very effective method is provided for semantic segmentation, the CNN is well developed when being applied to the semantic segmentation, the development of the semantic segmentation is promoted, and the CNN is remarkably achieved when being applied to various fields.
Many classical segmentation methods appear after deep learning is applied to semantic segmentation, such as a full convolution network FCN, a SegNet network with an encoder-decoder structure and a deep Lab with hole convolution, but as the hierarchy of the CNN network deepens, the continuous pooling and downsampling can lead the position information and boundary detail information of pictures to be lost, the process is irreversible, the removed information cannot be completely recovered, so that the feature map after upsampling in the decoding stage can be sparse due to the loss of the information, and the methods have certain limitations.
The full convolutional network FCN and the conventional SegNet network lose position and edge details due to downsampling, lost information is not reproduced when upsampling is performed in a decoding stage, the obtained feature map is sparse, and although the SegNet network recovers the position information through pooling indexes and enriches boundary and detail information by utilizing convolution operation, a great amount of information loss still exists.
The hole convolution is a convolution layer capable of obtaining dense feature images, but the calculation cost of using the hole convolution is relatively high, and a large amount of memory is occupied by processing a large amount of high-resolution feature images.
The problem of the existing image semantic segmentation method is that the maintenance of edge detail characteristics and position information still needs to be further improved, and the segmentation accuracy is still to be improved.
Disclosure of Invention
The invention aims to solve the technical problem of providing an image segmentation method for enhancing edge and detail information by utilizing feature fusion to realize segmentation of images aiming at the defects of the prior art.
In order to solve the technical problems, the invention adopts the following technical scheme: an image segmentation method for enhancing edge and detail information by utilizing feature fusion, comprising the following steps:
step 1: processing the images in the training data set to obtain images with uniform resolution;
step 1.1: scaling and cutting the images in the training data set to enable the input images to have uniform sizes;
step 1.2: fixing the resolution of the input image to 360×480;
step 2: inputting the image into a coding structure for feature extraction; the coding structure is the same as the SegNet network, the first 13 layers of VGG-16 are adopted, and meanwhile, the maximum pooling index is added during pooling to memorize the maximum value of pixels in an image and the position of the maximum value;
the convolution kernel size of each convolution layer of the coding structure is 3×3, and the feature map after each convolution layer is named conv_i_j, where i=1, 2,3,4,5, j=1, 2 when i=1, 2, and j=1, 2,3 when i=3, 4,5; meanwhile, each convolution layer is followed by a connection Batch Normalisation and a ReLU activation function; adding a maximum pooling index into each pooling layer, realizing downsampling by using 2 multiplied by 2 non-overlapping maximum pooling, and memorizing the position of the maximum value of the pixel through the maximum pooling index, wherein the characteristic diagram obtained by each pooling layer is represented by pool_r, wherein r=1, 2,3,4 and 5;
the specific method for memorizing the maximum value of the pixels in the image and the positions thereof by adding the maximum pooling index during pooling is as follows:
for the input feature map X ε R h×w×c Wherein h and w are the height and width of the feature map respectively, c is the number of channels, and the feature map is obtained through 2X 2 non-overlapping maximization poolingWherein the value of pixel (i, j) is shown in the following formula:
the position corresponding to the maximum value of the pixel point is recorded as (m i ,n j ) The following formula is shown:
step 3: inputting a pooling feature map pool_5 obtained through the coding structure into a decoding structure added with more feature fusion, releasing the maximum value of pixels in situ by utilizing the maximum pooling index, filling the rest positions with 0, and realizing up-sampling by 2 times to obtain a sparse feature map upsampling5;
the decoding structure comprises three-layer convolution structures and two-layer convolution structures; each convolutional layer in the decoding structure is followed by a concatenation Batch Normalisation and a ReLU activation function;
the obtained sparse feature map upsampling5 is characterized in that the value of each pixel is shown in the following formula:
wherein Z is u,v Pixel values of pixel points (u, v) in the sparse feature map upsampling5;
step 4: performing one-time feature fusion operation through a decoding structure, fusing a sparse feature map upsampling5 with convolution feature maps conv_5_1 and conv_5_2, and fusing a feature map obtained by fusion with a pooling feature map pool_4 with a corresponding size to obtain a fusion feature map F 1 ;
The fusion process is to add pixel values at corresponding positions in the feature map;
will fuse the feature map F 1 Inputting the data into a first three-layer convolution structure to perform convolution operation to obtain a dense feature map conv_decoding 5, and compensating information loss caused by pooling and downsampling;
step 5: performing four feature fusion operations through the decoding structure, and repeatedly performing up-sampling, feature fusion and convolution operations until the resolution of the feature map is restored to the original size;
step 5.1: performing secondary feature fusion through the decoding structure to recover image information;
step 5.1.1: 2 times up-sampling the conv_decoding 5 by using the maximum pooling index stored when generating the pooling feature map pool_4 to obtain a sparse feature map upsampling4;
step 5.1.2: the sparse feature map upsampling4 is compared with a convolution feature map conv_4_1, conv/u which is extracted from a coding structure and has the same resolution4_2, pooling the feature images pool_3 to obtain a fused feature image F 2 ;
Step 5.1.3: will fuse the feature map F 2 Inputting the dense feature map conv_decoding 4 into a second three-layer convolution structure to perform convolution operation;
step 5.2: performing third feature fusion through the decoding structure to recover image information;
step 5.2.1: performing 2-time up-sampling on the feature map conv_decoding 4 by using a maximum pooling index stored when generating pooling feature map pool_3 to obtain sparse feature map upsampling3;
step 5.2.2: performing feature fusion on the sparse feature map upsampling3 and the convolution feature maps conv3_1, conv3_2 and pooling feature map pool_2 which are extracted from the coding structure and have the same resolution to obtain a fusion feature map F 3 ;
Step 5.2.3: will fuse the feature map F 3 Inputting the dense feature map conv_decoding 3 into a third three-layer convolution structure to perform convolution operation;
step 5.3: performing fourth feature fusion through the decoding structure to recover the detail information of the image;
step 5.3.1: performing 2-time up-sampling on the feature map conv_decoding 3 by using a maximum pooling index stored when generating pooling feature map pool_2 to obtain sparse feature map upsampling2;
step 5.3.2: feature fusion is carried out on the sparse feature map upsampling2, the convolution feature map conv_2_1 and the pooling feature map pool_1 to obtain a fusion feature map F 4 ;
Step 5.3.3: according to the symmetry of SegNet network, the feature map F is fused 4 Inputting the two-layer characteristic images into a first two-layer convolution structure to carry out convolution operation to obtain a dense characteristic image conv_decoding 2;
step 5.4: performing fifth feature fusion through the decoding structure to recover the edge information of the image;
step 5.4.1: performing 2-time up-sampling on the feature map conv_decoding 2 by using a maximum pooling index stored when generating pooling feature map pool_1 to obtain sparse feature map upsampling1;
step (a)5.4.2: feature fusion is carried out on the sparse feature map upsampling1 and the convolution feature map conv_1_1 to obtain a fusion feature map F 5 ;
Step 5.4.3: will fuse the feature map F 5 Inputting the two-layer convolution structure to a second two-layer convolution structure to perform convolution operation to obtain a dense feature map conv_decoding 1;
step 6: inputting the dense feature map conv_decoding 1 into a Softmax layer to obtain the maximum probability of pixel classification in the image;
step 7: the cross entropy loss function is calculated through the maximum probability of pixel classification in the image, and the convolution kernel parameters of each convolution layer and pooling layer in the coding structure and the decoding structure are updated through a random gradient descent method, so that the image segmentation is realized.
The technical principle of the method of the invention is as follows: improving the decoding stage on the basis of the original SegNet network, and recovering the image position and boundary detail information while recovering the resolution of the feature map to obtain a dense feature map; since the features of the image are extracted by the convolution layer and the pooling layer in the coding structure, and the convolution layers and the pooling layers with different depths extract information with different scales, global low-level semantic information such as edges, directions, textures, chromaticity and the like is extracted in the shallow structure, and local high-level semantic information such as the shape of an object is extracted in the deep structure, the more abstract the features extracted by the network layer are, and in order to extract the more abstract high-level features, the model selects the maximum pooling rather than the average pooling in the coding structure.
Because the maximum value of the pixel extracted from the feature map and the position thereof are critical, edge detail information can be lost when pooling is carried out, and position information can be lost because of the reduction of the resolution of the feature map, so that a pooling index is added into an encoding structure to memorize the position of the maximum value of the pixel, a decoding structure releases the maximum value of the pixel at the original position through the pooling index, and the rest positions are filled with 0, thereby realizing up-sampling by 2 times, recovering important position information and reducing errors.
However, as the network level of the decoding structure deepens, the extracted features are more and more abstract, much edge detail information is lost, and each layer is lost with information of different scales, all the positions of the up-sampled feature images except the maximum value are 0 in the decoding structure, namely the obtained feature images are sparse, the lost information is not reproduced in the up-sampled feature images, so that feature fusion is added in the decoding structure to restore information, and the sparse feature images obtained after each up-sampling are overlapped with the feature images after convolution of the corresponding size of the encoding stage and after pooling. In this way, the method inputs each up-sampled characteristic diagram into the fusion structure, gradually recovers the lost information in the encoding stage, inputs the fusion result into the convolution layer to further enrich the information, and obtains denser characteristic diagrams, so that the segmentation effect is better, and the precision is higher.
The beneficial effects of adopting above-mentioned technical scheme to produce lie in: the image segmentation method utilizing the feature fusion enhanced edge and detail information provided by the invention can recover the position and boundary detail information lost in the encoding stage while recovering the resolution of the feature image, enrich the information of the image, obtain a dense feature image, make up the sparse feature image brought by direct up-sampling, make the segmented boundary and detail clearer, improve the segmentation effect on fine detail objects, and improve the average segmentation precision and the mIOU.
Drawings
Fig. 1 is a flowchart of an image segmentation method using feature fusion to enhance edge and detail information according to an embodiment of the present invention.
Detailed Description
The following describes in further detail the embodiments of the present invention with reference to the drawings and examples. The following examples are illustrative of the invention and are not intended to limit the scope of the invention.
In this embodiment, an image segmentation method using feature fusion to enhance edge and detail information, as shown in fig. 1, includes the following steps:
step 1: processing the images in the training data set to obtain images with uniform resolution;
step 1.1: scaling and cutting the images in the training data set to enable the input images to have uniform sizes;
step 1.2: fixing the resolution of the input image to 360×480;
step 2: inputting the image into a coding structure for feature extraction; the coding structure is the same as the SegNet network, the first 13 layers of VGG-16 are adopted, and meanwhile, the maximum pooling index is added during pooling to memorize the maximum value of pixels in an image and the position of the maximum value;
the convolution kernel size of each convolution layer of the coding structure is 3×3, so that the image size is kept unchanged, the feature map after each convolution layer is named conv_i_j, wherein i=1, 2,3,4,5, j=1, 2 when i=1, 2, and j=1, 2,3 when i=3, 4,5; meanwhile, each convolution layer is followed by a connection Batch Normalisation and a ReLU activation function; batch Normalisation is to accelerate the convergence speed of the model and alleviate the gradient dispersion problem in the deep network to a certain extent, so that the deep network model is easier and more stable to train; selecting a ReLU activation function can solve gradient disappearance and alleviate network overfitting; adding a maximum pooling index into each pooling layer, realizing downsampling by using 2 multiplied by 2 non-overlapping maximum pooling, and memorizing the position of the maximum value of the pixel through the maximum pooling index, wherein the characteristic diagram obtained by each pooling layer is represented by pool_r, wherein r=1, 2,3,4 and 5;
the coding structure uses the front 13 layers of VGG-16 to extract the features of pictures, and uses a convolution layer and a pooling layer to extract the features of images with different scales, wherein the front 4 layers of the structure can be regarded as a shallow structure, the obtained low-level semantic information can be regarded as a deep structure, the obtained high-level abstract information can be regarded as a deep structure, and the features with different scales can be obtained through the coding structure;
for the input feature map X ε R h×w×c Wherein h and w are the height and width of the feature map respectively, c is the number of channels, and the feature map is obtained through 2X 2 non-overlapping maximization poolingWherein the value of pixel (i, j) is shown in the following formula:
the position corresponding to the maximum value of the pixel point is recorded as (m i ,n j ) The following formula is shown:
step 3: inputting a pooling feature map pool_5 obtained through the coding structure into a decoding structure added with more feature fusion, releasing the maximum value of pixels in situ by utilizing the maximum pooling index, filling the rest positions with 0, and realizing up-sampling by 2 times to obtain a sparse feature map upsampling5;
the decoding structure comprises three-layer convolution structures and two-layer convolution structures; each convolutional layer in the decoding structure is followed by a concatenation Batch Normalisation and a ReLU activation function;
the value of each pixel in the obtained sparse feature map sampling5 is shown in the following formula:
wherein Z is u,v And (5) uploading pixel values of the pixel points (u, v) in the sampling5 for the sparse feature map.
Step 4: because the feature map obtained by up-sampling is sparse, performing a feature fusion operation through the decoding structure; the convolution feature graphs extracted from the coding structure and having the same resolution as the sparse feature graph upsampling5 are conv_5_1, conv_5_2 and conv_5_3, and because pool_5 is obtained by direct pooling of conv_5_3, part of information is recovered in the process of 2 times up sampling, and meanwhile, in order to reduce training parameters of a model, only the sparse feature graph upsampling5 is fused with the convolution feature graphs conv_5_1 and conv_5_2, and the fused feature graphs are fusedFusing the pooling feature map pool_4 with the corresponding size to obtain a fused feature map F 1 ;
The fusion process is to add pixel values at corresponding positions in the feature map;
to maintain the symmetry of the original SegNet network, feature map F is fused 1 Inputting the information into a first three-layer convolution structure to perform convolution operation to obtain a dense feature map conv_decoding 5, further enriching the information of the picture, and compensating the information loss caused by pooling and downsampling;
step 4 is equivalent to the first feature fusion operation, and the method of the invention needs to perform five times of feature fusion in the decoding process, and is divided into three different fusion forms according to the difference of up-sampling depth, wherein the previous three fusion forms are the same, and the following four times of feature fusion are also needed.
Step 5: performing four feature fusion operations through the decoding structure, and repeatedly performing up-sampling, feature fusion and convolution operations until the resolution of the feature map is restored to the original size to obtain a dense feature map conv_decoding 1;
step 5.1: performing secondary feature fusion through the decoding structure to recover image information;
step 5.1.1: after the step 4, the resolution of the feature map conv_decoding 5 is the same as that of the pooling feature map pool_4, and the conv_decoding 5 is up-sampled by 2 times by utilizing the maximum pooling index stored when the pooling feature map pool_4 is generated, so as to obtain a sparse feature map upsampling4;
step 5.1.2: merging the sparse feature map upsampling4 with convolution feature maps conv_4_1 and conv_4_2 with the same resolution extracted from the coding structure, and pooling feature map pool_3 to obtain a merged feature map F 2 ;
Step 5.1.3: will fuse the feature map F 2 Inputting the dense feature map conv_decoding 4 into a second three-layer convolution structure to perform convolution operation;
step 5.2: performing third feature fusion through the decoding structure to recover image information;
step 5.2.1: performing 2-time up-sampling on the feature map conv_decoding 4 by using a maximum pooling index stored when generating pooling feature map pool_3 to obtain sparse feature map upsampling3;
step 5.2.2: performing feature fusion on the sparse feature map upsampling3 and the convolution feature maps conv3_1, conv3_2 and pooling feature map pool_2 which are extracted from the coding structure and have the same resolution to obtain a fusion feature map F 3 ;
Step 5.2.3: will fuse the feature map F 3 Inputting the dense feature map conv_decoding 3 into a third three-layer convolution structure to perform convolution operation;
the first three feature fusion is the coding feature graphs corresponding to the three stages, and has the same fusion structure, the feature graphs participating in fusion have lower resolution and local abstract features, so that the same fusion form is used for recovering the local abstract features.
Step 5.3: performing fourth feature fusion through the decoding structure to recover the detail information of the image;
step 5.3.1: performing 2-time up-sampling on the feature map conv_decoding 3 by using a maximum pooling index stored when generating pooling feature map pool_2 to obtain sparse feature map upsampling2;
step 5.3.2: since the resolution of the feature map has been restored to the original map after step 5.3.1At this time, the corresponding feature graphs comprise conv_2_1, conv_2_2 and pool_1, so that in order to reduce the parameters of model training, only sparse feature graph upsampling2 is subjected to feature fusion with convolution feature graph conv_2_1 and pooling feature graph pool_1 to obtain a fusion feature graph F 4 ;
Step 5.3.3: according to the symmetry of SegNet network, the feature map F is fused 4 Inputting the two-layer characteristic images into a first two-layer convolution structure to carry out convolution operation to obtain a dense characteristic image conv_decoding 2;
different from the previous three feature fusion, the feature fusion corresponds to two stages of coding feature graphs for recovering detail information, so that the fusion forms are different;
step 5.4: performing fifth feature fusion through the decoding structure to recover the edge information of the image;
step 5.4.1: performing 2-time up-sampling on the feature map conv_decoding 2 by using a maximum pooling index stored when generating pooling feature map pool_1 to obtain sparse feature map upsampling1;
step 5.4.2: since the resolution of the feature map is restored to the original size after the step 5.4.1, the feature maps with the same resolution obtained by the coding structure have convolution features conv_1_1 and conv_1_2, and in order to reduce the parameters of model training, only the sparse feature map upsampling1 and the convolution feature map conv_1_1 are subjected to feature fusion to obtain a fusion feature map F 5 ;
Step 5.4.3: will fuse the feature map F 5 Inputting the two-layer convolution structure to a second two-layer convolution structure to perform convolution operation to obtain a dense feature map conv_decoding 1;
this feature fusion only has one stage of coded feature map to participate in the fusion and is used for recovering edge information.
Step 6: the dense feature map conv_decode1 is input to the Softmax layer to get the maximum probability of pixel classification in the image.
Step 7: the cross entropy loss function is calculated through the maximum probability of pixel classification in the image, and the convolution kernel parameters of each convolution layer and pooling layer in the coding structure and the decoding structure are updated through a random gradient descent method, so that the image segmentation is realized.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced with equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions, which are defined by the scope of the appended claims.
Claims (5)
1. An image segmentation method for enhancing edge and detail information by utilizing feature fusion is characterized in that: the method comprises the following steps:
step 1: processing the images in the training data set to obtain images with uniform resolution;
step 2: inputting the image into a coding structure for feature extraction; the coding structure is the same as the SegNet network, the first 13 layers of VGG-16 are adopted, and meanwhile, the maximum pooling index is added during pooling to memorize the maximum value of pixels in an image and the position of the maximum value;
the convolution kernel size of each convolution layer of the coding structure is 3×3, and the feature map after each convolution layer is named conv_i_j, where i=1, 2,3,4,5, j=1, 2 when i=1, 2, and j=1, 2,3 when i=3, 4,5; meanwhile, each convolution layer is followed by a connection Batch Normalisation and a ReLU activation function; adding a maximum pooling index into each pooling layer, realizing downsampling by using 2 multiplied by 2 non-overlapping maximum pooling, and memorizing the position of the maximum value of the pixel through the maximum pooling index, wherein the characteristic diagram obtained by each pooling layer is represented by pool_r, wherein r=1, 2,3,4 and 5;
step 3: inputting a pooling feature map pool_5 obtained through the coding structure into a decoding structure added with more feature fusion, releasing the maximum value of pixels in situ by utilizing the maximum pooling index, filling the rest positions with 0, and realizing up-sampling by 2 times to obtain a sparse feature map upsampling5;
the decoding structure comprises three-layer convolution structures and two-layer convolution structures; each convolutional layer in the decoding structure is followed by a concatenation Batch Normalisation and a ReLU activation function;
step 4: performing one-time feature fusion operation through a decoding structure, fusing a sparse feature map upsampling5 with convolution feature maps conv_5_1 and conv_5_2, and fusing a feature map obtained by fusion with a pooling feature map pool_4 with a corresponding size to obtain a fusion feature map F 1 ;
Will fuse the feature map F 1 The method comprises the steps of inputting the data into a first three-layer convolution structure of a decoding structure to carry out convolution operation to obtain a dense feature map conv_decoding 5, and compensating information loss caused by pooling and downsampling;
step 5: performing four feature fusion operations through the decoding structure until the resolution of the feature map is restored to the original size to obtain a dense feature map conv_decoding 1;
step 5.1: performing secondary feature fusion through the decoding structure to recover image information;
step 5.1.1: 2 times up-sampling the conv_decoding 5 by using the maximum pooling index stored when generating the pooling feature map pool_4 to obtain a sparse feature map upsampling4;
step 5.1.2: merging the sparse feature map upsampling4 with convolution feature maps conv_4_1 and conv_4_2 with the same resolution extracted from the coding structure, and pooling feature map pool_3 to obtain a merged feature map F 2 ;
Step 5.1.3: will fuse the feature map F 2 Inputting the dense feature map conv_decoding 4 into a second three-layer convolution structure to perform convolution operation;
step 5.2: performing third feature fusion through the decoding structure to recover image information;
step 5.2.1: performing 2-time up-sampling on the feature map conv_decoding 4 by using a maximum pooling index stored when generating pooling feature map pool_3 to obtain sparse feature map upsampling3;
step 5.2.2: performing feature fusion on the sparse feature map upsampling3 and the convolution feature maps conv3_1, conv3_2 and pooling feature map pool_2 which are extracted from the coding structure and have the same resolution to obtain a fusion feature map F 3 ;
Step 5.2.3: will fuse the feature map F 3 Inputting the dense feature map conv_decoding 3 into a third three-layer convolution structure to perform convolution operation;
step 5.3: performing fourth feature fusion through the decoding structure to recover the detail information of the image;
step 5.3.1: performing 2-time up-sampling on the feature map conv_decoding 3 by using a maximum pooling index stored when generating pooling feature map pool_2 to obtain sparse feature map upsampling2;
step 5.3.2: feature fusion is carried out on the sparse feature map upsampling2, the convolution feature map conv_2_1 and the pooling feature map pool_1 to obtain a fusion feature map F 4 ;
Step 5.3.3: root of Chinese characterAccording to the symmetry of SegNet network, feature map F will be fused 4 Inputting the two-layer characteristic images into a first two-layer convolution structure to carry out convolution operation to obtain a dense characteristic image conv_decoding 2;
step 5.4: performing fifth feature fusion through the decoding structure to recover the edge information of the image;
step 5.4.1: performing 2-time up-sampling on the feature map conv_decoding 2 by using a maximum pooling index stored when generating pooling feature map pool_1 to obtain sparse feature map upsampling1;
step 5.4.2: feature fusion is carried out on the sparse feature map upsampling1 and the convolution feature map conv_1_1 to obtain a fusion feature map F 5 ;
Step 5.4.3: will fuse the feature map F 5 Inputting the two-layer convolution structure to a second two-layer convolution structure to perform convolution operation to obtain a dense feature map conv_decoding 1;
step 6: inputting the dense feature map conv_decoding 1 into a Softmax layer to obtain the maximum probability of pixel classification in the image;
step 7: the cross entropy loss function is calculated through the maximum probability of pixel classification in the image, and the convolution kernel parameters of each convolution layer and pooling layer in the coding structure and the decoding structure are updated through a random gradient descent method, so that the image segmentation is realized.
2. An image segmentation method using feature fusion to enhance edge and detail information as defined in claim 1, wherein: the specific method of the step 1 is as follows:
step 1.1: scaling and cutting the images in the training data set to enable the input images to have uniform sizes;
step 1.2: the resolution of the input image is fixed to 360×480.
3. An image segmentation method using feature fusion to enhance edge and detail information as defined in claim 1, wherein: the specific method for memorizing the maximum value of the pixels in the image and the positions thereof by adding the maximum pooling index during pooling in the step 2 is as follows:
for inputFeature map X ε R h×w×c Wherein h and w are the height and width of the feature map respectively, c is the number of channels, and the feature map is obtained through 2X 2 non-overlapping maximization poolingWherein the value of pixel (i, j) is shown in the following formula:
the position corresponding to the maximum value of the pixel point is recorded as (m i ,n j ) The following formula is shown:
4. a method of image segmentation using feature fusion to enhance edge and detail information as claimed in claim 3, wherein: the value of each pixel in the sparse feature map upsamping 5 obtained in the step 3 is shown in the following formula:
wherein Z is u,v And (5) uploading pixel values of the pixel points (u, v) in the sampling5 for the sparse feature map.
5. An image segmentation method using feature fusion to enhance edge and detail information as defined in claim 1, wherein: and 4, performing addition operation on pixel values at corresponding positions in the feature map in the fusion process.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911094462.3A CN111028235B (en) | 2019-11-11 | 2019-11-11 | Image segmentation method for enhancing edge and detail information by utilizing feature fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911094462.3A CN111028235B (en) | 2019-11-11 | 2019-11-11 | Image segmentation method for enhancing edge and detail information by utilizing feature fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111028235A CN111028235A (en) | 2020-04-17 |
CN111028235B true CN111028235B (en) | 2023-08-22 |
Family
ID=70205321
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911094462.3A Active CN111028235B (en) | 2019-11-11 | 2019-11-11 | Image segmentation method for enhancing edge and detail information by utilizing feature fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111028235B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111582111B (en) * | 2020-04-29 | 2022-04-29 | 电子科技大学 | Cell component segmentation method based on semantic segmentation |
CN111666842B (en) * | 2020-05-25 | 2022-08-26 | 东华大学 | Shadow detection method based on double-current-cavity convolution neural network |
CN111784642B (en) * | 2020-06-10 | 2021-12-28 | 中铁四局集团有限公司 | Image processing method, target recognition model training method and target recognition method |
CN113052159B (en) * | 2021-04-14 | 2024-06-07 | 中国移动通信集团陕西有限公司 | Image recognition method, device, equipment and computer storage medium |
CN113192200B (en) * | 2021-04-26 | 2022-04-01 | 泰瑞数创科技(北京)有限公司 | Method for constructing urban real scene three-dimensional model based on space-three parallel computing algorithm |
CN113280820B (en) * | 2021-06-09 | 2022-11-29 | 华南农业大学 | Orchard visual navigation path extraction method and system based on neural network |
CN113496453A (en) * | 2021-06-29 | 2021-10-12 | 上海电力大学 | Anti-network image steganography method based on multi-level feature fusion |
CN113724269A (en) * | 2021-08-12 | 2021-11-30 | 浙江大华技术股份有限公司 | Example segmentation method, training method of example segmentation network and related equipment |
CN115828079B (en) * | 2022-04-20 | 2023-08-11 | 北京爱芯科技有限公司 | Method and device for maximum pooling operation |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10304193B1 (en) * | 2018-08-17 | 2019-05-28 | 12 Sigma Technologies | Image segmentation and object detection using fully convolutional neural network |
CN109903292A (en) * | 2019-01-24 | 2019-06-18 | 西安交通大学 | A kind of three-dimensional image segmentation method and system based on full convolutional neural networks |
CN110264483A (en) * | 2019-06-19 | 2019-09-20 | 东北大学 | A kind of semantic image dividing method based on deep learning |
-
2019
- 2019-11-11 CN CN201911094462.3A patent/CN111028235B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10304193B1 (en) * | 2018-08-17 | 2019-05-28 | 12 Sigma Technologies | Image segmentation and object detection using fully convolutional neural network |
CN109903292A (en) * | 2019-01-24 | 2019-06-18 | 西安交通大学 | A kind of three-dimensional image segmentation method and system based on full convolutional neural networks |
CN110264483A (en) * | 2019-06-19 | 2019-09-20 | 东北大学 | A kind of semantic image dividing method based on deep learning |
Non-Patent Citations (1)
Title |
---|
图像语义分割问题研究综述;肖朝霞 等;软件导刊;第17卷(第8期);第6-12页 * |
Also Published As
Publication number | Publication date |
---|---|
CN111028235A (en) | 2020-04-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111028235B (en) | Image segmentation method for enhancing edge and detail information by utilizing feature fusion | |
CN111047551B (en) | Remote sensing image change detection method and system based on U-net improved algorithm | |
CN110322495B (en) | Scene text segmentation method based on weak supervised deep learning | |
CN107644006B (en) | Automatic generation method of handwritten Chinese character library based on deep neural network | |
CN111612807B (en) | Small target image segmentation method based on scale and edge information | |
CN111915627B (en) | Semantic segmentation method, network, device and computer storage medium | |
CN101714262A (en) | Method for reconstructing three-dimensional scene of single image | |
CN113569865B (en) | Single sample image segmentation method based on class prototype learning | |
CN110751111B (en) | Road extraction method and system based on high-order spatial information global automatic perception | |
CN113505792B (en) | Multi-scale semantic segmentation method and model for unbalanced remote sensing image | |
CN114936605A (en) | Knowledge distillation-based neural network training method, device and storage medium | |
WO2023212997A1 (en) | Knowledge distillation based neural network training method, device, and storage medium | |
CN113408471A (en) | Non-green-curtain portrait real-time matting algorithm based on multitask deep learning | |
CN106910202B (en) | Image segmentation method and system for ground object of remote sensing image | |
CN111539887A (en) | Neural network image defogging method based on mixed convolution channel attention mechanism and layered learning | |
CN113066025B (en) | Image defogging method based on incremental learning and feature and attention transfer | |
CN112581409B (en) | Image defogging method based on end-to-end multiple information distillation network | |
CN112270366B (en) | Micro target detection method based on self-adaptive multi-feature fusion | |
CN114048822A (en) | Attention mechanism feature fusion segmentation method for image | |
CN112950477A (en) | High-resolution saliency target detection method based on dual-path processing | |
CN116797787B (en) | Remote sensing image semantic segmentation method based on cross-modal fusion and graph neural network | |
CN113689434A (en) | Image semantic segmentation method based on strip pooling | |
CN110633706B (en) | Semantic segmentation method based on pyramid network | |
CN113837290A (en) | Unsupervised unpaired image translation method based on attention generator network | |
CN116485867A (en) | Structured scene depth estimation method for automatic driving |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |