CN112329808A - Optimization method and system of Deeplab semantic segmentation algorithm - Google Patents
Optimization method and system of Deeplab semantic segmentation algorithm Download PDFInfo
- Publication number
- CN112329808A CN112329808A CN202011027787.2A CN202011027787A CN112329808A CN 112329808 A CN112329808 A CN 112329808A CN 202011027787 A CN202011027787 A CN 202011027787A CN 112329808 A CN112329808 A CN 112329808A
- Authority
- CN
- China
- Prior art keywords
- image
- layer
- semantic segmentation
- data
- segmentation algorithm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 72
- 230000011218 segmentation Effects 0.000 title claims abstract description 61
- 238000004422 calculation algorithm Methods 0.000 title claims abstract description 47
- 238000005457 optimization Methods 0.000 title claims abstract description 24
- 238000011176 pooling Methods 0.000 claims abstract description 41
- 230000009467 reduction Effects 0.000 claims abstract description 27
- 230000004927 fusion Effects 0.000 claims description 14
- 230000006870 function Effects 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 8
- 238000013528 artificial neural network Methods 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 abstract description 5
- 230000002776 aggregation Effects 0.000 abstract description 4
- 238000004220 aggregation Methods 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 14
- 230000008569 process Effects 0.000 description 11
- 238000005070 sampling Methods 0.000 description 10
- 235000004936 Bromus mango Nutrition 0.000 description 8
- 235000014826 Mangifera indica Nutrition 0.000 description 8
- 235000009184 Spondias indica Nutrition 0.000 description 8
- 230000000694 effects Effects 0.000 description 8
- 239000013589 supplement Substances 0.000 description 8
- 241001093152 Mangifera Species 0.000 description 7
- 238000002474 experimental method Methods 0.000 description 6
- 240000001008 Dimocarpus longan Species 0.000 description 5
- 235000000235 Euphoria longan Nutrition 0.000 description 5
- 238000012549 training Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 4
- 235000013399 edible fruits Nutrition 0.000 description 4
- 239000002420 orchard Substances 0.000 description 4
- 244000080767 Areca catechu Species 0.000 description 3
- 235000006226 Areca catechu Nutrition 0.000 description 3
- 240000008154 Piper betle Species 0.000 description 2
- 235000008180 Piper betle Nutrition 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 241000249058 Anthracothorax Species 0.000 description 1
- 101100456045 Schizosaccharomyces pombe (strain 972 / ATCC 24843) map3 gene Proteins 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000011800 void material Substances 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
- G06V10/464—Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Image Analysis (AREA)
Abstract
The invention provides an optimization method and a system of a Deeplab semantic segmentation algorithm, wherein the method comprises the following steps: establishing a popular learning layer on the basis of a Deeplab semantic segmentation algorithm to reduce the data dimension in the popular learning layer to obtain dimension-reduced data; inputting the data after dimensionality reduction into an improved encoder network, and extracting the bottom-layer features and the high-layer features of the image; extracting multi-scale characteristic information of the image based on a cavity pyramid pooling method; and carrying out bilinear interpolation on the multi-scale characteristic information of the image to obtain a prediction result. The invention introduces the popular learning layer to preprocess the data, which can effectively retain the useful information of the data and carry out the preliminary aggregation. And processing the high-level features from multiple dimensions by utilizing a spatial pyramid structure, so that the prediction information of the full-connection layer for calculating the loss is closer to the real label information, and the loss of partial effective information is prevented.
Description
Technical Field
The invention relates to the technical field of semantic segmentation, in particular to an optimization method and system of a Deeplab semantic segmentation algorithm.
Background
Currently, semantic segmentation algorithms can be divided into two main categories, namely traditional semantic segmentation algorithms and deep learning-based semantic segmentation. Conventional semantic segmentation algorithms perform image segmentation by extracting artificial features as visual information, for example, threshold, edge-based segmentation methods. The traditional semantic segmentation algorithm utilizes the artificially selected features to segment the image, and has the advantages of no training process and low computational complexity. However, the selection of artificial features is often difficult, and in addition, the segmentation result of the traditional semantic segmentation method in a multiple semantic scene is not satisfactory. In recent years, computer vision has advanced into the deep learning era due to the progress of computing power and exponential increase of visual data amount, but the convolutional neural network has achieved excellent performance in an image classification task, but continuous downsampling kernel pooling operation in the convolutional neural network causes the resolution of a feature map to be reduced, a large amount of image detail information is lost, and the task of semantic segmentation is not facilitated. The full convolution neural network (FCN) firstly utilizes the convolution neural network to realize the classification task of the pixel level, and lays a foundation frame of the deep learning semantic segmentation technology. The U-net network structure connects the convolution pooling layer with the deconvolution layer, so that the classification accuracy of the pixel points is further improved, but the problems of reduced resolution of the characteristic diagram and loss of image detail information can occur in the down-sampling process of the FCN and the U-net. Aiming at the problems, the Deeplabv1 and the PSPNet (photosensitive networking) network model use the hole convolution to replace the pooling layer, so that the receptive field size of the filter is effectively increased and the loss of detail information in the downsampling process is reduced on the premise of keeping parameters unchanged. The SegNet model utilizes the encoder-decoder structure to capture sufficient spatial information at the shallow layer of the network, restoring image detail information. The encoder structure reduces training parameters, time complexity is reduced on the premise of ensuring segmentation accuracy, but the sampling mode of the encoder structure can cause the problems of sparse characteristic diagram and low segmentation accuracy, and in addition, the phenomenon of inaccurate segmentation result often occurs in the face of multi-scale objects. In order to capture multi-scale context information, a spatial pyramid cavity pooling module is provided by a Deeplabv2, PSPNet and Deeplabv3 network model, convolution cores with different hole sizes are used for performing convolution on a feature map, and multi-scale feature information is obtained.
The semantic segmentation algorithm can supplement image detail information by expanding a convolution kernel receptive field range, extracting multi-scale characteristic information and encoding and decoding structures, but a lot of detail information is lost in the downsampling process, and the capability of utilizing global context information is lacked, so that the semantic segmentation effect is limited. In addition, the high-level features are beneficial to category identification, the low-level features are beneficial to dividing accurate boundaries, and how to improve the semantic segmentation effect by utilizing the feature information of all network layers is a problem worthy of solving.
Therefore, a new optimization method and system for the deep semantic segmentation algorithm are needed to solve the above problems.
Disclosure of Invention
The invention provides an optimization method and system of a Deeplab semantic segmentation algorithm, which are used for solving the problems that a lot of detail information is lost in the downsampling process of the existing semantic segmentation method, the capability of utilizing global context information is lacked, and the semantic segmentation effect is limited.
In a first aspect, an embodiment of the present invention provides an optimization method for a deep semantic segmentation algorithm, including:
establishing a popular learning layer on the basis of a Deeplab semantic segmentation algorithm to reduce the data dimension in the popular learning layer to obtain dimension-reduced data;
inputting the data after dimensionality reduction into an improved encoder network, and extracting the bottom-layer features and the high-layer features of the image;
extracting multi-scale characteristic information of the image based on a cavity pyramid pooling method;
and carrying out bilinear interpolation on the multi-scale characteristic information of the image to obtain a prediction result.
Further, the inputting the data after the dimensionality reduction into the improved encoder network, extracting the bottom layer features and the high layer features of the image, further includes:
and performing image fusion at an image feature level on the bottom-layer features and the high-layer features of the image.
Further, the image fusion formula for performing image feature level on the bottom layer features and the high layer features of the image is as follows:
wherein the output bit F of the neural network1(x),f1、f2Representing convolution, pooling and activation operations of residual units, w1、w2Is a convolution kernel.
Further, the extracting the multi-scale feature information of the image based on the void pyramid pooling method includes:
the calculation formula of the cavity pyramid pooling is as follows:
wherein F(s) is a convolved vector, k (t) is a convolution kernel, s is a step length, and t is an offset; (F-lk) (p) pyramid pooling results for p dimensions are shown,representing the summation over p dimensions.
Further, the reducing dimensionality of the data in the popular learning layer to obtain the data after dimensionality reduction comprises:
constructing a neighbor graph and connecting the sample points such that each point is connected to its nearest k points;
determining weights between neighboring points using a thermal kernel function;
and constructing an optimized objective function for prediction classification based on the weight between the adjacent points.
Further, the optimization objective function is:
wherein, ya,ybIs the column vector of the feature points in m-dimensional space, and W is the weight between neighboring points.
In a second aspect, an embodiment of the present invention provides an optimization system for a deep semantic segmentation algorithm, including:
the dimensionality reduction module is used for building a popular learning layer on the basis of a Deeplab semantic segmentation algorithm so as to reduce the dimensionality of data in the popular learning layer and obtain the data after dimensionality reduction;
the characteristic extraction module is used for inputting the data after the dimensionality reduction into an improved encoder network and extracting the bottom-layer characteristic and the high-layer characteristic of the image;
the pooling module is used for extracting multi-scale characteristic information of the image based on a cavity pyramid pooling method;
and the classification module is used for carrying out bilinear interpolation on the multi-scale characteristic information of the image to obtain a prediction result.
In a third aspect, an embodiment of the present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method for optimizing a deep semantic segmentation algorithm provided in the first aspect.
In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the optimization method of the deep semantic segmentation algorithm as provided in the first aspect.
According to the optimization method and the system for the Deeplab semantic segmentation algorithm, a popular learning layer is introduced to preprocess data, and compared with other traditional dimension reduction modes, the popular learning dimension reduction method can effectively retain useful information of the data and carry out preliminary aggregation. The method constructs parallel convolution network layers to extract detail feature information such as textures, outlines and the like, performs feature pixel level fusion to supplement semantic information of an original image, and then performs down-sampling operation by using the feature image obtained by fusion to further supplement the detail information. The invention also utilizes the space pyramid structure to process the high-level characteristics from multiple dimensions respectively, so that the prediction information of the full-connection layer for calculating the loss is closer to the real label information, and the loss of partial effective information is prevented.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of an optimization method of a deep semantic segmentation algorithm according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an optimized deep semantic segmentation algorithm according to an embodiment of the present invention;
FIG. 3 is a diagram of popular learning layer effects provided by embodiments of the present invention;
FIG. 4 is a block output diagram according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a convolution provided by an embodiment of the present invention;
FIG. 6 is a result of hole convolution according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an optimization system of a deep semantic segmentation algorithm according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
At present, the traditional semantic segmentation algorithm can supplement image detail information by expanding a convolution kernel receptive field range, extracting multi-scale feature information and encoding and decoding structures, but a lot of detail information is still lost in a downsampling process, and the capability of utilizing global context information is lacked, so that the semantic segmentation effect is limited.
Therefore, according to the optimization method and the optimization system for the Deeplab semantic segmentation algorithm, a popular learning layer is introduced to preprocess data, and compared with other traditional dimension reduction modes, the popular learning dimension reduction method and the system can effectively retain useful information of the data and perform preliminary aggregation. The method constructs parallel convolution network layers to extract detail feature information such as textures, outlines and the like, performs feature pixel level fusion to supplement semantic information of an original image, and then performs down-sampling operation by using the feature image obtained by fusion to further supplement the detail information. The following description and description of various embodiments are presented in conjunction with the following drawings.
Fig. 1 is a schematic flow chart of an optimization method of a deep semantic segmentation algorithm according to an embodiment of the present invention, as shown in fig. 1, the method includes:
101. establishing a popular learning layer on the basis of a Deeplab semantic segmentation algorithm to reduce the data dimension in the popular learning layer to obtain dimension-reduced data;
102. inputting the data after dimensionality reduction into an improved encoder network, and extracting the bottom-layer features and the high-layer features of the image;
103. extracting multi-scale characteristic information of the image based on a cavity pyramid pooling method;
104. and carrying out bilinear interpolation on the multi-scale characteristic information of the image to obtain a prediction result.
Specifically, fig. 2 is a schematic diagram of an optimized structure of a semantic segmentation algorithm of the deep semantic, which is provided by the embodiment of the present invention, and it can be seen from fig. 2 that the method provided by the embodiment of the present invention constructs four partial contents, which are a popular learning layer, an improved ResNet-101 layer, a spatial pyramid structure layer, and a bilinear interpolation classification layer, and correspond to steps 101 and 104 in the embodiment of the present invention, respectively. Fig. 3 is an effect diagram of a popular learning layer provided by the embodiment of the present invention, and as shown in fig. 3, the embodiment of the present invention builds a popular learning layer to implement data dimension reduction processing.
Further, in step 102, the embodiment of the present invention constructs an improved ResNet-101 structure, extracts the bottom layer features and the top layer features of the image, then performs fusion of image feature levels, and sets convolution kernels ConV2 and ConV3 as w1、w2And obtaining an output F of the improved ResNet-101 network after Block3, Block4 and convolution kernel sampling processing1(x)。
In step 103, a cavity pyramid pooling module is used to obtain multi-scale feature information of the image. The cavity pyramid pooling structure comprises the steps of respectively adopting cavity convolution with rates of 6, 12 and 18 to the pooled feature maps, carrying out 1-1 convolution on the feature maps and 3-3 maximum pooling process of the feature maps, and then reducing the feature maps into feature map sizes through a full connection layer and then carrying out vector superposition.
Finally, in step 104, bilinear interpolation is performed on the feature map to obtain a prediction result.
According to the optimization method and the system for the Deeplab semantic segmentation algorithm, a popular learning layer is introduced to preprocess data, and compared with other traditional dimension reduction modes, the popular learning dimension reduction method can effectively retain useful information of the data and carry out preliminary aggregation. The method constructs parallel convolution network layers to extract detail feature information such as textures, outlines and the like, performs feature pixel level fusion to supplement semantic information of an original image, and then performs down-sampling operation by using the feature image obtained by fusion to further supplement the detail information. The invention also utilizes the space pyramid structure to process the high-level characteristics from multiple dimensions respectively, so that the prediction information of the full-connection layer for calculating the loss is closer to the real label information, and the loss of partial effective information is prevented.
In one embodiment, the step 102 of inputting the reduced dimension data into an improved encoder network to extract the bottom-layer features and the high-layer features of the image further includes:
and performing image fusion at an image feature level on the bottom-layer features and the high-layer features of the image.
In one embodiment, the image fusion formula for performing image feature level on the underlying features and the upper features of the image is as follows:
wherein the output bit F of the neural network1(x),f1、f2Representing convolution, pooling and activation operations of residual units, w1、w2Is a convolution kernel.
Specifically, in the embodiment of the present invention, an improved ResNet-101 structure is substantially constructed, bottom layer features and high layer features of an image are extracted, and then image feature level fusion is performed, as shown in fig. 3, ResNet-101 is used as an encoder network in a deepab framework, an input image first passes through a convolution layer with a convolution kernel size of 7 and a step size of 2 and a pooling layer with a size of 3 and a step size of 2, which has the functions of reducing the number of training parameters, expanding the range of a sensing field, and retaining more image global information. Next, the feature map output by the pooling layer is input into 4 Blocks composed of a stack of residual units and further down-sampled, the number of residual units in the 4 Blocks is 3, 4, 23, and 3 in sequence, during which the feature map size is continuously reduced to 1/16 of the original image, which means that the feature information included therein is more complicated while the feature map size is continuously reduced.
Fig. 4 is a Block output schematic diagram provided in the embodiment of the present invention, and an output Feature map3 of Block2 in fig. 4 is taken as x, and after Block3 and Block4 downsampling processing, an output f (x) of the conventional ResNet-101 network is obtained, and an expression thereof is shown as follows.
F(x)=f2(f1(x))
FIG. 5 is a schematic diagram of convolution provided by the embodiment of the present invention, where the convolution kernels ConV2 and ConV3 in FIG. 5 are assumed to be w1、w2And obtaining an output F of the improved ResNet-101 network after Block3, Block4 and convolution kernel sampling processing1(x) The expression is shown below.
Wherein f is1、f2Representing the operations of convolution, pooling and activation of residual units in Block3 and Block 4.
In one embodiment, in step 103, the extracting multi-scale feature information of the image based on the hole pyramid pooling method includes:
the calculation formula of the cavity pyramid pooling is as follows:
wherein F(s) is a convolved vector, k (t) is a convolution kernel, s is a step length, and t is an offset; (F-lk) (p) pyramid pooling results for p dimensions are shown,representing the summation over p dimensions.
Specifically, the cavity pyramid pooling module is used for acquiring multi-scale feature information of the image. The cavity pyramid pooling structure comprises the steps of respectively adopting cavity convolution with rates of 6, 12 and 18 to the pooled feature maps, carrying out 1-1 convolution on the feature maps and 3-3 maximum pooling process of the feature maps, and then reducing the feature maps into feature map sizes through a full connection layer and then carrying out vector superposition. Fig. 6 shows the result of the hole convolution according to the embodiment of the present invention, and as shown in fig. 6, the hole convolution is a process of sampling the original image, and the sampling frequency is set according to the parameter hole size (rate). If rate is set to be 1, the original image is sampled without losing any information, namely standard convolution operation; if rate >1, sampling is performed every other (rate-1) pixel on the original data, and the scope of the receptive field is increased. Defining a hole factor as l, and then calculating the hole convolution according to the formula:
wherein F(s) is the convolved vector, k (t) is the convolution kernel, s is the step size, and t is the offset; (F-lk) (p) pyramid pooling results for p dimensions are shown,representing the summation over p dimensions.
Based on the content of each embodiment, in step 101, reducing the data dimension in the popular learning layer to obtain the dimension-reduced data specifically includes:
constructing a neighbor graph and connecting the sample points such that each point is connected to its nearest k points;
determining weights between neighboring points using a thermal kernel function;
and constructing an optimized objective function for prediction classification based on the weight between the adjacent points.
In one embodiment, the optimization objective function f (x) is:
wherein,ya,ybIs the column vector of the feature points in m-dimensional space, and W is the weight between neighboring points.
The specific implementation of building the popular learning layer in step 1 is as follows,
step 1a, constructing a neighbor graph, firstly connecting sample points, and connecting k points nearest to each point, wherein the k value is preset.
Step 1b, determining the weight between adjacent points by using a thermal kernel function, wherein the expression is as follows:
where x1 and x2 are adjacent points and t is the width of the thermonuclear. In addition, default setting of the weight value can be adopted, the weight value is set to be 1 when the two points x1 and x2 are not connected, and the weight value is set to be 0 when the two points x1 and x2 are connected.
Step 1c, similar sample points are closer in space after dimensionality reduction, and an optimization objective function f (x) is constructed as follows:
wherein y isa,ybIs the column vector of the feature points in the m-dimensional space, and the weight value W can be obtained in step 1 b.
In order to verify the performance of the method, the SAR image is adopted by the method, orchard data in the country of the south China's Ling-Water, the original data is obtained by a high-resolution three-number satellite, 9669 images which are divided into 200 x 200 by the step length of 200 form a data set, 85% of data is taken as training data, the rest 15% is taken as check data,
the experimental environment is under WIN10 system, in order to build improved generation deep network fast, adopts present comparatively popular deep learning frame Tensorflow, in the experiment, in order to improve the training speed of experiment network, adopts single 8G capacity GPU's calculation mode, adopts NVIDIA1080 card on the hardware, utilizes GUP acceleration mode operation.
To evaluate betterThe effectiveness of the algorithm, the experiment, with Kappa coefficient (Kappa), Overall Accuracy (OA) and Accuracy of a specific class (Accuracy), was evaluated. PabNumber of pixel points, t, representing prediction class a classified into class ba=∑bPabThe total number of pixel points belonging to the category a is represented, and the rating index is defined as follows:
kappa coefficient: kappa is a statistic used to measure the consistency between predictions and ground truth.
Where K is the number of classes [1, K ]
Overall accuracy: OA refers to the percentage of correctly classified pixels and all pixels in the entire image.
Class-specific precision: accuracy is the percentage of correctly classified pixels for each class:
the Kappa, OA and class specific precision value is between 0 and 1, and the higher the value is, the better the classification performance is.
In order to verify the classification effectiveness of the improved Deeplab network on the hyperspectral orchard image, 6 groups of experiments are performed on orchard data in the Hainan area, and b-e are 5 groups of comparison experiments, which are respectively as follows: GLCM + SVM, Decompositon + SPM, SDU-CNN and Deeplab, and the classification accuracy and evaluation index of various fruits in each group of experiments are shown in the following table. It can be seen that the total classification accuracy of the decomplexiton + SPM method is the lowest, which is only 64.16%, the misclassification of mangos in the three periods is serious, and the classification accuracy of two fruits, betel nut and longan, is also poor. The GLCM + SVM method improves the classifying precision of the betel nuts and the longans by about 10%, but the classifying effect of the same fruit in different periods is still poor. Compared with the two classification methods, the SDU-CNN method improves the classification precision of three fruits, namely mango, betel nut and longan by about 10%. The original Deeplab method improves the classification precision of stage I mango, stage II mango and betel nut to 95.62%, 91.56% and 94.33% respectively, but a larger error still exists in the classification of stage III mango and longan. The algorithm achieves the highest classification accuracy on 5 orchards of mangoes in stage I, mangoes in stage II, mangoes in stage III, betel nuts and longans, and the classification accuracy is as follows: 98.56%, 98.33%, 95.62%, 99.23%, 98.32%.
TABLE 1 improved Deepalb class segmentation confusion matrix
TABLE 2 evaluation index
In an embodiment, fig. 7 is a schematic structural diagram of an optimization system of a deep semantic segmentation algorithm provided in an embodiment of the present invention, and as shown in fig. 7, the apparatus includes a dimension reduction module 701, a feature extraction module 702, a pooling module 703 and a classification module 704, where:
the dimensionality reduction module 701 is used for building a popular learning layer on the basis of a Deeplab semantic segmentation algorithm so as to reduce the dimensionality of data in the popular learning layer and obtain the data after dimensionality reduction. The feature extraction module 702 is configured to input the reduced dimension data into an improved encoder network, and extract a bottom-layer feature and a top-layer feature of an image. The pooling module 703 is configured to extract multi-scale feature information of the image based on a cavity pyramid pooling method. The classification module 704 is configured to perform bilinear interpolation on the multi-scale feature information of the image to obtain a prediction result.
For a specific way of optimizing the deep semantic segmentation algorithm by using the dimension reduction module 701, the feature extraction module 702, the pooling module 703 and the classification module 704, reference may be made to the above method embodiment, and details of the embodiment of the present invention are not described herein again.
In an embodiment, based on the same concept, an embodiment of the present invention further provides an electronic device, fig. 8 is a schematic structural diagram of the electronic device provided in the embodiment of the present invention, and as shown in fig. 8, the electronic device may include: a processor (processor)801, a communication Interface (Communications Interface)802, a memory (memory)803 and a bus 804, wherein the processor 801, the communication Interface 802 and the memory 803 complete communication with each other via the bus 804. The processor 801 may call logic instructions in the memory 803 to perform the following method: establishing a popular learning layer on the basis of a Deeplab semantic segmentation algorithm to reduce the data dimension in the popular learning layer to obtain dimension-reduced data; inputting the data after dimensionality reduction into an improved encoder network, and extracting the bottom-layer features and the high-layer features of the image; extracting multi-scale characteristic information of the image based on a cavity pyramid pooling method; and carrying out bilinear interpolation on the multi-scale characteristic information of the image to obtain a prediction result.
In one embodiment, based on the same concept, the present embodiment also provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, which when executed by a computer, the computer is capable of performing the method provided by the above-mentioned method embodiments, for example, including: establishing a popular learning layer on the basis of a Deeplab semantic segmentation algorithm to reduce the data dimension in the popular learning layer to obtain dimension-reduced data; inputting the data after dimensionality reduction into an improved encoder network, and extracting the bottom-layer features and the high-layer features of the image; extracting multi-scale characteristic information of the image based on a cavity pyramid pooling method; and carrying out bilinear interpolation on the multi-scale characteristic information of the image to obtain a prediction result.
In one embodiment, based on the same concept, the present embodiment further provides a non-transitory computer-readable storage medium storing computer instructions that cause the computer to perform the method provided by the above method embodiments, for example, including: establishing a popular learning layer on the basis of a Deeplab semantic segmentation algorithm to reduce the data dimension in the popular learning layer to obtain dimension-reduced data; inputting the data after dimensionality reduction into an improved encoder network, and extracting the bottom-layer features and the high-layer features of the image; extracting multi-scale characteristic information of the image based on a cavity pyramid pooling method; and carrying out bilinear interpolation on the multi-scale characteristic information of the image to obtain a prediction result.
The embodiments of the present invention can be arbitrarily combined to achieve different technical effects.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (9)
1. A method for optimizing a Deeplab semantic segmentation algorithm is characterized by comprising the following steps:
establishing a popular learning layer on the basis of a Deeplab semantic segmentation algorithm to reduce the data dimension in the popular learning layer to obtain dimension-reduced data;
inputting the data after dimensionality reduction into an improved encoder network, and extracting the bottom-layer features and the high-layer features of the image;
extracting multi-scale characteristic information of the image based on a cavity pyramid pooling method;
and carrying out bilinear interpolation on the multi-scale characteristic information of the image to obtain a prediction result.
2. The method for optimizing a deep semantic segmentation algorithm according to claim 1, wherein the step of inputting the reduced-dimension data into an improved encoder network to extract bottom-layer features and high-layer features of an image further comprises:
and performing image fusion at an image feature level on the bottom-layer features and the high-layer features of the image.
3. The method for optimizing the deep semantic segmentation algorithm according to claim 1, wherein the image fusion formula for performing the image feature level on the bottom layer features and the high layer features of the image is as follows:
wherein the output bit F of the neural network1(x),f1、f2Representing convolution, pooling and activation operations of residual units, w1、w2Is a convolution kernel.
4. The optimization method of the deep semantic segmentation algorithm according to claim 3, wherein the extracting multi-scale feature information of the image based on the hole pyramid pooling method includes:
the calculation formula of the cavity pyramid pooling is as follows:
5. The method for optimizing a deep semantic segmentation algorithm according to claim 1, wherein the reducing dimensionality of data in the popular learning layer to obtain the reduced dimensionality data comprises:
constructing a neighbor graph and connecting the sample points such that each point is connected to its nearest k points;
determining weights between neighboring points using a thermal kernel function;
and constructing an optimized objective function for prediction classification based on the weight between the adjacent points.
7. An optimization system for a Deeplab semantic segmentation algorithm, comprising:
the dimensionality reduction module is used for building a popular learning layer on the basis of a Deeplab semantic segmentation algorithm so as to reduce the dimensionality of data in the popular learning layer and obtain the data after dimensionality reduction;
the characteristic extraction module is used for inputting the data after the dimensionality reduction into an improved encoder network and extracting the bottom-layer characteristic and the high-layer characteristic of the image;
the pooling module is used for extracting multi-scale characteristic information of the image based on a cavity pyramid pooling method;
and the classification module is used for carrying out bilinear interpolation on the multi-scale characteristic information of the image to obtain a prediction result.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of the method for optimizing a deep semantic segmentation algorithm according to any one of claims 1 to 6.
9. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the steps of the optimization method of the deplab semantic segmentation algorithm according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011027787.2A CN112329808A (en) | 2020-09-25 | 2020-09-25 | Optimization method and system of Deeplab semantic segmentation algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011027787.2A CN112329808A (en) | 2020-09-25 | 2020-09-25 | Optimization method and system of Deeplab semantic segmentation algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112329808A true CN112329808A (en) | 2021-02-05 |
Family
ID=74304261
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011027787.2A Pending CN112329808A (en) | 2020-09-25 | 2020-09-25 | Optimization method and system of Deeplab semantic segmentation algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112329808A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113111711A (en) * | 2021-03-11 | 2021-07-13 | 浙江理工大学 | Pooling method based on bilinear pyramid and spatial pyramid |
CN114549958A (en) * | 2022-02-24 | 2022-05-27 | 四川大学 | Night and disguised target detection method based on context information perception mechanism |
CN117197651A (en) * | 2023-07-24 | 2023-12-08 | 移动广播与信息服务产业创新研究院(武汉)有限公司 | Method and system for extracting field by combining edge detection and semantic segmentation |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109344821A (en) * | 2018-08-30 | 2019-02-15 | 西安电子科技大学 | Small target detecting method based on Fusion Features and deep learning |
CN110232394A (en) * | 2018-03-06 | 2019-09-13 | 华南理工大学 | A kind of multi-scale image semantic segmentation method |
WO2020093630A1 (en) * | 2018-11-09 | 2020-05-14 | 五邑大学 | Antenna downward inclination angle measurement method based on multi-scale deep semantic segmentation network |
CN111292330A (en) * | 2020-02-07 | 2020-06-16 | 北京工业大学 | Image semantic segmentation method and device based on coder and decoder |
-
2020
- 2020-09-25 CN CN202011027787.2A patent/CN112329808A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110232394A (en) * | 2018-03-06 | 2019-09-13 | 华南理工大学 | A kind of multi-scale image semantic segmentation method |
CN109344821A (en) * | 2018-08-30 | 2019-02-15 | 西安电子科技大学 | Small target detecting method based on Fusion Features and deep learning |
WO2020093630A1 (en) * | 2018-11-09 | 2020-05-14 | 五邑大学 | Antenna downward inclination angle measurement method based on multi-scale deep semantic segmentation network |
CN111292330A (en) * | 2020-02-07 | 2020-06-16 | 北京工业大学 | Image semantic segmentation method and device based on coder and decoder |
Non-Patent Citations (3)
Title |
---|
王云艳 等: "Deeplab网络的极化合成孔径雷达图像分类", 《测绘科学》, vol. 45, no. 6, pages 110 - 117 * |
王云艳 等: "改进型DeepLab 的极化SAR 果园分类", 《中国图像图像学报》, vol. 24, no. 11, pages 2035 - 2044 * |
韩松臣 等: "基于改进Faster‑RCNN 的机场场面小目标物体 检测算法", 《南京航空航天大学学报》, vol. 51, no. 6, pages 735 - 741 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113111711A (en) * | 2021-03-11 | 2021-07-13 | 浙江理工大学 | Pooling method based on bilinear pyramid and spatial pyramid |
CN114549958A (en) * | 2022-02-24 | 2022-05-27 | 四川大学 | Night and disguised target detection method based on context information perception mechanism |
CN114549958B (en) * | 2022-02-24 | 2023-08-04 | 四川大学 | Night and camouflage target detection method based on context information perception mechanism |
CN117197651A (en) * | 2023-07-24 | 2023-12-08 | 移动广播与信息服务产业创新研究院(武汉)有限公司 | Method and system for extracting field by combining edge detection and semantic segmentation |
CN117197651B (en) * | 2023-07-24 | 2024-03-29 | 移动广播与信息服务产业创新研究院(武汉)有限公司 | Method and system for extracting field by combining edge detection and semantic segmentation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110232394B (en) | Multi-scale image semantic segmentation method | |
CN109949255B (en) | Image reconstruction method and device | |
CN111369440B (en) | Model training and image super-resolution processing method, device, terminal and storage medium | |
CN112651438A (en) | Multi-class image classification method and device, terminal equipment and storage medium | |
WO2020119527A1 (en) | Human action recognition method and apparatus, and terminal device and storage medium | |
CN111488985B (en) | Deep neural network model compression training method, device, equipment and medium | |
CN112329808A (en) | Optimization method and system of Deeplab semantic segmentation algorithm | |
CN108510504B (en) | Image segmentation method and device | |
CN110570353A (en) | Dense connection generation countermeasure network single image super-resolution reconstruction method | |
CN111325271B (en) | Image classification method and device | |
CN112699937B (en) | Apparatus, method, device, and medium for image classification and segmentation based on feature-guided network | |
CN110059728B (en) | RGB-D image visual saliency detection method based on attention model | |
CN111612017A (en) | Target detection method based on information enhancement | |
CN111860398A (en) | Remote sensing image target detection method and system and terminal equipment | |
CN112465801B (en) | Instance segmentation method for extracting mask features in scale division mode | |
CN113313180B (en) | Remote sensing image semantic segmentation method based on deep confrontation learning | |
CN112270332A (en) | Three-dimensional target detection method and system based on sub-stream sparse convolution | |
CN113011253B (en) | Facial expression recognition method, device, equipment and storage medium based on ResNeXt network | |
CN112329801A (en) | Convolutional neural network non-local information construction method | |
CN110633640A (en) | Method for identifying complex scene by optimizing PointNet | |
CN117576402B (en) | Deep learning-based multi-scale aggregation transducer remote sensing image semantic segmentation method | |
CN115660955A (en) | Super-resolution reconstruction model, method, equipment and storage medium for efficient multi-attention feature fusion | |
CN116863194A (en) | Foot ulcer image classification method, system, equipment and medium | |
CN111126185A (en) | Deep learning vehicle target identification method for road intersection scene | |
CN114373110A (en) | Method and device for detecting target of input image and related products |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |