CN112329808A

CN112329808A - Optimization method and system of Deeplab semantic segmentation algorithm

Info

Publication number: CN112329808A
Application number: CN202011027787.2A
Authority: CN
Inventors: 姜益民; 罗冷坤; 洪勇
Original assignee: Wuhan Optics Valley Information Technology Co ltd
Current assignee: Wuhan Optics Valley Information Technology Co ltd
Priority date: 2020-09-25
Filing date: 2020-09-25
Publication date: 2021-02-05

Abstract

The invention provides an optimization method and a system of a Deeplab semantic segmentation algorithm, wherein the method comprises the following steps: establishing a popular learning layer on the basis of a Deeplab semantic segmentation algorithm to reduce the data dimension in the popular learning layer to obtain dimension-reduced data; inputting the data after dimensionality reduction into an improved encoder network, and extracting the bottom-layer features and the high-layer features of the image; extracting multi-scale characteristic information of the image based on a cavity pyramid pooling method; and carrying out bilinear interpolation on the multi-scale characteristic information of the image to obtain a prediction result. The invention introduces the popular learning layer to preprocess the data, which can effectively retain the useful information of the data and carry out the preliminary aggregation. And processing the high-level features from multiple dimensions by utilizing a spatial pyramid structure, so that the prediction information of the full-connection layer for calculating the loss is closer to the real label information, and the loss of partial effective information is prevented.

Description

Optimization method and system of Deeplab semantic segmentation algorithm

Technical Field

The invention relates to the technical field of semantic segmentation, in particular to an optimization method and system of a Deeplab semantic segmentation algorithm.

Background

Currently, semantic segmentation algorithms can be divided into two main categories, namely traditional semantic segmentation algorithms and deep learning-based semantic segmentation. Conventional semantic segmentation algorithms perform image segmentation by extracting artificial features as visual information, for example, threshold, edge-based segmentation methods. The traditional semantic segmentation algorithm utilizes the artificially selected features to segment the image, and has the advantages of no training process and low computational complexity. However, the selection of artificial features is often difficult, and in addition, the segmentation result of the traditional semantic segmentation method in a multiple semantic scene is not satisfactory. In recent years, computer vision has advanced into the deep learning era due to the progress of computing power and exponential increase of visual data amount, but the convolutional neural network has achieved excellent performance in an image classification task, but continuous downsampling kernel pooling operation in the convolutional neural network causes the resolution of a feature map to be reduced, a large amount of image detail information is lost, and the task of semantic segmentation is not facilitated. The full convolution neural network (FCN) firstly utilizes the convolution neural network to realize the classification task of the pixel level, and lays a foundation frame of the deep learning semantic segmentation technology. The U-net network structure connects the convolution pooling layer with the deconvolution layer, so that the classification accuracy of the pixel points is further improved, but the problems of reduced resolution of the characteristic diagram and loss of image detail information can occur in the down-sampling process of the FCN and the U-net. Aiming at the problems, the Deeplabv1 and the PSPNet (photosensitive networking) network model use the hole convolution to replace the pooling layer, so that the receptive field size of the filter is effectively increased and the loss of detail information in the downsampling process is reduced on the premise of keeping parameters unchanged. The SegNet model utilizes the encoder-decoder structure to capture sufficient spatial information at the shallow layer of the network, restoring image detail information. The encoder structure reduces training parameters, time complexity is reduced on the premise of ensuring segmentation accuracy, but the sampling mode of the encoder structure can cause the problems of sparse characteristic diagram and low segmentation accuracy, and in addition, the phenomenon of inaccurate segmentation result often occurs in the face of multi-scale objects. In order to capture multi-scale context information, a spatial pyramid cavity pooling module is provided by a Deeplabv2, PSPNet and Deeplabv3 network model, convolution cores with different hole sizes are used for performing convolution on a feature map, and multi-scale feature information is obtained.

The semantic segmentation algorithm can supplement image detail information by expanding a convolution kernel receptive field range, extracting multi-scale characteristic information and encoding and decoding structures, but a lot of detail information is lost in the downsampling process, and the capability of utilizing global context information is lacked, so that the semantic segmentation effect is limited. In addition, the high-level features are beneficial to category identification, the low-level features are beneficial to dividing accurate boundaries, and how to improve the semantic segmentation effect by utilizing the feature information of all network layers is a problem worthy of solving.

Therefore, a new optimization method and system for the deep semantic segmentation algorithm are needed to solve the above problems.

Disclosure of Invention

The invention provides an optimization method and system of a Deeplab semantic segmentation algorithm, which are used for solving the problems that a lot of detail information is lost in the downsampling process of the existing semantic segmentation method, the capability of utilizing global context information is lacked, and the semantic segmentation effect is limited.

In a first aspect, an embodiment of the present invention provides an optimization method for a deep semantic segmentation algorithm, including:

establishing a popular learning layer on the basis of a Deeplab semantic segmentation algorithm to reduce the data dimension in the popular learning layer to obtain dimension-reduced data;

inputting the data after dimensionality reduction into an improved encoder network, and extracting the bottom-layer features and the high-layer features of the image;

extracting multi-scale characteristic information of the image based on a cavity pyramid pooling method;

and carrying out bilinear interpolation on the multi-scale characteristic information of the image to obtain a prediction result.

Further, the inputting the data after the dimensionality reduction into the improved encoder network, extracting the bottom layer features and the high layer features of the image, further includes:

and performing image fusion at an image feature level on the bottom-layer features and the high-layer features of the image.

Further, the image fusion formula for performing image feature level on the bottom layer features and the high layer features of the image is as follows:

wherein the output bit F of the neural network₁(x)，f₁、f₂Representing convolution, pooling and activation operations of residual units, w₁、w₂Is a convolution kernel.

Further, the extracting the multi-scale feature information of the image based on the void pyramid pooling method includes:

the calculation formula of the cavity pyramid pooling is as follows:

wherein F(s) is a convolved vector, k (t) is a convolution kernel, s is a step length, and t is an offset; (F-_lk) (p) pyramid pooling results for p dimensions are shown,

representing the summation over p dimensions.

Further, the reducing dimensionality of the data in the popular learning layer to obtain the data after dimensionality reduction comprises:

constructing a neighbor graph and connecting the sample points such that each point is connected to its nearest k points;

determining weights between neighboring points using a thermal kernel function;

and constructing an optimized objective function for prediction classification based on the weight between the adjacent points.

Further, the optimization objective function is:

wherein, y_a，y_bIs the column vector of the feature points in m-dimensional space, and W is the weight between neighboring points.

In a second aspect, an embodiment of the present invention provides an optimization system for a deep semantic segmentation algorithm, including:

the dimensionality reduction module is used for building a popular learning layer on the basis of a Deeplab semantic segmentation algorithm so as to reduce the dimensionality of data in the popular learning layer and obtain the data after dimensionality reduction;

the characteristic extraction module is used for inputting the data after the dimensionality reduction into an improved encoder network and extracting the bottom-layer characteristic and the high-layer characteristic of the image;

the pooling module is used for extracting multi-scale characteristic information of the image based on a cavity pyramid pooling method;

and the classification module is used for carrying out bilinear interpolation on the multi-scale characteristic information of the image to obtain a prediction result.

In a third aspect, an embodiment of the present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method for optimizing a deep semantic segmentation algorithm provided in the first aspect.

In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the optimization method of the deep semantic segmentation algorithm as provided in the first aspect.

According to the optimization method and the system for the Deeplab semantic segmentation algorithm, a popular learning layer is introduced to preprocess data, and compared with other traditional dimension reduction modes, the popular learning dimension reduction method can effectively retain useful information of the data and carry out preliminary aggregation. The method constructs parallel convolution network layers to extract detail feature information such as textures, outlines and the like, performs feature pixel level fusion to supplement semantic information of an original image, and then performs down-sampling operation by using the feature image obtained by fusion to further supplement the detail information. The invention also utilizes the space pyramid structure to process the high-level characteristics from multiple dimensions respectively, so that the prediction information of the full-connection layer for calculating the loss is closer to the real label information, and the loss of partial effective information is prevented.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of an optimization method of a deep semantic segmentation algorithm according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of an optimized deep semantic segmentation algorithm according to an embodiment of the present invention;

FIG. 3 is a diagram of popular learning layer effects provided by embodiments of the present invention;

FIG. 4 is a block output diagram according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a convolution provided by an embodiment of the present invention;

FIG. 6 is a result of hole convolution according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an optimization system of a deep semantic segmentation algorithm according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

At present, the traditional semantic segmentation algorithm can supplement image detail information by expanding a convolution kernel receptive field range, extracting multi-scale feature information and encoding and decoding structures, but a lot of detail information is still lost in a downsampling process, and the capability of utilizing global context information is lacked, so that the semantic segmentation effect is limited.

Therefore, according to the optimization method and the optimization system for the Deeplab semantic segmentation algorithm, a popular learning layer is introduced to preprocess data, and compared with other traditional dimension reduction modes, the popular learning dimension reduction method and the system can effectively retain useful information of the data and perform preliminary aggregation. The method constructs parallel convolution network layers to extract detail feature information such as textures, outlines and the like, performs feature pixel level fusion to supplement semantic information of an original image, and then performs down-sampling operation by using the feature image obtained by fusion to further supplement the detail information. The following description and description of various embodiments are presented in conjunction with the following drawings.

Fig. 1 is a schematic flow chart of an optimization method of a deep semantic segmentation algorithm according to an embodiment of the present invention, as shown in fig. 1, the method includes:

101. establishing a popular learning layer on the basis of a Deeplab semantic segmentation algorithm to reduce the data dimension in the popular learning layer to obtain dimension-reduced data;

102. inputting the data after dimensionality reduction into an improved encoder network, and extracting the bottom-layer features and the high-layer features of the image;

103. extracting multi-scale characteristic information of the image based on a cavity pyramid pooling method;

104. and carrying out bilinear interpolation on the multi-scale characteristic information of the image to obtain a prediction result.

Specifically, fig. 2 is a schematic diagram of an optimized structure of a semantic segmentation algorithm of the deep semantic, which is provided by the embodiment of the present invention, and it can be seen from fig. 2 that the method provided by the embodiment of the present invention constructs four partial contents, which are a popular learning layer, an improved ResNet-101 layer, a spatial pyramid structure layer, and a bilinear interpolation classification layer, and correspond to

steps

101 and 104 in the embodiment of the present invention, respectively. Fig. 3 is an effect diagram of a popular learning layer provided by the embodiment of the present invention, and as shown in fig. 3, the embodiment of the present invention builds a popular learning layer to implement data dimension reduction processing.

Further, in step 102, the embodiment of the present invention constructs an improved ResNet-101 structure, extracts the bottom layer features and the top layer features of the image, then performs fusion of image feature levels, and sets convolution kernels ConV2 and ConV3 as w₁、w₂And obtaining an output F of the improved ResNet-101 network after Block3, Block4 and convolution kernel sampling processing₁(x)。

In step 103, a cavity pyramid pooling module is used to obtain multi-scale feature information of the image. The cavity pyramid pooling structure comprises the steps of respectively adopting cavity convolution with rates of 6, 12 and 18 to the pooled feature maps, carrying out 1-1 convolution on the feature maps and 3-3 maximum pooling process of the feature maps, and then reducing the feature maps into feature map sizes through a full connection layer and then carrying out vector superposition.

Finally, in step 104, bilinear interpolation is performed on the feature map to obtain a prediction result.

In one embodiment, the step 102 of inputting the reduced dimension data into an improved encoder network to extract the bottom-layer features and the high-layer features of the image further includes:

In one embodiment, the image fusion formula for performing image feature level on the underlying features and the upper features of the image is as follows:

Specifically, in the embodiment of the present invention, an improved ResNet-101 structure is substantially constructed, bottom layer features and high layer features of an image are extracted, and then image feature level fusion is performed, as shown in fig. 3, ResNet-101 is used as an encoder network in a deepab framework, an input image first passes through a convolution layer with a convolution kernel size of 7 and a step size of 2 and a pooling layer with a size of 3 and a step size of 2, which has the functions of reducing the number of training parameters, expanding the range of a sensing field, and retaining more image global information. Next, the feature map output by the pooling layer is input into 4 Blocks composed of a stack of residual units and further down-sampled, the number of residual units in the 4 Blocks is 3, 4, 23, and 3 in sequence, during which the feature map size is continuously reduced to 1/16 of the original image, which means that the feature information included therein is more complicated while the feature map size is continuously reduced.

Fig. 4 is a Block output schematic diagram provided in the embodiment of the present invention, and an output Feature map3 of Block2 in fig. 4 is taken as x, and after Block3 and Block4 downsampling processing, an output f (x) of the conventional ResNet-101 network is obtained, and an expression thereof is shown as follows.

F(x)＝f₂(f₁(x))

FIG. 5 is a schematic diagram of convolution provided by the embodiment of the present invention, where the convolution kernels ConV2 and ConV3 in FIG. 5 are assumed to be w₁、w₂And obtaining an output F of the improved ResNet-101 network after Block3, Block4 and convolution kernel sampling processing₁(x) The expression is shown below.

Wherein f is₁、f₂Representing the operations of convolution, pooling and activation of residual units in Block3 and Block 4.

In one embodiment, in step 103, the extracting multi-scale feature information of the image based on the hole pyramid pooling method includes:

the calculation formula of the cavity pyramid pooling is as follows:

representing the summation over p dimensions.

Specifically, the cavity pyramid pooling module is used for acquiring multi-scale feature information of the image. The cavity pyramid pooling structure comprises the steps of respectively adopting cavity convolution with rates of 6, 12 and 18 to the pooled feature maps, carrying out 1-1 convolution on the feature maps and 3-3 maximum pooling process of the feature maps, and then reducing the feature maps into feature map sizes through a full connection layer and then carrying out vector superposition. Fig. 6 shows the result of the hole convolution according to the embodiment of the present invention, and as shown in fig. 6, the hole convolution is a process of sampling the original image, and the sampling frequency is set according to the parameter hole size (rate). If rate is set to be 1, the original image is sampled without losing any information, namely standard convolution operation; if rate >1, sampling is performed every other (rate-1) pixel on the original data, and the scope of the receptive field is increased. Defining a hole factor as l, and then calculating the hole convolution according to the formula:

wherein F(s) is the convolved vector, k (t) is the convolution kernel, s is the step size, and t is the offset; (F-_lk) (p) pyramid pooling results for p dimensions are shown,

representing the summation over p dimensions.

Based on the content of each embodiment, in step 101, reducing the data dimension in the popular learning layer to obtain the dimension-reduced data specifically includes:

determining weights between neighboring points using a thermal kernel function;

In one embodiment, the optimization objective function f (x) is:

wherein，y_a，y_bIs the column vector of the feature points in m-dimensional space, and W is the weight between neighboring points.

The specific implementation of building the popular learning layer in step 1 is as follows,

step 1a, constructing a neighbor graph, firstly connecting sample points, and connecting k points nearest to each point, wherein the k value is preset.

Step 1b, determining the weight between adjacent points by using a thermal kernel function, wherein the expression is as follows:

where x1 and x2 are adjacent points and t is the width of the thermonuclear. In addition, default setting of the weight value can be adopted, the weight value is set to be 1 when the two points x1 and x2 are not connected, and the weight value is set to be 0 when the two points x1 and x2 are connected.

Step 1c, similar sample points are closer in space after dimensionality reduction, and an optimization objective function f (x) is constructed as follows:

wherein y is_a，y_bIs the column vector of the feature points in the m-dimensional space, and the weight value W can be obtained in step 1 b.

In order to verify the performance of the method, the SAR image is adopted by the method, orchard data in the country of the south China's Ling-Water, the original data is obtained by a high-resolution three-number satellite, 9669 images which are divided into 200 x 200 by the step length of 200 form a data set, 85% of data is taken as training data, the rest 15% is taken as check data,

the experimental environment is under WIN10 system, in order to build improved generation deep network fast, adopts present comparatively popular deep learning frame Tensorflow, in the experiment, in order to improve the training speed of experiment network, adopts single 8G capacity GPU's calculation mode, adopts NVIDIA1080 card on the hardware, utilizes GUP acceleration mode operation.

To evaluate betterThe effectiveness of the algorithm, the experiment, with Kappa coefficient (Kappa), Overall Accuracy (OA) and Accuracy of a specific class (Accuracy), was evaluated. P_abNumber of pixel points, t, representing prediction class a classified into class b_a＝∑_bP_abThe total number of pixel points belonging to the category a is represented, and the rating index is defined as follows:

kappa coefficient: kappa is a statistic used to measure the consistency between predictions and ground truth.

Where K is the number of classes [1, K ]

Overall accuracy: OA refers to the percentage of correctly classified pixels and all pixels in the entire image.

Class-specific precision: accuracy is the percentage of correctly classified pixels for each class:

the Kappa, OA and class specific precision value is between 0 and 1, and the higher the value is, the better the classification performance is.

In order to verify the classification effectiveness of the improved Deeplab network on the hyperspectral orchard image, 6 groups of experiments are performed on orchard data in the Hainan area, and b-e are 5 groups of comparison experiments, which are respectively as follows: GLCM + SVM, Decompositon + SPM, SDU-CNN and Deeplab, and the classification accuracy and evaluation index of various fruits in each group of experiments are shown in the following table. It can be seen that the total classification accuracy of the decomplexiton + SPM method is the lowest, which is only 64.16%, the misclassification of mangos in the three periods is serious, and the classification accuracy of two fruits, betel nut and longan, is also poor. The GLCM + SVM method improves the classifying precision of the betel nuts and the longans by about 10%, but the classifying effect of the same fruit in different periods is still poor. Compared with the two classification methods, the SDU-CNN method improves the classification precision of three fruits, namely mango, betel nut and longan by about 10%. The original Deeplab method improves the classification precision of stage I mango, stage II mango and betel nut to 95.62%, 91.56% and 94.33% respectively, but a larger error still exists in the classification of stage III mango and longan. The algorithm achieves the highest classification accuracy on 5 orchards of mangoes in stage I, mangoes in stage II, mangoes in stage III, betel nuts and longans, and the classification accuracy is as follows: 98.56%, 98.33%, 95.62%, 99.23%, 98.32%.

TABLE 1 improved Deepalb class segmentation confusion matrix

TABLE 2 evaluation index

In an embodiment, fig. 7 is a schematic structural diagram of an optimization system of a deep semantic segmentation algorithm provided in an embodiment of the present invention, and as shown in fig. 7, the apparatus includes a dimension reduction module 701, a feature extraction module 702, a pooling module 703 and a classification module 704, where:

the dimensionality reduction module 701 is used for building a popular learning layer on the basis of a Deeplab semantic segmentation algorithm so as to reduce the dimensionality of data in the popular learning layer and obtain the data after dimensionality reduction. The feature extraction module 702 is configured to input the reduced dimension data into an improved encoder network, and extract a bottom-layer feature and a top-layer feature of an image. The pooling module 703 is configured to extract multi-scale feature information of the image based on a cavity pyramid pooling method. The classification module 704 is configured to perform bilinear interpolation on the multi-scale feature information of the image to obtain a prediction result.

For a specific way of optimizing the deep semantic segmentation algorithm by using the dimension reduction module 701, the feature extraction module 702, the pooling module 703 and the classification module 704, reference may be made to the above method embodiment, and details of the embodiment of the present invention are not described herein again.

In an embodiment, based on the same concept, an embodiment of the present invention further provides an electronic device, fig. 8 is a schematic structural diagram of the electronic device provided in the embodiment of the present invention, and as shown in fig. 8, the electronic device may include: a processor (processor)801, a communication Interface (Communications Interface)802, a memory (memory)803 and a bus 804, wherein the processor 801, the communication Interface 802 and the memory 803 complete communication with each other via the bus 804. The processor 801 may call logic instructions in the memory 803 to perform the following method: establishing a popular learning layer on the basis of a Deeplab semantic segmentation algorithm to reduce the data dimension in the popular learning layer to obtain dimension-reduced data; inputting the data after dimensionality reduction into an improved encoder network, and extracting the bottom-layer features and the high-layer features of the image; extracting multi-scale characteristic information of the image based on a cavity pyramid pooling method; and carrying out bilinear interpolation on the multi-scale characteristic information of the image to obtain a prediction result.

In one embodiment, based on the same concept, the present embodiment also provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, which when executed by a computer, the computer is capable of performing the method provided by the above-mentioned method embodiments, for example, including: establishing a popular learning layer on the basis of a Deeplab semantic segmentation algorithm to reduce the data dimension in the popular learning layer to obtain dimension-reduced data; inputting the data after dimensionality reduction into an improved encoder network, and extracting the bottom-layer features and the high-layer features of the image; extracting multi-scale characteristic information of the image based on a cavity pyramid pooling method; and carrying out bilinear interpolation on the multi-scale characteristic information of the image to obtain a prediction result.

In one embodiment, based on the same concept, the present embodiment further provides a non-transitory computer-readable storage medium storing computer instructions that cause the computer to perform the method provided by the above method embodiments, for example, including: establishing a popular learning layer on the basis of a Deeplab semantic segmentation algorithm to reduce the data dimension in the popular learning layer to obtain dimension-reduced data; inputting the data after dimensionality reduction into an improved encoder network, and extracting the bottom-layer features and the high-layer features of the image; extracting multi-scale characteristic information of the image based on a cavity pyramid pooling method; and carrying out bilinear interpolation on the multi-scale characteristic information of the image to obtain a prediction result.

The embodiments of the present invention can be arbitrarily combined to achieve different technical effects.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for optimizing a Deeplab semantic segmentation algorithm is characterized by comprising the following steps:

2. The method for optimizing a deep semantic segmentation algorithm according to claim 1, wherein the step of inputting the reduced-dimension data into an improved encoder network to extract bottom-layer features and high-layer features of an image further comprises:

3. The method for optimizing the deep semantic segmentation algorithm according to claim 1, wherein the image fusion formula for performing the image feature level on the bottom layer features and the high layer features of the image is as follows:

4. The optimization method of the deep semantic segmentation algorithm according to claim 3, wherein the extracting multi-scale feature information of the image based on the hole pyramid pooling method includes:

the calculation formula of the cavity pyramid pooling is as follows:

representing the summation over p dimensions.

5. The method for optimizing a deep semantic segmentation algorithm according to claim 1, wherein the reducing dimensionality of data in the popular learning layer to obtain the reduced dimensionality data comprises:

determining weights between neighboring points using a thermal kernel function;

6. The method of optimizing a deep semantic segmentation algorithm according to claim 5, wherein the optimization objective function is:

7. An optimization system for a Deeplab semantic segmentation algorithm, comprising:

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of the method for optimizing a deep semantic segmentation algorithm according to any one of claims 1 to 6.

9. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the steps of the optimization method of the deplab semantic segmentation algorithm according to any one of claims 1 to 6.