CN112329808A - Optimization method and system of Deeplab semantic segmentation algorithm - Google Patents

Optimization method and system of Deeplab semantic segmentation algorithm Download PDF

Info

Publication number
CN112329808A
CN112329808A CN202011027787.2A CN202011027787A CN112329808A CN 112329808 A CN112329808 A CN 112329808A CN 202011027787 A CN202011027787 A CN 202011027787A CN 112329808 A CN112329808 A CN 112329808A
Authority
CN
China
Prior art keywords
image
layer
semantic segmentation
data
segmentation algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011027787.2A
Other languages
Chinese (zh)
Inventor
姜益民
罗冷坤
洪勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Optics Valley Information Technology Co ltd
Original Assignee
Wuhan Optics Valley Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Optics Valley Information Technology Co ltd filed Critical Wuhan Optics Valley Information Technology Co ltd
Priority to CN202011027787.2A priority Critical patent/CN112329808A/en
Publication of CN112329808A publication Critical patent/CN112329808A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • G06V10/464Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an optimization method and a system of a Deeplab semantic segmentation algorithm, wherein the method comprises the following steps: establishing a popular learning layer on the basis of a Deeplab semantic segmentation algorithm to reduce the data dimension in the popular learning layer to obtain dimension-reduced data; inputting the data after dimensionality reduction into an improved encoder network, and extracting the bottom-layer features and the high-layer features of the image; extracting multi-scale characteristic information of the image based on a cavity pyramid pooling method; and carrying out bilinear interpolation on the multi-scale characteristic information of the image to obtain a prediction result. The invention introduces the popular learning layer to preprocess the data, which can effectively retain the useful information of the data and carry out the preliminary aggregation. And processing the high-level features from multiple dimensions by utilizing a spatial pyramid structure, so that the prediction information of the full-connection layer for calculating the loss is closer to the real label information, and the loss of partial effective information is prevented.

Description

Optimization method and system of Deeplab semantic segmentation algorithm
Technical Field
The invention relates to the technical field of semantic segmentation, in particular to an optimization method and system of a Deeplab semantic segmentation algorithm.
Background
Currently, semantic segmentation algorithms can be divided into two main categories, namely traditional semantic segmentation algorithms and deep learning-based semantic segmentation. Conventional semantic segmentation algorithms perform image segmentation by extracting artificial features as visual information, for example, threshold, edge-based segmentation methods. The traditional semantic segmentation algorithm utilizes the artificially selected features to segment the image, and has the advantages of no training process and low computational complexity. However, the selection of artificial features is often difficult, and in addition, the segmentation result of the traditional semantic segmentation method in a multiple semantic scene is not satisfactory. In recent years, computer vision has advanced into the deep learning era due to the progress of computing power and exponential increase of visual data amount, but the convolutional neural network has achieved excellent performance in an image classification task, but continuous downsampling kernel pooling operation in the convolutional neural network causes the resolution of a feature map to be reduced, a large amount of image detail information is lost, and the task of semantic segmentation is not facilitated. The full convolution neural network (FCN) firstly utilizes the convolution neural network to realize the classification task of the pixel level, and lays a foundation frame of the deep learning semantic segmentation technology. The U-net network structure connects the convolution pooling layer with the deconvolution layer, so that the classification accuracy of the pixel points is further improved, but the problems of reduced resolution of the characteristic diagram and loss of image detail information can occur in the down-sampling process of the FCN and the U-net. Aiming at the problems, the Deeplabv1 and the PSPNet (photosensitive networking) network model use the hole convolution to replace the pooling layer, so that the receptive field size of the filter is effectively increased and the loss of detail information in the downsampling process is reduced on the premise of keeping parameters unchanged. The SegNet model utilizes the encoder-decoder structure to capture sufficient spatial information at the shallow layer of the network, restoring image detail information. The encoder structure reduces training parameters, time complexity is reduced on the premise of ensuring segmentation accuracy, but the sampling mode of the encoder structure can cause the problems of sparse characteristic diagram and low segmentation accuracy, and in addition, the phenomenon of inaccurate segmentation result often occurs in the face of multi-scale objects. In order to capture multi-scale context information, a spatial pyramid cavity pooling module is provided by a Deeplabv2, PSPNet and Deeplabv3 network model, convolution cores with different hole sizes are used for performing convolution on a feature map, and multi-scale feature information is obtained.
The semantic segmentation algorithm can supplement image detail information by expanding a convolution kernel receptive field range, extracting multi-scale characteristic information and encoding and decoding structures, but a lot of detail information is lost in the downsampling process, and the capability of utilizing global context information is lacked, so that the semantic segmentation effect is limited. In addition, the high-level features are beneficial to category identification, the low-level features are beneficial to dividing accurate boundaries, and how to improve the semantic segmentation effect by utilizing the feature information of all network layers is a problem worthy of solving.
Therefore, a new optimization method and system for the deep semantic segmentation algorithm are needed to solve the above problems.
Disclosure of Invention
The invention provides an optimization method and system of a Deeplab semantic segmentation algorithm, which are used for solving the problems that a lot of detail information is lost in the downsampling process of the existing semantic segmentation method, the capability of utilizing global context information is lacked, and the semantic segmentation effect is limited.
In a first aspect, an embodiment of the present invention provides an optimization method for a deep semantic segmentation algorithm, including:
establishing a popular learning layer on the basis of a Deeplab semantic segmentation algorithm to reduce the data dimension in the popular learning layer to obtain dimension-reduced data;
inputting the data after dimensionality reduction into an improved encoder network, and extracting the bottom-layer features and the high-layer features of the image;
extracting multi-scale characteristic information of the image based on a cavity pyramid pooling method;
and carrying out bilinear interpolation on the multi-scale characteristic information of the image to obtain a prediction result.
Further, the inputting the data after the dimensionality reduction into the improved encoder network, extracting the bottom layer features and the high layer features of the image, further includes:
and performing image fusion at an image feature level on the bottom-layer features and the high-layer features of the image.
Further, the image fusion formula for performing image feature level on the bottom layer features and the high layer features of the image is as follows:
Figure BDA0002702611650000031
wherein the output bit F of the neural network1(x),f1、f2Representing convolution, pooling and activation operations of residual units, w1、w2Is a convolution kernel.
Further, the extracting the multi-scale feature information of the image based on the void pyramid pooling method includes:
the calculation formula of the cavity pyramid pooling is as follows:
Figure BDA0002702611650000032
wherein F(s) is a convolved vector, k (t) is a convolution kernel, s is a step length, and t is an offset; (F-lk) (p) pyramid pooling results for p dimensions are shown,
Figure BDA0002702611650000033
representing the summation over p dimensions.
Further, the reducing dimensionality of the data in the popular learning layer to obtain the data after dimensionality reduction comprises:
constructing a neighbor graph and connecting the sample points such that each point is connected to its nearest k points;
determining weights between neighboring points using a thermal kernel function;
and constructing an optimized objective function for prediction classification based on the weight between the adjacent points.
Further, the optimization objective function is:
Figure BDA0002702611650000041
wherein, ya,ybIs the column vector of the feature points in m-dimensional space, and W is the weight between neighboring points.
In a second aspect, an embodiment of the present invention provides an optimization system for a deep semantic segmentation algorithm, including:
the dimensionality reduction module is used for building a popular learning layer on the basis of a Deeplab semantic segmentation algorithm so as to reduce the dimensionality of data in the popular learning layer and obtain the data after dimensionality reduction;
the characteristic extraction module is used for inputting the data after the dimensionality reduction into an improved encoder network and extracting the bottom-layer characteristic and the high-layer characteristic of the image;
the pooling module is used for extracting multi-scale characteristic information of the image based on a cavity pyramid pooling method;
and the classification module is used for carrying out bilinear interpolation on the multi-scale characteristic information of the image to obtain a prediction result.
In a third aspect, an embodiment of the present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method for optimizing a deep semantic segmentation algorithm provided in the first aspect.
In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the optimization method of the deep semantic segmentation algorithm as provided in the first aspect.
According to the optimization method and the system for the Deeplab semantic segmentation algorithm, a popular learning layer is introduced to preprocess data, and compared with other traditional dimension reduction modes, the popular learning dimension reduction method can effectively retain useful information of the data and carry out preliminary aggregation. The method constructs parallel convolution network layers to extract detail feature information such as textures, outlines and the like, performs feature pixel level fusion to supplement semantic information of an original image, and then performs down-sampling operation by using the feature image obtained by fusion to further supplement the detail information. The invention also utilizes the space pyramid structure to process the high-level characteristics from multiple dimensions respectively, so that the prediction information of the full-connection layer for calculating the loss is closer to the real label information, and the loss of partial effective information is prevented.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of an optimization method of a deep semantic segmentation algorithm according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an optimized deep semantic segmentation algorithm according to an embodiment of the present invention;
FIG. 3 is a diagram of popular learning layer effects provided by embodiments of the present invention;
FIG. 4 is a block output diagram according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a convolution provided by an embodiment of the present invention;
FIG. 6 is a result of hole convolution according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an optimization system of a deep semantic segmentation algorithm according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
At present, the traditional semantic segmentation algorithm can supplement image detail information by expanding a convolution kernel receptive field range, extracting multi-scale feature information and encoding and decoding structures, but a lot of detail information is still lost in a downsampling process, and the capability of utilizing global context information is lacked, so that the semantic segmentation effect is limited.
Therefore, according to the optimization method and the optimization system for the Deeplab semantic segmentation algorithm, a popular learning layer is introduced to preprocess data, and compared with other traditional dimension reduction modes, the popular learning dimension reduction method and the system can effectively retain useful information of the data and perform preliminary aggregation. The method constructs parallel convolution network layers to extract detail feature information such as textures, outlines and the like, performs feature pixel level fusion to supplement semantic information of an original image, and then performs down-sampling operation by using the feature image obtained by fusion to further supplement the detail information. The following description and description of various embodiments are presented in conjunction with the following drawings.
Fig. 1 is a schematic flow chart of an optimization method of a deep semantic segmentation algorithm according to an embodiment of the present invention, as shown in fig. 1, the method includes:
101. establishing a popular learning layer on the basis of a Deeplab semantic segmentation algorithm to reduce the data dimension in the popular learning layer to obtain dimension-reduced data;
102. inputting the data after dimensionality reduction into an improved encoder network, and extracting the bottom-layer features and the high-layer features of the image;
103. extracting multi-scale characteristic information of the image based on a cavity pyramid pooling method;
104. and carrying out bilinear interpolation on the multi-scale characteristic information of the image to obtain a prediction result.
Specifically, fig. 2 is a schematic diagram of an optimized structure of a semantic segmentation algorithm of the deep semantic, which is provided by the embodiment of the present invention, and it can be seen from fig. 2 that the method provided by the embodiment of the present invention constructs four partial contents, which are a popular learning layer, an improved ResNet-101 layer, a spatial pyramid structure layer, and a bilinear interpolation classification layer, and correspond to steps 101 and 104 in the embodiment of the present invention, respectively. Fig. 3 is an effect diagram of a popular learning layer provided by the embodiment of the present invention, and as shown in fig. 3, the embodiment of the present invention builds a popular learning layer to implement data dimension reduction processing.
Further, in step 102, the embodiment of the present invention constructs an improved ResNet-101 structure, extracts the bottom layer features and the top layer features of the image, then performs fusion of image feature levels, and sets convolution kernels ConV2 and ConV3 as w1、w2And obtaining an output F of the improved ResNet-101 network after Block3, Block4 and convolution kernel sampling processing1(x)。
In step 103, a cavity pyramid pooling module is used to obtain multi-scale feature information of the image. The cavity pyramid pooling structure comprises the steps of respectively adopting cavity convolution with rates of 6, 12 and 18 to the pooled feature maps, carrying out 1-1 convolution on the feature maps and 3-3 maximum pooling process of the feature maps, and then reducing the feature maps into feature map sizes through a full connection layer and then carrying out vector superposition.
Finally, in step 104, bilinear interpolation is performed on the feature map to obtain a prediction result.
According to the optimization method and the system for the Deeplab semantic segmentation algorithm, a popular learning layer is introduced to preprocess data, and compared with other traditional dimension reduction modes, the popular learning dimension reduction method can effectively retain useful information of the data and carry out preliminary aggregation. The method constructs parallel convolution network layers to extract detail feature information such as textures, outlines and the like, performs feature pixel level fusion to supplement semantic information of an original image, and then performs down-sampling operation by using the feature image obtained by fusion to further supplement the detail information. The invention also utilizes the space pyramid structure to process the high-level characteristics from multiple dimensions respectively, so that the prediction information of the full-connection layer for calculating the loss is closer to the real label information, and the loss of partial effective information is prevented.
In one embodiment, the step 102 of inputting the reduced dimension data into an improved encoder network to extract the bottom-layer features and the high-layer features of the image further includes:
and performing image fusion at an image feature level on the bottom-layer features and the high-layer features of the image.
In one embodiment, the image fusion formula for performing image feature level on the underlying features and the upper features of the image is as follows:
Figure BDA0002702611650000081
wherein the output bit F of the neural network1(x),f1、f2Representing convolution, pooling and activation operations of residual units, w1、w2Is a convolution kernel.
Specifically, in the embodiment of the present invention, an improved ResNet-101 structure is substantially constructed, bottom layer features and high layer features of an image are extracted, and then image feature level fusion is performed, as shown in fig. 3, ResNet-101 is used as an encoder network in a deepab framework, an input image first passes through a convolution layer with a convolution kernel size of 7 and a step size of 2 and a pooling layer with a size of 3 and a step size of 2, which has the functions of reducing the number of training parameters, expanding the range of a sensing field, and retaining more image global information. Next, the feature map output by the pooling layer is input into 4 Blocks composed of a stack of residual units and further down-sampled, the number of residual units in the 4 Blocks is 3, 4, 23, and 3 in sequence, during which the feature map size is continuously reduced to 1/16 of the original image, which means that the feature information included therein is more complicated while the feature map size is continuously reduced.
Fig. 4 is a Block output schematic diagram provided in the embodiment of the present invention, and an output Feature map3 of Block2 in fig. 4 is taken as x, and after Block3 and Block4 downsampling processing, an output f (x) of the conventional ResNet-101 network is obtained, and an expression thereof is shown as follows.
F(x)=f2(f1(x))
FIG. 5 is a schematic diagram of convolution provided by the embodiment of the present invention, where the convolution kernels ConV2 and ConV3 in FIG. 5 are assumed to be w1、w2And obtaining an output F of the improved ResNet-101 network after Block3, Block4 and convolution kernel sampling processing1(x) The expression is shown below.
Figure BDA0002702611650000091
Wherein f is1、f2Representing the operations of convolution, pooling and activation of residual units in Block3 and Block 4.
In one embodiment, in step 103, the extracting multi-scale feature information of the image based on the hole pyramid pooling method includes:
the calculation formula of the cavity pyramid pooling is as follows:
Figure BDA0002702611650000092
wherein F(s) is a convolved vector, k (t) is a convolution kernel, s is a step length, and t is an offset; (F-lk) (p) pyramid pooling results for p dimensions are shown,
Figure BDA0002702611650000093
representing the summation over p dimensions.
Specifically, the cavity pyramid pooling module is used for acquiring multi-scale feature information of the image. The cavity pyramid pooling structure comprises the steps of respectively adopting cavity convolution with rates of 6, 12 and 18 to the pooled feature maps, carrying out 1-1 convolution on the feature maps and 3-3 maximum pooling process of the feature maps, and then reducing the feature maps into feature map sizes through a full connection layer and then carrying out vector superposition. Fig. 6 shows the result of the hole convolution according to the embodiment of the present invention, and as shown in fig. 6, the hole convolution is a process of sampling the original image, and the sampling frequency is set according to the parameter hole size (rate). If rate is set to be 1, the original image is sampled without losing any information, namely standard convolution operation; if rate >1, sampling is performed every other (rate-1) pixel on the original data, and the scope of the receptive field is increased. Defining a hole factor as l, and then calculating the hole convolution according to the formula:
Figure BDA0002702611650000101
wherein F(s) is the convolved vector, k (t) is the convolution kernel, s is the step size, and t is the offset; (F-lk) (p) pyramid pooling results for p dimensions are shown,
Figure BDA0002702611650000102
representing the summation over p dimensions.
Based on the content of each embodiment, in step 101, reducing the data dimension in the popular learning layer to obtain the dimension-reduced data specifically includes:
constructing a neighbor graph and connecting the sample points such that each point is connected to its nearest k points;
determining weights between neighboring points using a thermal kernel function;
and constructing an optimized objective function for prediction classification based on the weight between the adjacent points.
In one embodiment, the optimization objective function f (x) is:
Figure BDA0002702611650000103
wherein,ya,ybIs the column vector of the feature points in m-dimensional space, and W is the weight between neighboring points.
The specific implementation of building the popular learning layer in step 1 is as follows,
step 1a, constructing a neighbor graph, firstly connecting sample points, and connecting k points nearest to each point, wherein the k value is preset.
Step 1b, determining the weight between adjacent points by using a thermal kernel function, wherein the expression is as follows:
Figure BDA0002702611650000111
where x1 and x2 are adjacent points and t is the width of the thermonuclear. In addition, default setting of the weight value can be adopted, the weight value is set to be 1 when the two points x1 and x2 are not connected, and the weight value is set to be 0 when the two points x1 and x2 are connected.
Step 1c, similar sample points are closer in space after dimensionality reduction, and an optimization objective function f (x) is constructed as follows:
Figure BDA0002702611650000112
wherein y isa,ybIs the column vector of the feature points in the m-dimensional space, and the weight value W can be obtained in step 1 b.
In order to verify the performance of the method, the SAR image is adopted by the method, orchard data in the country of the south China's Ling-Water, the original data is obtained by a high-resolution three-number satellite, 9669 images which are divided into 200 x 200 by the step length of 200 form a data set, 85% of data is taken as training data, the rest 15% is taken as check data,
the experimental environment is under WIN10 system, in order to build improved generation deep network fast, adopts present comparatively popular deep learning frame Tensorflow, in the experiment, in order to improve the training speed of experiment network, adopts single 8G capacity GPU's calculation mode, adopts NVIDIA1080 card on the hardware, utilizes GUP acceleration mode operation.
To evaluate betterThe effectiveness of the algorithm, the experiment, with Kappa coefficient (Kappa), Overall Accuracy (OA) and Accuracy of a specific class (Accuracy), was evaluated. PabNumber of pixel points, t, representing prediction class a classified into class ba=∑bPabThe total number of pixel points belonging to the category a is represented, and the rating index is defined as follows:
kappa coefficient: kappa is a statistic used to measure the consistency between predictions and ground truth.
Figure BDA0002702611650000121
Figure BDA0002702611650000122
Figure BDA0002702611650000123
Where K is the number of classes [1, K ]
Overall accuracy: OA refers to the percentage of correctly classified pixels and all pixels in the entire image.
Figure BDA0002702611650000124
Class-specific precision: accuracy is the percentage of correctly classified pixels for each class:
Figure BDA0002702611650000125
the Kappa, OA and class specific precision value is between 0 and 1, and the higher the value is, the better the classification performance is.
In order to verify the classification effectiveness of the improved Deeplab network on the hyperspectral orchard image, 6 groups of experiments are performed on orchard data in the Hainan area, and b-e are 5 groups of comparison experiments, which are respectively as follows: GLCM + SVM, Decompositon + SPM, SDU-CNN and Deeplab, and the classification accuracy and evaluation index of various fruits in each group of experiments are shown in the following table. It can be seen that the total classification accuracy of the decomplexiton + SPM method is the lowest, which is only 64.16%, the misclassification of mangos in the three periods is serious, and the classification accuracy of two fruits, betel nut and longan, is also poor. The GLCM + SVM method improves the classifying precision of the betel nuts and the longans by about 10%, but the classifying effect of the same fruit in different periods is still poor. Compared with the two classification methods, the SDU-CNN method improves the classification precision of three fruits, namely mango, betel nut and longan by about 10%. The original Deeplab method improves the classification precision of stage I mango, stage II mango and betel nut to 95.62%, 91.56% and 94.33% respectively, but a larger error still exists in the classification of stage III mango and longan. The algorithm achieves the highest classification accuracy on 5 orchards of mangoes in stage I, mangoes in stage II, mangoes in stage III, betel nuts and longans, and the classification accuracy is as follows: 98.56%, 98.33%, 95.62%, 99.23%, 98.32%.
TABLE 1 improved Deepalb class segmentation confusion matrix
Figure BDA0002702611650000131
TABLE 2 evaluation index
Figure BDA0002702611650000132
Figure BDA0002702611650000141
In an embodiment, fig. 7 is a schematic structural diagram of an optimization system of a deep semantic segmentation algorithm provided in an embodiment of the present invention, and as shown in fig. 7, the apparatus includes a dimension reduction module 701, a feature extraction module 702, a pooling module 703 and a classification module 704, where:
the dimensionality reduction module 701 is used for building a popular learning layer on the basis of a Deeplab semantic segmentation algorithm so as to reduce the dimensionality of data in the popular learning layer and obtain the data after dimensionality reduction. The feature extraction module 702 is configured to input the reduced dimension data into an improved encoder network, and extract a bottom-layer feature and a top-layer feature of an image. The pooling module 703 is configured to extract multi-scale feature information of the image based on a cavity pyramid pooling method. The classification module 704 is configured to perform bilinear interpolation on the multi-scale feature information of the image to obtain a prediction result.
For a specific way of optimizing the deep semantic segmentation algorithm by using the dimension reduction module 701, the feature extraction module 702, the pooling module 703 and the classification module 704, reference may be made to the above method embodiment, and details of the embodiment of the present invention are not described herein again.
In an embodiment, based on the same concept, an embodiment of the present invention further provides an electronic device, fig. 8 is a schematic structural diagram of the electronic device provided in the embodiment of the present invention, and as shown in fig. 8, the electronic device may include: a processor (processor)801, a communication Interface (Communications Interface)802, a memory (memory)803 and a bus 804, wherein the processor 801, the communication Interface 802 and the memory 803 complete communication with each other via the bus 804. The processor 801 may call logic instructions in the memory 803 to perform the following method: establishing a popular learning layer on the basis of a Deeplab semantic segmentation algorithm to reduce the data dimension in the popular learning layer to obtain dimension-reduced data; inputting the data after dimensionality reduction into an improved encoder network, and extracting the bottom-layer features and the high-layer features of the image; extracting multi-scale characteristic information of the image based on a cavity pyramid pooling method; and carrying out bilinear interpolation on the multi-scale characteristic information of the image to obtain a prediction result.
In one embodiment, based on the same concept, the present embodiment also provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, which when executed by a computer, the computer is capable of performing the method provided by the above-mentioned method embodiments, for example, including: establishing a popular learning layer on the basis of a Deeplab semantic segmentation algorithm to reduce the data dimension in the popular learning layer to obtain dimension-reduced data; inputting the data after dimensionality reduction into an improved encoder network, and extracting the bottom-layer features and the high-layer features of the image; extracting multi-scale characteristic information of the image based on a cavity pyramid pooling method; and carrying out bilinear interpolation on the multi-scale characteristic information of the image to obtain a prediction result.
In one embodiment, based on the same concept, the present embodiment further provides a non-transitory computer-readable storage medium storing computer instructions that cause the computer to perform the method provided by the above method embodiments, for example, including: establishing a popular learning layer on the basis of a Deeplab semantic segmentation algorithm to reduce the data dimension in the popular learning layer to obtain dimension-reduced data; inputting the data after dimensionality reduction into an improved encoder network, and extracting the bottom-layer features and the high-layer features of the image; extracting multi-scale characteristic information of the image based on a cavity pyramid pooling method; and carrying out bilinear interpolation on the multi-scale characteristic information of the image to obtain a prediction result.
The embodiments of the present invention can be arbitrarily combined to achieve different technical effects.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (9)

1. A method for optimizing a Deeplab semantic segmentation algorithm is characterized by comprising the following steps:
establishing a popular learning layer on the basis of a Deeplab semantic segmentation algorithm to reduce the data dimension in the popular learning layer to obtain dimension-reduced data;
inputting the data after dimensionality reduction into an improved encoder network, and extracting the bottom-layer features and the high-layer features of the image;
extracting multi-scale characteristic information of the image based on a cavity pyramid pooling method;
and carrying out bilinear interpolation on the multi-scale characteristic information of the image to obtain a prediction result.
2. The method for optimizing a deep semantic segmentation algorithm according to claim 1, wherein the step of inputting the reduced-dimension data into an improved encoder network to extract bottom-layer features and high-layer features of an image further comprises:
and performing image fusion at an image feature level on the bottom-layer features and the high-layer features of the image.
3. The method for optimizing the deep semantic segmentation algorithm according to claim 1, wherein the image fusion formula for performing the image feature level on the bottom layer features and the high layer features of the image is as follows:
Figure FDA0002702611640000011
wherein the output bit F of the neural network1(x),f1、f2Representing convolution, pooling and activation operations of residual units, w1、w2Is a convolution kernel.
4. The optimization method of the deep semantic segmentation algorithm according to claim 3, wherein the extracting multi-scale feature information of the image based on the hole pyramid pooling method includes:
the calculation formula of the cavity pyramid pooling is as follows:
Figure FDA0002702611640000021
wherein F(s) is a convolved vector, k (t) is a convolution kernel, s is a step length, and t is an offset; (F-lk) (p) pyramid pooling results for p dimensions are shown,
Figure FDA0002702611640000022
representing the summation over p dimensions.
5. The method for optimizing a deep semantic segmentation algorithm according to claim 1, wherein the reducing dimensionality of data in the popular learning layer to obtain the reduced dimensionality data comprises:
constructing a neighbor graph and connecting the sample points such that each point is connected to its nearest k points;
determining weights between neighboring points using a thermal kernel function;
and constructing an optimized objective function for prediction classification based on the weight between the adjacent points.
6. The method of optimizing a deep semantic segmentation algorithm according to claim 5, wherein the optimization objective function is:
Figure FDA0002702611640000023
wherein, ya,ybIs the column vector of the feature points in m-dimensional space, and W is the weight between neighboring points.
7. An optimization system for a Deeplab semantic segmentation algorithm, comprising:
the dimensionality reduction module is used for building a popular learning layer on the basis of a Deeplab semantic segmentation algorithm so as to reduce the dimensionality of data in the popular learning layer and obtain the data after dimensionality reduction;
the characteristic extraction module is used for inputting the data after the dimensionality reduction into an improved encoder network and extracting the bottom-layer characteristic and the high-layer characteristic of the image;
the pooling module is used for extracting multi-scale characteristic information of the image based on a cavity pyramid pooling method;
and the classification module is used for carrying out bilinear interpolation on the multi-scale characteristic information of the image to obtain a prediction result.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of the method for optimizing a deep semantic segmentation algorithm according to any one of claims 1 to 6.
9. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the steps of the optimization method of the deplab semantic segmentation algorithm according to any one of claims 1 to 6.
CN202011027787.2A 2020-09-25 2020-09-25 Optimization method and system of Deeplab semantic segmentation algorithm Pending CN112329808A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011027787.2A CN112329808A (en) 2020-09-25 2020-09-25 Optimization method and system of Deeplab semantic segmentation algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011027787.2A CN112329808A (en) 2020-09-25 2020-09-25 Optimization method and system of Deeplab semantic segmentation algorithm

Publications (1)

Publication Number Publication Date
CN112329808A true CN112329808A (en) 2021-02-05

Family

ID=74304261

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011027787.2A Pending CN112329808A (en) 2020-09-25 2020-09-25 Optimization method and system of Deeplab semantic segmentation algorithm

Country Status (1)

Country Link
CN (1) CN112329808A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111711A (en) * 2021-03-11 2021-07-13 浙江理工大学 Pooling method based on bilinear pyramid and spatial pyramid
CN114549958A (en) * 2022-02-24 2022-05-27 四川大学 Night and disguised target detection method based on context information perception mechanism
CN117197651A (en) * 2023-07-24 2023-12-08 移动广播与信息服务产业创新研究院(武汉)有限公司 Method and system for extracting field by combining edge detection and semantic segmentation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109344821A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Small target detecting method based on Fusion Features and deep learning
CN110232394A (en) * 2018-03-06 2019-09-13 华南理工大学 A kind of multi-scale image semantic segmentation method
WO2020093630A1 (en) * 2018-11-09 2020-05-14 五邑大学 Antenna downward inclination angle measurement method based on multi-scale deep semantic segmentation network
CN111292330A (en) * 2020-02-07 2020-06-16 北京工业大学 Image semantic segmentation method and device based on coder and decoder

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110232394A (en) * 2018-03-06 2019-09-13 华南理工大学 A kind of multi-scale image semantic segmentation method
CN109344821A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Small target detecting method based on Fusion Features and deep learning
WO2020093630A1 (en) * 2018-11-09 2020-05-14 五邑大学 Antenna downward inclination angle measurement method based on multi-scale deep semantic segmentation network
CN111292330A (en) * 2020-02-07 2020-06-16 北京工业大学 Image semantic segmentation method and device based on coder and decoder

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
王云艳 等: "Deeplab网络的极化合成孔径雷达图像分类", 《测绘科学》, vol. 45, no. 6, pages 110 - 117 *
王云艳 等: "改进型DeepLab 的极化SAR 果园分类", 《中国图像图像学报》, vol. 24, no. 11, pages 2035 - 2044 *
韩松臣 等: "基于改进Faster‑RCNN 的机场场面小目标物体 检测算法", 《南京航空航天大学学报》, vol. 51, no. 6, pages 735 - 741 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111711A (en) * 2021-03-11 2021-07-13 浙江理工大学 Pooling method based on bilinear pyramid and spatial pyramid
CN114549958A (en) * 2022-02-24 2022-05-27 四川大学 Night and disguised target detection method based on context information perception mechanism
CN114549958B (en) * 2022-02-24 2023-08-04 四川大学 Night and camouflage target detection method based on context information perception mechanism
CN117197651A (en) * 2023-07-24 2023-12-08 移动广播与信息服务产业创新研究院(武汉)有限公司 Method and system for extracting field by combining edge detection and semantic segmentation
CN117197651B (en) * 2023-07-24 2024-03-29 移动广播与信息服务产业创新研究院(武汉)有限公司 Method and system for extracting field by combining edge detection and semantic segmentation

Similar Documents

Publication Publication Date Title
CN110232394B (en) Multi-scale image semantic segmentation method
CN109949255B (en) Image reconstruction method and device
CN111369440B (en) Model training and image super-resolution processing method, device, terminal and storage medium
CN112651438A (en) Multi-class image classification method and device, terminal equipment and storage medium
WO2020119527A1 (en) Human action recognition method and apparatus, and terminal device and storage medium
CN111488985B (en) Deep neural network model compression training method, device, equipment and medium
CN112329808A (en) Optimization method and system of Deeplab semantic segmentation algorithm
CN108510504B (en) Image segmentation method and device
CN110570353A (en) Dense connection generation countermeasure network single image super-resolution reconstruction method
CN111325271B (en) Image classification method and device
CN112699937B (en) Apparatus, method, device, and medium for image classification and segmentation based on feature-guided network
CN110059728B (en) RGB-D image visual saliency detection method based on attention model
CN111612017A (en) Target detection method based on information enhancement
CN111860398A (en) Remote sensing image target detection method and system and terminal equipment
CN112465801B (en) Instance segmentation method for extracting mask features in scale division mode
CN113313180B (en) Remote sensing image semantic segmentation method based on deep confrontation learning
CN112270332A (en) Three-dimensional target detection method and system based on sub-stream sparse convolution
CN113011253B (en) Facial expression recognition method, device, equipment and storage medium based on ResNeXt network
CN112329801A (en) Convolutional neural network non-local information construction method
CN110633640A (en) Method for identifying complex scene by optimizing PointNet
CN117576402B (en) Deep learning-based multi-scale aggregation transducer remote sensing image semantic segmentation method
CN115660955A (en) Super-resolution reconstruction model, method, equipment and storage medium for efficient multi-attention feature fusion
CN116863194A (en) Foot ulcer image classification method, system, equipment and medium
CN111126185A (en) Deep learning vehicle target identification method for road intersection scene
CN114373110A (en) Method and device for detecting target of input image and related products

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination