CN111259904A

CN111259904A - Semantic image segmentation method and system based on deep learning and clustering

Info

Publication number: CN111259904A
Application number: CN202010047315.7A
Authority: CN
Inventors: 郭丽; 刘知贵; 张小乾; 白克强; 薛旭倩; 刘道广; 李理; 张活力; 吴均; 付聪; 喻琼
Original assignee: Southwest University of Science and Technology
Current assignee: Southwest University of Science and Technology
Priority date: 2020-01-16
Filing date: 2020-01-16
Publication date: 2020-06-09
Anticipated expiration: 2040-01-16
Also published as: CN111259904B

Abstract

The invention discloses a semantic image segmentation method and a semantic image segmentation system based on deep learning and clustering, which comprise the following steps: s1, performing convolution and pooling on the original image through a convolution neural network to obtain a linear characteristic matrix of the original image; s2, performing subspace clustering on the linear feature matrix to obtain clustered feature data; and S3, processing the clustered feature data to the same pixels as the original image through deconvolution and upsampling to obtain a segmented image. The invention combines the Convolutional Neural Network (CNN) in the deep neural network with subspace clustering, and replaces the full connection layer in the CNN with sparse subspace, thereby solving the problems of complicated semantic image segmentation calculation, large data volume and poor information in the prior art. A subspace clustering method is introduced into the neural network, so that a large amount of marking data required by the CNN during working is reduced, and unsupervised learning of the CNN neural network is realized.

Description

Semantic image segmentation method and system based on deep learning and clustering

Technical Field

The invention relates to the field of machine vision, in particular to a semantic image segmentation method and a semantic image segmentation system based on deep learning and clustering.

Background

The purpose of semantic image segmentation is to classify the semantics of each region, i.e. what objects the region is, i.e. to indicate all objects from the image to their respective classes, and to segment them. At present, the image segmentation and the sub-space clustering are realized by adopting a deep learning neural network tool as a common segmentation mode. They have advantages and disadvantages, respectively.

In the deep learning neural network mode, the characteristic data is extracted in an automatic learning mode of a computer, and the characteristic learning is integrated into the model establishing process, so that the imperfection caused by artificial design characteristics is reduced, and the characteristic of better classification performance is achieved.

Subspace clustering is an effective way to realize high-dimensional data clustering, dimensionality reduction is performed in advance, a graph beneficial to segmentation is constructed by often using a subspace clustering method to solve the problem of complex image segmentation, the feature data dimensionality of a complex image region is high, a large number of irrelevant attributes exist, clusters are difficult to form, and the problems of abundant data and poor information exist. However, subspace clustering methods require linear representation between data, realistic image data generally cannot be represented linearly in an input space, and subspace segmentation becomes difficult at this time, kernel-based methods are generally adopted at present, but kernel methods are similar to template-based methods, performance of kernel functions depends on selection of kernel functions to a great extent, kernel function selection is empirical, and nonlinear transformation is ambiguous, and it cannot be determined whether a predefined kernel can generate an implicit feature space suitable for subspace clustering, which brings difficulty in processing a large-scale data set.

Disclosure of Invention

The invention aims to solve the technical problems that in the existing semantic image segmentation technology, the system data volume is large, the computation is complex, the feature data are difficult to form clusters, and the feature data are inconvenient to extract.

The invention is realized by the following technical scheme:

a semantic image segmentation method based on deep learning and clustering is characterized by comprising the following steps:

s1, performing convolution and pooling on the original image through a convolution neural network to obtain a linear characteristic matrix of the original image;

s2, performing subspace clustering on the linear feature matrix to obtain clustered feature data;

and S3, processing the clustered feature data to the same pixels as the original image through deconvolution and upsampling to obtain a segmented image.

Firstly, a deep learning method is adopted to carry out multilayer convolution and pooling on the characteristic data of an original image through a convolution neural network, namely a layered nonlinear transformation process, a linear characteristic matrix of the original image is obtained and is transmitted to a subspace clustering layer through a highest pooling layer, and since the image characteristics after convolution and pooling have linearity, subspace segmentation becomes easy. And obtaining an affinity matrix after reconstructing and thinning the subspace, and clustering by using a spectral clustering mode. The invention replaces the full-connection layer of the convolutional neural network with the subspace clustering layer, thereby reducing the long calculation time and the large calculation difficulty brought by the classification mode of the full-connection layer. And finally, in order to improve the segmentation precision, the invention adopts a deconvolution and upsampling mode in a deep neural network to restore the segmented image to the original pixel and output the original pixel.

Further, the convolutional neural network comprises a plurality of layers of networks from low to high, each layer of network comprises convolution and pooling, and the step S1 comprises the following sub-steps:

in a lower layer network of the convolutional neural network, performing convolutional operation by using an M convolutional kernel, extracting simple common characteristic data through pooling, performing convolutional operation by using an N convolutional kernel in a higher layer network of the convolutional neural network, and extracting complex common characteristic data through pooling, wherein the number of the M convolutional kernels is less than that of the N convolutional kernels;

the object of each layer of network convolution and pooling of the convolutional neural network is common characteristic data obtained by the previous layer of network, and the object of the lowest layer of network convolution and pooling of the convolutional neural network is an original image;

and forming a linear matrix by using the common characteristic data obtained by the convolution and pooling of the highest layer network of the convolutional neural network, wherein the linear matrix is the linear characteristic matrix of the original image.

The invention is based on deep learning and clustering semantic image segmentation, and a definite mapping is learned to make a subspace more separable. The method comprises the steps of firstly adopting a deep learning method to realize layered nonlinear transformation extraction of feature data through a convolutional neural network, namely sequentially extracting simple common features from a lower layer network through convolution operation with less convolutional kernels, then adopting more convolutional kernels to realize extraction of complex features from a higher layer network, then realizing feature data clustering through a sparse subspace clustering method by using the features extracted from the highest pooling layer, and finally recovering original pixels of the segmented image through a decoding layer formed by the deep neural network and then outputting the image.

Further, the subspace clustering is sparse subspace clustering, and the step S2 includes the following sub-steps: and performing sparsification processing on the linear characteristic matrix through a subspace to obtain a sparse coefficient matrix, calculating the similarity between all joints of the image according to the sparse coefficient matrix to obtain an affinity matrix, and performing spectral clustering on the affinity matrix to obtain clustered characteristic data. Preferably, the linear feature matrix is partitioned into different subspaces by sparse coefficients. Preferably, the spectral clustering adopts a K-means clustering algorithm.

Further, the number of convolution layers for the deconvolution in step S3 is the same as the number of convolution layers for the convolution in step S1. When the image feature data mapping is restored to be the same as the original input image resolution, the convolution operation can be stopped, and a segmented image is obtained.

In addition, a semantic image segmentation system based on deep learning and clustering comprises:

an input unit: used for inputting the original image;

an encoding unit: the system comprises a convolution neural network, a characteristic matrix and a characteristic matrix, wherein the convolution neural network is used for performing convolution and pooling on an original image to obtain a linear characteristic matrix of the original image;

a clustering unit: the linear feature matrix is used for performing subspace clustering on the linear feature matrix to obtain clustered feature data;

a decoding unit: the characteristic data after clustering is processed to the same pixel as the original image through deconvolution and upsampling to obtain a segmented image;

an output unit: for outputting the segmented image.

Compared with the prior art, the invention has the following advantages and beneficial effects:

the convolutional neural network has the advantages that the image data features are extracted, the learning is supervised, meanwhile, the subspace clustering method is introduced into the neural network, the feature learning modules of all layers of the convolutional coding layer can be supervised through the output of spectral clustering, a large amount of labeled data needed by the convolutional neural network during working is reduced, the unsupervised learning of the convolutional neural network is realized, and the operation of a full connection layer of the convolutional neural network on a large amount of data is avoided. A subspace clustering method is implemented by learning a set of explicit nonlinear mapping functions through a convolutional neural network to map the input to another space, and using the sample representations in the new space to compute an affinity matrix. After the neural network training, the noise of the feature data is reduced, and the robustness of the subspace clustering model to the noise is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:

FIG. 1 is a schematic view of the present invention.

Fig. 2 is a design block diagram of embodiment 2.

FIG. 3 is a fully-connected layer neural network.

Fig. 4 is a schematic diagram of a subspace clustering layer instead of a fully connected layer design.

Fig. 5 shows the feature classification results of example 4.

FIG. 6 is a schematic diagram of a fully connected layer classification.

FIG. 7 is a block diagram of the decoding layer design of the convolutional neural network of embodiment 5.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention.

Example 1

The embodiment is a semantic image segmentation method based on deep learning and clustering, and the design principle is shown in fig. 1. Firstly, a deep learning method is adopted to realize the hierarchical nonlinear transformation of the feature data of an original image through a coding layer of a Convolutional Neural Network (CNN), the feature data are transmitted to a subspace clustering layer through a highest pooling layer, the feature data clustering is realized through a sparse subspace clustering method, and finally, in order to improve the segmentation precision, a decoding layer formed by the deep neural network is adopted to restore the original pixel of the segmented image and output the restored image.

Example 2

The present embodiment is based on embodiment 1, and elaborates the principle of the deep neural network coding layer. The deep neural network of the coding layer can select neural network architectures such as BP neural network, RNN neural network, CNN and the like. The invention utilizes a powerful function of extracting data characteristics from image information by a Convolutional Neural Network (CNN), adopts a Convolutional Neural Network (CNN) architecture at a deep neural network encoder layer, and consists of convolutional layers 1-a pooling layer 1-a convolutional layer 2-a pooling layer 2-a convolutional layer 3-a pooling layer 3-a convolutional layer 4-a pooling layer 4 …. Compared with the common method that the original data is directly used as input, the design method has the advantages that data are layered and changed in a multi-linear mode, subspace representation errors are reduced, and the robustness of a subspace clustering model to noise is improved. The block diagram of the convolutional neural network coding layer design is shown in fig. 2.

Example 3

In this embodiment, on the basis of embodiment 1, a subspace clustering layer design method is elaborated, and a sparse subspace clustering method is used to replace a full connection layer of the CNN deep neural network. The number of net parameters of the fully connected layer is very large, even much larger than the parameter amount of the plurality of convolutional layers, and taking fig. 3 as an example, if the image is pooled into 20 images with 12 × 12 outputs, after the action of the fully connected first layer of 100 neurons, the whole image has 100 × 20 × 12 — 288000 parameters, which increases the calculation time and difficulty. The invention avoids the generation of a classification mode by full connection, adopts a sparse subspace clustering method introduced into a deep neural network structure, and realizes the classification of characteristic data by utilizing the sparse subspace clustering method on the characteristic matrix output by the front-end top layer convolution layerClass, make the subspace more separable, e.g., divide the image features of the entire picture into k subspaces in this design

In addition, the sparse subspace clustering method defines the training target of the front-end neural network: the learning to reconstruct the input data error is minimal. And the weights among all nodes in the full-connection layer are the similarity among the corresponding nodes, all the weights form a coefficient matrix with X being XC, the coefficient matrix further constructs an affinity matrix among data points, and the data clustering is realized by utilizing a spectral clustering method. The subspace clustering layer instead of the fully connected layer design schematic is shown in fig. 4. Subspace representation coefficient calculation method:

subspace representation model: min | | C | luminance₁s.t.X＝XC,C_ii＝0。

Using the alternating bi-directional multiplier method f (c) ═ X-XC,

a calculation step: 1.

2、λ_k+1＝λ_k+μ_kh(C_k+1)

as a result: calculating by subspace representation coefficient to obtain coefficient matrix C

The similarity matrix calculation method comprises the following steps: w ═ C | + | C^T|)/2

The design method of replacing the full connection layer by the subspace clustering layer has the advantages of reducing a large amount of marking data required by the CNN during working and realizing unsupervised learning of the neural network.

Example 4

In this embodiment, based on embodiment 3, the process of replacing the full connection layer with the subspace clustering layer is described by taking an example of dividing the 2-class data into 6 groups of feature data matrices. Using a sparse matrix calculation formula: min | | C | luminance₁s.t.X＝XC,C_ii0; whereinX represents a 6 characteristic data matrix output by the highest pooling layer, C represents a coefficient matrix and is a sparse matrix, and formula (1) is shown as follows:

pairing the coefficient matrix in equation (1) by using the alternating bi-directional multiplier method

The coefficient matrix obtained by the construction has a large number of 0 vectors, such as the matrix C/in the formula (2).

The matrix has more coefficients with 0, i.e. the number in the feature can be represented by a small amount of data, and X11 in formula (1) is represented by the corresponding other data in the 6 groups of data as shown in formula (3):

X11＝X12*0+X13*1/2+X14*1/2+X15*0+X16*0

X12＝X11*0+X13*0+X14*0+X15*1/4+X16*3/4

………………………..

X16＝X11*0+X12*1/2+X13*0+X14*0+X15*1/2

i.e. by calculation with a small amount of data

And

these 5 sets of feature vectors are classified intoShown in FIG. 5

The A and B classes are not required to be known before clustering the characteristic values, the classification can be completed only by knowing the number of the classes, and the classes of each class are not required to be known exactly before the classification like neural network learning.

Example 5

In this embodiment, under the calculation condition of embodiment 4, a full-connection manner is adopted to implement two types of data classification processes of A, B on 6 groups of feature data matrices:

the first step is as follows: it is necessary to give A, B label

The second step is that: establishing a full connection layer and constructing a hidden layer, wherein all vectors in the 6 groups of characteristic values are connected with each neuron of each layer in the hidden layer by using a weight value, and the connection weight value is adjusted repeatedly. As shown in fig. 6.

Example 6

The present embodiment is based on embodiment 1, and details the principle of the deep neural network decoding layer. Data output after an input image is subjected to convolution and pooling operations of a deep neural network encoder layer is compressed into an image with lower pixels, if the input image is directly segmented by using image features of the highest layer, edge details of a segmented target image are fuzzy, because the features of the higher layer of the CNN are more complex and semantic information is richer, but the bottom layer features of the CNN usually contain many common features and contain more information such as edges, positions and the like. Therefore, in order to improve the image segmentation precision, a clear segmented image is output at an output layer, a deep neural network decoding layer is designed at the moment, and the feature map is processed into the size of an original image through an up-sampling and deconvolution operation. The decoding layer is specifically composed of an up-sampling layer 1-a convolution layer n + 1-an up-sampling layer 2-a convolution layer n + 2-an up-sampling layer 3-a convolution layer n +3 …. In order to prevent the over-fitting phenomenon, the number of the convolutional layers in the decoder can be the same as that of the convolutional layers in the coding layer, the convolution operation can be stopped when the feature mapping is restored to be the same as the resolution of the original input image, and finally the segmented image is output at the output layer. The design block diagram of the decoding layer of the deep neural network is shown in FIG. 7.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A semantic image segmentation method based on deep learning and clustering is characterized by comprising the following steps:

2. The deep learning and clustering based semantic image segmentation method according to claim 1, wherein the convolutional neural network comprises a plurality of layers from low to high, each layer of the layers comprising convolution and pooling, and the step S1 comprises the following sub-steps:

3. The deep learning and clustering based semantic image segmentation method according to claim 1, wherein the subspace clustering is sparse subspace clustering, and the step S2 comprises the following substeps: and performing sparsification processing on the linear characteristic matrix through a subspace to obtain a sparse coefficient matrix, calculating the similarity between all joints of the image according to the sparse coefficient matrix to obtain an affinity matrix, and performing spectral clustering on the affinity matrix to obtain clustered characteristic data.

4. The deep learning and clustering based semantic image segmentation method according to claim 3, wherein the linear feature matrix is segmented into different subspaces by sparse coefficients.

5. The deep learning and clustering based semantic image segmentation method according to claim 3, wherein the spectral clustering adopts a K-means clustering algorithm.

6. The method for semantic image segmentation based on deep learning and clustering according to claim 1, wherein the number of convolution layers of the deconvolution in step S3 is the same as the number of convolution layers of the convolution in step S1.

7. A semantic image segmentation system based on deep learning and clustering, comprising:

an input unit: used for inputting the original image;

an output unit: for outputting the segmented image.

8. The deep learning and clustering based semantic image segmentation system according to claim 7 wherein the coding unit comprises a plurality of layers of networks from low to high, each layer of networks comprising convolution and pooling, comprising:

the lower layer network of the coding unit uses M convolution kernels to perform convolution operation and extracts simple common characteristic data through pooling, the upper layer network of the coding unit uses N convolution kernels to perform convolution operation and extracts complex common characteristic data through pooling, and the number of the M convolution kernels is less than that of the N convolution kernels;

the object of each layer of network convolution and pooling of the coding unit is common characteristic data obtained by a network of the previous layer, and the object of the lowest layer of network convolution and pooling of the coding unit is an original image;

and forming a linear matrix by the common characteristic data obtained by the convolution and pooling of the highest layer network of the coding unit, wherein the linear matrix is the linear characteristic matrix of the original image.

9. The deep learning and clustering based semantic image segmentation system of claim 7 wherein the clustering unit is a sparse subspace clustering unit configured to: and carrying out sparsification treatment on the characteristic matrix through a subspace to obtain an affinity matrix, and carrying out spectral clustering on the affinity matrix to obtain clustered characteristic data.

10. The deep learning and clustering based semantic image segmentation system according to claim 7, wherein the number of convolution layers that the decoding unit deconvolves is the same as the number of convolution layers that the encoding unit convolves.