CN110675412A

CN110675412A - Image segmentation method, training method, device and equipment of image segmentation model

Info

Publication number: CN110675412A
Application number: CN201910922346.XA
Authority: CN
Inventors: 陈思宏
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-09-27
Filing date: 2019-09-27
Publication date: 2020-01-10
Anticipated expiration: 2039-09-27
Also published as: CN110675412B

Abstract

The application discloses an image segmentation method, an image segmentation model training device and image segmentation model training equipment, and relates to the field of artificial intelligence. The method comprises the following steps: acquiring a target image; inputting the target image into an image segmentation model, and outputting to obtain a segmentation result; the image segmentation model is an image obtained by training a sample image, the image segmentation model comprises a feature decomposition model and a feature clustering model, a reduced dimension feature space is obtained through decomposition of the feature decomposition model, a clustering result is obtained through the feature clustering model, a clustering loss value is determined according to the clustering result, and model parameters of the feature decomposition model are adjusted. After the image features of the sample image are decomposed, the dimension reduction feature space obtained by decomposition is clustered, a clustering loss value is calculated according to a clustering result, and the parameter adjustment is carried out on the feature decomposition model, so that the dimension reduction feature space obtained by decomposition of the feature decomposition model is more beneficial to clustering, and the accuracy of image segmentation is improved.

Description

Image segmentation method, training method, device and equipment of image segmentation model

Technical Field

The embodiment of the application relates to the field of artificial intelligence, in particular to an image segmentation method, an image segmentation model training device and image segmentation model training equipment.

Background

The image segmentation technology is a technology for performing self-supervision segmentation according to the texture features of an image, and schematically illustrates that the image segmentation technology is applied to the medical field, an acquired medical image is input into an image segmentation model, and a medical image divided by image blocks is output, wherein the image blocks divide the medical image into a normal region and a suspected lesion region.

In the related technology, the core of the image segmentation technology is to decompose and cluster the image feature space, wherein a decomposition model decomposes the feature matrix of the image to obtain a base matrix and a coefficient matrix, and the clustering model clusters the coefficient matrix to realize the segmentation of the image, wherein the decomposition model calculates loss through the base matrix and the coefficient matrix obtained by decomposition in the self-training process, thereby performing parameter optimization on the decomposition model.

However, in the process of decomposing the image features, the trained decomposition model has weak adaptability of the dimension-reduced feature space obtained by decomposition to the clustering model, and the clustering effect of the clustering model is poor, so that the accuracy of the image segmentation result is low.

Disclosure of Invention

The embodiment of the application provides an image segmentation method, an image segmentation model training device and image segmentation model training equipment, and can solve the problems that in the process of image feature decomposition, the adaptability of a dimension reduction feature space obtained through decomposition to a clustering model is not strong, the clustering effect of the clustering model is poor, and the accuracy of an image segmentation result is low. The technical scheme is as follows:

in one aspect, an image segmentation method is provided, and the method includes:

acquiring a target image, wherein the target image is an image to be segmented of image content;

inputting the target image into an image segmentation model, and outputting to obtain a segmentation result of the target image;

the image segmentation model is an image obtained by training a sample image, the image segmentation model comprises a feature decomposition model and a feature clustering model, the sample image is decomposed by the feature decomposition model to obtain a dimension reduction feature space, a clustering result is obtained by the feature clustering model, a clustering loss value is determined according to the clustering result, and the clustering loss value is used for adjusting model parameters of the feature decomposition model.

In another aspect, a method for training an image segmentation model is provided, the method including:

extracting image features of sample images, wherein the sample images are images used for training the image segmentation model, and the image segmentation model comprises a feature decomposition model and a feature clustering model, wherein the feature decomposition model is used for carrying out dimension reduction processing on the image features, and the feature clustering model is used for clustering a feature space after dimension reduction;

inputting the image features into the feature decomposition model to obtain a dimension reduction feature space;

inputting a spatial feature matrix corresponding to the dimensionality reduction feature space into the feature clustering model to obtain an attractor matrix, wherein the attractor matrix is used for representing an image segmentation result of the sample image;

determining a clustering loss value according to the difference between the spatial feature matrix and the attractor matrix;

and adjusting the model parameters of the characteristic decomposition model through the clustering loss value.

In another aspect, an image segmentation apparatus is provided, the apparatus comprising:

the device comprises an acquisition module, a segmentation module and a segmentation module, wherein the acquisition module is used for acquiring a target image, and the target image is an image to be segmented of image content;

the input module is used for inputting the target image into an image segmentation model and outputting a segmentation result of the target image;

In another aspect, an apparatus for training an image segmentation model is provided, the apparatus including:

the image segmentation module is used for performing image segmentation on the image features of the image, and the image segmentation module comprises a feature decomposition model and a feature clustering model;

the input module is used for inputting the image characteristics into the characteristic decomposition model to obtain a dimension reduction characteristic space;

the input module is further configured to input a spatial feature matrix corresponding to the dimensionality reduction feature space into the feature clustering model to obtain an attractor matrix, where the attractor matrix is used to represent an image segmentation result of the sample image;

the determining module is used for determining a clustering loss value according to the difference between the spatial feature matrix and the attractor matrix;

and the adjusting module is used for adjusting the model parameters of the characteristic decomposition model through the clustering loss values.

In another aspect, a computer device is provided, which includes a processor and a memory, where at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the memory, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the image segmentation method or the training method of the image segmentation model as provided in the embodiments of the present application.

In another aspect, a computer-readable storage medium is provided, in which at least one instruction, at least one program, a set of codes, or a set of instructions is stored, which is loaded and executed by the processor to implement the image segmentation method or the training method of the image segmentation model as provided in the embodiments of the present application.

In another aspect, a computer program product is provided, which when run on a computer causes the computer to perform the image segmentation method or the training method of the image segmentation model as provided in the embodiments of the present application described above.

The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:

after the image features of the sample image are decomposed, the dimension reduction feature space obtained by decomposition is clustered, a clustering loss value is calculated according to a clustering result, and the parameter adjustment is performed on the feature decomposition model by combining the clustering loss value, so that the dimension reduction feature space obtained by decomposition of the feature decomposition model is optimized, the dimension reduction feature space obtained by decomposition of the feature decomposition model is more beneficial to clustering, and the accuracy of image segmentation is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of an image segmentation process provided by an exemplary embodiment of the present application;

FIG. 2 is a flowchart of a method for training an image segmentation model provided by an exemplary embodiment of the present application;

FIG. 3 is a schematic diagram of a process for clustering a spatial feature matrix provided based on the embodiment shown in FIG. 2;

FIG. 4 is a flowchart of a method for training an image segmentation model provided by another exemplary embodiment of the present application;

FIG. 5 is a schematic diagram of a semantic loss value acquisition process provided based on the embodiment shown in FIG. 4;

FIG. 6 is a schematic diagram of a process for training a feature decomposition model according to an exemplary embodiment of the present application;

FIG. 7 is a flowchart of a method for training an image segmentation model provided by another exemplary embodiment of the present application;

FIG. 8 is a schematic diagram illustrating a training process of an image segmentation model provided by another exemplary embodiment of the present application;

FIG. 9 is a flowchart of an image segmentation method provided by an exemplary embodiment of the present application;

fig. 10 is a block diagram of an image segmentation apparatus according to an exemplary embodiment of the present application;

fig. 11 is a block diagram of an image segmentation apparatus according to another exemplary embodiment of the present application;

FIG. 12 is a block diagram illustrating an exemplary embodiment of an apparatus for training an image segmentation model;

fig. 13 is a block diagram of a terminal according to an exemplary embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

First, terms referred to in the embodiments of the present application are briefly described:

image segmentation: the method refers to a technology of performing self-supervision segmentation according to the texture features of an image, and optionally, when the image is segmented by the image segmentation technology, only parts with different texture features in the image are distinguished according to the texture features, and label labeling is not performed on the segmented parts. In an exemplary embodiment, after the image a is divided by the image division model, the image a is divided into a region 1, a region 2, a region 3, and a region 4, and the region 1, the region 2, the region 3, and the region 4 are 4 regions divided for texture features in the image a. Alternatively, the image segmentation technology can be applied to the segmentation task of images such as natural images and medical images.

Artificial Intelligence (AI): the method is a theory, method, technology and application system for simulating, extending and expanding human intelligence by using a digital computer or a machine controlled by the digital computer, sensing the environment, acquiring knowledge and obtaining the best result by using the knowledge. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

In the related art, when an image is segmented, firstly, feature extraction is performed on the image through a trained Convolutional Neural Network (CNN), and schematically, a feature map obtained by performing l-layer convolution on an image i is as follows:wherein h is the length of the feature map, w is the width of the feature map, and c is the number of channels of the feature map of the layer. Splicing the characteristic graphs of the n pictures to obtain a matrix A:

decomposing the matrix A by using a non-negative matrix decomposition method to obtain an H matrix and a W matrix, wherein the decomposition process is as follows:

H∈R^(n·h·w)×k

W∈R^k×c

wherein W is a base matrix consisting of k bases, H is a coefficient matrix, H_ijI rows and j columns, W, for representing H matrix_ijI rows and j columns for representing the W matrix, k is the maximum number of regions for segmenting the image, i.e., the image can be segmented into k parts by an image segmentation model.

Referring to fig. 1, schematically, after an image 101, an image 102, and an image 103 are input to a convolutional neural network 110, an image feature 111, an image feature 112, and an image feature 113 are output, the image feature 111, the image feature 112, and the image feature 113 are spliced to obtain a matrix a120, and after the matrix a120 is decomposed, a matrix H and a matrix W are obtained, thereby generating a segmented image 121, an image 122, and an image 123.

However, when the image is divided in the above manner, the spatial position in the image is represented by the magnitude of the coefficient value, i.e., only the thermodynamic diagrams of the respective combination coefficients are used as the spatial division result, so that the division result is rough and has low accuracy.

With reference to the above description, a method for training an image segmentation model provided in an embodiment of the present application is described, taking an example that the method is applied to a terminal as an example, as shown in fig. 2, the method includes:

step 201, extracting image features of a sample image, where the sample image is an image used for training an image segmentation model, and the image segmentation model includes a feature decomposition model and a feature clustering model.

Optionally, the feature decomposition model is used for performing dimension reduction on the image features, and optionally, the feature decomposition model is used for performing dimension reduction on the image features through a non-negative matrix decomposition algorithm.

Optionally, the feature clustering model is used to cluster the feature space after dimensionality reduction, and optionally, the feature clustering model is used to cluster the feature space through a k-means clustering algorithm.

Optionally, when the image features of the sample image are extracted, the image features of the sample image are extracted through a trained convolutional neural network model, optionally, the convolutional neural network comprises at least three layers of an input layer, a hidden layer and an output layer, wherein the hidden layer comprises a convolutional layer, a pooling layer, a full-connection layer and the like, after the sample image is input into the convolutional neural network, the layers of the convolutional neural network are respectively processed, and then the image features of the sample image are output.

Step 202, inputting the image features into a feature decomposition model to obtain a dimension reduction feature space.

Optionally, after the image features are input into the feature decomposition model, the image features are decomposed through the feature decomposition model according to a non-negative matrix decomposition algorithm, and a base matrix and a coefficient matrix corresponding to the base matrix are obtained through decomposition, wherein the coefficient matrix is a dimension reduction feature space obtained after dimension reduction is performed on the image features.

Alternatively, a non-Negative Matrix Factorization (NMF) algorithm refers to an algorithm aiming at that decomposed components are all non-negative values and at the same time, nonlinear dimensionality reduction is achieved.

Optionally, each column vector (i.e., coefficient vector) in the coefficient matrix corresponds to a pixel point in the sample image, and each row vector in the coefficient matrix is used to represent a contribution value of a category corresponding to the pixel point. Optionally, the higher the contribution value of the category is, the higher the probability that the pixel is segmented into the category is.

And 203, inputting the spatial feature matrix corresponding to the dimensionality reduction feature space into the feature clustering model to obtain an attractor matrix.

Optionally, the attractor matrix is used to represent an image segmentation result of the sample image.

Optionally, the spatial feature matrix and the attractor matrix corresponding to the dimension-reduced feature space are n columns of matrices, and n is a positive integer. Optionally, the image segmentation model is configured to segment an image in k categories, where k is a positive integer, the dimension-reduced feature space corresponds to a first dimension length h, a second dimension length w, and a third dimension length k, and w and h are positive integers, then a column number n of the spatial feature matrix is determined according to a product of the first dimension length h and the second dimension length w of the dimension-reduced feature space, optionally, the first dimension length h is used to represent a length (expressed by the number of pixels) of a feature image of the sample image, the second dimension length w is used to represent a width (expressed by the number of pixels) of the feature image of the sample image, then h × w pixels are included in the feature image, then the spatial feature matrix corresponds to n (n ═ h × w) columns, and each column corresponds to 1 pixel in the feature image. Optionally, the third dimension length k is determined as the number of rows k of the spatial feature matrix.

Optionally, after a spatial feature matrix (n columns and k rows of matrices) corresponding to the dimension-reduced feature space is determined, the spatial feature matrix is input into a feature clustering model and output to obtain an attractor matrix.

Optionally, the spatial feature matrix and the attractor matrix are both n rows by k rows.

Optionally, the feature clustering model is used to cluster the feature space by a k-means clustering algorithm (k-means clustering algorithm). Optionally, the k-means clustering algorithm is an iterative solution clustering analysis algorithm, and the steps are to randomly select k objects as initial clustering centers, then calculate the distance between each object and each seed clustering center, and assign each object to the closest clustering center. The cluster center and the assigned object represent a cluster. Optionally, each sample is assigned, the cluster center of the cluster is recalculated according to the existing object in the cluster, and this process is repeated until a termination condition is satisfied, where the termination condition may be, for example: no objects are reassigned to different clusters, or no cluster centers change again, or the sum of squared errors is locally minimal.

Optionally, for the coefficient matrix output by the feature decomposition model, performing k-means clustering on the coefficient vectors in the coefficient matrix to obtain a classification result of the coefficient vectors, and mapping the classification result onto the sample image according to the correspondence between the coefficient vectors and the pixel points of the feature image, so as to realize the segmentation of the sample image into different regions and obtain a soft segmentation result of the sample image.

Referring to fig. 3, schematically, the dimension-reduced feature space 310 is a matrix of (h × w) × k, the dimension-reduced feature space 310 is recombined to obtain a spatial feature matrix 320, the spatial feature matrix 320 is a matrix of n columns and k rows, where n columns are determined according to h × w in the dimension-reduced feature space, the spatial feature matrix 320 is clustered by a k-means clustering algorithm to obtain an attractor matrix 330 composed of n attractors, and the attractor matrix 330 is a matrix of n columns and k rows.

And step 204, confirming a clustering loss value according to the difference between the spatial feature matrix and the attractor matrix.

Optionally, a difference value between the ith column vector in the spatial feature matrix and the ith column attractor in the attractor matrix is determined, i is greater than 0 and less than or equal to n, and a clustering loss value is determined according to the sum of n difference values.

For example, the formula for calculating the cluster loss value refers to the following formula one:

the formula I is as follows:

wherein L is_cluRepresents the cluster loss value, h x w represents the number of columns of the spatial feature matrix, and F ═ F₁，f₂，…，f_n]I.e. the spatial signature matrix has n columns, f_iI.e. the ith column, C in the spatial feature matrix_siIn an attractor matrix, with_iAnd (5) clustering the obtained labels by the corresponding k-means.

Step 205, adjusting the model parameters of the feature decomposition model through the clustering loss values.

Optionally, the model parameters of the feature decomposition model are correspondingly adjusted according to the clustering loss values, so that the dimension reduction feature space obtained by decomposing the feature decomposition model is more friendly to the clustering process.

Optionally, the clustering loss value may also be used to perform corresponding adjustment on the model parameters of the feature clustering model.

In summary, according to the training method for the image segmentation model provided in this embodiment, after the image features of the sample image are decomposed, the dimension reduction feature space obtained by decomposition is clustered, the clustering loss value is calculated according to the clustering result, and the parameter adjustment is performed on the feature decomposition model by combining the clustering loss value, so that the dimension reduction feature space obtained by decomposition of the feature decomposition model is optimized, the dimension reduction feature space obtained by decomposition of the feature decomposition model is more beneficial to clustering, and the accuracy of image segmentation is improved.

Optionally, the model parameters of the feature decomposition model are further optimized by a semantic loss function, and fig. 4 is a flowchart of a training method of an image segmentation model according to another exemplary embodiment of the present application, as shown in fig. 4, the method includes:

step 401, extracting image features of a sample image, where the sample image is an image used for training an image segmentation model, and the image segmentation model includes a feature decomposition model and a feature clustering model.

Step 402, inputting the image features into a feature decomposition model to obtain a dimension reduction feature space.

And 403, inputting the spatial feature matrix corresponding to the dimensionality reduction feature space into the feature clustering model to obtain an attractor matrix.

Optionally, the spatial feature matrix and the attractor matrix corresponding to the dimension-reduced feature space are n columns of matrices, and n is a positive integer. Optionally, the image segmentation model is configured to segment an image in k categories, where k is a positive integer, the dimension-reduced feature space corresponds to a first dimension length h, a second dimension length w, and a third dimension length k, and w and h are positive integers, and then the column number n of the spatial feature matrix is determined according to a product of the first dimension length h and the second dimension length w of the dimension-reduced feature space.

And step 404, confirming a clustering loss value according to the difference between the spatial feature matrix and the attractor matrix.

And 405, adjusting model parameters of the characteristic decomposition model through the clustering loss value.

In step 406, semantic loss values are determined based on the difference between the self-segmentation result and the image segmentation result for the k channels.

Optionally, the sample image is an image labeled with an image segmentation result, and the dimension reduction feature space includes self-segmentation results of k channels corresponding to k classes.

Optionally, the image segmentation result of the sample image annotation may be a result of manual annotation, or may be a feature map extracted by a pre-trained feature extractor, such as: and extracting the obtained feature map through a Visual Geometry Group (VGG) network.

Optionally, the feature decomposition model further outputs a semantic base, and optionally, the semantic base is a base matrix output by the feature decomposition model. The semantic loss value is determined by determining the products of k self-segmentation results and corresponding semantic bases and determining the result according to the difference between the k products and the image segmentation results. For example, please refer to the following formula two for the calculation of the semantic loss value:

the formula II is as follows:

wherein L is_SCRepresenting semantic loss value, V (u, V) representing image segmentation result (such as feature map extracted by pre-trained VGG network), R (k, u, V) representing self-segmentation result of k channels, and w_kFor representing k sets of semantic bases.

Optionally, the k sets of semantic bases need to guarantee orthogonality. Optionally, a product matrix of a first matrix corresponding to the semantic base and a transpose matrix of the first matrix is determined, and an orthogonal loss value is determined according to a difference between the product matrix and an identity matrix, and optionally, the identity matrix is determined according to a number of groups of the semantic base, such as: w is a_kWhen the unit matrix is used for representing k groups of semantic bases, the unit matrix is a k-order unit matrix. Optionally, the semantic base is adjusted according to the quadrature loss value.

For illustration, the calculation procedure of the quadrature loss value refers to the following formula three:

the formula III is as follows:

wherein L is_onThe value of the quadrature loss is represented,

a first matrix corresponding to k sets of semantic bases,

a transposed matrix representing the first matrix, I_kRepresenting an identity matrix of order k.

For illustrative purposes, taking the feature map extracted by the pre-trained feature extractor as the image segmentation result as an example, please refer to fig. 5, in which an image 510 is input into the pre-trained feature extractor 520 and the feature decomposition model 530, an image segmentation result 521 is output by the feature extractor 520, a self-segmentation result 531 is output by the feature decomposition model 530 in combination with the semantic basis 541 and the orthogonal loss value 542, and a semantic loss value L is determined according to the image segmentation result 521 and the self-segmentation result 531_SC。

Step 407, adjusting the model parameters of the feature decomposition model through the semantic loss value.

Optionally, a first model parameter of the feature decomposition model is adjusted by a clustering loss value, and a second model parameter of the feature decomposition model is adjusted by a semantic loss value, where the first model parameter and the second model parameter may be the same parameter, or different parameters, or partially the same parameter, and this is not limited in this embodiment of the present application.

According to the method provided by the embodiment, the feature decomposition model is trained through the clustering loss function and the semantic loss function, so that the decomposition accuracy of the feature decomposition model is improved, and the adaptability of the dimension reduction feature space obtained by decomposing the feature decomposition model to clustering is improved.

In an optional embodiment, in the process of training the feature decomposition model, the model of the feature decomposition model is adjusted through a semantic loss function, a central loss function, and a clustering loss function, for example, referring to fig. 6, a sample image 610 is input into the feature decomposition model 620, and after a dimension-reduced feature space R is obtained through output, the feature decomposition model 620 is trained according to the dimension-reduced feature space R, the semantic loss function, the central loss function, and the clustering loss function.

Fig. 7 is a flowchart of a training method for an image segmentation model according to another exemplary embodiment of the present application, which is described by taking the method as an example for being applied to a terminal, and as shown in fig. 7, the method includes:

step 701, extracting image features of a sample image, wherein the sample image is an image used for training an image segmentation model, and the image segmentation model comprises a feature decomposition model and a feature clustering model.

Step 702, inputting the image characteristics into a characteristic decomposition model to obtain a dimension reduction characteristic space.

And 703, inputting the spatial feature matrix corresponding to the dimensionality reduction feature space into the feature clustering model to obtain an attractor matrix.

And step 704, confirming a clustering loss value according to the difference between the spatial feature matrix and the attractor matrix.

Step 705, adjusting the model parameters of the feature decomposition model through the clustering loss values.

Step 706, determining semantic loss values according to the difference between the self-segmentation result and the image segmentation result of the k channels.

And step 707, adjusting the model parameters of the feature decomposition model through the semantic loss value.

At step 708, the centroid position for each channel is determined from the self-segmentation results for the k channels.

Optionally, the centroid position refers to a location point in the mass concentration. Optionally, the centroid position is calculated by referring to the following formulas four to six:

the formula four is as follows:

the formula five is as follows:

formula six:

wherein the centroid position is

R (k, u, v, z) is the value at (u, v, z) in the kth channel of the dimension-reduced feature space.

Step 709, calculating a center loss value according to the centroid position of each channel.

Optionally, the spatial variance of each channel is calculated and summed to obtain the central loss value.

Optionally, the calculation procedure of the central loss value refers to the following formula seven:

the formula seven:

wherein L is_conThe value of the central loss is represented,

representing the centroid position in the kth channel.

And 710, adjusting the model parameters of the characteristic decomposition model through the central loss value.

Optionally, a first model parameter of the feature decomposition model is adjusted by a clustering loss value, a second model parameter of the feature decomposition model is adjusted by a semantic loss value, and a third model parameter of the feature decomposition model is adjusted by a central loss value, where the first model parameter, the second model parameter, and the third model parameter may be the same parameter, or different parameters, or partially the same parameter, and the parameters are not limited in this embodiment of the present application.

According to the method provided by the embodiment, the feature decomposition model is trained through the semantic loss function, the central loss function and the clustering loss function, so that the decomposition accuracy of the feature decomposition model is improved, and the adaptability of the dimensionality reduction feature space obtained by decomposing the feature decomposition model to clustering is improved.

Fig. 8 is a schematic process diagram of an auto-supervised training method for an image segmentation model according to an exemplary embodiment of the present application, and as shown in fig. 8, a sample image 810 is input into an image segmentation model 820 and output to obtain a segmented target image 830, and the image segmentation model 820 is trained according to the target image and a geometric distribution sub-module 841, an argument sub-module 842 and a semantic consistency sub-module 843 in an auto-supervised training module 840.

Optionally, in combination with the above-mentioned self-supervision training method of the image segmentation model, an image segmentation method provided in this embodiment of the present application is described, fig. 9 is a flowchart of the image segmentation method provided in an exemplary embodiment of the present application, and taking as an example that the method is applied to a terminal for description, as shown in fig. 9, the method includes:

step 901, obtaining a target image, where the target image is an image to be segmented of image content.

And step 902, inputting the target image into the image segmentation model, and outputting the segmentation result of the target image.

Optionally, the image segmentation model is an image obtained by training a sample image, and the image segmentation model includes a feature decomposition model and a feature clustering model, wherein after the sample image is decomposed by the feature decomposition model to obtain a dimension reduction feature space, a clustering result is obtained by the feature clustering model, and a clustering loss value is determined according to the clustering result, and the clustering loss value is used for adjusting model parameters of the feature decomposition model.

Optionally, the training process of the image segmentation model refers to the training method of the image segmentation model shown in fig. 2, fig. 4 and fig. 7.

In summary, in the image segmentation method provided in this embodiment, after the image features of the sample image are decomposed, the reduced-dimension feature space obtained by decomposition is clustered, the clustering loss value is calculated according to the clustering result, and the feature decomposition model is subjected to parameter adjustment in combination with the clustering loss value, so that the reduced-dimension feature space obtained by decomposition of the feature decomposition model is optimized, the reduced-dimension feature space obtained by decomposition of the feature decomposition model is more beneficial to clustering, and the accuracy of image segmentation is improved.

Fig. 10 is a block diagram of an image segmentation apparatus according to an exemplary embodiment of the present application, and as shown in fig. 10, the image segmentation apparatus is described as being applied to a terminal, and the image segmentation apparatus includes: an acquisition module 1010 and an input module 1020;

an obtaining module 1010, configured to obtain a target image, where the target image is an image to be segmented of image content;

an input module 1020, configured to input the target image into an image segmentation model, and output a segmentation result of the target image;

In an alternative embodiment, as shown in fig. 11, the apparatus further comprises:

an extracting module 1030, configured to extract image features of the sample image, where the feature decomposition model is configured to perform dimension reduction on the image features, and the feature clustering model is configured to cluster feature spaces after dimension reduction;

the input module 1020 is further configured to input the image feature into the feature decomposition model to obtain the dimension reduction feature space;

the input module 1020 is further configured to input a spatial feature matrix corresponding to the dimension-reduced feature space into the feature clustering model to obtain an attractor matrix, where the attractor matrix is used to represent an image segmentation result of the sample image;

a determining module 1040, configured to determine the clustering loss value according to a difference between the spatial feature matrix and the attractor matrix;

an adjusting module 1050, configured to adjust the model parameter of the feature decomposition model according to the clustering loss value.

In an optional embodiment, the spatial feature matrix and the attractor matrix corresponding to the dimension-reduced feature space are n columns of matrices, where n is a positive integer;

the determining module 1040 is further configured to determine a difference between an ith column vector in the spatial feature matrix and an ith column attractor in the attractor matrix, where i is greater than 0 and less than or equal to n; and determining the clustering loss value according to the sum of the n difference values.

In an optional embodiment, the image segmentation model is configured to segment an image in k categories, where k is a positive integer, the dimension-reduced feature space corresponds to a first dimension length h, a second dimension length w, and a third dimension length k, and w and h are positive integers;

the determining module 1040 is further configured to determine the number of columns n of the spatial feature matrix according to a product of a first dimension length h and a second dimension length w of the dimension-reduced feature space; determining the third dimension length k as the number of rows k of the space characteristic matrix;

the input module 1020 is further configured to input the spatial feature matrix into the feature clustering model.

In an optional embodiment, the sample image is an image labeled with an image segmentation result, and the dimension reduction feature space includes self-segmentation results of k channels corresponding to the k classes;

the determining module 1040 is further configured to determine a semantic loss value according to a difference between the self-segmentation result of the k channels and the image segmentation result;

the adjusting module 1050 is further configured to adjust the model parameters of the feature decomposition model according to the semantic loss value.

In an optional embodiment, the feature decomposition model further outputs a semantic base;

the determining module 1040 is further configured to determine a product between the self-segmentation result of the k channels and the corresponding semantic bases; determining the semantic loss value based on a difference between the k products and the image segmentation result.

In an optional embodiment, the determining module 1040 is further configured to determine a product matrix of a first matrix corresponding to the semantic basis and a transposed matrix of the first matrix; determining an orthogonal loss value according to the difference between the product matrix and the unit matrix;

the adjusting module 1050 is further configured to adjust the semantic base according to the orthogonality loss value.

In an optional embodiment, the determining module 1040 is further configured to determine a centroid position of each channel according to the self-segmentation result of the k channels; calculating a central loss value from the centroid position of each channel;

the adjusting module 1050 is further configured to adjust the model parameters of the feature decomposition model according to the central loss value.

In summary, in the image segmentation apparatus provided in this embodiment, after the image features of the sample image are decomposed, the reduced-dimension feature space obtained by decomposition is clustered, the clustering loss value is calculated according to the clustering result, and the feature decomposition model is subjected to parameter adjustment in combination with the clustering loss value, so that the reduced-dimension feature space obtained by decomposition of the feature decomposition model is optimized, the reduced-dimension feature space obtained by decomposition of the feature decomposition model is more beneficial to clustering, and the accuracy of image segmentation is improved.

Fig. 12 is a block diagram of a structure of an apparatus for training an image segmentation model according to an exemplary embodiment of the present application, and as shown in fig. 12, the apparatus is described as being applied to a terminal, and the apparatus includes: an extraction module 1210, an input module 1220, a determination module 1230, and an adjustment module 1240;

an extracting module 1210, configured to extract image features of a sample image, where the sample image is an image used for training the image segmentation model, and the image segmentation model includes a feature decomposition model and a feature clustering model, where the feature decomposition model is used to perform dimension reduction on the image features, and the feature clustering model is used to cluster a feature space after dimension reduction;

an input module 1220, configured to input the image feature into the feature decomposition model, so as to obtain a dimension reduction feature space;

the input module 1220 is further configured to input a spatial feature matrix corresponding to the dimension-reduced feature space into the feature clustering model to obtain an attractor matrix, where the attractor matrix is used to represent an image segmentation result of the sample image;

a determining module 1230, configured to determine a clustering loss value according to a difference between the spatial feature matrix and the attractor matrix;

and the adjusting module 1240 is used for adjusting the model parameters of the feature decomposition model through the clustering loss values.

In summary, in the training device for the image segmentation model provided in this embodiment, after the image features of the sample image are decomposed, the dimension reduction feature space obtained by decomposition is clustered, the clustering loss value is calculated according to the clustering result, and the parameter adjustment is performed on the feature decomposition model in combination with the clustering loss value, so that the dimension reduction feature space obtained by decomposition of the feature decomposition model is optimized, the dimension reduction feature space obtained by decomposition of the feature decomposition model is more beneficial to clustering, and the accuracy of image segmentation is improved.

It should be noted that: the image segmentation apparatus or the training apparatus for image segmentation models provided in the above embodiments are only illustrated by the division of the above functional modules, and in practical applications, the above functions may be distributed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the above described functions. In addition, the image segmentation apparatus or the training apparatus for the image segmentation model provided in the above embodiments and the image segmentation method or the training method for the image segmentation model provided in the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments, and are not described herein again.

Fig. 13 is a block diagram illustrating a terminal 1300 according to an exemplary embodiment of the present invention. The terminal 1300 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio layer iii, motion video Experts compression standard Audio layer 3), an MP4 player (Moving Picture Experts Group Audio layer IV, motion video Experts compression standard Audio layer 4), a notebook computer, or a desktop computer. Terminal 1300 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, etc.

In general, terminal 1300 includes: a processor 1301 and a memory 1302.

Processor 1301 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 1301 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 1301 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also referred to as a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1301 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing content that the display screen needs to display. In some embodiments, processor 1301 may further include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.

Memory 1302 may include one or more computer-readable storage media, which may be non-transitory. The memory 1302 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 1302 is used to store at least one instruction for execution by the processor 1301 to implement the image segmentation method or the training method of the image segmentation model provided by the method embodiments herein.

In some embodiments, terminal 1300 may further optionally include: a peripheral interface 1303 and at least one peripheral. Processor 1301, memory 1302, and peripheral interface 1303 may be connected by a bus or signal line. Each peripheral device may be connected to the peripheral device interface 1303 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1304, touch display 1305, camera 1306, audio circuitry 1307, positioning component 1308, and power supply 1309.

Peripheral interface 1303 may be used to connect at least one peripheral associated with I/O (Input/Output) to processor 1301 and memory 1302. In some embodiments, processor 1301, memory 1302, and peripheral interface 1303 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 1301, the memory 1302, and the peripheral device interface 1303 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 1304 is used to receive and transmit RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 1304 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 1304 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1304 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 1304 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 1304 may also include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 1305 is used to display a UI (user interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 1305 is a touch display screen, the display screen 1305 also has the ability to capture touch signals on or over the surface of the display screen 1305. The touch signal may be input to the processor 1301 as a control signal for processing. At this point, the display 1305 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, display 1305 may be one, providing the front panel of terminal 1300; in other embodiments, display 1305 may be at least two, either on different surfaces of terminal 1300 or in a folded design; in still other embodiments, display 1305 may be a flexible display disposed on a curved surface or on a folded surface of terminal 1300. Even further, the display 1305 may be arranged in a non-rectangular irregular figure, i.e., a shaped screen. The Display 1305 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), or the like.

The camera assembly 1306 is used to capture images or video. Optionally, camera assembly 1306 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 1306 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuit 1307 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 1301 for processing, or inputting the electric signals to the radio frequency circuit 1304 for realizing voice communication. For stereo capture or noise reduction purposes, multiple microphones may be provided, each at a different location of terminal 1300. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 1301 or the radio frequency circuitry 1304 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, audio circuitry 1307 may also include a headphone jack.

The positioning component 1308 is used for positioning the current geographic position of the terminal 1300 to implement navigation or LBS (location based Service). The positioning component 1308 can be a positioning component based on the GPS (global positioning System) of the united states, the beidou System of china, or the galileo System of russia.

Power supply 1309 is used to provide power to various components in terminal 1300. The power source 1309 may be alternating current, direct current, disposable or rechargeable. When the power source 1309 comprises a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal 1300 also includes one or more sensors 1310. The one or more sensors 1310 include, but are not limited to: acceleration sensor 1311, gyro sensor 1312, pressure sensor 1313, fingerprint sensor 1314, optical sensor 1315, and proximity sensor 1316.

The acceleration sensor 1311 can detect the magnitude of acceleration on three coordinate axes of the coordinate system established with the terminal 1300. For example, the acceleration sensor 1311 may be used to detect components of gravitational acceleration in three coordinate axes. The processor 1301 may control the touch display screen 1305 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 1311. The acceleration sensor 1311 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 1312 may detect the body direction and the rotation angle of the terminal 1300, and the gyro sensor 1312 may cooperate with the acceleration sensor 1311 to acquire a 3D motion of the user with respect to the terminal 1300. Processor 1301, based on the data collected by gyroscope sensor 1312, may perform the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensor 1313 may be disposed on a side bezel of terminal 1300 and/or underlying touch display 1305. When the pressure sensor 1313 is disposed on the side frame of the terminal 1300, a user's holding signal to the terminal 1300 may be detected, and the processor 1301 performs left-right hand recognition or shortcut operation according to the holding signal acquired by the pressure sensor 1313. When the pressure sensor 1313 is provided at a lower layer of the touch display screen 1305, control of an operability space on the UI interface is realized by the processor 1301 in accordance with a pressure operation of the user on the touch display screen 1305. The operability space includes at least one of a button space, a scroll bar space, an icon space, and a menu space.

The fingerprint sensor 1314 is used for collecting the fingerprint of the user, and the processor 1301 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 1314, or the fingerprint sensor 1314 identifies the identity of the user according to the collected fingerprint. When the identity of the user is identified as a trusted identity, the processor 1301 authorizes the user to perform relevant sensitive operations, including unlocking a screen, viewing encrypted information, downloading software, paying, changing settings, and the like. The fingerprint sensor 1314 may be disposed on the front, back, or side of the terminal 1300. When a physical button or vendor Logo is provided on the terminal 1300, the fingerprint sensor 1314 may be integrated with the physical button or vendor Logo.

The optical sensor 1315 is used to collect the ambient light intensity. In one embodiment, the processor 1301 can control the display brightness of the touch display screen 1305 according to the intensity of the ambient light collected by the optical sensor 1315. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 1305 is increased; when the ambient light intensity is low, the display brightness of the touch display 1305 is turned down. In another embodiment, the processor 1301 can also dynamically adjust the shooting parameters of the camera assembly 1306 according to the ambient light intensity collected by the optical sensor 1315.

Proximity sensor 1316, also known as a distance sensor, is typically disposed on a front panel of terminal 1300. Proximity sensor 1316 is used to gather the distance between the user and the front face of terminal 1300. In one embodiment, the processor 1301 controls the touch display 1305 to switch from the bright screen state to the dark screen state when the proximity sensor 1316 detects that the distance between the user and the front face of the terminal 1300 gradually decreases; the touch display 1305 is controlled by the processor 1301 to switch from the rest state to the bright state when the proximity sensor 1316 detects that the distance between the user and the front face of the terminal 1300 gradually becomes larger.

Those skilled in the art will appreciate that the configuration shown in fig. 13 is not intended to be limiting with respect to terminal 1300 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be employed.

Optionally, the computer-readable storage medium may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a Solid State Drive (SSD), or an optical disc. The Random Access Memory may include a resistive Random Access Memory (ReRAM) and a Dynamic Random Access Memory (DRAM). The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of image segmentation, the method comprising:

2. The method of claim 1, wherein the training process of the image segmentation model comprises:

extracting image features of the sample image, wherein the feature decomposition model is used for carrying out dimension reduction processing on the image features, and the feature clustering model is used for clustering feature spaces subjected to dimension reduction;

inputting the image features into the feature decomposition model to obtain the dimension reduction feature space;

determining the clustering loss value according to the difference between the spatial feature matrix and the attractor matrix;

3. The method according to claim 2, wherein the spatial feature matrix and the attractor matrix corresponding to the dimension-reduced feature space are n columns of matrices, n being a positive integer;

the determining the clustering loss value according to the difference between the spatial feature matrix and the attractor matrix comprises:

determining the difference value between the ith column vector in the spatial feature matrix and the ith column attractor in the attractor matrix, wherein i is more than 0 and less than or equal to n;

and determining the clustering loss value according to the sum of the n difference values.

4. The method of claim 2, wherein the image segmentation model is used to segment the image in k classes, k being a positive integer, the dimension-reduced feature space corresponds to a first dimension length h, a second dimension length w, and a third dimension length k, w, h being positive integers;

inputting the spatial feature matrix corresponding to the dimension-reduced feature space into the feature clustering model, including:

determining the column number n of the space characteristic matrix according to the product of the first dimension length h and the second dimension length w of the dimension reduction characteristic space;

determining the third dimension length k as the number of rows k of the space characteristic matrix;

and inputting the spatial feature matrix into the feature clustering model.

5. The method according to any one of claims 1 to 4, wherein the sample image is an image labeled with image segmentation results, and the dimension reduction feature space includes self-segmentation results of k channels corresponding to the k classes;

the method further comprises the following steps:

determining a semantic loss value according to a difference between the self-segmentation result of the k channels and the image segmentation result;

and adjusting the model parameters of the feature decomposition model through the semantic loss value.

6. The method of claim 5, wherein the feature decomposition model further outputs a semantic basis;

the determining a semantic loss value according to a difference between the self-segmentation result and the image segmentation result of the k channels includes:

determining a product between the self-segmentation results of the k channels and the corresponding semantic bases;

determining the semantic loss value based on a difference between the k products and the image segmentation result.

7. The method of claim 6, further comprising:

determining a product matrix of a first matrix corresponding to the semantic base and a transposed matrix of the first matrix;

determining an orthogonal loss value according to the difference between the product matrix and the unit matrix;

and adjusting the semantic base according to the orthogonal loss value.

8. The method of claim 5, further comprising:

determining the centroid position of each channel according to the self-segmentation result of the k channels;

calculating a central loss value from the centroid position of each channel;

and adjusting the model parameters of the characteristic decomposition model through the central loss value.

9. A method for training an image segmentation model, the method comprising:

10. An image segmentation apparatus, characterized in that the apparatus comprises:

11. An apparatus for training an image segmentation model, the apparatus comprising:

12. A computer device comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the image segmentation method according to any one of claims 1 to 8 or the training method of the image segmentation model according to claim 9.

13. A computer readable storage medium, having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the image segmentation method according to any one of claims 1 to 8 or the training method of the image segmentation model according to claim 9.