CN113283433A

CN113283433A - Image semantic segmentation method, system, electronic device and storage medium

Info

Publication number: CN113283433A
Application number: CN202110394315.9A
Authority: CN
Inventors: 李建强; 彭浩然; 吕思锐
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2021-04-13
Filing date: 2021-04-13
Publication date: 2021-08-20

Abstract

The embodiment of the invention provides an image semantic segmentation method, an image semantic segmentation system, electronic equipment and a storage medium, wherein the method comprises the following steps: determining an image to be semantically segmented; inputting the image into an image semantic segmentation model to obtain an image semantic segmentation result output by the image semantic segmentation model; the image semantic segmentation model is obtained by training based on a sample image and corresponding pixel class labels, and the pixel class labels are predetermined. The invention solves the problems of mismatching of large pieces, multiple divisions and few divisions when the ultrasonic image is segmented, and can effectively improve the image segmentation effect.

Description

Image semantic segmentation method, system, electronic device and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method, a system, an electronic device, and a storage medium for semantic segmentation of an image.

Background

The medical science and the deep learning have more and more close relationship, the interdisciplinary project of the deep learning and the medical science is endless, and through deep learning, a plurality of achievements have been proved to be capable of saving a great amount of manpower and material resources on the treatment of various diseases.

Hydronephrosis is a common nephropathy, and ultrasonic examination is the basic examination commonly done by suspected hydronephrosis patients, and is convenient, rapid, low in price, harmless and radiationless. If the disease can be judged and graded in the ultrasonic examination stage by using a deep learning method, a large amount of capital, manpower and medical resources can be saved, and related patients can be helped.

Image segmentation semantic recognition is indispensable in ultrasound image classification. However, in the process of renal ultrasound image segmentation, the traditional Unet model cannot well outline the boundary of the segmented part, and the phenomena of large mismatch, large amount of segmentation and small amount of segmentation often occur.

Disclosure of Invention

The embodiment of the invention provides an image semantic segmentation method, an image semantic segmentation system, electronic equipment and a storage medium, which are used for solving the problems that the traditional Unet model cannot well outline the boundary of a segmentation part, and the phenomena of large mismatch, more segmentation and less segmentation often occur when an ultrasonic image is segmented.

In a first aspect, an embodiment of the present invention provides an image semantic segmentation method, including:

determining an image to be semantically segmented;

inputting the image into an image semantic segmentation model to obtain an image semantic segmentation result output by the image semantic segmentation model;

the image semantic segmentation model is obtained by training based on a sample image and corresponding pixel class labels, and the pixel class labels are predetermined.

Preferably, the image semantic segmentation model comprises a trunk feature extraction model, a reinforced feature extraction model, a classification model and a segmentation model;

inputting the image into an image semantic segmentation model to obtain an image semantic segmentation result output by the image semantic segmentation model, wherein the image semantic segmentation result comprises the following steps:

inputting the image into the trunk feature extraction model, and outputting image features of a plurality of effective feature layers;

inputting the image features of the effective feature layers into the enhanced feature extraction model, and outputting the image fusion feature of each effective feature layer;

inputting the image fusion characteristics of each effective characteristic layer into the classification model, and outputting the pixel classification result of the image;

and inputting the pixel classification result of the image into the segmentation model, and outputting the semantic segmentation result of the image.

Preferably, the sample image is selected from an image dataset;

the trunk feature extraction model is obtained by training a convolutional neural network VGG16 based on a sample image selected from an image data set and serving as a training sample image after being labeled;

the enhanced feature extraction model comprises a weight block;

inputting the image features of the plurality of effective feature layers into the enhanced feature extraction model, and outputting the image fusion feature of each effective feature layer, wherein the method comprises the following steps:

weighting the image characteristics of the effective characteristic layers by weight values respectively to obtain the image fusion characteristics of each effective characteristic layer; wherein the weight value is adjustable by the weight block.

Preferably, the weighting of the image features of the plurality of effective feature layers is performed to obtain the image fusion feature of each effective feature layer, and the formula is as follows:

wherein, U_n+1Is the result of sampling at layer n +1, R_nIs an adjustment function for adjusting the resolution of the n +1 th layer to be consistent with the resolution of the n-th layer, P_nThe method is a channel number adjusting function based on the weight and the number of channels of the nth layer, and delta is a feature extraction operation function.

In a second aspect, an embodiment of the present invention provides an image semantic segmentation system, including an image determination module and an image semantic segmentation module:

the image determining module is used for determining an image to be subjected to semantic segmentation;

the image semantic segmentation module is used for inputting the image into an image semantic segmentation model to obtain an image semantic segmentation result output by the image semantic segmentation model;

Preferably, the image semantic segmentation module comprises a trunk feature extraction module, a reinforced feature extraction module, a classification module and a segmentation module;

the trunk feature extraction module is used for obtaining image features of a plurality of effective feature layers based on the determined image;

the enhanced feature extraction module is used for obtaining image fusion features of each effective feature layer based on the image features of the effective feature layers;

the classification module is used for obtaining an image pixel classification result based on the image fusion characteristics of each effective characteristic layer;

and the segmentation module is used for obtaining an image semantic segmentation result based on the image pixel classification result.

Preferably, the sample image is selected from an image dataset;

the trunk feature extraction module comprises a trunk feature extraction model, and the trunk feature extraction model is obtained by training a convolutional neural network VGG16 based on a sample image selected from an image data set and serving as a training sample image after being labeled;

the enhanced feature extraction module comprises a weight block;

the weight block is used for weighting the image characteristics of the effective characteristic layers respectively to obtain the image fusion characteristics of each effective characteristic layer; wherein the weight value is adjustable by the weight block.

Preferably, the weight value is configured to weight the image features of the plurality of effective feature layers respectively to obtain an image fusion feature of each effective feature layer, where the formula is as follows:

In a third aspect, an embodiment of the present invention provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the image semantic segmentation method according to any one of the above-mentioned first aspects when executing the program.

In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the image semantic segmentation method according to any one of the above-mentioned first aspects.

According to the image semantic segmentation method, the image semantic segmentation system, the electronic equipment and the storage medium, the weight block is added on the basis of the Unet through the novel semantic segmentation network structure based on the Unet, and a plurality of layers can be combined on the basis of self-defining the weight, so that the receptive field is enlarged, the network can extract context information better, and the effect of semantic segmentation network is improved.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a flow chart of a semantic segmentation method for an image according to the present invention;

FIG. 2 is a block diagram of an image semantic segmentation model provided by the present invention;

FIG. 3 is a diagram of a Unet model network architecture provided by the present invention;

FIG. 4 is an optimized view of the MwUnet network structure provided by the present invention;

FIG. 5 is a diagram of a weighted skip connection scheme provided by the present invention;

FIG. 6 is a schematic structural diagram of an image semantic segmentation system provided by the present invention;

FIG. 7 is a schematic structural diagram of an image semantic segmentation module provided by the present invention;

FIG. 8 is a schematic structural diagram of an electronic device provided by the present invention;

reference numerals:

1: down-sampling; 2: jump connection; 3: upsampling;

4: performing convolution operation; 5: and (6) weighting the weight.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

An image semantic segmentation method, system, electronic device and storage medium provided by the present invention are described below with reference to fig. 1 to 8.

The embodiment of the invention provides an image semantic segmentation method. Fig. 1 is a schematic flow chart of an image semantic segmentation method according to an embodiment of the present invention, as shown in fig. 1, the method includes:

step 110, determining an image to be semantically segmented;

in particular, renal ultrasound images are used in modern medical image recognition for practical applications.

Step 120, inputting the image into an image semantic segmentation model to obtain an image semantic segmentation result output by the image semantic segmentation model;

In particular, the goal of semantic segmentation of images is to label the class of each pixel in the image, and this task is often referred to as dense prediction because every pixel in the image needs to be predicted.

According to the method provided by the embodiment of the invention, the image semantic segmentation model is obtained after sample image training, and the image pixels to be subjected to semantic segmentation are classified by inputting the image to be subjected to semantic segmentation, so that the image segmentation effect can be effectively improved.

Based on any of the above embodiments, as shown in fig. 2, the image semantic segmentation model 200 includes a trunk feature extraction model 210, an enhanced feature extraction model 220, a classification model 230, and a segmentation model 240;

inputting the image into an image semantic segmentation model 200 to obtain an image semantic segmentation result output by the image semantic segmentation model 200, wherein the image semantic segmentation result comprises:

inputting the image into the trunk feature extraction model 210, and outputting image features of a plurality of effective feature layers;

inputting the image features of the plurality of effective feature layers into the enhanced feature extraction model 220, and outputting the image fusion feature of each effective feature layer;

inputting the image fusion features of each effective feature layer into the classification model 230, and outputting the pixel classification result of the image;

the pixel classification result of the image is input into the segmentation model 240, and the semantic segmentation result of the image is output.

Specifically, the image semantic segmentation method of the embodiment of the present invention is a novel semantic segmentation network structure based on the Unet, wherein a weight block is added on the basis of the Unet, and an Unet model structure can be divided into three parts:

1. the first part is a trunk feature extraction part, and a feature layer is acquired by using the trunk part. The backbone feature extraction part of the network is similar to VGG, a convolutional and maximally pooled stack. The five preliminary valid feature layers obtained in this step will be used for feature fusion in the next step.

2. The second section is an enhanced feature extraction section. And performing up-sampling on the five preliminary effective characteristic layers obtained in the first step, and performing characteristic fusion to obtain a final effective characteristic layer fused with all the characteristics.

3. The third part is the classification prediction part. And classifying each feature point by using the finally obtained last effective feature layer, namely classifying each pixel point.

The loss function used by Unet is CE loss (cross entropy), which is defined as follows:

wherein, p (x)_i) Represents a ground truth, i.e., label information of the split network, q (x)_i) Representing information after network segmentation.

In any of the above embodiments, the sample image is selected from an image dataset;

specifically, the image dataset of Imagenet is widely used for training data of an object recognition network in a deep learning network, and at present, 14197122 images are totally contained in the Imagenet, and the images are totally divided into 21841 categories (syncets), and the major categories include: animal, application, bird, coverage, device, fabric, fish, etc.

in particular, the Encoder feature extraction network part adopts VGG16 as a backbone to facilitate the transfer learning of pre-training network parameters downloaded from the official website in VGG 16. And the correctly labeled data is used as the basis for supervised learning of correct samples.

The enhanced feature extraction model comprises a weight block;

Specifically, the embodiment of the invention constructs a novel semantic segmentation network structure based on Unet, adds a weight block on the basis of Unet, and can combine a plurality of layers on the basis of self-defined weight, thereby enlarging the receptive field and better enabling the network to extract context information. The receptive field is the area size of the mapping of the pixel points on the feature map (feature map) output by each layer of the neural network on the input picture. The explanation of the restyle point is that one point on the feature map corresponds to an area on the input map, which is also an area that can be noticed by the neural network in the present hierarchy. Fig. 3 is a simplified schematic diagram of a structure diagram of the network of the Unet model. In contrast to the Unet shown in FIG. 3, the MwUnet network architecture is optimized as shown in FIG. 4.

The overall structure of the MwUnet is greatly changed from the original Unet, and different from the original Unet, the resolution of the network input diagram of the MwUnet is the same as the resolution of the final output diagram. Although a U-type network is adopted, it is not the same to each layer.

Based on any of the above embodiments, the weighting of the image features of the plurality of effective feature layers is performed to obtain the image fusion feature of each effective feature layer, and the formula is as follows:

wherein, U_n+1Is on the n +1 th layerResult of sampling, R_nIs an adjustment function for adjusting the resolution of the n +1 th layer to be consistent with the resolution of the n-th layer, P_nThe method is a channel number adjusting function based on the weight and the number of channels of the nth layer, and delta is a feature extraction operation function.

Specifically, as shown in fig. 3 and 4, unlike the jump connection 2skip connection of the Unet, the MwUnet does not directly connect the same level information in the encoder to the decoder, but adds a weight block 5, and realizes different multi-level combination by manually adjusting the weight. The decoder of each level can receive semantic information extracted after the encoder samples 1 on different levels, and the reception fields of different levels are different, so that the decoder of each level can receive the semantic information extracted by the feature extraction network under different resolutions. This connection is called weighted skip connection (weighted skip connection 2), and the specific operation method is shown in fig. 5. Weighted skip connection is obtained by weighting the result given by the weight 5weight block encoder, and the four levels of the encoder generate a feature matrix of the corresponding channel number according to the weight calculation, for example, the weight in fig. 5 is 1:1:1:1, all the four layers generate 128 channels, and the four-layer result concatemate is to combine the four 128-channel matrices into a 512-channel matrix, and then to combine the 512-channel matrix with the operation result concatemate of the previous layer, that is, to combine the 512-channel matrix with the result of the upsampling 3 of the previous layer, as shown in fig. 5, the result of the previous layer is a 512-channel matrix, that is, to decode the three-dimensional concatemate.

The Weight block can manually change the Weight of each layer. For example, if a segmentation task needs to focus on information of the whole picture, the model should have a wider receptive field, and the weight value can be set to a default value of 1:1:1:1 as shown in fig. 5. If one segmentation task needs attention but background knowledge is de-emphasized, the weight value may be set to 1:1:1: 9. Note that to ensure structural invariance, the sum of the weight values should be a multiple of 4.

As shown in FIG. 5, X^3,1The method of operation of (1), in conjunction with fig. 3 and 4. X^0,0To X^3,0Each of (1)The weights of the layers are the same, so that finally each layer is subjected to convolution operation 4 of maxporoling and 3X 3 to finally become feature maps with the resolution of 64X 64 (consistent with the resolution of the upsampling 3 of the layer) and the channel number of 128, the feature maps of the 4 layers are aggregated to form a Total cube of 512 channels, and then the aggregate operation is performed with the result of the upsampling 3 to obtain X^3,1。

The following describes an image semantic segmentation system provided by the present invention, and the following description and the above-described image semantic segmentation method can be referred to with each other.

Fig. 6 is a schematic structural diagram of an image semantic segmentation system according to an embodiment of the present invention, as shown in fig. 6, the system includes an image determination module 610 and an image semantic segmentation module 620:

the image determining module 610 is configured to determine an image to be semantically segmented;

the image semantic segmentation module 620 is configured to input the image into an image semantic segmentation model to obtain an image semantic segmentation result output by the image semantic segmentation model;

The system provided by the embodiment of the invention obtains the image semantic segmentation model based on sample image training, and can effectively improve the image segmentation effect by inputting the image to be subjected to semantic segmentation to classify the image pixels.

Based on any of the above embodiments, as shown in fig. 7, the image semantic segmentation module includes a trunk feature extraction module 710, an enhanced feature extraction module 720, a classification module 730, and a segmentation module 740;

the trunk feature extraction module 710 is configured to obtain image features of a plurality of valid feature layers based on the determined image;

the enhanced feature extraction module 720 is configured to obtain an image fusion feature of each effective feature layer based on the image features of the plurality of effective feature layers;

the classification module 730 is configured to obtain an image pixel classification result based on the image fusion feature of each effective feature layer;

the segmentation module 740 is configured to obtain an image semantic segmentation result based on the image pixel classification result.

the enhanced feature extraction module comprises a weight block;

Based on any of the above embodiments, the weight block is configured to perform weight weighting on the image features of the plurality of effective feature layers respectively to obtain an image fusion feature of each effective feature layer, and the formula is as follows:

Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 8, the electronic device may include: a processor (processor)810, a communication Interface 820, a memory 830 and a communication bus 840, wherein the processor 810, the communication Interface 820 and the memory 830 communicate with each other via the communication bus 840. The processor 810 may call logic instructions in the memory 830 to perform an image semantic segmentation method comprising: determining an image to be semantically segmented; inputting the image into an image semantic segmentation model to obtain an image semantic segmentation result output by the image semantic segmentation model; the image semantic segmentation model is obtained by training based on a sample image and corresponding pixel class labels, and the pixel class labels are predetermined.

In addition, the logic instructions in the memory 830 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, an embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by a computer, the computer can execute the image semantic segmentation method provided by the above methods, where the method includes: determining an image to be semantically segmented; inputting the image into an image semantic segmentation model to obtain an image semantic segmentation result output by the image semantic segmentation model; the image semantic segmentation model is obtained by training based on a sample image and corresponding pixel class labels, and the pixel class labels are predetermined.

In yet another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to perform the image semantic segmentation method provided in the foregoing aspects, the method including: determining an image to be semantically segmented; inputting the image into an image semantic segmentation model to obtain an image semantic segmentation result output by the image semantic segmentation model; the image semantic segmentation model is obtained by training based on a sample image and corresponding pixel class labels, and the pixel class labels are predetermined.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An image semantic segmentation method, comprising:

determining an image to be semantically segmented;

2. The image semantic segmentation method according to claim 1, wherein the image semantic segmentation model comprises a trunk feature extraction model, an enhanced feature extraction model, a classification model and a segmentation model;

3. The image semantic segmentation method according to claim 2, characterized in that the sample image is selected from an image dataset;

the enhanced feature extraction model comprises a weight block;

4. The image semantic segmentation method according to claim 3, wherein the weighted value weighting is performed on the image features of the plurality of effective feature layers respectively to obtain the image fusion feature of each effective feature layer, and a formula thereof is as follows:

5. An image semantic segmentation system, characterized by comprising an image determination module and an image semantic segmentation module:

6. The image semantic segmentation system according to claim 5, wherein the image semantic segmentation module comprises a trunk feature extraction module, an enhanced feature extraction module, a classification module, and a segmentation module;

7. The image semantic segmentation system of claim 6 wherein the sample image is selected from an image dataset;

the enhanced feature extraction module comprises a weight block;

8. The image semantic segmentation system according to claim 7, wherein the weight block is configured to weight the image features of the plurality of effective feature layers respectively to obtain the image fusion features of each effective feature layer, and the formula is as follows:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the image semantic segmentation method according to any one of claims 1 to 4 when executing the program.

10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the image semantic segmentation method according to any one of claims 1 to 4.