CN116310308A

CN116310308A - Image segmentation method, device, computer equipment and storage medium

Info

Publication number: CN116310308A
Application number: CN202211653434.2A
Authority: CN
Inventors: 杜立辉; 周奇明
Original assignee: Zhejiang Huanuokang Technology Co ltd
Current assignee: Zhejiang Huanuokang Technology Co ltd
Priority date: 2022-12-21
Filing date: 2022-12-21
Publication date: 2023-06-23

Abstract

The application relates to an image segmentation method, an image segmentation device, computer equipment and a storage medium, wherein the image segmentation method comprises the following steps: acquiring a first region image and a second region image of an image to be segmented, wherein the first region image comprises global information of the image to be segmented, and the second region image comprises local information of the image to be segmented; acquiring a first region feature of the first region image based on a first sub-network in an image segmentation model, and acquiring a second region feature of the second region image based on a second sub-network in the image segmentation model; and fusing the first region features and the second region features to obtain fused region features, and determining an image segmentation result based on the fused region features. By the method and the device, the problem of low image segmentation precision in the related technology is solved, the richness of the description information of the image features is improved, and the image segmentation precision is further improved.

Description

Image segmentation method, device, computer equipment and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image segmentation method, an image segmentation apparatus, a computer device, and a storage medium.

Background

With the continuous development of deep learning technology, the application of computer vision models based on deep learning in the medical industry becomes a current research hotspot. The semantic segmentation of pathological section images based on deep learning has important value for diagnosing diseases such as cancers. However, due to factors such as complexity and concentration of pathological data, the labeling data of the image are often not comprehensive and accurate enough. Therefore, accurate image segmentation with limited annotation data is of great research interest.

In the related art, feature extraction is generally directly performed on an image, and a segmentation result is determined based on the image feature. For example, the target foreground is segmented by using a seed growing mechanism and a boundary constraint rule, or an image is cut into a plurality of image blocks with fixed window sizes, and the characteristics of the image blocks are extracted and segmented through an image segmentation model. However, in the related art, for images with high overlapping degree, blurred edges, and complex morphological distribution in different regions such as pathological section images, description information of the images is not fully considered when the images are segmented, and the segmentation accuracy is low when the labeling data are limited.

Aiming at the technical problem of lower image segmentation precision in the related technology, no effective solution is proposed at present.

Disclosure of Invention

Based on the above, the application provides an image segmentation method, an image segmentation device, a computer device and a storage medium, which solve the technical problem of lower image segmentation precision in the related art.

In a first aspect, the present application provides an image segmentation method, including:

acquiring a first region image and a second region image of an image to be segmented, wherein the first region image comprises global information of the image to be segmented, and the second region image comprises local information of the image to be segmented;

acquiring a first region feature of the first region image based on a first sub-network in an image segmentation model, and acquiring a second region feature of the second region image based on a second sub-network in the image segmentation model;

and fusing the first region features and the second region features to obtain fused region features, and determining an image segmentation result based on the fused region features.

In one embodiment, the training method of the image segmentation model includes:

Acquiring a global sample image and a local sample image corresponding to the sample image, wherein the sample image contains segmentation tag information;

acquiring global sample features of the global sample image based on a first initial sub-network in an initial image segmentation model, and acquiring local sample features of the local sample image based on a second initial sub-network in the initial image segmentation model;

fusing the global sample features and the local sample features to obtain fused sample features, and determining a sample segmentation result based on the fused sample features;

and determining a loss function based on the sample segmentation result and the segmentation label information, and adjusting parameters of the initial image segmentation model based on the loss function to obtain an image segmentation model.

In one embodiment, the segmentation tag information includes a plurality of preset segmentation categories, and the determining the loss function based on the sample segmentation result and the segmentation tag information includes:

and determining a first loss function based on the predicted probability and the actual probability of each preset segmentation category.

In one embodiment, the determining the first loss function further includes:

Obtaining a mask vector corresponding to a preset area of the sample image, wherein the mask vector comprises a classification identifier corresponding to each pixel point;

acquiring a sample feature map, wherein the sample feature map is a feature map output by a feature extraction channel in the initial image segmentation model;

acquiring sample feature vectors corresponding to the preset areas based on the sample feature graphs, and determining sample mean vectors corresponding to each preset segmentation category based on the mask vectors and the sample feature vectors;

and determining a second loss function based on a plurality of distance parameters between a target sample mean vector corresponding to a target preset segmentation class and a plurality of sample mean vectors corresponding to a plurality of preset segmentation classes.

In one embodiment, the determining the second loss function further includes:

and carrying out weighted summation on the first loss function and the second loss function to obtain a target loss function.

In one embodiment, before the acquiring the global sample image and the local sample image corresponding to the sample image, the method further includes:

acquiring a first sample image and a second sample image;

acquiring a sub-region image in the first sample image;

And fusing the second sample image and the sub-region image to obtain the sample image.

In one embodiment, the second subnetwork is established based on a hole convolution layer, and parameters of the hole convolution layer include a preset number of hole convolution kernels and expansion rates corresponding to each hole convolution kernel.

In one embodiment, the fusing the first region feature and the second region feature to obtain a fused region feature includes:

performing transposition processing on the first region features, and multiplying a transposition result by the second region features to obtain initial fusion features;

and carrying out normalization processing on the initial fusion feature, and multiplying the normalization result by the first region feature to obtain a fusion region feature.

In one embodiment, the determining the image segmentation result based on the fused region features includes:

and carrying out up-sampling treatment on the fusion region characteristics to obtain an image segmentation result.

In a second aspect, the present application further provides an image segmentation apparatus, including:

the image acquisition module is used for acquiring a first area image and a second area image of an image to be segmented, wherein the first area image contains global information of the image to be segmented, and the second area image contains local information of the image to be segmented;

The feature acquisition module is used for acquiring first region features of the first region image based on a first sub-network in the image segmentation model and acquiring second region features of the second region image based on a second sub-network in the image segmentation model;

and the fusion module is used for fusing the first region features and the second region features to obtain fusion region features, and determining an image segmentation result based on the fusion region features.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:

In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:

The application provides an image segmentation method, an image segmentation device, computer equipment and a storage medium, wherein the image segmentation method comprises the following steps: acquiring a first region image and a second region image of an image to be segmented, wherein the first region image comprises global information of the image to be segmented, and the second region image comprises local information of the image to be segmented; acquiring a first region feature of the first region image based on a first sub-network in an image segmentation model, and acquiring a second region feature of the second region image based on a second sub-network in the image segmentation model; and fusing the first region features and the second region features to obtain fused region features, and determining an image segmentation result based on the fused region features. The global features, namely the first region features and the local features, namely the second region features, of the image to be segmented are extracted, and are fused to determine an image segmentation result, local information of the image and context information between the local and the whole are combined in the image feature extraction process, so that image segmentation is avoided through only image features with single dimension, the technical problem of low image segmentation precision in the related art is solved, the dependency relationship between the local semantics and the whole semantics of the image is constructed, the richness of description information of the image features is improved, and the image segmentation precision is further improved.

Drawings

FIG. 1 is an application environment diagram of an image segmentation method of one embodiment of the present application;

FIG. 2 is a flow chart of an image segmentation method according to an embodiment of the present application;

FIG. 3 is a flow chart of an image segmentation method according to another embodiment of the present application;

FIG. 4 is a flow chart illustrating a method for determining a sample mean vector according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a multi-scale encoder according to an embodiment of the present application;

FIG. 6 is a flow chart of a feature fusion method according to an embodiment of the present application;

fig. 7 is a block diagram of the structure of an image dividing apparatus according to an embodiment of the present application;

fig. 8 is an internal structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

The image segmentation method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.

With the continuous development of deep learning technology, the application of computer vision models based on deep learning in the medical industry becomes a current research hotspot. Among them, segmentation of pathological section images based on deep learning has important value for diagnosis of diseases such as cancer.

However, due to factors such as complexity and concentration of pathological data, the labeling of the image segmentation data has high requirements on the expertise of labeling personnel, and is also quite energy-consuming. Moreover, because the resolution of a single pathological section image is too high and the number of cells is too large, labeling staff usually only marks a small part of areas in the image for training of the neural network based on the consideration of labeling cost. In addition, because the labeling personnel have strong subjectivity in the labeling process, even the labeling personnel with abundant experience can generate divergence on certain areas, and certain deviation exists in the labeling result.

Therefore, accurate image segmentation with limited annotation data is of great research interest. Under the condition of limited labeling data, the local information and the global context information of the image are fully utilized, and the precision of image segmentation can be obviously improved. However, due to the high resolution of pathological section images, complex background structure, dense distribution of tissue primitives, high adhesion overlap and other factors, accurate image segmentation using image information is a challenging task.

Referring to fig. 2, fig. 2 is a flow chart of an image segmentation method according to an embodiment of the present application.

In one embodiment, as shown in fig. 2, the image segmentation method includes:

s202: and acquiring a first area image and a second area image of the image to be segmented, wherein the first area image comprises global information of the image to be segmented, and the second area image comprises local information of the image to be segmented.

Specifically, an image to be segmented is obtained, and a first area image and a second area image are determined based on the image to be segmented. It can be appreciated that the first region image may be a sub-region image of the image to be segmented (the sub-region image contains sufficient global information), or the first region image may be directly set as the image to be segmented; the second region image is a sub-region image of the image to be segmented.

The first region image contains global information of the image to be segmented. Under the condition that redundant information does not exist in the images to be segmented, the preferable first area image is directly set as the images to be segmented, so that the comprehensiveness of the global information of the first area image is ensured; and taking the sub-region image corresponding to the residual image information as a first region image when redundant information exists in the images to be segmented.

The second area image contains local information of the image to be segmented, so that the second area image is only used as a sub-area image of the image to be segmented. In this embodiment, the definition of the second area image is not limited, for example, the center of the image to be segmented may be used as the center of the second area image, and a rectangular area image with a preset size may be defined as the second area image; alternatively, the second area image may be set as an image corresponding to a target area that needs to be focused on; and the second region image may also be a region image in which the image to be segmented is randomly sampled and divided.

S204: based on a first sub-network in the image segmentation model, a first region feature of a first region image is acquired, and based on a second sub-network in the image segmentation model, a second region feature of a second region image is acquired.

Specifically, a trained image segmentation model is obtained, and a first sub-network and a second sub-network are arranged in the image segmentation model; inputting the first region image into a first sub-network to extract corresponding first region features; and inputting the second region image into a second sub-network to extract corresponding second region features.

The first region feature and the second region feature in this embodiment may be visual features such as color, texture, edges, etc., manually designed mathematical features such as HOG (Histogram of Oriented Gradient, directional gradient histogram feature), SIFT (Scale-invariant feature transform, scale invariance feature), etc., or convolution features. Preferably, the first region feature and the second region feature are set as convolution features, and the corresponding first sub-network and the corresponding second sub-network are set as convolution neural networks, and in the embodiment of the present application, the solution of the present application is described by taking the convolution neural network as an example.

Illustratively, the first subnetwork may be configured as a ResNet network and the second subnetwork may be configured as a multi-scale encoder. Wherein, the ResNet network comprises 5 convolution groups capable of carrying out residual error operation, and each convolution group comprises at least one convolution operation and one downsampling operation; the multi-scale encoder is used to extract feature maps for multiple scales based on different convolution parameters.

S206: and fusing the first region features and the second region features to obtain fused region features, and determining an image segmentation result based on the fused region features.

Specifically, after the first region feature and the second region feature are extracted, the first sub-network and the second sub-network send the first region feature and the second region feature to a backbone network in an image segmentation model, and then the backbone network performs feature fusion on the first region feature and the second region feature to obtain a fused region feature. In this embodiment, the fusion method of the first region feature and the second region feature is not limited, and the fusion region feature may be obtained by methods such as weighted summation, averaging, and vector multiplication.

Specifically, after the fusion region features are acquired, an image segmentation result is acquired based on the fusion region features. For example, upsampling the fused region features to obtain a final segmented image; or, normalizing the fusion region characteristics, classifying pixels and the like, thereby obtaining a segmented image.

Referring to fig. 3, fig. 3 is a flowchart illustrating an image segmentation method according to another embodiment of the present application.

Illustratively, as shown in fig. 3, the image to be segmented is a pathological slice image, the first subnetwork is a res net network, and the second subnetwork is a multi-scale encoder. Taking the pathological section image as a first area image, and carrying out center cutting on the pathological section image to obtain a second area image; inputting the first region image into a ResNet network for feature extraction to obtain a context feature, namely a first region feature; inputting the second region image into a multi-scale encoder to obtain local features, namely second region features; carrying out feature fusion on the context features and the local feature input value context embedding module to obtain fusion region features, thereby constructing a double-branch semantic dependency relationship; and carrying out network up-sampling on the fusion region characteristics to obtain a segmentation result.

It can be understood that, in the image segmentation method shown in fig. 3, besides the local features of the image to be segmented are extracted by using a traditional convolutional neural network, the aggregation information of a larger image area is extracted by using an additional context branch, namely a res net network, so that the accuracy of the image information described by the fused area features is ensured.

In this embodiment, a first area image and a second area image of an image to be segmented are acquired, wherein the first area image contains global information of the image to be segmented, and the second area image contains local information of the image to be segmented; acquiring a first region characteristic of a first region image based on a first sub-network in the image segmentation model, and acquiring a second region characteristic of a second region image based on a second sub-network in the image segmentation model; and fusing the first region features and the second region features to obtain fused region features, and determining an image segmentation result based on the fused region features. The global features, namely the first region features and the local features, namely the second region features, of the image to be segmented are extracted, and are fused to determine an image segmentation result, local information of the image and context information between the local and the whole are combined in the image feature extraction process, so that image segmentation is avoided through only image features with single dimension, the technical problem of low image segmentation precision in the related art is solved, the dependency relationship between the local semantics and the whole semantics of the image is constructed, the richness of description information of the image features is improved, and the image segmentation precision is further improved.

In another embodiment, a training method of an image segmentation model includes:

step 1: acquiring a global sample image and a local sample image corresponding to the sample image, wherein the sample image contains segmentation tag information;

step 2: acquiring global sample characteristics of a global sample image based on a first initial sub-network in an initial image segmentation model, and acquiring local sample characteristics of a local sample image based on a second initial sub-network in the initial image segmentation model;

step 3: fusing the global sample features and the local sample features to obtain fused sample features, and determining a sample segmentation result based on the fused sample features;

step 4: and determining a loss function based on the sample segmentation result and the segmentation label information, and adjusting parameters of the initial image segmentation model based on the loss function to obtain the image segmentation model.

Specifically, a sample image is acquired, and a global sample image and a local sample image are defined based on the sample image. The sample image includes preset partition label information, for example, the sample image includes a region boundary corresponding to each target and a category identifier, the region boundary may be defined based on a contour of the target, and the category identifier may be determined based on a category of the target; the determination manners of the global sample image and the local sample image are similar to those of the first area image and the second area image, and the description of this embodiment is omitted.

Specifically, an initial image segmentation model is obtained, wherein the initial image segmentation model is an image segmentation model without training. Inputting the global sample image into a first initial sub-network to extract corresponding global sample features; the local sample image is input into a second initial sub-network of values to extract corresponding local sample features. And after the global sample characteristics and the local sample characteristics are obtained, fusing the global sample characteristics and the local sample characteristics to obtain fused sample characteristics. Further, the fusion sample features are processed, and a sample segmentation result corresponding to the sample image is obtained.

Specifically, after a sample segmentation result is obtained, a loss function is determined based on the deviation degree between the sample segmentation result predicted by the initial image segmentation model and the pre-labeled segmentation label information, and the parameters of the initial image segmentation model are reversely adjusted through the loss function so as to enable the loss function to be converged. And after the loss function accords with a preset convergence condition, obtaining a final image segmentation model.

According to the method, the initial image segmentation model is trained through the sample image, parameters in the model are optimized based on the sample segmentation result and the deviation degree in the segmentation label information, so that the prediction result of the model can be continuously approximate to preset segmentation label information, and the segmentation accuracy of the image segmentation model is improved.

In another embodiment, the segmentation tag information includes a plurality of preset segmentation categories, and determining the loss function based on the sample segmentation result and the segmentation tag information includes:

the first loss function is determined based on the predicted probability and the actual probability for each preset segmentation class.

Specifically, in this embodiment, the partition label information includes a plurality of preset partition categories corresponding to the image objects. When the loss of the initial image segmentation model is calculated, the prediction probability of each target or pixel under each preset segmentation category is obtained, and a first loss function is determined according to the deviation between the prediction probabilities corresponding to the preset segmentation categories and the actual probability.

Illustratively, the first loss function is calculated as follows:

wherein, loss _seg For the first loss function, the value is determined based on the distribution difference of the prediction probability and the actual probability corresponding to a plurality of preset segmentation categories, C is the number of preset segmentation categories, and l _c Z is the prediction probability corresponding to the preset segmentation category _c And presetting the actual probability corresponding to the segmentation class.

In the embodiment, the first loss function is determined based on the distribution difference between the prediction probability and the actual probability corresponding to the preset segmentation category, so that the method has higher applicability to the pixel-level image segmentation task and higher convergence speed, and further the training precision and the segmentation precision of the image segmentation model are improved.

In another embodiment, determining the first loss function further comprises:

step 1: obtaining a mask vector corresponding to a preset area of a sample image, wherein the mask vector comprises a classification identifier corresponding to each pixel point;

step 2: acquiring a sample feature map, wherein the sample feature map is a feature map output by a feature extraction channel in an initial image segmentation model;

step 3: sample feature vectors corresponding to the preset areas are obtained based on the sample feature graphs, and sample mean value vectors corresponding to each preset segmentation category are determined based on the mask vectors and the sample feature vectors;

step 4: and determining a second loss function based on a plurality of distance parameters between the target sample mean vector corresponding to the target preset segmentation class and a plurality of sample mean vectors corresponding to a plurality of preset segmentation classes.

Specifically, the loss function of the initial image segmentation model in this embodiment further includes a second loss function. First, a preset region is specified based on a sample image, and a mask vector corresponding to the preset region is determined. Each element in the mask vector is a classification identifier corresponding to the pixel point. For example, if the first target, the second target and the third target exist in the preset area, the pixel point of the first target is identified as 0, the pixel point of the second target is identified as 1, the pixel point of the third target is identified as 2, and the mask vector of the preset area is determined based on the identification.

Specifically, a sample feature map is obtained, wherein the sample feature map is a feature map output by a feature extraction channel in an initial image segmentation model. It is understood that the feature extraction channel in this embodiment refers to a channel formed by a network layer that performs operations on the sample image and the image features, for example, a channel formed by a network layer such as a convolution layer, a pooling layer, an upsampling layer, and the like. Therefore, the feature map output by the feature extraction channel is the feature map output by the feature operation layer of the last layer, and the size of the feature map is consistent with that of the final segmented image.

Specifically, in the sample feature map, sample feature vectors corresponding to the specified preset areas are obtained, and sample mean value vectors corresponding to each preset segmentation category are calculated by combining the sample feature vectors and the mask vectors. Referring to fig. 4, fig. 4 is a flowchart illustrating a method for determining a sample mean vector according to an embodiment of the present application. More specifically, as shown in fig. 4, a product operation is performed on the sample feature vector and the mask vector to obtain a sample product vector; and in the sample product vector, carrying out averaging processing based on the elements corresponding to each preset segmentation class to obtain a sample mean vector corresponding to the preset segmentation class.

Specifically, after the sample mean value vector corresponding to each preset segmentation class is obtained, each preset segmentation class is sequentially used as a target preset segmentation class, and a second loss function is established according to all distance parameters between the target sample mean value vector of the target preset segmentation class and all sample mean value vectors corresponding to all preset segmentation classes. It will be appreciated that the difference in the predetermined segmentation class determines the magnitude of the distance parameter. The determining manner of the distance parameter in this embodiment is not limited, for example, minkowski distance, hamanton distance, euclidean distance, and the like may be used to determine the distance parameter in this embodiment.

It can be appreciated that the second loss function in this embodiment is used to constrain the similarity of feature representations between the same class of regions in the same batch of images to be segmented, while constraining the dissimilarity of feature representations between different classes of regions.

Illustratively, a sample feature map and a mask vector are obtained, a sample feature vector corresponding to the mask vector is extracted from the sample feature map, and a sample mean vector is further determined. The sample mean vector is determined as follows:

wherein f _c For the sample mean vector, f _x,y For presetting the image feature of the segmentation class c at the (x, y) position of the sample feature vector, P _x,y For category identification at the (x, y) position of the mask vector.

Illustratively, for each preset segmentation class, a sample mean vector is obtained and a loss function for that preset segmentation class is determined:

wherein l (f) _c ，f _i ) For the loss function corresponding to the preset segmentation class c, D (f _c ，f _i ) For preset divisionDistance function between sample mean vector corresponding to category C and mean vector corresponding to all preset segmentation categories, i is between 1 and number C of all preset segmentation categories, f _c F _i The sample mean vectors of the preset segmentation class c and the preset segmentation class i are respectively defined, delta is used for defining the similarity degree between the same preset segmentation class and different preset segmentation class, and the value of the sample mean vector can be set based on actual needs.

Illustratively, the loss functions corresponding to each preset partition category are summarized, so as to obtain a second loss function:

for the distance function D, the cosine similarity is used as a measure of the distance in the present embodiment, and the calculation method is as follows:

wherein x is _c 、y _c To calculate two vectors of cosine similarity, sim (x _c ，y _c ) Is the vector x _c 、y _c Cosine similarity between them.

According to the embodiment, the sample mean value vector is calculated based on the mask vector and the sample feature vector, and the second loss function is determined based on a plurality of distance parameters between the target sample mean value vector corresponding to the target preset segmentation category and a plurality of sample mean value vectors corresponding to a plurality of preset segmentation categories, so that feature similarity of the same semantic region in the image to be segmented is restrained, the difference of features of different semantic regions is enlarged, and the accuracy of the segmentation result is improved.

In another embodiment, determining the second loss function further comprises:

Specifically, after the first loss function and the second loss function are obtained, weights corresponding to the first loss function and the second loss function are respectively determined, and the first loss function and the second loss function are weighted based on the weights, so that a final target loss function is obtained.

Illustratively, the target Loss function Loss of the initial image segmentation model is determined with the weights corresponding to the first Loss function and the second Loss function being the same:

Loss＝Loss _seg +Loss _cont

In the embodiment, the first loss function and the second loss function are weighted and summed, so that the comprehensiveness of loss data in the initial image segmentation model is improved, and the accuracy of parameter adjustment in the training process is further improved.

In another embodiment, before the global sample image and the local sample image corresponding to the sample image are acquired, the method further includes:

step 1: acquiring a first sample image and a second sample image;

step 2: acquiring a subarea image in a first sample image;

step 3: and fusing the second sample image and the sub-region image to obtain a sample image.

Specifically, two training images are acquired from a training set, wherein the two training images are a first sample image and a second sample image respectively; cutting out one area from the first sample image to serve as a sub-area image of the first sample image; and fusing the first sub-region image with the second sample image to obtain a sample image, wherein the sample image is used as a new training image.

Specifically, in the process of fusing the second sample image and the sub-region image in this embodiment, the position of the sub-region image in the first sample image may be acquired first, and then the second sub-region image may be superimposed on the second sample image based on the same position.

Illustratively, in the process of randomly cutting out a region from the first sample image and adding the region to the second sample image, corresponding segmentation label information is mixed. In the sample image, the split label information includes split label information of the second sample image itself and split label information corresponding to the sub-region image.

In the embodiment, the sub-region image is cut out from the first sample data and added to the second sample image, so that a new sample image is obtained through fusion, the sample data size in the training set is increased, the building cost of the training set is reduced, and the training effect of the initial image segmentation model is improved.

In another embodiment, the second subnetwork is established based on a hole convolution layer, and the parameters of the hole convolution layer include a preset number of hole convolution kernels and a corresponding expansion rate of each hole convolution kernel.

Specifically, in this embodiment, the second subnetwork is built through the hole convolution layer, and a plurality of convolution kernels are disposed in the hole convolution layer of the second subnetwork, and each hole convolution kernel is provided with a corresponding expansion rate. Wherein, the cavity convolution is also called expansion convolution or expansion convolution, and spaces (generally 0) are arranged between elements of the cavity convolution kernel to expand the convolution kernel; the expansion rate is the expansion coefficient of the cavity convolution kernel and is used for measuring the size relation between the cavity convolution kernel and the original convolution kernel.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a multi-scale encoder according to an embodiment of the present application.

Illustratively, as shown in fig. 5, the second subnetwork in the present embodiment is provided as a multi-scale encoder, which is constructed based on ASPP (Atrous Spatial Pyramid Pooling module, hole-space pyramid pooling) in which four convolution layers and one pooling layer are provided. The four convolution kernels are, in order, a 1*1 convolution kernel, a 3*3 convolution kernel (expansion rate of 6), a 3*3 convolution kernel (expansion rate of 12), and a 3*3 convolution kernel (expansion rate of 18). And after the image to be segmented passes through ASPP, obtaining output results of each convolution layer and each pooling layer. And performing convolution operation again based on the 1*1 convolution kernel to obtain a second region characteristic.

In the embodiment, the second subnetwork is built based on the cavity convolution layer, so that the receptive field of the second subnetwork in the second region image is improved through the cavity convolution kernel, and further, the accuracy of feature extraction and image segmentation are improved.

In another embodiment, fusing the first region feature and the second region feature to obtain a fused region feature includes:

step 1: performing transposition processing on the first region features, and multiplying the transposition result by the second region features to obtain initial fusion features;

Step 2: and carrying out normalization processing on the initial fusion feature, and multiplying the normalization result by the first region feature to obtain the fusion region feature.

Specifically, after the first region feature and the second region feature are extracted, the first region feature and the second region feature are sent to a backbone network of an image segmentation model; and the image segmentation model carries out transposition processing on the first region characteristics, and multiplies the matrix subjected to the transposition processing by the second region characteristics to obtain initial fusion characteristics. And carrying out normalization processing on the initial fusion matrix through a Softmax function, and multiplying the normalization processing result with the first region characteristic again to obtain a final fusion region.

Referring to fig. 6, fig. 6 is a flow chart of a feature fusion method according to an embodiment of the present application.

Illustratively, as shown in fig. 6, the first region features have a size w×h×256, and the second region features have a size w×h×256. Transposing the first region features to obtain a transposed matrix with a size of 256 xw x h, and multiplying the transposed matrix by the second region features to obtain an initial fusion matrix with a size of WH x WH; and carrying out normalization processing on the initial fusion matrix through a Softmax function, and multiplying the matrix subjected to normalization processing by the first region feature to obtain the final fusion region feature, wherein the size of the fusion region feature is W.times.H.times.256.

In the embodiment, the first region feature is transposed, and the transposed result is multiplied by the second region feature to obtain an initial fusion feature; and carrying out normalization processing on the initial fusion feature, and multiplying the normalization result by the first region feature to obtain the fusion region feature. The fusion process is simple and easy to implement, thereby improving the speed of image segmentation.

In another embodiment, determining the image segmentation result based on the fused region features includes:

Specifically, the fusion region features are obtained, and up-sampling is carried out on the fusion region features, so that a final segmented image is obtained, and the segmented image is an image segmentation result.

For the fusion region feature, the image segmentation model performs 4 times up-sampling, performs convolution operation through a 3*3 convolution kernel, and performs 4 times up-sampling again to obtain a final image segmentation result.

In the embodiment, the up-sampling processing is performed on the fusion region features to obtain the image segmentation result, and the feature processing process is simple, so that the segmentation efficiency of the image is improved.

It should be understood that, although the steps in the flowcharts related to the above embodiments are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides an image segmentation device for realizing the above-mentioned image segmentation method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in one or more embodiments of the image segmentation device provided below may refer to the limitation of the image segmentation method hereinabove, and will not be repeated herein.

In one embodiment, as shown in fig. 7, there is provided an image segmentation apparatus including:

the image acquisition module 10 is configured to acquire a first area image and a second area image of an image to be segmented, where the first area image includes global information of the image to be segmented, and the second area image includes local information of the image to be segmented;

a feature acquiring module 20, configured to acquire a first region feature of a first region image based on a first sub-network in the image segmentation model, and acquire a second region feature of a second region image based on a second sub-network in the image segmentation model;

the fusion module 30 is configured to fuse the first region feature and the second region feature to obtain a fused region feature, and determine an image segmentation result based on the fused region feature;

The fusion module 30 is further configured to transpose the first region feature, and multiply the transposed result with the second region feature to obtain an initial fusion feature;

normalizing the initial fusion feature, and multiplying the normalization result by the first region feature to obtain a fusion region feature;

the fusion module 30 is further configured to perform upsampling processing on the fusion region feature to obtain an image segmentation result;

the image segmentation device further comprises a training module;

the training module is used for acquiring a global sample image and a local sample image corresponding to the sample image, wherein the sample image contains segmentation label information;

acquiring global sample characteristics of a global sample image based on a first initial sub-network in an initial image segmentation model, and acquiring local sample characteristics of a local sample image based on a second initial sub-network in the initial image segmentation model;

determining a loss function based on a sample segmentation result and segmentation label information, and adjusting parameters of an initial image segmentation model based on the loss function to obtain an image segmentation model;

The training module is further used for determining a first loss function based on the prediction probability and the actual probability of each preset segmentation category;

the training module is also used for acquiring a mask vector corresponding to a preset area of the sample image, wherein the mask vector comprises a classification identifier corresponding to each pixel point;

acquiring a sample feature map, wherein the sample feature map is a feature map output by a feature extraction channel in an initial image segmentation model;

sample feature vectors corresponding to the preset areas are obtained based on the sample feature graphs, and sample mean value vectors corresponding to each preset segmentation category are determined based on the mask vectors and the sample feature vectors;

determining a second loss function based on a plurality of distance parameters between a target sample mean vector corresponding to a target preset segmentation class and a plurality of sample mean vectors corresponding to a plurality of preset segmentation classes;

the training module is also used for carrying out weighted summation on the first loss function and the second loss function to obtain a target loss function;

the image segmentation device further comprises a sample establishment module;

the sample establishing module is used for acquiring a first sample image and a second sample image;

acquiring a subarea image in a first sample image;

And fusing the second sample image and the sub-region image to obtain a sample image.

The respective modules in the above-described image dividing apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a terminal, and the internal structure thereof may be as shown in fig. 8. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement an image segmentation method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structure shown in fig. 8 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:

acquiring a first area image and a second area image of an image to be segmented, wherein the first area image comprises global information of the image to be segmented, and the second area image comprises local information of the image to be segmented;

acquiring a first region characteristic of a first region image based on a first sub-network in the image segmentation model, and acquiring a second region characteristic of a second region image based on a second sub-network in the image segmentation model;

In one embodiment, the processor when executing the computer program further performs the steps of:

and determining a loss function based on the sample segmentation result and the segmentation label information, and adjusting parameters of the initial image segmentation model based on the loss function to obtain the image segmentation model.

obtaining a mask vector corresponding to a preset area of a sample image, wherein the mask vector comprises a classification identifier corresponding to each pixel point;

and determining a second loss function based on a plurality of distance parameters between the target sample mean vector corresponding to the target preset segmentation class and a plurality of sample mean vectors corresponding to a plurality of preset segmentation classes.

acquiring a first sample image and a second sample image;

acquiring a subarea image in a first sample image;

performing transposition processing on the first region features, and multiplying the transposition result by the second region features to obtain initial fusion features;

And carrying out normalization processing on the initial fusion feature, and multiplying the normalization result by the first region feature to obtain the fusion region feature.

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of:

Acquiring a sample feature map, wherein the feature map is output by a feature extraction channel in a sample feature map initial image segmentation model;

acquiring a first sample image and a second sample image;

acquiring a subarea image in a first sample image;

In one embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, performs the steps of:

acquiring a first sample image and a second sample image;

acquiring a subarea image in a first sample image;

It should be noted that, user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims

1. An image segmentation method, comprising:

2. The image segmentation method as set forth in claim 1, wherein the training method of the image segmentation model includes:

3. The image segmentation method as set forth in claim 2, wherein the segmentation label information includes a plurality of preset segmentation categories, and wherein determining a loss function based on the sample segmentation result and the segmentation label information comprises:

4. The image segmentation method as set forth in claim 3, further comprising, after the determining the first loss function:

5. The image segmentation method as set forth in claim 4, further comprising, after the determining the second loss function:

6. The image segmentation method according to claim 2, wherein before the global sample image and the local sample image corresponding to the sample image are acquired, further comprising:

acquiring a first sample image and a second sample image;

acquiring a sub-region image in the first sample image;

7. The image segmentation method according to claim 1, wherein the second subnetwork is established based on a hole convolution layer, and parameters of the hole convolution layer include a preset number of hole convolution kernels and a corresponding expansion rate of each hole convolution kernel.

8. The image segmentation method according to claim 1, wherein the fusing the first region feature and the second region feature to obtain a fused region feature includes:

9. The image segmentation method as set forth in claim 1, wherein the determining an image segmentation result based on the fused region features comprises:

10. An image dividing apparatus, comprising:

11. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 9 when the computer program is executed.

12. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 9.

13. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 9.