CN116681888A

CN116681888A - Intelligent image segmentation method and system

Info

Publication number: CN116681888A
Application number: CN202310501518.2A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Zhongke Chaojing Nanjing Technology Co ltd
Current assignee: Zhongke Chaojing Nanjing Technology Co ltd
Priority date: 2023-04-28
Filing date: 2023-05-06
Publication date: 2023-09-01

Abstract

The invention relates to the technical field of image processing, in particular to an intelligent image segmentation method and system, which aim at solving the problem of how to improve the image segmentation precision and the segmentation efficiency at the same time. For this purpose, the method comprises the steps of acquiring an image to be segmented, and inputting the image to be segmented into a trained image segmentation network to obtain an image segmentation result, wherein the image segmentation network comprises a plurality of downsampling modules, a plurality of position attention modules, a plurality of upsampling modules and a channel attention module. According to the embodiment, the image segmentation result can be obtained based on the trained image segmentation network, and in the encoding stage, the position information of the specific features is extracted through the position attention module, so that the relative positions of the features with different scales are highlighted; in the decoding stage, the multi-scale characteristics are processed, and the channel attention module is used for extracting multi-scale characteristic information, so that the segmentation efficiency is improved while the image segmentation accuracy is effectively improved.

Description

Intelligent image segmentation method and system

Technical Field

The invention relates to the technical field of image processing, in particular to an intelligent image segmentation method and system.

Background

At present, the method for segmenting the CT image mainly comprises three modes of manual labeling of doctors, segmentation by a traditional algorithm and deep learning segmentation.

Among them, the manual labeling method of doctors requires a great deal of time and effort for manual labeling by professional physical doctors, and is being abandoned gradually due to high time cost and economic cost.

The traditional algorithm carries out edge detection and region sketching according to the information such as the gray value of the CT image, but the traditional algorithm accuracy is reduced due to the factors such as unobvious color and texture of part of small organs, deformation of the organs, unclear photographing of few cases and the like, so the method is also gradually abandoned.

The deep learning is a method based on a statistical principle, has better generalization and robustness by learning a large amount of data, thereby realizing better segmentation precision, and is a segmentation method which is gradually rising at present.

The existing depth method is mainly improved aiming at an UNET network architecture, and a double-attention mechanism is introduced between UNET downsampling and upsampling. Still other approaches utilize their branched structure and parametric reconstruction to reduce the parameters and increase the generalization capability by introducing a RepVGG network architecture.

The above approach, however, does not take into account the information enhancement between different scale features by introducing an attention mechanism within one scale feature between downsampling and upsampling, resulting in low segmentation accuracy, while introducing an attention mechanism between each downsampling and upsampling results in low segmentation efficiency.

Accordingly, there is a need in the art for a new solution to the above-mentioned problems.

Disclosure of Invention

The present invention is proposed to overcome the above-mentioned drawbacks, and to provide an intelligent image segmentation method and system that solves or at least partially solves the technical problem of how to improve the image segmentation accuracy and segmentation efficiency at the same time.

In a first aspect, there is provided an intelligent image segmentation method, the method comprising:

acquiring an image to be segmented;

inputting the image to be segmented into a trained image segmentation network and obtaining an image segmentation result based on the following steps, wherein the image segmentation network comprises a plurality of downsampling modules, a plurality of position attention modules, a plurality of upsampling modules and a channel attention module:

respectively carrying out downsampling operation on the image based on the downsampling modules to obtain a plurality of first feature matrixes with different scales;

based on the plurality of position attention modules, convolution operation and position attention calculation are respectively carried out on the plurality of first feature matrixes, so that a plurality of second feature matrixes with different scales are obtained;

respectively carrying out up-sampling operation on the second feature matrixes based on the up-sampling modules to obtain third feature matrixes with different scales;

Processing the plurality of third feature matrixes to obtain a fourth feature matrix with a preset output size;

and carrying out channel attention calculation on the fourth feature matrix with the preset output size based on the channel attention module to obtain the image segmentation result.

In one aspect of the above intelligent image segmentation method, the convolving the plurality of first feature matrices based on the plurality of position attention modules includes:

performing convolution operation of convolution kernels with different sizes for a plurality of times on all the first feature matrixes respectively;

or, performing convolution operations of convolution kernels with different sizes on part of the first feature matrix for multiple times, and performing convolution operations on other first feature matrices for one time.

In one technical scheme of the above intelligent image segmentation method, the performing convolution operation and position attention calculation on the plurality of first feature matrices based on the plurality of position attention modules respectively, to obtain a plurality of second feature matrices with different scales includes:

based on different position attention modules, the convolution operation and the position attention calculation are respectively carried out on the first feature matrixes with different scales, so as to obtain corresponding position attention matrixes;

And adding each corresponding position attention matrix and the corresponding first feature matrix to obtain a plurality of corresponding second feature matrices.

In one aspect of the above intelligent image segmentation method, the calculating the position attention of the plurality of first feature matrices based on the plurality of position attention modules includes:

acquiring a scale feature matrix obtained by the convolution operation;

and obtaining the corresponding position attention matrix based on the scale feature matrix obtained by the convolution operation.

In one technical scheme of the above intelligent image segmentation method, the obtaining the scale feature matrix obtained by the convolution operation includes:

when the convolution operation is that convolution operations of convolution kernels with different sizes are respectively carried out on all the first feature matrixes for a plurality of times, obtaining a plurality of different scale feature matrixes corresponding to each first feature matrix;

and when the convolution operation is that the convolution operation is carried out on part of the first feature matrix for a plurality of times and the convolution operation is carried out on other first feature matrices for one time, acquiring a plurality of different scale feature matrices corresponding to part of the first feature matrix and acquiring single scale feature matrices corresponding to other first feature matrices.

In one technical scheme of the above intelligent image segmentation method, the processing the plurality of third feature matrices to obtain a fourth feature matrix with a preset output size includes:

storing the third feature matrix obtained by the up-sampling operation each time to obtain a plurality of third feature matrices with different scales;

adjusting the plurality of third feature matrixes to preset output sizes to obtain a plurality of third feature matrixes with preset output sizes;

and carrying out feature fusion on the third feature matrixes with the preset output sizes to obtain a fourth feature matrix with the preset output sizes.

In one technical solution of the above intelligent image segmentation method, the performing, based on the channel attention module, channel attention calculation on the fourth feature matrix with the preset output size includes:

performing dimension compression on the fourth feature matrix based on global average pooling operation;

predicting the weight of each channel based on the fourth feature matrix after the dimension compression, and carrying out the channel attention calculation according to the weight to obtain a fifth feature matrix.

In one technical scheme of the above intelligent image segmentation method, the method further comprises:

And carrying out convolution operation on the fifth feature matrix to obtain the image segmentation result containing the predictive label matrix.

In a second aspect, a smart image segmentation system is provided, which comprises a processor and a storage device, the storage device being adapted to store a plurality of program codes, the program codes being adapted to be loaded and run by the processor to perform the smart image segmentation method according to any one of the above-mentioned aspects of the smart image segmentation method.

In a third aspect, a computer readable storage medium is provided, in which a plurality of program codes are stored, the program codes being adapted to be loaded and run by a processor to perform the intelligent image segmentation method according to any one of the above-mentioned aspects of the intelligent image segmentation method.

One or more of the above technical solutions of the present invention at least has one or more of the following

The beneficial effects are that:

in the technical scheme of implementing the invention, an image to be segmented is acquired, the image to be segmented is input into a trained image segmentation network, and an image segmentation result is obtained based on the following steps, wherein the image segmentation network comprises a plurality of downsampling modules, a plurality of position attention modules, a plurality of upsampling modules and a channel attention module:

Respectively carrying out repeated downsampling operation on the image based on the multiple downsampling modules to obtain multiple first feature matrixes with different scales; performing convolution operation and position attention calculation on the first feature matrixes based on the position attention modules to obtain second feature matrixes with different scales; respectively carrying out up-sampling operation on the second feature matrixes based on the up-sampling modules to obtain third feature matrixes with different scales; processing the plurality of third feature matrixes to obtain a fourth feature matrix with a preset output size; and carrying out channel attention calculation on a fourth feature matrix with preset output size based on the channel attention module to obtain an image segmentation result.

According to the embodiment, the image segmentation result can be obtained based on the trained image segmentation network, and the position information of the specific features is extracted through the position attention module in the encoding stage, so that the relative positions in the features with different scales can be highlighted; in the decoding stage, the multi-scale features are processed, and the channel attention module is used for extracting channel dimension information of the multi-scale features, so that compared with the network structure of the existing UNET network parallel attention mechanism, the image segmentation accuracy is effectively improved, and meanwhile, the segmentation efficiency is improved.

Drawings

The present disclosure will become more readily understood with reference to the accompanying drawings. As will be readily appreciated by those skilled in the art: the drawings are for illustrative purposes only and are not intended to limit the scope of the present invention. Wherein:

FIG. 1 is a flow chart illustrating the main steps of an intelligent image segmentation method according to one embodiment of the present invention;

FIG. 2 is a schematic diagram of the architecture of an image segmentation network according to one embodiment of the invention;

FIG. 3 is a diagram of image segmentation network training according to one embodiment of the present invention;

FIG. 4 is a flowchart of the main steps of performing convolution operation and position attention calculation on a plurality of first feature matrices based on a plurality of position attention modules respectively to obtain a plurality of second feature matrices with different scales according to an embodiment of the present invention;

FIG. 5 is a flowchart illustrating the main steps of performing a position attention calculation on a plurality of first feature matrices based on a plurality of position attention modules, respectively, according to one embodiment of the present invention;

FIG. 6 is a flow chart illustrating the main steps of processing a third feature matrix of multiple scales to obtain a fourth feature matrix of a preset output size according to an embodiment of the present invention;

FIG. 7 is a flowchart illustrating the main steps of channel attention calculation for a fourth feature matrix of a preset output size based on a channel attention module according to one embodiment of the present invention;

fig. 8 is a schematic diagram of the main structure of an embodiment of an intelligent image segmentation system according to the present invention.

List of reference numerals:

801: a processor; 802: a storage device.

Detailed Description

Some embodiments of the invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are merely for explaining the technical principles of the present invention, and are not intended to limit the scope of the present invention.

In the description of the present invention, a "module," "processor" may include hardware, software, or a combination of both. A module may comprise hardware circuitry, various suitable sensors, communication ports, memory, or software components, such as program code, or a combination of software and hardware. The processor may be a central processor, a microprocessor, an image processor, a digital signal processor, or any other suitable processor. The processor has data and/or signal processing functions. The processor may be implemented in software, hardware, or a combination of both. Non-transitory computer readable storage media include any suitable medium that can store program code, such as magnetic disks, hard disks, optical disks, flash memory, read-only memory, random access memory, and the like. The term "a and/or B" means all possible combinations of a and B, such as a alone, B alone or a and B. The term "at least one A or B" or "at least one of A and B" has a meaning similar to "A and/or B" and may include A alone, B alone or A and B. The singular forms "a", "an" and "the" include plural referents.

Some terms related to the present invention will be explained first.

CNN: convolutional Neural Networks, a convolutional neural network, is a type of feedforward neural network (Feedforward Neural Networks) that includes convolutional calculations and has a deep structure, and is one of representative algorithms of deep learning. Convolutional neural networks have the ability to characterize learning (representation learning), and can perform a Shift-invariant classification on input information according to their hierarchical structure (Shift-invariant classification), and are therefore also referred to as Shift-invariant artificial neural networks (Shift-Invariant Artificial Neural Networks, SIANN)

UNet: the CNN network comprises 4 layers of downsampling, 4 layers of upsampling and similar jump connection structures, and is characterized in that the convolutional layers are completely symmetrical in downsampling and upsampling parts, and a feature map of a downsampling end can skip deep sampling and be spliced to a corresponding upsampling end. UNet was primarily used for semantic segmentation of medical images at the beginning of its presentation and was extended to semantic segmentation of 3-dimensional video data and super-resolution image generation in later application studies. UNet is a full convolution network with better universality, and some improved versions facing specific problems are also derived, such as HDsense-UNet constructed by introducing residual blocks at the downsampling end, unet++ containing deep supervision design and model pruning, and the like.

Encoder-decoder structure: refers to generating another output sequence y from one input sequence x. Wherein, the coding is to convert the input sequence into a vector with fixed length; decoding is the re-conversion of the previously generated fixed vector into an output sequence.

Downsampling: simple convolutional layer feature extraction, also commonly referred to as decimation, is the extraction of a small portion from a majority set, and downsampling is actually reducing the image in the image field, primarily to fit the image to the size of the display area, and to generate a thumbnail of the corresponding image. For example, the pooling layer or convolutional layer in a CNN network is downsampling. However, the image is made smaller by the convolution process to extract the features, and the pooled downsampling is to reduce the dimensions of the features.

Upsampling: after the input image has been subjected to the feature extraction by the previous downsampling, the image needs to be restored to its original size for further classification of pixels, so-called semantic segmentation. In general, up-sampling methods are common: bilinear difference, deconvolution, and reverse pooling.

Position attention: the spatial dependence between any two locations used to capture the feature map is weighted and updated by the features at all locations for a particular feature. The weights are feature similarities between the corresponding two locations. Thus, the positions of any two existing similar features can contribute to the boost regardless of the distance between them.

Channel attention: the method is used for learning the importance of each channel, and the importance of each channel is obtained by compressing the space of the feature map and then learning in the dimension of the channel.

Convolution: in image processing, a convolution operation refers to a series of operations performed using one convolution kernel for each pixel in an image. The convolution kernel is a matrix used in image processing, also called a mask, and is a parameter for performing an operation with an original image. The convolution kernel is typically a square grid structure (e.g., a matrix or pixel area of 3*3) with a weight value for each square. When convolution is used for calculation, the center of the convolution kernel is required to be placed on the pixel to be calculated, products of each element in the kernel and the pixel value of the image covered by the element are sequentially calculated and summed, and the obtained structure is the new pixel value of the position.

And (3) pooling: pooling is a method for compressing pictures, and has the significance of feature dimension reduction, and the pooling technology greatly reduces the loss of computing resources and has the advantage of reducing the overfitting of a model. The idea of pooling is derived from image feature aggregation statistics, and popular understanding is that the pooling can blur an image but does not affect the recognition and position judgment of the image; pooling also has the advantage that the translation is not changed, i.e. if the object is translated a small amount in the image (not exceeding the receptive field), such a displacement does not affect the effect of pooling and thus does not affect the feature map extraction of the model.

Linear activation function: a special activation function that does not use any activation function, but directly multiplies an input by a weight coefficient and directly takes an output as its output. It is the simplest and most commonly used activation function. The linear activation function can simplify the construction of neural networks in computers, it can implement nonlinear processing, but it does not perform as well as the nonlinear activation function.

Softmax layer: and the full-connection layer is used for mapping the calculated multiple neuron outputs of the convolutional neural network to a (0, 1) interval to give probability conditions of each classification.

Sigmoid function: a Sigmoid function, also known as an S-shaped growth curve, which is common in biology, is often used as an activation function of a neural network to map variables into (0, 1) intervals due to its single increment and anti-function single increment properties in information science.

As described in the background, in the existing depth method, the attention mechanism is introduced into one scale feature between downsampling and upsampling, the information enhancement between different scale features is not considered, so that the segmentation accuracy is low, and the attention mechanism is introduced into each downsampling and upsampling, so that the segmentation efficiency is low.

In order to solve the problems, the invention provides an intelligent image segmentation method and system.

Referring to fig. 1, fig. 1 is a schematic flow chart of main steps of an intelligent image segmentation method according to an embodiment of the present invention. As shown in fig. 1, the intelligent image segmentation method in the embodiment of the present invention mainly includes the following steps S101 to S107.

Step S101: and acquiring an image to be segmented.

Step S102: inputting the image to be segmented into a trained image segmentation network and obtaining an image segmentation result based on the following steps.

The image segmentation network comprises a plurality of downsampling modules, a plurality of position attention modules, a plurality of upsampling modules and a channel attention module.

Step S103: and respectively carrying out downsampling operation on the image based on the plurality of downsampling modules to obtain a plurality of first feature matrixes with different scales.

Step S104: and respectively carrying out convolution operation and position attention calculation on the plurality of first feature matrixes based on the plurality of position attention modules to obtain a plurality of second feature matrixes with different scales.

Step S105: and respectively carrying out up-sampling operation on the second feature matrixes based on the up-sampling modules to obtain third feature matrixes with different scales.

Step S106: and processing the plurality of third feature matrixes to obtain a fourth feature matrix with a preset output size.

Step S107: and carrying out channel attention calculation on a fourth feature matrix with preset output size based on the channel attention module to obtain an image segmentation result.

Based on the method described in the steps S101 to S107, an image segmentation result can be obtained based on a trained image segmentation network, and in the encoding stage, the position information of the specific features is extracted by the position attention module, so that the relative positions in the features of different scales can be highlighted; in the decoding stage, the multi-scale features are processed, and the channel attention module is used for extracting channel dimension information of the multi-scale features, so that compared with the network structure of the existing UNET network parallel attention mechanism, the image segmentation accuracy is effectively improved, and meanwhile, the segmentation efficiency is improved.

The above steps S101 to S107 are further described below.

In some embodiments of step S101 described above, the image to be segmented is a single mode gray scale image, such as Computed Tomography (CT), single Photon Emission Computed Tomography (SPECT), ultrasound (US), magnetic Resonance Imaging (MRI), etc. in medical images.

Specifically, taking a CT image as an example, step S101 includes the following steps S1011 to S1012.

Step S1011: and (3) performing CT irradiation on the case based on the medical equipment to acquire an initial DCM imaging file.

The DCM file is a file conforming to the digital imaging transmission protocol (Digital Imaging and Communications in Medicine, DICOM) standard, which is a digital imaging device capable of storing various image information, and the DICOM standard supports a plurality of medical devices including electrocardiogram, mri, angioscope, echocardiogram, etc., so that the DCM file is widely used in the medical industry.

Step S1012: the DCM imaging file is preprocessed.

Specifically, the method comprises the following steps:

(1) Resolving the DCM imaging file into a TIF image;

the DCM file is resolved into TIF images because it cannot be used directly for neural network training.

The TIF image (also called TIFF image), i.e. TagImageFileFormat, refers to an image file format, which is one of the formats commonly used in graphic image processing, and the image format is very complex, but because of flexible and changeable storage of image information, it can support many color systems and is independent of an operating system, so that the TIF image has been widely used.

(2) Carrying out gray level truncation on the TIF image;

gray level truncation of TIF images can better highlight organ colors, textures, and boundaries.

(3) And carrying out normalization operation on the image.

The normalization operation refers to a process of performing a series of standard processing transformations on the image to transform the image into a fixed standard form.

In general, the normalization process for images includes 4 steps, namely coordinate centering, x-shaping normalization, scaling normalization, and rotation normalization. The normalization operation makes the images resistant to attacks by geometric transformations, which can find those invariant in the images, knowing that they were originally identical or a series.

Through the above steps S1011 to S1012, an image to be segmented can be acquired. It should be noted that the above examples of the image to be segmented and the method for acquiring the image to be segmented are only illustrative, the image to be segmented may be any single-mode gray-scale image, and in practical application, a person skilled in the art may select a corresponding method for acquiring the image to be segmented according to a specific scene, which is not limited herein.

The above is a further explanation of step S101, and the following further explanation of step S102 is continued.

In some embodiments of step S102 described above, the image segmentation network may be modified based on an existing CNN network, such as adding a position attention module between a downsampling module and an upsampling module in the UNET network encoder-decoder structure, and adding a channel attention module after the upsampling module.

It should be noted that, taking UNET network as an example, the intelligent image segmentation method provided by the present invention may also be applied to CNN networks with any encoder-decoder structures, which is not limited herein.

In some implementations, referring to fig. 2, fig. 2 is a schematic diagram of an image segmentation network according to one embodiment of the invention. As shown in fig. 2, the image segmentation network includes four downsampling modules, four position attention modules, four upsampling modules, and one channel attention module.

Further referring to fig. 3, fig. 3 is a schematic diagram of an image segmentation network training according to an embodiment of the present invention.

In some embodiments, a method of training an image segmentation network generally comprises the steps of:

(1) Acquiring an original image and preprocessing the original image to obtain a training set;

taking the CT image as an example, the details of acquiring the original image and performing the preprocessing are shown in the above steps S1011 to S1012, which are not repeated here.

(2) Converting the training set into Tensor input image segmentation network;

wherein Tensor is a multiple linear mapping defined on a Cartesian product of vector space and dual space, the coordinates of which are in the |n| dimensional space, there is a quantity of |n| components, each of which is a function of the coordinates, and upon coordinate transformation, the components are also linearly transformed according to certain rules. In deep learning, the Tensor is actually a multidimensional array, and the purpose is to create a matrix and a vector with higher dimensionality.

(3) Training a downsampling module, a position attention module, an upsampling module and a channel attention module in the image segmentation network based on the training set;

(4) Performing error function calculation on the image segmentation result based on the original classification label matrix, and updating and back-propagating the image segmentation network based on the calculation result;

the original classification label matrix is formed by labeling classification labels (for example, labeling organ labels of different categories for CT images) on original images, and then converting the original classification label matrix into a matrix with the same format as an image segmentation result output by an image segmentation network through single-heat encoding treatment so as to conveniently calculate error functions of the prediction classification label matrix of different categories output by the image segmentation network and the original classification label.

Further, the image segmentation network is updated and back-propagated based on the error function calculation results, including weight parameters, gradients, and the like.

(5) And when the error function of the image segmentation network converges to a preset error value, completing the training of the image segmentation network.

The preset error value is a stable low value, and a person skilled in the art can set the preset error value according to an actual application scenario, which is not limited herein.

Further, after the image segmentation network is trained, the image to be segmented can be input into the trained image segmentation network, and an image segmentation result is obtained.

The above is a further explanation of step S102, and the following further explanation of step S103 is continued.

In some embodiments of step S103 described above, the downsampling module is a multi-scale encoder structure consisting of a convolutional layer, a pooling layer, and a linear activation function.

In some embodiments, the feature extraction is performed on the image based on the downsampling operations performed by the four downsampling modules shown in fig. 2, so that four first feature matrices with different scales can be obtained.

Feature matrices of different scales represent feature matrices of different fields of view. Where the dimensions include depth, width, and number of channels.

The above is a further explanation of step S103, and the following further explanation of step S104 is continued.

In some embodiments of step S104 described above, the plurality of location attention modules includes a multi-scale location attention module and/or a single-scale location attention module.

Specifically, referring to fig. 4, fig. 4 is a schematic flow chart of main steps of performing convolution operation and position attention calculation on a plurality of first feature matrices based on a plurality of position attention modules to obtain a plurality of second feature matrices with different scales according to an embodiment of the present invention.

As shown in fig. 4, step S104 mainly includes the following steps S1041 to S1042:

step S1041: and respectively carrying out convolution operation and position attention calculation on the first feature matrixes with different scales based on different position attention modules to obtain corresponding position attention matrixes.

In some embodiments, performing a convolution operation on the plurality of first feature matrices based on the plurality of position attention modules respectively mainly includes:

and respectively carrying out convolution operation of convolution kernels with different sizes on all the first feature matrixes.

The plurality of position attention modules are all multi-scale position attention modules.

Or, the convolution operation of convolution kernels with different sizes is respectively carried out on part of the first feature matrixes, and one convolution operation is carried out on other first feature matrixes.

The plurality of location awareness modules at this time include a multi-scale location awareness module and a single-scale location awareness module.

Specifically, convolution operations of convolution kernels with different sizes can be performed on all the four first feature matrices based on four multi-scale position attention modules; or performing convolution operation of convolution kernels of different sizes on the previous first feature matrix only based on one multi-scale position attention module, and performing convolution operation on the next three first feature matrices based on three single-scale position attention modules; or the convolution operation of convolution kernels with different sizes is carried out on the first two first feature matrixes for multiple times based on the two multi-scale position attention modules, and the convolution operation is carried out on the second two first feature matrixes for one time based on the two single-scale position attention modules.

It should be noted that, the comparison data obtained through the ablation experiment is based on the convolution operation of the convolution kernels with different sizes on the first two first feature matrixes for multiple times by the two multi-scale position attention modules, and the effect of performing one convolution operation on the second two first feature matrixes by the two single-scale position attention modules is better.

Further, in some implementations, referring to fig. 5, fig. 5 is a flowchart illustrating main steps of performing a position attention calculation on a plurality of first feature matrices based on a plurality of position attention modules according to an embodiment of the present invention. As shown in fig. 5, the method mainly includes the following steps S501 to S502:

step S501: and obtaining a scale feature matrix obtained by convolution operation.

In some embodiments, when the convolution operation is a convolution operation performed on all the first feature matrices for a plurality of times of convolution kernels with different sizes, a plurality of different scale feature matrices corresponding to each of the first feature matrices are obtained.

Specifically, the four first feature matrices obtained in the step S103 may be denoted as A1, A2, A3, and A4, and the convolution operations of the convolution kernels of different sizes are performed on each of the A1, A2, A3, and A4 based on the four multi-scale position attention modules.

For example, three convolution operations with 3x3, 5x5 and 7x7 are performed on each of A1, A2, A3 and A4 based on four multi-scale position attention modules, so three scale feature matrices are obtained for each of A1, A2, A3 and A4, taking A1 as an example, the obtained three scale feature matrices are denoted as B3, B5 and B7, and then three 1x1 convolution operations are performed on each of B3, B5 and B7 to extract features to obtain Ci, di and Ei, wherein i is the convolution kernel size of the scale feature, for example, three 1x1 convolution operations are performed on B3 to extract feature matrices C3, D3 and E3; carrying out three times of 1x1 convolution operation on the B5 to extract feature matrixes C5, D5 and E5; the feature matrices C7, D7, E7 are extracted by performing a 1x1 convolution operation three times on B7.

When the convolution operation is that convolution operations of convolution kernels with different sizes are respectively carried out on part of the first feature matrixes, and one convolution operation is carried out on other first feature matrixes, obtaining a plurality of different-scale feature matrixes corresponding to the part of the first feature matrixes, and obtaining single-scale feature matrixes corresponding to the other first feature matrixes.

Specifically, the convolution operation of the convolution kernels of different sizes can be performed on the A1 and the A2 multiple times based on the two multi-scale position attention modules, and the convolution operation can be performed on the A3 and the A4 once based on the two single-scale position attention modules.

For example, three convolution operations of 3x3, 5x5 and 7x7 are performed on A1 and A2 based on two multi-scale position attention modules, then three scale feature matrices are obtained for A1 and A2, taking A1 as an example, the obtained three scale feature matrices are marked as B3, B5 and B7, and then three 1x1 convolution operations are performed on B3, B5 and B7 to extract features to obtain Ci, di and Ei, wherein i is the size of the convolution kernel of the scale feature, for example, three 1x1 convolution operations are performed on B3 to extract feature matrices C3, D3 and E3; carrying out three times of 1x1 convolution operation on the B5 to extract feature matrixes C5, D5 and E5; the feature matrices C7, D7, E7 are extracted by performing a 1x1 convolution operation three times on B7.

And (3) carrying out one-time convolution operation on A3 and A4 based on the two single-scale position attention modules to obtain a single-scale feature matrix B3', and then carrying out three-time 1x1 convolution operation on B3' to extract the feature matrices C3', D3' and E3'.

The above is a description of step S401.

Step S502: and obtaining a corresponding position attention matrix based on the scale feature matrix obtained by the convolution operation.

Taking the convolution operation of convolution kernels with different sizes for A1 and A2 based on two multi-scale position attention modules as an example, performing one-time convolution operation for A3 and A4 based on two single-scale position attention modules, and multiplying the points of feature matrices Ci and Di respectively extracted by three-scale feature matrices B3, B5 and B7 of A1 to obtain position correlation; and then multiplying the characteristic matrix Ei by a softmax layer to obtain the position attention matrix.

The above is a description of step S1041.

Step S1042: and adding each corresponding position attention matrix and the corresponding first feature matrix to obtain a plurality of corresponding second feature matrices.

Specifically, each first feature matrix is added with the corresponding position attention moment matrix to obtain a feature matrix with enhanced position features, and the feature matrix is marked as G.

Taking convolution operation of convolution kernels with different sizes for A1 and A2 based on two multi-scale position attention modules, and taking one convolution operation for A3 and A4 based on two single-scale position attention modules as an example, for A1 and A2, three features G3, G5 and G7 with enhanced position information can be obtained, at the moment, multi-scale fusion is carried out on G3, G5 and G7 by using 1x1 convolution operation, and a second feature matrix with corresponding scales for A1 and A2 can be obtained; for A3 and A4, a second characteristic matrix with strengthened position information can be obtained, and four second characteristic matrixes with different scales can be obtained based on A1, A2, A3 and A4.

The above is a further explanation of step S104, and the following further explanation of step S105 is continued.

In some embodiments of step S105 described above, the upsampling module may be a multi-scale decoder structure consisting of a deconvolution layer or a bilinear difference layer or a pooling layer.

Because of the better feature expression capability of the deconvolution operation, in some embodiments the upsampling module may use a multi-scale decoder structure consisting of deconvolution layers.

Specifically, a plurality of second feature matrixes are respectively put into corresponding up-sampling modules, up-sampling is carried out by adopting the deconvolution operation of convolution kernels with the size of 3x3, and a plurality of third feature matrixes with different scales are obtained.

It should be noted that the above configuration of the upsampling module is only schematically illustrated, and in practical applications, those skilled in the art may set the configuration of the upsampling module according to a specific scenario, which is not limited herein.

The above is a further explanation of step S105, and step S106 is further explained below.

In some implementations of step S106, referring to fig. 6, fig. 6 is a schematic flow chart of main steps of processing a third feature matrix with multiple scales to obtain a fourth feature matrix with a preset output size according to an embodiment of the present invention. As shown in fig. 6, the steps S601 to S603 mainly include:

step S601: and storing the third feature matrix obtained by each up-sampling operation to obtain a third feature matrix with multiple scales.

Step S602: and adjusting the plurality of third feature matrixes to preset output sizes to obtain the plurality of third feature matrixes with preset output sizes.

In some embodiments, the preset output size is the same as the size of the image to be segmented of the input image segmentation network, i.e. the multiple third feature matrices of different scales are all adjusted to the size of the image to be segmented.

In other embodiments, the preset output size may be different from the size of the image to be segmented of the input image segmentation network, which is not limited herein.

Step S603: and carrying out feature fusion on the third feature matrixes with the preset output sizes to obtain a fourth feature matrix with the preset output sizes.

Specifically, the third feature matrices with the same size obtained in step S602 may be subjected to a stitching operation, to obtain a fourth feature matrix with a preset output size.

The above is a further explanation of step S106, and the following further explanation of step S107 is continued.

In some implementations of step S107 described above, referring to fig. 7, fig. 7 is a flowchart illustrating a main step of performing channel attention calculation on a fourth feature matrix with a preset output size based on a channel attention module according to an embodiment of the present invention. As shown in fig. 7, the method mainly includes the following steps S701 to S702:

step S701: and performing dimension compression on the fourth feature matrix based on the global average pooling operation.

Specifically, the scale of the fourth feature matrix is h×w×c (H, W and C represent height, width and channel number, respectively), and the dimension compression is to compress the scale of the fourth feature matrix from h×w×c to 1×1×c, i.e., h×w to 1×1, which is achieved by the global averaging pooling operation.

The global averaging pooling operation is to calculate an average value of the whole feature matrix and compress the scale from H×W×C to 1×1×C, so as to obtain four feature matrices with the same size and different channel dimensions.

Step S702: predicting the weight of each channel based on the fourth feature matrix after dimension compression, and carrying out channel attention calculation according to the weights to obtain a fifth feature matrix.

In some embodiments, the fourth feature matrix with the dimension of 1×1×c obtained by the compression part is fused into the full connection layer, the importance degree of each channel is predicted, that is, different dimensions are weighted respectively, and further, channel attention calculation is performed on the fourth feature matrix by adopting a sigmoid activation function in the channel attention module, so as to obtain a weighted fifth feature matrix.

The above is a further explanation of step S107.

In some embodiments, after completing step S107, the method further includes:

and carrying out convolution operation on the fifth feature matrix to obtain an image segmentation result containing the predictive label matrix.

For example, an image to be segmented of an input image segmentation network is a CT image, and a prediction tag matrix containing organ category numbers is obtained through one convolution.

The method provided by the invention utilizes the multi-scale dual-attention image segmentation network to strengthen information of the CT image features of the shallow layer, and can better improve the feature recognition precision of small organs and the image segmentation efficiency. Therefore, the image segmentation network can also be applied to other segmentation networks with common codes and decoders.

It should be noted that, although the foregoing embodiments describe the steps in a specific order, it will be understood by those skilled in the art that, in order to achieve the effects of the present invention, the steps are not necessarily performed in such an order, and may be performed simultaneously (in parallel) or in other orders, and these variations are within the scope of the present invention.

It will be appreciated by those skilled in the art that the present invention may implement all or part of the above-described methods according to the above-described embodiments, or may be implemented by means of a computer program for instructing relevant hardware, where the computer program may be stored in a computer readable storage medium, and where the computer program may implement the steps of the above-described embodiments of the method when executed by a processor. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable storage medium may include: any entity or device, medium, usb disk, removable hard disk, magnetic disk, optical disk, computer memory, read-only memory, random access memory, electrical carrier wave signals, telecommunications signals, software distribution media, and the like capable of carrying the computer program code.

Further, the invention also provides an intelligent image segmentation system. Referring to fig. 8, fig. 8 is a schematic diagram of the main structure of an embodiment of an intelligent image segmentation system according to the present invention. As shown in fig. 8, the smart image segmentation system in the embodiment of the present invention mainly includes a processor 801 and a storage device 802, and the storage device 802 may be configured to store a program for executing the smart image segmentation method of the above-described method embodiment, and the processor 801 may be configured to execute the program in the storage device 802, including, but not limited to, the program for executing the smart image segmentation method of the above-described method embodiment. For convenience of explanation, only those portions of the embodiments of the present invention that are relevant to the embodiments of the present invention are shown, and specific technical details are not disclosed, please refer to the method portions of the embodiments of the present invention.

In some possible embodiments of the invention, the intelligent image segmentation system may comprise a plurality of processors 801 and a plurality of storage devices 802. While the program for performing the intelligent image segmentation method of the above-described method embodiment may be segmented into a plurality of segments of sub-programs, each segment of sub-program may be loaded and executed by the processor 801 to perform the different steps of the intelligent image segmentation method of the above-described method embodiment, respectively. Specifically, each of the sub-programs may be stored in a different storage device 802, and each of the processors 801 may be configured to execute the programs in one or more storage devices 802 to collectively implement the intelligent image segmentation method of the above method embodiment, that is, each of the processors 801 executes different steps of the intelligent image segmentation method of the above method embodiment to collectively implement the intelligent image segmentation method of the above method embodiment.

Further, the invention also provides a computer readable storage medium. In one embodiment of a computer-readable storage medium according to the present invention, the computer-readable storage medium may be configured to store a program for performing the intelligent image segmentation method of the above-described method embodiment, which may be loaded and executed by a processor to implement the intelligent image segmentation method described above. For convenience of explanation, only those portions of the embodiments of the present invention that are relevant to the embodiments of the present invention are shown, and specific technical details are not disclosed, please refer to the method portions of the embodiments of the present invention. The computer readable storage medium may be a storage device including various electronic devices, and optionally, the computer readable storage medium in the embodiments of the present invention is a non-transitory computer readable storage medium.

Thus far, the technical solution of the present invention has been described in connection with one embodiment shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will fall within the scope of the present invention.

Claims

1. An intelligent image segmentation method, characterized in that the method comprises the following steps:

acquiring an image to be segmented;

2. The intelligent image segmentation method according to claim 1, wherein the convolving the plurality of first feature matrices based on the plurality of position attention modules respectively comprises:

3. The intelligent image segmentation method according to claim 2, wherein the performing convolution operation and position attention calculation on the plurality of first feature matrices based on the plurality of position attention modules respectively, to obtain a plurality of second feature matrices with different scales includes:

4. The intelligent image segmentation method according to claim 3, wherein the performing the position attention calculation on the plurality of first feature matrices based on the plurality of position attention modules respectively includes:

acquiring a scale feature matrix obtained by the convolution operation;

5. The intelligent image segmentation method according to claim 4, wherein the obtaining the scale feature matrix obtained by the convolution operation includes:

6. The intelligent image segmentation method according to claim 1, wherein the processing the plurality of third feature matrices to obtain a fourth feature matrix of a preset output size includes:

7. The intelligent image segmentation method according to claim 1, wherein the channel attention calculating based on the channel attention module for the fourth feature matrix of the preset output size comprises:

8. The intelligent image segmentation method according to claim 6, further comprising:

9. A smart image segmentation system comprising a processor and a storage device, the storage device being adapted to store a plurality of program code, characterized in that the program code is adapted to be loaded and executed by the processor to perform the smart image segmentation method of any one of claims 1 to 8.

10. A computer readable storage medium having stored therein a plurality of program codes, characterized in that the program codes are adapted to be loaded and executed by a processor to perform the intelligent image segmentation method according to any one of claims 1 to 8.