CN114266726A

CN114266726A - Medical image segmentation method, system, terminal and storage medium

Info

Publication number: CN114266726A
Application number: CN202111387233.8A
Authority: CN
Inventors: 乐美琰; 秦文健; 谢耀钦
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2021-11-22
Filing date: 2021-11-22
Publication date: 2022-04-01

Abstract

The application relates to a medical image segmentation method, a medical image segmentation system, a medical image segmentation terminal and a storage medium. The method comprises the following steps: performing superpixel extraction on the first modality image and the second modality image; respectively constructing a first modal graph neural network and a second modal graph neural network according to the superpixels of the first modal image and the second modal image as nodes, calculating an adjacent matrix of each superpixel and a neighborhood superpixel according to the central position distance and the gray mean value difference by the first modal graph neural network and the second modal graph neural network, and calculating a characteristic value of each superpixel in the current mode according to the adjacent matrix; and constructing a multi-modal graph neural network for cross-modal nodes according to the superpixels of the first modal image and the second modal image, performing multi-modal information fusion on different modal images by the multi-modal graph neural network, and segmenting the first modal image according to the fused feature graph. The invention does not need the advanced fine registration of the multi-mode images, and improves the segmentation precision of the medical images.

Description

Medical image segmentation method, system, terminal and storage medium

Technical Field

The present application relates to the field of medical image processing technologies, and in particular, to a medical image segmentation method, system, terminal, and storage medium.

Background

In recent years, the development of artificial intelligence technology in the field of image segmentation has brought new opportunities for automatic delineation of CT (Computed Tomography) images. Since MR (Magnetic Resonance imaging) images have high soft tissue contrast, when segmenting CT images, doctors often delineate a target region in the CT images against the MR images.

Since the CT and MR images are themselves not registered, prior art solutions require domain alignment of the two images prior to image segmentation. The current domain alignment methods are mainly divided into spatial domain alignment and gray domain alignment. Wherein spatial domain alignment means registration of MR images onto CT images by classical iterative optimization algorithms or voxelmorphh or other deep learning. When the spatial domains of the MR and CT images are aligned, the multi-modal images will be input into the deep learning neural network as multi-channel tensors. And then, carrying out self-adaptive fusion on the multi-modal information through a convolution module, an attention module and the like in the neural network. The method can effectively extract the characteristics which are beneficial to target region segmentation in different modes, thereby improving the segmentation precision. The disadvantages of this approach are: the segmentation result depends on the registration result, and the global and local precise alignment is difficult to realize simultaneously through a set optimization target in the actual operation, so that different types of features are blended in the multi-mode information fusion in the later period, confusion is caused to a classifier, and the segmentation precision is reduced; in addition, classical registration algorithms take a long time, thereby slowing down the segmentation.

Gray domain alignment represents the transformation of a CT image into a MR image by means of image generation, followed by segmentation of the MR-like image. The new image generated in this way is a virtual image, lacking a certain physical meaning. In addition, when the overall image style is relatively close to that of the MR, the contrast between the target segmentation region and the surrounding tissues is difficult to be greatly improved. More seriously, this image generation method involves a certain risk of changing the microstructure of the tissue, which also reduces the accuracy of the segmentation.

Disclosure of Invention

The present application provides a medical image segmentation method, system, terminal and storage medium, which aim to solve at least one of the above technical problems in the prior art to a certain extent.

In order to solve the above problems, the present application provides the following technical solutions:

a medical image segmentation method, comprising:

acquiring a first modal image and a second modal image, and performing superpixel extraction on the first modal image and the second modal image by using a simple linear iterative clustering algorithm;

respectively constructing a first modal graph neural network and a second modal graph neural network according to the superpixels of the first modal image and the second modal image as nodes, calculating an adjacent matrix of each superpixel and a neighborhood superpixel according to the central position distance and the gray mean value difference by the first modal graph neural network and the second modal graph neural network, and calculating a characteristic value of each superpixel in the current mode according to the adjacent matrix;

constructing a multi-modal graph neural network for cross-modal nodes according to the superpixels of the first modal image and the second modal image, wherein the multi-modal graph neural network constructs an adjacency matrix of each cross-modal node according to the position of each cross-modal node in different modal images and the connection mode of K neighborhood nodes;

acquiring K adjacent nodes of each cross-modal node in other modes according to the adjacency matrix of the cross-modal node, and fusing the eigenvalues of the K adjacent nodes of each cross-modal node in other modes with the eigenvalue of the corresponding superpixel in the current mode to obtain a multi-modal characteristic map with fused characteristics;

and carrying out image segmentation on the multi-modal feature map through a segmentation network.

The technical scheme adopted by the embodiment of the application further comprises the following steps: the first modality image is a CT image, and the second modality image is an MR image.

The technical scheme adopted by the embodiment of the application further comprises the following steps: the first modal graph neural network and the second modal graph neural network calculate the adjacent matrix of each super pixel and the adjacent super pixel according to the central position distance and the gray mean difference, and the method specifically comprises the following steps:

wherein W^k,jDenotes the adjacency of the kth super-pixel to the jth super-pixel in the neighborhood, Δ P^k,jAnd Δ G^k,jRespectively representing the difference between the central position and the gray average value of the kth super pixel and the jth super pixel [. gt|. gtC]Representing a vector connection, f_wRepresenting the mapping relation between the position gray difference and the adjacent matrix,

representing the neighborhood of the superpixel node k.

The technical scheme adopted by the embodiment of the application further comprises the following steps: the calculation of the characteristic value of each super pixel in the current mode according to the adjacency matrix is specifically as follows:

extracting a feature vector of each super pixel from a feature map output by the network convolution layer;

determining K neighborhood nodes of each superpixel in the current mode through the adjacency matrix, and performing weighted summation on the feature vectors of the K neighborhood nodes to obtain new features of each superpixel in the current mode;

adding the new features with the feature values of all pixel points in the corresponding superpixel to obtain the feature value of each superpixel in the current modeA characteristic value of (d); feature vector f of k-th super pixel at i-th scale_i,kThe calculation process of (2) includes:

wherein S is_kDenotes the kth super pixel, F_i,pRepresenting the characteristic vector of a pixel point p in the kth super pixel under the ith scale; using a pair of adjacent matrices W_i,jWeighting to obtain a new characteristic f 'of the kth super pixel at the ith scale'_i,k。

The technical scheme adopted by the embodiment of the application further comprises the following steps: the multi-modal graph neural network constructs an adjacency matrix of each cross-modal node according to the position of each cross-modal node in different modal images and the connection mode of the K neighborhood nodes, and specifically comprises the following steps:

wherein W^k,*Represents the connection pattern of the kth node with its neighboring nodes in the first modality image, sim (W)^k,*,W^j,*) Representing the connection mode similarity of the kth node in the first modality image and the jth node in the second modality image, f_crossAnd representing the mapping relation among the position difference, the connection mode similarity and the cross-mode adjacency matrix.

The technical scheme adopted by the embodiment of the application further comprises the following steps: the fusing the characteristic values of the K adjacent nodes of each cross-modal node in other modalities with the characteristic value of the corresponding super-pixel in the current modality is specifically as follows:

weighting and summing the feature vectors of K adjacent nodes of each cross-modal node in other modes to obtain new features of each cross-modal node in other modes, and adding the new features and the feature values of all pixel points in the corresponding superpixel to obtain a fused multi-modal feature map;

new features of kth super-pixel in ith scale in other modes

And the characteristic value of the pixel point p in the kth super pixel in the characteristic graph under the ith scale

The acquisition formula is:

wherein

And

respectively representing the characteristic vectors of the jth neighbor superpixel of the kth superpixel in the current mode image and other mode images under the ith scale, W^k,jAnd

respectively representing the adjacency value of the jth neighboring superpixel of the kth superpixel in the current modality image and other modality images, F_i,pAnd representing the feature vector on the pixel point p in the kth super pixel under the ith scale.

The technical scheme adopted by the embodiment of the application further comprises the following steps: the image segmentation of the multi-modal feature map through the segmentation network specifically comprises the following steps:

the segmentation network decodes the multi-modal feature map, enables the multi-modal feature map to be up-sampled to the size of the first modal image, and outputs a segmentation result of the region of interest in the first modal image.

Another technical scheme adopted by the embodiment of the application is as follows: a medical image segmentation system, comprising:

a super-pixel extraction module: the super-pixel extraction method comprises the steps of obtaining a first modal image and a second modal image, and performing super-pixel extraction on the first modal image and the second modal image by using a simple linear iterative clustering algorithm;

a single-mode network construction module: the first modal graph neural network and the second modal graph neural network are used for respectively constructing a first modal graph neural network and a second modal graph neural network according to the superpixels of the first modal image and the second modal image as nodes, the first modal graph neural network and the second modal graph neural network calculate an adjacent matrix of each superpixel and a neighborhood superpixel according to the central position distance and the gray mean value difference, and calculate a characteristic value of each superpixel in the current mode according to the adjacent matrix;

the multi-modal network construction module: the multi-modal graph neural network is used for constructing a multi-modal graph neural network for cross-modal nodes according to the superpixels of the first modal image and the second modal image, and the multi-modal graph neural network is used for constructing an adjacency matrix of each cross-modal node according to the position of each cross-modal node in different modal images and the connection mode of K neighborhood nodes;

a feature fusion module: the multi-modal feature map is used for acquiring K adjacent nodes of each cross-modal node in other modalities according to the adjacency matrix of the cross-modal node, and fusing the feature values of the K adjacent nodes of each cross-modal node in other modalities with the feature values of the corresponding super-pixel in the current modality to obtain a feature fused multi-modal feature map;

an image segmentation module: the multi-modal feature map is used for image segmentation through a segmentation network.

The embodiment of the application adopts another technical scheme that: a terminal comprising a processor, a memory coupled to the processor, wherein,

the memory stores program instructions for implementing the medical image segmentation method;

the processor is to execute the program instructions stored by the memory to control medical image segmentation.

The embodiment of the application adopts another technical scheme that: a storage medium storing program instructions executable by a processor for performing the medical image segmentation method.

Compared with the prior art, the embodiment of the application has the advantages that: according to the medical image segmentation method, the medical image segmentation system, the medical image segmentation terminal and the storage medium, images of different modes are divided into the plurality of superpixels, the corresponding relation between each superpixel in the images of different modes and the neighborhood superpixels in the current mode and other modes is established by using the graph neural network, so that multi-mode information fusion is performed on the images of different modes, alignment in a spatial domain is realized, the problem of information fusion of unregistered multi-mode medical images is solved, and the overall segmentation speed is improved. The method does not need the prior fine registration of the multi-mode images, avoids segmentation errors and long time consumption caused by the registration, and enables the feature fusion to be more self-adaptive, thereby improving the segmentation precision of the medical images.

Drawings

Fig. 1 is a flowchart of a medical image segmentation method of an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a medical image segmentation system according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a terminal according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a storage medium according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Please refer to fig. 1, which is a flowchart illustrating a medical image segmentation method according to an embodiment of the present application. The medical image segmentation method comprises the following steps:

s1: acquiring a first modal image and a second modal image, and respectively performing superpixel extraction on the first modal image and the second modal image by using a simple linear iterative clustering algorithm (SLIC);

in this step, the first modality image and the second modality image are a CT image and an MR image, respectively, and may specifically be medical images of other modalities such as DR.

S2: respectively taking the superpixels of the first modal image and the second modal image as nodes to construct a first modal graph neural network and a second modal graph neural network, respectively extracting the adjacent relation between each node and a neighborhood node under each mode through the first modal graph neural network and the second modal graph neural network, and calculating an adjacent matrix of each superpixel and the neighborhood superpixel according to the central position distance and the gray mean difference;

in this step, the first modal graph neural network and the second modal graph neural network are single modal segmentation networks, and the single modal segmentation network can extract the adjacency relation between each superpixel (node) and its neighborhood nodes in the image of the corresponding modality, and calculate the adjacency matrix of each superpixel by weighting the center position and the gray average value of the superpixel. The process of computing the adjacency matrix W for a superpixel is as follows:

wherein Δ P^k,jAnd Δ G^k,jRespectively representing the difference between the central position and the gray average value of the kth super pixel and the jth super pixel [. gt|. gtC]Representing a vector connection, f_wRepresenting the mapping relation between the position gray difference and the adjacent matrix,

w is obtained by calculation and represents the neighborhood of a super pixel node k^k,jIndicating the adjacency of the kth super-pixel with the jth super-pixel in the neighborhood.

S3: extracting a feature vector of each superpixel from a feature map output by a network convolution layer, determining K neighborhood nodes of each superpixel under different modes under the current mode through an adjacency matrix, carrying out weighted summation on the feature vectors of the K neighborhood nodes to obtain a new feature of each superpixel under the current mode, and adding the new feature and feature values of all pixel points in the corresponding superpixel to obtain a feature value of each superpixel under the current mode;

in this step, the feature values of all the pixel points in the superpixel refer to the feature of each pixel position in one superpixel, the superpixel index graph is scaled to the size of the feature graph, and then the feature values of all the pixel points in the current superpixel are obtained from the feature graph according to the index. Feature map F of hypothetical graph neural network at ith scale_iThe size is C x H W, the super pixel index map is required to be sampled to H x W, the feature mean value of all pixel points in each super pixel is calculated, and finally a C-dimensional feature vector of each super pixel is obtained. The k-th super-pixel S at the i-th scale can be determined by the adjacency matrix_kK neighborhood nodes in the current mode, for superpixel S_kThe characteristics of K neighborhood nodes are weighted and summed to obtain a super pixel S_kNew feature f 'at Current Modal'_i,kThen adding the new feature to the super-pixel S_kAnd finishing the feature extraction of the superpixel under the single mode on the feature values of all the internal pixel points. Specifically, the feature vector f of the kth super pixel at the ith scale_i,kThe calculation process of (2) is as follows:

in the formula (2), S_kDenotes the kth super pixel, F_i,pAnd representing the characteristic vector of the pixel point p in the kth super pixel under the ith scale. Using the adjacency matrix W to f in equation (3)_i,jWeighting to obtain a new characteristic f 'of the kth super pixel at the ith scale'_i,k。

The above calculation is performed in a plurality of scales, and feature extraction of each super pixel in a single mode is completed.

S4: constructing a multi-modal graph neural network for cross-modal nodes according to superpixels of the first modal image and the second modal image, comparing the connection mode of each cross-modal node with K neighborhood nodes, and determining an adjacent matrix of each cross-modal node in the multi-modal graph neural network according to the node positions and the connection modes;

in this step, because the multi-modal image has a large gap in the grayscale domain, there is a large risk in directly using the grayscale value or the feature map in the network to calculate the adjacency matrix across the modal nodes, and because the overall tissue organ structure in the multi-modal image is unchanged, the connection modes of the corresponding superpixels and the respective neighborhood nodes are relatively stable, the embodiment of the present application determines the adjacency matrix across the modal nodes by combining the positions and connection modes of the superpixels in the two modal images. For example, a connection weight vector of a super pixel p and neighboring super pixels in the first modality image may represent a connection mode of the super pixel p in the first modality image, and the adjacency matrix of the super pixel p is constructed by finding out the super pixel in the second modality image which is close to the coordinate position of the super pixel p and is similar to the connection mode of the super pixel p in the first modality image (the connection mode similarity is higher than a set threshold value). Specifically, the adjacency matrix construction formula is as follows:

Based on the above, in the embodiments of the present application, by comparing the connection mode between each super pixel and K neighborhood nodes in different modality images, an adjacency matrix is constructed according to the connection mode between the span modality nodes and the neighborhood nodes in the different modality images, and a solution is provided for gaps of gray scale domains and feature domains existing between different modalities.

S5: acquiring K adjacent nodes of each cross-modal node in other modes through an adjacency matrix of the cross-modal nodes, performing weighted summation on feature vectors of the K adjacent nodes of each cross-modal node in other modes to obtain a new feature of each cross-modal node in other modes, and adding the new feature and feature values of all pixel points in corresponding super pixels to obtain a multi-modal feature map with an unchanged structure and fused with other modal features;

in this step, the new features of the kth super-pixel in the ith scale in other modes

The acquisition formula is as follows:

wherein

And

S6: decoding the multi-scale multi-modal feature map through a segmentation network, so that the multi-modal feature map is up-sampled step by step to the size of a first modal image, and a segmentation result of an interest region in the first modal image is output;

in the step, the multi-modal feature map is decoded step by step through the segmentation network, and the multi-modal feature map is returned to the size of the first modal image, so that the region-of-interest segmentation result of the first modal image can be obtained. The segmentation network comprises a plurality of convolution layers and deconvolution layers, and long-range jump connection is added in a decoding stage of each scale to combine with a feature map of an encoding stage so as to increase detail information of a segmentation result. And finally, when the size of the decoded characteristic diagram is consistent with that of the first modal image, performing feedback adjustment on the network parameters by calculating the cross entropy loss of the characteristic diagram and the gold standard image to obtain the optimal network parameters.

Based on the above, in the medical image segmentation method of the embodiment of the application, the images of different modalities are divided into a plurality of superpixels, and the corresponding relation between each superpixel in the images of different modalities and the neighborhood superpixels in the current modality and other modalities is established by using the graph neural network, so that the images of different modalities are subjected to multi-modality information fusion, the alignment in a spatial domain is realized, the problem of information fusion of unregistered multi-modality medical images is solved, and the overall segmentation speed is improved. The method does not need the prior fine registration of the multi-mode images, avoids segmentation errors and long time consumption caused by the registration, and enables the feature fusion to be more self-adaptive, thereby improving the segmentation precision of the medical images.

Please refer to fig. 2, which is a schematic structural diagram of a medical image segmentation system according to an embodiment of the present application. The medical image segmentation system 40 of the embodiment of the present application includes:

the super pixel extraction module 41: the super-pixel extraction method comprises the steps of obtaining a first modal image and a second modal image, and performing super-pixel extraction on the first modal image and the second modal image by using a simple linear iterative clustering algorithm;

the single-modality network construction module 42: the first modal graph neural network and the second modal graph neural network are used for respectively constructing a first modal graph neural network and a second modal graph neural network according to the superpixels of the first modal image and the second modal image as nodes, the first modal graph neural network and the second modal graph neural network calculate an adjacent matrix of each superpixel and a neighborhood superpixel according to the central position distance and the gray mean value difference, and calculate the characteristic value of each superpixel in the current mode according to the adjacent matrix;

the multimodal network construction module 43: the multi-modal graph neural network is used for constructing a multi-modal graph neural network for cross-modal nodes according to the superpixels of the first modal image and the second modal image, and the multi-modal graph neural network is used for constructing an adjacency matrix of each cross-modal node according to the position of each cross-modal node in different modal images and the connection mode of K neighborhood nodes;

the feature fusion module 44: the system comprises a multi-modal feature map, a node B and a node C, wherein the multi-modal feature map is used for acquiring K adjacent nodes of each cross-modal node in other modes according to an adjacency matrix of the cross-modal node, and fusing the feature values of the K adjacent nodes of each cross-modal node in other modes with the feature values of corresponding super pixels in the current mode to obtain the feature fused multi-modal feature map;

the image segmentation module 45: the method is used for carrying out image segmentation on the multi-modal feature map through a segmentation network.

Please refer to fig. 3, which is a schematic diagram of a terminal structure according to an embodiment of the present application. The terminal 50 comprises a processor 51, a memory 52 coupled to the processor 51.

The memory 52 stores program instructions for implementing the medical image segmentation method described above.

The processor 51 is operative to execute program instructions stored in the memory 52 to control medical image segmentation.

The processor 51 may also be referred to as a CPU (Central Processing Unit). The processor 51 may be an integrated circuit chip having signal processing capabilities. The processor 51 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Please refer to fig. 4, which is a schematic structural diagram of a storage medium according to an embodiment of the present application. The storage medium of the embodiment of the present application stores a program file 61 capable of implementing all the methods described above, where the program file 61 may be stored in the storage medium in the form of a software product, and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the methods of the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a mobile hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or terminal devices, such as a computer, a server, a mobile phone, and a tablet.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A medical image segmentation method, comprising:

2. A medical image segmentation method according to claim 1, characterized in that the first modality image is a CT image and the second modality image is an MR image.

3. The medical image segmentation method according to claim 2, wherein the first and second modal graph neural networks calculate the adjacency matrix of each superpixel and the neighboring superpixels according to the center position distance and the gray mean difference by:

wherein W^k，jDenotes the adjacency of the kth super-pixel to the jth super-pixel in the neighborhood, Δ P^k，jAnd Δ G^k，jRespectively representing the center position and gray of the kth super pixel and the jth super pixelDifference in degree-means [. DELTA. ]]Representing a vector connection, f_wRepresenting the mapping relation between the position gray difference and the adjacent matrix,

representing the neighborhood of the superpixel node k.

4. A medical image segmentation method according to claim 3, wherein the calculating of the feature value of each super-pixel in the current modality according to the adjacency matrix is specifically:

adding the new features and the feature values of all pixel points in the corresponding super pixels to obtain the feature value of each super pixel in the current mode; feature vector f of k-th super pixel at i-th scale_i，kThe calculation process of (2) includes:

wherein S is_kDenotes the kth super pixel, F_i，pRepresenting the characteristic vector of a pixel point p in the kth super pixel under the ith scale; using a pair of adjacent matrices W_i，jWeighting to obtain a new characteristic f 'of the kth super pixel at the ith scale'_i，k。

5. The medical image segmentation method according to claim 4, wherein the multi-modal graph neural network constructs the adjacency matrix of each cross-modal node according to the position of each cross-modal node in the different modal images and the connection mode of the K neighborhood nodes by:

wherein W^k，*Represents the connection pattern of the kth node with its neighboring nodes in the first modality image, sim (W)^k，*，W^j，*) Representing the connection mode similarity of the kth node in the first modality image and the jth node in the second modality image, f_crossAnd representing the mapping relation among the position difference, the connection mode similarity and the cross-mode adjacency matrix.

6. The medical image segmentation method according to claim 5, wherein the fusing the feature values of the K neighboring nodes of each cross-modality node in the other modalities with the feature values of the corresponding superpixel in the current modality is specifically:

new features of kth super-pixel in ith scale in other modes

The acquisition formula is:

wherein

And

respectively representing the characteristic vectors of the jth neighbor superpixel of the kth superpixel in the current mode image and other mode images under the ith scale, W^k，jAnd

respectively representing the adjacency value of the jth neighboring superpixel of the kth superpixel in the current modality image and other modality images, F_i，pAnd representing the feature vector on the pixel point p in the kth super pixel under the ith scale.

7. The medical image segmentation method according to any one of claims 1 to 6, wherein the image segmentation of the multi-modal feature map by the segmentation network is specifically:

8. A medical image segmentation system, comprising:

9. A terminal, comprising a processor, a memory coupled to the processor, wherein,

the memory stores program instructions for implementing a medical image segmentation method according to any one of claims 1 to 7;

10. A storage medium storing program instructions executable by a processor to perform the medical image segmentation method according to any one of claims 1 to 7.