CN113139964A

CN113139964A - Multi-modal image segmentation method and device, electronic equipment and storage medium

Info

Publication number: CN113139964A
Application number: CN202010065805.XA
Authority: CN
Inventors: 不公告发明人
Original assignee: Shanghai Microport Medical Group Co Ltd
Current assignee: Shanghai Weiwei Medical Technology Co.,Ltd.
Priority date: 2020-01-20
Filing date: 2020-01-20
Publication date: 2021-07-20

Abstract

The invention provides a multi-modal image segmentation method, a multi-modal image segmentation device, electronic equipment and a storage medium, wherein the segmentation method comprises the steps of obtaining images to be segmented under a plurality of modes; performing first preprocessing on the acquired images to be segmented under the multiple modalities so as to register the images to be segmented under the multiple modalities; segmenting the images to be segmented under the plurality of modalities after the first preprocessing by adopting a pre-trained full convolution neural network model to obtain segmented images; the full convolution neural network model comprises a recalibration module, and the recalibration module is used for performing weight redistribution on a channel and/or a space of an input image. The invention can inhibit irrelevant channel and/or spatial characteristic information, enhance relevant channel and/or spatial characteristic information, effectively improve the precision of the integral segmentation algorithm, reduce the complicated operation of man-machine interaction and better assist doctors.

Description

Multi-modal image segmentation method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method and an apparatus for multi-modal image segmentation, an electronic device, and a storage medium.

Background

Brain tumors are an abnormal tissue formed by uncontrolled cell proliferation of local tissues, and clinically, brain tumors are diverse in form, which limits early diagnosis and treatment of brain tumors. With the development of medical imaging technology, Magnetic Resonance Imaging (MRI) becomes one of the main methods for diagnosing brain tumor diseases, so that the method has important significance for brain tumor MRI image segmentation, and according to the accurate result of brain tumor segmentation, doctors can obtain information such as the shape, size and position of tumors, and the like, so that the method is used for assisting operation navigation, positioning of a radiation target region and the like, and further helps doctors to make a personalized treatment scheme for patients.

Automatic brain tumor MRI image segmentation has certain technical challenges, which are mainly reflected in the following aspects: (1) brain tumors vary widely among individuals and can occur anywhere in the brain, and their shapes, structures and sizes vary, (2) some brain tumors grow diffusely and infiltratively, (3) brain tissues have complex structures, normal brain tissues include gray matter, white matter, ventricles and cerebrospinal fluid, etc., and diseased tissues include tumor and edema areas, wherein the tumors include necrotic, enhanced and non-enhanced areas.

The current brain tumor segmentation method comprises the following steps: (1) manual segmentation is carried out by means of experienced medical experts; (2) a threshold-based segmentation method; (3) a region-based segmentation method; (4) a segmentation method based on deep learning.

However, the existing brain tumor segmentation method has the following disadvantages:

(1) manual brain tumor segmentation methods not only differ greatly in their results but also take a lot of time and effort.

(2) The threshold-based segmentation method cannot fully utilize information of the image, so that the segmentation precision is not high.

(3) The region-based segmentation method is limited by the diversity of tumors, and universal iteration conditions are difficult to set, so that the segmentation precision is not high.

(4) Compared with the traditional method, the deep learning segmentation method has certain precision improvement, but channels and spaces of most network models have the same weight contribution, which can limit the further improvement of the segmentation precision.

Disclosure of Invention

The invention aims to provide a multi-modal image segmentation method, a multi-modal image segmentation device, electronic equipment and a storage medium, which can improve the precision of an overall segmentation algorithm and effectively reduce the complicated operation of man-machine interaction.

In order to achieve the above object, the present invention provides a multi-modal image segmentation method, including:

acquiring images to be segmented under a plurality of modes;

performing first preprocessing on the acquired images to be segmented under the plurality of modalities so as to register the images to be segmented under the plurality of modalities; and

segmenting the images to be segmented under the plurality of modes after first preprocessing by adopting a pre-trained full convolution neural network model to obtain segmented images;

the full convolution neural network model comprises a recalibration module, and the recalibration module is used for performing weight redistribution on channels and/or spaces of the input image.

Optionally, the recalibration module includes a channel recalibration sub-module and/or a space recalibration sub-module, the channel recalibration sub-module is configured to perform weight reallocation on channels of the input image, and the space recalibration sub-module is configured to perform weight reallocation on a space of the input image.

Optionally, the step of segmenting the to-be-segmented image in the plurality of modalities after the first preprocessing by using a pre-trained fully convolutional neural network model includes:

performing second preprocessing on the images to be segmented under the plurality of modalities after the first preprocessing so as to remove noise in the images to be segmented under each modality; and

and segmenting the images to be segmented under the plurality of modes after second preprocessing by adopting a pre-trained full convolution neural network model.

Optionally, the second preprocessing includes:

and respectively carrying out filtering processing on the images to be segmented under the plurality of modes after the first preprocessing by adopting a three-dimensional Gaussian filter.

Optionally, the step of performing a first preprocessing on the acquired images to be segmented in the multiple modalities to register the images to be segmented in the multiple modalities includes:

and taking the image to be segmented in one of the acquired images to be segmented in the plurality of modalities as a reference, and performing rigid change on the target image to be segmented through mutual information maximization so as to register the images to be segmented in the plurality of modalities.

Optionally, the channel recalibration sub-module includes a global pooling layer, a first full-connection layer, a second full-connection layer, and a first recalibration operation layer, which are cascaded.

Optionally, the spatial recalibration sub-module includes a cascaded convolution layer and a second recalibration operation layer.

Optionally, the full convolutional neural network model includes a decoding network and an encoding network;

the decoding network comprises an input layer, a plurality of cascaded first neural network groups and a first convolution layer, wherein the first neural network group comprises a cascaded second convolution layer, a recalibration module and a pooling layer;

the coding network comprises a plurality of cascaded second neural network groups, a third convolutional layer and an output layer, wherein the second neural network group comprises a cascaded deconvolution layer, a merging layer, a fourth convolutional layer and a recalibration module;

and the merging layer is used for performing linear addition merging on the output of the deconvolution layer and the output image of the recalibration module in the corresponding decoding network.

Optionally, the decoding network comprises a plurality of cascaded first residual connections, and the encoding network comprises a plurality of cascaded second residual connections.

Optionally, the full convolution neural network model is obtained by training through the following steps:

obtaining an original training sample, wherein the original training sample comprises an original training image and a label image corresponding to the original training image;

expanding the original training sample to obtain an expanded training sample, wherein the expanded training sample comprises an expanded training image and a label image corresponding to the expanded training image;

setting initial values of model parameters of the full convolution neural network model; and

and training a pre-built full convolution neural network model according to the expanded training sample and the initial value of the model parameter until a preset training end condition is met.

Optionally, the step of training the pre-built full convolution neural network model according to the expanded training samples and the initial values of the model parameters includes:

and training a pre-built full convolution neural network model by adopting a random gradient descent method according to the expanded training sample and the initial value of the model parameter.

Optionally, the preset training end condition is that a prediction result of a training image in the expanded training sample and an error value of a corresponding label image converge to a preset error value.

Optionally, the step of training the pre-built full convolution neural network model by using a random gradient descent method according to the expanded training sample and the initial value of the model parameter includes:

step A: taking the expanded training image as the input of a full convolution neural network model, and acquiring the prediction result of the expanded training image according to the initial value of the model parameter;

and B: calculating a loss function value according to the prediction result and a label image corresponding to the expanded training image; and

and C: and B, judging whether the loss function value converges to a preset value, if so, finishing training, if not, adjusting the model parameter, updating the initial value of the model parameter to the adjusted model parameter, and returning to execute the step A.

In order to achieve the above object, the present invention also provides a multi-modal image segmentation apparatus including:

the acquisition module is used for acquiring images to be segmented under a plurality of modalities;

the preprocessing module is used for performing first preprocessing on the acquired images to be segmented under the multiple modalities so as to register the images to be segmented under the multiple modalities; and

the segmentation module is used for segmenting the images to be segmented under the plurality of modes after the first preprocessing by adopting a pre-trained full convolution neural network model so as to obtain segmented images;

Optionally, the segmentation module includes:

the preprocessing submodule is used for carrying out second preprocessing on the images to be segmented under the plurality of modalities after the first preprocessing so as to remove noise in the images to be segmented under each modality; and

and the segmentation submodule is used for segmenting the images to be segmented under the plurality of modes after the second preprocessing by adopting a pre-trained full convolution neural network model.

To achieve the above object, the present invention further provides an electronic device, which includes a processor and a memory, where the memory stores a computer program, and the computer program, when executed by the processor, implements the multi-modal image segmentation method described above.

To achieve the above object, the present invention further provides a readable storage medium, which stores therein a computer program, and when the computer program is executed by a processor, the computer program implements the multi-modal image segmentation method described above.

Compared with the prior art, the multi-modal image segmentation method, the multi-modal image segmentation device, the electronic equipment and the storage medium have the following advantages: the method comprises the steps of obtaining images to be segmented under a plurality of modes; then, performing first preprocessing on the acquired images to be segmented under the multiple modalities to register the images to be segmented under the multiple modalities; and then, segmenting the images to be segmented under the plurality of modes after the first preprocessing by adopting a pre-trained full convolution neural network model to obtain segmented images. Because the full convolution neural network model adopted by the invention comprises the recalibration module, the channel and/or space of the input image can be subjected to weight redistribution through the recalibration module, so that irrelevant channel and/or space characteristic information can be inhibited, relevant channel and/or space characteristic information can be enhanced, the precision of the integral segmentation algorithm is effectively improved, meanwhile, the complicated operation of man-machine interaction can be reduced, the image segmentation algorithm has strong universality, an end-to-end algorithm flow is realized, and doctors can be better assisted.

Drawings

FIG. 1 is a flow chart of a multi-modal image segmentation method according to an embodiment of the present invention;

fig. 2a is a specific example of an image to be segmented in Flair mode after the second preprocessing;

fig. 2b is a specific example of an image to be segmented in the T1 modality after the second preprocessing;

fig. 2c is a specific example of an image to be segmented in the T1ce modality after the second preprocessing;

fig. 2d is a specific example of an image to be segmented in the T2 modality after the second preprocessing;

FIG. 3 is a schematic structural diagram of a recalibration module in a full convolution neural network model according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a full convolution neural network model according to an embodiment of the present invention;

FIG. 5a is a schematic view of the segmented image superimposed into FIG. 2 a;

FIG. 5b is a schematic view of the segmented image superimposed into FIG. 2 b;

FIG. 5c is a schematic view of the segmented image superimposed into FIG. 2 c;

FIG. 5d is a schematic diagram of the image obtained after segmentation superimposed on FIG. 2 d;

FIG. 6 is a block diagram of a multi-modal image segmentation apparatus according to an embodiment of the present invention;

fig. 7 is a block diagram of an electronic device according to an embodiment of the invention.

Wherein the reference numbers are as follows:

a channel recalibration sub-module-110; global pooling layer-111; a first fully connected layer-112; a second fully connected layer-113; first recalibration operation layer-114; a spatial recalibration sub-module-120; convolutional layer-121; second recalibration operating layer-122; an acquisition module-201; a preprocessing module-202; a segmentation module-203; a processor-301; a communication interface-302; a memory-303; communication bus-304.

Detailed Description

The multi-modal image segmentation method, apparatus, electronic device and storage medium according to the present invention will be described in detail with reference to the accompanying drawings and embodiments. The advantages and features of the present invention will become more apparent from the following description. It is to be noted that the drawings are in a very simplified form and are all used in a non-precise scale for the purpose of facilitating and distinctly aiding in the description of the embodiments of the present invention. To make the objects, features and advantages of the present invention comprehensible, reference is made to the accompanying drawings. It should be understood that the structures, ratios, sizes, and the like shown in the drawings and described in the specification are only used for matching with the disclosure of the specification, so as to be understood and read by those skilled in the art, and are not used to limit the implementation conditions of the present invention, so that the present invention has no technical significance, and any structural modification, ratio relationship change or size adjustment should still fall within the scope of the present invention without affecting the efficacy and the achievable purpose of the present invention.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The core idea of the invention is to provide a multi-modal image segmentation method, a multi-modal image segmentation device, an electronic device and a storage medium, which can not only improve the precision of the overall segmentation algorithm, but also effectively reduce the complicated operation of man-machine interaction.

In the embodiment of the present invention, the multi-modal image segmentation method and apparatus provided by the present invention are described by taking an organ image, for example, a brain image as an example, and the method and apparatus are not limited to segmentation of an organ image, but may also be applied to segmentation of other images. The multimodal image segmentation method according to the embodiment of the present invention is applicable to a multimodal image segmentation apparatus according to an embodiment of the present invention, which may be configured on an electronic device, such as a personal computer, a mobile terminal, and the like, and the mobile terminal may be a hardware device with various operating systems, such as a mobile phone and a tablet computer.

To achieve the above idea, the present invention provides a multi-modal image segmentation method, referring to fig. 1, which schematically shows a flowchart of the multi-modal image segmentation method according to an embodiment of the present invention, as shown in fig. 1, the multi-modal image segmentation method includes the following steps:

step S100: and acquiring images to be segmented under a plurality of modalities.

In the present invention, the image to be segmented may be a brain image including a brain tumor, or may be an image including other tissue and organ or other non-organ image, which is not limited in the present invention. The image to be segmented can be obtained by scanning and acquisition of various imaging systems, and can also be obtained by transmission of an internal or external storage system such as a storage system image archiving and communication system. The imaging system includes, but is not limited to, one or more of Magnetic Resonance Imaging (MRI), Computed Tomography (CT), Positron Emission Tomography (PET), and the like in combination. It should be noted that the size of the image to be segmented may be set according to specific situations, and the invention is not limited to this, for example, the size of the image to be segmented in any modality may be 240 × 240 × 120 pixels.

Taking the example of obtaining a brain tumor image by segmenting the brain image, in the present embodiment, MRI brain images including a brain tumor are obtained in four modalities, as shown in fig. 2a to 2d, which are Flair (magnetic resonance imaging liquid attenuation inversion recovery sequence), T1 (longitudinal sequence), T1ce (longitudinal contrast enhancement sequence), and T2 (transverse sequence). It should be noted that, although the embodiment is described by taking an example of obtaining a brain tumor image by segmenting an MRI brain image in four modalities as an example, as will be understood by those skilled in the art, in other embodiments, images to be segmented in less than four modalities or more than four modalities may be segmented, and the images to be segmented may be brain images or images of other organs.

The physical meaning reflected by brain tumor images of different modalities varies. In the T2 mode image, the tumor focus region is enhanced and appears high-brightness white, and the cerebrospinal fluid also appears high-brightness, while other normal tissues appear low-brightness black. The Flair modal image is obtained on the basis of the T2 image, the Flair image inhibits part of cerebrospinal fluid and eliminates the interference of the cerebrospinal fluid on brain tumor focus, so that the tumor focus area presents high brightness, and the contrast between the tumor focus area and normal tissues is relatively high. The T1 modal image is mainly clearly displayed for various tissue structures of the brain image, has no obvious response to a tumor focus area, and has very low contrast with normal brain tissue. The T1ce mode image mainly has higher identification of tumor core, enhanced tumor structure with high brightness, and necrotic area with lower brightness.

Step S200: performing first preprocessing on the acquired images to be segmented under the plurality of modalities to register the images to be segmented under the plurality of modalities.

Therefore, the acquired images to be segmented under the plurality of modalities can be subjected to spatial consistency matching by performing first preprocessing on the acquired images to be segmented under the plurality of modalities.

Preferably, in this step, the images to be segmented in the plurality of modalities may be registered by performing rigid changes such as rotation and translation on the target image to be segmented, that is, the images to be segmented in all modalities except for the acquired image to be segmented in one of the plurality of modalities, by using the image to be segmented in one of the plurality of modalities as a reference and maximizing mutual information. For example, when MRI brain images in the four modalities, namely Flair, T1, T1ce and T2, are acquired, any one of Flair, T1, T1ce and T2 may be used as a reference, and the images to be segmented in the remaining modalities are subjected to rigid change through mutual information maximization, so as to register the images to be segmented in the four modalities.

Mutual information, which is a measure of the similarity of images, was first applied in 1995 to medical image registration, Woods et al in multimodal medical image registration based on the assumption that the correlation is strongest when the same objects in the two images are spatially aligned, and it is believed that when the two images are aligned, the correlation is the greatest and the corresponding mutual information is also the greatest. In the medical image registration problem, since the two images are based on common human anatomy information, when the spatial positions of the two images are completely consistent, the information of one image expressed by the other image, i.e., the mutual information, should be the maximum.

Step S300: and segmenting the images to be segmented under the plurality of modes after the first preprocessing by adopting a pre-trained full convolution neural network model to obtain segmented images.

The full convolution neural network model comprises a recalibration module, and the recalibration module is used for performing weight redistribution on channels and/or spaces of the input image/image to be segmented. Therefore, the re-calibration module can be used for carrying out weight re-distribution on the channels and/or spaces of the input image/image to be segmented, so that irrelevant channel and/or space characteristic information can be inhibited, relevant channel and/or space characteristic information can be enhanced, the precision of the overall segmentation algorithm can be effectively improved, the tedious operations of man-machine interaction can be reduced, the image segmentation algorithm is strong in universality, an end-to-end algorithm process is realized, and doctors can be better assisted.

Preferably, the recalibration module comprises a channel recalibration sub-module and/or a space recalibration sub-module, the channel recalibration sub-module is used for performing weight reallocation on channels of the input images/images to be segmented, and the space recalibration sub-module is used for performing weight reallocation on spaces of the input images/images to be segmented. Therefore, the channel re-calibration sub-module can be used for carrying out weight re-distribution on the channels of the input image/image to be segmented, so that irrelevant channel characteristic information can be inhibited, and relevant channel characteristic information can be enhanced; the space re-calibration sub-module can be used for carrying out weight re-distribution on the space of the input image/image to be segmented, so that irrelevant space characteristic information can be inhibited, and relevant space characteristic information can be enhanced.

Preferably, please refer to fig. 3, which schematically shows a schematic structural diagram of a recalibration module in a full convolution neural network model according to an embodiment of the present invention, as shown in fig. 3, in the embodiment, the recalibration module includes a channel recalibration sub-module 110 and a spatial recalibration sub-module 120, and an output image of the recalibration module is an addition of a spatial recalibration result and a channel recalibration result. Because the recalibration module comprises the channel recalibration sub-module 110 and the space recalibration sub-module 120, the channel and the space of the input image/image to be segmented can be subjected to weight redistribution through the recalibration module, so that irrelevant space and channel characteristic information can be inhibited, relevant space and channel characteristic information can be enhanced, and the precision of the overall segmentation algorithm can be further improved.

Preferably, as shown in fig. 3, the channel recalibration sub-module 110 includes a global pooling layer 111, a first fully-connected layer 112, a second fully-connected layer 113, and a first recalibration operation layer 114 in cascade. The global pooling layer 111 is configured to compress the input image/image to be segmented (e.g., 240 × 240 × 120 × 4, 326 × 326 × 168 × 4, or 128 × 128 × 64 × 4, where 4 represents the number of channels) in three directions, where the directions of the channels are not compressed, and the size is, for example, 1 × 1 × 1 × 4, so as to compress the spatial information of the input image/image to be segmented, and only retain the N-dimensional channel feature information; the first full connection layer 112 is used for performing dimensionality reduction operation on channel characteristic information; the second full connection layer 113 is used for performing dimension increasing operation on the channel feature information after dimension reduction, so as to calculate a weight factor between channels, that is, to obtain an importance degree representation of each channel, and generate different weights for different channels; the first recalibration operation layer 114 is used for multiplying the obtained weight and the input image/image to be segmented, so that recalibration of the channel of the input image/image to be segmented is completed.

Preferably, as shown in fig. 3, the spatial recalibration sub-module 120 includes a cascaded convolution layer 121 and a second recalibration operation layer 122. The convolution layer 121 is configured to perform convolution operation on the input image/image to be segmented, so as to obtain a weight W, where each point in the weight W is a weighted sum of all points of a certain region in the given input image/image to be segmented, where the certain region has a position (i, j) as a central point, in different feature channels, and corresponds to an importance degree of each point in space, and the weight can give higher attention to a region of interest, and give an inhibition effect to a region of no interest; the second recalibration operation layer 122 is used for multiplying the obtained weight by the input image/image to be segmented, so that spatial recalibration of the input image/image to be segmented is completed.

Referring to fig. 4, a schematic structural diagram of a full convolutional neural network model according to an embodiment of the present invention is schematically shown, as shown in fig. 4, the full convolutional neural network model includes a decoding network and an encoding network;

The input of the decoding network is a multi-channel image, the number of the channels is the same as the modality type of the acquired image to be segmented, and since the image to be segmented in four modalities is segmented in the present embodiment, the input of the decoding network is a four-channel image in the present embodiment. The decoding network is used for learning useful characteristic information (such as brain tumor characteristic information) from images to be segmented of four modalities, and the coding network is used for finding the position of a region where the characteristic information is located according to the learned useful characteristic information.

As shown in fig. 4, in the present embodiment, the decoding network includes four cascaded first neural network groups a to D, and the encoding network includes four cascaded second neural network groups a to D. And a first convolution layer E11, an E12, an E13 and a recalibration module E2 are arranged between the first neural network group D and the second neural network group a.

As shown in fig. 4, the first neural network group a includes a second convolutional layer a1, a recalibration module a2, and a pooling layer A3 in cascade; the first neural network group B comprises a second convolutional layer B11, a second convolutional layer B12, a recalibration module B2 and a pooling layer B3 which are cascaded; the first neural network group C comprises a second convolutional layer C11, C12, C13, a recalibration module C2 and a pooling layer C3 which are cascaded; the first neural network group D includes a second convolutional layer D11, D12, D13, a recalibration module D2, and a pooling layer D3 in cascade. The first neural network group A is used for extracting characteristic information of an input image to be segmented under a plurality of modalities, such as brain tumor characteristic information. Specifically, the second convolutional layer a1 is configured to perform convolution processing on an image to be segmented, the recalibration module a2 is configured to perform recalibration on the image after the convolution processing, and the pooling layer A3 is configured to perform pooling operation on the image after the recalibration.

The first neural network group B is used for extracting feature information, such as brain tumor feature information, from the image pooled by the pooling layer a3, and the specific process is similar to that in the first neural network group a and is not described herein again.

The first neural network group C is used for extracting feature information, such as brain tumor feature information, from the image pooled by the pooling layer B3, and the specific process is similar to that in the first neural network group a and is not described herein again.

The first neural network group D is used for extracting feature information, such as brain tumor feature information, from the image pooled by the pooling layer C3, and the specific process is similar to that in the first neural network group a and is not described herein again.

The first convolution layer E11 is configured to convolve the image pooled by the pooling layer D3, the first convolution layer E12 is configured to continue to convolve the image convolved by the first convolution layer E11, the first convolution layer E13 is configured to continue to convolve the image convolved by the first convolution layer E12, and the recalibration module E2 is configured to recalibrate the image convolved by the first convolution layer E13.

The second neural network group a comprises a cascaded deconvolution layer a1, a merging layer a2, a fourth convolution layer a31, a32, a33 and a recalibration module a 4; the second neural network group b comprises a cascaded deconvolution layer b1, a merging layer b2, a fourth convolution layer b31, b32, b33 and a recalibration module b 4; the second neural network group c comprises a cascaded deconvolution layer c1, a merging layer c2, a fourth convolution layer c31, c32, c33 and a recalibration module c 4; the second neural network group d comprises a cascaded deconvolution layer d1, a merging layer d2, a fourth convolution layer d31, d32 and a recalibration module d 4. A third convolutional layer e1 is arranged between the recalibration module d4 and the deepest output layer of the second neural network group d, and the third convolutional layer e1 is used for realizing the logistic regression of the image and does not belong to the second neural network group.

In this embodiment, the second neural network group a is used to restore feature information of the image, for example, brain tumor feature information, to a corresponding position of the image pooled by the pooling layer C3. In particular, the deconvolution layer a1 is used to reverse the operation of the pooling layer D3 to restore the image to the corresponding location in the image before pooling layer D3. The merging layer a2 is used to recover the feature information of the image, such as the feature information of the brain tumor, and the output image of the rescaling module D2 in the decoding network is linearly additively merged with the output of the deconvolution layer a1 through the merging layer a2 as the input of the fourth convolution layer a 31. The fourth convolutional layers a31, a32 and a33 are used to recover image characteristic information, such as brain tumor characteristic information, lost during pooling of the images by the maximum pooling layer D3.

Similar to the second neural network group a, the second neural network groups b to d are also used to restore information of the image, and finally, the corresponding positions of all feature information (e.g., brain tumor feature information) in the finally restored image are output by the retargeting module d4 in the second neural network group d, and finally, through the logistic regression of the third convolutional layer e1, an image segmentation result, e.g., a brain tumor image segmentation result, is obtained. Please refer to fig. 5a to 5d, wherein fig. 5a is a schematic diagram of superimposing the segmented image (brain tumor image) on fig. 2 a; fig. 5b is a schematic diagram of the image obtained after segmentation (brain tumor image) superimposed into fig. 2 b; FIG. 5c is a schematic diagram of the image obtained after segmentation (brain tumor image) superimposed into FIG. 2 c; fig. 5d is a schematic diagram of the image obtained by segmentation (brain tumor image) superimposed on fig. 2 d. Therefore, the full convolution neural network model adopted by the invention comprises the recalibration module, so that the space and the channel of the input image can be subjected to weight redistribution through the recalibration module, thereby inhibiting irrelevant space and channel characteristic information, enhancing the relevant space and channel characteristic information, effectively improving the precision of the integral segmentation algorithm, and simultaneously reducing the complicated operation of man-machine interaction.

Similar to the input of the fourth convolutional layer a31, the output image of the rescaling module C2 in the decoding network is linearly additively combined with the output of the inverse convolutional layer b1 through the combining layer b2 as the input of the fourth convolutional layer b 31; the output image of the recalibration module B2 in the decoding network is subjected to linear addition combination through the output of the merging layer c2 and the deconvolution layer c1 and then serves as the input of a fourth convolution layer c 31; the output image of the rescaling module a2 in the decoding network is linearly additively combined with the output of the deconvolution layer d1 by the combination layer d2 to serve as the input of the fourth convolution layer d 31.

Preferably, the decoding network further comprises a plurality of concatenated first residual connections, and the encoding network further comprises a plurality of concatenated second residual connections. Therefore, by arranging a plurality of cascaded first residual error connections in the decoding network and a plurality of cascaded second residual error connections in the coding network, the problems of gradient disappearance and gradient explosion are effectively relieved as the number of network layers is deeper, the transmission of effective characteristics is ensured, the image restoration is facilitated, and the accuracy of image segmentation is improved.

As shown in fig. 4, in the present embodiment, the decoding networks each include five concatenated first residual connections F1 to F5, and the encoding networks each include four concatenated second residual connections F1 to F4. Wherein, in the decoding network, the output of the input layer and the output of the rescaling module a2 can be added as the input of the pooling layer A3 by a first residual connection F1; the output of the second convolutional layer B11 and the output of the recalibration module B2 may be added as input to the pooling layer B3 by a first residual connection F2; the output of the second convolutional layer C11 and the output of the recalibration module C2 may be added as input of the pooling layer C3 by a first residual connection F3; the output of the second convolutional layer D11 and the output of the recalibration module D2 may be added as the input of the pooling layer D3 through a first residual connection F4; the output of the first convolutional layer E11 and the output of the recalibration module E2 may be added as input to the deconvolution layer a1 by a first residual connection F5. In the coding network, the output of the deconvolution layer a1 and the output of the recalibration module a4 can be added as the input of the deconvolution layer b1 through a second residual connection f 1; the output of the deconvolution layer b1 and the output of the recalibration module b4 can be added as the input of the deconvolution layer c1 through a second residual connection f 2; the output of the deconvolution layer c1 and the output of the recalibration module c4 can be added as the input of the deconvolution layer d1 through a second residual connection f 3; the output of the deconvolution layer d1 may be added to the output of the rescaling module d4 by a second residual connection f4 as an input to a third convolution layer e 1.

In the full convolution neural network model shown in fig. 4, the number of the first neural network groups included in the decoding network and the number of the second neural network groups included in the encoding network are examples, and should not be construed as limiting the embodiments of the present application. The number of the first neural network groups included in the decoding network and the number of the second neural network groups included in the encoding network can be set according to specific needs. It should be noted that, since encoding and decoding have a one-to-one correspondence relationship, in the full convolution neural network model provided in the embodiment of the present application, the number of first neural network groups included in the decoding network is equal to the number of second neural network groups included in the encoding network. In addition, the number of the second convolutional layers included in the first neural network group a is not limited to 1, and may be 2 or more than 2; the number of the second convolutional layers included in the first neural network group B is not limited to 2, and may be 1, 3, or 3 or more; the number of the second convolutional layers included in the first neural network groups C and D is not limited to 3, and may be 1, 2, or 3 or more; the number of the fourth convolutional layers included in the second neural network groups a to c is not limited to 3, and may be 1, 2, or 3 or more; the number of the fourth convolutional layers included in the second neural network group d is not limited to 2, and may be 1, 3, or 3 or more, and the present invention is not limited thereto.

Preferably, step S300 includes:

and segmenting the images to be segmented under the plurality of modes after the second preprocessing by adopting a pre-trained full convolution neural network model.

Therefore, the second preprocessing is carried out on the images to be segmented under the plurality of modalities after the first preprocessing, so that the noise information in the images to be segmented under each modality can be effectively filtered, the image quality of the images to be segmented under each modality can be effectively improved, and the quality of the segmented images can be improved.

Preferably, the images to be segmented in the plurality of modalities after the first preprocessing may be subjected to a second preprocessing, such as a filtering process, respectively, by using a three-dimensional gaussian filter, so as to remove noise in the images to be segmented in each modality. In addition, in some other embodiments, the image to be segmented in each modality may be subjected to second preprocessing, such as filtering, by using another commonly used filter, so as to remove noise in the image to be segmented in each modality, which is not limited by the present invention.

Preferably, the full convolution neural network model is obtained by training through the following steps:

Each original training image comprises a training image under a plurality of modes after registration, and the label image can be a gold standard segmentation result obtained by segmenting the original training image by adopting the existing segmentation method. Because the data of the original training sample is limited, deep learning needs to be performed on certain data to have certain robustness, and in order to increase the robustness, data amplification operation needs to be performed to increase the generalization capability of the full convolution neural network model. Specifically, the same random rigid transformation may be performed on the original training image and the corresponding label image, and specifically includes: rotation, scaling, translation, flipping, and grayscale transformation. More specifically, the original training image and the corresponding label image may be translated by-20 to 20 pixels up and down, translated by-20 to 20 pixels left and right, rotated by-20 ° to 20 °, horizontally flipped, vertically flipped, up and down symmetrically transformed, scaled by 0.8 to 1.2 times, left and right symmetrically transformed, and gray transformed, respectively, to complete data amplification of the image. Through the transformation, the original 20 cases of images can be expanded to 2000 cases, 1500 cases of images can be used for model training, and the remaining 500 cases of images can be used for model testing.

Preferably, in order to improve the accuracy of the model, after the extended training sample is generated and before the model training is performed, the training images in each modality in the extended training sample may be subjected to second preprocessing to remove noise in the images and improve the image quality of the training sample.

The model parameters of the full convolution neural network model include two types: characteristic parameters and hyper-parameters. The feature parameters are parameters for learning the image features, and include a weight parameter and a bias parameter. The hyper-parameters are parameters manually set during training, and the characteristic parameters can be learned from the sample only by setting the proper hyper-parameters. The hyper-parameters may include a learning rate, a number of hidden layers, a convolution kernel size, a number of training iterations, and a batch size per iteration. The learning rate can be considered as a step size.

For example, the learning rate is preferably set to be 0.001, the number of hidden layers is respectively 16, 32, 64, 128 and 256, the size of the convolution kernel is 3 × 3 × 3, the number of training iterations is 30000, and the batch size of each iteration is 2.

Preferably, the preset training end condition is that a prediction result of a training image in the expanded training sample and an error value of a corresponding label image converge to a preset error value. The training purpose of the full convolution neural network model is to make the image segmentation result obtained by the model close to the real and accurate image segmentation result, that is, the error between the two is reduced to a certain range, so that the preset training end condition can be that the error value between the prediction result of the training image in the expanded training sample and the corresponding label image converges to a preset error value. In addition, the training process of the full convolution neural network model is a multiple-cycle iteration process, so that the training can be finished by setting the iteration number, that is, the preset training end condition can be that the iteration number reaches the preset iteration number.

Preferably, the step of training the pre-built full convolution neural network model according to the expanded training samples and the initial values of the model parameters includes: and training a pre-built full convolution neural network model by adopting a random gradient descent method according to the expanded training sample and the initial value of the model parameter. Since the model training process is actually the process of minimizing the loss function, and the derivation can achieve this goal quickly and simply, the derivation method is the gradient descent method. Therefore, the full convolution neural network model is trained by adopting a gradient descent method, and the training of the full convolution neural network model can be quickly and simply realized.

In the deep learning in the implementation mode of the invention, the gradient descent method is mainly used for training the model, and then the back propagation algorithm is used for updating the weight parameters and the bias parameters in the optimized network model. Specifically, the position with the maximum slope of the curve is judged to be the direction of reaching the optimal value faster by adopting a gradient descent method, the partial derivative is solved by adopting a probabilistic chain derivation method to update the weight by adopting a back propagation method, and the parameter is updated by continuously iterative training to learn the image. The method for updating the weight parameter and the bias parameter by the back propagation algorithm is as follows:

1. firstly, carrying out forward propagation, updating parameters through continuous iterative training to learn an image, and calculating activation values of all layers (convolutional layers and anti-convolutional layers), namely obtaining an activation image after the image is subjected to convolution operation;

2. for the output layer (n th)_lLayer), calculating the sensitivity value

Wherein y is the true value of the sample,

in order to output the prediction value of the layer,

a partial derivative representing an output layer parameter;

3. for l ═ n_l-1,n_l-2.... for each layer, calculating a sensitivity value

Wherein, W^lRepresents the weight parameter, δ, of the l-th layer^l+1Represents the sensitivity value, f' (z) of layer l +1^l) Represents the partial derivative of the l-th layer;

4. updating the weight parameter and the bias parameter of each layer:

wherein, W^lAnd b^lRespectively representing the weight parameter and the bias parameter for the l layers,

to the learning rate, a^lRepresents the output value of the l-th layer, δ^l+1The sensitivity value of the l +1 layer is shown.

Preferably, the step of training the pre-built full convolution neural network model by using a random gradient descent method according to the expanded training sample and the initial value of the model parameter includes:

And when the loss function value does not converge to the preset value, the full convolution neural network model is not accurate, the model needs to be trained continuously, in this way, the model parameters are adjusted, the initial values of the model parameters are updated to the adjusted model parameters, the step A is returned to be executed, and the next iteration process is started.

The loss function L (W, b) in the present invention is expressed as:

wherein W and b represent weight parameters and bias parameters of the full convolution network, m is the number of training samples, m is a positive integer, xⁱI-th training sample representing the input, f_W,b(xⁱ) Denotes the prediction result of the i-th training sample, yⁱAnd K is a smoothing parameter to prevent the situation that the denominator is zero and cannot be calculated.

In accordance with the multi-modal image segmentation method, referring to fig. 6, a block diagram of a multi-modal image segmentation apparatus according to an embodiment of the present invention is schematically shown, and as shown in fig. 6, the multi-modal image segmentation apparatus includes:

an obtaining module 201, configured to obtain images to be segmented in multiple modalities;

the preprocessing module 202 is configured to perform first preprocessing on the acquired images to be segmented in the multiple modalities, so as to register the images to be segmented in the multiple modalities; and

the segmentation module 203 is configured to segment the to-be-segmented image in the multiple modalities after the first preprocessing by using a pre-trained full convolution neural network model to obtain a segmented image;

Preferably, the recalibration module comprises a channel recalibration sub-module and/or a space recalibration sub-module, the channel recalibration sub-module is used for performing weight reallocation on channels of the input images, and the space recalibration sub-module is used for performing weight reallocation on spaces of the input images.

Preferably, the segmentation module 203 includes:

Preferably, the channel recalibration sub-module comprises a global pooling layer, a first full-connection layer, a second full-connection layer and a first recalibration operation layer which are cascaded.

Preferably, the spatial recalibration sub-module comprises a cascade of a convolution layer and a second recalibration operation layer.

Preferably, the full convolution neural network model comprises a decoding network and an encoding network;

Preferably, the decoding network comprises a plurality of concatenated first residual connections and the encoding network comprises a plurality of concatenated second residual connections.

According to the multi-modal image segmentation device provided by the invention, the weight re-distribution can be carried out on the channels and/or the spaces of the input images through the re-calibration module, so that irrelevant channel and/or space characteristic information can be inhibited, relevant channel and/or space characteristic information can be enhanced, the precision of the overall segmentation algorithm can be effectively improved, meanwhile, the complicated operation of man-machine interaction can be reduced, the image segmentation algorithm is strong in universality, the end-to-end algorithm process is realized, and doctors can be better assisted.

Based on the above inventive concept, the present invention further provides an electronic device, please refer to fig. 7, which schematically shows a block structure diagram of the electronic device according to an embodiment of the present invention. As shown in fig. 7, the electronic device comprises a processor 301 and a memory 303, the memory 303 having stored thereon a computer program, which when executed by the processor 301, implements the multi-modal image segmentation method as described above.

The electronic equipment provided by the invention can perform weight redistribution on the channel and/or space of the input image through the recalibration module, thereby inhibiting irrelevant channel and/or space characteristic information, enhancing relevant channel and/or space characteristic information, effectively improving the precision of the integral segmentation algorithm, and simultaneously reducing the complicated operation of man-machine interaction.

As shown in fig. 7, the electronic device further includes a communication interface 302 and a communication bus 304, wherein the processor 301, the communication interface 302 and the memory 303 complete communication with each other through the communication bus 304. The communication bus 304 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus 304 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus. The communication interface 302 is used for communication between the electronic device and other devices.

The Processor 301 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor 301 or the like, the processor 301 being the control center of the electronic device and connecting the various parts of the entire electronic device with various interfaces and lines.

The memory 303 is used for storing the computer program, and the processor 301 implements various functions of the electronic device by running or executing the computer program stored in the memory 303 and calling data stored in the memory 303.

The memory 303 comprises non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The present invention also provides a readable storage medium having stored therein a computer program which, when executed by a processor, may implement the multi-modal image segmentation method described above.

The readable storage medium provided by the invention can perform weight redistribution on the channel and/or space of the input image through the recalibration module, thereby inhibiting irrelevant channel and/or space characteristic information, enhancing relevant channel and/or space characteristic information, effectively improving the precision of the overall segmentation algorithm, and simultaneously reducing the complicated operation of man-machine interaction.

The readable storage media of embodiments of the invention may take any combination of one or more computer-readable media. The readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this context, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

In summary, compared with the prior art, the multimodal image segmentation method, the multimodal image segmentation apparatus, the electronic device and the storage medium provided by the invention have the following advantages: the method comprises the steps of obtaining images to be segmented under a plurality of modes; then, performing first preprocessing on the acquired images to be segmented under the multiple modalities to register the images to be segmented under the multiple modalities; and then, segmenting the images to be segmented under the plurality of modes after the first preprocessing by adopting a pre-trained full convolution neural network model to obtain segmented images. Because the full convolution neural network model adopted by the invention comprises the recalibration module, the channel and/or space of the input image can be subjected to weight redistribution through the recalibration module, so that irrelevant channel and/or space characteristic information can be inhibited, relevant channel and/or space characteristic information can be enhanced, the precision of the integral segmentation algorithm is effectively improved, meanwhile, the complicated operation of man-machine interaction can be reduced, the image segmentation algorithm has strong universality, an end-to-end algorithm flow is realized, and doctors can be better assisted.

It should be noted that the apparatuses and methods disclosed in the embodiments herein can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments herein. In this regard, each block in the flowchart or block diagrams may represent a module, a program, or a portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, the functional modules in the embodiments herein may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The above description is only for the purpose of describing the preferred embodiments of the present invention, and is not intended to limit the scope of the present invention, and any variations and modifications made by those skilled in the art based on the above disclosure are within the scope of the appended claims. It will be apparent to those skilled in the art that various changes and modifications may be made in the invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A multi-modality image segmentation method, characterized by comprising:

acquiring images to be segmented under a plurality of modes;

2. The multi-modal image segmentation method according to claim 1, wherein the re-calibration module comprises a channel re-calibration sub-module and/or a spatial re-calibration sub-module, the channel re-calibration sub-module is configured to re-weight channels of the input image, and the spatial re-calibration sub-module is configured to re-weight a space of the input image.

3. The multi-modal image segmentation method according to claim 1, wherein the step of segmenting the image to be segmented in the plurality of modalities after the first preprocessing by using a pre-trained fully-convolutional neural network model comprises:

4. The multi-modality image segmentation method according to claim 1, wherein the step of performing the first pre-processing on the acquired images to be segmented in the plurality of modalities to register the images to be segmented in the plurality of modalities includes:

5. The multi-modal image segmentation method according to claim 2, wherein the channel recalibration sub-module comprises a global pooling layer, a first fully connected layer, a second fully connected layer, and a first recalibration operation layer in cascade.

6. The multi-modal image segmentation method of claim 2 wherein the spatial retargeting sub-module comprises a cascaded convolutional layer and a second retargeting operational layer.

7. The multi-modal image segmentation method of claim 1 wherein the full convolutional neural network model comprises a decoding network and an encoding network;

8. A multi-modality image segmentation apparatus, characterized by comprising:

9. An electronic device comprising a processor and a memory, the memory having stored thereon a computer program which, when executed by the processor, implements the method of any of claims 1 to 7.

10. A readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1 to 7.