CN114359194A

CN114359194A - Multi-mode stroke infarct area image processing method based on improved U-Net network

Info

Publication number: CN114359194A
Application number: CN202111606856.XA
Authority: CN
Inventors: 金心宇; 陈榕; 赵晗; 金昀程; 陈智鸿
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2021-12-27
Filing date: 2021-12-27
Publication date: 2022-04-15
Anticipated expiration: 2041-12-27
Also published as: CN114359194B

Abstract

The invention discloses a multi-mode cerebral apoplexy infarct region image processing method based on an improved U-Net network, which comprises the steps of obtaining a brain medical image of a patient and inputting the brain medical image into an upper computer, wherein the brain medical image comprises a cerebral blood flow chart, a cerebral blood volume chart and a diffusion weighted imaging image, and matching 2-mode data of the cerebral blood flow chart and the cerebral blood volume chart of the same patient by using a 3D Slicer as a comprehensive analysis tool library with the diffusion weighted imaging image as a reference to obtain a registered NIFIT format file; using trained U-Net_AttentionThe network divides the registered NIFIT format fileAnd cutting to obtain a two-dimensional segmentation result image, and displaying the output result image in the upper computer. According to the method, the attention degree of the region related to the final segmentation result of the infarct region in the bottom layer characteristic diagram is enhanced through calculation of the attention weight of the space and the channel, so that the prediction level of the model is improved.

Description

Multi-mode stroke infarct area image processing method based on improved U-Net network

Technical Field

The invention relates to the technical field of medical image processing, in particular to an image processing method for multi-mode stroke infarction region segmentation based on an improved U-Net network.

Background

The stroke becomes the first threat to the health and life safety of middle-aged and elderly people in China. Perfusion images of Nuclear Magnetic Resonance Imaging (NMRI) and Computed Tomography (CT) are the criteria for a physician to determine the infarcted area of the brain of a patient. With the rapid development of medical image processing technology, digital Imaging and Communications DICOM (digital Imaging and Communications in medicine) is widely used in radiological diagnosis and treatment equipment, and the calculation of each diagnosis index of stroke increasingly depends on digital Imaging and Communications in DICOM medical image. At present, the clinical diagnosis of stroke still depends on manually labeled Nuclear Magnetic Resonance Images (NMRI) or Computed Tomography (CT) perfusion images, the method is time-consuming and easy to delay the state of an illness, and doctors with rich experience are needed, otherwise, the judgment of the state of an illness is easy to be inaccurate.

With the development of deep learning technology in recent years, the deep learning technology has been successful in the fields of target detection, image classification, image segmentation and the like. The application of deep learning techniques in medical image processing is also increasing, and higher accuracy is achieved in many problems than in conventional machine learning and statistical algorithms. Segmentation of an ischemic stroke infarct region is a typical medical image segmentation problem, and can be predicted by using a U-Net network model which is widely used in the field of medical image segmentation. However, the traditional prediction model only applies the brain image of the patient in a single mode, and does not fully mine rich information provided by data in different modes; in addition, due to the fact that the peduncle area has the conditions of extremely irregular edges and complex and variable form and position, the existing U-Net network does not deal with the problem in a targeted mode. Therefore, an image processing method based on multi-mode stroke infarct region segmentation of an improved U-Net network is needed.

Disclosure of Invention

The invention aims to solve the technical problem of providing an image processing method for multi-mode stroke infarct region segmentation based on an improved U-Net network, which is used for carrying out image processing on medical brain images of stroke patients.

In order to solve the technical problem, the invention provides an image processing method for multi-modal stroke infarct region segmentation based on an improved U-Net network, which comprises the following specific processes: acquiring a brain medical image of a patient and inputting the brain medical image into an upper computer, wherein the brain medical image comprises a cerebral blood flow chart, a cerebral blood volume chart and a diffusion weighted imaging image, and matching 2-mode data of the cerebral blood flow chart and the cerebral blood volume chart of the same patient by using a 3D Slicer in a comprehensive analysis tool library by taking the diffusion weighted imaging image as a reference to obtain an NIFIT format file after registration; using trained U-Net_AttentionThe network segments the registered NIFIT format file to obtain a two-dimensional segmentation result image, and displays the output result image in an upper computer;

the U-Net_AttentionThe network comprises a U-Net network as a baseline model, an attention mechanism module is introduced at a long connection position of the baseline model for improvement, and a target area is emphasized in a low-level feature map by calculating weights through the feature map of a codec.

The invention relates to an improvement of an image processing method for multi-modal stroke infarct area segmentation based on an improved U-Net network, which comprises the following steps:

the attention mechanism module includes a channel attention weight calculation and a spatial attention weight calculation:

(1) spatial attention weight calculation

The features of the codec are extracted by two 1 × 1 convolutions, respectively, and the attention distribution F and the attention score a are obtained as follows:

F(x_l，x_h)＝σ₁(w_αx_l+b_α)+σ₁(w_βx_h+b_β) (formula 1)

A＝σ₂(w_γF(x_l，x_h)+b_γ) (formula 2)

Wherein x is_lFor the low-level feature map from the encoder, x_hFor high level feature maps from the decoder, x_lAnd x_hIs the input of the attention module, σ₁As Relu activation function, σ₂Activating a function for Sigmoid, w_α、w_β、w_γAnd b_α、b_β、b_γParameters and offsets representing the convolution, respectively;

(2) channel attention weight calculation

Compressing the space size of the whole feature map into 1 × 1 through global average pooling operation, generating the attention weight on the channel through convolution of the channel feature map subjected to global average pooling GAP operation, and multiplying the attention weight by the low-level features to finally realize the attention weight calculation on the channel:

where x (i, j) is an element in the feature map at position (i, j), and W and H represent the size of the two-dimensional space of the feature map.

The invention is further improved by an image processing method for multi-mode stroke infarct region segmentation based on an improved U-Net network, and comprises the following steps:

the trained U-Net_AttentionThe network acquisition process comprises the following steps:

(1) the loss function L is defined as:

L＝αL_DSC+βL_{focal_loss}(formula 4)

Wherein, alpha and beta are the weights of the two loss functions respectively;

wherein, y_gtFor the true annotation value of a pixel, y' is viaAnd (3) outputting final prediction of the Sigmoid activation function, wherein alpha and gamma are two adjusting coefficients, wherein alpha is used for balancing positive and negative samples, and gamma is used for adjusting the weight of the samples;

wherein X represents a real sample label region, Y represents a prediction output result region, | X | represents the area of the real sample label region, | Y | represents the area of the prediction output result region;

(2)、U-Net_Attentionnetwork training

Acquiring medical brain images of a patient, including a cerebral blood flow chart, a cerebral blood volume chart and diffusion-weighted imaging image data, matching different modal data of the patient by using a 3D Slicer comprehensive analysis tool library and taking diffusion-weighted imaging images as a reference, and acquiring a registered NIFIT format file as a U-Net_AttentionAn input data set for training and testing of the network; manually labeling the peduncle region image on the diffusion weighted imaging modality image to be used as a corresponding label, taking 75% of input data sets and the corresponding labels as training data sets, and taking the rest 25% of the input data sets and the labels as test data sets;

using an Adam algorithm as an optimization loss function L, adopting a four-fold cross validation mode, and using an averaged result as an index of a final evaluation algorithm; training initial learning rate is set to 1e^-3The lowest learning rate is 1e^-10And the dynamic learning rate reduction is realized in an exponential decay mode according to the number of training rounds, and 260 rounds of iteration are trained to obtain a trained pt model file;

(3)、U-Net_Attentionnetwork testing

Using U-Net_AttentionLoading a trained pt model file by a network, and taking a test data set as U-Net_AttentionThe network input obtains a segmentation result, thereby verifying and obtaining the trained U-Net_AttentionA network.

the U-Net network includes an encoder, a decoder, and a long connection.

It is emphasized that the image obtained by the present invention is only used for the observation of the user, and is not used as the basis for judging the cerebral apoplexy infarction.

The invention has the following beneficial effects:

the method comprises the steps of establishing a base line model for segmenting the infarct area based on a U-Net network coding and decoding structure aiming at multi-modal brain medical images, improving an original long connection attention module aiming at the defect that splicing operation at a long connection part of the base line model is simple, and strengthening the attention degree of a region related to a final infarct area segmentation result in a bottom layer characteristic diagram through calculation of space and channel attention weight so as to improve the prediction level of the model.

Drawings

The following describes embodiments of the present invention in further detail with reference to the accompanying drawings.

FIG. 1 is a schematic flow chart of an image processing method for multi-modal stroke infarct region segmentation based on an improved U-Net network according to the present invention;

FIG. 2 is a topology diagram of a baseline network U-Net network of the present invention;

FIG. 3 is a diagram of a attention mechanism module network architecture of the present invention;

FIG. 4 shows the manual labeling, U-Net, of comparative experiment 1_AttentionAnd (4) segmenting the result image contrast map by the network and the baseline U-Net network.

Detailed Description

The invention will be further described with reference to specific examples, but the scope of the invention is not limited thereto:

embodiment 1, an image processing method for multi-modal stroke infarct region segmentation based on an improved U-Net network, as shown in fig. 1 to 3, includes obtaining a CT perfusion multi-modal brain medical image of a stroke patient and a lesion information image labeled in a Diffusion Weighted Imaging (DWI) format, and performing matching with a Diffusion Weighted Imaging (DWI) modal image as a reference, so that the images are matched with each otherAll data of the same patient have the greatest similarity; taking a U-Net network as a baseline model, improving the model at long connection positions based on an attention mechanism, calculating weights through feature maps of a codec to emphasize target areas in a low-level feature map, and forming a new network called U-Net_AttentionA network; taking CT perfusion multi-modal brain images as input, taking marked focus information images as labels, and training the network to extract feature description of focus information, thereby improving the recognition accuracy rate of the model on the whole infarction region; the method comprises the following specific steps:

s1, acquiring images and registering:

acquiring a CT perfusion multi-modal brain image and a lesion information image labeled under a Diffusion Weighted Imaging (DWI) mode of a patient, wherein the CT perfusion multi-modal brain image comprises images of 2 modes of cerebral blood flow volume (CBF) and Cerebral Blood Volume (CBV);

the data used in this example are from published datasets (458 slices) provided by the desensitization-treated Ischemic Stroke CT perfusion multimodal brain imaging (286 slices) and Ischemic Stroke Lesion Segmentation (Ischemic Stroke Lesion Segmentation)2018 contest provided by the present laboratory collaboration hospital, both in Neuroimaging Informatics Technology Initiative (NIFTI) format. Each patient acquired images within 8 hours of ischemic stroke onset, first the patient generated CT perfusion multimodality brain images in dynamic scans with 1-2 second intervals by using contrast agent. The CT perfusion scanning is obtained in the form of sparse regions (with axial distance of 5mm) covered with stroke lesions, the slice range is 2-22, a Diffusion Weighted Imaging (DWI) image is obtained through magnetic resonance scanning within a few hours after scanning, and manual labeling of the infarct region is carried out on the Diffusion Weighted Imaging (DWI) image to obtain labeling data. All images are manually screened twice by medical experts, and some data which do not meet the requirements, such as imaging errors, labeling errors and omissions, are eliminated.

Because of different data sources, different acquisition equipment protocols and the like, images need to be uniformly registered, namely, the images under different modes, different fields of view and different time are subjected to a series of spatial transformation, so that the two images have the maximum similarity, and the matching purpose is achieved; and matching the data of 2 modes including cerebral blood flow volume (CBF) under CT perfusion and Cerebral Blood Volume (CBV) of the same patient by using a 3D Slicer of a comprehensive analysis tool library of nuclear magnetic resonance imaging and computed tomography imaging with a Diffusion Weighted Imaging (DWI) image as a reference to obtain a registered NIFIT format file, so that all data of the same patient have the maximum similarity.

S2、U-Net_AttentionConstruction of a network

Taking a U-Net network as a baseline model, improving the model at long connection positions based on an attention mechanism, calculating weights through feature maps of a codec to emphasize target areas in a low-level feature map, and forming a new network called U-Net_AttentionA network.

S201: building U-Net network model

A U-Net network is built as a baseline network, and as shown in FIG. 2, the U-Net network comprises three parts, namely an encoder, a decoder and a long connection. The encoder is mainly responsible for extracting features, and a deep feature map is obtained through a series of convolution and down-sampling operations; the decoder is mainly responsible for restoring the image, restores the size of the image through a series of convolution and up-sampling operations and generates a prediction result; the long connection is mainly responsible for splicing the feature maps of the corresponding space sizes of the encoder and the decoder, so that a finer recovery result is obtained by utilizing the low-layer features of the encoder; the network structure of the U-Net network model is shown in FIG. 2.

S202: module for constructing attention mechanism

Introducing an attention mechanism module at a long connection position of the U-Net network model, calculating a weight through a feature map of a coder-decoder to emphasize a target area in a low-layer feature map and finally improve the feature representation of the decoder, wherein the network structure diagram of the attention mechanism module is shown in FIG. 3;

the calculation of the attention mechanism module can be divided into two parts, channel attention weight calculation and spatial attention weight calculation:

(1) spatial attention weight calculation

First, the features of the codec are extracted by two 1 × 1 convolutions, and the attention distribution F and the attention score a are obtained by the following formula:

F(x_l，x_h)＝σ₁(w_αx_l+b_α)+σ₁(w_βx_h+b_β) (formula 1)

A＝σ₂(w_γF(x_l，x_h)+b_γ) (formula 2)

Wherein x is_lFor the low-level feature map from the encoder, x_hFor high level feature maps from the decoder, x_lAnd x_hIs the input of the attention module, σ₁Activating a function for Relu to introduce non-linearity in attention calculation, increasing complexity of weight calculation, σ₂Activating a function for Sigmoid for generating a weight map between (0, 1), w_α、w_β、w_γAnd b_α、b_β、b_γParameters and offsets representing the convolution, respectively;

(2) channel attention weight calculation

Compressing the space size of the whole feature map to 1 × 1 through global average pooling operation, wherein W and H in formula 3 represent the two-dimensional space size of the feature map, generating the attention weight on the channel through convolution of the channel feature map after Global Average Pooling (GAP) operation, and multiplying the attention weight by the low-level features to finally realize the attention weight calculation on the channel:

wherein x (i, j) is an element with position (i, j) in the feature diagram

S3：U-Net_AttentionTraining and testing of models

S301: establishing a loss function

To increase U-Net_AttentionThe identification capability of the model to the whole infarcted area is realized by introducing DSC (differential scanning calorimetry) Loss as another weight on the basis of Focal Loss, and using the Focal Loss and the Focal LossThe weighting of DSC Loss as a Loss function is defined as follows:

L＝αL_DSC+βL_{focal_loss}(formula 4)

Where α and β are used to adjust the weights of the two loss functions.

The calculation formula of Focal local is as follows:

wherein, y_gtThe true labeled value of the pixel is obtained, y' is the final prediction output through a Sigmoid activation function, alpha and gamma are two adjusting coefficients, wherein alpha is used for balancing positive and negative samples, and gamma is used for adjusting the weight of the samples;

the DSC Loss is calculated as follows:

wherein X represents a real sample label region; y denotes a prediction output result region. | X | represents the area of the true sample label region and | Y | represents the area of the prediction output result region.

S302：U-Net_AttentionNetwork training

Obtaining medical brain images of the patient according to step S1, including CT perfusion Cerebral Blood Flow (CBF), Cerebral Blood Volume (CBV)2 modal images and Diffusion Weighted Imaging (DWI) image data, inputting the images into an upper computer, then performing unified registration on the images, matching different modal data of the patient by using a comprehensive analysis tool library 3D Slicer with the Diffusion Weighted Imaging (DWI) image as a reference, and obtaining a registered NIFIT format file as U-Net_AttentionAn input data set for training and testing of the network; meanwhile, manually labeling an peduncle region image on a Diffusion Weighted Imaging (DWI) image as a label corresponding to an input image, taking 75% of the input data set and the label as a training data set, and taking the rest 25% of the input data set and the label as a test data set;

the training data set and the test data set contain as far as possible image data of different patients.

Using an Adam algorithm as an optimization loss function L, and paying attention to the fact that in order to avoid the deviation of results caused by unreasonable segmentation of a training data set and a test data set, a four-fold cross validation mode is adopted in the experimental process, namely, the training data set is divided into four parts according to patients, three parts of the training data set are used as the training set and the other part of the training data set is used as a validation set each time, the experiment is repeated for four times, and the average result is used as the index of the final evaluation algorithm;

U-Net_Attentionthe training of the network is realized based on a Tensorflow framework. Initial learning rate is set to 1e^-3The lowest learning rate is 1e^-10And a dynamic learning rate reduction is achieved in an exponentially decaying manner according to the number of training rounds. And (4) performing 260 rounds of iteration to obtain a trained pt model file, wherein weights and parameters of the neural network nodes are stored in the pt model file and are used for testing the test data set in the step S303.

S303：U-Net_AttentionNetwork testing

Using U-Net in the test phase_AttentionAnd (3) loading the trained pt model file, namely loading the model weight obtained by training, and testing by using the multi-mode image of the test data set which does not participate in the training as U-Net_AttentionThe network inputs to obtain a prediction result, and a two-dimensional segmentation result image with higher prediction accuracy than that of the U-Net baseline model is successfully obtained, so that the trained U-Net is verified and obtained_AttentionA network.

S04, online use

Acquiring a medical brain image of a patient, wherein the medical brain image comprises a cerebral blood flow volume (CBF) map, a Cerebral Blood Volume (CBV) map and Diffusion Weighted Imaging (DWI) image data, and inputting the medical brain image into an upper computer; matching different modal data of a patient by using a comprehensive analysis tool library 3D Slicer with DWI images as a reference to obtain a registered NIFIT format file; with the trained U-Net obtained in step S303_AttentionSegmenting the registered NIFIT format file to obtain a two-dimensional segmentation result image, and displaying the two-dimensional segmentation result image in an upper computerAnd displaying and outputting the two-dimensional segmentation result image.

Comparative experiment 1:

acquiring medical brain images of the same patient, including a cerebral blood flow volume (CBF) map, a Cerebral Blood Volume (CBV) map and Diffusion Weighted Imaging (DWI) image data, and inputting the images into an upper computer; matching different modal data of a patient by using a comprehensive analysis tool library 3D Slicer with DWI images as a reference to obtain a registered NIFIT format file; then, the doctor inputs the registered NIFIT format files into U-Net respectively according to the actual infarct area marked by the DWI image_AttentionObtaining two-dimensional infarct region segmentation result images by a network and a baseline U-Net network, manually labeling and U-Net_AttentionThe segmentation result images of the network and the baseline U-Net network infarct area are shown in FIG. 4, the first column is the image of the actual infarct area labeled by the doctor according to the DWI image, the second column is the segmentation result image of the baseline U-Net, and the third column is the U-Net in the invention_AttentionAnd (4) segmenting a result image of the model. As can be seen from FIG. 4, the division area of the base-line network U-Net is obviously larger than the actual infarction area, and is easy to shift and miss detection, and after the improvement of the long connection attention mechanism, the U-Net in the invention_AttentionThe conditions of missed detection and false detection of the infarct area by the model are reduced, the similarity of the predicted infarct area and the normal area is improved, and the similarity is closer to a real labeling result.

Comparative experiment 2:

data used in the comparison experiment are from desensitized Ischemic Stroke CT perfusion multi-modal brain images (286 slices) provided by a cooperative hospital and public data sets (458 slices) provided by an Ischemic Stroke Lesion Segmentation (Ischemic Stroke Segmentation)2018 competition, different modal data of a patient are matched by using a comprehensive analysis tool library 3D Slicer with DWI images as a reference, and registered NIFIT format files are obtained; and then inputting the registered NIFIT format file into a U-NetAttention network and a baseline U-Net network respectively so as to obtain a two-dimensional infarct area segmentation result image.

The DSC coefficient, Sensitivity (Sensitivity), and Hausdorff Distance (HD for short) are used as performance indexes for comparison, and the calculation method is described as follows:

1) DSC coefficient: setting a pixel set X on the segmented image of the infarct area obtained by model calculation, and a pixel set Y of the labeling image of the real infarct area, wherein 1 represents that the pixel belongs to the infarct area, and 0 represents that the pixel belongs to the normal area, and then calculating DSC coefficients of the two sets as follows:

wherein | X | + | Y | is the intersection of the sets X and Y, and | X | + | Y | is the addition of the point sets of the sets X and Y.

2) The Sensitivity (Sensitivity) represents how many pixels belonging to the infarct area are correctly predicted as the true value, and the calculation method is as follows:

wherein TP represents that the true value is a pixel belonging to the infarct area and the model is predicted as the infarct area, FP represents that the true value is a pixel belonging to the infarct area and the model is predicted as the normal area

3) The Housdov Distance (HD) is used for measuring the similarity between two point sets, and a pixel set X on an peduncle region segmentation image obtained by model calculation and a pixel set Y of a real peduncle region labeling image are calculated in the following mode:

d_H(X,Y)＝max{sup_x∈Xinf_y∈Yd(x,y),sup_y∈Yinf_x∈Xd (x, y) } (equation 9)

The DSC coefficient is the most important evaluation index, and when DSCs are close, Sensitivity (Sensitivity) and Hausdorff Distance (HD) indexes are used as supplementary comparisons, and the results of the comparative experiments are as follows:

the higher the DSC coefficient, the lower the HD value, and the lower the Sensitivity value, the higher the degree of area coincidence between the predicted infarction area and the actual infarction area, and the greater the similarity. From the results in the table, it can be known that the improved attention module enhances the attention of the model to the important region on the feature map through the calculation of the space and the channel weight, and improves the segmentation accuracy of the final infarct region.

Finally, it is also noted that the above-mentioned lists merely illustrate a few specific embodiments of the invention. It is obvious that the invention is not limited to the above embodiments, but that many variations are possible. All modifications which can be derived or suggested by a person skilled in the art from the disclosure of the present invention are to be considered within the scope of the invention.

Claims

1. A multi-mode stroke infarct region image processing method based on an improved U-Net network is characterized by comprising the following steps:

acquiring a brain medical image of a patient and inputting the brain medical image into an upper computer, wherein the brain medical image comprises a cerebral blood flow chart, a cerebral blood volume chart and a diffusion weighted imaging image, and matching 2-mode data of the cerebral blood flow chart and the cerebral blood volume chart of the same patient by using a 3D Slicer in a comprehensive analysis tool library by taking the diffusion weighted imaging image as a reference to obtain an NIFIT format file after registration; using trained U-Net_AttentionThe network segments the registered NIFIT format file to obtain a two-dimensional segmentation result image, and displays the output result image in an upper computer;

2. The image processing method of multi-modal stroke infarct region segmentation based on the improved U-Net network according to claim 1, characterized in that:

(1) spatial attention weight calculation

F(x_l，x_h)＝σ₁(w_αx_l+b_α)+σ₁(w_βx_h+b_β) (formula 1)

A＝σ₂(w_γF(x_l，x_h)+b_γ) (formula 2)

(2) channel attention weight calculation

3. The image processing method of multi-modal stroke infarct region segmentation based on the improved U-Net network according to claim 2, characterized in that:

(1) the loss function L is defined as:

L＝αL_DSC+βL_{focal_loss}(formula 4)

Wherein, alpha and beta are the weights of the two loss functions respectively;

(2)、U-Net_Attentionnetwork training

using Adam's algorithm as the bestChanging a loss function L, adopting a verification mode of four-fold cross verification, and taking an average result as an index of a final evaluation algorithm; training initial learning rate is set to le^-3The lowest learning rate is le^-10And the dynamic learning rate reduction is realized in an exponential decay mode according to the number of training rounds, and 260 rounds of iteration are trained to obtain a trained pt model file;

(3)、U-Net_Attentionnetwork testing

4. The image processing method of multi-modal stroke infarct region segmentation based on the improved U-Net network according to claim 3, characterized in that:

the U-Net network includes an encoder, a decoder, and a long connection.