Detailed Description
The invention is described in further detail below with reference to the attached drawings and detailed description:
example 1
In this embodiment 1, an intracranial aneurysm image detection method is described, which proposes feature extraction of an original 3D TOF-MRA image, so as to preserve original features of the image to the greatest extent, and improve detection performance of the intracranial aneurysm. And the linear network is used for replacing part of convolution modules in the original model, so that the complexity of the model is reduced and the training time of the model is shortened on the premise of ensuring that the detection performance of the model is basically unchanged. Therefore, the invention provides an intracranial aneurysm automatic detection model based on an attention mechanism and a Multi-layer perceptron, firstly, a PCA module (Pyramid channel attention) and a MC module (Multi-scale convolution) are added in 3D U-Net to deeply extract features of different layers, then, a decoder is used for carrying out feature fusion among different layers, and finally, a final intracranial aneurysm detection result is obtained through deep supervision and guidance network training. Based on the model, the invention proposes to replace the original partial convolution module by a three-dimensional labeling multi-layer perceptron module (3D Tokenized MLP) so as to reduce the complexity of the model.
As shown in fig. 1, the intracranial aneurysm image detection method in the embodiment includes the following steps:
and step 1, preprocessing the original 3D TOF-MRA.
The preprocessing operations include resampling, data normalization, and random data enhancement.
Resampling is to normalize voxels of different sizes in the original image to the same size, setting the median pixel pitch to [0.7,0.43,0.43].
The data normalization is to scale data with different sizes into the same data interval through processing, and the calculation formula is as follows:
,;
wherein the method comprises the steps ofIs the standard deviation of the two-dimensional image,as an average value of the values,a voxel value representing each point is represented,representing the normalized score.
Random data enhancement: when the algorithm performs training experiments, random data enhancement can perform random elastic deformation, random rotation and random scale transformation on the data in real time.
And 2, building a 3D MAU-Net detection model, sending the preprocessed image into the 3D MAU-Net detection model, and detecting the intracranial aneurysm of the image to obtain a segmentation result of the intracranial aneurysm.
The original TOF-MRA image is sent to the 3D MAU-Net detection model to obtain a segmentation result, so that the original characteristics of the image can be fully learned, and the detection performance of intracranial aneurysms is effectively improved.
The invention provides an intracranial aneurysm automatic detection model combining an attention mechanism with a 3D U-Net model in deep learning, and provides a pyramid channel attention module, a multi-scale convolution module and a 3D Tokenized MLP module.
Specifically, the 3D MAU-Net detection model is based on the traditional 3D U-Net model, a context awareness pyramid feature extraction module CPFE, a channel attention module CA and a multi-scale convolution module MC are introduced, extraction of feature information of deep context features, edges and contours of images is achieved, and in addition, the invention further provides that a three-dimensional labeling MLP (namely a 3D processed MLP) module is used for replacing a part of convolution module, so that the complexity of the 3D MAU-Net detection model is reduced fully on the premise that the detection performance is basically unchanged. The method can solve the problems that the existing manual interpretation method is easily influenced by subjective experience of doctors and wastes time and labor, and further improves the accuracy and efficiency of intracranial aneurysm detection.
After the step 1 is completed, the intracranial aneurysm image is pre-cut, and then is sent into an automatic intracranial aneurysm detection model based on an attention mechanism and a multi-layer perceptron, namely a 3D MAU-Net detection model. The detection model takes a 3D U-Net structure as a main network, is improved by introducing a PCA and MC functional module and a 3D enabled MLP module, so that the detection model is suitable for intracranial aneurysm detection, and the network architecture of the 3D MAU-Net detection model is shown as a figure 2 and comprises an encoder, a PCA module, an MC module, a decoder, a 3D enabled MLP module and a depth supervision module;
the processing flow of the preprocessed 3D TOF-MRA image in the 3D MAU-Net detection model is as follows:
firstly, sending the preprocessed 3D TOF-MRA image to an encoder, performing common convolution and residual connection to extract image features, and inputting the features extracted by the encoder to a 3D Tokenized MLP module to perform shifting MLP operation;
the encoder outputs five layers of features f1, f2, f3, f4 and f5 together with the 3D Tokenized MLP;
the advanced features f1 and f2 are subjected to feature learning through the MC module, extraction of image edge information is enhanced, and feature images of f1 'and f 2' are output; extracting the advanced features f3, f4 and f5 by using a PCA module, outputting f3 ', f4 ' and f5 ' feature images to obtain more complete bottom features, and then passing through a 3D enabled MLP module and a decoder module;
and f3 ' feature images and the upper layer S4 of the decoder are fused and input into the third layer decoder to obtain S3, f2 ' feature images and the upper layer S3 of the decoder are fused and input into the second layer decoder to obtain S2, f1 ' feature images and the upper layer S2 of the decoder are fused and input into the third layer decoder to obtain S1, the upper three layers of the decoder output S1, S2 and S3 are guided by a mixed loss function to train through a depth supervision module, and a final intracranial aneurysm detection result is obtained.
The encoder is formed by pooling two Conv3d convolution layers and a maximum value, the number of characteristic channels is doubled after each downsampling, and five layers of characteristics of outputs f1, f2, f3, f4 and f5 are activated through convolution and LeakyReLU.
Since the main improvement of 3D U-Net is based on feature fusion of the decoder, features extracted prior to entry into the decoder are critical for intracranial aneurysm detection. When the channel dimension performs feature learning, importance degrees of all channels are emphasized through different module structures, and feature extraction capacity of a deep learning network is improved at an encoder end, so that overall detection performance of a model is improved.
The PCA module structure is shown in fig. 3, the PCA module is composed of a context perception pyramid feature extraction module CPFE module and a channel attention module CA, the CPFE module firstly convolves a feature image from an encoder by utilizing cavities with convolution kernel sizes of 3 multiplied by 3 and different expansion rates (3, 5 and 7 respectively) to obtain multi-scale multi-receptive field features, then fuses the feature image obtained by the cavity convolutions with the feature image with the convolution kernel size of 1 multiplied by 1 to obtain an output feature layer P, then sends the fused feature image P into the CA module, gives different weights to different channels by utilizing the CA module, fully utilizes effective information, and finally inputs the output feature information into a decoder of the same layer. Wherein, definition of the hole convolution is as follows:
。
wherein the method comprises the steps ofIs an input signal which is provided with a signal,is an output signal which is provided with a signal,representing a length ofR corresponds to the samplingExpansion ratio of (2); in standard convolution, r=1.
Hole convolution as shown in FIG. 4, when the CA module assigns a larger weight to a channel that plays an important role in image detection, it is assumed thatFor PResetting, then transposing the reset P, and multiplying the transposed P and the reset P by matrixFinally, obtaining the channel attention map by using softmax:
。
Wherein,representing the effect of the ith channel on the jth channel; thereafter, matrix multiplication is performed between X transpose and reset of P, and the resulting result is reset to。
Finally, the result is combinedMultiplying and performing element-by-element summation operation on P to obtain final output:
。
Wherein,representing the output characteristics of the channel,the importance weight representing the ith feature dimension,a bias term representing a jth neuron;representing the scale parameters, the weights are learned gradually starting from 0.
The formula shows that the final feature of each channel is a weighted sum of the features of all channels and the original features, which is beneficial to improving the discrimination between the features.
As shown in fig. 5, the MC module performs a convolution operation on a feature map obtained at a certain time by using 3 x 3 convolution, 5 x 5 convolution, and 7 x 7 convolution, while simultaneously convolving with 1 x 1 convolution, and obtaining a new feature map with different feature information, and then fusing the different feature maps to obtain a final output feature map. The multi-scale convolution does not change the size of the original feature map, but only enriches the features of the image through convolution operations of different convolutions, and extracts interesting feature information from the image from a global view angle, so that the performance of the model is improved. Because the multi-scale convolution fully utilizes convolution kernels with different sizes, not only can abundant context information be obtained in the feature extraction process, but also the edge contour information of the image can be better reserved.
As shown in FIG. 6, the 3D Tokenized MLP is a three-dimensional extension of the 2D Tokenized MLP block. In the 3D Tokenized MLP module, the feature map is processed as follows:
transmitting the feature map to a first Shifted multi-layer perceptron (Shifted MLP) to carry out width-wise shifting MLP operation, and carrying out depth convolution (DWConv) on the feature information after shifting; after a depth convolution, using a GELU activation function to output to the following Shifted MLP, respectively performing a shifting MLP operation on the height and a shifting MLP operation on the depth on the second shifting multi-layer perceptron Shifted MLP and the third shifting multi-layer perceptron Shifted MLP, performing the depth convolution again, and performing LN normalization on the feature map; here, the 3D token MLP uses a residual connection to linearly add the initial feature map token to the shifted MLP feature map and pass the output features to the next co-layer PCA module.
The mathematical computation process in the 3D Tokenized MLP module can be calculated as:
;
;
;
;
;
。
wherein X is shift Representing the output shifted in the X direction, Y shift Representing the output shifted in the Y direction, Z shift Representing the output shifted in the Z direction; token represents a marker location; t represents the original feature map token, H represents the height, W represents the width, D represents the depth, DWconv represents the depth convolution, LN represents the layer normalization, GELU is the activation function, X represents the output of the X-direction through the Tokenized MLP, Y represents the output of the Y-direction through the Tokenized MLP, and Z is the current output of the module.
When the depth convolution carries out convolution operation, the number of convolution kernels is consistent with the number of channels of the upper layer, and the number of output characteristic diagrams is ensured to be consistent with the number of input characteristic diagrams. As shown in fig. 7.
The decoder end is composed of functional modules, and the decoding stage and the encoding stage are mirror images.
In the decoding stage, the resolution is doubled once per upsampling. And the information of the encoding stage and the information of the decoding stage are combined through the jump connection, so that the image characteristic information quantity contained in the network is increased.
After the 3D MAU-Net detection model is built, the model needs to be trained, and the training process is as follows:
and sending the data set of the TOF-MRA image obtained after pretreatment into a 3D MAU-Net detection model for training.
The model complex training optimization algorithm adopts an ADAM algorithm with an initial learning rate lr of 0.001;
weight decay index of 10 -5 Setting a moving index average valueThe training loss is monitored every 30 iteration cycles as long asDoes not reduce by 5X 10 -4 Lr is reduced by a factor of 5; the network adopts a mixed loss function;
the number of the batch sizes is 30, the maximum iteration period of the network training is 300, each iteration period comprises 150 iterations, and when the iteration period reaches the maximum value or lr is less than 10 -8 The network stops training;
and (5) saving the model weight at the highest accuracy and lower loss under the verification set at present, and completing the training process.
In the training process of the 3D MAU-Net detection model, a Focal loss function and a Lovasz-Softmax loss function are adopted to jointly represent global loss and local loss, and the two losses are combined to be expressed as: loss=loss F +loss L 。
Wherein loss represents a loss function of the 3D MAU-Net detection model, loss F Table Focal loss, loss L Representing the lovassz-Softmax loss. Focal loss is used to deal with extreme foreground-background class imbalance problems, focusing loss on difficult-to-separate samples, starting from the difficulty of sample classification, defined as follows:
。
wherein y is i For input instance x i True category, p i To predict input instance x i Probability of belonging to real class, for extremely unbalanced class, addTo predict samples with low probability, and to switch positive samplesThe fluence becomes large, gamma is an adjustable factor, N represents the total number of samples, and alpha represents a weight parameter.
Lovasz-Softmax can promote the intersection ratio IoU score of the segmentation result, and IoU score is also called Jaccard index, which is defined as:。
wherein the method comprises the steps ofRepresenting standard results, y represents network prediction results, so Jaccard loss is defined as:
。
optimizing Jaccard loss by Lovasz-Softmax is often used to fill in the gaps in Jaccard loss partitioning, recover small objects, and form a more reasonable global partitioning, defined as follows:
;
。
wherein p represents the number of pixels;representing the first of the network predictions𝑖Value of individual element->Represents the first of the standard results𝑖The values of the individual elements.
Furthermore, in order to verify the effectiveness of the method of the invention, the following experiments are also presented:
the data used in this experiment was provided by published data in a hospital and MICCAI2020 aneurysm detection and segmentation (ADAM) challenge, and the experimental data were 1043 TOF-MRA images, 500 negative samples and 543 positive samples.
S1, preprocessing a data set:
the label used in the experiment is a binarized image, and since the negative sample does not have the original label, a label image which is consistent with the original image in size and has a voxel value of 0 needs to be generated in the process of constructing the data set. All subject data were then divided into large, medium, small and non-intracranial aneurysms patients, with intracranial aneurysms greater than 7mm in diameter labeled large aneurysms, medium aneurysms between 3mm and 7mm in diameter labeled medium aneurysms, small aneurysms less than 3mm in diameter labeled small aneurysms, and subjects not having an intracranial aneurysm labeled normal human specimens. And split it into training and testing sets.
S2, sending the training set of the TOF-MRA image obtained after pretreatment into a lightweight 3D MAU-Net detection model based on MLP for training.
The network training optimization algorithm provided by the model adopts an ADAM algorithm with initial learning rate of lr=0.001, and weight attenuation index is 10 -5 Setting a moving index average valueThe training loss is monitored every 30 iteration cycles as long asDoes not reduce by 5X 10 -4 Lr is reduced by a factor of 5. The network employs a hybrid loss function. The number of the batch sizes is 30, the maximum iteration period of the network training is 300, each iteration period comprises 150 iterations, and when the iteration period reaches the maximum value or lr is less than 10 -8 The network stops training. And saving the model weight at the time of highest accuracy and lower loss under the verification set.
And S3, sending the test set into a lightweight 3D MAU-Net to obtain an intracranial aneurysm detection result. And carrying out visualization processing on the image to obtain a visualization result. The detection model outputs intracranial aneurysm binarization images, the original MIP images and the corresponding binarization images are simultaneously opened by amide software to splice the images, and the aneurysm images are marked by different colors, so that the observation is convenient.
Example 2
This embodiment 2 describes an intracranial aneurysm image detection system based on the same inventive concept as the intracranial aneurysm image detection method described in embodiment 1 above.
Specifically, an intracranial aneurysm image detection system comprising:
the preprocessing module is used for preprocessing the original 3D TOF-MRA;
and the intracranial aneurysm identification module is used for building a 3D MAU-Net detection model, sending the preprocessed image into the 3D MAU-Net detection model, and carrying out intracranial aneurysm detection on the image to obtain an intracranial aneurysm segmentation result.
It should be noted that, in the intracranial aneurysm image detection system, the implementation process of the functions and roles of each functional module is specifically detailed in the implementation process of the corresponding steps in the method in the above embodiment 1, and will not be described herein.
Example 3
Embodiment 3 describes a computer apparatus for implementing the intracranial aneurysm image detection method described in embodiment 1.
In particular, the computer device includes a memory and one or more processors. Executable code is stored in the memory for implementing the steps of the intracranial aneurysm image detection method described above when the executable code is executed by the processor.
In this embodiment, the computer device is any device or apparatus having data processing capability, which is not described herein.
Example 4
Embodiment 4 describes a computer-readable storage medium for implementing the intracranial aneurysm image detection method described in embodiment 1.
Specifically, the computer-readable storage medium in this embodiment 4 has stored thereon a program for implementing the steps of the intracranial aneurysm image detection method described above when executed by a processor.
The computer readable storage medium may be an internal storage unit of any device or apparatus having data processing capability, such as a hard disk or a memory, or may be an external storage device of any device having data processing capability, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), or the like, which are provided on the device.
The foregoing description is, of course, merely illustrative of preferred embodiments of the present invention, and it should be understood that the present invention is not limited to the above-described embodiments, but is intended to cover all modifications, equivalents and alternatives falling within the spirit and scope of the present invention as defined by the appended claims.