CN109886273B

CN109886273B - CMR image segmentation and classification system

Info

Publication number: CN109886273B
Application number: CN201910149716.0A
Authority: CN
Inventors: 陈玉成; 吴锡; 李孝杰
Original assignee: West China Hospital of Sichuan University
Current assignee: West China Hospital of Sichuan University
Priority date: 2019-02-26
Filing date: 2019-02-26
Publication date: 2022-12-16
Anticipated expiration: 2039-02-26
Also published as: CN109886273A

Abstract

The invention discloses a CMR image segmentation and classification system, which comprises: the image preprocessing module is used for segmenting and classifying images; the image preprocessing module is used for acquiring and cutting an original CMR image to acquire a target area image; the image segmentation and classification module is a trained convolutional neural network and comprises an encoder, a decoder and an image post-processing module; the encoder comprises a convolution layer, a spatial pyramid pooling layer and a first softmax classifier which are sequentially connected; outputting a shallow feature map of the target area image by the convolution layer; the method comprises the steps that a first softmax classifier outputs a deep feature map and a classification result of a target area image; the decoder is used for fusing the shallow feature map and the deep feature map to obtain a segmentation probability response map; the image post-processing module is used for acquiring a segmentation mask of the target area image according to the segmentation probability response image. The technical scheme provided by the invention can automatically and accurately segment the target area and classify the image of the CMR image, and assist doctors in disease diagnosis.

Description

CMR image segmentation and classification system

Technical Field

The invention relates to the technical field of image processing, in particular to a CMR image segmentation and classification system.

Background

Semantic Segmentation (Semantic Segmentation) and Image Classification (Image Classification) of images are the classic research subjects in the fields of artificial intelligence and computer vision. The semantic segmentation of the image aims to automatically classify areas of different objects in the image pixel by using a computer algorithm to form a segmentation mask. The semantic segmentation of the image requires that a segmentation mask obtained by a segmentation algorithm can keep accurate edge details and the same resolution as that of an original image, and different objects in the image need to be correctly classified. Different from semantic segmentation of images, image classification is to classify the whole image, and find a most likely label in an existing label set to be given to the image so as to indicate the classification characteristic of the whole image.

In clinical work, doctors can accurately analyze and diagnose certain diseases by means of medical imaging technology. For example, dilated Cardiomyopathy (DCM) requires a diagnosis and a determination of the severity of the disease by Cardiac Magnetic Resonance (CMR) imaging. The CMR image is an advanced cardiac imaging technology emerging in recent years, which can show the morphology, structure, function and tissue characteristic information of the myocardium through the signal value and the change of the image, and a doctor judges the cardiac diseases, especially the cardiac diseases including DCM, by qualitatively and quantitatively analyzing and interpreting the image, and guides the treatment according to the judgment result.

In the prior art, when a doctor analyzes a CMR image, the doctor manually delineates an interested anatomical structure region such as a myocardium and a blood pool on the CMR image, and then further analyzes a signal value in the region. Obviously, the manual drawing process is not only tedious and inefficient, but also the drawing accuracy depends to a large extent on the personal experience and the professional level of the doctor. Therefore, how to automatically segment a target region of a CMR image and accurately classify the CMR image to assist a doctor in diagnosis is a problem to be solved at present.

Disclosure of Invention

The invention aims to provide a CMR image segmentation and classification system which can automatically and accurately segment and classify a target region of a CMR image and assist doctors in disease diagnosis.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

a CMR image segmentation classification system comprising: the image preprocessing module is used for segmenting and classifying images; the image preprocessing module is used for acquiring an original CMR image, cutting the original CMR image and acquiring a target area image; the image segmentation and classification module is a trained convolutional neural network, and the trained convolutional neural network comprises an encoder, a decoder and an image post-processing module; the encoder comprises a convolution layer, a spatial pyramid pooling layer and a first softmax classifier which are sequentially connected; the input of the convolution layer is connected with the output of the image preprocessing module; outputting a shallow characteristic diagram of the target area image by the convolutional layer; the first softmax classifier outputs a deep feature map of the target region image and a classification result of the target region image; the decoder is used for fusing the shallow layer feature map and the deep layer feature map to obtain a first segmentation probability response map; the image post-processing module is used for acquiring a segmentation mask of the target area image according to the first segmentation probability response map.

Further, the encoder further comprises: the first up-sampling module is used for up-sampling the deep feature map to obtain a first deep feature map; the decoder comprises a decoding module, a characteristic combination module and a second softmax classifier which are connected in sequence; the decoding module is used for receiving the shallow feature map and the first deep feature map, decoding the shallow feature map and the first deep feature map, and acquiring a decoded shallow feature map and a decoded first deep feature map; the characteristic combination module is used for carrying out characteristic combination on the decoded shallow characteristic diagram and the decoded first deep characteristic diagram to obtain a combined characteristic diagram; the second softmax classifier receives the combined feature map and outputs the first segmentation probability response map according to the combined feature map.

Further, the decoder further comprises: the second up-sampling module is used for up-sampling the first segmentation probability response graph to obtain a second segmentation probability response graph; the image post-processing module is further used for acquiring a segmentation mask of the target area image according to the second segmentation probability response map.

Preferably, the image post-processing module includes: the binarization module is used for binarizing the second segmentation probability response map to obtain a first segmentation mask; and the morphology processing module is used for carrying out image morphology processing on the first segmentation mask to obtain the segmentation mask of the target area image.

Preferably, the convolution layer comprises a first convolution layer and a second convolution layer which are connected in sequence; the second convolutional layer is a depth separable convolutional layer; the input of the first convolution layer is connected with the output of the image preprocessing module; the output of the second convolution layer is connected to the spatial pyramid pooling layer.

Furthermore, a residual error connection structure is also adopted between the first convolution layer and the second convolution layer, and the first convolution layer and the second convolution layer form a residual error unit; the residual error units are more than two and are connected in sequence.

Further, a batch normalization module and a ReLU activation function are sequentially arranged between the second convolution layer and the spatial pyramid pooling layer.

Preferably, the method for the feature combination module to perform feature combination on the decoded shallow feature map and the decoded first deep feature map includes: performing step-by-step feature combination on the plurality of decoded shallow feature maps and the decoded first deep feature map; and performing one-time up-sampling on the feature combination result after each stage of feature combination is completed.

Preferably, the spatial pyramid pooling layer comprises: more than two 3x3 coiled layers with holes; each of the 3x3 perforated convolutional layers is configured with a different aperture.

Further, the image preprocessing module is also used for sampling and normalizing the original CMR image.

The CMR image segmentation and classification system provided by the embodiment of the invention is based on a convolutional neural network, and an encoder and a decoder are arranged in the convolutional neural network, wherein the encoder is used for acquiring a deep characteristic diagram of a target area image and classifying the target area image through a first softmax classifier; the decoder is used for receiving the shallow feature map and the deep feature map from the encoder, and the shallow feature map can provide detail information of the image, and the deep feature map can provide semantic information of the image, so that after the shallow feature map and the deep feature map are fused by the decoder, an accurate segmentation probability response map can be obtained, and the target area image can be accurately segmented by the segmentation probability response map. Therefore, the trained convolutional neural network is adopted, the CMR image can be automatically and accurately segmented and classified in the target region, the doctor can be assisted in disease diagnosis, and the working efficiency of the doctor is greatly improved.

Drawings

FIG. 1 is a first block diagram of a system according to an embodiment of the present invention;

FIG. 2 is a second block diagram of the system according to the embodiment of the present invention;

FIG. 3 is a block diagram of a convolutional neural network in an embodiment of the present invention;

FIG. 4 is a block diagram of the entry to a convolutional neural network in an embodiment of the present invention;

FIG. 5 is a functional image of a ReLU activation function according to an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of a spatial pyramid pooling layer according to an embodiment of the present invention;

FIG. 7 is a block diagram of a decoder according to an embodiment of the present invention;

FIG. 8 is a qualitative effect graph of myocardial segmentation using the system of the present invention;

FIG. 9 is a flow chart of a method for CMR image segmentation and classification using the present system;

in fig. 3 and 7, 1 is a target area image, and 2 is a division mask of the target area image.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings.

The invention is based on a convolution neural network, solves the problem that the traditional segmentation and classification network cannot well cope with the multi-scale property of an object by using the technologies of perforated convolution, depth separable convolution, residual connection, space pyramid pooling and the like, and simultaneously fuses two different tasks of segmentation and classification into one network by utilizing the self characteristics of the structures of an encoder and a decoder. In particular, the dense feature representation learned by the encoder is used for classification, and the classifier uses the highly abstract characteristic to obtain an accurate classification result. Meanwhile, a high-resolution feature map recovered by a decoder is used as a segmentation mask feature, and an end-to-end mode is used for fully automatically generating the segmentation mask of the cardiac magnetic resonance image.

Fig. 1 is a system structure diagram according to an embodiment of the present invention, including: the image preprocessing module is used for segmenting and classifying images; the image preprocessing module is used for acquiring an original CMR image, cutting the original CMR image and acquiring a target area image; the image pre-processing module is further configured to sample and normalize the original CMR image. The image segmentation and classification module is a trained convolutional neural network, and the trained convolutional neural network comprises an encoder, a decoder and an image post-processing module; the encoder comprises a convolution layer, a spatial pyramid pooling layer and a first softmax classifier which are sequentially connected; the input of the convolution layer is connected with the output of the image preprocessing module; outputting a shallow characteristic diagram of the target area image by the convolutional layer; the first softmax classifier outputs a deep feature map of the target area image and a classification result of the target area image; the decoder is used for fusing the shallow layer feature map and the deep layer feature map to obtain a first segmentation probability response map; the image post-processing module is used for acquiring a segmentation mask of the target area image according to the first segmentation probability response map.

In this embodiment, as shown in fig. 2, the encoder further includes: the first up-sampling module is used for up-sampling the deep feature map to obtain a first deep feature map; the decoder comprises a decoding module, a characteristic combination module and a second softmax classifier which are connected in sequence; the decoding module is used for receiving the shallow feature map and the first deep feature map, decoding the shallow feature map and the first deep feature map, and acquiring a decoded shallow feature map and a decoded first deep feature map; the characteristic combination module is used for carrying out characteristic combination on the decoded shallow characteristic diagram and the decoded first deep characteristic diagram to obtain a combined characteristic diagram; the second softmax classifier receives the combined feature map and outputs the first segmentation probability response map according to the combined feature map. Further, the decoder further comprises: the second up-sampling module is used for up-sampling the first segmentation probability response graph to obtain a second segmentation probability response graph; the image post-processing module is further used for acquiring a segmentation mask of the target area image according to the second segmentation probability response map.

In this embodiment, the image post-processing module includes: a binarization module, configured to binarize the second segmentation probability response map to obtain a first segmentation mask; and the morphology processing module is used for carrying out image morphology processing on the first segmentation mask to obtain the segmentation mask of the target area image.

Fig. 3 is a structural diagram of a convolutional neural network with an encoder, a decoder and a spatial pyramid pooling structure according to the present invention, i.e., a structural diagram of a system according to an embodiment of the present invention. The convolutional neural network comprises computer vision technologies such as image input, image feature extraction, spatial pyramid pooling, encoder and decoder structures and the like. By utilizing the computer vision technology and the processing module, the multi-scale characteristics of the image can be processed, and the deep characteristic information of the image can be effectively explored. We now decompose and explain this multitasking network to give a detailed description of the structure and function of each part, and the logic flow of the processing between parts.

Fig. 4 is a structural diagram of an entrance of a convolutional neural network, which is composed of convolutional layers including a first convolutional layer and a second convolutional layer connected in sequence, wherein the second convolutional layer is a depth separable convolutional layer. The input of the first convolution layer is connected with the output of the image preprocessing module, and the output of the second convolution layer is connected with the spatial pyramid pooling layer.

A target area image I having a size of m x1 is input to the convolution layer, and a two-dimensional gray image is output. The first convolution layer performs feature extraction on the target area image by using convolution with a convolution kernel size of k × k and the number of output channels of n 1. The feature map size obtained at this time can be obtained using the following formula:

N＝(M-K+2P)/S+1

wherein, M is the size M x M of the target area image, K is the convolution kernel size K x K, P is the size of the padding extension of the first convolution layer, S is the step size stride of the first convolution layer, and the size of the feature map obtained after the first convolution layer is N x N.

And then into the second convolutional layer, a depth separable convolutional layer (depthwise partial constants). Separable convolution (separable convolution) can break the convolution kernel operation into multiple steps. The convolution operation is denoted by y = conv (x, k), where the output image is y, the input image is x, and the convolution kernel is k. Next, assume that k can be calculated by: k = k1.Dot (k 2). This achieves a separable convolution operation because instead of performing a two-dimensional convolution operation with k, the same effect is achieved by performing two one-dimensional convolutions with k1 and k2, respectively. In neural networks, we will typically use this deep separable convolution operation. The method can realize space convolution by connecting a deep convolution structure on the premise of keeping channel separation. For example, assume that there is a 3 × 3 convolutional layer with 16 input channels and 32 output channels. Specifically, 32 convolution kernels of size 3 × 3 would traverse each data in 16 channels, resulting in 16 × 32=512 feature maps. And then 1 feature map is obtained by fusion after the feature maps corresponding to each input channel are superposed. Finally, the required 32 output channels can be obtained. Applying a depth separable convolution for this example, 16 feature maps were obtained by traversing 16 channels of data with 1 convolution kernel of size 3 × 3. The 16 feature maps are then traversed with 32 convolution kernels of size 1 × 1 for additive fusion prior to the fusion operation. This process uses 16 × 3+16 × 32 × 1=656 parameters, which are much less than the above 16 × 32 × 3=4608 parameters. This example is a specific operation of depth separable convolution, where the upper depth multiplier (depth multiplier) is set to 1, which is also a common parameter of current network layers of this type, and this is done to decouple spatial information from depth information. The last depth separable convolution has a step size of 2 and no pooling layer is followed, which has the effect of replacing a pooling layer.

A jump Connection after convolution is carried out on a characteristic diagram from the front is arranged above the deep separable convolution, the special Connection structure is called Residual Connection (Residual Connection), a deep network generally has better effect than a shallow network, and if the accuracy of the model is further improved, the most direct method is to design the network to be deeper as well as better, so that the accuracy of the model is more and more accurate. However, as the network hierarchy increases, the model accuracy increases continuously, and when the network hierarchy increases to a certain number, the training accuracy and the testing accuracy decrease rapidly, which means that when the network becomes very deep, the deep network becomes more difficult to train. This is mainly due to the gradient vanishing problem of deep network models caused by the back propagation mechanism of deep learning. The principle of residual connection is to directly transmit input x to output as an initial result, and the output result is H (x) = F (x) + x, that is, in this embodiment, a residual connection structure is further adopted between the first convolutional layer and the second convolutional layer, and the first convolutional layer and the second convolutional layer form a residual unit. The residual error jump type structure breaks through the convention that the output of the n-1 layer of the traditional neural network can only be used as the input for the n layer, so that the output of a certain layer can directly cross over several layers to be used as the input of a later layer, and the problem that the error rate of the whole learning model is not reduced and inversely increased due to the superposition of multiple layers of networks is solved. In this embodiment, there are more than two residual error units, and the residual error units are connected in sequence. The repeated residual error unit can extract high-level semantic features of the image through a deep network structure.

In this embodiment, a batch normalization module and a ReLU activation function are further sequentially disposed between the second convolution layer and the spatial pyramid pooling layer. Batch normalization at each SGD (Stochastic Gradient Descent), the corresponding activation is normalized by a mini-batch, so that the mean value of the result (each dimension of the output signal) is 0 and the variance is 1, which serves to prevent the problem of Gradient disappearance during back propagation.

The full name of ReLU is a modified linear unit (Rectified linear unit) whose formula is ReLU = max (0, x), and the functional image is shown in fig. 5. The use of ReLU as an activation function has many advantages, such as the greater expressive power of ReLU for linear functions, especially in deep networks; for the nonlinear function, the gradient of the non-negative interval of the ReLU is constant, so that the problem of gradient disappearance does not exist, and the convergence rate of the model is maintained in a stable state.

Spatial pyramid pooling is an image feature detection technique that can use images of varying sizes as input, without the need to crop and scale the image as in conventional methods, because of the changing or loss of image features that reduce the classification accuracy. In the method, convolution kernels with different sizes are used on the convolved features, for example, a group of convolution kernels of 5x5, 3x3, 2x2 and 1x1 are used for performing convolution on the features at different scales, and the obtained multi-scale features are combined to obtain a fixed scale. And finally may be sent to a full link layer, or other classifier. Therefore, the scale of the input image does not need to be fixed, and the characteristic scale is rich. The space pyramid pooling has three advantages, firstly, the defect caused by different sizes of input pictures can be overcome, clipping and zooming are not needed, and the original characteristics of the images are kept; secondly, because a feature graph is subjected to feature extraction from different angles and scales and then aggregated, the information utilization rate is improved; third, the accuracy of target detection is increased. In this embodiment, the spatial pyramid pooling layer includes more than two 3 × 3 rolling layers with holes, each of the 3 × 3 rolling layers with holes is configured with a different aperture to be used as a multi-scale feature extraction convolution, and finally, extracted features are combined by using concat. The structure of the spatial pyramid pooling layer is schematically shown in fig. 6.

Fig. 7 is a schematic structural diagram of a decoder according to an embodiment of the present invention. Because of the inherent rotational invariance of the convolutional neural network, the denser extraction of image features is more beneficial to classification, but this characteristic is naturally opposite to image segmentation, and because image segmentation requires that a segmentation mask with the same size as the original image is output and clear edges are required, the dense feature map subjected to multiple downsampling needs to be decoded to recover spatial resolution and detail information. The decoder adopted by the invention uses a shallow feature map from the convolutional layer of the encoder and a deep feature map from the first softmax classifier, wherein the shallow feature map can provide detail information, and the deep feature map provides semantic information, which are complementary to each other, so that a classification result with accurate classification and a segmentation mask with clear details are obtained. The upsampling operation in the decoder is to restore the output to the original size.

FIG. 8 is a qualitative effect graph of myocardial segmentation using the system of the present invention. Where each behavior is a different sample and its results. The first column from left to right is an original image, the second column is a manually marked ground-route, the third column is a segmentation result of the system, and the fourth column is an error between the segmentation result of the system and the ground-route. By labeling different colors in the fourth column, we can obtain an accurate segmented region, an over-segmented region, and an under-segmented region.

As can be seen from the qualitative effect chart, the segmentation result of the system is very close to the ground-route drawn by a professional doctor manually, and even special and fine parts avoid errors possibly caused by manual work. And error analysis results of the rightmost column are observed, so that the control can be well realized on the areas of over-segmentation areas and under-segmentation areas, the errors are small, and the precision is high.

Meanwhile, in order to prove the effect of the system more convincingly, a K-Fold cross validation method is used for testing, K =5 is taken in the test, and the average fraction of the result after 5 folds is used as a final result. The division index shown in table 1 and the classification index shown in table 2 were obtained, and the indices allowed quantitative and accurate evaluation of the quality of a method. The system of the invention is compared with other segmentation classification systems, and the effect of the system is illustrated in a comparative way. Observing the data in the table, it can also be concluded that, taking DSC and Jaccard indexes as the most concerned indexes for segmentation as an example, the system is better than the existing deplabv 3+ network, and reaches 0.7945 and 0.6681 respectively. Compared with the existing Xception network, the classification result is also improved to a certain extent, and the accuracy of 0.855, the sensitivity of 0.896 and the specificity of 0.731 are achieved.

TABLE 1 evaluation index of myocardial segmentation result

Method \ index	DSC	Jaccard	AUC	F-Measure
					U-Net	0.6150	0.4546	0.7695	0.6150
Deeplabv3+	0.7476	0.5942	0.9610	0.7476
					The system	0.7945	0.6681	0.8861	0.7945

TABLE 2 CMR Classification result evaluation index

Method/index	Accuracy	Sensitivity	Specificity
				ResNet50	0.741	0.888	0.293
Xception	0.849	0.880	0.756
				The system	0.855	0.896	0.731

FIG. 9 is a flow chart of a method for CMR image segmentation and classification using the system, comprising the following steps:

inputting an original CMR image;

step two, the image preprocessing module preprocesses the original CMR image, and the specific processing process is as follows: resampling the original CMR image to ensure that the spatial resolution of the image is 1x1x1m ³ (ii) a Normalizing the resampled image to ensure that the brightness value of the image is between-1.0 and 1.0; the normalized image is cropped, and only the ROI area, namely the interested anatomical structure area such as cardiac muscle, blood pool and the like is reserved. And obtaining the target area image after cutting.

And step three, extracting the features of the target area image by using the convolution layer in the encoder. The convolution layers in this embodiment include a first convolution layer and a second convolution layer, where the first convolution layer is a normal convolution layer and the second convolution layer is a depth-separable convolution layer. A residual error connection structure is further adopted between the first convolution layer and the second convolution layer, so that a residual error unit is formed, in this embodiment, there are a plurality of residual error units, that is, a plurality of residual error units are adopted to perform feature extraction on the target area image.

And step four, sending the image subjected to the feature extraction into a spatial pyramid pooling layer for pooling.

And step five, sending the pooled images into a first softmax classifier for classification, and obtaining a classification result of the target area images.

Step six, after up-sampling the deep characteristic map from the first softmax classifier, sending the deep characteristic map into a decoder; a decoding module of the decoder receives the shallow layer feature map and the up-sampled deep layer feature map from the encoder convolutional layer, decodes the shallow layer feature map and the up-sampled deep layer feature map and obtains a decoded shallow layer feature map and a decoded first deep layer feature map; the characteristic combination module of the decoder performs characteristic combination on the decoded shallow characteristic diagram and the decoded first deep characteristic diagram to obtain a combined characteristic diagram; and a second softmax classifier of the decoder receives the combined feature map, and a second upsampling module upsamples the output of the second softmax classifier and outputs a second segmentation probability response map.

In this step, the method for feature combining the decoded shallow feature map and the decoded first deep feature map by the feature combining module includes: carrying out step-by-step feature combination on the plurality of decoded shallow feature maps and the decoded first deep feature map; and performing one-time up-sampling on the feature combination result after each stage of feature combination is completed.

And seventhly, the image post-processing module acquires a segmentation mask of the target area image according to the second segmentation probability response graph. Specifically, a binarization module in the image post-processing module binarizes the second segmentation probability response map to obtain a first segmentation mask; and a morphological processing module in the image post-processing module performs image morphological processing on the first segmentation mask to obtain a segmentation mask of the target area image.

The invention can accurately segment and classify the CMR image, and fuse two computer vision tasks into one network, thereby realizing automation and synchronization, and the two tasks of segmentation and classification are mutually promoted, and accurate results can be obtained. In the aspect of image segmentation, the convolution neural network can completely segment medical anatomical structures such as cardiac muscle, blood pool and the like, and provides a prerequisite for further diagnostic analysis. In the aspect of image classification, the convolutional neural network can accurately classify the CMR images, judge whether the subject is ill or not and provide reference basis for diagnosis of doctors. The invention replaces the traditional convolution with the porous convolution, and has the advantages that the receiving field of the convolution kernel is larger, and larger context information can be obtained; by using the spatial pyramid pooling layer, information of the image can be extracted on different scales, and even a very small object can be effectively captured by coping with the multi-scale characteristic of the image; the advantages of different branches can be exerted by using the encoder and decoder structures, and multi-task cooperation is realized. The encoder branches to obtain high-level semantics, the decoder branches to recover detail information, and the shallow feature map and the deep feature map are fused to obtain a segmentation mask with rich semantics and complete details. By reasonably utilizing the multiple deep learning technologies, the invention can effectively and automatically acquire high-quality heart segmentation masks and accurate heart disease classification results, can greatly reduce the workload of medical personnel, and has great significance for disease analysis and subsequent treatment plan evaluation.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention.

Claims

1. A CMR image segmentation classification system, comprising: the image preprocessing module is used for segmenting and classifying images;

the image preprocessing module is used for acquiring an original CMR image, cutting the original CMR image and acquiring a target area image;

the image segmentation and classification module is a trained convolutional neural network, and the trained convolutional neural network comprises an encoder, a decoder and an image post-processing module;

the encoder comprises a convolution layer, a spatial pyramid pooling layer and a first softmax classifier which are sequentially connected; the input of the convolution layer is connected with the output of the image preprocessing module; outputting a shallow characteristic diagram of the target area image by the convolutional layer; the first softmax classifier outputs a deep feature map of the target area image and a classification result of the target area image;

the decoder is used for fusing the shallow layer feature map and the deep layer feature map to obtain a first segmentation probability response map; the encoder further comprises: the first up-sampling module is used for up-sampling the deep feature map to obtain a first deep feature map; the decoder comprises a decoding module, a characteristic combination module and a second softmax classifier which are connected in sequence; the decoding module is used for receiving the shallow feature map and the first deep feature map, decoding the shallow feature map and the first deep feature map, and acquiring a decoded shallow feature map and a decoded first deep feature map; the characteristic combination module is used for carrying out characteristic combination on the decoded shallow layer characteristic diagram and the decoded first deep layer characteristic diagram to obtain a combined characteristic diagram; the second softmax classifier receives the combined feature map and outputs the first segmentation probability response map according to the combined feature map;

the method for the feature combination module to combine the features of the decoded shallow feature map and the decoded first deep feature map comprises the following steps:

performing progressive feature combination on the plurality of decoded shallow feature maps and the decoded first deep feature map; performing primary up-sampling on the feature combination result after each stage of feature combination is completed;

the image post-processing module is used for acquiring a segmentation mask of the target area image according to the first segmentation probability response map.

2. The CMR image segmentation classification system of claim 1, wherein the decoder further comprises: the second upsampling module is used for upsampling the first segmentation probability response graph to obtain a second segmentation probability response graph; the image post-processing module is further used for obtaining a segmentation mask of the target area image according to the second segmentation probability response image.

3. The CMR image segmentation classification system of claim 2, wherein the image post-processing module includes:

the binarization module is used for binarizing the second segmentation probability response map to obtain a first segmentation mask;

and the morphology processing module is used for carrying out image morphology processing on the first segmentation mask to obtain the segmentation mask of the target area image.

4. The CMR image segmentation classification system of claim 1, wherein the convolutional layers comprise a first convolutional layer, a second convolutional layer connected in sequence; the second convolutional layer is a depth separable convolutional layer; the input of the first convolution layer is connected with the output of the image preprocessing module; the output of the second convolutional layer is connected with the spatial pyramid pooling layer.

5. The CMR image segmentation classification system according to claim 4, wherein a residual connection structure is further adopted between the first convolution layer and the second convolution layer, and the first convolution layer and the second convolution layer form a residual unit; the residual error units are more than two and are connected in sequence.

6. The CMR image segmentation classification system of claim 4, wherein a batch normalization module and a ReLU activation function are further sequentially disposed between the second convolution layer and the spatial pyramid pooling layer.

7. The CMR image segmentation classification system of claim 1 wherein the spatial pyramid pooling layer comprises: more than two 3X3 winding layers with holes; each of the 3x3 perforated convolutional layers is configured with a different aperture.

8. The CMR image segmentation classification system of claim 1, wherein the image pre-processing module is further configured to sample and normalize the raw CMR image.