Disclosure of Invention
The embodiment of the invention provides a ventricular myocardium segmentation model training method, a segmentation method and a device in a cardiac nuclear magnetic resonance image, which are used for eliminating or improving one or more defects in the prior art and solving the problem of low generalization capability of a model caused by lack of a sample in a medical image.
The technical scheme of the invention is as follows:
in one aspect, the present invention provides a ventricular myocardium segmentation model training method in a cardiac nuclear magnetic resonance image, including:
acquiring a plurality of cardiac nuclear magnetic resonance images, and preprocessing each cardiac nuclear magnetic resonance image, wherein the preprocessing comprises contrast limited adaptive histogram equalization and Gaussian blur;
acquiring outlines marked with ventricular myocardium in each cardiac nuclear magnetic resonance image as labels and preprocessed cardiac nuclear magnetic resonance images to form a training sample set;
acquiring a preset neural network model, wherein the preset neural network model adopts a set number of source domain sample data for pre-training;
and training the preset neural network model by adopting the training sample set, and iteratively updating the preset neural network model by adopting a cross entropy function, a Dice loss and edge loss mixed loss function to obtain a target segmentation model.
In some embodiments, the predetermined neural network model is a U-Net model.
In some embodiments, the downsampling path of the U-Net model comprises a first downsampling layer, a second downsampling layer, a third downsampling layer, and a fourth downsampling layer in succession; the up-sampling path of the U-Net model comprises a first up-sampling layer, a second up-sampling layer, a third up-sampling layer and a fourth up-sampling layer which are continuous; the downsampling path and the upsampling path are connected by a bottleneck layer; the first down-sampling layer is in hopping connection with the fourth up-sampling layer, the second down-sampling layer is in hopping connection with the third up-sampling layer, the third down-sampling layer is in hopping connection with the second up-sampling layer, and the fourth down-sampling layer is in hopping connection with the first up-sampling layer.
In some embodiments, the first downsampling layer, the second downsampling layer, the third downsampling layer, and the fourth downsampling layer each comprise a density Block and a pooling layer; the first upsampled layer, the second upsampled layer, the third upsampled layer, and the fourth upsampled layer each comprise a transposed convolutional layer and a sense Block; the bottleneck layer is a Dense Block.
In some embodiments, the gaussian blur is assigned weights according to a two-dimensional normal distribution.
In some embodiments, in the mixed loss function, a-log function is used to amplify a Dice loss to the same order of magnitude as a cross-entropy loss function, and a variable α is used to control the cross-entropy function, the Dice loss, and the weight of the edge loss.
In some embodiments, the mixing loss function is:
wherein LOSS is a mixing LOSS function,
in order to be a function of the cross-entropy loss,
in order to account for the loss of the Dice,
alpha is a variable, alpha is more than 0 and less than 1.
In some embodiments, the variable α is gradually increased and goes to 1 during the training process.
In another aspect, the present invention further provides a ventricular myocardium segmentation method in a cardiac nuclear magnetic resonance image, including:
acquiring a cardiac nuclear magnetic resonance image to be segmented and preprocessing the cardiac nuclear magnetic resonance image, wherein the preprocessing comprises contrast limited adaptive histogram equalization and Gaussian blur;
and inputting the processed to-be-segmented cardiac nuclear magnetic resonance image into the target segmentation model in the training method of the ventricular myocardium segmentation model in the cardiac nuclear magnetic resonance image so as to output the segmentation result of the ventricular myocardium.
In another aspect, the present invention provides an electronic device, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method when executing the program.
The invention has the beneficial effects that:
according to the heart nuclear magnetic resonance image heart ventricle myocardial segmentation model training method, the segmentation method and the device, the segmentation model training method adopts contrast limited adaptive histogram equalization and Gaussian blur to preprocess a heart nuclear magnetic resonance image, so that the image contrast of a training sample can be effectively improved, and the recognition and segmentation effects are improved. Meanwhile, in the process of training the segmentation model, the dependency on training data can be effectively reduced in a transfer learning mode, and the identification accuracy is improved under the condition of lacking of the training data. Furthermore, a mixed loss function combining cross entropy loss, Dice loss and edge loss is adopted in training, influences of irrelevant backgrounds can be reduced, boundary contour characteristics are concerned, the problem of data category imbalance is solved and a training result is more accurate under the condition that the stability of a model training process is guaranteed, and the requirement for the quantity of training data is reduced to a certain extent.
Furthermore, an improved U-Net model is adopted, a sense Block is adopted to replace a common convolutional layer, the number of parameters is greatly reduced, the regularization effect is achieved, the phenomenon of overfitting can be reduced even on a small number of training sets, and the problem of few medical image samples can be effectively adapted.
Furthermore, in the training process, the proportion of the edge loss in the mixed loss function is gradually increased, so that the boundary characteristics are concerned more, the precision is improved, and the requirement on the number of training samples is reduced to a certain extent.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
It will be appreciated by those skilled in the art that the objects and advantages that can be achieved with the present invention are not limited to the specific details set forth above, and that these and other objects that can be achieved with the present invention will be more clearly understood from the detailed description that follows.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
It should be noted that, in order to avoid obscuring the present invention with unnecessary details, only the structures and/or processing steps closely related to the scheme according to the present invention are shown in the drawings, and other details not so relevant to the present invention are omitted.
It should be emphasized that the term "comprises/comprising" when used herein, is taken to specify the presence of stated features, elements, steps or components, but does not preclude the presence or addition of one or more other features, elements, steps or components.
In recent years, the incidence of cardiovascular diseases is increasing year by year and the number of cardiovascular patients is also increasing year by year due to lack of exercise, rest time, irregular diet and other factors. According to the world health organization, the direct cause of death of about 1790 ten thousand in 2016 is cardiovascular disease accounting for 31% of the total deaths, 85% of which are caused by heart disease and stroke. Cardiovascular disease is a major death factor worldwide. Clinical tests show that cardiovascular diseases have certain influence on the shape and function of the heart, so that the heart is diseased. Non-invasive medical imaging allows the observation of the occurrence of these lesions. The heart image data is segmented to obtain the intracardiac structure, and a foundation is laid for the next research and analysis of the cardiovascular diseases. The Magnetic Resonance Imaging (Magnetic Resonance Imaging) technology has the advantages of no wound, no radiation, high tissue contrast and the like, and can accurately quantify the intracardiac tissue structure and calculate the cardiac function parameters. Therefore, the images in the cardiac mri images are considered to be gold standards for quantitative analysis of the heart. However, manual image segmentation in cardiac mri is a lengthy and tedious task, requiring about 15 minutes for a clinician to complete the segmentation of a volume. In addition, the segmentation result is inaccurate due to the difference of the observer's own standard or the standard between different observers. In the face of a large number of clinical diagnostic needs, relying solely on manual segmentation is inefficient and undesirable. Therefore, the advanced algorithm is used for carrying out efficient and accurate automatic labeling on the cardiac nuclear magnetic resonance image, so that a doctor can get rid of a heavy and time-consuming manual segmentation task, and the method has profound significance for improving the diagnosis and treatment level of cardiovascular diseases.
The invention provides a ventricular myocardium segmentation model training method for a cardiac nuclear magnetic resonance image, which comprises the following steps of S101-S104:
it should be noted that, steps S101 to S104 are not limited to the order of the steps, and it should be understood that, in a specific scenario, the steps may be performed in parallel or the order may be changed.
Step S101: the method comprises the steps of obtaining a plurality of cardiac nuclear magnetic resonance images, and preprocessing each cardiac nuclear magnetic resonance image, wherein the preprocessing comprises contrast limited adaptive histogram equalization and Gaussian blur.
Step S102: and acquiring the outline marked with the heart ventricle myocardium in each cardiac nuclear magnetic resonance image as a label and each preprocessed cardiac nuclear magnetic resonance image to form a training sample set.
Step S103: and acquiring a preset neural network model, wherein the preset neural network model adopts a set number of source domain sample data for pre-training.
Step S104: and training the preset neural network model by adopting a training sample set, and iteratively updating the preset neural network model by adopting a cross entropy function, a Dice loss and edge loss mixed loss function to obtain a target segmentation model.
In step S101, the nuclear magnetic resonance image is preprocessed, which mainly aims to make the preprocessed data more favorable for network training, and the preprocessing methods include, but are not limited to, contrast enhancement, denoising, clipping, and the like. In addition, the processing can be performed by using the prior knowledge of the medical image.
In this embodiment, the cardiac mri image belongs to a grayscale image, and may be preprocessed by histogram equalization. Histogram equalization is a simple and efficient image enhancement method, but when the gray distribution difference of different areas of an image is large, the effect of histogram equalization is often not ideal. In medical images, there is often a need to enhance some local details, but different regions of the medical image just have the problem of uneven gray scale distribution. Local histogram equalization only considers pixels in a local region, ignores pixels in other regions of the image, and has the disadvantage of over-amplifying noise for similar regions in the image.
In this embodiment, an image enhancement method of contrast-limited adaptive histogram equalization is used, and the slope of the transform function is used to determine the contrast amplification around the designated pixel, which is proportional to the slope of the region-cumulative histogram. The method restricts the enhancement intensity of the local contrast by limiting the height of the local histogram, thereby reducing the noise amplification degree and avoiding the over-enhancement of the local contrast. Compared with local histogram equalization, contrast-limited adaptive histogram equalization limits local contrast, and interpolation can increase calculation speed.
Specifically, assuming that the sliding window size of the contrast-limited adaptive histogram equalization is mxm, the local mapping function is the following equation:
wherein, CDF(i) For a cumulative distribution function of a sliding window local histogram, the slope S of the local histogram can be expressed as:
wherein Hist(i) For histogram function, defining the maximum slope as SmaxThen the maximum allowable histogram height is the following equation:
h of histogram height greater thanmaxAssuming that the histogram is truncated by a truncation threshold T and the truncated parts are uniformly distributed over the entire gray scale, thus ensuring that the total area of the histogram is unchanged and the entire histogram height is increased by L, then the following equation applies:
Hmax=T+L (4)
the resulting histogram function is given by:
changing the maximum slope S of the mapping function by the algorithm described abovemaxAnd corresponding histogram maximum height slicemaxImages with different enhancement effects can be obtained.
Furthermore, in this embodiment, the image smoothing is performed by using gaussian blur, so that the trained model is more robust. Blurring is the averaging of all pixels around the pixel one by one. Numerically, this is a smoothing. On the graphics, a blurred visual effect is produced. The Gaussian blur is a weight distribution mode conforming to two-dimensional normal distribution, and elements in a weight matrix satisfy the following formula:
wherein the content of the first and second substances,
(x, y) represents the element in the weight, x, y represent the coordinate of the element in the weight matrix, μ
1、μ
2、σ
1、σ
2ρ is a constant, and is a parameter of a two-dimensional normal distribution.
In step S102, the contour of the ventricular myocardium is labeled as a label by manual labeling, and forms a training sample set with the preprocessed cardiac nuclear magnetic resonance image, where the preprocessed cardiac nuclear magnetic resonance image serves as an input and the contour of the ventricular myocardium serves as an output.
In steps S103 and S104, the preset neural network model is trained using the training sample set in step S102. In a general medical scene, due to the limitation of the labeling cost, the privacy of patients and other factors on the number of samples, the number of available training samples is only thousands or even only hundreds, and in the case of few samples, the general training mode is difficult to make the model obtain an ideal generalization effect. Therefore, step S103 may be trained in a migration learning or meta learning manner, where the migration learning may adopt a source domain training sample to pre-train a preset neural network model, and then migrate the model obtained by the pre-training to the training sample set in step S102 of this embodiment to train. For example, the preset neural network model is pre-trained by using the cardiac nmr image labeled with the heart overall contour, and then the segmentation model is obtained by training the training sample set labeled with the ventricular myocardium contour in step S102. The embodiment may also adopt a meta-learning mode for training, and the meta-learning may adopt a classical MAML form. The preset neural network model can adopt an image segmentation model such as a U-Net model or a Deep Mask model.
In the training process of step S104, a mixed loss function based on a cross entropy function, Dice loss, and edge loss is used for training. Statistical learning aims at selecting an optimal model in a hypothetical space, and when a model is used to simulate the behavior of an object, a loss function plays a role in describing the deviation between the model and the behavior of the object. Based on this criterion, the goal of the training is to minimize this error so that the behavior predicted by the model is as close as possible to the true behavior of the object. The loss function is a bridge connecting the model and the algorithm, and plays a role in converting supervised learning into an optimization problem. Commonly used loss functions in the field of medical image segmentation include cross entropy loss, Dice loss, Focal loss and the like.
Cross entropy loss is the loss function most commonly used in image segmentation, which predicts the probability y by comparing pixel points class by classpredOne-hot encoding vector y with real categorytrueTo calculate the prediction error. The following formula:
wherein the content of the first and second substances,
representing cross entropy loss and classes representing the class of the pixel. Equation 7 above shows that the cross entropy loss is calculated separately for each pixel class prediction and then averaged. This is for all pixelsAnd performing equal learning. However, most medical images have the problem of unbalanced categories, and the background occupies most of the images, so that more background categories with more pixels are learned, objects with less pixels which really need to be detected are difficult to learn, and small objects are difficult to learn if only cross entropy loss is used.
To mitigate the effects of class imbalance, this embodiment adds a class-level penalty function Dice penalty based on the pixel-level penalty function cross-entropy penalty. The Dice coefficient is an objective index for evaluating the segmentation effect, and the overall loss of all pixels in the same category is calculated by calculating the intersection ratio of the prediction result and the real label. The network adopts the Dice loss as a segmentation effect evaluation index as direct supervision, and simultaneously ignores a large number of irrelevant background pixels when calculating the cross-over ratio, so that the problem of class imbalance can be greatly reduced, and the convergence speed is accelerated. The Dice loss is applicable to the case of extreme category imbalance, but generally, the Dice loss is only used to be unfavorable for back propagation, so that the training process is unstable.
In addition, the detail part of the class edge mainly influences the segmentation precision at the later stage of training, in order to make the pixel classification of the class edge more accurate, the embodiment introduces the edge loss to minimize the distance between the segmentation boundary and the real label boundary, as shown in fig. 2, inspired by the discrete optimization technology in the curve when the boundary of the two graphs G and S
And
when sufficiently close, the distance between them can be expressed as:
wherein p represents
The point(s) on the upper surface,
to represent
The point closest to the upper distance p; equation 8 above calculates the distance between two boundaries by differentiation and cannot be used directly as a loss function. The distance between two boundaries calculated by the integration method can be expressed as follows:
wherein, as shown in FIG. 3, D
G(q) is according to
The obtained distance feature map, Δ S is the area enclosed by the prediction boundary and the true boundary, and can be expressed as the following formula:
ΔS=(G∪S)-(G∩S) (10)
assuming a function
The following formula:
the distance between two boundaries can be written as the area integral of the level set function as follows:
if the whole image domain is represented by Ω and a binary function s is introducedθ(q) and g (q) indicate the predicted result and the true label, respectively, the optimization goal can be expressed as:
wherein s isθ(q) is linked to the sofimax activation function, while the last term is independent of the network parameters, so the edge loss is of the form:
in summary, a mixed loss function combining cross entropy loss, Dice loss and edge loss is adopted, so that the problem of data class imbalance is solved and the training result is more accurate under the condition that the stability of the model training process is ensured.
In some embodiments, in the mixed loss function, a-log function may be used to amplify the Dice loss to the same order of magnitude as the cross-entropy loss function, and a variable α is used to control the specific gravity of the cross-entropy function, Dice loss, and edge loss. The mixing loss function is:
wherein LOSS is a mixing LOSS function,
in order to be a function of the cross-entropy loss,
in order to account for the loss of the Dice,
alpha is a variable, alpha is more than 0 and less than 1. In some embodiments, the variable α is gradually increased and goes to 1 during the training process. Using-log function to convert 1-L
Dic
eAmplifying to and L
CEOf the same order of magnitude. And meanwhile, the variable alpha is adopted to control the weight of each loss in different stages. As the training approaches the later stage, alpha increases continuously to approach 1, moduloThe model will focus more on the boundary contour.
In some embodiments, the predetermined neural network model is a dense connection model of a U-Net structure. Wherein, as shown in fig. 4, the down-sampling path of the U-Net model includes a first down-sampling layer, a second down-sampling layer, a third down-sampling layer and a fourth down-sampling layer which are consecutive; the up-sampling path of the U-Net model comprises a first up-sampling layer, a second up-sampling layer, a third up-sampling layer and a fourth up-sampling layer which are continuous; the down-sampling path and the up-sampling path are connected by a bottleneck layer; the first down-sampling layer is connected with the fourth up-sampling layer in a jumping mode, the second down-sampling layer is connected with the third up-sampling layer in a jumping mode, the third down-sampling layer is connected with the second up-sampling layer in a jumping mode, and the fourth down-sampling layer is connected with the first up-sampling layer in a jumping mode.
In some embodiments, the first downsampling layer, the second downsampling layer, the third downsampling layer, and the fourth downsampling layer each comprise a density Block and a pooling layer; the first upsampling layer, the second upsampling layer, the third upsampling layer and the fourth upsampling layer all comprise a transposed convolution layer and a sense Block; the bottleneck layer is a Dense Block. Compared with the traditional U-Net, the model structure can greatly reduce the model parameters while improving the extraction capability and the segmentation performance of the model features.
The Dense Block is a convolutional neural network with the property of tight connection, any two layers in the neural network have direct connection, namely the input of each layer in the network is the union of the outputs of all the previous layers, and the learned characteristics of the layer are directly transmitted to all the following layers as input. This tight connection exists only in the same depth Block, but not in a different depth Block. The most important contribution of DenseNet is the tightly connected convolutional neural network, which is embodied in the following 4 aspects: (1) the problem of gradient disappearance is relieved; (2) the spread of the features is enhanced, and the feature recycling is encouraged; (3) the number of parameters is greatly reduced; (4) the regularization effect is achieved, and the phenomenon of overfitting can be reduced even on a small training set.
On the other hand, the invention also provides a ventricular myocardium segmentation method in the cardiac nuclear magnetic resonance image, which comprises the following steps of S201 to S202:
step S201: and acquiring a cardiac nuclear magnetic resonance image to be segmented and preprocessing the cardiac nuclear magnetic resonance image, wherein the preprocessing comprises contrast limited adaptive histogram equalization and Gaussian blur.
Step S202: inputting the processed to-be-segmented cardiac nuclear magnetic resonance image into the target segmentation model in the training method of the ventricular myocardium segmentation model in the cardiac nuclear magnetic resonance image in the steps S101 to S104, so as to output the segmentation result of the ventricular myocardium.
In another aspect, the present invention provides an electronic device, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method when executing the program.
In summary, in the cardiac nuclear magnetic resonance image ventricular myocardium segmentation model training method, the segmentation method and the device of the present invention, the segmentation model training method employs contrast-limited adaptive histogram equalization and gaussian blur to preprocess a cardiac nuclear magnetic resonance image, which can effectively improve the image contrast of a training sample to improve recognition and segmentation effects. Meanwhile, in the process of training the segmentation model, the dependency on training data can be effectively reduced in a transfer learning mode, and the identification accuracy is improved under the condition of lacking of the training data. Furthermore, a mixed loss function combining cross entropy loss, Dice loss and edge loss is adopted in training, influences of irrelevant backgrounds can be reduced, boundary contour characteristics are concerned, the problem of data category imbalance is solved and a training result is more accurate under the condition that the stability of a model training process is guaranteed, and the requirement for the quantity of training data is reduced to a certain extent.
Furthermore, an improved U-Net model is adopted, a sense Block is adopted to replace a common convolutional layer, the number of parameters is greatly reduced, the regularization effect is achieved, the phenomenon of overfitting can be reduced even on a small number of training sets, and the problem of few medical image samples can be effectively adapted.
Furthermore, in the training process, the proportion of the edge loss in the mixed loss function is gradually increased, so that the boundary characteristics are concerned more, the precision is improved, and the requirement on the number of training samples is reduced to a certain extent.
Those of ordinary skill in the art will appreciate that the various illustrative components, systems, and methods described in connection with the embodiments disclosed herein may be implemented as hardware, software, or combinations of both. Whether this is done in hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.
It should also be noted that the exemplary embodiments mentioned in this patent describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.
Features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments and/or in combination with or instead of the features of the other embodiments in the present invention.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.