CN117274599A

CN117274599A - Brain magnetic resonance segmentation method and system based on combined double-task self-encoder

Info

Publication number: CN117274599A
Application number: CN202311273017.XA
Authority: CN
Inventors: 田智强; 李皓冰; 施展艺; 杜少毅
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2023-09-28
Filing date: 2023-09-28
Publication date: 2023-12-22

Abstract

The invention discloses a brain magnetic resonance segmentation method and a brain magnetic resonance segmentation system based on a combined double-task self-encoder, which are characterized in that a segmentation training set of a downstream segmentation task is registered, then middle cutting is carried out on the registered segmentation training set, and then resampling is carried out on data after center cutting to obtain characteristic data; extracting the characteristics of the acquired characteristic data by utilizing a pre-trained self-encoder to obtain basic characteristics, and decoding the acquired basic characteristics to obtain a segmentation result; the method comprises the steps of training a network segmentation model by using a decoded segmentation result and a corresponding segmentation training set, segmenting an MR image by using the trained network segmentation model, segmenting by using a double model, greatly improving the accuracy of the segmentation result, and improving the model segmentation result by using a pixel-level and object-level combined double-task framework to enable the model to learn pixel-level detail and object-level distinguishing information respectively and fusing modal information in a parameter sharing mode.

Description

Brain magnetic resonance segmentation method and system based on combined double-task self-encoder

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a brain magnetic resonance segmentation method and system based on a combined double-task self-encoder.

Background

Medical image segmentation plays an important role in computer-aided diagnosis and treatment and can help doctors to analyze data of diseases. The task is to accurately delineate target areas, such as organs, lesions, tissues, etc., at the pixel level in medical imaging. The current medical image segmentation is mainly performed in a manual labeling mode, which requires strong medical expert knowledge and experience, and is generally labeled by doctors, so that the ordinary people are difficult to conquer the task, and the low labeling data output efficiency is caused. Meanwhile, due to the noise of machine imaging, the divergence of subjective judgment of experts and the fatigue caused by a great deal of boring repetitive labor, subjective human errors are easily generated by manual labeling. A need therefore arises to develop accurate automated medical image segmentation algorithms. Compared with manual labeling, the algorithm has strong objectivity, can rapidly carry out batched image labeling, and greatly reduces the labor capacity of doctors.

There is currently a growing interest in the study of self-monitoring methods, but little work is done in the medical field to employ self-monitoring methods. Some studies indicate that self-supervised learning can be applied directly to the medical field, since unlabeled medical images carry valuable information about organ structures, and that self-supervision enables models to derive concepts about these structures without additional annotation costs. Unlike natural images, medical images are of a 3D nature, i.e. they are presented in a sequence. Many current approaches to self-supervising have been studied to convert 3D imaging tasks to 2D by extracting slices along arbitrary axes (e.g., axial dimensions). Obtaining a data representation from a 3D image by means of a 2D context is a suboptimal solution which reduces the performance of downstream tasks.

Deep learning models are typically trained using a supervised learning paradigm, where model learning maps inputs (e.g., nuclear magnetic resonance images or health records) to outputs. In order for the model to learn the relevant patterns in the data, the training process requires a large dataset with each input carrying corresponding tag information through supervised learning training. Deep learning has been very successful on supervised models. However, such work places more emphasis on building and testing models than on building annotated data sets. This is in part because expert annotation of large-scale generation of patient multi-modality data is non-simple, expensive and time-consuming for most medical tasks, and is associated with privacy exposure risks, even semi-automated software tools may not adequately reduce annotation costs.

On the other hand, unlike natural images, despite individual differences, the physical structure of medical images is relatively stable due to the anatomical structure of the human body depicted, presenting natural consistent contextual information in the image, and lesions also have their specific textures and appearances. The self-supervision proxy task is used for learning the basic mode of human anatomy, under the condition that an accurate segmentation result cannot be obtained in the self-supervision learning task, the discrimination accuracy of the encouragement model is affected.

Disclosure of Invention

The invention aims to provide a brain magnetic resonance segmentation method and a brain magnetic resonance segmentation system based on a combined double-task self-encoder, which are used for solving the problems that the accurate segmentation result cannot be obtained and the judgment accuracy of an encouragement model is affected in the prior art.

A brain magnetic resonance segmentation method based on a combined double-task self-encoder comprises the following steps:

registering the segmentation training set of the downstream segmentation task, performing intermediate cutting on the registered segmentation training set, and resampling the data after center cutting to obtain characteristic data;

extracting the characteristics of the acquired characteristic data by utilizing a pre-trained self-encoder to obtain basic characteristics, and decoding the acquired basic characteristics to obtain a segmentation result;

and training a network segmentation model by using the decoded segmentation result and the corresponding segmentation training set, and segmenting the MR image by using the trained network segmentation model.

Preferably, the pre-trained self-encoder training specific process is: the method comprises the steps of collecting a pre-training image as a pre-training set, converting pre-training set data into a brain imaging data structure, standardizing registration of the data structure of the pre-training set converted into the brain imaging data structure to the same template, then performing center cutting operation on the registered pre-training set data to obtain pre-training characteristic data, and training a self-encoder by utilizing the pre-training characteristic data.

Preferably, the data structure registration of the pre-training set is normalized to the same template using a clinical platform.

Preferably, performing random rotation operation on the same batch of pre-training characteristic data to obtain two positive correlation views of each sample in the same batch after data enhancement, and performing random masking operation on each positive correlation view to obtain a positive correlation characteristic map of a shielded part patch; and inputting the positive correlation characteristic map subjected to data enhancement and random mask operation into a self-encoder network to extract characteristic operation, respectively obtaining reconstructed image patches and contrast coding characteristics through a pixel-level prediction head and an object-level prediction head, and performing self-encoder pre-training by utilizing self-supervision information of a pre-training image.

Preferably, in the pre-training process of the self-encoder, the parameters of the network are optimized by using a back propagation strategy, training is assisted by using the loss function, and the network parameters are updated according to the value of the loss function, so that the loss function continuously descends until the loss function converges to a set value, and at the moment, the training is finished, and the pre-training of the self-encoder is completed.

Preferably, the step of adopting the clinical platform to standardize the registration of the data structure of the pre-training set to the same template specifically comprises the following steps:

and calculating a maximum adjacent rectangular frame for the maximum foreground region of all the modes after the registration, excluding the region with 0, unifying the space size of each sample to be consistent level for the whole pre-training set after the registration, and then resampling to obtain feature data, wherein the resampled target space is obtained by averaging the whole data set.

Preferably, for single-mode data, the self-encoder is directly loaded as an extraction feature encoder, and the low-level semantics of the encoding stage are connected with the high-level semantics of the decoding stage under the same downsampling multiplying power by using a cross-layer connection through a U-shaped network structure, so as to obtain a segmentation result.

Preferably, for multi-mode data, a simple mode sharing encoder is adopted, different mode data are input into the encoder sharing parameters, and common characteristics of all modes are captured, so that a segmentation result is obtained.

Preferably, the multi-modal coding output is decoded, and meanwhile, the low-level semantics of the coding stage are connected with the high-level semantics of the decoding stage under the same downsampling multiplying power by using cross-layer connection, and finally the segmentation result is obtained by decoding.

A brain magnetic resonance segmentation method based on a combined double-task self-encoder comprises a data preprocessing module, a self-supervision module and a segmentation module:

the data preprocessing module is used for registering the segmentation training set of the downstream segmentation task, then performing intermediate cutting on the registered segmentation training set, and then resampling the data after center cutting to obtain characteristic data;

the self-supervision module is used for extracting the characteristics of the acquired characteristic data to obtain basic characteristics, and decoding the acquired basic characteristics to obtain a segmentation result;

the segmentation module trains a network segmentation model by utilizing the decoded segmentation result and the corresponding segmentation training set, and segments the MR image by utilizing the trained network segmentation model.

Compared with the prior art, the invention has the following beneficial technical effects:

the invention provides a brain magnetic resonance segmentation method based on a combined double-task self-encoder, which is characterized in that a segmentation training set of a downstream segmentation task is registered, then the registered segmentation training set is subjected to middle cutting, and then data subjected to center cutting are resampled to obtain characteristic data; extracting the characteristics of the acquired characteristic data by utilizing a pre-trained self-encoder to obtain basic characteristics, and decoding the acquired basic characteristics to obtain a segmentation result; the decoded segmentation result and the corresponding segmentation training set are utilized to train the network segmentation model, the trained network segmentation model is utilized to segment the MR image, and the double model is utilized to segment, so that the accuracy of the segmentation result can be greatly improved.

Furthermore, a combined double-task framework of a pixel level and an object level is adopted, so that the model learns pixel level details and object level distinguishing information respectively, in a segmentation task, a self-encoder loading strategy based on a mode is provided for multi-mode data, mode information is fused in a mode of sharing parameters, and a model segmentation result is improved.

Furthermore, for multi-mode data, the data distribution of each branch is counted by using a special normalization layer in each branch, the mode private information is reserved, the data processing amount is reduced, and the image processing precision is improved.

Furthermore, by using cross entropy loss, softprice loss and depth supervision loss in combination, the feedback of gradient is promoted, model convergence is enhanced, and model training effect is further improved.

Drawings

Fig. 1 is a flowchart of an implementation of a brain magnetic resonance segmentation method in an embodiment of the present invention.

Fig. 2 is a block diagram of a network partition model based on a combined dual-tasking self-encoder in an embodiment of the present invention.

FIG. 3 is a view of a focus mask self-encoder in a self-supervision stage combined bi-tasking model in an embodiment of the present invention.

FIG. 4 is a schematic diagram of a contrast-based self-encoder architecture in a self-supervision stage combined dual-tasking model in an embodiment of the present invention.

Figure 5 is a diagram of a downstream brain magnetic resonance segmentation task network framework based on modalities in an embodiment of the invention.

Figure 6 is an extracted feature layer diagram of a modality-based downstream brain magnetic resonance segmentation encoder in an embodiment of the present invention.

Fig. 7 is a graph showing a segmentation effect of a brain magnetic resonance segmentation method model in an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

As shown in fig. 1 and 2, the invention provides a brain magnetic resonance segmentation method based on a combined double-task self-encoder, which specifically comprises the following steps:

s1, data preprocessing: the method comprises the steps of collecting a pre-training image as a pre-training set, converting pre-training set data into a brain imaging data structure, standardizing registration of the data structure of the pre-training set converted into the brain imaging data structure to the same template, and then performing center cutting operation on the registered pre-training set data to obtain pre-training characteristic data.

In the invention, a Clinica platform is adopted to standardize the registration of the data structure of the pre-training set to the same template.

Self-encoder pre-training process: performing random rotation operation on the same batch of pre-training characteristic data to obtain two positive correlation views of each sample in the same batch after data enhancement, and performing random masking operation on each positive correlation view to obtain a positive correlation characteristic map of a shielded part patch; and inputting the positive correlation characteristic map subjected to data enhancement and random mask operation into a self-encoder network to extract characteristic operation, respectively obtaining reconstructed image patches and contrast coding characteristics through a pixel-level prediction head and an object-level prediction head, and performing self-encoder pre-training by utilizing self-supervision information of a pre-training image.

In the pre-training process of the self-encoder, the parameters of the network are optimized by using a counter-propagation strategy, the training is assisted by using the loss function, and the network parameters are updated according to the value of the loss function, so that the loss function continuously descends until the loss function converges to a set value, and the pre-training of the self-encoder is completed after the training is finished.

Specifically, in the pre-training stage of the self-encoder, the disclosed data set is adopted as the pre-training data set, and the pre-training data set is totally divided into the pre-training sets.

The pre-training set data are converted into brain imaging data structures, and the pre-training set data are registered and standardized on the same template by adopting a Clinica platform, and specifically comprise the following steps:

calculating a maximum adjacent rectangular frame for the maximum foreground area of all the modes after the matching registration, excluding the area with 0, unifying the space size of each sample to be the same level for the whole pre-training set after the registration, and then resampling to obtain feature data, wherein the resampled target space is obtained by averaging the whole data set;

specifically, the pre-dataset is used entirely for training, and the tag information of the dataset is not used in training. In order to make training more stable during cutting, an oversampling strategy is adopted, so that at least one third of data in one batch is guaranteed to contain prospects.

The specific process of self-encoder training is as follows: the self-encoder is pre-trained with a combined dual-proxy task at the pixel level and the object level. The specific process is as follows: input samples X ε R for a 3D voxel volume of given batch size (Batchsize) of N ^H ^×W×D×C Randomly performing data enhancement of rotation operation, namely randomly transforming an input data instance into two positive correlation views of the same instance through the data enhancement, wherein the final data enhancement comprises 2N data points; the rotation enhancement formula is:

r＝RandomChoices(R),R＝{0°,90°,180°,270°}

wherein: r is a random rotation angle, and k represents a sample number of a batch size N.

For the enhanced samplesAnd->Firstly, remodelling the brain into a series of flat 3D patches, and adding symmetrical position codes in patch embedding to calculate a formula of the position codes according to the characteristic that the structure of the brain has bilateral symmetry in order to ensure the position information:

pos＝h·x-|w/2-y|+w/2+d ² ·z

wherein: dim is the dimension of the patch code, pos is the position of the patch embedding with coordinates of (x, y, z), i is the different dimension of the position code, and the patch embedding at the same position adopts an alternating coding mode of sine code and cosine code. Since the values of sine and cosine are between-1 and 1, no significant distortion is induced when adding position coding to the patch embedding. The final input from the encoder is the sum of the position encoding and the patch embedding addition.

The patch data is subjected to random masking operation, a visible patch area is sent to an encoder, and the three-dimensional patch sequence is projected to a fixed-dimensional space through an embedding layer. For more efficient modeling patch embedding interactions, features of input size H ' x W ' x D ' are uniformly divided into non-overlapping windows using windows of size MxM x M, and local self-attention is calculated within each region. And then offsetting the window according to (M/2, M/2 and M/2) voxels, so that different window features divided before appear in the same window after the window is moved, and the local self-attention in the window area can be calculated again at the moment to realize information exchange of different windows.

As shown in fig. 3 and 4, the output of the self-encoder after the extracted features is decoded, and finally, the reconstructed image patch and the contrast coding features are obtained through the prediction heads at the pixel level and the object level.

Training by using training data, wherein in the training process, parameters of a network are optimized by using a back propagation strategy, and training is assisted by using a loss function, wherein the loss function comprises focus reconstruction loss and contrast loss; the reconstruction loss and the contrast loss are used for helping training, and the parameters of the network are optimized through back propagation, so that the model is encouraged to learn the basic characteristics of the image.

Focusing on the reconstruction loss, aiming at the characteristic of relatively stable brain tissue structure, gradient of each voxel is calculated to obtain a gradual change image in each direction, and a calculation formula is as follows:

G _i ＝I*D _i ,i∈{x,y,z}

wherein: i is the characteristic of the input, D _i Is a filter in the i directionRepresenting a convolution operation.

Thereby obtaining the gradual change direction theta of each voxel, and the specific formula is as follows:

the gradient amplitude is:

for each image patch, a 2D directional gradient histogram is created, each voxel is traversed, looking at which interval of X, Y axis the gradient direction of the voxel falls in, and the gradient magnitude of the voxel is added to the Y-axis of the interval. After the traversal is finished, a norm normalization process is required for the vector representing the histogram size. Obtaining the importance degree of all the masking image patches in the whole brain tissue, wherein the calculation formula is as follows:

wherein:representing the average value of the histogram, N is the number of randomly masked image patches.

Depending on the importance of the directional gradient histogram, different weights are applied when measuring the pixel differences between the restored image region and the original image, thereby encouraging the model to pay more attention to the important region. The calculation formula of the focus reconstruction loss function is as follows:

loss of contrast reconstruction is to set a pair of enhanced samples in the same batch as positive sample z _i And z _j Taking other 2 (N-1) enhancement samples in the same batch asNegative examples. And calculating mutual information between the two vectors through cosine similarity, wherein the rest chord similarity formulas are as follows:

the contrast loss calculation formula is thus:

and updating the network parameters according to the value of the loss function, so that the loss function is continuously reduced until the loss function is converged to a smaller value, and at the moment, training is finished, and the trained pre-trained self-encoder is stored.

Collecting an image to be trained of a downstream segmentation task as a segmentation training set, registering the segmentation training set, then performing intermediate cutting on the registered segmentation training set, and then performing resampling to obtain feature data; inputting the acquired feature data into a self-encoder obtained in a self-supervision stage to extract basic features, then performing decoding operation on the extracted basic features, simultaneously connecting low-level semantics of convolution processing with high-level semantics of a decoding stage under the same downsampling multiplying power by using cross-layer connection, and finally decoding to obtain a segmentation result; and training a network segmentation model by using the segmentation result obtained by decoding and the corresponding segmentation training set, and segmenting the MR image by using the trained network segmentation model.

In the downstream brain MRI segmentation stage, for single-mode data, a self-encoder is directly loaded as an extraction feature encoder, and the low-level semantics of the encoding stage are connected with the high-level semantics of the decoding stage under the same downsampling multiplying power by using a cross-layer connection through a U-shaped network structure, so as to obtain a segmentation result. For multi-mode data, a simple mode sharing encoder is adopted, different mode data are input into the encoder sharing parameters, and common characteristics of all modes are captured, so that a segmentation result is obtained.

For the modal private information, after convolution, carrying out separate normalization operation on the multiple modes, and independently counting the modal private characteristics, wherein the specific formula is as follows:

wherein: u (u) ^L ，Representing the mean and variance over the whole sample. Is a very small constant and prevents the denominator from being 0. Alpha _m ，β _m Is a trainable parameter, respectively a scaling factor and a translation parameter in affine transformation, for restoring the expressive power on data. By modality privatization (alpha) _m ,β _m ) So as to achieve the effect of distinguishing statistical modal information.

And decoding the multi-modal coded output, simultaneously connecting the low-level semantics of the coding stage with the high-level semantics of the decoding stage under the same downsampling multiplying power by using cross-layer connection, and finally decoding to obtain a segmentation result.

In the network segmentation model training process, parameters of a network are optimized by using a back propagation strategy, and training is assisted by using a loss function, wherein the loss function comprises cross entropy loss, softprice loss and full resolution depth supervision loss; cross entropy loss, softrace loss, and full resolution depth supervision loss are used to help train, back-propagate parameters of the optimization network.

Cross entropy is the most common penalty in image segmentation algorithms, which compares each pixel to a truth-chart one by one, and is formulated as follows:

wherein: dxW.times.H is the number of pixels of the entire three-dimensional image, y _i E {0,1} is the true label of the ith element, where 0 is background, 1 is foreground, p _i ∈[0,1]Representing the probability that the pixel predicted by the network belongs to the foreground.

The softdie loss is formulated as follows:

wherein: epsilon is a small constant and is 0 in order to prevent denominator.

And taking a decoding layer of each stage of the network as an intermediate output, carrying out corresponding up-sampling on the output according to the down-sampling multiplying power of the stage, introducing accompanying (Side) loss in a full resolution mode, and carrying out depth supervision. The final loss function formula is expressed as follows:

wherein: p is a predictive probability map, Y is a truth map, g (x, u) represents upsampling by u as multiplying power, lambda _i Is a super parameter for regulating the losses of different intermediate layers, and N is the number of the intermediate layers.

Updating network parameters according to the value of the loss function, enabling the loss function to continuously decline until the loss function converges to a smaller value, finishing training at the moment, and storing a trained network model; and constructing a brain magnetic resonance segmentation model by using the stored trained model.

A brain magnetic resonance segmentation system based on a combined double-task self-encoder comprises a data preprocessing module, a self-supervision module and a segmentation module:

The invention relates to a brain magnetic resonance segmentation method based on a combined double-task self-encoder, which aims at brain tissue priori knowledge to design a reconstruction agent task to pay attention to important image features; in order to learn the basic mode of the anatomical structure of the brain region, a combined double-task framework of a pixel level and an object level which are suitable for brain magnetic resonance imaging is provided, so that the model learns pixel level detail and object level distinguishing information respectively; the reconstruction loss and the contrast loss are combined, so that gradient feedback is promoted, model convergence is enhanced, and model learning basic characteristics are further encouraged.

In the downstream task, according to the number of modes, a self-encoder loading strategy (MALS) based on the modes enables the self-encoder to be directly loaded as a characteristic extraction encoder for single-mode data in segmentation, and a U-shaped network is adopted to obtain a segmentation result. And the encoder which adopts the shared characteristic extraction parameters for the multi-mode data extracts the public information of the modes, but the multi-mode is normalized by the separated mode private information, and the mode private characteristics of the multi-mode data are counted independently.

By combining cross entropy loss, softprice loss and depth supervision loss, the feedback of the gradient is promoted, model convergence is enhanced, and model training effect is further improved;

the method and the device have the advantages that the competitive Dice and HD results are obtained by performing fine adjustment of the downstream segmentation task on the three public data sets, and the method and the device can be superior to the current popular self-supervision medical image segmentation model.

Examples

self-supervised learning of self-encoders: the 3D medical source data is pre-processed to make it suitable for training of the model. The specific working procedure is as follows:

(1.1) employing two sets of public data sets as pre-training data sets;

(1.2) converting the data set in the step (1.1) into a brain imaging data structure, and standardizing data registration on the same template by adopting a Clinica platform;

(1.3) calculating a maximum adjacent rectangular frame for the maximum foreground region of all modes, excluding the region with 0 in the foreground region, unifying the space size of each sample to a consistent level for the whole registered training set, enabling a convolution kernel to traverse data extraction features with the same receptive field, and accordingly resampling, wherein the resampled target space is obtained by averaging the whole data set;

(1.4) dividing the data set processed in the step (1.3) into training sets, wherein the training process does not use label information of any data set.

A reconstruction agent task is designed aiming at brain tissue priori knowledge to pay attention to important image features, and an agent task based on contrast coding is added on the basis of the important image features, so that a combined double-task framework of a pixel level and an object level is formed. The specific working procedure is as follows:

(2.1) for the data set obtained in step (1.4), for the same batch of input samples X εR ^H×W×D×C Data enhancement with random rotation is performed, and two positive correlation views of each sample after enhancement are obtained, namely

(2.2) for the enhancement sample described in step (2.1), remolding it into a series of flat 3D patches, and adding symmetric position codes in patch embedding, i.e. pos=h·x- |w/2-y|+w/2+d, in order to guarantee position information while having bilateral symmetry characteristics according to the brain structure ² ·z；

(2.3) carrying out random masking operation on the feature patch sequence added with the position codes and obtained in the step (2.2), and sending the visible patch area to an encoder;

(2.4) more efficient modeling patch embedding interactions in the encoder using paired window self-attention computation modules and moving window self-attention computation modules.

And (2.5) decoding the output of the encoder obtained in the step (2.4), and finally obtaining the reconstructed image patch and the contrast coding characteristic through the prediction heads of the pixel level and the object level.

The focus reconstruction loss and the contrast loss are adopted in the acquired self-encoder training process to promote gradient feedback, strengthen model convergence and further encourage model learning basic characteristics;

the 3D medical source data is pre-processed to make it suitable for training of the model. The specific working procedure is as follows:

(4.1) employing three sets of public data sets as training data sets;

(4.2) converting the data set in the step (4.1) into a brain imaging data structure, and standardizing data registration on the same template by adopting a Clinica platform;

(4.3) calculating a maximum adjacent rectangular frame for the maximum foreground region of all modes, excluding the region with 0 in the foreground region, unifying the space size of each sample to a consistent level for the whole registered training set, enabling the convolution kernel to traverse the data extraction features with the same receptive field, and thus resampling, wherein the resampled target space is obtained by averaging the whole data set;

(4.4) dividing the data set processed in the step (4.3) into a training set and a testing set.

For different modality data, a modality-based self-encoder loading strategy (Modal Autoencoder Loading Strategy, MALS) is proposed, as shown in fig. 5, 6. The specific working procedure is as follows:

(5.1) for single mode data, directly loading the self-encoder as an extraction feature encoder.

(5.2) for multi-mode data, a simple mode sharing encoder is adopted, different mode data are input into the encoder sharing parameters, and common characteristics of all modes are captured.

(5.3) carrying out separate normalization operation on the multimode after convolution on the modal private information in the step (5.2), and independently counting the modal private characteristics of the multimode.

Decoding basic features subjected to feature extraction by a self-encoder, simultaneously connecting low-level semantics of an encoding stage with high-level semantics of a decoding stage under the same downsampling multiplying power by using cross-layer connection, and finally decoding to obtain a segmentation result;

for a network segmentation model, cross entropy loss, softprice loss and full resolution depth supervision loss are adopted in the training process to promote gradient feedback, strengthen model convergence and further improve training effect;

for the trained network segmentation model, the test image is taken as input, and the result of automatic segmentation is obtained, as shown in fig. 7. The specific working procedure is as follows:

according to the invention, training set data are converted into brain imaging data structures in a self-supervision stage, data registration is standardized to the same template, then center cutting operation is carried out on the registered training set, and finally resampling is carried out to obtain characteristic data; performing random rotation operation on the obtained same batch of characteristic data to obtain two positive correlation views of each sample in the same batch after data enhancement, and performing random masking operation on each view to obtain the characteristics of the shielded part patch; the method comprises the steps of inputting a positive correlation characteristic map obtained after data enhancement and random mask operation into a self-encoder network to extract characteristic operation, obtaining reconstructed image patches and contrast encoding characteristics through a pixel-level prediction head and an object-level prediction head respectively, finally training the self-encoder by utilizing self-supervision information of an image, introducing symmetrical position encoding according to priori knowledge of brain magnetic resonance imaging, enabling symmetrical positions to have the same position information, simultaneously considering smoothness of a medical image relative to a natural image, classifying the characteristics of different areas according to the distribution of local intensity gradients of a three-dimensional voxel directional gradient histogram, applying different weights when reconstruction loss is measured and restored to pixel differences between an image area and an original image, thereby encouraging the model to pay more attention to the important areas, providing a combined dual-task framework of a pixel level and an object level customized for brain magnetic resonance imaging, enabling the model to learn pixel-level detail and object-level distinguishing information respectively, providing a self-encoder loading strategy based on multi-modal data in a downstream brain segmentation task, and improving a mode fusion mode and sharing the mode of the model by measuring and sharing parameter information.

Claims

1. The brain magnetic resonance segmentation method based on the combined double-task self-encoder is characterized by comprising the following steps of:

2. The brain magnetic resonance segmentation method based on the combined double-task self-encoder according to claim 1, wherein the pre-training self-encoder training specific process is as follows: the method comprises the steps of collecting a pre-training image as a pre-training set, converting pre-training set data into a brain imaging data structure, standardizing registration of the data structure of the pre-training set converted into the brain imaging data structure to the same template, then performing center cutting operation on the registered pre-training set data to obtain pre-training characteristic data, and training a self-encoder by utilizing the pre-training characteristic data.

3. The brain magnetic resonance segmentation method based on the combined dual-task self-encoder according to claim 2, wherein the data structure registration of the pre-training set is standardized to the same template by using a clinical platform.

4. The brain magnetic resonance segmentation method based on the combined double-task self-encoder according to claim 2, wherein the obtained same batch of pre-training characteristic data is subjected to random rotation operation to obtain two positive correlation views of each sample in the same batch after data enhancement, and each positive correlation view is subjected to random masking operation to obtain a positive correlation characteristic map of a shielded part patch; and inputting the positive correlation characteristic map subjected to data enhancement and random mask operation into a self-encoder network to extract characteristic operation, respectively obtaining reconstructed image patches and contrast coding characteristics through a pixel-level prediction head and an object-level prediction head, and performing self-encoder pre-training by utilizing self-supervision information of a pre-training image.

5. The brain magnetic resonance segmentation method based on the combined double-task self-encoder according to claim 2, wherein in the pre-training process of the self-encoder, parameters of the network are optimized by using a back propagation strategy, training is assisted by using a loss function, and the parameters of the network are updated according to the value of the loss function, so that the loss function continuously descends until the loss function converges to a set value, and at the moment, the training is finished, and the pre-training of the self-encoder is completed.

6. A brain magnetic resonance segmentation method based on a combined dual-task self-encoder according to claim 3, characterized in that the normalization of the registration of the data structures of the pre-training set onto the same template by using a clinical platform specifically comprises the following steps:

7. The brain magnetic resonance segmentation method based on the combined double-task self-encoder according to claim 1, wherein for single-mode data, the self-encoder is directly loaded as an extraction feature encoder, and the low-level semantics of the encoding stage are connected with the high-level semantics of the decoding stage under the same downsampling multiplying power by using a cross-layer connection through a U-shaped network structure, so as to obtain a segmentation result.

8. The brain magnetic resonance segmentation method based on the combined double-task self-encoder according to claim 1, wherein for multi-modal data, a simple modal shared encoder is adopted, different modal data are input into the encoder sharing parameters, and common characteristics of all modes are captured to obtain segmentation results.

9. The brain magnetic resonance segmentation method based on the combined double-task self-encoder according to claim 8, wherein the multi-modal encoded output is decoded, and the low-level semantics of the encoding stage are connected with the high-level semantics of the decoding stage under the same downsampling ratio by using cross-layer connection, and the segmentation result is finally obtained by decoding.

10. The brain magnetic resonance segmentation method based on the combined double-task self-encoder is characterized by comprising a data preprocessing module, a self-supervision module and a segmentation module: