CN115311183A

CN115311183A - Medical image cross-mode synthesis method and system and readable storage medium

Info

Publication number: CN115311183A
Application number: CN202210942137.3A
Authority: CN
Inventors: 罗玉; 洪奕开; 凌捷; 柳毅
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2022-08-08
Filing date: 2022-08-08
Publication date: 2022-11-08

Abstract

The method comprises the steps of determining a medical image pair corresponding to the same part, wherein the medical image pair comprises an MRI image and a CT image which are matched and used for training; performing image cross-mode synthesis on the MRI image based on the constructed initially generated confrontation network model to obtain a corresponding CT image, wherein the initially generated confrontation network model comprises a joint attention residual error processing module for determining corresponding joint attention characteristics by combining channel attention characteristics and global attention characteristics extracted from the image; performing model training based on the matched medical image, performing down-sampling and up-sampling operations on the input MRI image by combining with a joint attention residual error processing module in the training process, and obtaining a target generation confrontation network model when the training is finished; and inputting the obtained MRI image to be processed into a target generation confrontation network model to obtain a corresponding CT image synthesis result.

Description

Medical image cross-mode synthesis method and system and readable storage medium

Technical Field

The application relates to the technical field of medical image optimization, in particular to a medical image cross-mode synthesis method, a medical image cross-mode synthesis system and a readable storage medium.

Background

Medical imaging plays an important role in the diagnosis and treatment of various diseases. Since the insights provided by different imaging modalities tend to be complementary, more than one imaging modality is often involved in clinical decision-making. For example, magnetic Resonance (MR) images are widely used in clinical diagnostics and cancer monitoring because they are obtained by non-invasive imaging protocols, which provide good soft tissue contrast. However, MR images do not provide the electron density information that Computed Tomography (CT) images can provide, which is crucial for dose calculation in radiotherapy treatment planning. Computed Tomography (CT) has the advantage of providing electron and physical density of tissue, which is essential in radiation therapy dose planning for cancer patients. On the other hand, radiation exposure during CT acquisition may also increase the risk of secondary cancer, especially in younger patients. Magnetic Resonance Imaging (MRI) can provide very good soft tissue contrast. Compared with CT, MRI is also safer, not involving any radiation; generally, there are situations where two or more imaging examinations are required to make an accurate diagnosis. Acquiring a set of different clinical images is a time consuming and expensive process that is not affordable to most patients. Therefore, cross-modality synthesis of medical images, such as MR to CT synthesis, is desirable for many diagnostic and therapeutic purposes.

The above observations reflect a general dilemma that it is desirable to obtain some form of medical image, but in practice it is not feasible. For this purpose, a system that is capable of synthesizing images of interest from different sources (e.g., image mode and acquisition protocol) may offer significant benefits. It may provide high-demand imaging data for certain clinical uses without the additional cost/risk of a real acquisition.

Recently developed generation countermeasure networks (GANs) [ Goodfellow I et al, general adaptive networks, 2014], a generation method based on deep learning, is becoming more popular in medical image synthesis, in some studies [ amine H et al, generating synthetic CTs from magnetic resonance imaging using generating adaptive networks,2018] [ kazemifare S et al, mri-only fibrous radial approach: assembling the statistical approach of synthetic CT images generating using a derived adaptive network, 2019, gagagagagagagagagazette is used for non-linear correlation between trans-ns mode applications, estimating true synthetic CT images. Different variants of GANs show excellent performance in medical image synthesis. Despite their powerful capabilities, previous learning-based synthesis models were based essentially on convolution architectures, using compact filters to extract local image features. This generalized bias reduces the number of model parameters to facilitate learning by exploiting the correlation between small neighborhoods of image pixels. However, it also limits the expression of contextual characteristics that reflect long-term spatial dependencies [ x.wang et al. Medical images contain contextual relationships between healthy and pathological tissue. For example, the bone in the skull or cerebrospinal fluid in the ventricles of the brain is widely distributed over spatially adjacent or separated brain regions, resulting in dependencies between distant voxels. Although pathological tissues have a less regular anatomical basis, their spatial distribution (e.g., location, number, shape) may still show disease-specific patterns. In principle, this can be improved by capturing a priori these relationships synthetically. Visual transformers (VIT) hold great promise for this goal because the attention operation of learning contextual features can increase sensitivity to long-range interactions and focus on key image regions to improve generalization to atypical anatomy.

Disclosure of Invention

The purpose of the embodiments of the present application is to provide a method, a system, and a readable storage medium for cross-modality synthesis of a medical image, which can avoid limiting the expression of context characteristics and improve estimation accuracy.

The embodiment of the application also provides a cross-modal synthesis method of the medical image, which comprises the following steps:

determining a medical image pair corresponding to the same part, wherein the medical image pair comprises an MRI image and a CT image which are matched for training;

performing image cross-mode synthesis on the MRI image based on the constructed initially generated confrontation network model to obtain a corresponding CT image, wherein the initially generated confrontation network model comprises a joint attention residual error processing module for determining corresponding joint attention characteristics by combining channel attention characteristics and global attention characteristics extracted from the image;

performing model training based on the matched medical image, performing down-sampling and up-sampling operations on the input MRI image by combining the joint attention residual error processing module in the training process so as to cooperatively store local and global contexts, and obtaining a target generation confrontation network model when the training is finished;

and inputting the obtained MRI image to be processed into the target to generate a confrontation network model, and obtaining a corresponding CT image synthesis result.

In a second aspect, an embodiment of the present application further provides a medical image cross-modality synthesis system, which includes an image processing module, a model building module, a model training module, and an image synthesis module, where:

the image processing module is used for determining a medical image pair corresponding to the same part, wherein the medical image pair comprises an MRI image and a CT image which are matched for training;

the model construction module is used for carrying out image cross-modal synthesis on the MRI image based on the constructed initially generated confrontation network model to obtain a corresponding CT image, and the initially generated confrontation network model comprises a joint attention residual error processing module which is used for jointly extracting channel attention characteristics and global attention characteristics from the image and determining corresponding joint attention characteristics;

the model training module is used for performing model training based on the matched medical image, and performing down-sampling and up-sampling operations on the input MRI image by combining the joint attention residual error processing module in the training process so as to cooperatively store local and global contexts and obtain a target generation confrontation network model when the training is finished;

and the image synthesis module is used for inputting the obtained MRI image to be processed into the target generation confrontation network model to obtain a corresponding CT image synthesis result.

In a third aspect, an embodiment of the present application further provides a readable storage medium, where the readable storage medium includes a program of a cross-modality medical image synthesis method, and when the program of the cross-modality medical image synthesis method is executed by a processor, the method implements the steps of a cross-modality medical image synthesis method as described in any one of the above.

As can be seen from the above, according to the medical image cross-modal synthesis method, system and readable storage medium provided in the embodiments of the present application, a corresponding network model is established for medical image cross-modal synthesis by using the context sensitivity fitted with the visual Transformer, the precision of the convolution operator and the reality of the counterlearning, so that the conversion between cross-modal medical image data can be performed based on the network model, the problem that the expression of the context characteristics is limited by the existing convolutional neural network technology, the dependency between remote voxels is lacked, the image conversion efficiency is improved, and the estimation accuracy is improved.

Additional features and advantages of the present application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the present application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 is a flowchart of a method for cross-modal synthesis of a medical image according to an embodiment of the present application;

FIG. 2 is a block diagram of a joint attention residual processing module;

FIG. 3 is a schematic diagram of a channel attention block;

FIG. 4 is a schematic diagram of a network architecture for performing downsampling and upsampling operations on an input target MRI image in conjunction with a joint attention residual processing module;

fig. 5 is a schematic structural diagram of a medical image cross-modality synthesis system according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined or explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.

Referring to fig. 1, fig. 1 is a flowchart of a cross-modal synthesis method of a medical image according to some embodiments of the present application. The method is exemplified by being applied to a computer device (the computer device may specifically be a terminal or a server, and the terminal may specifically be but is not limited to various personal computers, notebook computers, smart phones, tablet computers and portable wearable devices, the server may be an independent server or a server cluster composed of a plurality of servers), and the method includes the following steps:

step S100, a medical image pair corresponding to the same part is determined, and the medical image pair includes an MRI image and a CT image for training which are matched.

And S200, carrying out image cross-mode synthesis on the MRI image based on the constructed initially generated confrontation network model to obtain a corresponding CT image, wherein the initially generated confrontation network model comprises a joint attention residual error processing module for determining a corresponding joint attention characteristic by combining the channel attention characteristic and the global attention characteristic extracted from the image.

And step S300, performing model training based on the matched medical image, performing down-sampling and up-sampling operations on the input MRI image by combining with the joint attention residual error processing module in the training process so as to cooperatively store local and global contexts, and obtaining a target generation confrontation network model when the training is finished.

And step S400, inputting the obtained MRI image to be processed into a target generation confrontation network model to obtain a corresponding CT image synthesis result.

Therefore, according to the medical image cross-modal synthesis method provided by the embodiment of the application, the context sensitivity of the visual Transformer, the precision of the convolution operator and the reality of the counterstudy are fitted, and the corresponding network model is established for the medical image cross-modal synthesis, so that the conversion between cross-modal medical image data can be performed based on the network model, the problem that the expression of the context characteristics is limited by the existing convolution neural network technology is overcome, the dependence between long-distance voxels is lacking, the image conversion efficiency is improved, and the estimation accuracy is improved.

In one embodiment, the step S100 of determining the target medical image pair corresponding to the same portion includes:

step S1001, a medical image pair to be processed corresponding to the same part is acquired, and the medical image pair to be processed includes an initial MRI image and an initial CT image that are mutually matched.

Step S1002, respectively preprocessing an initial MRI image and the initial CT image according to a preset preprocessing mode to obtain an MRI image and a CT image which are matched for training, wherein the preprocessing mode comprises at least one of an N4 bias correction mode and an image denoising mode.

And step S1003, determining a medical image pair corresponding to the same part according to the MRI image and the CT image which are matched for training.

Before model training is performed based on MRI data and CT data (i.e., the medical image to be processed), preprocessing such as N4 bias correction and denoising is performed on the MRI data and the CT data. And then, carrying out accurate registration work based on the obtained preprocessed data, and slicing into corresponding 2D data sets so as to improve the data registration accuracy.

In one embodiment, please refer to fig. 2, the joint attention residual processing module is composed of a channel attention block and a plurality of cascaded Swin Transformer blocks, wherein:

the channel attention block consists of a pooling layer, a first full-connection layer, a dot-product operation layer, a second full-connection layer and a sigmiod activation layer which are connected in sequence.

Specifically, please refer to fig. 3, which is different from the general channel attention block in that only one pooling operation is used to extract the statistical information of the input feature channel. In the current embodiment, the statistical information of the input feature channels is extracted based on the pooling layer covering two kinds of pooling operations, i.e., the maximum pooling operation, and the average pooling operation.

In one embodiment, the pooling layer activates the operation steps of full connectivity layer-dot product operation-full connectivity layer-sigmiod in sequence, and obtains the channel attention feature based on the extracted statistical information of the input feature channel. The obtained channel attention characteristics are multiplied by the input global attention characteristics, so that the required output characteristics are obtained, and the model can be better self-adapted to acquire the statistical information of the channel.

The pooling layer is composed of a maximum pooling branch layer for extracting statistical information of the input feature channels by a maximum pooling operation, and an average pooling branch layer for extracting statistical information of the input feature channels by an average pooling operation.

In particular, the commonly used pooling methods are maximum pooling and mean pooling. According to the relevant theory, the error of feature extraction mainly comes from two aspects:

(1) The variance of the estimated value is increased due to the limited size of the neighborhood;

(2) Convolutional layer parameter errors cause a shift in the estimated mean.

In general, maximum pooling can reduce the first and second errors, and more preserve background information and texture information of the image. Which, in a local sense, would obey the criteria of max-pooling, is similar to mean pooling.

In one embodiment, the size of the maximum pooled convolution kernel is typically 2 x 2. In the presence of a very large input quantity, the size of a possible convolution kernel may be set to 4x4. However, with a large convolution kernel setting, the signal size is significantly reduced and excessive loss of information may result. In general, pooling windows that do not overlap perform best.

In one embodiment, in step S300, when channel attention features are extracted from an input MRI image based on a channel attention block, the method includes:

step S3001, extracting a channel attention feature P from the input target MRI image by the following formula _C ：

Wherein x is ^T For a global attention feature, δ, extracted from an input target MRI image ₁ Is the first fully-connected layer, δ ₂ A second fully connected layer;

the function is activated for sigmoid.

In one embodiment, in step S300, when the joint attention feature of the image is determined by combining the channel attention feature and the global attention feature extracted from the input target MRI image, the method includes:

step S3002, determining the joint attention feature P of the image by the following formula _SRTB ：

P _SRTB ＝Conv(P _STB (x ^T )+P _C (x ^T ))；

Wherein x is ^T For global attention features, P, extracted from an input target MRI image _STB Is a cascade block composed of the plurality of cascade Swin transform blocks, P _C Conv is the convolutional layer for the channel attention block.

Specifically, referring to fig. 2, the cascade block is composed of six Swin Transformer blocks in cascade. In the present embodiment, the computer device will input the feature x ^T Through the above-mentioned channel attention block P _C Extraction of channel attention features is performed. Wherein the channel attention feature is further connected to an output of the concatenation block, thereby determining a joint attention feature of the image based on the extracted channel attention feature and the global attention feature.

In one embodiment, in step S300, the downsampling and upsampling operations performed on the input target MRI image in conjunction with the joint attention residual processing module include: performing four down-sampling operations and corresponding four up-sampling operations on the input target MRI image to maintain the original size of the image, wherein the down-sampling operations comprise a maximum pooling operation, and the up-sampling operations comprise a deconvolution operation; at the time of first sampling, jumping and connecting an output characteristic obtained through the down-sampling to an initial input characteristic of the up-sampling; and when the second sampling, the third sampling or the fourth sampling is carried out, the output characteristic obtained by the down sampling passes through the joint attention residual error processing module and is connected to the initial input characteristic of the up sampling.

Specifically, the computer device may perform a down-sampling operation on the target MRI image through a preset maximum pooling layer, and perform an up-sampling operation on the target MRI image through a preset anti-convolution layer.

Wherein, with reference to fig. 4, at the time of the first sampling, the computer device will jump connect to the initial input feature of the deconvolution operation layer corresponding to the upsampling operation via the target output feature corresponding to the maximum pooling layer output for the downsampling operation.

It should be noted that the target output features output through the downsampling of the 2 nd, 3 rd and 4 th layers are passed through the joint attention residual error processing module (i.e. RSTB illustrated in fig. 4) and then connected to the initial input features of the corresponding deconvolution operation layer for performing the upsampling operation.

Therefore, the method can effectively overcome the defect that the dependency among remote voxels is lacked due to the limitation of the expression of the context characteristics in the conventional convolutional neural network technology, and improve the data prediction accuracy.

In one embodiment, in step S300, model training is performed based on the target medical image pair, and when the training is finished, a target generation confrontation network model is obtained, including:

step S3003, constructing a target loss function, the target loss function including generating at least one of a run-out loss function, an average absolute error loss function determined based on an average absolute error between the target CT image and the corresponding CT image synthesis result, and a frequency loss function determined based on a frequency difference between the target CT image and the corresponding CT image synthesis result.

Specifically, the computer device may construct a corresponding loss function based on a deviation between the target CT image and the corresponding CT image synthesis result.

Wherein, when a plurality of types of loss functions are involved, the total loss function can be determined based on the weighted summation result among the plurality of types of loss functions. Subsequently, model optimization is performed based on the total loss function, and the optimization objects include, but are not limited to, adjustment of network parameters, adjustment of a network structure, and the like, which is not limited in this embodiment of the present application.

Step S3004, in the training process, based on the target loss function, model optimization is carried out through a gradient descent method, and when the preset training end condition is determined to be reached, a target generation confrontation network model is obtained.

Specifically, in the machine learning algorithm, when the loss function is minimized, iterative solution may be performed by a gradient descent method, and the minimized loss function and the model parameter value may be obtained thereby.

In one embodiment, if the maximum value of the loss function needs to be solved, a gradient ascent method can be used for iterative calculation. It should be noted that the gradient descent method and the gradient ascent method may be mutually inverted. Illustratively, when the minimum value of the loss function f (ω) needs to be solved, a gradient descent method may be used for iterative solution. In practice, however, the maximum of the loss function f (ω) can also be solved in reverse, in which case the gradient ascent method is used.

In one embodiment, the training end condition includes, but is not limited to, that the target loss function approaches 0, the maximum number of iterations is reached, and the like, which is not limited in the embodiment of the present application.

In one embodiment, the calculation formula for generating the antagonistic loss function LcGAN comprises:

LcGAN＝E _x，y [logD(x，y)]+E _x，y [log(1-D(x，G(x)))]；

wherein x ∈ R ^N For the input MRI image of the object, y ∈ R ^N A target CT image matched with the target MRI image; e _x,y [*]Is the expected value of the distribution function; d (x, y) is the matching between the identification target MRI image and the corresponding target CT image, and G (x) is the CT image synthesis result output via the model.

Specifically, D (x, y) described above may be further understood as an identifier for identifying the degree of matching between the target MRI image x and the corresponding generated CT image y. The above-mentioned G (x) may further be understood as a generator for generating a corresponding CT image based on the input target MRI image x.

It can be understood that, in the above equation, logD (x, y) is the probability that the discriminator determines the true data as true data, and log (1-D (x, G (x))) is the probability that the discriminator determines the false data generated by the generator as false data.

On the other hand, the countermeasure network merely proposes a network structure, and in general, the network structure uses two models, namely: one generative model, one discriminative model. Where the authentication model is used to determine whether a given picture is a true picture (a picture taken from a data set), the task of generating the model is to create a picture that looks like a true picture. At the beginning, the two models are not trained, the two models are subjected to antagonistic training together, wherein the generated model generates a picture to deceive the identification model, then the identification model judges whether the picture is true or false, and finally, the two models have stronger and stronger capacities in the training process of the two models and finally reach a steady state.

Mean absolute error loss function L ₁ The calculation formula (2) includes:

L ₁ ＝E _x，y ||y-G(x)|| ₁ ；

wherein | | ₁ Represents L ₁ And (4) norm.

Specifically, the mean absolute error function is also a commonly used regression loss function, which is the sum of absolute values of differences between the target value and the predicted value, and represents the mean error magnitude of the predicted value without considering the direction of the error. Compared with the average error function, the average absolute error function has the advantages that the deviation is converted into the absolute value, and the situation that positive and negative values are offset does not occur, so that the actual situation of the error of the predicted value can be better reflected by the average absolute error function, and the training accuracy of the model is improved.

Frequency loss function L _fre The calculation formula (2) includes:

L _fre ＝E _x，y [||y _l -G(x) _l ||+||y _h -G(x) _h ||]；

wherein, y _l For low frequency information of the target MRI image, G (x) _l Low-frequency information which is a CT image synthesis result; y is _h High frequency information for the MRI image of interest, G (x) _h High frequency information of the CT image synthesis result.

Specifically, the computer device may employ a gaussian kernel function, and the low-frequency information is retained by filtering out the high-frequency feature:

wherein [ i, j]Representing a spatial position within the image; sigma ² Represents the variance, where the variance increases in proportion to the gaussian kernel size.

In one embodiment, y _l For example, low-frequency information y in the determination image y _l In the process, the computer device may perform convolution processing by using a gaussian kernel, and the calculation method may refer to the following formula:

y _l [i，j]＝∑ _m ∑ _n K[m，n]·y[i+m，j+n]；

wherein, y _l For low frequency information extracted from image y, [ i, j]Representing a spatial location within the image; m, n are indexes of two-dimensional Gaussian kernel function, K [ m, n ]]Is a gaussian kernel function.

In one embodiment, the computer device may filter out low frequency information y from the image y _l In the mode (3), high-frequency information y is further extracted from the image y _h Namely: y is _h ＝y-y _l 。

Therefore, in the model training process, the computer equipment performs fusion training on the constructed target generation confrontation network model by using loss functions such as frequency constraint, confrontation constraint loss functions and the like, so that compared with the common generation confrontation network model, the method can achieve a higher-quality synthesis effect and improve the image synthesis quality.

Referring to fig. 5, the present application discloses a medical image cross-modality synthesis system 500, where the system 500 includes an image processing module 501, a model building module 502, a model training module 503, and an image synthesis module 504, where:

the image processing module 501 is configured to determine a medical image pair corresponding to the same region, where the medical image pair includes an MRI image and a CT image which are used for training and matched with each other.

The model building module 502 is configured to perform image cross-mode synthesis on an MRI image based on a built initially generated confrontation network model to obtain a corresponding CT image, where the initially generated confrontation network model includes a joint attention residual processing module configured to determine a corresponding joint attention feature by combining a channel attention feature and a global attention feature extracted from the image.

The model training module 503 is configured to perform model training based on the matched medical image pair, and perform down-sampling and up-sampling operations on the input MRI image in combination with the joint attention residual processing module during the training process to cooperatively store local and global contexts, and obtain a target generation confrontation network model when the training is finished.

The image synthesis module 504 is configured to input the obtained MRI image to be processed into a target generation countermeasure network model, so as to obtain a corresponding CT image synthesis result.

In one embodiment, the modules are further configured to implement the method in any optional implementation manner of the embodiment.

Therefore, according to the medical image cross-modal synthesis system disclosed by the application, the context sensitivity of the visual Transformer, the precision of the convolution operator and the reality of counterstudy are fitted, and the corresponding network model is established for the medical image cross-modal synthesis, so that the cross-modal medical image data can be converted based on the network model, the problem that the expression of the context characteristic is limited by the existing convolution neural network technology is overcome, the dependence among remote voxels is lacking, the image conversion efficiency is improved, and the estimation accuracy is improved.

The embodiment of the present application provides a readable storage medium, and when being executed by a processor, the computer program performs the method in any optional implementation manner of the foregoing embodiment. The storage medium may be implemented by any type of volatile or nonvolatile storage device or combination thereof, such as a Static Random Access Memory (SRAM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), an Erasable Programmable Read-Only Memory (EPROM), a Programmable Read-Only Memory (PROM), a Read-Only Memory (ROM), a magnetic Memory, a flash Memory, a magnetic disk, or an optical disk.

The readable storage medium establishes a corresponding network model for cross-modal synthesis of the medical image by utilizing the context sensitivity of the fitted visual Transformer, the precision of a convolution operator and the reality of counterstudy, so that the conversion can be performed between cross-modal medical image data based on the network model, the problem that the expression of the context characteristics is limited by the conventional convolutional neural network technology, the dependence between remote voxels is lacked is solved, the image conversion efficiency is improved, and the estimation accuracy is improved.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be through some communication interfaces, indirect coupling or communication connection between devices or units, and may be in an electrical, mechanical or other form.

In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist alone, or two or more modules may be integrated to form an independent part.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A cross-mode synthesis method of medical images is characterized by comprising the following steps:

performing image cross-mode synthesis on the MRI image based on a constructed initially generated confrontation network model to obtain a corresponding CT image, wherein the initially generated confrontation network model comprises a joint attention residual error processing module which is used for jointly extracting a channel attention feature and a global attention feature from the image and determining a corresponding joint attention feature;

model training is carried out based on the matched medical image, in the training process, downsampling and upsampling operations are carried out on the input MRI image by combining the joint attention residual error processing module so as to cooperatively save local and global contexts, and when the training is finished, a target generation confrontation network model is obtained;

and inputting the obtained MRI image to be processed into the target generation confrontation network model to obtain a corresponding CT image synthesis result.

2. The method of claim 1, wherein determining the pair of medical images corresponding to the same location comprises:

acquiring a medical image pair to be processed corresponding to the same part, wherein the medical image pair to be processed comprises an initial MRI image and an initial CT image which are matched with each other;

respectively preprocessing the initial MRI image and the initial CT image according to a preset preprocessing mode to obtain an MRI image and a CT image which are matched for training, wherein the preprocessing mode comprises at least one of an N4 offset correction mode and an image denoising mode;

and determining a medical image pair corresponding to the same part according to the MRI image and the CT image which are matched for training.

3. The method of claim 1, wherein the joint attention residual processing module is comprised of a channel attention block and a plurality of cascaded Swin transform blocks, wherein:

the channel attention block consists of a pooling layer, a first full-connection layer, a dot product operation layer, a second full-connection layer and a sigmiod activation layer which are connected in sequence;

4. The method of claim 3, wherein when extracting channel attention features from an input MRI image based on the channel attention block, the method comprises:

extracting a channel attention feature P from an input target MRI image through the following formula _C ：

the function is activated for sigmoid.

5. The method of claim 3, wherein when determining the joint attention feature of the images in combination with the extracted channel attention feature and the global attention feature from the input target MRI image, the method comprises:

determining a joint attention feature P of the images by the following formula _SRTB ：

P _SRTB ＝Conv(P _STB (x ^T )+P _c (x ^T ))；

6. The method of claim 1, wherein the down-sampling and up-sampling operations of the input target MRI image in conjunction with the joint attention residual processing module comprise:

performing four downsampling operations on an input target MRI image and corresponding four upsampling operations to maintain the original size of the image, wherein the downsampling operations comprise maximum pooling operations and the upsampling operations comprise deconvolution operations;

at the time of the first sampling, jumping and connecting the output characteristic obtained by the down-sampling to the initial input characteristic of the up-sampling;

and when the second sampling, the third sampling or the fourth sampling is carried out, the output characteristic obtained by the down sampling passes through the joint attention residual error processing module and is connected to the initial input characteristic of the up sampling.

7. The method of claim 1, wherein model training based on the target medical image and upon completion of training, obtaining a target generation confrontation network model comprises:

constructing a target loss function including generating at least one of an antagonistic loss function, a mean absolute error loss function determined based on a mean absolute error between the target CT image and the corresponding CT image synthesis result, and a frequency loss function determined based on a frequency difference between the target CT image and the corresponding CT image synthesis result;

in the training process, model optimization is carried out through a gradient descent method based on the target loss function, and when the preset training end condition is determined to be reached, a target generation confrontation network model is obtained.

8. The method according to claim 7, wherein said calculation formula for generating the antagonistic loss function LcGAN comprises:

LcGAN＝E _x，y [logD(x,y)]+E _x，y [log(1-D(x，G(x)))]；

wherein y ∈ R ^N For the input MRI image of the object, y ∈ R ^N A target CT image matched with the target MRI image; e _x,y [*]Is the expected value of the distribution function; d (x, y) is the matching between the identification target MRI image and the corresponding target CT image, G (x) is the CT image synthesis result output via the model;

the mean absolute error loss function L ₁ The calculation formula (2) includes:

L ₁ ＝E _x，y ||y-G(x)|| ₁ ；

wherein | | ₁ Represents L ₁ A norm;

said frequency loss function L _fre The calculation formula (2) includes:

L _fre ＝E _x，y [||y _l -G(x) _l ||+||y _h -G(x) _h ||]；

wherein, y _l For low frequency information of the target MRI image, G (x) _l Low-frequency information which is a CT image synthesis result; y is _h High frequency information for the target MRI image, G (x) _h High frequency information of the CT image synthesis result.

9. A medical image cross-modality synthesis system, characterized in that the system comprises an image processing module, a model construction module, a model training module and an image synthesis module, wherein:

the model construction module is used for carrying out image cross-mode synthesis on the MRI image based on a constructed initially generated confrontation network model to obtain a corresponding CT image, and the initially generated confrontation network model comprises a joint attention residual error processing module which is used for jointly extracting a channel attention feature and a global attention feature from the image and determining a corresponding joint attention feature;

10. A readable storage medium, characterized in that the readable storage medium comprises a program of a medical image cross modality synthesis method, which when executed by a processor implements the steps of the method according to any one of claims 1 to 8.