CN115311183A - Medical image cross-mode synthesis method and system and readable storage medium - Google Patents

Medical image cross-mode synthesis method and system and readable storage medium Download PDF

Info

Publication number
CN115311183A
CN115311183A CN202210942137.3A CN202210942137A CN115311183A CN 115311183 A CN115311183 A CN 115311183A CN 202210942137 A CN202210942137 A CN 202210942137A CN 115311183 A CN115311183 A CN 115311183A
Authority
CN
China
Prior art keywords
image
target
training
mri image
mri
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210942137.3A
Other languages
Chinese (zh)
Inventor
罗玉
洪奕开
凌捷
柳毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202210942137.3A priority Critical patent/CN115311183A/en
Publication of CN115311183A publication Critical patent/CN115311183A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10081Computed x-ray tomography [CT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10088Magnetic resonance imaging [MRI]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Epidemiology (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Apparatus For Radiation Diagnosis (AREA)

Abstract

The method comprises the steps of determining a medical image pair corresponding to the same part, wherein the medical image pair comprises an MRI image and a CT image which are matched and used for training; performing image cross-mode synthesis on the MRI image based on the constructed initially generated confrontation network model to obtain a corresponding CT image, wherein the initially generated confrontation network model comprises a joint attention residual error processing module for determining corresponding joint attention characteristics by combining channel attention characteristics and global attention characteristics extracted from the image; performing model training based on the matched medical image, performing down-sampling and up-sampling operations on the input MRI image by combining with a joint attention residual error processing module in the training process, and obtaining a target generation confrontation network model when the training is finished; and inputting the obtained MRI image to be processed into a target generation confrontation network model to obtain a corresponding CT image synthesis result.

Description

Medical image cross-mode synthesis method and system and readable storage medium
Technical Field
The application relates to the technical field of medical image optimization, in particular to a medical image cross-mode synthesis method, a medical image cross-mode synthesis system and a readable storage medium.
Background
Medical imaging plays an important role in the diagnosis and treatment of various diseases. Since the insights provided by different imaging modalities tend to be complementary, more than one imaging modality is often involved in clinical decision-making. For example, magnetic Resonance (MR) images are widely used in clinical diagnostics and cancer monitoring because they are obtained by non-invasive imaging protocols, which provide good soft tissue contrast. However, MR images do not provide the electron density information that Computed Tomography (CT) images can provide, which is crucial for dose calculation in radiotherapy treatment planning. Computed Tomography (CT) has the advantage of providing electron and physical density of tissue, which is essential in radiation therapy dose planning for cancer patients. On the other hand, radiation exposure during CT acquisition may also increase the risk of secondary cancer, especially in younger patients. Magnetic Resonance Imaging (MRI) can provide very good soft tissue contrast. Compared with CT, MRI is also safer, not involving any radiation; generally, there are situations where two or more imaging examinations are required to make an accurate diagnosis. Acquiring a set of different clinical images is a time consuming and expensive process that is not affordable to most patients. Therefore, cross-modality synthesis of medical images, such as MR to CT synthesis, is desirable for many diagnostic and therapeutic purposes.
The above observations reflect a general dilemma that it is desirable to obtain some form of medical image, but in practice it is not feasible. For this purpose, a system that is capable of synthesizing images of interest from different sources (e.g., image mode and acquisition protocol) may offer significant benefits. It may provide high-demand imaging data for certain clinical uses without the additional cost/risk of a real acquisition.
Recently developed generation countermeasure networks (GANs) [ Goodfellow I et al, general adaptive networks, 2014], a generation method based on deep learning, is becoming more popular in medical image synthesis, in some studies [ amine H et al, generating synthetic CTs from magnetic resonance imaging using generating adaptive networks,2018] [ kazemifare S et al, mri-only fibrous radial approach: assembling the statistical approach of synthetic CT images generating using a derived adaptive network, 2019, gagagagagagagagagazette is used for non-linear correlation between trans-ns mode applications, estimating true synthetic CT images. Different variants of GANs show excellent performance in medical image synthesis. Despite their powerful capabilities, previous learning-based synthesis models were based essentially on convolution architectures, using compact filters to extract local image features. This generalized bias reduces the number of model parameters to facilitate learning by exploiting the correlation between small neighborhoods of image pixels. However, it also limits the expression of contextual characteristics that reflect long-term spatial dependencies [ x.wang et al. Medical images contain contextual relationships between healthy and pathological tissue. For example, the bone in the skull or cerebrospinal fluid in the ventricles of the brain is widely distributed over spatially adjacent or separated brain regions, resulting in dependencies between distant voxels. Although pathological tissues have a less regular anatomical basis, their spatial distribution (e.g., location, number, shape) may still show disease-specific patterns. In principle, this can be improved by capturing a priori these relationships synthetically. Visual transformers (VIT) hold great promise for this goal because the attention operation of learning contextual features can increase sensitivity to long-range interactions and focus on key image regions to improve generalization to atypical anatomy.
Disclosure of Invention
The purpose of the embodiments of the present application is to provide a method, a system, and a readable storage medium for cross-modality synthesis of a medical image, which can avoid limiting the expression of context characteristics and improve estimation accuracy.
The embodiment of the application also provides a cross-modal synthesis method of the medical image, which comprises the following steps:
determining a medical image pair corresponding to the same part, wherein the medical image pair comprises an MRI image and a CT image which are matched for training;
performing image cross-mode synthesis on the MRI image based on the constructed initially generated confrontation network model to obtain a corresponding CT image, wherein the initially generated confrontation network model comprises a joint attention residual error processing module for determining corresponding joint attention characteristics by combining channel attention characteristics and global attention characteristics extracted from the image;
performing model training based on the matched medical image, performing down-sampling and up-sampling operations on the input MRI image by combining the joint attention residual error processing module in the training process so as to cooperatively store local and global contexts, and obtaining a target generation confrontation network model when the training is finished;
and inputting the obtained MRI image to be processed into the target to generate a confrontation network model, and obtaining a corresponding CT image synthesis result.
In a second aspect, an embodiment of the present application further provides a medical image cross-modality synthesis system, which includes an image processing module, a model building module, a model training module, and an image synthesis module, where:
the image processing module is used for determining a medical image pair corresponding to the same part, wherein the medical image pair comprises an MRI image and a CT image which are matched for training;
the model construction module is used for carrying out image cross-modal synthesis on the MRI image based on the constructed initially generated confrontation network model to obtain a corresponding CT image, and the initially generated confrontation network model comprises a joint attention residual error processing module which is used for jointly extracting channel attention characteristics and global attention characteristics from the image and determining corresponding joint attention characteristics;
the model training module is used for performing model training based on the matched medical image, and performing down-sampling and up-sampling operations on the input MRI image by combining the joint attention residual error processing module in the training process so as to cooperatively store local and global contexts and obtain a target generation confrontation network model when the training is finished;
and the image synthesis module is used for inputting the obtained MRI image to be processed into the target generation confrontation network model to obtain a corresponding CT image synthesis result.
In a third aspect, an embodiment of the present application further provides a readable storage medium, where the readable storage medium includes a program of a cross-modality medical image synthesis method, and when the program of the cross-modality medical image synthesis method is executed by a processor, the method implements the steps of a cross-modality medical image synthesis method as described in any one of the above.
As can be seen from the above, according to the medical image cross-modal synthesis method, system and readable storage medium provided in the embodiments of the present application, a corresponding network model is established for medical image cross-modal synthesis by using the context sensitivity fitted with the visual Transformer, the precision of the convolution operator and the reality of the counterlearning, so that the conversion between cross-modal medical image data can be performed based on the network model, the problem that the expression of the context characteristics is limited by the existing convolutional neural network technology, the dependency between remote voxels is lacked, the image conversion efficiency is improved, and the estimation accuracy is improved.
Additional features and advantages of the present application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the present application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a flowchart of a method for cross-modal synthesis of a medical image according to an embodiment of the present application;
FIG. 2 is a block diagram of a joint attention residual processing module;
FIG. 3 is a schematic diagram of a channel attention block;
FIG. 4 is a schematic diagram of a network architecture for performing downsampling and upsampling operations on an input target MRI image in conjunction with a joint attention residual processing module;
fig. 5 is a schematic structural diagram of a medical image cross-modality synthesis system according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined or explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
Referring to fig. 1, fig. 1 is a flowchart of a cross-modal synthesis method of a medical image according to some embodiments of the present application. The method is exemplified by being applied to a computer device (the computer device may specifically be a terminal or a server, and the terminal may specifically be but is not limited to various personal computers, notebook computers, smart phones, tablet computers and portable wearable devices, the server may be an independent server or a server cluster composed of a plurality of servers), and the method includes the following steps:
step S100, a medical image pair corresponding to the same part is determined, and the medical image pair includes an MRI image and a CT image for training which are matched.
And S200, carrying out image cross-mode synthesis on the MRI image based on the constructed initially generated confrontation network model to obtain a corresponding CT image, wherein the initially generated confrontation network model comprises a joint attention residual error processing module for determining a corresponding joint attention characteristic by combining the channel attention characteristic and the global attention characteristic extracted from the image.
And step S300, performing model training based on the matched medical image, performing down-sampling and up-sampling operations on the input MRI image by combining with the joint attention residual error processing module in the training process so as to cooperatively store local and global contexts, and obtaining a target generation confrontation network model when the training is finished.
And step S400, inputting the obtained MRI image to be processed into a target generation confrontation network model to obtain a corresponding CT image synthesis result.
Therefore, according to the medical image cross-modal synthesis method provided by the embodiment of the application, the context sensitivity of the visual Transformer, the precision of the convolution operator and the reality of the counterstudy are fitted, and the corresponding network model is established for the medical image cross-modal synthesis, so that the conversion between cross-modal medical image data can be performed based on the network model, the problem that the expression of the context characteristics is limited by the existing convolution neural network technology is overcome, the dependence between long-distance voxels is lacking, the image conversion efficiency is improved, and the estimation accuracy is improved.
In one embodiment, the step S100 of determining the target medical image pair corresponding to the same portion includes:
step S1001, a medical image pair to be processed corresponding to the same part is acquired, and the medical image pair to be processed includes an initial MRI image and an initial CT image that are mutually matched.
Step S1002, respectively preprocessing an initial MRI image and the initial CT image according to a preset preprocessing mode to obtain an MRI image and a CT image which are matched for training, wherein the preprocessing mode comprises at least one of an N4 bias correction mode and an image denoising mode.
And step S1003, determining a medical image pair corresponding to the same part according to the MRI image and the CT image which are matched for training.
Before model training is performed based on MRI data and CT data (i.e., the medical image to be processed), preprocessing such as N4 bias correction and denoising is performed on the MRI data and the CT data. And then, carrying out accurate registration work based on the obtained preprocessed data, and slicing into corresponding 2D data sets so as to improve the data registration accuracy.
In one embodiment, please refer to fig. 2, the joint attention residual processing module is composed of a channel attention block and a plurality of cascaded Swin Transformer blocks, wherein:
the channel attention block consists of a pooling layer, a first full-connection layer, a dot-product operation layer, a second full-connection layer and a sigmiod activation layer which are connected in sequence.
Specifically, please refer to fig. 3, which is different from the general channel attention block in that only one pooling operation is used to extract the statistical information of the input feature channel. In the current embodiment, the statistical information of the input feature channels is extracted based on the pooling layer covering two kinds of pooling operations, i.e., the maximum pooling operation, and the average pooling operation.
In one embodiment, the pooling layer activates the operation steps of full connectivity layer-dot product operation-full connectivity layer-sigmiod in sequence, and obtains the channel attention feature based on the extracted statistical information of the input feature channel. The obtained channel attention characteristics are multiplied by the input global attention characteristics, so that the required output characteristics are obtained, and the model can be better self-adapted to acquire the statistical information of the channel.
The pooling layer is composed of a maximum pooling branch layer for extracting statistical information of the input feature channels by a maximum pooling operation, and an average pooling branch layer for extracting statistical information of the input feature channels by an average pooling operation.
In particular, the commonly used pooling methods are maximum pooling and mean pooling. According to the relevant theory, the error of feature extraction mainly comes from two aspects:
(1) The variance of the estimated value is increased due to the limited size of the neighborhood;
(2) Convolutional layer parameter errors cause a shift in the estimated mean.
In general, maximum pooling can reduce the first and second errors, and more preserve background information and texture information of the image. Which, in a local sense, would obey the criteria of max-pooling, is similar to mean pooling.
In one embodiment, the size of the maximum pooled convolution kernel is typically 2 x 2. In the presence of a very large input quantity, the size of a possible convolution kernel may be set to 4x4. However, with a large convolution kernel setting, the signal size is significantly reduced and excessive loss of information may result. In general, pooling windows that do not overlap perform best.
In one embodiment, in step S300, when channel attention features are extracted from an input MRI image based on a channel attention block, the method includes:
step S3001, extracting a channel attention feature P from the input target MRI image by the following formula C
Figure BDA0003786138740000081
Wherein x is T For a global attention feature, δ, extracted from an input target MRI image 1 Is the first fully-connected layer, δ 2 A second fully connected layer;
Figure BDA0003786138740000082
the function is activated for sigmoid.
In one embodiment, in step S300, when the joint attention feature of the image is determined by combining the channel attention feature and the global attention feature extracted from the input target MRI image, the method includes:
step S3002, determining the joint attention feature P of the image by the following formula SRTB
P SRTB =Conv(P STB (x T )+P C (x T ));
Wherein x is T For global attention features, P, extracted from an input target MRI image STB Is a cascade block composed of the plurality of cascade Swin transform blocks, P C Conv is the convolutional layer for the channel attention block.
Specifically, referring to fig. 2, the cascade block is composed of six Swin Transformer blocks in cascade. In the present embodiment, the computer device will input the feature x T Through the above-mentioned channel attention block P C Extraction of channel attention features is performed. Wherein the channel attention feature is further connected to an output of the concatenation block, thereby determining a joint attention feature of the image based on the extracted channel attention feature and the global attention feature.
In one embodiment, in step S300, the downsampling and upsampling operations performed on the input target MRI image in conjunction with the joint attention residual processing module include: performing four down-sampling operations and corresponding four up-sampling operations on the input target MRI image to maintain the original size of the image, wherein the down-sampling operations comprise a maximum pooling operation, and the up-sampling operations comprise a deconvolution operation; at the time of first sampling, jumping and connecting an output characteristic obtained through the down-sampling to an initial input characteristic of the up-sampling; and when the second sampling, the third sampling or the fourth sampling is carried out, the output characteristic obtained by the down sampling passes through the joint attention residual error processing module and is connected to the initial input characteristic of the up sampling.
Specifically, the computer device may perform a down-sampling operation on the target MRI image through a preset maximum pooling layer, and perform an up-sampling operation on the target MRI image through a preset anti-convolution layer.
Wherein, with reference to fig. 4, at the time of the first sampling, the computer device will jump connect to the initial input feature of the deconvolution operation layer corresponding to the upsampling operation via the target output feature corresponding to the maximum pooling layer output for the downsampling operation.
It should be noted that the target output features output through the downsampling of the 2 nd, 3 rd and 4 th layers are passed through the joint attention residual error processing module (i.e. RSTB illustrated in fig. 4) and then connected to the initial input features of the corresponding deconvolution operation layer for performing the upsampling operation.
Therefore, the method can effectively overcome the defect that the dependency among remote voxels is lacked due to the limitation of the expression of the context characteristics in the conventional convolutional neural network technology, and improve the data prediction accuracy.
In one embodiment, in step S300, model training is performed based on the target medical image pair, and when the training is finished, a target generation confrontation network model is obtained, including:
step S3003, constructing a target loss function, the target loss function including generating at least one of a run-out loss function, an average absolute error loss function determined based on an average absolute error between the target CT image and the corresponding CT image synthesis result, and a frequency loss function determined based on a frequency difference between the target CT image and the corresponding CT image synthesis result.
Specifically, the computer device may construct a corresponding loss function based on a deviation between the target CT image and the corresponding CT image synthesis result.
Wherein, when a plurality of types of loss functions are involved, the total loss function can be determined based on the weighted summation result among the plurality of types of loss functions. Subsequently, model optimization is performed based on the total loss function, and the optimization objects include, but are not limited to, adjustment of network parameters, adjustment of a network structure, and the like, which is not limited in this embodiment of the present application.
Step S3004, in the training process, based on the target loss function, model optimization is carried out through a gradient descent method, and when the preset training end condition is determined to be reached, a target generation confrontation network model is obtained.
Specifically, in the machine learning algorithm, when the loss function is minimized, iterative solution may be performed by a gradient descent method, and the minimized loss function and the model parameter value may be obtained thereby.
In one embodiment, if the maximum value of the loss function needs to be solved, a gradient ascent method can be used for iterative calculation. It should be noted that the gradient descent method and the gradient ascent method may be mutually inverted. Illustratively, when the minimum value of the loss function f (ω) needs to be solved, a gradient descent method may be used for iterative solution. In practice, however, the maximum of the loss function f (ω) can also be solved in reverse, in which case the gradient ascent method is used.
In one embodiment, the training end condition includes, but is not limited to, that the target loss function approaches 0, the maximum number of iterations is reached, and the like, which is not limited in the embodiment of the present application.
In one embodiment, the calculation formula for generating the antagonistic loss function LcGAN comprises:
LcGAN=E x,y [logD(x,y)]+E x,y [log(1-D(x,G(x)))];
wherein x ∈ R N For the input MRI image of the object, y ∈ R N A target CT image matched with the target MRI image; e x,y [*]Is the expected value of the distribution function; d (x, y) is the matching between the identification target MRI image and the corresponding target CT image, and G (x) is the CT image synthesis result output via the model.
Specifically, D (x, y) described above may be further understood as an identifier for identifying the degree of matching between the target MRI image x and the corresponding generated CT image y. The above-mentioned G (x) may further be understood as a generator for generating a corresponding CT image based on the input target MRI image x.
It can be understood that, in the above equation, logD (x, y) is the probability that the discriminator determines the true data as true data, and log (1-D (x, G (x))) is the probability that the discriminator determines the false data generated by the generator as false data.
On the other hand, the countermeasure network merely proposes a network structure, and in general, the network structure uses two models, namely: one generative model, one discriminative model. Where the authentication model is used to determine whether a given picture is a true picture (a picture taken from a data set), the task of generating the model is to create a picture that looks like a true picture. At the beginning, the two models are not trained, the two models are subjected to antagonistic training together, wherein the generated model generates a picture to deceive the identification model, then the identification model judges whether the picture is true or false, and finally, the two models have stronger and stronger capacities in the training process of the two models and finally reach a steady state.
Mean absolute error loss function L 1 The calculation formula (2) includes:
L 1 =E x,y ||y-G(x)|| 1
wherein | | 1 Represents L 1 And (4) norm.
Specifically, the mean absolute error function is also a commonly used regression loss function, which is the sum of absolute values of differences between the target value and the predicted value, and represents the mean error magnitude of the predicted value without considering the direction of the error. Compared with the average error function, the average absolute error function has the advantages that the deviation is converted into the absolute value, and the situation that positive and negative values are offset does not occur, so that the actual situation of the error of the predicted value can be better reflected by the average absolute error function, and the training accuracy of the model is improved.
Frequency loss function L fre The calculation formula (2) includes:
L fre =E x,y [||y l -G(x) l ||+||y h -G(x) h ||];
wherein, y l For low frequency information of the target MRI image, G (x) l Low-frequency information which is a CT image synthesis result; y is h High frequency information for the MRI image of interest, G (x) h High frequency information of the CT image synthesis result.
Specifically, the computer device may employ a gaussian kernel function, and the low-frequency information is retained by filtering out the high-frequency feature:
Figure BDA0003786138740000111
wherein [ i, j]Representing a spatial position within the image; sigma 2 Represents the variance, where the variance increases in proportion to the gaussian kernel size.
In one embodiment, y l For example, low-frequency information y in the determination image y l In the process, the computer device may perform convolution processing by using a gaussian kernel, and the calculation method may refer to the following formula:
y l [i,j]=∑ mn K[m,n]·y[i+m,j+n];
wherein, y l For low frequency information extracted from image y, [ i, j]Representing a spatial location within the image; m, n are indexes of two-dimensional Gaussian kernel function, K [ m, n ]]Is a gaussian kernel function.
In one embodiment, the computer device may filter out low frequency information y from the image y l In the mode (3), high-frequency information y is further extracted from the image y h Namely: y is h =y-y l
Therefore, in the model training process, the computer equipment performs fusion training on the constructed target generation confrontation network model by using loss functions such as frequency constraint, confrontation constraint loss functions and the like, so that compared with the common generation confrontation network model, the method can achieve a higher-quality synthesis effect and improve the image synthesis quality.
Referring to fig. 5, the present application discloses a medical image cross-modality synthesis system 500, where the system 500 includes an image processing module 501, a model building module 502, a model training module 503, and an image synthesis module 504, where:
the image processing module 501 is configured to determine a medical image pair corresponding to the same region, where the medical image pair includes an MRI image and a CT image which are used for training and matched with each other.
The model building module 502 is configured to perform image cross-mode synthesis on an MRI image based on a built initially generated confrontation network model to obtain a corresponding CT image, where the initially generated confrontation network model includes a joint attention residual processing module configured to determine a corresponding joint attention feature by combining a channel attention feature and a global attention feature extracted from the image.
The model training module 503 is configured to perform model training based on the matched medical image pair, and perform down-sampling and up-sampling operations on the input MRI image in combination with the joint attention residual processing module during the training process to cooperatively store local and global contexts, and obtain a target generation confrontation network model when the training is finished.
The image synthesis module 504 is configured to input the obtained MRI image to be processed into a target generation countermeasure network model, so as to obtain a corresponding CT image synthesis result.
In one embodiment, the modules are further configured to implement the method in any optional implementation manner of the embodiment.
Therefore, according to the medical image cross-modal synthesis system disclosed by the application, the context sensitivity of the visual Transformer, the precision of the convolution operator and the reality of counterstudy are fitted, and the corresponding network model is established for the medical image cross-modal synthesis, so that the cross-modal medical image data can be converted based on the network model, the problem that the expression of the context characteristic is limited by the existing convolution neural network technology is overcome, the dependence among remote voxels is lacking, the image conversion efficiency is improved, and the estimation accuracy is improved.
The embodiment of the present application provides a readable storage medium, and when being executed by a processor, the computer program performs the method in any optional implementation manner of the foregoing embodiment. The storage medium may be implemented by any type of volatile or nonvolatile storage device or combination thereof, such as a Static Random Access Memory (SRAM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), an Erasable Programmable Read-Only Memory (EPROM), a Programmable Read-Only Memory (PROM), a Read-Only Memory (ROM), a magnetic Memory, a flash Memory, a magnetic disk, or an optical disk.
The readable storage medium establishes a corresponding network model for cross-modal synthesis of the medical image by utilizing the context sensitivity of the fitted visual Transformer, the precision of a convolution operator and the reality of counterstudy, so that the conversion can be performed between cross-modal medical image data based on the network model, the problem that the expression of the context characteristics is limited by the conventional convolutional neural network technology, the dependence between remote voxels is lacked is solved, the image conversion efficiency is improved, and the estimation accuracy is improved.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be through some communication interfaces, indirect coupling or communication connection between devices or units, and may be in an electrical, mechanical or other form.
In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist alone, or two or more modules may be integrated to form an independent part.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. A cross-mode synthesis method of medical images is characterized by comprising the following steps:
determining a medical image pair corresponding to the same part, wherein the medical image pair comprises an MRI image and a CT image which are matched for training;
performing image cross-mode synthesis on the MRI image based on a constructed initially generated confrontation network model to obtain a corresponding CT image, wherein the initially generated confrontation network model comprises a joint attention residual error processing module which is used for jointly extracting a channel attention feature and a global attention feature from the image and determining a corresponding joint attention feature;
model training is carried out based on the matched medical image, in the training process, downsampling and upsampling operations are carried out on the input MRI image by combining the joint attention residual error processing module so as to cooperatively save local and global contexts, and when the training is finished, a target generation confrontation network model is obtained;
and inputting the obtained MRI image to be processed into the target generation confrontation network model to obtain a corresponding CT image synthesis result.
2. The method of claim 1, wherein determining the pair of medical images corresponding to the same location comprises:
acquiring a medical image pair to be processed corresponding to the same part, wherein the medical image pair to be processed comprises an initial MRI image and an initial CT image which are matched with each other;
respectively preprocessing the initial MRI image and the initial CT image according to a preset preprocessing mode to obtain an MRI image and a CT image which are matched for training, wherein the preprocessing mode comprises at least one of an N4 offset correction mode and an image denoising mode;
and determining a medical image pair corresponding to the same part according to the MRI image and the CT image which are matched for training.
3. The method of claim 1, wherein the joint attention residual processing module is comprised of a channel attention block and a plurality of cascaded Swin transform blocks, wherein:
the channel attention block consists of a pooling layer, a first full-connection layer, a dot product operation layer, a second full-connection layer and a sigmiod activation layer which are connected in sequence;
the pooling layer is composed of a maximum pooling branch layer for extracting statistical information of the input feature channels by a maximum pooling operation, and an average pooling branch layer for extracting statistical information of the input feature channels by an average pooling operation.
4. The method of claim 3, wherein when extracting channel attention features from an input MRI image based on the channel attention block, the method comprises:
extracting a channel attention feature P from an input target MRI image through the following formula C
Figure FDA0003786138730000021
Wherein x is T For a global attention feature, δ, extracted from an input target MRI image 1 Is the first fully-connected layer, δ 2 A second fully connected layer;
Figure FDA0003786138730000022
the function is activated for sigmoid.
5. The method of claim 3, wherein when determining the joint attention feature of the images in combination with the extracted channel attention feature and the global attention feature from the input target MRI image, the method comprises:
determining a joint attention feature P of the images by the following formula SRTB
P SRTB =Conv(P STB (x T )+P c (x T ));
Wherein x is T For global attention features, P, extracted from an input target MRI image STB Is a cascade block composed of the plurality of cascade Swin transform blocks, P C Conv is the convolutional layer for the channel attention block.
6. The method of claim 1, wherein the down-sampling and up-sampling operations of the input target MRI image in conjunction with the joint attention residual processing module comprise:
performing four downsampling operations on an input target MRI image and corresponding four upsampling operations to maintain the original size of the image, wherein the downsampling operations comprise maximum pooling operations and the upsampling operations comprise deconvolution operations;
at the time of the first sampling, jumping and connecting the output characteristic obtained by the down-sampling to the initial input characteristic of the up-sampling;
and when the second sampling, the third sampling or the fourth sampling is carried out, the output characteristic obtained by the down sampling passes through the joint attention residual error processing module and is connected to the initial input characteristic of the up sampling.
7. The method of claim 1, wherein model training based on the target medical image and upon completion of training, obtaining a target generation confrontation network model comprises:
constructing a target loss function including generating at least one of an antagonistic loss function, a mean absolute error loss function determined based on a mean absolute error between the target CT image and the corresponding CT image synthesis result, and a frequency loss function determined based on a frequency difference between the target CT image and the corresponding CT image synthesis result;
in the training process, model optimization is carried out through a gradient descent method based on the target loss function, and when the preset training end condition is determined to be reached, a target generation confrontation network model is obtained.
8. The method according to claim 7, wherein said calculation formula for generating the antagonistic loss function LcGAN comprises:
LcGAN=E x,y [logD(x,y)]+E x,y [log(1-D(x,G(x)))];
wherein y ∈ R N For the input MRI image of the object, y ∈ R N A target CT image matched with the target MRI image; e x,y [*]Is the expected value of the distribution function; d (x, y) is the matching between the identification target MRI image and the corresponding target CT image, G (x) is the CT image synthesis result output via the model;
the mean absolute error loss function L 1 The calculation formula (2) includes:
L 1 =E x,y ||y-G(x)|| 1
wherein | | 1 Represents L 1 A norm;
said frequency loss function L fre The calculation formula (2) includes:
L fre =E x,y [||y l -G(x) l ||+||y h -G(x) h ||];
wherein, y l For low frequency information of the target MRI image, G (x) l Low-frequency information which is a CT image synthesis result; y is h High frequency information for the target MRI image, G (x) h High frequency information of the CT image synthesis result.
9. A medical image cross-modality synthesis system, characterized in that the system comprises an image processing module, a model construction module, a model training module and an image synthesis module, wherein:
the image processing module is used for determining a medical image pair corresponding to the same part, wherein the medical image pair comprises an MRI image and a CT image which are matched for training;
the model construction module is used for carrying out image cross-mode synthesis on the MRI image based on a constructed initially generated confrontation network model to obtain a corresponding CT image, and the initially generated confrontation network model comprises a joint attention residual error processing module which is used for jointly extracting a channel attention feature and a global attention feature from the image and determining a corresponding joint attention feature;
the model training module is used for performing model training based on the matched medical image, and performing down-sampling and up-sampling operations on the input MRI image by combining the joint attention residual error processing module in the training process so as to cooperatively store local and global contexts and obtain a target generation confrontation network model when the training is finished;
and the image synthesis module is used for inputting the obtained MRI image to be processed into the target generation confrontation network model to obtain a corresponding CT image synthesis result.
10. A readable storage medium, characterized in that the readable storage medium comprises a program of a medical image cross modality synthesis method, which when executed by a processor implements the steps of the method according to any one of claims 1 to 8.
CN202210942137.3A 2022-08-08 2022-08-08 Medical image cross-mode synthesis method and system and readable storage medium Pending CN115311183A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210942137.3A CN115311183A (en) 2022-08-08 2022-08-08 Medical image cross-mode synthesis method and system and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210942137.3A CN115311183A (en) 2022-08-08 2022-08-08 Medical image cross-mode synthesis method and system and readable storage medium

Publications (1)

Publication Number Publication Date
CN115311183A true CN115311183A (en) 2022-11-08

Family

ID=83860808

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210942137.3A Pending CN115311183A (en) 2022-08-08 2022-08-08 Medical image cross-mode synthesis method and system and readable storage medium

Country Status (1)

Country Link
CN (1) CN115311183A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116129235A (en) * 2023-04-14 2023-05-16 英瑞云医疗科技(烟台)有限公司 Cross-modal synthesis method for medical images from cerebral infarction CT to MRI conventional sequence
CN116152235A (en) * 2023-04-18 2023-05-23 英瑞云医疗科技(烟台)有限公司 Cross-modal synthesis method for medical image from CT (computed tomography) to PET (positron emission tomography) of lung cancer
CN116563189A (en) * 2023-07-06 2023-08-08 长沙微妙医疗科技有限公司 Medical image cross-contrast synthesis method and system based on deep learning
CN116778021A (en) * 2023-08-22 2023-09-19 北京大学 Medical image generation method, device, electronic equipment and storage medium
CN118071865A (en) * 2024-04-17 2024-05-24 英瑞云医疗科技(烟台)有限公司 Cross-modal synthesis method and device for medical images from brain peduncles CT to T1

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116129235A (en) * 2023-04-14 2023-05-16 英瑞云医疗科技(烟台)有限公司 Cross-modal synthesis method for medical images from cerebral infarction CT to MRI conventional sequence
CN116152235A (en) * 2023-04-18 2023-05-23 英瑞云医疗科技(烟台)有限公司 Cross-modal synthesis method for medical image from CT (computed tomography) to PET (positron emission tomography) of lung cancer
CN116563189A (en) * 2023-07-06 2023-08-08 长沙微妙医疗科技有限公司 Medical image cross-contrast synthesis method and system based on deep learning
CN116563189B (en) * 2023-07-06 2023-10-13 长沙微妙医疗科技有限公司 Medical image cross-contrast synthesis method and system based on deep learning
CN116778021A (en) * 2023-08-22 2023-09-19 北京大学 Medical image generation method, device, electronic equipment and storage medium
CN116778021B (en) * 2023-08-22 2023-11-07 北京大学 Medical image generation method, device, electronic equipment and storage medium
CN118071865A (en) * 2024-04-17 2024-05-24 英瑞云医疗科技(烟台)有限公司 Cross-modal synthesis method and device for medical images from brain peduncles CT to T1

Similar Documents

Publication Publication Date Title
CN115311183A (en) Medical image cross-mode synthesis method and system and readable storage medium
Serte et al. Deep learning in medical imaging: A brief review
Zhang et al. Detecting anatomical landmarks from limited medical imaging data using two-stage task-oriented deep neural networks
CN111368849B (en) Image processing method, image processing device, electronic equipment and storage medium
CN113239755B (en) Medical hyperspectral image classification method based on space-spectrum fusion deep learning
Gaggion et al. Improving anatomical plausibility in medical image segmentation via hybrid graph neural networks: applications to chest x-ray analysis
Bengs et al. Three-dimensional deep learning with spatial erasing for unsupervised anomaly segmentation in brain MRI
Singh et al. Medical image generation using generative adversarial networks
Sun et al. Classification for thyroid nodule using ViT with contrastive learning in ultrasound images
Seo et al. Neural contrast enhancement of CT image
Zhang et al. Boundary-oriented network for automatic breast tumor segmentation in ultrasound images
Zhou et al. Automatic multi‐label temporal bone computed tomography segmentation with deep learning
Mahapatra Registration of histopathogy images using structural information from fine grained feature maps
Rasoulian et al. Weakly supervised intracranial hemorrhage segmentation using head-wise gradient-infused self-attention maps from a swin transformer in categorical learning
Poonkodi et al. 3D-MedTranCSGAN: 3D medical image transformation using CSGAN
Jiang et al. A hybrid enhanced attention transformer network for medical ultrasound image segmentation
Perez–Gonzalez et al. Probabilistic learning coherent point drift for 3D ultrasound fetal head registration
Safari et al. MedFusionGAN: multimodal medical image fusion using an unsupervised deep generative adversarial network
Khan et al. Transformers in medical image segmentation: a narrative review
Zhang et al. U-net-and-a-half: convolutional network for biomedical image segmentation using multiple expert-driven annotations
Goyal et al. An efficient medical assistive diagnostic algorithm for visualisation of structural and tissue details in CT and MRI fusion
Chatterjee et al. MICDIR: Multi-scale inverse-consistent deformable image registration using UNetMSS with self-constructing graph latent
CN115965785A (en) Image segmentation method, device, equipment, program product and medium
CN112950654B (en) Brain tumor image segmentation method based on multi-core learning and super-pixel nuclear low-rank representation
Zhou et al. GMRE-iUnet: Isomorphic Unet fusion model for PET and CT lung tumor images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination