CN116503515A - Brain lesion image generation method and system based on text and image multi-mode - Google Patents
Brain lesion image generation method and system based on text and image multi-mode Download PDFInfo
- Publication number
- CN116503515A CN116503515A CN202310463730.4A CN202310463730A CN116503515A CN 116503515 A CN116503515 A CN 116503515A CN 202310463730 A CN202310463730 A CN 202310463730A CN 116503515 A CN116503515 A CN 116503515A
- Authority
- CN
- China
- Prior art keywords
- image
- text
- model
- brain
- focus
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 28
- 206010051290 Central nervous system lesion Diseases 0.000 title claims abstract description 20
- 210000004556 brain Anatomy 0.000 claims abstract description 26
- 230000011218 segmentation Effects 0.000 claims abstract description 14
- 230000003902 lesion Effects 0.000 claims abstract description 12
- 238000013480 data collection Methods 0.000 claims abstract description 7
- 238000012549 training Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 10
- 125000004122 cyclic group Chemical group 0.000 claims description 6
- 208000014644 Brain disease Diseases 0.000 abstract description 8
- 238000010586 diagram Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 5
- 238000009792 diffusion process Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 208000006011 Stroke Diseases 0.000 description 3
- 238000003745 diagnosis Methods 0.000 description 3
- 238000002372 labelling Methods 0.000 description 3
- 206010008190 Cerebrovascular accident Diseases 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 230000002490 cerebral effect Effects 0.000 description 2
- 238000004195 computer-aided diagnosis Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 238000002595 magnetic resonance imaging Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- ZHUJMSMQIPIPTF-JMBSJVKXSA-N (2s)-2-[[(2s)-2-[[2-[[(2r)-2-[[(2s)-2-amino-3-(4-hydroxyphenyl)propanoyl]amino]propanoyl]amino]acetyl]amino]-3-phenylpropanoyl]amino]-4-methylpentanoic acid Chemical compound C([C@@H](C(=O)N[C@@H](CC(C)C)C(O)=O)NC(=O)CNC(=O)[C@@H](C)NC(=O)[C@@H](N)CC=1C=CC(O)=CC=1)C1=CC=CC=C1 ZHUJMSMQIPIPTF-JMBSJVKXSA-N 0.000 description 1
- 208000003174 Brain Neoplasms Diseases 0.000 description 1
- 206010008111 Cerebral haemorrhage Diseases 0.000 description 1
- 208000032274 Encephalopathy Diseases 0.000 description 1
- 238000013256 Gubra-Amylin NASH model Methods 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 206010008118 cerebral infarction Diseases 0.000 description 1
- 208000026106 cerebrovascular disease Diseases 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 238000002591 computed tomography Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000302 ischemic effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 201000006417 multiple sclerosis Diseases 0.000 description 1
- 208000018389 neoplasm of cerebral hemisphere Diseases 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G06T11/60—Editing figures and text; Combining figures or text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0475—Generative networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/70—Labelling scene content, e.g. deriving syntactic or semantic representations
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Molecular Biology (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a brain focus image generation method and system based on text and image multi-mode, comprising the following steps: and a data collection module: collecting a public data set of related brain lesions, matching each image with a text template to form a text-image template data set; and a fine adjustment module: the image and the text are respectively encoded and converted into a common embedded space, and then similarity is calculated for matching. And a data expansion module: and performing image generation and variant generation according to the required text requirement by using the trimmed DALLE2 model. And the annotation image generation module is used for editing the target image by using the circularly fine-tuned model, providing a mask specified editing area and carrying out text description to generate a target focus. Thereby generating a lesion image with labels. The invention has the advantages that: the method can process the image and the text information simultaneously to generate more real, accurate and various brain disease focus images, realize the unsupervised brain disease focus segmentation task and be more efficient, accurate and diversified.
Description
Technical Field
The invention relates to the technical field of artificial intelligence generation of medical images, in particular to a text and image multi-mode brain focus image generation method and system based on deep learning.
Background
The brain focus image refers to an image presented by an abnormal focus found by examining the human brain through imaging technology (such as magnetic resonance imaging, computer tomography, etc.). These images can be used by doctors for diagnosis and treatment of encephalopathy. Common brain lesions include cerebral hemorrhage, cerebral infarction, tumors, and the like. Through the application of Computer Aided Diagnosis (CAD) and artificial intelligence technology, brain lesions can be identified and positioned more accurately, and important references are provided for doctors to formulate treatment schemes.
The prior art proposes deep learning based brain disease image generation models, such as GAN [1][2]
(Generative Adversarial Networks), and the like. These models are typically based on a single data source, such as a brain MRI image, to generate a brain lesion image. However, these methods have some drawbacks. First, they cannot utilize other available data sources, such as lesion description text, to improve the accuracy and authenticity of the generated image, nor can they add a priori information through the text description. Second, these methods typically require a large amount of training data, which is often difficult to obtain, especially for rare brain diseases. Meanwhile, the hyper-parameters of the GAN model are very sensitive, training is very unstable, and the diversity of the task of generating images is to be improved.
In addition, there is also prior art that proposes a GAN-based multimodal image generation model [3] Such as Text-to-Image Generation (Text-to-Image Generation) models. These models can combine both text and image information to generate an image. However, these models are often not well-suited for the generation of brain disease images because the pathological features of brain disease and the features of other objects are different, requiring specialized models to handle the more detailed textures. Most notably, these generative models do not provide labels for both the generated image and the edited lesion area, and thus do not create labeled image pairs for the segmentation task for training.
From the foregoing, it is known that prior art models are typically based on a single data source, and it is difficult to generate a true, accurate brain lesion image using both text and image multimodal information. Furthermore, the prior art requires a large amount of training data and computational resources to train the model, and is not suitable for the special needs of the medical image field, simply generating images without generating labeled areas for editing areas (e.g., added lesions) is not able to form paired images and labels to train the segmentation model.
[1].Han,Changhee,et al."GAN-based synthetic brain MR image generation."2018IEEE 15th international symposium on biomedical imaging(ISBI 2018).IEEE,2018.
[2].Huang,Yawen,et al."MCMT-GAN:Multi-task coherent modality transferable GAN for 3D brain image synthesis."IEEE Transactions on Image Processing 29(2020):8187-8198.
[3].Zhu,Minfeng,et al."Dm-gan:Dynamic memory generative adversarial networks for text-to-image synthesis."Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.2019.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a brain focus image generation method and system based on text and image multi-mode. The image and text information can be processed simultaneously. The method can simultaneously utilize the multi-modal information of the text and the image and the appointed editing area to generate a more real and accurate brain disease focus image with the label, and simultaneously reduce the requirements of training data and calculation resources.
In order to achieve the above object, the present invention adopts the following technical scheme:
a brain focus image generation method based on text and image multi-modality, comprising the steps of:
step1, collecting a public data set of related brain lesions, matching each image with a text template, and filling reserved positions in the text template according to each image sample to form a text-image template data set.
Step2, using the paired text-image template dataset to fine tune on the CLIP module of DALLE2 so as to match the image text feature space.
Step3: and performing image generation and variant generation according to the required text requirement by using the trimmed DALLE2 model. The CLIP module encodes and maps the required text into a text feature space, which is then converted into an image feature space by a priori model, and the image features generate an image with required semantics by a decoder. The generated image is used as expansion data, and the requirement of the DALLE2 model on the data volume is further met on the basis of a limited public data set.
Step4, performing cyclic fine tuning on the model again by the paired image text generated in the Step3 together with the original text-image template data set of the Step 1.
Step5, editing the target image by using the circularly fine-tuned model, providing a mask to specify the area of the edited image, and inputting text description to generate a target focus so as to generate a focus image with labels for subsequent segmentation tasks.
Further, the fine tuning described in S2 is specifically: the pre-trained weighted model of the large-scale text image pair is loaded as an initialization model, and text-image data based on the brain focus image generated locally is used for continuous training on the initialization model, so that fine adjustment of model weights is completed.
Further, the cyclic fine tuning in S4 is specifically: and (3) taking the preliminary fine tuning model in the step (S2) as an initialization model, and training the model again by using the data generated by expansion in the step (S3) and the original data to finish secondary fine tuning of the model parameters.
The invention discloses a brain focus image generation system, which can be used for implementing the brain focus image generation method, and concretely comprises the following steps: the system comprises a data collection module, a fine adjustment module, a data expansion module and a marked image generation module;
and a data collection module: a public data set for collecting relevant brain lesions and matching each image with a text template to form a text-image template data set;
and a fine adjustment module: matching is performed by encoding the image and text separately, i.e. both are converted into a common embedding space, and then by calculating the similarity between them.
And a data expansion module: and performing image generation and variant generation according to the required text requirement by using the trimmed DALLE2 model.
And the annotation image generation module is used for editing the target image by using the circularly fine-tuned model, providing a mask specified editing area and carrying out text description to generate a target focus. Thereby generating a lesion image with labels.
The invention also discloses a computer readable storage medium, on which a computer program is stored, which program, when being executed by a processor, implements the above-mentioned brain focus image generation method.
Compared with the prior art, the invention has the advantages that:
1. the advantages of a large multi-modal model (such as CLIP, DALLE2 and the like) are utilized, the image and text information can be processed simultaneously, and the prior description of the language can be used for guiding the generation of the image according to the requirement.
2. The method can generate more true, accurate and various brain lesion images, realize the generation of non-labeling lesion segmentation image pairs, and provide better tools and support for medical diagnosis and research.
3. Better adapt to brain lesion image generation tasks.
4. The most advanced diffusion generation model (DALLE 2) is adopted to generate highly realistic and diversified brain lesion images and corresponding lesion labels, and the generated image label pairs are used for training of the separate models to realize the task of unsupervised brain lesion segmentation.
5. By using a large multi-mode model DALLE2 and the most advanced diffusion generation model, the method can generate brain focus image annotation pairs more efficiently, accurately and variously, thereby providing better medical focus segmentation diagnosis and research support.
Drawings
Fig. 1 is a flowchart of a method for generating a brain focus image according to an embodiment of the present invention.
FIG. 2 is a fine tuning flow chart of an embodiment of the present invention.
Fig. 3 is a flow chart of generating a lesion image with labels according to an embodiment of the present invention.
Detailed Description
The invention will be described in further detail below with reference to the accompanying drawings and by way of examples in order to make the objects, technical solutions and advantages of the invention more apparent.
As shown in fig. 1, a method for generating a brain focus image based on text and image multi-mode includes the steps of:
step1, collecting relevant public data sets of brain focus, such as brain tumor segmentation data set BraTS, old cerebral apoplexy segmentation data ATLAS, ischemic cerebral apoplexy data set ISLES, multiple sclerosis focus segmentation data set MSSEG, etc. And each image is provided with a text template, for example: [ Stroke ] with [ hyper ] intensity on [ DWI ] module ] fills [ ] locations in the text template from each image sample, forming a text-image template dataset.
Step2, using the paired text-image template dataset to fine tune on the CLIP module of DALLE2 so as to match the image text feature space. Specifically, a pre-trained weighted model of a large-scale text image pair is loaded as an initialization model, and text-image data based on a brain focus image generated locally is used for continuous training on the initialization model, so that fine adjustment of model weights is completed.
Step3: and performing image generation and variant generation according to the required text requirement by using the trimmed DALLE2 model. Specifically, the CLIP module encodes and maps the required text to a text feature space, which in turn is converted to an image feature space by a priori model, and the image features generate an image with required semantics by a decoder. Since the generation pattern of DALLE2 is based on the diffusion model, the noise of initialization is different for the same model with fixed weight parameters, and the corresponding generation data is not used. Therefore, the generated image is equivalent to a data expansion, so that the requirement of the DALLE2 model on the data volume is further met on the basis of a limited public data set.
Step4, performing cyclic fine tuning on the model again by the paired image text generated in the Step3 together with the original text-image template data set of the Step 1. Specifically, the preliminary fine tuning model in S2 is used as an initialization model in the step, that is, based on the model parameter weight, the model is trained again by using the data generated by expansion in S3 together with the original data, so as to complete the secondary fine tuning of the model parameters.
Step5, editing the target image by using the circularly fine-tuned model, and providing a mask designated editing area and a text description to generate a target focus. Thereby generating a focus image with labels, which can be used for subsequent segmentation tasks.
Specifically, in step1, details of the fine tuning in the CLIP module and the diffusion model are shown in fig. 2. Matching is performed by encoding the image and text separately, i.e. both are converted into a common embedding space, and then by calculating the similarity between them.
In step5, as shown in fig. 3, the input text (labeled (2) in fig. three) is "A tumor at Axial view with hyper-intensity" and is assigned to the region (i.e., a binary mask, i.e., the label pointed by the arrow in the labeled (3) in fig. three) near the ventricle of the input edited image (labeled (1) in fig. three), and the image with the lesion (labeled (3) in fig. three) is generated by the 3D graphics tool) can be generated.
As can be seen from fig. 3, for the same input image, different language descriptions and designated editing areas can be applied to generate a lesion image according to requirements. But also retains all the features and details of the original image outside the edit area. The generated sample data and the labeling area can be used for subsequent segmentation tasks, so that dependence on the labeling data is greatly reduced and avoided.
In still another embodiment of the present invention, there is provided a brain focus image generation system, which can be used to implement the above-described brain focus image generation method, specifically including: the system comprises a data collection module, a fine adjustment module, a data expansion module and a marked image generation module;
and a data collection module: a public data set for collecting relevant brain lesions and matching each image with a text template to form a text-image template data set;
and a fine adjustment module: matching is performed by encoding the image and text separately, i.e. both are converted into a common embedding space, and then by calculating the similarity between them.
And a data expansion module: and performing image generation and variant generation according to the required text requirement by using the trimmed DALLE2 model.
And the annotation image generation module is used for editing the target image by using the circularly fine-tuned model, providing a mask specified editing area and carrying out text description to generate a target focus. Thereby generating a lesion image with labels.
In a further embodiment of the present invention, the present invention also provides a storage medium, in particular, a computer readable storage medium (Memory), which is a Memory device in a terminal device, for storing programs and data. It will be appreciated that the computer readable storage medium herein may include both a built-in storage medium in the terminal device and an extended storage medium supported by the terminal device. The computer-readable storage medium provides a storage space storing an operating system of the terminal. Also stored in the memory space are one or more instructions, which may be one or more computer programs (including program code), adapted to be loaded and executed by the processor. The computer readable storage medium herein may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory.
One or more instructions stored in a computer-readable storage medium may be loaded and executed by a processor to implement the respective steps of the methods for generating images of brain lesions in the above-described embodiments; one or more instructions in a computer-readable storage medium are loaded by a processor and perform the steps of:
step1, collecting a public data set of related brain lesions, matching each image with a text template, and filling reserved positions in the text template according to each image sample to form a text-image template data set.
Step2, using the paired text-image template dataset to fine tune on the CLIP module of DALLE2 so as to match the image text feature space.
Step3: and performing image generation and variant generation according to the required text requirement by using the trimmed DALLE2 model. The CLIP module encodes and maps the required text into a text feature space, which is then converted into an image feature space by a priori model, and the image features generate an image with required semantics by a decoder. The generated image is used as expansion data, and the requirement of the DALLE2 model on the data volume is further met on the basis of a limited public data set.
Step4, performing cyclic fine tuning on the model again by the paired image text generated in the Step3 together with the original text-image template data set of the Step 1.
Step5, editing the target image by using the circularly fine-tuned model, providing a mask to specify the area of the edited image, and inputting text description to generate a target focus so as to generate a focus image with labels for subsequent segmentation tasks.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Those of ordinary skill in the art will appreciate that the embodiments described herein are intended to aid the reader in understanding the practice of the invention and that the scope of the invention is not limited to such specific statements and embodiments. Those of ordinary skill in the art can make various other specific modifications and combinations from the teachings of the present disclosure without departing from the spirit thereof, and such modifications and combinations remain within the scope of the present disclosure.
Claims (5)
1. A brain focus image generation method based on text and image multi-mode is characterized in that: the method comprises the following steps:
step1, collecting a relevant brain focus public data set, matching each image with a text template, and filling a reserved position in the text template according to each image sample to form a text-image template data set;
step2, using the matched text-image template data set to be finely tuned on a CLIP module of DALLE2 so as to match the text feature space of the image;
step3: performing image generation and variant generation according to the required text requirement by using the trimmed DALLE2 model; the CLIP module encodes and maps the required text into a text feature space, the text feature space is converted into an image feature space through a priori model, and the image features generate an image with required semantics through a decoder; the generated image is used as expansion data, and the requirement of the DALLE2 model on the data volume is further met on the basis of a limited public data set;
step4, performing cyclic fine tuning on the model again by the paired image text generated in the Step3 and the original text-image template data set of the Step 1;
step5, editing the target image by using the circularly fine-tuned model, providing a mask to specify the area of the edited image, and inputting text description to generate a target focus so as to generate a focus image with labels for subsequent segmentation tasks.
2. A method of generating a text and image based multi-modality brain lesion image according to claim 1, characterized by: the fine tuning described in S2 is specifically: the pre-trained weighted model of the large-scale text image pair is loaded as an initialization model, and text-image data based on the brain focus image generated locally is used for continuous training on the initialization model, so that fine adjustment of model weights is completed.
3. A method of generating a text and image based multi-modality brain lesion image according to claim 1, characterized by: the cyclic fine adjustment in S4 is specifically: and (3) taking the preliminary fine tuning model in the step (S2) as an initialization model, and training the model again by using the data generated by expansion in the step (S3) and the original data to finish secondary fine tuning of the model parameters.
4. A brain focus image generation system, characterized by: the system can be used to implement the brain focus image generation method of any one of claims 1 to 3;
the brain focus image generation system includes: the system comprises a data collection module, a fine adjustment module, a data expansion module and a marked image generation module;
and a data collection module: a public data set for collecting relevant brain lesions and matching each image with a text template to form a text-image template data set;
and a fine adjustment module: the image and the text are respectively encoded, namely, both the image and the text are converted into a common embedded space, and then the matching is carried out by calculating the similarity between the image and the text;
and a data expansion module: performing image generation and variant generation according to the required text requirement by using the trimmed DALLE2 model;
the annotation image generation module is used for editing the target image by using the circularly fine-tuned model, providing mask appointed editing areas and text description to generate a target focus; thereby generating a lesion image with labels.
5. A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the brain lesion image generation method according to any one of claims 1 to 3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310463730.4A CN116503515A (en) | 2023-04-26 | 2023-04-26 | Brain lesion image generation method and system based on text and image multi-mode |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310463730.4A CN116503515A (en) | 2023-04-26 | 2023-04-26 | Brain lesion image generation method and system based on text and image multi-mode |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116503515A true CN116503515A (en) | 2023-07-28 |
Family
ID=87327980
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310463730.4A Pending CN116503515A (en) | 2023-04-26 | 2023-04-26 | Brain lesion image generation method and system based on text and image multi-mode |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116503515A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116775960A (en) * | 2023-08-23 | 2023-09-19 | 成都安哲斯生物医药科技有限公司 | Multi-mode medical data question-answering method and storage medium |
CN117558394A (en) * | 2023-09-28 | 2024-02-13 | 兰州交通大学 | Cross-modal network-based chest X-ray image report generation method |
-
2023
- 2023-04-26 CN CN202310463730.4A patent/CN116503515A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116775960A (en) * | 2023-08-23 | 2023-09-19 | 成都安哲斯生物医药科技有限公司 | Multi-mode medical data question-answering method and storage medium |
CN116775960B (en) * | 2023-08-23 | 2023-10-20 | 成都安哲斯生物医药科技有限公司 | Multi-mode medical data question-answering method and storage medium |
CN117558394A (en) * | 2023-09-28 | 2024-02-13 | 兰州交通大学 | Cross-modal network-based chest X-ray image report generation method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kwon et al. | Generation of 3D brain MRI using auto-encoding generative adversarial networks | |
CN116503515A (en) | Brain lesion image generation method and system based on text and image multi-mode | |
Hu et al. | Brain MR to PET synthesis via bidirectional generative adversarial network | |
Lyu et al. | Conversion between CT and MRI images using diffusion and score-matching models | |
CN109074659B (en) | Medical atlas registration | |
Skeika et al. | Convolutional neural network to detect and measure fetal skull circumference in ultrasound imaging | |
Wang et al. | Fully automatic intervertebral disc segmentation using multimodal 3D U-Net | |
Yang et al. | An indirect multimodal image registration and completion method guided by image synthesis | |
CN116188410A (en) | Multi-mode Trans-CNN UNet-based 3D brain glioma MRI focus segmentation method, system, equipment and medium | |
Dorjsembe et al. | Conditional diffusion models for semantic 3d medical image synthesis | |
US9361701B2 (en) | Method and system for binary and quasi-binary atlas-based auto-contouring of volume sets in medical images | |
CN117274599A (en) | Brain magnetic resonance segmentation method and system based on combined double-task self-encoder | |
Tang et al. | Multi-atlas brain parcellation using squeeze-and-excitation fully convolutional networks | |
Zhang et al. | Generator versus segmentor: Pseudo-healthy synthesis | |
Tudosiu et al. | Morphology-preserving autoregressive 3d generative modelling of the brain | |
Yu et al. | MouseGAN++: unsupervised disentanglement and contrastive representation for multiple MRI modalities synthesis and structural segmentation of mouse brain | |
Amirkolaee et al. | Development of a GAN architecture based on integrating global and local information for paired and unpaired medical image translation | |
Li et al. | Unsupervised cross-modality domain adaptation for segmenting vestibular schwannoma and cochlea with data augmentation and model ensemble | |
Zhu et al. | A survey of generative adversarial networks | |
Cao et al. | Autoencoder-Based Collaborative Attention GAN for Multi-Modal Image Synthesis | |
CN111612762B (en) | MRI brain tumor image generation method and system | |
Mortazi et al. | Weakly supervised segmentation by a deep geodesic prior | |
Klemencic et al. | Non-rigid registration based active appearance models for 3D medical image segmentation | |
Kumar et al. | Cardiac segmentation from MRI images using recurrent & residual convolutional neural network based on SegNet and level set methods | |
Wang et al. | Octree boundary transfiner: efficient transformers for tumor segmentation refinement |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |