CN117636099B - Medical image and medical report pairing training model - Google Patents

Medical image and medical report pairing training model Download PDF

Info

Publication number
CN117636099B
CN117636099B CN202410090308.3A CN202410090308A CN117636099B CN 117636099 B CN117636099 B CN 117636099B CN 202410090308 A CN202410090308 A CN 202410090308A CN 117636099 B CN117636099 B CN 117636099B
Authority
CN
China
Prior art keywords
medical
medical image
image
report
entity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410090308.3A
Other languages
Chinese (zh)
Other versions
CN117636099A (en
Inventor
马韵洁
宋国磊
王飞
王佐成
吴艳平
谢浩天
徐晓龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Data Space Research Institute
Original Assignee
Data Space Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Data Space Research Institute filed Critical Data Space Research Institute
Priority to CN202410090308.3A priority Critical patent/CN117636099B/en
Publication of CN117636099A publication Critical patent/CN117636099A/en
Application granted granted Critical
Publication of CN117636099B publication Critical patent/CN117636099B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a medical image and medical report pairing training model which adopts a set of registered medical images and medical reportsTraining is realized, and the training steps of the comprehensive training model are as follows: s1, image coding; s2, text encoding; s3, attention weighted image representation; s4, training the establishment of a model function, the invention can automatically learn useful characteristic representation from medical images and medical reports, can capture complex relations between the medical images and the report data through joint learning of the medical images and the report data, improves the characterization capability and the information extraction effect of the data, and introduces a modern deep learning technology into the medical field, thereby realizing the integrated analysis of multi-mode medical data and providing more accurate and comprehensive support for clinical diagnosis and disease monitoring.

Description

Medical image and medical report pairing training model
Technical Field
The invention relates to the technical field of medical information processing, in particular to a medical image and medical report pairing training model.
Background
Image and text data accumulation in the medical field has increased dramatically, and these data sources cover a wide variety of information from X-ray and MRI scans to clinical reports and medical records. These data are both rich in anatomical and pathological features and bear the clinical experience and professional judgment of the physician. However, the efficient use of such data, particularly in combining medical images with text data, remains one of the important challenges facing the medical field.
Interpretation of medical images is critical to accurately conduct clinical diagnosis. However, due to the variety and complexity of medical image data, accurate identification of lesions, localization of abnormalities, and analysis of anatomical structures requires a great deal of experience and expertise from a physician. In addition, medical texts contain a large amount of important information about patient condition, treatment regimen, and doctor diagnosis. However, there are challenges in correlating and combining these two data types to extract more comprehensive information.
In recent years, the application of deep learning techniques in the fields of medical images and texts is increasing. However, current deep learning methods focus mainly on single modality data analysis, ignoring the rich correlation between medical images and text. To fully exploit multimodal data, in particular medical image and text pairing data, we need an innovative approach to achieve joint analysis and comprehensive interpretation.
Disclosure of Invention
In order to solve the problems, the invention provides a medical image and medical report pairing training model, which is realized by the following technical scheme.
A medical image and medical report pairing training model employing a set of registered medical images and medical reportsTo achieve training, wherein K represents the number of paired medical images and medical reports;
wherein a is i And b i Representing a medical image and a medical report, respectively, W and H representing the width and length of the medical image, respectively, and C representing the number of color channels of the medical image source file;
wherein,
the training steps of the medical image and medical report pairing training model are as follows:
s1, image coding, namely partitioning a medical image, and coding sub-regions to obtain feature vectors of the sub-regions;
s2, text coding, namely extracting entity information in the medical report to code so as to acquire embedded representation of the entity information;
s3, attention weighted image representation, namely weighting the subareas of the medical image according to the importance of each subarea in the medical image relative to each medical report to obtain the final representation of the medical image;
s4, building a training model function.
Preferably, in the step S1, a target detection segmentation model is used to identify a critical entity region and a weak semantic feature region in the medical image, and a res net-50 model is used as an encoder to encode the critical entity region and the weak semantic feature region respectively, so as to obtain f and fWherein f represents the eigenvector of the key solid region, < ->Feature vectors representing regions of weak semantic features,
wherein M represents the number of key entity areas on each medical image, and M is 5;
final adaptive mean pooling by ResNet-50 modelThe global features extracted in the layer are denoted as f g
Preferably, in the step S2, entity information is extracted from the medical report by using the existing MetaMap model, and the extracted entity information is expressed asWherein->Representing->Extracted entity information, <' > about>Subsequently the BioClinicalBERT model was used as encoder and denoted +.>Encoding the entity information to obtain the entity information and an embedded representation of the overall report, comprising:
mapping the representation into 128-dimensional feature vectors by projection mapping, i.e
Preferably, in said step S3, the final representation a of the medical image i The method comprises the following steps:
λ1 and λ2 are hyper-parameters and λ1+λ2=1.
Preferably, the method comprises the steps of,
wherein the method comprises the steps ofWeighting the critical entity area on the ith medical image based on the attention of the medical report; />Is the effect of entity information on the ith medical report on the jth critical entity area on the ith medical image,/->Represents an attention weight; />Is the feature vector of the j-th key entity region on the medical image paired with the i-th medical report.
Preferably, the method comprises the steps of,
wherein,is a super parameter; />Corresponding to->The embedding of the individual reporting entity information represents a similarity to a jth sub-region in the ith medical image;
wherein,representing the transpose of the vector.
Preferably, the training model is optimized using the following loss function:
wherein,is a weight function, +.>Z is equal to->Or->,/>Is a super parameter.
Preferably, in the step S4, each medical image and the corresponding medical report are used as positive sample pairs, each medical image and other medical reports are used as negative sample pairs, and the noisy contrast estimation loss function in contrast learning is finally obtained, which is a function of the training model:
wherein the method comprises the steps ofIs a hyper-parameter used to control weak semantic negative sample weights where neg represents the set of negative sample pairs formed by each image with other exam reports.
The invention can automatically learn useful characteristic expression from medical images and medical reports, can capture complex relations between the medical images and the report data through joint learning of the medical images and the report data, improves the characterization capability and the information extraction effect of the data, introduces a modern deep learning technology into the medical field, thereby realizing the integrated analysis of multi-mode medical data and providing more accurate and comprehensive support for clinical diagnosis and disease monitoring, and has the following beneficial effects:
extraction of weak semantic features: by extracting image regions with weak semantic information. The method is beneficial to better capturing local low-level features by the model and improving the performance of the model in processing scenes such as medical images.
Consider local features: the generated weak semantic negative sample contains local features and other texture features of the target, so that the local structure of the medical image can be more comprehensively described, and the model can more accurately understand the details of the image.
Enhancing semantic information: the medical report is embedded in combination with the medical image representation, the image representation is weighted with an attention mechanism, and important areas related to the text entities are further extracted. This helps the model to better capture semantic information in the medical image.
Enhanced image-report association: by computing the similarity between the medical image sub-regions and the medical report, an attention weighted image representation is generated. This helps to strengthen the association between the image and the report, improving the similarity calculation capability of the model between the medical report and the image.
Contrast learning framework: the model is trained by comparing the differences between positive and negative samples using a contrast learning framework. This allows the model to better distinguish between different samples in the feature space, improving the robustness and performance of the model.
Drawings
In order to more clearly illustrate the technical solutions of the present invention, the drawings that are needed in the description of the specific embodiments will be briefly described below, it being obvious that the drawings in the following description are only some examples of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1: the invention relates to a process for establishing a medical image and medical report pairing training model;
fig. 2: the medical image processing flow comprises a medical image processing flow;
fig. 3: the invention relates to a medical report processing flow.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in figures 1-3 of the drawings,
example 1
A medical image and medical report pairing training model employing a set of registered medical images and medical reportsTo achieve training, wherein K represents the number of paired medical images and medical reports;
wherein a is i And b i Representing a medical image and a medical report, respectively, W and H representing the width and length of the medical image, respectively, and C representing the number of color channels of the medical image source file;
wherein,
the training steps of the medical image and medical report pairing training model are as follows:
s1, image coding, namely partitioning a medical image, and coding sub-regions to obtain feature vectors of the sub-regions;
s2, text coding, namely extracting entity information in the medical report to code so as to acquire embedded representation of the entity information;
s3, attention weighted image representation, namely weighting the subareas of the medical image according to the importance of each subarea in the medical image relative to each medical report to obtain the final representation of the medical image;
s4, building a training model function.
According to the invention, the medical report can be embedded into the medical image representation, the image representation is weighted, and the important area related to the entity information on the medical report is extracted, so that the semantic information in the medical image is better captured, and the sub-area on the medical image is weighted, thereby being beneficial to enhancing the association between the medical image and the medical report and improving the similarity computing capability of the model between the medical report and the medical image.
Example 2
In step S1, a target detection segmentation model is used for identifying a key entity region and a weak semantic feature region in a medical image, and a ResNet-50 model is used as an encoder for encoding the key entity region and the weak semantic feature region respectively to obtain f and fWherein f represents the eigenvector of the key solid region, < ->Feature vectors representing regions of weak semantic features,
wherein M represents the number of key entity areas on each medical image, and M is 5; the fixed value of M can be manually set according to task requirements, and is set to be 5 in the invention, namely five key entity areas are selected on each medical image.
The global features extracted in the final adaptive average pooling layer by ResNet-50 model are denoted as f g
In the application, the target detection segmentation model adopts the existing models of Faster R-CNN, YOLO or U-Net and the like, the models can identify key entity areas in medical images and help to locate interesting anatomical structures, lesion areas and the like, and then the ResNet-50 model is used as an encoder to encode the key entity areas to obtain feature vectors f of the key entity areas.
In the medical image field, weak semantic feature regions refer to those in the imageAreas which may not be major lesions or structures, such as blood vessels, bones, organ boundaries, textures, color distributions of images, etc., are identified using a target detection segmentation model, and encoded by a ResNet-50 model to obtain feature vectors of the weak semantic feature areas
Example 3
In step S2, entity information is extracted from the medical report using the existing MetaMap model, and the extracted entity information is expressed asWherein->Representing->The extracted entity information is used to determine the entity information,subsequently the BioClinicalBERT model was used as encoder and denoted +.>Encoding the entity information to obtain the entity information and an embedded representation of the overall report, comprising:
mapping the representation into 128-dimensional feature vectors by projection mapping, i.e
This process helps better capture key semantic information in medical reports.
Further, in step S3, the final representation Ai of the medical image is:
λ1 and λ2 are hyper-parameters and λ1+λ2=1.
Further, the method comprises the steps of,
wherein the method comprises the steps ofWeighting the critical entity area on the ith medical image based on the attention of the medical report; />Is the effect of entity information on the ith medical report on the jth critical entity area on the ith medical image,/->Represents an attention weight; />Is the feature vector of the j-th key entity region on the medical image paired with the i-th medical report.
Further, the method comprises the steps of,
wherein,is a super parameter; />Corresponding to->The embedding of the individual reporting entity information represents a similarity to a jth sub-region in the ith medical image;
wherein,representing the transpose of the vector.
Entity information that is defined in conformity with the medical language system, i.e., keyword information in the medical report.
In the present embodiment, in the medical image, unlike the natural image, the region of interest is often indicated by subtle visual cues. Using global features alone may not adequately capture the features of these regions of interest, we have adopted a different approach, namely learning the attention mechanism, to weight key solid regions of different medical images according to their importance to given solid information.
To generate an attention weighted image representation based on entity information, we first calculate the similarity of all key entity regions and entity information, using the dot product similarity calculation formula:
for each medical report, we calculate an attention weighted image representation based on its similarity to all key solid regions in the paired medical image
Attention weighting is the impact of a medical report on different critical entity areas,
a final representation of the final resulting image:
example 4
The training model is optimized using the following loss function:
wherein,is a weight function by +.>To dynamically adjust the weights of positive and weak semantic negative samples, which weight function can be calculated based on the distance of the feature vector, +.>Z is equal toOr->,/>Is a super parameter.
The loss function here is to learn the feature representation of the medical image more optimally.
By passing throughThe method has the advantages that the weight can be dynamically adjusted according to the feature distance between samples, so that the model can learn weak semantic features more pertinently. This helps to increase the sensitivity of the model to weak semantic information, thus making the comparison betterLearning and feature learning. By this loss function we can better utilize the back propagation algorithm to optimize the model parameters.
Example 5
In step S4, each medical image and the corresponding medical report are used as a positive sample pair, each medical image and other medical reports are used as a negative sample pair, and finally, the noisy contrast estimation loss function in contrast learning is obtained, which is a function of the training model:
wherein the method comprises the steps ofIs a hyper-parameter used to control weak semantic negative sample weights where neg represents the set of negative sample pairs formed by each image with other exam reports.
The feature function here is to optimize the pairing training of the medical image and the medical report to meet the task requirements of subsequent automated interpretation of the medical image.
The preferred embodiments of the invention disclosed above are intended only to assist in the explanation of the invention. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. The invention is limited only by the claims and the full scope and equivalents thereof.

Claims (5)

1. A medical image and medical report pairing training model, characterized in that the model employs a set of registered medical images and medical reportsTo achieve training, wherein K represents paired medical imagesAnd the number of medical reports;
wherein a is i And b i Representing a medical image and a medical report, respectively, W and H representing the width and length of the medical image, respectively, and C representing the number of color channels of the medical image source file;
wherein,
the training steps of the medical image and medical report pairing training model are as follows:
s1, image coding, namely partitioning a medical image, and coding sub-regions to obtain feature vectors of the sub-regions;
s2, text coding, namely extracting entity information in the medical report to code so as to acquire embedded representation of the entity information;
s3, attention weighted image representation, namely weighting the subareas of the medical image according to the importance of each subarea in the medical image relative to each medical report to obtain the final representation of the medical image;
final representation A of medical image i The method comprises the following steps:
λ1 and λ2 are hyper-parameters and λ1+λ2=1;
to generate an attention weighted image representation based on entity information, we first calculate the similarity of all key entity regions and entity information, using the dot product similarity calculation formula:
wherein,transposition of the representative vector;
wherein,is a super parameter; />Corresponding to->Embedding representation of information of individual reporting entity with the +.sup.th in the ith medical image>Similarity between sub-regions;
for each medical report, we calculate an attention weighted image representation based on its similarity to all key solid regions in the paired medical image
Wherein the method comprises the steps ofWeighting the critical entity area on the ith medical image based on the attention of the medical report; />Is the effect of entity information on the ith medical report on the jth critical entity area on the ith medical image,/->Represents an attention weight; />Is the feature vector of the jth critical entity region on the medical image paired with the ith medical report;
s4, building a training model function.
2. The medical image and medical report pairing training model according to claim 1, wherein in step S1, the target detection segmentation model is used to identify the critical entity region and the weak semantic feature region in the medical image, and the res net-50 model is used as the encoder to encode the critical entity region and the weak semantic feature region respectively, to obtain f and fWherein f represents the eigenvector of the key solid region, < ->Feature vectors representing weak semantic feature regions;
wherein M represents the number of key entity areas on each medical image, and M is 5;
the global features extracted in the final adaptive average pooling layer by ResNet-50 model are denoted as f g
3. The medical image and medical report pairing training model according to claim 2, wherein in step S2, entity information is extracted from the medical report using the existing MetaMap model, and the extracted entity information is represented asWherein->Representing->Extracted entity information, <' > about>Subsequently the BioClinicalBERT model was used as encoder and denoted +.>Encoding the entity information to obtain the entity information and an embedded representation of the overall report, comprising:
mapping the representation into 128-dimensional feature vectors by projection mapping, i.e
4. A medical image and medical report pairing training model according to claim 3, characterized in that the training model is optimized with the following loss function:
wherein,is a weight function, +.>Z is equal to->Or->,/>Is a super parameter.
5. The medical image and medical report pairing training model according to claim 4, wherein in step S4, each medical image and corresponding medical report are used as positive sample pairs, each medical image and other medical report are used as negative sample pairs, and the noisy contrast estimation loss function in contrast learning is finally obtained, and the function is a function of the training model:
wherein the method comprises the steps ofIs a hyper-parameter used to control weak semantic negative sample weights where neg represents the set of negative sample pairs formed by each image with other exam reports.
CN202410090308.3A 2024-01-23 2024-01-23 Medical image and medical report pairing training model Active CN117636099B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410090308.3A CN117636099B (en) 2024-01-23 2024-01-23 Medical image and medical report pairing training model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410090308.3A CN117636099B (en) 2024-01-23 2024-01-23 Medical image and medical report pairing training model

Publications (2)

Publication Number Publication Date
CN117636099A CN117636099A (en) 2024-03-01
CN117636099B true CN117636099B (en) 2024-04-12

Family

ID=90021849

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410090308.3A Active CN117636099B (en) 2024-01-23 2024-01-23 Medical image and medical report pairing training model

Country Status (1)

Country Link
CN (1) CN117636099B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112992308A (en) * 2021-03-25 2021-06-18 腾讯科技(深圳)有限公司 Training method of medical image report generation model and image report generation method
CN115861641A (en) * 2022-10-31 2023-03-28 浙江工业大学 Medical image report generation method based on fine-grained attention
CN115910264A (en) * 2022-11-10 2023-04-04 上海师范大学 Medical image classification method, device and system based on CT and medical report
WO2023204944A1 (en) * 2022-04-19 2023-10-26 Microsoft Technology Licensing, Llc Training of text and image models
CN117391092A (en) * 2023-12-12 2024-01-12 中南大学 Electronic medical record multi-mode medical semantic alignment method based on contrast learning
CN117392473A (en) * 2023-10-30 2024-01-12 齐鲁工业大学(山东省科学院) Interpretable medical image classification system based on multi-modal prototype network

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11583239B2 (en) * 2017-03-24 2023-02-21 The United States Of America, As Represented By The Secretary, Department Of Health And Human Service Method and system of building hospital-scale chest X-ray database for entity extraction and weakly-supervised classification and localization of common thorax diseases
US20220122250A1 (en) * 2020-10-19 2022-04-21 Northwestern University Brain feature prediction using geometric deep learning on graph representations of medical image data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112992308A (en) * 2021-03-25 2021-06-18 腾讯科技(深圳)有限公司 Training method of medical image report generation model and image report generation method
WO2023204944A1 (en) * 2022-04-19 2023-10-26 Microsoft Technology Licensing, Llc Training of text and image models
CN115861641A (en) * 2022-10-31 2023-03-28 浙江工业大学 Medical image report generation method based on fine-grained attention
CN115910264A (en) * 2022-11-10 2023-04-04 上海师范大学 Medical image classification method, device and system based on CT and medical report
CN117392473A (en) * 2023-10-30 2024-01-12 齐鲁工业大学(山东省科学院) Interpretable medical image classification system based on multi-modal prototype network
CN117391092A (en) * 2023-12-12 2024-01-12 中南大学 Electronic medical record multi-mode medical semantic alignment method based on contrast learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MedCLIP: Contrastive Learning from Unpaired Medical Images and Text;Zifeng Wang 等;arXiv:2210.10163;20221018;1-12 *

Also Published As

Publication number Publication date
CN117636099A (en) 2024-03-01

Similar Documents

Publication Publication Date Title
Qi et al. Automated diagnosis of breast ultrasonography images using deep neural networks
CN109493308B (en) Medical image synthesis and classification method for generating confrontation network based on condition multi-discrimination
Chang et al. Thyroid segmentation and volume estimation in ultrasound images
WO2019062846A1 (en) Medical image aided diagnosis method and system combining image recognition and report editing
CN109754007A (en) Peplos intelligent measurement and method for early warning and system in operation on prostate
CN111597946B (en) Processing method of image generator, image generation method and device
WO2015106374A1 (en) Multidimensional texture extraction method based on brain nuclear magnetic resonance images
Yu et al. Early melanoma diagnosis with sequential dermoscopic images
CN114782307A (en) Enhanced CT image colorectal cancer staging auxiliary diagnosis system based on deep learning
CN114266786A (en) Gastric lesion segmentation method and system based on generation countermeasure network
JP2021144675A (en) Method and program
CN113298830A (en) Acute intracranial ICH region image segmentation method based on self-supervision
KR20220144687A (en) Dual attention multiple instance learning method
CN117218127B (en) Ultrasonic endoscope auxiliary monitoring system and method
Zhang et al. A study on tuberculosis classification in chest X-ray using deep residual attention networks
CN113592769B (en) Abnormal image detection and model training method, device, equipment and medium
Ruan et al. An efficient tongue segmentation model based on u-net framework
Xu et al. Application of artificial intelligence technology in medical imaging
CN117636099B (en) Medical image and medical report pairing training model
CN117036288A (en) Tumor subtype diagnosis method for full-slice pathological image
CN115409812A (en) CT image automatic classification method based on fusion time attention mechanism
CN117274147A (en) Lung CT image segmentation method based on mixed Swin Transformer U-Net
CN115526898A (en) Medical image segmentation method
CN116092643A (en) Interactive semi-automatic labeling method based on medical image
Diamantis et al. This Intestine Does Not Exist: Multiscale Residual Variational Autoencoder for Realistic Wireless Capsule Endoscopy Image Generation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant