CN117542469A

CN117542469A - Diagnostic report generation method, system and medium for multi-mode medical data

Info

Publication number: CN117542469A
Application number: CN202311517989.9A
Authority: CN
Inventors: 黄飞跃; 马勇; 徐宇辰; 柏志安
Original assignee: Ruinjin Hospital Affiliated to Shanghai Jiaotong University School of Medicine Co Ltd
Current assignee: Ruinjin Hospital Affiliated to Shanghai Jiaotong University School of Medicine Co Ltd
Priority date: 2023-11-15
Filing date: 2023-11-15
Publication date: 2024-02-09

Abstract

The disclosure provides a diagnostic report generation method, a system and a medium for multi-modal medical data, wherein the diagnostic report generation method for the multi-modal medical data comprises the following steps: acquiring a multi-modal training data set, wherein the multi-modal training data set comprises a medical image and a text report; splitting the text report according to preset keywords, and determining a negative description text and a positive description text; inputting the medical image, the negative description text and the positive description text into a pre-trained diagnostic report generation model, and outputting a loss function; optimizing the pre-trained diagnostic report generation model according to the loss function, and determining a multi-mode medical diagnostic report generation model; inputting the medical image to be detected into a multi-mode medical diagnosis report generating model to generate a diagnosis report corresponding to the medical image to be detected. The method solves the problem of unbalanced training samples in the model training stage, and improves the detection capability of the model on abnormal description in the medical image.

Description

Diagnostic report generation method, system and medium for multi-mode medical data

Technical Field

The present disclosure relates to the field of computer vision and natural language processing technologies, and in particular, to a method, a system, and a medium for generating a diagnostic report for multimodal medical data.

Background

Along with the improvement of medical level and the development of medical imaging technology, medical image data becomes an important component of patient electronic files, and a doctor is required to have professional medical image knowledge, clinical medical knowledge and clinical experience in writing a qualified medical image report, and has a certain evaluation capability on the change of focus along with time, and comprehensive judgment capability on patient medical history and other examination results.

However, today medical image data grows exponentially, and excessive data puts tremendous stress on clinicians. The medical field accumulates a large amount of data of < medical image-text report >, the image-text data have strong internal correlation, if a computer-aided diagnosis system is adopted to optimize the workflow of doctors according to the massive data in hospitals, a computer is adopted to analyze and process medical images and extract information of inspection reports, the automatic generation of image reports is completed, doctors only carry out auditing and modification on final reports, the working pressure of the doctors can be greatly relieved, and the application has important practical application value.

When the doctor writes the image report, the doctor refers to the characteristics of the medical image, the examination report and the like. The text report comprises laboratory inspection reports, clinical records and the like, the information is difficult to comprehensively process by a traditional technical method, a deep learning method is needed to extract and align features of image data and text data, and an image report text is generated by using the aligned features, meanwhile, a large number of non-abnormal statement descriptions exist in the image report text description, the description is defined as negative descriptions, abnormal feature descriptions such as lesions in images are defined as positive descriptions, the problem of unbalance of positive and negative samples is caused by a large number of negative descriptions, and the detection rate of a final report generating task on the abnormal descriptions is affected.

Disclosure of Invention

In view of the drawbacks in the prior art, an object of the present disclosure is to provide a method, a system and a medium for generating a diagnostic report for multimodal medical data.

To achieve the above object, according to a first aspect of the present disclosure, there is provided a diagnostic report generating method for multimodal medical data, including:

acquiring a multi-modal training dataset, the multi-modal training dataset comprising a medical image and a text report;

splitting the text report according to preset keywords, and determining a negative description text and a positive description text;

inputting the medical image, the negative description text and the positive description text into a pre-trained diagnostic report generation model, and outputting a loss function, wherein the loss function comprises a negative loss function and a positive loss function, and performing model training on the pre-trained diagnostic report generation model;

optimizing the pre-trained diagnostic report generation model according to the loss function, and determining a multi-mode medical diagnostic report generation model;

inputting the medical image to be detected into the multi-mode medical diagnosis report generation model, and generating a diagnosis report corresponding to the medical image to be detected.

Optionally, the pre-trained diagnostic report generation model includes a text editor, an image editor, a multimodal text generator.

Optionally, the inputting the medical image, the negative description text and the positive description text into a pre-trained diagnostic report generating model, outputting a loss function, and performing model training on the pre-trained diagnostic report generating model includes:

inputting the medical image into the image editor for image feature extraction processing, and outputting the image features of the medical image;

inputting the negative description text and the positive description text into the text editor, and outputting text features of the text report, wherein the text features of the text report comprise negative description text features and positive description text features;

inputting the image features and the text features into the multi-modal text generator, and outputting the negative loss function and the positive loss function.

Optionally, the inputting the image feature and the text feature into the multi-modal text generator outputs the negative loss function and the positive loss function, including:

performing feature alignment processing on the image features and the text features in the same feature space, and determining the image features and the text features subjected to the alignment processing;

generating a diagnosis report corresponding to the medical image according to the image features and the text features subjected to the alignment processing;

and outputting the negative loss function and the positive loss function according to the diagnosis report and the text report corresponding to the medical image.

Optionally, the optimizing the pre-trained diagnostic report generating model according to the loss function determines a multi-mode medical diagnostic report generating model, including:

setting the ratio of the negative description text to the positive description text in the text report according to the negative loss function and the positive loss function, optimizing the parameters of the pre-trained diagnostic report generation model, and determining the multi-modal medical diagnostic report generation model.

Optionally, the pre-trained diagnostic report generation model performs a pre-training process using a text report of the full-scale negative descriptive text.

Optionally, the inputting the medical image to be tested into the multi-mode medical diagnostic report generating model, generating the diagnostic report corresponding to the medical image to be tested includes:

inputting the medical image to be detected into the image editor, and outputting the image characteristics of the medical image to be detected;

inputting the image characteristics of the medical image to be detected into the multi-mode text generator to generate a diagnosis report corresponding to the medical image to be detected.

According to a second aspect of the present disclosure, there is provided a diagnostic report generating system for multimodal medical data, comprising:

according to a third aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of generating a diagnostic report for multimodal medical data provided by the first aspect of the present disclosure.

According to a fourth aspect of the present disclosure, there is provided an electronic device comprising:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to implement the steps of the method for generating a diagnostic report for multimodal medical data provided by the first aspect of the disclosure.

Compared with the prior art, the embodiment of the disclosure has at least one of the following beneficial effects:

according to the technical scheme, in the process of model training of the pre-trained diagnostic report generation model, the negative loss function and the positive loss function are output, the proportion of the negative description text and the positive description text in the text report can be set according to the negative loss function and the positive loss function, the positive description text is guided to train the pre-trained diagnostic report generation model so as to solve the problem of unbalanced training samples, the loss weights of the negative description text and the positive description text are adjusted, the detection precision of abnormal description of medical influence is improved, the occurrence of abnormal lesions is effectively prevented, moreover, the trained multi-mode medical diagnostic report generation model is adopted to automatically generate the diagnostic report, the working pressure of doctors is relieved, and the working efficiency is improved.

Drawings

Other features, objects and advantages of the present disclosure will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings in which:

FIG. 1 is a flowchart illustrating a method of generating a diagnostic report for multimodal medical data, according to an exemplary embodiment.

FIG. 2 is a schematic diagram illustrating the structure of a pre-trained diagnostic report generation model, according to an example embodiment.

FIG. 3 is a flowchart illustrating a method for model training of a pre-trained diagnostic report generation model, according to an exemplary embodiment.

FIG. 4 is a block diagram illustrating a diagnostic report generation system for multimodal medical data according to an exemplary embodiment.

Fig. 5 is a block diagram of an electronic device, according to an example embodiment.

Detailed Description

The present disclosure is described in detail below with reference to specific embodiments. The following examples will assist those skilled in the art in further understanding the present disclosure, but are not intended to limit the disclosure in any way. It should be noted that variations and modifications could be made by those skilled in the art without departing from the spirit of the present disclosure. These are all within the scope of the present disclosure.

FIG. 1 is a flowchart illustrating a method of generating a diagnostic report for multimodal medical data, according to an exemplary embodiment. As shown in fig. 1, a diagnostic report generating method for multi-modal medical data includes S11 to S15.

S11, acquiring a multi-mode training data set.

Wherein the multimodal training dataset comprises medical images and text reports, the multimodal training dataset may take the form of a combined pair of a plurality of medical images and their corresponding text reports. The multimodal training dataset is used for model training of a pre-trained diagnostic report generation model.

S12, splitting the text report according to preset keywords, and determining a negative description text and a positive description text.

Wherein, the negative description text represents the text description in the text report related to the abnormal characteristics of the pathological changes which do not appear in the medical image, and the positive description text represents the text description in the text report related to the abnormal characteristics of the pathological changes which appear in the medical image.

S13, inputting the medical image, the negative description text and the positive description text into a pre-trained diagnostic report generation model, outputting a loss function, and performing model training on the pre-trained diagnostic report generation model.

The loss function comprises a negative loss function and a positive loss function, wherein the negative loss function represents a loss function corresponding to a negative description text, and the positive loss function represents a loss function corresponding to a positive description text.

In some possible embodiments, firstly, a text report of a full-scale negative description text and a corresponding medical image thereof can be adopted to perform pre-training processing on a preset diagnostic report generation model, so as to obtain a pre-trained diagnostic report generation model; secondly, a text report containing a negative description text and a positive description text is adopted to gradually carry out model training treatment on the pre-trained diagnostic report generation model, so that the recognition capability of the diagnostic report generation model on the positive description text is improved.

As shown in fig. 2, the diagnostic report generation model includes a text editor, an image editor, and a multimodal text generator. The text editor is used for extracting text features in the text report, the image editor is used for extracting image features of the medical image, the multi-modal text generator is used for aligning the text features and the image features in the same feature space, and the diagnosis report is generated according to the aligned text features and the aligned image features.

As an example, pre-training the preset diagnostic report generation model includes:

and inputting a text report of the full negative description text and a corresponding medical image thereof into a preset diagnosis report generation model, and outputting a negative loss function.

Specifically, inputting the full-amount negative description text into a text editor, and outputting text characteristics of the full-amount negative description text; inputting the medical image into an image editor, and outputting the image characteristics of the medical image; the text features of the full amount of negative descriptive text and the image features of the medical image are input into a multi-modal text generator, and a negative loss function is output.

As another example, a pre-trained diagnostic report generation model is model-trained using text reports containing negative descriptive text and positive descriptive text. The positive description text is guided to train the pre-trained diagnosis report generation model, so that the problem that the trained multi-mode medical diagnosis report generation model is difficult to detect abnormal description due to unbalanced training samples for model training is prevented.

And S14, optimizing the pre-trained diagnostic report generating model according to the loss function, and determining the multi-mode medical diagnostic report generating model.

In one possible embodiment, the pre-trained diagnostic report generation model is optimized to determine the multimodal medical diagnostic report generation model based on the negative and positive loss functions setting the ratio of negative and positive descriptive text in the text report.

In the model training stage, the proportion of the negative description text and the positive description text can be automatically regulated, and the weight of a negative loss function and a positive loss function is regulated by regulating the proportion of the positive description text in a text report, so that the identification capability of a trained multi-mode medical diagnosis report generation model on abnormal pathological change characteristics in medical images is improved to a certain extent, a diagnosis report is accurately generated based on the medical images, and abnormal pathological changes which are missed to be detected are effectively prevented.

S15, inputting the medical image to be detected into a multi-mode medical diagnosis report generation model, and generating a diagnosis report corresponding to the medical image to be detected.

As an example, in the test phase, a medical image to be tested is input into a trained multi-mode medical diagnosis report generating model, firstly, the medical image to be tested is input into an image editor, and image characteristics of the medical image to be tested are output; secondly, inputting the image features of the medical image to be detected into a multi-mode text generator, generating a diagnosis report corresponding to the medical image to be detected, and finally outputting the diagnosis report corresponding to the medical influence to be detected by a multi-mode medical diagnosis report generation model.

As shown in fig. 3, in some possible embodiments, the medical image, the negative description text, the positive description text are input into a pre-trained diagnostic report generation model, the loss function is output, and the pre-trained diagnostic report generation model is model trained, including S21 to S23.

S21, inputting the medical image into an image editor for image feature extraction processing, and outputting the image features of the medical image.

S22, inputting the negative description text and the positive description text into a text editor, and outputting text characteristics of a text report.

Wherein the text features of the text report include negative descriptive text features and positive descriptive text features.

S23, inputting the image features and the text features into a multi-modal text generator, and outputting a negative loss function and a positive loss function.

In one possible embodiment, in the multi-mode text generator, performing feature alignment processing on the image features and the text features in the same feature space, and determining the aligned image features and text features; generating a diagnosis report corresponding to the medical image according to the image characteristics and the text characteristics which are subjected to the alignment treatment; and outputting a negative loss function and a positive loss function according to the report and the text report corresponding to the medical image.

According to the technical scheme, the positive description text is guided to perform model training and optimization processing on the pre-trained diagnostic report generation model by adjusting the loss weights of the negative description text and the positive description text, parameters of a text editor, an image editor and a multi-modal text generator of the diagnostic report generation model are optimized, and model training on the pre-trained diagnostic report generation model is completed, so that the multi-modal medical diagnostic report generation model is obtained.

Based on the same concept, the present disclosure also provides a diagnostic report generating system for multi-modal medical data, referring to fig. 4, the diagnostic report generating system 100 for multi-modal medical data includes: an acquisition module 110, a text processing module 120, a model training module 130, a model optimization module 140, a diagnostic report generation module 150.

An acquisition module 110 for acquiring a multimodal training dataset comprising medical images and text reports;

the text processing module 120 is configured to split the text report according to a preset keyword, and determine a negative description text and a positive description text;

the model training module 130 is configured to input the medical image, the negative description text, and the positive description text into a pre-trained diagnostic report generation model, and output a loss function, and perform model training on the pre-trained diagnostic report generation model, where the loss function includes a negative loss function and a positive loss function;

the model optimization module 140 is configured to perform optimization processing on the pre-trained diagnostic report generation model according to the loss function, and determine a multi-mode medical diagnostic report generation model;

the diagnostic report generating module 150 is configured to input a medical image to be tested into the multi-mode medical diagnostic report generating model, and generate a diagnostic report corresponding to the medical image to be tested.

Optionally, the model training module 130 includes:

the image feature extraction submodule is used for inputting the medical image into the image editor for image feature extraction processing and outputting the image features of the medical image;

a text feature extraction sub-module, configured to input the negative description text and the positive description text into the text editor, and output text features of the text report, where the text features of the text report include negative description text features and positive description text features;

and the multi-mode processing submodule is used for inputting the image features and the text features into the multi-mode text generator and outputting the negative loss function and the positive loss function.

Optionally, the multi-modal processing sub-module includes:

an alignment processing sub-module, configured to perform feature alignment processing on the image feature and the text feature in the same feature space, and determine the image feature and the text feature that are subjected to the alignment processing;

the diagnostic report generation sub-module is used for generating a diagnostic report corresponding to the medical image according to the image characteristics and the text characteristics which are subjected to the alignment processing;

and the loss submodule is used for outputting the negative loss function and the positive loss function according to the diagnosis report and the text report corresponding to the medical image.

Optionally, the model optimization module 140 includes:

and the optimizing sub-module is used for setting the proportion of the negative description text and the positive description text in the text report according to the negative loss function and the positive loss function, optimizing the parameters of the pre-trained diagnostic report generation model and determining the multi-modal medical diagnostic report generation model.

Optionally, the diagnostic report generation module 150 includes:

the image feature extraction submodule to be detected is used for inputting the medical image to be detected into the image editor and outputting the image features of the medical image to be detected;

and the diagnostic report generating sub-module is used for inputting the image characteristics of the medical image to be detected into the multi-mode text generator to generate a diagnostic report corresponding to the medical image to be detected.

The specific manner in which the various modules perform the operations in relation to the embodiments of the system described above have been described in detail in relation to the embodiments of the method and will not be described in detail herein.

As shown in fig. 5, in some possible embodiments, the present disclosure may also provide an electronic device, such as a terminal for medical diagnostic reporting, the electronic device 500 may include: a processor 501, a memory 502. The electronic device 500 may also include one or more of a multimedia component 503, an input/output interface 504, and a communication component 505.

Wherein the processor 501 is configured to control the overall operation of the electronic device 500 to complete all or part of the steps in the above-described method for generating a diagnostic report for multimodal medical data according to the first aspect. The memory 502 is used to store various types of data to support operation at the electronic device 500, which may include, for example, instructions for any application or method operating on the electronic device 500, as well as application-related data, such as contact data, messages sent and received, pictures, audio, video, and so forth. The Memory 502 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as static random access Memory (Static Random Access Memory, SRAM for short), electrically erasable programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM for short), erasable programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM for short), programmable Read-Only Memory (Programmable Read-Only Memory, PROM for short), read-Only Memory (ROM for short), magnetic Memory, flash Memory, magnetic disk, or optical disk. The multimedia component 503 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen, the audio component being for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signals may be further stored in the memory 502 or transmitted through the communication component 505. The audio assembly further comprises at least one speaker for outputting audio signals. The input/output interface 504 provides an interface between the processor 501 and other interface modules, which may be a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 505 is used for wired or wireless communication between the electronic device 500 and other devices. Wireless communication, such as Wi-Fi, bluetooth, near field communication (Near Field Communication, NFC for short), 2G, 3G, 4G, NB-IOT, eMTC, or other 5G, etc., or one or a combination of more of them, is not limited herein. The corresponding communication component 305 may thus comprise: wi-Fi module, bluetooth module, NFC module, etc.

In another exemplary embodiment, there is also provided a non-transitory computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the above-described multimodal medical data oriented diagnostic report generating method of the first aspect. For example, the computer readable storage medium may be the memory described above including program instructions executable by a processor of an electronic device to perform a method of generating a diagnostic report for multimodal medical data.

In another exemplary embodiment, a computer program product is also provided, comprising a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-described multimodal medical data oriented diagnostic report generating method when executed by the programmable apparatus.

The foregoing has described specific embodiments of this disclosure. It is to be understood that the present disclosure is not limited to the particular embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the claims without affecting the spirit of the disclosure. The above-described preferred features may be used in any combination without collision.

Claims

1. A method for generating a diagnostic report for multimodal medical data, comprising:

2. The method of claim 1, wherein the pre-trained diagnostic report generation model comprises a text editor, an image editor, a multimodal text generator.

3. The method of claim 2, wherein the inputting the medical image, the negative descriptive text, and the positive descriptive text into a pre-trained diagnostic report generation model, outputting a loss function, model training the pre-trained diagnostic report generation model, comprises:

4. The method of claim 3, wherein said inputting the image feature and the text feature into the multimodal text generator outputs the negative loss function and the positive loss function comprises:

5. The method of claim 1, wherein optimizing the pre-trained diagnostic report generation model according to the loss function determines a multi-modal medical diagnostic report generation model, comprising:

6. The method of claim 5, wherein the pre-trained diagnostic report generation model performs the pre-training process using a text report of a full amount of negative descriptive text.

7. The method according to claim 2, wherein the inputting the medical image to be measured into the multi-modal medical diagnostic report generating model generates a diagnostic report corresponding to the medical image to be measured, comprising:

8. A multi-modal medical data oriented diagnostic report generating system comprising:

the acquisition module is used for acquiring a multi-modal training data set, wherein the multi-modal training data set comprises a medical image and a text report;

the text processing module is used for splitting the text report according to preset keywords to determine a negative description text and a positive description text;

the model training module is used for inputting the medical image, the negative description text and the positive description text into a pre-trained diagnostic report generation model, outputting a loss function, and carrying out model training on the pre-trained diagnostic report generation model, wherein the loss function comprises a negative loss function and a positive loss function;

the model optimization module is used for optimizing the pre-trained diagnostic report generation model according to the loss function and determining a multi-mode medical diagnostic report generation model;

the diagnostic report generation module is used for inputting the medical image to be detected into the multi-mode medical diagnostic report generation model and generating a diagnostic report corresponding to the medical image to be detected.

9. A non-transitory computer readable storage medium having stored thereon a computer program, characterized in that the program when executed by a processor realizes the steps of the method according to any of claims 1-7.

10. An electronic device, comprising:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to implement the steps of the method of any one of claims 1-7.