CN117174240A

CN117174240A - Medical image report generation method based on large model field migration

Info

Publication number: CN117174240A
Application number: CN202311401131.6A
Authority: CN
Inventors: 宋彦; 刘畅; 田元贺; 陈伟东; 张勇东
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2023-10-26
Filing date: 2023-10-26
Publication date: 2023-12-05
Anticipated expiration: 2043-10-26
Also published as: CN117174240B

Abstract

The invention relates to the technical field of image report generation and discloses a medical image report generation method based on migration in the field of large models; the training process for generating the model comprises the following steps: encoding the radiological image into a visual vector using a visual encoding module; inputting the visual vector and the generated prompt into a large model field instance migration module to obtain an intermediate report and instance migration loss; inputting the visual vector, the refined prompt and the intermediate report into a large model fine decoding module to obtain a final image report and cross entropy loss; and calculating the total loss, and updating parameters of the large model domain instance migration module and the large model fine decoding module by using a back propagation algorithm. According to the method, through the intra-domain instance ordering process, on the premise of a small number of data samples, the alignment of a large language model to specific task information in a special domain can be rapidly realized; the text generation capability of the large language model on the medical image report generation task can be further improved.

Description

Medical image report generation method based on large model field migration

Technical Field

The invention relates to the technical field of image report generation, in particular to a medical image report generation method based on migration in the field of large models.

Background

Medical image report generation aims to automatically generate a corresponding medical image report for a specific radiological image. The method solves two technical problems:

existing methods tend to complete the task of medical image generation by training a specific model from scratch on an existing public radiological medical image dataset, which makes it difficult to align the image features of the medical image with the text features in the report, and thus often achieves poor performance when dealing with the task of medical image report generation. The method of the invention is based on the strong generation capacity of the large language model (namely the large model) obtained from huge training data, and can effectively solve the problem of insufficient generation capacity of the small model in the field of medical image report generation.

The existing large language model method can obtain more excellent application in the text generation task of the general field by pre-training on huge general corpus, but when the method is directly applied to the medical image report generation task of the specific medical field, the method is not effective any more due to the knowledge gap between the general field knowledge and the medical field knowledge. The method of the invention provides the design of intra-domain instance migration and coarse-to-fine decoding, and by comparing and sequencing the task-specific instance information and the task-independent instance information in the medical field and further carrying out the fine processing on the generated characters, the model is enabled to be aligned with the task-specific information in the field rapidly, and more accurate and detailed medical image report content can be further generated.

Disclosure of Invention

In order to solve the technical problems, the invention provides a medical image report generation method based on large model field migration.

In order to solve the technical problems, the invention adopts the following technical scheme:

a medical image report generation method based on large model field migration inputs a radioactive image into a generation model, and the generation model generates an image report according to a generation prompt and a refinement prompt; the generation model comprises a visual coding module, a large model field instance migration module and a large model fine decoding module; a training process for generating a model, comprising the steps of:

step one, using a visual coding module to input a radioactive imageCoding as visual vector +.>；

Step two, the vision vector is calculatedGenerating a prompt +.>Inputting the large model domain instance migration module to obtain an intermediate report +.>Instance migration penalty->The method comprises the steps of carrying out a first treatment on the surface of the The method specifically comprises the following steps:

s21, utilizing the radioactive imageTraining data, public medical corpus, search tool, obtaining a related instance list using a related instance search process +.>；

S22, utilizing visual senseMeasuring amountGenerating prompts->List of related examples->Intermediate report +.>Instance migration penalty->；

Step three, the vision vector is calculatedRefined prompt->And intermediate report->Inputting into a large model fine decoding module to obtain a final image report +.>Cross entropy loss->The method specifically comprises the following steps:

s31, visual vector is calculatedRefined prompt->And intermediate report->Inputting the text generator to obtain final image report +.>；

S32, reporting the image predicted by the text generatorEach word->Individual words of the image report with manual labeling +.>Contrast, and pass the loss function->Calculate loss->；/>The loss function is a cross entropy loss function;

step four, calculating the total lossAnd updating parameters of the large model domain instance migration module and the large model fine decoding module by using a back propagation algorithm.

Further, the visual coding module comprises a visual coding layer, a semantic space alignment layer and a linear mapping layer, and the visual coding module is used for inputting the radioactive imageCoding as visual vector +.>When in use, the method specifically comprises the following steps:

s11, taking the radioactive image asInputting visual coding layer to extract visual characteristic +.>；

S12, extracting the visual characteristicsThe input semantic space alignment layer is aligned with the text semantic space to obtain visual representation +.>；

S13, visual characterizationInput linear mapping layer, will->Is aligned with the hidden space dimension of the text generator to obtain a visual vector +.>。

Further, step S21, a related instance list is obtained using a related instance retrieval processWhen in use, the method specifically comprises the following steps:

s211, randomly sampling from training dataText, composing a training data text sequence +.>Random sampling from public medical corpus->Individual texts, composing a medical corpus text sequence +.>Wherein->Text sequence representing training data->First->Personal text->Representing a medical corpus text sequence->First->A text;

s212, for sampling from training dataImage encoder and text encoder using search tool for inserting radiological image +.>And the sampled training data text sequence +.>Mapping to the same feature space to obtain image feature +.>And training data text feature sequence +.>，/>Text feature sequence representing training data->The%>Training data text features; calculate->And->Cosine similarity->Fetch before by ordering>A related instance;

s213 for sampling from the public medical corpusImage encoder and text encoder using search tool for inserting radiological image +.>And the sampled medical corpus text sequence +.>Mapping to the same feature space to obtain image feature +.>And medical corpus text feature sequence->，/>Text feature sequence representing medical corpus->The%>Individual medical corpus text features; calculate->And->Cosine similarity->Fetch before by ordering>A related instance;

s214, sorting out from the training dataThe related instances are ordered from the public medical corpus +.>The related examples are combined to obtain a related example list +.>；/>Representation->The%>The elements.

Further, in step S22, an intermediate report is obtained using a contrast semantic ordering processInstance migration lossWhen in use, the method specifically comprises the following steps:

s221, visual vectorAnd generate prompt->Inputting a text generator to obtain an intermediate report +.>，Representation->The%>A personal word;

s222, reporting the middleAnd->、/>Inputting the text generator again, obtaining the intermediate report token from the last layer of the text generator +.>；

S223, obtaining relevant instance in the retrieval processExamples, combined into->Example pair(s)>The set of example pair compositions is denoted +.>；/>Indicate->Pairs of instances;

s224, the firstExamples of the example pair->And->、/>Inputting the text generator, obtaining the +.>Example characterization->The method comprises the steps of carrying out a first treatment on the surface of the First->Examples of the example pair->And->、/>Inputting the text generator, obtaining the +.>Example characterization->；

S225, characterizing the intermediate reportFirst->Representation of the correspondence of the examples->、/>Performing pair-by-pair comparison, and calculating to obtain instance migration loss->：

；

To control parameters of the edge size.

Further, by cross entropy loss functionCalculate loss->When (1):。

compared with the prior art, the invention has the beneficial technical effects that:

through the intra-domain instance ordering process, the invention can rapidly align the large language model to specific task information in the proprietary domain on the premise of a small number of data samples, well solves the domain adaptation problem between the general domain and the proprietary domain of the conventional large language model, and further uses the strong generating capacity of the large language model to better complete the task of medical image report generation.

Through the rough-to-fine generation process, the text generation capability of the large language model on the medical image report generation task can be further improved, and further the cross-modal alignment capability on the more effective example level is constructed, so that more corresponding and more accurate symptom description of pictures and texts can be generated, and the quality of the generated report is improved.

Drawings

Fig. 1 is a flowchart of a medical image report generation method according to the present invention.

Detailed Description

A preferred embodiment of the present invention will be described in detail with reference to the accompanying drawings.

The generation model in the invention comprises three modules, namely a visual coding module, a large model field instance migration module and a large model fine decoding module, and the flow of the medical image report generation method based on large model field migration is shown in figure 1.

A training process for generating a model, comprising the steps of:

step one, using a visual coding module to input a radioactive imageCoding as visual vector +.>。

The visual coding module comprises a visual coding layer, a semantic space alignment layer and a linear mapping layer, and is used for inputting radioactive imagesCoding as visual vector +.>When in use, the method specifically comprises the following steps:

s11, taking the radioactive image asInputting visual coding layer to extract visual characteristic +.>. In the present invention, a transducer is used for the visual coding layer.

S12, extracting the visual characteristicsThe input semantic space alignment layer is aligned with the text semantic space to obtain visual representation +.>. In the invention, the semantic space alignment layer adopts Q-force.

S13, visual characterizationInput linear mapping layer, will->Is aligned with the hidden space dimension of the text generator to obtain a visual vector +.>. The text generator in the invention adopts a Vicurna text generator.

s21, utilizing the radioactive imageTraining data, public medical corpus, search tool, obtaining a related instance list using a related instance search process +.>. The method specifically comprises the following steps:

s211, randomly selecting the training dataSamplingText, composing a training data text sequence +.>Random sampling from public medical corpus->Individual texts, composing a medical corpus text sequence +.>Wherein->Text sequence representing training data->First->Personal text->Representing a medical corpus text sequence->First->A text;

s212, for sampling from training dataImage encoder and text encoder using search tool for inserting radiological image +.>And the sampled training data text sequence +.>Mapping to the same feature space to obtain graphs respectivelyImage characteristics->And training data text feature sequence +.>，/>Text feature sequence representing training data->The%>Training data text features; calculate->And->Cosine similarity->Fetch before by ordering>A related instance;

s213 for sampling from the public medical corpusImage encoder and text encoder using search tool for inserting radiological image +.>And the sampled medical corpus text sequence +.>Mapping to the same feature space to obtain image feature +.>And medical corpus textCharacteristic sequence->，/>Text feature sequence representing medical corpus->The%>Individual medical corpus text features; calculate->And->Cosine similarity->Fetch before by ordering>A related instance;

S22, utilizing the vision vectorGenerating prompts->List of related examples->Intermediate report +.>Instance migration penalty->. The method specifically comprises the following steps:

；

To control parameters of the edge size.

Step three, the vision vector is calculatedRefined prompt->Intermediate report->Inputting into a large model fine decoding module to obtain a final image report +.>Cross entropy loss->The method specifically comprises the following steps:

s31, visual vector is calculatedRefined prompt->Intermediate report->Inputting the text generator to obtain final image report +.>。

S32, reporting the image predicted by the text generatorAll words->All words of the image report with manual labeling +.>Contrast, and pass the loss function->Calculate loss->：/>；

The loss function is the cross entropy loss function.

A predictive process for generating a model, comprising the steps of:

s51, using a visual coding module, the radioactive image is displayedCoding as visual vector +.>。

S52, visual vectorGenerating a prompt +.>Inputting the large model domain instance migration module to obtain an intermediate report +.>。

S53, visual vector is addedRefined prompt->Intermediate report->Inputting into a large model fine decoding module to obtain a final image report +.>。

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a single embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to specific embodiments, and that the embodiments may be combined appropriately to form other embodiments that will be understood by those skilled in the art.

Claims

1. A medical image report generation method based on large model field migration inputs a radioactive image into a generation model, and the generation model generates an image report according to a generation prompt and a refinement prompt; the generation model comprises a visual coding module, a large model field instance migration module and a large model fine decoding module; a training process for generating a model, comprising the steps of:

Step two, the vision vector is calculatedGenerating a prompt +.>Inputting the large model field instance migration module to obtain an intermediate reportInstance migration penalty->The method comprises the steps of carrying out a first treatment on the surface of the The method specifically comprises the following steps:

S22, utilizing the vision vectorGenerating prompts->List of related examples->Intermediate report +.>Instance migration penalty->；

Step three, the vision is directedMeasuring amountRefined prompt->And intermediate report->Inputting into a large model fine decoding module to obtain a final image report +.>Cross entropy loss->The method specifically comprises the following steps:

s31, visual vector is calculatedRefined prompt->And intermediate report->Inputting the text generator to obtain the final image report；

S32, reporting the image predicted by the text generatorEach word->Words of image report with manual annotationContrast, and pass throughLoss function->Calculate loss->；/>The loss function is a cross entropy loss function;

2. The medical image report generating method based on large model area migration of claim 1, wherein: the visual coding module comprises a visual coding layer, a semantic space alignment layer and a linear mapping layer, and is used for inputting radioactive imagesCoding as visual vector +.>When in use, the method specifically comprises the following steps:

S12, extracting the visual characteristicsInput semantic space alignment layer and text languageSense space alignment to obtain visual representation；

3. The method for generating medical image report based on large model area migration of claim 1, wherein step S21, a related instance list is obtained using a related instance search processWhen in use, the method specifically comprises the following steps:

s211, randomly sampling from training dataText, composing a training data text sequence +.>Random sampling from public medical corpus->Individual texts, composing a medical corpus text sequence +.>Wherein->Representing training numbersAccording to the text sequence->First->Personal text->Representing a medical corpus text sequence->First->A text;

s212, for sampling from training dataImage encoder and text encoder using search tool for inserting radiological image +.>And the sampled training data text sequence +.>Mapping to the same feature space to obtain image feature +.>And training data text feature sequence +.>，/>Text feature sequence representing training data->The%>Training data text features; calculate->And->Cosine similarity of (2)Fetch before by ordering>A related instance;

4. The medical image report generation method based on large model area migration of claim 3, wherein in step S22, the contrast semantic ordering process is used in the obtainingInter reportingInstance migration penalty->When in use, the method specifically comprises the following steps:

s221, visual vectorAnd generate prompt->Inputting a text generator to obtain an intermediate report +.>，/>Representation->The%>A personal word;

；

To control parameters of the edge size.

5. The method for generating medical image report based on large model area migration of claim 3, wherein the image report is generated by cross entropy loss functionCalculate loss->When (1): />。