CN116168796A

CN116168796A - Medical image report structured generation method based on visual question and answer

Info

Publication number: CN116168796A
Application number: CN202310198891.5A
Authority: CN
Inventors: 周子杰; 余宙; 俞俊; 朱耕蔚; 高梓豪
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2023-03-03
Filing date: 2023-03-03
Publication date: 2023-05-26
Anticipated expiration: 2043-03-03
Also published as: ZA202307473B; CN116168796B

Abstract

The invention provides a medical image report structured generation method based on visual questions and answers. The method comprises the following steps: 1. VQA model design and modification. 2. "problem-state tree" design. 3. And (5) automatically extracting information. 4. Structured information integration. The invention is based on visual question-answering technology, and aims at generating a medical image diagnosis report, aiming at enhancing the interactivity between a problem and a model, and designs a series of data structures with a 'problem-state tree' as a core and conversion algorithms between the data structures. The invention reduces the randomness of the problem organization in the visual question answering technology to a certain extent, and is helpful for the VQA model to acquire more effective information in medical images. The model constructed by the technology has high expandability, can complete more comprehensive business with lower training cost, and can flexibly generate medical image diagnosis reports which have complete logic and contain rich information according to specific different application scenes in practical application.

Description

Medical image report structured generation method based on visual question and answer

Technical Field

The invention belongs to the technical field of computer vision, and relates to a medical image report structuring generation method which is based on a computer vision question-answering technology, guides a model to carry out image information mining through tree structuring problem setting and controllable judging logic and automatically carries out data integration.

Background

In the big data age, the number of images that humans acquire from nature or directly are in explosive growth phase at every moment, and the amount of information contained therein is far beyond what humans themselves can handle. In order to mine the information needed by human beings from a huge number of images, computer vision technology has been developed and is now the relatively most mature direction for artificial intelligence applications. Particularly, a plurality of medical links in the medical field are not separated from the support of images, and along with the increasing maturity of medical imaging technologies such as X-ray, MRI and the like, the application is increasingly wide, and the modern reliable medical diagnosis is not separated from the support of images, so that the information quantity worthy of mining in the images is increased. Therefore, in order to secure the reliability of medical diagnosis and to alleviate the burden on doctors, the integration of computer vision technology with medical diagnosis has become a necessary trend.

The following computer vision techniques have some preliminary applications in the field of medical diagnostics. The conventional medical diagnosis model uses a machine learning method, and has the capability of judging a specific disease type according to pictures through supervised learning of a large number of similar pictures and labels thereof; although the method has higher accuracy, the training cost is high, a large-scale data set needs to be manufactured, and meanwhile, the model only returns a corresponding final judgment result, so that all the characteristics mined from the medical image cannot be displayed in front of a user, and further diagnosis by a doctor is not facilitated. Visual description is one of the emerging technologies currently applied in the field of medical diagnosis, and can generate an objective summary description of the whole medical image; however, the medical image has a lot of information, one image may contain a plurality of focuses, and due to a certain degree of randomness of the medical diagnosis report generated by the visual description technology, part of important information in the image is likely to be insufficiently mined, so that hidden danger is buried for subsequent treatment. Compared with the prior art, the visual question-answering (VQA) technology has more flexibility and pertinence, on one hand, because the training data of the visual question-answering model comprises a plurality of modes, pictures of organs and various problems in different aspects, different symptoms can be diagnosed by only using one model in actual application, thereby avoiding the time cost caused by training a large number of conventional machine learning models; on the other hand, since the problem of natural language form can be designed by the human body, the system can search and extract the key information in the image according to the requirement of the user, and the result output by the system is more reliable than the visual description. Although the technology in the field of VQA is rapidly developed, due to the problem of model accuracy and the randomness of the problem, the mature VQA system can only be used for extracting the information of the conventional type image, such as the assistance of the blind, but is difficult to truly apply to the field of medical image processing with strong professionals and high information integration level.

Disclosure of Invention

In order to better promote the floor application of the computer vision technology in the medical field, the invention provides a medical image report structured generation method based on vision questions and answers. The invention is based on visual question-answering technology, and aims at generating a medical image diagnosis report, aiming at enhancing the interactivity between a problem and a model, and designs a series of data structures with a 'problem-state tree' as a core and conversion algorithms between the data structures. Compared with the traditional medical decision model, the method reduces the randomness of the problem organization in the visual question-answering technology to a certain extent, and is beneficial to the VQA model to acquire more effective information in the medical image. The model constructed by the technology has high expandability, can complete more comprehensive business with lower training cost, and can flexibly generate medical image diagnosis reports which have complete logic and contain rich information according to specific different application scenes in practical application.

The medical image report structured generation method based on visual question and answer comprises a VQA model design and transformation module, a problem-state tree design module, an information automatic extraction module and a structured information integration module.

The VQA model design and transformation module is specifically realized as follows

Firstly, a medical image VQA model with strong feature extraction and reasoning capability and good generalization performance needs to be constructed, and the model is particularly characterized in that the model is supposed to achieve better test results on a plurality of current medical VQA data sets. For any given medical image V and question text Q as input, the module will output a single determined text as answer a, and for the VQA model M (·) a=m (V, Q) can be found.

Based on the model, the main stream VQA model is basically selected from the candidate answer space to output the answer with the highest possibility as a real answer; in a real application scenario, the VQA model is limited by precision, and a judgment error is likely to occur, so that an answer beyond the expectation of a user is given, and the traditional VQA model is specially modified. The VQA model N (-) here will be described with a given medical image V, a given question text Q and a preset set of candidate answers A _p ＝{A ₁ ，A ₂ ...，...A _c As input, then selects the most appropriate answer in the given candidate answer set, which may be labeled a=n (V, Q, a) _p ) The specific selection method is that the probability value corresponding to each answer given by the traditional VQA model is determined, the largest probability value is selected as the output A, and if the finally selected answer is not the globally optimal answer given by the model, the judgment is marked in a 'question-state tree' to warn. When a suitable model is selected and reproduced and reformed, the model is used as a component of a 'black box' type for the subsequent whole process of the invention, namely, the model parameters are not changed as not specifically explained.

The 'problem-state tree' design module

The system comprises a pathology information demand analysis unit, a pathology state generation and combination unit, a pathology problem design unit, a tree structure optimization unit and an auxiliary information insertion unit.

Pathological information demand analysis unit

I.e. first analysing which information is necessary for generating a medical image diagnostic report. A complete medical image diagnostic report should produce diagnostic results from the phenomenon, which itself may be divided into different levels, corresponding to information on the different levels. In the concrete analysis, the invention adopts a bottom-up analysis method, namely, which information is needed is judged according to the type of the disease which is needed to be judged currently, and the acquisition of the information is established on which more basic information, so that the information has a global-local progressive relationship.

The invention expresses the image information as sigma, and constructs the information required by different diseases into a set I= { sigma according to a bottom-up method ₁ ，σ ₂ ...，...σ _n From sigma ₁ To sigma _n The represented information is arranged in a global to local order, and all the information should be able to find the corresponding representation from the answer space of model N (-) of VQA, i.e. when N (-) identifies that the answer has the highest likelihood, the corresponding information is obtained from image V on behalf of N (-). In particular, σ here ₁ Should represent global information for the entire image.

According to the thought, the information I needed by doctors to make a decision on the illness state can be analyzed _w On the basis of which information I can be provided by images of different modes respectively _h Further find the intersection I of the two _all ＝I _w ∩I _h ，I _all The final "problem-state tree" is the main pathological information that needs to be contained.

Pathological state generation and combination unit

I.e. according to the pathological information I which has been analyzed in the previous step _all It is integrated into a pathological state. Any kind of pathologyThe states are all superposition of several pathological information, and the more finely divided pathological states the more pathological information is needed. From the pathological information obtained by the pathological information demand analysis unit in the invention, the calculation can be performed

(the addition between information and state in the invention is not strictly mathematical addition, but represents integration and division of information, and the addition represents integration and the subtraction represents division), wherein 1 is less than or equal to a _i ≤n，1≤m≤n，/>

k=1, 2., where, further obtain a pathological state set S= { S ₁ ，s ₂ ...，...s _z Particularly, s herein ₁ ＝σ ₁ Representing an initial state. The subdivided pathological states often build on the more macroscopic pathological states, so that the inclusion relationships between the states can be found in the pathological state set S. It is further possible to organize the different states into a tree structure based on this inclusion relationship, i.e. in s ₁ As root node, the rest of the states s _i′ E S is divided into a plurality of mutually disjoint finite sets T ₁ ，T ₂ ...，T _l ，...，T _g Each T _l Nodes other than the root node of Tl can be divided into a plurality of mutually disjoint finite sets +.>

And so on. In particular for

If there is s _m′ ∈T _m′ ，s _n′ ∈T _n′ And s is _n′ Is T _n′ Should there be s _m′ -s _n′ >0, i.e., a node always contains more information than its ancestor node. According to the thought, the tree T can be constructed according to the pathological state set S _ST In the present invention this process can be denoted as T _ST =tree (S), where S represents a set of case states, referred to herein as T _ST Is a "state tree". To accommodate the later tree structure adjustment, all leaf nodes may be designated as feature state nodes and all branch nodes as flag state nodes.

Pathological problem design module

I.e. from the state tree T _ST Starting with, design and insert questions into the state tree T _ST In the last, the infrastructure T of the problem-state tree is obtained _QS . In the invention, the state tree presents the process of continuously refining states from top to bottom, and each state transition from a parent node to a child node needs to take a problem as a medium, and the answer of each problem can lead to a plurality of different state transition possibilities. For parent state nodes

It has several substate nodes +.>

For parent state node->

The set of sub-state nodes of (a) is divided into blocks according to characteristics, and +.>

Generating a question q _a Ensure pair A _a ＝N(V，q _a ，A _pa ) There is->

Make->

The constant holds. To further improve the accuracy of the VQA model, q _a Should be generated as much as possible according to->

The information of the model is set, and more constraint limiting words are added appropriately, so that the VQA model can search the correct result more easily when the answer space is searched. To the state tree T _ST Repeating the above steps for all non-leaf nodes (i.e. parent state nodes) to generate a question set +.>

This process can be marked as Q _u ＝Question(T _ST ). The problem node is inserted into the middle of the corresponding state node, so that the infrastructure T of the data structure 'problem-state tree' defined in the invention can be formed _QS . For example, in the above case, if the case of A _a ＝N(V，q _a ，A _pa ) There is->

Q is _a Become->

Sub-nodes of->

Becomes q _a Is a child of the node (a). The infrastructure of the "problem-state tree" can be expressed in particular as T _QS ＝{Q _u S, wherein Q _u S is a set of problems and S is a set of pathological states. In this case, see T _QS The root node and the leaf node of the node are both state nodes, and the child nodes of the state nodes are necessarily problem nodes, and the child nodes of the problem nodes are also necessarily state nodes.

Tree-type structure optimizing unit

In order to maximize the self-checking capability of the model constructed by the technology, the problem-state tree infrastructure T can be obtained _QS Further processing. Infrastructure for each sub-problem-state tree

There is->

But->

All state nodes contained are all +_ with their root node>

Is built on the basis that these state nodes all contain the root node +.>

Information provided, therefore, the present invention will be this +.>

All nodes included are regarded as having the same label which in actual use can represent a certain type of disease that can be qualitatively judged, herein referred to as a label set L, initially defined as l=t _QS Here T is _QS Is denoted as the label node of the label set L. By "problem-state tree" infrastructure T _QS The inclusion relationship of the self-collection can be known that the inclusion relationship of the label collection also exists, and the initial definition L is that _x ＝T _QSx ，L _y ＝T _QSy It can be seen that in the initial state, +.>

And->

Mutually are the filling conditions. For sub-tag set L _y Assuming its corresponding T _QSy Can directly divide T _QSx T, i.e _QSx Root node s _x Is T _QSy Root node s _y Then can be at s _x Selecting state nodes only comprising sub-features from all sub-problem nodes of (1)Is then taken from T along with the sub-feature state nodes _QSx Removed, but still remain at T _QSy And is taken as s _y And needs to be tightly attached to s _x To the left of the parent problem node. By executing the Label (-) operation, the invention realizes the process of checking the symptoms and then making the decision, enhances the logic of generating the text, and is also beneficial to a doctor to judge whether the diagnosis result given by the model meets the normal theory. It should be noted that the tag set L finally obtained here _x And T is _QSx Is not identical and can therefore be described as l=Label (T _QS ). The invention is thus distinguished because of the "problem-state tree" infrastructure T _QS The tree structure finally obtained by the invention is described, and the label set L describes the logic meaning of the true tree structure, the former is more beneficial to the subsequent checksum text generation work of the invention, and the latter is beneficial to the designer of the problem-state tree to better understand the project.

Auxiliary information inserting unit

After the optimization of the tree structure is completed, a plurality of auxiliary nodes can be inserted into the label set L according to specific requirements. The auxiliary nodes are divided into two types, one type is to further excavate the information contained in the existing characteristic state nodes, and the invention is called a characteristic state expansion node; another category is treatment advice, review advice, etc. for the condition to which the current label belongs, referred to herein as label replenishment nodes. The feature state expansion node should become a sub node of the corresponding feature state node, and may include a plurality of problem nodes and a plurality of feature state nodes, for further refining the features that have not been fully mined previously; the label supplement node should be used as a sub-node of the corresponding label node and needs to be located at the leftmost side, so that after the model makes a corresponding diagnosis, a proper suggestion can be given in the generated diagnosis report aiming at some specific phenomena, and the further development of doctor diagnosis and treatment work is assisted. For the feature state expansion node, the feature state expansion node is essentially a combination of the problem node and the feature state node, so that the problem node in the feature state expansion node can be added into the problem set Q _u The characteristic state nodes can be added into the pathological state set S, and the label supplementing nodes are introduction of medical information which is not possessed before, so that the characteristic state nodes are marked as a label supplementing node set E, and the information contained in each label supplementing node t is marked as epsilon. In summary, the "problem-state tree" ultimately constructed by the present invention can be expressed as T _QSE ＝{Q _u ，S，E}。

Information automatic extraction module

At the completion of the "problem-status tree" T _QSE ＝{Q _u After construction of S, E, the "problem-state tree" T may be found here _QSE Is searched and automatically generates an information tree T _IN . If a "problem-state tree" is said to correspond to a more generic template, then the "information tree" corresponds to a specific implementation of the template. The information tree is composed of several information nodes, where the information sigma contained therein is used directly _b To refer to information nodes. T is defined herein _IN ＝{I _s ，I _E }，

I _E ＝{∈ ₁ ，∈ ₂ ...，...∈ _h -and satisfy pair->

Make->

The construction principle is as follows: for a given picture and a given "problem-state tree" T _QSE ＝{Q _u S, E, assuming the current system is at S _t In the E S state, q _v Is s _t Is not traversed, s _v E S is q _v Is a sub-state node, sigma _v ＝s _v -s _t While at the same time s is present _v ＝s _t +A _v Wherein A is _v ＝N(V，q _v ，A _pv ) I.e. satisfy sigma _v ＝A _v Then s will be _v Information sigma contained in a node _v Copy to information tree as s _t Corresponding information node sigma _t Then changing the original system state to s _v And so on; if the label supplementing node t is encountered in the traversal process, the information epsilon is directly copied to the sub-node serving as the latest generation node in the information tree; if the state of the system is not traversed by the child problem node, returning to the last state. Note here that in the process of inserting an information node into an information tree, a plurality of sub-nodes should be sequentially arranged from left to right in the insertion order.

The process of generating the information tree is similar to the tree advanced traversal algorithm, but is different from the prior art, in the implementation process, only the state nodes and the label supplement nodes are reserved according to the whole running logic of the program to generate the information tree, the problem nodes are responsible for the logic guidance of the whole program, and when the problem nodes confirm that the program state is transferred to the state shown by one result, other states are automatically ignored. This design of the present invention takes into account the characteristics of the different diseases and thereby avoids too much invalid information contained in the resulting text.

Structured information integration module

In obtaining the information tree T _IN Later, since it conforms to the basic characteristics possessed by the data structure of "tree", it can be directly applied to T _IN Performing preface traversal, and concatenating all extracted information into characters, so as to finally complete the generation of a medical image diagnosis report.

The invention has the advantages and beneficial results that:

1. the invention provides a medical image report structured generation technology, which is favorable for the VQA technology to be applied to the field of medicine.

2. The invention designs a 'question-status tree', which generates a status from the information itself, and then generates a question from the status, thereby fully excavating the association between the question and the status, reducing the blindness and randomness of the organization of the visual question-answering question itself, and helping doctors to acquire more accurate and practical information.

3. The invention designs an algorithm for generating an information tree by a problem-state tree, which is favorable for extracting various information in a medical picture completely and carrying out structural storage and representation on the information on the premise of less human intervention, and is favorable for the subsequent comprehensive processing of the information.

4. The 'problem-state tree' organization form designed by the invention is simple and flexible, can be flexibly changed according to specific requirements, is easy to construct, is very easy to construct a complete medical image diagnosis system in a short time, and reduces the training time cost of an artificial intelligent algorithm in the application process.

5. The "problem-state tree" designed by the present invention has an inherent logic rule. Firstly, the rule can be reflected into a text generated when traversing the whole tree, so that the generated text has stronger logicality, accords with the reading habit of human beings, and meanwhile, a doctor can judge whether the decision made by the model is correct or not according to the logicality reflected in the text, thereby being beneficial to improving the reliability of the model; secondly, the relationships of inclusion, cause and effect and the like in the logic rule are good in compatibility with the knowledge graph, and the problem-state tree can be directly generated by converting the medical knowledge graph in the implementation process, so that the system is convenient to build.

6. The problem-state tree designed by the invention has strong expansibility, and a large number of auxiliary nodes containing diagnosis suggestions and treatment suggestions can be added so that the content of the finally generated text report is more abundant, and the decision making by doctors is more facilitated.

Drawings

FIG. 1 is a schematic diagram of the "problem-state tree" infrastructure of the present invention;

FIG. 2 is a schematic diagram of a process for optimizing a "problem-state tree" structure in accordance with the present invention;

FIG. 3 is a schematic diagram of a final "problem-state tree" structure of the present invention;

FIG. 4 is a schematic diagram of an "information tree" generated by the present invention and a diagnostic report generated;

FIG. 5 is a schematic diagram of the execution logic of the model VQA modified by the present invention;

Detailed Description

The invention is further illustrated by the following detailed description:

the specific implementation of the VQA model design and transformation in the step 1 is as follows:

the current mature medical image VQA model can be selected as a basic model M (-), then the output part of the model is changed, and the result with the largest output probability is changed into the result with the largest probability in the output candidate options, so that a VQA model N (-) is generated. In order to meet the above conditions, the whole model should be extended on the basis of the original one so that it can simultaneously input pictures, questions and candidate sets, and one picture and question have one-to-many correspondence, and one question corresponds to one candidate set. At the same time, if the result with the highest probability is not among the candidate sets, the last result given by the problem should be marked to alert the doctor that the model may not obtain the most ideal information on the problem. The structure after the whole transformation is shown in figure 5.

The specific implementation of the "problem-state tree" design in step 2 is as follows:

the design of the "problem-status tree" requires delivery to doctors with a professional medical setting. After the pathological information demand analysis is completed, theoretically the intersection I should be _all The corresponding information is arranged according to logic from whole to local, and a certain incremental subsequence is taken for addition to obtain a state set S, and a 'state tree' can be constructed through hierarchical connection among the state sets.

In practical application, in order to simplify the construction process of the "state tree", the logic structure of the "problem-state tree" is considered to be similar to the knowledge graph, and the generation can be directly performed through the medical knowledge graph. The disease entities in the knowledge graph have inclusion and subdivision relations, the entities can be directly used as mark state nodes of a state tree, and the attributes of the entities in the knowledge graph correspond to characteristic state nodes in the problem-state tree. With the method as a target, I is obtained by combining pathological information demand analysis _w Can be selected from medical knowledge graphThe trunk node is used as the root node of the sub-state tree, the depth-first traversal is carried out on the graph to obtain a plurality of spanning trees, and the spanning trees are analyzed according to the pathological information requirement to obtain I _h And combining the two states into a complete state tree, so that a prototype of a 'state tree' can be quickly constructed. On the basis, the nodes generated by the entities and the attributes selected from the knowledge graph can be manually pruned, and finally a needed 'state tree' is obtained.

After obtaining the "state tree", the problem is designed according to the given pathological problem generating method and is inserted into the "state tree", so as to obtain the "problem-state tree" infrastructure, and fig. 1 is a schematic diagram of the "problem-state tree" infrastructure given by taking an X-ray picture as an example.

On the basis, the 'problem-state tree' basic framework is optimized and perfected in turn, wherein the label supplementing nodes can be completely obtained from the medical knowledge graph, so that the auxiliary nodes can be reserved in the pruning process in actual operation. Fig. 2 illustrates an example of the "problem-state tree" infrastructure of fig. 1, showing a schematic diagram of an optimization "problem-state tree".

After the above steps are completed, the final "problem-state tree" is obtained, and the result after optimization as shown in fig. 2 is shown in fig. 3.

The information automatic extraction part in the step 3 is specifically implemented as follows:

the conversion process from the problem-state tree to the information tree is implemented in the step 3, and the implementation process is described in detail in the step 3, so that the description of how the information is stored and represented in the information tree is emphasized here. Since the "information tree" is converted from the "problem-status tree" and the diagnostic report is generated in step 4, it is necessary to unify the expression forms of the three information. To facilitate report generation in step 4, each information node should store a descriptive prefix or sentence concerning the nature or quantity of the image, and an example of an "information tree" is shown in the upper half of fig. 4, which is an illustration of the result obtained by the "problem-state tree" in fig. 3 through step 3, and can be referred to as reference. In order to achieve the result, the state of each state node should be analyzed in the design process of the problem-state tree during actual operation, so that the descriptive suffixes or sentences are preset, the traversal can directly extract the information expressed in the form of characters, and finally, the automatic generation of a diagnosis report is achieved.

The structural information integration part in the step 4 is specifically implemented as follows:

starting from the initial node of the information tree, traversing the whole tree by adopting an advanced traversing algorithm, extracting the information of each information node into a sentence of diagnosis conclusion, presetting keywords in a key part, for example, presetting the description of the information node X-ray image in the information node, and enabling the generated report to be smoother. The lower half of fig. 4 shows a report result generation case of the "information tree" of the upper half, and the final generated diagnosis report is "this is an X-ray image, the front chest is photographed, there is a lung lobe solid change, there is a pathological abnormality on the upper right lung, there is a thickening phenomenon of lung texture, it is diagnosed that there is pneumonia, it is recommended that bed rest, a lot of drinking water, there is a bronchus airway sign, it is diagnosed that pneumococcal pneumonia, it is recommended that penicillin medication is used, there is no pleural effusion, and it is not legionella pneumonia; there are fractures, which suggest bed rest, immobilization of joints ", for reference.

The above embodiments are further described in detail for the implementation of the present invention, but the present invention is not limited to the above examples, and the changes, modifications, additions or substitutions made by those skilled in the art within the technical scope of the present invention are also within the scope of the present invention.

Claims

1. The medical image report structuring generation method based on visual question and answer is characterized by comprising a VQA model design and transformation module, a 'question-state tree' design module, an information automation extraction module and a structuring information integration module;

the VQA model design and transformation module is specifically realized as follows:

firstly, a VQA model needs to be constructed, and for any given medical image V and question text Q to be used as input, the model outputs a single determined text as an answer A, and for a VQA model M (-), A=M (V, Q) can be obtained;

special adaptations to the traditional VQA model, the VQA model N (-) will be used with a given medical image V, a given question text Q and a preset candidate answer set A _p ＝{A ₁ ,A ₂ …,…A _c As input, then selects the most appropriate answer in the given candidate answer set, which may be labeled a=n (V, Q, a) _p ) The specific selection method is determined according to the probability value corresponding to each answer given by the traditional VQA model, the highest probability is selected as output A, and if the finally selected answer is not the globally optimal answer given by the model, the judgment is marked in a 'question-state tree' to warn;

the problem-state tree design module comprises five parts, namely a pathological information demand analysis unit, a pathological state generation and combination unit, a pathological problem design unit, a tree structure optimization unit and an auxiliary information insertion unit;

the information automatic extraction module is used for: after the construction of the problem-state tree is completed, searching is carried out in the problem-state tree, and an information tree is automatically generated;

the structured information integration module comprises: after the information tree is obtained, the information tree T can be directly accessed because the information tree conforms to the basic characteristics of the data structure of the tree _IN Performing preface traversal, and concatenating all extracted information into characters, so as to finally complete the generation of a medical image diagnosis report.

2. The visual question-answering based medical image report structured generation method according to claim 1, wherein the pathological information demand analysis unit is specifically implemented as follows

Representing image information as sigma, constructing information required for different diseases according to a bottom-up methodSet i= { σ ₁ ,σ ₂ …,…σ _n From sigma ₁ To sigma _n The represented information is arranged in a global to local order, and all the information can find out the corresponding representation from the answer space of the VQA model N (-), namely when the N (-) confirms that the answer has the highest possibility, the N (-) is represented to acquire the corresponding information from the image V; in particular, σ here ₁ Global information that should represent the entire image;

according to the thought, the information I needed by doctors to make a decision on the illness state is analyzed _w Further analysing on the basis of this which information I can be provided by the images of the different modalities, respectively _h Further find the intersection I of the two _all ＝I _w ∩I _h ，I _all It is the final "problem-state tree" that needs to contain the main pathological information.

3. The visual question-answering based medical image report structured generation method according to claim 2, wherein the pathological state generation and combination unit is specifically implemented as follows:

according to the pathological information I which is obtained by the pathological information demand analysis unit _all Will be pathological information I _all Integration into pathological states; any one pathological state is a superposition of a plurality of pathological information, and the more finely divided pathological states, the more pathological information is needed; the pathological state obtained by the pathological information demand analysis unit is calculated as follows:

wherein a is 1 to or less _i ≤n，1≤m≤n，/>

k=1, 2 …, …; obtaining a pathological state set S= { S ₁ ,s ₂ …,…s _z -a }; wherein pathological state s ₁ ＝σ ₁ Representing an initial state; while the subdivided pathological state builds on the more macroscopic pathological state, and therefore on the pathologyFinding the inclusion relation among the states in the state set S; the different states are further organized into a tree structure according to this inclusion relationship, i.e. in s ₁ As root node, the rest of the states s _i' E S is divided into a plurality of mutually disjoint finite sets T ₁ ,T ₂ …,T _l ,…,T _g Each T _l And can divide T _l The nodes outside the root node of (a) are divided into a plurality of mutually disjoint finite sets

And so on; in particular for->

If there is s _m' ∈T _m' ，s _n' ∈T _n' And s is _n' Is T _n' Should there be s _m' -s _n' >0, i.e., a node always contains more information than its ancestor node; according to the thought, constructing tree T according to pathological state set S _ST This process is denoted as T _ST =tree (S), where S represents a set of pathological states, T _ST Referred to as a "state tree"; to accommodate the later tree structure adjustment, all leaf nodes are designated as feature state nodes and all branch nodes are designated as flag state nodes.

4. The visual question-answering-based medical image report structured generation method according to claim 3, wherein the pathology problem design module is specifically implemented as follows:

i.e. from the state tree T _ST Starting with, design and insert questions into the state tree T _ST In the last, the infrastructure T of the problem-state tree is obtained _QS The method comprises the steps of carrying out a first treatment on the surface of the State tree T _ST The process of continuously refining the state is presented from top to bottom, and each state transition from the parent node to the child node needs to take the problem as a medium, and the answer of each problem can lead to a plurality of different state transition possibilities; for parent state nodes

It has several substate nodes +.>

For parent state node->

Generating a question q _a Ensure pair A _a ＝N(V,q _a ,A _pa ) There is->

Make->

The constant is established;

to further improve the accuracy of the VQA model, q _a Should be generated according to parent state nodes as much as possible

The information of the VQA model is set, constraint limiting words are added, so that the VQA model can more easily search a correct result when searching an answer space; to the state tree T _ST Repeating the above steps for all non-leaf nodes to generate a question set +.>

This process is labeled Q _u ＝Question(T _ST )；

Inserting problem nodes into the middle of corresponding state nodes can form the infrastructure T of the defined data structure' problem-state tree _QS The method comprises the steps of carrying out a first treatment on the surface of the The infrastructure of the "problem-state tree" is specifically representedIs T _QS ＝{Q _u S, wherein Q _u S is a pathological state set; in this case, see T _QS The root node and the leaf node of the node are both state nodes, and the child nodes of the state nodes are necessarily problem nodes, and the child nodes of the problem nodes are also necessarily state nodes.

5. The visual question-answering based medical image report structuring generation method according to claim 4, wherein the tree structure optimizing unit is specifically implemented as follows:

in order to have good self-checking capability, a problem-state tree infrastructure T is obtained _QS Further processing above, for each sub-problem-state tree infrastructure

There is->

But->

All state nodes contained are all +_ with their root node>

Is built on the basis that these state nodes all contain the root node +.>

Information provided, therefore, the +.>

All nodes included are regarded as having the same label which in actual use represents a certain type of disease which can be judged qualitatively, inReferred to herein as a tag set L, initially defined as l=t _QS Here T is _QS The root node of the label set L is marked as a label node of the label set L; by "problem-state tree" infrastructure T _QS The inclusion relationship of the self-collection can be known that the inclusion relationship of the label collection also exists, and the initial definition L is that _x ＝T _QSx ,L _y ＝T _QSy It can be seen that in the initial state, +.>

And (3) with

Mutually being the filling conditions; for sub-tag set L _y Assuming its corresponding T _QSy Can directly divide T _QSx T, i.e _QSx Root node s _x Is T _QSy Root node s _y Then at s _x Selecting these sub-problem nodes containing only sub-feature state nodes and then subtracting them together with the sub-feature state nodes from T _QSx Removed, but still remain at T _QSy And is taken as s _y And needs to be tightly attached to s _x To the left of the parent problem node; the process of checking the symptoms and then making the decision is realized by executing the Label (-) operation, so that the logic of the generated text is enhanced, and the doctor is facilitated to judge whether the diagnosis result given by the model meets the normal condition; it should be noted that the resulting tag set L _x And T is _QSx Is not identical and can therefore be described as l=Label (T _QS )。

6. The visual question-answering based medical image report structuring generation method according to claim 5, wherein the auxiliary information inserting unit is specifically implemented as follows:

after the optimization of the tree structure is completed, a plurality of auxiliary nodes are inserted into the label set L according to specific requirements; auxiliary nodes are divided into two types, one type is to further mine information contained in the existing characteristic state nodes, and the other type is calledThe characteristic state expansion nodes; the other category is treatment advice and review advice of the disease to which the current label belongs, and the advice is called label supplement nodes; the feature state expansion node should become a sub node of the corresponding feature state node, and may include a plurality of problem nodes and a plurality of feature state nodes, for further refining the features that have not been fully mined previously; the label supplementing node is used as a sub-node of the corresponding label node and needs to be positioned at the leftmost side, so that after the model makes corresponding diagnosis, a proper suggestion can be given in the generated diagnosis report aiming at some specific phenomena, and the further development of doctor diagnosis and treatment work is assisted; for the feature state expansion node, the feature state expansion node is essentially a combination of the problem node and the feature state node, so that the problem node in the feature state expansion node can be added into the problem set Q _u The characteristic state node can be added into the pathological state set S, and the label supplementing node is the introduction of medical information which is not possessed before, so that the characteristic state node is marked as a label supplementing node set E, and the information contained in each label supplementing node t is marked as E; the final constructed "problem-state tree" can thus be denoted as T _QSE ＝{Q _u ,S,E}。

7. The visual question-answering based medical image report structured generation method according to claim 6, wherein the information automatic extraction module is specifically implemented as follows:

at the completion of the "problem-status tree" T _QSE ＝{Q _u After construction of S, E, the "problem-state tree" T may be found here _QSE Is searched and automatically generates an information tree T _IN The method comprises the steps of carrying out a first treatment on the surface of the The information tree is composed of several information nodes, where the information sigma contained therein is used directly _b To refer to information nodes; t is defined herein _IN ＝{I _s ,I _E }，

I _E ＝{∈ ₁ ,∈ ₂ …,…∈ _h -and satisfy pair->

Make->

The construction principle is as follows: for a given picture and a given "problem-state tree" T _QSE ＝{Q _u S, E, assuming the current system is at S _t In the E S state, q _v Is s _t Is not traversed, s _v E S is q _v Is a sub-state node, sigma _v ＝s _v -s _t While at the same time s is present _v ＝s _t +A _v Wherein A is _v ＝N(V,q _v ,A _pv ) I.e. satisfy sigma _v ＝A _v Then s will be _v Information sigma contained in a node _v Copy to information tree as s _t Corresponding information node sigma _t Then changing the original system state to s _v And so on; if the label supplementing node t is encountered in the traversal process, the information epsilon is directly copied to the sub-node serving as the latest generation node in the information tree; if the state of the system does not have the sub-problem nodes which are not traversed yet, returning to the previous state; note here that in the process of inserting an information node into an information tree, a plurality of sub-nodes should be sequentially arranged from left to right in the insertion order. />