CN117370933A

CN117370933A - Multi-mode unified feature extraction method, device, equipment and medium

Info

Publication number: CN117370933A
Application number: CN202311434500.1A
Authority: CN
Inventors: 何昆仑; 赵亚威; 柳青河
Original assignee: Chinese PLA General Hospital
Current assignee: Chinese PLA General Hospital
Priority date: 2023-10-31
Filing date: 2023-10-31
Publication date: 2024-01-09
Anticipated expiration: 2043-10-31
Also published as: CN117370933B

Abstract

The invention provides a multi-mode unified feature extraction method, a device, equipment and a medium, wherein the multi-mode unified feature extraction method comprises the following steps: acquiring multi-mode target medical data; according to the mode of the target medical data, extracting the target medical data by adopting a feature extraction model to obtain a first feature extraction result; supplementing the missing features through the first feature extraction result to obtain a second feature extraction result; fusing the second characteristic results to obtain fusion characteristics; and identifying the fusion characteristics to obtain a medical characteristic identification result of the target medical data. The beneficial effects of the invention are as follows: the multi-mode data is prompted to project through a large model of expert knowledge, the large model is induced to generate corresponding features, and cyclic aggregation is carried out, so that the data of different modes are fused into a unified representation, fusion and complementation of the medical multi-mode data are realized, and the stability and accuracy of feature extraction and identification of the medical data are improved.

Description

Multi-mode unified feature extraction method, device, equipment and medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a medium for extracting multi-mode unified features.

Background

Most multi-mode intelligent learning algorithms extract multi-mode data into a unified feature representation through a special neural network, and then execute downstream tasks through the unified feature representation. However, in medical data, which is typically present as multi-modal data, the absence of data modalities can affect the accuracy of medical data identification.

Disclosure of Invention

The embodiment of the invention mainly aims to provide a multi-mode unified feature extraction method, device, equipment and medium, which realize fusion and complementation among medical multi-mode data and improve the stability and accuracy of feature extraction and identification of the medical data.

One aspect of the present invention provides a method for extracting multi-modal unified features, including:

acquiring target medical data according to a medical feature extraction request, wherein the target medical data is multi-mode data, and the modes of the target medical data comprise multimedia data, text data and structured data;

according to the mode of the target medical data, extracting the target medical data by adopting a feature extraction model to obtain a first feature extraction result;

determining missing features in the target medical data according to the first feature extraction result, and complementing the missing features through the first feature extraction result to obtain a second feature extraction result;

fusing the second characteristic results to obtain fusion characteristics, wherein the fusion characteristics represent the unified characterization of the target medical data;

and identifying the fusion characteristics to obtain a medical characteristic identification result of the target medical data.

The method for extracting the multi-mode unified feature according to the embodiment, wherein extracting the target medical data by using a feature extraction model according to the mode of the target medical data to obtain a first feature extraction result comprises the following steps:

and identifying and classifying the modes of the target medical data, and selecting a corresponding type of feature extraction model to perform feature extraction based on each classification to obtain a first feature extraction result of each mode.

The multi-mode unified feature extraction method according to the present invention, wherein determining missing features in the target medical data according to the first feature extraction result, to obtain a second feature extraction result, includes:

inducing a first feature extraction result of a first modality and a first feature extraction result corresponding to the first modality except the first modality to obtain a feature extraction model induction feature, wherein the first modality is any one of multi-modality data;

if the first feature extraction result of the first modality cannot characterize the induction feature, complementing the first modality induction feature with the first feature extraction result except the first modality, and determining the second feature extraction result; and determining the second feature extraction result according to the first feature extraction result of the first mode, the first feature extraction result except for the first mode accident and the comparison result of the induced features.

The method for extracting the unified features of multiple modes according to the present invention, wherein determining the second feature extraction result according to the first feature extraction result of the first mode, the first feature extraction result except for the first mode accident, and the comparison result of the induced features, includes:

if the first feature extraction result of the first modality is inconsistent with at least one of the first feature extraction results except the first modality, and the first feature extraction result except the first modality is inconsistent, the first feature extraction result of the first modality is used as a second feature extraction result;

and if at least one of the first feature extraction results of the first modality and the first feature extraction results except the first modality are inconsistent, and the first feature extraction results except the first modality are consistent, taking the first feature extraction results except the first modality as the second feature extraction results.

The multi-mode unified feature extraction method according to the present invention, wherein the method further comprises:

determining a mode missing prompt library and a question-answer library according to at least one of a disease guide, a treatment principle and expert discussion, wherein the mode missing prompt library is used for detecting whether a mode is true, and the question-answer library is used for determining the mode of medical data;

the method comprises the steps of detecting medical data by adopting a question-answer library, determining a first mode, obtaining a feature vector of the first mode and a feature vector except the first mode, and complementing the feature vector with the mode missing first mode by the mode missing prompt library and the feature vector except the first mode.

According to the multi-mode unified feature extraction method, the second feature result is fused, and obtaining the fused feature comprises:

performing cyclic aggregation on the first feature extraction results of all the first modes to obtain second feature extraction results which are not changed any more, wherein the cyclic aggregation is used for the completion and association of the characterization features;

and fusing the second characteristic results which are not changed any more in all the first modes to obtain fusion characteristics.

Another aspect of the embodiments of the present invention provides a multi-mode unified feature extraction device, including:

the first module is used for acquiring target medical data according to the medical feature extraction request, wherein the target medical data is multi-mode data, and the modes of the target medical data comprise multimedia data, text data and structured data;

the second module is used for extracting the target medical data by adopting a feature extraction model according to the mode of the target medical data to obtain a first feature extraction result;

a third module, configured to determine a missing feature in the target medical data according to the first feature extraction result, and complement the missing feature through the first feature extraction result to obtain a second feature extraction result;

a fourth module, configured to fuse the second feature result to obtain a fusion feature, where the fusion feature represents a unified representation of the target medical data;

and a fifth module, configured to identify the fusion feature, and obtain a medical feature identification result of the target medical data.

Another aspect of an embodiment of the present invention provides an electronic device, including a processor and a memory;

the memory is used for storing programs;

the processor executes the program to implement the method as described above.

Embodiments of the present invention also disclose a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions may be read from a computer-readable storage medium by a processor of a computer device, and executed by the processor, cause the computer device to perform the method described previously.

The beneficial effects of the invention are as follows: clinical data including data of multiple modes such as images, texts, structuring and the like are prompted to project through a large model of expert knowledge, the large model is induced to generate corresponding features, and cyclic aggregation is carried out, so that the data of different modes are fused into unified representation, fusion and complementation among medical multi-mode data are realized, and stability and accuracy of feature extraction and recognition of medical data are improved.

Drawings

The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:

FIG. 1 is a schematic diagram of a multimodal unified feature extraction system in accordance with an embodiment of the invention.

FIG. 2 is a flow chart of a method for multimodal unified feature extraction in accordance with an embodiment of the invention.

FIG. 3 is a schematic diagram of a missing feature completion and association flow in an embodiment of the invention.

Fig. 4 is a schematic diagram of a multi-modal feature unified feature extraction flow based on a modal missing library and a question-answer library according to an embodiment of the present invention.

FIG. 5 is a schematic diagram of a multi-modal feature fusion process according to an embodiment of the invention.

FIG. 6 is a diagram illustrating multi-modal unified feature extraction in accordance with an embodiment of the invention.

FIG. 7 is a schematic diagram of multi-modal unified feature extraction in the absence of a modality according to an embodiment of the present invention.

FIG. 8 is a schematic diagram of an apparatus for multimodal unified feature extraction in accordance with an embodiment of the invention.

Detailed Description

Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. In the following description, suffixes such as "module", "part" or "unit" for representing elements are used only for facilitating the description of the present invention, and have no particular meaning in themselves. Thus, "module," "component," or "unit" may be used in combination. "first", "second", etc. are used for the purpose of distinguishing between technical features only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated. In the following description, the continuous reference numerals of the method steps are used for facilitating examination and understanding, and the technical effects achieved by the technical scheme of the invention are not affected by adjusting the implementation sequence among the steps in combination with the overall technical scheme of the invention and the logic relations among the steps. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.

Referring to fig. 1, fig. 1 is a schematic diagram of a multi-mode unified feature extraction system according to an embodiment of the present invention, which includes a terminal 100 and a server 200, wherein the terminal 100 includes a medical terminal, a medical device, and the like, and is configured to generate medical data of different modes, such as image data, text data, and structured data, wherein a server 200 user obtains multi-mode data and a feature extraction or identification request of the terminal 100, obtains target medical data according to the medical feature extraction request, and the target medical data is multi-mode data, and the modes of the target medical data include multimedia data, text data, and structured data; according to the mode of the target medical data, extracting the target medical data by adopting a feature extraction model to obtain a first feature extraction result; determining missing features in the target medical data according to the first feature extraction result, and complementing the missing features through the first feature extraction result to obtain a second feature extraction result; fusing the second characteristic results to obtain fusion characteristics, wherein the fusion characteristics represent unified characterization of the target medical data; and identifying the fusion characteristics to obtain a medical characteristic identification result of the target medical data.

Referring to fig. 2, fig. 2 is a flowchart of a method for multi-modal unified feature extraction according to an embodiment of the invention, which can refer to, but is not limited to, steps S100-S500:

s100, acquiring target medical data according to a medical feature extraction request, wherein the target medical data is multi-mode data, and the modes of the target medical data comprise multimedia data, text data and structured data.

And S200, extracting the target medical data by adopting a feature extraction model according to the mode of the target medical data to obtain a first feature extraction result.

In some embodiments, the modalities of the target medical data are identified and classified, and based on each classification, a feature extraction model of a corresponding type is selected for feature extraction, so as to obtain a first feature extraction result of each modality.

In some embodiments, the feature extraction model is a large model, where the large model is capable of processing complex and large amounts of data, completing a variety of complex tasks.

S300, determining missing features in the target medical data according to the first feature extraction result, and complementing the missing features through the first feature extraction result to obtain a second feature extraction result.

In some embodiments, reference is made to the missing feature completion and association flow diagram shown in fig. 3, which includes, but is not limited to, the steps of:

s310, inducing a first feature extraction result of a first mode and a first feature extraction result corresponding to the first mode, so as to obtain a feature extraction model induction feature, wherein the first mode is any one of multi-mode data.

S320, if the first feature extraction result of the first mode cannot characterize the induced feature, complementing the induced feature of the first mode with the first feature extraction result except the first mode, and determining a second feature extraction result; and determining a second feature extraction result according to the first feature extraction result of the first modality, the first feature extraction result except for the first modality accident, and the comparison result of the induced features.

In some embodiments, taking multi-modal data of lobar pneumonia as an example, text data text modal feature extraction in modal data induces a large model to output the following features by inputting case data and extracted CT image features and blood routine data features: "sex, age, symptoms, signs".

The following determination was made: if the case data is insufficient to embody the characteristics, the characteristics are supplemented completely through the extracted CT image characteristics and blood routine data characteristics.

In some embodiments, if the first feature extraction result of the first modality is inconsistent with at least one of the first feature extraction results other than the first modality, and the first feature extraction result other than the first modality is inconsistent, the first feature extraction result of the first modality is used as the second feature extraction result;

Illustratively, taking multi-modal data of lobar pneumonia as an example:

if the characteristic result extracted by the case data is inconsistent with the characteristic result induced by the CT image characteristic and the blood routine data characteristic, the characteristic result extracted by the case data is the same as the characteristic result extracted by the case data.

If the characteristic results extracted from the case data are inconsistent with the CT image characteristics and the characteristic results induced by the blood routine data characteristics, the results between the CT image characteristics and the characteristics induced by the blood routine data characteristics are inconsistent, and the characteristic results extracted from the case data are the same.

If the characteristic results extracted from the case data are different from the CT image characteristics and the characteristic results induced by the blood routine data characteristics, the CT image characteristics are consistent with the blood routine data characteristics, and the CT image characteristics and the blood routine data characteristics are consistent with each other.

In some embodiments, referring to the multi-modal feature unified feature extraction flow diagram based on the modal missing library and question-answer library shown in fig. 4, it includes, but is not limited to, steps S330 to S340:

s330, determining a mode missing prompt library and a question-answer library according to at least one of disease guidelines, treatment principles and expert discussions, wherein the mode missing prompt library is used for detecting whether a mode is true, and the question-answer library is used for determining the mode of medical data;

s340, detecting the medical data by adopting a question-answer library, determining a first mode, acquiring a feature vector of the first mode and a feature vector except the first mode, and complementing the feature vector with the mode missing first mode by the mode missing prompt library and the feature vector except the first mode.

Illustratively, a question-answer library Q is constructed based on prior knowledge of disease guidelines, treatment guidelines, expert discussions, and the like _i Modal miss hint library L _i Randomly initializing feature vectors under different modesAt the time of the t-th cycle, according to Q _i Belonging to the mode, selecting a corresponding large model N _i Will prompt the engineering data Q _i Feature vector +.>Extracted as feature vector +.>If modal miss occurs, the large model calls the miss hint library L _i Other modality feature vector +.>Extracted as feature vector +.>And returning to the circulation step until the characteristics are converged. Fusing the converged features into a feature matrix W for downstream tasks, where Q _i Represents a question-answer library constructed based on prior knowledge such as disease guidelines, treatment principles, expert discussions and the like, L _i A modal missing prompt library is represented, i represents a certain modality; />Representing the characteristics extracted from the i-mode data after t rounds of aggregation; w represents all modality features->Feature matrix after connection.

S400, fusing the second characteristic results to obtain fusion characteristics, wherein the fusion characteristics represent unified characterization of the target medical data.

In some embodiments, referring to the feature fusion flow diagram shown in fig. 5, it includes, but is not limited to, steps S410-S420:

s410, carrying out cyclic aggregation on the first feature extraction results of all the first modes to obtain a second feature extraction result which is not changed any more, wherein the cyclic aggregation is used for the complementation and association of the characterization features;

and S420, fusing the second characteristic results which are not changed any more in all the first modes to obtain fusion characteristics.

In some embodiments, the missing feature completion and association flow diagram shown in fig. 3 is exemplified by multi-modal data of lobar pneumonia, and the processing flow includes: the image mode feature extraction, namely, the following features are output by inputting CT images, and the extracted case data and blood routine data features induction large model: "lobar status, mediastinal status, main bronchus status, skeletal status, neurological status", the association and complementation of lobar pneumonia features is performed; executing feature extraction of structural data, and outputting the following features by inputting blood routine data and the extracted case data and CT image features induction large model: "inflammatory status, coagulation status, immune status", performing association and complementation of lobar pneumonia features; circularly judging the data of the three modes until the output characteristics are not changed; and fusing the characteristics which are not changed any more to obtain unified characteristics for downstream tasks.

S500, identifying the fusion characteristics to obtain medical characteristic identification results of the target medical data.

In some embodiments, the fusion profile characterizes a lobar pneumonia fusion profile of a case, such as gender, age, symptoms, signs, lobar status, mediastinal status, main tracheal status, skeletal status, neurological status, inflammatory status, clotting status, immune status, etc.

In some embodiments, referring to a multi-modal unified feature extraction schematic diagram shown in fig. 6, clinical data in this embodiment includes data of multiple modalities such as images, texts, and structuring, and engineering is prompted by a large model of expert knowledge to induce the large model to generate corresponding features. And generating features associated with the previous mode direction under the mode by inputting the generated feature vector and data of another mode into a large model, and circulating until the features are converged.

In some embodiments, referring to a multi-mode unified feature extraction schematic diagram when a mode is missing as shown in fig. 7, the embodiment aims at the problem of encountering the mode missing, and the large model is induced to complement the mode features in the state through the association of other mode features.

FIG. 8 is a diagram of a multi-modal unified feature extraction analysis device in accordance with an embodiment of the invention. The apparatus includes a first module 110, a second module 820, a third module 830, a fourth module 840, and a fifth module 850.

The first module is used for acquiring target medical data according to the medical feature extraction request, wherein the target medical data is multi-mode data, and the modes of the target medical data comprise multimedia data, text data and structured data; the second module is used for extracting the target medical data by adopting a feature extraction model according to the mode of the target medical data to obtain a first feature extraction result; the third module is used for determining missing features in the target medical data according to the first feature extraction result, and complementing the missing features through the first feature extraction result to obtain a second feature extraction result; a fourth module, configured to fuse the second feature result to obtain a fused feature, where the fused feature represents a unified representation of the target medical data; and a fifth module for identifying the fusion characteristics to obtain the medical characteristic identification result of the target medical data.

The device according to the embodiment may implement any of the foregoing multi-mode unified feature extraction methods under the cooperation of the first module, the second module, the third module, the fourth module, and the fifth module in the device, that is, obtain, according to a medical feature extraction request, target medical data, where the target medical data is multi-mode data, and a mode of the target medical data includes multimedia data, text data, and structured data; according to the mode of the target medical data, extracting the target medical data by adopting a feature extraction model to obtain a first feature extraction result; determining missing features in the target medical data according to the first feature extraction result, and complementing the missing features through the first feature extraction result to obtain a second feature extraction result; fusing the second characteristic results to obtain fusion characteristics, wherein the fusion characteristics represent unified characterization of the target medical data; and identifying the fusion characteristics to obtain a medical characteristic identification result of the target medical data. The beneficial effects of the invention are as follows: clinical data including data of multiple modes such as images, texts, structuring and the like are prompted to project through a large model of expert knowledge, the large model is induced to generate corresponding features, and cyclic aggregation is carried out, so that the data of different modes are fused into unified representation, fusion and complementation among medical multi-mode data are realized, and stability and accuracy of feature extraction and recognition of medical data are improved.

The embodiment of the invention also provides electronic equipment, which comprises a processor and a memory;

the memory stores a program;

the processor executes the program to execute the multi-mode unified feature extraction method; the electronic device has the functionality of a software system that hosts and runs multi-modal unified feature extraction provided by embodiments of the invention, such as a personal computer, mini-computer, mainframe, workstation, network or distributed computing environment, separate or integrated computer platform, or in communication with a charged particle tool or other imaging device, etc.

Embodiments of the present invention also provide a computer-readable storage medium storing a program that is executed by a processor to implement a multi-modal unified feature extraction method as described above.

In some alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flowcharts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed, and in which sub-operations described as part of a larger operation are performed independently.

Embodiments of the present invention also disclose a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions may be read from a computer-readable storage medium by a processor of a computer device, and executed by the processor, to cause the computer device to perform the multi-modal unified feature extraction analysis method described previously.

Furthermore, while the invention is described in the context of functional modules, it should be appreciated that, unless otherwise indicated, one or more of the described functions and/or features may be integrated in a single physical device and/or software module or one or more functions and/or features may be implemented in separate physical devices or software modules. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary to an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be apparent to those skilled in the art from consideration of their attributes, functions and internal relationships. Accordingly, one of ordinary skill in the art can implement the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative and are not intended to be limiting upon the scope of the invention, which is to be defined in the appended claims and their full scope of equivalents.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.

While the preferred embodiment of the present invention has been described in detail, the present invention is not limited to the embodiments described above, and those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of the present invention, and these equivalent modifications or substitutions are included in the scope of the present invention as defined in the appended claims.

Claims

1. A method for multi-modal unified feature extraction, comprising:

2. The method of claim 1, wherein the performing extraction on the target medical data using a feature extraction model according to a modality of the target medical data to obtain a first feature extraction result includes:

3. The method of claim 2, wherein determining missing features in the target medical data according to the first feature extraction result, to obtain a second feature extraction result, comprises:

4. The method of claim 3, wherein determining the second feature extraction result according to the first feature extraction result of the first modality, the first feature extraction result except for the first modality accident, and the comparison result of the induced features comprises:

5. A multi-modal unified feature extraction method as claimed in claim 3 further comprising:

6. The method of claim 3, wherein fusing the second feature results to obtain a fused feature comprises:

7. The method of claim 1, wherein the feature extraction model is a large model.

8. A multi-modal unified feature extraction apparatus, comprising:

9. An electronic device comprising a processor and a memory;

the memory is used for storing programs;

execution of the program by the processor implements the multimodal unified feature extraction method of any one of claims 1-7.

10. A computer-readable storage medium, wherein the storage medium stores a program that is executed by a processor to implement the multi-modal unified feature extraction method of any one of claims 1-7.