CN116344028A

CN116344028A - Method and device for automatically identifying lung diseases based on multi-mode heterogeneous data

Info

Publication number: CN116344028A
Application number: CN202310123255.6A
Authority: CN
Inventors: 俞益洲; 马杰超; 张树; 李一鸣; 乔昕
Original assignee: Beijing Shenrui Bolian Technology Co Ltd; Shenzhen Deepwise Bolian Technology Co Ltd
Current assignee: Beijing Shenrui Bolian Technology Co Ltd; Shenzhen Deepwise Bolian Technology Co Ltd
Priority date: 2023-02-14
Filing date: 2023-02-14
Publication date: 2023-06-27

Abstract

The invention provides a method and a device for automatically identifying lung diseases based on multi-mode heterogeneous data, wherein the method comprises the following steps: preprocessing unstructured text data by using a global unified character embedding feature; preprocessing the structured text data; performing feature extraction on medical image data by using an image feature extraction model of a structure to obtain image features, wherein the image feature extraction model of the structure uses a transducer structure as a trunk model; carrying out relation mapping expression among vocabularies; performing feature extraction and analysis of multiple dimensions, and performing feature fusion on multi-modal data, wherein the multi-modal data comprises text features and image features obtained by preprocessing unstructured text data and preprocessing structured text data; and classifying the fused features to obtain an output result.

Description

Method and device for automatically identifying lung diseases based on multi-mode heterogeneous data

Technical Field

The invention relates to the field of computers, in particular to an automatic lung disease identification method and device based on multi-mode heterogeneous data.

Background

With the rapid development of medical informatization and the update iteration of medical equipment, a vast variety of medical data has been generated, which can be roughly divided into clinical text data and image data. The text data mainly comprises structural test data such as hemoglobin, urine convention, gene detection results and the like, and unstructured text data such as patient complaints, pathological texts and the like recorded by doctors; the image data includes image data such as ultrasonic image, CT image, X-ray, and nuclear magnetic resonance image, and signal data such as electrocardiogram and electroencephalogram. Currently, most of the applications of artificial intelligence in medicine are single-modality data to handle specific tasks, such as Computed Tomography (CT) and single-disease diagnosis of retinal images, which neglect a broader clinical context, which inevitably weakens the potential of artificial intelligence models. In contrast, multi-modal data from multiple sources is often processed for clinicians in diagnosing lung infections, performing prognostic evaluations, and determining treatment plans. Medical data of different modes provides diagnosis and treatment information of patients from different specific angles, and the accuracy of diagnosis and treatment is further improved by combining various medical information, so that the artificial intelligence is more close to clinical practice. However, in theory, the artificial intelligence model should also be able to use data resources that are generally available to all clinicians, even resources that are not available to most clinicians (e.g., most common clinicians often do not review thousands of multi-modal data from different regions, different hospitals, different departments), while data integration of different modalities often increases the robustness and accuracy of the diagnosis. However, the information among the data of different modes is complementary and redundant, so that the defect of the own mode is overcome by effectively utilizing the complementary information among the different modes, the influence of the redundant information among the modes is reduced, the mastering condition of the global state of a patient is improved, and the multi-mode lung identification method is a serious problem in the research of various common diseases of the lung.

Disclosure of Invention

The present invention aims to provide an automatic pulmonary disease identification method and device based on multimodal heterogeneous data, which overcomes or at least partially solves the above-mentioned problems.

In order to achieve the above purpose, the technical scheme of the invention is specifically realized as follows:

one aspect of the present invention provides an automatic pulmonary disease recognition method based on multimodal heterogeneous data, comprising: preprocessing unstructured text data by using a global unified character embedding feature; preprocessing the structured text data; performing feature extraction on medical image data by using an image feature extraction model of a structure to obtain image features, wherein the image feature extraction model of the structure uses a transducer structure as a trunk model; carrying out relation mapping expression among vocabularies; performing feature extraction and analysis of multiple dimensions, and performing feature fusion on multi-modal data, wherein the multi-modal data comprises text features and image features obtained by preprocessing unstructured text data and preprocessing structured text data; and classifying the fused features to obtain an output result.

Wherein preprocessing unstructured text data comprises: unstructured text data is converted using a rule-oriented structuring algorithm.

Wherein preprocessing the structured data comprises: judging whether the preset value is in a reasonable interval or not, and normalizing the data of different orders of magnitude.

The image feature extraction model of the structure comprises the following steps: and a plurality of multi-layer encoders, wherein the input of each encoder firstly flows into a Self-Attention layer, and the convolution kernel is 16 times 16.

The image feature extraction model of the structure comprises the following steps: a symptom-based abnormality detection model and a disease-based diagnostic model, wherein the symptom-based abnormality detection model is used to feature enhance the disease-based diagnostic model.

The feature extraction and analysis of multiple dimensions are performed, and the feature fusion of the multi-mode data comprises the following steps: feature fusion is performed by using a multi-modal attention fusion mechanism, which is expressed as:

wherein Y (i) represents the output of the relationship between a certain character and all other characters, x _i And y _j Two of the characters representing the vector in the fusion; i represents the index until the output of its response is calculated, j is the index enumerating all possible positions; θ (x) _i ,y _j ) Calculating a relationship between two different feature positions; g (x) _j ) Calculating a feature at position j; finally, the final relationship result of 1/C (x) is processed through normalization.

In another aspect, the present invention provides an automatic lung disease recognition apparatus based on multimodal heterogeneous data, comprising: the data structuring module is used for preprocessing unstructured text data by using global unified character embedding characteristics; the data preprocessing module is used for preprocessing the structured text data; the convolutional neural network module is used for carrying out feature extraction on medical image data by using an image feature extraction model of the structure to obtain image features, wherein the image feature extraction model of the structure uses a transformer structure as a trunk model; the text embedding module is used for carrying out relation mapping expression among vocabularies; the feature fusion module is used for carrying out feature extraction and analysis of multiple dimensions, carrying out feature fusion on multi-mode data, wherein the multi-mode data comprises text features and image features obtained by preprocessing unstructured text data and preprocessing structured text data; and the classifier is used for classifying the fused features to obtain an output result.

The data structuring module preprocesses unstructured text data in the following mode: unstructured text data is converted using a rule-oriented structuring algorithm.

The data preprocessing module preprocesses the structured data in the following mode: judging whether the preset value is in a reasonable interval or not, and normalizing the data of different orders of magnitude.

The feature fusion module performs feature extraction and analysis of multiple dimensions in the following manner, and performs feature fusion on the multi-mode data: feature fusion is performed by using a multi-modal attention fusion mechanism, which is expressed as:

wherein Y (i) represents the output of the relationship between a certain character and all other characters, x _i And y _h Two of the characters representing the vector in the fusion; i represents the index until the output of its response is calculated, j is the index enumerating all possible positions; θ (x) _i ,y _j ) Calculating a relationship between two different feature positions; g (x) _j ) Calculating a feature at position j; finally, the final relationship result of 1/C (x) is processed through normalization.

Therefore, the method and the device for automatically identifying the lung diseases based on the multi-mode heterogeneous data provided by the invention can identify the common chest multiple diseases and the common and different symptom and fine classification diseases thereof by fusing text information such as patient complaints and clinical medical record laboratory examination and image information such as CT (computed tomography), and can observe the patient from a macroscopic angle aiming at the multi-mode heterogeneous model of the common chest diseases, thereby effectively overcoming the limitation of the prior art and assisting a clinician to make more accurate judgment.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a method for automatically identifying pulmonary diseases based on multimodal heterogeneous data according to an embodiment of the present invention;

FIG. 2 is a flowchart of a specific implementation of a method for automatically identifying pulmonary diseases based on multimodal heterogeneous data according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an automatic lung disease recognition device based on multi-mode heterogeneous data according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The invention relates to a multi-mode heterogeneous data-based automatic lung disease identification scheme, which specifically comprises chronic obstructive pulmonary disease, bronchiectasis, pneumothorax, lung malignant tumor, emphysema, pneumonia, pulmonary tuberculosis, lung tumor, pleural effusion, interstitial lung disease and the like; the invention mainly utilizes heterogeneous data of a plurality of modes of imaging data (CT, X-ray and the like) and clinical text data (medical records, epidemiology and statistics and laboratory blood biochemical examination) to carry out final chest multi-disease identification. On the basis, the invention is based on multi-mode data, and adopts a deep neural network to efficiently extract image texture characteristics and semantic sign information (bronchi obstruction, bronchiectasis, bronchially enlarged lymph node, bronchis stenosis, pneumothorax, mediastinal lymph node enlargement, atelectasis, pneumoconiosis, bulla, lung solid variate, lung patch shadow, lung streak shadow, emphysema, lung grinding glass density shadow, lung cavitation, lung cavity, lung grid shadow, lung honeycomb shadow, phylum lymph node enlargement, pleural effusion, pleural thickening, augmentation lymph node enlargement, calcification, tumor, nodule and the like) existing in the image to carry out characteristic improvement on image diagnosis; for clinical text data, inputs are general clinical characteristics of the patient (age, body temperature and maximum temperature at admission), complaints, and case diagnosis and treatment procedures, laboratory blood biochemical tests (albumin, serum Lactate Dehydrogenase (LDH), indirect bilirubin, thrombin time, activated Partial Thromboplastin Time (APTT), platelet count, C-reactive protein (CRP), white blood cells, lymphocyte count, neutrophil count, PCT, IL-6, etc.), patient epidemiological routine information, and the like. Aiming at the phenomena of homonymy and heterology, the pulmonary disease fine granularity accurate diagnosis and treatment model based on the multi-modal data is constructed by gathering the multi-modal data of the whole diagnosis and treatment process by means of a large data platform by means of a text analysis function and an image processing function of artificial intelligence. For example, for a large class of pulmonary infection diseases, specific examples include detection of common respiratory viruses (influenza virus, H1N1, H5N1, H7N9, respiratory syncytial virus, coronavirus), bacteria (acinetobacter baumannii, haemophilus influenzae, klebsiella pneumoniae), mycotic pneumonia, pulmonary tuberculosis, and the like at a fine particle size.

The following describes a method for automatically identifying pulmonary diseases based on multimodal heterogeneous data according to an embodiment of the present invention with reference to fig. 1 and fig. 2, where the method for automatically identifying pulmonary diseases based on multimodal heterogeneous data according to the embodiment of the present invention includes:

s1, preprocessing unstructured text data by using a global unified character embedding feature.

Specifically, the overall unified character embedding feature is used for realizing unified modeling on unstructured text data such as medical records, complaints and discharge diagnosis reports. For this type of non-strongly structured data, data quality is critical to the performance of the system.

As a first alternative implementation manner of the embodiment of the present invention, preprocessing unstructured text data includes: unstructured text data is converted using a rule-oriented structuring algorithm. Specifically, firstly, the input data is converted by using a regular guided structuring algorithm, so that the input data is as orderly as possible in format, and therefore, the invention establishes a structuring preprocessing method of the original clinical record.

S2, preprocessing the structured text data.

In particular, pretreatment of data is required for epidemiology as well as laboratory blood biochemical examinations of such already structured data.

As an alternative implementation of the embodiment of the present invention, preprocessing the structured data includes: judging whether the preset value is in a reasonable interval or not, and normalizing the data of different orders of magnitude. Specifically, for example, white blood cell count, platelet count, etc., it is judged whether or not it is in a reasonable section, and it is normalized for various data of different orders of magnitude.

And S3, performing feature extraction on the medical image data by using an image feature extraction model of the structure to obtain image features, wherein the image feature extraction model of the structure uses a transducer structure as a trunk model.

As an optional implementation manner of the embodiment of the present invention, the image feature extraction model of the structure includes: and a plurality of multi-layer encoders, wherein the input of each encoder firstly flows into a Self-Attention layer, and the convolution kernel is 16 times 16. Specifically, for medical imaging, the invention uses an image feature extraction model of the structure, and uses a transducer structure as a backbone model, wherein the model is composed of a plurality of multi-layer encoders, and the input of the encoders flows into the Self-Attention layer first. In the present invention, the image is first subjected to a 16 x 16 convolution kernel to generate a series of tokens that allow the encoder to encode a particular word. The resulting image features are then stored in an intelligent multimodal database for subsequent fusion with text features.

As an optional implementation manner of the embodiment of the present invention, the image feature extraction model of the structure includes: a symptom-based abnormality detection model and a disease-based diagnostic model, wherein the symptom-based abnormality detection model is used to feature enhance the disease-based diagnostic model. Specifically, the present invention designs a symptom-based abnormality detection model and a disease-based diagnostic model: the chest disease diagnosis thinking of clinical specialists is simulated, namely firstly, the abnormal region appearing in the image is judged, and the final disease diagnosis is carried out on the patient based on the abnormal priori knowledge appearing on the patient. In particular, for disease diagnosis of images, final model diagnosis is made on the premise of referring to characteristic results detected by abnormality, a diagnosis model is enhanced by using a network model based on multiple symptoms, and the diagnosis is finally made by identifying the symptoms of the images.

S4, carrying out relation mapping expression among vocabularies.

Specifically, in a conventional NLP, words can be treated as discrete symbols and then represented using one-hot vectors. Then, there is similarity of codes between different vocabularies, and a relation mapping between the context and the target word needs to be found. This complex context is then expressed by modeling the network. In the invention, chinese BERT is used for carrying out relation mapping expression among vocabularies, thereby obtaining text characteristics and image characteristics for fusion.

And S5, extracting and analyzing the characteristics of multiple dimensions, and carrying out characteristic fusion on the multi-mode data, wherein the multi-mode data comprises text characteristics and image characteristics obtained by preprocessing unstructured text data and preprocessing structured text data.

Specifically, feature extraction and analysis of multiple dimensions are performed in an intelligent multi-modal feature database, corresponding data matched with the same word case for inspection are found, and feature fusion is performed on multi-modal data. Firstly, carrying out one-dimensional convolution operation and Dropout operation on the image features and the text in sequence; the convolution layer is used for extracting main features; the Dropout layer is used for avoiding the network model from generating over fitting and simplifying the calculation complexity. And then, carrying out vector dimension splicing fusion on the obtained image features and the corresponding text features, and carrying out a feature layer fusion-based mode on the image features and the corresponding text features.

As an optional implementation manner of the embodiment of the present invention, performing feature extraction and analysis of multiple dimensions, and performing feature fusion on multi-modal data includes: feature fusion is performed by using a multi-modal attention fusion mechanism, which is expressed as:

wherein Y (i) represents the output of the relationship between a certain character and all other characters, x _i And y _j Two of the characters representing the vector in the fusion; u represents the index until the output of its response is calculated, j is the index enumerating all possible positions; θ (x) _u ,y _j ) Calculating a relationship between two different feature positions; g (x) _j ) Calculating a feature at position j; finally, the final relationship result of 1/C (x) is processed through normalization.

In particular, the present invention utilizes multiple focused attention mechanisms to achieve relevance information extraction between images and text. The multimodal attention fusion mechanism can be expressed as:

wherein Y (i) represents the output of the relation between a certain token and all other tokens, x _i And y _j Two of the token represented in the fused vector; i represents the index until the output of its response is calculated, j is the index enumerating all possible positions; θ (x) _i ,y _j ) Calculating a relationship between two different feature positions; g (x) _j ) Calculate position jFeatures at the location; finally, the final relationship result of 1/C (x) is processed through normalization.

Therefore, the characteristics in the image and the characteristics in the text data can be regarded as one node in the transformation former network, so that real-time and simultaneous learning of multiple modes can be directly carried out, and efficient learning expression of the heterogeneous clinical data can be obtained.

And S6, classifying the fused features to obtain an output result.

Specifically, a plurality of classification rules are used to classify a plurality of diseases, and a word classification error is used to back-propagate an optimized classifier. And taking the diagnosis result analyzed in the discharge report as a gold standard, and carrying out feature analysis diagnosis of common lung diseases by combining multiple dimensions.

Therefore, the invention provides an automatic detection and identification method and device for common pulmonary diseases aiming at heterogeneous data in clinic in hospitals, including multi-mode image data, electronic case clinical reports, laboratory blood examination, image report and other image text data, and is provided with a plurality of features of biochemical signs and image detection, and a plurality of data fusion strategies are provided based on the multi-mode data, so that the diagnosis capability of a model to a patient from a plurality of angles is improved, and a more comprehensive support is provided for a clinician in diagnosis.

For example, respiratory infections often have similar clinical symptoms, signs, laboratory tests, and imaging manifestations, typical clinical manifestations being: fever, cough, chest distress, breathlessness and dyspnea; throat congestion, pulmonary dryness or moist rales; the heart beat increases; blood oxygen saturation decrease, etc. Laboratory tests for the same pathogen also share certain similarities, such as viral infections: normal total number of blood normal WBCs, decreased total number of lymphocytes, increased C-reactive protein (CRP), normal Procalcitonin (PCT), increased Lactate Dehydrogenase (LDH), etc. Pulmonary imaging cues: interstitial or frosted glass-like changes in both lungs, etc. On the premise of researching the identification of common chest diseases, the identification of fine classification granularity is carried out by utilizing multi-mode data as much as possible, thereby providing further support for clinicians in diagnosis.

Therefore, the invention solves the automatic detection function of various common lung diseases, can analyze images under the guidance of doctors by combining medical records and images based on various clinical data from different departments, avoids excessive irrelevant areas of the image technology, and accurately solves the identification diagnosis problem. For image data, inputting the image data into a chest CT image or an X-ray image, and efficiently extracting image texture features and semantic sign information existing in the image by adopting a deep neural network; for clinical text data, input is general clinical characteristics (age, body temperature and highest temperature at admission) of a patient, complaints and case diagnosis and treatment processes, laboratory blood biochemical examination (albumin, serum Lactate Dehydrogenase (LDH), indirect bilirubin, thrombin time, activated Partial Thromboplastin Time (APTT), platelet count, C-reactive protein (CRP), white blood cells, lymphocyte count, neutrophil count, PCT, IL-6, etc.), patient epidemiological conventional information, etc., the text data is first structured, and then the text is subjected to feature analysis by using a natural language processing algorithm.

By the invention, more than 10 lung typical abnormal symptoms or diseases can be detected simultaneously. The detection system can assist doctors to obtain higher sensitivity in focus discovery, and can provide more comprehensive support for clinicians in diagnosis.

Fig. 3 is a schematic structural diagram of a device for automatically identifying pulmonary diseases based on multimodal heterogeneous data according to an embodiment of the present invention, where the device for automatically identifying pulmonary diseases based on multimodal heterogeneous data applies the method described above, and the structure of the device for automatically identifying pulmonary diseases based on multimodal heterogeneous data is simply described below, and other less things are referred to the description related to the method for automatically identifying pulmonary diseases based on multimodal heterogeneous data described above, and referring to fig. 3, the device for automatically identifying pulmonary diseases based on multimodal heterogeneous data according to the embodiment of the present invention includes:

the data structuring module is used for preprocessing unstructured text data by using global unified character embedding characteristics;

the data preprocessing module is used for preprocessing the structured text data;

the convolutional neural network module is used for carrying out feature extraction on medical image data by using an image feature extraction model of the structure to obtain image features, wherein the image feature extraction model of the structure uses a transformer structure as a trunk model;

the text embedding module is used for carrying out relation mapping expression among vocabularies;

the feature fusion module is used for carrying out feature extraction and analysis of multiple dimensions, carrying out feature fusion on multi-mode data, wherein the multi-mode data comprises text features and image features obtained by preprocessing unstructured text data and preprocessing structured text data;

and the classifier is used for classifying the fused features to obtain an output result.

As a first alternative implementation manner of the embodiment of the present invention, the data structuring module performs preprocessing on unstructured text data in the following manner: unstructured text data is converted using a rule-oriented structuring algorithm.

As a first alternative implementation manner of the embodiment of the present invention, the data preprocessing module preprocesses structured data in the following manner: judging whether the preset value is in a reasonable interval or not, and normalizing the data of different orders of magnitude.

As a first optional implementation manner of the embodiment of the present invention, the image feature extraction model of the structure includes: and a plurality of multi-layer encoders, wherein the input of each encoder firstly flows into a Self-Attention layer, and the convolution kernel is 16 times 16.

As a first optional implementation manner of the embodiment of the present invention, the image feature extraction model of the structure includes: a symptom-based abnormality detection model and a disease-based diagnostic model, wherein the symptom-based abnormality detection model is used to feature enhance the disease-based diagnostic model.

As a first optional implementation manner of the embodiment of the present invention, the feature fusion module performs feature extraction and analysis of multiple dimensions, and performs feature fusion on the multimodal data in the following manner: feature fusion is performed by using a multi-modal attention fusion mechanism, which is expressed as:

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims

1. An automatic lung disease identification method based on multi-modal heterogeneous data, which is characterized by comprising the following steps:

preprocessing unstructured text data by using a global unified character embedding feature;

preprocessing the structured text data;

performing feature extraction on medical image data by using an image feature extraction model of a structure to obtain image features, wherein the image feature extraction model of the structure uses a transformer structure as a trunk model;

carrying out relation mapping expression among vocabularies;

performing feature extraction and analysis of multiple dimensions, and performing feature fusion on multi-modal data, wherein the multi-modal data comprises text features and image features, wherein the text features are obtained by preprocessing unstructured text data and preprocessing structured text data;

and classifying the fused features to obtain an output result.

2. The method of claim 1, wherein the preprocessing unstructured text data comprises:

the unstructured text data is converted using a rule-oriented structuring algorithm.

3. The method of claim 1, wherein the preprocessing of structured data comprises:

judging whether the preset value is in a reasonable interval or not, and normalizing the data of different orders of magnitude.

4. The method of claim 1, wherein the image feature extraction model of the structure comprises: and a plurality of multi-layer encoders, wherein the input of each encoder firstly flows into a Self-Attention layer, and the convolution kernel is 16 times 16 convolution kernels.

5. The method of claim 1, wherein the image feature extraction model of the structure comprises:

a symptom-based anomaly detection model and a disease-based diagnostic model, wherein the symptom-based anomaly detection model is used to feature enhance the disease-based diagnostic model.

6. The method of claim 1, wherein performing feature extraction and analysis in multiple dimensions, feature fusion in the multimodal data comprises:

feature fusion is performed by using a multi-modal attention fusion mechanism, which is expressed as:

7. An automatic lung disease recognition device based on multi-modal heterogeneous data, comprising:

the convolutional neural network module is used for carrying out feature extraction on medical image data by using an image feature extraction model of a structure to obtain image features, wherein the image feature extraction model of the structure uses a transformer structure as a trunk model;

the feature fusion module is used for carrying out feature extraction and analysis of multiple dimensions and carrying out feature fusion on multi-mode data, wherein the multi-mode data comprises text features and image features, wherein the text features are obtained by preprocessing unstructured text data and preprocessing structured text data;

8. The apparatus of claim 7, wherein the data structuring module pre-processes unstructured text data by:

9. The apparatus of claim 7, wherein the data preprocessing module preprocesses structured data by:

10. The apparatus of claim 7, wherein the image feature extraction model of the structure comprises: and a plurality of multi-layer encoders, wherein the input of each encoder firstly flows into a Self-Attention layer, and the convolution kernel is 16 times 16 convolution kernels.

11. The method of claim 7, wherein the image feature extraction model of the structure comprises:

12. The apparatus of claim 7, wherein the feature fusion module performs feature extraction and analysis of multiple dimensions by performing feature fusion on the multimodal data by: