CN115830017B

CN115830017B - Tumor detection system, method, equipment and medium based on image-text multi-mode fusion

Info

Publication number: CN115830017B
Application number: CN202310087066.8A
Authority: CN
Inventors: 刘伟华; 左勇; 刘磊
Original assignee: Athena Eyes Co Ltd
Current assignee: Athena Eyes Co Ltd
Priority date: 2023-02-09
Filing date: 2023-02-09
Publication date: 2023-07-25
Anticipated expiration: 2043-02-09
Also published as: CN115830017A

Abstract

The application discloses a tumor detection system, method, equipment and storage medium based on image-text multi-mode fusion, relates to the field of medical artificial intelligent detection, and comprises the following steps: the device comprises an electronic medical record feature extraction module, an image feature extraction module, a feature fusion module and a tumor detection module. The electronic medical record feature extraction module obtains text feature vectors of the electronic medical record; the image feature extraction module acquires an electronic computer tomography image feature vector, a nuclear magnetic resonance image feature vector and a biopsy pathology image feature vector; the feature fusion module acquires the fused feature vector; the tumor detection module is used for detecting tumor probability. Therefore, the method and the device fully absorb and fuse the effective characteristics of the four modes by fully learning the data of the symptom complaint information, the electronic computer tomography, the nuclear magnetic resonance and the pathology biopsy image of the patient, so that the liver tumor detection reaches higher accuracy.

Description

Tumor detection system, method, equipment and medium based on image-text multi-mode fusion

Technical Field

The invention relates to the field of medical artificial intelligent detection, in particular to a tumor detection system, method, equipment and storage medium based on image-text multi-mode fusion.

Background

Liver cancer is a disease that severely jeopardizes life and health of the public. Recent statistics indicate that liver cancer is the sixth and third leading rate of cancer incidence worldwide. Among them, hepatocellular carcinoma (hepatocellular carcinoma, HCC) with high mortality is the most common primary liver cancer, accounting for 75% of liver cancer cases; intrahepatic cholangiocellular carcinoma (intrahepatic cholangiocarcinoma, ICC) is the second most common primary liver cancer following HCC, accounting for 5-30% of liver cancer cases.

The existing artificial intelligence technology has quick development on the detection technology of tumors, is concentrated on image recognition and detection, has great advantages, is mainly a single-mode model, is fused with a multi-mode AI model, can be fused with various types of images and data, is a direction of application of future AI in the medical field, is more close to clinic, and is beneficial to application in clinical scenes. The technical scheme of tumor detection mainly comprises the following modes: the first approach is based on CT (Computed Tomography, i.e. computed tomography) mode: computer-aided segmentation and detection plays an important role in early detection and treatment of vital organs of the human body. The liver overall view can be peeped based on the CT image mode, and the relation between the tumor and the hepatic portal blood vessel and the bile duct can be displayed. The CT medical imaging technology is widely applied to medical detection of human liver tumor due to the characteristics of high resolution, low cost and the like. The second approach is based on MRI imaging (Magnetic Resonance Imaging, nuclear magnetic resonance) mode: although CT and ultrasound examinations are often used extensively as a basic examination of liver lesions, because MRI examinations can provide incomparable soft tissue contrast, do not rely on iodine contrast as CT or on the experience and subjective judgment of the operating physician as ultrasound, and can provide more detailed, specialized examination and assessment of liver parenchyma, intrahepatic bile ducts and vascular systems. The third is pathological biopsy. The method mainly comprises the steps that under the positioning of B ultrasonic or CT, a puncture needle is used for entering the liver to obtain the tissue pathology of part of the liver, and the tissue pathology is observed and detected under a microscope, so that the method has definite diagnostic significance for finding cancer cells. Liver penetration can know the microscopic lesions of liver tissue cells earlier and more accurately. Fourth is multi-modal liver tumor detection. The CT image and the MRI image are fused through a deep learning algorithm, and the advantages are complementary, so that the accuracy of liver tumor detection is improved. The above solutions have common problems, and one is that the diagnosis based on the medical image mode does not consider text information of patient complaints at present, including age, symptom, illness time, symptom degree and concurrent symptoms. Patient complaint information is of paramount importance, but in actual diagnosis, diagnosis is only based on the experience of the doctor. The second problem is that the accuracy of predicting the disease is not high due to the judgment of single-mode information input, and the detection of the liver tumor usually needs the simultaneous input of multiple-mode information, and the accuracy of the detection can be ensured only by mutual complementation and mutual evidence.

Disclosure of Invention

In view of the above, the present invention aims to provide a tumor detection system, method, device and storage medium based on image-text multi-mode fusion, which can deeply fuse four mode data of CT image, MRI image, biopsy pathology and patient complaint symptom, compared with the existing single mode detection technology or two mode fusion detection technology, thereby sufficiently, comprehensively and effectively improving liver tumor detection efficiency and accuracy. The specific scheme is as follows:

in a first aspect, the present application discloses a tumor detection system based on image-text multi-mode fusion, comprising:

the electronic medical record feature extraction module is used for inputting information in the electronic medical record into the first pre-training model to obtain text feature vectors of the electronic medical record;

the image feature extraction module is used for respectively inputting image data of three modes of the electronic computer tomography, the nuclear magnetic resonance and the biopsy pathology into the second pre-training model so as to obtain an electronic computer tomography image feature vector, a nuclear magnetic resonance image feature vector and a biopsy pathology image feature vector;

the feature fusion module is used for carrying out feature fusion on the text feature vector, the electronic computer tomography image feature vector, the nuclear magnetic resonance image feature vector and the biopsy pathological image feature vector of the electronic medical record by utilizing a pre-constructed attention mechanism so as to obtain a fused feature vector;

And the tumor detection module is used for inputting the fused feature vector into a pre-trained tumor detection model to acquire a probability result representing that the current detection object has tumor.

Optionally, the electronic medical record feature extraction module further includes:

the corpus construction unit is used for collecting and aggregating a plurality of electronic medical records and constructing a corpus in the medical text field based on text data of the electronic medical records and related documents in the medical field;

and the text data input unit is used for inputting the text data in the corpus into a pre-selected text pre-training model for pre-training so as to obtain the first pre-training model.

Optionally, the electronic medical record feature extraction module includes:

the electronic medical record feature acquisition unit is used for inputting the basic information of the patient and the complaint information of the patient in the electronic medical record into the first pre-training model so as to obtain the text feature vector of the electronic medical record.

Optionally, the image feature extraction module further includes:

the image library construction unit is used for collecting a plurality of examination image results and constructing an image library in the medical image field based on the examination image results and image data of medical field related documents;

The image data input unit is used for inputting the image data in the image library into a pre-selected image pre-training model for pre-training so as to obtain the second pre-training model.

Optionally, the feature fusion module is specifically configured to perform feature fusion on the text feature vector, the electronic computed tomography image feature vector, the nuclear magnetic resonance image feature vector and the biopsy pathology image feature vector of the electronic medical record by using a stitching mode based on rectangular stacking and a pre-constructed attention mechanism, so as to obtain a fused feature vector.

Optionally, the feature fusion module includes:

the first splicing unit is used for splicing the text feature vector, the electronic computer tomography image feature vector, the nuclear magnetic resonance image feature vector and the biopsy pathological image feature vector of the electronic medical record based on a rectangular stacking splicing mode so as to obtain a spliced feature vector;

the second splicing unit is used for splicing the text feature vector of the electronic medical record with the electronic computer tomography image feature vector, the nuclear magnetic resonance image feature vector and the biopsy pathology image feature vector by taking the text of the electronic medical record as a center and based on the attention mechanism constructed in advance so as to obtain three corresponding groups of attention feature vectors;

And the feature fusion unit is used for splicing the spliced feature vector with the three groups of attention feature vectors to obtain the fused feature vector.

In a second aspect, the application discloses a tumor detection method based on image-text multi-mode fusion, which comprises the following steps:

inputting information in the electronic medical record into a first pre-training model to obtain text feature vectors of the electronic medical record;

respectively inputting image data of three modes of electronic computer tomography, nuclear magnetic resonance and biopsy pathology into a second pre-training model to obtain an electronic computer tomography image feature vector, a nuclear magnetic resonance image feature vector and a biopsy pathology image feature vector;

performing feature fusion on the text feature vector, the electronic computer tomography image feature vector, the nuclear magnetic resonance image feature vector and the biopsy pathological image feature vector of the electronic medical record by utilizing a pre-constructed attention mechanism to obtain a fused feature vector;

and inputting the fused feature vector into a pre-trained tumor detection model to obtain a probability result representing that the current detection object has tumor.

Optionally, the method further comprises:

Acquiring a target label which is determined in advance based on a tumor examination result corresponding to the electronic medical record information confirmed by a doctor; wherein, if the target label is 1, the tumor is represented, and if the target label is 0, the tumor is represented not to exist;

and determining a cross entropy loss function value by using the target label and the probability result, and determining the accuracy of the probability result by using the cross entropy loss function value.

In a third aspect, the present application discloses an electronic device comprising:

a memory for storing a computer program;

and the processor is used for executing the computer program to realize the steps of the tumor detection method based on the image-text multi-mode fusion.

In a fourth aspect, the present application discloses a computer-readable storage medium for storing a computer program; the method comprises the steps of a computer program, wherein the computer program realizes the steps of the tumor detection method based on the image-text multi-mode fusion when being executed by a processor.

From the above, the application provides a tumor detection system and a method based on image-text multi-mode fusion, and the electronic medical record feature extraction module can input information in an electronic medical record into a first pre-training model to obtain a text feature vector of the electronic medical record; the image feature extraction module is used for respectively inputting image data of three modes of the electronic computer tomography, the nuclear magnetic resonance and the biopsy pathology into the second pre-training model so as to obtain an electronic computer tomography image feature vector, a nuclear magnetic resonance image feature vector and a biopsy pathology image feature vector; the feature fusion module can utilize a pre-constructed attention mechanism to perform feature fusion on the text feature vector, the electronic computer tomography image feature vector, the nuclear magnetic resonance image feature vector and the biopsy pathology image feature vector of the electronic medical record so as to obtain a fused feature vector; and the tumor detection module is used for inputting the fused feature vector into a pre-trained tumor detection model to acquire a probability result representing that the current detection object has tumor. Therefore, the method and the device for detecting the tumor of the patient can be used for fusing multi-mode information such as text information, medical examination images, living medical record detection and the like of the electronic medical record of the patient, combining a deep learning algorithm to detect the tumor of the patient, and can overcome the defects that the electronic computer tomography image shows low density of examination results and has unclear boundaries. Meanwhile, the advantages of the nuclear magnetic resonance image are adopted, and other image detection results are used for compensating for the defects of nuclear magnetic resonance image detection. The multi-mode cross characteristic data are fused, so that the tumor detection efficiency and accuracy are fully, comprehensively and effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic device structure diagram of a tumor detection system based on image-text multi-mode fusion disclosed in the application;

FIG. 2 is a flow chart of a tumor detection technique based on image-text multi-mode fusion disclosed in the application;

FIG. 3 is a schematic view of an electronic medical record of a hospital;

FIG. 4 is a schematic view of an image disclosed in the present application, wherein FIG. 4 (a) is a CT image, FIG. 4 (b) is an MRI image, and FIG. 4 (c) is a biopsy pathology image;

FIG. 5 is a schematic diagram of a multi-modal feature fusion scheme disclosed herein;

FIG. 6 is a flowchart of a tumor detection method based on image-text multi-mode fusion disclosed in the present application;

fig. 7 is a block diagram of an electronic device disclosed in the present application.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

There are several common problems in the prior art, one is that the current diagnosis based on medical imaging methods does not take into account textual information of patient complaints, including age, symptoms, time of illness, degree of symptoms, and concurrent symptoms. Patient complaint information is of paramount importance, but is diagnosed against the physician's own experience in actual diagnosis. The second problem is that by judging the input of single-mode information, the accuracy rate of predicting the disease is likely to be low. Because the detection of the tumor often needs the simultaneous input of a plurality of mode information, the detection accuracy can be ensured only by mutually supplementing and mutually proving. The method can deeply fuse four modal data of CT images, MRI images, biopsy pathology and patient complaints, and fully, comprehensively and effectively improve tumor detection efficiency and accuracy.

Referring to fig. 1, the embodiment of the invention discloses a tumor detection system based on image-text multi-mode fusion, which comprises:

the system comprises an electronic medical record feature extraction module 11, an image feature extraction module 12, a feature fusion module 13 and a tumor detection module 14.

The electronic medical record feature extraction module 11 is configured to input information in an electronic medical record into a first pre-training model to obtain a text feature vector of the electronic medical record.

It will be appreciated that the overall system of the present application may be divided into 4 modules, the 4 modules being: the device comprises an electronic medical record feature extraction module, an image feature extraction module, a feature fusion module and a tumor detection module. The image feature extraction module is a CT, MRI, biopsy pathological examination image feature extraction module. Through analyzing the data types of the four modes, the electronic medical record can be found to be text data, and CT images, MRI images and biopsy pathology are found to be image data. So in a specific embodiment, where image modality feature extraction is performed, the application may use ResNet (Deep Residual Learning for Image Recognition, application of depth residual learning in image recognition) pre-training model extraction features, where other image pre-training models may be used instead, such as NFNet (normal-Free ResNet). When the text electronic medical record modal feature extraction is carried out, the application can adopt a Robert (A Robustly Optimized BERT Pretraining Approach, a powerful optimized BERT pre-training method) pre-training model to extract the feature, and other natural language preprocessing models such as Bert can be used. The technical flowchart of the feature extraction is shown in fig. 2, and the feature extraction of the electronic medical record is specifically described first.

The application firstly analyzes an electronic medical record document desensitized to a tumor patient, and as shown in fig. 3, the document mainly comprises the following basic information: the first category is basic information of the patient, including age, sex, date of visit department, etc. The second category is complaint information, including existing symptoms, symptom levels, concurrent symptoms, past medical history, allergic history, and the like. The third category is the detection of illness by the doctor based on patient statement and examination results. It should be noted that, because each hospital has its own specific electronic medical record document, the present application defaults to structured document data. Specifically, the electronic medical record feature extraction module may include an electronic medical record feature acquisition unit, and may further include a corpus construction unit and a text data input unit. The corpus construction unit can collect and aggregate a plurality of electronic medical records, and construct a corpus in the medical text field based on text data of the electronic medical records and related documents in the medical field; and the text data input unit is used for inputting the text data in the corpus into a pre-selected text pre-training model for pre-training so as to obtain the first pre-training model. The electronic medical record feature acquisition unit inputs the basic information of the patient and the complaint information of the patient in the electronic medical record into the first pre-training model to obtain the text feature vector of the electronic medical record. It can be appreciated that the first pre-training model may be a robberta model, and finally, the last hidden layer feature of the robberta algorithm model is selected as the final feature of the text electronic medical record. The method comprises the steps of firstly collecting and aggregating a large amount of text electronic medical records of a three-dimensional hospital, and simultaneously combining a large amount of documents in the medical field to jointly form a corpus in the medical text field. Then, selecting the RoBerta model as a text pre-training model, inputting a large amount of corpus text data such as electronic medical records and disease information in the medical field into the pre-training model to continue pre-training, and constructing the text pre-training model RoBerta in the medical field. And finally, inputting a user basic information module and a user complaint module in the electronic medical record into a pre-training model RoBerta to obtain a text characterization vector X= [ B, L, E ] of the electronic medical record, wherein B represents the Batch quantity Batch, L represents the text length and E represents the feature vector dimension.

The image feature extraction module 12 is configured to input image data of three modalities of electronic computed tomography, nuclear magnetic resonance and biopsy pathology into the second pre-training model respectively, so as to obtain an electronic computed tomography image feature vector, a nuclear magnetic resonance image feature vector and a biopsy pathology image feature vector.

It should be noted that, the main image of tumor examination in the present application is based on CT image, MRI image and biopsy pathology image, so that the method of the present application can be utilized as long as the tumor can be checked and determined by the conventional tumor detection image. The present application mainly uses liver tumor as an example to specifically describe, and for liver tumor detection, a liver CT image, an MRI image, and a biopsy pathology image in a conventional scheme are shown in fig. 4. In this embodiment, the image feature extraction module may further include an image library construction unit and an image data input unit. The image library construction unit is used for collecting a plurality of examination image results and constructing an image library in the medical image field based on the examination image results and image data of medical field related documents; the image data input unit is used for inputting the image data in the image library into a pre-selected image pre-training model for pre-training so as to obtain the second pre-training model. Firstly, a large amount of examination image results of a three-dimensional hospital are collected, and meanwhile, a large amount of image data such as documents in the medical field are introduced to form an image library in the medical image field. Secondly, selecting a ResNet model as an image pre-training model, inputting a large amount of image data in the medical field into the pre-training model to continuously pre-train, and finally forming the medical image field pre-training model ResNet. And finally, selecting the last hidden layer characteristic of the ResNet model as the final image characteristic. Specifically, image data of a CT mode is input into a pre-training model ResNet to obtain an inspection image characterization vector Y= [ B, C, W, H ], wherein B represents the Batch number Batch, C represents the channel number channel, W represents the characterization vector Width, and H represents the Height of the characterization vector. Feature tiling W, H dimensions can transform the image characterization vector Y into y1= [ B, C, P ], where p=w×h. Repeating the steps, and respectively obtaining the image characteristic data of each mode by the MRI image model and the liver pathology biopsy image.

And the feature fusion module 13 is used for carrying out feature fusion on the text feature vector, the electronic computer tomography image feature vector, the nuclear magnetic resonance image feature vector and the biopsy pathology image feature vector of the electronic medical record by utilizing a pre-constructed attention mechanism so as to obtain a fused feature vector.

It can be understood that the invention mainly contains data of four modes, the first is text model data of an electronic medical record document; the second is CT examination image data; third MRI image detection data; fourth is the pathology examination image data. The fusion strategy and training process of these four modality data is shown in fig. 5. The feature fusion module is specifically configured to perform feature fusion on a text feature vector, an electronic computed tomography image feature vector, a nuclear magnetic resonance image feature vector and a biopsy pathology image feature vector of the electronic medical record by using a stitching mode based on rectangular stacking and a pre-constructed attention mechanism, so as to obtain a fused feature vector. The feature fusion module comprises a first splicing unit, a second splicing unit and a feature fusion unit. The first stitching unit is used for stitching the text feature vector, the electronic computer tomography image feature vector, the nuclear magnetic resonance image feature vector and the biopsy pathological image feature vector of the electronic medical record based on a rectangular stacking stitching mode so as to obtain a stitched feature vector; the second splicing unit is used for splicing the text feature vector of the electronic medical record with the electronic computer tomography image feature vector, the nuclear magnetic resonance image feature vector and the biopsy pathology image feature vector by taking the text of the electronic medical record as a center and based on the attention mechanism constructed in advance so as to obtain three corresponding groups of attention feature vectors; and the feature fusion unit is used for splicing the spliced feature vector with the three groups of attention feature vectors to obtain the fused feature vector. Specifically, the invention fuses four modal feature layers through an Attention (Attention) mechanism, and finally generates a fused feature vector. Wherein, the attention formula is:

Q, K, V denotes the word embedded layer encoding vector and q=k=v, Q, K and V denote Query, key and Value, respectively, the dot product of Q and K denote the degree of similarity of Q and K, but this degree of similarity is not normalized, so a softmax function is required to normalize the results of Q and K, the result after softmax is a mask matrix with all values of 0-1, and V denotes the feature after input linear transformation, and multiplying the mask matrix by V yields the filtered V feature.

When four different types of main features are effectively fused, two fusion modes are mainly adopted, namely, the electronic medical record text feature X1, the CT image feature X2, the MRI image feature X3 and the biopsy image feature X4 are simply spliced without learning cost, the four are subjected to matrix stacking to form output final features X= [ X1, X2, X3 and X4], and attention with learning mechanism is introduced to fuse, attention formulas specifically adopt a global attention mode, the application takes texts as a center, namely, the electronic medical record text feature X1 points to K and V vectors, the CT image feature X2, the MRI image feature X3 and the biopsy image feature X4 point to Q respectively, and three groups of attribute vector results of attribute_1= (K=V=x1, Q=x2) and attribute_2= (K=V=x1, Q=x3), and attribute_3= (K=V=x1, Q=x4) are obtained. The patent adopts a mixed mode of the two, namely splicing the splicing characteristic and the attribute characteristic again, waiting until the final output characteristic X_OUT= [ X, attribute_1, attribute_2 and attribute_3 ] is obtained, and the feature vector after fusion is obtained.

The tumor detection module 14 is configured to input the feature vector after fusion into a tumor detection model that is trained in advance, and obtain a probability result that characterizes that the current detection object has a tumor.

It can be understood that the invention can achieve a high-accuracy tumor detection technology based on the trained tumor detection model by obtaining the multi-modal characteristics of the model to be predicted.

From the above, the information in the electronic medical record can be input into the first pre-training model through the electronic medical record feature extraction module so as to obtain the text feature vector of the electronic medical record; the image data of three modes of the electronic computer tomography, the nuclear magnetic resonance and the biopsy pathology can be respectively input into a second pre-training model through an image feature extraction module so as to obtain an electronic computer tomography image feature vector, a nuclear magnetic resonance image feature vector and a biopsy pathology image feature vector; feature fusion can be carried out on the text feature vector, the electronic computer tomography image feature vector, the nuclear magnetic resonance image feature vector and the biopsy pathological image feature vector of the electronic medical record by utilizing a pre-constructed attention mechanism through a feature fusion module so as to obtain a fused feature vector; and inputting the fused feature vector into a pre-trained tumor detection model through a tumor detection module to acquire a probability result representing that the current detection object has tumor. Therefore, the method and the device for detecting the tumor of the patient can be used for fusing multi-mode information such as text information, medical examination images, living medical record detection and the like of the electronic medical record of the patient, combining a deep learning algorithm to detect the tumor of the patient, and can overcome the defects that the electronic computer tomography image shows low density of examination results and has unclear boundaries. Meanwhile, the advantages of the nuclear magnetic resonance image are adopted, and other image detection results are used for compensating for the defects of nuclear magnetic resonance image detection. The multi-mode cross characteristic data are fused, so that the tumor detection efficiency and accuracy are fully, comprehensively and effectively improved.

Referring to fig. 6, the embodiment of the invention discloses a tumor detection method based on image-text multi-mode fusion, which comprises the following steps:

and S11, inputting information in the electronic medical record into a first pre-training model to obtain text feature vectors of the electronic medical record.

Step S12, respectively inputting the image data of the three modes of the electronic computer tomography, the nuclear magnetic resonance and the biopsy pathology into a second pre-training model to obtain an electronic computer tomography image feature vector, a nuclear magnetic resonance image feature vector and a biopsy pathology image feature vector.

And S13, carrying out feature fusion on the text feature vector, the electronic computer tomography image feature vector, the nuclear magnetic resonance image feature vector and the biopsy pathological image feature vector of the electronic medical record by utilizing a pre-constructed attention mechanism so as to obtain a fused feature vector.

And S14, inputting the fused feature vector into a pre-trained tumor detection model, and obtaining a probability result representing that the current detection object has tumor.

In this embodiment, for detecting a tumor, a target label determined in advance based on a tumor examination result corresponding to the electronic medical record information confirmed by a doctor is obtained; wherein, if the target label is 1, the tumor is represented, and if the target label is 0, the tumor is represented not to exist; and determining a cross entropy loss function value by using the target label and the probability result, and determining the accuracy of the probability result by using the cross entropy loss function value. Specifically, the liver tumor detection result confirmed by the doctor in the text medical record is used as a label for training, a cross entropy loss function is used as a discrimination loss function, and the cross entropy loss cross entropyloss has the following expression:

Wherein, the liquid crystal display device comprises a liquid crystal display device,the model predicts the probability that the sample is liver tumor, y is a sample label, if the sample belongs to liver tumor, the value is 1, otherwise, the value is 0. Therefore, the liver tumor detection technology with high accuracy can be achieved by obtaining the multi-mode characteristics of the model to be predicted based on the trained liver tumor detection model. Details concerning steps S11, S12, S13Reference may be made to the relevant content in the above embodiments, and details are not repeated here.

As can be seen from the above, the present application inputs the information in the electronic medical record into the first pre-training model to obtain the text feature vector of the electronic medical record, and inputs the image data of three modes of the electronic computed tomography, the nuclear magnetic resonance and the biopsy pathology into the second pre-training model to obtain the electronic computed tomography image feature vector, the nuclear magnetic resonance image feature vector and the biopsy pathology image feature vector, and performs feature fusion on the text feature vector, the electronic computed tomography image feature vector, the nuclear magnetic resonance image feature vector and the biopsy pathology image feature vector of the electronic medical record by using a pre-constructed attention mechanism to obtain the fused feature vector; and inputting the fused feature vector into a tumor detection model which is trained in advance, and obtaining a probability result representing that the current detection object has tumor. Therefore, the method and the device fully learn the data of four modes of patient self-expression symptom complaint information, medical CT, MRI and pathology biopsy images based on the image-text multi-mode liver tumor detection technology, fully absorb and fuse the effective characteristics of the four modes, and enable tumor detection to achieve higher accuracy.

It will be appreciated that the prior art has several methods for detection of tumors (for example liver tumors), the first of which is based on CT: computer-aided segmentation and detection plays an important role in early detection and treatment of vital organs of the human body. The liver overall view can be peeped based on the CT image mode, and the relation between the tumor and the hepatic portal blood vessel and the bile duct can be displayed. The CT medical imaging technology is widely applied to medical detection of human liver tumor due to the characteristics of high resolution, low cost and the like. The second approach is based on MRI imaging (nuclear magnetic resonance) mode: although CT and ultrasound examinations are often used extensively as a basic examination of liver lesions, because MRI examinations can provide incomparable soft tissue contrast, do not rely on iodine contrast as CT or on the experience and subjective judgment of the operating physician as ultrasound, and can provide more detailed, specialized examination and assessment of liver parenchyma, intrahepatic bile ducts and vascular systems. The third is pathological biopsy. The method mainly comprises the steps that under the positioning of B ultrasonic or CT, a puncture needle is used for entering the liver to obtain the tissue pathology of part of the liver, and the tissue pathology is observed and detected under a microscope, so that the method has definite diagnostic significance for finding cancer cells. Liver penetration can know the microscopic lesions of liver tissue cells earlier and more accurately. Fourth is multi-modal liver tumor detection. The CT image and the MRI image are fused through a deep learning algorithm, and the advantages are complementary, so that the accuracy of liver tumor detection is improved. However, in practical applications, the above solutions have some problems. For example, for the first approach, the main problem is that some small liver tumors that are difficult to detect by CT are detected, and liver CT examination can find liver occupancy lesions, which are mainly represented by low density and are not clearly bordered. For the second approach, the disadvantage is that liver lesions are still difficult to diagnose by MRI alone, unlike endoscopes which can obtain both imaging and pathology detection. Furthermore, it is preferable to apply CT examination to liver, pancreas, adrenal gland and prostate. Meanwhile, if the metal object is left in the body, the MRI is not acceptable, the critical patient and the pregnant patient cannot be subjected to MRI examination in 3 months. For the third protocol, in some cases, liver biopsy may not be able to determine its specific pathology because of too little tissue sampled, or the probability of false negatives. For the fourth scheme, the accuracy of liver tumor detection is improved to a certain extent, but text information of patient medical records is not considered, liver tumor biopsy information is not utilized, and partial main information is lost.

However, the method is mainly based on linguistics, probability theory, image processing, artificial intelligence and other domain knowledge, integrates multi-mode information such as text information of electronic medical records of patients, medical examination images, living medical record detection and the like, and combines a deep learning algorithm to detect liver tumors of the patients. Compared with the existing technical scheme, the scheme has the following advantages: compared with the CT image detection technical scheme, the invention adopts various images, biopsy pathology and other detection results, and can make up for the defect of low density representation and unclear boundary of the CT image on the detection results. Compared with the MRI (nuclear magnetic resonance) image technical scheme, the scheme can adopt the advantages of the MRI image at the same time, and overcomes the defects of the MRI image detection by using other image detection results. Compared with a single biopsy pathology technical scheme, the multi-mode cross characteristic data can be fully fused, so that liver tumor diseases of patients can be detected more accurately. Compared with the existing single-mode detection technology or two-mode fusion detection technology, the scheme can deeply fuse four mode data of CT images, MRI images, biopsy pathology and patient complaints, and fully, comprehensively and effectively improves liver tumor detection efficiency and accuracy. Likewise, the advantages of the present application over prior art solutions are not only manifested in the detection of liver tumors, but also in some tumors that are suitable for the above-described detection schemes.

Further, the embodiment of the present application further discloses an electronic device, and fig. 7 is a block diagram of the electronic device 20 according to an exemplary embodiment, where the content of the figure is not to be considered as any limitation on the scope of use of the present application.

Fig. 7 is a schematic structural diagram of an electronic device 20 according to an embodiment of the present application. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. The memory 22 is configured to store a computer program, which is loaded and executed by the processor 21 to implement relevant steps in the tumor detection method based on the image-text multi-mode fusion disclosed in any of the foregoing embodiments. In addition, the electronic device 20 in the present embodiment may be specifically an electronic computer.

In this embodiment, the power supply 23 is configured to provide an operating voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and the communication protocol in which the communication interface is in compliance is any communication protocol applicable to the technical solution of the present application, which is not specifically limited herein; the input/output interface 25 is used for acquiring external input data or outputting external output data, and the specific interface type thereof may be selected according to the specific application requirement, which is not limited herein.

The memory 22 may be a carrier for storing resources, such as a read-only memory, a random access memory, a magnetic disk, or an optical disk, and the resources stored thereon may include an operating system 221, a computer program 222, and the like, and the storage may be temporary storage or permanent storage.

The operating system 221 is used for managing and controlling various hardware devices on the electronic device 20 and computer programs 222, which may be Windows Server, netware, unix, linux, etc. The computer program 222 may further comprise a computer program capable of performing other specific tasks in addition to the computer program capable of performing the method of tumor detection based on the teletext multimodal fusion performed by the electronic device 20 as disclosed in any of the embodiments described above.

Further, the application also discloses a computer readable storage medium for storing a computer program; the tumor detection method based on the image-text multi-mode fusion is disclosed, wherein the computer program is executed by a processor. For specific steps of the method, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and no further description is given here.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing has outlined the detailed description of the preferred embodiment of the present application, and the detailed description of the principles and embodiments of the present application has been provided herein by way of example only to facilitate the understanding of the method and core concepts of the present application; meanwhile, as those skilled in the art will have modifications in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. Tumor detecting system based on picture and text multimode fusion, characterized by comprising:

the electronic medical record feature extraction module is used for inputting information in the electronic medical record into the first pre-training model to obtain text feature vectors of the electronic medical record; the information in the electronic medical record comprises patient basic information and patient complaint information; the patient complaint information comprises existing symptoms, symptom degrees, concurrent symptoms, past medical history and allergy history;

the tumor detection module is used for inputting the fused feature vector into a pre-trained tumor detection model to obtain a probability result representing that the current detection object has tumor;

wherein, the feature fusion module includes:

the second splicing unit is used for respectively fusing the text feature vector of the electronic medical record with the electronic computer tomography image feature vector, the nuclear magnetic resonance image feature vector and the biopsy pathology image feature vector by taking the text of the electronic medical record as a center and based on the attention mechanism constructed in advance so as to obtain three corresponding groups of attention feature vectors;

2. The tumor detection system based on the image-text multi-mode fusion according to claim 1, wherein the electronic medical record feature extraction module further comprises:

3. The tumor detection system based on the image-text multi-mode fusion according to claim 1, wherein the image feature extraction module further comprises:

4. The tumor detection method based on the image-text multi-mode fusion is characterized by comprising the following steps of:

inputting the fused feature vector into a pre-trained tumor detection model to obtain a probability result representing that the current detection object has tumor;

the feature fusion of the text feature vector, the electronic computed tomography image feature vector, the nuclear magnetic resonance image feature vector and the biopsy pathology image feature vector of the electronic medical record by utilizing a pre-constructed attention mechanism to obtain a fused feature vector comprises:

Splicing the text feature vector, the electronic computer tomography image feature vector, the nuclear magnetic resonance image feature vector and the biopsy pathological image feature vector of the electronic medical record based on a rectangular stacking splicing mode to obtain a spliced feature vector;

taking the electronic medical record text as a center and based on the pre-constructed attention mechanism, respectively fusing the text feature vector of the electronic medical record with the electronic computed tomography image feature vector, the nuclear magnetic resonance image feature vector and the biopsy pathology image feature vector to obtain three corresponding groups of attention feature vectors;

and splicing the spliced feature vector with three groups of attention feature vectors to obtain the fused feature vector.

5. The method for detecting tumors based on the graphic multi-modal fusion according to claim 4, further comprising:

6. An electronic device, comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement the steps of the tumor detection method based on the teletext multi-modality fusion as claimed in claim 4 or 5.

7. A computer-readable storage medium storing a computer program; wherein the computer program when executed by a processor implements the steps of the method for detecting tumors based on the graph-text multi-modal fusion according to claim 4 or 5.