WO2021012225A1

WO2021012225A1 - Artificial intelligence system for medical diagnosis based on machine learning

Info

Publication number: WO2021012225A1
Application number: PCT/CN2019/097538
Authority: WO
Inventors: Mingyang Sun; Xiaoqing Yang; Zang Li
Original assignee: Beijing Didi Infinity Technology And Development Co., Ltd.
Priority date: 2019-07-24
Filing date: 2019-07-24
Publication date: 2021-01-28

Abstract

Artificial intelligence systems and methods for training a learning model for medical diagnosis are provided. The artificial intelligence system includes a storage device, a processor, and a communication interface. The storage device is configured to store a sample patient description of a patient and a known disease of the patient. The processor is configured to train the learning model, including a first sub-model and a second sub-model. A first symptom feature matrix is determined based on the sample patient description and input the first symptom feature matrix to the first sub-model. A second symptom feature matrix is determined based on the known disease and input the second symptom feature matrix to the second sub-model. the first sub-model and the second sub-model are jointly optimized. The communication interface is configured to provide the learning model for automatically diagnosing a disease from a patient description.

Description

ARTIFICIAL INTELLIGENCE SYSTEM FOR MEDICAL DIAGNOSIS BASED ON MACHINE LEARNING

TECHNICAL FIELD

The present disclosure relates to artificial intelligence (AI) systems and methods for medical diagnosis, and more particularly to, AI systems and methods for making medical diagnosis from the patient’s descriptions using machine learning.

BACKGROUND

Pre-diagnosis is usually performed in hospitals to preliminarily determine the illnesses of patients before sending them to the right doctors. Pre-diagnosis is typically based on symptoms described by the patient. For example, if the patient says she has a fever and a running nose, she will be pre-diagnosed as having a cold or a flu and be sent to an internal medicine doctor. If the patient says that she has itchy rashes on her skin, she will be pre-diagnosed as having skin allergies and be sent to a dermatologist.

Pre-diagnosis is typically performed by medical practitioners, such as physicians or nurses. For example, hospitals usually have pre-diagnosis personnel available at the check-in desk to determine where the patient should be sent to. However, having practitioners perform the pre-diagnosis wastes valuable resources. Automated pre-diagnosis methods are used to improve the efficiency. For example, diagnosis robots are being developed to perform the pre-diagnosis. These automated methods provide a preliminary diagnosis based on patient’s described symptoms, e.g., based on preprogramed mappings between diseases and known symptoms.

Patient descriptions are, however, not accurate or clear. For example, the patient may be under the influence of the illness or medicine and could not express herself accurately. In addition, patients are not practitioners and are therefore not familiar with medical terminologies for describing symptoms. Indeed, patients, especially when describing symptoms orally, may use informal language while medical terminologies are usually formal. As a result, existing automated methods could not readily perform accurate medical diagnosis from patient descriptions.

Embodiments of the disclosure address the above problems by providing improved artificial intelligence systems and methods for automatically making medical diagnosis from patient’s descriptions using machine learning.

SUMMARY

Embodiments of the disclosure provide an artificial intelligence system for training a learning model for medical diagnosis. An exemplary artificial intelligence system includes a storage device, a processor, and a communication interface. The storage device is configured to store a sample patient description of a patient and a known disease of the patient. The processor is configured to train the learning model. The learning model includes a first sub-model and a second sub-model. To train the learning model, the processor is configured to determine a first symptom feature matrix based on the sample patient description and input the first symptom feature matrix to the first sub-model. The processor is further configured to determine a second symptom feature matrix based on the known disease and input the second symptom feature matrix to the second sub-model. The processor is also configured to jointly optimize the first sub-model and the second sub-model. The communication interface is configured to provide the learning model for automatically diagnosing a disease from a patient description.

Embodiments of the disclosure further provide an artificial intelligence system for making medical diagnosis based on a learning model. An exemplary artificial intelligence system includes a patient interaction interface, a storage device, and a processor. The patient interaction interface is configured to receive a patient description of a patient. The storage device is configured to store the learning model. The learning model includes a first sub-model and a second sub-model that are jointly trained. The processor is configured to determine a symptom feature matrix based on the patient description, obtain a feature map by applying the first sub-model to the symptom feature matrix, and identify a disease for the patient by applying the second sub-model on the feature map.

Embodiments of the disclosure further provide an artificial intelligence method for training a learning model for medical diagnosis. The learning model includes a first sub-model and a second sub-model. An exemplary artificial intelligence method includes receiving, from a storage device, a sample patient description of a patient and a known disease of the patient. The method further includes determining, by a processor, first symptom feature matrix based on the sample patient description, and inputting, by the processor, the first symptom feature matrix to the first sub-model. The method also includes determining, by the processor, a second symptom feature matrix based on the known disease, and inputting the second symptom feature matrix to the second sub-model. The method additionally includes jointly optimizing, by the processor, the first sub-model and the second sub-model. The method yet further includes providing, by a communication interface, the learning model for automatically diagnosing a disease from a patient description.

Embodiments of the disclosure further provide an artificial intelligence method for making medical diagnosis based on a learning model. The learning model includes a first sub-model and a second sub-model that are jointly trained. An exemplary artificial intelligence method includes receiving, through a patient interaction interface, a patient description of a patient. The method further includes determining, by a processor, a symptom feature matrix based on the patient description. The method also includes obtaining, by the processor, a feature map by applying the first sub-model to the symptom feature matrix. The method additionally includes identifying, by the processor, a disease for the patient by applying the second sub-model on the feature map.

Embodiments of the disclosure further provide a non-transitory computer-readable medium having instructions stored thereon that, when executed by a processor, causes the processor to perform an artificial intelligence method for training a learning model for medical diagnosis. The learning model includes a first sub-model and a second sub-model. The artificial intelligence method includes receiving, determining a first symptom feature matrix based on a sample patient description of a patient, and inputting the first symptom feature matrix to the first sub-model. The method further includes determining a second symptom feature matrix based on a known disease of the patient, and inputting the second symptom feature matrix to the second sub-model. The method additionally includes jointly optimizing the first sub-model and the second sub-model. The method also includes providing the learning model for automatically diagnosing a disease from a patient description.

Embodiments of the disclosure further provide a non-transitory computer-readable medium having instructions stored thereon that, when executed by a processor, causes the processor to perform an artificial intelligence method for making medical diagnosis based on a learning model. The learning model includes a first sub-model and a second sub-model that are jointly trained. The artificial intelligence method includes determining a symptom feature matrix based on a patient description of a patient. The method further includes obtaining a feature map by applying the first sub-model to the symptom feature matrix. The method additionally includes identifying a disease for the patient by applying the second sub-model on the feature map.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a schematic diagram of an exemplary medical diagnosis system, according to embodiments of the disclosure;

FIG. 2 illustrates a block diagram of an exemplary AI system for training a diagnosis model, according to embodiments of the disclosure;

FIG. 3 illustrates a schematic diagram of an exemplary diagnosis model, according to embodiments of the disclosure;

FIG. 4 illustrates a flowchart of an exemplary method for training a diagnosis model, according to embodiments of the disclosure;

FIG. 5 illustrates a block diagram of an exemplary AI system for making medical diagnosis based on a diagnosis model, according to embodiments of the disclosure; and

FIG. 6 illustrates a flowchart of an exemplary method for making medical diagnosis based on a diagnosis model, according to embodiments of the disclosure.

DETAILED DESCRIPTION

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts.

FIG. 1 illustrates a schematic diagram of an exemplary medical diagnosis system 100, according to embodiments of the disclosure. Consistent with the present disclosure, medical diagnosis system 100 is configured to perform medical diagnosis from patient descriptions (e.g., patient descriptions 103) based on a diagnosis model 105 trained using sample patient descriptions and corresponding known diseases (e.g., included in training data 101) . In some embodiments, medical diagnosis system 100 may include components shown in FIG. 1, including a training database 110, a model training device 120, a diagnosis device 130, a patient description dataset 140, a patient terminal 150, and a network 160 to facilitate communications among the various components. It is contemplated that medical diagnosis system 100 may include more or less components compared to those shown in FIG. 1.

Consistent with the present disclosure, medical diagnosis system 100 may receive patient descriptions (e.g., sample patient descriptions as part of training data 101 or patient descriptions 103) from a patient terminal 150. For example, patient terminal 150 may be a mobile phone, a desktop computer, a laptop, a PDA, a robot, a kiosk, etc. Patient terminal 150 may include a patient interaction interface configured to receive the patient descriptions provided by one or more patients. In some embodiments, patient terminal 150 may include a keyboard, hard or soft, for patients to type in the patient description. Patient terminal 150 may additionally or alternatively include a touch screen for patients to handwrite the patient description. Accordingly, patient terminal 150 may record the patient description as texts. If the input is handwriting, patient terminal 150 may automatically recognize the handwriting and convert it to text information. In some other embodiments, patient terminal 150 may include a microphone, for recording the patient description provided by patients orally. Patient terminal 150 may automatically transcribe the recorded audio data into texts. In some alternative embodiments, the handwriting recognition and audio transcription may be performed automatically by patient terminal 150 or other components of medical diagnosis system 100, before being stored in training database 110 or patient description database 140.

In some embodiments, sample patient descriptions may be patient descriptions received for medical diagnosis system 100 to train diagnosis model 105. For example, sample patient descriptions and known diseases of the respective patients whose symptoms are described in the sample patient descriptions may be and used by model training device 120 to train diagnosis model 105. The known diseases of the patients may be benchmark diagnosis made by professionals based on the sample patient descriptions. Sample patient descriptions and their respective known diseases may be stored in pairs in training database 110 as training data 101.

In some embodiments, patient descriptions 103 may be patient descriptions for medical diagnosis system 100 to process and make automatic diagnosis. For example, diagnosis device 130 may predict a disease of the patient who made patient description 103 using diagnosis model 105. Patient descriptions 103 may be stored in patient description database 140. In some embodiments, patient description 103, along with its diagnosis (e.g., predicted disease) , may be periodically provided to update training database 110.

As shown in FIG. 1, medical diagnosis system 100 may include components for performing two stages, a training stage and a diagnosis stage. To perform the training stage, medical diagnosis system 100 may include training database 110 and model training device 120. To perform the diagnosis stage, medical diagnosis system 100 may include a diagnosis device 130 and a patient description database 140. In some embodiments, when a learning model (e.g., diagnosis model 105) for disease diagnosis is pre-trained, medical diagnosis system 100 may only include diagnosis device 130 and patient description database 140.

Medical diagnosis system 100 may optionally include network 160 to facilitate the communication among the various components of medical diagnosis system 100, such as

databases

110 and 140,

devices

120 and 130, and terminal 150. For example, network 160 may be a local area network (LAN) , a wireless network, a cloud computing environment (e.g., software as a service, platform as a service, infrastructure as a service) , a client-server, a wide area network (WAN) , etc. In some embodiments, network 160 may be replaced by wired data communication systems or devices.

In some embodiments, the various components of medical diagnosis system 100 may be remote from each other or in different locations, and be connected through network 160 as shown in FIG. 1. In some alternative embodiments, certain components of medical diagnosis system 100 may be located on the same site or inside one device. For example, training database 110 may be located on-site with or be part of model training device 120. As another example, model training device 120 and diagnosis device 130 may be inside the same computer or processing device.

As shown in FIG. 1, model training device 120 may communicate with training database 110 to receive one or more sets of training data 101. Each set of training data may include a sample patient description and its corresponding ground truth diagnosis that indicates the known disease of the patient. Model training device 120 may use training data 101 received from training database 110 to train a learning model, diagnosis model 105, for diagnosing patient diseases based on their symptom descriptions. Model training device 120 may be implemented with hardware specially programmed by software that performs the training process. For example, model training device 120 may include a processor and a non-transitory computer-readable medium (discussed in detail in connection with FIG. 2) . The processor may conduct the training by performing instructions of a training process stored in the computer-readable medium. Model training device 120 may additionally include input and output interfaces to communicate with training database 110, network 160, and/or a user interface (not shown) . The user interface may be used for selecting sets of training data, adjusting one or more parameters of the training process, selecting or modifying a framework of the learning model, and/or manually or semi-automatically providing diagnosis results associated with a sample patient description for training.

Consistent with some embodiments, diagnosis model 105 (discussed in detail in connection with FIG. 3) may be a convolutional neural network (CNN) model, a recurrent neural network (RNN) model or a combination of the two. Diagnosis model 105 may be trained using supervised learning. The architecture of a diagnosis model 105 includes a stack of distinct layers that transform the input into the output. As used herein, “training” a learning model refers to determining one or more parameters of at least one layer in the learning model. For example, a convolutional layer of a CNN model may include at least one filter or kernel. One or more parameters, such as kernel weights, size, shape, and structure, of the at least one filter may be determined by e.g., a backpropagation-based training process.

Consistent with the present disclosure, diagnosis model 105 may include a CNN sub-model to process data using the known diseases and their symptoms as input. Diagnosis model 105 may additionally include an RNN sub-model to process data using the symptoms recognized from the sample patient descriptions as input. Consistent with the present disclosure, the CNN and RNN sub-models are connected and jointly trained. In some embodiments, intermediate outputs from the two sub-models may be jointly optimized. For example, a difference between an output from the CNN max-pooling layer and another output from the RNN hidden layer may be minimized to derive the optimal model parameters.

Diagnosis device 130 may receive diagnosis model 105 from model training device 120. Diagnosis device 130 may include a processor and a non-transitory computer-readable medium (discussed in detail in connection with FIG. 5) . The processor may perform instructions of a cancer metastasis detection process stored in the medium. Diagnosis device 130 may additionally include input and output interfaces to communicate with patient description database 140, network 160, and/or a user interface (not shown) . The user interface may be used for selecting a patient description 103 for diagnosis, initiating the diagnosis process, or displaying a diagnosis result 107.

Diagnosis device 130 may communicate with patient description database 140 to receive one or more patient descriptions 103. In some embodiments, the patient descriptions stored in patient description database 140 may be received from patient terminal 150. Diagnosis device 130 may use the trained model received from model training device 120 to predict a disease or illness of the patient whose symptoms are described by patient description 103, and output diagnosis result 107.

FIG. 2 illustrates a block diagram of an exemplary AI system 200 for training a diagnosis model, according to embodiments of the disclosure. Consistent with the present disclosure, AI system 200 may be an embodiment of model training device 120. In some embodiments, as shown in FIG. 2, AI system 200 may include a communication interface 202, a processor 204, a memory 206, and a storage 208. In some embodiments, AI system 200 may have different modules in a single device, such as an integrated circuit (IC) chip (e.g., implemented as an application-specific integrated circuit (ASIC) or a field-programmable gate array (FPGA) ) , or separate devices with dedicated functions. In some embodiments, one or more components of AI system 200 may be located in a cloud, or may be alternatively in a single location (such as inside a mobile device) or distributed locations. Components of AI system 200 may be in an integrated device, or distributed at different locations but communicate with each other through a network (not shown) . Consistent with the president disclosure, AI system 200 may be configured to train diagnosis model 105 based on training data 101, which is provided to diagnosis device 130 for processing patient descriptions 103.

Communication interface 202 may send data to and receive data from components such as training database 110 via communication cables, a Wireless Local Area Network (WLAN) , a Wide Area Network (WAN) , wireless networks such as radio waves, a cellular network, and/or a local or short-range wireless network (e.g., Bluetooth ^TM) , or other communication methods. In some embodiments, communication interface 202 may include an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection. As another example, communication interface 202 may include a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links can also be implemented by communication interface 202. In such an implementation, communication interface 202 can send and receive electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.

Consistent with some embodiments, communication interface 202 may receive training data 101 including sample patient descriptions and their respective known diseases from training database 110. The sample patient descriptions may be received as texts or in their original format as acquired by patient terminal 150, such as an audio or in handwriting. A sample patient description may include one sentence or multiple sentences that describe the symptoms and feelings of a patient. When the patient description is made orally, the description may additionally contain various spoken languages such as exclamation words, including, e.g., hmm, well, all right, you know, okay, so, etc. For example, patient 530 may describe her symptom as “Yeah, okay, I am having a recurring pain in the head, you know, headache. ” Communication interface 202 may further provide the received data to memory 206 and/or storage 208 for storage or to processor 204 for processing.

Processor 204 may include any appropriate type of general-purpose or special-purpose microprocessor, digital signal processor, or microcontroller. Processor 204 may be configured as a separate processor module dedicated to training a learning model. Alternatively, processor 204 may be configured as a shared processor module for performing other functions in addition to model training.

Memory 206 and storage 208 may include any appropriate type of mass storage provided to store any type of information that processor 204 may need to operate. Memory 206 and storage 208 may be a volatile or non-volatile, magnetic, semiconductor, tape, optical, removable, non-removable, or other type of storage device or tangible (i.e., non-transitory) computer-readable medium including, but not limited to, a ROM, a flash memory, a dynamic RAM, and a static RAM. Memory 206 and/or storage 208 may be configured to store one or more computer programs that may be executed by processor 204 to perform functions disclosed herein. For example, memory 206 and/or storage 208 may be configured to store program (s) that may be executed by processor 204 to train diagnosis model 105.

Memory 206 and/or storage 208 may be further configured to store information and data used by processor 204. For instance, storage 208 may be configured to store a knowledge database 282 including the various types of data associated with patients, symptoms, diseases, diagnoses, images, treatments, and other medical data. In some embodiments, knowledge database 282 may include various lists used for automatically recognizing medical symptoms from patient descriptions, such as a stop-word list 284 and an entity list 286.

In some embodiments, stop-word list 284 may include stop-words that do not carry substantive meanings for the purpose of medical diagnosis. In some embodiments, stop-word list 284 may include relational words. Linguistically, words may include notional words that have substantive meanings and relational words that merely express a grammatical relationship between notional words to express the meanings. For example, notional words may include nouns, verbs, adjectives, numerals, qualifiers, pronouns, etc. In contrast, a relational word does not have independent meanings and it must be attached to a notional word to express a substantive meaning. For example, relational words may include adverbs, articles, prepositions, conjunctions, particles, exclamations, etc. Because relational words carry no substantive meanings, they can be automatically included on stop-word list 284.

Stop-word list 284 may further include notional words that are unrelated to medical symptoms. Accordingly, certain notional words, such as nouns used as the subject, e.g., “I, ” “we, ” “you, ” “it” as non-substantive, and verbs and adjectives that do not meaningfully describe a symptom, e.g., “have, ” “seem, ” “look, ” “feel, ” and “alittle bit, ” may be included as stop-words. Stop-word list 284 may be periodically updated, e.g., to include additional stop-words used by the patients.

In some embodiments, entity list 286 may include entities associated with known symptoms. The entities associated with known symptoms may be provided or reviewed by medical professionals such as physicians or nurses. For example, entities may include “fever, ” “headache, ” “nausea, ” “migraine, ” “joint pain, ” “running nose, ” “bleeding, ” “swelling, ” “upset stomach, ” “vomit, ” etc. In some embodiments, when an entity contains a phrase, it may be further divided into words and stored separately. For example, “joint pain” may be further divided into two words “joint” and “pain. ” In some embodiments, entity list 286 may be periodically updated, e.g., to include entities describing new symptoms.

Consistent with the present disclosure, knowledge database 282 may further include symptom vectors for each symptom entity in entity list 286. The symptom vectors may be word vectors of the entity word. In some embodiments, the word vectors are determined using word embedding, which maps the words to vectors of real numbers. In some embodiments, the word vectors may be of several hundred dimensions.

In some embodiments, memory 206 and/or storage 208 may also store intermediate data such as the word segments in sample patient descriptions, feature maps output by layers of the learning model, and optimization loss functions, etc. Memory 206 and/or storage 208 may additionally store various learning models including their model parameters, such as a CNN model and an RNN model, etc. that will be described. The various types of data may be stored permanently, removed periodically, or disregarded immediately after the data is processed.

As shown in FIG. 2, processor 204 may include multiple modules, such as an RNN processing unit 242, a CNN processing unit 244, a joint optimization unit 246, and the like. These modules (and any corresponding sub-modules or sub-units) can be hardware units (e.g., portions of an integrated circuit) of processor 204 designed for use with other components or software units implemented by processor 204 through executing at least part of a program. The program may be stored on a computer-readable medium, and when executed by processor 204, it may perform one or more functions. Although FIG. 1 shows units 242-246 all within one processor 204, it is contemplated that these units may be distributed among different processors located closely or remotely with each other.

Units 242-246 are configured to train a diagnosis model using training data 101. FIG. 3 illustrates a schematic diagram of an exemplary diagnosis model 300, according to embodiments of the disclosure. Consistent with the present disclosure, diagnosis model 300 may include a plurality of sub-models, such as a CNN 310 and an RNN 320. Both CNN 310 and RNN 320 may include multiple layers. For example, CNN 310 may include one or more convolution layers or fully-convolutional layers, non-linear operator layers, pooling or subsampling layers, fully connected layers, and/or final loss layers. RNN 320 may include input layers, hidden layers, and output layers. Each layer may connect one upstream layer and one downstream layer.

Consistent with the present disclosure, data processing is performed to determine symptom features associated with known diseases. These symptom features are used as input to CNN 310, which predicts the disease based on the symptom features. On the other hand, data processing is performed to determine symptom features associated with sample patient descriptions. The symptom features are used as input to RNN 320. Consistent with the present disclosure, CNN 310 and RNN 320 are internally connected at given layers and jointly trained. For example, the max-pooling layer of CNN 310 and the hidden layer of RNN 320 are connected, as they both provide feature maps describing the symptoms.

For a set of training data 101, RNN 320 should produce a same feature map output based on the sample patient description as that produced by CNN 310 based on the known disease corresponding to the sample patient description. Accordingly, these outputs from the two sub-models may be jointly optimized to derive the optimal model parameters, e.g., by minimizing a difference between the outputs of the max-pooling layer and the hidden layer. For example, the sample patient description may be “I feel really warm in my entire body, my nose is running, and by the way, throat is pretty sore too, oh yes, I am just so tired” and the known diagnosis is “flu. ” Accordingly, CNN 310 may look up symptoms of “flu” such as “fever, ” “running nose, ” “sore throat, ” “fatigue, ” etc. and derive the symptom features. On the other hand, RNN 320 may also recognize the same symptoms from the sample patient description and derive the symptom features accordingly. The feature maps output by the max-pooling layer of CNN 310 and the hidden layer of RNN 320, therefore, should have a minimum difference. By jointly optimizing the outputs of the sub-models, optimal model parameters for CNN 310 and RNN 320 may be obtained, and diagnosis model 300 trained.

In some embodiments, units 242-246 of FIG. 2 may execute computer instructions to perform the training. For example, FIG. 4 illustrates a flowchart of an exemplary method 400 for training diagnosis model 300, according to embodiments of the disclosure. Method 400 may include steps S402-S426 as described below. In some embodiments, communication interface 202 may perform steps S402, RNN processing unit 242 may perform steps S404-S414, CNN processing unit 244 may perform steps S416-S422, and joint optimization unit 246 may perform steps S424-S426. It is to be appreciated that some of the steps may be optional to perform the disclosure provided herein. Further, some of the steps may be performed simultaneously, or in a different order than shown in FIG. 2. For example, RNN processing unit 242 may perform steps S404-S414 while CNN processing unit 244 is performing steps S416-S422.

In step 402, communication interface 202 may receive training data 101 including sample patient descriptions and known diseases corresponding to the sample patient descriptions. In some embodiments, a large number of training data may be received to train the learning model. Each sample patient description may include one or more sentences that describe the symptoms of a patient. For example, the patient may describe her symptom as “I feel really warm in my entire body, my nose is running, and by the way, throat is pretty sore too, oh yes, I am just so tired. ”

In step S404, RNN processing unit 242 segments each sample patient description into multiple word segments. A word segment is the smallest unit in a sentence that has semantic meanings. A word segment may be a word or a combination of two or more words. If the patient description includes multiple sentences, it may be segmented into different sentences first. In some embodiments, each sample patient description may be segmented using a sentence segmentation model trained using sample sentences and known word segments of those sentences. Applying the segmentation model, each sample patient description is segmented into a plurality of word segments. The exemplary description above can be segmented as follows:

I //feel //really //warm //in //my //entire body //My //nose //is running //

and //by the way //throat //is //pretty //sore //too //oh //yes //I //am //just //so //tired

In step S406, RNN processing unit 242 may filter the word segments in the sample patient description using stop-word list 284. In some embodiments, RNN processing unit 242 may search each stop-word on stop-word list 284, and if the stop-word is found in the sample patient description, the corresponding word segment will be removed. In other words, stop-word list 284 is used to filter the sample patient description to remove word segments that are known to be irrelevant to patient symptoms. For example, in the description above, word segments such as “I, ” “felt, ” “really, ” “in, ” “my, ” “by the way, ” “too, ” “oh, ” “yes, ” “just, ” “so, ” etc. may be identified and removed as stop-words. Removing these stop-words “cleans up” the sample patient description and conditions it for the later symptom recognition processes. For example, in the sample patient description above – “I feel really warm in my entire body, my nose is running, and by the way, throat is pretty sore too, oh yes, I am just so tired” –the word segments/spans remaining after the filtering may be “warm ... entire body, ” “nose is running, ” “throat ... sore, ” and “tired. ”

In step S408, RNN processing unit 242 may match the remaining word segments with the symptom entities. The medical symptoms may be recognized using various methods. In some embodiments, a span searching method may be applied to find the entity with the highest matching value with each span between two word segments. For example, the symptom entities on entity list 286 may be searched and matched to the word segments. In some embodiments, an end-to-end learning network may be used to identify the matched symptom entities to the word segments. For example, the remaining word segments/spans may be matched with symptom entities “fever, ” “running nose, ” “sore throat, ” and “fatigue. ”

In step S410, RNN processing unit 242 may construct a first symptom feature matrix M1 for the symptom entities. In some embodiments, RNN processing unit 242 may retrieve the symptom vectors corresponding to the symptom entities from knowledge database 282. For example, the symptom vectors may be pre-stored in the database for entities on entity list 286. In some embodiments, RNN processing unit 242 may also determine the symptom vectors using symptom embedding on the fly. The symptom feature matrix may be constructed using the symptom vectors. For example, if n symptoms are detected, and each symptom vector is d dimension, the symptom feature matrix may be n × d in size.

In step S410, RNN processing unit 242 may input the first symptom feature matrix M1 to RNN 320. In step S412, a first output may be obtained from the hidden layer of RNN 320. For example, the first output may be a feature map.

In step S416, CNN processing unit 244 may determine symptoms associated with the known disease. In some embodiments, CNN processing unit 244 may inquire knowledge database 282 to map the disease to a list of symptoms. The diseases and their associated symptoms may be embedded in a dynamic trans-matrix (referred to as TransD) . For example, if the patient is known to have a flu, the associated symptoms may include “fever, ” “running nose, ” “sore throat, ” and “fatigue. ” In step S418, CNN processing unit 244 may construct a second symptom feature matrix M2 similar to step S410.

In step S420, CNN processing unit 244 may input the second symptom feature matrix M2 to CNN 310. In step S422, a second output may be obtained from the max-pooling layer of RNN 320. For example, the second output may also be a feature map.

In step S424, joint optimization unit 246 may minimize a difference between the first output obtained in step S414 and the second output obtained in step S422. In some embodiments, the difference may be a mean square loss (i.e., norm-2 difference) between the two outputs. It is contemplated that the difference may be formulated differently, such as a norm-1 difference, a square root of a norm-2 difference, etc. Any suitable method may be used to solve the optimization problem, such as various iterative methods. By jointly training CNN 310 and RNN 320, the data processing of the two sub-models are connected, and medical diagnosis model 300 is optimized as a whole. In step S426, the optimized diagnosis model is provided to a diagnosis device. For example, as shown in FIG. 3, model training device 120 provides diagnosis model 105 to diagnosis device 130.

FIG. 5 illustrates a block diagram of an exemplary AI system 500 for making medical diagnosis based on a diagnosis model, according to embodiments of the disclosure. Consistent with the present disclosure, AI system 500 may be an embodiment of diagnosis device 130. In some embodiments, as shown in FIG. 5, AI system 500 may include a communication interface 502, a processor 504, a memory 506, and a storage 508. In some embodiments, AI system 500 may have hardware components and configurations similar to AI system 200. Consistent with the president disclosure, AI system 500 may be configured to make medical diagnosis from patient descriptions 103 based on diagnosis model 105 provided by model training device 120.

Communication interface 502 may be configured similarly as communication interface 202. In some embodiments, communication interface 502 may send data to and receive data from components such as model training device 120, patient description database 140, and display 550. For example, communication interface 502 may receive diagnosis model 105 from model training device 120, and patient descriptions 103 from patient description database 140. The patient descriptions 103 may be provided by patients 530 through patient terminal 150. Communication interface 502 may send diagnosis result 107 to display 550 to be displayed to patients 530.

Processor 504 may include hardware components similar to those in processor 204. Processor 504 may be configured as a separate processor module dedicated to making medical diagnosis using a learning model. Alternatively, processor 504 may be configured as a shared processor module for performing other functions in addition to medical diagnosis. Memory 506 and storage 508 may be similar to memory 206 and storage 208. For example, memory 506 and/or storage 508 may be configured to store program (s) that may be executed by processor 504 to make medical diagnosis using diagnosis model 105. Storage 508 may also store knowledge database 282, which may include stop-word list 284, entity list 286, and various mappings such as between symptoms and their corresponding symptom vectors.

In some embodiments, memory 506 and/or storage 508 may store intermediate data such as the word segments in patient descriptions 103, feature maps output by layers of diagnosis model 105, etc. Memory 506 and/or storage 508 may additionally store various learning models including their model parameters, such as a sentence segmentation model, a symptom recognition model, and diagnosis model 105, etc.

As shown in FIG. 5, processor 504 may include multiple modules, such as a symptom recognition unit 542, a diagnosis unit 544, and the like. These modules (and any corresponding sub-modules or sub-units) can be hardware units (e.g., portions of an integrated circuit) of processor 504 designed for use with other components or software units implemented by processor 504 through executing at least part of a program. The program may be stored on a computer-readable medium, and when executed by processor 504, it may perform one or more functions. Although FIG. 1 shows units 542-544 all within one processor 504, it is contemplated that these units may be distributed among different processors located closely or remotely with each other.

In some embodiments, units 542-544 may execute computer instructions to perform the diagnosis. For example, FIG. 6 illustrates a flowchart of an exemplary method 600 for making medical diagnosis based on a diagnosis model, according to embodiments of the disclosure. Method 600 may be implemented by AI system 500 and particularly processor 504 or a separate processor not shown in FIG. 5. Method 600 may include steps S602-S61 as described below. It is to be appreciated that some of the steps may be optional to perform the disclosure provided herein. Further, some of the steps may be performed simultaneously, or in a different order than shown in FIG. 6.

In step S602, communication interface 502 may receive a patient description, e.g., patient description 103. Patient description 103 may be received as texts or in its original format as acquired by patient terminal 150, such as an audio or in handwriting. If received as an audio, patient description 103 may be transcribed into texts. If received in handwriting, patient description 103 may be automatically recognized and convert into texts. Patient description 103 may include one sentence or multiple sentences that describe the symptoms of patient 530. F For example, the patient may describe her symptom as “I had a headache all night last night, so I woke up feeling very dizzy, you know, and by the way my nose seems running too. ”

In step S604, symptom recognition unit 542 may segment the patient description into word segments. The segmentation may be performed as described for step S404 of method 400. For example, the above exemplary description may be divided into three sentences: “I am having a recurring pain in the head. ” “Also feeling a bit dizzy. ” and “And my nose seems running too. ” Symptom recognition unit 542 may further segment each of the sentences into word segments, such as:

I //had //a headache //all night //last night.

So //I //woke up //feeling //very //dizzy //you know.

And //by the way //my //nose //seems //running //too.

In step S606, symptom recognition unit 542 may filter the word segments with stop-word list 284, similar to step S406 of method 400. For example, in the description above, word segments such as “I, ” “had, ” “all night, ” “last night, ” “so, ” “woke up, ” “you know, ” “by the way, ” etc. may be identified and removed as stop-words.

In step S608, symptom recognition unit 542 may match the remaining word segments with symptom entities on entity list 286, similar to step S408 of method 400. For example, the remaining word segments in the above exemplary description may be matched to symptom entities such as “headache, ” “dizzy, ” and “running nose. ”

In step S610, diagnosis unit 544 may determine a system vector for each symptom/entity matched to patient description 103 in step S608. In some embodiments, the symptom vectors are determined using symptom embedding, which maps the entities to vectors of real numbers. In some embodiments, the word vectors may be of several hundred dimensions. For example, for symptoms s1, s2, and s3, symptom embedding may generate symptom vectors v1, v2, and v3, respectively. In some embodiments, system embedding may be implemented as a Continuous Bag of Words (CBOW) learning model or Glove learning model, etc. In some embodiments, the symptom vectors for entities on entity list 286 may be pre-generated and stored in knowledge database 282. Accordingly, in step S610, diagnosis unit 544 may request and receive the respective symptom vectors from knowledge database 282.

In step S612, diagnosis unit 544 may input the symptom vectors to the trained RNN, e.g., RNN 320. In some embodiments, diagnosis unit 544 may construct a feature matrix using the symptom vectors. In step S614, diagnosis unit 544 may obtain a feature map from a hidden layer of the RNN.

In step S616, diagnosis unit 544 may input the feature map to a classifier layer of the trained CNN, e.g., CNN 310. For example, the classifier layer may be implemented using a feedforward neural network, such as softmax. Based on the symptom features, softmax may be configured to determine the probabilities of possible diseases that the patient sustained. For example, if the matched symptoms are “fever, ” “nausea, ” “muscle pain, ” and “fatigue, ” the softmax layer, in CNN 310, may predict that patient 530 has a 50%possibility to have a flu caused by viruses, a 30%possibility to have a regular cold, and a 20%possibility to have another disease, such as yellow fever.

In step S618, communication interface 502 may provide the diagnosis result output by CNN 310. For example, as shown by FIG. 5, diagnosis result 107 may be provided to patient 530 through display 550. Display 550 may include a display such as a Liquid Crystal Display (LCD) , a Light Emitting Diode Display (LED) , a plasma display, or any other type of display, and provide a Graphical User Interface (GUI) presented on the display for user input and data depiction. The display may include a number of different types of materials, such as plastic or glass, and may be touch-sensitive to receive inputs from the user. For example, the display may include a touch-sensitive material that is substantially rigid, such as Gorilla Glass ^TM, or substantially pliable, such as Willow Glass ^TM. In some embodiments, display 550 may be part of patient terminal 150.

Another aspect of the disclosure is directed to a non-transitory computer-readable medium storing instructions which, when executed, cause one or more processors to perform the methods, as discussed above. The computer-readable medium may include volatile or non-volatile, magnetic, semiconductor, tape, optical, removable, non-removable, or other types of computer-readable medium or computer-readable storage devices. For example, the computer-readable medium may be the storage device or the memory module having the computer instructions stored thereon, as disclosed. In some embodiments, the computer-readable medium may be a disc or a flash drive having the computer instructions stored thereon.

It will be apparent to those skilled in the art that various modifications and variations can be made to the disclosed system and related methods. Other embodiments will be apparent to those skilled in the art from consideration of the specification and practice of the disclosed system and related methods.

It is intended that the specification and examples be considered as exemplary only, with a true scope being indicated by the following claims and their equivalents.

Claims

An artificial intelligence system for training a learning model for medical diagnosis, comprising:

a storage device configured to store a sample patient description of a patient and a known disease of the patient;

a processor configured to train the learning model, wherein the learning model includes a first sub-model and a second sub-model, wherein the processor is configured to:

determine a first symptom feature matrix based on the sample patient description;

input the first symptom feature matrix to the first sub-model;

determine a second symptom feature matrix based on the known disease;

input the second symptom feature matrix to the second sub-model; and

jointly optimize the first sub-model and the second sub-model; and

a communication interface configured to provide the learning model for automatically diagnosing a disease from a patient description.
The artificial intelligence system of claim 1, wherein to determine the first symptom feature matrix, the processor is further configured to:

recognize at least one symptom from the sample patient description; and

determine a symptom vector for each recognized symptom.
The artificial intelligence system of claim 2, wherein to recognize the at least one symptom, the processor is further configured to:

segment the sample patient description into word segments;

filter the word segments with a stop-word list; and

match the remaining word segments with the at least one symptom.
The artificial intelligence system of claim 1, wherein to determine the second symptom feature matrix, the processor is further configured to:

determine a plurality of symptoms associated with the known disease; and

determine a symptom vector for each symptom.
The artificial intelligence system of claim 1, wherein the first sub-model is a recurrent neural network (RNN) and the second sub-model is a convolutional neural network (CNN) .
The artificial intelligence system of claim 5, wherein to jointly optimize the first and second sub-models, the processor is further configured to:

obtain a first output from the RNN;

obtain a second output from the CNN; and

optimize a loss indicative of a difference between the first output and the second output.
The artificial intelligence system of claim 6, wherein the first output is obtained from a hidden layer of the RNN, and the second output is obtained from a max-pooling layer of the CNN.
The artificial intelligence system of claim 6, wherein the loss is a mean variance of the difference between the first output and the second output.
An artificial intelligence system for making medical diagnosis based on a learning model, comprising:

a patient interaction interface configured to receive a patient description of a patient;

a storage device configured to store the learning model, wherein the learning model includes a first sub-model and a second sub-model that are jointly trained; and

a processor configured to:

determine a symptom feature matrix based on the patient description;

obtain a feature map by applying the first sub-model to the symptom feature matrix; and

identify a disease for the patient by applying one or more layers of the second sub-model on the feature map.
The artificial intelligence system of claim 9, wherein to determine the symptom feature matrix, the processor is further configured to:

recognize at least one symptom from the patient description; and

determine a symptom vector for each recognized symptom.
The artificial intelligence system of claim 10, wherein to recognize the at least one symptom, the processor is further configured to:

segment the patient description into word segments;

filter the word segments with a stop-word list; and

match the remaining word segments with the at least one symptom.
The artificial intelligence system of claim 9, wherein the first sub-model is a recurrent neural network (RNN) and the second sub-model is a convolutional neural network (CNN) .
The artificial intelligence system of claim 12, wherein the feature map is obtained from a hidden layer of the RNN and inputted to a softmax layer of the CNN.
The artificial intelligence system of claim 13, wherein the softmax layer is configured to:

calculate probabilities of the patient having a plurality of potential diseases based on the feature map; and

identify the potential disease associated with the highest probability as the disease for the patient.
An artificial intelligence method for training a learning model for medical diagnosis, the learning model including a first sub-model and a second sub-model, the artificial intelligence method comprising:

receiving, from a storage device, a sample patient description of a patient and a known disease of the patient;

determining, by a processor, a first symptom feature matrix based on the sample patient description;

inputting, by the processor, the first symptom feature matrix to the first sub-model;

determining, by the processor, a second symptom feature matrix based on the known disease;

inputting, by the processor, the second symptom feature matrix to the second sub-model;

jointly optimizing, by the processor, the first sub-model and the second sub-model; and

providing, by a communication interface, the learning model for automatically diagnosing a disease from a patient description.
The artificial intelligence method of claim 15, wherein determining the first symptom feature matrix further comprises:

segmenting the sample patient description into word segments;

filtering the word segments with a stop-word list;

matching the remaining word segments with at least one symptom; and

determining a symptom vector for each recognized symptom.
The artificial intelligence method of claim 15, wherein determining the second symptom feature matrix further comprises:

determining a plurality of symptoms associated with the known disease; and

determining a symptom vector for each symptom.
The artificial intelligence method of claim 15, wherein the first sub-model is a recurrent neural network (RNN) and the second sub-model is a convolutional neural network (CNN) , and wherein jointly optimizing the first and second sub-models further comprises:

obtaining a first output from a hidden layer of the RNN;

obtaining a second output from a max-pooling layer of the CNN; and

optimizing a loss indicative of a difference between the first output and the second output.
An artificial intelligence method for making medical diagnosis based on a learning model, wherein the learning model includes a first sub-model and a second sub-model that are jointly trained, the artificial intelligence method comprising:

receiving, through a patient interaction interface, a patient description of a patient;

determining, by a processor, a symptom feature matrix based on the patient description;

obtaining, by the processor, a feature map by applying the first sub-model to the symptom feature matrix; and

identifying, by the processor, a disease for the patient by applying one or more layers of the second sub-model on the feature map.
The artificial intelligence method of claim 19, wherein determining the symptom feature matrix further comprises:

segmenting the patient description into word segments;

filtering the word segments with a stop-word list;

matching the remaining word segments with the at least one symptom; and

determining a symptom vector for each recognized symptom.
The artificial intelligence method of claim 19, wherein the first sub-model is a recurrent neural network (RNN) and the second sub-model is a convolutional neural network (CNN) , and wherein the feature map is obtained from a hidden layer of the RNN and inputted to a softmax layer of the CNN.
The artificial intelligence method of claim 19, wherein identifying the disease further comprises:

calculating probabilities of the patient having a plurality of potential diseases based on the feature map; and

identifying the potential disease associated with the highest probability as the disease for the patient.
A non-transitory computer-readable medium having instructions stored thereon that, when executed by a processor, causes the processor to perform an artificial intelligence method for training a learning model for medical diagnosis, the learning model including a first sub-model and a second sub-model, the artificial intelligence method comprising:

determining a first symptom feature matrix based on a sample patient description of a patient;

inputting the first symptom feature matrix to the first sub-model;

determining a second symptom feature matrix based on a known disease of the patient;

inputting the second symptom feature matrix to the second sub-model;

jointly optimizing the first sub-model and the second sub-model; and

providing the learning model for automatically diagnosing a disease from a patient description.
A non-transitory computer-readable medium having instructions stored thereon that, when executed by a processor, causes the processor to perform an artificial intelligence method for making medical diagnosis based on a learning model, wherein the learning model includes a first sub-model and a second sub-model that are jointly trained, the artificial intelligence method comprising:

determining a symptom feature matrix based on a patient description of a patient;

obtaining a feature map by applying the first sub-model to the symptom feature matrix; and

identifying a disease for the patient by applying one or more layers of the second sub-model on the feature map.