WO2021012222A1 - Artificial intelligence system for processing patient descriptions - Google Patents

Artificial intelligence system for processing patient descriptions Download PDF

Info

Publication number
WO2021012222A1
WO2021012222A1 PCT/CN2019/097534 CN2019097534W WO2021012222A1 WO 2021012222 A1 WO2021012222 A1 WO 2021012222A1 CN 2019097534 W CN2019097534 W CN 2019097534W WO 2021012222 A1 WO2021012222 A1 WO 2021012222A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
stop
patient
artificial intelligence
descriptions
Prior art date
Application number
PCT/CN2019/097534
Other languages
French (fr)
Inventor
Mingyang Sun
Xiaoqing Yang
Zang Li
Original Assignee
Beijing Didi Infinity Technology And Development Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Didi Infinity Technology And Development Co., Ltd. filed Critical Beijing Didi Infinity Technology And Development Co., Ltd.
Priority to PCT/CN2019/097534 priority Critical patent/WO2021012222A1/en
Publication of WO2021012222A1 publication Critical patent/WO2021012222A1/en

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/60ICT specially adapted for the handling or processing of medical references relating to pathologies
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Definitions

  • the present disclosure relates to artificial intelligence (AI) systems and methods for processing patient descriptions, and more particularly to, AI systems and methods for constructing a stop-word list for a knowledge database used to process patient descriptions.
  • AI artificial intelligence
  • Pre-diagnosis is usually performed in hospitals to preliminarily determine the illnesses of patients before sending them to the right physicians. Pre-diagnosis is typically based on symptoms described by the patient. For example, if the patient says she has a fever and a running nose, she will be pre-diagnosed as having a cold or a flu and be sent to an internal medicine doctor. If the patient says that she has itchy rashes on her skin, she will be pre-diagnosed as having skin allergies and be sent to a dermatologist.
  • Pre-diagnosis is typically performed by medical practitioners, such as physicians or nurses.
  • hospitals usually have pre-diagnosis personnel available at the check-in desk to determine where the patient should be sent to.
  • having practitioners perform the pre-diagnosis wastes valuable resources.
  • Automated pre-diagnosis methods are used to improve the efficiency. For example, diagnosis robots are being developed to perform the pre-diagnosis. These automated methods provide a preliminary diagnosis based on patient’s described symptoms, e.g., based on preprogramed mappings between diseases and known symptoms.
  • Patient descriptions are, however, not accurate or clear.
  • the patient may be under the influence of the illness or medicine and could not express herself accurately.
  • the patient may use informal language that contains stop-words that do not convey substantive meanings, such as modal words and exclamation words.
  • Existing automated methods rely on a predetermined stop-word list to filter out the stop-words from the patient description, before using the description to recognize medical symptoms.
  • the existing stop-word lists are incomplete and need to be supplemented as new vocabularies are used in patient descriptions.
  • Embodiments of the disclosure address the above problems by providing improved artificial intelligence systems and methods for constructing a stop-word list for a knowledge database used to process patient descriptions.
  • Embodiments of the disclosure provide an artificial intelligence system for constructing a stop-word list used for processing patient descriptions.
  • An exemplary artificial intelligence system includes a patient interaction interface configured to receive sample patient descriptions.
  • the artificial intelligence system further includes a storage device configured to store a plurality of entities corresponding to known medical symptoms.
  • the artificial intelligence system also includes a processor.
  • the processor is configured to identify a candidate stop-word from the sample patient descriptions.
  • the processor is further configured to identify matching entities, from the plurality of entities, that are matched with the candidate stop-word.
  • the processor is also configured to determine similarity values between the candidate stop-word and the matching entities.
  • the processor is additionally configured to add the candidate stop-word to the stop-word list when the highest similarity value is lower than a threshold.
  • Embodiments of the disclosure also provide an artificial intelligence method for constructing a stop-word list used for processing patient descriptions.
  • the artificial intelligence method includes receiving, by a patient interaction interface, sample patient descriptions.
  • the method further includes identifying, by a processor, a candidate stop-word from the sample patient descriptions.
  • the method further includes identifying matching entities, from the plurality of entities, that are matched with the candidate stop-word.
  • the method also includes determining, by the processor, similarity values between the candidate stop-word and the matching entities.
  • the method additionally includes adding the candidate stop-word to the stop-word list when the highest similarity value is lower than a threshold.
  • Embodiments of the disclosure further provide a non-transitory computer-readable medium having instructions stored thereon that, when executed by a processor, causes the processor to perform an artificial intelligence method for constructing a stop-word list used for processing patient descriptions.
  • the artificial intelligence method includes identifying a candidate stop-word from sample patient descriptions.
  • the method further includes identifying matching entities, from the plurality of entities, that are matched with the candidate stop-word.
  • the method also includes determining similarity values between the candidate stop-word and the matching entities.
  • the method additionally includes adding the candidate stop-word to the stop-word list when the highest similarity value is lower than a threshold.
  • FIG. 1 illustrates a schematic diagram of an exemplary AI system for processing patient descriptions, according to embodiments of the disclosure
  • FIG. 2 illustrates a flowchart of an exemplary method for constructing a stop-word list used for processing patient descriptions, according to embodiments of the disclosure
  • FIG. 3 illustrates a flowchart of an exemplary method for processing a patient description using a stop-word list, according to embodiments of the disclosure.
  • FIG. 1 illustrates a block diagram of an exemplary AI system 100 for processing patient descriptions, according to embodiments of the disclosure.
  • AI system 100 may receive patient descriptions (e.g., sample patient descriptions 101 or patient descriptions 103) from a patient terminal 120.
  • patient terminal 120 may be a mobile phone, a desktop computer, a laptop, a PDA, a robot, a kiosk, etc.
  • Patient terminal 120 may include a patient interaction interface configured to receive the patient descriptions provided by one or more patients 130.
  • patient terminal 120 may include a keyboard, hard or soft, for patients 130 to type in the patient description.
  • Patient terminal 120 may additionally or alternatively include a touch screen for patients 130 to handwrite the patient description. Accordingly, patient terminal 120 may record the patient description as texts.
  • patient terminal 120 may automatically recognize the handwriting and convert it to text information.
  • patient terminal 120 may include a microphone, for recording the patient description provided by patients 130 orally.
  • Patient terminal 120 may automatically transcribe the recorded audio data into texts.
  • AI system 100 may receive the patient descriptions in their original format as captured by patient terminal 120, and the handwriting recognition and audio transcription may be performed automatically by patient terminal 120 or AI system 100.
  • sample patient descriptions 101 may be patient descriptions received for AI system 100 to construct or update a stop-word list.
  • AI system 100 may identify stop-words from sample patient descriptions 101 and add them to the stop-word list.
  • patient descriptions 103 may be patient descriptions for AI system to process and identify medical symptoms based on the stop-word list. For example, each patient description 103 may be filtered using the stop-word list.
  • AI system 100 may include a communication interface 102, a processor 104, a memory 106, and a storage 108.
  • AI system 100 may have different modules in a single device, such as an integrated circuit (IC) chip (e.g., implemented as an application-specific integrated circuit (ASIC) or a field-programmable gate array (FPGA) ) , or separate devices with dedicated functions.
  • IC integrated circuit
  • ASIC application-specific integrated circuit
  • FPGA field-programmable gate array
  • one or more components of AI system 100 may be located in a cloud, or may be alternatively in a single location (such as inside a mobile device) or distributed locations.
  • Components of AI system 100 may be in an integrated device, or distributed at different locations but communicate with each other through a network (not shown) .
  • AI system 100 may be configured to construct a stop-word list based on sample patient descriptions 101, which is then used as a filter for processing patient descriptions 103.
  • Communication interface 102 may send data to and receive data from components such as patient terminal 120 via communication cables, a Wireless Local Area Network (WLAN) , a Wide Area Network (WAN) , wireless networks such as radio waves, a cellular network, and/or a local or short-range wireless network (e.g., Bluetooth TM ) , or other communication methods.
  • communication interface 102 may include an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection.
  • ISDN integrated services digital network
  • communication interface 102 may include a local area network (LAN) card to provide a data communication connection to a compatible LAN.
  • LAN local area network
  • Wireless links can also be implemented by communication interface 102.
  • communication interface 102 can send and receive electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
  • communication interface 102 may receive data such as sample patient descriptions 101 and/or patient descriptions 103 from patient terminal 120.
  • the patient descriptions may be received as texts or in their original format as acquired by patient terminal 120, such as an audio or in handwriting.
  • a patient description may include one sentence or multiple sentences that describe the symptoms and feelings of patient 130.
  • the description may additionally contain various spoken language such as exclamation words, including, e.g., hmm, well, all right, you know, okay, so, etc.
  • patient 130 may describe her symptom as “Yeah, okay, I am having a recurring pain in the head, you know, headache. ”
  • Communication interface 102 may further provide the received data to memory 106 and/or storage 108 for storage or to processor 104 for processing.
  • Processor 104 may include any appropriate type of general-purpose or special-purpose microprocessor, digital signal processor, or microcontroller. Processor 104 may be configured as a separate processor module dedicated to constructing and updating a stop-word list for processing patient descriptions 103. Alternatively, processor 104 may be configured as a shared processor module for performing other functions unrelated to the stop-word list or patient description processing.
  • Memory 106 and storage 108 may include any appropriate type of mass storage provided to store any type of information that processor 104 may need to operate.
  • Memory 106 and storage 108 may be a volatile or non-volatile, magnetic, semiconductor, tape, optical, removable, non-removable, or other type of storage device or tangible (i.e., non-transitory) computer-readable medium including, but not limited to, a ROM, a flash memory, a dynamic RAM, and a static RAM.
  • Memory 106 and/or storage 108 may be configured to store one or more computer programs that may be executed by processor 104 to perform functions disclosed herein.
  • memory 106 and/or storage 108 may be configured to store program (s) that may be executed by processor 104 to generate diagnosis result 105 for patient 130.
  • Memory 106 and/or storage 108 may be further configured to store information and data used by processor 104.
  • storage 108 may be configured to store a knowledge database 182 including the various types of data associated with patients, symptoms, diseases, diagnoses, images, treatments, and other medical data.
  • knowledge database 182 may include various lists used for automatically recognizing medical symptoms from patient descriptions, such as a stop-word list 184 and an entity list 186.
  • stop-word list 184 may include stop-words that do not carry substantive meanings for the purpose of medical diagnosis.
  • stop-word list 184 may include relational words. Linguistically, words may include notional words that have substantive meanings and relational words that merely express a grammatical relationship between notional words to express the meanings. For example, notional words may include nouns, verbs, adjectives, numerals, qualifiers, pronouns, etc. In contrast, a relational word does not have independent meanings and it must be attached to a notional word to express a substantive meaning. For example, relational words may include adverbs, articles, prepositions, conjunctions, particles, exclamations, etc. Because relational words carry no substantive meanings, they can be automatically included on stop-word list 184.
  • Stop-word list 184 may further include notional words that are unrelated to medical symptoms. Accordingly, certain notional words, such as nouns used as the subject, e.g., “I, ” “we, ” “you, ” “it” as non-substantive, and verbs and adjectives that do not meaningfully describe a symptom, e.g., “have, ” “seem, ” “look, ” “feel, ” and “a little bit, ” may be included as stop-words.
  • nouns used as the subject e.g., “I, ” “we, ” “you, ” “it” as non-substantive
  • verbs and adjectives that do not meaningfully describe a symptom, e.g., “have, ” “seem, ” “look, ” “feel, ” and “a little bit, ” may be included as stop-words.
  • Stop-word list 184 may be constructed and updated by stop-word list construction unit 142 in processor 104, which will be described in greater details in this disclosure. Consistent with the present disclosure, stop-word list 184 may be periodically updated using, e.g., sampling patient descriptions 101. For example, additional stop-words used by the patients may be added to stop-word list 184.
  • entity list 186 may include entities associated with known symptoms (not shown) .
  • the entities associated with known symptoms may be provided or reviewed by medical professionals such as physicians or nurses.
  • entities may include “fever, ” “headache, ” “nausea, ” “migraine, ” “joint pain, ” “running nose, ” “bleeding, ” “swelling, ” “upset stomach, ” “vomit, ” etc.
  • when an entity contains a phrase it may be further divided into words and stored separately. For example, “joint pain” may be further divided into two words “joint” and “pain. ”
  • entity list 186 may be periodically updated, e.g., to include entities describing new symptoms.
  • memory 106 and/or storage 108 may also store intermediate data such as the word segments in sample patient description 101 and patient description 103, term frequencies of word segments, candidate stop-words, and similarity values between the candidate stop-words and the entities, etc.
  • Memory 106 and/or storage 108 may additionally store various learning models including their model parameters, such as a sentence segmentation model, an entity matching model, a pre-diagnosis model, etc. that will be described.
  • the various types of data may be stored permanently, removed periodically, or disregarded immediately after the data is processed.
  • processor 104 may include multiple modules, such as a stop-word list construction unit 142, a patient description processing unit 144, and the like. These modules (and any corresponding sub-modules or sub-units) can be hardware units (e.g., portions of an integrated circuit) of processor 104 designed for use with other components or software units implemented by processor 104 through executing at least part of a program.
  • the program may be stored on a computer-readable medium, and when executed by processor 104, it may perform one or more functions.
  • FIG. 1 shows units 142 and 144 both within one processor 104, it is contemplated that these units may be distributed among different processors located closely or remotely with each other.
  • Stop-word list construction unit 142 is configured to construct or update stop-word list 184 using sample patient descriptions 101.
  • stop-word list construction unit 142 may execute computer instructions to perform a method.
  • FIG. 2 illustrates a flowchart of an exemplary method 200 for recognizing a constructing/updating a stop-word list based on sample patient descriptions 101, according to embodiments of the disclosure.
  • Method 200 may include steps S202-S220 as described below. It is to be appreciated that some of the steps may be optional to perform the disclosure provided herein. Further, some of the steps may be performed simultaneously, or in a different order than shown in FIG. 2.
  • stop-word list construction unit 142 may receive sample patient descriptions 101, e.g., from communication interface 102. In some embodiments, a large number of sample patient descriptions may be received to construct stop-word list 184. As described, sample patient descriptions 101 may be received as texts or in their original format as acquired by patient terminal 120, such as an audio or in handwriting. In some embodiments, sample patient descriptions 101 may be processed and converted into texts. Each sample patient description 101 may include one sentence that describe the symptoms of a patient. For example, the patient may describe her symptom as “Yeah, okay, I am having a recurring pain in the head, you know, headache. ”
  • stop-word list construction unit 142 segments each sample patient description 101 into multiple word segments.
  • a word segment is the smallest unit in a sentence that has semantic meanings.
  • a word segment may be a word or a combination of two ore more words.
  • each sample patient description 101 may be segmented using a sentence segmentation model trained using sample sentences and known word segments of those sentences. Applying the segmentation model, each sample patient description 101 is segmented into a plurality of word segments.
  • the exemplary description above can be segmented as follows:
  • stop-word list construction unit 142 may compare the word segments of sample patient descriptions 101 with the entities in entity list 186, and remove any word segment that is an entity. For example, the word segment “headache” is an entity in entity list 186, and should not be included as a stop-word. Accordingly, the word “headache” will be removed from the word segments, for the purpose of identifying stop-words.
  • stop-word list construction unit 142 calculates a term frequency (TF) for each remaining word segment in sample patient descriptions 101.
  • the term frequency measures how frequent each word segment appears in patient descriptions.
  • stop-word list construction unit 142 may determine whether a word segment is among the top N most frequent words. In some embodiments, the remaining word segments may be ranked according to their term frequencies as determined in step S208, and the most frequent N words may be determined. N may be selected as a proper number based on various factors, including, e.g., the total number of word segments in sample patient descriptions 101. For example, N may be 500 for a sample size of 5000 word segments, or 5000 for a sample size of 20,000 word segments. In some alternative embodiments, N may be a percentage value, such as 10%, 20%, etc.
  • method 200 may proceed to step S212, where stop-word list construction unit 142 identifies the word segment as a candidate stop-word. Otherwise (S210: no) , method 200 may return to step S208 to analyze another word segment.
  • stop-word list construction unit 142 may identify entities from entity list 186 that potentially match with the candidate stop-word.
  • entities that are at least remotely relevant to the candidate stop-word may be identified as potentially matching entities. For example, entities such as “cutaneous pain, ” “join pain, ” “cardiac pain, ” “back pain, ” “pain when breathing, ” etc. may be identified for the candidate stop-word “muscle pain. ”
  • an entity is remotely relevant to the candidate stop-word if a similarity value between the two is higher than a nominal value.
  • matching entities can be identified using a unigram model.
  • a unigram is a single item from a given sample of text or speech. The item can be a phoneme, syllable, letters, words or base pairs according to the application.
  • a unigram model is a probabilistic language model for predicting the next item in a sequence. In some embodiments, a higher order n-gram model may be used.
  • An n-gram is a contiguous sequence of n items.
  • An n-gram model can be in the form of a (n-1) –order Markov model.
  • stop-word list construction unit 142 may calculate similarity values between the candidate stop-word and the matching entities.
  • the similarity value may be determined as a ratio between the number of overlapping words between the candidate stop-word and the matching entity and the number of words in the candidate stop-word.
  • similarity value S is determined as:
  • w 1 is the matching entity and w 2 is the candidate stop-word.
  • w 2 is the candidate stop-word.
  • the similarity value may be determined as a measure of similarity in semantic meaning between the candidate stop-word and the matching entity.
  • a learning model may be used to determine the semantic similarity. Using a semantic similarity learning model, the similarity value between the candidate stop-word “all night” and the entity “cough” may be, e.g., 10%or 20%.
  • stop-word list construction unit 142 may identify the highest similarity value determined for the candidate stop-word.
  • the highest similarity value may be compared with a threshold.
  • the threshold may be predetermined or dynamically adjusted. For example, the threshold may be 0.1, 0.2, 0.5 (or 10%, 20%, 50%) , etc. In other words, when the most similar entity to the candidate stop-word is still quite different, the candidate stop-word can be decided as a true stop-word that does not carry any meaningful information related to patient symptoms and be added to the stop-word list.
  • method 200 may return to step S208 to analyze the next word segment remaining in sample patient descriptions 101 using stops S208-S220. Method 200 may conclude when all word segments are analyzed.
  • patient description processing unit 144 is configured to process new patient descriptions 103 and provide diagnosis result 105.
  • the processing of patient description processing unit 144 takes advantage of knowledge database 182, particularly stop-word list 184 as constructed/updated by stop-word list construction unit 142 described above.
  • patient description processing unit 144 may also execute computer instructions to perform a method.
  • FIG. 3 illustrates a flowchart of an exemplary method 300 for processing a patient description using stop-word list 184, according to embodiments of the disclosure.
  • Method 300 may include steps S302-S310 as described below. It is to be appreciated that some of the steps may be optional to perform the disclosure provided herein. Further, some of the steps may be performed simultaneously, or in a different order than shown in FIG. 3.
  • patient description processing unit 144 may receive a patient description, such as patient description 103 provided by patient 130 through patient terminal 120. Similar to a sample patient description, patient description 103 may be received as texts or in its original format as acquired by patient terminal 120, such as an audio or in handwriting. In some embodiments, when patient description 103 are originally received in a non-text form, it may be automatically processed and converted into texts by e.g., patient terminal 120 or processor 104. Each patient description 103 may include one sentence or multiple sentences that describe the symptoms of patient 130. For example, the patient may describe her symptom as “I had a headache all night last night, so I woke up feeling very dizzy, you know, and by the way my nose seems running too. ”
  • patient description processing unit 144 may segment patient description 103 into multiple word segments.
  • patient description processing unit 144 may first divide patient description 103 into different sentences. For example, the above exemplary description may be divided into three sentences: “I had a headache all night last night, ” “So I woke up feeling very dizzy, you know, ” and “And by the way my nose seems running too. ”
  • Patient description processing unit 144 may further segment each of the sentences into word segments.
  • patient description processing unit 144 may apply a sentence segmentation model trained using sample sentences and known word segments of those sentences.
  • the exemplary description above can be segmented as:
  • patient description processing unit 144 may remove the word segments in patient description 103 that are among stop-word list 184.
  • patient description processing unit 144 may search each stop-word on stop-word list 184, and if the stop-word is found in patient description 103, the corresponding word segment will be removed.
  • stop-word list 184 is used to “filter” patient description 103 to remove word segments that are known to be irrelevant to patient symptoms. For example, in the description above, word segments such as “I, ” “had, ” “all night, ” “last night, ” “so, ” “woke up, ” “you know, ” “by the way, ” etc. may be identified and removed as stop-words. Removing these stop-words “cleans up” patient description 103 and conditions it for the later symptom recognition and illness diagnosis processes.
  • patient description processing unit 144 may recognize medical symptoms from the remaining word segments.
  • the medical symptoms may be recognized using various methods.
  • a span searching method may be applied to find the entity with the highest matching value with each span between two word segments.
  • an end-to-end learning network may be used to identify the matched entities to the word segments.
  • the end-to-end learning network may use word embedding, a bi-directional Long Short-Term Memory (LSTM) model, and classifiers such as softmax to find the matched entities. Medical symptoms can then be determined based on the matched entities.
  • LSTM Long Short-Term Memory
  • patient description processing unit 144 may make a preliminary diagnosis based on the medical symptoms recognized from patient description 103 and provide diagnosis result 105.
  • symptoms detected from patient description 103 may include “headache, ” “faint, ” and “running nose. ”
  • patient description processing unit 144 may pre-diagnose the illness sustained by the patient.
  • patient description processing unit 144 may predict that the patient likely has a flu.
  • patient description processing unit 144 may use a learning model to predict the illness based on the symptoms.
  • a CNN learning model may be used. The learning model may be trained with sample symptoms of patients and the final diagnosis of the patients made by physicians.
  • diagnosis result 105 which may include, e.g., the diagnosed illness, the symptoms, and relevant patient descriptions and patient information.
  • diagnosis result 105 may be provided to patient 130 and/or a medical professional through a display 150.
  • Display 150 may include a display such as a Liquid Crystal Display (LCD) , a Light Emitting Diode Display (LED) , a plasma display, or any other type of display, and provide a Graphical User Interface (GUI) presented on the display for user input and data depiction.
  • the display may include a number of different types of materials, such as plastic or glass, and may be touch-sensitive to receive inputs from the user.
  • the display may include a touch-sensitive material that is substantially rigid, such as Gorilla Glass TM , or substantially pliable, such as Willow Glass TM .
  • display 150 may be part of patient terminal 120. Based on diagnosis result 105, patient terminal 120 may automatically instruct patient 130 to visit the appropriate physician or facility for further diagnosis.
  • the computer-readable medium may include volatile or non-volatile, magnetic, semiconductor, tape, optical, removable, non-removable, or other types of computer-readable medium or computer-readable storage devices.
  • the computer-readable medium may be the storage device or the memory module having the computer instructions stored thereon, as disclosed.
  • the computer-readable medium may be a disc or a flash drive having the computer instructions stored thereon.

Abstract

An artificial intelligence system (100) and method for constructing a stop-word list (184) used for processing patient descriptions is provided. An exemplary artificial intelligence system (100) includes a patient interaction interface configured to receive sample patient descriptions (101). The artificial intelligence system (100) further includes a storage device (108) configured to store a plurality of entities (186) corresponding to known medical symptoms. The artificial intelligence system (100) also includes a processor (104). The processor (104) is configured to identify a candidate stop-word from the sample patient descriptions (101). The processor (104) is further configured to identify matching entities, from the plurality of entities, that are matched with the candidate stop-word. The processor is also configured to determine similarity values between the candidate stop-word and the matching entities, and add the candidate stop-word to the stop-word list (184) when the highest similarity value is lower than a threshold.

Description

ARTIFICIAL INTELLIGENCE SYSTEM FOR PROCESSING PATIENT DESCRIPTIONS
CROSS-REFERENCE TO TELATED APPLICATIONS
TECHNICAL FIELD
The present disclosure relates to artificial intelligence (AI) systems and methods for processing patient descriptions, and more particularly to, AI systems and methods for constructing a stop-word list for a knowledge database used to process patient descriptions.
BACKGROUND
Pre-diagnosis is usually performed in hospitals to preliminarily determine the illnesses of patients before sending them to the right physicians. Pre-diagnosis is typically based on symptoms described by the patient. For example, if the patient says she has a fever and a running nose, she will be pre-diagnosed as having a cold or a flu and be sent to an internal medicine doctor. If the patient says that she has itchy rashes on her skin, she will be pre-diagnosed as having skin allergies and be sent to a dermatologist.
Pre-diagnosis is typically performed by medical practitioners, such as physicians or nurses. For example, hospitals usually have pre-diagnosis personnel available at the check-in desk to determine where the patient should be sent to. However, having practitioners perform the pre-diagnosis wastes valuable resources. Automated pre-diagnosis methods are used to improve the efficiency. For example, diagnosis robots are being developed to perform the pre-diagnosis. These automated methods provide a preliminary diagnosis based on patient’s described symptoms, e.g., based on preprogramed mappings between diseases and known symptoms.
Patient descriptions are, however, not accurate or clear. For example, the patient may be under the influence of the illness or medicine and could not express herself accurately. In addition, when describing symptoms orally, the patient may use informal language that contains stop-words that do not convey substantive meanings, such as modal words and exclamation words. Existing automated methods rely on a predetermined stop-word list to filter out the stop-words from the patient description, before using the description to recognize medical symptoms. However, the existing stop-word lists are incomplete and need to be supplemented as new vocabularies are used in patient descriptions.
Embodiments of the disclosure address the above problems by providing improved artificial intelligence systems and methods for constructing a stop-word list for a knowledge database used to process patient descriptions.
SUMMARY
Embodiments of the disclosure provide an artificial intelligence system for constructing a stop-word list used for processing patient descriptions. An exemplary artificial intelligence system includes a patient interaction interface configured to receive sample patient descriptions. The artificial intelligence system further includes a storage device configured to store a plurality of entities corresponding to known medical symptoms. The artificial intelligence system also includes a processor. The processor is configured to identify a candidate stop-word from the sample patient descriptions. The processor is further configured to identify matching entities, from the plurality of entities, that are matched with the candidate stop-word. The processor is also configured to determine similarity values between the candidate stop-word and the matching entities. The processor is additionally configured to add the candidate stop-word to the stop-word list when the highest similarity value is lower than a threshold.
Embodiments of the disclosure also provide an artificial intelligence method for constructing a stop-word list used for processing patient descriptions. The artificial intelligence method includes receiving, by a patient interaction interface, sample patient descriptions. The method further includes identifying, by a processor, a candidate stop-word from the sample patient descriptions. The method further includes identifying matching entities, from the plurality of entities, that are matched with the candidate stop-word. The method also includes determining, by the processor, similarity values between the candidate stop-word and the matching entities. The method additionally includes adding the candidate stop-word to the stop-word list when the highest similarity value is lower than a threshold.
Embodiments of the disclosure further provide a non-transitory computer-readable medium having instructions stored thereon that, when executed by a processor, causes the processor to perform an artificial intelligence method for constructing a stop-word list used for processing patient descriptions. The artificial intelligence method includes identifying a candidate stop-word from sample patient descriptions. The method further includes identifying matching entities, from the plurality of entities, that are matched with the candidate stop-word. The method also includes determining similarity values between the candidate stop-word and the matching entities. The method additionally includes adding the candidate stop-word to the stop-word list when the highest similarity value is lower than a threshold.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 illustrates a schematic diagram of an exemplary AI system for  processing patient descriptions, according to embodiments of the disclosure;
FIG. 2 illustrates a flowchart of an exemplary method for constructing a stop-word list used for processing patient descriptions, according to embodiments of the disclosure;
FIG. 3 illustrates a flowchart of an exemplary method for processing a patient description using a stop-word list, according to embodiments of the disclosure.
DETAILED DESCRIPTION
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts.
FIG. 1 illustrates a block diagram of an exemplary AI system 100 for processing patient descriptions, according to embodiments of the disclosure. Consistent with the present disclosure, AI system 100 may receive patient descriptions (e.g., sample patient descriptions 101 or patient descriptions 103) from a patient terminal 120. For example, patient terminal 120 may be a mobile phone, a desktop computer, a laptop, a PDA, a robot, a kiosk, etc. Patient terminal 120 may include a patient interaction interface configured to receive the patient descriptions provided by one or more patients 130. In some embodiments, patient terminal 120 may include a keyboard, hard or soft, for patients 130 to type in the patient description. Patient terminal 120 may additionally or alternatively include a touch screen for patients 130 to handwrite the patient description. Accordingly, patient terminal 120 may record the patient description as texts. If the input is handwriting, patient terminal 120 may automatically recognize the handwriting and convert it to text information. In  some other embodiments, patient terminal 120 may include a microphone, for recording the patient description provided by patients 130 orally. Patient terminal 120 may automatically transcribe the recorded audio data into texts. In some alternative embodiments, AI system 100 may receive the patient descriptions in their original format as captured by patient terminal 120, and the handwriting recognition and audio transcription may be performed automatically by patient terminal 120 or AI system 100.
In some embodiments, sample patient descriptions 101 may be patient descriptions received for AI system 100 to construct or update a stop-word list. For example, AI system 100 may identify stop-words from sample patient descriptions 101 and add them to the stop-word list. In some embodiments, patient descriptions 103 may be patient descriptions for AI system to process and identify medical symptoms based on the stop-word list. For example, each patient description 103 may be filtered using the stop-word list.
In some embodiments, as shown in FIG. 1, AI system 100 may include a communication interface 102, a processor 104, a memory 106, and a storage 108. In some embodiments, AI system 100 may have different modules in a single device, such as an integrated circuit (IC) chip (e.g., implemented as an application-specific integrated circuit (ASIC) or a field-programmable gate array (FPGA) ) , or separate devices with dedicated functions. In some embodiments, one or more components of AI system 100 may be located in a cloud, or may be alternatively in a single location (such as inside a mobile device) or distributed locations. Components of AI system 100 may be in an integrated device, or distributed at different locations but communicate with each other through a network (not shown) . Consistent with the president disclosure, AI system 100 may be configured to construct a stop-word list based on sample patient descriptions 101, which is then used as a filter for processing patient descriptions  103.
Communication interface 102 may send data to and receive data from components such as patient terminal 120 via communication cables, a Wireless Local Area Network (WLAN) , a Wide Area Network (WAN) , wireless networks such as radio waves, a cellular network, and/or a local or short-range wireless network (e.g., Bluetooth TM) , or other communication methods. In some embodiments, communication interface 102 may include an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection. As another example, communication interface 102 may include a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links can also be implemented by communication interface 102. In such an implementation, communication interface 102 can send and receive electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
Consistent with some embodiments, communication interface 102 may receive data such as sample patient descriptions 101 and/or patient descriptions 103 from patient terminal 120. The patient descriptions may be received as texts or in their original format as acquired by patient terminal 120, such as an audio or in handwriting. A patient description may include one sentence or multiple sentences that describe the symptoms and feelings of patient 130. When the patient description is made orally, the description may additionally contain various spoken language such as exclamation words, including, e.g., hmm, well, all right, you know, okay, so, etc. For example, patient 130 may describe her symptom as “Yeah, okay, I am having a recurring pain in the head, you know, headache. ” Communication interface 102 may further provide the received data to memory 106 and/or storage 108 for storage or to processor 104  for processing.
Processor 104 may include any appropriate type of general-purpose or special-purpose microprocessor, digital signal processor, or microcontroller. Processor 104 may be configured as a separate processor module dedicated to constructing and updating a stop-word list for processing patient descriptions 103. Alternatively, processor 104 may be configured as a shared processor module for performing other functions unrelated to the stop-word list or patient description processing.
Memory 106 and storage 108 may include any appropriate type of mass storage provided to store any type of information that processor 104 may need to operate. Memory 106 and storage 108 may be a volatile or non-volatile, magnetic, semiconductor, tape, optical, removable, non-removable, or other type of storage device or tangible (i.e., non-transitory) computer-readable medium including, but not limited to, a ROM, a flash memory, a dynamic RAM, and a static RAM. Memory 106 and/or storage 108 may be configured to store one or more computer programs that may be executed by processor 104 to perform functions disclosed herein. For example, memory 106 and/or storage 108 may be configured to store program (s) that may be executed by processor 104 to generate diagnosis result 105 for patient 130.
Memory 106 and/or storage 108 may be further configured to store information and data used by processor 104. For instance, storage 108 may be configured to store a knowledge database 182 including the various types of data associated with patients, symptoms, diseases, diagnoses, images, treatments, and other medical data. In some embodiments, knowledge database 182 may include various lists used for automatically recognizing medical symptoms from patient descriptions, such as a stop-word list 184 and an entity list 186.
In some embodiments, stop-word list 184 may include stop-words that do  not carry substantive meanings for the purpose of medical diagnosis. In some embodiments, stop-word list 184 may include relational words. Linguistically, words may include notional words that have substantive meanings and relational words that merely express a grammatical relationship between notional words to express the meanings. For example, notional words may include nouns, verbs, adjectives, numerals, qualifiers, pronouns, etc. In contrast, a relational word does not have independent meanings and it must be attached to a notional word to express a substantive meaning. For example, relational words may include adverbs, articles, prepositions, conjunctions, particles, exclamations, etc. Because relational words carry no substantive meanings, they can be automatically included on stop-word list 184.
Stop-word list 184 may further include notional words that are unrelated to medical symptoms. Accordingly, certain notional words, such as nouns used as the subject, e.g., “I, ” “we, ” “you, ” “it” as non-substantive, and verbs and adjectives that do not meaningfully describe a symptom, e.g., “have, ” “seem, ” “look, ” “feel, ” and “a little bit, ” may be included as stop-words.
Stop-word list 184 may be constructed and updated by stop-word list construction unit 142 in processor 104, which will be described in greater details in this disclosure. Consistent with the present disclosure, stop-word list 184 may be periodically updated using, e.g., sampling patient descriptions 101. For example, additional stop-words used by the patients may be added to stop-word list 184.
In some embodiments, entity list 186 may include entities associated with known symptoms (not shown) . The entities associated with known symptoms may be provided or reviewed by medical professionals such as physicians or nurses. For example, entities may include “fever, ” “headache, ” “nausea, ” “migraine, ” “joint pain, ” “running nose, ” “bleeding, ” “swelling, ” “upset stomach, ”  “vomit, ” etc. In some embodiments, when an entity contains a phrase, it may be further divided into words and stored separately. For example, “joint pain” may be further divided into two words “joint” and “pain. ” In some embodiments, entity list 186 may be periodically updated, e.g., to include entities describing new symptoms.
In some embodiments, memory 106 and/or storage 108 may also store intermediate data such as the word segments in sample patient description 101 and patient description 103, term frequencies of word segments, candidate stop-words, and similarity values between the candidate stop-words and the entities, etc. Memory 106 and/or storage 108 may additionally store various learning models including their model parameters, such as a sentence segmentation model, an entity matching model, a pre-diagnosis model, etc. that will be described. The various types of data may be stored permanently, removed periodically, or disregarded immediately after the data is processed.
As shown in FIG. 1, processor 104 may include multiple modules, such as a stop-word list construction unit 142, a patient description processing unit 144, and the like. These modules (and any corresponding sub-modules or sub-units) can be hardware units (e.g., portions of an integrated circuit) of processor 104 designed for use with other components or software units implemented by processor 104 through executing at least part of a program. The program may be stored on a computer-readable medium, and when executed by processor 104, it may perform one or more functions. Although FIG. 1 shows  units  142 and 144 both within one processor 104, it is contemplated that these units may be distributed among different processors located closely or remotely with each other.
Stop-word list construction unit 142 is configured to construct or update stop-word list 184 using sample patient descriptions 101. In some  embodiments, stop-word list construction unit 142 may execute computer instructions to perform a method. For example, FIG. 2 illustrates a flowchart of an exemplary method 200 for recognizing a constructing/updating a stop-word list based on sample patient descriptions 101, according to embodiments of the disclosure. Method 200 may include steps S202-S220 as described below. It is to be appreciated that some of the steps may be optional to perform the disclosure provided herein. Further, some of the steps may be performed simultaneously, or in a different order than shown in FIG. 2.
In step S202, stop-word list construction unit 142 may receive sample patient descriptions 101, e.g., from communication interface 102. In some embodiments, a large number of sample patient descriptions may be received to construct stop-word list 184. As described, sample patient descriptions 101 may be received as texts or in their original format as acquired by patient terminal 120, such as an audio or in handwriting. In some embodiments, sample patient descriptions 101 may be processed and converted into texts. Each sample patient description 101 may include one sentence that describe the symptoms of a patient. For example, the patient may describe her symptom as “Yeah, okay, I am having a recurring pain in the head, you know, headache. ” 
In step S204, stop-word list construction unit 142 segments each sample patient description 101 into multiple word segments. A word segment is the smallest unit in a sentence that has semantic meanings. A word segment may be a word or a combination of two ore more words. In some embodiments, each sample patient description 101 may be segmented using a sentence segmentation model trained using sample sentences and known word segments of those sentences. Applying the segmentation model, each sample patient description 101 is segmented into a plurality of word segments. The exemplary description above can be segmented as follows:
Yeah //okay //I //am having //a //recurring //pain //in //the //head// you know //headache //
In step S206, stop-word list construction unit 142 may compare the word segments of sample patient descriptions 101 with the entities in entity list 186, and remove any word segment that is an entity. For example, the word segment “headache” is an entity in entity list 186, and should not be included as a stop-word. Accordingly, the word “headache” will be removed from the word segments, for the purpose of identifying stop-words.
In step S208, stop-word list construction unit 142 calculates a term frequency (TF) for each remaining word segment in sample patient descriptions 101. In some embodiments, the term frequency measures how frequent each word segment appears in patient descriptions. For example, the term frequency of word segment i may be calculated as TF i = F (word segment i) /F (word segments) , where F (word segment i) is the number of times word segment i appear among all sample patient descriptions 101 and F (word segments) is the total number of remaining word segments in sample patient descriptions 101.
In step S210, stop-word list construction unit 142 may determine whether a word segment is among the top N most frequent words. In some embodiments, the remaining word segments may be ranked according to their term frequencies as determined in step S208, and the most frequent N words may be determined. N may be selected as a proper number based on various factors, including, e.g., the total number of word segments in sample patient descriptions 101. For example, N may be 500 for a sample size of 5000 word segments, or 5000 for a sample size of 20,000 word segments. In some alternative embodiments, N may be a percentage value, such as 10%, 20%, etc.
If a word segment is among the top N most frequent words (S210: yes) , method 200 may proceed to step S212, where stop-word list construction unit  142 identifies the word segment as a candidate stop-word. Otherwise (S210: no) , method 200 may return to step S208 to analyze another word segment.
In step 214, stop-word list construction unit 142 may identify entities from entity list 186 that potentially match with the candidate stop-word. In some embodiments, entities that are at least remotely relevant to the candidate stop-word may be identified as potentially matching entities. For example, entities such as “cutaneous pain, ” “join pain, ” “cardiac pain, ” “back pain, ” “pain when breathing, ” etc. may be identified for the candidate stop-word “muscle pain. ” In some embodiments, an entity is remotely relevant to the candidate stop-word if a similarity value between the two is higher than a nominal value.
In some embodiments, matching entities can be identified using a unigram model. A unigram is a single item from a given sample of text or speech. The item can be a phoneme, syllable, letters, words or base pairs according to the application. A unigram model is a probabilistic language model for predicting the next item in a sequence. In some embodiments, a higher order n-gram model may be used. An n-gram is a contiguous sequence of n items. An n-gram model can be in the form of a (n-1) –order Markov model.
In step S216, stop-word list construction unit 142 may calculate similarity values between the candidate stop-word and the matching entities. In some embodiments, the similarity value may be determined as a ratio between the number of overlapping words between the candidate stop-word and the matching entity and the number of words in the candidate stop-word. For example, similarity value S is determined as:
Figure PCTCN2019097534-appb-000001
where w 1 is the matching entity and w 2 is the candidate stop-word. Using this formula, for example, the similarity value between the candidate stop-word  “muscle pain” and the entity “joint pain” is 50%.
In some embodiments, the similarity value may be determined as a measure of similarity in semantic meaning between the candidate stop-word and the matching entity. In some embodiments, a learning model may be used to determine the semantic similarity. Using a semantic similarity learning model, the similarity value between the candidate stop-word “all night” and the entity “cough” may be, e.g., 10%or 20%.
In some embodiments, stop-word list construction unit 142 may identify the highest similarity value determined for the candidate stop-word. The highest similarity value may be compared with a threshold. In step S218, if the highest similarity value is less that the threshold (S218: yes) , method 200 may proceed to step S220 to add the candidate stop-word to stop-word list 184. In some embodiments, the threshold may be predetermined or dynamically adjusted. For example, the threshold may be 0.1, 0.2, 0.5 (or 10%, 20%, 50%) , etc. In other words, when the most similar entity to the candidate stop-word is still quite different, the candidate stop-word can be decided as a true stop-word that does not carry any meaningful information related to patient symptoms and be added to the stop-word list. If the highest similarity value is no less that the threshold (S218: no) , the candidate stop-word is determined as not a true stop-word, and method 200 may return to step S208 to analyze the next word segment remaining in sample patient descriptions 101 using stops S208-S220. Method 200 may conclude when all word segments are analyzed.
Referring back to FIG. 1, patient description processing unit 144 is configured to process new patient descriptions 103 and provide diagnosis result 105. In some embodiments, the processing of patient description processing unit 144 takes advantage of knowledge database 182, particularly stop-word list 184 as constructed/updated by stop-word list construction unit 142 described  above.
In some embodiments, like stop-word list construction unit 142, patient description processing unit 144 may also execute computer instructions to perform a method. For example, FIG. 3 illustrates a flowchart of an exemplary method 300 for processing a patient description using stop-word list 184, according to embodiments of the disclosure. Method 300 may include steps S302-S310 as described below. It is to be appreciated that some of the steps may be optional to perform the disclosure provided herein. Further, some of the steps may be performed simultaneously, or in a different order than shown in FIG. 3.
In step S302, patient description processing unit 144 may receive a patient description, such as patient description 103 provided by patient 130 through patient terminal 120. Similar to a sample patient description, patient description 103 may be received as texts or in its original format as acquired by patient terminal 120, such as an audio or in handwriting. In some embodiments, when patient description 103 are originally received in a non-text form, it may be automatically processed and converted into texts by e.g., patient terminal 120 or processor 104. Each patient description 103 may include one sentence or multiple sentences that describe the symptoms of patient 130. For example, the patient may describe her symptom as “I had a headache all night last night, so I woke up feeling very dizzy, you know, and by the way my nose seems running too. ”
In step S304, patient description processing unit 144 may segment patient description 103 into multiple word segments. In some embodiments, when patient description 103 contains multiple sentences, patient description processing unit 144 may first divide patient description 103 into different sentences. For example, the above exemplary description may be divided into  three sentences: “I had a headache all night last night, ” “So I woke up feeling very dizzy, you know, ” and “And by the way my nose seems running too. ” 
Patient description processing unit 144 may further segment each of the sentences into word segments. In some embodiments, patient description processing unit 144 may apply a sentence segmentation model trained using sample sentences and known word segments of those sentences. The exemplary description above can be segmented as:
I //had //a headache //all night //last night.
So //I //woke up //feeling //very //dizzy //you know.
And //by the way //my //nose //seems //running //too
In step S306, patient description processing unit 144 may remove the word segments in patient description 103 that are among stop-word list 184. In some embodiments, patient description processing unit 144 may search each stop-word on stop-word list 184, and if the stop-word is found in patient description 103, the corresponding word segment will be removed. In other words, stop-word list 184 is used to “filter” patient description 103 to remove word segments that are known to be irrelevant to patient symptoms. For example, in the description above, word segments such as “I, ” “had, ” “all night, ” “last night, ” “so, ” “woke up, ” “you know, ” “by the way, ” etc. may be identified and removed as stop-words. Removing these stop-words “cleans up” patient description 103 and conditions it for the later symptom recognition and illness diagnosis processes.
In step S308, patient description processing unit 144 may recognize medical symptoms from the remaining word segments. The medical symptoms may be recognized using various methods. In some embodiments, a span searching method may be applied to find the entity with the highest matching value with each span between two word segments. In some embodiments, an  end-to-end learning network may be used to identify the matched entities to the word segments. For example, the end-to-end learning network may use word embedding, a bi-directional Long Short-Term Memory (LSTM) model, and classifiers such as softmax to find the matched entities. Medical symptoms can then be determined based on the matched entities.
In step S310, patient description processing unit 144 may make a preliminary diagnosis based on the medical symptoms recognized from patient description 103 and provide diagnosis result 105. For example, symptoms detected from patient description 103 may include “headache, ” “faint, ” and “running nose. ” Based on these symptoms, patient description processing unit 144 may pre-diagnose the illness sustained by the patient. For example, patient description processing unit 144 may predict that the patient likely has a flu. In some embodiments, patient description processing unit 144 may use a learning model to predict the illness based on the symptoms. For example, a CNN learning model may be used. The learning model may be trained with sample symptoms of patients and the final diagnosis of the patients made by physicians. Based on the pre-diagnosis, patient description processing unit 144 provides diagnosis result 105, which may include, e.g., the diagnosed illness, the symptoms, and relevant patient descriptions and patient information.
Referring back to FIG. 1, diagnosis result 105 may be provided to patient 130 and/or a medical professional through a display 150. Display 150 may include a display such as a Liquid Crystal Display (LCD) , a Light Emitting Diode Display (LED) , a plasma display, or any other type of display, and provide a Graphical User Interface (GUI) presented on the display for user input and data depiction. The display may include a number of different types of materials, such as plastic or glass, and may be touch-sensitive to receive inputs from the user. For example, the display may include a touch-sensitive material that is  substantially rigid, such as Gorilla Glass TM, or substantially pliable, such as Willow Glass TM. In some embodiments, display 150 may be part of patient terminal 120. Based on diagnosis result 105, patient terminal 120 may automatically instruct patient 130 to visit the appropriate physician or facility for further diagnosis.
Another aspect of the disclosure is directed to a non-transitory computer-readable medium storing instructions which, when executed, cause one or more processors to perform the methods, as discussed above. The computer-readable medium may include volatile or non-volatile, magnetic, semiconductor, tape, optical, removable, non-removable, or other types of computer-readable medium or computer-readable storage devices. For example, the computer-readable medium may be the storage device or the memory module having the computer instructions stored thereon, as disclosed. In some embodiments, the computer-readable medium may be a disc or a flash drive having the computer instructions stored thereon.
It will be apparent to those skilled in the art that various modifications and variations can be made to the disclosed system and related methods. Other embodiments will be apparent to those skilled in the art from consideration of the specification and practice of the disclosed system and related methods.
It is intended that the specification and examples be considered as exemplary only, with a true scope being indicated by the following claims and their equivalents.

Claims (20)

  1. An artificial intelligence system for constructing a stop-word list used for processing patient descriptions, comprising:
    a patient interaction interface configured to receive sample patient descriptions;
    a storage device configured to store a plurality of entities corresponding to known medical symptoms; and
    a processor configured to:
    identify a candidate stop-word from the sample patient descriptions;
    identify matching entities, from the plurality of entities, that are matched with the candidate stop-word;
    determine similarity values between the candidate stop-word and the matching entities; and
    add the candidate stop-word to the stop-word list when the highest similarity value is lower than a threshold.
  2. The artificial intelligence system of claim 1, wherein the processor is further configured to:
    segment the patient description into word segments;
    remove any word segment among the plurality of entities from the sample patient descriptions.
  3. The artificial intelligence system of claim 2, wherein the processor is further configured to:
    calculate a term frequency for each remaining word segment in the sample patient descriptions; and
    identify a word segment associated with the term frequency satisfying a  predetermined condition as the candidate stop-word.
  4. The artificial intelligence system of claim 3, wherein the predetermined condition is the word segment being among the top N most frequent words based on its term frequency.
  5. The artificial intelligence system of claim 1, wherein the similarity value is calculated as a ratio between a first number of overlapping words between the candidate stop-word and the matching entity and a second number of words in the candidate stop-word.
  6. The artificial intelligence system of claim 1, wherein the matching entities are matched with the candidate stop-word using unigram.
  7. The artificial intelligence system of claim 1, wherein the patient interaction interface is further configured to receive the patient descriptions for processing, and
    wherein the processor is further configured to filter the patient descriptions with the stop-word list.
  8. The artificial intelligence system of claim 1, wherein the patient interaction interface is a microphone configured to receive the sample patient descriptions in the form of an audio, wherein the processor is further configured to transcribe the audio to a text.
  9. An artificial intelligence method for constructing a stop-word list used for processing patient descriptions, comprising:
    receiving, by a patient interaction interface, sample patient descriptions;
    identifying, by a processor, a candidate stop-word from the sample patient descriptions;
    identifying matching entities, from the plurality of entities, that are matched with the candidate stop-word;
    determining, by the processor, similarity values between the candidate stop-word and the matching entities; and
    adding the candidate stop-word to the stop-word list when the highest similarity value is lower than a threshold.
  10. The artificial intelligence method of claim 9, further comprising:
    segmenting the patient description into word segments;
    removing any word segment among the plurality of entities from the sample patient descriptions.
  11. The artificial intelligence method of claim 10, further comprising:
    calculating a term frequency for each remaining word segment in the sample patient descriptions; and
    identifying a word segment associated with the term frequency satisfying a predetermined condition as the candidate stop-word.
  12. The artificial intelligence method of claim 11, wherein the predetermined condition is the word segment being among the top N most frequent words based on its term frequency.
  13. The artificial intelligence method of claim 9, wherein the similarity value is calculated as a ratio between a first number of overlapping words between the  candidate stop-word and the matching entity and a second number of words in the candidate stop-word.
  14. The artificial intelligence method of claim 9, wherein the matching entities are matched with the candidate stop-word using unigram.
  15. The artificial intelligence method of claim 1, further comprising:
    receiving, by the patient interaction interface, the patient descriptions for processing, and
    filtering the patient descriptions with the stop-word list.
  16. A non-transitory computer-readable medium having instructions stored thereon that, when executed by a processor, causes the processor to perform an artificial intelligence method for constructing a stop-word list used for processing patient descriptions, the artificial intelligence method comprising:
    identifying a candidate stop-word from sample patient descriptions;
    identifying matching entities that are matched with the candidate stop-word;
    determining similarity values between the candidate stop-word and the matching entities; and
    adding the candidate stop-word to the stop-word list when the highest similarity value is lower than a threshold.
  17. The non-transitory computer-readable medium of claim 16, wherein the artificial intelligence method further comprises:
    segmenting the patient description into word segments; and
    removing any word segment among the plurality of entities from the sample patient descriptions.
  18. The non-transitory computer-readable medium of claim 17, wherein the artificial intelligence method further comprises:
    calculating a term frequency for each remaining word segment in the sample patient descriptions; and
    identifying a word segment associated with the term frequency satisfying a predetermined condition as the candidate stop-word.
  19. The non-transitory computer-readable medium of claim 16, wherein the similarity value is calculated as a ratio between a first number of overlapping words between the candidate stop-word and the matching entity and a second number of words in the candidate stop-word.
  20. The non-transitory computer-readable medium of claim 16, wherein the artificial intelligence method further comprises filtering the patient descriptions for processing with the stop-word list.
PCT/CN2019/097534 2019-07-24 2019-07-24 Artificial intelligence system for processing patient descriptions WO2021012222A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/097534 WO2021012222A1 (en) 2019-07-24 2019-07-24 Artificial intelligence system for processing patient descriptions

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/097534 WO2021012222A1 (en) 2019-07-24 2019-07-24 Artificial intelligence system for processing patient descriptions

Publications (1)

Publication Number Publication Date
WO2021012222A1 true WO2021012222A1 (en) 2021-01-28

Family

ID=74192963

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/097534 WO2021012222A1 (en) 2019-07-24 2019-07-24 Artificial intelligence system for processing patient descriptions

Country Status (1)

Country Link
WO (1) WO2021012222A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012054657A2 (en) * 2010-10-20 2012-04-26 Mobilemed Apps, Llc Mobile medical information system and methods of use
CN106557653A (en) * 2016-11-15 2017-04-05 合肥工业大学 A kind of portable medical intelligent medical guide system and method
CN107247868A (en) * 2017-05-18 2017-10-13 深思考人工智能机器人科技(北京)有限公司 A kind of artificial intelligence aids in interrogation system
CN108874773A (en) * 2018-05-31 2018-11-23 平安医疗科技有限公司 Keyword increases method, apparatus, computer equipment and storage medium newly
CN109192211A (en) * 2018-10-29 2019-01-11 珠海格力电器股份有限公司 A kind of method, device and equipment of voice signal identification
CN109346171A (en) * 2018-10-31 2019-02-15 北京惠每云科技有限公司 A kind of aided diagnosis method, device and computer equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012054657A2 (en) * 2010-10-20 2012-04-26 Mobilemed Apps, Llc Mobile medical information system and methods of use
CN106557653A (en) * 2016-11-15 2017-04-05 合肥工业大学 A kind of portable medical intelligent medical guide system and method
CN107247868A (en) * 2017-05-18 2017-10-13 深思考人工智能机器人科技(北京)有限公司 A kind of artificial intelligence aids in interrogation system
CN108874773A (en) * 2018-05-31 2018-11-23 平安医疗科技有限公司 Keyword increases method, apparatus, computer equipment and storage medium newly
CN109192211A (en) * 2018-10-29 2019-01-11 珠海格力电器股份有限公司 A kind of method, device and equipment of voice signal identification
CN109346171A (en) * 2018-10-31 2019-02-15 北京惠每云科技有限公司 A kind of aided diagnosis method, device and computer equipment

Similar Documents

Publication Publication Date Title
US10740561B1 (en) Identifying entities in electronic medical records
Kim et al. Two-stage multi-intent detection for spoken language understanding
WO2019153737A1 (en) Comment assessing method, device, equipment and storage medium
CN107729313B (en) Deep neural network-based polyphone pronunciation distinguishing method and device
WO2021000497A1 (en) Retrieval method and apparatus, and computer device and storage medium
US20180025121A1 (en) Systems and methods for finer-grained medical entity extraction
US20180068222A1 (en) System and Method of Advising Human Verification of Machine-Annotated Ground Truth - Low Entropy Focus
US11244755B1 (en) Automatic generation of medical imaging reports based on fine grained finding labels
CN109299227B (en) Information query method and device based on voice recognition
US9703773B2 (en) Pattern identification and correction of document misinterpretations in a natural language processing system
CN109086265B (en) Semantic training method and multi-semantic word disambiguation method in short text
US20220375246A1 (en) Document display assistance system, document display assistance method, and program for executing said method
US11468989B2 (en) Machine-aided dialog system and medical condition inquiry apparatus and method
US11977838B2 (en) Synonym mining method, application method of synonym dictionary, medical synonym mining method, application method of medical synonym dictionary, synonym mining device and storage medium
WO2021063089A1 (en) Rule matching method, rule matching apparatus, storage medium and electronic device
WO2021208444A1 (en) Method and apparatus for automatically generating electronic cases, a device, and a storage medium
US20240111956A1 (en) Nested named entity recognition method based on part-of-speech awareness, device and storage medium therefor
US10902342B2 (en) System and method for scoring the geographic relevance of answers in a deep question answering system based on geographic context of an input question
US10552461B2 (en) System and method for scoring the geographic relevance of answers in a deep question answering system based on geographic context of a candidate answer
WO2023173823A1 (en) Method for predicting interaction relationship of drug pair, and device and medium
CN111144102A (en) Method and device for identifying entity in statement and electronic equipment
US20220375605A1 (en) Methods of automatically generating formatted annotations of doctor-patient conversations
WO2021012225A1 (en) Artificial intelligence system for medical diagnosis based on machine learning
CN112655054A (en) Artificial intelligence medical symptom identification system based on end-to-end learning
US20220108070A1 (en) Extracting Fine Grain Labels from Medical Imaging Reports

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19938988

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19938988

Country of ref document: EP

Kind code of ref document: A1