WO2021012222A1

WO2021012222A1 - Artificial intelligence system for processing patient descriptions

Info

Publication number: WO2021012222A1
Application number: PCT/CN2019/097534
Authority: WO
Inventors: Mingyang Sun; Xiaoqing Yang; Zang Li
Original assignee: Beijing Didi Infinity Technology And Development Co., Ltd.
Priority date: 2019-07-24
Filing date: 2019-07-24
Publication date: 2021-01-28

Abstract

An artificial intelligence system (100) and method for constructing a stop-word list (184) used for processing patient descriptions is provided. An exemplary artificial intelligence system (100) includes a patient interaction interface configured to receive sample patient descriptions (101). The artificial intelligence system (100) further includes a storage device (108) configured to store a plurality of entities (186) corresponding to known medical symptoms. The artificial intelligence system (100) also includes a processor (104). The processor (104) is configured to identify a candidate stop-word from the sample patient descriptions (101). The processor (104) is further configured to identify matching entities, from the plurality of entities, that are matched with the candidate stop-word. The processor is also configured to determine similarity values between the candidate stop-word and the matching entities, and add the candidate stop-word to the stop-word list (184) when the highest similarity value is lower than a threshold.

Description

ARTIFICIAL INTELLIGENCE SYSTEM FOR PROCESSING PATIENT DESCRIPTIONS

CROSS-REFERENCE TO TELATED APPLICATIONS

TECHNICAL FIELD

The present disclosure relates to artificial intelligence (AI) systems and methods for processing patient descriptions, and more particularly to, AI systems and methods for constructing a stop-word list for a knowledge database used to process patient descriptions.

BACKGROUND

Pre-diagnosis is usually performed in hospitals to preliminarily determine the illnesses of patients before sending them to the right physicians. Pre-diagnosis is typically based on symptoms described by the patient. For example, if the patient says she has a fever and a running nose, she will be pre-diagnosed as having a cold or a flu and be sent to an internal medicine doctor. If the patient says that she has itchy rashes on her skin, she will be pre-diagnosed as having skin allergies and be sent to a dermatologist.

Pre-diagnosis is typically performed by medical practitioners, such as physicians or nurses. For example, hospitals usually have pre-diagnosis personnel available at the check-in desk to determine where the patient should be sent to. However, having practitioners perform the pre-diagnosis wastes valuable resources. Automated pre-diagnosis methods are used to improve the efficiency. For example, diagnosis robots are being developed to perform the pre-diagnosis. These automated methods provide a preliminary diagnosis based on patient’s described symptoms, e.g., based on preprogramed mappings between diseases and known symptoms.

Patient descriptions are, however, not accurate or clear. For example, the patient may be under the influence of the illness or medicine and could not express herself accurately. In addition, when describing symptoms orally, the patient may use informal language that contains stop-words that do not convey substantive meanings, such as modal words and exclamation words. Existing automated methods rely on a predetermined stop-word list to filter out the stop-words from the patient description, before using the description to recognize medical symptoms. However, the existing stop-word lists are incomplete and need to be supplemented as new vocabularies are used in patient descriptions.

Embodiments of the disclosure address the above problems by providing improved artificial intelligence systems and methods for constructing a stop-word list for a knowledge database used to process patient descriptions.

SUMMARY

Embodiments of the disclosure provide an artificial intelligence system for constructing a stop-word list used for processing patient descriptions. An exemplary artificial intelligence system includes a patient interaction interface configured to receive sample patient descriptions. The artificial intelligence system further includes a storage device configured to store a plurality of entities corresponding to known medical symptoms. The artificial intelligence system also includes a processor. The processor is configured to identify a candidate stop-word from the sample patient descriptions. The processor is further configured to identify matching entities, from the plurality of entities, that are matched with the candidate stop-word. The processor is also configured to determine similarity values between the candidate stop-word and the matching entities. The processor is additionally configured to add the candidate stop-word to the stop-word list when the highest similarity value is lower than a threshold.

Embodiments of the disclosure also provide an artificial intelligence method for constructing a stop-word list used for processing patient descriptions. The artificial intelligence method includes receiving, by a patient interaction interface, sample patient descriptions. The method further includes identifying, by a processor, a candidate stop-word from the sample patient descriptions. The method further includes identifying matching entities, from the plurality of entities, that are matched with the candidate stop-word. The method also includes determining, by the processor, similarity values between the candidate stop-word and the matching entities. The method additionally includes adding the candidate stop-word to the stop-word list when the highest similarity value is lower than a threshold.

Embodiments of the disclosure further provide a non-transitory computer-readable medium having instructions stored thereon that, when executed by a processor, causes the processor to perform an artificial intelligence method for constructing a stop-word list used for processing patient descriptions. The artificial intelligence method includes identifying a candidate stop-word from sample patient descriptions. The method further includes identifying matching entities, from the plurality of entities, that are matched with the candidate stop-word. The method also includes determining similarity values between the candidate stop-word and the matching entities. The method additionally includes adding the candidate stop-word to the stop-word list when the highest similarity value is lower than a threshold.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a schematic diagram of an exemplary AI system for processing patient descriptions, according to embodiments of the disclosure;

FIG. 2 illustrates a flowchart of an exemplary method for constructing a stop-word list used for processing patient descriptions, according to embodiments of the disclosure;

FIG. 3 illustrates a flowchart of an exemplary method for processing a patient description using a stop-word list, according to embodiments of the disclosure.

DETAILED DESCRIPTION

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts.

FIG. 1 illustrates a block diagram of an exemplary AI system 100 for processing patient descriptions, according to embodiments of the disclosure. Consistent with the present disclosure, AI system 100 may receive patient descriptions (e.g., sample patient descriptions 101 or patient descriptions 103) from a patient terminal 120. For example, patient terminal 120 may be a mobile phone, a desktop computer, a laptop, a PDA, a robot, a kiosk, etc. Patient terminal 120 may include a patient interaction interface configured to receive the patient descriptions provided by one or more patients 130. In some embodiments, patient terminal 120 may include a keyboard, hard or soft, for patients 130 to type in the patient description. Patient terminal 120 may additionally or alternatively include a touch screen for patients 130 to handwrite the patient description. Accordingly, patient terminal 120 may record the patient description as texts. If the input is handwriting, patient terminal 120 may automatically recognize the handwriting and convert it to text information. In some other embodiments, patient terminal 120 may include a microphone, for recording the patient description provided by patients 130 orally. Patient terminal 120 may automatically transcribe the recorded audio data into texts. In some alternative embodiments, AI system 100 may receive the patient descriptions in their original format as captured by patient terminal 120, and the handwriting recognition and audio transcription may be performed automatically by patient terminal 120 or AI system 100.

In some embodiments, sample patient descriptions 101 may be patient descriptions received for AI system 100 to construct or update a stop-word list. For example, AI system 100 may identify stop-words from sample patient descriptions 101 and add them to the stop-word list. In some embodiments, patient descriptions 103 may be patient descriptions for AI system to process and identify medical symptoms based on the stop-word list. For example, each patient description 103 may be filtered using the stop-word list.

In some embodiments, as shown in FIG. 1, AI system 100 may include a communication interface 102, a processor 104, a memory 106, and a storage 108. In some embodiments, AI system 100 may have different modules in a single device, such as an integrated circuit (IC) chip (e.g., implemented as an application-specific integrated circuit (ASIC) or a field-programmable gate array (FPGA) ) , or separate devices with dedicated functions. In some embodiments, one or more components of AI system 100 may be located in a cloud, or may be alternatively in a single location (such as inside a mobile device) or distributed locations. Components of AI system 100 may be in an integrated device, or distributed at different locations but communicate with each other through a network (not shown) . Consistent with the president disclosure, AI system 100 may be configured to construct a stop-word list based on sample patient descriptions 101, which is then used as a filter for processing patient descriptions 103.

Communication interface 102 may send data to and receive data from components such as patient terminal 120 via communication cables, a Wireless Local Area Network (WLAN) , a Wide Area Network (WAN) , wireless networks such as radio waves, a cellular network, and/or a local or short-range wireless network (e.g., Bluetooth ^TM) , or other communication methods. In some embodiments, communication interface 102 may include an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection. As another example, communication interface 102 may include a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links can also be implemented by communication interface 102. In such an implementation, communication interface 102 can send and receive electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.

Consistent with some embodiments, communication interface 102 may receive data such as sample patient descriptions 101 and/or patient descriptions 103 from patient terminal 120. The patient descriptions may be received as texts or in their original format as acquired by patient terminal 120, such as an audio or in handwriting. A patient description may include one sentence or multiple sentences that describe the symptoms and feelings of patient 130. When the patient description is made orally, the description may additionally contain various spoken language such as exclamation words, including, e.g., hmm, well, all right, you know, okay, so, etc. For example, patient 130 may describe her symptom as “Yeah, okay, I am having a recurring pain in the head, you know, headache. ” Communication interface 102 may further provide the received data to memory 106 and/or storage 108 for storage or to processor 104 for processing.

Processor 104 may include any appropriate type of general-purpose or special-purpose microprocessor, digital signal processor, or microcontroller. Processor 104 may be configured as a separate processor module dedicated to constructing and updating a stop-word list for processing patient descriptions 103. Alternatively, processor 104 may be configured as a shared processor module for performing other functions unrelated to the stop-word list or patient description processing.

Memory 106 and storage 108 may include any appropriate type of mass storage provided to store any type of information that processor 104 may need to operate. Memory 106 and storage 108 may be a volatile or non-volatile, magnetic, semiconductor, tape, optical, removable, non-removable, or other type of storage device or tangible (i.e., non-transitory) computer-readable medium including, but not limited to, a ROM, a flash memory, a dynamic RAM, and a static RAM. Memory 106 and/or storage 108 may be configured to store one or more computer programs that may be executed by processor 104 to perform functions disclosed herein. For example, memory 106 and/or storage 108 may be configured to store program (s) that may be executed by processor 104 to generate diagnosis result 105 for patient 130.

Memory 106 and/or storage 108 may be further configured to store information and data used by processor 104. For instance, storage 108 may be configured to store a knowledge database 182 including the various types of data associated with patients, symptoms, diseases, diagnoses, images, treatments, and other medical data. In some embodiments, knowledge database 182 may include various lists used for automatically recognizing medical symptoms from patient descriptions, such as a stop-word list 184 and an entity list 186.

In some embodiments, stop-word list 184 may include stop-words that do not carry substantive meanings for the purpose of medical diagnosis. In some embodiments, stop-word list 184 may include relational words. Linguistically, words may include notional words that have substantive meanings and relational words that merely express a grammatical relationship between notional words to express the meanings. For example, notional words may include nouns, verbs, adjectives, numerals, qualifiers, pronouns, etc. In contrast, a relational word does not have independent meanings and it must be attached to a notional word to express a substantive meaning. For example, relational words may include adverbs, articles, prepositions, conjunctions, particles, exclamations, etc. Because relational words carry no substantive meanings, they can be automatically included on stop-word list 184.

Stop-word list 184 may further include notional words that are unrelated to medical symptoms. Accordingly, certain notional words, such as nouns used as the subject, e.g., “I, ” “we, ” “you, ” “it” as non-substantive, and verbs and adjectives that do not meaningfully describe a symptom, e.g., “have, ” “seem, ” “look, ” “feel, ” and “a little bit, ” may be included as stop-words.

Stop-word list 184 may be constructed and updated by stop-word list construction unit 142 in processor 104, which will be described in greater details in this disclosure. Consistent with the present disclosure, stop-word list 184 may be periodically updated using, e.g., sampling patient descriptions 101. For example, additional stop-words used by the patients may be added to stop-word list 184.

In some embodiments, entity list 186 may include entities associated with known symptoms (not shown) . The entities associated with known symptoms may be provided or reviewed by medical professionals such as physicians or nurses. For example, entities may include “fever, ” “headache, ” “nausea, ” “migraine, ” “joint pain, ” “running nose, ” “bleeding, ” “swelling, ” “upset stomach, ” “vomit, ” etc. In some embodiments, when an entity contains a phrase, it may be further divided into words and stored separately. For example, “joint pain” may be further divided into two words “joint” and “pain. ” In some embodiments, entity list 186 may be periodically updated, e.g., to include entities describing new symptoms.

In some embodiments, memory 106 and/or storage 108 may also store intermediate data such as the word segments in sample patient description 101 and patient description 103, term frequencies of word segments, candidate stop-words, and similarity values between the candidate stop-words and the entities, etc. Memory 106 and/or storage 108 may additionally store various learning models including their model parameters, such as a sentence segmentation model, an entity matching model, a pre-diagnosis model, etc. that will be described. The various types of data may be stored permanently, removed periodically, or disregarded immediately after the data is processed.

As shown in FIG. 1, processor 104 may include multiple modules, such as a stop-word list construction unit 142, a patient description processing unit 144, and the like. These modules (and any corresponding sub-modules or sub-units) can be hardware units (e.g., portions of an integrated circuit) of processor 104 designed for use with other components or software units implemented by processor 104 through executing at least part of a program. The program may be stored on a computer-readable medium, and when executed by processor 104, it may perform one or more functions. Although FIG. 1 shows

units

142 and 144 both within one processor 104, it is contemplated that these units may be distributed among different processors located closely or remotely with each other.

Stop-word list construction unit 142 is configured to construct or update stop-word list 184 using sample patient descriptions 101. In some embodiments, stop-word list construction unit 142 may execute computer instructions to perform a method. For example, FIG. 2 illustrates a flowchart of an exemplary method 200 for recognizing a constructing/updating a stop-word list based on sample patient descriptions 101, according to embodiments of the disclosure. Method 200 may include steps S202-S220 as described below. It is to be appreciated that some of the steps may be optional to perform the disclosure provided herein. Further, some of the steps may be performed simultaneously, or in a different order than shown in FIG. 2.

In step S202, stop-word list construction unit 142 may receive sample patient descriptions 101, e.g., from communication interface 102. In some embodiments, a large number of sample patient descriptions may be received to construct stop-word list 184. As described, sample patient descriptions 101 may be received as texts or in their original format as acquired by patient terminal 120, such as an audio or in handwriting. In some embodiments, sample patient descriptions 101 may be processed and converted into texts. Each sample patient description 101 may include one sentence that describe the symptoms of a patient. For example, the patient may describe her symptom as “Yeah, okay, I am having a recurring pain in the head, you know, headache. ”

In step S204, stop-word list construction unit 142 segments each sample patient description 101 into multiple word segments. A word segment is the smallest unit in a sentence that has semantic meanings. A word segment may be a word or a combination of two ore more words. In some embodiments, each sample patient description 101 may be segmented using a sentence segmentation model trained using sample sentences and known word segments of those sentences. Applying the segmentation model, each sample patient description 101 is segmented into a plurality of word segments. The exemplary description above can be segmented as follows:

Yeah //okay //I //am having //a //recurring //pain //in //the //head// you know //headache //

In step S206, stop-word list construction unit 142 may compare the word segments of sample patient descriptions 101 with the entities in entity list 186, and remove any word segment that is an entity. For example, the word segment “headache” is an entity in entity list 186, and should not be included as a stop-word. Accordingly, the word “headache” will be removed from the word segments, for the purpose of identifying stop-words.

In step S208, stop-word list construction unit 142 calculates a term frequency (TF) for each remaining word segment in sample patient descriptions 101. In some embodiments, the term frequency measures how frequent each word segment appears in patient descriptions. For example, the term frequency of word segment i may be calculated as TF _i = F (word segment i) /F (word segments) , where F (word segment i) is the number of times word segment i appear among all sample patient descriptions 101 and F (word segments) is the total number of remaining word segments in sample patient descriptions 101.

In step S210, stop-word list construction unit 142 may determine whether a word segment is among the top N most frequent words. In some embodiments, the remaining word segments may be ranked according to their term frequencies as determined in step S208, and the most frequent N words may be determined. N may be selected as a proper number based on various factors, including, e.g., the total number of word segments in sample patient descriptions 101. For example, N may be 500 for a sample size of 5000 word segments, or 5000 for a sample size of 20,000 word segments. In some alternative embodiments, N may be a percentage value, such as 10%, 20%, etc.

If a word segment is among the top N most frequent words (S210: yes) , method 200 may proceed to step S212, where stop-word list construction unit 142 identifies the word segment as a candidate stop-word. Otherwise (S210: no) , method 200 may return to step S208 to analyze another word segment.

In step 214, stop-word list construction unit 142 may identify entities from entity list 186 that potentially match with the candidate stop-word. In some embodiments, entities that are at least remotely relevant to the candidate stop-word may be identified as potentially matching entities. For example, entities such as “cutaneous pain, ” “join pain, ” “cardiac pain, ” “back pain, ” “pain when breathing, ” etc. may be identified for the candidate stop-word “muscle pain. ” In some embodiments, an entity is remotely relevant to the candidate stop-word if a similarity value between the two is higher than a nominal value.

In some embodiments, matching entities can be identified using a unigram model. A unigram is a single item from a given sample of text or speech. The item can be a phoneme, syllable, letters, words or base pairs according to the application. A unigram model is a probabilistic language model for predicting the next item in a sequence. In some embodiments, a higher order n-gram model may be used. An n-gram is a contiguous sequence of n items. An n-gram model can be in the form of a (n-1) –order Markov model.

In step S216, stop-word list construction unit 142 may calculate similarity values between the candidate stop-word and the matching entities. In some embodiments, the similarity value may be determined as a ratio between the number of overlapping words between the candidate stop-word and the matching entity and the number of words in the candidate stop-word. For example, similarity value S is determined as:

where w ₁ is the matching entity and w ₂ is the candidate stop-word. Using this formula, for example, the similarity value between the candidate stop-word “muscle pain” and the entity “joint pain” is 50%.

In some embodiments, the similarity value may be determined as a measure of similarity in semantic meaning between the candidate stop-word and the matching entity. In some embodiments, a learning model may be used to determine the semantic similarity. Using a semantic similarity learning model, the similarity value between the candidate stop-word “all night” and the entity “cough” may be, e.g., 10%or 20%.

In some embodiments, stop-word list construction unit 142 may identify the highest similarity value determined for the candidate stop-word. The highest similarity value may be compared with a threshold. In step S218, if the highest similarity value is less that the threshold (S218: yes) , method 200 may proceed to step S220 to add the candidate stop-word to stop-word list 184. In some embodiments, the threshold may be predetermined or dynamically adjusted. For example, the threshold may be 0.1, 0.2, 0.5 (or 10%, 20%, 50%) , etc. In other words, when the most similar entity to the candidate stop-word is still quite different, the candidate stop-word can be decided as a true stop-word that does not carry any meaningful information related to patient symptoms and be added to the stop-word list. If the highest similarity value is no less that the threshold (S218: no) , the candidate stop-word is determined as not a true stop-word, and method 200 may return to step S208 to analyze the next word segment remaining in sample patient descriptions 101 using stops S208-S220. Method 200 may conclude when all word segments are analyzed.

Referring back to FIG. 1, patient description processing unit 144 is configured to process new patient descriptions 103 and provide diagnosis result 105. In some embodiments, the processing of patient description processing unit 144 takes advantage of knowledge database 182, particularly stop-word list 184 as constructed/updated by stop-word list construction unit 142 described above.

In some embodiments, like stop-word list construction unit 142, patient description processing unit 144 may also execute computer instructions to perform a method. For example, FIG. 3 illustrates a flowchart of an exemplary method 300 for processing a patient description using stop-word list 184, according to embodiments of the disclosure. Method 300 may include steps S302-S310 as described below. It is to be appreciated that some of the steps may be optional to perform the disclosure provided herein. Further, some of the steps may be performed simultaneously, or in a different order than shown in FIG. 3.

In step S302, patient description processing unit 144 may receive a patient description, such as patient description 103 provided by patient 130 through patient terminal 120. Similar to a sample patient description, patient description 103 may be received as texts or in its original format as acquired by patient terminal 120, such as an audio or in handwriting. In some embodiments, when patient description 103 are originally received in a non-text form, it may be automatically processed and converted into texts by e.g., patient terminal 120 or processor 104. Each patient description 103 may include one sentence or multiple sentences that describe the symptoms of patient 130. For example, the patient may describe her symptom as “I had a headache all night last night, so I woke up feeling very dizzy, you know, and by the way my nose seems running too. ”

In step S304, patient description processing unit 144 may segment patient description 103 into multiple word segments. In some embodiments, when patient description 103 contains multiple sentences, patient description processing unit 144 may first divide patient description 103 into different sentences. For example, the above exemplary description may be divided into three sentences: “I had a headache all night last night, ” “So I woke up feeling very dizzy, you know, ” and “And by the way my nose seems running too. ”

Patient description processing unit 144 may further segment each of the sentences into word segments. In some embodiments, patient description processing unit 144 may apply a sentence segmentation model trained using sample sentences and known word segments of those sentences. The exemplary description above can be segmented as:

I //had //a headache //all night //last night.

So //I //woke up //feeling //very //dizzy //you know.

And //by the way //my //nose //seems //running //too

In step S306, patient description processing unit 144 may remove the word segments in patient description 103 that are among stop-word list 184. In some embodiments, patient description processing unit 144 may search each stop-word on stop-word list 184, and if the stop-word is found in patient description 103, the corresponding word segment will be removed. In other words, stop-word list 184 is used to “filter” patient description 103 to remove word segments that are known to be irrelevant to patient symptoms. For example, in the description above, word segments such as “I, ” “had, ” “all night, ” “last night, ” “so, ” “woke up, ” “you know, ” “by the way, ” etc. may be identified and removed as stop-words. Removing these stop-words “cleans up” patient description 103 and conditions it for the later symptom recognition and illness diagnosis processes.

In step S308, patient description processing unit 144 may recognize medical symptoms from the remaining word segments. The medical symptoms may be recognized using various methods. In some embodiments, a span searching method may be applied to find the entity with the highest matching value with each span between two word segments. In some embodiments, an end-to-end learning network may be used to identify the matched entities to the word segments. For example, the end-to-end learning network may use word embedding, a bi-directional Long Short-Term Memory (LSTM) model, and classifiers such as softmax to find the matched entities. Medical symptoms can then be determined based on the matched entities.

In step S310, patient description processing unit 144 may make a preliminary diagnosis based on the medical symptoms recognized from patient description 103 and provide diagnosis result 105. For example, symptoms detected from patient description 103 may include “headache, ” “faint, ” and “running nose. ” Based on these symptoms, patient description processing unit 144 may pre-diagnose the illness sustained by the patient. For example, patient description processing unit 144 may predict that the patient likely has a flu. In some embodiments, patient description processing unit 144 may use a learning model to predict the illness based on the symptoms. For example, a CNN learning model may be used. The learning model may be trained with sample symptoms of patients and the final diagnosis of the patients made by physicians. Based on the pre-diagnosis, patient description processing unit 144 provides diagnosis result 105, which may include, e.g., the diagnosed illness, the symptoms, and relevant patient descriptions and patient information.

Referring back to FIG. 1, diagnosis result 105 may be provided to patient 130 and/or a medical professional through a display 150. Display 150 may include a display such as a Liquid Crystal Display (LCD) , a Light Emitting Diode Display (LED) , a plasma display, or any other type of display, and provide a Graphical User Interface (GUI) presented on the display for user input and data depiction. The display may include a number of different types of materials, such as plastic or glass, and may be touch-sensitive to receive inputs from the user. For example, the display may include a touch-sensitive material that is substantially rigid, such as Gorilla Glass ^TM, or substantially pliable, such as Willow Glass ^TM. In some embodiments, display 150 may be part of patient terminal 120. Based on diagnosis result 105, patient terminal 120 may automatically instruct patient 130 to visit the appropriate physician or facility for further diagnosis.

Another aspect of the disclosure is directed to a non-transitory computer-readable medium storing instructions which, when executed, cause one or more processors to perform the methods, as discussed above. The computer-readable medium may include volatile or non-volatile, magnetic, semiconductor, tape, optical, removable, non-removable, or other types of computer-readable medium or computer-readable storage devices. For example, the computer-readable medium may be the storage device or the memory module having the computer instructions stored thereon, as disclosed. In some embodiments, the computer-readable medium may be a disc or a flash drive having the computer instructions stored thereon.

It will be apparent to those skilled in the art that various modifications and variations can be made to the disclosed system and related methods. Other embodiments will be apparent to those skilled in the art from consideration of the specification and practice of the disclosed system and related methods.

It is intended that the specification and examples be considered as exemplary only, with a true scope being indicated by the following claims and their equivalents.

Claims

An artificial intelligence system for constructing a stop-word list used for processing patient descriptions, comprising:

a patient interaction interface configured to receive sample patient descriptions;

a storage device configured to store a plurality of entities corresponding to known medical symptoms; and

a processor configured to:

identify a candidate stop-word from the sample patient descriptions;

identify matching entities, from the plurality of entities, that are matched with the candidate stop-word;

determine similarity values between the candidate stop-word and the matching entities; and

add the candidate stop-word to the stop-word list when the highest similarity value is lower than a threshold.
The artificial intelligence system of claim 1, wherein the processor is further configured to:

segment the patient description into word segments;

remove any word segment among the plurality of entities from the sample patient descriptions.
The artificial intelligence system of claim 2, wherein the processor is further configured to:

calculate a term frequency for each remaining word segment in the sample patient descriptions; and

identify a word segment associated with the term frequency satisfying a predetermined condition as the candidate stop-word.
The artificial intelligence system of claim 3, wherein the predetermined condition is the word segment being among the top N most frequent words based on its term frequency.
The artificial intelligence system of claim 1, wherein the similarity value is calculated as a ratio between a first number of overlapping words between the candidate stop-word and the matching entity and a second number of words in the candidate stop-word.
The artificial intelligence system of claim 1, wherein the matching entities are matched with the candidate stop-word using unigram.
The artificial intelligence system of claim 1, wherein the patient interaction interface is further configured to receive the patient descriptions for processing, and

wherein the processor is further configured to filter the patient descriptions with the stop-word list.
The artificial intelligence system of claim 1, wherein the patient interaction interface is a microphone configured to receive the sample patient descriptions in the form of an audio, wherein the processor is further configured to transcribe the audio to a text.
An artificial intelligence method for constructing a stop-word list used for processing patient descriptions, comprising:

receiving, by a patient interaction interface, sample patient descriptions;

identifying, by a processor, a candidate stop-word from the sample patient descriptions;

identifying matching entities, from the plurality of entities, that are matched with the candidate stop-word;

determining, by the processor, similarity values between the candidate stop-word and the matching entities; and

adding the candidate stop-word to the stop-word list when the highest similarity value is lower than a threshold.
The artificial intelligence method of claim 9, further comprising:

segmenting the patient description into word segments;

removing any word segment among the plurality of entities from the sample patient descriptions.
The artificial intelligence method of claim 10, further comprising:

calculating a term frequency for each remaining word segment in the sample patient descriptions; and

identifying a word segment associated with the term frequency satisfying a predetermined condition as the candidate stop-word.
The artificial intelligence method of claim 11, wherein the predetermined condition is the word segment being among the top N most frequent words based on its term frequency.
The artificial intelligence method of claim 9, wherein the similarity value is calculated as a ratio between a first number of overlapping words between the candidate stop-word and the matching entity and a second number of words in the candidate stop-word.
The artificial intelligence method of claim 9, wherein the matching entities are matched with the candidate stop-word using unigram.
The artificial intelligence method of claim 1, further comprising:

receiving, by the patient interaction interface, the patient descriptions for processing, and

filtering the patient descriptions with the stop-word list.
A non-transitory computer-readable medium having instructions stored thereon that, when executed by a processor, causes the processor to perform an artificial intelligence method for constructing a stop-word list used for processing patient descriptions, the artificial intelligence method comprising:

identifying a candidate stop-word from sample patient descriptions;

identifying matching entities that are matched with the candidate stop-word;

determining similarity values between the candidate stop-word and the matching entities; and

adding the candidate stop-word to the stop-word list when the highest similarity value is lower than a threshold.
The non-transitory computer-readable medium of claim 16, wherein the artificial intelligence method further comprises:

segmenting the patient description into word segments; and

removing any word segment among the plurality of entities from the sample patient descriptions.
The non-transitory computer-readable medium of claim 17, wherein the artificial intelligence method further comprises:

calculating a term frequency for each remaining word segment in the sample patient descriptions; and

identifying a word segment associated with the term frequency satisfying a predetermined condition as the candidate stop-word.
The non-transitory computer-readable medium of claim 16, wherein the similarity value is calculated as a ratio between a first number of overlapping words between the candidate stop-word and the matching entity and a second number of words in the candidate stop-word.
The non-transitory computer-readable medium of claim 16, wherein the artificial intelligence method further comprises filtering the patient descriptions for processing with the stop-word list.