WO2020244534A1 - 医疗问答方法、医疗问答系统、电子设备和计算机可读存储介质 - Google Patents

医疗问答方法、医疗问答系统、电子设备和计算机可读存储介质 Download PDF

Info

Publication number
WO2020244534A1
WO2020244534A1 PCT/CN2020/094068 CN2020094068W WO2020244534A1 WO 2020244534 A1 WO2020244534 A1 WO 2020244534A1 CN 2020094068 W CN2020094068 W CN 2020094068W WO 2020244534 A1 WO2020244534 A1 WO 2020244534A1
Authority
WO
WIPO (PCT)
Prior art keywords
semantic
patient
medical
intention
answer
Prior art date
Application number
PCT/CN2020/094068
Other languages
English (en)
French (fr)
Inventor
胡玉兰
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to US17/282,035 priority Critical patent/US20210375404A1/en
Publication of WO2020244534A1 publication Critical patent/WO2020244534A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/20ICT specially adapted for the handling or processing of patient-related medical or healthcare data for electronic clinical trials or questionnaires
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H80/00ICT specially adapted for facilitating communication between medical practitioners or patients, e.g. for collaborative diagnosis, therapy or health monitoring

Definitions

  • the present disclosure relates to the field of Internet technology, and in particular to a medical question answering method, a medical question answering system, an electronic device and a computer-readable storage medium.
  • the embodiments of the present disclosure provide a medical question answering method, a medical question answering system, an electronic device, and a non-transitory computer-readable storage medium.
  • the first aspect of the present disclosure provides a medical question and answer method, including:
  • the synonym mapping table includes a mapping relationship between a plurality of standard expression words and respective corresponding entity words ;
  • the corresponding answer is output according to the semantic analysis result.
  • the recognizing the patient's intention according to the medical consultation sentence input by the patient includes:
  • the patient's intention is determined.
  • the extracting at least one entity word corresponding to the condition feature from the medical consultation sentence according to the patient's intention includes:
  • the semantic slot template including a plurality of semantic slots for characterizing the characteristics of the condition
  • the extracting entity words corresponding to the multiple semantic slots in the semantic slot template from the medical consultation sentence includes:
  • a sequence labeling model is used to sequence the medical consultation sentences, and the entity words corresponding to the multiple semantic slots in the semantic slot template are obtained according to the sequence labeling results.
  • the generating a semantic analysis result according to the patient's intention and the standard expression word includes:
  • the semantic analysis result is generated according to the patient's intention, each semantic slot and its filling value.
  • the output of the corresponding answer according to the semantic analysis result includes:
  • each of the sample groups including question samples and corresponding answer samples
  • the calculating the degree of matching between the semantic analysis result and each sample group in the doctor-patient question and answer knowledge base includes:
  • the matching degree is generated according to the similarity degree and the first weighting coefficient, and the correlation degree and the second weighting coefficient.
  • the disease characteristics include: at least one of onset symptoms, symptom onset time, symptom duration, accompanying symptoms, medical history, treatment history, and patient age.
  • the medical question and answer method before recognizing the patient's intention according to the medical consultation sentence input by the patient, the medical question and answer method further includes:
  • the synonym mapping table is generated.
  • the second aspect of the present disclosure provides a medical question answering system, including:
  • the intention recognizer is used to recognize the patient’s intention according to the medical consultation sentence entered by the patient;
  • An entity word extractor which is used to extract at least one entity word corresponding to a feature of the condition from the medical consultation sentence according to the patient's intention
  • the standard word acquisition unit is configured to acquire standard expression words that are synonymous with each of the at least one entity word according to a preset synonym mapping table; wherein, the synonym mapping table includes a plurality of standard expression words and their respective corresponding The mapping relationship between entity words;
  • a parser configured to generate a semantic analysis result according to the patient's intention and the standard expression
  • the output unit outputs the corresponding answer according to the semantic analysis result.
  • the intention recognizer is also used for:
  • the patient's intention is determined.
  • the entity word extractor includes:
  • a template obtaining unit configured to obtain a semantic slot template corresponding to the patient's intention, the semantic slot template including a plurality of semantic slots for characterizing the characteristics of the condition;
  • the recognition unit is configured to extract entity words corresponding to the multiple semantic slots in the semantic slot template from the medical consultation sentence.
  • the identification unit is further configured to:
  • a sequence labeling model is used to sequence the medical consultation sentences, and the entity words corresponding to the multiple semantic slots in the semantic slot template are obtained according to the sequence labeling results.
  • the parser includes:
  • the filling unit is used to fill the standard expression words corresponding to the patient's medical consultation sentence into the corresponding semantic slots of the plurality of semantic slots;
  • a judging unit for judging whether there is an unfilled semantic slot in the semantic slot template
  • the inquiry unit is used to generate an inquiry question corresponding to the unfilled semantic slot when there is an unfilled semantic slot in the semantic slot template, and respond to the unfilled semantic slot according to the answer sentence entered by the patient for the inquiry question.
  • the semantic slot is filled until all the semantic slots of the current semantic slot template are filled;
  • the parsing unit is used to generate the semantic parsing result according to the patient's intention, each semantic slot and its filling value.
  • the output unit includes:
  • the matching degree calculator is used to calculate the matching degree between the semantic analysis result and each sample group in the doctor-patient question and answer knowledge base, each sample group includes a question sample and its corresponding answer sample;
  • the output unit is used to output the answer sample corresponding to the maximum matching degree.
  • the matching degree calculator includes:
  • the generating subunit is configured to generate the matching degree according to the similarity and the first weighting coefficient, and the correlation and the second weighting coefficient.
  • the medical question answering system further includes:
  • Standard word database generator used to generate a standard word database, in which there are multiple standard expression word samples
  • Synonym collector for collecting at least one synonym corresponding to each standard expression word sample
  • the filter is used to calculate the similarity between each standard expression word sample and its corresponding synonym, keep the synonym corresponding to the similarity greater than the preset value, and replace the synonym corresponding to the similarity less than or equal to the preset value Remove;
  • the mapping table generator is used to generate the synonym mapping table according to each synonym and its corresponding and currently reserved synonyms.
  • a third aspect of the present disclosure provides an electronic device including a memory and a processor, and a computer program is stored on the memory, wherein the computer program is executed by the processor to implement the method according to the first aspect of the present disclosure.
  • the fourth aspect of the present disclosure provides a non-transitory computer-readable storage medium on which a computer program is stored, wherein, when the computer program is executed by a processor, the implementation in each embodiment of the first aspect of the present disclosure Any one of the mentioned medical question and answer methods.
  • FIG. 1 is a flowchart of a medical question-and-answer method provided by an embodiment of the disclosure
  • FIG. 3 is a schematic diagram of the process of generating a synonym mapping table provided by an embodiment of the disclosure
  • FIG. 4 is a schematic structural diagram of a medical question answering system provided by an embodiment of the disclosure.
  • FIG. 5 is a schematic structural diagram of a medical question answering system provided by an embodiment of the disclosure.
  • the inventor of the present disclosure found that when a patient (or user) consults a related online disease question and answer website, the question input by the patient is generally colloquial and the description is diverse, resulting in the relevant online disease question and answer website (or related Question answering system) cannot answer the questions entered by patients well.
  • Fig. 1 is a flowchart of a medical question-and-answer method provided by an embodiment of the disclosure.
  • the medical question answering method may be executed by a medical question answering system, the system may be implemented by software and/or hardware, and the system may be integrated in an electronic device.
  • the medical question-and-answer method may include the following steps S11 to S15.
  • Step S11 Identify the patient's intention according to the medical consultation sentence input by the patient (or user).
  • the types of the patient's intentions may include “disease diagnosis”, “treatment”, “medicine”, “medicine effect consultation”, “pathogenesis consultation”, “surgical consultation”, and the like.
  • step S11 may use a preset intention recognition model to determine the specific type of patient intention.
  • Step S12 according to the patient's intention, extract at least one entity word corresponding to the condition feature from the medical consultation sentence.
  • the patient's condition characteristics include at least one of onset symptoms, symptom onset time, symptom duration, accompanying symptoms, medical history, treatment history, and patient age.
  • Each intention can correspond to one or more preset disease characteristics.
  • the entity word may be a word corresponding to at least one of the patient's onset symptoms, symptom occurrence time, symptom duration, accompanying symptoms, medical history, treatment history, and age.
  • the patient's intention is identified as “treatment”, and based on the intention of "treatment”, extract: corresponding to "patient age”
  • Step S13 Obtain a standard expression word that is synonymous with each of the at least one entity word according to a preset synonym mapping table.
  • the synonym mapping table may include a mapping relationship between a plurality of standard expression words and their respective corresponding synonyms (ie, the entity words).
  • the entity words extracted in step S12 can be colloquialized words, such as “diarrhea”, “cannot eat”, “bad appetite”, “bad appetite”; according to the synonym mapping table, it can be obtained with “diarrhea”
  • the corresponding standard expression is “diarrhea”, and the corresponding standard expressions of "can't eat,” “bad appetite,” and “bad appetite” are all “anorexia”.
  • Step S14 Generate a semantic analysis result according to the patient's intention and standard expression words.
  • Step S15 Output a corresponding answer according to the semantic analysis result.
  • FIG. 2 is a flowchart of another medical question and answer method provided by an embodiment of the disclosure. As shown in FIG. 2, the medical question and answer method may include the following steps S21 to S25.
  • Step S21 Identify the patient's intention according to the medical consultation sentence input by the patient.
  • this step S21 may include the following steps S211 to S213.
  • Step S211 Obtain the document subject information of the medical consultation sentence input by the patient; and convert the medical consultation sentence input by the patient from text data into vector data.
  • a document topic generation (also called Latent Dirichlet Allocation, referred to as LDA) model may be used to generate document topic information of a medical consultation sentence, and the word2vec model may be used to convert the medical consultation sentence into an embedding vector.
  • LDA Latent Dirichlet Allocation
  • Step S212 According to the document subject information and vector data corresponding to the medical consultation sentence, a score corresponding to each preset intention of the medical consultation sentence is obtained.
  • document subject information and vector data corresponding to medical consultation sentences can be spliced to obtain a vector matrix containing word information and subject information, and the vector matrix can be input to the bidirectional gated recurrent unit (BiGRU) to obtain the medical consultation sentence correspondence
  • BiGRU bidirectional gated recurrent unit
  • Step S213 Determine the patient's intention according to the score of the medical consultation sentence corresponding to each preset intention.
  • a softmax classifier is used to map the score corresponding to each intent to a probability between (0, 1), so that the patient's intent is determined according to the maximum probability.
  • the softmax classifier is only for illustration, and other classifiers, such as svm classifier, can also be applied.
  • step S21 may be performed using a preset intent recognition model, which may include a word2vec model, a document topic generation (LDA) model, a bidirectional gated recurrent unit (BiGRU), and a softmax classifier.
  • a preset intent recognition model which may include a word2vec model, a document topic generation (LDA) model, a bidirectional gated recurrent unit (BiGRU), and a softmax classifier.
  • the intent recognition model of the required function can be obtained through training.
  • training use samples from professional medical websites or apps (such as Haodafu (see www.haodf.com), Dingxiang Doctor (see www.dxy.com), Ping An Good Doctor (see www.jk.cn) Etc.) or medical inquiry records (patients and doctors’ inquiry records) to collect doctor-patient question and answer data, extract the patient’s medical consultation sentences, and perform data cleaning on the text of the medical consultation sentences (that is, remove non-keywords from the text) , Such as "Hello” etc.).
  • clustering algorithm is used to cluster the text data, and the types of intentions commonly asked by patients are determined by sampling; and the specific types of each type of intentions are determined by professionals (doctors or professionals with medical knowledge). And train the intention recognition model according to each medical consultation sentence and its corresponding intention type.
  • the types of intentions may include: “disease diagnosis”, “treatment”, “medicine consultation”, “medicine effect consultation”, “inquiry about the cause of disease”, “surgical consultation” and “other”.
  • the recognition model determines that the patient's intention is “other", it can directly prompt the user that it cannot answer such questions.
  • Step S22 According to the patient's intention, extract at least one entity word corresponding to the condition feature from the medical consultation sentence.
  • the disease characteristics include: at least one of onset symptoms, symptom onset time, symptom duration, accompanying symptoms, medical history, treatment history, and patient age.
  • this step S22 may include the following steps S221 and S222.
  • Step S221 Obtain a semantic slot template corresponding to the patient's intention, and each semantic slot template includes a plurality of semantic slots for characterizing the characteristics of the condition.
  • the semantic slot template corresponding to each intent can be preset.
  • the semantic slot template corresponding to "medicine consultation” includes multiple semantic slots for representing "symptoms", “time when symptoms occur”, “accompanying symptoms”, “medical history”, and "treatment history”.
  • Step S222 Extract entity words corresponding to the semantic slot in the semantic slot template from the medical consultation sentence.
  • a named entity recognition method may be used to extract entity words corresponding to semantic slots in the semantic slot template from the medical consultation sentence.
  • step S222 may include: sequence labeling the medical consultation sentence using a sequence labeling model, and obtaining the entity word corresponding to the semantic slot in the semantic slot template according to the sequence labeling result.
  • the sequence annotation model may be the BiLSTM-CRF model, which uses the BIO annotation set to perform named entity recognition based on the name of the semantic slot.
  • the semantic slot template includes two semantic slots: "disease" and "symptom name”.
  • B-DIS represents the first word of the disease
  • I-DIS represents the non-first word of the disease
  • B-SYM represents the first word of the symptom
  • I-SYM symptoms are not the first word
  • O means that the word is not part of the named entity.
  • B1-DIS can represent the first word of the disease
  • I1-DIS can represent the disease Not the first word
  • B1-SYM represents the first word of symptoms
  • I1-SYM is the first word of symptoms
  • B2-DIS represents the first word of onset time
  • I2-DIS represents the first word of onset time
  • B1-SYM represents the first word of medication history
  • I2- SYM symptoms are not the first word
  • O means that the word is not part of the named entity.
  • the BiLSTM-CRF model can be obtained through training. During training, set multiple sample sequences and their corresponding label sequences, each sample sequence and its corresponding label sequence have the same length; use the sample sequence as the input of the initial BiLSTM-CRF model, and set the sample sequence corresponding to the label sequence As the output of the initial BiLSTM-CRF model, the BiLSTM-CRF model with the required function is obtained through multiple training.
  • Step S23 Obtain a standard expression word that is synonymous with each of the at least one entity word according to a preset synonym mapping table.
  • the synonym mapping table includes a mapping relationship between a plurality of standard expression words and their corresponding synonyms (ie, the entity words).
  • the synonym mapping table may be provided before step S21.
  • FIG. 3 is a schematic diagram of a process of generating a synonym mapping table provided by an embodiment of the disclosure. As shown in FIG. 3, the process of generating the synonym mapping table may include the following steps S301 to S304.
  • Step S301 Generate a standard vocabulary.
  • the standard vocabulary stores multiple standard expression word samples (ie, multiple standard expression word samples).
  • Step S302 Collect at least one synonym corresponding to each standard expression word sample.
  • synonyms corresponding to a sample of standard expressions mean that they have the same or basically the same meaning as the standard expressions.
  • Synonyms corresponding to standard expression word samples can be collected from major medical websites, forums, Baidu Encyclopedia (see baike.baidu.com) and other websites. The synonyms collected in this step can be colloquial non-standard expression words.
  • standard expressions can be obtained from authoritative medical textbooks, dictionaries, manuals, etc., such as the diagnosis and treatment guidelines for various diseases issued by the medical and health management department, the clinical diagnosis and treatment guidelines issued by the medical industry association, and the doctor's desk reference (PDR, Physician's Desk Reference) ), Pharmacopoeia, etc.
  • Step S303 Calculate the similarity (such as cosine similarity) between each standard expression word sample and its corresponding synonym; keep the synonyms corresponding to the similarity greater than the preset value, and compare the similarity less than or equal to the preset value The synonyms corresponding to the degree are removed.
  • the similarity such as cosine similarity
  • an existing synonym recognition model for example, word2vec can be used to calculate the semantic similarity between words
  • the similarity is too small, it indicates that the corresponding standard expression word sample does not have the same meaning as the collected synonyms, and the synonyms can be removed.
  • the similarity may be a value between 0 and 1.
  • the preset value for determining the similarity between each standard expression word sample and its corresponding synonym can be set according to actual needs.
  • Synonyms toolkit In the field of natural language processing technology, a variety of models for identifying synonyms have been developed. For example, Synonyms toolkit, LRWE model, etc. The embodiments of the present disclosure may also use these known models to identify synonyms.
  • Step S304 Generate the synonym mapping table according to each standard expression word sample (ie, a standard expression word sample) and its corresponding and currently reserved synonyms.
  • Table 2 exemplarily shows a part of the synonym mapping table.
  • step S23 the standard expression words that are synonymous with the entity words can be directly searched from the synonym mapping table.
  • Step S24 Generate a semantic analysis result according to the patient's intention and standard expression words.
  • step S24 specifically includes the following steps S241 to S244.
  • Step S241 Fill the standard expression words corresponding to the patient's medical consultation sentence into corresponding semantic slots among the multiple semantic slots of the current semantic slot template.
  • the medical consultation sentence entered by the patient is "a cold, dry throat, what kind of medicine do you need to take", and the patient's intention can be identified as “medicine consultation” according to the medical consultation sentence.
  • the semantic slot template corresponding to this intention is multiple Semantic slots include: “symptoms”, “time when symptoms occurred”, “accompanying symptoms”, “medical history” and “treatment history”.
  • the entity words related to "symptoms” are: “cold, dry throat”; using the synonym mapping table to get the standard expression word “dry throat”, Then, fill "cold, dry throat” into the semantic slot of "symptoms".
  • Step S242 Determine whether there is an unfilled semantic slot in the current semantic slot template.
  • step S243 If the result of the judgment is that there is no unfilled semantic slot in the current semantic slot template, then proceed to step S243, as described below.
  • step S244 If the result of the judgment is that there is an unfilled semantic slot in the current semantic slot template, proceed to step S244.
  • step S244 an inquiry question corresponding to the unfilled semantic slot is generated, and the unfilled semantic slot is filled according to the answer sentence input by the patient for the inquiry question until all the semantic slots are filled.
  • the medical consultation sentence entered by the patient for the first time may contain only a few disease characteristics, for example, only the symptoms and onset time; and in most cases, the time, characteristics, The state and accompanying symptoms directly determine the possibility that the patient may develop a certain disease.
  • vomiting is a common symptom. It may be a symptom caused by a cold, or it may be a symptom caused by other reasons. The time of vomiting is different, and the result of the diagnosis may be different.
  • the medical question and answer method breaks through the traditional single-round question and answer method, and realizes multiple rounds of interaction.
  • Step S243 Generate a semantic analysis result according to the patient's intention, each semantic slot and its filling value.
  • the intent is "medicine consultation”
  • the semantic slot includes “symptoms”, “time when symptoms occur”, “accompanying symptoms”, “medical history” and “treatment history”
  • the semantic slot "symptoms” has a slot value of "headache”
  • semantics The slot "time of symptom occurrence” has a slot value of "one day ago”
  • the semantic slot "accompanying symptoms” has a slot value of "retching”
  • the semantic slot “medical history” has a slot value of "three positives”
  • the slot value of "history” is "anti-virus”
  • Step S25 Output a corresponding answer according to the semantic analysis result.
  • this step S25 may include the following steps S251 and S252.
  • Step S251 Calculate the matching degree between the semantic analysis result and each sample group in the doctor-patient question and answer knowledge base, each sample group includes a question sample (that is, a sample of the question) and its corresponding answer sample (that is, a sample of the answer).
  • step S251 may specifically include the following steps S251a and S251b.
  • Step S251a Calculate the similarity between the semantic analysis result and the question sample, and the correlation between the semantic analysis result and the answer sample.
  • the similarity between the semantic analysis result and the question sample and the correlation between the semantic analysis result and the answer sample can all be calculated using existing correlation calculation methods, such as the BM25 algorithm.
  • Step S251b Generate a matching degree according to the similarity degree and the first weighting coefficient, and the correlation degree and the second weighting coefficient. That is, the matching degree is the sum of the product of the similarity degree and the first weighting coefficient and the product of the correlation degree and the second weighting coefficient.
  • the first weighting coefficient and the second weighting coefficient can be set according to actual needs, each of the first weighting coefficient and the second weighting coefficient is between 0 and 1, and the sum of the first weighting coefficient and the second weighting coefficient Equal to 1.
  • Step S252 Output the answer sample corresponding to the maximum matching degree.
  • the matching degree is the similarity between the semantic analysis result and the question sample and the weighted sum of the semantic analysis result and the answer sample, in the sample group with the largest matching degree, the similarity between the question sample and the semantic analysis result and the answer sample and the semantic analysis result The relevance is relatively high.
  • Table 3 lists the similarity between the question corresponding to a semantic analysis result and each question sample and the correlation with each answer sample.
  • the question samples and answer samples in the same row in Table 3 are the same sample group.
  • the semantic analysis results have the highest similarity with the first question sample and the correlation with the first answer sample.
  • the semantic analysis result has the highest matching degree with the first sample group, so the first answer sample is output.
  • the following example introduces the medical question answering system method.
  • the patient entered the medical consultation sentence as "cold, nasal congestion, headache, dry throat, sore back, pain in the temples, I started to get sick yesterday morning, probably a runny nose, a little pain in the head in the afternoon, and dry throat last night.
  • the stomach turned a bit, and then I had insomnia until more than two o’clock. I woke up this morning with a clear nose in one nostril, and a yellow nose in one nostril. It turned into a clear nose after rubbing it three or four times. This afternoon, I had a fever and a fever. Sweating. What medicine do I need to take?".
  • FIG. 4 is a schematic structural diagram of a medical question answering system provided by an embodiment of the disclosure.
  • the medical question answering system can be used to execute the above medical question answering method.
  • the medical question answering system may include: an intention recognizer 10, an entity word extractor 20, a standard word acquisition unit 30, a parser 40, and an output unit 50.
  • the intention recognizer 10 is used to recognize the patient's intention based on a medical consultation sentence input by the patient.
  • the intention recognizer 10 can be used to convert the medical consultation sentence input by the patient from text data into vector data; input the vector data into a preset intention recognition model to recognize the patient's intention.
  • the intent recognition model is a classification model based on a document topic generation model and a two-way gated recurrent unit.
  • the intent recognizer 10 is further used to: obtain document subject information of the medical consultation sentence input by the patient; convert the medical consultation sentence input by the patient from text data into vector data; The document subject information corresponding to the medical consultation sentence and the vector data, obtaining the score of the medical consultation sentence corresponding to each preset intention; and according to the score of the medical consultation sentence corresponding to each preset intention, Determine the patient's intentions.
  • the entity word extractor 20 is used for extracting at least one entity word corresponding to the condition feature from the medical consultation sentence according to the patient's intention.
  • the characteristics of the condition include: at least one of onset symptoms, symptom onset time, symptom duration, accompanying symptoms, medical history, treatment history, and patient age.
  • the standard word acquiring unit 30 is configured to acquire standard expression words that are synonymous with the entity word according to a preset synonym mapping table.
  • the synonym mapping table includes a mapping relationship between a plurality of standard expression words and their corresponding synonyms (ie, the entity words).
  • the parser 40 is used to generate a semantic analysis result according to the patient's intention and standard expression words.
  • the output unit 50 is configured to output a corresponding answer according to the semantic analysis result.
  • FIG. 5 is a schematic structural diagram of a medical question answering system provided by an embodiment of the disclosure.
  • the medical question answering system in addition to the above-mentioned intention recognizer 10, the entity word extractor 20, the standard word acquisition unit 30, the parser 40, and the output unit 50, the medical question answering system also includes: a standard dictionary generator 60, Synonym collector 70, filter 80, and mapping table generator 90.
  • the entity word extractor 20 includes a template acquisition unit 21 and a recognition unit 22.
  • the template obtaining unit 21 is used to obtain a semantic slot template corresponding to the patient's intention, and each semantic slot template includes a plurality of semantic slots for characterizing disease characteristics.
  • the recognition unit 22 is used to extract entity words corresponding to the semantic slot in the semantic slot template from the medical consultation sentence.
  • the recognition unit 22 may be configured to use the sequence labeling model to sequence the medical consultation sentences, and obtain the entity words corresponding to the semantic slot in the semantic slot template according to the sequence labeling result.
  • the recognition unit may also be used to perform sequence labeling on the medical consultation sentence using a sequence labeling model, and obtain entity words corresponding to the plurality of semantic slots in the semantic slot template according to the sequence labeling result.
  • the parser 40 includes: a filling unit 41, a judgment unit 42, an inquiry unit 43 and a parsing unit 44.
  • the filling unit 41 is used to fill the standard expression words corresponding to the patient's medical consultation sentence into the corresponding semantic slots of the multiple semantic slots of the current semantic slot template.
  • the judging unit 42 is used to judge whether there is an unfilled semantic slot in the current semantic slot template.
  • the inquiring unit 43 is used to generate an inquiry question corresponding to the unfilled semantic slot when there is an unfilled semantic slot in the current semantic slot template, and respond to the unfilled semantic slot according to the answer sentence entered by the patient for the inquiry question. Filling is performed until all semantic slots in the current semantic slot template are filled.
  • the parsing unit 44 is used to generate a semantic parsing result according to the patient's intention, each semantic slot and its filling value.
  • the output unit 50 includes: a matching degree calculator 51 and an output unit 52.
  • the matching degree calculation unit 51 is used to calculate the matching degree between the semantic analysis result and each sample group in the doctor-patient question and answer knowledge base.
  • Each sample group includes question samples and their corresponding answer samples.
  • the matching degree calculator 51 includes: a calculation subunit 511 and a generation subunit 512.
  • the calculation subunit 511 is configured to calculate the similarity between the semantic analysis result and the question sample, and the correlation between the semantic analysis result and the answer sample.
  • the generating subunit 512 is configured to generate the matching degree according to the similarity degree and the first weighting coefficient, and the correlation degree and the second weighting coefficient.
  • the output unit 52 is configured to output the answer sample corresponding to the maximum matching degree.
  • the standard vocabulary generator 60 is used to generate a standard vocabulary, and a plurality of standard expression word samples are stored in the standard vocabulary.
  • the synonym collector 70 is used to collect at least one synonym corresponding to each standard expression word sample.
  • the filter 80 is used to calculate the similarity between each standard expression word sample and its corresponding synonym; retain the synonyms corresponding to the similarity greater than the preset value, and remove the synonyms corresponding to the similarity less than or equal to the preset value .
  • the preset value may be 0.5, 0.6, 0.7, 0.8, 0.9, etc.
  • the mapping table generator 90 is used to generate a synonym mapping table according to each synonym and its corresponding and currently reserved synonyms.
  • the medical question answering system shown in FIG. 4 or FIG. 5 may be a single computer or a single computing device, or multiple computers or multiple computing devices connected through a wired network and/or a wireless network.
  • the various components of the medical question-and-answer system shown in FIG. 4 or FIG. 5 may be implemented in hardware, or in a combination of hardware and software.
  • the various components of the medical question answering system shown in FIG. 4 or FIG. 5 can be implemented through a central processing unit (CPU), an application processor (AP), and a digital signal processor (DSP) having the corresponding functions described in the embodiments of the present disclosure.
  • CPU central processing unit
  • AP application processor
  • DSP digital signal processor
  • the various components of the medical question answering system shown in FIG. 4 or FIG. 5 can be implemented by a combination of a processor, a memory, and a computer program.
  • the computer program is stored in the memory, and the processor receives The computer program is read and executed in the memory, so as to be used as each component of the medical question answering system shown in FIG. 4 or FIG. 5.
  • the embodiments of the present disclosure also provide an electronic device, the electronic device includes: one or more processors and a storage device; wherein, one or more programs are stored on the storage device, and when the one or more programs are When executed by or multiple processors, the above one or more processors implement the medical question and answer method provided in the foregoing embodiments.
  • the embodiments of the present disclosure also provide a non-transitory computer-readable storage medium on which a computer program is stored, wherein the computer program is executed by a processor to implement the medical question and answer method provided in the foregoing embodiments.
  • word Embedding technology word2vec model, softmax classifier, LDA model, Bi-directional Long-Term Memory (Bi-directional Long-Term Memory) model, two-way gated cyclic unit (BiGRU), BiLSTM-CRF model, BIO annotation set , BM25 algorithm, etc. are all known technologies in the field of artificial intelligence and natural language processing.
  • word Embedding technology and word2vec model please refer to Mikolov T, Chen K, Corrado G S, et al. Effective Evaluation of Word Representations in Vector Space[C].
  • BiGRU bidirectional gated recurrent unit
  • Cho K Van Merrienboer B
  • Gulcehre C et al. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation[J].
  • arXiv Computation and Language, 2014.
  • BiLSTM-CRF model please refer to Huang Z, Xu W, Yu K, et al. Bidirectional LSTM-CRF Models for Sequence Tagging. [J].arXiv: Computation and Language, 2015.
  • BIO annotation set please refer to Sang E F, De Meulder F.
  • Some physical components or all physical components can be implemented as software executed by a processor, such as a central processing unit (CPU), a digital signal processor (DSP), a field programmable logic circuit (FPGA), or a microprocessor (MCU) , Either implemented as hardware, or implemented as an integrated circuit, such as an application specific integrated circuit (ASIC).
  • a processor such as a central processing unit (CPU), a digital signal processor (DSP), a field programmable logic circuit (FPGA), or a microprocessor (MCU) , Either implemented as hardware, or implemented as an integrated circuit, such as an application specific integrated circuit (ASIC).
  • a processor such as a central processing unit (CPU), a digital signal processor (DSP), a field programmable logic circuit (FPGA), or a microprocessor (MCU)
  • CPU central processing unit
  • DSP digital signal processor
  • FPGA field programmable logic circuit
  • MCU microprocessor
  • ASIC application specific integrated circuit
  • computer storage medium includes volatile and non-volatile data implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data).
  • Computer storage media include but are not limited to RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cassette, tape, magnetic disk storage or other magnetic storage device, or Any other medium used to store desired information and that can be accessed by a computer.
  • communication media usually contain computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as carrier waves or other transmission mechanisms, and may include any information delivery media .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Medical Informatics (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Machine Translation (AREA)

Abstract

一种医疗问答方法、系统、电子设备和计算机可读存储介质,所述方法包括:根据患者输入的病情描述语句识别患者的意图(S11);根据所述患者的意图,从所述病情描述语句中抽取与病情特征对应的至少一个实体词(S12);根据预设的同义词映射表获取与所述至少一个实体词中的每一个同义的标准表述词(S13);其中,所述同义词映射表包括多个标准表述词与各自对应的实体词之间的映射关系;根据患者的意图和标准表述词生成语义解析结果(S14);根据语义解析结果输出相应的答案(S15)。

Description

医疗问答方法、医疗问答系统、电子设备和计算机可读存储介质
相关申请的交叉引用
本申请要求于2019年6月5日提交的中国专利申请No.201910484808.4的优先权,该专利申请的全部内容通过引用方式合并于此。
技术领域
本公开涉及互联网技术领域,具体涉及一种医疗问答方法、一种医疗问答系统、一种电子设备和一种计算机可读存储介质。
背景技术
随着互联网的飞速发展,在健康相关的医疗领域,出现了许多在线疾病问答网站,它们可以为患者初期提供有建设性的疾病诊断建议。然而,由于患者在进行咨询时,存在口语化、描述多样性等问题,因此需要正确地理解患者输入的问题并对患者输入的问题进行回答。
发明内容
本公开的实施例提供了一种医疗问答方法、一种医疗问答系统、一种电子设备和一种非暂时性计算机可读存储介质。
本公开的第一方面提供了一种医疗问答方法,包括:
根据患者输入的医疗咨询语句识别患者的意图;
根据所述患者的意图,从所述医疗咨询语句中抽取与病情特征对应的至少一个实体词;
根据预设的同义词映射表获取与所述至少一个实体词中的每一个同义的标准表述词;其中,所述同义词映射表包括多个标准表述词与各自对应的实体词之间的映射关系;
根据所述患者的意图和所述标准表述词生成语义解析结果;以及
根据所述语义解析结果输出相应的答案。
在一个实施例中,所述根据患者输入的医疗咨询语句识别患者的意图,包括:
获取所述患者输入的医疗咨询语句的文档主题信息;
将所述患者输入的医疗咨询语句由文本数据转换为向量数据;
根据所述医疗咨询语句所对应的文档主题信息和所述向量数据,获取所述医疗咨询语句对应于每种预设的意图的分数;以及
根据所述医疗咨询语句对应于每种预设的意图的分数,确定所述患者的意图。
在一个实施例中,所述根据患者的意图,从所述医疗咨询语句中抽取与病情特征对应的至少一个实体词,包括:
获取与所述患者的意图相对应的语义槽模板,所述语义槽模板包括用于表征病情特征的多个语义槽;以及
从所述医疗咨询语句中抽取与所述语义槽模板中的所述多个语义槽对应的实体词。
在一个实施例中,所述从所述医疗咨询语句中抽取与所述语义槽模板中的所述多个语义槽对应的实体词,包括:
利用序列标注模型对所述医疗咨询语句进行序列标注,并根据序列标注结果获得与所述语义槽模板中的所述多个语义槽对应的实体词。
在一个实施例中,所述根据所述患者的意图和所述标准表述词生成语义解析结果,包括:
将所述患者的医疗咨询语句所对应的标准表述词填充至所述多个语义槽中相应的语义槽中;
判断当前的语义槽模板中是否存在未被填充的语义槽;
若判断的结果是当前的语义槽模板中存在未被填充的语义槽,则生成与未填充的语义槽对应的询问问题,并根据患者针对所述询问问题所输入的回答语句,对未填充的语义槽进行填充,直至当前的语 义槽模板的所有的语义槽均被填充为止;以及
根据所述患者的意图、每个语义槽及其填充值生成所述语义解析结果。
在一个实施例中,所述根据所述语义解析结果输出相应的答案包括:
计算所述语义解析结果与医患问答知识库中各样本组的匹配度,每个所述样本组包括问题样本及其对应的答案样本;以及
将最大匹配度所对应的答案样本进行输出。
在一个实施例中,所述计算所述语义解析结果与医患问答知识库中各样本组的匹配度,包括:
计算所述语义解析结果与所述问题样本的相似度、以及所述语义解析结果与所述答案样本的相关度;以及
根据所述相似度和第一加权系数、以及所述相关度和第二加权系数,生成所述匹配度。
在一个实施例中,所述病情特征包括:发病症状、症状发生时间、症状持续时间、伴随症状、病史、治疗史和患者年龄中的至少一者。
在一个实施例中,在所述根据患者输入的医疗咨询语句识别患者的意图之前,所述医疗问答方法还包括:
生成标准词库,该标准词库中存储有多个标准表述词样本;
采集与每个标准表述词样本对应的至少一个同义词;
计算每个标准表述词样本与其对应的同义词的相似度;将大于预设值的相似度所对应的同义词保留,并将小于或等于所述预设值的相似度所对应的同义词去除;以及
根据每个同义词及其对应的、且当前保留的同义词,生成所述同义词映射表。
本公开的第二方面提供了一种医疗问答系统,包括:
意图识别器,用于根据患者输入的医疗咨询语句识别患者的意图;
实体词抽取器,用于根据所述患者的意图,从所述医疗咨询语 句中抽取与病情特征对应的至少一个实体词;
标准词获取单元,用于根据预设的同义词映射表获取与所述至少一个实体词中的每一个同义的标准表述词;其中,所述同义词映射表包括多个标准表述词与各自对应的实体词之间的映射关系;
解析器,用于根据所述患者的意图和所述标准表述词生成语义解析结果;以及
输出单元,根据所述语义解析结果输出相应的答案。
在一个实施例中,所述意图识别器还用于:
获取所述患者输入的医疗咨询语句的文档主题信息;
将所述患者输入的医疗咨询语句由文本数据转换为向量数据;
根据所述医疗咨询语句所对应的文档主题信息和所述向量数据,获取所述医疗咨询语句对应于每种预设的意图的分数;以及
根据所述医疗咨询语句对应于每种预设的意图的分数,确定所述患者的意图。
在一个实施例中,所述实体词抽取器包括:
模板获取单元,用于获取与所述患者的意图相对应的语义槽模板,所述语义槽模板包括用于表征病情特征的多个语义槽;以及
识别单元,用于从所述医疗咨询语句中抽取与所述语义槽模板中的所述多个语义槽对应的实体词。
在一个实施例中,所述识别单元还用于:
利用序列标注模型对所述医疗咨询语句进行序列标注,并根据序列标注结果获得与所述语义槽模板中的所述多个语义槽对应的实体词。
在一个实施例中,所述解析器包括:
填充单元,用于将所述患者的医疗咨询语句所对应的标准表述词填充至所述多个语义槽中相应的语义槽中;
判断单元,用于判断所述语义槽模板中是否存在未被填充的语义槽;
询问单元,用于当所述语义槽模板中存在未填充的语义槽时,生成与未填充的语义槽对应的询问问题,并根据患者针对所述询问问 题所输入的回答语句,对未填充的语义槽进行填充,直至当前的语义槽模板的所有的语义槽均被填充为止;以及
解析单元,用于根据所述患者的意图、每个语义槽及其填充值生成所述语义解析结果。
在一个实施例中,所述输出单元包括:
匹配度计算器,用于计算所述语义解析结果与医患问答知识库中各样本组的匹配度,每个所述样本组包括问题样本及其对应的答案样本;以及
输出单元,用于将最大匹配度所对应的答案样本进行输出。
在一个实施例中,所述匹配度计算器包括:
计算子单元,用于计算所述语义解析结果与所述问题样本的相似度、以及所述语义解析结果与所述答案样本的相关度;以及
生成子单元,用于根据所述相似度和第一加权系数、以及所述相关度和第二加权系数,生成所述匹配度。
在一个实施例中,所述医疗问答系统还包括:
标准词库生成器,用于生成标准词库,该标准词库中存储有多个标准表述词样本;
同义词采集器,用于采集与每个标准表述词样本对应的至少一个同义词;
筛选器,用于计算每个标准表述词样本与其对应的同义词的相似度,将大于预设值的相似度所对应的同义词保留,将小于或等于所述预设值的相似度所对应的同义词去除;以及
映射表生成器,用于根据每个同义词及其对应的、且当前保留的同义词,生成所述同义词映射表。
本公开的第三方面提供了一种电子设备,包括存储器和处理器,所述存储器上存储有计算机程序,其中,所述计算机程序被所述处理器执行时实现根据本公开的第一方面的各个实施例中任意一个所述的医疗问答方法。
本公开的第四方面提供了一种非暂时性计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现 根据本公开的第一方面的各个实施例中任意一个所述的医疗问答方法。
附图说明
附图是用来提供对本公开的进一步理解,并且构成说明书的一部分,与下面的具体实施方式一起用于解释本公开,但并不构成对本公开的限制。在附图中:
图1为本公开实施例提供的一种医疗问答方法的流程图;
图2为本公开实施例提供的另一种医疗问答方法的流程图;
图3为本公开实施例提供的生成同义词映射表的流程示意图;
图4为本公开实施例提供的一种医疗问答系统的结构示意图;以及
图5为本公开实施例提供的一种医疗问答系统的结构示意图。
具体实施方式
为使本领域的技术人员更好地理解本公开的技术方案,下面结合附图对本公开提供的医疗问答方法、医疗问答系统、电子设备和计算机可读存储介质进行详细描述。
在下文中将参考附图更充分地描述示例实施例,但是所述示例实施例可以以不同形式来体现且不应当被解释为限于本文阐述的实施例。反之,提供这些实施例的目的在于使本公开透彻和完整,并将使本领域技术人员充分理解本公开的范围。如本文所使用的,术语“和/或”包括一个或多个相关列举条目的任何和所有组合。
本文所使用的术语仅用于描述特定实施例,且不意欲限制本公开。如本文所使用的,单数形式“一种”、“一个”和“该”也意欲包括复数形式,除非上下文另外清楚指出。还将理解的是,当本说明书中使用术语“包括”和/或“由……制成”时,指定存在所述特征、整体、步骤、操作、元件和/或组件,但不排除存在或添加一个或多个其它特征、整体、步骤、操作、元件、组件和/或其群组。
本文所述实施例可借助本公开的理想示意图而参考平面图和/或 截面图进行描述。因此,可根据制造技术和/或容限来修改示例图示。因此,实施例不限于附图中所示的实施例,而是包括基于制造工艺而形成的配置的修改。因此,附图中例示的区具有示意性属性,并且图中所示区的形状例示了元件的区的具体形状,但并不旨在是限制性的。
除非另外限定,否则本文所用的所有术语(包括技术和科学术语)的含义与本领域普通技术人员通常理解的含义相同。还将理解,诸如那些在常用字典中限定的那些术语应当被解释为具有与其在相关技术以及本公开的背景下的含义一致的含义,且将不解释为具有理想化或过度形式上的含义,除非本文明确如此限定。
本公开的发明人发现,当患者(或用户)向相关的在线疾病问答网站进行咨询时,由于患者输入的问题一般是口语化并且描述是多样性的,导致相关的在线疾病问答网站(或相关的问答系统)不能很好地对患者输入的问题进行回答。
图1为本公开实施例提供的一种医疗问答方法的流程图。例如,该医疗问答方法可以由医疗问答系统来执行,该系统可以通过软件和/或硬件的方式来实现,该系统可以集成在电子设备中。如图1所示,医疗问答方法可以包括以下步骤S11至步骤S15。
步骤S11、根据患者(或用户)输入的医疗咨询语句识别患者的意图。
例如,患者的意图的种类可以包括“疾病诊断”、“治疗”、“用药”、“用药效果咨询”、“发病原因咨询”、“手术咨询”等。
例如,步骤S11可以利用预设的意图识别模型来确定患者意图的具体种类。
步骤S12、根据患者的意图,从医疗咨询语句中抽取与病情特征对应的至少一个实体词。
在一些实施例中,患者的病情特征包括:发病症状、症状发生时间、症状持续时间、伴随症状、病史、治疗史和患者年龄中的至少一者。每种意图可以对应预设的一种或多种病情特征。例如,实体词可以是与患者的发病症状、症状发生时间、症状持续时间、伴随症状、病史、治疗史和年龄中的至少一者相对应的词语。
例如,患者输入的医疗咨询语句为“成人发烧38.5度两天了怎么办?”,则识别出患者的意图为“治疗”、根据“治疗”这一意图,抽取出:与“患者年龄”对应的实体词“成人”、与“发病症状”对应的实体词“发烧”,与“症状持续时间”对应的实体词“两天”等。
步骤S13、根据预设的同义词映射表获取与所述至少一个实体词中的每一个同义的标准表述词。例如,同义词映射表可以包括多个标准表述词与各自对应的同义词(即,所述实体词)之间的映射关系。
例如,步骤S12中所抽取的实体词可以为口语化的词语,例如“拉肚子”、“吃不下饭”、“胃口不好”、“食欲不好”;根据同义词映射表可以得到与“拉肚子”对应的标准表述词为“腹泻”,以及“吃不下饭”、“胃口不好”、“食欲不好”对应的标准表述词均为“厌食”。
步骤S14、根据患者的意图和标准表述词生成语义解析结果。
步骤S15、根据语义解析结果输出相应的答案。
在相关的医疗问答系统中,由于患者在进行咨询时,存在口语化、描述多样性等问题,因此,并不能准确地判断出患者的真实表述含义,从而不能准确地进行回答。而在本公开实施例中,抽取出患者输入的医疗咨询语句中的一些疾病、症状及描述症状特征的相关词(即,所述实体词)之后,将所述相关词转换为标准表述词,从而有利于系统给出准确的回答。
图2为本公开实施例提供的另一种医疗问答方法的流程图。如图2所示,该医疗问答方法可以包括以下步骤S21至步骤S25。
步骤S21、根据患者输入的医疗咨询语句识别患者的意图。
在一些实施例中,该步骤S21可以包括以下步骤S211至步骤S213。
步骤S211、获取患者输入的医疗咨询语句的文档主题信息;以及将患者输入的医疗咨询语句由文本数据转换为向量数据。
可选地,可以利用文档主题生成(也称为Latent Dirichlet Allocation,简称为LDA)模型生成医疗咨询语句的文档主题信息,利用word2vec模型将医疗咨询语句转换为embedding词(word  embedding)向量。
步骤S212、根据医疗咨询语句所对应的文档主题信息和向量数据,获取医疗咨询语句对应于每种预设的意图的分数。
例如,可以将医疗咨询语句所对应的文档主题信息和向量数据拼接得到包含词信息和主题信息的向量矩阵,并将该向量矩阵输入给双向门控循环单元(BiGRU),以获得医疗咨询语句对应于每种预设的意图的分数。每种预设的意图可以根据预先学习的方式获得。
步骤S213、根据医疗咨询语句对应于每种预设的意图的分数,确定患者的意图。
例如,利用softmax分类器将对应于每个意图的分数映射为(0,1)之间的概率,从而根据最大概率确定患者的意图。在此处,softmax分类器仅为示意,其它的分类器,例如svm分类器也可以应用。
例如,步骤S21可以利用预设的意图识别模型来执行,意图识别模型可以包括word2vec模型、文档主题生成(LDA)模型、双向门控循环单元(BiGRU)和softmax分类器。
可以通过训练的方法获得所需功能的意图识别模型。在进行训练时,通过样本,即从专业医疗网站或App(如好大夫(参见www.haodf.com)、丁香医生(参见www.dxy.com)、平安好医生(参见www.jk.cn)等)或医疗问诊病历(病人与医生的问诊记录)采集医患问答数据,从中抽取患者的医疗咨询语句,并对医疗咨询语句的文本进行数据清洗(即,去除文本中的非关键词,例如“你好”等)。之后,采用聚类算法对文本数据进行聚类,并通过抽样的方式确定患者通常询问的意图种类;并由专业人员(医生或具有医学知识的专业人员)确定每类意图的具体种类。并根据每个医疗咨询语句及其对应的意图种类来训练意图识别模型。
以表1为例,显示了采集的部分医疗咨询语句及其对应的意图种类的示例。
表1
医疗咨询语句 意图的种类
重症胰腺炎炎症没好转怎么办? 治疗
多囊吃了来曲,想确定有无排卵 用药效果咨询
身上各个地方先后出现红色疹子 疾病诊断
腰肌劳损治愈的时间和用药情况 用药咨询
这是怎么回事,什么引起的发烧? 发生原因询问
挤眼,张嘴,点头症状严重吗? 疾病诊断
多囊,吃黄体酮七天后没来月经 用药效果咨询
多卵性卵巢囊肿需要小孩怎么办? 治疗
2岁半宝宝摩擦性苔藓,如何用药 用药咨询
多颗牙齿缺失能不能做种植牙? 手术咨询
在一些实施例中,意图的种类可以包括:“疾病诊断”、“治疗”、“用药咨询”、“用药效果咨询”、“发病原因询问”、“手术咨询”和“其他”,当根据意图识别模型判断出患者的意图为“其他”时,则可以直接提示用户无法回答此类问题。
步骤S22、根据患者的意图,从所述医疗咨询语句中抽取与病情特征对应的至少一个实体词。
示例性地,所述病情特征包括:发病症状、症状发生时间、症状持续时间、伴随症状、病史、治疗史和患者年龄中的至少一者。
在一些实施例中,该步骤S22可以包括以下步骤S221和步骤S222。
步骤S221、获取与患者的意图相对应的语义槽模板,每个语义槽模板包括多个用于表征病情特征的语义槽。
例如,每种意图所对应的语义槽模板可以预先设定。例如,“用药咨询”所对应的语义槽模板中包括用于表征“症状”、“症状发生的时间”、“伴随症状”、“病史”、“治疗史”的多个语义槽。
步骤S222、从医疗咨询语句中抽取与语义槽模板中的语义槽对应的实体词。
在一些实施例中,可以采用命名实体识别方法从医疗咨询语句中抽取与语义槽模板中的语义槽对应的实体词。
具体地,步骤S222可以包括:利用序列标注模型对所述医疗咨询语句进行序列标注,并根据序列标注结果获得与语义槽模板中的语义槽对应的实体词。
例如,序列标注模型可以为BiLSTM-CRF模型,该模型采用BIO标注集进行基于语义槽的名称的命名实体识别。例如,语义槽模板中包括两个语义槽:“疾病”和“症状名称”,采用BIO标注集进行标注时,以B-DIS代表疾病首字,I-DIS代表疾病非首字,B-SYM代表症状首字,I-SYM症状非首字,O代表该字不属于命名实体中的一部分。当然,语义槽模板包括其他数量的语义槽时,如:“疾病”、“发病时间”、“用药史”和“症状名称”,则可以以B1-DIS代表疾病首字,I1-DIS代表疾病非首字,B1-SYM代表症状首字,I1-SYM症状非首字,B2-DIS代表发病时间首字,I2-DIS代表发病时间非首字,B1-SYM代表用药史首字,I2-SYM症状非首字;O代表该字不属于命名实体中的一部分。
例如,BiLSTM-CRF模型可以通过训练的方式得到。在训练时,设置多个样本序列及其各自对应的标注序列,每个样本序列与其对应的标注序列具有相同的长度;将样本序列作为初始BiLSTM-CRF模型的输入、将样本序列对应的标注序列作为初始BiLSTM-CRF模型的输出,并通过多次训练得到所需功能的BiLSTM-CRF模型。
步骤S23、根据预设的同义词映射表获取与所述至少一个实体词中的每一个同义的标准表述词。例如,同义词映射表包括多个标准表述词与各自对应的同义词(即,所述实体词)之间的映射关系。
例如,同义词映射表可以在步骤S21之前提供。图3为本公开实施例提供的生成同义词映射表的流程示意图。如图3所示,生成同义词映射表的过程可以包括以下步骤S301至步骤S304。
步骤S301、生成标准词库,该标准词库中存储有多个标准表述词样本(即,多个标准表述词的样本)。
步骤S302、采集与每个标准表述词样本对应的至少一个同义词。
例如,与标准表述词样本对应的同义词是指,与标准表述词的含义相同或基本相同。可以从各大医疗网站、论坛、百度百科(参见 baike.baidu.com)等网站采集与标准表述词样本对应的同义词,该步骤采集到的同义词可以为口语化的非标准表述词。
例如,标准表述词可以从权威的医学教材、词典、手册等获取,例如医疗卫生管理部门发布的各类疾病的诊疗指南、医学行业协会发布的临床诊疗指南、医生案头手册(PDR,Physician’s Desk Reference)、药典等。
步骤S303、计算每个标准表述词样本与其对应的同义词的相似度(如余弦相似度);将大于预设值的相似度所对应的同义词保留,并将小于或等于所述预设值的相似度所对应的同义词去除。
例如,可以利用现有的同义词识别模型(例如,可以使用word2vec来计算各个词之间的语义相似度)来计算相似度。当相似度过小时,则表明相应的标准表述词样本与采集到的同义词表达的含义并不相同,将该同义词去除即可。例如,该相似度可以是在0和1之间的值。
例如,用于判断每个标准表述词样本与其对应的同义词的相似度的预设值可以根据实际需要设置。
在自然语言处理技术领域,已经发展了多种识别同义词的模型。例如,Synonyms工具包、LRWE模型等。本公开的实施例也可以使用这些已知的模型来识别同义词。
步骤S304、根据每个标准表述词样本(即,标准表述词的样本)及其对应的、且当前保留的同义词,生成所述同义词映射表。
表2示例性地示出了同义词映射表的一部分。
表2
Figure PCTCN2020094068-appb-000001
在步骤S23中,可以从同义词映射表中直接查询与实体词同义的标准表述词。
步骤S24、根据患者的意图和标准表述词生成语义解析结果。
在一些实施例中,步骤S24具体包括以下步骤S241至步骤S244。
步骤S241、将患者的医疗咨询语句所对应的标准表述词填充至当前的语义槽模板的所述多个语义槽中相应的语义槽中。
例如,患者输入的医疗咨询语句为“感冒、嗓子发干,请问需要吃什么药”,根据医疗咨询语句可以识别患者的意图为“用药咨询”,该意图所对应的语义槽模板中的多个语义槽包括:“症状”、“症状发生的时间”、“伴随症状”“病史”和“治疗史”。通过对医疗咨询语句进行命名实体识别,得到与“症状”的实体词为:“感冒、嗓子发干”;利用同义词映射表得到与“嗓子发干”得到的标准表述词为“喉咙干”,那么,则将“感冒、喉咙干”填充至“症状”的语义槽中。
步骤S242、判断当前的语义槽模板中是否存在未被填充的语义槽。
若判断的结果是当前的语义槽模板中不存在未被填充的语义槽,则继续进行至步骤S243,如下文所述。
若判断的结果是当前的语义槽模板中存在未被填充的语义槽,则继续进行至步骤S244。在步骤S244中,生成与未填充的语义槽对应的询问问题,并根据患者针对询问问题所输入的回答语句,对未填充的语义槽进行填充,直至所有的语义槽均被填充为止。
在一些实际应用场景中,患者第一次输入的医疗咨询语句中可能只包含少量的几个病情特征,例如,只包括症状和发病时间;而大多数情况下,患者症状出现的时间、特点、状态、伴随症状直接决定了患者可能患某种疾病的可能性。例如,呕吐是一种常见的症状,有可能是感冒引发的症状,也可能是其他原因引发的症状,呕吐时间不同,可能诊断的疾病结果不同。本公开实施例中,当用户的医疗咨询语句中的有用信息较少,而导致语义槽模板中的语义槽未完全填充时,可向用户(或患者)输出询问问题,进而得到更全面的信息。这样,根据本公开的实施例的医疗问答方法突破了传统的单轮式问答方式,实现多轮交互。
步骤S243、根据患者的意图、每个语义槽及其填充值生成语义解析结果。
例如,语义解析结果可以采用act(slot1=value1,slot2=value2……)三元组的形式,act表示意图,slot1、slot2为语义槽、value1、value2为各语义槽中填充的槽值。例如,意图为“用药咨询”、语义槽包括“症状”、“症状发生的时间”、“伴随症状”“病史”和“治疗史”;语义槽“症状”的槽值为“头痛”,语义槽“症状发生的时间”的槽值为“一天前”,语义槽“伴随症状”的槽值为“干呕”,语义槽“病史”的槽值为“大三阳”,语义槽“治疗史”的槽值为“抗病毒”;则三元组形式的语义解析结果即为:“用药咨询(症状=头痛,症状发生的时间=一天前,伴随症状=干呕,病史=大三阳,治疗史=抗病毒)”。
步骤S25、根据语义解析结果输出相应的答案。
在一些实施例中,该步骤S25可以包括以下步骤S251和步骤S252。
步骤S251、计算语义解析结果与医患问答知识库中各样本组的 匹配度,每个样本组包括问题样本(即,问题的样本)及其对应的答案样本(即,答案的样本)。
例如,步骤S251具体可以包括以下步骤S251a和步骤S251b。
步骤S251a、计算语义解析结果与问题样本的相似度、以及语义解析结果与答案样本的相关度。例如,语义解析结果与问题样本的相似度以及语义解析结果与答案样本的相关度均可以利用现有的相关度计算方法来计算,例如BM25算法。
步骤S251b、根据相似度和第一加权系数、以及相关度和第二加权系数,生成匹配度。即,匹配度为:相似度和第一加权系数的乘积与相关度和第二加权系数的乘积之和。
例如,第一加权系数和第二加权系数可以根据实际需要进行设置,第一加权系数和第二加权系数中的每一个在0和1之间,并且第一加权系数和第二加权系数之和等于1。
步骤S252、将最大匹配度所对应的答案样本进行输出。
由于匹配度是语义解析结果与问题样本的相似度以及语义解析结果与答案样本的加权和,因此,最大匹配度的样本组中,问题样本与语义解析结果的相似度以及答案样本与语义解析结果的相关度均较高。
当然,也可以采用其他的方式选择与语义解析结果对应的答案。例如,语义解析结果与某一问题样本的相似度超过预设的第一阈值,且语义解析结构与该问题样本所对应的答案样本的相关度超过第二阈值,则将该答案样本进行输出。
表3列举出了一种语义分析结果所对应的问题与各问题样本的相似度以及与各答案样本的相关度。
例如,表3中同一行的问题样本和答案样本为同一个样本组。对于表3中患者的医疗咨询语句“多颗牙齿缺失能不能做种植牙”,其语义解析结果与第一个问题样本的相似度以及与第一个答案样本的相关度均达到最大,此时,语义解析结果与第一组样本组的匹配度最高,因此,将第一个答案样本进行输出。
表3
Figure PCTCN2020094068-appb-000002
下面举例介绍医疗问答系统方法。
例如,患者输入的医疗咨询语句为“感冒,鼻塞,头痛,嗓子发干,背部酸痛,太阳穴扎着痛,昨天早上开始生病的,大概就是流鼻涕,下午头有点痛,到昨晚喉咙干,胃有点翻,然后就失眠到两点多,今早起来一个鼻孔里流着清鼻涕,一个鼻孔流着黄鼻涕没多少,擦了三四次就变成了清鼻涕。今天下午发烧,边烧边流汗。请问需要吃什么药?”。首先,利用意图识别模型识别出患者的意图为“用药咨询”,其对应的语义槽模板中包括的语义槽为:症状、症状发生时 间、伴随症状、病史、治疗史。抽取出医疗咨询语句中的与病情特征对应的实体词,并将其转换为标准表述词;将各标准表述词填充至当前的语义槽模板的所述多个语义槽中相应的语义槽中,得到:症状=“感冒,鼻塞,头痛,喉咙干,背部酸痛,太阳穴刺痛”,症状发生时间=“昨天早上开始”,伴随症状=“流鼻涕,发烧,边烧边流汗”;之后,生成与“病史”对应的询问语句“有无病史”;以及生成与“治疗史”对应的询问语句“有无治疗史”。假设用户作出回答为:“得过大三阳,一直在抗病毒”,那么,在“病史”的语义槽中填充槽值“大三阳”,在“治疗史”的语义槽中填充槽值“抗病毒”;从而得到语义分析结果。最后,根据语义分析结果作出回答“您所患疾病为感冒(自愈性疾病),用药建议:泰诺、板蓝根”。
图4为本公开实施例提供的一种医疗问答系统的结构示意图,该医疗问答系统可以用于执行上述医疗问答方法。如图4所示,该医疗问答系统可以包括:意图识别器10、实体词抽取器20、标准词获取单元30、解析器40和输出单元50。
例如,意图识别器10用于根据患者输入的医疗咨询语句识别患者的意图。
在一些实施例中,意图识别器10可以用于将患者输入的医疗咨询语句由文本数据转换为向量数据;将向量数据输入至预设的意图识别模型,以识别出患者的意图。
在一些实施例中,意图识别模型为基于文档主题生成模型和双向门控循环单元的分类模型。
在一些实施例中,所述意图识别器10还用于:获取所述患者输入的医疗咨询语句的文档主题信息;将所述患者输入的医疗咨询语句由文本数据转换为向量数据;根据所述医疗咨询语句所对应的文档主题信息和所述向量数据,获取所述医疗咨询语句对应于每种预设的意图的分数;以及根据所述医疗咨询语句对应于每种预设的意图的分数,确定所述患者的意图。
实体词抽取器20用于根据患者的意图,从所述医疗咨询语句中抽取至少一个与病情特征对应的实体词。可选地,所述病情特征包括: 发病症状、症状发生时间、症状持续时间、伴随症状、病史、治疗史和患者年龄中的至少一者。
标准词获取单元30用于根据预设的同义词映射表获取与实体词同义的标准表述词。例如,同义词映射表包括多个标准表述词与各自对应的同义词(即,所述实体词)之间的映射关系。
解析器40用于根据患者的意图和标准表述词生成语义解析结果。
输出单元50用于根据所述语义解析结果输出相应的答案。
图5为本公开实施例提供的一种医疗问答系统的结构示意图。如图5所示,该医疗问答系统除了包括上述意图识别器10、实体词抽取器20、标准词获取单元30、解析器40、输出单元50之外,还包括:标准词库生成器60、同义词采集器70、筛选器80和映射表生成器90。
在一些实施例中,实体词抽取器20包括模板获取单元21和识别单元22。
模板获取单元21用于获取与患者的意图相对应的语义槽模板,每个语义槽模板包括多个用于表征病情特征的语义槽。
识别单元22用于从医疗咨询语句中抽取与语义槽模板中的语义槽对应的实体词。
例如,识别单元22可以用于利用序列标注模型对医疗咨询语句进行序列标注,并根据序列标注结果获得与语义槽模板中的语义槽对应的实体词。
此外,所述识别单元还可以用于:利用序列标注模型对所述医疗咨询语句进行序列标注,并根据序列标注结果获得与所述语义槽模板中的所述多个语义槽对应的实体词。
在一些实施例中,解析器40包括:填充单元41、判断单元42、询问单元43和解析单元44。
例如,填充单元41用于将患者的医疗咨询语句所对应的标准表述词填充至当前的语义槽模板的所述多个语义槽中相应的语义槽中。
判断单元42用于判断当前的语义槽模板中是否存在未被填充的语义槽。
询问单元43用于当当前的语义槽模板中存在未填充的语义槽时,生成与未填充的语义槽对应的询问问题,并根据患者针对询问问题所输入的回答语句,对未填充的语义槽进行填充,直至当前的语义槽模板中所有的语义槽均被填充。
解析单元44用于根据患者的意图、每个语义槽及其填充值生成语义解析结果。
在一些实施例中,输出单元50包括:匹配度计算器51和输出单元52。
例如,匹配度计算的单元51用于计算语义解析结果与医患问答知识库中各样本组的匹配度。每个样本组包括问题样本及其对应的答案样本。
在一些实施例中,匹配度计算器51包括:计算子单元511和生成子单元512。
例如,计算子单元511用于计算所述语义解析结果与所述问题样本的相似度、以及所述语义解析结果与所述答案样本的相关度。
生成子单元512用于根据所述相似度和第一加权系数、以及所述相关度和第二加权系数,生成所述匹配度。
输出单元52用于将最大匹配度所对应的答案样本进行输出。
标准词库生成器60用于生成标准词库,该标准词库中存储有多个标准表述词样本。
同义词采集器70用于采集与每个标准表述词样本对应的至少一个同义词。
筛选器80用于计算每个标准表述词样本与其对应的同义词的相似度;并将大于预设值的相似度所对应的同义词保留,将小于或等于预设值的相似度所对应的同义词去除。例如,该预设值可以为0.5、0.6、0.7、0.8或0.9等。
映射表生成器90用于根据每个同义词及其对应的、且当前保留的同义词,生成同义词映射表。
对于上述各模块和单元的实现细节和技术效果的描述,可以参见前述方法实施例的说明,此处不再赘述。
应当理解的是,图4或图5所示的医疗问答系统可以是单个计算机或单个计算装置,也可以是通过有线网络和/或无线网络连接起来的多个计算机或多个计算装置。此外,图4或图5所示的医疗问答系统的各个组件可以通过硬件的方式来实现,也可以通过硬件和软件相结合的方式来实现。例如,图4或图5所示的医疗问答系统的各个组件可以通过具有本公开的实施例所述的相应功能的中央处理器(CPU)、应用处理器(AP)、数字信号处理器(DSP)、现场可编程逻辑电路(FPGA)、微处理器(MCU)、集成电路(IC)或专用集成电路(ASIC)。例如,图4或图5所示的医疗问答系统的各个组件可以通过处理器、存储器和计算机程序相结合的方式来实现,所述计算机程序存储在所述存储器中,所述处理器从所述存储器中读取并执行所述计算机程序,从而用作图4或图5所示的医疗问答系统的各个组件。
本公开实施例还提供了一种电子设备,该电子设备包括:一个或多个处理器以及存储装置;其中,存储装置上存储有一个或多个程序,当上述一个或多个程序被上述一个或多个处理器执行时,使得上述一个或多个处理器实现如前述各实施例所提供的医疗问答方法。
本公开实施例还提供了一非暂时性计算机可读存储介质,其上存储有计算机程序,其中,该计算机程序被处理器执行时实现如前述各实施例所提供的医疗问答方法。
应当理解的是,上述word Embedding技术、word2vec模型、softmax分类器、LDA模型、BiLSTM(Bi-directional Long Short-Term Memory)模型、双向门控循环单元(BiGRU)、BiLSTM-CRF模型、BIO标注集、BM25算法等都是人工智能领域和自然语言处理领域已知的技术。例如,word Embedding技术和word2vec模型的进一步信息可以参见Mikolov T,Chen K,Corrado G S,et al.Efficient Estimation of Word Representations in Vector Space[C].international conference on learning representations,2013以及位于网址https://code.google.com/p/word2vec/的Google开源的代码。softmax分类器的进一步信息可以参见网址 https://pytorch.org/docs/master/generated/torch.nn.Softmax.html。LDA模型的进一步信息可以参见Blei,D.M.,Ng,A.Y.,&Jordan,M.I.(2003).Latent dirichlet allocation.Journal of machine Learning research,3(Jan),993-1022。BiLSTM(Bi-directional Long Short-Term Memory)模型的进一步信息可以参见Hochreiter S,Schmidhuber J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780。双向门控循环单元(BiGRU)的进一步信息可以参见Cho K,Van Merrienboer B,Gulcehre C,et al.Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation[J].arXiv:Computation and Language,2014。BiLSTM-CRF模型的进一步信息可以参见Huang Z,Xu W,Yu K,et al.Bidirectional LSTM-CRF Models for Sequence Tagging.[J].arXiv:Computation and Language,2015。BIO标注集的进一步信息可以参见Sang E F,De Meulder F.Introduction to the CoNLL-2003shared task:language-independent named entity recognition[C].north american chapter of the association for computational linguistics,2003:142-147。BM25算法的进一步信息可以参见书籍《信息检索导论》。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器(CPU)、数字信号处理器(DSP)、现场可编程逻辑电路(FPGA)或微处理器(MCU)执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路(ASIC)。这样的软件可以分布在计算机可读存储介质上,计算机可读存储介质可以包括计算机存储介质(或非暂时性计算机可读存储介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其它 数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其它存储器技术、CD-ROM、数字多功能盘(DVD)或其它光盘存储、磁盒、磁带、磁盘存储或其它磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其它的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其它传输机制之类的调制数据信号中的其它数据,并且可包括任何信息递送介质。
本文已经公开了示例实施例,并且虽然采用了具体术语,但它们仅用于并仅应当被解释为一般说明性含义,并且不用于限制的目的。在一些实例中,对本领域技术人员显而易见的是,除非另外明确指出,否则可单独使用与特定实施例相结合描述的特征、特性和/或元素,或可与其它实施例相结合描述的特征、特性和/或元件组合使用。因此,本领域技术人员将理解,在不脱离由所附的权利要求阐明的本公开的范围的情况下,可进行各种形式和细节上的改变。
可以理解的是,以上实施方式仅仅是为了说明本公开的原理而采用的示例性实施方式,然而本公开并不局限于此。对于本领域内的普通技术人员而言,在不脱离由所附的权利要求所限定的本公开的保护范围的情况下,可以做出各种变型和改进,这些变型和改进也落入本公开的保护范围。

Claims (19)

  1. 一种医疗问答方法,包括:
    根据患者输入的医疗咨询语句识别患者的意图;
    根据所述患者的意图,从所述医疗咨询语句中抽取与病情特征对应的至少一个实体词;
    根据预设的同义词映射表获取与所述至少一个实体词中的每一个同义的标准表述词;其中,所述同义词映射表包括多个标准表述词与各自对应的实体词之间的映射关系;
    根据所述患者的意图和所述标准表述词生成语义解析结果;以及
    根据所述语义解析结果输出相应的答案。
  2. 根据权利要求1所述的医疗问答方法,其中,所述根据患者输入的医疗咨询语句识别患者的意图,包括:
    获取所述患者输入的医疗咨询语句的文档主题信息;
    将所述患者输入的医疗咨询语句由文本数据转换为向量数据;
    根据所述医疗咨询语句所对应的文档主题信息和所述向量数据,获取所述医疗咨询语句对应于每种预设的意图的分数;以及
    根据所述医疗咨询语句对应于每种预设的意图的分数,确定所述患者的意图。
  3. 根据权利要求1或2所述的医疗问答方法,其中,所述根据患者的意图,从所述医疗咨询语句中抽取与病情特征对应的至少一个实体词,包括:
    获取与所述患者的意图相对应的语义槽模板,所述语义槽模板包括用于表征病情特征的多个语义槽;以及
    从所述医疗咨询语句中抽取与所述语义槽模板中的所述多个语义槽对应的实体词。
  4. 根据权利要求3所述的医疗问答方法,其中,所述从所述医疗咨询语句中抽取与所述语义槽模板中的所述多个语义槽对应的实体词,包括:
    利用序列标注模型对所述医疗咨询语句进行序列标注,并根据序列标注结果获得与所述语义槽模板中的所述多个语义槽对应的实体词。
  5. 根据权利要求3所述的医疗问答方法,其中,所述根据所述患者的意图和所述标准表述词生成语义解析结果,包括:
    将所述患者的医疗咨询语句所对应的标准表述词填充至所述多个语义槽中相应的语义槽中;
    判断当前的语义槽模板中是否存在未被填充的语义槽;
    若判断的结果是当前的语义槽模板中存在未被填充的语义槽,则生成与未填充的语义槽对应的询问问题,并根据患者针对所述询问问题所输入的回答语句,对未填充的语义槽进行填充,直至当前的语义槽模板的所有的语义槽均被填充为止;以及
    根据所述患者的意图、每个语义槽及其填充值生成所述语义解析结果。
  6. 根据权利要求1至5中任意一项所述的医疗问答方法,其中,所述根据所述语义解析结果输出相应的答案包括:
    计算所述语义解析结果与医患问答知识库中各样本组的匹配度,每个所述样本组包括问题样本及其对应的答案样本;以及
    将最大匹配度所对应的答案样本进行输出。
  7. 根据权利要求6所述的医疗问答方法,其中,所述计算所述语义解析结果与医患问答知识库中各样本组的匹配度,包括:
    计算所述语义解析结果与所述问题样本的相似度、以及所述语义解析结果与所述答案样本的相关度;以及
    根据所述相似度和第一加权系数、以及所述相关度和第二加权 系数,生成所述匹配度。
  8. 根据权利要求1至7中任意一项所述的医疗问答方法,其中,所述病情特征包括:发病症状、症状发生时间、症状持续时间、伴随症状、病史、治疗史和患者年龄中的至少一者。
  9. 根据权利要求1至7中任意一项所述的医疗问答方法,其中,在所述根据患者输入的医疗咨询语句识别患者的意图之前,所述医疗问答方法还包括:
    生成标准词库,该标准词库中存储有多个标准表述词样本;
    采集与每个标准表述词样本对应的至少一个同义词;
    计算每个标准表述词样本与其对应的同义词的相似度;将大于预设值的相似度所对应的同义词保留,并将小于或等于所述预设值的相似度所对应的同义词去除;以及
    根据每个同义词及其对应的、且当前保留的同义词,生成所述同义词映射表。
  10. 一种医疗问答系统,包括:
    意图识别器,用于根据患者输入的医疗咨询语句识别患者的意图;
    实体词抽取器,用于根据所述患者的意图,从所述医疗咨询语句中抽取与病情特征对应的至少一个实体词;
    标准词获取单元,用于根据预设的同义词映射表获取与所述至少一个实体词中的每一个同义的标准表述词;其中,所述同义词映射表包括多个标准表述词与各自对应的实体词之间的映射关系;
    解析器,用于根据所述患者的意图和所述标准表述词生成语义解析结果;以及
    输出单元,根据所述语义解析结果输出相应的答案。
  11. 根据权利要求10所述的医疗问答系统,其中,所述意图识 别器还用于:
    获取所述患者输入的医疗咨询语句的文档主题信息;
    将所述患者输入的医疗咨询语句由文本数据转换为向量数据;
    根据所述医疗咨询语句所对应的文档主题信息和所述向量数据,获取所述医疗咨询语句对应于每种预设的意图的分数;以及
    根据所述医疗咨询语句对应于每种预设的意图的分数,确定所述患者的意图。
  12. 根据权利要求10或11所述的医疗问答系统,其中,所述实体词抽取器包括:
    模板获取单元,用于获取与所述患者的意图相对应的语义槽模板,所述语义槽模板包括用于表征病情特征的多个语义槽;以及
    识别单元,用于从所述医疗咨询语句中抽取与所述语义槽模板中的所述多个语义槽对应的实体词。
  13. 根据权利要求12所述的医疗问答系统,其中,所述识别单元还用于:
    利用序列标注模型对所述医疗咨询语句进行序列标注,并根据序列标注结果获得与所述语义槽模板中的所述多个语义槽对应的实体词。
  14. 根据权利要求12所述的医疗问答系统,其中,所述解析器包括:
    填充单元,用于将所述患者的医疗咨询语句所对应的标准表述词填充至所述多个语义槽中相应的语义槽中;
    判断单元,用于判断所述语义槽模板中是否存在未被填充的语义槽;
    询问单元,用于当所述语义槽模板中存在未填充的语义槽时,生成与未填充的语义槽对应的询问问题,并根据患者针对所述询问问题所输入的回答语句,对未填充的语义槽进行填充,直至当前的语义 槽模板的所有的语义槽均被填充为止;以及
    解析单元,用于根据所述患者的意图、每个语义槽及其填充值生成所述语义解析结果。
  15. 根据权利要求10至14中任一项所述的医疗问答系统,其中,所述输出单元包括:
    匹配度计算器,用于计算所述语义解析结果与医患问答知识库中各样本组的匹配度,每个所述样本组包括问题样本及其对应的答案样本;以及
    输出单元,用于将最大匹配度所对应的答案样本进行输出。
  16. 根据权利要求15所述的医疗问答系统,其中,所述匹配度计算器包括:
    计算子单元,用于计算所述语义解析结果与所述问题样本的相似度、以及所述语义解析结果与所述答案样本的相关度;以及
    生成子单元,用于根据所述相似度和第一加权系数、以及所述相关度和第二加权系数,生成所述匹配度。
  17. 根据权利要求10至16中任意一项所述的医疗问答系统,还包括:
    标准词库生成器,用于生成标准词库,该标准词库中存储有多个标准表述词样本;
    同义词采集器,用于采集与每个标准表述词样本对应的至少一个同义词;
    筛选器,用于计算每个标准表述词样本与其对应的同义词的相似度,将大于预设值的相似度所对应的同义词保留,将小于或等于所述预设值的相似度所对应的同义词去除;以及
    映射表生成器,用于根据每个同义词及其对应的、且当前保留的同义词,生成所述同义词映射表。
  18. 一种电子设备,包括存储器和处理器,所述存储器上存储有计算机程序,其中,所述计算机程序被所述处理器执行时实现根据权利要求1至9中任意一项所述的医疗问答方法。
  19. 一种非暂时性计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现根据权利要求1至9中任意一项所述的医疗问答方法。
PCT/CN2020/094068 2019-06-05 2020-06-03 医疗问答方法、医疗问答系统、电子设备和计算机可读存储介质 WO2020244534A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/282,035 US20210375404A1 (en) 2019-06-05 2020-06-03 Medical question-answering method, medical question-answering system, electronic device, and computer readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910484808.4A CN110176315B (zh) 2019-06-05 2019-06-05 医疗问答方法及系统、电子设备、计算机可读介质
CN201910484808.4 2019-06-05

Publications (1)

Publication Number Publication Date
WO2020244534A1 true WO2020244534A1 (zh) 2020-12-10

Family

ID=67697060

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/094068 WO2020244534A1 (zh) 2019-06-05 2020-06-03 医疗问答方法、医疗问答系统、电子设备和计算机可读存储介质

Country Status (3)

Country Link
US (1) US20210375404A1 (zh)
CN (1) CN110176315B (zh)
WO (1) WO2020244534A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116662374A (zh) * 2023-07-31 2023-08-29 天津市扬天环保科技有限公司 基于相关性分析的信息技术咨询服务系统
CN117476163A (zh) * 2023-12-27 2024-01-30 万里云医疗信息科技(北京)有限公司 用于确定疾病结论的方法、装置以及存储介质

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110176315B (zh) * 2019-06-05 2022-06-28 京东方科技集团股份有限公司 医疗问答方法及系统、电子设备、计算机可读介质
CN110781677B (zh) * 2019-10-12 2023-02-07 深圳平安医疗健康科技服务有限公司 药品信息匹配处理方法、装置、计算机设备和存储介质
CN111063429A (zh) * 2019-10-25 2020-04-24 中国科学院自动化研究所 一种医疗咨询方法、装置、设备和计算机可读存储介质
CN111180077A (zh) * 2019-11-29 2020-05-19 厦门快商通科技股份有限公司 医美主题识别方法、装置、设备及存储介质
CN111177334A (zh) * 2019-11-29 2020-05-19 厦门快商通科技股份有限公司 医美主题切换方法、装置、设备及存储介质
CN111339252B (zh) * 2020-02-25 2021-05-11 腾讯科技(深圳)有限公司 一种搜索方法、装置及存储介质
CN111694942A (zh) * 2020-05-29 2020-09-22 平安科技(深圳)有限公司 问答方法、装置、设备及计算机可读存储介质
CN113742480A (zh) * 2020-06-18 2021-12-03 北京汇钧科技有限公司 客服应答方法和装置
CN111966781B (zh) * 2020-06-28 2024-02-20 北京百度网讯科技有限公司 数据查询的交互方法及装置、电子设备和存储介质
CN112037903B (zh) * 2020-08-31 2023-08-15 康键信息技术(深圳)有限公司 在线问诊开药系统
CN112102840B (zh) * 2020-09-09 2024-05-03 中移(杭州)信息技术有限公司 语义识别方法、装置、终端及存储介质
CN113076403A (zh) * 2021-04-21 2021-07-06 深圳追一科技有限公司 一种用户消息处理方法及相关设备
US20220375626A1 (en) * 2021-05-21 2022-11-24 Nuance Communications, Inc. Telehealth System and Method
CN113449089B (zh) * 2021-06-11 2023-12-01 车智互联(北京)科技有限公司 一种查询语句的意图识别方法、问答方法及计算设备
CN113707303A (zh) * 2021-08-30 2021-11-26 康键信息技术(深圳)有限公司 基于知识图谱的医疗问题解答方法、装置、设备及介质
CN113764112A (zh) * 2021-09-16 2021-12-07 山东大学第二医院 一种在线医疗问答方法
CN113642327A (zh) * 2021-10-14 2021-11-12 中国光大银行股份有限公司 一种标准知识库的构建方法及装置
CN114357144B (zh) * 2022-03-09 2022-08-09 北京大学 基于小样本的医疗数值抽取和理解方法及装置
CN115878790B (zh) * 2022-04-08 2023-08-25 北京中关村科金技术有限公司 一种智能问答方法、装置、存储介质及电子设备
CN115080751B (zh) * 2022-08-16 2022-11-11 之江实验室 一种基于通用模型的医学标准术语管理系统及方法
CN115840808B (zh) * 2022-12-27 2023-08-11 广州汉申科技中介服务有限公司 科技项目咨询方法、装置、服务器及计算机可读存储介质
CN116578682B (zh) * 2023-05-22 2024-02-13 浙江法之道信息技术有限公司 一种法务服务的智能咨询方法及系统
CN116756579B (zh) * 2023-08-22 2023-12-12 腾讯科技(深圳)有限公司 大语言模型的训练方法及基于大语言模型的文本处理方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663129A (zh) * 2012-04-25 2012-09-12 中国科学院计算技术研究所 医疗领域深度问答方法及医学检索系统
US9367608B1 (en) * 2009-01-07 2016-06-14 Guangsheng Zhang System and methods for searching objects and providing answers to queries using association data
CN106897559A (zh) * 2017-02-24 2017-06-27 黑龙江特士信息技术有限公司 一种面向多数据源的症状体征类实体识别方法及装置
CN109637674A (zh) * 2018-10-30 2019-04-16 北京健康有益科技有限公司 自动获取健康医疗问题答案的方法、系统、介质和设备
CN109840275A (zh) * 2019-01-31 2019-06-04 北京嘉和美康信息技术有限公司 一种医疗搜索语句的处理方法、装置和设备
CN110176315A (zh) * 2019-06-05 2019-08-27 京东方科技集团股份有限公司 医疗问答方法及系统、电子设备、计算机可读介质

Family Cites Families (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004126981A (ja) * 2002-10-03 2004-04-22 Medifocus:Kk 診断装置、診断方法およびプログラム
US8200501B2 (en) * 2006-01-26 2012-06-12 International Business Machines Corporation Methods, systems and computer program products for synthesizing medical procedure information in healthcare databases
US20120141962A1 (en) * 2010-06-07 2012-06-07 Williamson Gabrielle R Systems and methods for matching a patient with a mental health care provider
US20130218596A1 (en) * 2012-02-16 2013-08-22 dbMotion Ltd. Method And System For Facilitating User Navigation Through A Computerized Medical Information System
CN103810218B (zh) * 2012-11-14 2018-06-08 北京百度网讯科技有限公司 一种基于问题簇的自动问答方法和装置
CN103400054A (zh) * 2013-08-27 2013-11-20 哈尔滨工业大学 计算机辅助心理咨询自动问答机器人系统
US10311206B2 (en) * 2014-06-19 2019-06-04 International Business Machines Corporation Electronic medical record summary and presentation
CN104573028B (zh) * 2015-01-14 2019-01-25 百度在线网络技术(北京)有限公司 实现智能问答的方法和系统
US20180005123A1 (en) * 2016-07-01 2018-01-04 Xerox Corporation Combining semantic and business process modeling in a multi-layer framework
US11294942B2 (en) * 2016-09-29 2022-04-05 Koninklijk Ephilips N.V. Question generation
CN107247868B (zh) * 2017-05-18 2020-05-12 深思考人工智能机器人科技(北京)有限公司 一种人工智能辅助问诊系统
US20180365742A1 (en) * 2017-06-20 2018-12-20 Crystal A. Thomas System for Data Analysis and Payment Expedition
CN107516110B (zh) * 2017-08-22 2020-02-18 华南理工大学 一种基于集成卷积编码的医疗问答语义聚类方法
US11430347B2 (en) * 2017-09-18 2022-08-30 Microsoft Technology Licensing, Llc Providing diet assistance in a session
CN108090127B (zh) * 2017-11-15 2021-02-12 北京百度网讯科技有限公司 建立问答文本评价模型与评价问答文本的方法、装置
CN108170684B (zh) * 2018-01-22 2020-06-05 京东方科技集团股份有限公司 文本相似度计算方法及系统、数据查询系统和计算机产品
EP3788632A1 (en) * 2018-04-30 2021-03-10 Koninklijke Philips N.V. Visual question answering using on-image annotations
CN108804532B (zh) * 2018-05-03 2020-06-26 腾讯科技(深圳)有限公司 一种查询意图的挖掘和查询意图的识别方法、装置
CN108733837B (zh) * 2018-05-28 2021-04-27 上海依智医疗技术有限公司 一种病历文本的自然语言结构化方法及装置
CN109117474B (zh) * 2018-06-25 2022-05-03 广州多益网络股份有限公司 语句相似度的计算方法、装置及存储介质
CN109378077B (zh) * 2018-08-13 2022-07-29 北京左医科技有限公司 诊前病史采集方法及执行该方法的机器可读存储介质
CN109684445B (zh) * 2018-11-13 2021-05-28 中国科学院自动化研究所 口语化医疗问答方法及系统
CN109741824B (zh) * 2018-12-21 2023-08-04 质直(上海)教育科技有限公司 一种基于机器学习的医疗问诊方法
CN109817329B (zh) * 2019-01-21 2021-06-29 暗物智能科技(广州)有限公司 一种医疗问诊对话系统以及应用于该系统的强化学习方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9367608B1 (en) * 2009-01-07 2016-06-14 Guangsheng Zhang System and methods for searching objects and providing answers to queries using association data
CN102663129A (zh) * 2012-04-25 2012-09-12 中国科学院计算技术研究所 医疗领域深度问答方法及医学检索系统
CN106897559A (zh) * 2017-02-24 2017-06-27 黑龙江特士信息技术有限公司 一种面向多数据源的症状体征类实体识别方法及装置
CN109637674A (zh) * 2018-10-30 2019-04-16 北京健康有益科技有限公司 自动获取健康医疗问题答案的方法、系统、介质和设备
CN109840275A (zh) * 2019-01-31 2019-06-04 北京嘉和美康信息技术有限公司 一种医疗搜索语句的处理方法、装置和设备
CN110176315A (zh) * 2019-06-05 2019-08-27 京东方科技集团股份有限公司 医疗问答方法及系统、电子设备、计算机可读介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116662374A (zh) * 2023-07-31 2023-08-29 天津市扬天环保科技有限公司 基于相关性分析的信息技术咨询服务系统
CN116662374B (zh) * 2023-07-31 2023-10-20 天津市扬天环保科技有限公司 基于相关性分析的信息技术咨询服务系统
CN117476163A (zh) * 2023-12-27 2024-01-30 万里云医疗信息科技(北京)有限公司 用于确定疾病结论的方法、装置以及存储介质
CN117476163B (zh) * 2023-12-27 2024-03-08 万里云医疗信息科技(北京)有限公司 用于确定疾病结论的方法、装置以及存储介质

Also Published As

Publication number Publication date
US20210375404A1 (en) 2021-12-02
CN110176315A (zh) 2019-08-27
CN110176315B (zh) 2022-06-28

Similar Documents

Publication Publication Date Title
WO2020244534A1 (zh) 医疗问答方法、医疗问答系统、电子设备和计算机可读存储介质
Abacha et al. Recognizing question entailment for medical question answering
US10331659B2 (en) Automatic detection and cleansing of erroneous concepts in an aggregated knowledge base
WO2023029506A1 (zh) 病情分析方法、装置、电子设备及存储介质
US10380251B2 (en) Mining new negation triggers dynamically based on structured and unstructured knowledge
EP3376400A1 (en) Dynamic context adjustment in language models
JP7357614B2 (ja) 機械支援対話システム、ならびに病状問診装置およびその方法
US20200211709A1 (en) Method and system to provide medical advice to a user in real time based on medical triage conversation
Li et al. MLEC-QA: A Chinese multi-choice biomedical question answering dataset
US20160098456A1 (en) Implicit Durations Calculation and Similarity Comparison in Question Answering Systems
Miftahutdinov et al. KFU at CLEF eHealth 2017 Task 1: ICD-10 Coding of English Death Certificates with Recurrent Neural Networks.
US20190198137A1 (en) Automatic Summarization of Patient Data Using Medically Relevant Summarization Templates
JP2022537759A (ja) コンピューティング・スパンに対するディープ・ラーニング・アプローチ
CN116992002A (zh) 一种智能护理方案应答方法及系统
Alghoson Medical document classification based on MeSH
Liu et al. Integrated cTAKES for Concept Mention Detection and Normalization.
Zhou et al. Converting semi-structured clinical medical records into information and knowledge
Montenegro et al. The HoPE model architecture: A novel approach to pregnancy information retrieval based on conversational agents
Hu et al. Label-indicator morpheme growth on LSTM for Chinese healthcare question department classification
Roberts et al. Annotating Question Decomposition on Complex Medical Questions.
Park et al. A deep learning-based depression trend analysis of korean on social media
Li et al. Extrinsic factors affecting the accuracy of biomedical NER
Liu Research on question classification methods in the medical field
Yip et al. Construction of UMLS Metathesaurus with Knowledge-Infused Deep Learning.
Li et al. Medical text entity recognition based on CRF and joint entity

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20818741

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20818741

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20818741

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20818741

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 21/07/2022)