CN113889259A

CN113889259A - Automatic diagnosis dialogue system under assistance of knowledge graph

Info

Publication number: CN113889259A
Application number: CN202111036730.3A
Authority: CN
Inventors: 王万良; 王媛媛; 徐新黎; 赵燕伟; 尹晶
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2021-09-06
Filing date: 2021-09-06
Publication date: 2022-01-04

Abstract

The invention relates to an automatic diagnosis dialogue system under assistance of a knowledge graph, which comprises a front-end module, a data acquisition module, a data storage management module, a symptom expansion module and a module doctor-patient dialogue processing module, wherein the front-end module comprises a front-end module, a data acquisition module, a data storage management module, a symptom expansion module and a module doctor-patient dialogue processing module: wherein the front-end module transmits the user input to the doctor-patient dialogue processing module and displays the user input and corresponding system feedback; the data acquisition module finishes the acquisition of doctor-patient conversation linguistic data and disease regulations; the data storage management module receives data from the data acquisition module, constructs a knowledge map, and perfects a user information symptom table and a medical data selection table; the symptom expansion module expands the doctor-patient dialogue model training corpus, queries the knowledge graph to obtain key symptoms and probable symptoms causing diseases, and transmits the key symptoms and probable symptoms into a medical data selection table; the doctor-patient dialogue processing module trains the doctor-patient dialogue model, receives input of a front-end user, and utilizes the doctor-patient dialogue model to make a decision on actions to be taken next step to generate sentences or diagnosis results, and the sentences or the diagnosis results are transmitted to the front-end module and displayed to the user.

Description

Automatic diagnosis dialogue system under assistance of knowledge graph

Technical Field

The invention relates to a dialogue system, in particular to an automatic diagnosis dialogue system assisted by a knowledge graph.

Background

The digital medical industry is rapidly developing, and particularly, the demand for non-contact auxiliary diagnosis technologies such as network inquiry and the like is rapidly increasing. However, the current online inquiry is mainly that online diagnosis of doctors is not intelligent automatic diagnosis, patients often need to wait for online reply of doctors, diagnosis lacks real-time performance, and a large number of problems increase reply burden of doctors, so researchers provide an automatic diagnosis dialogue system based on a knowledge graph.

The knowledge map is a structured semantic knowledge base, expresses massive internet information into semantic representation recognizable in the objective world, has strong semantic expression and storage capacity, is widely applied to chat robots, recommendation systems and the like, doctors determine diseases suffered by patients according to medical knowledge and medical experience, the knowledge map is the summary of human knowledge and experience, and the professional knowledge map in a specific field can provide strong knowledge support for downstream tasks; the dialog system is classified into a non-task type chat system and a task type dialog system, which can accomplish a specific task through natural language interaction between a computer and a user, and is widely applied to a recommendation system, a reservation system, and the like.

Some disease diagnosis systems currently exist with the following problems: most of the systems are simple single-hop question-answering systems, diagnosis results are obtained according to keyword matching through one-time input of users, the relation between symptoms and diseases is ignored, the diagnosis processes in actual life are not met, and the diagnosis results are lack of rigor. The problem of 'black box' brought by the model obtained by learning a large amount of linguistic data and cases through machine learning and deep learning technologies causes the diagnosis result to lack of interpretability, and doctors have a worry about the diagnosis system. Model training linguistic data is too simple or doctor-patient dialogue is lost, so that the training model symptoms are insufficient, and the trained model is low in efficiency and accuracy.

Disclosure of Invention

In order to solve the problems, the invention provides an automatic diagnosis dialogue system under the assistance of a knowledge map, which realizes the automatic diagnosis of common diseases, realizes the automatic dialogue of doctors and patients by learning the diagnosis thinking of doctors through a reinforcement learning method, carries out disease verification on the input excavation symptoms of users and simulates the whole inquiry process. The invention introduces the medical knowledge map into a dialogue system, provides a theoretical basis for disease diagnosis, and makes the result more persuasive and interpretable. The knowledge graph is used for carrying out symptom expansion and complementing training corpus data through searching and matching to realize automatic diagnosis of common diseases, the disease diagnosis result is obtained more accurately, and comprehensive diagnosis opinions are given through knowledge graph query. The technical scheme adopted by the invention for realizing the purpose is as follows:

a system for automated diagnosis dialog with assistance of a knowledge graph, comprising: the system comprises a front-end module, a data acquisition module, a data storage management module, a symptom expansion module and a doctor-patient conversation processing module.

Wherein:

the front-end module displays an important window of human-computer interaction, displays user input and corresponding system feedback, and transmits the symptoms input by the user into the doctor-patient dialogue processing module.

And the data acquisition module finishes data acquisition work by utilizing a python crawler technology, acquires content of the data as doctor-patient conversation corpus and disease vocabulary entries, and outputs the content of the data as the doctor-patient conversation corpus and the disease vocabulary entries to the data storage management module so as to finish the construction work of the knowledge map and the training sample.

And the data storage management module comprises a medical knowledge map construction submodule, a user symptom information table and a medical data selection table. Wherein:

the medical knowledge map comprises entities such as diseases, symptoms, treatment departments, treatment modes, key symptoms and the like and weight relations between symptoms and diseases, and the medical knowledge map construction submodule comprises a knowledge acquisition grandchild module, a knowledge fusion grandchild module, a disease-symptom weight relation calculation grandchild module and a knowledge storage grandchild module; wherein:

and the knowledge acquisition grandchild module receives the input of the data acquisition module, adopts a deep learning-based method for the disease symptom description part in the disease entry, and utilizes a Jieba tool, a BERT + BilSTM + CRF model and a BIO sequence marking method to identify the named entity and extract symptoms. The key symptoms described for the disease are obtained using TF-IDF. Where TF is the word frequency and IDF is the inverse text frequency index. The method specifically comprises the following steps:

and (3) carrying out symptom recognition on the disease symptom description part in the disease entry by adopting a deep learning method and utilizing a Jieba tool, a BERT + BilSTM + CRF model and a BIO sequence marking method. Firstly, adding a standard term set such as SNOMEDCT and the like into a Jieba word segmentation special word bank to segment words of disease symptom description, and realizing symptom identification of user input sentences by using a Bert + BilSTM + CRF technology for the segmented sentences. Specifically, the method comprises the following steps:

1) mapping the sentence after word segmentation into a word vector by utilizing the trained Embedding of Bert as the input of a subsequent BilSTM network;

2) the input sequence is as follows: x ═ X₁,x₂,x₃,...,x_n) The BilSTM outputs a prediction sequence y ═ y (y)₁,y₂,y₃,...,y_n) Taking the output sequence as the input of a third CRF layer for inputting the prediction scores of all labels of each word of the sentence;

3) the CRF layer defines the scoring function of the sequence:

an entity BIO tag value is obtained.

The TF-IDF formula is:

TF-IDF＝TF*IDF (4)

and the knowledge fusion grandchild module is used for mapping the disease to the ICD-10 standard code by utilizing the Chinese disease name synonym library of the ICD-10 disease diagnosis code and carrying out symptom standardization by utilizing the SNOMED CT standard library.

And the disease-symptom weight relation calculation grandchild module calculates the weight relation between the disease and the symptom by utilizing a probability map (Noisy-or) model technology. Let the conditional probability of a disease causing symptom be P (dis | sym), let the conditional probability of a disease causing symptom be P (sym | dis), each relational edge connects only one symptom and disease, and each edge contains both weights.

And the knowledge storage grandchild module is used for displaying the organization structure of knowledge in the knowledge map in a map structure, importing entity data into a Neo4j database in batch in a CSV file mode, and performing visualization processing on the knowledge map by using a cypher language.

User symptom information table: and storing the symptoms extracted from the user complaints.

Medical data selection table: and storing the symptoms which are required to be asked by the user for verifying the disease suffered by the user by the doctor and the patient in the doctor-patient conversation corpus. In this table, the user's negative symptom is marked as-1 and the user's positive symptom is 1.

And the symptom expansion module is used for adding the key symptoms and the general symptoms causing diseases into the medical data selection table to be expanded as training data, so that the accuracy of model training is improved. The method specifically comprises the following steps:

the key symptom of the disease is suffered in the knowledge map by using a cypher query language, and the prior probability of the self-describing explicit symptom of the user is recorded as P_prior(sym) is 1, according to

P(dis)＝P(dis|sym)*P_prior(sym) (5)

Obtaining the probability of having a disease, based on

P(sym)＝P(sym|dis)*P(dis) (6)

The probability of the symptom causing the disease was obtained, and the first 2 with the largest p (sym) value were taken. Two symptoms are added into a medical data selection table, the symptom mark is 1, and if the symptom has repetition, one of the symptoms is taken.

And the doctor-patient conversation processing module comprises a doctor-patient conversation model training submodule and a doctor-patient conversation system agent submodule. Wherein:

and the doctor-patient conversation model training submodule comprises a natural language understanding grandchild module, a conversation management grandchild module and a natural language generation grandchild module, and trains a model learning decision process by using a reinforcement learning method. Specifically, the method comprises the following steps:

and the natural language understanding grandchild module identifies user intention and slot filling operation. The user intentions are classified into four categories, namely "ask for a disease", "confirm a symptom", "deny a symptom", and "not determine whether or not to have the symptom". The method based on deep learning is adopted for the doctor-patient dialogue corpus, named entity recognition and symptom extraction are carried out by utilizing a Jieba tool, a BERT + BilsTM + CRF model and a BIO sequence marking method, simply, the first input of a user is classified into 'inquiry disease', and the subsequent input is classified according to a keyword matching mode and is respectively filled into a user symptom information table and a medical data selection table.

And the dialogue management grandchild module comprises dialogue state tracking and dialogue strategy learning, and realizes dialogue interaction between the system agent and the user. And the conversation management module controls the whole process until a diagnosis result is obtained. The method specifically comprises the following steps:

with a rule-based session state tracker, the state of the symptoms is stored and updated by the session state tracker after natural language understanding is completed. In each round of conversation, s is adopted_tTo save information of previous turns of actions, known symptoms and current turns of the robotic agent and user.

Training the DQN network by using a user simulator and a system agent simulation task driving conversation process, and when the doctor-patient conversation model makes correct diagnosis, the conversation is successfully stopped; when the diagnosis result is wrong or the number of conversation turns reaches a preset number of turns, the conversation failure is terminated. Specifically, the method comprises the following steps:

data in the user simulator comes from a data storage management module and maintains user targets of each conversation, including 'diagnosed diseases', 'explicit symptoms' and 'implicit symptoms'. Wherein the 'confirmed disease' is the final confirmed symptom of the current dialogue, the 'explicit symptom' is from the user symptom information table, and the 'implicit symptom' is from the medical data selection table.

The input of DQN is the dialog current state s_tOutput Q(s)_t,a_t| θ) as the selection action a in the current state_tThe decay weighted sum of the earned reward:

where Q' is a parameter of the target network, γ is the attenuation coefficient, r_tIs the current session state s_tThe maximum return on taking system action is taken. Training each phase using a greedy strategy and fitting the experience of each time step e_t(s_t,a_t,t_t,s_t+1) Stored in an experience pool that is updated when the network performs better.

And the natural language generation grandchild module is used for generating human language to ask a question to a user or generating a final diagnosis result based on the template according to the decision action of the dialogue management submodule.

Specifically, the doctor-patient dialogue model training process is as follows:

1) the data acquisition module acquires disease regulations and doctor-patient dialogue linguistic data by using a crawler and transmits the disease regulations and the doctor-patient dialogue linguistic data to the data storage management module;

2) the data storage management module corpus is used for carrying out named entity recognition, a knowledge map is constructed according to disease regulations, and a user symptom information table and a medical data selection table are filled according to doctor-patient conversation corpus;

3) the symptom expansion module is used for inquiring by adopting a cypher language according to the diagnosis disease name of the doctor-patient dialogue corpus to obtain key symptoms and probable symptoms causing diseases and expanding the medical data selection table;

4) establishing a user simulator by utilizing the doctor-patient dialogue corpus confirmed diagnosis disease tag name, the user symptom information table and the medical data selection table;

5) and (3) training the DQN network to perform dialogue decision learning by utilizing a user simulator and a system agent simulation task driving dialogue process to obtain an automatic dialogue system model as a doctor-patient dialogue system agent.

And the doctor-patient dialogue system agent submodule is a doctor-patient dialogue model obtained by the training of the doctor-patient dialogue model training submodule. The doctor-patient dialogue system agent receives the input of the front-end module user, continuously and automatically generates the next dialogue according to the feedback of the user until a diagnosis result is obtained, and the diagnosis result is displayed on a front-end page and displayed to the user. Specifically, if a certain disease can be diagnosed, a diagnosis template is automatically filled in and transmitted to a front-end module to be displayed to a user, the display content of the diagnosis template is the name of the diagnosed disease, a diagnosis department is recommended, reasons and recommended relieving measures and contraindications before the diagnosis are formed, and if the number of sessions reaches a fixed number, a fixed statement is output to remind the user to go to a hospital for medical treatment in time.

The invention has the beneficial effects that:

according to the invention, implicit symptoms existing in the user can be mined by simulating doctor-patient conversation according to the chief symptoms of the user, and automatic diagnosis of diseases is finally realized according to the current symptoms of the user, so that the diagnosis result is more real-time. The intensive learning method is used for realizing automatic generation of doctor-patient conversations, and the medical knowledge map containing key symptoms and weight relations between diseases and symptoms is used for expanding the linguistic data of the confirmed cases, so that the deficiency of inquiry symptoms caused by insufficient doctor experience of the training linguistic data is made up, and the diagnosis accuracy of a conversation system is improved. The invention introduces the medical knowledge map into the dialogue system and combines the reinforcement learning technology to provide basis for the diagnosis of diseases, so that the diagnosis has more persuasiveness and interpretability. The invention can reduce the repeated labor of doctors, is beneficial to reducing the burden of the doctors, is beneficial to reasonably allocating medical resources, effectively relieves the problems of difficult and troublesome seeing, reduces the medical cost, enables people to know the health condition of the people in time and to reasonably seek medical advice.

Drawings

Fig. 1 is a detailed block diagram of a module according to an embodiment of the present invention.

Fig. 2 is a flowchart of a doctor-patient dialogue model implementation according to an embodiment of the invention.

Fig. 3 is a flow chart of doctor-patient interaction in accordance with an embodiment of the present invention.

FIG. 4 is a flow chart of a module implementation of an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

Referring to fig. 1, an automatic diagnosis dialogue system assisted by a knowledge graph includes a front-end module, a data acquisition module, a data storage management module, a symptom expansion module, and a doctor-patient dialogue processing module, wherein the data acquisition management module includes a medical knowledge graph construction sub-module, and the doctor-patient dialogue processing module includes a doctor-patient dialogue model training sub-module and a doctor-patient dialogue system agent sub-module. Specifically, the modules are specifically realized by the following steps:

The data acquisition module finishes data acquisition work by utilizing a python crawler technology, acquires content of the data acquisition module as doctor-patient conversation linguistic data and disease entries, outputs the collected content to the data storage management module to finish construction of a knowledge map and construction of a training sample, and data source websites comprise professional websites of book documents, Baidu encyclopedias, spring rain doctors, good doctors and the like.

3) the CRF layer defines the scoring function of the sequence:

an entity BIO tag value is obtained.

The TF-IDF formula is:

TF-IDF＝TF*IDF (4)

and the knowledge fusion grandchild module is used for mapping the disease to the ICD-10 standard code by utilizing the Chinese disease name synonym library of the ICD-10 disease diagnosis code and carrying out symptom normalization by utilizing the SNOMEDCT standard library.

P(dis)＝P(dis|sym)*P_prior(sym) (5)

Obtaining the probability of having a disease, based on

P(sym)＝P(sym|dis)*P(dis) (6)

Training the DQN network by using a user simulator and a system agent simulation task driving conversation process, and when the doctor-patient conversation model makes correct diagnosis, the conversation is successfully stopped; when the diagnosis result is wrong or the number of conversation turns reaches a preset number of turns, the conversation failure is terminated.

Specifically, the method comprises the following steps:

Specifically, as shown in fig. 3, the doctor-patient dialogue model training process is as follows:

And the doctor-patient dialogue system agent submodule is a doctor-patient dialogue model obtained by the training of the doctor-patient dialogue model training submodule. Referring to fig. 3, the doctor-patient dialog system agent receives the user input of the front-end module, automatically generates the next dialog according to the user feedback continuously until the diagnosis result is obtained, and displays the result on the front-end page to be displayed to the user. Specifically, if a certain disease can be diagnosed, the diagnosis template is automatically filled in and transmitted to the front-end module to be displayed to the user, the display content of the diagnosis template is the name of the diagnosed disease, a department for diagnosis is suggested, reasons and suggested mitigating measures and contraindications before diagnosis are formed, and if the number of sessions reaches a fixed turn, a fixed sentence is output to remind the user to go to the hospital for medical treatment in time.

Claims

1. A knowledge-graph-aided automated diagnostic dialog system, characterized by: including front end module, data acquisition module, data storage management module, symptom extension module, doctor-patient dialogue processing module, wherein:

the front-end module displays an important window of human-computer interaction, displays user input and corresponding system feedback, and transmits the user input and the corresponding system feedback into the doctor-patient conversation processing module after the user inputs symptoms.

And the data acquisition module finishes data acquisition work by utilizing a python crawler technology, acquires content including doctor-patient conversation linguistic data and disease entries, and outputs the content to the data storage management module so as to finish the construction work of the knowledge map and the training sample.

and the knowledge acquisition grandchild module receives the input of the data acquisition module, adopts a deep learning-based method for the disease symptom description part in the disease entry, and utilizes a Jieba tool, a BERT + BilSTM + CRF model and a BIO sequence tagging method to identify the named entity and extract symptoms. The key symptoms described for the disease are obtained using TF-IDF. Where TF is the word frequency and IDF is the inverse text frequency index. The method specifically comprises the following steps:

and (3) carrying out symptom recognition on the disease symptom description part in the disease entry by adopting a deep learning method and utilizing a Jieba tool, a BERT + BilSTM + CRF model and a BIO sequence marking method. Firstly, adding a standard term set such as SNOMED CT and the like into a Jieba word segmentation special word bank to segment words of disease symptom description, and realizing symptom identification of user input sentences by using Bert + BilSTM + CRF technology for the segmented sentences. Specifically, the method comprises the following steps:

3) the CRF layer defines the scoring function of the sequence:

an entity BIO tag value is obtained.

The TF-IDF formula is:

TF-IDF＝TF*IDF (4)

And the knowledge storage grandchild module is used for displaying the organization structure of knowledge in the knowledge graph in a graph structure, importing entity data into a Neo4j database in batch in a CSV (CSV) file mode, and performing visualization processing on the knowledge graph by using a cypher language.

P(dis)＝P(dis|sym)*P_prior(sym) (5)

Obtaining the probability of having a disease, based on

P(sym)＝P(sym|dis)*P(dis) (6)

and the natural language understanding grandchild module identifies user intention and slot filling operation. The user intentions are classified into four categories, namely "ask for a disease", "confirm a symptom", "deny a symptom", and "not determine whether or not to have the symptom". The method is characterized in that a deep learning-based method is adopted for the doctor-patient dialogue corpus, named entity recognition is carried out by utilizing a Jieba tool, a BERT + BilsTM + CRF model and a BIO sequence marking method to extract symptoms, simply, the first input of a user is classified into 'inquiry diseases', and the subsequent inputs are classified according to a keyword matching mode and are respectively filled into a user symptom information table and a medical data selection table.

Specifically, the doctor-patient dialogue model training process is as follows:

The doctor-patient dialogue system agent submodule and the doctor-patient dialogue model training submodule train the obtained doctor-patient dialogue model. And the doctor-patient conversation system agent receives the input of the front-end module, automatically generates the next conversation continuously according to the feedback of the user until a diagnosis result is obtained, and displays the diagnosis result on a front-end page to be displayed to the user. Specifically, if a certain disease can be diagnosed, a diagnosis template is automatically filled in and transmitted to a front-end module to be displayed to a user, the display content of the diagnosis template is the name of the diagnosed disease, a diagnosis department is recommended, reasons and recommended relieving measures and contraindications before the diagnosis are formed, and if the number of sessions reaches a fixed number, a fixed statement is output to remind the user to go to a hospital for medical treatment in time.