CN113889259A - Automatic diagnosis dialogue system under assistance of knowledge graph - Google Patents

Automatic diagnosis dialogue system under assistance of knowledge graph Download PDF

Info

Publication number
CN113889259A
CN113889259A CN202111036730.3A CN202111036730A CN113889259A CN 113889259 A CN113889259 A CN 113889259A CN 202111036730 A CN202111036730 A CN 202111036730A CN 113889259 A CN113889259 A CN 113889259A
Authority
CN
China
Prior art keywords
module
symptom
user
doctor
disease
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111036730.3A
Other languages
Chinese (zh)
Inventor
王万良
王媛媛
徐新黎
赵燕伟
尹晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202111036730.3A priority Critical patent/CN113889259A/en
Publication of CN113889259A publication Critical patent/CN113889259A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Human Computer Interaction (AREA)
  • Pathology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The invention relates to an automatic diagnosis dialogue system under assistance of a knowledge graph, which comprises a front-end module, a data acquisition module, a data storage management module, a symptom expansion module and a module doctor-patient dialogue processing module, wherein the front-end module comprises a front-end module, a data acquisition module, a data storage management module, a symptom expansion module and a module doctor-patient dialogue processing module: wherein the front-end module transmits the user input to the doctor-patient dialogue processing module and displays the user input and corresponding system feedback; the data acquisition module finishes the acquisition of doctor-patient conversation linguistic data and disease regulations; the data storage management module receives data from the data acquisition module, constructs a knowledge map, and perfects a user information symptom table and a medical data selection table; the symptom expansion module expands the doctor-patient dialogue model training corpus, queries the knowledge graph to obtain key symptoms and probable symptoms causing diseases, and transmits the key symptoms and probable symptoms into a medical data selection table; the doctor-patient dialogue processing module trains the doctor-patient dialogue model, receives input of a front-end user, and utilizes the doctor-patient dialogue model to make a decision on actions to be taken next step to generate sentences or diagnosis results, and the sentences or the diagnosis results are transmitted to the front-end module and displayed to the user.

Description

Automatic diagnosis dialogue system under assistance of knowledge graph
Technical Field
The invention relates to a dialogue system, in particular to an automatic diagnosis dialogue system assisted by a knowledge graph.
Background
The digital medical industry is rapidly developing, and particularly, the demand for non-contact auxiliary diagnosis technologies such as network inquiry and the like is rapidly increasing. However, the current online inquiry is mainly that online diagnosis of doctors is not intelligent automatic diagnosis, patients often need to wait for online reply of doctors, diagnosis lacks real-time performance, and a large number of problems increase reply burden of doctors, so researchers provide an automatic diagnosis dialogue system based on a knowledge graph.
The knowledge map is a structured semantic knowledge base, expresses massive internet information into semantic representation recognizable in the objective world, has strong semantic expression and storage capacity, is widely applied to chat robots, recommendation systems and the like, doctors determine diseases suffered by patients according to medical knowledge and medical experience, the knowledge map is the summary of human knowledge and experience, and the professional knowledge map in a specific field can provide strong knowledge support for downstream tasks; the dialog system is classified into a non-task type chat system and a task type dialog system, which can accomplish a specific task through natural language interaction between a computer and a user, and is widely applied to a recommendation system, a reservation system, and the like.
Some disease diagnosis systems currently exist with the following problems: most of the systems are simple single-hop question-answering systems, diagnosis results are obtained according to keyword matching through one-time input of users, the relation between symptoms and diseases is ignored, the diagnosis processes in actual life are not met, and the diagnosis results are lack of rigor. The problem of 'black box' brought by the model obtained by learning a large amount of linguistic data and cases through machine learning and deep learning technologies causes the diagnosis result to lack of interpretability, and doctors have a worry about the diagnosis system. Model training linguistic data is too simple or doctor-patient dialogue is lost, so that the training model symptoms are insufficient, and the trained model is low in efficiency and accuracy.
Disclosure of Invention
In order to solve the problems, the invention provides an automatic diagnosis dialogue system under the assistance of a knowledge map, which realizes the automatic diagnosis of common diseases, realizes the automatic dialogue of doctors and patients by learning the diagnosis thinking of doctors through a reinforcement learning method, carries out disease verification on the input excavation symptoms of users and simulates the whole inquiry process. The invention introduces the medical knowledge map into a dialogue system, provides a theoretical basis for disease diagnosis, and makes the result more persuasive and interpretable. The knowledge graph is used for carrying out symptom expansion and complementing training corpus data through searching and matching to realize automatic diagnosis of common diseases, the disease diagnosis result is obtained more accurately, and comprehensive diagnosis opinions are given through knowledge graph query. The technical scheme adopted by the invention for realizing the purpose is as follows:
a system for automated diagnosis dialog with assistance of a knowledge graph, comprising: the system comprises a front-end module, a data acquisition module, a data storage management module, a symptom expansion module and a doctor-patient conversation processing module.
Wherein:
the front-end module displays an important window of human-computer interaction, displays user input and corresponding system feedback, and transmits the symptoms input by the user into the doctor-patient dialogue processing module.
And the data acquisition module finishes data acquisition work by utilizing a python crawler technology, acquires content of the data as doctor-patient conversation corpus and disease vocabulary entries, and outputs the content of the data as the doctor-patient conversation corpus and the disease vocabulary entries to the data storage management module so as to finish the construction work of the knowledge map and the training sample.
And the data storage management module comprises a medical knowledge map construction submodule, a user symptom information table and a medical data selection table. Wherein:
the medical knowledge map comprises entities such as diseases, symptoms, treatment departments, treatment modes, key symptoms and the like and weight relations between symptoms and diseases, and the medical knowledge map construction submodule comprises a knowledge acquisition grandchild module, a knowledge fusion grandchild module, a disease-symptom weight relation calculation grandchild module and a knowledge storage grandchild module; wherein:
and the knowledge acquisition grandchild module receives the input of the data acquisition module, adopts a deep learning-based method for the disease symptom description part in the disease entry, and utilizes a Jieba tool, a BERT + BilSTM + CRF model and a BIO sequence marking method to identify the named entity and extract symptoms. The key symptoms described for the disease are obtained using TF-IDF. Where TF is the word frequency and IDF is the inverse text frequency index. The method specifically comprises the following steps:
and (3) carrying out symptom recognition on the disease symptom description part in the disease entry by adopting a deep learning method and utilizing a Jieba tool, a BERT + BilSTM + CRF model and a BIO sequence marking method. Firstly, adding a standard term set such as SNOMEDCT and the like into a Jieba word segmentation special word bank to segment words of disease symptom description, and realizing symptom identification of user input sentences by using a Bert + BilSTM + CRF technology for the segmented sentences. Specifically, the method comprises the following steps:
1) mapping the sentence after word segmentation into a word vector by utilizing the trained Embedding of Bert as the input of a subsequent BilSTM network;
2) the input sequence is as follows: x ═ X1,x2,x3,...,xn) The BilSTM outputs a prediction sequence y ═ y (y)1,y2,y3,...,yn) Taking the output sequence as the input of a third CRF layer for inputting the prediction scores of all labels of each word of the sentence;
3) the CRF layer defines the scoring function of the sequence:
Figure BDA0003247360490000021
an entity BIO tag value is obtained.
The TF-IDF formula is:
Figure BDA0003247360490000022
Figure BDA0003247360490000031
TF-IDF=TF*IDF (4)
and the knowledge fusion grandchild module is used for mapping the disease to the ICD-10 standard code by utilizing the Chinese disease name synonym library of the ICD-10 disease diagnosis code and carrying out symptom standardization by utilizing the SNOMED CT standard library.
And the disease-symptom weight relation calculation grandchild module calculates the weight relation between the disease and the symptom by utilizing a probability map (Noisy-or) model technology. Let the conditional probability of a disease causing symptom be P (dis | sym), let the conditional probability of a disease causing symptom be P (sym | dis), each relational edge connects only one symptom and disease, and each edge contains both weights.
And the knowledge storage grandchild module is used for displaying the organization structure of knowledge in the knowledge map in a map structure, importing entity data into a Neo4j database in batch in a CSV file mode, and performing visualization processing on the knowledge map by using a cypher language.
User symptom information table: and storing the symptoms extracted from the user complaints.
Medical data selection table: and storing the symptoms which are required to be asked by the user for verifying the disease suffered by the user by the doctor and the patient in the doctor-patient conversation corpus. In this table, the user's negative symptom is marked as-1 and the user's positive symptom is 1.
And the symptom expansion module is used for adding the key symptoms and the general symptoms causing diseases into the medical data selection table to be expanded as training data, so that the accuracy of model training is improved. The method specifically comprises the following steps:
the key symptom of the disease is suffered in the knowledge map by using a cypher query language, and the prior probability of the self-describing explicit symptom of the user is recorded as Pprior(sym) is 1, according to
P(dis)=P(dis|sym)*Pprior(sym) (5)
Obtaining the probability of having a disease, based on
P(sym)=P(sym|dis)*P(dis) (6)
The probability of the symptom causing the disease was obtained, and the first 2 with the largest p (sym) value were taken. Two symptoms are added into a medical data selection table, the symptom mark is 1, and if the symptom has repetition, one of the symptoms is taken.
And the doctor-patient conversation processing module comprises a doctor-patient conversation model training submodule and a doctor-patient conversation system agent submodule. Wherein:
and the doctor-patient conversation model training submodule comprises a natural language understanding grandchild module, a conversation management grandchild module and a natural language generation grandchild module, and trains a model learning decision process by using a reinforcement learning method. Specifically, the method comprises the following steps:
and the natural language understanding grandchild module identifies user intention and slot filling operation. The user intentions are classified into four categories, namely "ask for a disease", "confirm a symptom", "deny a symptom", and "not determine whether or not to have the symptom". The method based on deep learning is adopted for the doctor-patient dialogue corpus, named entity recognition and symptom extraction are carried out by utilizing a Jieba tool, a BERT + BilsTM + CRF model and a BIO sequence marking method, simply, the first input of a user is classified into 'inquiry disease', and the subsequent input is classified according to a keyword matching mode and is respectively filled into a user symptom information table and a medical data selection table.
And the dialogue management grandchild module comprises dialogue state tracking and dialogue strategy learning, and realizes dialogue interaction between the system agent and the user. And the conversation management module controls the whole process until a diagnosis result is obtained. The method specifically comprises the following steps:
with a rule-based session state tracker, the state of the symptoms is stored and updated by the session state tracker after natural language understanding is completed. In each round of conversation, s is adoptedtTo save information of previous turns of actions, known symptoms and current turns of the robotic agent and user.
Training the DQN network by using a user simulator and a system agent simulation task driving conversation process, and when the doctor-patient conversation model makes correct diagnosis, the conversation is successfully stopped; when the diagnosis result is wrong or the number of conversation turns reaches a preset number of turns, the conversation failure is terminated. Specifically, the method comprises the following steps:
data in the user simulator comes from a data storage management module and maintains user targets of each conversation, including 'diagnosed diseases', 'explicit symptoms' and 'implicit symptoms'. Wherein the 'confirmed disease' is the final confirmed symptom of the current dialogue, the 'explicit symptom' is from the user symptom information table, and the 'implicit symptom' is from the medical data selection table.
The input of DQN is the dialog current state stOutput Q(s)t,at| θ) as the selection action a in the current statetThe decay weighted sum of the earned reward:
Figure BDA0003247360490000041
where Q' is a parameter of the target network, γ is the attenuation coefficient, rtIs the current session state stThe maximum return on taking system action is taken. Training each phase using a greedy strategy and fitting the experience of each time step et(st,at,tt,st+1) Stored in an experience pool that is updated when the network performs better.
And the natural language generation grandchild module is used for generating human language to ask a question to a user or generating a final diagnosis result based on the template according to the decision action of the dialogue management submodule.
Specifically, the doctor-patient dialogue model training process is as follows:
1) the data acquisition module acquires disease regulations and doctor-patient dialogue linguistic data by using a crawler and transmits the disease regulations and the doctor-patient dialogue linguistic data to the data storage management module;
2) the data storage management module corpus is used for carrying out named entity recognition, a knowledge map is constructed according to disease regulations, and a user symptom information table and a medical data selection table are filled according to doctor-patient conversation corpus;
3) the symptom expansion module is used for inquiring by adopting a cypher language according to the diagnosis disease name of the doctor-patient dialogue corpus to obtain key symptoms and probable symptoms causing diseases and expanding the medical data selection table;
4) establishing a user simulator by utilizing the doctor-patient dialogue corpus confirmed diagnosis disease tag name, the user symptom information table and the medical data selection table;
5) and (3) training the DQN network to perform dialogue decision learning by utilizing a user simulator and a system agent simulation task driving dialogue process to obtain an automatic dialogue system model as a doctor-patient dialogue system agent.
And the doctor-patient dialogue system agent submodule is a doctor-patient dialogue model obtained by the training of the doctor-patient dialogue model training submodule. The doctor-patient dialogue system agent receives the input of the front-end module user, continuously and automatically generates the next dialogue according to the feedback of the user until a diagnosis result is obtained, and the diagnosis result is displayed on a front-end page and displayed to the user. Specifically, if a certain disease can be diagnosed, a diagnosis template is automatically filled in and transmitted to a front-end module to be displayed to a user, the display content of the diagnosis template is the name of the diagnosed disease, a diagnosis department is recommended, reasons and recommended relieving measures and contraindications before the diagnosis are formed, and if the number of sessions reaches a fixed number, a fixed statement is output to remind the user to go to a hospital for medical treatment in time.
The invention has the beneficial effects that:
according to the invention, implicit symptoms existing in the user can be mined by simulating doctor-patient conversation according to the chief symptoms of the user, and automatic diagnosis of diseases is finally realized according to the current symptoms of the user, so that the diagnosis result is more real-time. The intensive learning method is used for realizing automatic generation of doctor-patient conversations, and the medical knowledge map containing key symptoms and weight relations between diseases and symptoms is used for expanding the linguistic data of the confirmed cases, so that the deficiency of inquiry symptoms caused by insufficient doctor experience of the training linguistic data is made up, and the diagnosis accuracy of a conversation system is improved. The invention introduces the medical knowledge map into the dialogue system and combines the reinforcement learning technology to provide basis for the diagnosis of diseases, so that the diagnosis has more persuasiveness and interpretability. The invention can reduce the repeated labor of doctors, is beneficial to reducing the burden of the doctors, is beneficial to reasonably allocating medical resources, effectively relieves the problems of difficult and troublesome seeing, reduces the medical cost, enables people to know the health condition of the people in time and to reasonably seek medical advice.
Drawings
Fig. 1 is a detailed block diagram of a module according to an embodiment of the present invention.
Fig. 2 is a flowchart of a doctor-patient dialogue model implementation according to an embodiment of the invention.
Fig. 3 is a flow chart of doctor-patient interaction in accordance with an embodiment of the present invention.
FIG. 4 is a flow chart of a module implementation of an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
Referring to fig. 1, an automatic diagnosis dialogue system assisted by a knowledge graph includes a front-end module, a data acquisition module, a data storage management module, a symptom expansion module, and a doctor-patient dialogue processing module, wherein the data acquisition management module includes a medical knowledge graph construction sub-module, and the doctor-patient dialogue processing module includes a doctor-patient dialogue model training sub-module and a doctor-patient dialogue system agent sub-module. Specifically, the modules are specifically realized by the following steps:
the front-end module displays an important window of human-computer interaction, displays user input and corresponding system feedback, and transmits the symptoms input by the user into the doctor-patient dialogue processing module.
The data acquisition module finishes data acquisition work by utilizing a python crawler technology, acquires content of the data acquisition module as doctor-patient conversation linguistic data and disease entries, outputs the collected content to the data storage management module to finish construction of a knowledge map and construction of a training sample, and data source websites comprise professional websites of book documents, Baidu encyclopedias, spring rain doctors, good doctors and the like.
And the data storage management module comprises a medical knowledge map construction submodule, a user symptom information table and a medical data selection table. Wherein:
the medical knowledge map comprises entities such as diseases, symptoms, treatment departments, treatment modes, key symptoms and the like and weight relations between symptoms and diseases, and the medical knowledge map construction submodule comprises a knowledge acquisition grandchild module, a knowledge fusion grandchild module, a disease-symptom weight relation calculation grandchild module and a knowledge storage grandchild module; wherein:
and the knowledge acquisition grandchild module receives the input of the data acquisition module, adopts a deep learning-based method for the disease symptom description part in the disease entry, and utilizes a Jieba tool, a BERT + BilSTM + CRF model and a BIO sequence marking method to identify the named entity and extract symptoms. The key symptoms described for the disease are obtained using TF-IDF. Where TF is the word frequency and IDF is the inverse text frequency index. The method specifically comprises the following steps:
and (3) carrying out symptom recognition on the disease symptom description part in the disease entry by adopting a deep learning method and utilizing a Jieba tool, a BERT + BilSTM + CRF model and a BIO sequence marking method. Firstly, adding a standard term set such as SNOMEDCT and the like into a Jieba word segmentation special word bank to segment words of disease symptom description, and realizing symptom identification of user input sentences by using a Bert + BilSTM + CRF technology for the segmented sentences. Specifically, the method comprises the following steps:
1) mapping the sentence after word segmentation into a word vector by utilizing the trained Embedding of Bert as the input of a subsequent BilSTM network;
2) the input sequence is as follows: x ═ X1,x2,x3,...,xn) The BilSTM outputs a prediction sequence y ═ y (y)1,y2,y3,...,yn) Taking the output sequence as the input of a third CRF layer for inputting the prediction scores of all labels of each word of the sentence;
3) the CRF layer defines the scoring function of the sequence:
Figure BDA0003247360490000061
an entity BIO tag value is obtained.
The TF-IDF formula is:
Figure BDA0003247360490000062
Figure BDA0003247360490000063
TF-IDF=TF*IDF (4)
and the knowledge fusion grandchild module is used for mapping the disease to the ICD-10 standard code by utilizing the Chinese disease name synonym library of the ICD-10 disease diagnosis code and carrying out symptom normalization by utilizing the SNOMEDCT standard library.
And the disease-symptom weight relation calculation grandchild module calculates the weight relation between the disease and the symptom by utilizing a probability map (Noisy-or) model technology. Let the conditional probability of a disease causing symptom be P (dis | sym), let the conditional probability of a disease causing symptom be P (sym | dis), each relational edge connects only one symptom and disease, and each edge contains both weights.
And the knowledge storage grandchild module is used for displaying the organization structure of knowledge in the knowledge map in a map structure, importing entity data into a Neo4j database in batch in a CSV file mode, and performing visualization processing on the knowledge map by using a cypher language.
User symptom information table: and storing the symptoms extracted from the user complaints.
Medical data selection table: and storing the symptoms which are required to be asked by the user for verifying the disease suffered by the user by the doctor and the patient in the doctor-patient conversation corpus. In this table, the user's negative symptom is marked as-1 and the user's positive symptom is 1.
And the symptom expansion module is used for adding the key symptoms and the general symptoms causing diseases into the medical data selection table to be expanded as training data, so that the accuracy of model training is improved. The method specifically comprises the following steps:
the key symptom of the disease is suffered in the knowledge map by using a cypher query language, and the prior probability of the self-describing explicit symptom of the user is recorded as Pprior(sym) is 1, according to
P(dis)=P(dis|sym)*Pprior(sym) (5)
Obtaining the probability of having a disease, based on
P(sym)=P(sym|dis)*P(dis) (6)
The probability of the symptom causing the disease was obtained, and the first 2 with the largest p (sym) value were taken. Two symptoms are added into a medical data selection table, the symptom mark is 1, and if the symptom has repetition, one of the symptoms is taken.
And the doctor-patient conversation processing module comprises a doctor-patient conversation model training submodule and a doctor-patient conversation system agent submodule. Wherein:
and the doctor-patient conversation model training submodule comprises a natural language understanding grandchild module, a conversation management grandchild module and a natural language generation grandchild module, and trains a model learning decision process by using a reinforcement learning method. Specifically, the method comprises the following steps:
and the natural language understanding grandchild module identifies user intention and slot filling operation. The user intentions are classified into four categories, namely "ask for a disease", "confirm a symptom", "deny a symptom", and "not determine whether or not to have the symptom". The method based on deep learning is adopted for the doctor-patient dialogue corpus, named entity recognition and symptom extraction are carried out by utilizing a Jieba tool, a BERT + BilsTM + CRF model and a BIO sequence marking method, simply, the first input of a user is classified into 'inquiry disease', and the subsequent input is classified according to a keyword matching mode and is respectively filled into a user symptom information table and a medical data selection table.
And the dialogue management grandchild module comprises dialogue state tracking and dialogue strategy learning, and realizes dialogue interaction between the system agent and the user. And the conversation management module controls the whole process until a diagnosis result is obtained. The method specifically comprises the following steps:
with a rule-based session state tracker, the state of the symptoms is stored and updated by the session state tracker after natural language understanding is completed. In each round of conversation, s is adoptedtTo save information of previous turns of actions, known symptoms and current turns of the robotic agent and user.
Training the DQN network by using a user simulator and a system agent simulation task driving conversation process, and when the doctor-patient conversation model makes correct diagnosis, the conversation is successfully stopped; when the diagnosis result is wrong or the number of conversation turns reaches a preset number of turns, the conversation failure is terminated.
Specifically, the method comprises the following steps:
data in the user simulator comes from a data storage management module and maintains user targets of each conversation, including 'diagnosed diseases', 'explicit symptoms' and 'implicit symptoms'. Wherein the 'confirmed disease' is the final confirmed symptom of the current dialogue, the 'explicit symptom' is from the user symptom information table, and the 'implicit symptom' is from the medical data selection table.
The input of DQN is the dialog current state stOutput Q(s)t,at| θ) as the selection action a in the current statetThe decay weighted sum of the earned reward:
Figure BDA0003247360490000081
where Q' is a parameter of the target network, γ is the attenuation coefficient, rtIs the current session state stThe maximum return on taking system action is taken. Training each phase using a greedy strategy and fitting the experience of each time step et(st,at,tt,st+1) Stored in an experience pool that is updated when the network performs better.
And the natural language generation grandchild module is used for generating human language to ask a question to a user or generating a final diagnosis result based on the template according to the decision action of the dialogue management submodule.
Specifically, as shown in fig. 3, the doctor-patient dialogue model training process is as follows:
1) the data acquisition module acquires disease regulations and doctor-patient dialogue linguistic data by using a crawler and transmits the disease regulations and the doctor-patient dialogue linguistic data to the data storage management module;
2) the data storage management module corpus is used for carrying out named entity recognition, a knowledge map is constructed according to disease regulations, and a user symptom information table and a medical data selection table are filled according to doctor-patient conversation corpus;
3) the symptom expansion module is used for inquiring by adopting a cypher language according to the diagnosis disease name of the doctor-patient dialogue corpus to obtain key symptoms and probable symptoms causing diseases and expanding the medical data selection table;
4) establishing a user simulator by utilizing the doctor-patient dialogue corpus confirmed diagnosis disease tag name, the user symptom information table and the medical data selection table;
5) and (3) training the DQN network to perform dialogue decision learning by utilizing a user simulator and a system agent simulation task driving dialogue process to obtain an automatic dialogue system model as a doctor-patient dialogue system agent.
And the doctor-patient dialogue system agent submodule is a doctor-patient dialogue model obtained by the training of the doctor-patient dialogue model training submodule. Referring to fig. 3, the doctor-patient dialog system agent receives the user input of the front-end module, automatically generates the next dialog according to the user feedback continuously until the diagnosis result is obtained, and displays the result on the front-end page to be displayed to the user. Specifically, if a certain disease can be diagnosed, the diagnosis template is automatically filled in and transmitted to the front-end module to be displayed to the user, the display content of the diagnosis template is the name of the diagnosed disease, a department for diagnosis is suggested, reasons and suggested mitigating measures and contraindications before diagnosis are formed, and if the number of sessions reaches a fixed turn, a fixed sentence is output to remind the user to go to the hospital for medical treatment in time.

Claims (1)

1. A knowledge-graph-aided automated diagnostic dialog system, characterized by: including front end module, data acquisition module, data storage management module, symptom extension module, doctor-patient dialogue processing module, wherein:
the front-end module displays an important window of human-computer interaction, displays user input and corresponding system feedback, and transmits the user input and the corresponding system feedback into the doctor-patient conversation processing module after the user inputs symptoms.
And the data acquisition module finishes data acquisition work by utilizing a python crawler technology, acquires content including doctor-patient conversation linguistic data and disease entries, and outputs the content to the data storage management module so as to finish the construction work of the knowledge map and the training sample.
And the data storage management module comprises a medical knowledge map construction submodule, a user symptom information table and a medical data selection table. Wherein:
the medical knowledge map comprises entities such as diseases, symptoms, treatment departments, treatment modes, key symptoms and the like and weight relations between symptoms and diseases, and the medical knowledge map construction submodule comprises a knowledge acquisition grandchild module, a knowledge fusion grandchild module, a disease-symptom weight relation calculation grandchild module and a knowledge storage grandchild module; wherein:
and the knowledge acquisition grandchild module receives the input of the data acquisition module, adopts a deep learning-based method for the disease symptom description part in the disease entry, and utilizes a Jieba tool, a BERT + BilSTM + CRF model and a BIO sequence tagging method to identify the named entity and extract symptoms. The key symptoms described for the disease are obtained using TF-IDF. Where TF is the word frequency and IDF is the inverse text frequency index. The method specifically comprises the following steps:
and (3) carrying out symptom recognition on the disease symptom description part in the disease entry by adopting a deep learning method and utilizing a Jieba tool, a BERT + BilSTM + CRF model and a BIO sequence marking method. Firstly, adding a standard term set such as SNOMED CT and the like into a Jieba word segmentation special word bank to segment words of disease symptom description, and realizing symptom identification of user input sentences by using Bert + BilSTM + CRF technology for the segmented sentences. Specifically, the method comprises the following steps:
1) mapping the sentence after word segmentation into a word vector by utilizing the trained Embedding of Bert as the input of a subsequent BilSTM network;
2) the input sequence is as follows: x ═ X1,x2,x3,...,xn) The BilSTM outputs a prediction sequence y ═ y (y)1,y2,y3,...,yn) Taking the output sequence as the input of a third CRF layer for inputting the prediction scores of all labels of each word of the sentence;
3) the CRF layer defines the scoring function of the sequence:
Figure FDA0003247360480000011
an entity BIO tag value is obtained.
The TF-IDF formula is:
Figure FDA0003247360480000012
Figure FDA0003247360480000021
TF-IDF=TF*IDF (4)
and the knowledge fusion grandchild module is used for mapping the disease to the ICD-10 standard code by utilizing the Chinese disease name synonym library of the ICD-10 disease diagnosis code and carrying out symptom standardization by utilizing the SNOMED CT standard library.
And the disease-symptom weight relation calculation grandchild module calculates the weight relation between the disease and the symptom by utilizing a probability map (Noisy-or) model technology. Let the conditional probability of a disease causing symptom be P (dis | sym), let the conditional probability of a disease causing symptom be P (sym | dis), each relational edge connects only one symptom and disease, and each edge contains both weights.
And the knowledge storage grandchild module is used for displaying the organization structure of knowledge in the knowledge graph in a graph structure, importing entity data into a Neo4j database in batch in a CSV (CSV) file mode, and performing visualization processing on the knowledge graph by using a cypher language.
User symptom information table: and storing the symptoms extracted from the user complaints.
Medical data selection table: and storing the symptoms which are required to be asked by the user for verifying the disease suffered by the user by the doctor and the patient in the doctor-patient conversation corpus. In this table, the user's negative symptom is marked as-1 and the user's positive symptom is 1.
And the symptom expansion module is used for adding the key symptoms and the general symptoms causing diseases into the medical data selection table to be expanded as training data, so that the accuracy of model training is improved. The method specifically comprises the following steps:
the key symptom of the disease is suffered in the knowledge map by using a cypher query language, and the prior probability of the self-describing explicit symptom of the user is recorded as Pprior(sym) is 1, according to
P(dis)=P(dis|sym)*Pprior(sym) (5)
Obtaining the probability of having a disease, based on
P(sym)=P(sym|dis)*P(dis) (6)
The probability of the symptom causing the disease was obtained, and the first 2 with the largest p (sym) value were taken. Two symptoms are added into a medical data selection table, the symptom mark is 1, and if the symptom has repetition, one of the symptoms is taken.
And the doctor-patient conversation processing module comprises a doctor-patient conversation model training submodule and a doctor-patient conversation system agent submodule. Wherein:
and the doctor-patient conversation model training submodule comprises a natural language understanding grandchild module, a conversation management grandchild module and a natural language generation grandchild module, and trains a model learning decision process by using a reinforcement learning method. Specifically, the method comprises the following steps:
and the natural language understanding grandchild module identifies user intention and slot filling operation. The user intentions are classified into four categories, namely "ask for a disease", "confirm a symptom", "deny a symptom", and "not determine whether or not to have the symptom". The method is characterized in that a deep learning-based method is adopted for the doctor-patient dialogue corpus, named entity recognition is carried out by utilizing a Jieba tool, a BERT + BilsTM + CRF model and a BIO sequence marking method to extract symptoms, simply, the first input of a user is classified into 'inquiry diseases', and the subsequent inputs are classified according to a keyword matching mode and are respectively filled into a user symptom information table and a medical data selection table.
And the dialogue management grandchild module comprises dialogue state tracking and dialogue strategy learning, and realizes dialogue interaction between the system agent and the user. And the conversation management module controls the whole process until a diagnosis result is obtained. The method specifically comprises the following steps:
with a rule-based session state tracker, the state of the symptoms is stored and updated by the session state tracker after natural language understanding is completed. In each round of conversation, s is adoptedtTo save information of previous turns of actions, known symptoms and current turns of the robotic agent and user.
Training the DQN network by using a user simulator and a system agent simulation task driving conversation process, and when the doctor-patient conversation model makes correct diagnosis, the conversation is successfully stopped; when the diagnosis result is wrong or the number of conversation turns reaches a preset number of turns, the conversation failure is terminated. Specifically, the method comprises the following steps:
data in the user simulator comes from a data storage management module and maintains user targets of each conversation, including 'diagnosed diseases', 'explicit symptoms' and 'implicit symptoms'. Wherein the 'confirmed disease' is the final confirmed symptom of the current dialogue, the 'explicit symptom' is from the user symptom information table, and the 'implicit symptom' is from the medical data selection table.
The input of DQN is the dialog current state stOutput Q(s)t,at| θ) as the selection action a in the current statetThe decay weighted sum of the earned reward:
Figure FDA0003247360480000031
where Q' is a parameter of the target network, γ is the attenuation coefficient, rtIs the current session state stThe maximum return on taking system action is taken. Training each phase using a greedy strategy and fitting the experience of each time step et(st,at,tt,st+1) Stored in an experience pool that is updated when the network performs better.
And the natural language generation grandchild module is used for generating human language to ask a question to a user or generating a final diagnosis result based on the template according to the decision action of the dialogue management submodule.
Specifically, the doctor-patient dialogue model training process is as follows:
1) the data acquisition module acquires disease regulations and doctor-patient dialogue linguistic data by using a crawler and transmits the disease regulations and the doctor-patient dialogue linguistic data to the data storage management module;
2) the data storage management module corpus is used for carrying out named entity recognition, a knowledge map is constructed according to disease regulations, and a user symptom information table and a medical data selection table are filled according to doctor-patient conversation corpus;
3) the symptom expansion module is used for inquiring by adopting a cypher language according to the diagnosis disease name of the doctor-patient dialogue corpus to obtain key symptoms and probable symptoms causing diseases and expanding the medical data selection table;
4) establishing a user simulator by utilizing the doctor-patient dialogue corpus confirmed diagnosis disease tag name, the user symptom information table and the medical data selection table;
5) and (3) training the DQN network to perform dialogue decision learning by utilizing a user simulator and a system agent simulation task driving dialogue process to obtain an automatic dialogue system model as a doctor-patient dialogue system agent.
The doctor-patient dialogue system agent submodule and the doctor-patient dialogue model training submodule train the obtained doctor-patient dialogue model. And the doctor-patient conversation system agent receives the input of the front-end module, automatically generates the next conversation continuously according to the feedback of the user until a diagnosis result is obtained, and displays the diagnosis result on a front-end page to be displayed to the user. Specifically, if a certain disease can be diagnosed, a diagnosis template is automatically filled in and transmitted to a front-end module to be displayed to a user, the display content of the diagnosis template is the name of the diagnosed disease, a diagnosis department is recommended, reasons and recommended relieving measures and contraindications before the diagnosis are formed, and if the number of sessions reaches a fixed number, a fixed statement is output to remind the user to go to a hospital for medical treatment in time.
CN202111036730.3A 2021-09-06 2021-09-06 Automatic diagnosis dialogue system under assistance of knowledge graph Pending CN113889259A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111036730.3A CN113889259A (en) 2021-09-06 2021-09-06 Automatic diagnosis dialogue system under assistance of knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111036730.3A CN113889259A (en) 2021-09-06 2021-09-06 Automatic diagnosis dialogue system under assistance of knowledge graph

Publications (1)

Publication Number Publication Date
CN113889259A true CN113889259A (en) 2022-01-04

Family

ID=79008277

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111036730.3A Pending CN113889259A (en) 2021-09-06 2021-09-06 Automatic diagnosis dialogue system under assistance of knowledge graph

Country Status (1)

Country Link
CN (1) CN113889259A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114155962A (en) * 2022-02-10 2022-03-08 北京妙医佳健康科技集团有限公司 Data cleaning method and method for constructing disease diagnosis by using knowledge graph
CN114496234A (en) * 2022-04-18 2022-05-13 浙江大学 Cognitive-atlas-based personalized diagnosis and treatment scheme recommendation system for general patients
CN115482926A (en) * 2022-09-20 2022-12-16 浙江大学 Knowledge-driven rare disease visual question-answer type auxiliary differential diagnosis system and method
CN116306687A (en) * 2023-05-25 2023-06-23 北京梆梆安全科技有限公司 Medical consultation platform self-detection system and medical consultation platform
CN116612879A (en) * 2023-07-19 2023-08-18 北京惠每云科技有限公司 Diagnostic result prediction method, diagnostic result prediction device, electronic equipment and storage medium
CN116775911A (en) * 2023-08-22 2023-09-19 北京六元空间信息科技有限责任公司 Medical queue follow-up dialogue assisting method and system based on questionnaire and large model
CN116936080A (en) * 2023-07-27 2023-10-24 中日友好医院(中日友好临床医学研究所) Preliminary diagnosis guiding method and device based on dialogue and electronic medical record
CN116955579A (en) * 2023-09-21 2023-10-27 武汉轻度科技有限公司 Chat reply generation method and device based on keyword knowledge retrieval
CN117194604A (en) * 2023-11-06 2023-12-08 临沂大学 Intelligent medical patient inquiry corpus construction method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109817329A (en) * 2019-01-21 2019-05-28 暗物智能科技(广州)有限公司 A kind of medical treatment interrogation conversational system and the intensified learning method applied to the system
CN112002411A (en) * 2020-08-20 2020-11-27 杭州电子科技大学 Cardiovascular and cerebrovascular disease knowledge map question-answering method based on electronic medical record
CN112632997A (en) * 2020-12-14 2021-04-09 河北工程大学 Chinese entity identification method based on BERT and Word2Vec vector fusion
CN112784051A (en) * 2021-02-05 2021-05-11 北京信息科技大学 Patent term extraction method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109817329A (en) * 2019-01-21 2019-05-28 暗物智能科技(广州)有限公司 A kind of medical treatment interrogation conversational system and the intensified learning method applied to the system
CN112002411A (en) * 2020-08-20 2020-11-27 杭州电子科技大学 Cardiovascular and cerebrovascular disease knowledge map question-answering method based on electronic medical record
CN112632997A (en) * 2020-12-14 2021-04-09 河北工程大学 Chinese entity identification method based on BERT and Word2Vec vector fusion
CN112784051A (en) * 2021-02-05 2021-05-11 北京信息科技大学 Patent term extraction method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
田迎等: "基于知识图谱的抑郁症自动问答系统研究", 《湖北大学学报(自然科学版)》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114155962A (en) * 2022-02-10 2022-03-08 北京妙医佳健康科技集团有限公司 Data cleaning method and method for constructing disease diagnosis by using knowledge graph
CN114496234A (en) * 2022-04-18 2022-05-13 浙江大学 Cognitive-atlas-based personalized diagnosis and treatment scheme recommendation system for general patients
WO2024060508A1 (en) * 2022-09-20 2024-03-28 浙江大学 Knowledge-driven system and method for visualized question-and-answer assisted differential diagnosis of rare disease
CN115482926A (en) * 2022-09-20 2022-12-16 浙江大学 Knowledge-driven rare disease visual question-answer type auxiliary differential diagnosis system and method
CN115482926B (en) * 2022-09-20 2024-04-09 浙江大学 Knowledge-driven rare disease visual question-answer type auxiliary differential diagnosis system and method
CN116306687A (en) * 2023-05-25 2023-06-23 北京梆梆安全科技有限公司 Medical consultation platform self-detection system and medical consultation platform
CN116306687B (en) * 2023-05-25 2023-08-18 北京梆梆安全科技有限公司 Medical consultation platform self-detection system and medical consultation platform
CN116612879A (en) * 2023-07-19 2023-08-18 北京惠每云科技有限公司 Diagnostic result prediction method, diagnostic result prediction device, electronic equipment and storage medium
CN116612879B (en) * 2023-07-19 2023-09-26 北京惠每云科技有限公司 Diagnostic result prediction method, diagnostic result prediction device, electronic equipment and storage medium
CN116936080A (en) * 2023-07-27 2023-10-24 中日友好医院(中日友好临床医学研究所) Preliminary diagnosis guiding method and device based on dialogue and electronic medical record
CN116775911B (en) * 2023-08-22 2023-11-03 北京六元空间信息科技有限责任公司 Medical queue follow-up dialogue assisting method and system based on questionnaire and large model
CN116775911A (en) * 2023-08-22 2023-09-19 北京六元空间信息科技有限责任公司 Medical queue follow-up dialogue assisting method and system based on questionnaire and large model
CN116955579B (en) * 2023-09-21 2023-12-29 武汉轻度科技有限公司 Chat reply generation method and device based on keyword knowledge retrieval
CN116955579A (en) * 2023-09-21 2023-10-27 武汉轻度科技有限公司 Chat reply generation method and device based on keyword knowledge retrieval
CN117194604A (en) * 2023-11-06 2023-12-08 临沂大学 Intelligent medical patient inquiry corpus construction method
CN117194604B (en) * 2023-11-06 2024-01-30 临沂大学 Intelligent medical patient inquiry corpus construction method

Similar Documents

Publication Publication Date Title
CN113889259A (en) Automatic diagnosis dialogue system under assistance of knowledge graph
CN110765257B (en) Intelligent consulting system of law of knowledge map driving type
EP3910492A2 (en) Event extraction method and apparatus, and storage medium
CN112507696B (en) Human-computer interaction diagnosis guiding method and system based on global attention intention recognition
CN111708874A (en) Man-machine interaction question-answering method and system based on intelligent complex intention recognition
CN111125309A (en) Natural language processing method and device, computing equipment and storage medium
US11481387B2 (en) Facet-based conversational search
CN102663129A (en) Medical field deep question and answer method and medical retrieval system
CN110517767B (en) Auxiliary diagnosis method, auxiliary diagnosis device, electronic equipment and storage medium
US20200279147A1 (en) Method and apparatus for intelligently recommending object
US8990246B2 (en) Understanding and addressing complex information needs
CN112420151A (en) Method, system, equipment and medium for structured analysis after ultrasonic report
US11947578B2 (en) Method for retrieving multi-turn dialogue, storage medium, and electronic device
CN114153994A (en) Medical insurance information question-answering method and device
CN115714030A (en) Medical question-answering system and method based on pain perception and active interaction
CN117809798A (en) Verification report interpretation method, system, equipment and medium based on large model
CN113705207A (en) Grammar error recognition method and device
GUO et al. Design and implementation of intelligent medical customer service robot based on deep learning
JP2022044016A (en) Automatically recommending existing machine learning project adaptable for use in new machine learning project
Zhang et al. Medical Q&A statement NER based on ECA attention mechanism and lexical enhancement
CN113569124A (en) Medical title matching method, device, equipment and storage medium
CN113704481B (en) Text processing method, device, equipment and storage medium
CN116991982B (en) Interactive dialogue method, device, equipment and storage medium based on artificial intelligence
CN112541056B (en) Medical term standardization method, device, electronic equipment and storage medium
CN117574917A (en) Model generation method, device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20220104

RJ01 Rejection of invention patent application after publication