CN115565655A - Enhanced auxiliary inquiry method - Google Patents

Enhanced auxiliary inquiry method Download PDF

Info

Publication number
CN115565655A
CN115565655A CN202211233086.3A CN202211233086A CN115565655A CN 115565655 A CN115565655 A CN 115565655A CN 202211233086 A CN202211233086 A CN 202211233086A CN 115565655 A CN115565655 A CN 115565655A
Authority
CN
China
Prior art keywords
doctor
symptom
model
patient
disease
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211233086.3A
Other languages
Chinese (zh)
Inventor
胡松
张云
林钰久
朱嘉静
李巧勤
傅翀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202211233086.3A priority Critical patent/CN115565655A/en
Publication of CN115565655A publication Critical patent/CN115565655A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H40/00ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
    • G16H40/20ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the management or administration of healthcare resources or facilities, e.g. managing hospital staff or surgery rooms
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Data Mining & Analysis (AREA)
  • General Business, Economics & Management (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention belongs to the technical field of intelligent triage, and particularly relates to an enhanced auxiliary inquiry method. According to the technical scheme, three prediction modules are provided, the three prediction modules are combined, the clustering model utilizes historical doctor-patient conversation data, the diagnosis result of a doctor is used as the pre-diagnosis result, and the reliability and the scientificity are improved; the body node symptom graph is better matched with the spoken description information of the patient, and the symptom information can be more accurately obtained; the disease prediction model based on the neural network can accept wider input description and has wider coverage; the results of the three modules are integrated, and the final pre-diagnosis result is generated by voting, so that the accuracy and the scientificity of diagnosis can be improved.

Description

Enhanced auxiliary inquiry method
Technical Field
The invention belongs to the technical field of intelligent triage, and particularly relates to an enhanced auxiliary inquiry method.
Background
At present, common people have difficulty seeing a doctor and a high sight of a doctor, which is a more prominent problem in a medical system. Often, a person is ill and may have minor problems, such as a child's cold, fever, cough, etc., that they also need to go to a hospital for a visit. Due to lack of professional medical knowledge, registration to a wrong department and the like may be caused, and even if registration is successful, a few hours may be required for a doctor to see a doctor, or no registration may be performed at all. Doctors are not easy to see, and the time for seeing a doctor is only a few minutes. So even a common ailment usually takes half a day to a day, while the actual inquiry time is only a few minutes.
However, since there are too many patients and it is difficult to spend much time for each patient, the inquiry of one patient has to be completed within several minutes for the doctor. Because the time is too short, it is difficult to perform a careful and patience inquiry. Since most patients are usually only a small daily illness, physicians simply and repeatedly ask about the same questions most of the day.
In order to reduce the time taken for patient registration and visit and reduce the workload of doctors, various auxiliary diagnosis systems based on techniques such as artificial intelligence and data mining have been developed. Chinese patent publication No. CN109166622 discloses a disease pre-diagnosis system based on knowledge-graph, which includes a database and a pre-diagnosis subsystem, wherein the database stores medical knowledge-graph including disease name and symptom information obtained from a third party; the pre-diagnosis subsystem comprises an inquiry module, a screening module and an output module, wherein the inquiry module inquires disease data from a database after receiving symptom information of a patient, the screening module is used for receiving the symptom screening information and the disease data from the inquiry module, screening the received disease data according to the received symptom screening information, sequentially generating suspected disease sets according to screening results, and the output module outputs and displays the suspected disease sets.
Chinese patent "CN104102816 automatic diagnosis system and method based on symptom matching and machine learning" provides a disease/symptom database for storing each known disease/symptom and its corresponding symptom; the user interaction module is used for receiving a symptom keyword set input by a user; the symptom matching module is used for matching the symptom keyword set input by the user with symptoms in the disease/symptom database and calculating the matching degree of the symptom keyword set and each disease/symptom; and the diagnosis module is used for determining corresponding diseases/symptoms according to the matching degree of the symptom keyword set and each disease/symptom.
The Chinese patent "CN109166622 disease pre-diagnosis system based on knowledge graph" completely depends on the existing data in the database, and if some symptom information does not appear on the database or knowledge graph, the pre-diagnosis output cannot be completed.
Chinese patent "CN104102816 automatic diagnosis system and method based on symptom matching and machine learning" matches the symptoms in the disease/symptom database according to the symptom keyword set input by the user, and has the disadvantage that the symptom description of the patient is mostly spoken, and is greatly different from the standard medical terms, resulting in the influence on the final accuracy.
Disclosure of Invention
Aiming at the problems, the invention provides an enhanced auxiliary inquiry method, which improves the accuracy of auxiliary inquiry, saves the time for patients to queue for seeing a doctor and reduces the workload of doctors.
The technical scheme of the invention is as follows:
an enhanced assisted interrogation method comprising the steps of:
s1, acquiring self-describing information of a patient;
s2, preprocessing the acquired self-describing information, and inputting the preprocessed self-describing information into a clustering model for matching to obtain a clustering matching result; the clustering model is a cluster-based headword matching model, and the construction method comprises the following steps:
firstly, constructing a doctor-patient conversation database: the method comprises the steps of collecting doctor-patient conversation data, reserving description information of a patient on own symptoms, question information of a doctor on the symptoms and diagnosis results and suggestion information of the doctor after preprocessing, and establishing a doctor-patient conversation data set D = { Q, A }, wherein Q represents the description of the patient on the symptoms, and A represents answers, diagnosis results and suggestion information of the doctor; based on a data set D, similar doctor-patient dialogue data are divided into subsets D1, D2, \ 8230Dn by using a cosine similarity algorithm, wherein Di stores doctor-patient dialogue data of the same or similar cases, and n represents the number of disease types related in the dialogue data set; establishing a doctor-patient conversation information database by adopting a divided conversation data set D = { D1, D2, D3 \8230Dn };
dividing doctor answers under the same subset in a doctor-patient conversation database into short sentences, and then performing N-grams word mining to obtain N continuous segments;
merging and filtering the obtained fragments, and then clustering to enable answers of the same type of diseases to form an answer set, and selecting sentences with the longest length from the answer set as the center A of the set to obtain a clustering model { Q', A }; wherein Q '= { Q1', Q2', Q3' \8230 }, and the acquisition mode is as follows: let the subset of patients with similar self-statement corresponding to the type of disease and treatment recommendation given by the doctor included in center a be Q = { Q1, Q2, Q3 \8230 }, qi is the self-statement of a certain patient,
Figure BDA0003882346170000021
by extracting the key words of each element in Q, and then calculating TF-IDF Gram The feature values yield a representation f of each q tfidf i The subscript Gram represents that the word frequency statistics is that one N-Gram segment is used as word counting;
the method for matching the clustering models comprises the following steps: defining the obtained self-describing information as q new Calculating q new TF-IDF of Gram Characteristic f new tfidf The representation f of each q' is tfidf i And q is new Characteristic f of new tfidf Calculating the similarity:
Sim(q’ j ,q new )=f tfidf i *f new tfidf /(||f tfidf i ||*||f new tfidf ||)
l x l represents the modular length of x, and the problem q 'with the highest similarity is selected' max Question q 'will be' max Center A of the problem set i * As cluster match prediction results;
obtaining a first prediction result R1 and a first matching degree V1 after the self-describing information passes through a clustering model, wherein R1 is a disease type given by a corresponding doctor in a matched clustering center A, and V1 is a normalized numerical value output by the clustering model;
s3, judging whether the first matching degree V1 is larger than a set first threshold value, if so, taking the first prediction result R1 as a final result R, and entering a step S8, otherwise, entering a step S4;
s4, inputting the self-describing information into a body node symptom graph matching model for matching to obtain a symptom graph matching result; the construction mode of the body node symptom graph model is as follows:
dividing a human body into a plurality of parts including a head, a neck, a chest, an abdomen, a hip, a left upper limb, a right upper limb, a left lower limb and a right lower limb; dividing each part into a plurality of different nodes to obtain a body node map;
using the doctor-patient dialogue database constructed in the S2 to collect descriptions of different patients at each part to construct a description set K, using a cosine similarity algorithm to remove sentences with similarity higher than 90%, and reserving a description with the longest information to obtain a set K';
labeling a corresponding professional medical description or medical symptom for each description set K';
correspondingly matching the labeled description set K' with the divided body node graph to obtain a body node symptom graph model, enabling each node in the body node symptom graph model to contain spoken description sets of a plurality of patients with common symptoms of the current part and professional symptom descriptions of doctors, and connecting the spoken description sets and the professional symptom descriptions to diseases containing the symptoms;
the method for matching the body node symptom graph model comprises the following steps: extracting body part information based on the self-describing information, performing spoken language symptom description matching according to a symptom spoken language description set of corresponding part nodes in a body node symptom graph model to obtain corresponding professional medical symptoms, obtaining corresponding diseases according to the symptoms, and taking the disease type with the largest occurrence frequency as a symptom graph matching result;
obtaining a second prediction result R2 and a second matching degree V2 after the self-describing information passes through the body node symptom graph matching model, wherein V2 is a normalized numerical value output by the body node symptom graph matching model;
s5, judging whether the first prediction result R1 is the same as the second prediction result R2, if so, taking the first prediction result R1 as a final result R, and entering a step S8, otherwise, judging whether the second matching degree V2 is greater than a second threshold value, if so, taking the second prediction result R2 as the final result R, and entering the step S8, otherwise, entering a step S6;
s6, converting the self-describing information into vectors and inputting the vectors into a trained disease prediction model to obtain predicted disease types; the acquisition mode of the trained disease prediction model is as follows:
automatically extracting self-describing description, disease information, treatment medication and diagnosis suggestion information corresponding to doctors of the patients in the collected original doctor-patient dialogue data by using natural language processing and keyword extraction technology;
according to the extracted patient self-description and the diagnosis result and treatment suggestion information of the doctor, performing semantic feature representation, converting one-hot codes into a vector form to be used as a training sample, wherein the coding length is equal to the number of related disease entities; 0 represents that the entity of the position is not mentioned, and 1 represents that the entity represented by the position is mentioned; setting the disease type given by the doctor as a label of the sample;
constructing a disease prediction model by using a DQN neural network model, inputting a training sample represented by a vector into the model, fitting, outputting a prediction result, calculating loss, adjusting parameters through an iterative fitting process, stopping training when the loss of a verification set does not change any more, and obtaining the disease prediction model based on doctor-patient dialogue data;
the method for predicting the trained disease prediction model comprises the following steps: converting the acquired self-describing information into 0,1 code, inputting the trained model, and outputting the probability of the disease, wherein the disease type with the highest probability is the final prediction result;
obtaining a third prediction result R3 and a third matching degree V3 after the self-describing information passes through a trained disease prediction model, wherein V3 is a normalized numerical value output by the disease prediction model;
s7, integrating the obtained first prediction result R1, the second prediction result R2 and the third prediction result R3, voting according to the matching degree, and specifically: if two results are the same, if R3 and R1 are the same, taking the first prediction result R1 as a final result R, and entering the step S8; otherwise, if R3 and R2 are the same, taking the second prediction result R2 as the final result R, and going to step S8, otherwise, selecting max (R1 = V1 × 45%, R2= V2 × 30, R3= v3 × 25%), where V1 is the matching value corresponding to R1 output by the model, V2 is the matching value corresponding to R2 output by the model, and V3 is the matching value corresponding to R3 output by the model, and going to step S8;
and S8, outputting a final result R, and recommending departments and treatment suggestions according to R.
The normalization calculation mode of the model in the scheme is generally as follows:
Figure BDA0003882346170000041
the beneficial effects of the invention are as follows:
compared with the prior art, the technical scheme provided by the invention provides three prediction modules, the three prediction modules are combined, the clustering model utilizes historical doctor-patient dialogue data, and the diagnosis result of a doctor is used as a pre-diagnosis result, so that the reliability and the scientificity are improved; the body node symptom graph is better matched with the spoken description information of the patient, so that the symptom information can be more accurately obtained; the disease prediction model based on the neural network can accept wider input description and has wider coverage; the results of the three modules are integrated, and the final pre-diagnosis result is generated by voting, so that the accuracy and the scientificity of diagnosis can be improved.
Drawings
FIG. 1 is a flow chart of the process of the present invention.
FIG. 2 is a schematic diagram of a clustering model according to the present invention.
FIG. 3 is a process of constructing a disease prediction model according to the present invention.
Fig. 4 is a diagram of body node symptoms in the present invention.
FIG. 5 is a partial node symptom graph of the present invention.
Detailed Description
The present invention is described in detail below with reference to the attached drawings.
Fig. 1 is a main flow chart of the technical solution of the present invention, which mainly includes: preprocessing doctor-patient conversation data, constructing a local doctor-patient conversation database, constructing a clustered central word matching model, constructing a disease prediction model, constructing a body node symptom graph, and generating an inquiry result.
The construction of each model is described in detail below.
(1) Doctor-patient session data collection and pre-processing
a) Doctor-patient session data are collected at hospitals both online (clove garden website, spring rain doctor website, good doctor online website) and offline.
b) Preprocessing conversation data: irrelevant sentences such as greetings are removed, description information of the patient on own symptoms, question information of the doctor on the symptoms and diagnosis results and suggestion information of the doctor in the conversation are kept, and a doctor-patient conversation data set D = { Q, A } is obtained, wherein Q represents the description of the patient on the symptoms, and A represents answers, diagnosis results and suggestion information of the doctor.
(2) Constructing local doctor-patient conversation database
a) And (2) dividing the set D obtained in the step (1) into smaller subsets D1, D2, \8230Dnby using a cosine similarity algorithm, wherein Di stores doctor-patient conversation data of the same or similar cases, and n represents the number of disease types related in the conversation data set. Example (c): doctor-patient dialogue data related to stomach illness is classified into the same category as follows:
Figure BDA0003882346170000061
b) And putting the divided conversation data set D = { D1, D2, D3 \8230Dn } into a MySQL database, and establishing a local doctor-patient conversation information database.
(3) Construction of clustering-based headword matching model
a) Dividing each sentence by the doctor' S answer under the same subset in the database obtained in the step (2) by comma, period, semicolon and all sentence pause symbols, for example, dividing a certain sentence S to obtain a segment set of the sentence S: s = { a1, a2, a3 \8230; am }, where a1 denotes the 1 st segment in the sentence S and am denotes the m-th segment in the sentence.
b) Performing N-grams word mining on the short sentence segmented in the step a), wherein in order to make each segment meaningful, N sequentially takes values of | S |, | S | -1, | S | -2, | S | -3, | \ 8230 |, and | 2, wherein | S | represents the length of the sentence, and N continuous segments are output on the segmented text by using a sliding window to generate a segment N-grams of each answer;
example (a):
if the sentence S contains 5 segments, the sentence S is first divided into 5-grams: s = { a1, a2, a3, a4, a5}, further divided into two 4-grams: { a1, a2, a3, a4} and { a2, a3, a4, a5}, and then dividing the two 4-grams into 3-grams: { a1, a2, a3}, { a2, a3, a4}, and { a2, a3, a4}, { a3, a4, a5}, and so on.
c) Next, merging and counting the same n-grams segments, filtering out segments less than 3 in number, since a number less than 3 is considered not frequent segments, while, if one segment is contained in another longer segment, removing the shorter one;
d) For the n-grams after merging and filtering, dividing two or more overlapped small fragments into an answer cluster set C, such as a3a4a5 and a1a2a3a4 in a cluster map, and finally forming an answer set by answers of the same kind of diseases;
e) Searching the center of the cluster set C, firstly selecting the sentences containing the most public fragments, and then containing the public fragmentsSelecting the sentence with the longest length from the sentences with the most segments as the center of the whole set C, for example, in the figure, a3a4a5 and a1a2a3a4 are divided into the same cluster set, and the longest A is selected i * = a1a2a3a4 as the center. The content of center a contains the type of disease and treatment recommendations given by the physician, corresponding to a series of similar self-describing subsets Q = { Q1, Q2, Q3 \8230 }, qi is a self-describing of a patient,
Figure BDA0003882346170000071
f) For Q = { Q1, Q2, Q3 \8230 }, extracting key words in each Q by using a TextRank method in a jieba package, removing discourse words, pause words and the like, and only keeping key nouns, adjectives and verbs to obtain Q '= { Q1', Q2', Q3' \8230 }.
g) For Q ' = { Q1', Q2', Q3' \ 8230; }, TF-IDF for each Q ' is calculated Gram (the word frequency statistics here is a word count of an N-gram fragment) feature value to obtain f tfidf i To as each q' j Is represented by (a);
h) When there is a question from the patient, use q new Representing, calculating q new TF-IDF of Gram Characteristic f new tfidf
i) Similarity calculation : Representing f for each q tfidf i And q is new Characteristic f of new tfidf And calculating the similarity by the following method:
Sim(q’ j ,qnew)=f tfidf i *f new tfidf /(||f tfidf i ||*||f new tfidf l |), | x | | | represents the modular length of x, and the problem q 'with the highest similarity is selected' max Question q 'is' max Center A of the problem set i * The answer is used as a candidate answer which is used for the system to carry out comprehensive evaluation and then output a final result.
(4) Construction of disease prediction models
The model building process is shown in fig. 3. The method comprises the following specific steps:
a) Automatically extracting self-describing description, disease information, treatment medication and diagnosis suggestion information corresponding to doctors of the patient from the collected original dialogue data by using natural language processing and keyword extraction technologies;
b) According to the self-description of the patient and the diagnosis result and treatment suggestion information of the doctor extracted in the last step, performing semantic feature representation, and converting one-hot codes into vector forms, such as {0, 1,0 \8230; \8230 } serving as training samples, wherein the coding length is equal to the number of related disease entities; 0 represents that the entity of the position is not mentioned, and 1 represents that the entity represented by the position is mentioned; setting the disease type given by the doctor as a label of the sample;
c) And constructing a disease prediction model by using the DQN neural network model, inputting a training sample represented by a vector into the model, fitting, outputting a prediction result, calculating loss, adjusting parameters through an iterative fitting process, stopping training when the loss of the verification set does not change any more, and obtaining the disease prediction model based on doctor-patient dialogue data.
d) And (3) converting the symptom information of the patient to be predicted into 0,1 code, inputting the trained model, and outputting the probability of the disease, wherein the disease type with the highest probability is the final prediction result.
(5) Constructing a body node symptom graph (examples are shown in figure 4 and figure 5)
Most of the descriptions of patients to their symptoms are based on a specific part of the body, and the knowledge systems of each patient are very different, and many of them are spoken descriptions, and the words for the same symptom are also very different, and are not standard medical professional terms, thus bringing difficulty to the extraction of the symptom information.
Therefore, the scheme provides a symptom graph taking body parts as nodes, and the spoken description of the symptom characteristics is added into a description set of the body nodes, wherein the set comprises the spoken description and professional medical symptom description, so that the specific parts of the symptom graph can be directly matched from the self description of the patient.
The construction process is as follows:
1) Dividing the entire body into various parts, including: head, neck, chest, abdomen, buttocks, left upper limb, right upper limb, left lower limb, right lower limb; the divided parts are divided more finely, and for example, the head part comprises: the crown of the head, forehead, eyes, nose, mouth, ears, chin, hindbrain.
2) According to the local database constructed in the step (2), collecting the description set K of each part of different patients as the part, then removing statements with similarity higher than 90% by using a cosine similarity algorithm, reserving a description with the longest information to obtain a set K ', and adding the description set K' as an attribute into a body node symptom graph so as to be convenient for later matching to a specific part;
3) Labeling by a professional physician, and labeling corresponding professional medical description or medical symptom for each description set;
4) The resulting body node symptom graph includes nodes for a plurality of body parts. Each node contains a set of spoken descriptions of a plurality of patients with common symptoms for the current site, as well as descriptions of symptoms of physician specialization, and is connected to the disease containing the symptoms.
(6) Generating an interrogation result
a) The system acquires the self-describing symptom information input by the patient;
b) Processing the symptom information data to remove irrelevant sentences;
c) And inputting the preprocessed data into a clustering model to obtain a first prediction result R1 and a first matching degree V1, wherein R1 is the disease type given by the corresponding doctor in the matched clustering center A. If V1 reaches a first threshold value T1 (85%), outputting the first prediction result R1 as a final result R, otherwise, continuing the next step d);
d) Inputting symptom description information of a patient into a body node symptom graph matching module, extracting body part information, carrying out spoken language symptom description matching in a symptom spoken language description set of a corresponding part node in a body node symptom graph to obtain a corresponding professional medical symptom, obtaining a corresponding disease through symptoms, and obtaining a second prediction result R2 and a second matching degree V2 by taking a disease type with the largest occurrence frequency as a matching result; if R2 and R1 are the same, the final result R is a first predicted result R1; if R2 is not the same as R1, if the matching degree V2 reaches a second threshold value T2 (90%), taking a second prediction result R2 as a final result R, otherwise, continuing the step e);
e) After symptom information is extracted, vector representation is carried out, the vector representation is input into a trained disease prediction model, and a predicted disease type is output to obtain a third prediction result R3 and a third matching degree V3;
f) Output results of the three modules are integrated; voting according to the matching degree, which specifically comprises the following steps: if two results are the same, if R3 and R1 are the same, then R1 is output, if R3 and R2 are the same, then R2 is output, if none of the three results is similar, then max (R1 = V1 x 45%, R2= V2 x 30, and R3= V3 x 25%) is selected as the final prediction result R, and the department and treatment recommendations are recommended according to the results.
Applying a clustering model in historical doctor-patient dialogue data, and planning the same or similar symptoms, diseases and diagnosis results and suggestions of corresponding doctors into a cluster; provides a new matching mode, and a key segment TF-IDF of symptom information self-described by a patient Gram Calculating the similarity between the frequency and the cluster, thereby matching diagnosis results and suggestions given by doctors corresponding to historical patients with similar current patient description information;
spoken description information is added into the body node symptom graph, so that the matching degree and universality are improved, and the description information of a patient can be better extracted;
and comprehensively considering the prediction results and the matching degree of the three modules of cluster matching, body node symptom graph matching and network model diagnosis to generate a final inquiry result.

Claims (1)

1. An enhanced assisted interrogation method, comprising the steps of:
s1, acquiring self-describing information of a patient;
s2, preprocessing the acquired self-describing information, and inputting the preprocessed self-describing information into a clustering model for matching to obtain a clustering matching result; the clustering model is a cluster-based headword matching model, and the construction method comprises the following steps:
firstly, constructing a doctor-patient conversation database: the method comprises the steps of collecting doctor-patient conversation data, reserving description information of a patient on own symptoms, question information of a doctor on the symptoms and diagnosis results and suggestion information of the doctor after preprocessing, and establishing a doctor-patient conversation data set D = { Q, A }, wherein Q represents the description of the patient on the symptoms, and A represents answers, diagnosis results and suggestion information of the doctor; based on a data set D, similar doctor-patient dialogue data are divided into subsets D1, D2, \ 8230Dn by using a cosine similarity algorithm, wherein Di stores doctor-patient dialogue data of the same or similar cases, and n represents the number of disease types related in the dialogue data set; establishing a doctor-patient conversation information database by adopting a divided conversation data set D = { D1, D2, D3 \8230Dn };
dividing doctor answers under the same subset in a doctor-patient conversation database into short sentences, and then performing N-grams word mining to obtain N continuous segments;
merging and filtering the obtained fragments, and then clustering to enable answers of the same type of diseases to form an answer set, and selecting sentences with the longest length from the answer set as centers A of the set to obtain a clustering model { Q', A }; wherein Q '= { Q1', Q2', Q3' \8230 }, and the acquisition mode is as follows: let the subset of patients with similar self-disciplines for the type of disease and treatment recommendation given by the doctor included in center a be Q = { Q1, Q2, Q3 \8230 }, qi is self-discipline of a certain patient,
Figure FDA0003882346160000011
by extracting the key words of each element in Q, and then calculating TF-IDF Gram The feature values yield a representation f of each q tfidf i The subscript Gram represents that the word frequency statistics is that one N-Gram segment is used as word counting;
the method for matching the clustering models comprises the following steps: defining the obtained self-describing information as q new Calculating q new TF-IDF of Gram Characteristic f new tfidf The representation f of each q' is tfidf i And q is new Characteristic f of new tfidf Calculating the similarity:
Sim(q’ j ,q new )=f tfidf i *f new tfidf /(||f tfidf i ||*||f new tfidf ||)
l x l represents the modular length of x, and the problem q 'with the highest similarity is selected' max Question q 'is' max Center A of the problem set i * As cluster match prediction results;
obtaining a first prediction result R1 and a first matching degree V1 after the information passes through a clustering model, wherein R1 is a disease type given by a corresponding doctor in a matched clustering center A, and V1 is a normalized numerical value output by the clustering model;
s3, judging whether the first matching degree V1 is larger than a set first threshold value, if so, taking the first prediction result R1 as a final result R, and entering a step S8, otherwise, entering a step S4;
s4, inputting the self-describing information into a body node symptom graph matching model for matching to obtain a symptom graph matching result; the body node symptom graph model is constructed in the following mode:
dividing a human body into a plurality of parts including a head, a neck, a chest, an abdomen, a hip, a left upper limb, a right upper limb, a left lower limb and a right lower limb; dividing each part into a plurality of different nodes to obtain a body node map;
using the doctor-patient dialogue database constructed in the S2 to collect descriptions of different patients at each part to construct a description set K, using a cosine similarity algorithm to remove sentences with similarity higher than 90%, and reserving a description with the longest information to obtain a set K';
labeling a corresponding professional medical description or medical symptom for each description set K';
correspondingly matching the labeled description set K' with the divided body node graph to obtain a body node symptom graph model, enabling each node in the body node symptom graph model to contain spoken description sets of a plurality of patients with common symptoms of the current part and professional symptom descriptions of doctors, and connecting the spoken description sets and the professional symptom descriptions to diseases containing the symptoms;
the method for matching the body node symptom graph model comprises the following steps: extracting body part information based on the self-describing information, performing spoken language symptom description matching according to a symptom spoken language description set of a corresponding part node in a body node symptom graph model to obtain a corresponding professional medical symptom, obtaining a corresponding disease according to the symptom, and taking the disease type with the largest occurrence frequency as a symptom graph matching result;
obtaining a second prediction result R2 and a second matching degree V2 after the self-information passes through the body node symptom graph matching model, wherein V2 is a normalized numerical value output by the body node symptom graph matching model;
s5, judging whether the first prediction result R1 is the same as the second prediction result R2, if so, taking the first prediction result R1 as a final result R, and entering a step S8, otherwise, judging whether the second matching degree V2 is greater than a second threshold value, if so, taking the second prediction result R2 as the final result R, and entering the step S8, otherwise, entering a step S6;
s6, converting the self-describing information into vectors and inputting the vectors into a trained disease prediction model to obtain predicted disease types; the acquisition mode of the trained disease prediction model is as follows:
automatically extracting the self-describing description, the disease information, the treatment medication and the diagnosis suggestion information corresponding to the doctor of the patient from the collected original doctor-patient dialogue data by using natural language processing and keyword extraction technology;
according to the extracted patient self-description and the diagnosis result and treatment suggestion information of the doctor, performing semantic feature representation, converting one-hot codes into a vector form to be used as a training sample, wherein the coding length is equal to the number of related disease entities; 0 represents that the entity of the position is not mentioned, and 1 represents that the entity represented by the position is mentioned; setting the disease type given by the doctor as a label of the sample;
establishing a disease prediction model by using a DQN neural network model, inputting a training sample represented by a vector into the model, fitting, outputting a prediction result, calculating loss, adjusting parameters through an iterative fitting process, stopping training when the loss of a verification set does not change any more, and obtaining a disease prediction model based on doctor-patient dialogue data;
the method for predicting the trained disease prediction model comprises the following steps: converting the acquired self-describing information into 0,1 code, inputting the trained model, and outputting the probability of the disease, wherein the disease type with the highest probability is the final prediction result;
obtaining a third prediction result R3 and a third matching degree V3 after the self-describing information passes through a trained disease prediction model, wherein V3 is a normalized numerical value output by the disease prediction model;
s7, integrating the obtained first prediction result R1, the second prediction result R2 and the third prediction result R3, voting according to the matching degree, and specifically: if two results are the same, if R3 and R1 are the same, taking the first prediction result R1 as a final result R, and entering the step S8; otherwise, if R3 and R2 are the same, taking the second prediction result R2 as the final result R, and going to step S8, otherwise, selecting max (R1 = V1 × 45%, R2= V2 × 30, R3= v3 × 25%), as the final prediction result R, and going to step S8;
and S8, outputting a final result R, and recommending departments and treatment suggestions according to R.
CN202211233086.3A 2022-10-10 2022-10-10 Enhanced auxiliary inquiry method Pending CN115565655A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211233086.3A CN115565655A (en) 2022-10-10 2022-10-10 Enhanced auxiliary inquiry method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211233086.3A CN115565655A (en) 2022-10-10 2022-10-10 Enhanced auxiliary inquiry method

Publications (1)

Publication Number Publication Date
CN115565655A true CN115565655A (en) 2023-01-03

Family

ID=84745368

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211233086.3A Pending CN115565655A (en) 2022-10-10 2022-10-10 Enhanced auxiliary inquiry method

Country Status (1)

Country Link
CN (1) CN115565655A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117059283A (en) * 2023-08-15 2023-11-14 宁波市鄞州区疾病预防控制中心 Speech database classification and processing system based on pulmonary tuberculosis early warning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117059283A (en) * 2023-08-15 2023-11-14 宁波市鄞州区疾病预防控制中心 Speech database classification and processing system based on pulmonary tuberculosis early warning
CN117059283B (en) * 2023-08-15 2024-07-02 宁波市鄞州区疾病预防控制中心 Speech database classification and processing system based on pulmonary tuberculosis early warning

Similar Documents

Publication Publication Date Title
CN107705839B (en) Disease automatic coding method and system
US20180322954A1 (en) Method and device for constructing medical knowledge graph and assistant diagnosis method
CN111897967A (en) Medical inquiry recommendation method based on knowledge graph and social media
CN110489566A (en) A kind of hospital guide's method of intelligence hospital guide's service robot
CN112786194A (en) Medical image diagnosis guide inspection system, method and equipment based on artificial intelligence
CN112802575B (en) Medication decision support method, device, equipment and medium based on graphic state machine
CN113724882B (en) Method, device, equipment and medium for constructing user portrait based on inquiry session
US20190057773A1 (en) Method and system for performing triage
Teng et al. Automatic medical code assignment via deep learning approach for intelligent healthcare
CN113051905A (en) Medical named entity recognition training model and medical named entity recognition method
CN112183026A (en) ICD (interface control document) encoding method and device, electronic device and storage medium
CN111191415A (en) Operation classification coding method based on original operation data
WO2022227203A1 (en) Triage method, apparatus and device based on dialogue representation, and storage medium
CN113764112A (en) Online medical question and answer method
Adhikari et al. A Comparative Study of Machine Learning and NLP Techniques for Uses of Stop Words by Patients in Diagnosis of Alzheimer's Disease
CN112037909A (en) Diagnostic information rechecking system
CN116992002A (en) Intelligent care scheme response method and system
CN116910172A (en) Follow-up table generation method and system based on artificial intelligence
CN117033568A (en) Medical data index interpretation method, device, storage medium and equipment
CN112182168A (en) Medical record text analysis method and device, electronic equipment and storage medium
CN115565655A (en) Enhanced auxiliary inquiry method
JabaSheela et al. A hybrid model for detecting linguistic cues in alzheimer’s disease patients
CN117194604B (en) Intelligent medical patient inquiry corpus construction method
Liao et al. Medical data inquiry using a question answering model
CN113643825A (en) Medical case knowledge base construction method and system based on clinical key characteristic information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination