CN116936080A - Preliminary diagnosis guiding method and device based on dialogue and electronic medical record - Google Patents

Preliminary diagnosis guiding method and device based on dialogue and electronic medical record Download PDF

Info

Publication number
CN116936080A
CN116936080A CN202310933373.3A CN202310933373A CN116936080A CN 116936080 A CN116936080 A CN 116936080A CN 202310933373 A CN202310933373 A CN 202310933373A CN 116936080 A CN116936080 A CN 116936080A
Authority
CN
China
Prior art keywords
dialogue
diagnosis
model
electronic medical
patient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310933373.3A
Other languages
Chinese (zh)
Inventor
尹琳
史晟辉
张拓
李张岩
卢清君
杨学来
张何明
彭丽丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Chemical Technology
China Japan Friendship Hospital
Original Assignee
Beijing University of Chemical Technology
China Japan Friendship Hospital
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Chemical Technology, China Japan Friendship Hospital filed Critical Beijing University of Chemical Technology
Priority to CN202310933373.3A priority Critical patent/CN116936080A/en
Publication of CN116936080A publication Critical patent/CN116936080A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H80/00ICT specially adapted for facilitating communication between medical practitioners or patients, e.g. for collaborative diagnosis, therapy or health monitoring
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Public Health (AREA)
  • Epidemiology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Primary Health Care (AREA)
  • Data Mining & Analysis (AREA)
  • Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The disclosure provides a preliminary diagnosis guiding method based on dialogue and electronic medical records, which comprises the following steps: acquiring a patient diagnosis guiding dialogue and an electronic medical record; preprocessing a patient diagnosis guiding dialogue and an electronic medical record, and extracting short text data; inputting the short text data into a pre-trained diagnosis guiding model to obtain a predicted diagnosis receiving doctor list and a predicted value; the pre-trained guided diagnosis model comprises a BERT embedded layer, a BERT model and a CRF joint model; acquiring auxiliary data; based on the auxiliary data, correcting the predicted value by using a post-processing method to obtain a corrected predicted value; and outputting a predicted doctor receiving list and a corrected predicted value. The present disclosure enables preliminary lead prediction through dialog and electronic medical records. Especially under the condition of nonstandard input dialogue, the method can still output the prediction of the visiting doctor with higher accuracy, and realize the rapid and accurate matching between the patient and the doctor.

Description

Preliminary diagnosis guiding method and device based on dialogue and electronic medical record
Technical Field
The disclosure relates to the technical field of medical information, in particular to a preliminary diagnosis guiding method based on dialogue and electronic medical records.
Background
The Internet hospital breaks the time space limitation, expands the medical institution service by applying information technologies such as Internet and the like, and builds an on-line and off-line integrated medical service mode covering the before-diagnosis, during-diagnosis and after-diagnosis. Particularly, during the epidemic situation of new coronavirus infection, carbon is sent in the snow of the internet hospital, which plays an important role in on-line review, remote consultation, on-line medicine start and the like, thereby improving the willingness of people to accept on-line diagnosis and treatment.
At present, the problem of lack of a preliminary diagnosis guiding link or pain spots of inaccurate preliminary diagnosis guiding procedures exists in the service function of the Internet hospital. Most of the Internet hospital platforms guide diagnosis to departments and cannot guide diagnosis to doctors. On the one hand, the patient lacks medical knowledge and describes the symptoms inaccurately, on the other hand, the existing diagnosis guiding program is insufficient in utilization of basic information of the patient, the patient has high selection job title, the selected expert is too concentrated, and the selected expert is not necessarily suitable for the illness state of the patient. These problems result in the patient not easily finding the doctor that best matches the condition and therefore the experience of hospitalization is affected.
Disclosure of Invention
The preliminary diagnosis guiding method based on the dialogue and the electronic medical record is provided, only the dialogue and the electronic medical record are input in the diagnosis guiding program by a patient, and under the condition that the input dialogue is not standard, the prediction of a doctor with higher accuracy can be output, so that the rapid and accurate matching of the patient and the doctor is realized, the intelligent degree of the preliminary diagnosis guiding program of an Internet hospital is improved, the use experience of the patient is improved, the rationalization and the utilization of medical resources are promoted, and the service level of the Internet hospital is finally improved.
In order to solve the above-mentioned purpose, the technical scheme that this disclosure provides is as follows:
in a first aspect, a preliminary diagnosis guiding method based on dialogue and electronic medical records includes the following steps:
s1: acquiring a patient diagnosis guiding dialogue and an electronic medical record;
s2: preprocessing a patient diagnosis guiding dialogue and an electronic medical record, and extracting short text data;
s3: inputting short text data into a pre-trained guided diagnosis model to obtain a predicted doctor receiving list and a predicted value, wherein the pre-trained guided diagnosis model comprises a BERT embedded layer, a BERT model and a CRF combined model;
s4: acquiring auxiliary data;
s5: based on the auxiliary data, correcting the predicted value by using a post-processing method to obtain a corrected predicted value;
s6: and outputting a predicted doctor receiving list and a corrected predicted value.
Preferably, the patient guiding session and the electronic medical record include: symptom description information and electronic medical records of patients in a guided diagnosis dialogue.
Preferably, the preprocessing of the patient diagnosis guiding dialogue and the electronic medical record in S2, and the extracting of the short text data includes the following steps:
s21: using word segmentation tools to segment symptom description information in the guided diagnosis dialogue in combination with a medical proprietary dictionary;
s22: removing stop words, noise words and punctuation marks from the word segmentation result;
s23: extracting gender, age group and past medical history information from the electronic medical record by using a custom template;
s24: combining the synonyms in the word segmentation result by combining the medical proprietary dictionary to obtain a simplified word segmentation result;
s25: and combining the simplified word segmentation result, the gender, the age group and the past medical history information into short text data.
Preferably, the word segmentation tool supports custom dictionaries, including jieba, hanlp, ICTCLAS, or LTP.
Preferably, before inputting the short text data into the pre-trained guided diagnosis model in S3 to obtain the list of predicted doctor-receiving doctors and the predicted value, the method further includes:
s0, training a guided diagnosis model to be trained;
the training of the guided diagnosis model to be trained in S0 comprises the following steps:
s01: acquiring a historical patient diagnosis guiding dialogue, a related patient diagnosis record and a sample data set of an electronic medical record;
s02: preprocessing a sample data set, and extracting a sample short text data set;
s03: acquiring patient visit doctor information according to relevant patient visit records, marking each piece of short text data in a sample short text data set by taking the patient visit doctor information as a label, and constructing a training data set;
s04: training a guided diagnosis model to be trained based on the training data set;
s05: a pre-trained lead model is obtained.
Preferably, the relevant patient visit record includes: the patient records a real visit over a period of time after completing the guided diagnosis session.
Preferably, the step S3 of inputting the short text data into the pre-trained guided diagnosis model to obtain a list of predicted doctor-receiving and predicted values includes:
inputting short text data into a BERT embedding layer to obtain vector representation of the text data, inputting the vector representation into a BERT model, taking a feature sequence obtained from the BERT model as input of a CRF model, marking and decoding the short text to obtain a tag sequence of the short text, and outputting a prediction equation S (X, y) of the tag sequence probability value as a tag sequence probability value as follows:
wherein X is a signature sequence, y is a tag sequence, A yi,yi+1 Is a transition score between feature sequences of short text data, P i,yi Is the ratio of the total score of the correctly labeled feature sequence to the total score of all the possible labeled feature sequences in the short text data, and is used to obtain the predicted value p (y|X) of the tag sequence, and the formula is as follows:
wherein Y is x Representing all possible tags.
Preferably, the acquiring auxiliary data includes: and acquiring the proportion of the consultation and the list of all the consultation doctors in each department within the selected time period.
Preferably, the correcting the predicted value by using a post-processing method based on the auxiliary data to obtain a corrected predicted value includes:
obtaining a corrected predicted value based on formula (3):
wherein P (C) is a correction predicted value, P (A) is the total number of doctors in each department divided by the department in a selected time period, P (y|X) is a predicted value of a label sequence, and P (y) is a label occurrence ratio, and the total number of doctors in the department is the total number of doctors belonging to the department on a total doctor receiving list.
In a second aspect, a preliminary diagnosis guiding device based on dialogue and electronic medical records is provided. The device comprises:
an acquisition unit: the method is used for acquiring the patient diagnosis guiding dialogue and the electronic medical record;
pretreatment unit: the method is used for preprocessing the patient diagnosis guiding dialogue and the electronic medical record and extracting short text data;
prediction unit: the method comprises the steps of inputting short text data into a pre-trained guided diagnosis model to obtain a predicted doctor receiving list and a predicted value, wherein the pre-trained guided diagnosis model comprises a BERT embedded layer, a BERT model and a CRF combined model;
auxiliary data unit: for acquiring auxiliary data;
and a correction unit: the method comprises the steps of correcting a predicted value by using a post-processing method based on auxiliary data to obtain a corrected predicted value;
an output unit: for outputting the list of predicted medical practitioners and the revised prediction value. .
Compared with the prior art, the technical scheme has at least the following beneficial effects:
according to the scheme, various data are utilized, and besides the patient consultation dialogue and the electronic medical record, the data such as the visit amount of doctors in a hospital are included. By using these data, the effects of patient symptom descriptive inaccuracy can be minimized, seasonal characteristics of the disease can be considered, office prediction can be improved, and rapid and accurate matching of patients and doctors can be finally achieved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.
FIG. 1 is a flow chart of a preliminary diagnosis guiding method based on dialogue and electronic medical records of the present disclosure;
FIG. 2 is a flowchart of a training method of the guided diagnosis model of the present disclosure;
FIG. 3 is a block diagram of a guided diagnosis model of the present disclosure;
fig. 4 is a block diagram of a preliminary diagnosis guiding apparatus based on dialogue and electronic medical records according to the present disclosure.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present disclosure. It will be apparent that the described embodiments are some, but not all, of the embodiments of the present disclosure. All other embodiments, which can be made by one of ordinary skill in the art without the need for inventive faculty, are within the scope of the present disclosure, based on the described embodiments of the present disclosure.
Unless defined otherwise, technical or scientific terms used in this disclosure should be given the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure belongs. The terms "first," "second," and the like, as used in this disclosure, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. Likewise, the terms "a," "an," or "the" and similar terms do not denote a limitation of quantity, but rather denote the presence of at least one. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect.
It should be noted that "up", "down", "left", "right", "front", "rear", and the like are used in this disclosure only to indicate a relative positional relationship, and when the absolute position of the object to be described is changed, the relative positional relationship may be changed accordingly.
Aiming at the problems that the prior internet hospital service function generally lacks a preliminary diagnosis guiding link or the preliminary diagnosis guiding program is not accurate in guiding, the invention provides a preliminary diagnosis guiding method based on dialogue and electronic medical record, which can solve the problems that a patient lacks medical knowledge, the symptomatic description is inaccurate, the basic information of the patient is not utilized by the prior diagnosis guiding program, and the like.
As shown in FIG. 1, the preliminary diagnostic method 100 based on dialog and electronic medical records includes six steps.
S1: acquiring a patient diagnosis guiding dialogue and an electronic medical record;
in some embodiments, the patient's dialogue is derived from a lead-in procedure for internet questionnaires. In the guided diagnosis procedure, the first visit patient is first guided to complete the upshift process. In the profiling process, the patient needs to fill in an electronic medical record and provide personal basic information such as gender, age and past medical history. The electronic medical record is established through input modes such as selection and the like, so that the user is prevented from inputting the electronic medical record by himself, and irregular input is generated. In the diagnosis guiding program, the dialogue input by the patient can be set, or the patient can be guided to select a certain label. In the process of self-input of the patient, a certain prompt can be carried out, so that the input of the patient is complete and standard as much as possible. For example, after a patient enters a symptom, a dialog program prompts the patient to provide the duration and severity of the symptoms. In the guide program, the personal input section may set the number of words.
S2: preprocessing a patient diagnosis guiding dialogue and an electronic medical record, and extracting short text data;
in some embodiments, the lead dialog is first extracted. The word segmentation tool is used for carrying out word segmentation on the guided diagnosis dialogue in combination with the medical proprietary dictionary. Using word segmentation tools to segment symptom description information in the guided diagnosis dialogue in combination with a medical proprietary dictionary; removing stop words, noise words and punctuation marks from the word segmentation result; extracting gender, age group and past medical history information from the electronic medical record by using a custom template; combining the synonyms in the word segmentation result by combining the medical proprietary dictionary to obtain a simplified word segmentation result; and combining the simplified word segmentation result, the gender, the age group and the past medical history information into short text data.
In some embodiments, a medical proprietary dictionary is first established. The expansion can be performed by adopting a basic Chinese medicine special dictionary. The following are several commonly used medical specific dictionaries. CMeSH, a chinese medical science thesaurus, corresponds to the english version of MeSH, is the standard thesaurus in the chinese medical field. CDT, chinese disease and diagnosis code, is the disease and diagnosis classification standard issued by the Chinese national ministry of health, and contains a large number of Chinese medical terms and words. CUMT, chinese medical vocabulary, is a medical vocabulary issued by the Chinese national drug administration, and contains common vocabulary and terms in the Chinese medical field.
In some embodiments, chinese segmentation is an important step in chinese natural language processing. The chinese word segmenter may be jieba, hanlp, ICTCLAS, or LTP. The selected segmenter needs to be able to support a custom dictionary. The word segmentation performance reference indexes are accuracy, recall rate and comprehensive index F-1 value.
The word segmentation devices are all open source tools, can be used and modified for free, and have higher word segmentation accuracy and speed.
In some embodiments, information of the electronic medical record can be extracted by formulating matching templates, such as medical history, physical examination reports, diagnostic records, and the like. Template matching may be implemented using rule-based methods. The extraction of key information generally converts the key information directly into phrases or words. For example, extract age, translate into age bracket. The past medical history is extracted and directly converted into the disease name or treatment means such as hospitalization and operation.
In some embodiments, the method of combining the guided dialog and the gender, age group and past history information into short text data may be direct superposition combination or de-superposition combination. The patient may mention a past medical history during the session, records that are inconsistent with the electronic medical record may appear, and the de-coincidence and process may increase the weight of the past medical history because the recording personnel of the past medical history are professionals.
S3: inputting the short text data into a pre-trained diagnosis guiding model to obtain a predicted diagnosis receiving doctor list and a predicted value; the pre-trained guided diagnosis model comprises a BERT embedded layer, a BERT model and a CRF joint model;
in some embodiments, the lead model is built based on the BERT model. The BERT (Bidirectional Encoder Representations from Transformers) model is a bidirectional pre-training language model based on a transducer, and is one of the most excellent pre-training models in the field of natural language processing at present. The purpose of the BERT embedding layer is to convert the input text sequence into a vector representation for subsequent model training and reasoning. As shown in FIG. 2, the guided diagnosis model comprises an input layer, a BERT model and a CRF model. Wherein the BERT model and the CRF are one joint model. Wherein the BERT model includes an embedded layer and a plurality of Transformer Encoder layers. The BERT embedding layer comprises three parts: token embedding, segment embedding, and Position embedding. Token embedding breaks the input text into words or subwords, and then maps each word or subword into a vector. Segment embedding divides the input text into segments and then assigns a vector to each Segment. Position embedding assigns a vector to each input Position. The three embedded vectors are spliced together to form a one-dimensional vector sequence as the input of the model. The BERT model is constructed as a model of two Transformer Encoder stacked layers of Transformer Encoder. Wherein each Transformer Encoder layer consists of a multi-headed self-attention mechanism and a feed-forward neural network. The outputs of these layers are all passed through a linear transformation layer and then classified or regressed using the Softmax function. Transformer Encoder layers ultimately output vectors. The multi-headed self-attention mechanism in BERT can improve the input of the convolution layer, each word of the patient symptom description is characterized as a vector, a sentence is characterized as an embedding matrix, each row of the embedding matrix is a word vector, the BERT is an important step, natural language can be better mapped into the word vector matrix, and context information is reserved.
As shown in fig. 2, the joint method of the BERT model and the joint model of the CRF is to input these output vectors into the CRF layer for label prediction. The CRF layer predicts the label of the current word depending on the label of the previous word and outputs the label as a label sequence probability value. During the training process, the model will perform loss calculation and gradient back propagation based on the differences between the entity tags and labels. The predictive equation S (X, y) for the tag sequence probability values is as follows:
wherein X is a signature sequence, y is a tag sequence, A yi,yi+1 Is a transition score between feature sequences of short text data, P i,yi Is the ratio of the total score of the correctly labeled feature sequence to the total score of all the possible labeled feature sequences in the short text data, and is used to obtain the predicted value p (y|X) of the tag sequence, and the formula is as follows:
wherein Y is x Representing all possible labels, the value of p (y|X) is 1 during training of the lead model, indicating that the predicted entity labels and labels are consistent.
In some embodiments, the output of the lead model is a list of palpations and a predicted value. Typically, the output of the guided diagnostic model is a physician.
In some embodiments, the purpose of the guided diagnostic model is to provide a pre-screening triage. The pre-examination triage is to perform preliminary medical evaluation on the patient before the outpatient service of the hospital, and provide more accurate diagnosis advice and treatment scheme for the patient through evaluating and analyzing the disease condition of the patient. Off-line hospitals, pre-examination triage is typically responsible for the nurse or doctor of the hospital. With the development of artificial intelligence, the patients can complete pre-examination and triage before outpatient service through telephone or online consultation and other modes. The Internet hospital pre-examination triage is usually completed by a triage model. The guided diagnosis model may output a doctor. Mainly depends on the granularity of the data annotation of model training.
S4: acquiring auxiliary data;
in some embodiments, the referral session, and in particular the Internet referral session, may be under-informative. Because of the seasonal and regional nature of the disease, ancillary data, such as department heat, is often an important basis for guiding a diagnosis. The department's heat refers to the number and frequency of departments that the patient has selected to visit over a period of time. By analyzing the heat of the department, the patient's need for a doctor can be better understood. Department heat data is collected and analyzed. The hospital can collect the heat data of various departments through the modes of an outpatient registration system, a visit record and the like, such as the number of patients, the frequency of the patients, the time of the patients and the like. Such data may be acquired by an electronic medical records system and analyzed and processed by data analysis software. And also requires the collection of a complete list of the medical practitioners.
In some embodiments, the amount of inquiry for each department may reflect the department's heat during the selected time period. Department heat statistics generally adopt different time granularity such as year, month, zhou Du, day and the like.
S5: based on the auxiliary data, correcting the predicted value by using a post-processing method to obtain a corrected predicted value;
in some embodiments, department warmth may be used to supplement the patient's symptom description. Patients are prone to lack of consciousness for some symptoms or are prone to omission in the description. And in the same period, the visit amount of a certain department is greatly increased. A suitable department can be inferred by combining incomplete symptom descriptions with department visits.
In some embodiments, the Bayesian equation is a mathematical formula for inferring the probability of occurrence of an event, the basic principle of which is to combine a priori knowledge with new evidence to arrive at a posterior probability.
In particular, bayesian equations describe how the estimate of the probability of occurrence of an event is updated with new evidence given some prior conditions. The equation is as follows:
P(A|B)=P(B|A)*P(A)/P(B)
where P (a|b) represents the probability of event a occurring given that B occurs; p (b|a) represents the probability that B occurs in the event a occurs; p (A) is a priori probability, which indicates the probability of event A occurring without new evidence; p (B) is a normalization constant for ensuring that the sum of posterior probabilities is 1.
In some embodiments, the predicted value is modified based on a Bayesian equation. The obtaining of the corrected predicted value includes:
wherein P (C) is a correction predicted value, P (A) is the total number of doctors in each department divided by the department in a selected time period, P (y|X) is a predicted value of a label sequence, and P (y) is a label occurrence ratio, and the total number of doctors in the department is the total number of doctors belonging to the department on a doctor receiving list.
S6: outputting a predicted doctor receiving list and a corrected predicted value;
in some embodiments, the patient predicted matching physician may be recommended based on the revised prediction value. Only one doctor with the highest predicted value may be provided. Multiple doctors may also be provided if the revised predictions are all relatively close.
It should be noted that, the application scenario of the preliminary diagnosis guiding based on the dialogue and the electronic medical record includes an internet hospital, an online medical self-alert, an offline hospital intelligent diagnosis guiding system or a family doctor system. Based on the dialogue and the electronic medical record, the treatment efficiency of the patient can be improved, and the waste of medical resources is avoided. The threshold is lower for the patient. Meanwhile, the doctor can primarily know the illness state by primarily guiding the doctor.
Figure 3 shows a training method flow chart 300 of the guided diagnostic model.
As shown in fig. 3, the training method of the guided diagnosis model includes the following steps:
s01: acquiring a historical patient diagnosis guiding dialogue, a related patient diagnosis record and a sample data set of an electronic medical record;
in some embodiments, a large amount of patient guide dialog and data for electronic medical records are required as training data. There are two general data sources, self-collection data and public data sources. The self-acquisition of data includes acquisition of offline diagnosis guiding dialogue by using equipment such as recording and the like, and conversion of the offline diagnosis guiding dialogue into data. Dialogue information may also be extracted from the medical records.
Presently known sources of chinese published data include: meddialogs, a data set of medical questions and answers issued by the university of bloom natural language processing and social personal computing laboratory. CMedQA, a medical question and answer data set issued by the national defense science and technology university. The disadvantage of the disclosed data source is the lack of electronic medical records, which can be randomly generated as a supplement.
S02: preprocessing a sample data set, and extracting a sample short text data set;
firstly, word segmentation is carried out on the guided diagnosis dialogue by using a word segmentation tool and combining a medical proprietary dictionary; then, the word segmentation result is used for removing stop words, noise words and punctuation marks; if the electronic medical record exists, sex, age group and past medical history information are extracted from the electronic medical record by using the custom template. Then combining the medical proprietary dictionary to perform synonym combination on the word segmentation result; and finally, combining the word segmentation result, the gender, the age group and the past medical history information into short text data.
S03: and acquiring patient visit doctor information according to the relevant patient visit record, marking each piece of short text data in the sample short text data set by taking the patient visit doctor information as a label, and constructing a training data set.
Regarding the acquisition of a patient visit doctor, on the one hand a doctor based on a real visit can be used as a label. The dialogs can also be annotated by medical background personnel based on medical knowledge. Meanwhile, the advice of the consultation guiding personnel can be used as a label according to the offline consultation guiding record.
It should be noted that the label is most accurate by the doctor who actually accesses. But associating truly accessed doctors with dialogue information often presents challenges. For example, the patient may not provide sufficient information during the lead diagnosis, nor may the hospital adequately process the data during the lead diagnosis. Or the patient is not consulted for his or her situation. Therefore, if a truly accessed doctor is used as a label, the data auditing needs to be manually carried out, and the higher relevance of the truly accessed doctor is ensured.
In some embodiments, the model training includes a large amount of corpus. The BERT model is a pre-trained language model that can be pre-trained on a large corpus to learn a generic representation of the language. Thereby improving the understanding degree of the model to the language. Compared with the traditional sequence labeling model, the BERT model can utilize more context information, so that the semantic and grammar information of the language can be captured better, and the use threshold of a patient is reduced. The CRF model is a conditional random field model that can be used to label or classify sequential data. The CRF model can model the state transition process of the entire sequence so that more context information can be utilized. By combining the advantages of BERT and CRF, the joint model can achieve the most advanced performance in various sequence labeling tasks.
S04: training a guided diagnosis model to be trained based on the training data set;
the diagnosis guiding model comprises a BERT embedded layer, a BERT model and a CRF model; as shown in fig. three, the goal of the guided diagnostic model is to assign labels to incoming dialogs. Training the joint model of the BERT model and the CRF model, word segmentation and label adding are necessary steps. The training process should be optimized using an appropriate optimizer, such as Adam or SGD. The hyper-parameters of the model, such as the learning rate and batch size, should be carefully adjusted to obtain the best performance. The performance of the model should be evaluated on the retained validation set to monitor its performance and prevent overfitting.
S05: acquiring a pre-trained diagnosis guiding model; the pre-trained lead model can be used as a lead prediction. When new training data is acquired, or doctor information is updated, the model needs to be retrained.
The foregoing is a description of embodiments of the method, and the following further describes embodiments of the present disclosure through examples of apparatus.
Fig. 4 shows a block diagram 400 of a preliminary diagnostic device based on dialog and electronic medical records. As shown in fig. 4, the apparatus 400 includes: the acquisition unit 410: the method is used for acquiring the patient diagnosis guiding dialogue and the electronic medical record; the preprocessing unit 420: the method is used for preprocessing the patient diagnosis guiding dialogue and the electronic medical record and extracting short text data; prediction unit 430: the method comprises the steps of inputting short text data into a pre-trained guided diagnosis model to obtain a predicted doctor receiving list and a predicted value, wherein the pre-trained guided diagnosis model comprises a BERT embedded layer, a BERT model and a CRF combined model; auxiliary data unit 440: for acquiring auxiliary data; correction unit 450: the method comprises the steps of correcting a predicted value by using a post-processing method based on auxiliary data to obtain a corrected predicted value; the output unit 460: for outputting the list of predicted medical practitioners and the revised prediction value.
In summary, the present disclosure utilizes a variety of data, including data such as the amount of physician visits in a hospital in addition to patient guide dialogs and electronic medical records. By using these data, the effects of patient symptom descriptive inaccuracy can be minimized, seasonal characteristics of the disease can be considered, office prediction can be improved, and rapid and accurate matching of patients and doctors can be finally achieved.
The following points need to be described:
(1) The drawings of the embodiments of the present disclosure relate only to the structures related to the embodiments of the present disclosure, and other structures may refer to the general design.
(2) In the drawings for describing embodiments of the present disclosure, the thickness of layers or regions is exaggerated or reduced for clarity, i.e., the drawings are not drawn to actual scale. It will be understood that when an element such as a layer, film, region or substrate is referred to as being "on" or "under" another element, it can be "directly on" or "under" the other element or intervening elements may be present.
(3) The embodiments of the present disclosure and features in the embodiments may be combined with each other to arrive at a new embodiment without conflict.
The above is merely a specific embodiment of the disclosure, but the protection scope of the disclosure should not be limited thereto, and the protection scope of the disclosure should be subject to the claims.

Claims (10)

1. The preliminary diagnosis guiding method based on the dialogue and the electronic medical record is characterized by comprising the following steps of:
s1: acquiring a patient diagnosis guiding dialogue and an electronic medical record;
s2: preprocessing a patient diagnosis guiding dialogue and an electronic medical record, and extracting short text data;
s3: inputting the short text data into a pre-trained diagnosis guiding model to obtain a predicted diagnosis receiving doctor list and a predicted value; the pre-trained guided diagnosis model comprises a BERT embedded layer, a BERT model and a CRF joint model;
s4: acquiring auxiliary data;
s5: based on the auxiliary data, correcting the predicted value by using a post-processing method to obtain a corrected predicted value;
s6: and outputting a predicted doctor receiving list and a corrected predicted value.
2. The preliminary diagnosis guiding method based on dialogue and electronic medical records as claimed in claim 1, wherein the patient diagnosis guiding dialogue and electronic medical records comprises: symptom description information and electronic medical records of patients in a guided diagnosis dialogue.
3. The preliminary diagnosis guiding method based on dialogue and electronic medical records according to claim 1 or 2, wherein the step of preprocessing the dialogue for guiding the patient and the electronic medical records in S2, and extracting the short text data comprises the steps of:
s21: using word segmentation tools to segment symptom description information in the guided diagnosis dialogue in combination with a medical proprietary dictionary;
s22: removing stop words, noise words and punctuation marks from the word segmentation result;
s23: extracting gender, age group and past medical history information from the electronic medical record by using a custom template;
s24: combining the synonyms in the word segmentation result by combining the medical proprietary dictionary to obtain a simplified word segmentation result;
s25: and combining the simplified word segmentation result, the gender, the age group and the past medical history information into short text data.
4. The method of claim 3, wherein the word segmentation tool supports custom dictionaries, including jieba, hanlp, ICTCLAS or LTP.
5. The preliminary triage method based on dialogue and electronic medical records according to claim 1, wherein before inputting the short text data into the pre-trained triage model to obtain the list of predicted triage doctors and the predicted value in S3, the method further comprises:
s0, training a guided diagnosis model to be trained;
the training of the guided diagnosis model to be trained in S0 comprises the following steps:
s01: acquiring a historical patient diagnosis guiding dialogue, a related patient diagnosis record and a sample data set of an electronic medical record;
s02: preprocessing a sample data set, and extracting a sample short text data set;
s03: acquiring patient visit doctor information according to relevant patient visit records, marking each piece of short text data in a sample short text data set by taking the patient visit doctor information as a label, and constructing a training data set;
s04: training a guided diagnosis model to be trained based on the training data set;
s05: a pre-trained lead model is obtained.
6. The preliminary diagnosis guiding method based on dialogue and electronic medical records as claimed in claim 5, wherein the relevant patient visit record comprises: the patient records a real visit over a period of time after completing the guided diagnosis session.
7. The preliminary diagnosis guiding method based on dialogue and electronic medical records according to claim 1, wherein the step S3 of inputting short text data into a pre-trained diagnosis guiding model to obtain a list of predicted doctor-receiving and predicted values includes:
inputting short text data into a BERT embedding layer to obtain vector representation of the text data, inputting the vector representation into a BERT model, taking a feature sequence obtained from the BERT model as input of a CRF model, marking and decoding the short text to obtain a tag sequence of the short text, and outputting a prediction equation S (X, y) of the tag sequence probability value as a tag sequence probability value as follows:
wherein X is a signature sequence, y is a tag sequence, A yi,yi+1 Is a transition score between feature sequences of short text data,is the ratio of the total score of the correctly labeled feature sequence to the total score of all the possible labeled feature sequences in the short text data, and is used to obtain the predicted value p (y|X) of the tag sequence, and the formula is as follows:
wherein Y is x Representing all possible tags.
8. The preliminary diagnosis guiding method based on dialogue and electronic medical records as claimed in claim 1, wherein the acquiring auxiliary data comprises: and acquiring the proportion of the consultation and the list of all the consultation doctors in each department within the selected time period.
9. The preliminary diagnosis guiding method based on dialogue and electronic medical records according to claim 1, wherein the correcting the predicted value based on the auxiliary data by using a post-processing method to obtain a corrected predicted value includes:
obtaining a corrected predicted value based on formula (3):
wherein P (C) is a correction predicted value, P (A) is the total number of doctors in each department divided by the department in a selected time period, P (y|X) is a predicted value of a label sequence, and P (y) is a label occurrence ratio, and the total number of doctors in the department is the total number of doctors belonging to the department on a total doctor receiving list.
10. The utility model provides a preliminary diagnosis guiding device based on dialogue and electronic medical record which characterized in that includes:
an acquisition unit: the method is used for acquiring the patient diagnosis guiding dialogue and the electronic medical record;
pretreatment unit: the method is used for preprocessing the patient diagnosis guiding dialogue and the electronic medical record and extracting short text data;
prediction unit: the method comprises the steps of inputting short text data into a pre-trained guided diagnosis model to obtain a predicted doctor receiving list and a predicted value, wherein the pre-trained guided diagnosis model comprises a BERT embedded layer, a BERT model and a CRF combined model;
auxiliary data unit: for acquiring auxiliary data;
and a correction unit: the method comprises the steps of correcting a predicted value by using a post-processing method based on auxiliary data to obtain a corrected predicted value;
an output unit: for outputting the list of predicted medical practitioners and the revised prediction value.
CN202310933373.3A 2023-07-27 2023-07-27 Preliminary diagnosis guiding method and device based on dialogue and electronic medical record Pending CN116936080A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310933373.3A CN116936080A (en) 2023-07-27 2023-07-27 Preliminary diagnosis guiding method and device based on dialogue and electronic medical record

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310933373.3A CN116936080A (en) 2023-07-27 2023-07-27 Preliminary diagnosis guiding method and device based on dialogue and electronic medical record

Publications (1)

Publication Number Publication Date
CN116936080A true CN116936080A (en) 2023-10-24

Family

ID=88392198

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310933373.3A Pending CN116936080A (en) 2023-07-27 2023-07-27 Preliminary diagnosis guiding method and device based on dialogue and electronic medical record

Country Status (1)

Country Link
CN (1) CN116936080A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110534185A (en) * 2019-08-30 2019-12-03 腾讯科技(深圳)有限公司 Labeled data acquisition methods divide and examine method, apparatus, storage medium and equipment
CN110993081A (en) * 2019-12-03 2020-04-10 济南大学 Doctor online recommendation method and system
CN113611433A (en) * 2021-08-26 2021-11-05 中国医学科学院阜外医院 Auxiliary system and method for inquiry
CN113889259A (en) * 2021-09-06 2022-01-04 浙江工业大学 Automatic diagnosis dialogue system under assistance of knowledge graph
CN116110047A (en) * 2023-02-23 2023-05-12 四川大学华西医院 Method and system for constructing structured electronic medical record based on OCR-NER
CN116432645A (en) * 2023-01-17 2023-07-14 西安石油大学 Traffic accident named entity recognition method based on pre-training model

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110534185A (en) * 2019-08-30 2019-12-03 腾讯科技(深圳)有限公司 Labeled data acquisition methods divide and examine method, apparatus, storage medium and equipment
CN110993081A (en) * 2019-12-03 2020-04-10 济南大学 Doctor online recommendation method and system
CN113611433A (en) * 2021-08-26 2021-11-05 中国医学科学院阜外医院 Auxiliary system and method for inquiry
CN113889259A (en) * 2021-09-06 2022-01-04 浙江工业大学 Automatic diagnosis dialogue system under assistance of knowledge graph
CN116432645A (en) * 2023-01-17 2023-07-14 西安石油大学 Traffic accident named entity recognition method based on pre-training model
CN116110047A (en) * 2023-02-23 2023-05-12 四川大学华西医院 Method and system for constructing structured electronic medical record based on OCR-NER

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李灵芳;杨佳琦;李宝山;杜永兴;胡伟健;: "基于BERT的中文电子病历命名实体识别", 内蒙古科技大学学报, no. 01 *

Similar Documents

Publication Publication Date Title
Zeng et al. MedDialog: Large-scale medical dialogue datasets
US11823798B2 (en) Container-based knowledge graphs for determining entity relations in non-narrative text
US10699215B2 (en) Self-training of question answering system using question profiles
CN110705293A (en) Electronic medical record text named entity recognition method based on pre-training language model
US11610678B2 (en) Medical diagnostic aid and method
US20180089383A1 (en) Container-Based Knowledge Graphs for Determining Entity Relations in Medical Text
US20200118683A1 (en) Medical diagnostic aid and method
CN112800766B (en) Active learning-based Chinese medical entity identification labeling method and system
Teng et al. Automatic medical code assignment via deep learning approach for intelligent healthcare
US10847261B1 (en) Methods and systems for prioritizing comprehensive diagnoses
WO2021139231A1 (en) Triage method and apparatus based on neural network model, and computer device
Tang et al. Terminology-aware medical dialogue generation
Zhu et al. Proposing causal sequence of death by neural machine translation in public health informatics
CN117877660A (en) Medical report acquisition method and system based on voice recognition
CN112860842A (en) Medical record labeling method and device and storage medium
CN115565655A (en) Enhanced auxiliary inquiry method
CN116936080A (en) Preliminary diagnosis guiding method and device based on dialogue and electronic medical record
CN115713992A (en) Data analysis system and data analysis method
Yuan et al. Numerical Feature Transformation-Based Sequence Generation Model for Multi-Disease Diagnosis
Biswas et al. Can ChatGPT be Your Personal Medical Assistant?
CN110289065A (en) A kind of auxiliary generates the control method and device of medical electronic report
CN117194604B (en) Intelligent medical patient inquiry corpus construction method
Rasipuram et al. Towards Generating Contextual and Empathetic Response for Covid-related Queries
Dao et al. Patient Similarity using Electronic Health Records and Self-supervised Learning
CN113643825B (en) Medical case knowledge base construction method and system based on clinical key feature information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination