CN116665922A - Doctor-patient communication method and system - Google Patents
Doctor-patient communication method and system Download PDFInfo
- Publication number
- CN116665922A CN116665922A CN202310949211.9A CN202310949211A CN116665922A CN 116665922 A CN116665922 A CN 116665922A CN 202310949211 A CN202310949211 A CN 202310949211A CN 116665922 A CN116665922 A CN 116665922A
- Authority
- CN
- China
- Prior art keywords
- doctor
- patient
- data
- category
- accuracy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000006854 communication Effects 0.000 title claims abstract description 84
- 238000004891 communication Methods 0.000 title claims abstract description 82
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000012549 training Methods 0.000 claims abstract description 36
- 238000012360 testing method Methods 0.000 claims abstract description 30
- 238000007781 pre-processing Methods 0.000 claims abstract description 19
- 238000004140 cleaning Methods 0.000 claims abstract description 13
- 238000003745 diagnosis Methods 0.000 claims abstract description 11
- 238000011282 treatment Methods 0.000 claims abstract description 10
- 238000001914 filtration Methods 0.000 claims description 24
- 238000010606 normalization Methods 0.000 claims description 23
- 238000012545 processing Methods 0.000 claims description 14
- 238000004364 calculation method Methods 0.000 claims description 12
- 238000011156 evaluation Methods 0.000 claims description 12
- 230000002159 abnormal effect Effects 0.000 claims description 10
- 238000000605 extraction Methods 0.000 claims description 7
- 208000025174 PANDAS Diseases 0.000 claims description 6
- 208000021155 Paediatric autoimmune neuropsychiatric disorders associated with streptococcal infection Diseases 0.000 claims description 6
- 240000004718 Panda Species 0.000 claims description 6
- 235000016496 Panda oleosa Nutrition 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 6
- 238000012795 verification Methods 0.000 claims description 6
- 230000006870 function Effects 0.000 description 25
- 208000024891 symptom Diseases 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 4
- 238000011269 treatment regimen Methods 0.000 description 3
- 238000012854 evaluation process Methods 0.000 description 2
- 201000009240 nasopharyngitis Diseases 0.000 description 2
- 206010013700 Drug hypersensitivity Diseases 0.000 description 1
- 230000000172 allergic effect Effects 0.000 description 1
- 208000010668 atopic eczema Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 201000005311 drug allergy Diseases 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H80/00—ICT specially adapted for facilitating communication between medical practitioners or patients, e.g. for collaborative diagnosis, therapy or health monitoring
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
- G06F18/24155—Bayesian classification
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Medical Informatics (AREA)
- Health & Medical Sciences (AREA)
- Public Health (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Primary Health Care (AREA)
- General Health & Medical Sciences (AREA)
- Epidemiology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biomedical Technology (AREA)
- Evolutionary Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Pathology (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Medical Treatment And Welfare Office Work (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
Abstract
The invention relates to the field of doctor-patient communication, in particular to a doctor-patient communication method and system, which greatly improve the accuracy and stability of a doctor-patient communication system. The doctor-patient communication method comprises the following steps: collecting medical information of a patient, wherein the medical information comprises basic information, medical history records, diagnosis results and treatment schemes of the patient; preprocessing the acquired medical information, wherein the preprocessing comprises cleaning, denoising and normalizing; extracting specific characteristics from the pretreated medical information, wherein the specific characteristics comprise the age, sex and severity of the illness state of the patient; training the extracted specific features by using a naive Bayesian algorithm, and establishing a prediction model of doctor-patient communication; evaluating the established prediction model of the doctor-patient communication by using a test data set to determine the prediction accuracy and stability of the model; and carrying out doctor-patient communication based on the estimated prediction model. The invention is suitable for communication between doctors and patients.
Description
Technical Field
The invention relates to the field of doctor-patient communication, in particular to a doctor-patient communication method and system.
Background
The traditional doctor-patient communication modes comprise the following steps:
1. face-to-face communication: the patient can physically visit a hospital or clinic to communicate face to face with a doctor who can make diagnoses and advice based on the patient's condition and symptoms.
2. Telephone consultation: the patient can consult his own condition and symptoms via telephone to the doctor, who will make preliminary diagnoses and advice based on the information provided in the telephone.
3. Short message or mail consultation: the patient can consult the doctor for own illness and symptoms through short messages or mails, and the doctor can communicate and guide in a mode of replying to the short messages or mails.
4. On-line inquiry: the patient can be subjected to on-line inquiry through an Internet platform, and a doctor can diagnose and recommend the patient through text, voice or video and the like.
Face-to-face communication can be used for more directly knowing the illness and symptoms of a patient, but the patient is required to go to a hospital or clinic, a great deal of time is required, the efficiency is low, and the convenience is extremely high; telephone consultation and short message/mail consultation are convenient and quick, but patients may not be accurately described, doctors may not be able to comprehensively understand the illness state of the patients, and the accuracy of communication is low; the on-line inquiry can save time and cost, but needs a certain network skill and equipment for the patient, and doctors can only carry out unilateral diagnosis according to the description of the patient to give out patient advice, so that the on-line inquiry has low efficiency and low accuracy.
In the prior art, as disclosed in CN105812376a, in a doctor-patient multiparty instant messaging system constructed by using strophe, after the client program is started, the bottom layer communication module starts monitoring. The data packet is converted into a data packet with a structhe protocol format through a protocol analysis module, and the request is newly sent to a communication server side for processing through a bottom communication protocol. The instant communication system platform provides various functional interfaces for the instant communication system, is convenient for the client to demand, and the client expands corresponding functions according to own demand.
The embodiment of the scheme provides an instant communication message technology based on the strophe, and a doctor-patient multiparty instant communication mechanism management mechanism constructed by the strophe is applied in the communication process, so that the success rate of message updating can be greatly improved. The efficiency of doctor-patient communication is improved, but the accuracy and stability of doctor-patient communication are poor.
Disclosure of Invention
The invention aims to provide a doctor-patient communication method, which greatly improves the accuracy and stability of a doctor-patient communication system and ensures the efficiency of doctor-patient communication.
The invention adopts the following technical scheme to achieve the aim, and the doctor-patient communication method comprises the following steps:
step 1, acquiring medical information of a patient, wherein the medical information comprises basic information, medical history records, diagnosis results and treatment schemes of the patient;
step 2, preprocessing the acquired medical information, wherein the preprocessing comprises cleaning, denoising and normalizing;
step 3, extracting specific characteristics from the preprocessed medical information, wherein the specific characteristics comprise the age, sex and severity of the illness state of the patient;
step 4, training the extracted specific features by using a naive Bayesian algorithm, and establishing a prediction model of doctor-patient communication;
step 5, evaluating the established prediction model of the doctor-patient communication by using a test data set to determine the prediction accuracy and stability of the model;
and 6, performing doctor-patient communication based on the estimated prediction model, wherein the method specifically comprises the following steps of: and taking the input information of the patient as the input of the estimated predictive model, and presenting the output information of the estimated predictive model to a doctor, wherein the doctor replies the patient according to the output information.
Further, step 2 specifically includes:
step 201, in a data cleaning stage, removing duplicate data, missing data and abnormal values, specifically including: removing repeated data by using a drop_redundant function in a pandas library in Python, filling missing values by using a filter function, detecting abnormal values by using an outlier function, and processing;
step 202, removing noise and interference in signals in a data denoising stage, which specifically includes: using a signal module in a scipy library in Python to perform data denoising for the first time; performing median filtering by using a medfilt function, performing low-pass filtering by using an lfilter function, and performing secondary denoising and interference removal by the median filtering and the low-pass filtering;
step 203, in the data normalization stage, scaling the data ranges of different features to the same range, specifically including: data normalization processing is performed using the MinMaxScale class in the sklearn library in Python, and includes subtracting a minimum value from the data for each feature and then dividing by a maximum value to scale the data range to between 0 and 1.
The repeated data, the missing data and the abnormal values are removed in the data cleaning stage, the noise and the interference in the signals are removed in the data denoising stage, and the data ranges of different characteristics are scaled to be within the same range in the data normalization stage, so that the data quality is greatly improved, the data format is unified, the subsequent analysis difficulty is reduced, and the subsequent modeling efficiency is improved.
Further, extracting specific features from the preprocessed medical information specifically includes:
after the data normalization process is completed, the SelectKBest class in the scikit-learn library in Python is used for carrying out feature selection, the top k features are selected, the correlation coefficient or information gain index between the top k features and the target variable is calculated to determine which features are the most important, the most important features are taken as specific features, and k is an integer larger than 0. By the scheme, the accuracy of feature extraction is improved.
Further, training the extracted specific features by using a naive bayes algorithm specifically includes:
step 401, dividing the extracted specific features into a training set and a verification set, and determining the number of samples and the number of categories in the training set;
step 402, for each category, calculating the prior probability distribution according to the historical data, and calculating the posterior probability distribution of each category according to the characteristics and the category information by using the bayesian theorem, wherein the calculation formula is as follows:
wherein y is i Representing the class of the sample, x i Representing the characteristics of the sample, P (x i |y i ) Expressed in given y i Feature x i Probability of occurrence, P (y i ) Representing class y i Is a priori probability of P (x) i ) Representing sample characteristics x i Probability of occurrence in training set, P (y i |x i ) Representing posterior probability distribution;
step 403, building a classifier according to the posterior probability distribution obtained by calculation to predict, specifically comprising taking the posterior probability of each category as the weight of the category, weighting and summing the characteristics of all samples, mapping the result to between 0 and 1 through a softmax function, and finally, selecting the category with the highest probability as a prediction result.
Through the training process, for some data sets with a plurality of characteristics, the number of the characteristics can be effectively reduced, the efficiency and the accuracy of the model can be improved, and for the case of a few samples in certain categories, the data sets can be balanced by a weighting method and the like.
Further, the evaluating the established predictive model of the doctor-patient communication using the test data set specifically includes:
step 501, inputting each sample in the test data set into a prediction model, calculating the probability that the prediction model belongs to each category of the sample, selecting the category with the highest probability as a first prediction result of the sample, and selecting the category with the smallest probability as a second prediction result of the sample;
step 502, comparing a first prediction result and a second prediction result of the prediction model on the test data set with the real labels respectively, and calculating the accuracy of the prediction model on the test data set to obtain a corresponding first accuracy and a corresponding second accuracy;
step 503, multiplying the first accuracy rate by a first weight to obtain a first accuracy rate comparison reference value; multiplying the second accuracy rate by a second weight to obtain a second accuracy rate comparison reference value; the first weight is the weight of the category with the largest probability, and the second weight is the weight of the category with the smallest probability;
step 504, the first accuracy comparison reference value and the second accuracy comparison reference value are differenced, the difference value is compared with a set threshold range, and if the difference value is within the set threshold range, the accuracy of the prediction model is judged to be in a reasonable range.
By the above evaluation process, the accuracy of the evaluation process can be improved. By using the test dataset, the generalization ability of the model on unknown data can be verified, i.e., whether the model can correctly predict new data.
A doctor-patient communication system, the doctor-patient communication system comprising:
the data acquisition module is used for acquiring medical information of a patient, wherein the medical information comprises basic information, medical history records, diagnosis results and treatment schemes of the patient;
the data preprocessing module is used for preprocessing acquired medical information, and the preprocessing operation comprises cleaning, denoising and normalization;
the characteristic extraction module is used for extracting specific characteristics from the preprocessed medical information, wherein the specific characteristics comprise the age, sex and severity of the illness state of the patient;
the model training module is used for training the extracted specific features by using a naive Bayesian algorithm and establishing a prediction model of doctor-patient communication;
the model evaluation module is used for evaluating the established prediction model of the doctor-patient communication by using the test data set so as to determine the prediction accuracy and stability of the model evaluation module;
the communication module is used for carrying out doctor-patient communication based on the estimated prediction model, and specifically comprises the following steps: and taking the input information of the patient as the input of the estimated predictive model, and presenting the output information of the estimated predictive model to a doctor, wherein the doctor replies the patient according to the output information.
Further, the data preprocessing module is specifically configured to remove duplicate data, missing data, and outliers in a data cleaning stage, and specifically includes: removing repeated data by using a drop_redundant function in a pandas library in Python, filling missing values by using a filter function, detecting abnormal values by using an outlier function, and processing;
in the data denoising stage, removing noise and interference in signals, specifically including: using a signal module in a scipy library in Python to perform data denoising for the first time; performing median filtering by using a medfilt function, performing low-pass filtering by using an lfilter function, and performing secondary denoising and interference removal by the median filtering and the low-pass filtering;
in the data normalization stage, the data ranges of different features are scaled to be within the same range, and specifically comprises the following steps: data normalization processing is performed using the MinMaxScale class in the sklearn library in Python, and includes subtracting a minimum value from the data for each feature and then dividing by a maximum value to scale the data range to between 0 and 1.
Further, the feature extraction module is specifically configured to, after the data normalization process is completed, select the top k features by using the SelectKBest class in the scikit-learn library in Python, calculate correlation coefficients or information gain indexes between the top k features and the target variable, so as to determine which features are the most important features, take the most important features as specific features, and k is an integer greater than 0.
The model training module is specifically used for dividing the extracted specific features into a training set and a verification set, and determining the number of samples and the number of categories in the training set;
for each category, calculating the prior probability distribution according to the historical data, and calculating the posterior probability distribution of each category according to the characteristics and the category information by using the Bayesian theorem, wherein the calculation formula is as follows:
wherein y is i Representing the class of the sample, x i Representing the characteristics of the sample, P (x i |y i ) Expressed in given y i Feature x i Probability of occurrence, P (y i ) Representing class y i Is a priori probability of P (x) i ) Representing sample characteristics x i Probability of occurrence in training set, P (y i |x i ) Representing posterior probability distribution;
establishing a classifier according to the posterior probability distribution obtained by calculation to predict, specifically comprising taking the posterior probability of each category as the weight of the category, weighting and summing the characteristics of all samples, mapping the result to between 0 and 1 through a softmax function to obtain the probability that the samples belong to each category, and finally, selecting the category with the maximum probability as a prediction result.
Further, the model evaluation module is specifically configured to input each sample in the test data set into the prediction model, calculate a probability that the prediction model belongs to each class for the sample, select a class with a maximum probability as a first prediction result of the sample, and select a class with a minimum probability as a second prediction result of the sample;
comparing the first prediction result and the second prediction result of the prediction model on the test data set with the real labels respectively, and calculating the accuracy of the prediction model on the test data set to obtain corresponding first accuracy and second accuracy;
multiplying the first accuracy rate by first weight to obtain a first accuracy rate comparison reference value; multiplying the second accuracy rate by a second weight to obtain a second accuracy rate comparison reference value; the first weight is the weight of the category with the largest probability, and the second weight is the weight of the category with the smallest probability;
and comparing the difference between the first accuracy comparison reference value and the second accuracy comparison reference value, and comparing the difference value with a set threshold range, and if the difference value is within the set threshold range, judging that the accuracy of the prediction model is within a reasonable range.
The beneficial effects of the invention are as follows:
the naive Bayes-based doctor-patient communication can realize rapid automatic classification and prediction, and save time and energy of doctors.
The naive Bayesian algorithm provided by the invention assumes that sample data obeys Gaussian distribution, can effectively process unbalanced data sets, and has higher accuracy and stability.
The naive Bayes-based doctor-patient communication can conveniently perform model training and application expansion, and support large-scale data processing and analysis.
According to the naive Bayesian-based doctor-patient communication method, automatic classification and prediction can be realized, the requirement for manual intervention is reduced, and the doctor-patient communication efficiency and accuracy are improved.
Drawings
Fig. 1 is a flowchart of a method for communicating between a doctor and a patient according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention provides a doctor-patient communication method, as shown in fig. 1, comprising the following steps:
s1, collecting medical information of a patient
Wherein the medical information includes basic information of the patient, medical history record, diagnosis result and treatment scheme.
In one embodiment of the invention, the manner in which medical information of a patient is acquired includes:
determining the type of the acquired information: depending on the purpose and scope of the physician-patient communication, it is determined which types of medical information, such as basic information, medical history, diagnostic results, treatment regimen, etc., need to be collected.
Collecting patient basic information: including name, gender, age, contact, etc. Such information may be obtained by way of patient filled forms, physician inquiries, and the like.
Collecting a patient's medical history: the past medical history, family medical history, allergic history, etc. of the patient are collected by querying the patient or looking up a record of medical records, etc.
Confirming the diagnosis result of the patient: if the patient has received an examination and diagnosis from a hospital or clinic, the doctor or medical institution may be queried for the patient's diagnosis.
Treatment regimen for the collection patient: if the patient has begun to receive treatment, the patient's treatment regimen, medication, etc. may be collected from the doctor or medical facility.
Patient symptoms and feedback were recorded: in doctor-patient communication, a doctor may ask the patient for symptoms and feedback, and the information may help the doctor to better understand the condition and treatment effect of the patient.
S2, preprocessing the acquired medical information
The pretreatment operation comprises the modes of cleaning, denoising, normalization and the like.
In one embodiment of the present invention, the specific method for cleaning, denoising and normalizing the acquired medical information includes:
step 201, in a data cleaning stage, removing duplicate data, missing data and abnormal values, specifically including: removing repeated data by using a drop_redundant function in a pandas library in Python, filling missing values by using a filter function, detecting abnormal values by using an outlier function, and processing;
step 202, removing noise and interference in signals in a data denoising stage, which specifically includes: using a signal module in a scipy library in Python to perform data denoising for the first time; performing median filtering by using a medfilt function, performing low-pass filtering by using an lfilter function, and performing secondary denoising and interference removal by the median filtering and the low-pass filtering;
step 203, in the data normalization stage, scaling the data ranges of different features to the same range, specifically including: data normalization processing is performed using the MinMaxScale class in the sklearn library in Python, and includes subtracting a minimum value from the data for each feature and then dividing by a maximum value to scale the data range to between 0 and 1.
S3, extracting specific characteristics from the preprocessed medical information
Specific characteristics may include, among others, the age, sex, severity of the condition, etc. of the patient.
In one embodiment of the present invention, the method for extracting specific features from the preprocessed medical information specifically includes:
after the data normalization process is completed, the SelectKBest class in the scikit-learn library in Python is used for carrying out feature selection, the top k features are selected, the correlation coefficient or information gain index between the top k features and the target variable is calculated to determine which features are the most important, the most important features are taken as specific features, and k is an integer larger than 0.
S4, training the extracted specific features by using a naive Bayesian algorithm, and establishing a prediction model of doctor-patient communication
In one embodiment of the present invention, a method for training an extracted specific feature by using a naive bayes algorithm specifically includes:
step 401, dividing the extracted specific features into a training set and a verification set, and determining the number of samples and the number of categories in the training set;
step 402, for each category, calculating the prior probability distribution according to the historical data, and calculating the posterior probability distribution of each category according to the characteristics and the category information by using the bayesian theorem, wherein the calculation formula is as follows:
wherein y is i Representing the class of the sample, x i Representing the characteristics of the sample, P (x i |y i ) Expressed in given y i Feature x i Probability of occurrence, P (y i ) Representing class y i Is a priori probability of P (x) i ) Representing sample characteristics x i Probability of occurrence in training set, P (y i |x i ) Representing posterior probability distribution;
step 403, building a classifier according to the posterior probability distribution obtained by calculation to predict, specifically comprising taking the posterior probability of each category as the weight of the category, weighting and summing the characteristics of all samples, mapping the result to between 0 and 1 through a softmax function, and finally, selecting the category with the highest probability as a prediction result.
S5, evaluating the established prediction model of the doctor-patient communication by using the test data set
The purpose of the evaluation is to determine the accuracy and stability of the predictive model.
In one embodiment of the present invention, the accuracy of the established predictive model of the doctor-patient communication using the test dataset may be evaluated as follows:
step 502, comparing a first prediction result and a second prediction result of the prediction model on the test data set with the real labels respectively, and calculating the accuracy of the prediction model on the test data set to obtain a corresponding first accuracy and a corresponding second accuracy;
step 503, multiplying the first accuracy rate by a first weight to obtain a first accuracy rate comparison reference value; multiplying the second accuracy rate by a second weight to obtain a second accuracy rate comparison reference value; the first weight is the weight of the category with the largest probability, and the second weight is the weight of the category with the smallest probability;
step 504, the first accuracy comparison reference value and the second accuracy comparison reference value are differenced, the difference value is compared with a set threshold range, and if the difference value is within the set threshold range, the accuracy of the prediction model is judged to be in a reasonable range.
In this embodiment, the true label refers to a correct answer or a target value corresponding to the test data, which is also referred to as a true value or a true label. In machine learning, real labels are used to evaluate the performance and accuracy of a model, and the model is optimized and improved according to the evaluation result.
In one embodiment of the present invention, the stability evaluation of the established predictive model of the doctor-patient communication using the test dataset may be performed in the following manner:
and comparing the prediction result with the real label, and calculating the average absolute error of each sample. The average absolute error can reflect the fluctuation condition of the model in the prediction process, and smaller average absolute error indicates better prediction stability of the model.
The prediction stability of the model can also be determined by plotting ROC curves and comparing the performance of the model under different thresholds.
S6, performing doctor-patient communication based on the estimated prediction model
In the communication process of the doctor and the patient, the input information of the patient is used as the input of the estimated prediction model, the output information of the estimated prediction model is presented to the doctor, and the doctor replies the patient according to the output information.
For example, after the patient inputs the name, the age and the preliminary description of cold symptoms, the estimated prediction model correspondingly outputs relevant information such as the cold symptoms, the treatment scheme, the drug allergy and the like before the patient is output to a doctor, the doctor comprehensively analyzes the presented information, and communicates with the patient according to the analysis result, so that the working time of the doctor is greatly saved, and the communication efficiency and accuracy are improved.
The invention also provides a doctor-patient communication system for realizing the doctor-patient communication method according to the embodiment of the invention, wherein the doctor-patient communication comprises the following steps:
the data acquisition module is used for acquiring medical information of a patient, wherein the medical information comprises basic information, medical history records, diagnosis results and treatment schemes of the patient;
the data preprocessing module is used for preprocessing acquired medical information, and the preprocessing operation comprises cleaning, denoising and normalization;
the characteristic extraction module is used for extracting specific characteristics from the preprocessed medical information, wherein the specific characteristics comprise the age, sex and severity of the illness state of the patient;
the model training module is used for training the extracted specific features by using a naive Bayesian algorithm and establishing a prediction model of doctor-patient communication;
the model evaluation module is used for evaluating the established prediction model of the doctor-patient communication by using the test data set so as to determine the prediction accuracy and stability of the model evaluation module;
the communication module is used for carrying out doctor-patient communication based on the estimated prediction model, and specifically comprises the following steps: and taking the input information of the patient as the input of the estimated predictive model, and presenting the output information of the estimated predictive model to a doctor, wherein the doctor replies the patient according to the output information.
In one embodiment of the present invention, the data preprocessing module is specifically configured to remove duplicate data, missing data, and outliers during a data cleansing stage, and specifically includes: removing repeated data by using a drop_redundant function in a pandas library in Python, filling missing values by using a filter function, detecting abnormal values by using an outlier function, and processing;
in the data denoising stage, removing noise and interference in signals, specifically including: using a signal module in a scipy library in Python to perform data denoising for the first time; performing median filtering by using a medfilt function, performing low-pass filtering by using an lfilter function, and performing secondary denoising and interference removal by the median filtering and the low-pass filtering;
in the data normalization stage, the data ranges of different features are scaled to be within the same range, and specifically comprises the following steps: data normalization processing is performed using the MinMaxScale class in the sklearn library in Python, and includes subtracting a minimum value from the data for each feature and then dividing by a maximum value to scale the data range to between 0 and 1.
In one embodiment of the present invention, the feature extraction module is specifically configured to, after the data normalization process is completed, perform feature selection using a SelectKBest class in a scikit-learn library in Python, select the top k features, and calculate correlation coefficients or information gain indexes between them and the target variable to determine which features are the most important features, and take the most important features as specific features, where k is an integer greater than 0.
In one embodiment of the present invention, the model training module is specifically configured to divide the extracted specific features into a training set and a verification set, and determine the number of samples and the number of categories in the training set;
for each category, calculating the prior probability distribution according to the historical data, and calculating the posterior probability distribution of each category according to the characteristics and the category information by using the Bayesian theorem, wherein the calculation formula is as follows:
wherein y is i Representing the class of the sample, x i Representing the characteristics of the sample, P (x i |y i ) Expressed in given y i Feature x i Probability of occurrence, P (y i ) Representing class y i Is a priori probability of P (x) i ) Representing sample characteristics x i Probability of occurrence in training set, P (y i |x i ) Representing posterior probability distribution;
establishing a classifier according to the posterior probability distribution obtained by calculation to predict, specifically comprising taking the posterior probability of each category as the weight of the category, weighting and summing the characteristics of all samples, mapping the result to between 0 and 1 through a softmax function to obtain the probability that the samples belong to each category, and finally, selecting the category with the maximum probability as a prediction result.
In one embodiment of the present invention, the model evaluation module is specifically configured to input each sample in the test data set into the prediction model, calculate a probability that the prediction model belongs to each class for the sample, select a class with a maximum probability as a first prediction result of the sample, and select a class with a minimum probability as a second prediction result of the sample;
comparing the first prediction result and the second prediction result of the prediction model on the test data set with the real labels respectively, and calculating the accuracy of the prediction model on the test data set to obtain corresponding first accuracy and second accuracy;
multiplying the first accuracy rate by first weight to obtain a first accuracy rate comparison reference value; multiplying the second accuracy rate by a second weight to obtain a second accuracy rate comparison reference value; the first weight is the weight of the category with the largest probability, and the second weight is the weight of the category with the smallest probability;
and comparing the difference between the first accuracy comparison reference value and the second accuracy comparison reference value, and comparing the difference value with a set threshold range, and if the difference value is within the set threshold range, judging that the accuracy of the prediction model is within a reasonable range.
In conclusion, the accuracy and the stability of the doctor-patient communication system are greatly improved, and meanwhile, the efficiency of doctor-patient communication is guaranteed.
Claims (10)
1. A doctor-patient communication method, characterized in that the doctor-patient communication method comprises:
step 1, acquiring medical information of a patient, wherein the medical information comprises basic information, medical history records, diagnosis results and treatment schemes of the patient;
step 2, preprocessing the acquired medical information, wherein the preprocessing comprises cleaning, denoising and normalizing;
step 3, extracting specific characteristics from the preprocessed medical information, wherein the specific characteristics comprise the age, sex and severity of the illness state of the patient;
step 4, training the extracted specific features by using a naive Bayesian algorithm, and establishing a prediction model of doctor-patient communication;
step 5, evaluating the established prediction model of the doctor-patient communication by using a test data set to determine the prediction accuracy and stability of the model;
and 6, performing doctor-patient communication based on the estimated prediction model, wherein the method specifically comprises the following steps of: and taking the input information of the patient as the input of the estimated predictive model, and presenting the output information of the estimated predictive model to a doctor, wherein the doctor replies the patient according to the output information.
2. The doctor-patient communication method according to claim 1, wherein step 2 specifically comprises:
step 201, in a data cleaning stage, removing duplicate data, missing data and abnormal values, specifically including: removing repeated data by using a drop_redundant function in a pandas library in Python, filling missing values by using a filter function, detecting abnormal values by using an outlier function, and processing;
step 202, removing noise and interference in signals in a data denoising stage, which specifically includes: using a signal module in a scipy library in Python to perform data denoising for the first time; performing median filtering by using a medfilt function, performing low-pass filtering by using an lfilter function, and performing secondary denoising and interference removal by the median filtering and the low-pass filtering;
step 203, in the data normalization stage, scaling the data ranges of different features to the same range, specifically including: data normalization processing is performed using the MinMaxScale class in the sklearn library in Python, and includes subtracting a minimum value from the data for each feature and then dividing by a maximum value to scale the data range to between 0 and 1.
3. The doctor-patient communication method according to claim 2, wherein extracting specific features from the preprocessed medical information specifically comprises:
after the data normalization process is completed, the SelectKBest class in the scikit-learn library in Python is used for carrying out feature selection, the top k features are selected, the correlation coefficient or information gain index between the top k features and the target variable is calculated to determine which features are the most important, the most important features are taken as specific features, and k is an integer larger than 0.
4. The doctor-patient communication method according to claim 1, wherein training the extracted specific features using a naive bayes algorithm specifically includes:
step 401, dividing the extracted specific features into a training set and a verification set, and determining the number of samples and the number of categories in the training set;
step 402, for each category, calculating the prior probability distribution according to the historical data, and calculating the posterior probability distribution of each category according to the characteristics and the category information by using the bayesian theorem, wherein the calculation formula is as follows:
wherein y is i Representing the class of the sample, x i Representing the characteristics of the sample, P (x i |y i ) Expressed in given y i Feature x i Probability of occurrence, P (y i ) Representing class y i Is a priori probability of P (x) i ) Representing sample characteristics x i Probability of occurrence in training set, P (y i |x i ) Representing posterior probability distribution;
step 403, building a classifier according to the posterior probability distribution obtained by calculation to predict, specifically comprising taking the posterior probability of each category as the weight of the category, weighting and summing the characteristics of all samples, mapping the result to between 0 and 1 through a softmax function, and finally, selecting the category with the highest probability as a prediction result.
5. The method of doctor-patient communication according to claim 1, wherein evaluating the established predictive model of the doctor-patient communication using the test data set specifically comprises:
step 501, inputting each sample in the test data set into a prediction model, calculating the probability that the prediction model belongs to each category of the sample, selecting the category with the highest probability as a first prediction result of the sample, and selecting the category with the smallest probability as a second prediction result of the sample;
step 502, comparing a first prediction result and a second prediction result of the prediction model on the test data set with the real labels respectively, and calculating the accuracy of the prediction model on the test data set to obtain a corresponding first accuracy and a corresponding second accuracy;
step 503, multiplying the first accuracy rate by a first weight to obtain a first accuracy rate comparison reference value; multiplying the second accuracy rate by a second weight to obtain a second accuracy rate comparison reference value; the first weight is the weight of the category with the largest probability, and the second weight is the weight of the category with the smallest probability;
step 504, the first accuracy comparison reference value and the second accuracy comparison reference value are differenced, the difference value is compared with a set threshold range, and if the difference value is within the set threshold range, the accuracy of the prediction model is judged to be in a reasonable range.
6. A doctor-patient communication system for implementing the doctor-patient communication method according to any one of claims 1-5, wherein the doctor-patient communication system includes:
the data acquisition module is used for acquiring medical information of a patient, wherein the medical information comprises basic information, medical history records, diagnosis results and treatment schemes of the patient;
the data preprocessing module is used for preprocessing acquired medical information, and the preprocessing operation comprises cleaning, denoising and normalization;
the characteristic extraction module is used for extracting specific characteristics from the preprocessed medical information, wherein the specific characteristics comprise the age, sex and severity of the illness state of the patient;
the model training module is used for training the extracted specific features by using a naive Bayesian algorithm and establishing a prediction model of doctor-patient communication;
the model evaluation module is used for evaluating the established prediction model of the doctor-patient communication by using the test data set so as to determine the prediction accuracy and stability of the model evaluation module;
the communication module is used for carrying out doctor-patient communication based on the estimated prediction model, and specifically comprises the following steps: and taking the input information of the patient as the input of the estimated predictive model, and presenting the output information of the estimated predictive model to a doctor, wherein the doctor replies the patient according to the output information.
7. The doctor-patient communication system according to claim 6, wherein the data preprocessing module is specifically configured to remove duplicate data, missing data, and outliers during the data cleansing phase, and specifically includes: removing repeated data by using a drop_redundant function in a pandas library in Python, filling missing values by using a filter function, detecting abnormal values by using an outlier function, and processing;
in the data denoising stage, removing noise and interference in signals, specifically including: using a signal module in a scipy library in Python to perform data denoising for the first time; performing median filtering by using a medfilt function, performing low-pass filtering by using an lfilter function, and performing secondary denoising and interference removal by the median filtering and the low-pass filtering;
in the data normalization stage, the data ranges of different features are scaled to be within the same range, and specifically comprises the following steps: data normalization processing is performed using the MinMaxScale class in the sklearn library in Python, and includes subtracting a minimum value from the data for each feature and then dividing by a maximum value to scale the data range to between 0 and 1.
8. The doctor-patient communication system according to claim 7, wherein the feature extraction module is specifically configured to, after the data normalization process is completed, perform feature selection using a SelectKBest class in a scikit-learn library in Python, select the top k features, and calculate correlation coefficients or information gain indexes between the top k features and the target variable to determine which features are most important, and take the most important features as specific features, where k is an integer greater than 0.
9. The doctor-patient communication system of claim 6, wherein the model training module is specifically configured to divide the extracted specific features into a training set and a verification set, and determine the number of samples and the number of categories in the training set;
for each category, calculating the prior probability distribution according to the historical data, and calculating the posterior probability distribution of each category according to the characteristics and the category information by using the Bayesian theorem, wherein the calculation formula is as follows:
wherein y is i Representing the class of the sample, x i Representing the characteristics of the sample, P (x i |y i ) Expressed in given y i Feature x i Probability of occurrence, P (y i ) Representing class y i Is a priori probability of P (x) i ) Representing sample characteristics x i Probability of occurrence in training set, P (y i |x i ) Representing posterior probability distribution;
establishing a classifier according to the posterior probability distribution obtained by calculation to predict, specifically comprising taking the posterior probability of each category as the weight of the category, weighting and summing the characteristics of all samples, mapping the result to between 0 and 1 through a softmax function to obtain the probability that the samples belong to each category, and finally, selecting the category with the maximum probability as a prediction result.
10. The doctor-patient communication system according to claim 6, wherein the model evaluation module is specifically configured to input each sample in the test data set into the prediction model, calculate a probability that the prediction model belongs to each class for the sample, select a class with a highest probability as a first prediction result of the sample, and select a class with a smallest probability as a second prediction result of the sample;
comparing the first prediction result and the second prediction result of the prediction model on the test data set with the real labels respectively, and calculating the accuracy of the prediction model on the test data set to obtain corresponding first accuracy and second accuracy;
multiplying the first accuracy rate by first weight to obtain a first accuracy rate comparison reference value; multiplying the second accuracy rate by a second weight to obtain a second accuracy rate comparison reference value; the first weight is the weight of the category with the largest probability, and the second weight is the weight of the category with the smallest probability;
and comparing the difference between the first accuracy comparison reference value and the second accuracy comparison reference value, and comparing the difference value with a set threshold range, and if the difference value is within the set threshold range, judging that the accuracy of the prediction model is within a reasonable range.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310949211.9A CN116665922A (en) | 2023-07-31 | 2023-07-31 | Doctor-patient communication method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310949211.9A CN116665922A (en) | 2023-07-31 | 2023-07-31 | Doctor-patient communication method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116665922A true CN116665922A (en) | 2023-08-29 |
Family
ID=87722822
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310949211.9A Pending CN116665922A (en) | 2023-07-31 | 2023-07-31 | Doctor-patient communication method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116665922A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108304887A (en) * | 2018-02-28 | 2018-07-20 | 云南大学 | Naive Bayesian data processing system and method based on the synthesis of minority class sample |
CN109036568A (en) * | 2018-09-03 | 2018-12-18 | 浪潮软件集团有限公司 | Method for establishing prediction model based on naive Bayes algorithm |
CN114093445A (en) * | 2021-11-18 | 2022-02-25 | 重庆邮电大学 | Patient screening and marking method based on multi-label learning |
-
2023
- 2023-07-31 CN CN202310949211.9A patent/CN116665922A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108304887A (en) * | 2018-02-28 | 2018-07-20 | 云南大学 | Naive Bayesian data processing system and method based on the synthesis of minority class sample |
CN109036568A (en) * | 2018-09-03 | 2018-12-18 | 浪潮软件集团有限公司 | Method for establishing prediction model based on naive Bayes algorithm |
CN114093445A (en) * | 2021-11-18 | 2022-02-25 | 重庆邮电大学 | Patient screening and marking method based on multi-label learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN117253614B (en) | Diabetes risk early warning method based on big data analysis | |
CN114639479A (en) | Intelligent diagnosis auxiliary system based on medical knowledge map | |
CN110634566A (en) | Traditional Chinese medicine clinical diagnosis data processing system and method and information data processing terminal | |
CN112967803A (en) | Early mortality prediction method and system for emergency patients based on integrated model | |
CN112786203A (en) | Machine learning diabetic retinopathy morbidity risk prediction method and application | |
CN111370124A (en) | Health analysis system and method based on facial recognition and big data | |
CN111956214A (en) | QRS wave automatic detection method based on U-net end-to-end neural network model | |
CN115346598A (en) | Chronic kidney disease genetic gene risk screening system | |
CN116013511B (en) | Intelligent recommendation method and system for diabetes intervention based on knowledge graph | |
CN116578845B (en) | Risk identification method and system for batch identification data learning | |
CN117457192A (en) | Intelligent remote diagnosis method and system | |
CN116665922A (en) | Doctor-patient communication method and system | |
Zhang et al. | A deep Bayesian neural network for cardiac arrhythmia classification with rejection from ECG recordings | |
CN116564521A (en) | Chronic disease risk assessment model establishment method, medium and system | |
CN115376692A (en) | Health data processing method and device, electronic equipment and storage medium | |
CN116167008A (en) | Abnormal positioning method for internet of things sensing cloud data center based on data enhancement | |
CN113889274B (en) | Method and device for constructing risk prediction model of autism spectrum disorder | |
CN114550930A (en) | Disease prediction method, device, equipment and storage medium | |
CN113436027A (en) | Medical insurance reimbursement abnormal data detection method and system | |
AU2021102832A4 (en) | System & method for automatic health prediction using fuzzy based machine learning | |
Meriwani | Enhancing Deep Neural Network Perforamnce on Small Datasets by the using Deep Autoencoder | |
Nandhini et al. | Random forest and genetic algorithm united with hyperparameter for diabetes prediction by using WBSMOTE, wrapper approach | |
CN116763312B (en) | Abnormal emotion recognition method and system based on wearable equipment | |
CN116738352B (en) | Method and device for classifying abnormal rod cells of retinal vascular occlusion disease | |
CN113096828B (en) | Diagnosis, prediction and major health management platform based on cancer genome big data core algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20230829 |