CN116665922A - Doctor-patient communication method and system - Google Patents

Doctor-patient communication method and system Download PDF

Info

Publication number
CN116665922A
CN116665922A CN202310949211.9A CN202310949211A CN116665922A CN 116665922 A CN116665922 A CN 116665922A CN 202310949211 A CN202310949211 A CN 202310949211A CN 116665922 A CN116665922 A CN 116665922A
Authority
CN
China
Prior art keywords
doctor
patient
data
category
accuracy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310949211.9A
Other languages
Chinese (zh)
Inventor
叶桄希
陈巧林
杨林
吕佳忆
王璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Tianfu Zhilian Health Technology Co ltd
Original Assignee
Sichuan Tianfu Zhilian Health Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Tianfu Zhilian Health Technology Co ltd filed Critical Sichuan Tianfu Zhilian Health Technology Co ltd
Priority to CN202310949211.9A priority Critical patent/CN116665922A/en
Publication of CN116665922A publication Critical patent/CN116665922A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H80/00ICT specially adapted for facilitating communication between medical practitioners or patients, e.g. for collaborative diagnosis, therapy or health monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Pathology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The invention relates to the field of doctor-patient communication, in particular to a doctor-patient communication method and system, which greatly improve the accuracy and stability of a doctor-patient communication system. The doctor-patient communication method comprises the following steps: collecting medical information of a patient, wherein the medical information comprises basic information, medical history records, diagnosis results and treatment schemes of the patient; preprocessing the acquired medical information, wherein the preprocessing comprises cleaning, denoising and normalizing; extracting specific characteristics from the pretreated medical information, wherein the specific characteristics comprise the age, sex and severity of the illness state of the patient; training the extracted specific features by using a naive Bayesian algorithm, and establishing a prediction model of doctor-patient communication; evaluating the established prediction model of the doctor-patient communication by using a test data set to determine the prediction accuracy and stability of the model; and carrying out doctor-patient communication based on the estimated prediction model. The invention is suitable for communication between doctors and patients.

Description

Doctor-patient communication method and system
Technical Field
The invention relates to the field of doctor-patient communication, in particular to a doctor-patient communication method and system.
Background
The traditional doctor-patient communication modes comprise the following steps:
1. face-to-face communication: the patient can physically visit a hospital or clinic to communicate face to face with a doctor who can make diagnoses and advice based on the patient's condition and symptoms.
2. Telephone consultation: the patient can consult his own condition and symptoms via telephone to the doctor, who will make preliminary diagnoses and advice based on the information provided in the telephone.
3. Short message or mail consultation: the patient can consult the doctor for own illness and symptoms through short messages or mails, and the doctor can communicate and guide in a mode of replying to the short messages or mails.
4. On-line inquiry: the patient can be subjected to on-line inquiry through an Internet platform, and a doctor can diagnose and recommend the patient through text, voice or video and the like.
Face-to-face communication can be used for more directly knowing the illness and symptoms of a patient, but the patient is required to go to a hospital or clinic, a great deal of time is required, the efficiency is low, and the convenience is extremely high; telephone consultation and short message/mail consultation are convenient and quick, but patients may not be accurately described, doctors may not be able to comprehensively understand the illness state of the patients, and the accuracy of communication is low; the on-line inquiry can save time and cost, but needs a certain network skill and equipment for the patient, and doctors can only carry out unilateral diagnosis according to the description of the patient to give out patient advice, so that the on-line inquiry has low efficiency and low accuracy.
In the prior art, as disclosed in CN105812376a, in a doctor-patient multiparty instant messaging system constructed by using strophe, after the client program is started, the bottom layer communication module starts monitoring. The data packet is converted into a data packet with a structhe protocol format through a protocol analysis module, and the request is newly sent to a communication server side for processing through a bottom communication protocol. The instant communication system platform provides various functional interfaces for the instant communication system, is convenient for the client to demand, and the client expands corresponding functions according to own demand.
The embodiment of the scheme provides an instant communication message technology based on the strophe, and a doctor-patient multiparty instant communication mechanism management mechanism constructed by the strophe is applied in the communication process, so that the success rate of message updating can be greatly improved. The efficiency of doctor-patient communication is improved, but the accuracy and stability of doctor-patient communication are poor.
Disclosure of Invention
The invention aims to provide a doctor-patient communication method, which greatly improves the accuracy and stability of a doctor-patient communication system and ensures the efficiency of doctor-patient communication.
The invention adopts the following technical scheme to achieve the aim, and the doctor-patient communication method comprises the following steps:
step 1, acquiring medical information of a patient, wherein the medical information comprises basic information, medical history records, diagnosis results and treatment schemes of the patient;
step 2, preprocessing the acquired medical information, wherein the preprocessing comprises cleaning, denoising and normalizing;
step 3, extracting specific characteristics from the preprocessed medical information, wherein the specific characteristics comprise the age, sex and severity of the illness state of the patient;
step 4, training the extracted specific features by using a naive Bayesian algorithm, and establishing a prediction model of doctor-patient communication;
step 5, evaluating the established prediction model of the doctor-patient communication by using a test data set to determine the prediction accuracy and stability of the model;
and 6, performing doctor-patient communication based on the estimated prediction model, wherein the method specifically comprises the following steps of: and taking the input information of the patient as the input of the estimated predictive model, and presenting the output information of the estimated predictive model to a doctor, wherein the doctor replies the patient according to the output information.
Further, step 2 specifically includes:
step 201, in a data cleaning stage, removing duplicate data, missing data and abnormal values, specifically including: removing repeated data by using a drop_redundant function in a pandas library in Python, filling missing values by using a filter function, detecting abnormal values by using an outlier function, and processing;
step 202, removing noise and interference in signals in a data denoising stage, which specifically includes: using a signal module in a scipy library in Python to perform data denoising for the first time; performing median filtering by using a medfilt function, performing low-pass filtering by using an lfilter function, and performing secondary denoising and interference removal by the median filtering and the low-pass filtering;
step 203, in the data normalization stage, scaling the data ranges of different features to the same range, specifically including: data normalization processing is performed using the MinMaxScale class in the sklearn library in Python, and includes subtracting a minimum value from the data for each feature and then dividing by a maximum value to scale the data range to between 0 and 1.
The repeated data, the missing data and the abnormal values are removed in the data cleaning stage, the noise and the interference in the signals are removed in the data denoising stage, and the data ranges of different characteristics are scaled to be within the same range in the data normalization stage, so that the data quality is greatly improved, the data format is unified, the subsequent analysis difficulty is reduced, and the subsequent modeling efficiency is improved.
Further, extracting specific features from the preprocessed medical information specifically includes:
after the data normalization process is completed, the SelectKBest class in the scikit-learn library in Python is used for carrying out feature selection, the top k features are selected, the correlation coefficient or information gain index between the top k features and the target variable is calculated to determine which features are the most important, the most important features are taken as specific features, and k is an integer larger than 0. By the scheme, the accuracy of feature extraction is improved.
Further, training the extracted specific features by using a naive bayes algorithm specifically includes:
step 401, dividing the extracted specific features into a training set and a verification set, and determining the number of samples and the number of categories in the training set;
step 402, for each category, calculating the prior probability distribution according to the historical data, and calculating the posterior probability distribution of each category according to the characteristics and the category information by using the bayesian theorem, wherein the calculation formula is as follows:
wherein y is i Representing the class of the sample, x i Representing the characteristics of the sample, P (x i |y i ) Expressed in given y i Feature x i Probability of occurrence, P (y i ) Representing class y i Is a priori probability of P (x) i ) Representing sample characteristics x i Probability of occurrence in training set, P (y i |x i ) Representing posterior probability distribution;
step 403, building a classifier according to the posterior probability distribution obtained by calculation to predict, specifically comprising taking the posterior probability of each category as the weight of the category, weighting and summing the characteristics of all samples, mapping the result to between 0 and 1 through a softmax function, and finally, selecting the category with the highest probability as a prediction result.
Through the training process, for some data sets with a plurality of characteristics, the number of the characteristics can be effectively reduced, the efficiency and the accuracy of the model can be improved, and for the case of a few samples in certain categories, the data sets can be balanced by a weighting method and the like.
Further, the evaluating the established predictive model of the doctor-patient communication using the test data set specifically includes:
step 501, inputting each sample in the test data set into a prediction model, calculating the probability that the prediction model belongs to each category of the sample, selecting the category with the highest probability as a first prediction result of the sample, and selecting the category with the smallest probability as a second prediction result of the sample;
step 502, comparing a first prediction result and a second prediction result of the prediction model on the test data set with the real labels respectively, and calculating the accuracy of the prediction model on the test data set to obtain a corresponding first accuracy and a corresponding second accuracy;
step 503, multiplying the first accuracy rate by a first weight to obtain a first accuracy rate comparison reference value; multiplying the second accuracy rate by a second weight to obtain a second accuracy rate comparison reference value; the first weight is the weight of the category with the largest probability, and the second weight is the weight of the category with the smallest probability;
step 504, the first accuracy comparison reference value and the second accuracy comparison reference value are differenced, the difference value is compared with a set threshold range, and if the difference value is within the set threshold range, the accuracy of the prediction model is judged to be in a reasonable range.
By the above evaluation process, the accuracy of the evaluation process can be improved. By using the test dataset, the generalization ability of the model on unknown data can be verified, i.e., whether the model can correctly predict new data.
A doctor-patient communication system, the doctor-patient communication system comprising:
the data acquisition module is used for acquiring medical information of a patient, wherein the medical information comprises basic information, medical history records, diagnosis results and treatment schemes of the patient;
the data preprocessing module is used for preprocessing acquired medical information, and the preprocessing operation comprises cleaning, denoising and normalization;
the characteristic extraction module is used for extracting specific characteristics from the preprocessed medical information, wherein the specific characteristics comprise the age, sex and severity of the illness state of the patient;
the model training module is used for training the extracted specific features by using a naive Bayesian algorithm and establishing a prediction model of doctor-patient communication;
the model evaluation module is used for evaluating the established prediction model of the doctor-patient communication by using the test data set so as to determine the prediction accuracy and stability of the model evaluation module;
the communication module is used for carrying out doctor-patient communication based on the estimated prediction model, and specifically comprises the following steps: and taking the input information of the patient as the input of the estimated predictive model, and presenting the output information of the estimated predictive model to a doctor, wherein the doctor replies the patient according to the output information.
Further, the data preprocessing module is specifically configured to remove duplicate data, missing data, and outliers in a data cleaning stage, and specifically includes: removing repeated data by using a drop_redundant function in a pandas library in Python, filling missing values by using a filter function, detecting abnormal values by using an outlier function, and processing;
in the data denoising stage, removing noise and interference in signals, specifically including: using a signal module in a scipy library in Python to perform data denoising for the first time; performing median filtering by using a medfilt function, performing low-pass filtering by using an lfilter function, and performing secondary denoising and interference removal by the median filtering and the low-pass filtering;
in the data normalization stage, the data ranges of different features are scaled to be within the same range, and specifically comprises the following steps: data normalization processing is performed using the MinMaxScale class in the sklearn library in Python, and includes subtracting a minimum value from the data for each feature and then dividing by a maximum value to scale the data range to between 0 and 1.
Further, the feature extraction module is specifically configured to, after the data normalization process is completed, select the top k features by using the SelectKBest class in the scikit-learn library in Python, calculate correlation coefficients or information gain indexes between the top k features and the target variable, so as to determine which features are the most important features, take the most important features as specific features, and k is an integer greater than 0.
The model training module is specifically used for dividing the extracted specific features into a training set and a verification set, and determining the number of samples and the number of categories in the training set;
for each category, calculating the prior probability distribution according to the historical data, and calculating the posterior probability distribution of each category according to the characteristics and the category information by using the Bayesian theorem, wherein the calculation formula is as follows:
wherein y is i Representing the class of the sample, x i Representing the characteristics of the sample, P (x i |y i ) Expressed in given y i Feature x i Probability of occurrence, P (y i ) Representing class y i Is a priori probability of P (x) i ) Representing sample characteristics x i Probability of occurrence in training set, P (y i |x i ) Representing posterior probability distribution;
establishing a classifier according to the posterior probability distribution obtained by calculation to predict, specifically comprising taking the posterior probability of each category as the weight of the category, weighting and summing the characteristics of all samples, mapping the result to between 0 and 1 through a softmax function to obtain the probability that the samples belong to each category, and finally, selecting the category with the maximum probability as a prediction result.
Further, the model evaluation module is specifically configured to input each sample in the test data set into the prediction model, calculate a probability that the prediction model belongs to each class for the sample, select a class with a maximum probability as a first prediction result of the sample, and select a class with a minimum probability as a second prediction result of the sample;
comparing the first prediction result and the second prediction result of the prediction model on the test data set with the real labels respectively, and calculating the accuracy of the prediction model on the test data set to obtain corresponding first accuracy and second accuracy;
multiplying the first accuracy rate by first weight to obtain a first accuracy rate comparison reference value; multiplying the second accuracy rate by a second weight to obtain a second accuracy rate comparison reference value; the first weight is the weight of the category with the largest probability, and the second weight is the weight of the category with the smallest probability;
and comparing the difference between the first accuracy comparison reference value and the second accuracy comparison reference value, and comparing the difference value with a set threshold range, and if the difference value is within the set threshold range, judging that the accuracy of the prediction model is within a reasonable range.
The beneficial effects of the invention are as follows:
the naive Bayes-based doctor-patient communication can realize rapid automatic classification and prediction, and save time and energy of doctors.
The naive Bayesian algorithm provided by the invention assumes that sample data obeys Gaussian distribution, can effectively process unbalanced data sets, and has higher accuracy and stability.
The naive Bayes-based doctor-patient communication can conveniently perform model training and application expansion, and support large-scale data processing and analysis.
According to the naive Bayesian-based doctor-patient communication method, automatic classification and prediction can be realized, the requirement for manual intervention is reduced, and the doctor-patient communication efficiency and accuracy are improved.
Drawings
Fig. 1 is a flowchart of a method for communicating between a doctor and a patient according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention provides a doctor-patient communication method, as shown in fig. 1, comprising the following steps:
s1, collecting medical information of a patient
Wherein the medical information includes basic information of the patient, medical history record, diagnosis result and treatment scheme.
In one embodiment of the invention, the manner in which medical information of a patient is acquired includes:
determining the type of the acquired information: depending on the purpose and scope of the physician-patient communication, it is determined which types of medical information, such as basic information, medical history, diagnostic results, treatment regimen, etc., need to be collected.
Collecting patient basic information: including name, gender, age, contact, etc. Such information may be obtained by way of patient filled forms, physician inquiries, and the like.
Collecting a patient's medical history: the past medical history, family medical history, allergic history, etc. of the patient are collected by querying the patient or looking up a record of medical records, etc.
Confirming the diagnosis result of the patient: if the patient has received an examination and diagnosis from a hospital or clinic, the doctor or medical institution may be queried for the patient's diagnosis.
Treatment regimen for the collection patient: if the patient has begun to receive treatment, the patient's treatment regimen, medication, etc. may be collected from the doctor or medical facility.
Patient symptoms and feedback were recorded: in doctor-patient communication, a doctor may ask the patient for symptoms and feedback, and the information may help the doctor to better understand the condition and treatment effect of the patient.
S2, preprocessing the acquired medical information
The pretreatment operation comprises the modes of cleaning, denoising, normalization and the like.
In one embodiment of the present invention, the specific method for cleaning, denoising and normalizing the acquired medical information includes:
step 201, in a data cleaning stage, removing duplicate data, missing data and abnormal values, specifically including: removing repeated data by using a drop_redundant function in a pandas library in Python, filling missing values by using a filter function, detecting abnormal values by using an outlier function, and processing;
step 202, removing noise and interference in signals in a data denoising stage, which specifically includes: using a signal module in a scipy library in Python to perform data denoising for the first time; performing median filtering by using a medfilt function, performing low-pass filtering by using an lfilter function, and performing secondary denoising and interference removal by the median filtering and the low-pass filtering;
step 203, in the data normalization stage, scaling the data ranges of different features to the same range, specifically including: data normalization processing is performed using the MinMaxScale class in the sklearn library in Python, and includes subtracting a minimum value from the data for each feature and then dividing by a maximum value to scale the data range to between 0 and 1.
S3, extracting specific characteristics from the preprocessed medical information
Specific characteristics may include, among others, the age, sex, severity of the condition, etc. of the patient.
In one embodiment of the present invention, the method for extracting specific features from the preprocessed medical information specifically includes:
after the data normalization process is completed, the SelectKBest class in the scikit-learn library in Python is used for carrying out feature selection, the top k features are selected, the correlation coefficient or information gain index between the top k features and the target variable is calculated to determine which features are the most important, the most important features are taken as specific features, and k is an integer larger than 0.
S4, training the extracted specific features by using a naive Bayesian algorithm, and establishing a prediction model of doctor-patient communication
In one embodiment of the present invention, a method for training an extracted specific feature by using a naive bayes algorithm specifically includes:
step 401, dividing the extracted specific features into a training set and a verification set, and determining the number of samples and the number of categories in the training set;
step 402, for each category, calculating the prior probability distribution according to the historical data, and calculating the posterior probability distribution of each category according to the characteristics and the category information by using the bayesian theorem, wherein the calculation formula is as follows:
wherein y is i Representing the class of the sample, x i Representing the characteristics of the sample, P (x i |y i ) Expressed in given y i Feature x i Probability of occurrence, P (y i ) Representing class y i Is a priori probability of P (x) i ) Representing sample characteristics x i Probability of occurrence in training set, P (y i |x i ) Representing posterior probability distribution;
step 403, building a classifier according to the posterior probability distribution obtained by calculation to predict, specifically comprising taking the posterior probability of each category as the weight of the category, weighting and summing the characteristics of all samples, mapping the result to between 0 and 1 through a softmax function, and finally, selecting the category with the highest probability as a prediction result.
S5, evaluating the established prediction model of the doctor-patient communication by using the test data set
The purpose of the evaluation is to determine the accuracy and stability of the predictive model.
In one embodiment of the present invention, the accuracy of the established predictive model of the doctor-patient communication using the test dataset may be evaluated as follows:
step 502, comparing a first prediction result and a second prediction result of the prediction model on the test data set with the real labels respectively, and calculating the accuracy of the prediction model on the test data set to obtain a corresponding first accuracy and a corresponding second accuracy;
step 503, multiplying the first accuracy rate by a first weight to obtain a first accuracy rate comparison reference value; multiplying the second accuracy rate by a second weight to obtain a second accuracy rate comparison reference value; the first weight is the weight of the category with the largest probability, and the second weight is the weight of the category with the smallest probability;
step 504, the first accuracy comparison reference value and the second accuracy comparison reference value are differenced, the difference value is compared with a set threshold range, and if the difference value is within the set threshold range, the accuracy of the prediction model is judged to be in a reasonable range.
In this embodiment, the true label refers to a correct answer or a target value corresponding to the test data, which is also referred to as a true value or a true label. In machine learning, real labels are used to evaluate the performance and accuracy of a model, and the model is optimized and improved according to the evaluation result.
In one embodiment of the present invention, the stability evaluation of the established predictive model of the doctor-patient communication using the test dataset may be performed in the following manner:
and comparing the prediction result with the real label, and calculating the average absolute error of each sample. The average absolute error can reflect the fluctuation condition of the model in the prediction process, and smaller average absolute error indicates better prediction stability of the model.
The prediction stability of the model can also be determined by plotting ROC curves and comparing the performance of the model under different thresholds.
S6, performing doctor-patient communication based on the estimated prediction model
In the communication process of the doctor and the patient, the input information of the patient is used as the input of the estimated prediction model, the output information of the estimated prediction model is presented to the doctor, and the doctor replies the patient according to the output information.
For example, after the patient inputs the name, the age and the preliminary description of cold symptoms, the estimated prediction model correspondingly outputs relevant information such as the cold symptoms, the treatment scheme, the drug allergy and the like before the patient is output to a doctor, the doctor comprehensively analyzes the presented information, and communicates with the patient according to the analysis result, so that the working time of the doctor is greatly saved, and the communication efficiency and accuracy are improved.
The invention also provides a doctor-patient communication system for realizing the doctor-patient communication method according to the embodiment of the invention, wherein the doctor-patient communication comprises the following steps:
the data acquisition module is used for acquiring medical information of a patient, wherein the medical information comprises basic information, medical history records, diagnosis results and treatment schemes of the patient;
the data preprocessing module is used for preprocessing acquired medical information, and the preprocessing operation comprises cleaning, denoising and normalization;
the characteristic extraction module is used for extracting specific characteristics from the preprocessed medical information, wherein the specific characteristics comprise the age, sex and severity of the illness state of the patient;
the model training module is used for training the extracted specific features by using a naive Bayesian algorithm and establishing a prediction model of doctor-patient communication;
the model evaluation module is used for evaluating the established prediction model of the doctor-patient communication by using the test data set so as to determine the prediction accuracy and stability of the model evaluation module;
the communication module is used for carrying out doctor-patient communication based on the estimated prediction model, and specifically comprises the following steps: and taking the input information of the patient as the input of the estimated predictive model, and presenting the output information of the estimated predictive model to a doctor, wherein the doctor replies the patient according to the output information.
In one embodiment of the present invention, the data preprocessing module is specifically configured to remove duplicate data, missing data, and outliers during a data cleansing stage, and specifically includes: removing repeated data by using a drop_redundant function in a pandas library in Python, filling missing values by using a filter function, detecting abnormal values by using an outlier function, and processing;
in the data denoising stage, removing noise and interference in signals, specifically including: using a signal module in a scipy library in Python to perform data denoising for the first time; performing median filtering by using a medfilt function, performing low-pass filtering by using an lfilter function, and performing secondary denoising and interference removal by the median filtering and the low-pass filtering;
in the data normalization stage, the data ranges of different features are scaled to be within the same range, and specifically comprises the following steps: data normalization processing is performed using the MinMaxScale class in the sklearn library in Python, and includes subtracting a minimum value from the data for each feature and then dividing by a maximum value to scale the data range to between 0 and 1.
In one embodiment of the present invention, the feature extraction module is specifically configured to, after the data normalization process is completed, perform feature selection using a SelectKBest class in a scikit-learn library in Python, select the top k features, and calculate correlation coefficients or information gain indexes between them and the target variable to determine which features are the most important features, and take the most important features as specific features, where k is an integer greater than 0.
In one embodiment of the present invention, the model training module is specifically configured to divide the extracted specific features into a training set and a verification set, and determine the number of samples and the number of categories in the training set;
for each category, calculating the prior probability distribution according to the historical data, and calculating the posterior probability distribution of each category according to the characteristics and the category information by using the Bayesian theorem, wherein the calculation formula is as follows:
wherein y is i Representing the class of the sample, x i Representing the characteristics of the sample, P (x i |y i ) Expressed in given y i Feature x i Probability of occurrence, P (y i ) Representing class y i Is a priori probability of P (x) i ) Representing sample characteristics x i Probability of occurrence in training set, P (y i |x i ) Representing posterior probability distribution;
establishing a classifier according to the posterior probability distribution obtained by calculation to predict, specifically comprising taking the posterior probability of each category as the weight of the category, weighting and summing the characteristics of all samples, mapping the result to between 0 and 1 through a softmax function to obtain the probability that the samples belong to each category, and finally, selecting the category with the maximum probability as a prediction result.
In one embodiment of the present invention, the model evaluation module is specifically configured to input each sample in the test data set into the prediction model, calculate a probability that the prediction model belongs to each class for the sample, select a class with a maximum probability as a first prediction result of the sample, and select a class with a minimum probability as a second prediction result of the sample;
comparing the first prediction result and the second prediction result of the prediction model on the test data set with the real labels respectively, and calculating the accuracy of the prediction model on the test data set to obtain corresponding first accuracy and second accuracy;
multiplying the first accuracy rate by first weight to obtain a first accuracy rate comparison reference value; multiplying the second accuracy rate by a second weight to obtain a second accuracy rate comparison reference value; the first weight is the weight of the category with the largest probability, and the second weight is the weight of the category with the smallest probability;
and comparing the difference between the first accuracy comparison reference value and the second accuracy comparison reference value, and comparing the difference value with a set threshold range, and if the difference value is within the set threshold range, judging that the accuracy of the prediction model is within a reasonable range.
In conclusion, the accuracy and the stability of the doctor-patient communication system are greatly improved, and meanwhile, the efficiency of doctor-patient communication is guaranteed.

Claims (10)

1. A doctor-patient communication method, characterized in that the doctor-patient communication method comprises:
step 1, acquiring medical information of a patient, wherein the medical information comprises basic information, medical history records, diagnosis results and treatment schemes of the patient;
step 2, preprocessing the acquired medical information, wherein the preprocessing comprises cleaning, denoising and normalizing;
step 3, extracting specific characteristics from the preprocessed medical information, wherein the specific characteristics comprise the age, sex and severity of the illness state of the patient;
step 4, training the extracted specific features by using a naive Bayesian algorithm, and establishing a prediction model of doctor-patient communication;
step 5, evaluating the established prediction model of the doctor-patient communication by using a test data set to determine the prediction accuracy and stability of the model;
and 6, performing doctor-patient communication based on the estimated prediction model, wherein the method specifically comprises the following steps of: and taking the input information of the patient as the input of the estimated predictive model, and presenting the output information of the estimated predictive model to a doctor, wherein the doctor replies the patient according to the output information.
2. The doctor-patient communication method according to claim 1, wherein step 2 specifically comprises:
step 201, in a data cleaning stage, removing duplicate data, missing data and abnormal values, specifically including: removing repeated data by using a drop_redundant function in a pandas library in Python, filling missing values by using a filter function, detecting abnormal values by using an outlier function, and processing;
step 202, removing noise and interference in signals in a data denoising stage, which specifically includes: using a signal module in a scipy library in Python to perform data denoising for the first time; performing median filtering by using a medfilt function, performing low-pass filtering by using an lfilter function, and performing secondary denoising and interference removal by the median filtering and the low-pass filtering;
step 203, in the data normalization stage, scaling the data ranges of different features to the same range, specifically including: data normalization processing is performed using the MinMaxScale class in the sklearn library in Python, and includes subtracting a minimum value from the data for each feature and then dividing by a maximum value to scale the data range to between 0 and 1.
3. The doctor-patient communication method according to claim 2, wherein extracting specific features from the preprocessed medical information specifically comprises:
after the data normalization process is completed, the SelectKBest class in the scikit-learn library in Python is used for carrying out feature selection, the top k features are selected, the correlation coefficient or information gain index between the top k features and the target variable is calculated to determine which features are the most important, the most important features are taken as specific features, and k is an integer larger than 0.
4. The doctor-patient communication method according to claim 1, wherein training the extracted specific features using a naive bayes algorithm specifically includes:
step 401, dividing the extracted specific features into a training set and a verification set, and determining the number of samples and the number of categories in the training set;
step 402, for each category, calculating the prior probability distribution according to the historical data, and calculating the posterior probability distribution of each category according to the characteristics and the category information by using the bayesian theorem, wherein the calculation formula is as follows:
wherein y is i Representing the class of the sample, x i Representing the characteristics of the sample, P (x i |y i ) Expressed in given y i Feature x i Probability of occurrence, P (y i ) Representing class y i Is a priori probability of P (x) i ) Representing sample characteristics x i Probability of occurrence in training set, P (y i |x i ) Representing posterior probability distribution;
step 403, building a classifier according to the posterior probability distribution obtained by calculation to predict, specifically comprising taking the posterior probability of each category as the weight of the category, weighting and summing the characteristics of all samples, mapping the result to between 0 and 1 through a softmax function, and finally, selecting the category with the highest probability as a prediction result.
5. The method of doctor-patient communication according to claim 1, wherein evaluating the established predictive model of the doctor-patient communication using the test data set specifically comprises:
step 501, inputting each sample in the test data set into a prediction model, calculating the probability that the prediction model belongs to each category of the sample, selecting the category with the highest probability as a first prediction result of the sample, and selecting the category with the smallest probability as a second prediction result of the sample;
step 502, comparing a first prediction result and a second prediction result of the prediction model on the test data set with the real labels respectively, and calculating the accuracy of the prediction model on the test data set to obtain a corresponding first accuracy and a corresponding second accuracy;
step 503, multiplying the first accuracy rate by a first weight to obtain a first accuracy rate comparison reference value; multiplying the second accuracy rate by a second weight to obtain a second accuracy rate comparison reference value; the first weight is the weight of the category with the largest probability, and the second weight is the weight of the category with the smallest probability;
step 504, the first accuracy comparison reference value and the second accuracy comparison reference value are differenced, the difference value is compared with a set threshold range, and if the difference value is within the set threshold range, the accuracy of the prediction model is judged to be in a reasonable range.
6. A doctor-patient communication system for implementing the doctor-patient communication method according to any one of claims 1-5, wherein the doctor-patient communication system includes:
the data acquisition module is used for acquiring medical information of a patient, wherein the medical information comprises basic information, medical history records, diagnosis results and treatment schemes of the patient;
the data preprocessing module is used for preprocessing acquired medical information, and the preprocessing operation comprises cleaning, denoising and normalization;
the characteristic extraction module is used for extracting specific characteristics from the preprocessed medical information, wherein the specific characteristics comprise the age, sex and severity of the illness state of the patient;
the model training module is used for training the extracted specific features by using a naive Bayesian algorithm and establishing a prediction model of doctor-patient communication;
the model evaluation module is used for evaluating the established prediction model of the doctor-patient communication by using the test data set so as to determine the prediction accuracy and stability of the model evaluation module;
the communication module is used for carrying out doctor-patient communication based on the estimated prediction model, and specifically comprises the following steps: and taking the input information of the patient as the input of the estimated predictive model, and presenting the output information of the estimated predictive model to a doctor, wherein the doctor replies the patient according to the output information.
7. The doctor-patient communication system according to claim 6, wherein the data preprocessing module is specifically configured to remove duplicate data, missing data, and outliers during the data cleansing phase, and specifically includes: removing repeated data by using a drop_redundant function in a pandas library in Python, filling missing values by using a filter function, detecting abnormal values by using an outlier function, and processing;
in the data denoising stage, removing noise and interference in signals, specifically including: using a signal module in a scipy library in Python to perform data denoising for the first time; performing median filtering by using a medfilt function, performing low-pass filtering by using an lfilter function, and performing secondary denoising and interference removal by the median filtering and the low-pass filtering;
in the data normalization stage, the data ranges of different features are scaled to be within the same range, and specifically comprises the following steps: data normalization processing is performed using the MinMaxScale class in the sklearn library in Python, and includes subtracting a minimum value from the data for each feature and then dividing by a maximum value to scale the data range to between 0 and 1.
8. The doctor-patient communication system according to claim 7, wherein the feature extraction module is specifically configured to, after the data normalization process is completed, perform feature selection using a SelectKBest class in a scikit-learn library in Python, select the top k features, and calculate correlation coefficients or information gain indexes between the top k features and the target variable to determine which features are most important, and take the most important features as specific features, where k is an integer greater than 0.
9. The doctor-patient communication system of claim 6, wherein the model training module is specifically configured to divide the extracted specific features into a training set and a verification set, and determine the number of samples and the number of categories in the training set;
for each category, calculating the prior probability distribution according to the historical data, and calculating the posterior probability distribution of each category according to the characteristics and the category information by using the Bayesian theorem, wherein the calculation formula is as follows:
wherein y is i Representing the class of the sample, x i Representing the characteristics of the sample, P (x i |y i ) Expressed in given y i Feature x i Probability of occurrence, P (y i ) Representing class y i Is a priori probability of P (x) i ) Representing sample characteristics x i Probability of occurrence in training set, P (y i |x i ) Representing posterior probability distribution;
establishing a classifier according to the posterior probability distribution obtained by calculation to predict, specifically comprising taking the posterior probability of each category as the weight of the category, weighting and summing the characteristics of all samples, mapping the result to between 0 and 1 through a softmax function to obtain the probability that the samples belong to each category, and finally, selecting the category with the maximum probability as a prediction result.
10. The doctor-patient communication system according to claim 6, wherein the model evaluation module is specifically configured to input each sample in the test data set into the prediction model, calculate a probability that the prediction model belongs to each class for the sample, select a class with a highest probability as a first prediction result of the sample, and select a class with a smallest probability as a second prediction result of the sample;
comparing the first prediction result and the second prediction result of the prediction model on the test data set with the real labels respectively, and calculating the accuracy of the prediction model on the test data set to obtain corresponding first accuracy and second accuracy;
multiplying the first accuracy rate by first weight to obtain a first accuracy rate comparison reference value; multiplying the second accuracy rate by a second weight to obtain a second accuracy rate comparison reference value; the first weight is the weight of the category with the largest probability, and the second weight is the weight of the category with the smallest probability;
and comparing the difference between the first accuracy comparison reference value and the second accuracy comparison reference value, and comparing the difference value with a set threshold range, and if the difference value is within the set threshold range, judging that the accuracy of the prediction model is within a reasonable range.
CN202310949211.9A 2023-07-31 2023-07-31 Doctor-patient communication method and system Pending CN116665922A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310949211.9A CN116665922A (en) 2023-07-31 2023-07-31 Doctor-patient communication method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310949211.9A CN116665922A (en) 2023-07-31 2023-07-31 Doctor-patient communication method and system

Publications (1)

Publication Number Publication Date
CN116665922A true CN116665922A (en) 2023-08-29

Family

ID=87722822

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310949211.9A Pending CN116665922A (en) 2023-07-31 2023-07-31 Doctor-patient communication method and system

Country Status (1)

Country Link
CN (1) CN116665922A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304887A (en) * 2018-02-28 2018-07-20 云南大学 Naive Bayesian data processing system and method based on the synthesis of minority class sample
CN109036568A (en) * 2018-09-03 2018-12-18 浪潮软件集团有限公司 Method for establishing prediction model based on naive Bayes algorithm
CN114093445A (en) * 2021-11-18 2022-02-25 重庆邮电大学 Patient screening and marking method based on multi-label learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304887A (en) * 2018-02-28 2018-07-20 云南大学 Naive Bayesian data processing system and method based on the synthesis of minority class sample
CN109036568A (en) * 2018-09-03 2018-12-18 浪潮软件集团有限公司 Method for establishing prediction model based on naive Bayes algorithm
CN114093445A (en) * 2021-11-18 2022-02-25 重庆邮电大学 Patient screening and marking method based on multi-label learning

Similar Documents

Publication Publication Date Title
CN117253614B (en) Diabetes risk early warning method based on big data analysis
CN114639479A (en) Intelligent diagnosis auxiliary system based on medical knowledge map
CN110634566A (en) Traditional Chinese medicine clinical diagnosis data processing system and method and information data processing terminal
CN112967803A (en) Early mortality prediction method and system for emergency patients based on integrated model
CN112786203A (en) Machine learning diabetic retinopathy morbidity risk prediction method and application
CN111370124A (en) Health analysis system and method based on facial recognition and big data
CN111956214A (en) QRS wave automatic detection method based on U-net end-to-end neural network model
CN115346598A (en) Chronic kidney disease genetic gene risk screening system
CN116013511B (en) Intelligent recommendation method and system for diabetes intervention based on knowledge graph
CN116578845B (en) Risk identification method and system for batch identification data learning
CN117457192A (en) Intelligent remote diagnosis method and system
CN116665922A (en) Doctor-patient communication method and system
Zhang et al. A deep Bayesian neural network for cardiac arrhythmia classification with rejection from ECG recordings
CN116564521A (en) Chronic disease risk assessment model establishment method, medium and system
CN115376692A (en) Health data processing method and device, electronic equipment and storage medium
CN116167008A (en) Abnormal positioning method for internet of things sensing cloud data center based on data enhancement
CN113889274B (en) Method and device for constructing risk prediction model of autism spectrum disorder
CN114550930A (en) Disease prediction method, device, equipment and storage medium
CN113436027A (en) Medical insurance reimbursement abnormal data detection method and system
AU2021102832A4 (en) System & method for automatic health prediction using fuzzy based machine learning
Meriwani Enhancing Deep Neural Network Perforamnce on Small Datasets by the using Deep Autoencoder
Nandhini et al. Random forest and genetic algorithm united with hyperparameter for diabetes prediction by using WBSMOTE, wrapper approach
CN116763312B (en) Abnormal emotion recognition method and system based on wearable equipment
CN116738352B (en) Method and device for classifying abnormal rod cells of retinal vascular occlusion disease
CN113096828B (en) Diagnosis, prediction and major health management platform based on cancer genome big data core algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20230829