CN110970129B - Method for judging traditional Chinese medicine syndrome based on improved Bayesian statistics - Google Patents

Method for judging traditional Chinese medicine syndrome based on improved Bayesian statistics Download PDF

Info

Publication number
CN110970129B
CN110970129B CN201911333121.7A CN201911333121A CN110970129B CN 110970129 B CN110970129 B CN 110970129B CN 201911333121 A CN201911333121 A CN 201911333121A CN 110970129 B CN110970129 B CN 110970129B
Authority
CN
China
Prior art keywords
syndrome
symptom
symptoms
score
probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911333121.7A
Other languages
Chinese (zh)
Other versions
CN110970129A (en
Inventor
许玉龙
马锦地
李新安
柳忠勇
王忠义
吕雅丽
朱红磊
宋婷
刘方方
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan University of Traditional Chinese Medicine HUTCM
Original Assignee
Henan University of Traditional Chinese Medicine HUTCM
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan University of Traditional Chinese Medicine HUTCM filed Critical Henan University of Traditional Chinese Medicine HUTCM
Priority to CN201911333121.7A priority Critical patent/CN110970129B/en
Publication of CN110970129A publication Critical patent/CN110970129A/en
Application granted granted Critical
Publication of CN110970129B publication Critical patent/CN110970129B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/48Other medical applications
    • A61B5/4854Diagnosis based on concepts of traditional oriental medicine
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Animal Behavior & Ethology (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Molecular Biology (AREA)
  • Surgery (AREA)
  • Alternative & Traditional Medicine (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Veterinary Medicine (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Medicines Containing Plant Substances (AREA)

Abstract

The invention relates to a method for researching traditional Chinese medicine syndromes. A method for calculating traditional Chinese medicine syndrome threshold and symptom score based on improved Bayesian statistics can analyze all traditional Chinese medicine syndrome diseases. Taking the consumptive lung disease as an example, calculating the syndrome threshold value and the symptom score by using a Bayesian statistical algorithm for the established symptoms of the consumptive lung disease patients and the syndrome classification data of the expert group, and researching the classification rule of the consumptive lung disease syndromes: calculating the prior probability of each syndrome and symptom from the classification data of the symptoms and symptoms of all the consumptive lung disease patients; counting the probability of occurrence of symptoms and syndromes, and calculating and determining contribution scores of all symptoms and syndrome thresholds in each syndrome through a logarithmic ratio; and calculating contribution scores and syndrome thresholds of all symptoms by adopting a Laplace smoothing improvement algorithm so as to improve statistical results, and taking the first ten symptoms with higher scores as syndrome typing rules so as to improve the robustness of the syndrome differentiation rules and provide reference for syndrome diagnosis of the clinical consumptive lung disease.

Description

Method for judging traditional Chinese medicine syndrome based on improved Bayesian statistics
Technical Field
The invention relates to a method for researching traditional Chinese medicine syndromes by using Bayes, which can analyze all traditional Chinese medicine syndrome diseases. Based on the earlier stage research foundation, the established lung system disease database for the famous and old traditional Chinese medicine treatment is analyzed, the symptoms and syndromes of the consumptive lung disease are induced and researched by taking the consumptive lung disease as an example and using an improved Bayesian statistical method, and the threshold value of each syndrome and the score of the corresponding symptom of the syndrome are deduced and calculated based on the logarithmic ratio, so that the syndrome typing diagnosis rule of the consumptive lung disease is established, and the reference is provided for the clinical diagnosis of the syndromes of the consumptive lung disease.
Background
The thinking of diagnosing diseases and differentiating syndromes of traditional Chinese medicine is the essence of traditional Chinese medicine, but the differentiation process is stored in the heart of traditional Chinese medicine doctors and is difficult to express, how to display the traditional Chinese medicine differentiation process from mining, and how to establish widely recognized traditional Chinese medicine syndrome differentiation rules has higher value and significance. The traditional Chinese medicine literature and the experience of the famous and old traditional Chinese medicine experts have the traditional Chinese medicine dialectical thinking, and the traditional Chinese medicine literature and the famous and old traditional Chinese medicine experience are taken as the background, and the modern technology is combined to research the syndrome type and the diagnosis standard, so that the traditional Chinese medicine experience with higher value can be excavated.
There are several methods for analyzing and mining the Chinese medicine data, including factor analysis, principal component analysis, classification and Bayesian statistics. The Bayesian classification is the most common one, and the method is mainly characterized in that a search optimization process is not needed, only the frequency number of each attribute value in a training set is calculated, the prior probability value of each attribute is independently solved under the general assumption of class conditions, and then the posterior probability is solved according to the prior probability. The whole solving process has solid mathematical reasoning, so the Bayesian classification has higher efficiency and interpretability. The Bayesian correlation algorithm has wide application in medical diagnosis, and the superiority of the Bayesian method is elucidated by comparing the principles and methods, the current application situation, the advantages and various algorithms of the Lebodong and the like in the aspect that Bayesian is used for computer-aided disease diagnosis and epidemic situation judgment. The research of Sunyan men and the like shows that the Bayesian algorithm has good classification performance in a traditional Chinese medicine coronary heart disease clinical diagnosis model, and is beneficial to improving the clinical syndrome differentiation capability and finding new syndrome differentiation elements. Based on the local learning, the cheerfulness is paid, and a naive Bayes classification algorithm for carrying out example weighting improvement based on cosine similarity is provided for disease prediction, so that the classification precision of the algorithm is improved. Liujing Hua uses a Bayes model to research and apply a medical image classification technology, and a satisfactory effect is obtained.
The above analysis and research results show that the use of Bayes to study the syndromes of TCM is a feasible method. Generally, the mathematical description of the syndrome typing diagnostic rule is as follows: given the occurrence of a group of symptoms X1, …, Xn, the value of which belongs to syndrome Z is s, or does not belong to syndrome-s? Wherein the values of the symptoms X1, … and Xn are all 1; the value of syndrome Z is 1(s) or 0 (-s), where s represents the existence of syndrome and s represents the absence of syndrome. The conventional method is as follows: calculating posterior probability distribution P (Z | X) of Z based on Bayesian formula 1 ,...,X n ) Then, it is checked whether the probability of Z ═ s is larger than that of Z ═ s, that is, whether formula (2) is satisfied:
P(Z=s|X 1 ,...,X n )≥P(Z=~s|X 1 ,...,X n )(2)
if so, classifying the patient as Z ═ s; if the value is less than the predetermined value, the patient is judged to belong to the class of Z-s.
However, the existing methods give results according to probability values during research, so that the method is inconvenient to popularize and use, and doctors of traditional Chinese medicine have difficulty in performing mathematical reasoning during application. Such methods require a physician to make probabilistic reasoning, which is inappropriate and difficult to apply practically.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a method for calculating the traditional Chinese medicine syndrome threshold value and the symptom score based on improved Bayesian statistics, and all traditional Chinese medicine syndrome diseases can be analyzed.
Based on the earlier stage research foundation (Li Jiansheng, Ma jin Di, Yuan Qing, etc., based on the modern famous and old traditional Chinese medicine experience, the medicine rule research [ J ]. the traditional Chinese medicine journal, 2016,57(18): 1598-.
The technical scheme adopted by the invention is as follows:
a method for calculating traditional Chinese medicine syndrome threshold and symptom score based on improved Bayesian statistics is characterized in that by taking consumptive lung disease as an example, symptoms, syndrome types and syndrome score data of patients are obtained for established symptoms and syndrome classification data of consumptive lung disease patients and experts, the syndrome threshold and the symptom score are calculated by using Bayesian statistics, and a classification rule of the consumptive lung disease syndromes is researched, and the method comprises the following steps:
step 1) starting from the classification data of symptoms and syndromes of all patients with consumptive lung disease, calculating the prior probability of each syndrome and symptom;
step 2) counting the probability of occurrence of symptoms and syndromes, and calculating and determining contribution scores of all symptoms and syndrome thresholds in each syndrome through a logarithmic ratio;
and 3) calculating contribution scores and syndrome thresholds of all symptoms by adopting a Laplace smoothness improvement algorithm so as to improve statistical results, and taking the first ten symptoms with higher scores as a syndrome typing rule so as to improve the robustness of the syndrome differentiation rule.
Method for calculating traditional Chinese medicine syndrome threshold value and symptom score based on improved Bayesian statisticsAnd in the step 2), performing mathematical description according to syndrome typing diagnosis rules: when a group of symptoms X1, … and Xn appear, the value of the symptom belonging to syndrome Z is determined to be s, and the value of the symptom not belonging to syndrome is determined to be-s, wherein the values of the symptoms X1, … and Xn are all 1; the value of syndrome Z is 1(s) or 0(s), where Z-s represents the existence of syndrome and s is the absence of syndrome; calculating posterior probability distribution P (Z | X) of Z based on Bayesian formula 1 ,...,X n ) Whether the probability of Z ═ s is greater than the probability of Z ═ s, that is, whether formula (2) is satisfied:
P(Z=s|X 1 ,...,X n )≥P(Z=~s|X 1 ,...,X n ) (2)
establishing a diagnosis rule in a scoring mode according to the existing experience of the western medicine disease diagnosis standard, counting the prior probability of each symptom from original data, assigning a score to each symptom by using logarithmic ratio conversion, determining a syndrome threshold, and classifying the patient into a syndrome Z(s) when the total score of the symptoms reaches or exceeds the threshold, otherwise, not belonging to the syndrome Z(s).
The method for calculating the traditional Chinese medicine syndrome threshold value and the symptom score based on the improved Bayesian statistics is proved to be as follows by deduction of a symptom score and syndrome threshold value formula:
Figure BDA0002330204600000031
Figure BDA0002330204600000032
substituting formulae (3) and (4) for formula (2) yields formula (5):
Figure BDA0002330204600000033
considering equation (5), since the symptomatic variables satisfy mutually independent distributions:
Figure BDA0002330204600000034
due to the symptom X in the above formula n Are all equal to 1, and in order to consider the condition that the symptom does not appear, the condition that the symptom is 0 is simultaneously subtracted from two sides of the above formula, and the inequality is converted into
Figure BDA0002330204600000035
For the above equation (6), the left side of the inequality
Figure BDA0002330204600000036
That is, symptom X i The score of (1) is score (xi), which means the ratio of symptom appearance to symptom non-appearance when the syndrome is 1, divided by the ratio of symptom appearance to symptom non-appearance when the syndrome is 0, and is called ratio for short; the right side of the inequality is a syndrome Threshold value which is recorded as Threshold; then the syndrome analysis rule obtains the following formula (7) according to the relationship between the symptom score and the threshold value:
Score(X1)+Score(X2)+…+Score(Xn)≥Threshold (7)
if the sum of the symptom scores of the patients is greater than or equal to the threshold value, the syndrome Z can be judged to belong to, otherwise, the syndrome Z is not judged to belong to.
In the method for calculating the traditional Chinese medicine syndrome threshold value and the symptom score based on the improved Bayesian statistics, in the step 1), the obtained symptom, syndrome and expert group syndrome classification data of all patients are traversed according to the following sequence: traversing each syndrome, traversing all symptoms corresponding to each syndrome, and traversing each row of patient data of each symptom; the following syndrome typing algorithm framework is established: in the innermost cycle, if each patient has a non-empty corresponding syndrome: if the syndrome is 1, s1 is increased by 1, considering that the corresponding symptom is not empty, s1x0 is increased by 1 when the symptom is 0, and s1x1 is increased by 1 when the symptom is 1; if the syndrome is 0, s0 is increased by 1, considering that the corresponding symptom is not null, when the symptom is 0, s0x0 is increased by 1, and when the symptom is 1, s0x1 is increased by 1;
after the inner layer circulation is finished, obtaining s1x0, s1x1, s0x0 and s0x1 partial data for calculating the score of the symptom and the threshold value of the syndrome, and storing the calculation result into a dynamic array;
after each symptom is traversed, taking out each value from the dynamic array, and calculating a result by using the value and the threshold value calculation formula; the output result is a rule file for each syndrome, and the symptom name, the syndrome threshold value and the score corresponding to the symptom are output.
The method for calculating the traditional Chinese medicine syndrome threshold and symptom score based on the improved Bayesian statistics adopts a Laplace smoothing algorithm to improve a syndrome typing algorithm:
when the probability is calculated, a small number is added to the occurrence frequency of each symptom, so that the influence on the result is small, when the data set is large enough, the influence on the probability is ignored, and the problem of zero probability can be solved;
under Laplace smoothing, the prior probabilities P (c) and P (x) are calculated i I c) is:
Figure BDA0002330204600000041
Figure BDA0002330204600000042
wherein D represents the training set, D c Representing a sample set with the category c in the training set; d c ,x i Represents D c Wherein the value is x i N denotes the number of possible classes present in D, N i The possible category number of the ith attribute is represented, and the small constant selected for solving the zero probability problem is obtained by adding 1; in the smoothing process, when the initial score of the symptom is negative infinity and the value is positive after smoothing, multiplying the smoothed score by-1;
the data are processed by the algorithm to obtain the threshold value of each syndrome and the symptoms with the higher scores of the first ten syndromes, namely the classification rule of the consumptive lung syndrome.
The method for calculating the traditional Chinese medicine syndrome threshold value and the symptom score based on the improved Bayesian statistics has the following formula:
let the sample space of experiment E be S, A be the event of E, B 1 ,B 2 …,B n Is a division of S, and P (ai)>0,P(Bi)>0(i ═ 1,2, …, n), then
Figure BDA0002330204600000043
When analyzing the symptoms and syndromes of traditional Chinese medicine, A in the above formula represents the symptoms, and B represents the syndromes;
the right part of the formula (1) is prior probability, which is directly obtained from sample data, and the denominator is equal to the summation of the numerator, so that the numerator is only analyzed; in the molecule P (B) i ) The probability of occurrence of syndrome B is defined as follows: p (B ═ 0) and P (B ═ 1) respectively represent the absence or probability of the presence of syndrome B; p (A | B) i ) The probability of the symptom A in a certain state of the syndrome B is shown, and the probability has four values: p (a ═ 0| B ═ 0), P (a ═ 1| B ═ 0), P (a ═ 0| B ═ 1), P (a ═ 1| B ═ 1), where the numbers 0 and 1 denote the absence and presence, respectively; after the 6 probability values are obtained, Bayesian inference and statistics can be realized subsequently.
The invention has the beneficial effects that:
1. the invention discloses a traditional Chinese medicine syndrome and symptom prior probability calculation method based on improved Bayesian statistics, which is based on patient data, uses the improved Bayesian statistics to calculate the prior probability of syndromes and symptoms, deduces and calculates each syndrome threshold and the score of the corresponding symptom of the syndrome through logarithmic ratio, and provides a Laplace smooth improved score calculation method to improve the robustness of syndrome typing rules. To validate the method herein, the classification accuracy averaged 96.72% using all data as test set validation rules. And (3) learning the rules by taking 80% of data as a training set, taking 20% of the data as a test set to verify the rules, and finally testing the original data and the 20% of the data by using the obtained typing rules respectively, wherein the accuracy rate of the classification of the verification rules is 97.07% on average.
2. The invention is based on the method for calculating the traditional Chinese medicine syndrome and symptom prior probability by improving Bayesian statistics, the obtained rule is basically consistent with the relevant traditional Chinese medicine theory, the syndrome classifier constructed by the syndrome differentiation rule has higher accuracy, and a new method and thought are provided for the clinical traditional Chinese medicine syndrome diagnosis of the consumptive lung disease in future. Through the traditional Chinese medicine interpretation of the obtained rule, the symptom scores corresponding to the symptoms of the disease such as exterior deficiency syndrome, lung-spleen qi deficiency syndrome, lung-qi-yin deficiency syndrome and the like all accord with the traditional Chinese medicine theory, and the syndrome differentiation experience of experts is better revealed.
3. The method for calculating the traditional Chinese medicine syndrome threshold value and the symptom score based on the improved Bayesian statistics can be applied to analysis of all traditional Chinese medicine syndrome diseases. The method can mine and deduce the relation between symptoms and syndromes from the original data in a strict mathematical probability form, and express the syndrome distinguishing rule in a convenient score threshold value form, thereby being convenient for doctors and patients to use. Therefore, the syndrome differentiation thought of the famous and old traditional Chinese medicine on the disease syndromes can be effectively excavated, the corresponding relation between symptoms and syndromes is determined, and a reference basis is provided for the clinical traditional Chinese medicine syndrome type diagnosis of the diseases.
Drawings
FIG. 1: screenshot of part of symptom data of symptoms, syndrome types and syndrome data of the patient with consumptive lung disease;
FIG. 2: obtaining an accuracy comparison graph of the dialectical rule in all data;
figure 3 is a graph comparing the accuracy of the dialectical rules against 20% test data.
Detailed Description
The technical solution of the present invention is further described in detail below by means of specific embodiments and with reference to the accompanying drawings.
Example 1
The invention discloses a method for calculating a traditional Chinese medicine syndrome threshold value and a symptom score based on improved Bayesian statistics, aiming at the established symptoms of consumptive lung disease patients and expert group syndrome classification data, the symptoms, syndrome types and syndrome data of the patients are obtained, the syndrome threshold value and the symptom score are calculated by using a Bayesian statistical algorithm, and the classification rule of the consumptive lung disease syndromes is researched, and the method comprises the following steps:
step 1) calculating the prior probability of each syndrome and symptom from the classification data of all symptoms and syndromes of the consumptive lung disease patients;
step 2) counting the probability of occurrence of symptoms and syndromes, and calculating and determining contribution scores of all symptoms and syndrome thresholds in each syndrome through a logarithmic ratio;
and 3) calculating contribution scores and syndrome thresholds of all symptoms by adopting a Laplace smoothing improvement algorithm so as to improve statistical results, and taking the first ten symptoms with higher scores as syndrome typing rules to improve the robustness of the syndrome differentiation rules.
Example 2
The method for calculating the traditional Chinese medicine syndrome threshold value and the symptom score based on the improved Bayesian statistics in the embodiment is different from the method in the embodiment 1 in that: further, in step 2), the diagnosis rules are mathematically described according to syndrome typing: when a group of symptoms X1, … and Xn appear, the value of the symptom belonging to syndrome Z is determined to be s, and the value of the symptom not belonging to syndrome is determined to be-s, wherein the values of the symptoms X1, … and Xn are all 1; the value of syndrome Z is 1(s) or 0(s), where Z-s represents the existence of syndrome and s is the absence of syndrome; calculating posterior probability distribution P (Z | X) of Z based on Bayesian formula 1 ,...,X n ) Whether the probability of Z ═ s is greater than the probability of Z ═ s, that is, whether formula (2) is satisfied:
P(Z=s|X 1 ,...,X n )≥P(Z=~s|X 1 ,...,X n ) (2)
according to the existing standard experience of the diagnosis of the western medicine diseases [16,18] Establishing a diagnosis rule in a scoring mode, counting the prior probability of each symptom from original data, assigning a score to each symptom by using logarithmic ratio conversion, determining a syndrome threshold, and classifying the patient into a Z-s syndrome when the total score of the symptoms reaches or exceeds the threshold, or else, classifying the patient into a Z-s syndrome.
Example 3
The method for calculating the traditional Chinese medicine syndrome threshold value and the symptom score based on the improved Bayesian statistics is different from the method in the embodiment 2 in that: the formula derivation of symptom score and syndrome threshold proves as follows:
Figure BDA0002330204600000061
Figure BDA0002330204600000062
substituting formulae (3) and (4) for formula (2) yields formula (5):
Figure BDA0002330204600000063
considering equation (5), since the symptomatic variables satisfy mutually independent distributions:
Figure BDA0002330204600000064
Figure BDA0002330204600000071
due to the symptom X in the above formula n Are all equal to 1, and in order to consider the condition that the symptom does not appear, the condition that the symptom is 0 is simultaneously subtracted from two sides of the above formula, and the inequality is converted into
Figure BDA0002330204600000072
For the above equation (6), the left side of the inequality
Figure BDA0002330204600000073
That is, symptom X i The score of (1) is score (xi), which means the ratio of symptom appearance to symptom non-appearance when the syndrome is 1, divided by the ratio of symptom appearance to symptom non-appearance when the syndrome is 0, and is called ratio for short; the right side of the inequality is a syndrome Threshold value which is recorded as Threshold; then the syndrome analysis rule obtains the following formula (7) according to the relationship between the symptom score and the threshold value:
Score(X1)+Score(X2)+…+Score(Xn)≥Threshold (7)
if the sum of the symptom scores of the patients is greater than or equal to the threshold value, the syndrome Z can be judged to belong to, otherwise, the syndrome Z is not judged to belong to.
Example 4
The method for calculating the traditional Chinese medicine syndrome threshold value and the symptom score based on the improved Bayesian statistics is different from the previous embodiments in that: in the step 1), traversing the acquired symptoms, syndrome types and expert group syndrome classification data of all patients according to the following sequence: traversing each syndrome, traversing all symptoms corresponding to each syndrome, and traversing each row of patient data of each symptom; the following syndrome typing algorithm framework is established:
in the innermost cycle, if each patient has a non-empty corresponding syndrome: if the syndrome is 1, s1 is increased by 1, considering that the corresponding symptom is not empty, s1x0 is increased by 1 when the symptom is 0, and s1x1 is increased by 1 when the symptom is 1; if the syndrome is 0, s0 is increased by 1, considering that the corresponding symptom is not null, when the symptom is 0, s0x0 is increased by 1, and when the symptom is 1, s0x1 is increased by 1;
after the inner layer circulation is finished, obtaining s1x0, s1x1, s0x0 and s0x1 partial data for calculating the score of the symptom and the threshold value of the syndrome, and storing the calculation result into a dynamic array;
after each symptom is traversed, taking out each value from the dynamic array, and calculating a result by using the value and the threshold value calculation formula; the output result is a rule file for each syndrome, and the symptom name, the syndrome threshold value and the score corresponding to the symptom are output.
Example 5
In the method for calculating the traditional Chinese medicine syndrome threshold value and the symptom score based on the improved bayesian statistics, the difference between the previous embodiments is as follows: and (3) improving the syndrome typing algorithm by adopting a Laplace smoothing algorithm:
when the probability is calculated, a small number is added to the occurrence frequency of each symptom, so that the influence on the result is small, when the data set is large enough, the influence on the probability is ignored, and the problem of zero probability can be solved;
under Laplace smoothing, the prior probabilities P (c) and P (x) are calculated i I c) is:
Figure BDA0002330204600000081
Figure BDA0002330204600000082
wherein D represents the training set, D c Representing a sample set with the category c in the training set; d c ,x i Represents D c Wherein the value is x i N denotes the number of possible classes present in D, N i The possible category number of the ith attribute is represented, and the small constant selected for solving the zero probability problem is obtained by adding 1; in the smoothing process, when the initial score of the symptom is negative infinity and the value is positive after smoothing, multiplying the smoothed score by-1;
the data are processed by the algorithm to obtain the threshold value of each syndrome and the symptoms with the higher scores of the first ten syndromes, namely the classification rule of the consumptive lung syndrome.
The invention discloses a method for calculating a traditional Chinese medicine syndrome threshold value and a symptom score based on improved Bayesian statistics, wherein a Bayesian statistical formula is as follows:
let the sample space of experiment E be S, A be the event of E, B 1 ,B 2 …,B n Is a division of S, and P (ai)>0,P(Bi)>0(i ═ 1,2, …, n), then
Figure BDA0002330204600000083
When analyzing the symptoms and syndromes of traditional Chinese medicine, A in the formula represents the symptoms, B represents the syndromes;
the right part of the formula (1) is prior probability, which is directly obtained from sample data, and denominator is equal to summation of numerators, so that only the numerators are analyzed; in the molecule P (B) i ) The probability of occurrence of syndrome B is defined as follows: p (B ═ 0) and P (B ═ 1) respectively represent the absence or probability of the presence of syndrome B; p (A | B) i ) Indicates the probability of the symptom A in a certain state of the syndrome B, and it has fourTaking values as follows: p (a ═ 0| B ═ 0), P (a ═ 1| B ═ 0), P (a ═ 0| B ═ 1), P (a ═ 1| B ═ 1), where the numbers 0 and 1 denote the absence and presence, respectively; after the 6 probability values are obtained, Bayesian inference and statistics can be realized subsequently.
Example 6
The invention discloses a method for calculating a traditional Chinese medicine syndrome threshold value and a symptom score based on improved Bayesian statistics, which is used for analyzing an established Lung series disease database for famous and old traditional Chinese medicine treatment based on earlier stage research foundation, inducing and researching symptoms and syndromes of consumptive lung disease by utilizing the improved Bayesian statistics method, deducing and calculating each syndrome threshold value and the score of corresponding symptom of syndrome by using logarithmic ratio, and providing a Laplace smooth improved score calculation method, thereby establishing a syndrome typing diagnosis rule of consumptive lung disease and providing reference for the syndrome diagnosis of clinical consumptive lung disease.
1. Correlation basis
1.1 data Source
The data is from the established modern famous and old Chinese medicine lung disease database and periodical Chinese medicine lung disease database. The preliminary period of the project group has obtained symptoms, syndrome types and syndrome data of 412 patients with lung atrophy, wherein the symptoms data are shown in figure 1, each case has 88 symptom variables, and an expert group has the syndrome classification result of the 412 data according to experience. In the data table, 1 represents the presence of the syndrome or symptom, and 0 represents the absence of the syndrome or symptom.
1.2 Bayes theory introduction
Bayesian theory hypothesis: if the event outcome is uncertain, the only way to quantify it is the event probability. If the occurrence of events in past experiments is known, the probability of occurrence of events in future experiments can be calculated mathematically. Bayesian theorem can be expressed using a mathematical formula, which is the bayesian formula below.
Let the sample space of experiment E be S, A be the event of E, B 1 ,B 2 …,B n Is a division of S, and P (ai)>0,P(Bi)>0(i ═ 1,2, …, n), then
Figure BDA0002330204600000091
When analyzing the symptoms and syndromes of traditional Chinese medicine, A in the above formula represents the symptoms and B represents the syndromes.
The right part of equation (1) is the prior probability, which can be found directly from the sample data, the denominator being equal to the sum of the numerators. In the molecule P (B) i ) The probability of occurrence of syndrome B is defined as follows: p (B ═ 0) and P (B ═ 1) each indicate the absence or probability of the presence of syndrome B. P (A | B) i ) The probability of the symptom A in a certain state of the syndrome B is shown, and the probability has four values: p (a-0 | B ═ 0), P (a-1 | B ═ 0), P (a-0 | B ═ 1), P (a-1 | B ═ 1), where the numbers 0 and 1 denote the absence and presence, respectively. After the 6 probability values are obtained, Bayesian inference and statistics can be realized subsequently.
2. Improved Bayes statistics for establishing syndrome typing rule
2.1 establishing syndrome typing rules
Starting from all the consumptive lung disease symptoms and syndrome data, calculating the prior probability of each syndrome and symptom, then deducing a symptom contribution score and a syndrome threshold calculation formula, calculating all the symptom contribution scores and the syndrome threshold in each syndrome by adopting a Laplace smooth improvement statistical result, and then taking the first ten symptoms with higher scores as a syndrome typing rule. And finally, testing the original data and the 20% data respectively by using the obtained typing rule, and verifying the accuracy of rule classification.
2.2 regular symptom score and syndrome threshold calculation derivation
Generally, the syndrome typing diagnostic rules are described mathematically as: given the occurrence of a group of symptoms X1, …, Xn, the value of which belongs to syndrome Z is s, or does not belong to syndrome-s? Wherein the values of the symptoms X1, … and Xn are all 1; the value of syndrome Z is 1(s) or 0(s), where Z-s represents the presence of syndrome and s is the absence of syndrome.
The conventional method is as follows: calculating posterior probability distribution P (Z | X) of Z based on Bayesian formula 1 ,...,X n ) Check to see if the probability of Z ═ s is greater than that of Z ═ s, i.e.Whether formula (2) is satisfied:
P(Z=s|X 1 ,...,X n )≥P(Z=~s|X 1 ,...,X n ) (2)
if so, classifying the patient as Z ═ s; if the value is less than the predetermined value, the patient is judged to belong to the class of Z-s.
Such methods require a doctor to make probabilistic reasoning inappropriate and are difficult to apply practically. According to the existing experience of the western medicine disease diagnosis standard, a diagnosis rule in a scoring mode is established, the prior probability of each symptom is counted from original data, a score is given to each symptom, a syndrome threshold value is determined, when the total score of the symptoms reaches or exceeds the threshold value, the patient is classified into a syndrome Z(s), otherwise, the patient does not belong to the syndrome Z(s).
The symptom score and syndrome threshold formula derivation proves as follows:
Figure BDA0002330204600000101
Figure BDA0002330204600000102
substituting formulae (3) and (4) for formula (2) yields formula (5):
Figure BDA0002330204600000103
considering equation (5), since the symptomatic variables satisfy mutually independent distributions:
Figure BDA0002330204600000104
due to the symptom X in the above formula n Are all equal to 1, and in order to consider the condition that the symptom does not appear, the condition that the symptom is 0 is simultaneously subtracted from two sides of the above formula, and the inequality is converted into
Figure BDA0002330204600000111
For the above equation (6), the left side of the inequality
Figure BDA0002330204600000112
That is, symptom X i The score of (1) is score (xi), which means the ratio of symptom appearing to symptom not appearing when the syndrome is 1, divided by the ratio of symptom appearing to symptom not appearing when the syndrome is 0, and is called ratio for short. The right side of the inequality is the syndrome Threshold, which is recorded as Threshold. Then the syndrome analysis rule obtains the following formula (7) according to the relationship between the symptom score and the threshold value:
Score(X1)+Score(X2)+…+Score(Xn)≥Threshold (7)
if the sum of the symptom scores of the patients is greater than or equal to the threshold value, the syndrome Z can be judged to belong to, otherwise, the syndrome Z is not judged to belong to.
2.3 establishing syndrome typing Algorithm framework
Traversing all syndromes, symptoms and patients according to data, wherein the traversing sequence is as follows: first, each syndrome is traversed, then all symptoms corresponding to each syndrome are traversed, and finally, each row of patient data of each symptom is traversed. The output result is a rule file for each syndrome, and the file contains a syndrome threshold value and all symptom scores.
The algorithm idea is as follows: in the innermost cycle, if each patient has a non-empty corresponding syndrome: if the syndrome is 1, s1 is increased by 1, considering that the corresponding symptom is not empty, s1x0 is increased by 1 when the symptom is 0, and s1x1 is increased by 1 when the symptom is 1; if the syndrome is 0, s0 is increased by 1, considering that the corresponding symptom is not null, s0x0 is increased by 1 when the symptom is 0, and s0x1 is increased by 1 when the symptom is 1. And after the inner layer circulation is finished, obtaining s1x0, s1x1, s0x0 and s0x1 partial data for calculating the score of the symptom and the threshold value of the syndrome, and storing the calculation result into a dynamic array. And after the traversal of each symptom is finished, taking out each value from the dynamic array, and calculating a result by using the score and the threshold calculation formula. And sorting the absolute values of the scores, and outputting the symptom names and the scores corresponding to the symptom names.
The algorithm framework is as follows:
inputting: symptoms and certain syndrome data of patients with consumptive lung disease
And (3) outputting: score of symptom, threshold of certain syndrome
Figure BDA0002330204600000113
Figure BDA0002330204600000121
Algorithm 1 Bayesian statistical syndrome typing algorithm
The original algorithm illustrates that in the process of processing data, we find that the following special cases exist in obtaining scores:
score is positive infinity:
for example, for lung yang deficiency syndrome, there are 8 records with a syndrome of 1, 4 records with a symptom of enuresis of 1 and 4 records with a symptom of 0 in the 8 records. There were 404 records with syndrome 0, of which 404 records were 0 (i.e., absent) with symptom enuresis of 1, and 404 records with symptom 0. And obtaining the enuresis score as positive and infinite according to a score calculation formula.
② the score is negative infinity:
for example, when calculating lung yang deficiency syndrome, there are 8 records with syndrome 1, and in the 8 records, there are 0 records (i.e., absence) with symptom pulse width 1 and 8 records with symptom 0. There are 404 records with syndrome 0, 2 records with symptom pulse width of 1 and 402 records with symptom 0 in the 404 records. According to a score calculation formula, obtaining the pulse hollow score as minus infinity.
2.4 improvement of the syndrome typing Algorithm
The symptom score is infinite and infinitesimal because of the continuous product calculation in Bayes statistics, when the training set is not comprehensive or extreme, the probability of the calculation result is 0, the numerator or denominator in the score formula is 0, and the score is positive infinity or negative infinity. To solve this problem, we use Laplace smoothing [10] (Laplacian smoothering), when calculating the probability, adding a very small number to the occurrence frequency of each symptom, thus having little influence on the result, when the data set is large enough, neglecting the influence on the probability, but solving the problem of zero probability. Under Laplace smoothing, the prior probabilities P (c) and P (x) are calculated i I c) is:
Figure BDA0002330204600000131
Figure BDA0002330204600000132
wherein D represents the training set, D c Representing a sample set with the category c in the training set; d c ,x i Represents D c Wherein the value is x i N denotes the number of possible classes present in D, N i The number of possible categories representing the ith attribute, plus 1, is a small constant chosen to solve the zero probability problem. I.e. under laplace smoothing: when the syndrome is 0 and the frequency of appearance of the symptom is 0 or 1 is 0, two records of the syndrome is 0 are added, wherein one symptom is 0 and the other symptom is 1; when the syndrome is 1 and the frequency of appearance of the symptom is 0 or 1 is 0, two records of the syndrome is 1 are added, wherein one symptom is 0 and the other symptom is 1. In addition, because the positive and negative values of the score have different meanings, in the process of smoothing processing, the original negative infinite score is still negative after being processed. In the smoothing process, when the initial score of the symptom is negative infinity and the smoothed score is positive, the smoothed score is multiplied by-1. The improved smoothing algorithm framework is as follows:
the smoothing part pseudo code is as follows:
initial, the symptom, syndrome data, original score and original threshold after preliminary statistics
Output-improved syndrome threshold and symptom score for smoothing
Figure BDA0002330204600000141
Algorithm 2 improved syndrome type algorithm
The data are processed by the algorithm to obtain the threshold value of each syndrome and the symptoms with the higher scores of the first ten, namely the classification rule of the consumptive lung syndrome.
3. To get rule test analysis
In order to verify the validity of the rule, the first ten symptoms of each syndrome are traversed respectively, the scores of the symptoms of a patient are taken out when the patient has one of the symptoms, the scores are added and compared with a threshold value after the traversal is finished, if the scores are larger than or equal to the threshold value, the patient suffers from the syndrome, and otherwise, the syndrome does not exist.
According to the symptom data of the patient, the obtained rules are used for syndrome classification, the expert group classification results are used as theoretical values for comparison, and the accuracy and the comprehensive index F1-measure of the syndrome classification rules are calculated. Precision (Precision):
Figure BDA0002330204600000142
reflects the proportion of true positive examples samples in the positive examples determined by the classifier. Comprehensive evaluation index (F1-measure):
Figure BDA0002330204600000143
is the precision and recall weighted harmonic average (when the precision and recall are weighted the same). And respectively carrying out comparison test on all the data sets and the test sets.
3.1 testing all data
The data of 412 patients with consumptive lung disease are used as a training set and a testing set, so that syndrome differentiation rules are obtained, and the accuracy rate of the rules and the F1-measure index are verified. The comparison of the accuracy of the original syndrome typing algorithm and the accuracy of the algorithm obtained after smooth optimization is shown in fig. 2.
Wherein the average accuracy of the original algorithm and the optimized algorithm is 86.04% and 96.72% respectively, and the average F1 index value is 27.84% and 42.94% respectively. F1 is a comprehensive index of the precision rate and the recall rate, and the higher the F1 index is, the more balanced the precision rate and the recall rate are, and the more stable the classification model is. The reason why the original algorithm F1 is small is that the score of the symptom appears positive infinity and negative infinity due to the zero probability problem, and the uncertainty generated in establishing the rule makes the classification model less stable.
(1) Smooth rule-free typing algorithm test
TABLE 1 results of non-smoothing tests
Figure BDA0002330204600000151
Figure BDA0002330204600000161
Description of the drawings:
the accuracy, precision and recall of the syndrome of wind-heat invading the lung and the F1-measure explain: for the lung syndrome of wind-heat invasion, the data obtained by the test is compared with the judgment result of an expert, the dialectical result of the expert is 1, the sample number of the classification result which is still 1 is 3, namely TP is 3; the expert dialectical result is 1, the number of samples with the classification result of 0 is 0, namely FN is 0; the expert dialectical result is 0, the number of samples with the classification result of 1 is 218, namely FP is 218; expert's dialectical result is 0, and the number of samples with classification result of 0 is 191, that is, TN is 191, so Accuracy (Accuracy): a-194/412-0.47087, Precision (Precision): p3/(3 +218) 0.01357, Recall (Recall) R3/(3 +0) 1, and overall evaluation index (F1-measure): f1 ═ 0.026776.
(2) Smoothing results in a rule test
And training scores and thresholds for the data of 412 patients by using a smooth optimization rule classification algorithm, and comparing the scores and the thresholds with expert group classification results after classification by using rules to obtain the accuracy and the F1-measure.
TABLE 2 test results after smoothing optimization
Figure BDA0002330204600000162
Figure BDA0002330204600000171
Description of the drawings: (
The accuracy, precision and recall of the lung-kidney qi deficiency syndrome and F1-measure explain: for the lung-kidney qi deficiency syndrome, the data obtained by the test is known by comparing with the judgment result of an expert, the dialectical result of the expert is 1, the sample number of the classification result which is still 1 is 0, namely TP is 0; the expert dialectical result is 1, the number of samples with the classification result of 0 is 1, namely FN is 1; the expert dialectical result is 0, the number of samples with the classification result of 1 is 0, namely FP is 0; the expert dialectical result is 0, the number of samples with the classification result of 0 is 411, that is, TN is 411, so the Accuracy (Accuracy): a-411/412-0.99757, Precision (Precision): p0/0? (ii) Recall rate (Recall): R0/1 is 0, and integrated evaluation index (F1-measure): f1? .
3.2 testing 20% of the data
80% of 412 patients with consumptive lung disease are randomly selected as training set data, a dialectic rule is obtained through training, then the rest 20% of data are used as a test set, the dialectic rule is obtained, and the accuracy rate of the verification rule is shown in fig. 3.
Wherein the average accuracy of the original algorithm and the optimized algorithm is 85.66% and 97.07% respectively, and the average F1 index value is 25.27% and 24.17% respectively. It can be seen that the average F1 value for 20% of the test results of the data set is significantly lower than that for the entire data set, because the data of 20% of the test data set is less, and the records with the theoretical label value of 1 are 0 or very few, so that the TP corresponding to multiple syndromes is 0, and therefore the total F1 is 0, which is more frequent, and results in a smaller overall F1 mean.
(1) No smoothing test:
TABLE 3 No smoothing test results
Figure BDA0002330204600000172
Figure BDA0002330204600000181
(2) Testing after smoothing optimization:
80% of symptom data of lung atrophy patients are used as a training set, values and thresholds are calculated after smoothing processing is carried out on the data, a classifier is constructed by combining syndrome differentiation rules, 20% of data are used as a test set for classification, and the results of syndrome classification of the 20% of data are compared with syndrome classification results of an expert group according to experience, so that corresponding accuracy, precision, recall rate and F1-measure are calculated.
TABLE 4 test results after smoothing
Figure BDA0002330204600000182
Figure BDA0002330204600000191
4. Interpretation and conclusion of rules in traditional Chinese medicine
The study researches 25 syndromes of the patients with consumptive lung disease, including deficiency syndromes (exterior deficiency syndrome, lung-spleen qi deficiency syndrome, lung-kidney yin deficiency syndrome, lung-stomach yin deficiency syndrome, lung yang deficiency syndrome, lung yin deficiency syndrome, spleen-kidney yang deficiency syndrome, spleen-stomach qi deficiency syndrome, heart-kidney yang deficiency syndrome, yin-yang deficiency syndrome, lung qi and yin deficiency syndrome), excess syndromes (lung heat exuberance syndrome, wind-cold attacking lung syndrome, wind-heat invading lung syndrome, heat-toxin lung blocking syndrome, phlegm-heat lung blocking syndrome, phlegm-stasis lung blocking syndrome, turbid phlegm-phlegm lung blocking syndrome, blood stasis lung collateral blocking syndrome) and deficiency-excess syndromes (spleen-dampness stagnation syndrome, yang deficiency water-flood syndrome, lung dryness syndrome).
The threshold value of each syndrome and the corresponding first ten symptom score table data are as follows, and are not listed for saving the layout.
TABLE 5 exterior deficiency syndrome
Figure BDA0002330204600000192
Figure BDA0002330204600000201
For the exterior deficiency syndrome, the highest score of the spontaneous sweating symptom is 8.21, and the scores of the other main symptoms are aversion to wind 2.13, mental fatigue 1.63, aversion to cold 1.39 and headache 1.12. The weakness of the defensive exterior and the open and closure of the striae and skin striae with spontaneous sweating are the most common symptoms of exterior deficiency. The exterior deficiency can lead to the decline of the function of defending the exterior from pathogenic factors, and is susceptible to the attack of exogenous pathogenic factors, manifested as aversion to wind, aversion to cold, headache, and floating pulse; exterior deficiency generally refers to qi deficiency, manifested as lassitude, anorexia, pale tongue; spontaneous sweating due to exterior deficiency and consumption of yin fluid due to external discharge may also be accompanied by night sweat. The obtained dialectical rules are in good accordance with the theory of traditional Chinese medicine.
The dialectical rules of all syndromes obtained by longitudinal observation show that most of the conclusions are consistent with the knowledge of traditional Chinese medicine, only a few symptoms are not completely consistent with the syndromes, and the analysis reasons are that the number of the included original documents is small, and the syndromes are mostly related to the concurrent syndromes. For example, the spleen deficiency and damp retention syndrome may be manifested by moist coating, slow pulse, deep pulse, pale tongue, thready pulse, rapid pulse, red tongue, dyspnea, white coating, dyspnea and dyspnea. The moist coating, slow pulse, deep pulse, pale tongue and white coating can prompt damp evil, but the common symptoms of spleen deficiency and damp encumbering syndrome, such as epigastric and abdominal fullness, distending pain, loss of appetite, loose stool, limb encumbering and the like, are not reflected in the result, and the rapid pulse, red tongue, wheezing and dyspnea and shortness are frequently seen in the lung heat syndrome because the spleen deficiency and damp encumbering syndrome in the consumptive lung disease can have various syndromes (lung heat) which are added together, so that the symptoms of the heat syndrome appear.
The research is based on a Bayesian statistical method, based on patient data, the prior probability of syndromes and symptoms is calculated by using improved Bayesian statistics, each syndrome threshold value and the score of the corresponding symptom of the syndrome are calculated by logarithmic ratio derivation, and the calculation method of the Laplace smooth improved score is provided, so that the robustness of syndrome typing rules is improved. All data were used as test set validation rules, with an average classification accuracy of 86.04%, an average F1 score of 27.84%, an average improved rule classification accuracy of 96.72%, and an average F1 score of 42.94%. 80% of data is used as a training set to learn rules, 20% of data is used as a test set to verify rules, the average classification accuracy rate is 85.66%, the average F1 index is 25.27%, the average classification accuracy rate of the improved rules is 97.07%, and the average F1 index is 24.17%. The obtained rule is basically consistent with the relevant traditional Chinese medicine theory, the syndrome classifier constructed by the syndrome differentiation rule has higher accuracy, and a new method and thought are provided for the clinical traditional Chinese medicine syndrome diagnosis of the consumptive lung disease in future. The relation between symptoms and syndromes of the consumptive lung disease is explored, and the limitation is shown in that the sample volume of the current database is small, the establishment of the traditional Chinese medicine syndrome diagnosis rule needs to be based on a large amount of real and accurate data, most of the current samples come from the literature database of the famous and old traditional Chinese medicine, the actual medical record diagnosis and treatment data of the famous and old traditional Chinese medicine are increased in the later stage, the database is perfected, and the related methods are optimized.
The above examples are only for illustrating the present invention and should not be construed as limiting the scope of the claims of the present invention. The implementation can be changed by the simple modification of the technical scheme of the invention or the equivalent replacement by combining the prior art with the technical scheme of the invention. It should be understood that they are within the scope of the following claims and are not to be considered as limitations on the scope of the invention.

Claims (4)

1. A method for calculating traditional Chinese medicine syndrome threshold and symptom score based on improved Bayesian statistics is used for obtaining symptom, syndrome and symptom score data of a patient from established symptoms and expert group syndrome classification data of a consumptive lung disease patient, calculating the syndrome threshold and the symptom score by using a Bayesian statistical algorithm, and researching a classification rule of the consumptive lung disease syndrome, and comprises the following steps:
step 1) calculating the prior probability of each syndrome and symptom from the classification data of all symptoms and syndromes of the consumptive lung disease patients;
step 2) counting the probability of occurrence of symptoms and syndromes, and calculating and determining contribution scores of all symptoms and syndrome thresholds in each syndrome through a logarithmic ratio;
and (3) performing mathematical description according to syndrome typing diagnosis rules: given that a group of symptoms X1, … and Xn appear, the value of the symptom belonging to syndrome Z is s, and the value of the symptom not belonging to syndrome is-s, wherein the symptom X1, … and Xn is takenThe values are all 1; the value of syndrome Z is 1(s) or 0 (-s), Z-s represents the existence of syndrome, and-s represents the absence of syndrome; calculating posterior probability distribution P (Z | X) of Z based on Bayesian formula 1 ,...,X n ) Then, it is checked whether the probability of Z ═ s is larger than that of Z ═ s, that is, whether formula (2) is satisfied:
P(Z=s|X 1 ,...,X n )≥P(Z=~s|X 1 ,...,X n ) (2)
establishing a diagnosis rule in a scoring mode according to the existing western medicine disease diagnosis standard experience, counting the prior probability of each symptom from original data, assigning a score to each symptom by using logarithmic ratio conversion, determining a syndrome threshold, and classifying a patient into a Z-s syndrome when the total score of the symptoms reaches or exceeds the threshold, or else, classifying the patient into a Z-s syndrome;
step 3) calculating contribution scores and syndrome thresholds of all symptoms by adopting a Laplace smoothness improvement algorithm so as to improve statistical results, and taking the first ten symptoms with higher scores as syndrome typing rules to improve the robustness of the syndrome differentiation rules;
the formula derivation of symptom score and syndrome threshold proves as follows:
Figure FDA0003722264760000011
Figure FDA0003722264760000012
substituting formulae (3) and (4) for formula (2) yields formula (5):
Figure FDA0003722264760000013
considering equation (5), since the symptomatic variables satisfy mutually independent distributions:
Figure FDA0003722264760000014
Figure FDA0003722264760000021
due to the symptom X in the above formula n Are all equal to 1, and in order to consider the case of no symptom, the inequality is converted into the case of subtracting the symptom from 0 on both sides of the above equation
Figure FDA0003722264760000022
For the above equation (6), the left side of the inequality
Figure FDA0003722264760000023
That is, symptom X i The score of (1) is score (xi), which means the ratio of symptom appearance to symptom non-appearance when the syndrome is 1, divided by the ratio of symptom appearance to symptom non-appearance when the syndrome is 0, and is called ratio for short; the right side of the inequality is a syndrome Threshold value which is recorded as Threshold; then the syndrome analysis rule obtains the following formula (7) according to the relationship between the symptom score and the threshold value:
Score(X1)+Score(X2)+…+Score(Xn)≥Threshold (7)
if the sum of the symptom scores of the patients is greater than or equal to the threshold value, the syndrome Z can be judged to belong to, otherwise, the syndrome Z is not judged to belong to.
2. The method for calculating the traditional Chinese medicine syndrome threshold and symptom score based on the improved Bayesian statistics as claimed in claim 1, wherein: in the step 1), traversing the acquired symptoms, syndrome types and expert group syndrome classification data of all patients according to the following sequence: traversing each syndrome, traversing all symptoms corresponding to each syndrome, and traversing each row of patient data of each symptom; the following syndrome typing algorithm framework is established:
in the innermost cycle, if each patient has a non-empty corresponding syndrome: if the syndrome is 1, s1 is increased by 1, considering that the corresponding symptom is not empty, s1x0 is increased by 1 when the symptom is 0, and s1x1 is increased by 1 when the symptom is 1; if the syndrome is 0, s0 is increased by 1, considering that the corresponding symptom is not null, when the symptom is 0, s0x0 is increased by 1, and when the symptom is 1, s0x1 is increased by 1;
after the inner layer circulation is finished, obtaining s1x0, s1x1, s0x0 and s0x1 partial data for calculating the score of the symptom and the threshold value of the syndrome, and storing the calculation result into a dynamic array;
after each symptom is traversed, taking out each value from the dynamic array, and calculating a result by using the value and the threshold value calculation formula; the output result is a rule file for each syndrome, and the symptom name, the syndrome threshold value and the score corresponding to the symptom are output.
3. The method for calculating the traditional Chinese medicine syndrome threshold and symptom score based on the improved Bayesian statistics as claimed in claim 2, wherein: and (3) improving the syndrome typing algorithm by adopting a Laplace smoothing algorithm:
when the probability is calculated, a small number is added to the occurrence frequency of each symptom, so that the influence on the result is small, when the data set is large enough, the influence on the probability is ignored, and the problem of zero probability can be solved;
under Laplace smoothing, the prior probabilities P (c) and P (x) are calculated i I c) is:
Figure FDA0003722264760000031
Figure FDA0003722264760000032
wherein D represents the training set, D c Representing a sample set with the category c in the training set; d c ,x i Represents D c Wherein the value is x i N denotes the number of possible classes present in D, N i Indicates the number of possible categories for the ith attribute, plus1 is a small constant selected to solve the zero probability problem; in the smoothing process, when the initial score of the symptom is negative infinity and the value is positive after smoothing, multiplying the smoothed score by-1;
the data are processed by the algorithm to obtain the threshold value of each syndrome and the symptoms with the higher scores of the first ten syndromes, namely the classification rule of the consumptive lung syndrome.
4. The method for calculating the traditional Chinese medicine syndrome threshold and symptom score based on the improved Bayesian statistics as recited in claim 1,2 or 3, wherein: the Bayesian statistical formula is as follows:
let the sample space of experiment E be S, A be the event of E, B 1 ,B 2 …,B n Is a division of S, and p (ai) is > 0, p (bi) is > 0(i ═ 1,2, …, n), then
Figure FDA0003722264760000033
When analyzing the symptoms and syndromes of traditional Chinese medicine, A in the formula represents the symptoms, B represents the syndromes;
the right part of the formula (1) is prior probability, which is directly obtained from sample data, and the denominator is equal to the summation of the numerator, so that the numerator is only analyzed; in the molecule P (B) i ) The probability of occurrence of syndrome B is defined as follows: p (B ═ 0) and P (B ═ 1) each represent the absence or probability of the presence of syndrome B; p (A | B) i ) The probability of the symptom A in a certain state of the syndrome B is shown, and the probability has four values: p (a ═ 0| B ═ 0), P (a ═ 1| B ═ 0), P (a ═ 0| B ═ 1), P (a ═ 1| B ═ 1), where the numbers 0 and 1 denote the absence and presence, respectively; after the 6 probability values are obtained, Bayesian inference and statistics can be realized subsequently.
CN201911333121.7A 2019-12-23 2019-12-23 Method for judging traditional Chinese medicine syndrome based on improved Bayesian statistics Active CN110970129B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911333121.7A CN110970129B (en) 2019-12-23 2019-12-23 Method for judging traditional Chinese medicine syndrome based on improved Bayesian statistics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911333121.7A CN110970129B (en) 2019-12-23 2019-12-23 Method for judging traditional Chinese medicine syndrome based on improved Bayesian statistics

Publications (2)

Publication Number Publication Date
CN110970129A CN110970129A (en) 2020-04-07
CN110970129B true CN110970129B (en) 2022-08-16

Family

ID=70035904

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911333121.7A Active CN110970129B (en) 2019-12-23 2019-12-23 Method for judging traditional Chinese medicine syndrome based on improved Bayesian statistics

Country Status (1)

Country Link
CN (1) CN110970129B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111863199B (en) * 2020-07-30 2023-11-14 上海群有信息技术有限公司 Decision making system for assisting western medicine in dialectical application of Chinese patent medicine
CN116825364A (en) * 2023-08-29 2023-09-29 江苏盛泰科技集团有限公司 High-risk group health identification judgment system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106874655A (en) * 2017-01-16 2017-06-20 西北工业大学 Traditional Chinese medical science disease type classification Forecasting Methodology based on Multi-label learning and Bayesian network
CN107887022A (en) * 2017-11-09 2018-04-06 淮阴工学院 A kind of tcm syndrome intelligent diagnosing method based on SSTM
CN109036568A (en) * 2018-09-03 2018-12-18 浪潮软件集团有限公司 Method for establishing prediction model based on naive Bayes algorithm

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110257988A1 (en) * 2010-04-14 2011-10-20 Carmel-Haifa University Economic Corp. Ltd. Multi-phase anchor-based diagnostic decision-support method and system
KR101830314B1 (en) * 2017-07-26 2018-02-20 재단법인 구미전자정보기술원 A method of providing information for the diagnosis of pancreatic cancer using bayesian network based on artificial intelligence, computer program, and computer-readable recording media using the same

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106874655A (en) * 2017-01-16 2017-06-20 西北工业大学 Traditional Chinese medical science disease type classification Forecasting Methodology based on Multi-label learning and Bayesian network
CN107887022A (en) * 2017-11-09 2018-04-06 淮阴工学院 A kind of tcm syndrome intelligent diagnosing method based on SSTM
CN109036568A (en) * 2018-09-03 2018-12-18 浪潮软件集团有限公司 Method for establishing prediction model based on naive Bayes algorithm

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
A Traffic PRediction Algorithm Based on Bayesian Apatio-Temporal Model in Cellular Network;zhang z等;《2017 INTERNATIONAL SYMPOSIUM ON WIRELESS COMMUNICATION SYSTEMS》;20171231;全文 *
Construction and Application of Bayesian Network in Early Diagnosis of Alzheimer Disease"s System;Yan Sun;《2007 IEEE/ICME International Conference on Complex Medical Engineering》;20071112;全文 *
基于贝叶斯方法的中医"症-证"分析;李仕进等;《计算机工程》;20080105(第01期);全文 *
基于贝叶斯网络技术对焦虑抑郁共病中医证候学规律研究;薛亚静;《中国优秀硕士学位论文全文数据库(医药卫生科技辑)》;20180815(第8期);全文 *
基于贝叶斯网络的慢性阻塞性肺疾病急性加重期证候与症状间的关联模式;王至婉等;《中华中医杂志》;20190901(第9期);全文 *
朴素贝叶斯方法在中医证候分类识别中的应用研究;张丽伟等;《内蒙古大学学报(自然科学版)》;20070915(第05期);全文 *

Also Published As

Publication number Publication date
CN110970129A (en) 2020-04-07

Similar Documents

Publication Publication Date Title
CN111292853B (en) Multi-parameter-based cardiovascular disease risk prediction network model and construction method thereof
CN110970129B (en) Method for judging traditional Chinese medicine syndrome based on improved Bayesian statistics
CN111739634A (en) Method, device and equipment for intelligently grouping similar patients and storage medium
CN103678534A (en) Physiological information and health correlation acquisition method based on rough sets and fuzzy inference
CN109935337A (en) A kind of medical record lookup method and system based on similarity measurement
CN108133752A (en) A kind of optimization of medical symptom keyword extraction and recovery method and system based on TFIDF
CN115099149A (en) Result prediction method based on multiple feature comparison and random forest algorithm
Çinare et al. Determination of Covid-19 possible cases by using deep learning techniques
Manhar et al. A improving feature selection on heart disease dataset with Boruta approach
Anooj Implementing decision tree fuzzy rules in clinical decision support system after comparing with fuzzy based and neural network based systems
CN110610766A (en) Apparatus and storage medium for deriving probability of disease based on symptom feature weight
CN116564521A (en) Chronic disease risk assessment model establishment method, medium and system
Li et al. Covid-19 detection in chest radiograph based on yolo v5
Biswas et al. A belief rule base expert system for staging non-small cell lung cancer under uncertainty
Dai et al. Risk Prediction of Diabetes Based on Spark and Random Forest Algorithm
Wang Identification of Cardiovascular Diseases Based on Machine Learning
CN114242178A (en) Method for quantitatively predicting biological activity of ER alpha antagonist based on gradient lifting decision tree
Xu et al. Research on Liver Disease Diagnosis Based on RS_LMBP Neural Network
Malaysha et al. Classification and Prediction of Low-Density Lipoprotein Cholesterol LDL-C in The Palestinian Patients Using Machine Learning Techniques
Angayarkanni et al. Selection OF features associated with coronary artery diseases (cad) using feature selection techniques
Baihaqi et al. Review on fuzzy expert system and data mining techniques for the diagnosis of coronary artery disease
Rihana et al. Artificial intelligence framework for COVID19 patients monitoring
Barus et al. Implementation of the Naive Bayes Algorithm to Predict the Safety of Heart Failure Patients
Oleiwi Using the Fuzzy Logic to Find Optimal Centers of Clusters of K-means
Xie et al. Predicting the risk of stroke based on imbalanced data set with missing data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant