CN110033862B - Traditional Chinese medicine quantitative diagnosis system based on weighted directed graph and storage medium - Google Patents

Traditional Chinese medicine quantitative diagnosis system based on weighted directed graph and storage medium Download PDF

Info

Publication number
CN110033862B
CN110033862B CN201910295314.1A CN201910295314A CN110033862B CN 110033862 B CN110033862 B CN 110033862B CN 201910295314 A CN201910295314 A CN 201910295314A CN 110033862 B CN110033862 B CN 110033862B
Authority
CN
China
Prior art keywords
weight
characteristic data
matrix
directed graph
sparse matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910295314.1A
Other languages
Chinese (zh)
Other versions
CN110033862A (en
Inventor
孙鑫亮
杨涛
章颖
李鑫欣
汪叶群
苏璐萍
高佳奕
于婧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Chinese Medicine
Original Assignee
Nanjing University of Chinese Medicine
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Chinese Medicine filed Critical Nanjing University of Chinese Medicine
Priority to CN201910295314.1A priority Critical patent/CN110033862B/en
Publication of CN110033862A publication Critical patent/CN110033862A/en
Application granted granted Critical
Publication of CN110033862B publication Critical patent/CN110033862B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24143Distances to neighbourhood prototypes, e.g. restricted Coulomb energy networks [RCEN]
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Abstract

The invention discloses a traditional Chinese medicine quantitative diagnosis system and a storage medium based on a weighted directed graph, and belongs to the field of traditional Chinese medicine information processing. The method comprises the following steps: the weight calculation module is used for calculating the weight of the characteristic data according to a preset strategy; the directed graph constructing module is used for constructing a weighted directed graph according to the relation between the weight and the feature data; and the reasoning diagnosis module is used for reasoning the case to be detected through the directed graph construction module to obtain a corresponding result of the case. The invention provides a traditional Chinese medicine quantitative diagnosis system based on a weighted directed graph, which visually expresses characteristic data in a complex pathogenesis as the directed graph, completes dynamic construction of the pathogenesis through the weight relation between symptoms and syndrome types, can better display the evaluation result of a constructed traditional Chinese medicine model compared with the method for exhaustively enumerating a standard syndrome name mode in the existing intelligent syndrome differentiation system, has wide diagnosis adaptability and good accuracy, and can effectively improve the diagnosis efficiency of the traditional Chinese medicine.

Description

Traditional Chinese medicine quantitative diagnosis system based on weighted directed graph and storage medium
Technical Field
The invention belongs to the field of traditional Chinese medicine information processing, and particularly relates to a traditional Chinese medicine quantitative diagnosis system based on a weighted directed graph and a storage medium.
Background
Treatment based on syndrome differentiation is the characteristic and essence of traditional Chinese medicine, wherein syndrome differentiation is the premise of traditional Chinese medicine establishment, prescription and medication. The syndrome differentiation of traditional Chinese medicine is a process of collecting clinical information (symptoms and signs) through four diagnostic methods (inspection, auscultation, inquiry and resection), and abstracting and summarizing according to the traditional Chinese medicine theory to finally obtain the syndrome type of traditional Chinese medicine. Because the differentiation of symptoms and signs of traditional Chinese medicine depends on the experience of traditional Chinese medicine experts, the differentiation of symptoms and signs of traditional Chinese medicine has the characteristics of subjectivity, complexity and fuzziness, so that the differentiation of symptoms and signs of traditional Chinese medicine is difficult to quantify and repeat, and the development of modernization of traditional Chinese medicine is hindered.
With the development of information technology, more and more new methods and techniques are introduced into the field of traditional Chinese medicine research. Artificial intelligence techniques represented by techniques such as knowledge engineering, machine learning, pattern recognition, etc. are gradually introduced into the differential study of traditional Chinese medicine, and some progress is made. However, most of the above studies have focused on the identification and judgment of a disease or several syndromes, and it is difficult to effectively cope with clinically complicated disease changes. In the clinical practice of traditional Chinese medicine, the patient's condition is complicated, the syndrome type often does not appear singly, a plurality of syndrome types are often interlaced and overlapped, and effective modeling and analysis cannot be performed by adopting the traditional artificial intelligence technology; in addition, in the conventional analysis process, clinical information is mostly represented in a binary system (a value of "1" indicates that a certain symptom appears, and "0" does not appear), and the binary value of the clinical information is used for modeling, so that the weight of the clinical information is ignored, and a satisfactory effect is difficult to obtain.
Chinese patent publication No. CN102298663A discloses a detection method for automatically identifying syndrome type in traditional chinese medicine, comprising the following steps: establishing a standard and objective traditional Chinese medicine case database; aiming at the standardized traditional Chinese medicine sample database, calculating mutual information and symmetry uncertainty among all attributes by an attribute screening method based on a cooperative relation, and selecting a symptom attribute set with a large contribution degree to syndrome detection based on a heuristic rule; constructing a classification training sample set by using the selected key attribute set and sample information in a case database, determining a decision attribute by calculating the information gain rate of the attribute, simultaneously controlling the lower limit of a sample of each node and recording a classification error, reading all training samples and quasi-training samples in an incremental learning manner, and finally obtaining a classification rule; and carrying out syndrome type identification and detection on the new sample by using the obtained classification rule. However, the scheme is only researched on the automatic syndrome differentiation problem of the cirrhosis, and no specific solution exists for popularizing in the field of automatic differentiation of other syndrome types in traditional Chinese medicine.
Chinese patent publication No. CN104615894B discloses a method and system for chinese medical diagnosis based on k-nearest neighbor label specific weight characteristics. The method comprises the following steps: acquiring feature data weight information of the cases under different categories according to a preset weight determination strategy; according to the feature data weight information of the cases under different categories, acquiring the weighted Euclidean distance of any two cases and selecting the case with the minimum weighted Euclidean distance of a preset number; and processing the selected case by adopting a k-nearest neighbor label specific weight feature multi-label learning method, namely ML-LSWAKNN, acquiring an evaluation index corresponding to the case, fully considering the influence of feature weighting on classification, and greatly improving the classification precision.
The above scheme adopts a proximity algorithm, or a K-nearest neighbor (KNN, K-nearest neighbor) classification algorithm, which is one of the simplest methods in the data mining classification technology. By K nearest neighbors is meant the K nearest neighbors, meaning that each sample can be represented by its nearest K neighbors. The method has the following steps: if a sample belongs to a certain class in the majority of the k most similar samples in feature space (i.e. the nearest neighbors in feature space), then the sample also belongs to this class. In the KNN algorithm, the selected neighbors are all objects that have been correctly classified. The method only determines the category of the sample to be classified according to the category of the nearest sample or a plurality of samples in the classification decision. Where K is a neighborhood number, i.e., when predicting a target point, several neighborhood points are taken for prediction. Therefore, the value of K is very important to choose because: if the value of K is too small, once noisy components exist, the influence on prediction is large, for example, when the value of K is 1, once a nearest point is noise, a deviation occurs, and the reduction of the value of K means that the overall model becomes complex and overfitting is easy to occur; if the value of K is taken too large, which is equivalent to predicting with training examples in a larger neighborhood, the approximation error of learning will increase. Instances that are further away from the input target point may also contribute to the prediction, causing the prediction to be incorrect. Therefore, although the weight marking can be performed, the optimal value of K is difficult to determine, and the later result is not accurate enough.
The main task of similarity calculation is to measure the similarity between objects, and is a basic calculation of information retrieval, recommendation systems, data mining and the like. The Euclidean Metric (also known as Euclidean distance) used in the K nearest neighbor (KNN, K-nearest neighbor) classification algorithm is a commonly used distance definition that refers to the true distance between two points in an m-dimensional space, or the natural length of a vector (i.e., the distance of the point to the origin). The Euclidean distance in two-dimensional and three-dimensional spaces is the actual distance between two points, is visual and easy to understand, but in a high-dimensional space, the representation of the Euclidean distance is often poor. The Chinese medicine symptom space is a typical high-dimensional space, and has hundreds of symptoms, each symptom is represented by 0 or 1,0 is that no symptom appears, and 1 is that a symptom appears. Assume that syndrome type 1 corresponds to symptom vector [0,0,0,0,0,0,0,0,1]Syndrome 2 corresponds to [1,1,0,0,0,0,0]Then the euclidean distance between the two S12 ═ 2; syndrome type 3 corresponds to a symptom vector [1,1,1,1,1,1,1, 0]Syndrome 4 corresponds to [0,0,0,1,1,1,1]The Euclidean distance between the two
Figure BDA0002026289760000021
It is clear that S34 is greater than S12, and syndrome 1 and syndrome 2 are more similar theoretically, but actually syndrome 1 and syndrome 2 do not have any correlation, but instead syndrome 3 and syndrome 4 have multiple overlapping symptoms, and the similarity is higher. This results in fewer positive symptoms (symptoms labeled 1) and more negative symptoms (symptoms labeled 0) appearing in the samples in the symptom space, resulting in the similarity calculation being more affected by the negative symptoms, affecting the model effect.
Disclosure of Invention
1. Problems to be solved
Aiming at the problems that symptoms often do not appear singly and are often interwoven together in the clinical traditional Chinese medicine and the traditional data mining technology cannot perform modeling and analysis simultaneously in the prior art, the invention provides a traditional Chinese medicine quantitative diagnosis system based on a weighted directed graph, which visually expresses characteristic data in a complex pathogenesis in the form of the directed graph and completes dynamic construction of the pathogenesis through the weight relation between symptoms and symptoms.
2. Technical scheme
In a first aspect, the present invention provides a quantitative diagnosis system for chinese medicine based on a weighted directed graph, comprising: the weight calculation module is used for calculating the weight of the characteristic data according to a preset strategy; the directed graph constructing module is used for constructing a weighted directed graph according to the relation between the weight and the feature data; and the reasoning diagnosis module is used for reasoning the case to be detected through the directed graph construction module to obtain a corresponding result of the case.
Further, the predetermined strategy comprises a mutual information calculation method, a confidence calculation method and an information entropy calculation method.
Further, the weight calculation module comprises a characteristic data matrix construction submodule, a characteristic data correlation degree determination submodule and a characteristic data weight acquisition submodule; wherein
The characteristic data matrix construction submodule is used for converting the characteristic data into a sparse matrix;
the characteristic data relevancy determination submodule calculates the relevancy of the characteristic data according to the sparse matrix and the preset strategy;
the characteristic data weight obtaining submodule is used for carrying out standardization processing on the correlation degree of the characteristic data to obtain the weight of the characteristic data.
Further, the data processing process of the characteristic data matrix construction submodule is as follows:
(1) respectively constructing a sparse matrix A and a sparse matrix B according to different types of feature data;
Figure BDA0002026289760000031
(2) respectively taking out a single element column from the sparse matrix A and taking out a single element from the matrix B to perform AND operation to obtain a matrix Ci;
Figure BDA0002026289760000032
where m denotes the number of columns in the matrix and n denotes the number of rows in the matrix.
Further, the feature data correlation degree determination submodule is used for determining the correlation degree according to the sparse matrix A, the sparse matrix B and the sparse matrix CiCalculating the correlation degree of the characteristic data according to a mutual information calculation method, specifically:
Figure BDA0002026289760000041
Figure BDA0002026289760000042
p(x,y)=ci=x,n=y (3)
Figure BDA0002026289760000043
wherein x represents symptom, y represents syndrome type, and p (x) represents a in the sparse matrix AmnProbability of an item appearing in its column, amnIs an element in the sparse matrix a, represented by 0 or 1; p (y) represents B in the sparse matrix BmnProbability of an item appearing in its column, bmnIs an element in the sparse matrix B, represented by 0 or 1; p (x, y) denotes C in the matrix CmnThe probability of occurrence, PMI (x, y), is the probability of each element in the sparse matrix A and the sparse matrix B occurring at the same time, m represents the number of columns in the matrix, and n represents the number of rows in the matrix.
Further, the processing procedure of the feature data weight obtaining sub-module is as follows:
obtaining the correlation of the characteristic data
Figure BDA0002026289760000044
Calculating the weight of the characteristic data: WF ═ WF1, WF 2.. wfk.. wfn), where,
Figure BDA0002026289760000045
further, a directed graph is formed by utilizing the triples according to the sparse matrix A, the sparse matrix B and the weight of the feature data.
Furthermore, the method for reasoning the to-be-detected case through the directed graph construction module specifically comprises the following steps:
corresponding the case to be detected with the characteristic data to obtain the weight corresponding to the characteristic data;
carrying out weighted summation according to the weight corresponding to the characteristic data to obtain a reference result and a corresponding weight sum;
and sorting the corresponding weight and the descending order, taking a threshold value to discard the reference result, and obtaining the optimal result corresponding to the case.
In a second aspect, the present invention provides a computer-readable storage medium storing the quantitative diagnosis system for chinese medicine according to any one of the above-mentioned items.
3. Advantageous effects
Compared with the prior art, the invention has the beneficial effects that:
(1) the invention provides a traditional Chinese medicine quantitative diagnosis system based on a weighted directed graph, which visually expresses characteristic data in a complex pathogenesis as the directed graph, completes dynamic construction of the pathogenesis through the weight relation between symptoms and syndrome types, can better display the evaluation result of a constructed traditional Chinese medicine model compared with the method for exhaustively listing a standard syndrome name mode in the existing intelligent syndrome differentiation system, has wide diagnosis adaptability and good accuracy, and can effectively improve the diagnosis efficiency of the traditional Chinese medicine;
(2) according to the invention, the weight of the feature data is normalized, and the original data are converted into the non-dimensionalized index values, namely, all the index values are in the same quantity level, so that the contribution of all the features to the result is the same, the comprehensive evaluation analysis can be carried out, and the calculation accuracy is improved;
(3) according to the invention, the case to be detected corresponds to the characteristic data, and the weight corresponding to the characteristic data is obtained; carrying out weighted summation according to the weight corresponding to the characteristic data to obtain a reference result and a corresponding weighted sum; sorting the corresponding weight and descending order, discarding the reference result by taking a threshold value, obtaining the optimal result corresponding to the case, and sorting and outputting the result, so that the patient or the doctor can conveniently demonstrate and treat the syndrome type with high probability;
(4) according to the invention, the weighting directed graph constructed by the weight information is obtained by calculation according to a mutual information method and is realized according to Chinese medical treatment thinking, so that the extraction of the traditional Chinese medicine expert experience and the construction of a model can be better realized;
(5) the system provided by the invention has the advantages of simple structure, reasonable design and easy use.
Drawings
FIG. 1 is a diagram illustrating a quantitative diagnosis system according to the present invention;
FIG. 2 is a directed graph constructed by the quantitative diagnosis system of the present invention.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical solutions of the present invention, and therefore are only used as examples, and the protection scope of the present invention is not limited thereby. It is to be noted that, unless otherwise specified, technical or scientific terms used herein shall have the ordinary meaning as understood by those skilled in the art to which the present invention belongs.
In the present application, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection".
In particular implementations, the terminals described in embodiments of the present invention include, but are not limited to, other portable devices such as mobile phones, laptop computers, or tablet computers having touch sensitive surfaces (e.g., touch screen displays and/or touch pads). It should also be understood that in some embodiments, the device is not a portable communication device, but is a desktop computer having a touch-sensitive surface (e.g., a touch screen display and/or touchpad).
In the discussion that follows, a terminal that includes a display and a touch-sensitive surface is described. However, it should be understood that the terminal may include one or more other physical user interface devices such as a physical keyboard, mouse, and/or joystick.
The terminal supports various applications, such as one or more of the following: a drawing application, a presentation application, a word processing application, a website creation application, a disc burning application, a spreadsheet application, a gaming application, a telephone application, a video conferencing application, an email application, an instant messaging application, an exercise support application, a photo management application, a digital camera application, a web browsing application, a digital music player application, and/or a digital video player application.
Various applications that may be executed on the terminal may use at least one common physical user interface device, such as a touch-sensitive surface. One or more functions of the touch-sensitive surface and corresponding information displayed on the terminal can be adjusted and/or changed between applications and/or within respective applications. In this way, a common physical architecture (e.g., touch-sensitive surface) of the terminal can support various applications with user interfaces that are intuitive and transparent to the user.
Example 1
The present embodiment provides a quantitative diagnosis system for traditional Chinese medicine based on a weighted directed graph, as shown in fig. 1, including: the weight calculation module is used for calculating the weight of the characteristic data according to a preset strategy; the directed graph constructing module is used for constructing a weighted directed graph according to the relation between the weight and the feature data; and the reasoning diagnosis module is used for reasoning the case to be detected through the directed graph construction module to obtain a corresponding result of the case. The method has the advantages that the characteristic data in the complex pathogenesis are visually expressed in a directed graph, the dynamic construction of the pathogenesis is completed through the weight relation between symptoms and syndrome types, compared with the method for exhaustively listing the standard syndrome name mode in the existing intelligent syndrome differentiation system, the evaluation result of the constructed traditional Chinese medicine model can be better displayed, the diagnosis adaptability is wide, the accuracy is good, and the diagnosis efficiency of the traditional Chinese medicine can be effectively improved.
The weight calculation module calculates the weight of the characteristic data according to a preset strategy;
the characteristic data refers to the syndrome and symptom in traditional Chinese medicine, and the symptom (symptom) refers to the subjective abnormal feeling or some objective pathological changes of a patient caused by a series of abnormal changes of functions, metabolism and morphological structures in an organism in the disease process. Clinically common important symptoms include fever, pain, weight change, edema, dyspnea, cough, expectoration, hemoptysis, anorexia, dyspepsia, dysphagia, nausea and vomiting, hematemesis, hematochezia, jaundice, abnormal urination, anemia, shock and the like. Syndrome type is a unique name in TCM. The syndrome refers to the generalization of pathological attributes in a certain stage of the disease process. In traditional Chinese medicine, the human body is divided into yin and yang, qi and blood, and the etiology is divided into wind-cold-summer-heat-damp dryness-heat phlegm, deficiency and excess, and the like. The syndrome type refers to different disease states of the human body caused by different changes of yin, yang, qi and blood caused by different causes. They can be stored in the form of data such as execl, csv and txt; the predetermined strategy comprises a mutual information calculation method, a confidence coefficient calculation method and an information entropy calculation method. The mutual information calculation method is used in this example, but is not intended to limit the present invention. Specifically, the weight calculation module comprises a characteristic data matrix construction sub-module, a characteristic data relevancy determination sub-module and a characteristic data weight acquisition sub-module; the characteristic data matrix construction submodule is used for converting characteristic data into a sparse matrix; the characteristic data correlation degree determining submodule is used for calculating the correlation degree of the characteristic data according to the sparse matrix and the preset strategy; the characteristic data weight obtaining submodule is used for carrying out standardization processing on the correlation degree of the characteristic data to obtain the weight of the characteristic data.
Further, the data processing process of the characteristic data matrix construction submodule is as follows:
(1) respectively constructing a sparse matrix A and a sparse matrix B according to different types of feature data; specifically, the characteristic data is classified according to symptoms and syndrome types, and then a matrix form is obtained by processing, wherein the element in the sparse matrix A is amn,The element in the sparse matrix B is Bmn,m represents the number of columns in the matrix and n represents the number of rows in the matrix; the value in the matrix is 0 or 1, the column name of the matrix is symptom or syndrome type, 1 represents the symptom in the case, and 0 represents none.
Figure BDA0002026289760000071
(2) Respectively taking out single element columns from the sparse matrix A and respectively taking out single elements from the matrix B to carry out AND operation to obtainMatrix Ci
Figure BDA0002026289760000072
For newly constructed matrix Ci element is cmn,The elements in the matrix are also 0 or 1, the result is from sparse matrix A and B operations, and the column names are not required and are from pure operations.
The characteristic data correlation degree determining submodule is used for determining the correlation degree according to the sparse matrix A, the sparse matrix B and the sparse matrix CiCalculating the correlation degree of the characteristic data according to a mutual information calculation method, specifically comprising the following steps:
Figure BDA0002026289760000073
Figure BDA0002026289760000074
p(x,y)=ci=x,n=y (3)
Figure BDA0002026289760000081
wherein x represents symptom, y represents syndrome type, and p (x) represents a in the sparse matrix AmnProbability of an item appearing in its column, amnIs an element in the sparse matrix a, represented by 0 or 1; p (y) represents B in the sparse matrix BmnProbability of an item appearing in its column, bmnIs an element in the sparse matrix B, represented by 0 or 1; p (x, y) denotes C in the matrix CmnThe probability of occurrence, PMI (x, y), is the probability of the simultaneous occurrence of each element in the sparse matrix A and the sparse matrix B, m represents the number of columns in the matrix, and n represents the number of rows in the matrix. It should be noted that Mutual Information (Mutual Information) is a useful Information measure in Information theory, and it can be regarded as that contained in a random variable about another random variableThe information quantity of a random variable, or the uncertainty that one random variable decreases because another random variable is known.
The feature data weight obtaining sub-module is specifically configured to obtain a correlation degree of the feature data:
Figure BDA0002026289760000082
calculating a weight WF ═ of the feature data (WF1, WF 2.. wfk.. wfn), wherein,
Figure BDA0002026289760000083
in the multi-index evaluation system, the evaluation indexes are different in nature and generally have different dimensions and orders. When the levels of the indexes are greatly different, if the original index values are directly used for analysis, the function of the indexes with higher values in the comprehensive analysis is highlighted, and the function of the indexes with lower values is relatively weakened. Therefore, in the embodiment, the weight of the feature data is normalized, and the original data is converted into the non-dimensionalized mapping evaluation value, that is, each index value is in the same quantity level, so that each feature contributes the same to the result, and the comprehensive evaluation analysis can be performed, thereby improving the calculation accuracy.
And the directed graph constructing module is used for constructing a weighted directed graph according to the relation between the weight and the feature data.
Further, a directed graph is formed by utilizing the triples according to the sparse matrix A, the sparse matrix B and the weight of the feature data.
It should be further noted that the directed graph is a graph theory model, and the core of the directed graph lies in how to determine the direction and weight of an edge, in the prior art, the direction of an edge is mostly set manually, and a statistical learning method is used to determine the weight. In the field of traditional Chinese medicine diagnosis, the directions of the sides are also set manually, the weights of the sides mostly adopt frequency or conditional probability, and the like, and in the embodiment, point-to-point mutual information is adopted for calculation of the weights. The directed graph construction process is as follows: and constructing triples to form a weighted directed graph according to weight information among different feature data, such as the weight of the sparse matrix A, the sparse matrix B and the weight of elements in the sparse matrix A and the sparse matrix B.
Specifically, if each symptom ZZ is represented by a different vector, such as: the symptom ZZ is [ ZX, WF ], and the structure of the ternary network directed graph is represented as { ZZ, ZX, WF }, where ZX represents the symptom and WF represents the weight.
As shown in fig. 2, a directed graph is constructed, wherein ZX (1), ZX (2) and ZX (3) represent three different syndromes, ZZ (1) and ZZ (2) … … ZZ (k) represent k symptoms of ZX (1), wherein ZZ (1) is a common symptom of ZX (1), ZX (2) and ZX (3), ZZ (2) is a common symptom of ZX (1) and ZX (2), and ZZ (3), ZZ (4) … … and ZZ (k) are unique symptoms of ZX (1), and it can be found from the following graph that when a certain symptom is in more syndromes, the weight of the symptom to the syndrome should be smaller compared with other symptoms in the syndrome; when the total number of symptoms of a syndrome is less, the weight of a symptom in the syndrome should be greater than the weight of a symptom in other syndromes.
The reasoning diagnosis module corresponds the case to be detected with the characteristic data to obtain the weight corresponding to the characteristic data; carrying out weighted summation according to the weight corresponding to the characteristic data to obtain a reference result and a corresponding weight sum; and sorting the corresponding weight and the descending order, discarding the reference result by taking a threshold value, obtaining the optimal result corresponding to the case, sorting the corresponding weight and the descending order, discarding the reference result by taking the threshold value, and obtaining the optimal result corresponding to the case.
Specifically, as shown in fig. 2, the symptom information extracted from the medical record of a certain patient includes symptoms ZZ (1), ZZ (2), ZZ (3), ZZ (4), … … ZZ (k), and the category is determined by a binary search algorithm to obtain the weight corresponding to the category, where the binary search algorithm is a common technical means in the field and is not described herein again. The probability that the patient is a syndrome zx is then:
p (zx) ═ sum (wfxk) ═ WF (11) + WF (12) + WF (13) + WF (14) + WF (15) + … + WF (1k), descending order of the weights of the syndromes, discarding the reference result by taking a threshold, and discarding the syndrome that is ranked earlier than the median, discarding the syndrome that is ranked later than the median, to obtain the optimal result corresponding to the case, where WF (xk) (k) ═ 1,2,3, … n) is the weight of the feature data, and a larger weight indicates a higher contribution rate to the result.
The embodiment visually expresses the characteristic data in the complicated pathogenesis by using a directed graph, and completes the dynamic construction of the pathogenesis by the weight relation between symptoms and syndrome types on the basis of the directed graph definition of the pathogenesis; corresponding the case to be detected with the characteristic data to obtain the weight corresponding to the characteristic data; carrying out weighted summation according to the weight corresponding to the characteristic data to obtain a reference result and a corresponding weight sum; and sorting the corresponding weight and the descending order, discarding the reference result by taking a threshold value, and obtaining the optimal result corresponding to the case. It should be noted that syndrome explosion is the medical design of the computer system for syndrome differentiation in traditional Chinese medicine, which determines 48 basic contents for syndrome differentiation (i.e. elements for syndrome differentiation), accesses 1500 standard syndrome name modes (i.e. composite syndrome types), and in the research, adopts methods of "threshold adjustment, compatibility" and the like, and performs fuzzy clustering analysis on 48 model elements (i.e. standard syndrome type modes) through fuzzy mathematics correlation theory, such as using space measurement method, transformation dimension reduction (or dimension increase) method and the like, to form more than 500 syndrome name models. In the intelligent syndrome differentiation model based on pattern matching, a standard syndrome type pattern must be determined. Theoretically, all permutation and combination of 48 basic syndrome differentiation elements can certainly cover the clinical condition, but on one hand, the data volume (248) is an astronomical number, and it is impossible to arrange various combinations thereof, and on the other hand, clinically, all syndrome differentiation elements can not be randomly permutated and combined, such as { fire, yang deficiency } or { external wind, cold, fire } and other combination modes can not form the syndrome type under the theoretical system of traditional Chinese medicine. This is the common syndrome type explosion problem in the current intelligent syndrome differentiation field of traditional Chinese medicine.
As shown in fig. 1, the present embodiment further provides a directed graph quantitative diagnosis method for chinese medicine, which includes the following detailed steps:
the method comprises the following steps: dividing training data train _ data and test data test _ data from original data according to a ten-fold cross validation method;
step two: each label L in the For label vector L executes the step three to the step five;
step three: according to a weight determination method, point type mutual information of each feature is calculated by utilizing train _ data, and then point type mutual information normalization processing is used as weight information of each feature;
step four: calculating the corresponding weight between each unknown instance and the train _ data instance in the test _ data according to a formula (1) in all the test _ data, and selecting K instances N (K) with the maximum weight;
step five: end for.
To illustrate the effectiveness of the method, we performed a correlation experiment in which 418 coronary heart disease data were selected as the study subjects and the samples were trained using models. The system automatically computes the triplets that result (symptom, syndrome, weight). In order to facilitate loading and calling, the background converts the Chinese characters into a JSON format for storage, namely { syndrome: { symptom 1: weight 1, symptom 2: weight 2 … … } } the following description takes symptoms of heart-qi deficiency and lung-yin deficiency as examples: { 'Heart-Qi deficiency' { 'cardiopalmus': 0.17970033096330984, 'chest distress': 0.1426162709006604, 'dizziness': 0.09675323697280372, 'headache': 0.06133919561179082, 'throat pain': 0.54120158784469415, 'waist soreness': 0.049497041840412495, 'sunken edema of both lower limbs': 04621506196625556, 'hypodynamia': 0.04621232129119378, 'anorexia': 0.040777500785163054, 'night sleep lack-ease': 0.030477793526566727, 'frequent urination': 0.027262711151846412, 'constipation': 024742634905585082, 'stool regulation': 0.023804154276122487} }.
{ 'deficiency of lung yin' { 'cardiopalmus': 0.14956935140308092, 'chest distress': 0.12866125182845342, 'dizziness': 0.08022981081402662, 'headache': 0.05275585322018142, 'sore throat': 0.052129023680618475, 'soreness of waist': 0.04847449413050493, 'sunken edema of both lower limbs': 04621506196625556, 'lack of strength': 0.041422491758973445, 'poor appetite': 032739345289794476, 'poor night sleep': 0.028341493055174725, 'frequent urination': 0.024742634905585082, 'constipation': 0.023804154276122487, 'stool regulation': 0.023528103602116983} }.
And (3) testing the effectiveness of the model:
randomly extracting 10% of samples from the samples to test, randomly sampling for 10 times, comparing the calculation result with the originally marked diagnosis result, marking the calculation result as 1, and calculating the 1-error rate, the coverage rate, the sequencing loss, the average precision and the Hamming loss of the model if the calculation result is 0, wherein the calculation result is used for evaluating the effect of the model.
The correlation index is defined as follows:
1-Error rate (One Error, OE ↓), which is used to examine the case that the label at the top of the sequence does not belong to the sample label set in the sample concept label ordering sequence. The index expression is as follows:
Figure BDA0002026289760000111
rank first H (x)i) 1, otherwise H (x)i)=0。
Coverage (Coverage ↓), which is used to examine the search depth situation required to cover all concept markers belonging to the sample in the concept marker ordering sequence of the sample. The index expression is as follows:
Figure BDA0002026289760000112
wherein C (x)i)={l|f(xi,l)≥f(xi,li'), l ∈ y }, and
Figure BDA0002026289760000113
ranking Loss (Ranking Loss, RL ↓), which is used to examine the case where a Ranking error occurs in the concept-labeled Ranking sequence of samples. The index expression is as follows:
Figure BDA0002026289760000114
wherein
Figure BDA0002026289760000115
Average Precision (AVP ×) for examining the case where, in the sorted sequence of concept tokens for a sample, the tokens ranked before the concept token belonging to the sample still belong to the set of sample tokens.
Figure BDA0002026289760000116
Wherein
Figure BDA0002026289760000117
Hamming Loss (HL ↓) used for examining the misclassification of the sample on a single mark, namely, the concept mark belonging to the sample does not appear in the mark set, but the concept mark not belonging to the sample appears in the mark set.
Figure BDA0002026289760000118
Wherein Q is the total number of tags, h (x)i) Is the classification result.
Note: the larger the value ↓ indicates the smaller the value, the better the effect, and m is the number of records.
The calculation results of the indexes of the model are shown in the following table:
TABLE 1 evaluation of the models
Figure BDA0002026289760000121
It should be noted that the obtained result is evaluated by using several evaluation indexes, Hamming _ Loss, Average _ Precision, One _ Error, Ranking _ Loss and Coverage, commonly used for multi-label problems, the model constructed by the invention can more accurately display the evaluation result, and the One _ Error reflects the misjudgment rate of the model diagnosis result compared with the real result; the Average _ Precision reflects the similarity between the model diagnosis result and the real diagnosis result; the Ranking _ Loss reflects the error rate of the situation corresponding to the Ranking of each sub-item in the model diagnosis result and the real diagnosis result; hamming _ Loss reflects the misjudgment rate of the corresponding situation of each subentry in the model diagnosis result and the real diagnosis result; coverage reflects the redundancy of model diagnostic results compared to actual diagnostic results. Therefore, the model test of the invention not only considers the numerical significance of each index, but also focuses on the result explanation of the traditional Chinese medicine angle.
Example 2
The present embodiment provides a computer-readable storage medium storing the system described in embodiment 1.
The computer-readable storage medium may include an internal storage unit of a terminal (computer), such as a hard disk or a memory of the terminal. The computer readable storage medium may also be an external storage device of the terminal, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the terminal. Further, the computer-readable storage medium may also include both an internal storage unit and an external storage device of the terminal. The computer-readable storage medium is used for storing the computer program and other programs and data required by the terminal. The computer readable storage medium may also be used to temporarily store data that has been output or is to be output.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the terminal and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed terminal and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electrical, mechanical or other form of connection.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, those skilled in the art will appreciate that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims (6)

1. A quantitative diagnosis system of traditional Chinese medicine based on a weighted directed graph is characterized by comprising:
the weight calculation module is used for calculating the weight of the characteristic data according to a preset strategy;
the directed graph constructing module is used for constructing a weighted directed graph according to the relation between the weight and the feature data;
the reasoning diagnosis module is used for reasoning the case to be detected through the directed graph construction module to obtain a corresponding result of the case;
the weight calculation module comprises a characteristic data matrix construction submodule, a characteristic data relevancy determination submodule and a characteristic data weight acquisition submodule; wherein
The characteristic data matrix construction submodule is used for converting the characteristic data into a sparse matrix;
the method specifically comprises the following steps: classifying the characteristic data according to symptoms and syndrome types, and respectively constructing a sparse matrix A and a sparse matrix B in a matrix form obtained by processing;
the characteristic data relevancy determination submodule calculates the relevancy of the characteristic data according to the sparse matrix and the preset strategy;
the characteristic data weight acquisition submodule is used for carrying out standardization processing on the relevancy of the characteristic data to acquire the weight of the characteristic data;
constructing a triple to form a weighted directed graph according to the sparse matrix A, the sparse matrix B and the weights of the elements in the sparse matrix A and the sparse matrix B; when the syndrome type of a certain symptom is more, the weight of the symptom corresponding to the syndrome type should be smaller compared with other symptoms in the syndrome type; when the total number of symptoms of a syndrome is less, the weight of a symptom in the syndrome should be greater compared with the weight of a symptom in other syndromes;
the method for reasoning the case to be detected through the directed graph construction module specifically comprises the following steps:
corresponding the case to be detected with the characteristic data to obtain the weight corresponding to the characteristic data;
carrying out weighted summation according to the weight corresponding to the characteristic data to obtain a reference result and a corresponding weight sum;
and sorting the corresponding weight and the descending order, taking a threshold value to discard the reference result, and obtaining the optimal result corresponding to the case.
2. The quantitative diagnosis system for traditional Chinese medicine based on weighted directed graph according to claim 1, wherein the predetermined strategy comprises mutual information calculation method, confidence calculation method and information entropy calculation method.
3. The quantitative diagnosis system for traditional Chinese medicine based on weighted directed graph according to claim 1, wherein the data processing procedure of the characteristic data matrix construction sub-module is as follows:
(1) respectively constructing a sparse matrix A and a sparse matrix B according to different types of feature data;
Figure FDA0003567147820000021
(2) respectively taking out a single element column from the sparse matrix A and taking out a single element from the matrix B to perform AND operation to obtain a matrix Ci;
Figure FDA0003567147820000022
wherein, the element in the sparse matrix A is amnThe element in the sparse matrix B is BmnThe element of the matrix Ci is cmnM denotes the number of columns in the matrix, n denotes the number of rows in the matrix, and i denotes the index number of the matrix Ci.
4. The quantitative diagnosis system for traditional Chinese medicine based on weighted directed graph according to any one of claims 2 or 3, wherein the characteristic data correlation degree determination submodule is used for determining the correlation degree of the characteristic data according to the sparse matrix A, the sparse matrix B and the sparse matrix CiCalculating the correlation degree of the characteristic data according to a mutual information calculation method, specifically comprising the following steps of:
Figure FDA0003567147820000023
Figure FDA0003567147820000024
p(x,y)=ci=x,n=y (3)
Figure FDA0003567147820000025
wherein x represents symptom, y represents syndrome type, and p (x) represents a in the sparse matrix AmnProbability of an item appearing in its column, amnIs an element in the sparse matrix a, represented by 0 or 1; p (y) represents B in the sparse matrix BmnProbability of an item appearing in its column, bmnIs an element in the sparse matrix B, represented by 0 or 1; p (x, y) denotes C in matrix CmnThe probability of occurrence, PMI (x, y) is the probability of the simultaneous occurrence of each element in the sparse matrix A and the sparse matrix B, m represents the number of columns in the matrix, and n represents the number of rows in the matrix.
5. The quantitative diagnosis system for traditional Chinese medicine based on weighted directed graph as claimed in claim 3, wherein the processing procedure of said feature data weight obtaining sub-module is:
obtaining the correlation of the characteristic data
Figure FDA0003567147820000031
Calculating the weight of the characteristic data: WF ═ WF1, WF 2.. wfk.. wfn), where,
Figure FDA0003567147820000032
6. a computer-readable storage medium storing the quantitative diagnosis system of chinese medicine according to any one of claims 1 to 5.
CN201910295314.1A 2019-04-12 2019-04-12 Traditional Chinese medicine quantitative diagnosis system based on weighted directed graph and storage medium Active CN110033862B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910295314.1A CN110033862B (en) 2019-04-12 2019-04-12 Traditional Chinese medicine quantitative diagnosis system based on weighted directed graph and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910295314.1A CN110033862B (en) 2019-04-12 2019-04-12 Traditional Chinese medicine quantitative diagnosis system based on weighted directed graph and storage medium

Publications (2)

Publication Number Publication Date
CN110033862A CN110033862A (en) 2019-07-19
CN110033862B true CN110033862B (en) 2022-05-17

Family

ID=67238224

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910295314.1A Active CN110033862B (en) 2019-04-12 2019-04-12 Traditional Chinese medicine quantitative diagnosis system based on weighted directed graph and storage medium

Country Status (1)

Country Link
CN (1) CN110033862B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111477302A (en) * 2020-03-07 2020-07-31 深圳问止中医健康科技有限公司 Traditional Chinese medicine dialectical algorithm of data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102298588A (en) * 2010-06-25 2011-12-28 株式会社理光 Method and device for extracting object from non-structured document
CN103473409A (en) * 2013-08-25 2013-12-25 浙江大学 FPGA (filed programmable gate array) fault automatic diagnosing method based on knowledge database
CN106933994A (en) * 2017-02-27 2017-07-07 广东省中医院 A kind of core disease card relation construction method based on knowledge of TCM collection of illustrative plates
CN107609389A (en) * 2017-08-24 2018-01-19 南京理工大学 A kind of verification method and system of image content-based correlation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102298588A (en) * 2010-06-25 2011-12-28 株式会社理光 Method and device for extracting object from non-structured document
CN103473409A (en) * 2013-08-25 2013-12-25 浙江大学 FPGA (filed programmable gate array) fault automatic diagnosing method based on knowledge database
CN106933994A (en) * 2017-02-27 2017-07-07 广东省中医院 A kind of core disease card relation construction method based on knowledge of TCM collection of illustrative plates
CN107609389A (en) * 2017-08-24 2018-01-19 南京理工大学 A kind of verification method and system of image content-based correlation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于医学知识图谱的疾病智能诊断研究;刘路;《中国优秀博硕士学位论文全文数据库(硕士)医药卫生科技辑》;20190115(第1期);第3.5,4.1节 *

Also Published As

Publication number Publication date
CN110033862A (en) 2019-07-19

Similar Documents

Publication Publication Date Title
ElShawi et al. Interpretability in healthcare: A comparative study of local machine learning interpretability techniques
CN108334574B (en) Cross-modal retrieval method based on collaborative matrix decomposition
Chiu et al. Multimodal depression detection on instagram considering time interval of posts
US9665824B2 (en) Rapid image annotation via brain state decoding and visual pattern mining
Berrendero et al. The mRMR variable selection method: a comparative study for functional data
Sun et al. Diagnosis and analysis of diabetic retinopathy based on electronic health records
CN109935337B (en) Medical record searching method and system based on similarity measurement
CN104573130B (en) The entity resolution method and device calculated based on colony
Chang et al. An assessment of machine learning models and algorithms for early prediction and diagnosis of diabetes using health indicators
Xu et al. Intelligent syndrome differentiation of traditional Chinese medicine by ANN: a case study of chronic obstructive pulmonary disease
Wang et al. Attention-based multi-instance neural network for medical diagnosis from incomplete and low quality data
Dong et al. Cervical cell classification based on the CART feature selection algorithm
Ghavidel et al. An ensemble data mining approach to discover medical patterns and provide a system to predict the mortality in the ICU of cardiac surgery based on stacking machine learning method
CN114530248A (en) Method for determining risk pre-warning model of potentially inappropriate prescription for cardiovascular disease
Chiu et al. Integrating structured and unstructured EHR data for predicting mortality by machine learning and latent Dirichlet allocation method
CN110033862B (en) Traditional Chinese medicine quantitative diagnosis system based on weighted directed graph and storage medium
CN115862897B (en) Syndrome monitoring method and system based on clinical data
Pina et al. Clustering clinical data in R
Di Corso et al. Simplifying text mining activities: scalable and self-tuning methodology for topic detection and characterization
Dilli Babu et al. Heart disease prognosis and quick access to medical data record using data lake with deep learning approaches
Akhtar et al. Data Mining Techniques to Construct a Model: Cardiac Diseases
Siregar et al. Comparison of Classification Algorithm Performance for Diabetes Prediction Using Orange Data Mining
Thakur et al. RNN-CNN Based Cancer Prediction Model for Gene Expression
Chandra Impacts of data synthesis: a metric for quantifiable data standards and performances
Chen et al. Auxiliary Diagnosis of Breast Cancer Based on Machine Learning and Hybrid Strategy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant