Disclosure of Invention
Aiming at the technical problems in the prior art, the invention aims to provide a traditional Chinese medicine dialectical treatment mode mining method and system based on graph data mining, wherein a graph data structure is utilized to model complex logic association among elements such as symptoms, diseases, syndrome types, syndromes, treatment rules, formulas and traditional Chinese medicine compositions in a traditional Chinese medicine case, and an expanded frequent subgraph mining algorithm is adopted to extract key elements of the traditional Chinese medicine dialectical treatment process, so that a traditional Chinese medicine dialectical treatment mode facing to the diseases, syndrome types and syndromes is formed. The system can display the diagnosis and treatment experience hidden in the traditional Chinese medical record in an objective and graphical traditional Chinese medicine dialectical treatment mode diagram mode, is beneficial to inheritance and development of clinical experience of famous physicians, and has important guiding significance and application value for traditional Chinese medicine dialectical treatment theory research, traditional Chinese medicine intelligent auxiliary diagnosis and the like.
The technical solution of the invention is as follows: the traditional Chinese medicine dialectical treatment mode mining system based on traditional Chinese medicine case mining constructs a traditional Chinese medicine case into a dialectical treatment process diagram when a doctor is on the way, and extracts key diagnosis and treatment information in the case by adopting an expanded diagram mining algorithm to form a traditional Chinese medicine dialectical treatment mode aiming at different diseases, syndrome types and syndromes. The system comprises a data preprocessing module, a data modeling module and a syndrome differentiation and treatment mode mining module. Wherein:
and the data preprocessing module is used for carrying out standardization and word segmentation processing on data information in the medical records. The data standardization process specifies standard names for symptom names, disease names, syndrome names, prescription names and traditional Chinese medicine names appearing in the medical records according to the Chinese medicine term vocabulary. The word segmentation processing is mainly developed aiming at symptom information and is used for splitting the complex symptom description into fine-grained minimum symptom description units. The symptom is composed of a symptom attribute and a symptom value. Wherein, the symptom attribute represents the object described by the symptom, such as the color of tongue coating, the quality of tongue coating, the pulse condition, etc.; symptom values describe the appearance of attributes of the symptom, such as dark red, thick, slippery, etc.
And the data modeling module is used for modeling the traditional Chinese medical record data processed by the data preprocessing module into a graph data structure. The nodes in the graph represent symptoms, diseases, syndrome, treatment rules, prescriptions and traditional Chinese medicine components in the medical record, namely symptom nodes, disease nodes, syndrome nodes, treatment rules nodes, prescription nodes and traditional Chinese medicine component nodes. The symptom nodes are connected with the disease nodes, the syndrome nodes and the syndrome nodes by edges to represent that the symptoms described by the symptom nodes belong to the symptoms of diseases, syndromes and syndromes described by the disease nodes, the syndrome nodes and the syndrome nodes connected with the symptom nodes. The syndrome node and the rule node, and the rule node and the prescription node are connected by edges to represent the corresponding rules and available prescriptions. The edge between the prescription node and the Chinese medicine composition node indicates that the Chinese medicine corresponding to the Chinese medicine composition node is the Chinese medicine composition corresponding to the prescription.
The dialectical treatment mode mining module adopts an expanded frequent subgraph mining algorithm to mine a graph-structured traditional Chinese medicine case, finds key symptom characteristics of disease diagnosis, an adopted therapeutic rule and treatment method, an applicable prescription and a core traditional Chinese medicine composition, and forms a traditional Chinese medicine dialectical treatment mode aiming at different diseases, syndrome types and syndromes. The module is realized by the following steps:
1) and according to the mining target of the user, selecting a traditional Chinese medical case set Z meeting the mining condition of the user from the output result of the data preprocessing module. The mining conditions include the name of the doctor and the name of the disease, syndrome or syndrome to be mined. Wherein, the disease name, syndrome type name and syndrome name are the necessary mining conditions of three syndrome differentiation treatment modes of disease, syndrome type and syndrome respectively; doctor names are optional mining conditions. The user can designate and dig the syndrome differentiation and treatment mode of a certain doctor about a certain disease, syndrome type or syndrome, or can not designate the doctor and dig the syndrome differentiation and treatment mode about a certain disease, syndrome type or syndrome in all medical records.
2) Traversing all medical records in the medical record set Z, and counting the occurrence frequency of each symptom, treatment rule, prescription and traditional Chinese medicine composition.
Wherein, the frequency of occurrence is the number of occurrences/total number of cases.
3) And setting a minimum support parameter according to the node type.
And 3.1) the three types of nodes of diseases, syndrome types and syndromes are nodes related to the user mining target and are used for appointing which disease, syndrome type or syndrome the dialectical treatment mode mined by the user aims at. The three types of nodes are necessary nodes of three syndrome differentiation treatment modes respectively, and minimum support degree parameters are not required to be set.
And 3.2) for the symptom nodes, sequentially acquiring the symptom attribute to which each symptom node belongs, and then respectively setting a minimum support degree parameter for each symptom attribute according to the occurrence frequency of each symptom value corresponding to the symptom attribute. The setting steps are as follows:
3.2.1) sorting the symptom values from large to small according to the occurrence frequency of each symptom value corresponding to the current symptom attribute to obtain a symptom value sequence L, wherein list (L) is the length of the sequence L;
3.2.2) sequentially selecting each element L (i) in L from the first element, wherein i is more than or equal to 1 and less than or equal to list (L), and accumulating and summing the appearance frequencies corresponding to L (i) until the accumulation frequency exceeds a preset threshold value;
3.2.3) taking the appearance frequency corresponding to the element L (i) taken out from L for the last time as the minimum support degree parameter of the current symptom attribute.
3.3) for three types of nodes formed by rules of treatment, prescriptions and traditional Chinese medicine, respectively setting minimum support degree parameters of the three types of nodes according to actual application requirements.
4) And (3) screening a graph data set G (Z) corresponding to the medical record set Z from the output result of the data modeling module according to the medical record ID, and then mining on G (Z) by adopting an expanded frequent subgraph mining algorithm, wherein the obtained frequent subgraph is a traditional Chinese medicine syndrome differentiation and treatment mode which consists of key elements such as symptoms, syndrome types, syndromes, diseases, treatment rules, prescriptions, traditional Chinese medicines and the like and can reflect internal logic association. The excavating steps are as follows:
4.1) selecting a corresponding minimum support parameter as a filtering condition according to the node type, mining on G (Z) by adopting a frequent subgraph mining algorithm to obtain frequent nodes and frequent edges, and sequencing the frequent edges according to the sequence of descending frequency and ascending DFS (depth-first search) code values;
4.2) sequentially acquiring edges in the frequent edge set, calculating DFS codes, and constructing frequent subtrees by using the frequent edges;
4.3) for each frequent subtree, finding out all the inner edges which can be connected with the frequent subtree from the frequent edge set, and adding the inner edges into the subtree to form a frequent subgraph, namely, a traditional Chinese medicine dialectical treatment mode corresponding to the mining condition.
Compared with the prior art, the invention has the advantages that:
(1) modeling the traditional Chinese medical record into a graph structure, extracting key diagnosis and treatment information in the traditional Chinese medical record based on the thought of frequent subgraph mining, forming a graphical traditional Chinese medicine dialectical treatment mode, and showing traditional Chinese medicine dialectical thinking and a method in a more visualized mode.
(2) The mining requirements of three common traditional Chinese medicine dialectical treatment modes including the mode mining of disease dialectical treatment, the mode mining of syndrome dialectical treatment and the mode mining of syndrome dialectical treatment can be met; the method can realize the dialectical treatment mode excavation facing to the famous doctors and the famous families aiming at individual famous doctors.
(3) The traditional frequent subgraph mining algorithm is expanded, different minimum support degree parameters are set according to the type of the node, the flexibility of mode mining is improved, and the requirements of a user on various mode mining requirements can be met. Particularly, aiming at the symptom nodes, the minimum support degree parameter is respectively set for each symptom attribute according to the symptom attribute, so that the problem of key symptom deficiency is effectively solved, and the completeness of a syndrome differentiation treatment mode is guaranteed.
Detailed Description
The present invention will be described in detail with reference to specific examples.
As shown in FIG. 1, the system of the present invention comprises three functional modules of data preprocessing, data modeling and dialectical treatment mode mining.
And the data preprocessing module is used for carrying out standardization and word segmentation processing on data information in the medical records. The data standardization process specifies standard names for symptom names, disease names, syndrome names, prescription names and traditional Chinese medicine names appearing in the medical records according to the Chinese medicine term vocabulary. The vocabulary of Chinese medicine terms defines the standard names and aliases of common Chinese medicine terms. The Chinese medicine terms appearing in the Chinese medicine medical record can be standardized by utilizing the word list of the Chinese medicine terms, for example, the symptom names of 'epigastric pain by pressing', 'epigastric pain by pressing' and 'epigastric pain by pressing' can be unified and standardized into 'epigastric pain by pressing'. The word segmentation processing is mainly developed aiming at symptom information and is used for segmenting the complex symptom description into fine-grained minimum symptom description units, for example, the symptom ' pulse condition is deep, wiry and thin, and the number of the pulses ' can be segmented into four independent symptoms ' pulse condition is deep ', ' pulse condition is wiry ', ' pulse condition is thin, and ' pulse condition is number '.
And the data modeling module is used for modeling the traditional Chinese medical record data processed by the data preprocessing module into a graph data structure. The nodes in the graph represent symptoms, diseases, syndrome types, syndromes, treatment rules, prescriptions and traditional Chinese medicine compositions in the medical records. The side connection between the symptom and the disease, syndrome or syndrome indicates that the symptom belongs to the manifestation of the disease, syndrome or syndrome. If a plurality of diseases, syndromes and syndromes exist, each symptom is connected with the plurality of diseases, syndromes and syndromes respectively. The syndrome and the rules of treatment, and the rules of treatment are also connected by the border, indicating the corresponding rules of treatment for the syndrome and the available formulas for the rules of treatment. The border between the prescription and the Chinese medicine composition indicates that the Chinese medicine is the composition component of the prescription. If there are multiple syndromes, treatment principles and prescriptions, the syndromes are connected with the treatment principles, the treatment principles are connected with the prescriptions, and the prescriptions are connected with the Chinese herbs.
The dialectical treatment mode mining module adopts an expanded frequent subgraph mining algorithm to mine a graph-structured traditional Chinese medicine case, finds key symptom characteristics of disease diagnosis, an adopted therapeutic rule and treatment method, an applicable prescription and a core traditional Chinese medicine composition, and forms a traditional Chinese medicine dialectical treatment mode aiming at different diseases, syndrome types and syndromes. The module implementation process is shown in fig. 2. Firstly, a medical record set is screened according to mining conditions input by a user. The excavation conditions include: 1) the 'doctor name' is mined according to the medical record of the doctor; 2) the "disease name", "syndrome name" and "syndrome name" refer to which disease, syndrome or syndrome is mined. Then, different mining parameters, namely the minimum support degree parameter, are set for different types of nodes in the medical record graph according to the actual mining requirements of the user. The node types capable of independently setting the minimum support degree parameter comprise a rule node, a prescription node, a traditional Chinese medicine composition node and a symptom node. The minimum support parameter of the symptom node can be set in two ways: 1) unifying the minimum support degree, namely adopting a unified minimum support degree parameter for all symptom nodes; 2) and multiple minimum support degrees, namely setting a special minimum support degree parameter for each symptom attribute according to the value range and the data distribution characteristics of different symptom attributes. And finally, mining frequent subgraphs in a graph set by taking the minimum support degree parameters corresponding to different node types as filtering conditions of the frequency to obtain a traditional Chinese medicine syndrome differentiation treatment mode comprising key symptom characteristics, a common treatment rule treatment method, an applicable prescription and core traditional Chinese medicines.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and are not limited. Although the present invention has been described in detail with reference to the embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention as defined in the appended claims.