CN109065173B - Knowledge path acquisition method - Google Patents

Knowledge path acquisition method Download PDF

Info

Publication number
CN109065173B
CN109065173B CN201810751261.5A CN201810751261A CN109065173B CN 109065173 B CN109065173 B CN 109065173B CN 201810751261 A CN201810751261 A CN 201810751261A CN 109065173 B CN109065173 B CN 109065173B
Authority
CN
China
Prior art keywords
path
knowledge
node
preset
paths
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810751261.5A
Other languages
Chinese (zh)
Other versions
CN109065173A (en
Inventor
谢永红
哈爽
张德政
阿孜古丽
栗辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology Beijing USTB
Original Assignee
University of Science and Technology Beijing USTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology Beijing USTB filed Critical University of Science and Technology Beijing USTB
Priority to CN201810751261.5A priority Critical patent/CN109065173B/en
Publication of CN109065173A publication Critical patent/CN109065173A/en
Application granted granted Critical
Publication of CN109065173B publication Critical patent/CN109065173B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses a knowledge path acquisition method. The method comprises the steps of obtaining an initial node of a knowledge path to be searched, wherein the initial node is symptom information and/or patient basic information, the knowledge path is composed of a plurality of nodes, and the node is a concept layer feature associated with the symptom information and/or the patient basic information; determining an end point of a path to be searched, wherein the end point is a concept layer characteristic yin or yang obtained by searching the path according to symptom information and/or patient basic information; carrying out routing between an initial node and a terminal point through a greedy algorithm to obtain a plurality of knowledge paths; the method and the device have the advantages that the preset number of knowledge paths to be searched are obtained by screening the plurality of knowledge paths through feature optimization, the technical problem that in the prior art, the traditional Chinese medicine data cannot be efficiently analyzed due to the fact that the traditional Chinese medicine symptom data have problems when case reasoning is carried out is solved, and the technical effect of efficiently and accurately analyzing the traditional Chinese medicine data is achieved.

Description

Knowledge path acquisition method
Technical Field
The invention relates to the field of traditional Chinese medicine data analysis, in particular to a knowledge path acquisition method.
Background
With the rapid development of society and the continuous improvement of the living standard of people, people pay more attention to the health condition of the people. How to improve the medical level and reasonably utilize medical resources becomes a hot topic of research. As a precious wealth in the medical field of China, Chinese medicine is increasingly concerned by people due to the historical deposition and unique ways and curative effects for treating diseases.
The symptoms are the core data of the traditional Chinese medicine cases and also the main basis for case reasoning, and the data quality of the symptom part directly influences the final case reasoning result. The traditional Chinese medicine is developed for thousands of years, various medical classics are as great as the sea of cigarettes, and meanwhile, the development directions and the development degrees of the traditional Chinese medicine are slightly different due to the fact that the amplitude of Chinese staffs is wide and the factors such as geographical environment, natural resources and the like are different in different areas. When different old traditional Chinese medicines record medical records, the following problems exist in the symptom data in the traditional Chinese medicine medical records due to different personal preferences and recognitions:
1) data loss
The data loss is mainly reflected in tongue diagnosis and pulse diagnosis. In different medical cases of old traditional Chinese medicine, the description degree of the information of tongue diagnosis and pulse diagnosis can be different. For example, some of the old TCM will record the pulse completely as "wiry pulse" in the medical record, but some of the old TCM will record the pulse as "wiry".
2) Terminology is irregular
The synonyms and synonyms are very common in the traditional Chinese medical record. For example, red tongue and red tongue are synonymous, but different old Chinese medicine may record this symptom as red tongue or red tongue in the medical record due to personal habit problems.
3) Text too short
The symptom description part of each case usually contains only the symptom entity itself, and the number of symptom words is usually not so many. The specific part of each medical case usually contains 5-10 symptom words, and the tongue diagnosis and pulse diagnosis part usually contains only 1-3 symptom words. Meanwhile, the back of the symptom words usually contains rich implied semantic information which is difficult to directly acquire from the symptom words.
The prior art provides an acquisition method for expanding traditional Chinese medicine symptoms into an instance layer and an attribute layer, aims at the technical problem that the traditional Chinese medicine data cannot be efficiently analyzed when case reasoning of a concept layer is carried out due to the problems of the traditional Chinese medicine symptom data, and does not provide an effective solution for acquiring the concept layer at present.
Disclosure of Invention
The embodiment of the invention provides a method for acquiring a knowledge path, which at least solves the technical problem that the traditional Chinese medicine data cannot be efficiently analyzed due to the problem of the data of the traditional Chinese medicine symptoms in case reasoning in the prior art.
According to an aspect of the embodiments of the present invention, there is provided a method for acquiring a knowledge path, including: acquiring an initial node of a knowledge path to be searched, wherein the initial node is symptom information and/or patient basic information, the knowledge path is composed of a plurality of nodes, and the nodes are concept layer features associated with the symptom information and/or the patient basic information; determining an end point of a path to be searched, wherein the end point is a concept layer characteristic yin or yang obtained by searching the path according to symptom information and/or patient basic information; carrying out routing between the initial node and the end point through a greedy algorithm to obtain a plurality of knowledge paths; and screening the plurality of knowledge paths through feature optimization to obtain a preset number of to-be-searched knowledge paths.
Further, obtaining the initial node of the knowledge path to be searched includes: and judging that the initial node is consistent with a preset standard word, and taking the initial node as a starting point of the path to be searched, wherein the preset standard word is a standardized word in the symptom information and/or the basic information of the patient.
Further, the method includes, when it is determined that the initial node is inconsistent with a preset standard word: calculating the similarity between the preset standard words and the initial nodes; searching a preset standard word with the similarity to the initial node exceeding a threshold value; and taking the preset standard words with the similarity exceeding the threshold as the starting points of the knowledge path to be searched.
Further, the obtaining a plurality of knowledge paths by routing between the initial node and the end point through a greedy algorithm comprises: and performing path searching between the initial node and the end point by combining a path acquisition function and a greedy algorithm to obtain a plurality of knowledge paths, wherein the path acquisition function is used for increasing the path length between the starting point and the end point by a preset step length to acquire the path, and the step length is the path length between the starting point and the end point which is increased each time in the path searching process.
Further, the obtaining a plurality of knowledge paths by performing path finding between the initial node and the end point through the greedy algorithm and the path obtaining function includes: acquiring preset intermediate nodes between the initial nodes and the end points, wherein the preset intermediate nodes are words of preset types, the preset types are etiology, pathogenesis, disease nature, syndrome and meridian points respectively, the initial nodes, the end points and the preset intermediate nodes form paths, and the number of the preset intermediate nodes is the length of the preset path minus one; judging that the path contains a preset intermediate node of a preset type; and taking the path containing the preset intermediate node of the preset type as a knowledge path.
Further, in a case that the path does not include all preset intermediate nodes of the preset type, the method includes: continuing to increase the path length between the starting point and the end point by a preset step length until reaching the preset path length, wherein the preset path length is the number of preset intermediate nodes; acquiring a preset intermediate node between the initial node and the terminal; and taking a path containing a preset intermediate node between the initial node and the end point as a knowledge path.
Further, the screening of the plurality of knowledge paths through feature optimization to obtain a predetermined number of knowledge paths to be searched includes: calculating a score of each knowledge path in the plurality of knowledge paths; prioritizing the knowledge paths according to the scores; and taking the knowledge path with high priority as the knowledge path to be searched, wherein the high priority is to calculate the score of each knowledge path in the knowledge paths.
According to another aspect of the embodiments of the present invention, there is also provided a knowledge path acquiring system, including: the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an initial node of a knowledge path to be searched, the initial node is symptom information and/or patient basic information, the knowledge path is composed of a plurality of nodes, and the nodes are concept layer features associated with the symptom information and/or the patient basic information; the determining unit is used for determining an end point of a path to be found, wherein the end point is a concept layer characteristic yin or yang obtained by finding the path according to symptom information and/or patient basic information; the searching unit is used for searching paths between the initial node and the terminal point to obtain a plurality of knowledge paths; and the screening unit is used for screening the plurality of knowledge paths to obtain a predetermined number of to-be-searched knowledge paths.
In the embodiment of the invention, an initial node for acquiring a knowledge path to be searched is adopted, wherein the initial node is symptom information and/or patient basic information, the knowledge path is composed of a plurality of nodes, and the nodes are concept layer characteristics associated with the symptom information and/or the patient basic information; determining an end point of a knowledge path to be searched, wherein the end point is a concept layer feature yin or yang obtained by searching the path according to symptom information and/or patient basic information, and the yin or the yang belongs to the concept layer feature; carrying out routing between the initial node and the end point through a greedy algorithm to obtain a plurality of knowledge paths; the method for screening the plurality of knowledge paths to obtain the predetermined number of to-be-searched knowledge paths through feature optimization solves the technical problem that the traditional Chinese medicine data cannot be efficiently analyzed due to the problem of the data of the traditional Chinese medicine symptoms in case reasoning in the prior art, and achieves the technical effect of efficiently and accurately analyzing the traditional Chinese medicine data.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow chart of a knowledge path acquisition method according to an embodiment of the invention;
FIG. 2 is an alternative conceptual layer signature associated with tongue redness, in accordance with embodiments of the invention;
FIG. 3 is a diagram illustrating an alternative abdominal pain knowledge path query result according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of an optimized conceptual feature store according to an embodiment of the invention;
FIG. 5 is a flow diagram of concept level feature acquisition according to an embodiment of the invention;
FIG. 6 is a schematic diagram of a knowledge path acquisition system according to an embodiment of the invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In accordance with an embodiment of the present invention, there is provided an embodiment of a knowledge path acquisition method, it is noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer-executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.
Fig. 1 is a knowledge path acquisition method according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:
step S102, acquiring an initial node of a knowledge path to be searched, wherein the initial node is symptom information and/or patient basic information, the knowledge path is composed of a plurality of nodes, and the node is a concept layer characteristic associated with the symptom information and/or the patient basic information;
step S104, determining an end point of a path to be searched, wherein the end point is a concept layer characteristic yin or yang obtained by searching the path according to symptom information and/or patient basic information, and the yin or the yang is one of the concept layer characteristics;
step S106, carrying out routing between the initial node and the end point through a greedy algorithm to obtain a plurality of knowledge paths;
and step S108, screening the plurality of knowledge paths through feature optimization to obtain a predetermined number of to-be-searched knowledge paths.
The above steps are performed based on the data storage structure of the knowledge-graph, and since the data storage structure of the knowledge-graph is a graph, the above steps specify the start and end boundaries of the search expansion by setting the start point and the end point, thereby forming the knowledge path.
In the above step S104, the theory of yin and yang is considered as a specific thinking method in the traditional Chinese medicine, and is widely used to explain the life activities of the human body, the causes and pathological changes of diseases, and to guide the diagnosis and prevention of diseases. In the course of treatment based on syndrome differentiation, famous and old Chinese medicine also teaches that all things belong to yin and yang, and the meaning is that the external disease features can be linked with yin and yang through some relations. Based on this theory, inside the knowledge path described above, the termination node can be designated as either a negative or a positive.
After the above step S106, a plurality of knowledge paths are obtained, each knowledge path corresponds to a case where the symptoms in the case have the concept layer characteristics and the attribute layer characteristics after the expansion based on the symptom label system. For example, as shown in fig. 2, the symptom information "tongue red" in fig. 2 has nodes having various semantic relationships with "tongue red", that is, when the "tongue red" is taken as an initial node to perform the path search, various knowledge paths are found, and each knowledge path includes many nodes having various semantic relationships with "tongue red". Since keeping all the nodes (conceptual level features) associated with "tongue red" increases a large amount of subsequent workload reduction efficiency, the number of knowledge paths is reduced to an appropriate predetermined number by step S108.
The initial node in the above steps refers to an instance layer feature and an attribute layer feature, the instance layer feature is a set of words in the symptom information and/or the patient basic information, one instance layer feature is a certain word (a node in the path) in the instance layer, the attribute layer feature describes the basic information of the data object, and the attribute data can be directly or indirectly obtained from the data object itself, for example, the attribute layer feature includes a set of words that decompose some words in the symptom information and/or the patient basic information.
And performing path search by taking the instance layer features and/or the attribute layer features in the steps as the starting nodes, finding all possible knowledge paths between the initial nodes and the end points, and fully mining semantic features (concept layer features) implied by each symptom word (initial nodes) because a plurality of knowledge paths contain all concept layer features associated with the initial nodes. The technical problem that the traditional Chinese medicine data cannot be efficiently and accurately analyzed during case reasoning due to the short symptom text of the traditional Chinese medicine data is solved, and the purpose of efficiently analyzing the traditional Chinese medicine data is achieved.
Since there are problems of data loss and term irregularity in the symptom data of the chinese medical science, the initial node in the above steps may be normative and non-normative. The initial node for obtaining the path to be learned may first determine whether the initial node is consistent with a preset standard word, the consistency of the initial node with the preset standard word indicates that the data specification is not missing, and in an optional implementation manner, the initial node is used as a starting point of the path to be learned when the word of the initial node is a standard word in the symptom information and/or the basic information of the patient.
Judging that the initial node is inconsistent with the preset standard word and represents data loss or terms are not standard, wherein in an optional implementation mode, the similarity between the preset standard word and the initial node needs to be calculated firstly; searching a preset standard word with the similarity exceeding a threshold value with the initial node; and then, taking the preset standard words with the similarity exceeding the threshold as the starting points of the knowledge paths to be searched. For example, the standard symptom words are used as initial nodes of the knowledge path, some standard symptom words are preset, and the set of the standard symptom words is used as the preset standard words. And when a certain symptom is subjected to labeling processing, and a corresponding preset standard word is not found, performing similarity calculation on the initial nodes of the marked example layer feature and attribute layer feature labels and the preset standard word, and taking the preset standard word with the similarity exceeding a threshold value and the maximum similarity as the initial node to perform knowledge routing.
Through the steps, the problems of data loss and data irregularity can be solved to the maximum extent, and therefore the analysis efficiency of the traditional Chinese medicine medical record data is improved.
In an optional implementation manner, a plurality of knowledge paths are obtained by performing path finding between an initial node and an end point through a path obtaining function in combination with a greedy algorithm, where the path obtaining function is a function that increases the path length between the start point and the end point by a preset step length to obtain a path, and the step length is the path length between the start point and the end point that is increased each time in the path finding process. For example, the start node and the end node of the knowledge path are first specified, and the path length between the start point and the end point is gradually increased by a certain step h. In order to make the path length grow uniformly, and thus facilitate obtaining each possible knowledge path, the value of h is set to 1. When the path length between two nodes exceeds 6, the association relationship between the two nodes becomes very weak, so that the upper limit k of the path length is set to 6.
According to the description of the symptoms in the basic theory of traditional Chinese medicine, the conceptual characteristics of the symptoms of traditional Chinese medicine can be divided into five words in advance; the predetermined types are etiology, pathogenesis, disease nature, syndrome and meridian points. In order to obtain concept level characteristics of symptoms through a greedy algorithm and a path obtaining function, in an optional implementation manner, first, a preset intermediate node between an initial node and a terminal point is obtained, wherein the preset intermediate node is a word of a preset type, the preset type is a cause, a pathogenesis, a disease property, a syndrome and meridian points, the initial node, the terminal point and the preset intermediate node form a path, and the number of the preset intermediate node is the length of the preset path minus one; secondly, judging a preset intermediate node of a preset type in the path; then, a path containing a preset intermediate node of a preset type is taken as a knowledge path.
All concept characteristics related to each initial node (such as symptom words) can be fully mined through the steps, the mined concept characteristics can form paths containing etiology, pathogenesis, disease property, syndrome and meridian points according to the theory of traditional Chinese medicine, each path contains related etiology, pathogenesis, disease property, syndrome and meridian points, and multiple paths are paths sets containing different etiology, pathogenesis, disease property, syndrome and meridian points related to the initial nodes, namely medical case data matched with the initial input symptom words can be found from multiple layers of semantics, so that the efficiency and the accuracy of data analysis are greatly improved, and convenience is provided for case reasoning.
In the case that the path does not include all preset intermediate nodes of the preset type, in an alternative embodiment, the path length between the starting point and the end point is continuously increased by a preset step length until reaching the preset path length, wherein the preset path length is the number of the preset intermediate nodes; acquiring a preset intermediate node between an initial node and a terminal; and taking a path containing a preset intermediate node between the initial node and the end point as a knowledge path. For example, the path acquisition function may determine whether the acquired path includes five types of intermediate nodes, i.e., etiology, pathogenesis, disease nature, syndrome, and meridian points, according to the concept of Cypher language and greedy algorithm and the length of the knowledge path specified each time. And if the nodes cannot be obtained completely, increasing the length of the path by a certain step length until all the five types of nodes can be obtained.
The above process is illustrated below by an alternative embodiment:
the concept features of the symptoms exist on a specific knowledge path, the knowledge path is required to be acquired when the concept layer features are acquired, the preset intermediate nodes are knowledge path templates, and the knowledge path templates of the five preset types of the symptom concept features are etiology, pathogenesis, disease nature, syndrome and meridian points respectively. Based on these, a preliminarily simplified conceptual feature can be obtained. For each knowledge path mode, a plurality of corresponding example paths can be expanded in the knowledge graph, as shown in fig. 3, the starting node is a symptom "abdominal pain", the ending node is a "yang", and the knowledge path mode is a "syndrome relation-disease location relation-sub-concept", so that the knowledge path set shown in fig. 3 can be obtained through the knowledge graph. That is to say, 11 route examples are expanded under the specific knowledge route "syndrome relation-location relation-sub-concept", and the 11 routes contain 5 kinds of syndrome information.
For another example, a greedy algorithm is used to perform routing between the initial node and the end point to obtain a plurality of knowledge paths, where the knowledge path taking symptoms as the routing initial node contains 23 knowledge paths related to syndromes, and if the syndromes expanded by the 23 knowledge paths are left without processing, a large-scale syndrome feature set is obtained, and a part of redundant features unrelated to retrieval exist in the syndrome set, and a certain reduction strategy needs to be adopted to further simplify the obtained final knowledge path to be searched. In an optional implementation manner, a plurality of knowledge paths are screened through feature optimization, and priority ranking is performed on the knowledge paths; taking the knowledge path with high priority as the knowledge path to be searched, wherein the high priority is that the score of each knowledge path in the knowledge paths is high, and the score is calculated according to the following calculation formula (1):
Figure BDA0001725594380000071
wherein S ispA ranking score representing a certain path P; eqAs a set of query entities, Eq={e1,e2,…eiH, e represents a node; p is a relationship path; obtaining a predetermined number of knowledge paths to be searched includes: calculating a score of each knowledge path in the plurality of knowledge paths; according to the score pair hEq,p(e)Representing the probability of the starting node walking to the second node in one step; h isEq,p(e)Is calculated according to equation (2):
Figure BDA0001725594380000072
Cpis calculated according to equation (3):
Figure BDA0001725594380000073
Cprepresents the importance of the path P formed by nodes e and e', where CpRepresents the degree of importance of the path; ceRepresenting the degree of importance of a node, which is calculated according to equation (4):
Figure BDA0001725594380000081
wherein, Degrid is the node Degree, ClusterCoffective is the aggregation coefficient, and in order to balance the node Degree and the importance of the aggregation coefficient, the value of alpha is 0.5.
The steps reserve important parts in the concept characteristics through a characteristic optimization strategy (PRA), and delete some concept characteristics with lower scores.
PRA-based feature optimization strategies. For example, in the statistical analysis of the path score ranking results with the symptom as the starting point and yin and yang as the ending point, the number of intermediate nodes of the knowledge path ranked after the sixth name is too large, so the ranking threshold K is 5.
The conceptual features corresponding to these paths will be used in the next case base construction and case retrieval stage. As shown in fig. 4, after the medical scheme with medical scheme id 125 is primarily obtained and optimized by features, the concept layer features are finally stored in the database.
The entire process is described below in conjunction with fig. 5 according to an alternative embodiment:
the embodiment divides semantic features into three layers: instance layer features, attribute layer features, and concept layer features. The example layer and attribute layer features belong to the first two layers of features in the multilayer semantic features, and the essence of the method is that each symptom is subjected to primary detailed description once and can be directly or indirectly obtained from symptom words; the concept layer features belong to implicit semantic information, generally cannot be obtained through symptom data per se, and need to be assisted by some special means. Generally, concepts and instances are in a many-to-many relationship, that is, a concept may contain multiple instances, and an instance may be subordinate to multiple concepts; there is a one-to-many relationship between instances and attributes, i.e., an instance may contain several attributes. The three are defined as follows.
Definition 1 example R denotes dialectical information of a medical case and S denotesA set consisting of patient basic information and symptom words in the dialectical information, and for one dialectical information consisting of m words, may be expressed as R ═ { s ═ s1,s2,…,smIn which s isk∈S,k∈[1,m]. If there is a certain word skIndicating a particular symptom or patient-based information, is called skIs an example (Instance). Correspondingly, from skThe Set of compositions is referred to as an Instance Set. For example, "abdominal pain" is an example of a symptom, { diarrhea, hematochezia, abdominal pain, poor sleep, dark red, thin white, thready veins, chordal veins } represents a set of example symptoms, each of { } is an example, and { } represents only a set of example symptoms.
Define 2 attribute let I denote an instance and have the set D ═ a1,a2,…,amIn which a isk∈I,k∈[1,m]At this moment, it is called akAn Attribute (Attribute) of instance I, and the collection D is an Attribute Set (Attribute Set) of instance I. The attributes of an instance may be derived directly or indirectly from the instance. For example, the symptom instance "tongue quality is pale red", and its attributes include { tongue quality, pale, red, pale red }, and this set is referred to as the attribute set of the symptom instance "tongue quality is pale red".
Definition 3 concept R denotes dialectic information of a medical case, S denotes a set consisting of patient basic information and symptom word in the dialectic information, and for a dialectic information consisting of m words, may be expressed as R ═ { S ═ S1,s2,…,smIn which s isk∈S,k∈[1,m],siIs an example of forensic information. If there is one ckIs siOr an implicit characteristic associated therewith, is called ckIs an example siA Concept of (Concept). Correspondingly, a plurality of ckThe composed Set is called a Concept Set (Concept Set). The symptoms of abdominal pain are the underlying features of abdominal pain in the examples. For example, the "red tongue" may belong to a plurality of syndromes such as "intestinal dryness and fluid deficiency syndrome", "small intestine excess heat syndrome", "dampness-heat in spleen", "liver and gallbladder dampness-heat syndrome" and "gallbladder stagnation and phlegm disturbance syndrome", and the corresponding syndrome Z ═{ the syndrome of intestinal dryness and fluid deficiency, the syndrome of damp-heat in the small intestine, the syndrome of damp-heat in the spleen, and the syndrome of gallbladder stagnation and phlegm disturbance } is a concept set of the symptom "red tongue".
The example layer and attribute layer features describe the basic information of the symptom, and the essence of the example layer and attribute layer features is a detailed description of the symptom. The acquisition of the characteristics of the instance layer and the attribute layer can perform standard and structured representation on the Chinese traditional symptoms, so that the symptoms realize the primary extension of the semantic level. Meanwhile, a foundation can be laid for obtaining the characteristics of the concept layer, and the obtaining method is an automatic characteristic obtaining method based on a traditional Chinese medicine symptom label system (the construction method of the traditional Chinese medicine symptom label system refers to a patent with the patent number of 201611235453.8). After the extension of the symptom-based label system, the symptoms in the medical record have been characterized by a concept layer and an attribute layer.
The multi-layer semantic features of the present embodiment include three layers: the system comprises an instance layer, an attribute layer and a concept layer, wherein the feature acquisition of the concept layer is complex and can be realized by a special domain knowledge base. The embodiment is based on a concept layer feature acquisition path of the knowledge graph, and aims to determine which concepts should be used as extended semantic features of the given entity by analyzing the position of the given entity in the knowledge graph and which concepts in semantic relation with the given entity. The knowledge graph is a complex semantic network based on a graph model, and complex semantic relations exist between nodes. In theory, for a given entity, without any restriction, it is likely that it will extend a very large number of semantic features within the knowledge-graph. The following results are easily caused: 1) too many features easily cause dimension disaster 2) a great amount of redundant features may exist in original features 3) in various retrieval applications, calculating the similarity of the features between entities (symptom words) is an important method for measuring the similarity of the entities, a great amount of features with lower feature weights exist in an original feature set, and the features do not bring great help to the calculation of the similarity between the entities, but increase the time complexity of a retrieval system.
In order to obtain the conceptual level characteristics of the symptom, the core steps of the embodiment include: step one, acquiring knowledge paths among nodes; acquiring concept layer characteristics based on the knowledge path; and step three, further optimizing and selecting the characteristics of the concept layer. The method comprises the following specific steps:
the method comprises the following steps: after the extension of the symptom-based label system, the symptoms in the medical record have been characterized by a concept layer and an attribute layer. After a certain symptom is subjected to labeling processing, a corresponding standard symptom word is not found, at this time, similarity calculation is carried out on the existing standard word according to the marked example layer and attribute layer feature labels, and knowledge routing is carried out on the standard symptom word with the similarity exceeding a threshold value and the maximum similarity.
Step two: the knowledge graph is a semantic network, and the relationship between nodes is complex. In the case of knowing the start point and the end point of the knowledge path, it is difficult to obtain information of intermediate nodes and relationships between the two according to the prior knowledge. In order to solve the problem of path acquisition, a knowledge path acquisition strategy based on a greedy algorithm is provided. Greedy algorithm is also called greedy algorithm. As the name implies, it is always imperative to be able to make the choice that seems best at the present time when solving the problem. The method has no fixed algorithm framework, and the core idea is to select an optimal greedy strategy. Aiming at the knowledge path finding problem in the embodiment, the idea of a greedy algorithm can be used for helping to find the knowledge path between the nodes.
Step three: and (5) based on the knowledge path template in the step two, obtaining a rough conceptual layer feature set by using the most basic conceptual layer feature acquisition method. The most basic concept layer feature acquisition method is to use all knowledge paths of each concept feature for expansion.
Step four: based on the acquired knowledge path, the embodiment can obtain a concept layer feature set with rough symptoms. The concept layer feature set is called a rough concept layer feature set because each symptom still has more spread concept layer features and still contains some features irrelevant to practical application. In order to further optimize the concept layer feature set, a feature optimization method based on PRA is provided, so that the concept layer feature set expanded from symptom data in each case is cleaner. Pra (path Ranking Algorithm), which can be regarded as a modified version of the Random Walk Algorithm (RWA), is equivalent to Random Walk on a sequence set along a set of edges with a specific type of information, i.e., RWA Algorithm that restricts the Walk path.
The embodiment aims at the problems of data loss, irregular terms, short texts and the like of symptom data in the traditional Chinese medical scheme, multi-level feature expansion is carried out on symptoms in the case by using a multi-layer semantic feature technology, a concept layer feature (knowledge path) technology can be automatically obtained based on a knowledge graph, and the analysis efficiency and the analysis accuracy of the symptom data in the traditional Chinese medical scheme are improved.
An embodiment of the present invention provides a knowledge path acquiring system, and fig. 6 is a knowledge path acquiring system according to an embodiment of the present invention, and as shown in fig. 6, the system includes:
an obtaining unit 62, configured to obtain an initial node of a knowledge path to be searched, where the initial node is symptom information and/or patient basic information, the knowledge path is formed by multiple nodes, and the node is a conceptual layer feature associated with the symptom information and/or the patient basic information;
a determining unit 64, configured to determine an end point of the path to be learned, where the end point is a concept layer feature yin or yang obtained by routing according to the symptom information and/or the patient basic information;
a searching unit 66, configured to perform path finding between the initial node and the destination to obtain a plurality of knowledge paths;
and the screening unit 68 is configured to screen the plurality of knowledge paths to obtain a predetermined number of knowledge paths to be searched.
The embodiment of the system for acquiring the knowledge path corresponds to the method for acquiring the knowledge path, so the beneficial effects are not described again.
The embodiment of the invention provides a storage medium, which comprises a stored program, wherein when the program runs, a device on which the storage medium is positioned is controlled to execute the method.
The embodiment of the invention provides a processor, which comprises a processing program, wherein when the program runs, a device where the processor is located is controlled to execute the method.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (7)

1. A knowledge path acquisition method is characterized by comprising the following steps:
acquiring an initial node of a knowledge path to be searched, wherein the initial node is symptom information and/or patient basic information, the knowledge path is composed of a plurality of nodes, and the nodes are concept layer features associated with the symptom information and/or the patient basic information;
the method for acquiring the initial node of the knowledge path to be searched comprises the following steps:
if the initial node is judged to be consistent with a preset standard word, the initial node is used as a starting point of a path to be searched, wherein the preset standard word is a standardized word in symptom information and/or patient basic information;
and under the condition of judging that the initial node is inconsistent with a preset standard word, the method comprises the following steps:
calculating the similarity between the preset standard words and the initial nodes;
searching a preset standard word with the similarity to the initial node exceeding a threshold value;
taking a preset standard word with the similarity exceeding a threshold value as a starting point of a knowledge path to be searched;
determining an end point of a path to be searched, wherein the end point is a concept layer characteristic yin or yang obtained by searching the path according to symptom information and/or patient basic information;
carrying out routing between the initial node and the end point through a greedy algorithm to obtain a plurality of knowledge paths;
screening a plurality of knowledge paths through feature optimization to obtain a predetermined number of knowledge paths to be searched; the method comprises the following steps:
calculating a score of each knowledge path in the plurality of knowledge paths;
prioritizing the knowledge paths according to the scores;
taking a knowledge path with high priority as a knowledge path to be searched, wherein the high priority is to calculate the score of each knowledge path in the knowledge paths;
calculating the score according to the following calculation formula (1):
Figure FDA0003488658090000011
wherein S ispA ranking score representing a certain path P; eqAs a set of query entities, Eq={e1,e2,…eiH, e represents a node; p is a relationship path; obtaining a predetermined number of knowledge paths to be searched includes: calculating a score of each knowledge path in the plurality of knowledge paths; according to the score pair hEq,p(e)Representing the probability of the starting node walking to the second node in one step; h isEq,p(e)Is calculated according to equation (2):
Figure FDA0003488658090000021
Cpis calculated according to equation (3):
Figure FDA0003488658090000022
Cprepresenting the importance of the path P formed by the nodes e and e'; ceRepresenting the degree of importance of a node, which is calculated according to equation (4):
Figure FDA0003488658090000023
wherein, Degrid is the node Degree, ClusterCoffective is the aggregation coefficient, and in order to balance the node Degree and the importance of the aggregation coefficient, the value of alpha is 0.5.
2. The method of claim 1, wherein obtaining a plurality of knowledge paths by routing between the initial node and the end point using a greedy algorithm comprises:
and performing path searching between the initial node and the end point by combining a path acquisition function and a greedy algorithm to obtain a plurality of knowledge paths, wherein the path acquisition function is used for increasing the path length between the starting point and the end point by a preset step length to acquire the path, and the step length is the path length between the starting point and the end point which is increased each time in the path searching process.
3. The method of claim 2, wherein obtaining a plurality of knowledge paths by routing between the initial node and the end point via the greedy algorithm and the path acquisition function comprises:
acquiring preset intermediate nodes between the initial nodes and the end points, wherein the preset intermediate nodes are words of preset types, the preset types are etiology, pathogenesis, disease nature, syndrome and meridian points respectively, the initial nodes, the end points and the preset intermediate nodes form paths, and the number of the preset intermediate nodes is the length of the preset path minus one;
judging that the path contains all preset intermediate nodes of preset types;
and taking the path containing all the preset intermediate nodes of the preset types as a knowledge path.
4. The method according to claim 3, wherein in case that no preset intermediate nodes of all preset types are included in the path, the method comprises:
continuing to increase the path length between the starting point and the end point by a preset step length until the preset path length is reached, wherein the preset path length is obtained by adding one to the number of preset intermediate nodes;
acquiring a preset intermediate node between the initial node and the terminal;
and taking a path containing a preset intermediate node between the initial node and the end point as a knowledge path.
5. A knowledge path acquisition system, comprising:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an initial node of a knowledge path to be searched, the initial node is symptom information and/or patient basic information, the knowledge path is composed of a plurality of nodes, and the nodes are concept layer features associated with the symptom information and/or the patient basic information;
the obtaining unit is specifically configured to:
if the initial node is judged to be consistent with a preset standard word, the initial node is used as a starting point of a path to be searched, wherein the preset standard word is a standardized word in symptom information and/or patient basic information;
and under the condition of judging that the initial node is inconsistent with a preset standard word, the method comprises the following steps:
calculating the similarity between the preset standard words and the initial nodes;
searching a preset standard word with the similarity to the initial node exceeding a threshold value;
taking a preset standard word with the similarity exceeding a threshold value as a starting point of a knowledge path to be searched;
the determining unit is used for determining an end point of a path to be found, wherein the end point is a concept layer characteristic yin or yang obtained according to symptom information and/or patient path finding;
the searching unit is used for searching paths between the initial node and the terminal point to obtain a plurality of knowledge paths;
the screening unit is used for screening the plurality of knowledge paths to obtain a predetermined number of to-be-searched knowledge paths;
the screening unit is specifically configured to:
calculating a score of each knowledge path in the plurality of knowledge paths;
prioritizing the knowledge paths according to the scores;
taking a knowledge path with high priority as a knowledge path to be searched, wherein the high priority is to calculate the score of each knowledge path in the knowledge paths;
calculating the score according to the following calculation formula (1):
Figure FDA0003488658090000031
wherein S ispA ranking score representing a certain path P; eqAs a set of query entities, Eq={e1,e2,…eiH, e represents a node; p is a relationship path; obtaining a predetermined number of knowledge paths to be searched includes: calculating a score of each knowledge path in the plurality of knowledge paths; according to the score pair hEq,p(e)Representing the probability of the starting node walking to the second node in one step; h isEq,p(e)Is calculated according to equation (2):
Figure FDA0003488658090000032
Cpis calculated according to equation (3):
Figure FDA0003488658090000041
Cprepresenting the importance of the path P formed by the nodes e and e'; ceRepresenting the degree of importance of a node, which is calculated according to equation (4):
Figure FDA0003488658090000042
wherein, Degrid is the node Degree, ClusterCoffective is the aggregation coefficient, and in order to balance the node Degree and the importance of the aggregation coefficient, the value of alpha is 0.5.
6. A storage medium, characterized in that the storage medium comprises a stored program, wherein the program performs the method of any one of claims 1 to 4.
7. A processor, characterized in that the processor is configured to run a program, wherein the program when running performs the method of any of claims 1 to 4.
CN201810751261.5A 2018-07-10 2018-07-10 Knowledge path acquisition method Active CN109065173B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810751261.5A CN109065173B (en) 2018-07-10 2018-07-10 Knowledge path acquisition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810751261.5A CN109065173B (en) 2018-07-10 2018-07-10 Knowledge path acquisition method

Publications (2)

Publication Number Publication Date
CN109065173A CN109065173A (en) 2018-12-21
CN109065173B true CN109065173B (en) 2022-04-19

Family

ID=64819404

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810751261.5A Active CN109065173B (en) 2018-07-10 2018-07-10 Knowledge path acquisition method

Country Status (1)

Country Link
CN (1) CN109065173B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110335675B (en) * 2019-06-20 2021-10-01 北京科技大学 Syndrome differentiation method based on traditional Chinese medicine knowledge graph library
CN110825862B (en) * 2019-11-06 2022-12-06 北京诺道认知医学科技有限公司 Intelligent question and answer method and device based on pharmacy knowledge graph
CN112988994B (en) * 2021-03-04 2023-03-21 网易(杭州)网络有限公司 Conversation processing method and device and electronic equipment
CN113609250A (en) * 2021-06-29 2021-11-05 中国科学院微生物研究所 Method and device for mining knowledge of coronavirus associated data based on scientific angle
CN113611424A (en) * 2021-06-29 2021-11-05 中国科学院微生物研究所 Method and device for knowledge mining of coronavirus associated data based on strain angle

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106227820A (en) * 2016-07-22 2016-12-14 北京科技大学 A kind of construction method of Basic Theories of Chinese Medicine knowledge picture library
CN106570319A (en) * 2016-10-31 2017-04-19 北京科技大学 Method and device for determining traditional Chinese medicine diagnosis mode
CN106874695A (en) * 2017-03-22 2017-06-20 北京大数医达科技有限公司 The construction method and device of medical knowledge collection of illustrative plates
CN106933994A (en) * 2017-02-27 2017-07-07 广东省中医院 A kind of core disease card relation construction method based on knowledge of TCM collection of illustrative plates

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101615182A (en) * 2008-06-27 2009-12-30 西门子公司 Tcm symptom information storage system and tcm symptom information storage means

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106227820A (en) * 2016-07-22 2016-12-14 北京科技大学 A kind of construction method of Basic Theories of Chinese Medicine knowledge picture library
CN106570319A (en) * 2016-10-31 2017-04-19 北京科技大学 Method and device for determining traditional Chinese medicine diagnosis mode
CN106933994A (en) * 2017-02-27 2017-07-07 广东省中医院 A kind of core disease card relation construction method based on knowledge of TCM collection of illustrative plates
CN106874695A (en) * 2017-03-22 2017-06-20 北京大数医达科技有限公司 The construction method and device of medical knowledge collection of illustrative plates

Also Published As

Publication number Publication date
CN109065173A (en) 2018-12-21

Similar Documents

Publication Publication Date Title
CN109065173B (en) Knowledge path acquisition method
CN105653840B (en) The similar case recommender system and corresponding method shown based on words and phrases distribution table
CN113707297B (en) Medical data processing method, device, equipment and storage medium
Chen et al. The thematic and citation landscape of data and knowledge engineering (1985–2007)
CN109086356B (en) Method for diagnosing and correcting error connection relation of large-scale knowledge graph
CN116682553B (en) Diagnosis recommendation system integrating knowledge and patient representation
WO2015093541A1 (en) Scenario generation device and computer program therefor
CN108461110B (en) Medical information processing method, device and equipment
WO2015093540A1 (en) Phrase pair gathering device and computer program therefor
WO2017198039A1 (en) Tag recommendation method and device
Li et al. An approach for approximate subgraph matching in fuzzy RDF graph
CN112308157A (en) Decision tree-oriented transverse federated learning method
CN110189831A (en) A kind of case history knowledge mapping construction method and system based on dynamic diagram sequences
Wu et al. A novel community answer matching approach based on phrase fusion heterogeneous information network
WO2021238436A1 (en) Multi-drug sharing query method, mobile terminal and storage medium
CN106055908A (en) Personal medical information recommending method and system based on cloud computation
CN113220904A (en) Data processing method, data processing device and electronic equipment
CN115440392A (en) Important super-edge identification method based on post-deletion Laplace matrix
CN110033191B (en) Business artificial intelligence analysis method and system
CN104933296A (en) Big data processing method based on multi-dimensional data fusion and big data processing equipment based on multi-dimensional data fusion
Benavent et al. FCA-based knowledge representation and local generalized linear models to address relevance and diversity in diverse social images
CN106126681A (en) A kind of increment type stream data clustering method and system
Kundu et al. Building a graph database for storing heterogeneous healthcare data
CN107291875B (en) Metadata organization management method and system based on metadata graph
CN116186297A (en) Graph manifold learning-based literature relationship discovery method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant