CN109065173B

CN109065173B - Knowledge path acquisition method

Info

Publication number: CN109065173B
Application number: CN201810751261.5A
Authority: CN
Inventors: 谢永红; 哈爽; 张德政; 阿孜古丽; 栗辉
Original assignee: University of Science and Technology Beijing USTB
Current assignee: University of Science and Technology Beijing USTB
Priority date: 2018-07-10
Filing date: 2018-07-10
Publication date: 2022-04-19
Anticipated expiration: 2038-07-10
Also published as: CN109065173A

Abstract

The invention discloses a knowledge path acquisition method. The method comprises the steps of obtaining an initial node of a knowledge path to be searched, wherein the initial node is symptom information and/or patient basic information, the knowledge path is composed of a plurality of nodes, and the node is a concept layer feature associated with the symptom information and/or the patient basic information; determining an end point of a path to be searched, wherein the end point is a concept layer characteristic yin or yang obtained by searching the path according to symptom information and/or patient basic information; carrying out routing between an initial node and a terminal point through a greedy algorithm to obtain a plurality of knowledge paths; the method and the device have the advantages that the preset number of knowledge paths to be searched are obtained by screening the plurality of knowledge paths through feature optimization, the technical problem that in the prior art, the traditional Chinese medicine data cannot be efficiently analyzed due to the fact that the traditional Chinese medicine symptom data have problems when case reasoning is carried out is solved, and the technical effect of efficiently and accurately analyzing the traditional Chinese medicine data is achieved.

Description

Knowledge path acquisition method

Technical Field

The invention relates to the field of traditional Chinese medicine data analysis, in particular to a knowledge path acquisition method.

Background

With the rapid development of society and the continuous improvement of the living standard of people, people pay more attention to the health condition of the people. How to improve the medical level and reasonably utilize medical resources becomes a hot topic of research. As a precious wealth in the medical field of China, Chinese medicine is increasingly concerned by people due to the historical deposition and unique ways and curative effects for treating diseases.

The symptoms are the core data of the traditional Chinese medicine cases and also the main basis for case reasoning, and the data quality of the symptom part directly influences the final case reasoning result. The traditional Chinese medicine is developed for thousands of years, various medical classics are as great as the sea of cigarettes, and meanwhile, the development directions and the development degrees of the traditional Chinese medicine are slightly different due to the fact that the amplitude of Chinese staffs is wide and the factors such as geographical environment, natural resources and the like are different in different areas. When different old traditional Chinese medicines record medical records, the following problems exist in the symptom data in the traditional Chinese medicine medical records due to different personal preferences and recognitions:

1) data loss

The data loss is mainly reflected in tongue diagnosis and pulse diagnosis. In different medical cases of old traditional Chinese medicine, the description degree of the information of tongue diagnosis and pulse diagnosis can be different. For example, some of the old TCM will record the pulse completely as "wiry pulse" in the medical record, but some of the old TCM will record the pulse as "wiry".

2) Terminology is irregular

The synonyms and synonyms are very common in the traditional Chinese medical record. For example, red tongue and red tongue are synonymous, but different old Chinese medicine may record this symptom as red tongue or red tongue in the medical record due to personal habit problems.

3) Text too short

The symptom description part of each case usually contains only the symptom entity itself, and the number of symptom words is usually not so many. The specific part of each medical case usually contains 5-10 symptom words, and the tongue diagnosis and pulse diagnosis part usually contains only 1-3 symptom words. Meanwhile, the back of the symptom words usually contains rich implied semantic information which is difficult to directly acquire from the symptom words.

The prior art provides an acquisition method for expanding traditional Chinese medicine symptoms into an instance layer and an attribute layer, aims at the technical problem that the traditional Chinese medicine data cannot be efficiently analyzed when case reasoning of a concept layer is carried out due to the problems of the traditional Chinese medicine symptom data, and does not provide an effective solution for acquiring the concept layer at present.

Disclosure of Invention

The embodiment of the invention provides a method for acquiring a knowledge path, which at least solves the technical problem that the traditional Chinese medicine data cannot be efficiently analyzed due to the problem of the data of the traditional Chinese medicine symptoms in case reasoning in the prior art.

According to an aspect of the embodiments of the present invention, there is provided a method for acquiring a knowledge path, including: acquiring an initial node of a knowledge path to be searched, wherein the initial node is symptom information and/or patient basic information, the knowledge path is composed of a plurality of nodes, and the nodes are concept layer features associated with the symptom information and/or the patient basic information; determining an end point of a path to be searched, wherein the end point is a concept layer characteristic yin or yang obtained by searching the path according to symptom information and/or patient basic information; carrying out routing between the initial node and the end point through a greedy algorithm to obtain a plurality of knowledge paths; and screening the plurality of knowledge paths through feature optimization to obtain a preset number of to-be-searched knowledge paths.

Further, obtaining the initial node of the knowledge path to be searched includes: and judging that the initial node is consistent with a preset standard word, and taking the initial node as a starting point of the path to be searched, wherein the preset standard word is a standardized word in the symptom information and/or the basic information of the patient.

Further, the method includes, when it is determined that the initial node is inconsistent with a preset standard word: calculating the similarity between the preset standard words and the initial nodes; searching a preset standard word with the similarity to the initial node exceeding a threshold value; and taking the preset standard words with the similarity exceeding the threshold as the starting points of the knowledge path to be searched.

Further, the obtaining a plurality of knowledge paths by routing between the initial node and the end point through a greedy algorithm comprises: and performing path searching between the initial node and the end point by combining a path acquisition function and a greedy algorithm to obtain a plurality of knowledge paths, wherein the path acquisition function is used for increasing the path length between the starting point and the end point by a preset step length to acquire the path, and the step length is the path length between the starting point and the end point which is increased each time in the path searching process.

Further, the obtaining a plurality of knowledge paths by performing path finding between the initial node and the end point through the greedy algorithm and the path obtaining function includes: acquiring preset intermediate nodes between the initial nodes and the end points, wherein the preset intermediate nodes are words of preset types, the preset types are etiology, pathogenesis, disease nature, syndrome and meridian points respectively, the initial nodes, the end points and the preset intermediate nodes form paths, and the number of the preset intermediate nodes is the length of the preset path minus one; judging that the path contains a preset intermediate node of a preset type; and taking the path containing the preset intermediate node of the preset type as a knowledge path.

Further, in a case that the path does not include all preset intermediate nodes of the preset type, the method includes: continuing to increase the path length between the starting point and the end point by a preset step length until reaching the preset path length, wherein the preset path length is the number of preset intermediate nodes; acquiring a preset intermediate node between the initial node and the terminal; and taking a path containing a preset intermediate node between the initial node and the end point as a knowledge path.

Further, the screening of the plurality of knowledge paths through feature optimization to obtain a predetermined number of knowledge paths to be searched includes: calculating a score of each knowledge path in the plurality of knowledge paths; prioritizing the knowledge paths according to the scores; and taking the knowledge path with high priority as the knowledge path to be searched, wherein the high priority is to calculate the score of each knowledge path in the knowledge paths.

According to another aspect of the embodiments of the present invention, there is also provided a knowledge path acquiring system, including: the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an initial node of a knowledge path to be searched, the initial node is symptom information and/or patient basic information, the knowledge path is composed of a plurality of nodes, and the nodes are concept layer features associated with the symptom information and/or the patient basic information; the determining unit is used for determining an end point of a path to be found, wherein the end point is a concept layer characteristic yin or yang obtained by finding the path according to symptom information and/or patient basic information; the searching unit is used for searching paths between the initial node and the terminal point to obtain a plurality of knowledge paths; and the screening unit is used for screening the plurality of knowledge paths to obtain a predetermined number of to-be-searched knowledge paths.

In the embodiment of the invention, an initial node for acquiring a knowledge path to be searched is adopted, wherein the initial node is symptom information and/or patient basic information, the knowledge path is composed of a plurality of nodes, and the nodes are concept layer characteristics associated with the symptom information and/or the patient basic information; determining an end point of a knowledge path to be searched, wherein the end point is a concept layer feature yin or yang obtained by searching the path according to symptom information and/or patient basic information, and the yin or the yang belongs to the concept layer feature; carrying out routing between the initial node and the end point through a greedy algorithm to obtain a plurality of knowledge paths; the method for screening the plurality of knowledge paths to obtain the predetermined number of to-be-searched knowledge paths through feature optimization solves the technical problem that the traditional Chinese medicine data cannot be efficiently analyzed due to the problem of the data of the traditional Chinese medicine symptoms in case reasoning in the prior art, and achieves the technical effect of efficiently and accurately analyzing the traditional Chinese medicine data.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a flow chart of a knowledge path acquisition method according to an embodiment of the invention;

FIG. 2 is an alternative conceptual layer signature associated with tongue redness, in accordance with embodiments of the invention;

FIG. 3 is a diagram illustrating an alternative abdominal pain knowledge path query result according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of an optimized conceptual feature store according to an embodiment of the invention;

FIG. 5 is a flow diagram of concept level feature acquisition according to an embodiment of the invention;

FIG. 6 is a schematic diagram of a knowledge path acquisition system according to an embodiment of the invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In accordance with an embodiment of the present invention, there is provided an embodiment of a knowledge path acquisition method, it is noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer-executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.

Fig. 1 is a knowledge path acquisition method according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:

step S102, acquiring an initial node of a knowledge path to be searched, wherein the initial node is symptom information and/or patient basic information, the knowledge path is composed of a plurality of nodes, and the node is a concept layer characteristic associated with the symptom information and/or the patient basic information;

step S104, determining an end point of a path to be searched, wherein the end point is a concept layer characteristic yin or yang obtained by searching the path according to symptom information and/or patient basic information, and the yin or the yang is one of the concept layer characteristics;

step S106, carrying out routing between the initial node and the end point through a greedy algorithm to obtain a plurality of knowledge paths;

and step S108, screening the plurality of knowledge paths through feature optimization to obtain a predetermined number of to-be-searched knowledge paths.

The above steps are performed based on the data storage structure of the knowledge-graph, and since the data storage structure of the knowledge-graph is a graph, the above steps specify the start and end boundaries of the search expansion by setting the start point and the end point, thereby forming the knowledge path.

In the above step S104, the theory of yin and yang is considered as a specific thinking method in the traditional Chinese medicine, and is widely used to explain the life activities of the human body, the causes and pathological changes of diseases, and to guide the diagnosis and prevention of diseases. In the course of treatment based on syndrome differentiation, famous and old Chinese medicine also teaches that all things belong to yin and yang, and the meaning is that the external disease features can be linked with yin and yang through some relations. Based on this theory, inside the knowledge path described above, the termination node can be designated as either a negative or a positive.

After the above step S106, a plurality of knowledge paths are obtained, each knowledge path corresponds to a case where the symptoms in the case have the concept layer characteristics and the attribute layer characteristics after the expansion based on the symptom label system. For example, as shown in fig. 2, the symptom information "tongue red" in fig. 2 has nodes having various semantic relationships with "tongue red", that is, when the "tongue red" is taken as an initial node to perform the path search, various knowledge paths are found, and each knowledge path includes many nodes having various semantic relationships with "tongue red". Since keeping all the nodes (conceptual level features) associated with "tongue red" increases a large amount of subsequent workload reduction efficiency, the number of knowledge paths is reduced to an appropriate predetermined number by step S108.

The initial node in the above steps refers to an instance layer feature and an attribute layer feature, the instance layer feature is a set of words in the symptom information and/or the patient basic information, one instance layer feature is a certain word (a node in the path) in the instance layer, the attribute layer feature describes the basic information of the data object, and the attribute data can be directly or indirectly obtained from the data object itself, for example, the attribute layer feature includes a set of words that decompose some words in the symptom information and/or the patient basic information.

And performing path search by taking the instance layer features and/or the attribute layer features in the steps as the starting nodes, finding all possible knowledge paths between the initial nodes and the end points, and fully mining semantic features (concept layer features) implied by each symptom word (initial nodes) because a plurality of knowledge paths contain all concept layer features associated with the initial nodes. The technical problem that the traditional Chinese medicine data cannot be efficiently and accurately analyzed during case reasoning due to the short symptom text of the traditional Chinese medicine data is solved, and the purpose of efficiently analyzing the traditional Chinese medicine data is achieved.

Since there are problems of data loss and term irregularity in the symptom data of the chinese medical science, the initial node in the above steps may be normative and non-normative. The initial node for obtaining the path to be learned may first determine whether the initial node is consistent with a preset standard word, the consistency of the initial node with the preset standard word indicates that the data specification is not missing, and in an optional implementation manner, the initial node is used as a starting point of the path to be learned when the word of the initial node is a standard word in the symptom information and/or the basic information of the patient.

Judging that the initial node is inconsistent with the preset standard word and represents data loss or terms are not standard, wherein in an optional implementation mode, the similarity between the preset standard word and the initial node needs to be calculated firstly; searching a preset standard word with the similarity exceeding a threshold value with the initial node; and then, taking the preset standard words with the similarity exceeding the threshold as the starting points of the knowledge paths to be searched. For example, the standard symptom words are used as initial nodes of the knowledge path, some standard symptom words are preset, and the set of the standard symptom words is used as the preset standard words. And when a certain symptom is subjected to labeling processing, and a corresponding preset standard word is not found, performing similarity calculation on the initial nodes of the marked example layer feature and attribute layer feature labels and the preset standard word, and taking the preset standard word with the similarity exceeding a threshold value and the maximum similarity as the initial node to perform knowledge routing.

Through the steps, the problems of data loss and data irregularity can be solved to the maximum extent, and therefore the analysis efficiency of the traditional Chinese medicine medical record data is improved.

In an optional implementation manner, a plurality of knowledge paths are obtained by performing path finding between an initial node and an end point through a path obtaining function in combination with a greedy algorithm, where the path obtaining function is a function that increases the path length between the start point and the end point by a preset step length to obtain a path, and the step length is the path length between the start point and the end point that is increased each time in the path finding process. For example, the start node and the end node of the knowledge path are first specified, and the path length between the start point and the end point is gradually increased by a certain step h. In order to make the path length grow uniformly, and thus facilitate obtaining each possible knowledge path, the value of h is set to 1. When the path length between two nodes exceeds 6, the association relationship between the two nodes becomes very weak, so that the upper limit k of the path length is set to 6.

According to the description of the symptoms in the basic theory of traditional Chinese medicine, the conceptual characteristics of the symptoms of traditional Chinese medicine can be divided into five words in advance; the predetermined types are etiology, pathogenesis, disease nature, syndrome and meridian points. In order to obtain concept level characteristics of symptoms through a greedy algorithm and a path obtaining function, in an optional implementation manner, first, a preset intermediate node between an initial node and a terminal point is obtained, wherein the preset intermediate node is a word of a preset type, the preset type is a cause, a pathogenesis, a disease property, a syndrome and meridian points, the initial node, the terminal point and the preset intermediate node form a path, and the number of the preset intermediate node is the length of the preset path minus one; secondly, judging a preset intermediate node of a preset type in the path; then, a path containing a preset intermediate node of a preset type is taken as a knowledge path.

All concept characteristics related to each initial node (such as symptom words) can be fully mined through the steps, the mined concept characteristics can form paths containing etiology, pathogenesis, disease property, syndrome and meridian points according to the theory of traditional Chinese medicine, each path contains related etiology, pathogenesis, disease property, syndrome and meridian points, and multiple paths are paths sets containing different etiology, pathogenesis, disease property, syndrome and meridian points related to the initial nodes, namely medical case data matched with the initial input symptom words can be found from multiple layers of semantics, so that the efficiency and the accuracy of data analysis are greatly improved, and convenience is provided for case reasoning.

In the case that the path does not include all preset intermediate nodes of the preset type, in an alternative embodiment, the path length between the starting point and the end point is continuously increased by a preset step length until reaching the preset path length, wherein the preset path length is the number of the preset intermediate nodes; acquiring a preset intermediate node between an initial node and a terminal; and taking a path containing a preset intermediate node between the initial node and the end point as a knowledge path. For example, the path acquisition function may determine whether the acquired path includes five types of intermediate nodes, i.e., etiology, pathogenesis, disease nature, syndrome, and meridian points, according to the concept of Cypher language and greedy algorithm and the length of the knowledge path specified each time. And if the nodes cannot be obtained completely, increasing the length of the path by a certain step length until all the five types of nodes can be obtained.

The above process is illustrated below by an alternative embodiment:

the concept features of the symptoms exist on a specific knowledge path, the knowledge path is required to be acquired when the concept layer features are acquired, the preset intermediate nodes are knowledge path templates, and the knowledge path templates of the five preset types of the symptom concept features are etiology, pathogenesis, disease nature, syndrome and meridian points respectively. Based on these, a preliminarily simplified conceptual feature can be obtained. For each knowledge path mode, a plurality of corresponding example paths can be expanded in the knowledge graph, as shown in fig. 3, the starting node is a symptom "abdominal pain", the ending node is a "yang", and the knowledge path mode is a "syndrome relation-disease location relation-sub-concept", so that the knowledge path set shown in fig. 3 can be obtained through the knowledge graph. That is to say, 11 route examples are expanded under the specific knowledge route "syndrome relation-location relation-sub-concept", and the 11 routes contain 5 kinds of syndrome information.

For another example, a greedy algorithm is used to perform routing between the initial node and the end point to obtain a plurality of knowledge paths, where the knowledge path taking symptoms as the routing initial node contains 23 knowledge paths related to syndromes, and if the syndromes expanded by the 23 knowledge paths are left without processing, a large-scale syndrome feature set is obtained, and a part of redundant features unrelated to retrieval exist in the syndrome set, and a certain reduction strategy needs to be adopted to further simplify the obtained final knowledge path to be searched. In an optional implementation manner, a plurality of knowledge paths are screened through feature optimization, and priority ranking is performed on the knowledge paths; taking the knowledge path with high priority as the knowledge path to be searched, wherein the high priority is that the score of each knowledge path in the knowledge paths is high, and the score is calculated according to the following calculation formula (1):

wherein S is_pA ranking score representing a certain path P; e_qAs a set of query entities, E_q＝{e₁,e₂,…e_iH, e represents a node; p is a relationship path; obtaining a predetermined number of knowledge paths to be searched includes: calculating a score of each knowledge path in the plurality of knowledge paths; according to the score pair h_Eq,p(e)Representing the probability of the starting node walking to the second node in one step; h is_Eq,p(e)Is calculated according to equation (2):

C_pis calculated according to equation (3):

C_prepresents the importance of the path P formed by nodes e and e', where C_pRepresents the degree of importance of the path; c_eRepresenting the degree of importance of a node, which is calculated according to equation (4):

wherein, Degrid is the node Degree, ClusterCoffective is the aggregation coefficient, and in order to balance the node Degree and the importance of the aggregation coefficient, the value of alpha is 0.5.

The steps reserve important parts in the concept characteristics through a characteristic optimization strategy (PRA), and delete some concept characteristics with lower scores.

PRA-based feature optimization strategies. For example, in the statistical analysis of the path score ranking results with the symptom as the starting point and yin and yang as the ending point, the number of intermediate nodes of the knowledge path ranked after the sixth name is too large, so the ranking threshold K is 5.

The conceptual features corresponding to these paths will be used in the next case base construction and case retrieval stage. As shown in fig. 4, after the medical scheme with medical scheme id 125 is primarily obtained and optimized by features, the concept layer features are finally stored in the database.

The entire process is described below in conjunction with fig. 5 according to an alternative embodiment:

the embodiment divides semantic features into three layers: instance layer features, attribute layer features, and concept layer features. The example layer and attribute layer features belong to the first two layers of features in the multilayer semantic features, and the essence of the method is that each symptom is subjected to primary detailed description once and can be directly or indirectly obtained from symptom words; the concept layer features belong to implicit semantic information, generally cannot be obtained through symptom data per se, and need to be assisted by some special means. Generally, concepts and instances are in a many-to-many relationship, that is, a concept may contain multiple instances, and an instance may be subordinate to multiple concepts; there is a one-to-many relationship between instances and attributes, i.e., an instance may contain several attributes. The three are defined as follows.

Definition 1 example R denotes dialectical information of a medical case and S denotesA set consisting of patient basic information and symptom words in the dialectical information, and for one dialectical information consisting of m words, may be expressed as R ═ { s ═ s₁,s₂,…,s_mIn which s is_k∈S，k∈[1,m]. If there is a certain word s_kIndicating a particular symptom or patient-based information, is called s_kIs an example (Instance). Correspondingly, from s_kThe Set of compositions is referred to as an Instance Set. For example, "abdominal pain" is an example of a symptom, { diarrhea, hematochezia, abdominal pain, poor sleep, dark red, thin white, thready veins, chordal veins } represents a set of example symptoms, each of { } is an example, and { } represents only a set of example symptoms.

Define 2 attribute let I denote an instance and have the set D ═ a₁,a₂,…,a_mIn which a is_k∈I，k∈[1,m]At this moment, it is called a_kAn Attribute (Attribute) of instance I, and the collection D is an Attribute Set (Attribute Set) of instance I. The attributes of an instance may be derived directly or indirectly from the instance. For example, the symptom instance "tongue quality is pale red", and its attributes include { tongue quality, pale, red, pale red }, and this set is referred to as the attribute set of the symptom instance "tongue quality is pale red".

Definition 3 concept R denotes dialectic information of a medical case, S denotes a set consisting of patient basic information and symptom word in the dialectic information, and for a dialectic information consisting of m words, may be expressed as R ═ { S ═ S₁,s₂,…,s_mIn which s is_k∈S，k∈[1,m]，s_iIs an example of forensic information. If there is one c_kIs s_iOr an implicit characteristic associated therewith, is called c_kIs an example s_iA Concept of (Concept). Correspondingly, a plurality of c_kThe composed Set is called a Concept Set (Concept Set). The symptoms of abdominal pain are the underlying features of abdominal pain in the examples. For example, the "red tongue" may belong to a plurality of syndromes such as "intestinal dryness and fluid deficiency syndrome", "small intestine excess heat syndrome", "dampness-heat in spleen", "liver and gallbladder dampness-heat syndrome" and "gallbladder stagnation and phlegm disturbance syndrome", and the corresponding syndrome Z ═{ the syndrome of intestinal dryness and fluid deficiency, the syndrome of damp-heat in the small intestine, the syndrome of damp-heat in the spleen, and the syndrome of gallbladder stagnation and phlegm disturbance } is a concept set of the symptom "red tongue".

The example layer and attribute layer features describe the basic information of the symptom, and the essence of the example layer and attribute layer features is a detailed description of the symptom. The acquisition of the characteristics of the instance layer and the attribute layer can perform standard and structured representation on the Chinese traditional symptoms, so that the symptoms realize the primary extension of the semantic level. Meanwhile, a foundation can be laid for obtaining the characteristics of the concept layer, and the obtaining method is an automatic characteristic obtaining method based on a traditional Chinese medicine symptom label system (the construction method of the traditional Chinese medicine symptom label system refers to a patent with the patent number of 201611235453.8). After the extension of the symptom-based label system, the symptoms in the medical record have been characterized by a concept layer and an attribute layer.

The multi-layer semantic features of the present embodiment include three layers: the system comprises an instance layer, an attribute layer and a concept layer, wherein the feature acquisition of the concept layer is complex and can be realized by a special domain knowledge base. The embodiment is based on a concept layer feature acquisition path of the knowledge graph, and aims to determine which concepts should be used as extended semantic features of the given entity by analyzing the position of the given entity in the knowledge graph and which concepts in semantic relation with the given entity. The knowledge graph is a complex semantic network based on a graph model, and complex semantic relations exist between nodes. In theory, for a given entity, without any restriction, it is likely that it will extend a very large number of semantic features within the knowledge-graph. The following results are easily caused: 1) too many features easily cause dimension disaster 2) a great amount of redundant features may exist in original features 3) in various retrieval applications, calculating the similarity of the features between entities (symptom words) is an important method for measuring the similarity of the entities, a great amount of features with lower feature weights exist in an original feature set, and the features do not bring great help to the calculation of the similarity between the entities, but increase the time complexity of a retrieval system.

In order to obtain the conceptual level characteristics of the symptom, the core steps of the embodiment include: step one, acquiring knowledge paths among nodes; acquiring concept layer characteristics based on the knowledge path; and step three, further optimizing and selecting the characteristics of the concept layer. The method comprises the following specific steps:

the method comprises the following steps: after the extension of the symptom-based label system, the symptoms in the medical record have been characterized by a concept layer and an attribute layer. After a certain symptom is subjected to labeling processing, a corresponding standard symptom word is not found, at this time, similarity calculation is carried out on the existing standard word according to the marked example layer and attribute layer feature labels, and knowledge routing is carried out on the standard symptom word with the similarity exceeding a threshold value and the maximum similarity.

Step two: the knowledge graph is a semantic network, and the relationship between nodes is complex. In the case of knowing the start point and the end point of the knowledge path, it is difficult to obtain information of intermediate nodes and relationships between the two according to the prior knowledge. In order to solve the problem of path acquisition, a knowledge path acquisition strategy based on a greedy algorithm is provided. Greedy algorithm is also called greedy algorithm. As the name implies, it is always imperative to be able to make the choice that seems best at the present time when solving the problem. The method has no fixed algorithm framework, and the core idea is to select an optimal greedy strategy. Aiming at the knowledge path finding problem in the embodiment, the idea of a greedy algorithm can be used for helping to find the knowledge path between the nodes.

Step three: and (5) based on the knowledge path template in the step two, obtaining a rough conceptual layer feature set by using the most basic conceptual layer feature acquisition method. The most basic concept layer feature acquisition method is to use all knowledge paths of each concept feature for expansion.

Step four: based on the acquired knowledge path, the embodiment can obtain a concept layer feature set with rough symptoms. The concept layer feature set is called a rough concept layer feature set because each symptom still has more spread concept layer features and still contains some features irrelevant to practical application. In order to further optimize the concept layer feature set, a feature optimization method based on PRA is provided, so that the concept layer feature set expanded from symptom data in each case is cleaner. Pra (path Ranking Algorithm), which can be regarded as a modified version of the Random Walk Algorithm (RWA), is equivalent to Random Walk on a sequence set along a set of edges with a specific type of information, i.e., RWA Algorithm that restricts the Walk path.

The embodiment aims at the problems of data loss, irregular terms, short texts and the like of symptom data in the traditional Chinese medical scheme, multi-level feature expansion is carried out on symptoms in the case by using a multi-layer semantic feature technology, a concept layer feature (knowledge path) technology can be automatically obtained based on a knowledge graph, and the analysis efficiency and the analysis accuracy of the symptom data in the traditional Chinese medical scheme are improved.

An embodiment of the present invention provides a knowledge path acquiring system, and fig. 6 is a knowledge path acquiring system according to an embodiment of the present invention, and as shown in fig. 6, the system includes:

an obtaining unit 62, configured to obtain an initial node of a knowledge path to be searched, where the initial node is symptom information and/or patient basic information, the knowledge path is formed by multiple nodes, and the node is a conceptual layer feature associated with the symptom information and/or the patient basic information;

a determining unit 64, configured to determine an end point of the path to be learned, where the end point is a concept layer feature yin or yang obtained by routing according to the symptom information and/or the patient basic information;

a searching unit 66, configured to perform path finding between the initial node and the destination to obtain a plurality of knowledge paths;

and the screening unit 68 is configured to screen the plurality of knowledge paths to obtain a predetermined number of knowledge paths to be searched.

The embodiment of the system for acquiring the knowledge path corresponds to the method for acquiring the knowledge path, so the beneficial effects are not described again.

The embodiment of the invention provides a storage medium, which comprises a stored program, wherein when the program runs, a device on which the storage medium is positioned is controlled to execute the method.

The embodiment of the invention provides a processor, which comprises a processing program, wherein when the program runs, a device where the processor is located is controlled to execute the method.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A knowledge path acquisition method is characterized by comprising the following steps:

acquiring an initial node of a knowledge path to be searched, wherein the initial node is symptom information and/or patient basic information, the knowledge path is composed of a plurality of nodes, and the nodes are concept layer features associated with the symptom information and/or the patient basic information;

the method for acquiring the initial node of the knowledge path to be searched comprises the following steps:

if the initial node is judged to be consistent with a preset standard word, the initial node is used as a starting point of a path to be searched, wherein the preset standard word is a standardized word in symptom information and/or patient basic information;

and under the condition of judging that the initial node is inconsistent with a preset standard word, the method comprises the following steps:

calculating the similarity between the preset standard words and the initial nodes;

searching a preset standard word with the similarity to the initial node exceeding a threshold value;

taking a preset standard word with the similarity exceeding a threshold value as a starting point of a knowledge path to be searched;

determining an end point of a path to be searched, wherein the end point is a concept layer characteristic yin or yang obtained by searching the path according to symptom information and/or patient basic information;

carrying out routing between the initial node and the end point through a greedy algorithm to obtain a plurality of knowledge paths;

screening a plurality of knowledge paths through feature optimization to obtain a predetermined number of knowledge paths to be searched; the method comprises the following steps:

calculating a score of each knowledge path in the plurality of knowledge paths;

prioritizing the knowledge paths according to the scores;

taking a knowledge path with high priority as a knowledge path to be searched, wherein the high priority is to calculate the score of each knowledge path in the knowledge paths;

calculating the score according to the following calculation formula (1):

C_pis calculated according to equation (3):

C_prepresenting the importance of the path P formed by the nodes e and e'; c_eRepresenting the degree of importance of a node, which is calculated according to equation (4):

2. The method of claim 1, wherein obtaining a plurality of knowledge paths by routing between the initial node and the end point using a greedy algorithm comprises:

and performing path searching between the initial node and the end point by combining a path acquisition function and a greedy algorithm to obtain a plurality of knowledge paths, wherein the path acquisition function is used for increasing the path length between the starting point and the end point by a preset step length to acquire the path, and the step length is the path length between the starting point and the end point which is increased each time in the path searching process.

3. The method of claim 2, wherein obtaining a plurality of knowledge paths by routing between the initial node and the end point via the greedy algorithm and the path acquisition function comprises:

acquiring preset intermediate nodes between the initial nodes and the end points, wherein the preset intermediate nodes are words of preset types, the preset types are etiology, pathogenesis, disease nature, syndrome and meridian points respectively, the initial nodes, the end points and the preset intermediate nodes form paths, and the number of the preset intermediate nodes is the length of the preset path minus one;

judging that the path contains all preset intermediate nodes of preset types;

and taking the path containing all the preset intermediate nodes of the preset types as a knowledge path.

4. The method according to claim 3, wherein in case that no preset intermediate nodes of all preset types are included in the path, the method comprises:

continuing to increase the path length between the starting point and the end point by a preset step length until the preset path length is reached, wherein the preset path length is obtained by adding one to the number of preset intermediate nodes;

acquiring a preset intermediate node between the initial node and the terminal;

and taking a path containing a preset intermediate node between the initial node and the end point as a knowledge path.

5. A knowledge path acquisition system, comprising:

the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an initial node of a knowledge path to be searched, the initial node is symptom information and/or patient basic information, the knowledge path is composed of a plurality of nodes, and the nodes are concept layer features associated with the symptom information and/or the patient basic information;

the obtaining unit is specifically configured to:

the determining unit is used for determining an end point of a path to be found, wherein the end point is a concept layer characteristic yin or yang obtained according to symptom information and/or patient path finding;

the searching unit is used for searching paths between the initial node and the terminal point to obtain a plurality of knowledge paths;

the screening unit is used for screening the plurality of knowledge paths to obtain a predetermined number of to-be-searched knowledge paths;

the screening unit is specifically configured to:

calculating a score of each knowledge path in the plurality of knowledge paths;

prioritizing the knowledge paths according to the scores;

calculating the score according to the following calculation formula (1):

C_pis calculated according to equation (3):

6. A storage medium, characterized in that the storage medium comprises a stored program, wherein the program performs the method of any one of claims 1 to 4.

7. A processor, characterized in that the processor is configured to run a program, wherein the program when running performs the method of any of claims 1 to 4.