CN116484017A - Knowledge graph retrieval method and system based on rule reasoning - Google Patents

Knowledge graph retrieval method and system based on rule reasoning Download PDF

Info

Publication number
CN116484017A
CN116484017A CN202310336741.6A CN202310336741A CN116484017A CN 116484017 A CN116484017 A CN 116484017A CN 202310336741 A CN202310336741 A CN 202310336741A CN 116484017 A CN116484017 A CN 116484017A
Authority
CN
China
Prior art keywords
rule
knowledge
condition
entity
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310336741.6A
Other languages
Chinese (zh)
Inventor
陈细平
邓荣平
郑圳洺
魏倩
姚家渭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Half Cloud Technology Co ltd
Original Assignee
Hangzhou Half Cloud Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Half Cloud Technology Co ltd filed Critical Hangzhou Half Cloud Technology Co ltd
Priority to CN202310336741.6A priority Critical patent/CN116484017A/en
Publication of CN116484017A publication Critical patent/CN116484017A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a knowledge graph retrieval method and system based on rule reasoning, belongs to the technical field of knowledge graphs, and solves the problem of low accuracy of the existing retrieval result. The method comprises the steps of obtaining user questions, extracting keywords in the user questions, and obtaining subclasses corresponding to the user questions according to a text classification model; based on the relation of knowledge content in the knowledge graph, searching out related knowledge content according to the keywords and the subclasses to obtain a search result; the relationship of the knowledge content in the knowledge graph is updated periodically, which comprises the following steps: extracting hidden keywords of the knowledge content according to the keyword prediction model, and establishing the relationship between the knowledge content and the keywords; and/or constructing an inference task, configuring a condition rule and a result rule for each inference task by using the condition rule template and the result rule template, running the inference task passing the verification, and establishing or updating the relationship between the knowledge content and the keywords and/or the subclasses. And the accuracy of the search result and the satisfaction degree of the user are improved.

Description

Knowledge graph retrieval method and system based on rule reasoning
Technical Field
The invention relates to the technical field of knowledge graphs, in particular to a knowledge graph retrieval method and system based on rule reasoning.
Background
With the rapid development of science and technology, various industries have generated enormous amounts of data, most of which are based on relational databases. However, a bottleneck occurs when complex relational query and relational analysis are performed, and table and multi-table association query of a large amount of data obviously reduces query efficiency, so that in order to better store and analyze such relational data and make full use of unstructured data, a graph database storage technology is generated, and meanwhile, various knowledge graphs based on a graph database are increasingly applied.
The retrieval of the knowledge graph aims at quickly finding out an accurate result from the existing knowledge graph. The existing search focuses on the search comparison of keywords in a knowledge graph, only the comparison result in the knowledge base content is found out and output, but the keywords often cannot reflect the intention of a customer problem, and whether the fed-back search result meets the user requirement or not does not really understand the result required by the user is well processed. Moreover, the updating of the knowledge graph is also only changed due to the change of the service data, and information with deeper or more dimensions is lacking, for example, some knowledge content expressions do not refer to a certain keyword but are information related to the keyword, so that the information cannot be searched in an associated manner, and the accuracy of a search result is low.
Therefore, in the prior art, the accurate matching of the search condition and the search result is lacking, the hidden entity and relation information in the user problem is ignored, a large amount of valuable data is lacking, and the search result is not displayed in the order closest to the semantic understanding of the user.
Disclosure of Invention
In view of the above analysis, the embodiment of the invention aims to provide a knowledge graph retrieval method and a system based on rule reasoning, which are used for solving the problem of low accuracy of a retrieval result caused by lack of deeper information in the prior art.
On one hand, the embodiment of the invention provides a knowledge graph retrieval method based on rule reasoning, which comprises the following steps:
acquiring user questions, extracting keywords in the user questions, and obtaining subclasses corresponding to the user questions according to the text classification model; based on the relation of knowledge content in the knowledge graph, searching out related knowledge content according to the keywords and the subclasses to obtain a search result;
the relationship of the knowledge content in the knowledge graph is updated periodically, which comprises the following steps:
extracting hidden keywords of the knowledge content according to the keyword prediction model, and establishing the relationship between the knowledge content and the keywords;
and/or constructing an inference task, configuring a condition rule and a result rule for each inference task by using the condition rule template and the result rule template, running the inference task passing the verification, and establishing or updating the relationship between the knowledge content and the keywords and/or the subclasses.
Based on the further improvement of the method, according to the text classification model, the subclass corresponding to the user problem is obtained, which comprises the following steps:
according to the historical user problems and the corresponding subclasses thereof, a first sample set is constructed, and a textCNN text classification model is trained by using the first sample set; and after word segmentation and vectorization are carried out on the obtained user problems, the obtained user problems are transmitted into a textCNN text classification model, and corresponding subclasses are obtained.
Based on the further improvement of the method, based on the relation of knowledge contents in the knowledge graph, the related knowledge contents are searched according to the keywords and the subclasses, and the search result is obtained by sequencing, comprising the following steps:
based on the relation between the knowledge content in the knowledge graph and the keywords and the subclasses, according to the keywords, searching out the associated knowledge content, the relation weight and the belonging subclasses, and obtaining a search result; and sequencing the search results according to the detail, the keyword word frequency and the relation weight in sequence.
Based on the further improvement of the method, the keyword prediction model is obtained through the following steps: manually marking hidden keywords as sample labels for knowledge contents in the knowledge graph, and constructing a second sample set after word segmentation processing is performed on the knowledge contents; and constructing a characteristic sequence model and training by using a second sample set to obtain a keyword prediction model.
Based on a further improvement of the above method, the conditional rule template comprises: the entity A and the entity B have a relation C, the attribute B of the entity A meets the condition C, the attribute B of the relation A meets the condition C, the attribute of the entity A is defined as a variable B, the attribute of the relation A is defined as a variable B, and the variable A and the variable B meet the condition C; the result rule template includes: the entity A and the entity B add the relation C, the attribute B of the entity A is set as C, and the attribute B of the relation A is set as C.
Based on the further improvement of the method, based on the knowledge graph, the condition rule template and the result rule template are utilized to configure the condition rule and the result rule for each reasoning task, and the method comprises the following steps: selecting an entity, a relation, an entity attribute and a relation attribute in the knowledge graph, and configuring 1 or more condition rules and 1 result rule; the same entity, relationship, entity attribute and relationship attribute are respectively identified by unique variable names in the same reasoning task; the conditions in the condition rule template include operators and condition values; the plurality of condition rules are "and" relationships.
Based on the further improvement of the method, the verification passing reasoning task means that the condition rule and the variable name of the result rule in the reasoning task are valid, the condition value is matched with the attribute type, and the attribute value is valid.
Based on a further improvement of the above method, the variable names effectively include: each variable name does not include Chinese and symbols; each variable name only corresponds to the same entity or relationship or attribute; and, according to the condition rule configured by the rule template of defining the attribute of the entity A as the variable B and/or defining the attribute of the relation A as the variable B, the variable name exists in the condition rule configured by the rule template of defining the condition C by the variable A and the variable B;
matching the condition value with the attribute type includes: the condition value is not NULL or NULL string; and the condition value is matched with the type of the attribute of the corresponding entity or relation;
the attribute values effectively include: the attribute value is not null and the attribute value exists in the corresponding entity or relationship.
Based on the further improvement of the method, the reasoning task which passes the verification is operated to establish or update the relation between the knowledge content and the keywords and/or the subclasses, and the method comprises the following steps:
converting the condition rules in the reasoning task into query operation sentences of the graph database;
converting the result rule in the reasoning task into a result operation statement of the graph database;
executing the query operation statement, obtaining the query result, executing the result operation statement on the query result, and establishing or updating the corresponding relation of the result operation statement.
On the other hand, the embodiment of the invention provides a knowledge graph retrieval system based on rule reasoning, which comprises the following steps:
the problem retrieval module is used for acquiring user problems, extracting keywords in the user problems, and obtaining subclasses corresponding to the user problems according to the text classification model; based on the relation of knowledge content in the knowledge graph, searching out related knowledge content according to the keywords and the subclasses to obtain a search result;
the map updating module is used for periodically updating the relation of the knowledge content in the knowledge map, and comprises the following steps: extracting hidden keywords of the knowledge content according to the keyword prediction model, and establishing the relationship between the knowledge content and the keywords; and/or constructing an inference task, configuring a condition rule and a result rule for each inference task by using the condition rule template and the result rule template, running the inference task passing the verification, and establishing or updating the relationship between the knowledge content and the keywords and/or the subclasses.
Compared with the prior art, the invention has at least one of the following beneficial effects: through deep learning and positioning of problem categories, keywords are identified through NLP, and through configuration and execution of machine learning and reasoning tasks, potential knowledge relations are mined to supplement and update knowledge graphs, and accuracy of search results is improved; processing the problem sample by combining deep learning, subdividing the problem, and better identifying the intention of a user to ask questions; predicting the association relation between knowledge content and hidden keywords by combining a machine learning model; the method has the advantages that various condition rules and result rules are combined rapidly by combining the rule templates to configure the reasoning task, implicit entity attributes and association relations are mined, the method is convenient and quick, understanding is facilitated, an operation user only needs to have business knowledge, complex graph database sentences do not need to be written, and professional knowledge such as graph algorithms does not need to be used for data analysis; through relationship supplementation of knowledge base data and recognition of problem intention, richness and comprehensiveness of knowledge are increased, and accuracy of search results and user satisfaction are improved.
In the invention, the technical schemes can be mutually combined to realize more preferable combination schemes. Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and drawings.
Drawings
The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, like reference numerals being used to refer to like parts throughout the several views.
FIG. 1 is a flowchart of a knowledge graph retrieval method based on rule reasoning in the embodiment 1 of the present invention;
FIG. 2 is a flowchart of knowledge graph searching and updating in embodiment 1 of the present invention;
fig. 3 is a schematic diagram of converting a conditional rule into an operation sentence in embodiment 1 of the present invention.
Detailed Description
Preferred embodiments of the present invention will now be described in detail with reference to the accompanying drawings, which form a part hereof, and together with the description serve to explain the principles of the invention, and are not intended to limit the scope of the invention.
Example 1
The invention discloses a knowledge graph retrieval method based on rule reasoning, which comprises the following steps as shown in fig. 1 and 2:
s11, acquiring user problems, extracting keywords in the user problems, and obtaining subclasses corresponding to the user problems according to a text classification model; based on the relation of knowledge content in the knowledge graph, searching out related knowledge content according to the keywords and the subclasses to obtain a search result;
s12, periodically updating the relation of knowledge content in the knowledge graph, wherein the method comprises the following steps: extracting hidden keywords of the knowledge content according to the keyword prediction model, and establishing the relationship between the knowledge content and the keywords; and/or constructing an inference task, configuring a condition rule and a result rule for each inference task by using the condition rule template and the result rule template, running the inference task passing the verification, and establishing or updating the relationship between the knowledge content and the keywords and/or the subclasses.
In implementation, the method comprises the steps of locating the problem category through deep learning, identifying keywords through NLP (Natural Language Processing ), predicting the association relation between knowledge content and hidden keywords through a machine learning model, mining hidden entity attributes and association relation through configuration and execution of reasoning tasks, supplementing and updating a knowledge map, increasing richness and comprehensiveness of knowledge, and improving accuracy of search results and user satisfaction.
It should be noted that, the knowledge graph is to map the collected structured and unstructured business data into a graph database by using a constructed entity model to form a set of points and edges, wherein the "points" are used to represent entities, the "edges" are used to represent relationships between the entities, and the entities and the relationships all include attributes. Further, knowledge maps of different teams or projects are isolated through graph space.
The entities related to the retrieval in the knowledge graph of the embodiment include: major classes, minor classes, knowledge content, and keywords. Wherein, a major class is divided into a plurality of subclasses, one subclass covers a plurality of knowledge contents, one knowledge content belongs to a plurality of subclasses, one knowledge content comprises a plurality of keywords, one keyword corresponds to a plurality of knowledge contents, the keywords are obtained by extracting the keywords from the knowledge contents through jieba of NLP, the relation weight between the directly extracted keywords and the knowledge contents is 0, and the expression level is highest.
Illustratively, in a medical knowledge graph, the broad classes built according to the user's retrieval problem include: disease consultation class, medicine consultation class, disease class, discipline class, examination class, medical insurance class, hospital class and the like, wherein the disease class is further divided into stomach diseases, chronic diseases, diabetes, skin diseases and other subclasses, the subclasses of the stomach diseases and the chronic diseases also cover knowledge content of 'the clinical manifestation of chronic gastritis lacks specific symptoms, the severity of the symptoms is not consistent with the pathological change degree of gastric mucosa', and the knowledge content comprises the following keywords: gastritis, chronic gastritis and gastric mucosa.
Further, in order to better distinguish the intention of the user problem, the embodiment constructs a first sample set according to the historical user problem and the corresponding subclass thereof, and trains a TextCNN text classification model by using the first sample set for identifying the subclass corresponding to the user problem.
Specifically, extracting historical user questions, eliminating questions with the number of questions not exceeding 5 words, manually marking corresponding subclasses, and taking the question numbers, the question texts and the subclasses as original data;
eliminating punctuation marks of the problem text in the original data, adopting jieba word segmentation of NLP, eliminating stop words, obtaining word segmentation results of the problem text, and counting word frequency;
constructing a vocabulary table according to word segmentation results, wherein the vocabulary table comprises keywords and occurrence frequencies thereof, subclasses and occurrence frequencies thereof;
and vectorizing the word segmentation result and the corresponding detail of the problem text according to the vocabulary, and constructing a first sample set.
It should be noted that, because the lengths of the questions are different, the number of words in the word segmentation result is different, and the cutting or the filling needs to be performed according to the actual situation. The first sample set is subdivided into a training set, a test set and a validation set, which are of conventional use and are not separately elucidated.
Illustratively, the TextCNN model is built and trained through the tensorflow framework.
After word segmentation and vectorization are carried out on the obtained user questions, a trained textCNN text classification model is transmitted to obtain corresponding subclasses, and therefore the intention of the user for asking questions is more clearly identified.
Then, based on the relation of knowledge content in the knowledge graph, according to the keywords and the subclasses, the associated knowledge content is searched, and the search result is obtained by sequencing, comprising the following steps:
based on the relation between the knowledge content in the knowledge graph and the keywords and the subclasses, according to the keywords, searching out the associated knowledge content, the relation weight and the belonging subclasses, and obtaining a search result; and sequencing the search results according to the detail, the keyword word frequency and the relation weight in sequence.
Illustratively, the user enters the question "what are the symptoms of gastritis? The extracted keywords are "gastritis" and "symptoms", and the problem belongs to "disease category" obtained through a text classification model; searching knowledge content containing 'gastritis' and/or 'symptom' keywords, and acquiring the relation weight of the knowledge content and the keywords; acquiring the belonged subclass of the searched knowledge content according to the relation between the knowledge content and the subclass; then firstly sorting according to the keyword word frequency in the knowledge content, accumulating the relation weight of each keyword for the knowledge content with the same keyword word frequency, and sorting to be more forward according to the smaller accumulated relation weight; finally, the knowledge content of the category of disease is ranked at the forefront according to the keyword word frequency and the relation weight order, and the ranked retrieval result is obtained.
In step S12, the relationships of the knowledge contents in the knowledge graph are updated periodically by two methods S121 and S122, and the updated relationship weight is 1.
S121, extracting the hidden keywords of the knowledge content according to the keyword prediction model, and establishing the relationship between the knowledge content and the keywords.
It should be noted that, the keyword prediction model is obtained by the following steps: manually marking hidden keywords as sample labels for knowledge contents in the knowledge graph, and constructing a second sample set after word segmentation processing is performed on the knowledge contents; and constructing a characteristic sequence model and training by using a second sample set to obtain a keyword prediction model.
The characteristic sequence model is established by using TensorFlow, and training and testing are carried out by adopting a convolutional neural network. After training, a keyword prediction model is called, hidden keywords of the knowledge content are obtained, the relationship between the knowledge content and the hidden keywords is established, and the relationship weight is set to be 1. That is, the weight of the hidden relationship word is not as high as the relationship weight of the keyword directly related to the knowledge content.
For example, for the knowledge content "gastroenteritis is usually caused by microbial infection and also can be caused by chemical poison or medicine", the hidden keyword predicted by the keyword prediction model is "gastritis", the relationship between the knowledge content and the "gastritis" keyword is established, and the relationship weight is set to be 1.
Preferably, knowledge content is periodically acquired through a timing task, a keyword prediction model is input, if the output hidden keywords do not establish an association relationship with the input knowledge content, corresponding input and output are recorded, and after all the knowledge content is detected, the relationship is established in batches according to the recorded information.
S122, constructing an inference task, configuring a condition rule and a result rule for each inference task by using the condition rule template and the result rule template, running the inference task which passes the verification, and establishing or updating the relationship between the knowledge content and the keywords and/or the subclasses.
When the reasoning task is configured, all the entity and relation information of the medical knowledge graph are read, the entity and relation information is cached to a front page, and when the rule configuration is performed, the entity, the relation, the entity attribute and the relation attribute are selected in a drop-down frame mode. An inference task includes 1 or more condition rules, but only 1 result rule, meaning that an inference result rule can be mined based on configured condition rules.
Specifically, the conditional rule templates include: the entity A and the entity B have a relation C, the attribute B of the entity A meets the condition C, the attribute B of the relation A meets the condition C, the attribute of the entity A is defined as a variable B, the attribute of the relation A is defined as a variable B, and the variable A and the variable B meet the condition C; the result rule template includes: the entity A and the entity B add the relation C, the attribute B of the entity A is set as C, and the attribute B of the relation A is set as C.
It should be noted that A, B and C are only distinguished from the rule templates, and there is no correspondence between rule templates. When the condition rules are configured, 1 or more condition rule templates can be selected, 1 or more condition rules can be configured by 1 condition rule template, and the 'conditions' in the condition rule templates comprise operators and condition values; the plurality of condition rules are "and" relationships.
In order to correctly identify the entity, the relation and the attribute in knowledge mining, the problem of confusion of judgment of rules caused by the fact that the same rule template is introduced for many times is avoided, a variable name is introduced to replace the names of the original entity, the relation and the attribute, and the selected same entity, relation, entity attribute and relation attribute are respectively identified by unique variable names in the same reasoning task.
The following description is directed to 6 conditional rule templates.
1) Entity A has relationship C with entity B
The rule template is used for judging whether the entity A and the entity B in the graph database have a relation C or not, wherein the entity A and the entity B can be the same entity or different entities. When the rule is configured, setting variable names for 2 entities and 1 relation respectively, wherein in the medical map, the entity name of the keyword is keyword, and the variable name is gjc; the entity name of the knowledge content is content, and the variable name is nr; the relationship name of the keyword inclusion relationship is contact, and the set variable name is bh.
2) Attribute B of entity a satisfies condition C
The rule template is a rule that determines whether a certain attribute B of an entity a in the graph database satisfies a certain condition C, wherein the operators of the condition include, but are not limited to: greater than, equal to or less than, not equal to or equal to and inclusive. For example: in the medical map, the price (attribute name price) of the medicine (entity name drug, variable name yp) is greater than 100 yuan (condition C), wherein the condition of "greater than 100 yuan" is set to be "greater than" by a drop-down box, and the condition value of "100" is filled in by an input box (the back-end processing is in the unit of yuan).
3) Attribute B of relationship a satisfies condition C
The rule template is a rule that determines whether a certain attribute B of a relationship a in the graph database satisfies a certain condition C, wherein the operators of the condition include, but are not limited to: greater than, equal to or less than, not equal to or equal to and inclusive. For example: in the medical map, the weight value (attribute name weighted value) of the keyword containing relation (relation name contain) is greater than 0 (condition C), wherein the condition "greater than 0" is set to "greater than" by the drop-down box, and the condition value "0" is filled in by the input box.
4) Defining the attribute of entity A as variable B
The rule template is to set a variable B to a certain attribute of an entity a in the graph database, where the variable B represents a value of the attribute, and the variable B is used to participate in a variable value comparison in that the variable a and the variable B satisfy a condition C. For example: in the medical map, the available state (attribute name) of a keyword (entity name gjc) is set as a variable name sfky.
5) Defining the attributes of relationship A as variable B
The rule template is to set a variable B to a certain attribute of a relation a in the graph database, and the variable B is used to participate in the variable value comparison in the condition that the variable a and the variable B meet the condition C. For example: in the medical map, a weight value (attribute name weighted value) of a keyword inclusion relation (relationship name) between knowledge content and keywords is set as a variable name qzz.
6) The variable A and the variable B meet the condition C
The rule template is to use the entity attribute variable names and the relationship attribute variable names defined in steps 4) and 5) above to make a conditional comparison between variable names, wherein the operators of the conditions include, but are not limited to: greater than, equal to or less than, not equal to or equal to and inclusive.
The following is a description of 3 result rule templates.
1) Entity A and entity B add relationship C
The rule template is to add a relation C to an entity A and an entity B in the graph database and update the graph database. For example: in the medical map, a keyword-containing relationship (relationship name) is added to a keyword (entity name keyword, variable name gjc) and knowledge content (entity name content, variable name nr) in a dataset satisfying a condition rule.
2) The attribute B of entity a is set to the value C
The rule template is to set the attribute B of the entity A in the graph database as a value C and update the graph database. For example: in the medical map, the available state (attribute name) of a keyword (entity name gjc) satisfying the condition rule is set to "false".
3) Attribute B of relationship A is set to value C
The rule template is to set the attribute B of the relation A in the graph database as the value C and update the graph database. For example: in the medical map, a weight value (attribute name weighted value) containing a keyword relationship (relationship name bh) in the data set satisfying the condition rule is set to "0".
Compared with the prior art, the embodiment summarizes the knowledge discovery and reasoning in the graph database into 6 condition rules, the updating of the graph database into 3 result rules, automatically extracts the data conforming to the conditions according to the flexibly configured condition rules through the visual interface, increases the implicit knowledge according to the result rules, enriches the knowledge graph and improves the accuracy of the search result.
Further, the condition rules and the result rules in the reasoning task are checked.
It should be noted that, the condition rule and the result rule configured by the front end are transmitted to the back end, each condition rule is mapped into a condition rule object to form a condition rule list, 1 result rule is mapped into a result rule object, and after verification, information in the condition rule object and the result rule object is stored in a relational database, so that the information is conveniently displayed at the interface end.
The rule checking is to comprehensively check the configured condition rule and result rule, the reasoning task can be operated only after the checking passes, and the checking cannot be operated after the checking does not pass. Rule checking sequentially checks whether the variable name is valid, whether the condition value is matched with the attribute type, and whether the attribute value is valid, if one error exists, error reporting prompt is carried out, the task state is set to be failed in checking, and only the state is changed to be normal after all the task states pass. That is, the condition rule and the result rule check pass, including: the variable name is valid, the condition value matches the attribute type, and the attribute value is valid.
Specifically, when checking whether the variable name is valid, if the variable name contains symbols and Chinese, the variable name is invalid; an invalid variable if a variable name is repeatedly defined by a different entity, a different relationship, or a different attribute; if the attribute in the former rule defines a variable name, but the attribute variable name is not used in the latter rule, the variable name is an invalid variable name; if a variable name is used in the rule for performing the variable comparison, but the variable name is not defined in all rules, it is an invalid variable name; the occurrence of the above cases is regarded as verification failure. Thus, variable names effectively include: each variable name does not include Chinese and symbols; each variable name only corresponds to the same entity or relationship or attribute; and, the variable names in the rule templates configured according to the rule templates of the definition entity A as the variable B and/or the definition relation A as the variable B exist in the rule templates configured according to the rule templates of the definition relation A as the variable C.
When checking whether the condition value is matched with the attribute type, if the attribute type of the entity or the relation is a digital type, but the set condition value is Chinese-English characters or symbols, the condition value is an invalid rule; if the attribute value of the entity or relationship is set to null or empty string, the entity or relationship is regarded as an invalid rule, and the above situations are regarded as verification failure. Thus, matching the condition value with the attribute type includes: the condition value is not NULL or NULL string; and, the condition value matches the type of attribute of the corresponding entity or relationship.
When checking whether the attribute value is valid, if the attribute of the entity or the relation needs to be selected in configuration, but is not selected, or the attribute is manually modified after the selection, the attribute value is invalid, and the situation that the verification fails is considered. Thus, the attribute values effectively include: the attribute value is not null and the attribute value exists in the corresponding entity or relationship.
Preferably, if a plurality of condition rules configured by a rule template of 'entity A and entity B have a relation C' exist in the condition rules, whether the plurality of entities have tandem connection or not is further checked according to the connection direction of the entities and the relation in the knowledge graph.
Illustratively, if the rule configured for "entity a has a relationship C with entity B" is: the details and the knowledge content have a covering relationship, the knowledge content and the keywords have a containing relationship, and according to the connection direction of the entity and the relationship, the knowledge content and the keywords can be connected in series as follows: (detail) - [ cover ] - > (knowledge content) - [ contain ] - > (keyword), then is the valid rule, check pass; if "entity A has a relationship C with entity B" the configuration rule is: the knowledge content and the keywords have a containing relation (knowledge content) - [ containing ] - > (keywords), the major class and the minor class have a dividing relation (major class) - [ dividing ] - > (minor class), and the rule is invalid if the same entities are not connected in series, and the verification fails; if "entity A has a relationship C with entity B" the configuration rule is: the knowledge content and the keywords have a containing relation (knowledge content) - [ containing ] - > (keyword), the knowledge content and the subclasses have a attributive relation (knowledge content) - [ attributing ] - > (subclass), and the connection direction of the relation cannot connect the entities in series, so that the rule is invalid and verification fails.
Preferably, whether the entity or the relation corresponding to the task keyword of the reasoning task exists in the entity or the relation selected by the condition rule and the result rule or not is checked, if not, prompt information is displayed, whether the prompt information is ignored or not is confirmed by a user, if so, the verification is passed, otherwise, the verification is failed.
Further, after the verification of the reasoning task, the relationship between the knowledge content and the keywords and/or the subclasses is established or updated, which comprises the following steps:
converting the condition rules in the reasoning task into query operation sentences of the graph database;
converting the result rule in the reasoning task into a result operation statement of the graph database;
executing the query operation statement, obtaining the query result, executing the result operation statement on the query result, and establishing or updating the corresponding relation of the result operation statement.
It should be noted that the operation sentence of the graph database is based on the query language specification of the graph database adopted in the knowledge graph. The mainstream graph database includes Neo4j, dgraph, janusGraph, hugeGraph, nebula, etc., and in this embodiment, a Nebula graph database is used to convert the map database into an operation sentence supported by the corresponding nGQL language.
Specifically, converting the condition rules in the reasoning task into query operation sentences of the graph database includes:
(1) And converting the condition rules in the reasoning task into entity relation sentences and condition sentences according to the condition rule templates.
It should be noted that, the condition rule templates 1) are used for configuring entity relationships, and the condition rule templates 2), 3) and 6) are used for configuring conditions; conditional rule templates 4) and 5) are used to define variables for use in conditional rule template 6). In the embodiment, the rule of each reasoning task is analyzed and converted according to the sequence of the first variable, the second condition and the last entity relation.
Illustratively, as shown in fig. 3, the following three steps are performed on the conditional rule to obtain an entity relationship statement and a conditional statement:
a) Performing identification conversion of attribute variable names
Namely: when the condition rules comprise the condition rules configured by the rule templates of the definition entity A attribute being the variable B and the definition relation A attribute being the variable B, the corresponding relation of the variable names is identified and stored.
Specifically, the attribute of the definition entity A is a variable B, and the entity A and the entity variable name, the entity attribute and the entity attribute variable name B are configured; the attribute defining the relation A is a variable B, and the relation A and the relation variable name, the relation attribute and the relation attribute variable name B are configured; if the same entity, relationship and attribute are used in the subsequent rule, the corresponding variable name can be directly obtained.
b) Identification conversion for conditional comparison
Namely: when the condition rule comprises that the attribute B of the entity A meets the condition C, the attribute B of the relation A meets the condition C and the variable A and the variable B meet the condition C, and the condition rule is configured by the rule template, the corresponding relation of the variable names is combined, and the condition statement corresponding to the entity or the relation is obtained.
The term "attribute B of the entity a satisfies the condition C" and "attribute B of the relation a satisfies the condition C", and the conversion statement is obtained according to the term "variable name of the entity/relation. Illustratively, the available state (attribute name) of the keyword (entity name gjc) is available, and the converted conditional statement is: gjc. availabl= true.
The condition rule that the variable A and the variable B meet the condition C is that after the corresponding relation of the variable names obtained in the first step is needed to obtain the entity or the variable name of the relation corresponding to the variable names, the conversion statement is obtained according to the variable names of the entity/the relation.
For example, the available state of the keyword is set as a variable gjczt, the available state of the containing relationship between the knowledge content and the keyword is set as a variable gjczt, the variable values of the two states are equal, the condition rule is configured by "gjczt" = "bhzt", when the condition statement is converted, the corresponding entity and relationship need to be found according to the corresponding relationship of the variable name, and the converted condition statement is: gjc.gjczt= bh.bhzt.
c) Performing recognition conversion of entity relationship
Namely: and according to the condition rule configured by the rule template of the entity A and the entity B with the relation C, the entities are connected in series through the relation to obtain an entity relation statement.
If there are a plurality of condition rules of "entity a and entity B have a relationship C", a linked list may be created first, the rules may be sorted and stored to the linked list according to the connection direction of the relationship, and then the entities may be connected in series according to "(entity variable name 1: entity name 1) - [ relationship variable name 1: relationship name 1] - > (entity variable name 2: entity name 2)", to obtain an entity relationship sentence.
Illustratively, the subclass (entity name category, variable name lb) has a covering relationship (relationship name owncategory, variable name sslb) with the knowledge content (entity name content, variable name nr), the knowledge content has a covering relationship (relationship name contact, variable name bh) with the keyword (entity name keyword, variable name gjc), and the converted entity relationship statement is: (lb: category) - [ sslb: owncategory ] - > (nr: content) - [ bh: content ] - > (gjc: keyword).
(2) And obtaining a corresponding operation object from the result rule of the reasoning task according to the result rule template.
It should be noted that, when the result rule in the reasoning task corresponds to "entity a and entity B add relationship C", the operation object is entity a and entity B; when the result rule in the reasoning task corresponds to that 'the attribute B of the entity A is set as C', the operation object is the entity A; when the result rule in the reasoning task corresponds to "the attribute B of the relationship a is set to C", the operation object is the relationship a.
(3) And splicing the entity relation statement, the conditional statement and the operation object according to the grammar of the operation statement of the graph database to obtain a query operation statement.
It should be noted that, the query operation statement obtained according to the condition rule is used for obtaining the data result meeting the condition. The format of the query operation statement is: the MATCH entity relationship statement WHERE 1= =1 and the conditional statement RETURN operand; when there are 2 operation objects, the operation objects are used and spliced.
Illustratively, the query operation statement that is spliced is: "MATCH (lb: category) - [ sslb: owncategory ] - > (nr: content) - [ bh: content ] - > (gjc: keyword) WHERE 1= 1and lb.
Specifically, converting the result rule in the reasoning task into a result operation sentence of the graph database includes:
when the result rule in the reasoning task corresponds to "the entity A and the entity B add the relation C", the result rule is converted into an insert relation statement, and the format of the insert relation statement is as follows: INSERT EDGE C () the primary key of value "a" - > "primary key of B": ().
When the result rule in the reasoning task corresponds to that "the attribute B of the entity A is set as C", the result rule is converted into an updated entity attribute statement, and the format of the updated entity attribute statement is as follows: UPDATE VERTEX ONA "main key of a" SET attribute b=c.
When the result rule in the reasoning task corresponds to that the attribute B of the relation A is set as C, converting the result rule into an updated relation attribute statement, wherein the format of the updated relation attribute statement is as follows: UPDATE EDGE a "a start point id" - > "a end point id" @0SET attribute b=c.
And the sentences obtained by the three result rule templates are used as result operation sentences.
In summary, the knowledge content "gastroscopy" is an exemplary method for diagnosing gastritis, especially chronic gastritis, "comprising the keywords" gastritis "," examination "and" gastroscopy ", and it is inferred that this knowledge content is related to the subclass" digestive system examination "of examination class classification. Rules are set by conditions: the knowledge content (entity name content, variable name nr) has an inclusion relationship (relationship name content, variable name bh) with a keyword (entity name keyword, variable name gjc), and the name attribute name of the keyword (entity name keyword, variable name gjc) is "gastritis", "examination" and "gastroscope", and the configuration result rule: the knowledge content (entity name content, variable name nr) entity adds a relationship (relationship name owncategory) with a subclass (entity name category, variable name lb) name of "digestive system check". The complete graph operation statement is as follows:
MATCH (nr: content) - [ bh: content ] - > (gjc: keyword) WHERE 1= 1and gjc.name in ("gastritis", "examination", "gastroscope") RETURN nr ";
INSERT EDGE owncategory () value "nr.id" - > "lb.id": (lb.name= "digestive system examination").
Next, executing a query operation sentence, obtaining a query result, executing a result operation sentence on the query result, and establishing or updating a corresponding relation of the result operation sentence, namely: firstly executing the query operation statement obtained by converting the conditional rule, temporarily storing the data result meeting the condition in the memory, then taking out the temporarily stored data, executing the result operation statement obtained by converting the result rule by using the corresponding executor, and performing the inserting or updating operation on the temporarily stored data.
It should be noted that, through the above-mentioned step S121 and/or step S122, the relationship of the knowledge content in the knowledge graph is updated periodically, all the inserted or updated records are stored in the result data buffer, after the operation of each update task is completed, the data in the buffer is counted as result data, the result statistical data is stored in the task result table of the service database, and after the update is completed, the total number of statistics and detailed detail records are recorded.
Compared with the prior art, the knowledge graph retrieval method based on rule reasoning provided by the embodiment has the advantages that the category of the problem is positioned through deep learning, the keyword is identified through NLP, the potential knowledge relationship is mined out to supplement and update the knowledge graph through configuration and execution of machine learning and reasoning tasks, and the accuracy of the retrieval result is improved; processing the problem sample by combining deep learning, subdividing the problem, and better identifying the intention of a user to ask questions; predicting the association relation between knowledge content and hidden keywords by combining a machine learning model; the method has the advantages that various condition rules and result rules are combined rapidly by combining the rule templates to configure the reasoning task, implicit entity attributes and association relations are mined, the method is convenient and quick, understanding is facilitated, an operation user only needs to have business knowledge, complex graph database sentences do not need to be written, and professional knowledge such as graph algorithms does not need to be used for data analysis; through relationship supplementation of knowledge base data and recognition of problem intention, richness and comprehensiveness of knowledge are increased, and accuracy of search results and user satisfaction are improved.
Example 2
In another embodiment of the invention, a knowledge graph retrieval system based on rule reasoning is disclosed, so that the knowledge graph retrieval method based on rule reasoning in the embodiment 1 is realized. The specific implementation of each module is described with reference to the corresponding description in embodiment 1. The system comprises:
The problem retrieval module is used for acquiring user problems, extracting keywords in the user problems, and obtaining subclasses corresponding to the user problems according to the text classification model; based on the relation of knowledge content in the knowledge graph, searching out related knowledge content according to the keywords and the subclasses to obtain a search result;
the map updating module is used for periodically updating the relation of the knowledge content in the knowledge map, and comprises the following steps: extracting hidden keywords of the knowledge content according to the keyword prediction model, and establishing the relationship between the knowledge content and the keywords; and constructing an inference task, configuring a condition rule and a result rule for each inference task by using the condition rule template and the result rule template, running the inference task which passes the verification, and establishing or updating the relationship between the knowledge content and the keywords and/or the subclasses.
Because the related parts of the knowledge graph retrieval system based on rule reasoning and the knowledge graph retrieval method based on rule reasoning in the embodiment can be referred to each other, repeated description is omitted here. The principle of the system embodiment is the same as that of the method embodiment, so the system embodiment also has the corresponding technical effects of the method embodiment.
Those skilled in the art will appreciate that all or part of the flow of the methods of the embodiments described above may be accomplished by way of a computer program to instruct associated hardware, where the program may be stored on a computer readable storage medium. Wherein the computer readable storage medium is a magnetic disk, an optical disk, a read-only memory or a random access memory, etc.
The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention.

Claims (10)

1. The knowledge graph retrieval method based on rule reasoning is characterized by comprising the following steps of:
acquiring user questions, extracting keywords in the user questions, and obtaining subclasses corresponding to the user questions according to the text classification model; based on the relation of knowledge content in the knowledge graph, searching out related knowledge content according to the keywords and the subclasses to obtain a search result;
the relationship of the knowledge content in the knowledge graph is updated periodically, which comprises the following steps:
extracting hidden keywords of the knowledge content according to the keyword prediction model, and establishing the relationship between the knowledge content and the keywords;
And/or constructing an inference task, configuring a condition rule and a result rule for each inference task by using the condition rule template and the result rule template, running the inference task passing the verification, and establishing or updating the relationship between the knowledge content and the keywords and/or the subclasses.
2. The rule reasoning-based knowledge graph retrieval method as set forth in claim 1, wherein the obtaining the subclass corresponding to the user problem according to the text classification model includes:
according to the historical user problems and the corresponding subclasses thereof, a first sample set is constructed, and a textCNN text classification model is trained by using the first sample set; and after word segmentation and vectorization are carried out on the obtained user problems, the obtained user problems are transmitted into a textCNN text classification model, and corresponding subclasses are obtained.
3. The rule reasoning-based knowledge graph retrieval method as set forth in claim 1, wherein the retrieving related knowledge contents according to the relation of the knowledge contents in the knowledge graph, the ranking to obtain the retrieval result includes:
based on the relation between the knowledge content in the knowledge graph and the keywords and the subclasses, according to the keywords, searching out the associated knowledge content, the relation weight and the belonging subclasses, and obtaining a search result; and sequencing the search results according to the detail, the keyword word frequency and the relation weight in sequence.
4. The rule reasoning-based knowledge graph retrieval method as claimed in claim 1, wherein the keyword prediction model is obtained by: manually marking hidden keywords as sample labels for knowledge contents in the knowledge graph, and constructing a second sample set after word segmentation processing is performed on the knowledge contents; and constructing a characteristic sequence model and training by using a second sample set to obtain a keyword prediction model.
5. The rule reasoning-based knowledge-graph retrieval method as recited in claim 1, wherein the conditional rule template comprises: the entity A and the entity B have a relation C, the attribute B of the entity A meets the condition C, the attribute B of the relation A meets the condition C, the attribute of the entity A is defined as a variable B, the attribute of the relation A is defined as a variable B, and the variable A and the variable B meet the condition C; the result rule template comprises: the entity A and the entity B add the relation C, the attribute B of the entity A is set as C, and the attribute B of the relation A is set as C.
6. The rule reasoning-based knowledge-graph retrieval method as recited in claim 5, wherein the configuring the condition rule and the result rule for each reasoning task based on the knowledge graph using the condition rule template and the result rule template includes: selecting an entity, a relation, an entity attribute and a relation attribute in the knowledge graph, and configuring 1 or more condition rules and 1 result rule; the same entity, relationship, entity attribute and relationship attribute are respectively identified by unique variable names in the same reasoning task; the conditions in the condition rule template include operators and condition values; the plurality of condition rules are "and" relationships.
7. The rule-based reasoning knowledge graph retrieval method as claimed in claim 6, wherein the reasoning task passed by the verification means that the condition rule and the variable name of the result rule in the reasoning task are valid, the condition value is matched with the attribute type, and the attribute value is valid.
8. The rule-based reasoning knowledge-graph retrieval method as recited in claim 7, wherein the variable name effectively comprises: each variable name does not include Chinese and symbols; each variable name only corresponds to the same entity or relationship or attribute; and, according to the condition rule configured by the rule template of defining the attribute of the entity A as the variable B and/or defining the attribute of the relation A as the variable B, the variable name exists in the condition rule configured by the rule template of defining the condition C by the variable A and the variable B;
the matching of the condition value with the attribute type comprises: the condition value is not NULL or NULL string; and the condition value is matched with the type of the attribute of the corresponding entity or relation;
the attribute value effectively includes: the attribute value is not null and the attribute value exists in the corresponding entity or relationship.
9. The rule-based reasoning knowledge graph retrieval method as claimed in claim 1, wherein the operation of the reasoning task passing the verification establishes or updates a relationship between knowledge content and keywords and/or subclasses, including:
Converting the condition rules in the reasoning task into query operation sentences of the graph database;
converting the result rule in the reasoning task into a result operation statement of the graph database;
executing the query operation statement, obtaining the query result, executing the result operation statement on the query result, and establishing or updating the corresponding relation of the result operation statement.
10. A knowledge graph retrieval system based on rule reasoning, comprising:
the problem retrieval module is used for acquiring user problems, extracting keywords in the user problems, and obtaining subclasses corresponding to the user problems according to the text classification model; based on the relation of knowledge content in the knowledge graph, searching out related knowledge content according to the keywords and the subclasses to obtain a search result;
the map updating module is used for periodically updating the relation of the knowledge content in the knowledge map, and comprises the following steps: extracting hidden keywords of the knowledge content according to the keyword prediction model, and establishing the relationship between the knowledge content and the keywords; and/or constructing an inference task, configuring a condition rule and a result rule for each inference task by using the condition rule template and the result rule template, running the inference task passing the verification, and establishing or updating the relationship between the knowledge content and the keywords and/or the subclasses.
CN202310336741.6A 2023-03-31 2023-03-31 Knowledge graph retrieval method and system based on rule reasoning Pending CN116484017A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310336741.6A CN116484017A (en) 2023-03-31 2023-03-31 Knowledge graph retrieval method and system based on rule reasoning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310336741.6A CN116484017A (en) 2023-03-31 2023-03-31 Knowledge graph retrieval method and system based on rule reasoning

Publications (1)

Publication Number Publication Date
CN116484017A true CN116484017A (en) 2023-07-25

Family

ID=87211172

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310336741.6A Pending CN116484017A (en) 2023-03-31 2023-03-31 Knowledge graph retrieval method and system based on rule reasoning

Country Status (1)

Country Link
CN (1) CN116484017A (en)

Similar Documents

Publication Publication Date Title
US20210382878A1 (en) Systems and methods for generating a contextually and conversationally correct response to a query
US10678816B2 (en) Single-entity-single-relation question answering systems, and methods
US11308143B2 (en) Discrepancy curator for documents in a corpus of a cognitive computing system
US7467079B2 (en) Cross lingual text classification apparatus and method
US11074286B2 (en) Automated curation of documents in a corpus for a cognitive computing system
US10146858B2 (en) Discrepancy handler for document ingestion into a corpus for a cognitive computing system
CN112035595B (en) Method and device for constructing auditing rule engine in medical field and computer equipment
US12032565B2 (en) Systems and methods for advanced query generation
CN112000802A (en) Software defect positioning method based on similarity integration
CN110209743B (en) Knowledge management system and method
CN113762100A (en) Name extraction and standardization method and device in medical bill, computing equipment and storage medium
US11494560B1 (en) System and methodology for computer-facilitated development of reading comprehension test items through passage mapping
CN117573797A (en) Test question retrieval method based on large language model
WO2022180989A1 (en) Model generation device and model generation method
CN116484017A (en) Knowledge graph retrieval method and system based on rule reasoning
CN114676258A (en) Disease classification intelligent service method based on patient symptom description text
Braunschweig Recovering the semantics of tabular web data
DeVille et al. Text as Data: Computational Methods of Understanding Written Expression Using SAS
Malak Text Preprocessing: A Tool of Information Visualization and Digital Humanities
Koci Layout inference and table detection in spreadsheet document
US20240054290A1 (en) Deep technology innovation management by cross-pollinating innovations dataset
Wu et al. Recommending Relevant Tutorial Fragments for API-Related Natural Language Questions
Qamar et al. Text Classification
CN118377853A (en) Paper topic selecting auxiliary method, system, medium and equipment based on large language model
Nagaraj et al. Automatic Correction of Text Using Probabilistic Error Approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination