CN111414465B - Knowledge graph-based processing method and device in question-answering system - Google Patents

Knowledge graph-based processing method and device in question-answering system Download PDF

Info

Publication number
CN111414465B
CN111414465B CN202010182500.7A CN202010182500A CN111414465B CN 111414465 B CN111414465 B CN 111414465B CN 202010182500 A CN202010182500 A CN 202010182500A CN 111414465 B CN111414465 B CN 111414465B
Authority
CN
China
Prior art keywords
candidate
main entity
main
question
path
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010182500.7A
Other languages
Chinese (zh)
Other versions
CN111414465A (en
Inventor
张文剑
牟小峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN202010182500.7A priority Critical patent/CN111414465B/en
Publication of CN111414465A publication Critical patent/CN111414465A/en
Application granted granted Critical
Publication of CN111414465B publication Critical patent/CN111414465B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the application discloses a processing method and a processing device in a question-answering system based on a knowledge graph. The method comprises the following steps: after receiving a question, acquiring a main entity corresponding to the question in a preset knowledge base; selecting at least two candidate master entities from the master entities; searching adjacent edges and adjacent nodes of the root node in a pre-stored knowledge graph by taking each candidate main entity as a root node, searching adjacent edges and adjacent nodes of the adjacent nodes in the next layer by the adjacent nodes, and the like until searching the node of the last layer to obtain a path corresponding to each candidate main entity; calculating the similarity between the text information corresponding to the path of each candidate main entity and the text information of the question; selecting a path with the similarity meeting a preset highest similarity judgment condition as a candidate path of a candidate main entity; obtaining a final selection path of the main entity from candidate paths of the candidate main entity; and determining text information corresponding to the final selection path as an answer of the question.

Description

Knowledge graph-based processing method and device in question-answering system
Technical Field
The embodiment of the application relates to the field of information processing, in particular to a processing method and a processing device in a question-answering system based on a knowledge graph.
Background
The question-answering system is a high-level form of information retrieval system. Knowledge graph-based questions and answers (knowledges-based Question Answering, KBQA, hereinafter "Knowledge questions and answers") help people acquire Knowledge from a Knowledge base in the form of natural language dialogs. Knowledge question and answer relies on a large knowledge base (such as a knowledge graph or a structured database, etc.), natural language question sentences of the user are converted into structured query sentences (such as SPARQL, SQL, etc.), and answers required by the user are directly derived from the knowledge base.
The knowledge base stores knowledge in RDF (Resource Description Framework ) format, each knowledge being represented as a triplet, namely, subject (Object), object (Object) and language (precursor). Wherein the Subject (Object), object (Object) is most of the time the Subject entity, the Object will sometimes also be an attribute value; the term (predicte) describes the relationship between a subject and an object. All such triples constitute a semantic network, i.e. a knowledge graph. From the view of the graph, the knowledge graph is composed of nodes and edges, and for any triplet, the subject and object are nodes, and the language is an edge connecting the two nodes.
With the increasing development and application of knowledge graphs, knowledge questions and answers are particularly important. The knowledge question and answer is mainly applied to intelligent dialogue systems, intelligent customer service, intelligent assistants and the like, can help people to quickly and accurately acquire knowledge, and is a natural form of man-machine interaction. In knowledge questions and answers, the questions and answers methods based on knowledge graph in the related art are roughly divided into two main categories, namely, questions and answers methods based on semantic analysis and questions and answers methods based on information extraction.
Two difficulties are faced in the related art, one is that existing natural language understanding techniques also appear to be weak in dealing with the ambiguity and complexity of natural language. For example, sometimes a sentence system is understood, but in another words it is not; another difficulty is that knowledge question and answer systems require a great deal of domain knowledge to understand natural language questions, which requires a great deal of labor cost.
Disclosure of Invention
In order to solve any technical problem, the embodiment of the application provides a processing method and a processing device in a question-answering system based on a knowledge graph.
In order to achieve the purpose of the embodiment of the application, the embodiment of the application provides a processing method in a question-answering system based on a knowledge graph, which comprises the following steps:
after receiving a question, acquiring a main entity corresponding to the question in a preset knowledge base;
selecting at least two candidate master entities from the master entities;
searching adjacent edges and adjacent nodes of the root node in a pre-stored knowledge graph by taking each candidate main entity as a root node, searching adjacent edges and adjacent nodes of the adjacent nodes in the next layer by the adjacent nodes, and the like until searching the node of the last layer to obtain a path corresponding to each candidate main entity;
calculating the similarity between the text information corresponding to the path of each candidate main entity and the text information of the question;
selecting a path with the similarity meeting a preset highest similarity judgment condition as a candidate path of a candidate main entity;
obtaining a final selection path of the main entity from candidate paths of the candidate main entity;
and determining text information corresponding to the final selection path as an answer of the question.
A processing device in a knowledge-graph-based question-answering system, comprising:
the first acquisition module is used for acquiring a main entity corresponding to a question in a preset knowledge base after receiving the question;
a first selection module, configured to select at least two candidate master entities from the master entities;
the searching module is used for searching adjacent edges and adjacent nodes of the root node in a pre-stored knowledge graph by taking each candidate main entity as a root node, searching adjacent edges and adjacent nodes of the adjacent nodes in the next layer by the adjacent nodes, and the like until the node of the last layer is searched, so as to obtain a path corresponding to each candidate main entity;
the computing module is used for computing the similarity between the text information corresponding to the path of each candidate main entity and the text information of the question;
the second selection module is used for selecting a path with the similarity meeting a preset highest similarity judgment condition as a candidate path of the candidate main entity;
the second acquisition module is used for obtaining a final selection path of the main entity from the candidate paths of the candidate main entity;
and the determining module is used for determining text information corresponding to the final selection path and taking the text information as an answer of the question.
According to the scheme provided by the embodiment of the application, after a question is received, a main entity corresponding to the question in a preset knowledge base is obtained, at least two candidate main entities are selected from the main entities, each candidate main entity is taken as a root node, adjacent edges and adjacent nodes of the root node are searched in a prestored knowledge graph, adjacent edges and adjacent nodes of the adjacent nodes in the next layer are searched through the adjacent nodes, and the like until the node of the last layer is searched, a path corresponding to each candidate main entity is obtained, the similarity of text information corresponding to the path of each candidate main entity and the text information of the question is calculated, a path of which the similarity meets a preset highest similarity judgment condition is selected as a candidate path of the candidate main entity, the final selected path of the main entity is obtained from the candidate paths of the candidate main entity, the text information corresponding to the final selected path is determined, the answer of the question is obtained, the answer of the question is improved, and the manual maintenance cost is reduced.
Additional features and advantages of embodiments of the application will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of embodiments of the application. The objectives and other advantages of embodiments of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The accompanying drawings are included to provide a further understanding of the technical solution of the embodiments of the present application, and are incorporated in and constitute a part of this specification, illustrate and explain the technical solution of the embodiments of the present application, and not to limit the technical solution of the embodiments of the present application.
Fig. 1 is a flowchart of a processing method in a knowledge-graph-based question-answering system provided by an embodiment of the present application;
fig. 2 is a schematic diagram of a knowledge-based question-answering method according to an embodiment of the present application;
fig. 3 is a block diagram of a processing device in a knowledge-graph-based question-answering system according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the embodiments of the present application will be described in detail hereinafter with reference to the accompanying drawings. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be arbitrarily combined with each other.
In the process of implementing the scheme of the application, the inventor finds that the following problems exist in the related art, and the specific analysis is as follows:
the knowledge base question-answering method based on semantic analysis in the related technology is characterized in that a natural language question sentence is parsed to convert the question sentence into a logic expression, then the logic expression is converted into a knowledge base query language by utilizing semantic information of the knowledge base, and finally a result is obtained by querying the knowledge base. Through analysis by the inventor, since the key of the semantic parsing-based method is to convert the natural language query into a logical expression, and the operation of converting into the logical expression requires training a parser model by using a supervised learning method. Training a parser requires a large amount of annotation data, and because of the diversity of Chinese word sense expressions, a large amount of words representing the relationships in the knowledge base are extracted from the text after conversion to the logical expression, resulting in high labor cost and significant limitations.
In the related art, a knowledge base question-answering method based on information extraction simulates human thinking, firstly, a subject word of a natural language question is identified, then a main entity corresponding to the subject word is found in a knowledge base, the main entity is taken as a node, candidate answers are searched out in the knowledge base through adjacent sides of the node, and each candidate answer corresponds to a candidate path. And calculating the similarity between all candidate paths and the question, wherein the candidate path with the maximum similarity is used as the path for finally obtaining the answer of the question. Through analysis of the inventor, the method based on information extraction is to identify the main entity in the question sentence by using a deep learning model and calculate the similarity between the question sentence and the path, so that the method is extremely dependent on training corpus. Along with the increase of the knowledge base, the deep learning model easily misses the main entity and is difficult to screen the candidate path which is most in line with the question through a limited training set, so that the answer returned to the user by the system does not meet the actual requirement of the user.
Based on the analysis, the embodiment of the application provides a knowledge-graph-based question-answering method integrating a deep learning model and rules, which takes an information extraction mode as a main flow, does not need grammar analysis on questions, directly substitutes the questions into the deep learning model for feature extraction and calculation, and avoids expensive labor cost; the answers are further screened in a plurality of processes in a rule mode, and auxiliary screening is carried out by a rule-merging method, so that the answers finally returned to the user are as reasonable as possible, and the interference to the deep learning model caused by the huge number of triples of the knowledge base is reduced.
Fig. 1 is a flowchart of a processing method in a knowledge-graph-based question-answering system according to an embodiment of the present application. As shown in fig. 1, the method shown in fig. 1 includes:
step 101, after receiving a question, acquiring a main entity corresponding to the question in a preset knowledge base;
in one exemplary embodiment, the triples in the knowledge base include a subject, a language, and a guest; for example, the question is where Zhang Sanzhen the content of the question is, and the main entity is Zhang Sanhe; the language is lived; which object is (representing address information).
102, selecting at least two candidate main entities from the main entities;
in one exemplary embodiment, the candidate living master entity that may be determined is actor three; alternatively, the doctor tenses three; alternatively, the teacher opens three.
Step 103, searching adjacent edges and adjacent nodes of the root node in a pre-stored knowledge graph by taking each candidate main entity as a root node, searching adjacent edges and adjacent nodes of the adjacent nodes in the next layer by the adjacent nodes, and the like until searching the node of the last layer to obtain a path corresponding to each candidate main entity;
104, calculating the similarity between the text information corresponding to the path of each candidate main entity and the text information of the question;
in one exemplary embodiment, similarity may be calculated using a pre-trained text matching model.
Step 105, selecting a path with the similarity meeting a preset highest similarity judgment condition as a candidate path;
step 106, obtaining the final selection path of the main entity from the candidate paths of the candidate main entity;
and 107, determining text information corresponding to the final selection path as an answer of the question.
According to the method provided by the embodiment of the application, after a question is received, a main entity corresponding to the question in a preset knowledge base is obtained, at least two candidate main entities are selected from the main entities, each candidate main entity is taken as a root node, adjacent edges and adjacent nodes of the root node are searched in a prestored knowledge graph, adjacent edges and adjacent nodes of the adjacent nodes in the next layer are searched through the adjacent nodes, and the like until the node of the last layer is searched, a path corresponding to each candidate main entity is obtained, the similarity of text information corresponding to the path of each candidate main entity and the text information of the question is calculated, a path of which the similarity meets a preset highest similarity judgment condition is selected as a candidate path of the candidate main entity, the final selected path of the main entity is obtained from the candidate paths of the candidate main entity, the text information corresponding to the final selected path is determined, the answer of the question is obtained, the answer of the question is improved, and the manual maintenance cost is reduced.
The following describes the method provided by the embodiment of the application:
in an exemplary embodiment, the obtaining the main entity corresponding to the question in the preset knowledge base includes:
identifying a main entity mention in the question by using a pre-acquired main entity dictionary to obtain a first identification result, wherein the main entity dictionary comprises all subjects and objects in the knowledge base; and identifying the main entity mention in the question sentence by utilizing a deep learning model of the pre-acquired subject term identification, so as to obtain a second identification result;
combining the first recognition result and the second recognition result into a final recognition result;
and searching the corresponding relation between the main entity mention stored in the preset link dictionary and the main entity in the knowledge base, and searching each main entity mention in the identification result to the main entity in the corresponding knowledge base.
The deep learning model of subject word recognition may be a BERT-CRF (bi-directional encoder representation of fusion conditional random field) model.
The main entity dictionary is utilized to ensure that the subject words in the question are not missed as far as possible, the deep learning model can grasp the subject words more accurately, and meanwhile, the omission of the subject words in the question can be reduced; the main entity mention in the question sentence is identified through the main entity dictionary and the deep learning model, all fragments possibly becoming subject words in the question sentence can be extracted, and then all the identified main entity mention is converted into the main entity in the knowledge base through one link dictionary, and the main entity mention is identified by combining the two modes, so that omission of the main entity mention identification is reduced, and the identification accuracy is improved.
In an exemplary embodiment, the selecting at least two candidate master entities from the master entities includes:
determining at least two pieces of characteristic information of each main entity;
the method comprises the steps of obtaining score information of each main entity by identifying characteristic information of the same main entity;
at least two candidate master entities are selected according to the score information of each master entity.
The characteristics of the main entity comprise at least one of the length mentioned by the main entity, the overlapped word number and semantic similarity of the main entity and the question, the overlapped word number of all adjacent edges of the main entity and the question, and the occurrence frequency of the main entity in a knowledge base.
By analyzing the characteristic information of the main entity, the accurate identification of the main entity information is achieved, an operation basis is provided for screening the main entity, and the accuracy of selection is improved.
In an exemplary embodiment, the obtaining the final selection path of the master entity from the candidate paths of the candidate master entity includes:
judging whether the difference value between the similarity of the candidate paths is larger than or equal to a preset threshold value;
if the difference value is smaller than the threshold value, acquiring the number of overlapped words of text information corresponding to the candidate path and the question, and selecting the path with the largest number of overlapped words as the final selected path;
and if the difference value is greater than or equal to the threshold value, selecting the path with the highest similarity in the candidate paths as the final selected path.
Determining the mode used for selecting the final selection path through calculation of the similarity difference value; if the difference value is larger than or equal to the threshold value, the text semantic information between the candidate paths is indicated to have larger difference, the paths with the highest similarity are directly adopted, and if the difference value is smaller than the threshold value, the text semantic information between the candidate paths is indicated to have smaller difference, and the repeated word number of the text content is utilized for screening the answers.
In an exemplary embodiment, when n main entities are included in the question, selecting one main entity of the n main entities as a target main entity, and after determining candidate main entities of the target main entity, determining target candidate main entities of the target main entity;
the candidate path of the target candidate main entity is obtained by the following steps:
in the process of searching adjacent edges and adjacent nodes by taking each target candidate main entity as a root node, when searching the adjacent edges and the adjacent nodes of the adjacent nodes in the next layer, filtering paths by using the rest (n-1) main entities in the n main entities to obtain candidate paths of the target candidate main entities, wherein n is an integer greater than or equal to 2
If the question is where Zhang three and Liu four live, the main entity includes Zhang three and Liu four, after obtaining Zhang three paths (Zhang three, address, pond area), a triplet (Liu four, address, pond area) can be searched by using the adjacent node 'pond area', so as to bridge to a double main entity path.
In summary, the application provides a knowledge-graph-based question-answering method, which comprises main entity mention identification, main entity link, candidate path screening, main entity bridging and answer generation. And identifying the main entities in the question, calculating the characteristics of each main entity, substituting the characteristics of each main entity into the multi-layer perceptron model to obtain the score of each main entity, and finally selecting at least two main entities with the highest scores as candidate main entities. And then executing the step of candidate path screening, wherein the step comprises the steps of searching adjacent edges and adjacent nodes in the knowledge graph by taking the candidate main entity as a root node, and searching edges of a deeper layer through the adjacent nodes so as to generate a path formed by the root node and the edges. These paths are then substituted into the BERT (bi-directional encoder representation) text matching model along with the question to obtain their similarity, and the candidate paths most likely to match the question answers are screened out in combination with other features and some rules. The answer generation is to find the answer to answer the question in the knowledge base through the finally screened path.
In addition, the main entity bridging is mainly used for solving the situation that the question sentence contains a plurality of main entities, one candidate main entity is bridged to the other candidate main entity in the deep path, and the final path is obtained by comparing the characteristics of the candidate main entity and the paths screened in the previous step.
Fig. 2 is a schematic diagram of a knowledge-based question-answering method according to an embodiment of the present application. As shown in fig. 2, the method shown in fig. 2 includes:
step 1: for natural language questions input by users, all main entity references in the questions are identified through a main entity dictionary and BERT-CRF (bi-directional encoder representation of fusion conditional random fields).
The main entity dictionary is a dictionary created by aggregating all subjects (objects) and objects (objects) in the knowledge base. Fragments appearing in the main entity dictionary are extracted by slicing the question, and those fragments partially contained in the longer fragments are filtered out, and the remaining fragments are mentioned as main entities extracted through the main entity dictionary. The main entity dictionary can ensure that the subject words in the question are not missed as far as possible, and the BERT-CRF (bi-directional encoder representation of the fusion conditional random field) deep learning model can grasp the subject words more accurately, and meanwhile, the omission of the subject words in the question can be reduced. Finally, merging the main entity references extracted in the two modes.
Step 2: for all the main entity references obtained in the step 1, firstly converting the main entity references into main entities in a knowledge base through a link dictionary, then determining the characteristics of each main entity, substituting the characteristics of each main entity into a multi-layer perceptron model trained by a training corpus to obtain the score of each main entity, and finally selecting 5 main entities with highest scores as candidate main entities.
The features of the main entity include the length mentioned by the main entity, the number of overlapping words and semantic similarity of the main entity and the question, the number of overlapping words of all adjacent edges of the main entity and the question, the number of times the main entity appears in the knowledge base, and the like.
The link dictionary is generally manually captured or created, which can align the subject words in the natural language question sentence with the main entities in the knowledge base, and simultaneously refer to the same main entity, and a plurality of main entities may be corresponding in the knowledge base. For example, a main entity mentioning "Zhang Sano" might correspond to an actor in a movie and a doctor in a certain hospital, etc.
Step 3: and (3) taking the 5 candidate main entities obtained in the step (2) as root nodes, searching adjacent edges and adjacent nodes in the knowledge graph, and searching edges of a deeper layer through the adjacent nodes, so as to generate a path formed by the root nodes and the edges. And substituting the paths and question sentences into a BERT (bi-directional encoder representation) text matching model trained by the training corpus to obtain the similarity of the paths, and selecting 3 paths with highest similarity as candidate paths. For these 3 candidate paths, further filtering is performed in overlapping words. When the similarity score of the candidate path and the question is not more than a threshold value a, the path with the largest overlapping word number with the question is taken as the final selection path of the step, otherwise, the path with the highest similarity is taken. Wherein the threshold a is obtained by manual parameter adjustment.
Step 4: and (3) searching adjacent edges by taking the candidate main entities obtained in the step (2) as root nodes to obtain paths formed by the root nodes and the adjacent edges. Substituting the paths and questions into the BERT (bi-directional encoder representation) text matching model used in the step 3 to calculate the similarity, taking 20 paths with the highest similarity, searching the triples of the deeper layer through the nodes of the 20 paths, and bridging to the paths of a double-master entity if the triples of the deeper layer contain other candidate master entities.
For example, the question "what are television shows that are co-occurrence of Zhang three and Li four? ", including the main entities" Zhang Sano "and" Liqu ", can be obtained by searching the" object "in the path (Zhang Sano, main actor," object ") in a deeper layer, so as to obtain a double main entity path: zhang Sanzhuan > principal? The principal is the fourth.
Filtering all bridged paths of the double main entities according to rules that the main entities cannot be the same as the relationships, the relationships do not contain special relationships such as other names, chinese names and the like, the references corresponding to the two main entities in question sentences cannot be overlapped and the like, calculating the characteristics of the filtered paths of the double main entities and the paths finally screened in the step 4, such as overlapping word numbers of the question sentences, BERT (bi-directional encoder representation) text matching model similarity scores and the like, and finally selecting the paths which are most in line with answers of the question sentences. Through carrying out rule filtration on the double main entities, the path interference of the double main entities is greatly avoided, and meanwhile, the final screening by adopting a text matching model is integrated with semantic features.
Step 5: and (3) searching and obtaining answers of the questions in the knowledge base according to the final path obtained in the step (4) and returning the answers to the user.
The knowledge-graph-based question-answering algorithm of the application takes an information extraction mode as a main flow, and filters in a mode of integrating rules in a plurality of flows. The generalization advantage of the information extraction mode for semantic computation and the lower labor cost advantage are reserved, the uncertainty influence caused by using a deep learning model in the information extraction mode is avoided in a rule mode, and the capability of answering natural language questions and answers is greatly improved.
The deep learning model used in the technical solution in step 2 is not limited to BERT (bi-directional encoder representation), and other deep learning models such as RNN (recurrent neural network), CNN (convolutional neural network) and the like can be used to obtain similar technical effects. Meanwhile, the rule filtering method set in each step in the step 2 can be formulated through priori knowledge and data experiments, and is not limited to the above-mentioned rule filtering modes.
The key point of the application is to select and filter question answers by adding more rules while the knowledge question answering method of the information extraction mode is used, and the rules are usually formulated according to a specific knowledge base and an application scene and combining human wisdom.
Fig. 3 is a block diagram of a processing method in a knowledge-based question-answering system according to an embodiment of the present application. As shown in fig. 3, the apparatus shown in fig. 3 includes:
the first acquisition module is used for acquiring a main entity corresponding to a question in a preset knowledge base after receiving the question;
a first selection module, configured to select at least two candidate master entities from the master entities;
the searching module is used for searching adjacent edges and adjacent nodes of the root node in a pre-stored knowledge graph by taking each candidate main entity as a root node, searching adjacent edges and adjacent nodes of the adjacent nodes in the next layer by the adjacent nodes, and the like until the node of the last layer is searched, so as to obtain a path corresponding to each candidate main entity;
the computing module is used for computing the similarity between the text information corresponding to the path of each candidate main entity and the text information of the question;
the second selection module is used for selecting a path with the similarity meeting a preset highest similarity judgment condition as a candidate path of the candidate main entity;
the second acquisition module is used for obtaining a final selection path of the main entity from the candidate paths of the candidate main entity;
and the determining module is used for determining text information corresponding to the final selection path and taking the text information as an answer of the question.
In one exemplary embodiment, the acquisition module includes:
the recognition unit is used for recognizing the main entity mention in the question sentence by utilizing a pre-acquired main entity dictionary to obtain a first recognition result, wherein the main entity dictionary comprises all subjects and objects in the knowledge base; and identifying the main entity mention in the question sentence by utilizing a deep learning model of the pre-acquired subject term identification, so as to obtain a second identification result;
the merging unit is used for merging the first recognition result and the second recognition result into a final recognition result;
and the searching unit is used for searching the corresponding relation between the main entity mention stored in the preset link dictionary and the main entity in the knowledge base, and each main entity mention in the identification result corresponds to the main entity in the knowledge base.
In one exemplary embodiment, the first selection module includes:
a determining unit, configured to determine at least two pieces of feature information of each main entity;
the processing unit is used for identifying the characteristic information of the same main entity to obtain the score information of each main entity;
and the selection unit is used for selecting at least two candidate main entities according to the score information of each main entity.
In an exemplary embodiment, the second selecting module includes:
the judging unit is used for judging whether the difference value between the similarity of the candidate paths is larger than or equal to a preset threshold value;
the obtaining unit is used for selecting a path with highest similarity in the candidate paths as a final selected path if the difference value is larger than or equal to the threshold value; and if the difference value is smaller than the threshold value, acquiring the number of overlapped words of text information corresponding to the candidate path and the question, and selecting the path with the largest number of overlapped words as the final selected path.
In an exemplary embodiment, the search module is configured to, when n main entities are included in the question, select one main entity of the n main entities as a target main entity, and determine a target candidate main entity of the target main entity after determining a candidate main entity of the target main entity;
the candidate path of the target candidate main entity is obtained by the following steps:
and in the process of searching adjacent edges and adjacent nodes by taking each target candidate main entity as a root node, when searching the adjacent edges and the adjacent nodes of the adjacent nodes in the next layer, screening paths by using the rest (n-1) main entities in the n main entities to obtain candidate paths of the target candidate main entities, wherein n is an integer greater than or equal to 2. According to the device provided by the embodiment of the application, after a question is received, a main entity corresponding to the question in a preset knowledge base is obtained, at least two candidate main entities are selected from the main entities, each candidate main entity is taken as a root node, adjacent edges and adjacent nodes of the root node are searched in a prestored knowledge graph, adjacent edges and adjacent nodes of the adjacent nodes in the next layer are searched through the adjacent nodes, and the like until the node of the last layer is searched, a path corresponding to each candidate main entity is obtained, the similarity of text information corresponding to the path of each candidate main entity and the text information of the question is calculated, a path of which the similarity meets a preset highest similarity judgment condition is selected as a candidate path of the candidate main entity, the final selected path of the main entity is obtained from the candidate paths of the candidate main entity, the text information corresponding to the final selected path is determined, the answer of the question is obtained, the answer of the question is improved, and the manual maintenance cost is reduced.
Those of ordinary skill in the art will appreciate that all or some of the steps, systems, functional modules/units in the apparatus, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed cooperatively by several physical components. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.

Claims (8)

1. The processing method in the question-answering system based on the knowledge graph is characterized by comprising the following steps:
after receiving a question, acquiring a main entity corresponding to the question in a preset knowledge base;
selecting at least two candidate master entities from the master entities;
searching adjacent edges and adjacent nodes of the root node in a pre-stored knowledge graph by taking each candidate main entity as a root node, searching adjacent edges and adjacent nodes of the adjacent nodes in the next layer by the adjacent nodes, and the like until searching the node of the last layer to obtain a path corresponding to each candidate main entity;
calculating the similarity between the text information corresponding to the path of each candidate main entity and the text information of the question;
selecting a path with the similarity meeting a preset highest similarity judgment condition as a candidate path of a candidate main entity;
obtaining a final selection path of the main entity from candidate paths of the candidate main entity;
determining text information corresponding to the final selection path as an answer to the question;
wherein the obtaining the final selection path of the master entity from the candidate paths of the candidate master entity includes:
judging whether the difference value between the similarity of the candidate paths is larger than or equal to a preset threshold value;
if the difference value is greater than or equal to the threshold value, selecting a path with highest similarity in the candidate paths as a final selected path;
and if the difference value is smaller than the threshold value, acquiring the number of overlapped words of text information corresponding to the candidate path and the question, and selecting the path with the largest number of overlapped words as the final selected path.
2. The method of claim 1, wherein the obtaining the main entity corresponding to the question in the preset knowledge base includes:
identifying a main entity mention in the question by using a pre-acquired main entity dictionary to obtain a first identification result, wherein the main entity dictionary comprises all subjects and objects in the knowledge base; and identifying the main entity mention in the question sentence by utilizing a deep learning model of the pre-acquired subject term identification, so as to obtain a second identification result;
combining the first recognition result and the second recognition result into a final recognition result;
and searching the corresponding relation between the main entity mention stored in the preset link dictionary and the main entity in the knowledge base, and searching each main entity mention in the identification result to the main entity in the corresponding knowledge base.
3. The method of claim 1, wherein the selecting at least two candidate master entities from the master entities comprises:
determining at least two pieces of characteristic information of each main entity;
the method comprises the steps of obtaining score information of each main entity by identifying characteristic information of the same main entity;
at least two candidate master entities are selected according to the score information of each master entity.
4. The method according to claim 1, characterized in that:
when n main entities are included in the question, selecting one main entity from the n main entities as a target main entity, and determining a target candidate main entity of the target main entity after determining the candidate main entity of the target main entity;
the candidate path of the target candidate main entity is obtained by the following steps:
and in the process of searching adjacent edges and adjacent nodes by taking each target candidate main entity as a root node, when searching the adjacent edges and the adjacent nodes of the adjacent nodes in the next layer, screening paths by using the rest (n-1) main entities in the n main entities to obtain candidate paths of the target candidate main entities, wherein n is an integer greater than or equal to 2.
5. A processing device in a knowledge-graph-based question-answering system, comprising:
the first acquisition module is used for acquiring a main entity corresponding to a question in a preset knowledge base after receiving the question;
a first selection module, configured to select at least two candidate master entities from the master entities;
the searching module is used for searching adjacent edges and adjacent nodes of the root node in a pre-stored knowledge graph by taking each candidate main entity as a root node, searching adjacent edges and adjacent nodes of the adjacent nodes in the next layer by the adjacent nodes, and the like until the node of the last layer is searched, so as to obtain a path corresponding to each candidate main entity;
the computing module is used for computing the similarity between the text information corresponding to the path of each candidate main entity and the text information of the question;
the second selection module is used for selecting a path with the similarity meeting a preset highest similarity judgment condition as a candidate path of the candidate main entity;
the second acquisition module is used for obtaining a final selection path of the main entity from the candidate paths of the candidate main entity;
the determining module is used for determining text information corresponding to the final selection path and taking the text information as an answer of the question;
wherein the second selection module includes:
the judging unit is used for judging whether the difference value between the similarity of the candidate paths is larger than or equal to a preset threshold value;
the obtaining unit is used for selecting a path with highest similarity in the candidate paths as a final selected path if the difference value is larger than or equal to the threshold value; and if the difference value is smaller than the threshold value, acquiring the number of overlapped words of text information corresponding to the candidate path and the question, and selecting the path with the largest number of overlapped words as the final selected path.
6. The apparatus of claim 5, wherein the acquisition module comprises:
the recognition unit is used for recognizing the main entity mention in the question sentence by utilizing a pre-acquired main entity dictionary to obtain a first recognition result, wherein the main entity dictionary comprises all subjects and objects in the knowledge base; and identifying the main entity mention in the question sentence by utilizing a deep learning model of the pre-acquired subject term identification, so as to obtain a second identification result;
the merging unit is used for merging the first recognition result and the second recognition result into a final recognition result;
and the searching unit is used for searching the corresponding relation between the main entity mention stored in the preset link dictionary and the main entity in the knowledge base, and each main entity mention in the identification result corresponds to the main entity in the knowledge base.
7. The apparatus of claim 5, wherein the first selection module comprises:
a determining unit, configured to determine at least two pieces of feature information of each main entity;
the processing unit is used for identifying the characteristic information of the same main entity to obtain the score information of each main entity;
and the selection unit is used for selecting at least two candidate main entities according to the score information of each main entity.
8. The apparatus according to claim 5, wherein:
the searching module is used for selecting one of n main entities as a target main entity when the question includes the n main entities, and determining target candidate main entities of the target main entity after determining candidate main entities of the target main entity;
the candidate path of the target candidate main entity is obtained by the following steps:
and in the process of searching adjacent edges and adjacent nodes by taking each target candidate main entity as a root node, when searching the adjacent edges and the adjacent nodes of the adjacent nodes in the next layer, screening paths by using the rest (n-1) main entities in the n main entities to obtain candidate paths of the target candidate main entities, wherein n is an integer greater than or equal to 2.
CN202010182500.7A 2020-03-16 2020-03-16 Knowledge graph-based processing method and device in question-answering system Active CN111414465B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010182500.7A CN111414465B (en) 2020-03-16 2020-03-16 Knowledge graph-based processing method and device in question-answering system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010182500.7A CN111414465B (en) 2020-03-16 2020-03-16 Knowledge graph-based processing method and device in question-answering system

Publications (2)

Publication Number Publication Date
CN111414465A CN111414465A (en) 2020-07-14
CN111414465B true CN111414465B (en) 2023-09-01

Family

ID=71491208

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010182500.7A Active CN111414465B (en) 2020-03-16 2020-03-16 Knowledge graph-based processing method and device in question-answering system

Country Status (1)

Country Link
CN (1) CN111414465B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111966834A (en) * 2020-07-29 2020-11-20 深圳市元征科技股份有限公司 File generation method, file generation device and server
CN112115276A (en) * 2020-09-18 2020-12-22 平安科技(深圳)有限公司 Intelligent customer service method, device, equipment and storage medium based on knowledge graph
CN112199473A (en) * 2020-10-16 2021-01-08 上海明略人工智能(集团)有限公司 Multi-turn dialogue method and device in knowledge question-answering system
CN112115238B (en) * 2020-10-29 2022-11-15 电子科技大学 Question-answering method and system based on BERT and knowledge base
CN112579600A (en) * 2020-12-21 2021-03-30 广州橙行智动汽车科技有限公司 Data processing method and device based on vehicle-mounted question answering
CN112632226B (en) * 2020-12-29 2021-10-26 天津汇智星源信息技术有限公司 Semantic search method and device based on legal knowledge graph and electronic equipment
CN112818675A (en) * 2021-02-01 2021-05-18 北京金山数字娱乐科技有限公司 Knowledge base question-answer-based entity extraction method and device
CN112860862B (en) * 2021-02-01 2022-11-11 北京邮电大学 Method and device for generating intelligent agent dialogue sentences in man-machine dialogue
CN113157861B (en) * 2021-04-12 2022-05-24 山东浪潮科学研究院有限公司 Entity alignment method fusing Wikipedia
CN113204628A (en) * 2021-05-17 2021-08-03 上海明略人工智能(集团)有限公司 Method and device for obtaining answers to question sentences, electronic equipment and readable storage medium
CN113033210A (en) * 2021-05-31 2021-06-25 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Drug potential side effect mining method based on social media data analysis

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2400442A1 (en) * 2000-02-25 2001-08-30 Yet Mui Method for enterprise workforce planning
CN109344238A (en) * 2018-09-18 2019-02-15 阿里巴巴集团控股有限公司 The benefit word method and apparatus of user's question sentence
CN109800284A (en) * 2018-12-19 2019-05-24 中国电子科技集团公司第二十八研究所 A kind of unstructured information intelligent Answer System construction method of oriented mission
CN110837550A (en) * 2019-11-11 2020-02-25 中山大学 Knowledge graph-based question and answer method and device, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2400442A1 (en) * 2000-02-25 2001-08-30 Yet Mui Method for enterprise workforce planning
CN109344238A (en) * 2018-09-18 2019-02-15 阿里巴巴集团控股有限公司 The benefit word method and apparatus of user's question sentence
CN109800284A (en) * 2018-12-19 2019-05-24 中国电子科技集团公司第二十八研究所 A kind of unstructured information intelligent Answer System construction method of oriented mission
CN110837550A (en) * 2019-11-11 2020-02-25 中山大学 Knowledge graph-based question and answer method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111414465A (en) 2020-07-14

Similar Documents

Publication Publication Date Title
CN111414465B (en) Knowledge graph-based processing method and device in question-answering system
Nie et al. Combining fact extraction and verification with neural semantic matching networks
CN108804521B (en) Knowledge graph-based question-answering method and agricultural encyclopedia question-answering system
Mochales et al. Argumentation mining
JP2020027649A (en) Method, apparatus, device and storage medium for generating entity relationship data
CN110297893B (en) Natural language question-answering method, device, computer device and storage medium
CN110222045A (en) A kind of data sheet acquisition methods, device and computer equipment, storage medium
CN111625659A (en) Knowledge graph processing method, device, server and storage medium
Alturayeif et al. A systematic review of machine learning techniques for stance detection and its applications
CN115203440B (en) Event map construction method and device for time-space dynamic data and electronic equipment
CN107301164B (en) Semantic analysis method and device for mathematical formula
Sovrano et al. Legal knowledge extraction for knowledge graph based question-answering
CN112149386A (en) Event extraction method, storage medium and server
CN114661872A (en) Beginner-oriented API self-adaptive recommendation method and system
Zhao RETRACTED ARTICLE: Application of deep learning algorithm in college English teaching process evaluation
KR20200145299A (en) Intelligent recruitment support platform based on online interview video analysis and social media information analysis
CN111563147B (en) Entity linking method and device in knowledge question-answering system
CN114117000A (en) Response method, device, equipment and storage medium
CN111858962A (en) Data processing method, device and computer readable storage medium
CN109977235B (en) Method and device for determining trigger word
Leblay et al. Computational fact-checking: Problems, state of the art, and perspectives
Okoye Linked open data: State-of-the-art mechanisms and conceptual framework
KR20210098135A (en) Apparatus, method and computer program for analyzing query data
CN113886535B (en) Knowledge graph-based question and answer method and device, storage medium and electronic equipment
CN111949781B (en) Intelligent interaction method and device based on natural sentence syntactic analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant