CN111008272A

CN111008272A - Knowledge graph-based question and answer method and device, computer equipment and storage medium

Info

Publication number: CN111008272A
Application number: CN201911227905.1A
Authority: CN
Inventors: 廖林伟; 金洁; 史伟国; 盛学军; 卢振
Original assignee: Shenzhen New Guodu Jinfu Technology Co Ltd
Current assignee: Shenzhen New Guodu Jinfu Technology Co Ltd
Priority date: 2019-12-04
Filing date: 2019-12-04
Publication date: 2020-04-14

Abstract

The embodiment of the invention discloses a question and answer method and device based on a knowledge graph, computer equipment and a storage medium. The method comprises the following steps: if a question to be tested sent by a terminal is received, identifying an entity and a relation of the question to be tested; acquiring an entity of the question to be tested and a candidate list of the relation from a preset knowledge map; selecting an entity or a relation from the candidate list of the entities and the relations of the question to be tested to form a characteristic sequence, wherein the sum of cost functions between the entities and the relations of the characteristic sequence is minimum; and inquiring a prediction result in a preset graph database according to the characteristic sequence, and sending the inquired prediction result to the terminal, so that an answer for acquiring the question to be tested from the knowledge graph can be accurately found, and the use experience of a user is improved.

Description

Knowledge graph-based question and answer method and device, computer equipment and storage medium

Technical Field

The invention relates to the technical field of knowledge graphs, in particular to a question answering method and device based on a knowledge graph, computer equipment and a storage medium.

Background

Intelligent question answering is becoming more popular as an important way to implement artificial intelligence. Existing intelligent question answering is usually done based on deep learning models. The deep learning model needs to model a large amount of data, and has high requirements on hardware. In addition, deep learning is a kind of invisible knowledge representation, which is difficult to reach an accurate answer, resulting in a poor user experience.

Disclosure of Invention

The embodiment of the invention provides a knowledge graph-based question and answer method, a knowledge graph-based question and answer device, computer equipment and a storage medium, and aims to solve the problem that the answer of the existing intelligent question and answer is inaccurate.

In a first aspect, an embodiment of the present invention provides a question-answering method based on a knowledge graph, which includes identifying an entity and a relationship of a question to be detected, if the question to be detected sent by a terminal is received;

acquiring an entity of the question to be tested and a candidate list of the relation from a preset knowledge map;

selecting an entity or a relation from the candidate list of the entities and the relations of the question to be tested to form a characteristic sequence, wherein the sum of cost functions between the entities and the relations of the characteristic sequence is minimum;

and inquiring a prediction result in a preset graph database according to the characteristic sequence, and sending the inquired prediction result to the terminal.

In a second aspect, an embodiment of the present invention further provides a knowledge-graph-based question answering device, which includes:

the identification unit is used for identifying the entity and the relation of the question to be detected if the question to be detected sent by the terminal is received;

the acquiring unit is used for acquiring the entity of the question to be tested and a candidate list of the relation from a preset knowledge map;

the first selecting unit is used for selecting an entity or a relation from the candidate list of the entities and the relations of the question to be tested to form a characteristic sequence, wherein the sum of cost functions between the entities and the relations of the characteristic sequence is minimum;

and the first query unit is used for querying the prediction result in a preset graph database according to the characteristic sequence and sending the queried prediction result to the terminal.

In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the above method when executing the computer program.

In a fourth aspect, the present invention also provides a computer-readable storage medium, which stores a computer program, and the computer program can implement the above method when being executed by a processor.

According to the technical scheme of the embodiment of the invention, if a question to be tested sent by a terminal is received, the entity and the relation of the question to be tested are identified; acquiring an entity of the question to be tested and a candidate list of the relation from a preset knowledge map; selecting an entity or a relation from the candidate list of the entities and the relations of the question to be tested to form a characteristic sequence, wherein the sum of cost functions between the entities and the relations of the characteristic sequence is minimum; and inquiring a prediction result in a preset graph database according to the characteristic sequence, and sending the inquired prediction result to the terminal, so that an answer for acquiring the question to be tested from the knowledge graph can be accurately found, and the use experience of a user is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flow chart of a knowledge-graph-based question-answering method according to an embodiment of the present invention;

FIG. 2 is a schematic block diagram of a computer device provided by an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Referring to fig. 1, fig. 1 is a schematic flow chart of a knowledge-graph-based question answering method according to an embodiment of the present invention. As shown, the method includes the following steps S1-S4.

And S1, if the question to be tested sent by the terminal is received, identifying the entity and the relationship of the question to be tested.

In specific implementation, if a question to be tested sent by a terminal is received, the entity and the relationship of the question to be tested are identified.

The entity and relation recognition of the question to be tested mainly adopts a neural network model based on BERT and conditional random fields. BERT is a word vector training model. Since the word vectors generated by BERT have a strong expression ability, the word vectors generated by BERT are used as feature extraction. The conditional random field can take global characteristics of the whole question sentence into consideration when classifying the current category, and accuracy of classification is improved. The neural network model based on the BERT and the conditional random field is composed of a first BERT layer, a second fully-connected layer and a third conditional random field layer.

Therefore, in one embodiment, the above step S1 specifically includes the following steps S11-S12.

S11, inputting the question to be tested into a BERT layer of a preset neural network model based on a BERT and a conditional random field to extract the feature vector of the question to be tested.

And S12, inputting the question sentence to be tested into the full-link layer and the conditional random field layer of the neural network model based on the BERT and the conditional random field respectively to obtain the mark corresponding to each word of the question sentence to be tested, and extracting the entity and the relation from the mark corresponding to each word of the question sentence to be tested.

And S2, acquiring the entity of the question to be tested and a candidate list of the relationship from a preset knowledge map.

In specific implementation, the entity of the question to be tested and the candidate list of the relationship are obtained from a preset knowledge graph. The candidate list generation method mainly adopts a method of combining n-gram and semantic similarity. An n-gram is a language model where n represents the maximum length of a continuous association of text. Firstly, an inverted index is established for an entity in a knowledge graph by adopting an n-gram method, and then the similarity between a candidate entity and a question is calculated.

Specifically, in one embodiment, the above step S2 includes the following steps S21-S22.

And S21, acquiring the entity of the question to be tested and a candidate list of the relation from the knowledge graph through a preset index server.

And S22, respectively obtaining the similarity between the entities or the relations in each candidate list and the question to be tested, and sequencing the entities or the relations in each candidate list according to the sequence of the similarity from high to low.

It should be noted that both the entity or relationship in the candidate list and the question to be tested generate a sentence vector by a BERT algorithm, and then the similarity between the entity or relationship and the question to be tested is calculated according to a cosine method.

S3, selecting an entity or a relation from the candidate list of the entities and the relations of the question to be tested to form a feature sequence, wherein the sum of cost functions between the entities and the relations of the feature sequence is minimum.

In specific implementation, an entity or a relation is selected from the candidate list of the entities and the relations of the question to be tested to form a feature sequence, wherein the sum of cost functions between the entities and the relations of the feature sequence is minimum.

Entity and relationship linking refers to the selection of the most appropriate entity or relationship from its candidate list for each word predicted to be an entity or relationship. In order to pick the most suitable entity or relationship from each candidate list, a cost function is defined. Cost function cos t (v)_i,v_j) Is represented by v_iNode to v_jThe distance of the nodes. The calculation method comprises two parts, namely RDF2Vec, wherein the RDF2Vec is vectorization of entities and relations in the knowledge graph, and the other part is weight of each entity or relation in the candidate list. And calculating the similarity of the two entities by using the vectors of the entities through a cosine method and combining the weights of the entities to obtain the distance between the two entities. Which ranges from zero to one. The closer the value is to zero, the closer the two nodes are. Wherein v is_i、v_jRespectively representing entities or relations in the ith candidate list and the jth candidate list.

Specifically, in one embodiment, the above step S3 includes the following steps S31-S32.

And S31, calculating the distance between any two nodes in different candidate lists.

S32, selecting a sequence that minimizes the sum of the total cost functions from the candidate lists as the feature sequence.

In specific implementation, one entity or relationship is selected from each candidate list to obtain a plurality of sequences, and a sequence with the minimum sum of total cost functions is selected from the obtained plurality of sequences as the characteristic sequence.

And S4, inquiring the prediction result in a preset graph database according to the characteristic sequence, and sending the inquired prediction result to the terminal.

In specific implementation, a prediction result is queried in a preset graph database according to the characteristic sequence, and the queried prediction result is sent to the terminal. And generating a knowledge graph query statement to the graph database for query according to the entities or the relations of the feature sequences obtained in the step S3 and the preset query template, and returning a query result. Since the knowledge-graph consists of < entity, relationship, entity > triplets, all query templates consist of two, template 1 is < entity? Entity >, another template 2 is <? Relation, entity > or < entity, relation,? And (6).

Specifically, in one embodiment, the above step S4 includes the following steps S41-S45.

And S41, querying related entities according to the entities of the characteristic sequences and the relationship between the entities and the graph database.

In specific implementation, related entities are queried from the graph database according to the entities and the relations of the feature sequences and the template 2.

And S42, inquiring related relations from the inquired entities and the entities of the characteristic sequences to the graph database.

In specific implementation, related relationships are queried from the queried entities, the entities of the feature sequences and the template 1 to the graph database.

And S43, judging whether the related relation is inquired.

S44, if the related relationship is inquired, inquiring the related entity from the graph database according to the inquired relationship and the entity of the characteristic sequence, and returning to the step of inquiring the related relationship from the inquired entity and the entity of the characteristic sequence to the graph database until the related relationship is not inquired.

And S45, if the related relation is not inquired, outputting the inquiry result.

Corresponding to the question-answering method based on the knowledge graph, the invention also provides a question-answering device based on the knowledge graph. The knowledge-graph-based question answering device comprises a unit for executing the knowledge-graph-based question answering method. Specifically, the knowledge-graph-based question answering device comprises an identification unit, an acquisition unit, a first selection unit and a first query unit.

And the identification unit is used for identifying the entity and the relation of the question to be detected if the question to be detected sent by the terminal is received.

And the acquisition unit is used for acquiring the entity of the question to be tested and the candidate list of the relationship from a preset knowledge map.

And the first selecting unit is used for selecting an entity or a relation from the candidate list of the entities and the relations of the question to be tested to form a characteristic sequence, wherein the sum of cost functions between the entities and the relations of the characteristic sequence is minimum.

In one embodiment, the identification unit includes a first input unit and a second input unit.

The first input unit is used for inputting the question to be tested into a BERT layer of a preset neural network model based on a BERT and a conditional random field so as to extract the feature vector of the question to be tested.

And the second input unit is used for respectively inputting the question sentence to be tested into the full connection layer and the conditional random field layer of the neural network model based on the BERT and the conditional random field so as to obtain the mark corresponding to each word of the question sentence to be tested, and extracting the entity and the relationship from the mark corresponding to each word of the question sentence to be tested.

In an embodiment, the first obtaining unit includes a second obtaining unit and a sorting unit.

And the second acquisition unit is used for acquiring the entity of the question to be asked and the candidate list of the relationship from the knowledge graph through a preset index server.

And the sequencing unit is used for respectively acquiring the similarity between the entities or the relations in each candidate list and the question to be tested, and sequencing the entities or the relations in each candidate list according to the sequence of the similarity from high to low.

In one embodiment, the first selecting unit includes a calculating unit and a second selecting unit.

And the calculating unit is used for calculating the distance between any two nodes in different candidate lists.

And a second selecting unit configured to select, as the feature sequence, a sequence that minimizes a sum of total cost functions from among the candidate lists.

In one embodiment, the first query unit includes a second query unit, a third query unit, a judgment unit, a fourth query unit, and an output unit.

And the second query unit is used for querying related entities in the graph database according to the entities of the characteristic sequences and the relations.

And the third query unit is used for querying related relations from the queried entities and the entities of the characteristic sequences to the graph database.

And the judging unit is used for judging whether the related relation is inquired.

And the fourth query unit is used for querying the related entity in the graph database according to the queried relationship and the entity of the characteristic sequence if the related relationship is queried, and returning to the step of querying the related relationship in the graph database according to the queried entity and the entity of the characteristic sequence.

And the output unit is used for outputting the query result if the relevant relationship is not queried.

It should be noted that, as will be clear to those skilled in the art, the specific implementation process of the above question-answering device based on a knowledge graph and each unit may refer to the corresponding description in the foregoing method embodiment, and for convenience and brevity of description, no further description is provided herein.

The above-described knowledge-graph-based question answering apparatus may be implemented in the form of a computer program that can be run on a computer device as shown in fig. 2.

Referring to fig. 2, fig. 2 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 500 may be a terminal or a server, where the terminal may be an electronic device with a communication function, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant, and a wearable device. The server may be an independent server or a server cluster composed of a plurality of servers.

Referring to fig. 2, the computer device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.

The non-volatile storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032, when executed, causes the processor 502 to perform a knowledge-graph based question-answering method.

The processor 502 is used to provide computing and control capabilities to support the operation of the overall computer device 500.

The internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503, and when the computer program 5032 is executed by the processor 502, the processor 502 can be enabled to execute a knowledge-graph-based question-answering method.

The network interface 505 is used for network communication with other devices. Those skilled in the art will appreciate that the configuration shown in fig. 2 is a block diagram of only a portion of the configuration associated with the present application and does not constitute a limitation of the computer device 500 to which the present application may be applied, and that a particular computer device 500 may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

Wherein the processor 502 is configured to run the computer program 5032 stored in the memory to implement the following steps:

if a question to be tested sent by a terminal is received, identifying an entity and a relation of the question to be tested;

In an embodiment, when the processor 502 implements the step of identifying the entity and the relationship of the question to be tested, the following steps are specifically implemented:

inputting the question to be tested into a preset BERT layer of a neural network model based on a BERT and a conditional random field to extract a feature vector of the question to be tested;

and respectively inputting the question sentence to be tested into a full-connection layer and a conditional random field layer of the neural network model based on the BERT and the conditional random field to obtain a mark corresponding to each word of the question sentence to be tested, and extracting an entity and a relation from the mark corresponding to each word of the question sentence to be tested.

In an embodiment, when implementing the step of obtaining the candidate list of entities and relationships of the question to be tested from the preset knowledge graph, the processor 502 specifically implements the following steps:

acquiring an entity of the question to be tested and a candidate list of the relation from the knowledge graph through a preset index server;

and respectively obtaining the similarity between the entities or the relations in each candidate list and the question to be tested, and sequencing the entities or the relations in each candidate list according to the sequence of the similarity from high to low.

In an embodiment, when implementing the step of selecting an entity or a relationship from the candidate lists of entities and relationships of the question to be tested to form the feature sequence, the processor 502 specifically implements the following steps:

calculating the distance between any two nodes in different candidate lists;

and selecting a sequence which minimizes the sum of the total cost functions from each candidate list as the characteristic sequence.

In an embodiment, when the processor 502 implements the step of querying the predicted result from the feature sequence into the preset map database, the following steps are specifically implemented:

querying related entities according to the entities of the characteristic sequence and the entities related to the graph database;

querying related relations from the queried entities and the entities of the feature sequences to the graph database;

judging whether related relations are inquired;

if the related relation is inquired, inquiring the related entity from the graph database according to the inquired relation and the entity of the characteristic sequence, and returning to the step of inquiring the related relation from the graph database according to the inquired entity and the entity of the characteristic sequence;

and if the relevant relation is not inquired, outputting the inquiry result.

It should be understood that, in the embodiment of the present Application, the Processor 502 may be a Central Processing Unit (CPU), and the Processor 502 may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable Gate arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It will be understood by those skilled in the art that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program instructing associated hardware. The computer program may be stored in a storage medium, which is a computer-readable storage medium. The computer program is executed by at least one processor in the computer system to implement the flow steps of the embodiments of the method described above.

Accordingly, the present invention also provides a storage medium. The storage medium may be a computer-readable storage medium. The storage medium stores a computer program. The computer program, when executed by a processor, causes the processor to perform the steps of:

In an embodiment, when the processor executes the computer program to implement the step of identifying the entity and the relationship of the question to be tested, the following steps are specifically implemented:

In an embodiment, when the processor executes the computer program to implement the step of obtaining the candidate list of entities and relationships of the question to be asked from the preset knowledge graph, the following steps are specifically implemented:

In an embodiment, when the processor executes the computer program to implement the step of selecting an entity or a relationship from the candidate list of entities and relationships of the question to be tested to form a feature sequence, the following steps are specifically implemented:

calculating the distance between any two nodes in different candidate lists;

In an embodiment, when the processor executes the computer program to implement the step of querying the predicted result from the feature sequence into a preset graph database, the following steps are specifically implemented:

judging whether related relations are inquired;

and if the relevant relation is not inquired, outputting the inquiry result.

The storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk, which can store various computer readable storage media.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative. For example, the division of each unit is only one logic function division, and there may be another division manner in actual implementation. For example, various elements or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented.

The steps in the method of the embodiment of the invention can be sequentially adjusted, combined and deleted according to actual needs. The units in the device of the embodiment of the invention can be merged, divided and deleted according to actual needs. In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a terminal, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, while the invention has been described with respect to the above-described embodiments, it will be understood that the invention is not limited thereto but may be embodied with various modifications and changes.

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A question-answering method based on a knowledge graph is characterized by comprising the following steps:

2. The knowledge-graph-based question answering method according to claim 1, wherein the identifying of the entities and relationships of the question to be tested comprises:

3. The knowledge-graph-based question answering method according to claim 1, wherein the obtaining of the candidate list of entities and relationships of the question to be tested from a preset knowledge graph comprises:

4. The knowledge-graph-based question-answering method according to claim 1, wherein the selecting an entity or a relationship from the candidate list of entities and relationships of the question to be tested to form a feature sequence comprises:

calculating the distance between any two nodes in different candidate lists;

5. The knowledge-graph-based question-answering method according to claim 1, wherein the querying of the predicted result from the feature sequence into a preset graph database comprises:

judging whether related relations are inquired;

and if the relevant relation is not inquired, outputting the inquiry result.

6. A knowledge-graph-based question answering device, comprising:

7. The knowledge-graph-based question answering device according to claim 6, wherein the identifying unit comprises:

the first input unit is used for inputting the question to be tested into a preset BERT layer of a neural network model based on a BERT and a conditional random field so as to extract a feature vector of the question to be tested;

8. The knowledge-graph-based question answering device according to claim 6, wherein the first obtaining unit includes:

the second acquisition unit is used for acquiring the entity of the question to be asked and a candidate list of the relation from the knowledge graph through a preset index server;

9. A computer arrangement, characterized in that the computer arrangement comprises a memory having stored thereon a computer program and a processor implementing the method according to any of claims 1-5 when executing the computer program.

10. A computer-readable storage medium, characterized in that the storage medium stores a computer program which, when being executed by a processor, is adapted to carry out the method according to any one of claims 1-5.