WO2021086221A1

WO2021086221A1 - Method for maintaining dialogue consistency using a knowledge graph and checking statements for inconsistencies

Info

Publication number: WO2021086221A1
Application number: PCT/RU2019/000779
Authority: WO
Inventors: Владимир Александрович СУВОРОВ; Михаил Сергеевич БУРЦЕВ
Priority date: 2019-10-31
Filing date: 2019-10-31
Publication date: 2021-05-06

Abstract

The invention relates to a method for checking statements for inconsistencies with the aid of a knowledge graph. The present method includes steps in which a user's statement is received and transferred to a response-generating module. The response-generating module formulates a candidate response, and the candidate response is checked for inconsistencies. An object in a statement is extracted from the candidate response. In an object-subject-predicate extraction module, a search is carried out for a subject and predicate in a phrase, and a search is carried out for subject and predicate in the knowledge graph. In a graph-checking module, an inconsistency is identified if the two subjects or two predicates differ in the phrase and in the knowledge base. A new entry is made in the knowledge graph if the subject or predicate in the phrase is included in or does not contradict the concept of the subject or predicate in the base, and the response is issued to the user. This ensures a statement is checked for inconsistencies.

Description

METHOD FOR MAINTAINING CONSISTENCY OF DIALOGUE BASED ON KNOWLEDGE GRAPH AND CHECKING REPORTS FOR CONTRADICTIONS

FIELD OF TECHNOLOGY

The present technical solution relates to the field of computing, in particular, to a method for maintaining the consistency of a dialogue based on the knowledge graph and checking statements for contradictions.

LEVEL OF TECHNOLOGY

Bots are known from the prior art - Mitsuku from Pandorabots, Inc., Voice Assistant Alice from Yandex - which can maintain dialogues with the user.

However, the disadvantage of these bots is that during the dialogue bots can make contradictions in their statements.

Known information source CN 107562863 A, disclosing the automatic generation of a chat bot response, where keywords are extracted in the operator entered by the user, the search is carried out using keywords as indexes, and the corresponding question-answer pair is in the knowledge base; if a matching question-answer pair is found, the play suggestion in the question-answer pair is accepted as the response operator for input; if no matching question-answer pair is found, the operator entered by the user enters a response generation module.

The disadvantage of this solution is that the check is not carried out regarding the inconsistency of the statements of the bot itself. To answer similar questions and ensure consistency, the question similarity metric (similarity) is used, which does not fully ensure consistency, since differently reformulated questions can be attributed to different answers according to this metric.

The closest analogue is the source of information US 2019/0138595 A1, which discloses a method of argumentation in a text, where text containing fragments is accessed on a computing device. The application creates a discourse tree from text. The discourse tree includes nodes, each nonterminal node representing a rhetorical relationship between two fragments, and each leaf node of the nodes of the discourse tree is associated with one of the fragments. The application matches each fragment that has a verb with the signature of the verb, thereby creating a tree of communicative discourse. The application determines whether the tree of communicative discourse represents text that includes affective argumentation, applying a classification model trained to detect affective argumentation in a tree of communicative discourse.

The disadvantage of this solution is that it uses a tree for detecting argumentation and is associated with action verbs (in nodes) and is not used for the direct task of detecting contradictions in a bot's response.

SUMMARY OF THE INVENTION

The technical problem to be solved by the claimed technical solution is the creation of a chatbot architecture that allows it to have memory in the form of a knowledge graph and to check statements for contradictions. A special feature is that the bot's knowledge base, collected in the form of dialogues, can be checked for contradictions in the bot's personality and contradictory statements can be removed (all but one). Thus, only clearly consistent statements remain in the database. The present invention is aimed at providing a computer-implemented method for maintaining the consistency of a dialogue based on the knowledge graph and checking statements for contradictions, which is characterized in the independent claim. Additional embodiments of the present invention are presented in the dependent claims.

The technical result consists in the possibility of checking the statement for contradictions.

In a preferred embodiment, a computer-implemented method is claimed for maintaining the consistency of a dialogue based on a knowledge graph and checking statements for contradictions, including the stages at which, by means of a processor: the user's statement is obtained and transmitted to the response generation module; the response generation module generates a candidate response; in this case, the candidate answer is checked for contradictions, where: the object in the statement is selected from the candidate answer; in the module for extracting the subject-subject-predicate, the subject and predicate are searched for in the phrase; search for the subject and predicate in the knowledge graph; in the verification module with respect to the graph, contradictions are found if two subjects or two predicates in the phrase and in the base are different; make a new record in the knowledge graph if the subject or predicate in the phrase is included or does not contradict the concept of the subject or predicate in the base; and give a response to the user. In a particular version, the response generation module is based on ranking or generative or Open-Domain Question Answering (ODQA) models.

In another particular version, the check in the module relative to the knowledge graph is implemented using MQL / SparkQL queries by finding or not finding an answer.

DESCRIPTION OF DRAWINGS

The implementation of the invention will be described in the following in accordance with the accompanying drawings, which are presented to explain the essence of the invention and in no way limit the scope of the invention. The following drawings are attached to the application:

FIG. 1 illustrates a flow diagram of a method;

FIG. 2 illustrates a block diagram of a computing device.

DETAILED DESCRIPTION OF THE INVENTION

In the following detailed description of an implementation of the invention, numerous implementation details are set forth in order to provide a thorough understanding of the present invention. However, it will be obvious to those skilled in the art how the present invention can be used, with or without these implementation details. In other instances, well-known techniques, procedures, and components have not been described in detail so as not to obscure the details of the present invention.

In addition, from the above presentation it will be clear that the invention is not limited to the above implementation. Numerous possible modifications, changes, variations and substitutions, while retaining the spirit and form of the present invention, will be apparent to those skilled in the art.

A knowledge graph is a way of organizing information in a database. The knowledge graph is built on the basis of a base graph (for example wikidata) + a set of reference texts and dialogues (for example, a description of the personality of a bot with its relation to objects).

In the claimed method, a contradictory statement is determined, where for one pair of object-subject there is a different predicate or for one pair of object-predicate there is a different subject. The candidate answer is checked for each of these conditions, and if at least one of them is inconsistent, the answer will be rejected.

For each condition, some exceptions are specified by a list of subject-predicate relations and a list of subject-predicate exceptions. Also, the list of inclusions is supplemented with subjects or predicates clearly from the same class and contradicting each other, for example, "I love", "I hate". The specified lists are in the module for checking against the knowledge graph. A list of subject relationships or predicates can be implemented as an ontology. At the same time, being on the same branch of the tree allows us to say that there is no contradiction. For example, the subject "apples" is part of the "fruit" so there will be no contradiction. The conditions for this list are:

1) the subject or predicate is included in the concept;

2) the subject or predicate contradicts the concept;

3) the subject or predicate does not contradict the concept.

By the list of exclusion of subjects and predicates is meant a list of consistent predicates or subjects, that is, unrelated predicates or subjects that do not contradict each other. For example, “color” “size” is “I like big apples” and “I like yellow apples”.

FIG. 1 characterizes the way of maintaining a consistent dialogue based on the knowledge graph and checking statements for contradictions.

The claimed method is performed on a computing device.

The user's phrase is delivered to the computing device through any available graphical or voice interface. Then the phrase enters the system to generate a response. A candidate answer is being formed. The candidate answer is checked for inconsistencies in the knowledge graph. For the implementation of the method, the key module is the "relationship check module". This module can be expressed by the following modules:

1) Entity extraction module. This module searches for entities relative to the knowledge graph. If the entity retrieval module was unable to retrieve, then the check is considered successful.

2) Extraction module object-subject-predicate. This module searches for predicates, objects and subjects, within the found entity in the entity extraction module, relative to the knowledge graph. The module is based on methods for extracting entities, for example, working with a catalog or various Named-Entity Recognition (NER) models (deeppavlov.ai). Directory Entity Extraction (also called gazeteer) parses the text and matches each element of the text segment with the entities in the directory. If a segment of text and a catalog item match, the module marks the segment as an entity. When using NER methods in a module, it uses neural network architectures to define entities and their boundaries in the text (for example, http://docs.deeppaylov.ai/en/master/features/models/ner.html). 3) Checking against the knowledge graph. In this module, a possible list of available object and predicate relationships is matched. Check for discrepancy with respect to the generated circuit.

If there are no contradictions, then the statement is recorded in the database as a new branch of the graph and the answer is given, if there is, the next relevant answer is checked.

The data in the database can be generated relative to previous responses and according to the description of the bot's personality or in another way.

Bot identity is a textual description of the bot's identity. It is presented as a list of sentences in natural language listing facts that characterize the individual characteristics of the bot (what it likes, dislikes, what interests it, and other affilations).

The response generation system is a ranking model (for example, based on the similarity of embeddings) or a generative model, or an ODQA (phrase extraction from text) model. Using one of these models, a candidate response is generated for the entered phrase.

The candidate answer is checked for inconsistencies in the knowledge graph.

First, using the classifier of important entities, the entity is searched for, to which the statement belongs.

A classifier of important entities is, for example, a CNN model for classification pre-trained on a known dataset regarding important entities (descriptions of a bot's personality, attitudes towards politics, attitudes towards music, ethical values). Classifies either to a certain class of important entities or says that there are no such entities.

After extracting the entities from the extraction module, the subject-subject-predicate is searched for the predicate in the candidate response and transformed into a time-sensitive form. Then the predicate is searched in the database.

Further, the search for the subject and the object is carried out, also in the candidate answer and in the database.

After extracting the entities, object, subject and predicate, relationships are compared and checked against the knowledge graph. In the knowledge column, the types of possible relationships are fixed in advance, for example, “year of birth, work, love, etc.”. No search is performed for relations of any other type. The graph is updated only by retrieving a pair of objects for the specified type of relationship. Search queries can be generated in the MQL / SparkQL language. Checking against the knowledge graph is implemented using MQL / SparkQL queries by finding or not finding an answer. If a given contradiction is found, then the candidate answer is rejected, and the system proceeds to analyze the next candidate answer until it finds a suitable one or exhausts all the answers.

If no contradictions are found, then a new statement, expressed by the candidate answer, is written into the knowledge graph, after which the answer is given to the user.

Let's look at an example.

The user enters the phrase "Do you like apples?" The phrase is passed to the server, there are two statements in the database:

I am (love n.v.) fruit.

I (love nv) apples (color) green.

The response generation system generates a candidate response: "I love red apples."

Further, in the classifier of important entities, an important entity is highlighted, it is "I".

In the candidate answer, a predicate is searched for, it is "love", and it is also searched for in the database - "love". In addition, the subject and object are searched for in the phrase and in the database, one object is found - "apples", both in the candidate response and in the database.

Further processing takes place relative to the "apples" object. Finding Essential Entities - Apples are an important entity.

Search for predicates in the candidate answer and in the base, the predicate is "has a color". A search for a subject and an object is carried out in the candidate answer and in the database. The subject "red" of the predicate "has a color" was found in the candidate answer. The subject "green" of the predicate "has color" was found in the database.

Since the subject "red" of the candidate answer and the subject "green" of the answer in the database are different subjects, the candidate answer "I love red apples" is rejected and the system starts analyzing the next candidate answer. The analysis continues until a suitable candidate answer is found or until all possible candidate answers have been exhausted. This example illustrates the condition when a pair of predicate object has a different subject.

The system generates the following candidate answer: "I love red pears."

In the classifier of important entities, an important entity (object) is highlighted, it is "I".

In the candidate answer, a predicate is searched for, it is "love", and it is also searched for in the database - "love". In addition, the search for the subject and the object in the candidate answer b and in the database, the subject in the candidate answer is “pears”, and the found subject in the database is “fruit”.

Since the subject of "pear" is a subset of the subject "fruit", there is no contradiction, the search ends.

The candidate answer "I love red pears" is written to the database as a new branch of the graph.

FIG. 2, a general diagram of a computing device (200) that provides data processing necessary for the implementation of the claimed solution will be presented below.

In general, the device (200) contains such components as: one or more processors (201), at least one memory (202), data storage means (203), input / output interfaces (204), I / O means ( 205), networking tools (206).

The processor (201) of the device performs the basic computational operations necessary for the operation of the device (200) or the functionality of one or more of its components. The processor (201) executes the necessary computer readable instructions contained in the main memory (202).

Memory (202), as a rule, is made in the form of RAM and contains the necessary program logic that provides the required functionality.

The data storage medium (203) can be performed in the form of HDD, SSD disks, raid array, network storage, flash memory, optical information storage devices (CD, DVD, MD, Blue-Ray disks), etc. The means (203) allows performing long-term storage of various types of information, for example, the aforementioned files with user data sets, a database containing records of time intervals measured for each user, user identifiers, etc.

Interfaces (204) are standard means for connecting and working with the server side, for example, USB, RS232, RJ45, LPT, COM, HDMI, PS / 2, Lightning, FireWire, etc.

The choice of interfaces (204) depends on the specific implementation of the device (200), which can be a personal computer, mainframe, server cluster, thin client, smartphone, laptop, etc.

As means of I / O data (205) in any embodiment of a system that implements the described method, a keyboard should be used. The hardware design of the keyboard can be any known: it can be either a built-in keyboard used on a laptop or netbook, or a stand-alone device connected to a desktop computer, server or other computer device. Connection in this case, it can be either wired, in which the connecting cable of the keyboard is connected to the PS / 2 or USB port located on the system unit of the desktop computer, or wireless, in which the keyboard exchanges data via a wireless communication channel, for example, a radio channel, with the base station, which, in turn, is directly connected to the system unit, for example, to one of the USB ports. In addition to the keyboard, I / O data can also include: joystick, display (touch screen), projector, touchpad, mouse, trackball, light pen, speakers, microphone, etc.

Networking means (206) are selected from a device that provides network reception and transmission of data, for example, Ethernet card, WLAN / Wi-Fi module, Bluetooth module, BLE module, NFC module, IrDa, RFID module, GSM modem, etc. The means (205) provide the organization of data exchange via a wired or wireless data transmission channel, for example, WAN, PAN, LAN, Intranet, Internet, WLAN, WMAN or GSM.

The components of the device (200) are interfaced through a common data bus (210).

In the present application materials, the preferred disclosure of the implementation of the claimed technical solution has been presented, which should not be used as limiting other, particular embodiments of its implementation, which do not go beyond the scope of the claimed scope of legal protection and are obvious to specialists in the relevant field of technology.

Claims

Formula

1. A computer-implemented method for checking statements for contradictions using a knowledge graph, which includes the stages at which, by means of a processor: the user’s statement is received and transmitted to the response generation module; the response generation module generates a candidate response; in this case, the candidate answer is checked for contradictions, where: the object in the statement is selected from the candidate answer; in the module for extracting the subject-subject-predicate, the subject and predicate are searched for in the phrase; search for the subject and predicate in the knowledge graph; in the verification module with respect to the graph, contradictions are found if two subjects or two predicates in the phrase and in the base are different; make a new record in the knowledge graph if the subject or predicate in the phrase is included or does not contradict the concept of the subject or predicate in the base; and give a response to the user.

2. The method according to claim 1, characterized in that the response generation module is based on ranking or generative or ODQA models.

3. The method according to claim 1, characterized in that the check in the module relative to the knowledge graph is implemented using MQL / SparkQL queries by finding or not finding an answer.

nine

SUBSTITUTE SHEET (RULE 26)