CN111400480B

CN111400480B - User intention recognition method and device for multi-round dialogue

Info

Publication number: CN111400480B
Application number: CN202010316438.6A
Authority: CN
Inventors: 张�杰; 鄢杭; 蒋亚凡; 王雅芳
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-04-21
Filing date: 2020-04-21
Publication date: 2023-05-12
Anticipated expiration: 2040-04-21
Also published as: CN111400480A

Abstract

The embodiment of the specification provides a method and a device for identifying user intention aiming at multi-round dialogue, which are used for identifying the user intention based on a pre-established knowledge graph, wherein the knowledge graph associates each knowledge point element with each standard question, and the method comprises the following steps: acquiring user text of at least one round of current multi-round dialogue; encoding at least one round of user text to obtain a context embedding vector; in the knowledge graph, iteratively searching for a next-hop node from a root node according to the context embedding vector; after a predetermined number of iterations, selecting a target node; and determining the element or standard question corresponding to the target node as a user intention recognition result. A stable recognition effect can be ensured.

Description

User intention recognition method and device for multi-round dialogue

Technical Field

One or more embodiments of the present description relate to the field of computers, and more particularly, to a method and apparatus for user intent recognition for multiple rounds of conversations.

Background

Currently, in intelligent customer service, a machine and a user perform a dialogue to answer a user problem, and one round of dialogue often cannot definitely request a user due to the expression spoken of the user, multiple rounds of dialogue are needed between the machine and the user, and user intention recognition is performed for the multiple rounds of dialogue, so that the user request can be definitely finally.

In the multi-round dialogue process of the user and the machine, aiming at the condition that the user description information is incomplete, the standard question meeting the user intention cannot be directly identified, only the knowledge point elements meeting the user intention can be identified, the machine is required to conduct guiding back-query according to the identified knowledge point elements so as to lead the part of the user with missing supplementary information, and finally the standard question meeting the user intention is identified; for the situation that the user descriptive information is complete, standard questions conforming to the user intention can be generally and directly identified.

In the user intention recognition method for multi-round conversations in the prior art, since the user intention recognition result comprises two types of standard questions and knowledge point elements, a stable recognition effect cannot be ensured.

Disclosure of Invention

One or more embodiments of the present specification describe a method and apparatus for recognizing user intention for a multi-round dialogue, which can ensure a stable recognition effect.

In a first aspect, a method for identifying user intention for multiple rounds of conversations is provided, where the method performs user intention identification based on a pre-established knowledge graph, the knowledge graph includes a root node and multiple types of branch nodes, the multiple types of branch nodes include element nodes and question-marking nodes, the element nodes correspond to knowledge point elements in a knowledge domain to which the multiple rounds of conversations belong, the question-marking nodes correspond to standard question-marks, the nodes with association relationships are connected through directional connection edges of corresponding types, and each node has a connection edge connected to itself, and the method includes:

Acquiring user text of at least one round of current multi-round dialogue;

encoding the user text of at least one round to obtain a context embedding vector;

in the knowledge graph, determining an initial state according to the root node and the context embedding vector, searching a first number of next-hop nodes from all associated nodes connected with all outgoing edges of the root node according to the initial state, and respectively updating the first number of next-hop nodes into current nodes;

for each current node, executing a preset number of iterations, wherein each iteration comprises the steps of determining a current state according to the current node, the root node and the context embedding vector, searching a second number of next-hop nodes from all associated nodes connected with all outgoing edges of the current node according to the current state, determining all action probabilities corresponding to each next-hop node, and selecting one next-hop node to update as the current node according to all action probabilities;

after a preset number of iterations, selecting a current node with the maximum action probability as a target node;

and determining the element or standard question corresponding to the target node as a user intention recognition result.

In one possible implementation, the determining the current state according to the current node, the root node, and the context embedding vector includes:

determining a search path formed by each node and a connecting edge between the root node and the current node;

determining a path embedding vector corresponding to the search path according to the node embedding vector of each node and the edge embedding vector of the connecting edge in the search path;

and determining the current state according to the path embedded vector corresponding to the search path, the node embedded vector corresponding to the current node and the context embedded vector.

Further, the determining the path embedding vector corresponding to the search path according to the node embedding vector of each node and the edge embedding vector of the connecting edge in the search path includes:

taking a connecting edge in the search path and a node pointed by the connecting edge as path elements, and sequentially determining output vectors corresponding to the path elements according to the sequence of the path elements in the search path; determining an output vector corresponding to the current path element according to the output vector corresponding to the last path element and the embedded vector of the current path element; and determining the output vector corresponding to the last path element as the path embedding vector corresponding to the search path.

In a possible implementation manner, the searching, according to the current state, a second number of next-hop nodes from the associated nodes connected by the edges of the current node, and determining each action probability corresponding to each next-hop node includes:

determining a selectable action set according to each associated node connected with each outgoing side of the current node, outputting a second number of target actions in the selectable action set according to the current state by using a reinforcement learning model, and respectively taking each target action as a next-hop node and taking the action probability of each target action as the action probability of the corresponding next-hop node according to the action probabilities of each target action.

Further, each node and connection edge between the root node and the current node form a search path, and the method further includes:

after a predetermined number of iterations, training the reinforcement learning model based on rewards corresponding to each search path.

Further, the rewards corresponding to the search paths include:

the tail node of the search path reaches a target node confirmed by a user, and the rewards are positive rewards, otherwise, the rewards are negative rewards; and/or the number of the groups of groups,

The search path passes through the key nodes confirmed by the user, and the rewards are positive rewards.

In a possible implementation manner, the selecting a next-hop node according to each action probability to update as the current node includes:

and according to each action probability, selecting one next-hop node with the maximum action probability to update as the current node.

according to the action probabilities, the first proportion selects one next-hop node with the largest action probability to update as the current node, and the second proportion randomly selects one next-hop node to update as the current node.

In a possible implementation manner, the target node is an element node, and the determining that the element or the standard question corresponding to the target node is the user intention recognition result includes:

determining the element corresponding to the element node as a user intention recognition result;

the method further comprises the steps of:

outputting a reply sentence according to a reply template corresponding to the preset element so as to respond to the user in the current multi-round dialogue.

In a possible implementation manner, the target node is a question node, and the determining that the element or the standard question corresponding to the target node is the user intention recognition result includes:

Determining a standard question corresponding to the question node as a user intention recognition result;

the method further comprises the steps of:

and outputting a reply sentence according to a reply template which is preset and corresponds to the standard question sentence so as to respond to the user in the current multi-round dialogue.

Further, after the standard question corresponding to the question node is determined to be the user intention recognition result, the method further includes:

and outputting a search path formed by each node and the connecting edge between the root node and the question node so as to explain the user intention recognition result according to the search path.

In one possible implementation, the knowledge point element includes: business elements and/or claim elements.

In a second aspect, there is provided a user intention recognition apparatus for a multi-round dialogue, the apparatus performing user intention recognition based on a pre-established knowledge graph, the knowledge graph including a root node and a plurality of types of branch nodes, the plurality of types of branch nodes including element nodes and question-marking nodes, wherein the element nodes correspond to knowledge point elements in a knowledge domain to which the multi-round dialogue belongs, the question-marking nodes correspond to standard question-marks, the nodes having an association relationship are connected by a corresponding type of directed connection edge, each node has a connection edge connected to itself, the apparatus comprising:

The acquisition unit is used for acquiring user text of at least one round of current multi-round dialogue;

the embedding unit is used for encoding the user text of at least one round acquired by the acquisition unit to acquire a context embedding vector;

the first searching unit is used for determining an initial state according to the context embedding vector obtained by the root node and the embedding unit in the knowledge graph, searching a first number of next-hop nodes from all associated nodes connected with all outgoing edges of the root node according to the initial state, and respectively updating the first number of next-hop nodes into current nodes;

the second search unit is used for executing iteration for a preset number of times for each current node obtained by the first search unit, each iteration comprises the steps of determining a current state according to the current node, the root node and the context embedding vector, searching a second number of next-hop nodes from all associated nodes connected with all outgoing edges of the current node according to the current state, determining each action probability corresponding to each next-hop node, and selecting one next-hop node according to each action probability to update the next-hop node as the current node;

a selecting unit, configured to select, after a predetermined number of iterations of the second searching unit, a current node with a maximum action probability as a target node;

The identification unit is used for determining the element or the standard question corresponding to the target node selected by the selection unit as a user intention identification result.

In a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.

In a fourth aspect, there is provided a computing device comprising a memory having executable code stored therein and a processor which, when executing the executable code, implements the method of the first aspect.

Through the method and the device provided by the embodiment of the specification, the user intention is identified based on the pre-established knowledge graph, the knowledge graph associates each knowledge point element with each standard question, and at first, the user text of at least one round of current multi-round dialogue is acquired; then coding the user text of at least one round to obtain a context embedding vector; then, in the knowledge graph, starting iterative search of a next-hop node from a root node, and selecting a target node after a preset number of iterations; and finally, determining the element or standard question corresponding to the target node as a user intention recognition result. From the above, according to the embodiment of the present disclosure, the target node representing the user intention recognition node is searched in the knowledge graph according to the user text of at least one round of the current multi-round dialogue, so that a stable recognition effect can be ensured, and the interpretation can be achieved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic illustration of an implementation scenario of an embodiment disclosed herein;

FIG. 2 illustrates a user intent recognition method flowchart for a multi-round dialog, in accordance with one embodiment;

FIG. 3 illustrates a decision diagram of a reinforcement learning model, according to one embodiment;

FIG. 4 illustrates a search path schematic according to one embodiment;

FIG. 5 shows a schematic block diagram of a user intent recognition device for multiple rounds of conversations, according to one embodiment.

Detailed Description

The following describes the scheme provided in the present specification with reference to the drawings.

Fig. 1 is a schematic diagram of an implementation scenario of an embodiment disclosed in the present specification. The implementation scenario involves user intent recognition for multiple rounds of conversations, the user intent recognition result may be an element or a standard question. It is understood that the multiple rounds of conversations may be multiple rounds of conversations of users with machines in an intelligent customer service, where the machines may also be referred to as agents. The method is based on a pre-established knowledge graph to identify user intention, the knowledge graph (knowledgegraph) is called knowledge domain visualization or knowledge domain mapping map in book emotion, a series of different graphs for displaying knowledge development process and structural relationship, knowledge resources and carriers thereof are described by using a visualization technology, and knowledge and the mutual connection between the knowledge resources and carriers are mined, analyzed, constructed, drawn and displayed.

In this embodiment of the present disclosure, the knowledge graph includes a root node and a plurality of types of branch nodes, where the plurality of types of branch nodes include element nodes and query nodes, the element nodes correspond to knowledge point elements in a knowledge domain to which the multi-round dialog belongs, the query nodes correspond to standard question sentences, the nodes with association relationships are connected through corresponding types of directional connection edges, each node has a connection edge connected to itself, and for purposes of brevity, in fig. 1, each node is not shown to be connected to its connection edge.

Referring to fig. 1, in a multi-turn dialogue process between a user and an agent, starting from an initial node given in a knowledge graph, for example, the initial node is er, iteratively searching for a next-hop node according to a user text, for example, firstly searching for a node e1 having a connection edge r1 with the node er, secondly searching for a node e2 having a connection edge r2 with the node e1, secondly searching for a node e3 having a connection edge r3 with the node e2, finally searching for a node e4 having a connection edge r4 with the node e3, ending the search for a predetermined number of searches, forming a search path from the initial node and the connection edge, returning an element or standard question corresponding to the end node to the agent as a user intention recognition result for the multi-turn dialogue, and further querying the user according to the user intention recognition result to obtain a confirmation or denial answer of the user.

In the embodiment of the specification, a plurality of search paths can be obtained, the most suitable search path is found in the plurality of search paths, and elements or standard questions corresponding to the tail nodes of the most suitable search path are returned to the agent as user intention recognition results for multiple rounds of conversations.

Fig. 2 shows a flowchart of a method for identifying user intention for a multi-round dialogue according to one embodiment, which can be based on the implementation scenario shown in fig. 1, and the method performs user intention identification based on a pre-established knowledge graph, where the knowledge graph includes a root node and a plurality of types of branch nodes, and the plurality of types of branch nodes include element nodes and query nodes, where the element nodes correspond to knowledge point elements in a knowledge domain to which the multi-round dialogue belongs, the query nodes correspond to standard questions, and nodes with association relationships are connected through corresponding types of directed connection edges, and each node has a connection edge connected to itself. As shown in fig. 2, the user intention recognition method for a multi-round dialogue in this embodiment includes the steps of: step 21, obtaining user text of at least one round of the current multi-round dialogue; step 22, coding the user text of at least one round to obtain a context embedding vector; step 23, in the knowledge graph, determining an initial state according to the root node and the context embedding vector, searching a first number of next-hop nodes from all associated nodes connected by all outgoing edges of the root node according to the initial state, and respectively updating the first number of next-hop nodes into current nodes; step 24, for each current node, performing a predetermined number of iterations, where each iteration includes determining a current state according to the current node, the root node and the context embedding vector, searching a second number of next-hop nodes from all associated nodes connected by each outgoing edge of the current node according to the current state, determining each action probability corresponding to each next-hop node, and selecting one next-hop node according to each action probability to update the next-hop node as the current node; step 25, after a predetermined number of iterations, selecting the current node with the highest action probability as a target node; and 26, determining the element or standard question corresponding to the target node as a user intention recognition result. Specific implementations of the above steps are described below.

First, at step 21, user text for at least one of the current rounds of dialog is obtained. It will be appreciated that the retrieved user text may be the only user text of the current round, or the retrieved user text may include not only the user text of the current round but also the user text of the previous round.

For example, if the current multi-round dialog has just proceeded to the first round of dialog, the user text of the first round of dialog, such as user text 1 in fig. 1, may be obtained; if the current multi-round dialogue proceeds to the second round dialogue, the user text of the first round dialogue and the user text of the second round dialogue, such as the user text 1 and the user text 2 in fig. 1, can be acquired; if the current multi-round dialog is going to the third round of dialog, the user text of the first round of dialog, the user text of the second round of dialog, and the user text of the third round of dialog, such as user text 1, user text 2, and user text 3 in fig. 1, may be obtained.

The at least one round of user text is then encoded, step 22, resulting in a context embedding vector. It can be appreciated that the existing encoding manner may be used to encode the user text of the at least one round, which is not described herein.

In one example, given a multiple round of conversations, in the ith round of conversations, the system first encodes the current round of conversations Ui and the previous historical conversations U1 through Ui-1, outputting the embedded vector ci of the context.

Then in step 23, in the knowledge graph, an initial state is determined according to the root node and the context embedding vector, a first number of next-hop nodes are searched from each associated node connected with each outgoing side of the root node according to the initial state, and the first number of next-hop nodes are updated to be current nodes respectively. It will be appreciated that when there is a connecting edge between two nodes, the two nodes may be considered to be associated nodes with each other, and in this embodiment each node is its own associated node since it has a connecting edge connected to itself.

The first number may be preset, for example, 3, 5, or 20.

In one example, the first number of next-hop nodes may include duplicate nodes, for example, 3 nodes are connected to each edge of the root node, and the first number of 20 may be 20 next-hop nodes selected randomly from the 3 nodes, that is, each next-hop node is selected randomly from the 3 nodes, and there is necessarily a duplicate node in the 20 next-hop nodes.

And in step 24, for each current node, performing a predetermined number of iterations, where each iteration includes determining a current state according to the current node, the root node and the context embedding vector, searching a second number of next-hop nodes from associated nodes connected by each outgoing edge of the current node according to the current state, determining each action probability corresponding to each next-hop node, and selecting one next-hop node according to each action probability to update the next-hop node as the current node. It will be appreciated that the second number may be the same as the first number described above, e.g., both the first number and the second number are 20; the second number may also be different from the first number described above, e.g. the first number is 20 and the second number is 10.

In one example, the determining the current state from the current node, the root node, and the context embedding vector includes:

For example, the root node is er, the current node is et, the search path from the root node sequentially passes through the connection edge r1, the node e1 … …, the connection edge rt, and then to the current node, where the search path may be represented as (er, r1, e1 …, rt, et), and may be encoded into the corresponding path embedding vector ht by using a long short-term memory (LSTM) network.

In one example, the searching, according to the current state, a second number of next-hop nodes from the associated nodes connected by the edges of the current node, and determining each action probability corresponding to each next-hop node includes:

FIG. 3 illustrates a decision diagram of a reinforcement learning model, according to one embodiment. Referring to fig. 3, the reinforcement learning model may also be called a decision network, where ht, et, and ci are connected in series, and used as input of the decision network, the decision network integrates the action space At and outputs a policy pi theta representing each action probability, where ht is a path embedded vector corresponding to a search path, et is a node embedded vector corresponding to a current node, ci is a context embedded vector, and ht, et, and ci correspond to a current state.

Reinforcement learning (reinforcement learning, RL), also known as re-learning, evaluation learning or reinforcement learning, is one of the paradigm and methodology of machine learning to describe and solve the problem of agents (agents) maximizing returns or achieving specific goals through learning strategies during interactions with an environment. A common model for reinforcement learning is a standard markov decision process (markov decision process, MDP). Reinforcement learning can be classified into model-based reinforcement learning (model-free RL) and model-free RL, and active reinforcement learning (active RL) and passive reinforcement learning (passive RL) according to given conditions. Variants of reinforcement learning include reverse reinforcement learning, hierarchical reinforcement learning, and reinforcement learning of partially observable systems. Algorithms used to solve reinforcement learning problems can be categorized into two types, a policy search algorithm and a value function (value function) algorithm. The deep learning model may be used in reinforcement learning to form deep reinforcement learning.

Further, each node and connection edge between the root node and the current node form a search path, and the method further includes: after a predetermined number of iterations, training the reinforcement learning model based on rewards corresponding to each search path.

FIG. 4 illustrates a search path schematic according to one embodiment. Referring to fig. 4, starting from a root node, a preset number of search paths may be obtained by iteratively searching for next-hop nodes, for example, starting from the root node, 3 next-hop nodes including node 1, node 4 and node 7 are searched, and then node 1, node 4 and node 7 are respectively used as current nodes to continue searching for next-hop nodes. When the node 1 is used as a current node, a plurality of next-hop nodes are searched, then the node 2 is selected from the searched plurality of next-hop nodes to be used as the current node, the next-hop nodes are continuously searched, and the node 3 is searched to be used as the next-hop node of the node 2, so that a searching path of the root node, the node 1, the node 2 and the node 3 is obtained. And so on, obtaining a search path from the root node to the node 4 to the node 5 to the node 6 and a search path from the root node to the node 7 to the node 8 to the node 9. Each branch node in the search path has a corresponding action probability, and only the action probability corresponding to the tail node of each search path is shown in the figure, wherein the action probability of the node 3 is 0.5642, the action probability of the node 6 is 0.8264, and the action probability of the node 9 is 0.9853.

Further, the rewards corresponding to the search paths include:

It will be appreciated that the rewards corresponding to the search paths may correspond to rewards corresponding to the end nodes of the search paths, and that after determining the rewards corresponding to the search paths, the rewards corresponding to the intermediate nodes of the search paths may be determined, e.g., the rewards corresponding to the intermediate nodes may be determined by multiplying the rewards corresponding to the search paths by an attenuation coefficient. After a predetermined number of iterations, the reinforcement learning model may also be trained based on rewards corresponding to each intermediate node.

In one example, the selecting a next-hop node according to each action probability to update as the current node includes:

In another example, the selecting a next-hop node according to each action probability to update as the current node includes:

According to the action probabilities, the first proportion selects one next-hop node with the largest action probability to update as the current node, and the second proportion randomly selects one next-hop node to update as the current node. For example, the first ratio is set to 80% in advance, and the second ratio is set to 20%.

Then, in step 25, after a predetermined number of iterations, the current node with the highest action probability is selected as the target node. It will be appreciated that after a predetermined number of iterations, the search for the next hop node is not continued, the current node being the tail node of the search path.

In one example, the determining that the element or the standard question corresponding to the target node is the user intention recognition result includes:

the method further comprises the steps of:

In another example, the target node is a query node, and the determining that the element or the standard question corresponding to the target node is the user intention recognition result includes:

the method further comprises the steps of:

It can be understood that different reply templates corresponding to the elements and the standard questions can be preset, so that reply sentences are flexible and changeable, and better experience is provided for users.

Finally, in step 26, it is determined that the element or standard question corresponding to the target node is the user intention recognition result. It will be appreciated that the element is specifically a knowledge point element in the knowledge domain to which the multi-turn dialog belongs.

In one example, the knowledge point element includes: business elements and/or claim elements.

It will be appreciated that, after step 26, if the multiple rounds of dialogue proceed to the next round, the user intention recognition needs to be performed again for the multiple rounds of dialogue, at this time, steps 21 to 26 may be repeatedly performed, further, steps 21 to 26 may be slightly adjusted, and then steps 21 to 26 after the adjustment may be performed, where the adjustment includes replacing the root node in step 23 with the target node, that is, when the current round is to find the node corresponding to the result of the user intention recognition, path searching may be continued from the target node of the previous round, so that the start node in each round of searching of the multiple rounds of dialogue is the tail node of the previous round of searching.

In the embodiment of the present specification, a knowledge graph may be constructed in the following manner. Creating a root node in the knowledge graph, wherein the root node represents a large service type, such as insurance, acquiring all standard questions in the large service type, and manually marking each standard question, and the specific content is marking service elements and appeal elements corresponding to each standard question. The business elements refer to business categories of the branch business corresponding to the standard question, such as mutual insurance, and the appeal elements refer to requirements or intentions of users, such as refund. Each branch node is added in the knowledge graph, and the branch nodes comprise element nodes corresponding to elements and standard question nodes corresponding to standard questions, wherein the element nodes are divided into service nodes corresponding to service elements and demand nodes corresponding to demand elements. Creating a connection edge between each associated node, for example, a connection edge of a root node pointing to a service node, a connection edge of a root node pointing to a demand node, the service node being connected to its own connection edge, the service node pointing to a connection edge of a demand node, the demand node being connected to its own connection edge, the demand node pointing to a connection edge of the service node, the service node pointing to a connection edge of the demand node. In the structure of the knowledge graph, the structure under the service node is similar to the structure under the appeal node, so that after the user speaks incomplete information, the path search can stay on the element node. Specifically, if the information uttered by the user contains service elements without requirements, the path search will branch off from the service node, hopefully stay on the specific service node, and ask back for the service node; if the information of the user speaking contains that the demand element does not have the service element, the path search can walk the branch under the demand node and hopefully can stay on the specific demand node, and the back inquiry is carried out aiming at the demand node; if the user speaks information that includes both a demand element and a business element, then the path search may take either the branch under the demand node or the branch under the business node.

According to the method provided by the embodiment of the specification, user intention recognition is carried out based on a pre-established knowledge graph, the knowledge graph associates each knowledge point element with each standard question, and at least one round of user text of a current multi-round dialogue is acquired firstly; then coding the user text of at least one round to obtain a context embedding vector; then, in the knowledge graph, starting iterative search of a next-hop node from a root node, and selecting a target node after a preset number of iterations; and finally, determining the element or standard question corresponding to the target node as a user intention recognition result. From the above, according to the embodiment of the present disclosure, the target node representing the user intention recognition node is searched in the knowledge graph according to the user text of at least one round of the current multi-round dialogue, so that a stable recognition effect can be ensured, and the interpretation can be achieved.

According to another aspect of the present invention, there is further provided a user intention recognition device for a multi-round dialogue, where the device is configured to perform the user intention recognition method for the multi-round dialogue provided in the embodiment of the present invention, where the device performs user intention recognition based on a pre-established knowledge graph, where the knowledge graph includes a root node and a plurality of types of branch nodes, where the plurality of types of branch nodes include element nodes and question nodes, where the element nodes correspond to knowledge point elements in a knowledge domain to which the multi-round dialogue belongs, the question nodes correspond to standard questions, and nodes with association relationships are connected by directional connection edges of corresponding types, and each node has a connection edge connected to itself. FIG. 5 shows a schematic block diagram of a user intent recognition device for multiple rounds of conversations, according to one embodiment. As shown in fig. 5, the apparatus 500 includes:

An obtaining unit 51, configured to obtain a user text of at least one round of a current multi-round dialogue;

an embedding unit 52, configured to encode the user text of at least one round acquired by the acquiring unit 51, to obtain a context embedding vector;

a first searching unit 53, configured to determine an initial state according to the context embedding vectors obtained by the root node and the embedding unit 52 in the knowledge graph, search a first number of next-hop nodes from each associated node connected to each outgoing edge of the root node according to the initial state, and update the first number of next-hop nodes as current nodes respectively;

a second search unit 54, configured to perform a predetermined number of iterations for each current node obtained by the first search unit 53, where each iteration includes determining a current state according to the current node, the root node, and the context embedding vector, searching a second number of next-hop nodes from each associated node connected to each outgoing edge of the current node according to the current state, determining each action probability corresponding to each next-hop node, and selecting one next-hop node according to each action probability to update the next-hop node as the current node;

A selecting unit 55, configured to select, as a target node, a current node with the largest action probability after a predetermined number of iterations of the second searching unit 54;

and a recognition unit 56, configured to determine the element or standard question corresponding to the target node selected by the selection unit 55 as a user intention recognition result.

Optionally, as an embodiment, the second search unit 54 specifically includes:

a path determining subunit, configured to determine a search path formed by each node and a connection edge between the root node and the current node;

the vector determination subunit is used for determining a path embedding vector corresponding to the search path according to the node embedding vector of each node and the edge embedding vector of the connecting edge in the search path determined by the path determination subunit;

and the state determining subunit is used for determining the current state according to the path embedding vector corresponding to the search path, the node embedding vector corresponding to the current node and the context embedding vector obtained by the embedding unit, which are determined by the vector determining subunit.

Further, the vector determining subunit is specifically configured to sequentially determine, as path elements, a connection edge in the search path and a node pointed by the connection edge, and according to an order of each path element in the search path, an output vector corresponding to each path element; determining an output vector corresponding to the current path element according to the output vector corresponding to the last path element and the embedded vector of the current path element; and determining the output vector corresponding to the last path element as the path embedding vector corresponding to the search path.

Optionally, as an embodiment, the second search unit 54 is specifically configured to determine an optional action set according to each associated node connected to each outgoing edge of the current node, output, by using a reinforcement learning model, a second number of target actions in the optional action set according to the current state, and action probabilities corresponding to each target action, and take each target action as a next-hop node, and take the action probability of each target action as the action probability of the corresponding next-hop node.

Further, each node and a connection edge between the root node and the current node form a search path, and the apparatus further includes:

and a training unit, configured to train the reinforcement learning model based on rewards corresponding to each search path after the predetermined number of iterations of the second search unit 54.

Further, the rewards corresponding to the search paths include:

Optionally, as an embodiment, the second search unit 54 is specifically configured to select, according to each action probability, a next-hop node with the largest action probability to update as the current node.

Optionally, as an embodiment, the second search unit 54 is specifically configured to select, according to each action probability, a next-hop node with the largest action probability to update as the current node according to the first proportion, and randomly select, according to the second proportion, a next-hop node to update as the current node.

Optionally, as an embodiment, the target node is an element node, and the identifying unit 56 is specifically configured to determine that an element corresponding to the element node is a user intention identification result;

the apparatus further comprises:

and a first output unit for outputting a reply sentence according to a reply template preset to correspond to the element, so as to respond to the user in the current multi-turn dialogue.

Optionally, as an embodiment, the target node is a question node, and the identifying unit 56 is specifically configured to determine that a standard question corresponding to the question node is a user intention identifying result;

the apparatus further comprises:

and a second output unit for outputting a reply sentence according to a reply template preset to correspond to the standard question sentence, so as to respond to the user in the current multi-round dialogue.

Further, the apparatus further comprises:

and a third output unit, configured to output, after the recognition unit 56 determines that the standard question corresponding to the question node is a user intention recognition result, a search path formed by each node and a connection edge between the root node and the question node, so as to interpret the user intention recognition result according to the search path.

Optionally, as an embodiment, the knowledge point element includes: business elements and/or claim elements.

By the apparatus provided in the embodiment of the present specification, user intention recognition is performed based on a pre-established knowledge map that associates each knowledge point element with each standard question, and first, the obtaining unit 51 obtains a user text of at least one round of a current multi-round dialogue; then the embedding unit 52 encodes the user text of the at least one round to obtain a context embedding vector; next, the first search unit 53 and the second search unit 54 search for the next-hop node in the knowledge graph in sequence, starting from the root node, and the selection unit 55 selects the target node after a predetermined number of iterations; finally, the recognition unit 56 determines the element or standard question corresponding to the target node as the user intention recognition result. From the above, according to the embodiment of the present disclosure, the target node representing the user intention recognition node is searched in the knowledge graph according to the user text of at least one round of the current multi-round dialogue, so that a stable recognition effect can be ensured, and the interpretation can be achieved.

According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 2.

According to an embodiment of yet another aspect, there is also provided a computing device including a memory having executable code stored therein and a processor that, when executing the executable code, implements the method described in connection with fig. 2.

Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the present invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, these functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The foregoing embodiments have been provided for the purpose of illustrating the general principles of the present invention in further detail, and are not to be construed as limiting the scope of the invention, but are merely intended to cover any modifications, equivalents, improvements, etc. based on the teachings of the invention.

Claims

1. A user intention recognition method for a multi-round dialogue, the method performing user intention recognition based on a pre-established knowledge graph, the knowledge graph including a root node and a plurality of types of branch nodes, the plurality of types of branch nodes including element nodes and question nodes, wherein the element nodes correspond to knowledge point elements in a knowledge domain to which the multi-round dialogue belongs, the question nodes correspond to standard questions, the nodes having an association relationship are connected by a corresponding type of directional connection edge, each node has a connection edge connected to itself, the method comprising:

Acquiring user text of at least one round of current multi-round dialogue;

2. The method of claim 1, wherein the determining the current state from the current node, the root node, and the context embedding vector comprises:

3. The method of claim 2, wherein the determining the path embedding vector corresponding to the search path according to the node embedding vector of each node and the edge embedding vector of the connecting edge in the search path comprises:

4. The method of claim 1, wherein the searching for a second number of next-hop nodes from the associated nodes connected by the edges of the current node according to the current state, and determining the action probabilities corresponding to each next-hop node, comprises:

5. The method of claim 4, wherein each node and connection edge between the root node and the current node forms a search path, the method further comprising:

6. The method of claim 5, wherein the rewards corresponding to each search path include:

7. The method of claim 1, wherein the selecting a next-hop node based on the respective action probabilities for updating as the current node comprises:

8. The method of claim 1, wherein the selecting a next-hop node based on the respective action probabilities for updating as the current node comprises:

9. The method of claim 1, wherein the target node is an element node, and the determining that the element or the standard question corresponding to the target node is a user intention recognition result includes:

the method further comprises the steps of:

10. The method of claim 1, wherein the target node is a question node, and the determining that the element or the standard question corresponding to the target node is a user intention recognition result comprises:

the method further comprises the steps of:

11. The method of claim 10, wherein after the determining that the standard question corresponding to the question node is a user intention recognition result, the method further comprises:

12. The method of claim 1, wherein the knowledge point element comprises: business elements and/or claim elements.

13. A user intention recognition device for a multi-round dialog, the device performing user intention recognition based on a pre-established knowledge graph, the knowledge graph including a root node and a plurality of types of branch nodes, the plurality of types of branch nodes including element nodes and question nodes, wherein the element nodes correspond to knowledge point elements in a knowledge domain to which the multi-round dialog belongs, the question nodes correspond to standard questions, the nodes having an association relationship are connected by a corresponding type of directional connection edge, each node has a connection edge connected to itself, the device comprising:

14. The apparatus of claim 13, wherein the second search unit specifically comprises:

15. The apparatus of claim 14, wherein the vector determination subunit is specifically configured to sequentially determine, as path elements, a connection edge in the search path and a node pointed by the connection edge, and according to an order of path elements in the search path, output vectors corresponding to the path elements; determining an output vector corresponding to the current path element according to the output vector corresponding to the last path element and the embedded vector of the current path element; and determining the output vector corresponding to the last path element as the path embedding vector corresponding to the search path.

16. The apparatus of claim 13, wherein the second search unit is specifically configured to determine a selectable action set according to each associated node connected by each outgoing edge of the current node, output, according to the current state, a second number of target actions in the selectable action set and action probabilities corresponding to each target action respectively, with a reinforcement learning model, take each target action as a next-hop node, and take action probabilities of each target action as action probabilities of a corresponding next-hop node.

17. The apparatus of claim 16, wherein each node and connection edge between the root node and the current node forms a search path, the apparatus further comprising:

and the training unit is used for training the reinforcement learning model based on rewards corresponding to each search path after the second search unit is iterated for a preset number of times.

18. The apparatus of claim 17, wherein the rewards for each search path comprise:

19. The apparatus of claim 13, wherein the second search unit is specifically configured to select, according to each action probability, a next-hop node with the largest action probability to update as the current node.

20. The apparatus of claim 13, wherein the second search unit is specifically configured to select, according to each action probability, a next-hop node with a largest action probability from the first proportions to update as a current node, and select, according to the second proportions, a next-hop node from the second proportions to update as the current node.

21. The apparatus of claim 13, wherein the target node is an element node, and the identifying unit is specifically configured to determine that an element corresponding to the element node is a user intention identification result;

the apparatus further comprises:

22. The apparatus of claim 13, wherein the target node is a question node, and the recognition unit is specifically configured to determine that a standard question corresponding to the question node is a user intention recognition result;

the apparatus further comprises:

23. The apparatus of claim 22, wherein the apparatus further comprises:

and the third output unit is used for outputting a search path formed by each node and the connecting edge between the root node and the question node after the recognition unit determines that the standard question corresponding to the question node is the user intention recognition result, so as to explain the user intention recognition result according to the search path.

24. The apparatus of claim 13, wherein the knowledge point element comprises: business elements and/or claim elements.

25. A computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of claims 1-12.

26. A computing device comprising a memory having executable code stored therein and a processor which, when executing the executable code, implements the method of any of claims 1-12.