CN115905483A

CN115905483A - User intention determining method and device, storage medium and electronic equipment

Info

Publication number: CN115905483A
Application number: CN202211394327.2A
Authority: CN
Inventors: 许慧楠; 邹波; 宋双永; 刘丹
Original assignee: Jingdong Technology Information Technology Co Ltd
Current assignee: Jingdong Technology Information Technology Co Ltd
Priority date: 2022-11-08
Filing date: 2022-11-08
Publication date: 2023-04-04

Abstract

The present disclosure provides a user intention determination method and apparatus, a storage medium, an electronic device; relates to the technical field of information processing. The method comprises the following steps: acquiring a user session text, and determining an initial feature vector corresponding to the user session text; constructing a multi-factor dialog graph based on the user session text and the initial feature vector; the multi-factor dialogue graph comprises content nodes, content group nodes and object nodes; carrying out graph coding on each type of nodes according to the multi-factor dialog graph to obtain vector representation of each type of nodes; and performing user intention identification based on the vector representation of each type of node, and determining the target user intention. The method and the device can solve the problem that in the related art, only the text information is considered, so that the accuracy of the recognition result is low.

Description

User intention determining method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of information processing technologies, and in particular, to a method and an apparatus for determining a user intention, a storage medium, and an electronic device.

Background

With the continuous development and application of internet technology, users and service parties communicate with each other through online customer service. How to accurately identify the user intention is an important factor influencing the service effect in the interactive session process of customer service and users.

In the related art, the user intention is determined by mining the relationship between a dialogue sentence and a related history sentence. But only the text information is considered, and other factors in the dialogue sentences are not considered, so that the accuracy of the recognition result is low.

It is noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure and therefore may include information that does not constitute prior art that is already known to a person of ordinary skill in the art.

Disclosure of Invention

The embodiment of the disclosure aims to provide a user intention determining method and device, a storage medium and electronic equipment, and further solves the problem of low accuracy of a recognition result caused by only considering text information in the related art to a certain extent.

According to a first aspect of the present disclosure, there is provided a user intent determination method including: acquiring a user session text, and determining an initial feature vector corresponding to the user session text; constructing a multi-factor dialog graph based on the user session text and the initial feature vector; the multi-factor dialog graph comprises content nodes, content group nodes and object nodes; carrying out graph coding on each type of nodes according to the multi-factor dialog graph to obtain vector representation of each type of nodes; and performing user intention identification based on the vector representation of each type of node, and determining the target user intention.

Optionally, the user session text includes speaker information; the constructing of the multi-factor dialog comprises the following steps: determining the content node based on a single sentence in the user session text; performing sentence division on the user conversation text based on the speaker information, and determining the content group node based on each divided group of sentences; determining the object node based on the speaker information; and adding corresponding relation edges among the content nodes, the content group nodes and the object nodes to construct a multi-factor dialog graph.

Optionally, the relationship edge includes: the method comprises the following steps of containing relationship edges, sequence relationship edges and object attribute edges, wherein corresponding relationship edges are added among the content nodes, the content group nodes and the object nodes, and the method comprises the following steps: adding an inclusion relation edge between the content node and the corresponding content group node; adding object attribute edges between the content group nodes and the corresponding object nodes and between the content group nodes of the same object respectively; and adding sequential relation edges between adjacent content group nodes and between adjacent content nodes in the same group respectively.

Optionally, the multi-factor dialog graph further includes a keyword node, and the method further includes: determining keyword information in the user session text according to a domain keyword knowledge graph; determining a keyword node corresponding to the user session text based on the keyword information; and constructing a multi-factor dialog graph based on the keyword nodes.

Optionally, the constructing a multi-factor dialog based on the keyword node includes: and adding an inclusion relation edge between the keyword node and the corresponding content node to construct a multi-factor dialog graph.

Optionally, the performing graph coding of each type of node according to the multi-factor dialog graph includes: determining an initial vector representation for each type of node in the multi-factor dialog based on the initial feature vector; determining an adjacency matrix for the multi-factor dialog graph; inputting the initial vector representation and the adjacency matrix of each type of node into a trained graph convolution neural network model to update a vector representation of each type of node.

Optionally, the determining an initial vector representation of each type of node in the multi-factor dialog based on the initial feature vector includes: determining the initial vector representation of the content node according to the initial characteristic vector corresponding to the content node; determining an initial vector representation of the content group node based on the initial vector representation of the content node; determining an initial vector representation of the object node based on the initial vector representation of the content group node.

Optionally, the identifying the user intention based on the vector representation of each type of nodes includes: splicing the vector representations of each type of nodes to obtain a multi-level feature vector; and classifying the multi-level feature vectors by adopting the trained first neural network model to determine the intention of the target user.

Optionally, the method further comprises: and respectively extracting the significant features of the vector representation of each type of nodes to obtain the significant feature vector of each type of nodes.

Optionally, the determining an initial feature vector corresponding to the user session text includes: performing word vectorization on each single sentence of the user session text to obtain a single sentence vector; extracting the features of the single sentence vectors to obtain single sentence feature vectors; and determining an initial feature vector corresponding to the user session text based on the single sentence feature vector.

Optionally, the determining an initial feature vector corresponding to the user session text based on the single sentence feature vector includes: and inputting the single sentence feature vector into a trained second neural network model according to the conversation sequence for extracting the context feature so as to obtain an initial feature vector corresponding to the user conversation text.

According to a second aspect of the present disclosure, there is provided a user intent determination apparatus, the apparatus comprising: the system comprises a characteristic determining module, a graph constructing module, a graph coding module and an intention identifying module, wherein the characteristic determining module is used for acquiring a user session text and determining an initial characteristic vector corresponding to the user session text; the graph building module is used for building a multi-factor dialog graph based on the user session text and the initial feature vector; the multi-factor dialog graph comprises content nodes, content group nodes and object nodes; the graph coding module is used for carrying out graph coding on each type of nodes according to the multi-factor dialog graph so as to obtain vector representation of each type of nodes; and the intention identification module is used for carrying out user intention identification based on the vector representation of each type of nodes and determining the intention of the target user.

According to a third aspect of the present disclosure, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, is adapted to carry out the method of any of the above embodiments.

According to a fourth aspect of the present disclosure, there is provided an electronic apparatus comprising: one or more processors; and storage for one or more programs which, when executed by the one or more processors, cause the one or more processors to perform the method of any of the embodiments described above.

Exemplary embodiments of the present disclosure may have some or all of the following benefits:

in the method for determining the user intention provided by the exemplary embodiment of the present disclosure, on one hand, a multi-factor dialog graph including content nodes, content group nodes and object nodes may be constructed based on the user session text and the corresponding initial feature vectors thereof; three types of session related information are introduced through the content nodes, the content group nodes and the object nodes, and related factors in different aspects are merged into the multi-factor dialog diagram, so that the user intention can be identified more accurately. On the other hand, the method and the device can realize the full mining and fusion of the related information of the user conversation text through the processes of characteristic determination, multi-factor graph construction, graph coding and user intention identification, can realize the intention identification of smaller granularity and finer category, and simultaneously improve the identification accuracy and precision.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.

Fig. 1 schematically illustrates an exemplary application scenario architecture diagram of a user intent determination method and apparatus according to one embodiment of the present disclosure.

Fig. 2 schematically shows a flow chart of a user intent determination method in one embodiment according to the present disclosure.

FIG. 3 schematically illustrates a flow diagram for constructing a multi-factor dialog diagram in accordance with one embodiment of the present disclosure.

FIG. 4 schematically shows one of the diagrams for constructing a multi-factor dialog based on user session text in an embodiment in accordance with the present disclosure.

FIG. 5 schematically shows a second schematic diagram for constructing a multi-factor dialog based on user session text in an embodiment in accordance with the present disclosure.

FIG. 6 schematically shows a process flow diagram of a user intent determination method according to one embodiment of the present disclosure.

Fig. 7 schematically shows a block diagram of a user intention determination apparatus according to an embodiment of the present disclosure.

FIG. 8 illustrates a block diagram of an electronic device suitable for use to implement embodiments of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

As shown in fig. 1, an exemplary system diagram of an application scenario of a user intent determination method and apparatus is provided, and the system includes a user terminal device and a customer service server. The embodiment is illustrated by applying the method to the server, and it can be understood that the method can also be applied to the terminal device, and can also be applied to a system including the terminal device and the server, and is implemented by interaction between the terminal device and the server. The customer service server can be an independent physical server, can also be a server cluster or distributed system formed by a plurality of physical servers, can also be a cloud server for providing basic cloud computing services such as cloud service, a cloud database, cloud computing, cloud functions, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN (content delivery network), big data and artificial intelligence platforms and the like, and can also be a node in a block chain. For example, in the process of man-machine conversation, the server can be a customer service robot, and the user carries out conversation with the customer service robot through the terminal device. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, a vehicle-mounted device, and the like. When the user intention determining method provided in this embodiment is implemented through interaction between a terminal and a server, the terminal and the server may be directly or indirectly connected through wired or wireless communication, and the disclosure is not limited herein.

The user intention determining method provided by the embodiment of the disclosure can be executed in a server, and accordingly, the user intention determining device is generally arranged in a work server. The user intention determining method provided by the embodiment of the disclosure can be executed in the terminal device, and accordingly, the user intention determining device is generally arranged in the terminal device.

It will be appreciated that the training process and the use process of the intent recognition model for determining user intent of the present disclosure are similar, with the primary difference being that the input data of the intent recognition model during use is the user session text for which intent is pending, and the input data during the training process is the training text, where the training text may include historical user session text labeled with intentional image labels, which may be determined based on manual labeling or pre-constructed models. Accordingly, the embodiments of the present disclosure mainly describe the usage process of the intention recognition model, and the training process thereof is not repeatedly discussed. The following describes a user intention determining method disclosed in an embodiment of the present specification, with reference to specific embodiments.

Referring to fig. 2, a user intention determining method of an example embodiment provided by the present disclosure may include the following steps.

Step S210, obtaining a user session text, and determining an initial feature vector corresponding to the user session text.

In this example embodiment, the user session text may be a complete dialog between the user and the customer service, or may be a partial dialog of the complete dialog, which is not limited in this example. The initial feature vector corresponding to the user session text may be determined by existing vectorization means. For example, a single sentence in the text of the user session may be converted into initial feature vectors by looking up a pre-training word vector table, and a feature matrix may also be composed of these initial feature vectors.

Step S220, constructing a multi-factor dialog graph based on the user session text and the initial feature vector, wherein the multi-factor dialog graph comprises content nodes, content group nodes and object nodes.

In the present exemplary embodiment, the content node refers to a node related to text content, and for example, each single sentence may be regarded as one content node. The content group node refers to a plurality of content nodes which are mutually connected by a certain rule or an independent content node which is not connected with other content nodes by the rule. The object node refers to a node divided according to objects with different attributes, and the objects with different attributes may include a speaking subject object or a speaking object, and may also include an object of conversation talk, and the like, which is not limited in this example. In this example, the multi-factor conversation graph may be composed of different types of nodes and different relationships between the nodes.

And step S230, carrying out graph coding on each type of nodes according to the multi-factor dialog graph so as to obtain vector representation of each type of nodes.

In the present example embodiment, the graph encoding of each type of node may be performed by a graph-convolution neural network. The Graph Convolutional neural Network (GCN) may also be a neural Network model R-GCN (relational Graph Convolutional neural Network model) optimized on the basis of the GCN, and may process multi-relational data features in the Network.

And step S240, identifying the user intention based on the vector representation of each type of node, and determining the target user intention.

In this example embodiment, various neural network models may be used for user intention recognition, and the neural network model may be a multi-class artificial neural network (such as a multi-layer perceptron) or a convolutional neural network, and the example is not limited thereto. The vector representation of each type of node may be subjected to fusion processing before being input into the neural network model, or may be subjected to fusion processing after being input separately, where the fusion processing may include one or more of stitching, data averaging, or weighted averaging, and this example is not limited in this respect.

In the present exemplary embodiment, the categories of the user's intention may include several tens or even hundreds (e.g., 50), and exemplarily, the categories of the user's intention may include that the goods are returned without a free freight rate, that the automatic renewal is forgotten to be turned off, that the member is not enabled, that the color/style of the goods is consulted, that the goods are not normally used, and the like.

In the method for determining a user intention provided by the present exemplary embodiment, on one hand, a multi-factor dialog graph including content nodes, content group nodes, and object nodes may be constructed based on a user session text and its corresponding initial feature vector; three types of session related information are introduced through the content nodes, the content group nodes and the object nodes, and related factors in different aspects are merged into the multi-factor dialog diagram, so that the user intention can be identified more accurately. On the other hand, the method and the device can fully mine and fuse the related information of the user session text through the processes of feature determination, multi-factor graph construction, graph coding and user intention identification, can realize intention identification with smaller granularity and finer categories, and simultaneously improve identification accuracy and accuracy.

The various steps of the present disclosure are described in more detail below.

In some embodiments, referring to FIG. 3, constructing a multi-factor dialog includes the following steps S310-S340.

Step S310, content nodes are determined based on the single sentence in the user session text.

In this example embodiment, each single sentence or multiple single sentences in the user session text may be used as one content node, or a single sentence may be preprocessed and used as a content node, for example, a single sentence text obtained by removing a word-breaking word such as a word-breaking word in a single sentence may be used as a content node, which is not limited in this example.

Step S320, performing sentence division on the user conversation text based on the speaker information, and determining a content group node based on each divided sentence group.

In this example embodiment, the user session text may include speaker information. The speaker information may include the speaker's role information, the speaker's speaking time information, and may also include other speaker-related information, such as speaker-related order information or basic information (e.g., name, contact address, etc.), which is not limited in this example. In this example, the sentence division may be performed on the user conversation text based on the speaker information, for example, the continuous sentences of the same role in the conversation may be divided into a group as a content group node based on the role information (customer service or user) of the speaker. A content group node may include one single sentence or a plurality of consecutive single sentences, which is not limited in this example.

In step S330, the target node is determined based on the speaker information.

In the present exemplary embodiment, the speaker information may include the speaker's role information, the speaker's speaking time information, and may also include other speaker-related information, such as order information or basic information (e.g. name, contact address, etc.) related to the speaker, which is not limited in this example.

In the present exemplary embodiment, the object node may be determined based on certain speaker information or speaker information. For example, the object nodes (user node and customer service node) may be determined based on the speaker's role information. As another example, the object node may be determined based on the speaker role information and the speaker order information, and the speaker information (user node and customer service node) and the order information (order a node and order B node) may be respectively used as one object node.

And step S340, adding corresponding relation edges among the content nodes, the content group nodes and the object nodes to construct the multi-factor dialog graph.

In this example embodiment, relationship edges between each two of the content node, the content group node, and the object node may be the same or partially the same, or may all be different, and this example does not limit this. After adding relationship edges between different nodes, a multi-factor dialog graph is formed.

In some embodiments, the relationship edges may include a relationship edge, a sequential relationship edge, and an object property edge, and adding corresponding relationship edges between the content nodes, the content group nodes, and the object nodes may include the following steps: an inclusion relation edge is added between the content node and the corresponding content group node. In this example embodiment, the inclusion relation edge refers to that two nodes connected are in an inclusion relation, and the inclusion relation edge may include one or more relation edges. For example, the content node is included in the corresponding object group node.

Object attribute edges are respectively added between the content group nodes and the corresponding object nodes and between the content group nodes of the same object.

In this example embodiment, an object property edge means that two connection nodes have some same object property therebetween, and the object property edge may include one or more relationship edges. For example, the content group node corresponding to the same speaker may be connected to the object node via an object attribute edge. The content group nodes of the same speaker can be connected through another object attribute edge

And adding sequential relation edges between adjacent content group nodes and between adjacent content nodes in the same group respectively.

In this example embodiment, an ordinal relationship edge means that two connected nodes have a certain ordinal relationship therebetween, and the ordinal relationship edge includes one or more relationship edges. For example, one order relation edge can be added between adjacent content group nodes according to the speaking sequence, and another order relation edge can be added between adjacent content nodes in the same content group according to the speaking sequence.

For example, a user session text formed for a period of man-machine service sessions is shown in the upper part of fig. 4. Each single sentence can be taken as an object node, such as U1 to U6 in fig. 4. A plurality of consecutive single sentences uttered by the same speaker may be taken as object group nodes, such as L1 to L4 in fig. 4, where the node L4 is composed of nodes U3 to U5. Considering that a human-computer service conversation involves two speaker roles, i.e., machine service and client, each having a different speaking tendency, two object nodes can be defined based on the speaker role information, respectively, node C (client) and node S (machine service).

In this example, different relationship edges may be defined between different nodes, as shown in fig. 4, a local edge (including a relationship edge) is added between a content node and a content group node to which the content node belongs; adding a conversation edge (object attribute edge) between the content group node and a speaker node corresponding to the content group node; adding round sequential edges (sequential relation edges) between adjacent content nodes in the same content group node; adding local sequential edges (sequential relationship edges) between adjacent content group nodes; the speaker-by-speaker edges (object attribute edges) are added between the speaker-identical content group nodes. The resulting multi-factor dialog is shown in the lower part of fig. 4.

The method can extract the speaking information of the text sentences through the content nodes, then aggregate the local speaking information continuously expressed by the speaker through the content group nodes, and finally aggregate the personal speaking tendency information of the speaker in the multi-turn conversation context through the object nodes, thereby realizing the fusion of the multi-factor related sentence information in the user conversation text and improving the accuracy of the user intention identification.

In some embodiments, the multi-factor dialog diagram further includes a keyword node, and the method further includes the following steps.

And determining the keyword information in the user session text according to the domain keyword knowledge graph.

In the present exemplary embodiment, the domain keyword knowledge graph may be a previously constructed knowledge graph of related domains, and may be composed of keywords of related domains and relationships between the keywords. The related fields may be fields related to the user session text. The related domain information may be a domain information base established by the service party according to historical service information (e.g., historical session information between the customer service and the user), keywords are extracted according to the domain information base, the extracted domain keywords are used as entity nodes, and a domain keyword knowledge graph is constructed based on the entity nodes and the relationship thereof. For example, the relationship between entities may include equivalence, containment, instance, attributes, and the like, such as the relationship between entity nodes "airline tickets" and "airline tickets" may be an equivalence relationship, the relationship between entity nodes "after sale" and "return", "change", "repair", "pay", may be a containment relationship, the relationship between entity nodes "fees" and "pay online", "pay by date" may be an attribute relationship, and the relationship between entity nodes "pay methods" and "pay online", "pay by date" may be an instance relationship.

In the present example embodiment, the keyword information of the user session text is determined by search matching the domain keyword knowledge-graph with the user session text.

And determining a keyword node corresponding to the user session text based on the keyword information.

In this example embodiment, each keyword in the keyword information may be used as a keyword node, or each keyword may be used as a keyword node after the keyword information is subjected to a filtering process, which is not limited in this example. As shown in fig. 5, the keyword nodes are "consult", "order", "goods", "delivery", "urging", "out of stock", "replenishment", respectively.

And constructing the multi-factor dialog graph based on the keyword nodes.

In this example embodiment, an inclusion relationship edge may be added between a keyword node and a corresponding content node to construct a multi-factor dialog graph. Illustratively, a containment relationship edge may be added between the keyword node and the content node to which it belongs. As shown in fig. 5, a containment edge (physical edge) may be added between "ship," "back-order," "restock," and node U5.

In some embodiments, the graph coding of each type of node is performed according to a multi-factor dialog graph, including the following steps.

Based on the initial feature vectors, an initial vector representation for each type of node in the multi-factor dialog is determined.

In this exemplary embodiment, a matrix composed of initial feature vectors corresponding to the content nodes may be used as the feature matrix of the class node. An initial vector representation of the content group node may be determined based on the initial feature vector of the content node. The initial feature vectors of all content nodes in the content group node may be subjected to a first correlation operation to determine an initial vector representation of the content group node, where the first correlation operation may be averaging or weighted averaging, which is not limited in this example. An initial vector representation of the object node may be determined based on the initial feature vectors of the content group nodes. For example, the initial feature vectors of the object group nodes sent by the object node may be subjected to a second correlation operation to determine an initial vector representation of the object node, where the second correlation operation may be averaging or weighted averaging, which is not limited in this example. The first correlation operation and the second correlation operation in this example may be the same or different, and this example does not limit this.

In the present example embodiment, for a keyword node, entities and relationships in the domain keyword knowledge graph may be mapped to low-dimensional vectors as initial feature vectors of the keyword node using a knowledge graph embedding technique such as TransE, transH, transR, or the like.

An adjacency matrix for the multi-factor dialog is determined.

In the present example embodiment, its adjacency matrix may be determined from the multi-factor dialog diagram. If the multi-factor dialog graph has n nodes, and n is a natural number, the adjacent matrix is a square matrix of n × n, and the adjacent matrix is defined as:

wherein, arc _ij Representing the i row and j column elements, v, of the adjacency matrix _i Denotes the ith node, v _j Representing the jth node, and E representing a set of relational edges between the nodes.

The initial vector representation and adjacency matrix of each type of node are input into the trained graph convolution neural network model to update the vector representation of each type of node.

In the present example embodiment, the atlas neural network model may be a relational atlas neural network model, such as an R-GCN, which may handle multi-relational data features in a network. And taking the initial characteristic vector of each type of node as the initial vector representation of the type of node, and updating the vector representation of each node through an adjacency matrix and a trained R-GCN model to obtain the updated node vector representation.

In some embodiments, the user intent recognition is performed based on a vector representation of each type of node, including: and splicing the vector representations of each type of nodes to obtain the multi-level feature vector.

In this example embodiment, the vector representations of different classes of nodes may be spliced, and the splicing may be performed in a direct association order, for example, if the vector of the content node is represented as a, the vector of the content group node is represented as B, and the vector of the object node is represented as C, then the multi-level feature vector is [ ABC ]. Any sequence of splicing may be performed, which is not limited in this example.

And classifying the multi-level feature vectors by adopting the trained first neural network model to determine the intention of the target user.

In this example embodiment, the first neural network model may be an artificial neural network, such as a multi-layer perceptron, and the number of layers of the multi-layer perceptron may be determined according to actual situations, which is not limited in this example. For example, a three-layer perceptron may be set to classify the multi-level feature vectors, and output category probability vectors, and an intention category corresponding to a maximum value in the category probability vectors may be used as the target user intention, or a plurality of intention categories with larger probability values may be selected in the category probability vectors as the target user intention, or a tendency analysis may be performed on the selected plurality of intention categories, and the target user intention is determined according to an analysis result, which is not limited in this example.

In some embodiments, the method further comprises: and respectively extracting the significant features of the vector representation of each type of nodes to obtain the significant feature vector of each type of nodes.

In the present exemplary embodiment, the significant feature extraction may be performed by using a pooling operation, such as maximum pooling, average pooling, and the like, which is not limited by the present example. After the significant features are extracted, the vector representations of various nodes can be spliced and classified so as to remove redundant information and reduce data processing amount and hardware consumption.

In some embodiments, determining an initial feature vector corresponding to the user session text comprises: and performing word vectorization on each single sentence of the user conversation text to obtain a single sentence vector.

In this example embodiment, word vectorization may be performed by looking up a pre-trained word vector table or other vectorization means, for example, words of each single sentence may be sequentially encoded by a Bert encoder.

And performing feature extraction on the single-sentence vectors to obtain the single-sentence feature vectors.

In this exemplary embodiment, the convolutional neural network may be used to perform feature extraction on the single sentence vectors, and each single sentence vector may correspond to one convolutional neural network, so as to speed up the feature extraction process and reduce the complexity of a single network.

Based on the single sentence feature vector, an initial feature vector corresponding to the user conversation text is determined.

In this example embodiment, the single sentence feature vector may be input to the trained second neural network model in conversational order for contextual feature extraction to obtain an initial feature vector corresponding to the user conversational text. The second neural network model may be a recurrent neural network, such as a bidirectional long-term and short-term memory network, and the contact information between preceding and following single sentences of the user session text is extracted through the trained second neural network model to obtain a context-dependent vector representation of each single sentence utterance, that is, an initial feature vector corresponding to the user session text.

For example, the implementation process of the user intention determination method of the present disclosure is shown in fig. 6, and may be implemented on a customer service robot. The intention is determined through the trained intention recognition model, and if the trained intention recognition model can be deployed on the customer service robot, the automatic recognition of the intention of the user can be realized. The intent recognition model may include a sequential encoding module, which may include a word embedding model, a third neural network model, and a second neural network model, a multi-factor dialog encoding module, which may include a convolution neural network model, a knowledge graph embedding module, and a classification module, which includes a first neural network model. Specifically, the following steps can be performed.

The first step is to obtain a user session text.

In this example, the user session text to be analyzed may be obtained from various clients of the user terminal.

And secondly, respectively inputting a word embedding model into each single sentence in the user conversation text for word vectorization to obtain a single sentence vector.

In this example, the single sentence may be word coded using Bert to achieve vectorization.

And thirdly, inputting each single sentence vector into a third neural network model to extract sentence characteristics so as to obtain single sentence characteristic vectors.

In this example, the third neural network model may be a convolutional neural network, with each single sentence vector input to the convolutional neural network model for local feature extraction.

And fourthly, inputting all the single sentence feature vectors into the second neural network model according to the conversation sequence to extract the context feature so as to obtain the initial feature vector corresponding to the user conversation text.

In this example, the second neural network model may be a bidirectional long-and-short memory network.

And fifthly, determining keyword information in the user session text according to the domain keyword knowledge graph, and further determining corresponding keyword nodes.

In this example, the domain keyword knowledge graph may be a knowledge graph that is pre-established based on historical information of the related domain.

And sixthly, constructing a multi-factor dialog diagram.

In this example, the content node, the content group node, and the object node in the user session text may be determined first, and then the relationship edges between different nodes may be determined; based on the content nodes, content group nodes, object nodes, and keyword nodes and the relationship edges between different nodes, a multi-factor dialog is constructed, as shown in fig. 5.

And seventhly, determining initial vector representation of each type of nodes in the multi-factor dialog graph based on the initial feature vector corresponding to the user session text.

In this example, the initial feature vector corresponding to the content node may be represented as an initial vector for the class of node in the multi-factor graph. The average of the initial vector representations corresponding to all content nodes within the content node group may be taken as the initial vector representation of the content node group. The average of the initial vector representations of all content group nodes issued by the same object may be taken as the initial vector representation of the object node.

And eighthly, mapping entities and relations corresponding to the keyword nodes in the domain keyword knowledge graph into initial vector representations corresponding to the nodes by using a knowledge graph embedding technology.

In this example, feature vectors of the knowledge-graph, i.e., the initial vector representation, may be obtained using TransE, transH, transR, etc.

And step nine, inputting the adjacent matrix of the multi-factor dialog graph and the initial vector representation of each node into a graph convolution neural network so as to update the vector representation of each type of node.

And step ten, extracting the significant features of the updated vector representation of each type of nodes to obtain the significant feature vector of each type of nodes.

In this example, significant feature extraction may be performed using maximum pooling or average pooling.

And step eleven, splicing the significant feature vectors of each type of nodes to obtain a multi-level feature vector.

In this example, the concatenation may be in the order of content nodes, content group nodes, and object nodes.

And step ten, inputting the multi-level feature vector into a first neural network model to obtain a prediction probability vector, and determining the target user intention according to the prediction probability vector.

In this example, the first neural network model may be a multi-layer perceptron, and the multi-layer perceptron is used to classify the multi-layer feature vectors to obtain a prediction probability vector corresponding to each class.

The user intention recognition is performed for a conversation between a customer service and a user to improve the service quality of the customer service. Illustratively, for the customer service robot, the user intention needs to be recognized first, and then the user session is responded correspondingly according to the recognized intention, so that accurate intention recognition directly affects the effect of downstream components of the customer service robot, and is very important in the whole user service process. The method disclosed by the invention forms different levels of corresponding nodes by carrying out multi-level information extraction on the user session information, constructs a multi-factor dialogue graph based on different levels of nodes, and digs out complex interaction relations among content nodes (single sentence information), content group nodes (local sentence information), object nodes (speaker information) and keyword nodes (keyword information) in the user session text, thereby improving the accuracy of user intention identification.

According to the method, the context information of the text is extracted through a bidirectional long-time and short-time memory network, the multi-factor dialog graph is subjected to graph coding, information fusion of multi-level nodes is completed, the probability of each intention category is obtained through characteristic splicing and classification of multi-level nodes, the intention of a target user is further determined, multi-level information in the conversation text can be mined, and the recognition accuracy is improved. In addition, the domain keyword information is also merged into the domain knowledge graph, and the identification precision can be further improved. The method can realize the identification of the finer categories with smaller granularity and smaller user intention, such as the following categories can be directly identified: the returned goods are not free of freight charges, forget to close automatic renewal, cannot open members, consult the colors/styles of the goods, cannot normally use the goods and the like, and not only stay in the identification of the major categories (consultation, returned goods and the like), but also greatly improve the accuracy of intention identification.

The method and the system can also be applied to conversation between the artificial customer service and the user, and are used for carrying out user intention recommendation on the artificial customer service or solving the problems of information omission or repeated conversation and the like in the artificial customer service handover process.

Further, in the present exemplary embodiment, a user intention determination apparatus 700 is also provided. The user intention determining apparatus 700 may be applied to a robot customer service server. Referring to fig. 7, the user intention determining apparatus 700 may include: the system comprises a feature determination module 710, a graph construction module 720, a graph coding module 730 and an intention identification module 740, wherein the feature determination module 710 is used for acquiring a user session text and determining an initial feature vector corresponding to the user session text; a graph construction module 720, configured to construct a multi-factor dialog graph based on the user session text and the initial feature vector; the multi-factor dialog graph comprises content nodes, content group nodes and object nodes; the graph coding module 730 is used for carrying out graph coding on each type of nodes according to the multi-factor dialog graph so as to obtain vector representation of each type of nodes; and the intention identification module 740 is used for performing user intention identification based on the vector representation of each type of node and determining the target user intention.

In an exemplary embodiment of the present disclosure, the user session text includes speaker information; the graph building module 720 may include: the first node determining module, the second node determining module, the third node determining module and the first graph constructing sub-module; the first node determining module may be configured to determine the content node based on a single sentence in the user session text; the second node determining module may be configured to perform sentence division on the user session text based on the speaker information, and determine the content group node based on each divided group of sentences; the third node determining module may be configured to determine the object node based on the speaker information; the first graph building sub-module may be configured to add corresponding relationship edges between the content nodes, the content group nodes, and the object nodes to build the multi-factor conversation graph.

In an exemplary embodiment of the present disclosure, the relationship edge includes: the first graph building submodule may further be configured to: adding a containing relation edge between the content node and the corresponding content group node; adding object attribute edges between the content group nodes and the corresponding object nodes and between the content group nodes of the same object respectively; and adding sequential relation edges between adjacent content group nodes and between adjacent content nodes in the same group respectively.

In an exemplary embodiment of the present disclosure, the multi-factor dialog further includes a keyword node, and the apparatus 700 further includes: the first determining module, the second determining module and the second graph constructing sub-module; the first determining module may be configured to determine keyword information in the user session text according to a domain keyword knowledge graph; the second determining module can be used for determining a keyword node corresponding to the user session text based on the keyword information; a second graph construction sub-module may be used to construct the multi-factor dialog graph based on the keyword nodes.

In an exemplary embodiment of the disclosure, the second graph construction sub-module may be further configured to add an inclusion relation edge between the keyword node and the corresponding content node to construct the multi-factor dialog graph.

In an exemplary embodiment of the present disclosure, the graph encoding module 730 includes a first determining submodule, a second determining submodule, and an updating submodule; a first determining submodule may be configured to determine an initial vector representation for each type of node in the multi-factor dialog based on the initial feature vector; a second determination submodule may be configured to determine an adjacency matrix for the multi-factor dialog diagram; an update submodule may be configured to input the initial vector representation and the adjacency matrix for each type of node into a trained graph convolution neural network model to update a vector representation for each type of node.

In an exemplary embodiment of the disclosure, the first determining sub-module may be further configured to: determining initial vector representation of the content node according to the initial feature vector corresponding to the content node; determining an initial vector representation of the content group node based on the initial vector representation of the content node; determining an initial vector representation of the object node based on the initial vector representation of the content group node.

In an exemplary embodiment of the present disclosure, the intent recognition module 740 includes a stitching sub-module and a classification sub-module; the splicing submodule can be used for splicing the vector representation of each type of nodes to obtain a multi-level feature vector; the classification submodule can be used for classifying the multi-level feature vectors by adopting the trained first neural network model to determine the target user intention.

In an exemplary embodiment of the present disclosure, the apparatus 700 further includes an extraction module, which may be configured to perform salient feature extraction on the vector representations of each type of nodes, respectively, to obtain salient feature vectors of each type of nodes.

In an exemplary embodiment of the present disclosure, the feature determination module 710 may include: a vectorization submodule, an extraction submodule and a third determination submodule; the vectorization submodule can be used for carrying out word vectorization on each single sentence of the user session text to obtain a single sentence vector; the extraction submodule can be used for extracting the features of the single sentence vectors to obtain single sentence feature vectors; a third determination submodule may be configured to determine an initial feature vector corresponding to the user session text based on the single sentence feature vector.

In an exemplary embodiment of the disclosure, the third determining sub-module may be further configured to input the single sentence feature vector into a trained second neural network model for context feature extraction in a conversational order to obtain an initial feature vector corresponding to the user conversational text.

The specific details of each module or unit in the user intention determining device have been described in detail in the corresponding user intention determining method, and therefore are not described herein again.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiment; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method in the embodiments described below. For example, the electronic device may implement the various steps shown in fig. 2-6, etc.

It should be noted that the computer readable media shown in the present disclosure may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

An electronic device 800 according to such an embodiment of the disclosure is described below with reference to fig. 8. The electronic device 800 shown in fig. 8 is only an example and should not bring any limitations to the functionality and scope of use of the embodiments of the present disclosure.

As shown in fig. 8, electronic device 800 is in the form of a general purpose computing device. The components of the electronic device 800 may include, but are not limited to: the at least one processing unit 810, the at least one memory unit 820, a bus 830 that couples various system components (including the memory unit 820 and the processing unit 810), and a display unit 840.

Where the memory unit stores program code, the program code may be executed by the processing unit 810 to cause the processing unit 810 to perform steps according to various exemplary embodiments of the present disclosure as described in the "exemplary methods" section above in this specification.

The storage unit 820 may include readable media in the form of volatile memory units such as a random access memory unit (RAM) 8201 and/or a cache memory unit 8202, and may further include a read only memory unit (ROM) 8203.

The storage unit 820 may also include a program/utility 8204 having a set (at least one) of program modules 8205, such program modules 8205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 830 may be any one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 800 may also communicate with one or more external devices 870 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 800, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 800 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 850. Also, the electronic device 800 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 860. As shown, the network adapter 860 communicates with the other modules of the electronic device 800 via the bus 830. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 800, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RA identification systems, tape drives, and data backup storage systems, etc.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

Furthermore, the above-described figures are merely schematic illustrations of processes included in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed, for example, synchronously or asynchronously in multiple modules.

It should be noted that although the various steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that these steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken into multiple step executions, etc., are all considered part of this disclosure.

It should be understood that the disclosure disclosed and defined in this specification extends to all alternative combinations of two or more of the individual features mentioned or evident from the text and/or drawings. All of these different combinations constitute various alternative aspects of the present disclosure. The embodiments of this specification illustrate the best mode known for carrying out the disclosure and will enable those skilled in the art to utilize the disclosure.

Claims

1. A method for determining user intent, comprising:

acquiring a user session text, and determining an initial feature vector corresponding to the user session text;

constructing a multi-factor dialog graph based on the user session text and the initial feature vector; the multi-factor dialog graph comprises content nodes, content group nodes and object nodes;

carrying out graph coding on each type of nodes according to the multi-factor dialog graph to obtain vector representation of each type of nodes;

and performing user intention identification based on the vector representation of each type of node, and determining the target user intention.

2. The method of claim 1, wherein the user session text includes speaker information; the constructing of the multi-factor dialog graph comprises the following steps:

determining the content node based on a single sentence in the user session text;

carrying out sentence division on the user conversation text based on the speaker information, and determining the content group node based on each divided group of sentences;

determining the object node based on the speaker information;

and adding corresponding relation edges among the content nodes, the content group nodes and the object nodes to construct a multi-factor dialog graph.

3. The method of claim 2, wherein the relationship edge comprises: the method comprises the following steps of containing relationship edges, sequence relationship edges and object attribute edges, wherein the adding of the corresponding relationship edges among the content nodes, the content group nodes and the object nodes comprises the following steps:

adding an inclusion relation edge between the content node and the corresponding content group node;

adding object attribute edges between the content group nodes and the corresponding object nodes and between the content group nodes of the same object respectively;

4. The method of claim 1, wherein the multi-factor dialog graph further comprises keyword nodes, the method further comprising:

determining keyword information in the user session text according to a domain keyword knowledge graph;

determining a keyword node corresponding to the user session text based on the keyword information;

and constructing a multi-factor dialog graph based on the keyword nodes.

5. The method of claim 4, wherein constructing a multi-factor dialog graph based on the keyword nodes comprises:

and adding an inclusion relation edge between the keyword node and the corresponding content node to construct a multi-factor dialog graph.

6. The method according to claim 1, wherein the graph coding of each type of node according to the multi-factor dialog graph comprises:

determining an initial vector representation for each type of node in the multi-factor dialog based on the initial feature vector;

determining an adjacency matrix for the multi-factor dialog graph;

inputting the initial vector representation and the adjacency matrix of each type of node into a trained graph convolution neural network model to update a vector representation of each type of node.

7. The method of claim 6, wherein the determining an initial vector representation for each type of node in the multi-factor dialog graph based on the initial feature vector comprises:

determining an initial vector representation of the content node according to the initial feature vector of the content node;

determining an initial vector representation of the content group node based on the initial vector representation of the content node;

determining an initial vector representation of the object node based on the initial vector representation of the content group node.

8. The method according to claim 1, wherein the identifying the user intention based on the vector representation of each type of node comprises:

splicing the vector representations of each type of nodes to obtain a multi-level feature vector;

9. The method of claim 1, further comprising: and respectively extracting the significant features of the vector representation of each type of nodes to obtain the significant feature vector of each type of nodes.

10. The method of any one of claims 1-9, wherein the determining an initial feature vector corresponding to the user session text comprises:

performing word vectorization on each single sentence of the user session text to obtain a single sentence vector;

extracting the features of the single sentence vectors to obtain single sentence feature vectors;

and determining an initial feature vector corresponding to the user session text based on the single sentence feature vector.

11. The method of claim 10, wherein the determining an initial feature vector corresponding to the user session text based on the single sentence feature vector comprises:

and inputting the single sentence feature vector into a trained second neural network model according to the conversation sequence for extracting the context feature so as to obtain an initial feature vector corresponding to the user conversation text.

12. An apparatus for determining user intent, the apparatus comprising:

the characteristic determining module is used for acquiring a user session text and determining an initial characteristic vector corresponding to the user session text;

the graph building module is used for building a multi-factor dialog graph based on the user session text and the initial feature vector; the multi-factor dialog graph comprises content nodes, content group nodes and object nodes;

the graph coding module is used for carrying out graph coding on each type of nodes according to the multi-factor dialog graph so as to obtain vector representation of each type of nodes;

and the intention identification module is used for carrying out user intention identification based on the vector representation of each type of nodes and determining the intention of the target user.

13. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-11.

14. An electronic device, comprising: one or more processors; and

storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out the method of any of claims 1-11.