WO2022141875A1

WO2022141875A1 - User intention recognition method and apparatus, device, and computer-readable storage medium

Info

Publication number: WO2022141875A1
Application number: PCT/CN2021/084250
Authority: WO
Inventors: 李志韬; 王健宗; 程宁; 吴天博
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-12-30
Filing date: 2021-03-31
Publication date: 2022-07-07
Also published as: CN112732882A

Abstract

The present invention relates to the technical field of intelligent decision making. Provided are a user intention recognition method and apparatus, a device, and a computer-readable storage medium. The method comprises: obtaining text information corresponding to speech data input by a user, and inputting the text information into a preset intention classification model to obtain output probabilities of a plurality of preset intention labels for representing speech intentions (S101); determining a preset number of candidate intention labels from the plurality of preset intention labels according to the output probability of each preset intention label (S102); determining a dialogue success rate of each intention node in a preset intention knowledge graph; determining a dialogue success rate of each candidate intention label according to the dialogue success rate of each intention node (S104); and determining the candidate intention label having the highest dialogue success rate as an intention label of the speech data input by the user (S105). A target intention label of a user is determined by combining the preset intention classification model with the dialogue success rate of each intention node in the preset intention knowledge graph.

Description

User intent recognition method, apparatus, device, and computer-readable storage medium

This application claims the priority of the Chinese patent application with the application number 202011631344.4 and the invention title "User Intent Recognition Method, Apparatus, Equipment and Computer-readable Storage Medium", which was filed in the China Patent Office on December 30, 2020. The entire contents of this application are incorporated by reference.

technical field

The present application belongs to the technical field of intelligent decision-making, and in particular, relates to a method, apparatus, device and computer-readable storage medium for identifying user intent.

Background technique

The dialogue system is a human-computer interaction system based on natural language. Intent recognition is an important part of the human-computer interaction system. It converts the content of the user's dialogue into a way that the computer can understand. The recognized intent will directly affect the robot's next sentence. Whether what is said is relevant to what the user expresses, and whether the customer is satisfied. Among them, intent recognition mainly includes two parts: intent detection and extraction of semantic slots. The traditional methods of intent recognition have ranged from Hidden Markov Model (HMM), conditional random fields (CRF), and Support Vector Machine (SVM) to more popular in the past decade. Both the convolutional neural network and the recurrent neural network have good experimental results. However, the inventors realized that these models can only get better results with less contextual input and large-scale training corpus. Moreover, the traditional intent recognition is to use the classifier to select the intent with the highest probability as the final intent. In practical application scenarios, some intent recognition errors may be caused by occasional speech recognition errors. Data, and accurately determine the user's target intention is an urgent problem to be solved at present.

technical problem

One of the purposes of the embodiments of the present application is to provide a method, device, device and computer-readable storage medium for identifying user intent, aiming to solve the technical problem that the intent of the voice data input by the user can be accurately determined according to the voice data of the user .

technical solutions

In order to solve the above-mentioned technical problems, the technical solutions adopted in the embodiments of the present application are:

A first aspect of the embodiments of the present application provides a method for identifying user intent, the method comprising:

acquiring text information corresponding to the voice data input by the user, and inputting the text information into a preset intent classification model to obtain output probabilities of multiple preset intent labels used to represent voice intent;

According to the output probability of each of the preset intent tags, determine a preset number of candidate intent tags from a plurality of the preset intent tags;

determining the dialog success rate of each intent node in the preset intent knowledge graph, wherein the preset intent knowledge graph is generated according to historical dialog data;

Determine the dialog success rate of each of the candidate intent labels according to the dialog success rate of each of the intent nodes;

The candidate intent label with the highest dialogue success rate is determined as the intent label of the voice data input by the user.

A second aspect of the embodiments of the present application provides a user intent identification device, the user intent identification device includes an acquisition module, a generation module, a screening module, a first determination module, a second determination module, and a third determination module, wherein:

The obtaining module is used to obtain text information corresponding to the voice data input by the user;

The generating module is configured to input the text information into a preset intent classification model to obtain output probabilities of a plurality of preset intent labels used to represent voice intents;

the screening module, configured to determine a preset number of candidate intent tags from a plurality of the preset intent tags according to the output probability of each of the preset intent tags;

The first determining module is configured to determine the dialog success rate of each intent node in the preset intent knowledge graph, wherein the preset intent knowledge graph is generated according to historical dialog data;

the second determining module, configured to determine the dialogue success rate of each of the candidate intent tags according to the dialogue success rate of each of the intent nodes;

The third determining module is configured to determine the candidate intent label with the highest dialogue success rate as the intent label of the voice data input by the user.

A third aspect of the embodiments of the present application provides a computer device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program When realized:

A fourth aspect of the embodiments of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement:

A fifth aspect of the embodiments of the present application further provides a computer program product, when the computer program product is run on a computer device, the computer device can implement in real time:

beneficial effect

Compared with the prior art, the embodiments of the present application include the following advantages:

In this embodiment of the present application, by acquiring the text information corresponding to the voice data input by the user, and inputting the text information into the preset intent classification model, the output probabilities of multiple preset intent labels used to represent the voice intent are obtained; The output probability of the preset intent label, determine a preset number of candidate intent labels from multiple preset intent labels; then determine the dialog success rate of each intent node in the preset intent knowledge graph; The dialog success rate determines the dialog success rate of each candidate intent label; and the candidate intent label with the highest dialog success rate is determined as the intent label of the speech data input by the user. The method can obtain the output probability of multiple preset intent labels through the preset intent classification model. Combined with the output probability of multiple preset intent labels and the dialogue success rate of each intent node in the preset intent knowledge graph, it can accurately Identify the user's target intent label.

Description of drawings

In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the accompanying drawings that are used in the description of the embodiments or exemplary technologies. Obviously, the drawings in the following description are only for the present application. In some embodiments, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without any creative effort.

1 is a schematic flowchart of steps of a method for identifying user intent provided by an embodiment of the present application;

2 is a schematic block diagram of a preset intent classification model provided by an embodiment of the present application;

3 is a schematic flow chart of sub-steps of the method for identifying user intent in FIG. 1;

FIG. 4 is a schematic diagram of a scenario of a knowledge graph of a preset intent provided by an embodiment of the present application;

FIG. 5 is a schematic block diagram of an apparatus for identifying user intent according to an embodiment of the present application;

6 is a schematic block diagram of sub-modules of the device for identifying user intent in FIG. 5;

FIG. 7 is a schematic structural block diagram of a computer device according to an embodiment of the present application.

The realization, functional characteristics and advantages of the purpose of the present application will be further described with reference to the accompanying drawings in conjunction with the embodiments.

Embodiments of the present invention

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

The flowcharts shown in the figures are for illustration only, and do not necessarily include all contents and operations/steps, nor do they have to be performed in the order described. For example, some operations/steps can also be decomposed, combined or partially combined, so the actual execution order may be changed according to the actual situation.

Embodiments of the present application provide a method, apparatus, device, and computer-readable storage medium for identifying user intent. Wherein, the method for identifying the user intent can be applied to a terminal device, and the terminal device can be an electronic device such as a mobile phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant, and a wearable device.

Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and features in the embodiments may be combined with each other without conflict.

Please refer to FIG. 1 , which is a schematic flowchart of steps of a method for identifying user intent provided by an embodiment of the present application.

As shown in FIG. 1 , the method for identifying user intent includes steps S101 to S105.

Step S101: Acquire text information corresponding to the voice data input by the user, and input the text information into a preset intent classification model to obtain output probabilities of multiple preset intent labels representing the voice intent.

The preset intent classification model is a pre-trained model, and the preset intent classification model includes a plurality of neural network layers, and the neural network layers at least include at least one of the following: a vector extraction layer, a delay neural network layer, and a ReLU layer , residual network layer, summation layer, recurrent neural network layer, dropout layer and Solfmaxlayer layer.

Specifically, as shown in FIG. 2, the text information corresponding to the voice data input by the user is obtained, the text information is input into the vector extraction layer, multiple word vectors are obtained, and the multiple word vectors are input into the delay neural network layer , extract multiple word vector features, and input multiple word vector features to the ReLU layer to process multiple word vector features, reduce the gradient disappearance of word vector features, and obtain semantic label vectors. Input word vectors to the summation layer to obtain a plurality of preliminary intent label vectors, input a plurality of the preliminary intent label vectors to the recurrent neural network layer, obtain a plurality of candidate intent label vectors, and input the plurality of candidate intent label vectors to dropout layer to obtain multiple preset intent label vectors, and input multiple intent label vectors to the Solfmaxlayer layer to obtain the output probability of the preset intent labels.

It should be noted that the vector extraction layer can be selected according to the actual situation. For example, the vector extraction layer is a Word2Vec model, and the delay neural network layer and the ReLU layer also include a residual network layer. The residual network layer makes the delay neural network The parameter processing of the layer and the ReLU layer is more accurate. The dropout layer can prevent the candidate intent label vector from overfitting and improve the accuracy of the output intent label vector.

The training method of the preset intent classification model may be: acquiring sample text information, labeling the sample text information according to the category identifier corresponding to the output probability of the preset intent label, so as to construct sample data, and based on the sample data, neural The network model is iteratively trained until the neural network mode converges, thereby obtaining the preset intent classification model. The above-mentioned neural network models include convolutional neural network models, cyclic neural network models, and cyclic convolutional neural network models. Of course, other network models can also be used for training to obtain a preset intent classification model, which is not specifically limited in this application.

In one embodiment, the text information corresponding to the voice data input by the user is acquired, and the text information is input into the preset intent classification model to obtain the output probabilities of multiple preset intent labels. Through the preset intent classification model, the output probability of multiple preset intent labels can be accurately and quickly determined, which greatly improves the user experience.

In one embodiment, the method of acquiring the text information corresponding to the voice data input by the user is: acquiring the voice input by the user, and inputting the voice into a preset voice recognition model to obtain the text information. The preset speech recognition model is a pre-trained neural network model, which is not specifically limited in this application. In other embodiments, the text information corresponding to the voice data transmitted by other devices is obtained, and the text information corresponding to the voice data input by the user is obtained. It can be understood that there are other ways of acquiring the text information corresponding to the voice data of the user data, which is not specifically limited in this application.

Step S102 , according to the output probability of each of the preset intent tags, determine a preset number of candidate intent tags from a plurality of the preset intent tags.

Among them, the candidate intent label is the intent label whose intent is closer to the user's intent.

In one embodiment, the plurality of preset intent tags are sorted in descending order of output probability to obtain an intent tag queue; the preset intent tags are sequentially selected from the intent tag queue until a preset intent tag is obtained. the number of said candidate intent labels. The preset intent label may be set according to the actual situation, which is not specifically limited in this application. For example, the preset intent label may be set to 5. By arranging the intent tag queue of preset intent tags, and then selecting candidate intent tags according to the probability, the accuracy and efficiency of selecting candidate intent tags can be improved.

Exemplarily, the output probability of the preset intent tag 1 is 10%, the output probability of the preset intent tag 2 is 20%, the output probability of the preset intent tag 3 is 5%, and the output probability of the preset intent tag 4 is 12%. %, the output probability of preset intent tag 5 is 7%, the output probability of preset intent tag 6 is 18%, the output probability of preset intent tag 7 is 25%, the output probability of preset intent tag 8 is 14%, The output probability of the preset intent label 9 is 4% and the output probability of the preset intent label 10 is 28%. According to the probability of each preset intent label, the 10 intent labels are sorted in descending order to obtain the intent labels. The queue is [preset intent tag 10, preset intent tag 7, preset intent tag 2, preset intent tag 6, preset intent tag 8, preset intent tag 4, preset intent tag 1, preset intent tag 5 , preset intent tag 3, preset intent tag 9], the number of acquired preset intent tags is 5, select the first 5 candidate intent tags from the intent tag queue, and obtain the candidate intent tags as preset intent tags 10, preset intent tags Preset intent tag 7, preset intent tag 2, preset intent tag 6, and preset intent tag 8. By sorting the preset intent labels, candidate intent labels can be quickly selected.

Step S103 , determining the dialog success rate of each intent node in the preset intent knowledge graph, wherein the preset intent knowledge graph is generated according to historical dialog data.

The preset intent knowledge graph is generated according to historical dialogue data. Specifically, all historical dialog data is collected, the historical dialog data is classified, and the associated dialog data is associated to obtain the preset intent knowledge graph.

In one embodiment, a preset intent knowledge graph is obtained, wherein the preset intent knowledge graph is generated according to historical dialogue data; the success of the intent node corresponding to each intent node in the preset intent knowledge graph is rate as the dialog success rate for each intent node. The dialog success rate of each intent node can be accurately determined by the preset intent knowledge graph.

In one embodiment, as shown in FIG. 3 , step S103 includes sub-steps S1031 to S1034 .

Sub-step S1031: Acquire multiple flow paths of each intention node from the preset intention knowledge graph, and count the number of the multiple flow paths, wherein each flow path includes multiple intention nodes.

Wherein, each flow path includes multiple intent nodes.

Exemplarily, as shown in FIG. 4 , the preset intent knowledge graph includes intent node a, intent node b, intent node c, intent node d, intent node e, intent node f, and intent node g, and the flow of intent node a. The path includes the flow path where intent node a connects intent node b and intent node c, intent node a connects intent node b connects intent node e connects intent node f, and intent node a connects intent node b connects intent node e connects intent node. A's flow path, intent node a connects intent node b connects intent node e connects intent node g's flow path, intent node a connects intent node b connects intent node e connects intent node f The flow path and intent node a connects intent node d The flow path connecting intention node g, the flow path of intention node b includes the flow path connecting intention node b to intention node c, the flow path connecting intention node b to intention node e and the flow path of intention node b connecting intention node e to the intention node The flow path of g is connected to the flow path of intention node b and the flow path of intention node e is connected to the flow path of intention node a. The flow path of intention node e includes the flow path of intention node e connecting intention node a, and the flow path and intention of intention node e connecting intention node f. Node e is connected to the flow path of intention node g, the flow path of intention node d includes the flow path of intention node connected to intention node g, and intention node c and intention node f have no flow path.

Exemplarily, as shown in FIG. 4 , the number of paths of the flow path of the intention node a is 5, the number of paths of the flow path of the intention node b is 4, and the number of paths of the flow path of the intention node c is 0 The number of paths of the flow path of the intent node d is 1, the number of paths of the flow path of the intention node e is 3, the number of paths of the flow path of the intention node f is 0, and the number of paths of the flow path of the intention node g is 0 The number of paths is 0.

Sub-step S1032: Determine the flow path whose attribute identifier of the last intent node is the preset attribute identifier as a successful flow path.

The attribute identifier of the intent node is a keyword set according to the actual situation, for example, keywords such as time, place, and event.

In one embodiment, the flow path identified by the preset attribute is determined as the flow path that has an attribute of the last intent node in the flow path. For example, the preset attribute identifier is time, and if the last intent node in the flow path is a flow path of time, the flow path is a successful flow path.

Sub-step S1033: Count the number of successful flow paths among the multiple flow paths of each intention node.

Specifically, the number of paths of the multiple flow paths of each intent node is determined according to the preset intent knowledge graph.

Exemplarily, as shown in FIG. 4 , when the preset attribute identifier of the intent node is g, the attribute identifier of the last intent node in the multiple flow paths of each intent node is queried from the preset intent knowledge graph as g. The number of , where the attribute of the last intent node in the multiple flow paths of the intent node a is 2, and the attribute of the last intent node in the multiple flow paths of the intent node b is the number of g is 1 , the attribute identifier of the last intent node in the multiple flow paths of the intent node c is 0, the number of the attribute identifier of the last intent node in the multiple flow paths of the intent node d is 1, and the intent node e The attribute identifier of the last intent node in the multiple flow paths is 1, the attribute identifier of the last intent node in the multiple flow paths of the intent node f is 0, and the number of intent node g is multiple flow paths The attribute identifier of the last intent node in , the number of g is 1. Therefore, the number of successful circulations of intent node a is 2, the number of successful circulations of intent node b is 1, and the number of successful circulations of intent node c is obtained. is 0 times, the number of successful circulations obtained by intent node d is 1, the number of successful circulations obtained by intent node e is 1, the number of successful circulations obtained by intent node f is 0, and the number of successful circulations obtained by intent node f is 1 Second-rate.

Sub-step S1034: Calculate the percentage of the number of successful circulation paths in the multiple circulation paths of each intention node to the number of all circulation paths, and use the calculated percentage as the dialogue success rate of each intention node.

In one embodiment, determine the percentage of the successful circulation times of each intent node to the corresponding number of paths; determine the percentage of the successful circulation times of each intent node to the corresponding number of paths as a percentage in the preset intent knowledge graph. The dialog success rate for each intent node of .

Exemplarily, the number of successful circulation of intent node a is 2, the number of successful circulation of intent node b is 1, the number of successful circulation of intent node c is 0, the number of successful circulation of intent node d is 1, and the number of successful circulation of intent node e1 2 times, the number of successful circulation of intention node f is 0, the number of successful circulation of intention node g is 0, the number of paths of the circulation path of intention node a is 5, and the number of paths of the circulation path of intention node b is 4 The number of paths for the flow path of the intent node c is 0, the number of paths for the flow path of the intention node d is 1, the number of paths of the flow path of the intention node e is 3, and the number of paths of the flow path of the intention node f The number of paths is 0, and the number of paths of the flow path of intention node g is 0. The number of successful flow of intention node a accounts for 40% of the corresponding number of paths, and the number of successful flow of intention node b accounts for 40% of the corresponding number of paths. 25% of the number of paths, the number of successful circulation of intention node c accounts for 0% of the number of corresponding paths, the number of successful circulation of intention node d accounts for 100% of the number of corresponding paths, and the number of successful circulation of intention node e accounts for the corresponding number of paths. 33.3% of the number of paths, the number of successful circulation of intention node f accounts for 0% of the number of corresponding paths, the number of successful circulation of intention node g accounts for 100% of the number of corresponding paths, according to the number of intention nodes a, b, c , d, e, f, and g’s successful flow times account for the number of corresponding paths. Determine the success rate of the conversation of intent node a is 40%, determine the success rate of conversation of intent node b is 25%, and determine the success rate of conversation of intent node c. 0%, the dialog success rate for determining intent node d is 100%, the dialog success rate for determining intent node e is 33.3%, the dialog success rate for determining intent node f is 0%, and the dialog success rate for determining intent node g is 100%.

Step S104: Determine the dialog success rate of each candidate intent tag according to the dialog success rate of each intent node.

Among them, the success rate of the candidate intent label is between 0 and 100%, and the larger the candidate intent label, the higher the probability of the successful dialogue of the candidate intent label.

In one embodiment, the dialog success rate of each intent node and the preset intent label corresponding to each intent node are mapped to obtain the dialog success rate of each preset intent label; The tags are mapped to each other, and the dialogue success rate of the mapped preset intent tags is used as the dialogue success rate of the candidate intent tags.

Step S105: Determine the candidate intent tag with the highest dialogue success rate as the intent tag of the voice data input by the user.

Among them, the target intent label is the intent label closest to the user's intent.

In one embodiment, a plurality of candidate intent tags are sorted according to the dialogue success rate of each candidate intent tag to obtain a candidate intent tag queue, and the candidate intent tag with the highest dialogue success rate is selected from the candidate intent tag queue as the user's target. Intent label. The candidate intent labels are sorted by the success rate and the candidate intent label with the highest success rate is selected as the user's target intent label, which greatly improves the accuracy of determining the user's intent.

Exemplarily, the dialog success rate of candidate intent label 1 is 50%, the dialog success rate of candidate intent label 2 is 25%, the dialog success rate of candidate intent label 3 is 15%, and the dialog success rate of candidate intent label 4 is 60%. %, the dialogue success rate of candidate intent label 5 is 40%, and the candidate intent label 1, candidate intent label 2, candidate intent label 3, candidate intent label 4 and candidate intent label 5 are sorted according to the conversation success rate of candidate intent labels, and we get The candidate intent label queue is, [candidate intent label 4, candidate intent label 1, candidate intent label 5, candidate intent label 2, candidate intent label 3], from the candidate intent label queue, select the candidate intent label 4 with the highest dialogue success rate as the The user's goal intent label.

The user intent recognition method provided by the above embodiment obtains the output probability of multiple preset intent labels used to represent the voice intent by acquiring the text information corresponding to the voice data input by the user, and inputting the text information into the preset intent classification model. ; Then according to the output probability of each preset intent tag, determine a preset number of candidate intent tags from multiple preset intent tags; then determine the dialogue success rate of each intent node in the preset intent knowledge graph; The dialog success rate of each intent node determines the dialog success rate of each candidate intent label; and the candidate intent label with the highest dialog success rate is determined as the intent label of the speech data input by the user. The method can obtain the output probability of multiple preset intent labels through the preset intent classification model. Combined with the output probability of multiple preset intent labels and the dialogue success rate of each intent node in the preset intent knowledge graph, it can accurately Identify the user's target intent label.

Please refer to FIG. 5. FIG. 5 is a schematic block diagram of an apparatus for recognizing user intent according to an embodiment of the present application.

As shown in FIG. 5 , the user intent recognition apparatus 200 includes an acquisition module 210 , a generation module 220 , a screening module 230 , a first determination module 240 , a second determination module 250 and a third determination module 260 , wherein,

The obtaining module 210 is configured to obtain text information corresponding to the voice data input by the user.

The generating module 220 is configured to input the text information into a preset intent classification model to obtain output probabilities of multiple preset intent labels used to represent the voice intent.

The screening module 230 is configured to determine a preset number of candidate intent tags from a plurality of the preset intent tags according to the output probability of each preset intent tag.

The first determining module 240 is configured to determine the dialog success rate of each intent node in the preset intent knowledge graph, wherein the preset intent knowledge graph is generated according to historical dialog data.

The second determination module 250 is configured to determine the dialogue success rate of each of the candidate intent tags according to the dialogue success rate of each of the intent nodes.

The third determining module 260 is configured to determine the candidate intent label with the highest dialogue success rate as the intent label of the voice data input by the user.

In one embodiment, the screening module 230 further includes the following sub-modules:

The sorting sub-module is used for sorting a plurality of the preset intent tags according to the descending output probability to obtain an intent tag queue.

A selection sub-module, configured to sequentially select the preset intent tags from the intent tag queue until a preset number of the candidate intent tags are obtained.

In one embodiment, the first determining module 240 further includes the following sub-modules:

The knowledge graph acquisition sub-module is used to acquire the preset intent knowledge graph.

A sub-module is set for taking the success rate of the intent node corresponding to each intent node in the preset intent knowledge graph as the dialog success rate of each intent node.

In one embodiment, as shown in FIG. 6 , the first determination module 240 includes an acquisition sub-module 241, a statistics module 242, a determination sub-module 243 and a calculation module 244, wherein:

The acquisition sub-module 241 is configured to acquire multiple flow paths of each intent node from the preset intent knowledge graph.

The statistics module 242 is configured to count the number of the multiple flow paths, wherein each flow path includes multiple intent nodes.

The determining submodule 243 is configured to determine the flow path whose attribute identifier of the last intent node is the preset attribute identifier as a successful flow path.

The statistics module 242 is further configured to count the number of successful flow paths in the multiple flow paths of each intention node.

The calculation module 244 is configured to calculate the percentage of the number of successful circulation paths in the multiple circulation paths of each intention node to the number of all circulation paths, and use the calculated percentage as the dialogue success rate of each intention node.

In one embodiment, the second determining module 250 further includes the following sub-modules:

The first mapping sub-module is configured to map the dialogue success rate of each of the intent nodes and the preset intent labels corresponding to each of the intent nodes to obtain the dialog success rate of each preset intent label.

The second mapping submodule is configured to map the candidate intent tag with the preset intent tag, and use the mapped dialog success rate of the preset intent tag as the dialog success rate of the candidate intent tag.

In one embodiment, the preset intent classification model includes a vector extraction layer, a time-delay neural network layer, a ReLU layer, a residual network layer, a summation layer, a recurrent neural network layer, a dropout layer, and a Solfmaxlayer layer; the generating Module 220 also includes the following sub-modules:

The first input sub-module is used for inputting the text information to the vector extraction layer to obtain a plurality of word vectors.

The second input sub-module is used for inputting a plurality of the word vectors to the time delay neural network layer, and extracting a plurality of word vector features.

The third input sub-module is used for inputting a plurality of the word vector features to the ReLU layer to obtain a semantic label vector.

The fourth input sub-module is configured to input the semantic label vector and a plurality of the word vectors into the summation layer to obtain a plurality of preliminary intent label vectors.

The fifth input sub-module is used for inputting a plurality of the preliminary intent label vectors to the recurrent neural network layer to obtain a plurality of candidate intent label vectors.

The seventh input sub-module is used for inputting a plurality of the candidate intent label vectors to the dropout layer to obtain a plurality of preset intent label vectors.

The eighth input sub-module is used for inputting a plurality of the intent label vectors to the Solfmaxlayer layer to obtain the output probability of the preset intent label.

The apparatuses provided by the above embodiments may be implemented in the form of a computer program, and the computer program may be executed on the computer device as shown in FIG. 7 .

Please refer to FIG. 7 , which is a schematic block diagram of the structure of a computer device according to an embodiment of the present application.

As shown in FIG. 7, the computer device includes a processor, a memory, and a network interface connected by a system bus, wherein the memory may include a non-volatile storage medium and an internal memory.

The nonvolatile storage medium can store operating systems and computer programs. The computer program includes program instructions that, when executed, can cause the processor to execute any method for identifying user intent.

This network interface is used for communication. Those skilled in the art can understand that the structure shown in FIG. 7 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.

It should be understood that the bus is, for example, an I2C (Inter-integrated Circuit) bus, the memory can be a Flash chip, a read-only memory (ROM, Read-Only Memory) magnetic disk, an optical disk, a U disk or a mobile hard disk, etc., and the processor can be Central Processing Unit (CPU), the processor can also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), application specific integrated circuits (Application Specific Integrated Circuits, ASICs), field programmable gates Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. Wherein, the general-purpose processor can be a microprocessor or the processor can also be any conventional processor or the like.

Wherein, in one embodiment, a computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements when executing the computer program:

In one embodiment, the processor, when executing the computer program, implements:

Sorting a plurality of the preset intent tags according to the output probability in descending order to obtain an intent tag queue;

The preset intent tags are sequentially selected from the intent tag queue until a preset number of the candidate intent tags are obtained.

obtaining the preset intent knowledge graph;

The success rate of the intent node corresponding to each intent node in the preset intent knowledge graph is taken as the dialog success rate of each intent node.

Obtain multiple flow paths of each intention node from the preset intention knowledge graph, and count the number of the multiple flow paths, wherein each flow path includes multiple intention nodes;

Determining the flow path with the attribute identification of the last intent node as the preset attribute identification as the successful flow path;

Count the number of successful circulation paths in the multiple circulation paths of each of the intent nodes;

Calculate the percentage of the number of successful flow paths in the multiple flow paths of each intention node to the number of all flow paths, and use the calculated percentage as the dialogue success rate of each intention node.

Mapping the dialogue success rate of each of the intent nodes and the preset intent labels corresponding to each of the intent nodes to obtain the dialog success rate of each preset intent label;

The candidate intent tag is mapped with the preset intent tag, and the dialog success rate of the mapped preset intent tag is used as the dialog success rate of the candidate intent tag.

In one embodiment, the preset intent classification model includes a vector extraction layer, a delay neural network layer, a ReLU layer, a residual network layer, a summation layer, a recurrent neural network layer, a dropout layer, and a Solfmaxlayer layer; the processing When the computer executes the computer program, it realizes:

Inputting the text information to the vector extraction layer to obtain a plurality of word vectors;

inputting a plurality of the word vectors into the delay neural network layer, and extracting a plurality of word vector features;

Inputting a plurality of the word vector features to the ReLU layer to obtain a semantic label vector;

Inputting the semantic label vector and a plurality of the word vectors to the summation layer to obtain a plurality of preliminary intent label vectors;

inputting a plurality of the preliminary intent label vectors into the recurrent neural network layer to obtain a plurality of candidate intent label vectors;

Inputting a plurality of the candidate intent label vectors to the dropout layer to obtain a plurality of preset intent label vectors;

A plurality of the intent label vectors are input to the Solfmaxlayer layer to obtain the output probability of the preset intent label.

It should be noted that those skilled in the art can clearly understand that, for the convenience and brevity of description, for the specific working process of the computer device described above, reference may be made to the corresponding process in the foregoing embodiment of the method for identifying user intent, which is not repeated here. Repeat.

Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, the computer program includes program instructions, and the method implemented when the program instructions are executed may refer to this document Various embodiments of methods for identifying user intent are claimed. specific:

A computer-readable storage medium on which a computer program is stored, wherein the computer program is implemented when executed by a processor:

In one embodiment, the computer program, when executed by the processor, further implements:

obtaining the preset intent knowledge graph;

In one embodiment, the preset intent classification model includes a vector extraction layer, a time-delay neural network layer, a ReLU layer, a residual network layer, a summation layer, a recurrent neural network layer, a dropout layer, and a Solfmaxlayer layer; the computer When the program is executed by the processor, it also implements:

The computer-readable storage medium may be an internal storage unit of the computer device described in the foregoing embodiments, such as a hard disk or a memory of the computer device. The computer-readable storage medium may be non-volatile or volatile. The computer-readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk equipped on the computer device, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) ) card, Flash Card, etc.

It should be understood that the terms used in the specification of the present application herein are for the purpose of describing particular embodiments only and are not intended to limit the present application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural unless the context clearly dictates otherwise.

It will also be understood that, as used in this specification and the appended claims, the term "and/or" refers to and including any and all possible combinations of one or more of the associated listed items. It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or system comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, method, article or system. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article or system that includes the element.

The above-mentioned serial numbers of the embodiments of the present application are only for description, and do not represent the advantages or disadvantages of the embodiments. The above are only specific embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person skilled in the art can easily think of various equivalents within the technical scope disclosed in the present application. Modifications or substitutions shall be covered by the protection scope of this application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

A method for identifying user intent, wherein the method comprises:

acquiring text information corresponding to the voice data input by the user, and inputting the text information into a preset intent classification model to obtain output probabilities of multiple preset intent labels used to represent voice intent;

According to the output probability of each of the preset intent tags, determine a preset number of candidate intent tags from a plurality of the preset intent tags;

determining the dialog success rate of each intent node in the preset intent knowledge graph, wherein the preset intent knowledge graph is generated according to historical dialog data;

Determine the dialog success rate of each of the candidate intent labels according to the dialog success rate of each of the intent nodes;

The candidate intent label with the highest dialogue success rate is determined as the intent label of the voice data input by the user.
The method for identifying user intent according to claim 1, wherein determining a preset number of candidate intent tags from a plurality of preset intent tags according to an output probability of each preset intent tag, comprising:

Sorting a plurality of the preset intent tags according to the output probability in descending order to obtain an intent tag queue;

The preset intent tags are sequentially selected from the intent tag queue until a preset number of the candidate intent tags are obtained.
The method for identifying user intent according to claim 1, wherein the determining the dialog success rate of each intent node in the preset intent knowledge graph comprises:

obtaining the preset intent knowledge graph;

The success rate of the intent node corresponding to each intent node in the preset intent knowledge graph is taken as the dialog success rate of each intent node.
The method for identifying user intent according to claim 1, wherein the determining the dialog success rate of each intent node in the preset intent knowledge graph comprises:

Obtain multiple flow paths of each intention node from the preset intention knowledge graph, and count the number of the multiple flow paths, wherein each flow path includes multiple intention nodes;

Determining the flow path with the attribute identification of the last intent node as the preset attribute identification as the successful flow path;

Count the number of successful circulation paths in the multiple circulation paths of each of the intent nodes;

Calculate the percentage of the number of successful flow paths in the multiple flow paths of each intention node to the number of all flow paths, and use the calculated percentage as the dialogue success rate of each intention node.
The method for identifying user intent according to any one of claims 1-4, wherein the determining the dialog success rate of each of the candidate intent labels according to the dialog success rate of each of the intent nodes comprises:

Mapping the dialogue success rate of each of the intent nodes and the preset intent labels corresponding to each of the intent nodes to obtain the dialog success rate of each preset intent label;

The candidate intent tag is mapped with the preset intent tag, and the dialog success rate of the mapped preset intent tag is used as the dialog success rate of the candidate intent tag.
The user intent identification method according to any one of claims 1-4, wherein the preset intent classification model comprises a vector extraction layer, a time-delay neural network layer, a ReLU layer, a residual network layer, a summation layer, a loop A neural network layer, a dropout layer and a Solfmaxlayer layer; the text information is input into the preset intent classification model to obtain the output probabilities of N preset intent labels, including:

Inputting the text information to the vector extraction layer to obtain a plurality of word vectors;

inputting a plurality of the word vectors into the delay neural network layer, and extracting a plurality of word vector features;

Inputting a plurality of the word vector features to the ReLU layer to obtain a semantic label vector;

Inputting the semantic label vector and a plurality of the word vectors to the summation layer to obtain a plurality of preliminary intent label vectors;

Inputting a plurality of the preliminary intent label vectors to the recurrent neural network layer to obtain a plurality of candidate intent label vectors;

Inputting a plurality of the candidate intent label vectors to the dropout layer to obtain a plurality of preset intent label vectors;

A plurality of the intent label vectors are input to the Solfmaxlayer layer to obtain the output probability of the preset intent label.
A user intention identification device, wherein the user intention identification device includes an acquisition module, a generation module, a screening module, a first determination module, a second determination module and a third determination module, wherein:

The obtaining module is used to obtain text information corresponding to the voice data input by the user;

The generating module is configured to input the text information into a preset intent classification model to obtain output probabilities of multiple preset intent labels used to represent voice intents;

the screening module, configured to determine a preset number of candidate intent tags from a plurality of the preset intent tags according to the output probability of each of the preset intent tags;

The first determining module is configured to determine the dialog success rate of each intent node in the preset intent knowledge graph, wherein the preset intent knowledge graph is generated according to historical dialog data;

the second determining module, configured to determine the dialogue success rate of each of the candidate intent tags according to the dialogue success rate of each of the intent nodes;

The third determining module is configured to determine the candidate intent label with the highest dialogue success rate as the intent label of the voice data input by the user.
The device for identifying user intentions as claimed in claim 7, wherein the screening module further comprises the following sub-modules:

a sorting sub-module, configured to sort a plurality of the preset intent tags according to the descending output probability to obtain an intent tag queue;

A selection sub-module, configured to sequentially select the preset intent tags from the intent tag queue until a preset number of the candidate intent tags are obtained.
The device for identifying user intent according to claim 7, wherein the first determining module further comprises the following sub-modules:

a knowledge graph acquisition sub-module for acquiring the preset intent knowledge graph;

A sub-module is set for taking the success rate of the intent node corresponding to each intent node in the preset intent knowledge graph as the dialog success rate of each intent node.
The user intention identification device according to claim 7, wherein the first determination module comprises an acquisition sub-module, a statistics module, a determination sub-module and a calculation module, wherein:

The obtaining submodule is used to obtain multiple flow paths of each intention node from the preset intention knowledge graph;

the statistics module, configured to count the number of the multiple flow paths, wherein each flow path includes multiple intent nodes;

The determining submodule is used to determine the flow path whose attribute identifier of the last intent node is the preset attribute identifier as the successful flow path;

The statistics module is further configured to count the number of successful circulation paths in the multiple circulation paths of each of the intention nodes;

The calculation module is configured to calculate the percentage of the number of successful circulation paths in the multiple circulation paths of each intention node to the number of all circulation paths, and use the calculated percentage as the dialogue success rate of each intention node.
The user intention identification device according to any one of claims 7-11, wherein the second determining module further comprises the following sub-modules:

a first mapping submodule, configured to map the dialogue success rate of each of the intent nodes and the preset intent labels corresponding to each of the intent nodes to obtain the dialog success rate of each preset intent label;

The second mapping submodule is configured to map the candidate intent tag with the preset intent tag, and use the mapped dialog success rate of the preset intent tag as the dialog success rate of the candidate intent tag.
The user intent recognition device according to any one of claims 7-11, wherein the preset intent classification model comprises a vector extraction layer, a time-delay neural network layer, a ReLU layer, a residual network layer, a summation layer, a loop Neural network layer, dropout layer and Solfmaxlayer layer; the generation module also includes the following sub-modules:

a first input submodule, for inputting the text information into the vector extraction layer to obtain a plurality of word vectors;

The second input sub-module is used for inputting a plurality of the word vectors to the time delay neural network layer, and extracting a plurality of word vector features;

The third input sub-module is used to input a plurality of the word vector features to the ReLU layer to obtain a semantic label vector;

a fourth input sub-module, configured to input the semantic label vector and a plurality of the word vectors into the summation layer to obtain a plurality of preliminary intent label vectors;

a fifth input sub-module, configured to input a plurality of the preliminary intent label vectors into the recurrent neural network layer to obtain a plurality of candidate intent label vectors;

a seventh input sub-module, configured to input a plurality of the candidate intent label vectors to the dropout layer to obtain a plurality of preset intent label vectors;

The eighth input sub-module is used for inputting a plurality of the intent label vectors to the Solfmaxlayer layer to obtain the output probability of the preset intent label.
A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor executing the computer program to achieve:

acquiring text information corresponding to the voice data input by the user, and inputting the text information into a preset intent classification model to obtain output probabilities of multiple preset intent labels used to represent voice intent;

According to the output probability of each of the preset intent tags, determine a preset number of candidate intent tags from a plurality of the preset intent tags;

determining the dialog success rate of each intent node in the preset intent knowledge graph, wherein the preset intent knowledge graph is generated according to historical dialog data;

Determine the dialog success rate of each of the candidate intent labels according to the dialog success rate of each of the intent nodes;

The candidate intent label with the highest dialogue success rate is determined as the intent label of the voice data input by the user.
The computer device of claim 13, wherein the processor, when executing the computer program, further implements:

Sorting a plurality of the preset intent tags according to the output probability in descending order to obtain an intent tag queue;

The preset intent tags are sequentially selected from the intent tag queue until a preset number of the candidate intent tags are obtained.
The computer device of claim 13, wherein the processor, when executing the computer program, further implements:

obtaining the preset intent knowledge graph;

The success rate of the intent node corresponding to each intent node in the preset intent knowledge graph is taken as the dialog success rate of each intent node.
The computer device of claim 13, wherein the processor, when executing the computer program, further implements:

Obtain multiple flow paths of each intention node from the preset intention knowledge graph, and count the number of the multiple flow paths, wherein each flow path includes multiple intention nodes;

Determining the flow path with the attribute identification of the last intent node as the preset attribute identification as the successful flow path;

Count the number of successful circulation paths in the multiple circulation paths of each of the intent nodes;

Calculate the percentage of the number of successful flow paths in the multiple flow paths of each intention node to the number of all flow paths, and use the calculated percentage as the dialogue success rate of each intention node.
A computer-readable storage medium storing a computer program, wherein the computer program is executed by a processor to realize:

acquiring text information corresponding to the voice data input by the user, and inputting the text information into a preset intent classification model to obtain output probabilities of multiple preset intent labels used to represent voice intent;

According to the output probability of each of the preset intent tags, determine a preset number of candidate intent tags from a plurality of the preset intent tags;

determining the dialog success rate of each intent node in the preset intent knowledge graph, wherein the preset intent knowledge graph is generated according to historical dialog data;

Determine the dialog success rate of each of the candidate intent labels according to the dialog success rate of each of the intent nodes;

The candidate intent label with the highest dialogue success rate is determined as the intent label of the voice data input by the user.
The computer-readable storage medium of claim 17, wherein the computer program, when executed by the processor, further implements:

Sorting a plurality of the preset intent tags according to the output probability in descending order to obtain an intent tag queue;

The preset intent tags are sequentially selected from the intent tag queue until a preset number of the candidate intent tags are obtained.
The computer-readable storage medium of claim 17, wherein the computer program, when executed by the processor, further implements:

obtaining the preset intent knowledge graph;

The success rate of the intent node corresponding to each intent node in the preset intent knowledge graph is taken as the dialog success rate of each intent node.
The computer-readable storage medium of claim 17, wherein the computer program, when executed by the processor, further implements:

Obtain multiple flow paths of each intention node from the preset intention knowledge graph, and count the number of the multiple flow paths, wherein each flow path includes multiple intention nodes;

Determining the flow path with the attribute identification of the last intent node as the preset attribute identification as the successful flow path;

Count the number of successful circulation paths in the multiple circulation paths of each of the intent nodes;

Calculate the percentage of the number of successful flow paths in the multiple flow paths of each intention node to the number of all flow paths, and use the calculated percentage as the dialogue success rate of each intention node.