WO2021042902A1

WO2021042902A1 - User intention identification method in multi-round dialogue and related device

Info

Publication number: WO2021042902A1
Application number: PCT/CN2020/103922
Authority: WO
Inventors: 陈涛; 张毅
Original assignee: 深圳Tcl数字技术有限公司
Priority date: 2019-09-04
Filing date: 2020-07-24
Publication date: 2021-03-11
Also published as: CN112445902A

Abstract

Disclosed in the present disclosure are a user intention identification method in multi-round dialogue and a related device. Said method comprises: acquiring a first dialogue state of previous information and a second dialogue state of following information in former and latter rounds of dialogues in multi-round dialogue; and calculating a first correlation between the first dialogue state and the second dialogue state, and determining, according to the magnitude of the first correlation, whether to perform user intention identification of a single-round dialogue. As the correlation of information between former and latter rounds of dialogues is fully considered in the present embodiment, if information of the two is greatly different, a user intention analysis is performed by using the latter round of dialogue as an independent piece of information, so that a more accurate analysis result can be obtained, thereby providing a basis for performing accurate feedback on information sent by a user.

Description

Method and related equipment for recognizing user's intention in multiple rounds of dialogue

priority

This disclosure requires the priority of a Chinese patent application filed with the Chinese Patent Office on September 4, 2019, the application number is "2019108334041", and the application name is "A method for identifying user intentions in multiple rounds of dialogue and related equipment" , The entire contents of which are incorporated into the present disclosure by reference.

Technical field

The present disclosure relates to the field of voice interaction technology, and in particular to a method and related equipment for recognizing user intentions in multiple rounds of dialogue.

Background technique

The natural language analysis technology in the prior art is generally data-driven and based on machine learning. The dialogue technology based on natural language analysis is divided into single-round dialogue and multi-round dialogue.

Multi-round dialogue is a way in which the user's intention is initially clarified in a human-machine dialogue, and then necessary information is obtained to finally obtain a clear user instruction. Multiple rounds of dialogue correspond to the handling of one thing. The multi-round dialogue system has modules such as language understanding, language generation, dialogue management, and knowledge base. Dialogue management also includes state tracking and action selection sub-modules. It can be considered that a multi-round dialogue system is an extension of a single-round dialogue based on analysis. In each round of dialogue, the semantics of the speech is understood and internal representations are generated. Dialogue management uses a finite state machine, which represents the entire process of obtaining information in a dialogue. After several rounds of dialogue, the system gradually obtains the required information and performs tasks.

However, the existing multiple rounds of dialogue in the prior art are based on the previous round of dialogue search for query matching in the next round of dialogue. For example, when the user’s previous round of dialogue is: "Piggie Pig", If the next round of dialogue is "horrible", it will first search for "Little Pig Peggy", and then search for information related to "horror" on Peggy Little Pig's search results, because there is a gap between these two words. There is no correlation, so the final analysis of the user’s intention is inaccurate, and the inaccurate analysis of the user’s intention leads to the inaccurate return of the system behavior, which leads to unsmooth communication between the user and the dialog device, or leads to the dialog device The instruction execution error directly brings inconvenience to the user's use of the dialogue device.

Therefore, the existing technology needs to be further improved.

Public content

In view of the above-mentioned shortcomings in the prior art, the present disclosure provides a method and related equipment for recognizing user intentions in multiple rounds of dialogue, which overcomes the lack of consideration of the relationship between the previous and subsequent dialogues in the multiple rounds of dialogue in the prior art. The query matching of the next round of dialogue is always performed on the basis of the previous round of dialogue search, which leads to the defect that the accuracy of the subsequent round of dialogue query matching is low.

In the first aspect, this embodiment discloses a method for recognizing user intentions in multiple rounds of dialogue, which includes the following steps:

Acquiring the first dialogue state of the above information and the second dialogue state of the following information in the multiple rounds of dialogue;

If the first correlation between the first dialogue state and the second dialogue state is less than the preset first threshold, the user intention is recognized according to the following information, and the user intention recognition result is obtained.

In one embodiment, before the step of acquiring the first dialogue state of the above information and the second dialogue state of the following information in the multiple rounds of dialogue, the method further includes:

Determine whether the above information and the voice information corresponding to the below information are the same;

If yes, re-acquire the dialogue information in the next round after the following information, and replace the re-acquired dialogue information with the following information.

In an embodiment, the step of separately acquiring the first dialogue state of the above information and the second dialogue state of the following information in the multiple rounds of dialogues includes:

Obtain the above and below information in multiple rounds of dialogue;

Perform voice recognition and language analysis on the above information and below information, respectively, to obtain the first dialogue state and the second dialogue state.

In one embodiment, if the first correlation between the first dialogue state and the second dialogue state is less than a preset first threshold, the user's intention is identified according to the following information to obtain the user Intent identification results, including:

If the correlation between the first dialogue state and the second dialogue state is greater than or equal to a preset first threshold, acquiring system feedback information corresponding to the first dialogue state;

If the correlation between the system feedback information and the second dialogue state is less than the preset second threshold, the user's intention is identified according to the following information, and the user's intention identification result is obtained.

In one embodiment, if the first correlation between the first dialogue state and the second dialogue state is less than a preset first threshold, the user's intention is identified according to the following information to obtain the user Before the step of intent to identify the result, it also includes:

Acquiring the first slot information of the first dialog state;

Acquiring the second slot information of the second dialogue state;

Calculating the first correlation according to the first slot information and the second slot information;

It is determined whether the first correlation is less than a preset first threshold.

In an embodiment, the step of calculating the first correlation according to the first slot information and the second slot information includes:

Acquiring the character strings contained in each slot in the first slot information, and merging the character strings into the first character string information;

Acquiring the character strings contained in each slot in the second slot information, and merging the character strings into the second character string information;

Calculating an edit distance between each character string in the first character string information and the second character string information;

The first correlation is calculated according to the size of the edit distance.

In one embodiment, if the correlation between the system feedback information and the second dialog state is less than a preset second threshold, the user intention is identified according to the following information, and the user intention recognition result is obtained Before the steps, it also includes:

Acquiring the third slot information of the system feedback information;

Acquiring the second slot information of the second dialogue state;

Calculating the second correlation according to the third slot information and the second slot information;

It is determined whether the second correlation is less than a preset second threshold.

In an embodiment, the step of calculating the second correlation according to the third slot information and the second slot information includes:

Acquiring the character strings contained in each slot in the third slot information, and merging the character strings into the third character string information;

Calculating the edit distance between each character string in the third character string information and the second character string information;

The second correlation is calculated according to the size of the edit distance.

In an embodiment, after the step of obtaining the first slot information and the second slot information of the first dialogue state and the second dialogue state, respectively, the method further includes:

Judging whether the slots contained in the first dialogue state and/or the second dialogue state are completely filled;

If it is not completely filled, then obtain the missing keyword information with the incompletely filled slots in the first dialogue state and/or the second dialogue state, and change the first dialogue state according to the obtained keyword information And/or the slots contained in the second dialog state are completely filled.

In an embodiment, after the step of separately acquiring the system feedback information and the third slot information and the second slot information of the second dialogue state, the method further includes:

Judging whether the slots contained in the system feedback information and/or the second dialog state are completely filled;

If it is not completely filled, then obtain the missing keyword information with the system feedback information and/or the incompletely filled slot in the second dialogue state, and send the system feedback information and/or according to the obtained keyword information Or the slots contained in the second dialogue state are completely filled.

In an embodiment, the step of recognizing the user's intention based on the following information to obtain the user's intention recognition result includes:

Acquiring character information of the information contained in the second dialogue state, and extracting a second keyword set;

Determining second user instruction information corresponding to the second keyword set;

Obtain the user intention recognition result according to the second user instruction information.

In an embodiment, the identification method further includes:

If the first correlation between the first dialogue state and the second dialogue state is greater than or equal to the preset first threshold, then the first dialogue state, the system feedback information and the following information are combined to the user Intentions are recognized, and the results of user intent recognition are obtained.

In an embodiment, the identification method further includes:

If the correlation between the system feedback information and the second dialogue state is greater than or equal to the preset second threshold, the first dialogue state, the system feedback information and the following information are combined to identify the user's intention , Get the result of user intention recognition.

In one embodiment, the step of combining the first dialogue state, the system feedback information and the following information to identify the user's intention, and obtaining the user's intention recognition result includes:

Acquiring the first dialogue state and the character information of the information contained in the system feedback information, and extracting the first keyword set;

Searching for first user instruction information corresponding to the first keyword set;

Searching for second user instruction information corresponding to the second keyword set in the first user instruction information;

In the second aspect, this embodiment also discloses a computer device, including a memory and a processor, the memory stores a computer program, wherein the steps of the method are implemented when the processor executes the computer program.

In the third aspect, this embodiment also discloses a computer-readable storage medium on which a computer program is stored, wherein the computer program is executed by a processor to implement the steps of the method.

Compared with the prior art, the embodiments of the present disclosure have the following advantages:

According to the method provided by the embodiments of the present disclosure, by acquiring the first dialogue state of the above information and the second dialogue state of the following information in the front and rear dialogues in multiple rounds of dialogue; respectively, the difference between the first dialogue state and the second dialogue state is calculated First correlation, and compare the first correlation with a preset first threshold, and if the first correlation is less than the preset first threshold, only identify the user's intention based on the following information to obtain User intention recognition result. Since this embodiment fully considers the relevance of the interactive information between the front and rear dialogues, when there is a large difference in the interactive information between the two, the latter dialogue will be used as a separate information for user intent analysis, so that more information can be obtained. It provides a basis for accurate analysis results and accurate feedback for users' information.

Description of the drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present disclosure or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments described in the present disclosure. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative work.

FIG. 1 is a flowchart of steps of a method for recognizing user intentions in multiple rounds of dialogue in an embodiment of the present disclosure;

Figure 2 is a schematic diagram of the information flow of the multi-round dialogue system;

Figure 3 Schematic diagram of the principle structure of a multi-round dialogue system;

Fig. 4 is a schematic diagram of a framework of an exemplary application scenario in an embodiment of the present disclosure;

Fig. 5 is a block diagram of the principle structure of a computer device in an embodiment of the present disclosure.

detailed description

In order to enable those skilled in the art to better understand the solutions of the present disclosure, the technical solutions in the embodiments of the present disclosure will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only These are a part of the embodiments of the present disclosure, but not all of the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present disclosure.

Multi-round dialogue is the process of human-computer interaction. After the user’s intention is initially clarified, the necessary information is obtained to finally obtain a definite user instruction. Multi-round dialogue corresponds to the processing of one thing, which can be expressed as a multitude of interactions between humans and machines. If the user's instructions can be clearly defined in one dialogue, then multiple rounds of dialogue can be expressed as one dialogue interaction between man and machine.

After research, it is found that the current multi-round dialogue ignores the randomness between the utterances, and it is necessary to judge whether it is necessary to make a decision based on the previous dialogue context based on the relevance of the content before and after the dialogue, and there may be between the content before and after the dialogue. Irrelevant situations. If the content between the front and the back is not relevant, when the system feedback behavior is generated for the next round of content, the retrieval is implemented on the basis of the previous round of content, which may lead to the subsequent round of information The resulting feedback behavior is inaccurate.

This embodiment discloses a method for identifying user intentions in multiple rounds of dialogue. By analyzing the relevance of information between the front and back rounds of dialogue, it is determined whether it is necessary to perform the next round of dialogue information on the basis of the query results of the previous round of dialogue. For example, when the user says "Little Pig" and then "Terror", first judge the relevance of the two. If they are not relevant, perform a single-round search, that is, search for "Little Pig" and "Single" Search for "horror". If relevant, search for "Little Pig Peggy" first, and then search for "horror" on the search results of "Little Pig Peggy". Since there is no correlation between "Little Pig Peggy" and "Terror", better and more accurate results can be obtained by using the method disclosed in this embodiment.

Exemplary method

Referring to Fig. 1, there is shown a method for recognizing user intentions in multiple rounds of dialogue in an embodiment of the present disclosure. In this embodiment, the method may include the following steps, for example:

Step S101: Acquire the first dialogue state of the above information and the second dialogue state of the following information in multiple rounds of dialogue.

In multiple rounds of dialogues, the first dialogue state corresponding to the above information and the second dialogue state of the following information are obtained respectively. The following information is the voice information sent by the user during the next human-computer interaction during the human-computer interaction in multiple rounds of dialogue. The above information is the human-computer interaction in the multiple rounds of dialogue. Compared with the last human-computer interaction, the user The previous voice message. The above information and the following information belong to the previous and next voice information sent by the user, and the above information and the following information belong to the natural voice dialogue. You can use Chinese, English or other natural voices to conduct multiple rounds of conversations with the dialogue system. . After the dialogue system receives the voice information sent by the user, it helps the user complete a task, which is usually the task of accessing information.

The dialogue state includes the text converted from the voice information sent by the user and the information related to the text information analyzed according to the text. After obtaining the above information and the following information, the above information and the following information are respectively subjected to voice recognition and semantic recognition to obtain the first dialogue state of the above information and the second dialogue state of the following information.

As shown in FIG. 2, the multiple rounds of dialogue include: voice understanding, voice generation, dialogue management, knowledge base search and other steps. Dialogue management also includes steps such as dialogue state tracking and action selection. It can be considered that multiple rounds of dialogue are the expansion of a single round of dialogue based on analysis. In each round of dialogue, the semantics of the speech is understood and internal representations are generated. Dialogue management uses a finite state machine, which represents the entire process of obtaining information in a dialogue. After several rounds of dialogue, the system gradually obtains the required information and performs tasks such as flight information query.

With reference to Figure 3, the above information sent by the user is first obtained, and the voice recognition result is generated through the voice recognition of the above information sent by the user, which is the text information corresponding to the above information; the semantic analysis module maps the text information to the user The dialogue state is the first dialogue state; similarly, voice recognition is performed on the following information sent by the user to obtain the voice recognition result, and the voice recognition result is mapped to the user dialogue state to obtain the second dialogue state.

In this step, in order to obtain the first dialogue state of the above information, it is first necessary to perform voice recognition on the above information, identify the text information contained in the above information, and then perform semantic analysis on the recognized text information. Get the information contained in the text message. In semantic analysis, there are generally two processing methods. One is to retrieve information corresponding to the text information, and the other is to generate information corresponding to the text information based on a generation method. The way to retrieve the information corresponding to the text information generally requires the establishment of a storage database, in which a large amount of dialogue data is stored, and an index is established between the dialogue data and the dialogue keywords. After the contained keywords are identified, the corresponding dialogue data in the database is output according to the keywords, that is, the analyzed first dialogue state corresponding to the text information. In the way of semantic analysis by generation, the semantic analysis processing module uses a large amount of data to construct a speech analysis model. After the user inputs a piece of text information, the voice analysis model outputs an analysis result corresponding to the text information. The voice analysis model is constructed on the basis of a large amount of dialogue data based on a deep learning neural network. The result of the analysis of the voice analysis model is the dialogue state corresponding to the above information or the following information.

Using the same voice recognition and semantic analysis methods to perform voice recognition and semantic analysis on the following information, respectively, to obtain a first dialogue state corresponding to the above information and a second dialogue state corresponding to the following information.

Step S102: If the first correlation between the first dialogue state and the second dialogue state is less than a preset first threshold, the user intention is recognized according to the following information, and the user intention recognition result is obtained.

After the first dialogue state and the second dialogue state are obtained in the above steps, the first correlation between the first dialogue state and the second dialogue state is calculated, and it is determined whether the first correlation is It is less than the preset first threshold. If it is less than, it is determined that the correlation between the above information and the following information is small, and the user's intention can be directly identified based on the following information to obtain the user's intention recognition result.

Specifically, since the calculation of the correlation between the dialogue states is the correlation between the corresponding slot information in the information contained in the dialogue state, the calculation of the correlation between the first dialogue state and the second dialogue state The first correlation steps include:

Acquiring the first slot information of the first dialog state;

Acquiring the second slot information of the second dialogue state;

The first correlation is calculated according to the first slot information and the second slot information.

The slot information is the information needed to transform the preliminary user's intentions into clear user instructions in the process of multiple rounds of dialogue, and one slot corresponds to a type of information that needs to be acquired in the processing of one thing. Slot information is a kind of information that must be obtained. It does not need to be completely filled in multiple rounds of dialogue. It is divided into required slot information and non-required slot information. Since the non-mandatory slot information can be obtained based on the context information, it can exist in the form of a default value.

For example: how is the weather today? In this dialog, because the weather conditions are different in different regions and in different weather conditions, the weather conditions must be searched based on geographic location. However, since the user's location can be known as: Beijing, the default query corresponding to the dialog can be defaulted The weather in Beijing, so the system can directly give feedback: search for the weather in Beijing.

Specifically, the step of calculating the first correlation according to the first slot information and the second slot information includes:

Acquiring the first character string information and the second character string information of the information contained in each slot in the first slot information and the second slot information, respectively;

The first correlation is calculated according to the size of the edit distance.

For example, in a round of dialogue, the following information is: "What is the exchange rate of the said RMB against the US dollar". Then the slot information contained in the following information is: "query (slot 1=RMB, slot 2=USD)" This form will be used as the input of the dialogue management module. At this time, the status tracking module will be based on the previous one. The round information is combined with the input to determine the query status of the round, and the user status in the previous round is determined as: currency information query, according to the character strings "RMB" and "USD" corresponding to the two slots of the information below , And the slot information string corresponding to the last round of system feedback information: "currency information query", calculate the edit distance between the two, and get the string information corresponding to the following information converted into the corresponding information above The minimum number of editing operations required for the string to get the correlation between the two.

Specifically, the algorithmic process of calculating the edit distance between character strings includes:

Suppose we use d[i,j] steps (you can use a two-dimensional array to save this value), which represents the minimum number of steps required to convert the string s[1...i] to the string t[1...j] , Then, in the most basic case, that is, when i is equal to 0, that is to say, the string s is empty, then the corresponding d[0,j] is an increase of j characters, so that s is converted to t, and j is equal to When 0, that is to say, the string t is empty, then the corresponding d[i,0] is to reduce i characters, so that s is converted to t.

Then we consider the general situation and add the dynamic programming algorithm. If we want to get s[1..i] after the least number of additions, deletions, or replacement operations, then we must transform it into t[1..j]. Previously, it was possible to add, delete, or replace in the least number of times, so that now the string s and string t only need to do one more operation or not to complete s[1..i] to t[1..j] Conversion. The so-called "before" is divided into the following three situations:

1) We can convert s[1...i] to t[1...j-1] in k operations;

2) We can convert s[1..i-1] to t[1..j] in k operations;

3) We can convert s[1...i-1] to t[1...j-1] in k steps.

For the first case, we only need to add t[j] and s[1..i] at the end to complete the matching, so that a total of k+1 operations are required.

For the second case, we only need to remove s[i] at the end, and then do these k operations, so a total of k+1 operations are required.

For the third case, we only need to replace s[i] with t[j] at the end, so that s[1..i] == t[1..j], so that a total of k+1 is needed operating. And if in the third case, s[i] is exactly equal to t[j], then we can complete this process using only k operations.

Finally, in order to ensure that the number of operations obtained is always the least, we can choose the least expensive one from the above three cases to convert s[1..i] to t[1..j]. The minimum operation required frequency.

The basic steps of the algorithm:

(1) Construct a matrix with the number of rows m+1 and the number of columns n+1 to save the number of operations that need to be performed to complete a certain conversion, and convert the string s[1..n] to the string t [1...m] The number of operations that need to be performed is the value of matrix[n][m];

(2) The first row of the initialization matrix is 0 to n, and the first column is 0 to m.

Matrix[0][j] represents the value in row 1 and column j-1. This value represents the number of operations required to convert the string s[1...0] to t[1..j]. Obviously An empty string is converted to a string of length j, only j times of add operations are required, so the value of matrix[0][j] should be j, and other values can be deduced by analogy.

(3) Check every s[i] character from 1 to n;

(4) Check every s[i] character from 1 to m;

(5) Compare each character of the string s and the string t in pairs, if they are equal, let the cost be 0, if they are not equal, let the cost be 1;

(6)a. If we can convert s[1..i-1] to t[1..j] in k operations, then we can remove s[i] and then do this k Operations, so a total of k+1 operations are required.

(6) b. If we can convert s[1...i] to t[1...j-1] in k operations, that is to say, d[i,j-1]=k, then we can convert t[j] plus s[1..i], so a total of k+1 operations are required.

(6)c. If we can convert s[1...i-1] to t[1...j-1] in k steps, then we can convert s[i] to t[j], such that Satisfying s[1..i] == t[1..j], so a total of k+1 operations are also required. (The cost is added here because if s[i] is exactly equal to t[j], then there is no need to perform a replacement operation, and it can be satisfied. If it does not wait, you need to do a replacement operation again, and then you need k+ 1 operation)

Because we want to obtain the minimum number of operations, we finally need to compare the number of operations in these three cases, and take the minimum value as the value of d[i,j];

Then repeat (3), (4), (5), (6), and the final result is in d[n,m].

According to the magnitude between the calculated first correlation and the preset first threshold, it is determined whether to perform a single round of dialogue or multiple rounds of dialogue to identify the user's intention. If the first correlation is greater than the preset first threshold, then multiple A round of dialogue recognizes the user's intention, and if the first correlation is less than a preset first threshold, a single round of dialogue is performed to recognize the user's intention.

In an implementation of this embodiment, in order to obtain a more accurate correlation determination result, if the first correlation between the first dialogue state and the second dialogue state is less than the preset first correlation Threshold, the user's intention is identified according to the following information, and the result of user's intention identification is obtained, including:

Step 103: If the correlation between the first dialogue state and the second dialogue state is greater than or equal to a preset first threshold, obtain system feedback information corresponding to the first dialogue state;

The machine system that talks to the user in the first dialog state will automatically feed back a reply message. The reply message is system feedback information, and the system feedback information is implemented by the dialog management module. As shown in Figure 3, the dialog management module will select the system feedback behavior that needs to be performed according to the first dialog state and the second dialog state, that is, system feedback information; if the system feedback information needs to interact with the user, then the language is generated The module will be triggered to generate natural language or system speech; finally, the generated language is read aloud to the user by the speech synthesis module.

The main tasks of dialogue management include: dialogue state maintenance and system decision making. The dialogue state maintenance includes maintaining and updating the dialogue state. For example, the dialogue state at time t+1 is _St+1 , which depends on the state St at the previous time _t , and the system behavior at the previous time _t , and the current time User behavior a _t+1 corresponding to t+1. It can be written as S _t+1 ←S _t +a _t +a _{t+1 to} generate system decisions. According to the dialogue state in the dialogue state tracking, the system feedback behavior is generated to decide what to do next. The system feedback behavior represents the dialogue based on the user's input State, the feedback behavior made by the system. Therefore, the input information of the dialogue management model is the user's voice information and the current dialogue state obtained by analyzing the user's voice information, and its output is the next system feedback behavior and updated dialogue status. Therefore, the more semantic information carried in the input information, the more accurate the information fed back by the dialogue management module.

For example: When the content of the above information is: I want to watch Peppa Pig, the corresponding dialogue status includes: film and television, actors are animals, comedy and family dramas are genres, and other information related to Peppa Pig. The system feedback information corresponding to the dialogue state is: video search. If the following information is: the third episode of the first season, through speech recognition and semantic analysis of the following information, the second dialogue state corresponding to the following information is: TV series or cartoons, episode 3, multi-season plot, etc., By calculating the similarity between the first dialogue state and the second dialogue state, it can be obtained that the correlation between the first dialogue state and the second dialogue state is greater than the preset first threshold, and it is necessary to obtain information about the first dialogue state. System feedback: Search for the third episode of the first season of Peppa Pig.

The above-mentioned dialogue management module controls the process of man-machine dialogue, and determines the reaction to the user at the moment based on the dialogue history information. The most common multi-round dialogue is task-driven. The user has a clear purpose such as ordering food and ticketing. User needs are more complex and have many restrictions. Therefore, consultation responses with relatively complex content need to be presented in multiple rounds. On the one hand, In the dialogue process, users can continuously modify or improve their own needs. On the other hand, when the user’s stated needs are not specific or clear enough, the machine can also help users find satisfactory results by asking, clarifying, or confirming. The dialogue process is as shown in Figure 4. The user and the system realize information communication through question and answer. The user sends out a voice message: Hi, I want to order a meal and realize the transmission of voice commands. When the system receives the user’s voice command (so The system can be a voice robot, or other devices that can recognize the user's voice information), analyze the voice command, and identify the key information contained in the voice command: restaurant, then the system will feed back the inquiry information: what type do you like What about the food, and the feedback dialogue behavior: food, after the user receives the message, the user sends out the voice message again: I like to eat Gongbao chicken, then the system receives the keyword contained in the voice message: Gongbao chicken, According to the received information, the user feedback confirmation, and finally a satisfactory ordering effect is obtained.

Step S104: If the correlation between the system feedback information and the second dialogue state is less than a preset second threshold, the user's intention is identified according to the following information, and the user's intention identification result is obtained.

If the correlation between the system feedback information corresponding to the first dialogue state and the second dialogue state acquired in the above step S103 is less than the preset second threshold, it is determined that the previous round of dialogue information and the next round of dialogue The information is low in relevance. Only the following information is used to identify the user's intention. Otherwise, it is determined that the previous round of dialogue information is highly relevant to the next round of dialogue information, and the following information and the relevant content of the above information are combined to determine the user's intention. Recognition.

Specifically, in step S104, if the correlation between the system feedback information and the second dialogue state is less than the preset second threshold, the user intention is identified according to the following information, and the user intention identification result is obtained before the step ,Also includes:

Acquiring the third slot information of the system feedback information;

Acquiring the second slot information of the second dialogue state;

Same as the above-mentioned obtaining the first slot information of the first dialogue state and obtaining the second slot information of the second dialogue state, respectively obtaining the third slot information of the system feedback information and the second slot information of the second dialogue state, Then, the second correlation between the system feedback information and the second dialogue state is calculated according to the obtained slot information.

Specifically, the step of calculating the second correlation according to the third slot information and the second slot information includes:

Calculate the correlation between the slot information, specifically to calculate the correlation with the string information of the information contained in the slot information, and the correlation between the string information is reflected by calculating the edit distance between the strings Yes, the calculation principle is the same as the calculation principle of the correlation between the first slot information and the second slot information in the above steps.

In an implementation manner, before the step of calculating the first correlation between the first dialogue state and the second dialogue state, the method further includes:

If the slots corresponding to the first dialogue state and/or the second dialogue state are not filled completely, the calculation accuracy of the correlation may be low. Therefore, in the above steps, the first dialogue state and the second dialogue state contain It is judged whether the slot is filled completely. If it is not filled completely, it will be filled completely, and then the correlation between the two will be calculated.

Similarly, before the step of calculating the correlation between the third slot information of the system feedback information and the second slot information of the second dialogue state, the method further includes:

Determine whether the third slot information and the second slot information are complete;

If the slot corresponding to the system feedback information and/or the second dialogue state is not filled completely, the calculation accuracy of the correlation may be low. Therefore, in the above steps, the slots contained in the system feedback information and the second dialogue state are not fully filled. It is judged whether the filling is complete. If it is not filled, it will be filled completely, and then the correlation between the two will be calculated.

When the correlation between the previous round of dialogue and the next round of dialogue among multiple rounds of dialogue is high, that is, when the first correlation is higher than the preset first preset threshold, or when the first correlation is less than or equal to the preset first threshold, However, when the second correlation is less than the preset second threshold, multiple rounds of dialogue are used to identify the user's intent, that is, the first dialogue state, the system feedback information and the following information are combined to identify the user's intent to obtain the user Intent recognition result.

Specifically, the step of identifying the user's intention by combining the first dialogue state, the system feedback information and the following information, and obtaining the user's intention recognition result includes:

Searching for second user instruction information corresponding to the second keyword set under the searched first user instruction information;

When the first dialogue state and/or the correlation between the system feedback information and the second dialogue state meets the preset threshold condition, the first dialogue state and the system feedback information contained in the first dialogue state will be told to you Character information, the user’s intention is recognized, and the search results for the above information are obtained. On the basis of the search results of the above information, the following information is searched, so as to feed back the above information and the following information sent by the user. Corresponding user instructions, and search results.

When the relevance between the previous round of dialogue and the next round of dialogue in multiple rounds of dialogue is low, that is, when the first relevance is less than the preset first preset threshold, a single round of dialogue is used to identify the user's intention, that is, only the following The information is combined to identify the user's intention, and the result of the user's intention identification is obtained.

When the correlation between the first dialogue state, the system feedback information, and the second dialogue state does not meet the preset threshold condition, a single round of dialogue is executed, and the user's intention is only identified based on the following information. The specific , The step of recognizing the user's intention according to the following information and obtaining the result of the user's intention recognition includes:

Searching for second user instruction information corresponding to the second keyword set;

In the above-mentioned single round of dialogue, only the results of user intention recognition are obtained for the following information. Since this search is not limited to the relevant content of the above information, search information that is more in line with the user's intention can be obtained.

In the method disclosed in this embodiment, since the dialogue state and system feedback information corresponding to the adjacent context information in the multiple rounds of dialogue are calculated, it is possible to avoid when there is no correlation between the current sentence of the dialogue and the next sentence of the dialogue. , Or based on the search results of the previous sentence of dialogue, generate the search results of the next sentence of dialogue, but re-search according to the content of the next sentence of dialogue, thereby improving the accuracy of user intention recognition.

In one embodiment, in order to prevent the user from sending out the same voice information before and after, the dialogue system repeatedly calculates the correlation of the same voice, which leads to an increase in the amount of system information processing tasks. Before the steps of the first dialogue state of the text information and the system feedback information corresponding to the first dialogue state, and the second dialogue state of the following information, it further includes:

If yes, ignore the following information, and re-acquire the dialogue information located in the next round after the following information, and replace the re-acquired dialogue information with the following information.

In the above steps, after obtaining the above information and the following information, first compare whether the above information and the following information are the same. If they are exactly the same, it is determined that the user has repeatedly said the same voice information, and the received following information is ignored , And re-receive the information in the next round of the following information, thereby avoiding unnecessary information processing by the system. For example: when the user repeats in a contextual dialogue: I want to book a ticket, because the information is the same two times, you can simply ignore the voice message received the second time: "I want to book a ticket" without semantic analysis, User intention recognition and other processing, but directly re-acquire the voice message of "From Xi'an to Beijing" sent by the user after the second voice message "I want to book a flight", and will re-receive the "From Xi'an to Beijing" voice message. The voice information is used as the following information to make similarity judgments.

In an application embodiment of this embodiment, the following steps can be used to identify user intentions:

Step H1, first determine whether the above information is the same as the following information, if they are the same, perform step H2, otherwise, perform step H3;

Step H2, reacquire the following information.

Step H3, acquiring the first dialogue state of the above information and the second dialogue state of the following information;

Step H4: Calculate the first correlation between the first dialogue state and the second dialogue state; determine whether the first correlation is greater than a preset first threshold; if it is less, go to step H5, otherwise go to step H6;

Step H5: Obtain the system feedback information corresponding to the first dialogue state, and calculate the second correlation between the system feedback information and the second dialogue state, and the second dialogue state and the system feedback Whether the second correlation between the information is lower than the preset second threshold, if yes, execute step H7, if not, execute step H8;

Step H6: It is determined that there is a correlation between the two rounds of dialogue, then enter the opposite round dialogue, obtain the system feedback information corresponding to the first dialogue state, and combine the first dialogue state, the system feedback information, and the second dialogue state. Recognition of user intent.

Step H7: This conversation can be used as a single-round conversation to identify the user's intention.

Step H8: This dialogue performs multiple rounds of dialogue, and the user's intention needs to be identified in combination with the first dialogue state, system feedback information, and second dialogue state.

For example: When the content of the above information is: I want to watch Peppa Pig, the corresponding dialogue status includes: film and television, actors are animals, comedies and family dramas are genres, and other information related to Peppa Pig. The system feedback information corresponding to the dialogue state is: video search. If the following information is: what's the weather today, it can be obtained by speech recognition and semantic analysis of the following information, and the second dialogue state corresponding to the following information is: geographic location, today, temperature, rain, etc., by calculating the first dialogue The similarity between the state and the second dialogue state, it can be obtained that the correlation between the first dialogue state and the second dialogue state is lower than the preset first threshold, then it is judged that the following information is not related to the above information . Therefore, it is not possible to make system feedback information for the following information based on the results of the above information. It is necessary to search for the second dialogue state again and make system feedback information for the second dialogue state: search for the current user’s location today weather.

For example, when the content of the above information is: I want to watch a comedy, the corresponding dialogue state content includes: movies, genres are comedy and family drama, and the system feedback information corresponding to the dialogue state is: movie search. If the following information is: Peppa Pig, the corresponding dialogue state content includes: film and television, the actor is an animal, the genre is comedy and family drama, and other information related to Peppa Pig, which corresponds to the dialogue state The system feedback information is: comedy movie search. Through speech recognition and semantic analysis of the following information, it can be obtained that the correlation between the first dialogue state and the second dialogue state is higher than the preset first threshold, and then it is determined that the following information is related to the above information. Therefore, it is necessary to make system feedback information for the following information based on the results of the above information, and it is necessary to search for the second dialogue state again, and make system feedback information for the second dialogue state: search for the related information of Peppa Pig Video information.

Exemplary equipment

On the basis of the above method, this embodiment also discloses a computer device, as shown in FIG. 5, including a memory and a processor, the memory storing a computer program, wherein when the processor executes the computer program Implement the steps of the method.

On the basis of the above method, this embodiment also discloses a computer-readable storage medium on which a computer program is stored, wherein the steps of the method are implemented when the computer program is executed by a processor.

In an exemplary embodiment, a computer device can be implemented by one or more application-specific integrated circuits (ASIC), digital signal processors (DSP), digital signal processing devices (DSPD), programmable logic devices (PLD), field Programmable gate array (FPGA), controller, microcontroller, microprocessor, or other electronic components are implemented to perform the above methods.

According to the method provided by the embodiments of the present disclosure, by acquiring the voice information of the front and back dialogues in multiple rounds of dialogue; the correlation between the dialogue states corresponding to the voice information of the front and rear dialogues is calculated to determine whether the correlation exceeds a certain threshold If it is not exceeded, the user’s intention will be recognized only for the next round of dialogue state. If it is exceeded, it will be judged whether the correlation between the system behavior of the previous round of dialogue and the dialogue state of the next round of dialogue exceeds a certain threshold. If it is not exceeded, the user's intention is identified based on the dialogue state of the next round alone; if it is exceeded, the user's intention is identified by combining the dialogue state of the previous round with the second dialogue state. Since this embodiment fully considers the relevance of the information between the front and back rounds of dialogue, when there is a big difference in the information between the two, the latter round of dialogue will be used as a separate piece of information for user intent analysis, so that a more accurate analysis can be obtained. As a result, it provides a basis for realizing accurate feedback on information sent by users.

Those skilled in the art will easily think of other embodiments of the present disclosure after considering the specification and practicing the disclosure disclosed herein. The present disclosure is intended to cover any variations, uses, or adaptive changes of the present disclosure. These variations, uses, or adaptive changes follow the general principles of the present disclosure and include common knowledge or conventional technical means in the technical field that are not disclosed in the present disclosure. . The description and the embodiments are to be regarded as exemplary only, and the true scope and spirit of the present disclosure are pointed out by the following claims.

It should be understood that the present disclosure is not limited to the precise structure that has been described above and shown in the drawings, and various modifications and changes can be made without departing from its scope. The scope of the present disclosure is limited only by the appended claims.

Claims

A method for recognizing user intentions in multiple rounds of dialogue, which includes:

Acquiring the first dialogue state of the above information and the second dialogue state of the following information in the multiple rounds of dialogue;

If the first correlation between the first dialogue state and the second dialogue state is less than the preset first threshold, the user intention is recognized according to the following information, and the user intention recognition result is obtained.
The method for recognizing user intentions in multiple rounds of dialogues according to claim 1, wherein before the step of acquiring the first dialogue state of the above information in the multiple rounds of dialogues and the second dialogue state of the following information in the multiple rounds of dialogues, the method further comprises:

Determine whether the above information and the voice information corresponding to the below information are the same;

If yes, re-acquire the dialogue information in the next round after the following information, and replace the re-acquired dialogue information with the following information.
The method for recognizing user intentions in multiple rounds of dialogues according to claim 1, wherein the step of separately acquiring the first dialogue state of the above information and the second dialogue state of the following information in the multiple rounds of dialogues comprises:

Obtain the above and below information in multiple rounds of dialogue;

Perform voice recognition and language analysis on the above information and below information, respectively, to obtain the first dialogue state and the second dialogue state.
The method for identifying user intentions in multiple rounds of dialogue according to claim 1, wherein said if the first correlation between the first dialogue state and the second dialogue state is less than a preset first threshold, then Recognize the user's intention according to the following information, and obtain the user's intention recognition result, including:

If the correlation between the first dialogue state and the second dialogue state is greater than or equal to a preset first threshold, acquiring system feedback information corresponding to the first dialogue state;

If the correlation between the system feedback information and the second dialogue state is less than the preset second threshold, the user's intention is identified according to the following information, and the user's intention identification result is obtained.
The method for recognizing user intentions in multiple rounds of conversations according to any one of claims 1 to 4, wherein, if the first correlation between the first conversation state and the second conversation state is less than a preset The first threshold is to identify the user's intention according to the following information, and before the step of obtaining the user's intention identification result, it also includes:

Acquiring the first slot information of the first dialog state;

Acquiring the second slot information of the second dialogue state;

Calculating the first correlation according to the first slot information and the second slot information;

It is determined whether the first correlation is less than a preset first threshold.
The method for recognizing user intentions in multiple rounds of conversations according to claim 5, wherein the step of calculating the first correlation based on the first slot information and the second slot information comprises:

Acquiring the character strings contained in each slot in the first slot information, and merging the character strings into the first character string information;

Acquiring the character strings contained in each slot in the second slot information, and merging the character strings into the second character string information;

Calculating an edit distance between each character string in the first character string information and the second character string information;

The first correlation is calculated according to the size of the edit distance.
The method for recognizing user intentions in multiple rounds of dialogues according to claim 4, wherein said if the correlation between said system feedback information and said second dialogue state is less than a preset second threshold value, according to said The following information identifies the user's intention, and before the step of obtaining the user's intention identification result, it also includes:

Acquiring the third slot information of the system feedback information;

Acquiring the second slot information of the second dialogue state;

Calculating the second correlation according to the third slot information and the second slot information;

It is determined whether the second correlation is less than a preset second threshold.
8. The method for recognizing user intentions in multiple rounds of conversations according to claim 7, wherein the step of calculating the second correlation based on the third slot information and the second slot information comprises:

Acquiring the character strings contained in each slot in the third slot information, and merging the character strings into the third character string information;

Acquiring the character strings contained in each slot in the second slot information, and merging the character strings into the second character string information;

Calculating the edit distance between each character string in the third character string information and the second character string information;

The second correlation is calculated according to the size of the edit distance.
The method for recognizing user intentions in multiple rounds of dialogues according to claim 5, wherein the step of obtaining first slot information and second slot information of the first dialogue state and the second dialogue state respectively After that, it also includes:

Judging whether the slots contained in the first dialogue state and/or the second dialogue state are completely filled;

If it is not completely filled, then obtain the missing keyword information with the incompletely filled slots in the first dialogue state and/or the second dialogue state, and change the first dialogue state according to the obtained keyword information And/or the slots contained in the second dialog state are completely filled.
The method for recognizing user intentions in multiple rounds of conversations according to claim 7, wherein after the step of obtaining the system feedback information and the third slot information and the second slot information of the second dialogue state respectively ,Also includes:

Judging whether the slots contained in the system feedback information and/or the second dialog state are completely filled;

If it is not completely filled, then obtain the missing keyword information with the system feedback information and/or the incompletely filled slot in the second dialogue state, and send the system feedback information and/or according to the obtained keyword information Or the slots contained in the second dialogue state are completely filled.
The method for recognizing user intentions in multiple rounds of conversations according to any one of claims 1-4 and 6-10, wherein the step of recognizing user intentions according to the following information to obtain a user intention recognition result comprises:

Acquiring character information of the information contained in the second dialogue state, and extracting a second keyword set;

Determining second user instruction information corresponding to the second keyword set;

Obtain the user intention recognition result according to the second user instruction information.
The method for recognizing user intentions in multiple rounds of conversations according to claim 1, wherein the recognizing method further comprises:

If the first correlation between the first dialogue state and the second dialogue state is greater than or equal to the preset first threshold, then the first dialogue state, the system feedback information and the following information are combined to the user Intentions are recognized, and the results of user intent recognition are obtained.
The method for recognizing user intentions in multiple rounds of conversations according to claim 4, wherein the recognizing method further comprises:

If the correlation between the system feedback information and the second dialogue state is greater than or equal to the preset second threshold, the first dialogue state, the system feedback information and the following information are combined to identify the user's intention , Get the result of user intention recognition.
The method for recognizing user intentions in multiple rounds of conversations according to claim 12 or 13, wherein the first conversation state, the system feedback information, and the following information are combined to identify the user intentions to obtain the user intentions The steps to identify the result include:

Acquiring the first dialogue state and the character information of the information contained in the system feedback information, and extracting the first keyword set;

Acquiring character information of the information contained in the second dialogue state, and extracting a second keyword set;

Searching for first user instruction information corresponding to the first keyword set;

Searching for second user instruction information corresponding to the second keyword set in the first user instruction information;

Obtain the user intention recognition result according to the second user instruction information.
A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method according to any one of claims 1 to 14 when the computer program is executed by the processor.
A computer-readable storage medium having a computer program stored thereon, wherein the computer program implements the steps of the method according to any one of claims 1 to 14 when the computer program is executed by a processor.