WO2021042902A1 - User intention identification method in multi-round dialogue and related device - Google Patents

User intention identification method in multi-round dialogue and related device Download PDF

Info

Publication number
WO2021042902A1
WO2021042902A1 PCT/CN2020/103922 CN2020103922W WO2021042902A1 WO 2021042902 A1 WO2021042902 A1 WO 2021042902A1 CN 2020103922 W CN2020103922 W CN 2020103922W WO 2021042902 A1 WO2021042902 A1 WO 2021042902A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
dialogue
user
dialogue state
state
Prior art date
Application number
PCT/CN2020/103922
Other languages
French (fr)
Chinese (zh)
Inventor
陈涛
张毅
Original Assignee
深圳Tcl数字技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳Tcl数字技术有限公司 filed Critical 深圳Tcl数字技术有限公司
Publication of WO2021042902A1 publication Critical patent/WO2021042902A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation

Definitions

  • the present disclosure relates to the field of voice interaction technology, and in particular to a method and related equipment for recognizing user intentions in multiple rounds of dialogue.
  • the natural language analysis technology in the prior art is generally data-driven and based on machine learning.
  • the dialogue technology based on natural language analysis is divided into single-round dialogue and multi-round dialogue.
  • Multi-round dialogue is a way in which the user's intention is initially clarified in a human-machine dialogue, and then necessary information is obtained to finally obtain a clear user instruction. Multiple rounds of dialogue correspond to the handling of one thing.
  • the multi-round dialogue system has modules such as language understanding, language generation, dialogue management, and knowledge base.
  • Dialogue management also includes state tracking and action selection sub-modules. It can be considered that a multi-round dialogue system is an extension of a single-round dialogue based on analysis. In each round of dialogue, the semantics of the speech is understood and internal representations are generated. Dialogue management uses a finite state machine, which represents the entire process of obtaining information in a dialogue. After several rounds of dialogue, the system gradually obtains the required information and performs tasks.
  • the existing multiple rounds of dialogue in the prior art are based on the previous round of dialogue search for query matching in the next round of dialogue.
  • the user’s previous round of dialogue is: "Piggie Pig”
  • the next round of dialogue is "horrible”
  • search for information related to "horror” on Peggy Little Pig's search results because there is a gap between these two words.
  • the final analysis of the user’s intention is inaccurate, and the inaccurate analysis of the user’s intention leads to the inaccurate return of the system behavior, which leads to unsmooth communication between the user and the dialog device, or leads to the dialog device
  • the instruction execution error directly brings inconvenience to the user's use of the dialogue device.
  • the present disclosure provides a method and related equipment for recognizing user intentions in multiple rounds of dialogue, which overcomes the lack of consideration of the relationship between the previous and subsequent dialogues in the multiple rounds of dialogue in the prior art.
  • the query matching of the next round of dialogue is always performed on the basis of the previous round of dialogue search, which leads to the defect that the accuracy of the subsequent round of dialogue query matching is low.
  • this embodiment discloses a method for recognizing user intentions in multiple rounds of dialogue, which includes the following steps:
  • the user intention is recognized according to the following information, and the user intention recognition result is obtained.
  • the method before the step of acquiring the first dialogue state of the above information and the second dialogue state of the following information in the multiple rounds of dialogue, the method further includes:
  • the step of separately acquiring the first dialogue state of the above information and the second dialogue state of the following information in the multiple rounds of dialogues includes:
  • the user's intention is identified according to the following information to obtain the user Intent identification results, including:
  • the correlation between the first dialogue state and the second dialogue state is greater than or equal to a preset first threshold, acquiring system feedback information corresponding to the first dialogue state;
  • the user's intention is identified according to the following information, and the user's intention identification result is obtained.
  • the user's intention is identified according to the following information to obtain the user Before the step of intent to identify the result, it also includes:
  • the step of calculating the first correlation according to the first slot information and the second slot information includes:
  • the first correlation is calculated according to the size of the edit distance.
  • the user intention is identified according to the following information, and the user intention recognition result is obtained Before the steps, it also includes:
  • the step of calculating the second correlation according to the third slot information and the second slot information includes:
  • the second correlation is calculated according to the size of the edit distance.
  • the method further includes:
  • the method further includes:
  • the step of recognizing the user's intention based on the following information to obtain the user's intention recognition result includes:
  • the identification method further includes:
  • the first correlation between the first dialogue state and the second dialogue state is greater than or equal to the preset first threshold, then the first dialogue state, the system feedback information and the following information are combined to the user Intentions are recognized, and the results of user intent recognition are obtained.
  • the identification method further includes:
  • the correlation between the system feedback information and the second dialogue state is greater than or equal to the preset second threshold, the first dialogue state, the system feedback information and the following information are combined to identify the user's intention , Get the result of user intention recognition.
  • the step of combining the first dialogue state, the system feedback information and the following information to identify the user's intention, and obtaining the user's intention recognition result includes:
  • this embodiment also discloses a computer device, including a memory and a processor, the memory stores a computer program, wherein the steps of the method are implemented when the processor executes the computer program.
  • this embodiment also discloses a computer-readable storage medium on which a computer program is stored, wherein the computer program is executed by a processor to implement the steps of the method.
  • the difference between the first dialogue state and the second dialogue state is calculated First correlation, and compare the first correlation with a preset first threshold, and if the first correlation is less than the preset first threshold, only identify the user's intention based on the following information to obtain User intention recognition result. Since this embodiment fully considers the relevance of the interactive information between the front and rear dialogues, when there is a large difference in the interactive information between the two, the latter dialogue will be used as a separate information for user intent analysis, so that more information can be obtained. It provides a basis for accurate analysis results and accurate feedback for users' information.
  • FIG. 1 is a flowchart of steps of a method for recognizing user intentions in multiple rounds of dialogue in an embodiment of the present disclosure
  • Figure 2 is a schematic diagram of the information flow of the multi-round dialogue system
  • Figure 3 Schematic diagram of the principle structure of a multi-round dialogue system
  • Fig. 4 is a schematic diagram of a framework of an exemplary application scenario in an embodiment of the present disclosure
  • Fig. 5 is a block diagram of the principle structure of a computer device in an embodiment of the present disclosure.
  • Multi-round dialogue is the process of human-computer interaction. After the user’s intention is initially clarified, the necessary information is obtained to finally obtain a definite user instruction. Multi-round dialogue corresponds to the processing of one thing, which can be expressed as a multitude of interactions between humans and machines. If the user's instructions can be clearly defined in one dialogue, then multiple rounds of dialogue can be expressed as one dialogue interaction between man and machine.
  • the current multi-round dialogue ignores the randomness between the utterances, and it is necessary to judge whether it is necessary to make a decision based on the previous dialogue context based on the relevance of the content before and after the dialogue, and there may be between the content before and after the dialogue. Irrelevant situations. If the content between the front and the back is not relevant, when the system feedback behavior is generated for the next round of content, the retrieval is implemented on the basis of the previous round of content, which may lead to the subsequent round of information The resulting feedback behavior is inaccurate.
  • This embodiment discloses a method for identifying user intentions in multiple rounds of dialogue. By analyzing the relevance of information between the front and back rounds of dialogue, it is determined whether it is necessary to perform the next round of dialogue information on the basis of the query results of the previous round of dialogue. For example, when the user says “Little Pig” and then "Terror", first judge the relevance of the two. If they are not relevant, perform a single-round search, that is, search for "Little Pig" and “Single” Search for "horror”. If relevant, search for "Little Pig Peggy” first, and then search for "horror” on the search results of "Little Pig Peggy". Since there is no correlation between "Little Pig Peggy" and "Terror", better and more accurate results can be obtained by using the method disclosed in this embodiment.
  • the method may include the following steps, for example:
  • Step S101 Acquire the first dialogue state of the above information and the second dialogue state of the following information in multiple rounds of dialogue.
  • the first dialogue state corresponding to the above information and the second dialogue state of the following information are obtained respectively.
  • the following information is the voice information sent by the user during the next human-computer interaction during the human-computer interaction in multiple rounds of dialogue.
  • the above information is the human-computer interaction in the multiple rounds of dialogue.
  • the user The previous voice message.
  • the above information and the following information belong to the previous and next voice information sent by the user, and the above information and the following information belong to the natural voice dialogue. You can use Chinese, English or other natural voices to conduct multiple rounds of conversations with the dialogue system. .
  • the dialogue system receives the voice information sent by the user, it helps the user complete a task, which is usually the task of accessing information.
  • the dialogue state includes the text converted from the voice information sent by the user and the information related to the text information analyzed according to the text. After obtaining the above information and the following information, the above information and the following information are respectively subjected to voice recognition and semantic recognition to obtain the first dialogue state of the above information and the second dialogue state of the following information.
  • the multiple rounds of dialogue include: voice understanding, voice generation, dialogue management, knowledge base search and other steps.
  • Dialogue management also includes steps such as dialogue state tracking and action selection. It can be considered that multiple rounds of dialogue are the expansion of a single round of dialogue based on analysis. In each round of dialogue, the semantics of the speech is understood and internal representations are generated. Dialogue management uses a finite state machine, which represents the entire process of obtaining information in a dialogue. After several rounds of dialogue, the system gradually obtains the required information and performs tasks such as flight information query.
  • the above information sent by the user is first obtained, and the voice recognition result is generated through the voice recognition of the above information sent by the user, which is the text information corresponding to the above information; the semantic analysis module maps the text information to the user
  • the dialogue state is the first dialogue state; similarly, voice recognition is performed on the following information sent by the user to obtain the voice recognition result, and the voice recognition result is mapped to the user dialogue state to obtain the second dialogue state.
  • this step in order to obtain the first dialogue state of the above information, it is first necessary to perform voice recognition on the above information, identify the text information contained in the above information, and then perform semantic analysis on the recognized text information. Get the information contained in the text message.
  • semantic analysis there are generally two processing methods. One is to retrieve information corresponding to the text information, and the other is to generate information corresponding to the text information based on a generation method.
  • the way to retrieve the information corresponding to the text information generally requires the establishment of a storage database, in which a large amount of dialogue data is stored, and an index is established between the dialogue data and the dialogue keywords. After the contained keywords are identified, the corresponding dialogue data in the database is output according to the keywords, that is, the analyzed first dialogue state corresponding to the text information.
  • the semantic analysis processing module uses a large amount of data to construct a speech analysis model.
  • the voice analysis model After the user inputs a piece of text information, the voice analysis model outputs an analysis result corresponding to the text information.
  • the voice analysis model is constructed on the basis of a large amount of dialogue data based on a deep learning neural network.
  • the result of the analysis of the voice analysis model is the dialogue state corresponding to the above information or the following information.
  • Step S102 If the first correlation between the first dialogue state and the second dialogue state is less than a preset first threshold, the user intention is recognized according to the following information, and the user intention recognition result is obtained.
  • the first correlation between the first dialogue state and the second dialogue state is calculated, and it is determined whether the first correlation is It is less than the preset first threshold. If it is less than, it is determined that the correlation between the above information and the following information is small, and the user's intention can be directly identified based on the following information to obtain the user's intention recognition result.
  • the first correlation steps include:
  • the first correlation is calculated according to the first slot information and the second slot information.
  • the slot information is the information needed to transform the preliminary user's intentions into clear user instructions in the process of multiple rounds of dialogue, and one slot corresponds to a type of information that needs to be acquired in the processing of one thing.
  • Slot information is a kind of information that must be obtained. It does not need to be completely filled in multiple rounds of dialogue. It is divided into required slot information and non-required slot information. Since the non-mandatory slot information can be obtained based on the context information, it can exist in the form of a default value.
  • the weather conditions must be searched based on geographic location.
  • the user's location can be known as: Beijing
  • the default query corresponding to the dialog can be defaulted The weather in Beijing, so the system can directly give feedback: search for the weather in Beijing.
  • the step of calculating the first correlation according to the first slot information and the second slot information includes:
  • the first correlation is calculated according to the size of the edit distance.
  • the round information is combined with the input to determine the query status of the round, and the user status in the previous round is determined as: currency information query, according to the character strings "RMB” and “USD” corresponding to the two slots of the information below , And the slot information string corresponding to the last round of system feedback information: "currency information query”, calculate the edit distance between the two, and get the string information corresponding to the following information converted into the corresponding information above The minimum number of editing operations required for the string to get the correlation between the two.
  • the algorithmic process of calculating the edit distance between character strings includes:
  • d[i,j] steps (you can use a two-dimensional array to save this value), which represents the minimum number of steps required to convert the string s[1...i] to the string t[1...j] .
  • the first row of the initialization matrix is 0 to n, and the first column is 0 to m.
  • Matrix[0][j] represents the value in row 1 and column j-1. This value represents the number of operations required to convert the string s[1...0] to t[1..j]. Obviously An empty string is converted to a string of length j, only j times of add operations are required, so the value of matrix[0][j] should be j, and other values can be deduced by analogy.
  • the magnitude between the calculated first correlation and the preset first threshold it is determined whether to perform a single round of dialogue or multiple rounds of dialogue to identify the user's intention. If the first correlation is greater than the preset first threshold, then multiple A round of dialogue recognizes the user's intention, and if the first correlation is less than a preset first threshold, a single round of dialogue is performed to recognize the user's intention.
  • the user's intention is identified according to the following information, and the result of user's intention identification is obtained, including:
  • Step 103 If the correlation between the first dialogue state and the second dialogue state is greater than or equal to a preset first threshold, obtain system feedback information corresponding to the first dialogue state;
  • the machine system that talks to the user in the first dialog state will automatically feed back a reply message.
  • the reply message is system feedback information
  • the system feedback information is implemented by the dialog management module.
  • the dialog management module will select the system feedback behavior that needs to be performed according to the first dialog state and the second dialog state, that is, system feedback information; if the system feedback information needs to interact with the user, then the language is generated
  • the module will be triggered to generate natural language or system speech; finally, the generated language is read aloud to the user by the speech synthesis module.
  • the main tasks of dialogue management include: dialogue state maintenance and system decision making.
  • the dialogue state maintenance includes maintaining and updating the dialogue state.
  • the dialogue state at time t+1 is St+1 , which depends on the state St at the previous time t , and the system behavior at the previous time t , and the current time User behavior a t+1 corresponding to t+1. It can be written as S t+1 ⁇ S t +a t +a t+1 to generate system decisions.
  • the system feedback behavior is generated to decide what to do next.
  • the system feedback behavior represents the dialogue based on the user's input State, the feedback behavior made by the system.
  • the input information of the dialogue management model is the user's voice information and the current dialogue state obtained by analyzing the user's voice information, and its output is the next system feedback behavior and updated dialogue status. Therefore, the more semantic information carried in the input information, the more accurate the information fed back by the dialogue management module.
  • the corresponding dialogue status includes: film and television, actors are animals, comedy and family dramas are genres, and other information related to Peppa Pig.
  • the system feedback information corresponding to the dialogue state is: video search. If the following information is: the third episode of the first season, through speech recognition and semantic analysis of the following information, the second dialogue state corresponding to the following information is: TV series or cartoons, episode 3, multi-season plot, etc., By calculating the similarity between the first dialogue state and the second dialogue state, it can be obtained that the correlation between the first dialogue state and the second dialogue state is greater than the preset first threshold, and it is necessary to obtain information about the first dialogue state.
  • System feedback Search for the third episode of the first season of Peppa Pig.
  • the above-mentioned dialogue management module controls the process of man-machine dialogue, and determines the reaction to the user at the moment based on the dialogue history information.
  • the most common multi-round dialogue is task-driven.
  • the user has a clear purpose such as ordering food and ticketing. User needs are more complex and have many restrictions. Therefore, consultation responses with relatively complex content need to be presented in multiple rounds.
  • users can continuously modify or improve their own needs.
  • the machine can also help users find satisfactory results by asking, clarifying, or confirming.
  • the dialogue process is as shown in Figure 4. The user and the system realize information communication through question and answer.
  • the user sends out a voice message: Hi, I want to order a meal and realize the transmission of voice commands.
  • voice command The system can be a voice robot, or other devices that can recognize the user's voice information
  • analyze the voice command and identify the key information contained in the voice command: restaurant, then the system will feed back the inquiry information: what type do you like What about the food, and the feedback dialogue behavior: food
  • the user sends out the voice message again: I like to eat Gongbao chicken
  • the system receives the keyword contained in the voice message: Gongbao chicken, According to the received information, the user feedback confirmation, and finally a satisfactory ordering effect is obtained.
  • Step S104 If the correlation between the system feedback information and the second dialogue state is less than a preset second threshold, the user's intention is identified according to the following information, and the user's intention identification result is obtained.
  • the correlation between the system feedback information corresponding to the first dialogue state and the second dialogue state acquired in the above step S103 is less than the preset second threshold, it is determined that the previous round of dialogue information and the next round of dialogue The information is low in relevance. Only the following information is used to identify the user's intention. Otherwise, it is determined that the previous round of dialogue information is highly relevant to the next round of dialogue information, and the following information and the relevant content of the above information are combined to determine the user's intention. Recognition.
  • step S104 if the correlation between the system feedback information and the second dialogue state is less than the preset second threshold, the user intention is identified according to the following information, and the user intention identification result is obtained before the step ,Also includes:
  • the second correlation between the system feedback information and the second dialogue state is calculated according to the obtained slot information.
  • the step of calculating the second correlation according to the third slot information and the second slot information includes:
  • the second correlation is calculated according to the size of the edit distance.
  • the calculation principle is the same as the calculation principle of the correlation between the first slot information and the second slot information in the above steps.
  • the method before the step of calculating the first correlation between the first dialogue state and the second dialogue state, the method further includes:
  • the first dialogue state and the second dialogue state contain It is judged whether the slot is filled completely. If it is not filled completely, it will be filled completely, and then the correlation between the two will be calculated.
  • the method further includes:
  • the slot corresponding to the system feedback information and/or the second dialogue state is not filled completely, the calculation accuracy of the correlation may be low. Therefore, in the above steps, the slots contained in the system feedback information and the second dialogue state are not fully filled. It is judged whether the filling is complete. If it is not filled, it will be filled completely, and then the correlation between the two will be calculated.
  • the step of identifying the user's intention by combining the first dialogue state, the system feedback information and the following information, and obtaining the user's intention recognition result includes:
  • the first dialogue state and/or the correlation between the system feedback information and the second dialogue state meets the preset threshold condition
  • the first dialogue state and the system feedback information contained in the first dialogue state will be told to you Character information, the user’s intention is recognized, and the search results for the above information are obtained.
  • the search results of the above information the following information is searched, so as to feed back the above information and the following information sent by the user. Corresponding user instructions, and search results.
  • a single round of dialogue is used to identify the user's intention, that is, only the following
  • the information is combined to identify the user's intention, and the result of the user's intention identification is obtained.
  • the step of recognizing the user's intention according to the following information and obtaining the result of the user's intention recognition includes:
  • the dialogue system in order to prevent the user from sending out the same voice information before and after, the dialogue system repeatedly calculates the correlation of the same voice, which leads to an increase in the amount of system information processing tasks.
  • the dialogue system Before the steps of the first dialogue state of the text information and the system feedback information corresponding to the first dialogue state, and the second dialogue state of the following information, it further includes:
  • Step H1 first determine whether the above information is the same as the following information, if they are the same, perform step H2, otherwise, perform step H3;
  • Step H2 reacquire the following information.
  • Step H3 acquiring the first dialogue state of the above information and the second dialogue state of the following information
  • Step H4 Calculate the first correlation between the first dialogue state and the second dialogue state; determine whether the first correlation is greater than a preset first threshold; if it is less, go to step H5, otherwise go to step H6;
  • Step H5 Obtain the system feedback information corresponding to the first dialogue state, and calculate the second correlation between the system feedback information and the second dialogue state, and the second dialogue state and the system feedback Whether the second correlation between the information is lower than the preset second threshold, if yes, execute step H7, if not, execute step H8;
  • Step H6 It is determined that there is a correlation between the two rounds of dialogue, then enter the opposite round dialogue, obtain the system feedback information corresponding to the first dialogue state, and combine the first dialogue state, the system feedback information, and the second dialogue state. Recognition of user intent.
  • Step H7 This conversation can be used as a single-round conversation to identify the user's intention.
  • Step H8 This dialogue performs multiple rounds of dialogue, and the user's intention needs to be identified in combination with the first dialogue state, system feedback information, and second dialogue state.
  • the corresponding dialogue status includes: film and television, actors are animals, comedies and family dramas are genres, and other information related to Peppa Pig.
  • the system feedback information corresponding to the dialogue state is: video search. If the following information is: what's the weather today, it can be obtained by speech recognition and semantic analysis of the following information, and the second dialogue state corresponding to the following information is: geographic location, today, temperature, rain, etc., by calculating the first dialogue
  • the similarity between the state and the second dialogue state it can be obtained that the correlation between the first dialogue state and the second dialogue state is lower than the preset first threshold, then it is judged that the following information is not related to the above information . Therefore, it is not possible to make system feedback information for the following information based on the results of the above information. It is necessary to search for the second dialogue state again and make system feedback information for the second dialogue state: search for the current user’s location today weather.
  • the corresponding dialogue state content includes: movies, genres are comedy and family drama, and the system feedback information corresponding to the dialogue state is: movie search.
  • the following information is: Peppa Pig
  • the corresponding dialogue state content includes: film and television
  • the actor is an animal
  • the genre is comedy and family drama
  • other information related to Peppa Pig which corresponds to the dialogue state
  • the system feedback information is: comedy movie search.
  • this embodiment also discloses a computer device, as shown in FIG. 5, including a memory and a processor, the memory storing a computer program, wherein when the processor executes the computer program Implement the steps of the method.
  • this embodiment also discloses a computer-readable storage medium on which a computer program is stored, wherein the steps of the method are implemented when the computer program is executed by a processor.
  • a computer device can be implemented by one or more application-specific integrated circuits (ASIC), digital signal processors (DSP), digital signal processing devices (DSPD), programmable logic devices (PLD), field Programmable gate array (FPGA), controller, microcontroller, microprocessor, or other electronic components are implemented to perform the above methods.
  • ASIC application-specific integrated circuits
  • DSP digital signal processors
  • DSPD digital signal processing devices
  • PLD programmable logic devices
  • FPGA field Programmable gate array
  • controller microcontroller, microprocessor, or other electronic components are implemented to perform the above methods.
  • the correlation between the dialogue states corresponding to the voice information of the front and rear dialogues is calculated to determine whether the correlation exceeds a certain threshold If it is not exceeded, the user’s intention will be recognized only for the next round of dialogue state. If it is exceeded, it will be judged whether the correlation between the system behavior of the previous round of dialogue and the dialogue state of the next round of dialogue exceeds a certain threshold. If it is not exceeded, the user's intention is identified based on the dialogue state of the next round alone; if it is exceeded, the user's intention is identified by combining the dialogue state of the previous round with the second dialogue state.
  • this embodiment fully considers the relevance of the information between the front and back rounds of dialogue, when there is a big difference in the information between the two, the latter round of dialogue will be used as a separate piece of information for user intent analysis, so that a more accurate analysis can be obtained. As a result, it provides a basis for realizing accurate feedback on information sent by users.

Abstract

Disclosed in the present disclosure are a user intention identification method in multi-round dialogue and a related device. Said method comprises: acquiring a first dialogue state of previous information and a second dialogue state of following information in former and latter rounds of dialogues in multi-round dialogue; and calculating a first correlation between the first dialogue state and the second dialogue state, and determining, according to the magnitude of the first correlation, whether to perform user intention identification of a single-round dialogue. As the correlation of information between former and latter rounds of dialogues is fully considered in the present embodiment, if information of the two is greatly different, a user intention analysis is performed by using the latter round of dialogue as an independent piece of information, so that a more accurate analysis result can be obtained, thereby providing a basis for performing accurate feedback on information sent by a user.

Description

一种多轮对话中用户意图的识别方法及相关设备Method and related equipment for recognizing user's intention in multiple rounds of dialogue
优先权priority
本公开要求申请日为2019年09月04日提交中国专利局、申请号为“2019108334041”、申请名称为“一种多轮对话中用户意图的识别方法及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。This disclosure requires the priority of a Chinese patent application filed with the Chinese Patent Office on September 4, 2019, the application number is "2019108334041", and the application name is "A method for identifying user intentions in multiple rounds of dialogue and related equipment" , The entire contents of which are incorporated into the present disclosure by reference.
技术领域Technical field
本公开涉及语音交互技术领域,尤其涉及的是一种多轮对话中用户意图的识别方法及相关设备。The present disclosure relates to the field of voice interaction technology, and in particular to a method and related equipment for recognizing user intentions in multiple rounds of dialogue.
背景技术Background technique
现有技术中的自然语言解析技术,一般是采用数据驱动,基于机器学习的。基于自然语言解析的对话技术分单轮对话和多轮对话。The natural language analysis technology in the prior art is generally data-driven and based on machine learning. The dialogue technology based on natural language analysis is divided into single-round dialogue and multi-round dialogue.
多轮对话是一种在人机对话中,初步明确用户意图之后,获取必要信息以最终得到明确用户指令的方式。多轮对话与一件事情的处理相对应。多轮对话系统拥有语言理解,语言生成,对话管理,知识库等模块。对话管理又包括状态跟踪和动作选择子模块,可以认为多轮对话系统,是基于分析的单轮对话的扩展,在每轮对话中,对发话进行语义理解,产生内部表征。对话管理使用有限状态机,表示对话中获取信息的整个过程。经过几轮对话,系统逐步获取所需信息,并执行任务。Multi-round dialogue is a way in which the user's intention is initially clarified in a human-machine dialogue, and then necessary information is obtained to finally obtain a clear user instruction. Multiple rounds of dialogue correspond to the handling of one thing. The multi-round dialogue system has modules such as language understanding, language generation, dialogue management, and knowledge base. Dialogue management also includes state tracking and action selection sub-modules. It can be considered that a multi-round dialogue system is an extension of a single-round dialogue based on analysis. In each round of dialogue, the semantics of the speech is understood and internal representations are generated. Dialogue management uses a finite state machine, which represents the entire process of obtaining information in a dialogue. After several rounds of dialogue, the system gradually obtains the required information and performs tasks.
但是现有技术中的现有的多轮对话都是在前一轮对话搜索的基础上对后一轮对话进行查询匹配,例如:当用户前一轮的对话为:“小猪佩琪”,后一轮对话为“恐怖的”,则会先对“小猪佩琪”进行搜索,然后在小猪佩琪的搜索结果上去搜索与“恐怖的”相关的信息,由于这两个词之间没有关联关系,所以导致最后分析出的用户意图是不准确的,而用户意图分析的不准确,导致系统行为返回的不准确,从而导致用户与对话设备之间沟通的不顺利,或者导致对话设备指令执行错误,直接给用户对对话设备的使用带来不便。However, the existing multiple rounds of dialogue in the prior art are based on the previous round of dialogue search for query matching in the next round of dialogue. For example, when the user’s previous round of dialogue is: "Piggie Pig", If the next round of dialogue is "horrible", it will first search for "Little Pig Peggy", and then search for information related to "horror" on Peggy Little Pig's search results, because there is a gap between these two words. There is no correlation, so the final analysis of the user’s intention is inaccurate, and the inaccurate analysis of the user’s intention leads to the inaccurate return of the system behavior, which leads to unsmooth communication between the user and the dialog device, or leads to the dialog device The instruction execution error directly brings inconvenience to the user's use of the dialogue device.
因此,现有技术有待于进一步的改进。Therefore, the existing technology needs to be further improved.
公开内容Public content
鉴于上述现有技术中的不足之处,本公开提供了一种多轮对话中用户意图的识别方法及相关设备,克服现有技术中的多轮对话中未考虑前后对话之间的关联关系,总是在前一轮对话搜索的基础上对后一轮对话进行查询匹配,导致后一轮对话查询匹配的准确性低缺陷。In view of the above-mentioned shortcomings in the prior art, the present disclosure provides a method and related equipment for recognizing user intentions in multiple rounds of dialogue, which overcomes the lack of consideration of the relationship between the previous and subsequent dialogues in the multiple rounds of dialogue in the prior art. The query matching of the next round of dialogue is always performed on the basis of the previous round of dialogue search, which leads to the defect that the accuracy of the subsequent round of dialogue query matching is low.
第一方面,本实施例公开了一种多轮对话中用户意图的识别方法,其中,包括以下步骤:In the first aspect, this embodiment discloses a method for recognizing user intentions in multiple rounds of dialogue, which includes the following steps:
获取多轮对话中上文信息的第一对话状态和下文信息的第二对话状态;Acquiring the first dialogue state of the above information and the second dialogue state of the following information in the multiple rounds of dialogue;
若所述第一对话状态与所述第二对话状态之间的第一相关性小于预设第一阈值,则根据所述下文信息对用户意图进行识别,得到用户意图识别结果。If the first correlation between the first dialogue state and the second dialogue state is less than the preset first threshold, the user intention is recognized according to the following information, and the user intention recognition result is obtained.
在一个实施例中,所述获取多轮对话中上文信息的第一对话状态和下文信息的第二对话状态的步骤之前,还包括:In one embodiment, before the step of acquiring the first dialogue state of the above information and the second dialogue state of the following information in the multiple rounds of dialogue, the method further includes:
判断所述上文信息与所述下文信息所对应的语音信息是否相同;Determine whether the above information and the voice information corresponding to the below information are the same;
若是,则重新获取位于所述下文信息后一轮的对话信息,并将重新获取的所述对话信息替换为所述下文信息。If yes, re-acquire the dialogue information in the next round after the following information, and replace the re-acquired dialogue information with the following information.
在一个实施例中,所述分别获取多轮对话中上文信息的第一对话状态和下文信息的第二对话状态的步骤包括:In an embodiment, the step of separately acquiring the first dialogue state of the above information and the second dialogue state of the following information in the multiple rounds of dialogues includes:
获取多轮对话中的上文信息和下文信息;Obtain the above and below information in multiple rounds of dialogue;
分别对所述上文信息和下文信息进行语音识别和语言分析,得到第一对话状态和第二对话状态。Perform voice recognition and language analysis on the above information and below information, respectively, to obtain the first dialogue state and the second dialogue state.
在一个实施例中,所述若所述第一对话状态与所述第二对话状态之间的第一相关性小于预设第一阈值,则根据所述下文信息对用户意图进行识别,得到用户意图识别结果,包括:In one embodiment, if the first correlation between the first dialogue state and the second dialogue state is less than a preset first threshold, the user's intention is identified according to the following information to obtain the user Intent identification results, including:
若所述第一对话状态与所述第二对话状态之间的相关性大于等于预设第一阈值,则获取与所述第一对话状态所对应的系统反馈信息;If the correlation between the first dialogue state and the second dialogue state is greater than or equal to a preset first threshold, acquiring system feedback information corresponding to the first dialogue state;
若所述系统反馈信息与所述第二对话状态之间的相关性小于预设第二阈值,则根据所述下文信息对用户意图进行识别,得到用户意图识别结果。If the correlation between the system feedback information and the second dialogue state is less than the preset second threshold, the user's intention is identified according to the following information, and the user's intention identification result is obtained.
在一个实施例中,所述若所述第一对话状态与所述第二对话状态之间的第一相关性小于预设第一阈值,则根据所述下文信息对用户意图进行识别,得到用户意图识别结果的步骤之前,还包括:In one embodiment, if the first correlation between the first dialogue state and the second dialogue state is less than a preset first threshold, the user's intention is identified according to the following information to obtain the user Before the step of intent to identify the result, it also includes:
获取所述第一对话状态的第一槽位信息;Acquiring the first slot information of the first dialog state;
获取所述第二对话状态的第二槽位信息;Acquiring the second slot information of the second dialogue state;
根据所述第一槽位信息和第二槽位信息计算出所述第一相关性;Calculating the first correlation according to the first slot information and the second slot information;
判断所述第一相关性是否小于预设第一阈值。It is determined whether the first correlation is less than a preset first threshold.
在一个实施例中,所述根据所述第一槽位信息和第二槽位信息计算出所述第一相关性的步骤包括:In an embodiment, the step of calculating the first correlation according to the first slot information and the second slot information includes:
获取所述第一槽位信息中各个槽位所含字符串,并将各个字符串合并为第一字符串信息;Acquiring the character strings contained in each slot in the first slot information, and merging the character strings into the first character string information;
获取所述第二槽位信息中各个槽位所含字符串,并将各个字符串合并为第二字符串信息;Acquiring the character strings contained in each slot in the second slot information, and merging the character strings into the second character string information;
计算所述第一字符串信息与所述第二字符串信息中各个字符串之间的编辑距离;Calculating an edit distance between each character string in the first character string information and the second character string information;
根据所述编辑距离的大小计算出所述第一相关性。The first correlation is calculated according to the size of the edit distance.
在一个实施例中,所述若所述系统反馈信息与所述第二对话状态之间的相关性小于预设第二阈值,则根据所述下文信息对用户意图进行识别,得到用户意图识别结果的步骤之前,还包括:In one embodiment, if the correlation between the system feedback information and the second dialog state is less than a preset second threshold, the user intention is identified according to the following information, and the user intention recognition result is obtained Before the steps, it also includes:
获取所述系统反馈信息的第三槽位信息;Acquiring the third slot information of the system feedback information;
获取所述第二对话状态的第二槽位信息;Acquiring the second slot information of the second dialogue state;
根据所述第三槽位信息和第二槽位信息计算出所述第二相关性;Calculating the second correlation according to the third slot information and the second slot information;
判断所述第二相关性是否小于预设第二阈值。It is determined whether the second correlation is less than a preset second threshold.
在一个实施例中,所述根据所述第三槽位信息和第二槽位信息计算出所述第二相关性的步骤包括:In an embodiment, the step of calculating the second correlation according to the third slot information and the second slot information includes:
获取所述第三槽位信息中各个槽位所含字符串,并将各个字符串合并为第三字符串信息;Acquiring the character strings contained in each slot in the third slot information, and merging the character strings into the third character string information;
获取所述第二槽位信息中各个槽位所含字符串,并将各个字符串合并为第二字符串信息;Acquiring the character strings contained in each slot in the second slot information, and merging the character strings into the second character string information;
计算所述第三字符串信息与所述第二字符串信息中各个字符串之间的编辑距离;Calculating the edit distance between each character string in the third character string information and the second character string information;
根据所述编辑距离的大小计算出所述第二相关性。The second correlation is calculated according to the size of the edit distance.
在一个实施例中,所述分别获取所述第一对话状态和所述第二对话状态的第一槽位信息和第二槽位信息的步骤之后,还包括:In an embodiment, after the step of obtaining the first slot information and the second slot information of the first dialogue state and the second dialogue state, respectively, the method further includes:
判断所述第一对话状态和/或第二对话状态所含有的槽位是否填充完整;Judging whether the slots contained in the first dialogue state and/or the second dialogue state are completely filled;
若未填充完整,则获取与所述第一对话状态和/或第二对话状态中未填充完整的槽位所缺失的关键词信息,并根据获取到的关键词信息将所述第一对话状态和/或第二对话状态所含有的槽位填充完整。If it is not completely filled, then obtain the missing keyword information with the incompletely filled slots in the first dialogue state and/or the second dialogue state, and change the first dialogue state according to the obtained keyword information And/or the slots contained in the second dialog state are completely filled.
在一个实施例中,所述分别获取所述系统反馈信息和所述第二对话状态的第三槽位信息和第二槽位信息的步骤之后,还包括:In an embodiment, after the step of separately acquiring the system feedback information and the third slot information and the second slot information of the second dialogue state, the method further includes:
判断所述系统反馈信息和/或第二对话状态所含有的槽位是否填充完整;Judging whether the slots contained in the system feedback information and/or the second dialog state are completely filled;
若未填充完整,则获取与所述系统反馈信息和/或第二对话状态中未填充完整的槽位所缺失的关键词信息,并根据获取到的关键词信息将所述系统反馈信息和/或第二对话状态所含有的槽位填充完整。If it is not completely filled, then obtain the missing keyword information with the system feedback information and/or the incompletely filled slot in the second dialogue state, and send the system feedback information and/or according to the obtained keyword information Or the slots contained in the second dialogue state are completely filled.
在一个实施例中,所述根据所述下文信息对用户意图进行识别,得到用户意图识别结果的步骤包括:In an embodiment, the step of recognizing the user's intention based on the following information to obtain the user's intention recognition result includes:
获取所述第二对话状态所含信息的字符信息,提取出第二关键词集;Acquiring character information of the information contained in the second dialogue state, and extracting a second keyword set;
确定与所述第二关键词集相对应的第二用户指令信息;Determining second user instruction information corresponding to the second keyword set;
根据所述第二用户指令信息得到用户意图识别结果。Obtain the user intention recognition result according to the second user instruction information.
在一个实施例中,所述识别方法还包括:In an embodiment, the identification method further includes:
若所述第一对话状态与所述第二对话状态之间的第一相关性大于等于预设第一阈值,则将所述第一对话状态、所述系统反馈信息与下文信息相结合对用户意图进行识别,得到用户意图识别结果。If the first correlation between the first dialogue state and the second dialogue state is greater than or equal to the preset first threshold, then the first dialogue state, the system feedback information and the following information are combined to the user Intentions are recognized, and the results of user intent recognition are obtained.
在一个实施例中,所述识别方法还包括:In an embodiment, the identification method further includes:
若所述系统反馈信息与所述第二对话状态之间的相关性大于等于预设第二阈值,则将所述第一对话状态、所述系统反馈信息与下文信息相结合对用户意图进行识别,得到用户意图识别结果。If the correlation between the system feedback information and the second dialogue state is greater than or equal to the preset second threshold, the first dialogue state, the system feedback information and the following information are combined to identify the user's intention , Get the result of user intention recognition.
在一个实施例中,所述将所述第一对话状态、所述系统反馈信息与下文信息相结合对用户意图进行识别,得到用户意图识别结果的步骤包括:In one embodiment, the step of combining the first dialogue state, the system feedback information and the following information to identify the user's intention, and obtaining the user's intention recognition result includes:
获取所述第一对话状态和所述系统反馈信息所含信息的字符信息,提取出第一关键词集;Acquiring the first dialogue state and the character information of the information contained in the system feedback information, and extracting the first keyword set;
获取所述第二对话状态所含信息的字符信息,提取出第二关键词集;Acquiring character information of the information contained in the second dialogue state, and extracting a second keyword set;
搜索与所述第一关键词集相对应的第一用户指令信息;Searching for first user instruction information corresponding to the first keyword set;
搜索所述第一用户指令信息中与所述第二关键词集相对应的第二用户指令信息;Searching for second user instruction information corresponding to the second keyword set in the first user instruction information;
根据所述第二用户指令信息得到用户意图识别结果。Obtain the user intention recognition result according to the second user instruction information.
第二方面,本实施例还公开了一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其中,所述处理器执行所述计算机程序时实现所述方法的步骤。In the second aspect, this embodiment also discloses a computer device, including a memory and a processor, the memory stores a computer program, wherein the steps of the method are implemented when the processor executes the computer program.
第三方面,本实施例还公开了一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现所述的方法的步骤。In the third aspect, this embodiment also discloses a computer-readable storage medium on which a computer program is stored, wherein the computer program is executed by a processor to implement the steps of the method.
与现有技术相比,本公开实施例具有以下优点:Compared with the prior art, the embodiments of the present disclosure have the following advantages:
根据本公开实施方式提供的方法,通过获取多轮对话中前后轮对话中上文信息的第一对话状态和下文信息的第二对话状态;分别计算第一对话状态与第二对话状态之间的第一相关性,并将所述第一相关性与预设第一阈值进行比较,若所述第一相关性小于预设第一阈值,则仅仅根据所述下文信息对用户意图进行识别,得到用户意图识别结果。由于本实施例中充分考虑了前后轮对话之间交互信息的相关性,当两者之间交互信息差别较大时,则将后一轮对话作为一个单独信息进行用户意图分析,从而可以得到更为准确的分析结果,为实现针对用户发出信息做出精确的反馈提供了基础。According to the method provided by the embodiments of the present disclosure, by acquiring the first dialogue state of the above information and the second dialogue state of the following information in the front and rear dialogues in multiple rounds of dialogue; respectively, the difference between the first dialogue state and the second dialogue state is calculated First correlation, and compare the first correlation with a preset first threshold, and if the first correlation is less than the preset first threshold, only identify the user's intention based on the following information to obtain User intention recognition result. Since this embodiment fully considers the relevance of the interactive information between the front and rear dialogues, when there is a large difference in the interactive information between the two, the latter dialogue will be used as a separate information for user intent analysis, so that more information can be obtained. It provides a basis for accurate analysis results and accurate feedback for users' information.
附图说明Description of the drawings
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present disclosure or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments described in the present disclosure. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative work.
图1是本公开实施例中一种多轮对话识别用户意图的方法的步骤流程图;FIG. 1 is a flowchart of steps of a method for recognizing user intentions in multiple rounds of dialogue in an embodiment of the present disclosure;
图2是多轮对话系统的信息流程方式示意图;Figure 2 is a schematic diagram of the information flow of the multi-round dialogue system;
图3多轮对话系统的原理结构示意图;Figure 3 Schematic diagram of the principle structure of a multi-round dialogue system;
图4是本公开实施例中一个示例性应用场景的框架示意图;Fig. 4 is a schematic diagram of a framework of an exemplary application scenario in an embodiment of the present disclosure;
图5是本公开实施例中一种计算机设备的原理结构框图。Fig. 5 is a block diagram of the principle structure of a computer device in an embodiment of the present disclosure.
具体实施方式detailed description
为了使本技术领域的人员更好地理解本公开方案,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是 本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。In order to enable those skilled in the art to better understand the solutions of the present disclosure, the technical solutions in the embodiments of the present disclosure will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only These are a part of the embodiments of the present disclosure, but not all of the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present disclosure.
多轮对话是人机互动过程中,初步明确用户意图之后,获取必要信息以最终得到确定的用户指令的方式,多轮对话与一件事情的处理相对应,其可以表现为人机之间的多次对话交互,若一次对话便可以明确用户指令,则多轮对话可以表现为人机之间的一次对话交互。Multi-round dialogue is the process of human-computer interaction. After the user’s intention is initially clarified, the necessary information is obtained to finally obtain a definite user instruction. Multi-round dialogue corresponds to the processing of one thing, which can be expressed as a multitude of interactions between humans and machines. If the user's instructions can be clearly defined in one dialogue, then multiple rounds of dialogue can be expressed as one dialogue interaction between man and machine.
经过研究发现,目前的多轮对话忽略话语之间的随机性,需要根据对话前后内容的相关性,来判断是否需要根据之前的对话语境进行决策,而多轮对话中前后内容之间可能存在并不相关的情况,若前后之间的内容并不相关,对后一轮内容产生系统反馈行为时,还是执行在前一轮内容的基础上进行检索实现,则可能会导致对后一轮信息产生的反馈行为不准确。After research, it is found that the current multi-round dialogue ignores the randomness between the utterances, and it is necessary to judge whether it is necessary to make a decision based on the previous dialogue context based on the relevance of the content before and after the dialogue, and there may be between the content before and after the dialogue. Irrelevant situations. If the content between the front and the back is not relevant, when the system feedback behavior is generated for the next round of content, the retrieval is implemented on the basis of the previous round of content, which may lead to the subsequent round of information The resulting feedback behavior is inaccurate.
本实施例公开了一种多轮对话中识别用户意图的方法,通过分析前后轮对话之间信息的相关性,判断出是否需要在前一轮对话的查询结果的基础上进行后一轮对话信息的查询,例如:当用户先说“小猪佩琪”,再说“恐怖”时,先判断两者的相关性,如果不相关则执行单轮搜索,即单独搜索“小猪佩琪”和单独搜索“恐怖”,如果相关,则先根据“小猪佩琪”进行搜索,然后在“小猪佩琪”的搜索结果上,去搜索“恐怖”。由于“小猪佩琪”与“恐怖”之间不相关,因此利用本实施例所公开的方法可以得到更好准确的结果。This embodiment discloses a method for identifying user intentions in multiple rounds of dialogue. By analyzing the relevance of information between the front and back rounds of dialogue, it is determined whether it is necessary to perform the next round of dialogue information on the basis of the query results of the previous round of dialogue. For example, when the user says "Little Pig" and then "Terror", first judge the relevance of the two. If they are not relevant, perform a single-round search, that is, search for "Little Pig" and "Single" Search for "horror". If relevant, search for "Little Pig Peggy" first, and then search for "horror" on the search results of "Little Pig Peggy". Since there is no correlation between "Little Pig Peggy" and "Terror", better and more accurate results can be obtained by using the method disclosed in this embodiment.
示例性方法Exemplary method
参见图1,示出了本公开实施例中的一种多轮对话中用户意图的识别方法。在本实施例中,所述方法例如可以包括以下步骤:Referring to Fig. 1, there is shown a method for recognizing user intentions in multiple rounds of dialogue in an embodiment of the present disclosure. In this embodiment, the method may include the following steps, for example:
步骤S101:获取多轮对话中上文信息的第一对话状态和下文信息的第二对话状态。Step S101: Acquire the first dialogue state of the above information and the second dialogue state of the following information in multiple rounds of dialogue.
在多轮对话中,分别获取用户发出上文信息所对应的第一对话状态和下文信息的第二对话状态。其中所述下文信息是多轮对话中人机交互时,处于后一次人机交互时用户发出的语音信息,上文信息是多轮对话中人机交互时,相比较后一次人机交互,用户前一次发出的语音信息。上文信息和下文信息属于用户发出的上一句语音信息和下一句语音信息,并且上文信息和下文信息属于自然语音对话,可以使用中文、英文或者其他自然语音与对话系统之间进行多轮交谈。对话系统接收到用户发出的语音信息后,帮助用户完成一个任务,该任务通常是访问信息的任务。In multiple rounds of dialogues, the first dialogue state corresponding to the above information and the second dialogue state of the following information are obtained respectively. The following information is the voice information sent by the user during the next human-computer interaction during the human-computer interaction in multiple rounds of dialogue. The above information is the human-computer interaction in the multiple rounds of dialogue. Compared with the last human-computer interaction, the user The previous voice message. The above information and the following information belong to the previous and next voice information sent by the user, and the above information and the following information belong to the natural voice dialogue. You can use Chinese, English or other natural voices to conduct multiple rounds of conversations with the dialogue system. . After the dialogue system receives the voice information sent by the user, it helps the user complete a task, which is usually the task of accessing information.
所述对话状态为包含用户发出的语音信息所转化的文本以及根据所述文本分析出的与所述文本信息相关的信息。当获取到上文信息和下文信息后,分别对上文信息和下文信息进行语音识别、语义识别得到所述上文信息的第一对话状态和下文信息的第二对话状态。The dialogue state includes the text converted from the voice information sent by the user and the information related to the text information analyzed according to the text. After obtaining the above information and the following information, the above information and the following information are respectively subjected to voice recognition and semantic recognition to obtain the first dialogue state of the above information and the second dialogue state of the following information.
结合图2所示,所述多轮对话中包括:语音理解,语音生成,对话管理,知识库搜索等步骤。对话管理又包括对话状态跟踪和动作选择等步骤。可以认为多轮对话是基于分析的单轮对话的扩展,在每轮对话中,对发话进行语义理解,产生内部表征。对话管理使用有限状态机,表示对话中获取信息的整个过程。经过几轮对话,系统逐步获取所需信息,并执行任务,如航班信息查询。As shown in FIG. 2, the multiple rounds of dialogue include: voice understanding, voice generation, dialogue management, knowledge base search and other steps. Dialogue management also includes steps such as dialogue state tracking and action selection. It can be considered that multiple rounds of dialogue are the expansion of a single round of dialogue based on analysis. In each round of dialogue, the semantics of the speech is understood and internal representations are generated. Dialogue management uses a finite state machine, which represents the entire process of obtaining information in a dialogue. After several rounds of dialogue, the system gradually obtains the required information and performs tasks such as flight information query.
结合图3,首先获取用户发出的上文信息,通过对用户发出的上文信息进行语音识别,产生语音识别结果,也就是上文信息所对应的文本信息;语义解析模块将文本信息映射成用户对话状态,也即是第一对话状态;同样的,对获取用户发出的下文信息进行语音识别、得到语音识别结果,在对语音识别结果映射成用户对话状态,得到第二对话状态。With reference to Figure 3, the above information sent by the user is first obtained, and the voice recognition result is generated through the voice recognition of the above information sent by the user, which is the text information corresponding to the above information; the semantic analysis module maps the text information to the user The dialogue state is the first dialogue state; similarly, voice recognition is performed on the following information sent by the user to obtain the voice recognition result, and the voice recognition result is mapped to the user dialogue state to obtain the second dialogue state.
本步骤中,为了获取到上文信息的第一对话状态,首先需要对所述上文信息进行语音识别,识别出上文信息中含有的文本信息,再对识别出的文本信息进行语义分析,得到该文本信息中所包含的信息。在对语义分析时一般可以有两种处理方式,一种是具有检索得到与所述文本信息相对应的信息,另一种是基于生成的方式,生成与所述文本信息相对应的信息。利用检索得到与所述文本信息相对应的信息的方式,一般是需建立存储数据库,在所述存储数据库中存储有大量的对话数据,对话数据与对话关键词之间建立索引,对文本信息中含有的关键词进行识别后,根据关键词输出数据库中相对应的对话数据,也即是分析出的与文本信息相对应的第一对话状态。利用生成进行语义分析的方式,该语义分析处理模块利用大量数据构建出一个语音分析模型。当用户输入一个文本信息后,所述语音分析模型输出与所述文本信息相对应的分析结果。所述语音分析模型是基于深度学习的神经网络在大量对话数据的基础上,构建而成的。该语音分析模型分析出的结果即为与所述上文信息或下文信息所对应的对话状态。In this step, in order to obtain the first dialogue state of the above information, it is first necessary to perform voice recognition on the above information, identify the text information contained in the above information, and then perform semantic analysis on the recognized text information. Get the information contained in the text message. In semantic analysis, there are generally two processing methods. One is to retrieve information corresponding to the text information, and the other is to generate information corresponding to the text information based on a generation method. The way to retrieve the information corresponding to the text information generally requires the establishment of a storage database, in which a large amount of dialogue data is stored, and an index is established between the dialogue data and the dialogue keywords. After the contained keywords are identified, the corresponding dialogue data in the database is output according to the keywords, that is, the analyzed first dialogue state corresponding to the text information. In the way of semantic analysis by generation, the semantic analysis processing module uses a large amount of data to construct a speech analysis model. After the user inputs a piece of text information, the voice analysis model outputs an analysis result corresponding to the text information. The voice analysis model is constructed on the basis of a large amount of dialogue data based on a deep learning neural network. The result of the analysis of the voice analysis model is the dialogue state corresponding to the above information or the following information.
使用相同的语音识别和语义分析方法分别对下文信息进行语音识别和语义分析,得到与所述上文信息相对应的第一对话状态和与所述下文信息相对应的第二对话状态。Using the same voice recognition and semantic analysis methods to perform voice recognition and semantic analysis on the following information, respectively, to obtain a first dialogue state corresponding to the above information and a second dialogue state corresponding to the following information.
步骤S102:若所述第一对话状态与所述第二对话状态之间的第一相关性小于预设第一阈值,则根据所述下文信息对用户意图进行识别,得到用户意图识别结果。Step S102: If the first correlation between the first dialogue state and the second dialogue state is less than a preset first threshold, the user intention is recognized according to the following information, and the user intention recognition result is obtained.
当上述步骤中获取到所述第一对话状态和所述第二对话状态之后,计算所述第一对话状态与第二对话状态之间的第一相关性,并判断所述第一相关性是否小于所述预设第一阈值,若小于,则判断上文信息与下文信息之间的相关性小,可以直接根据下文信息对用户意图进行识别,得到用户意图识别结果。After the first dialogue state and the second dialogue state are obtained in the above steps, the first correlation between the first dialogue state and the second dialogue state is calculated, and it is determined whether the first correlation is It is less than the preset first threshold. If it is less than, it is determined that the correlation between the above information and the following information is small, and the user's intention can be directly identified based on the following information to obtain the user's intention recognition result.
具体的,由于对话状态之间相关性的计算为对话状态所含信息中所对应槽位信息之间的相关性,因此所述计算所述第一对话状态与所述第二对话状态之间的第一相关性的步骤包括:Specifically, since the calculation of the correlation between the dialogue states is the correlation between the corresponding slot information in the information contained in the dialogue state, the calculation of the correlation between the first dialogue state and the second dialogue state The first correlation steps include:
获取所述第一对话状态的第一槽位信息;Acquiring the first slot information of the first dialog state;
获取所述第二对话状态的第二槽位信息;Acquiring the second slot information of the second dialogue state;
根据所述第一槽位信息和第二槽位信息计算出所述第一相关性。The first correlation is calculated according to the first slot information and the second slot information.
所述槽位信息为:多轮对话过程中将初步用户意图转化为明确用户指令所需要的信息,一个槽与一件事情的处理中所需要获取的一种信息相对应。槽位信息作为必须获取的一种信息,在多轮对话中不需要全部被填充完整,其分为必填槽位信息和非必填槽位信息。由于非必填槽位信息可以基于上下文信息中得到,因此其可以默认值的形式存在。The slot information is the information needed to transform the preliminary user's intentions into clear user instructions in the process of multiple rounds of dialogue, and one slot corresponds to a type of information that needs to be acquired in the processing of one thing. Slot information is a kind of information that must be obtained. It does not need to be completely filled in multiple rounds of dialogue. It is divided into required slot information and non-required slot information. Since the non-mandatory slot information can be obtained based on the context information, it can exist in the form of a default value.
例如:今天天气如何?在此对话中,由于天气情况是不同的地域处于不同的天气状态,必须根据地理位置才能做出天气情况的搜索,但是由于可以获知用户所在地点为:北京,可以默认该对话所对应的默认询问北京的天气情况,因此系统可以直接做出反馈信息:搜索北京天气。For example: how is the weather today? In this dialog, because the weather conditions are different in different regions and in different weather conditions, the weather conditions must be searched based on geographic location. However, since the user's location can be known as: Beijing, the default query corresponding to the dialog can be defaulted The weather in Beijing, so the system can directly give feedback: search for the weather in Beijing.
具体的,所述根据所述第一槽位信息和第二槽位信息计算出所述第一相关性的步骤包括:Specifically, the step of calculating the first correlation according to the first slot information and the second slot information includes:
分别获取所述第一槽位信息和所述第二槽位信息中各个槽位所含信息的第一字符串信息和第二字符串信息;Acquiring the first character string information and the second character string information of the information contained in each slot in the first slot information and the second slot information, respectively;
计算所述第一字符串信息与所述第二字符串信息中各个字符串之间的编辑距离;Calculating an edit distance between each character string in the first character string information and the second character string information;
根据所述编辑距离的大小计算出所述第一相关性。The first correlation is calculated according to the size of the edit distance.
例如在对轮对话中,下文信息为:“所述人民币对美元的汇率是多少”。则所述下文信息多含的槽位信息为:“查询(槽位1=人民币,槽位2=美元)”这样的形式将作为对话管理模块的输入,这时候状态追踪模块就要根据前一轮的信息,结合该输入判断该轮的查询状态,确定前一轮所出的用户状态为:货币信息查询,则根据下文信息的两个槽位所对应的字符串“人民币”和“美元”,以及上一轮系统反馈信息所对应的槽位信息字符串:“货币信息查询”,计算两者之间的编辑距离,得到由下文信息所对应的字符串信息转成上 文信息所对应的字符串所需的最少编辑操作次数,从而得到两者之间的相关性。For example, in a round of dialogue, the following information is: "What is the exchange rate of the said RMB against the US dollar". Then the slot information contained in the following information is: "query (slot 1=RMB, slot 2=USD)" This form will be used as the input of the dialogue management module. At this time, the status tracking module will be based on the previous one. The round information is combined with the input to determine the query status of the round, and the user status in the previous round is determined as: currency information query, according to the character strings "RMB" and "USD" corresponding to the two slots of the information below , And the slot information string corresponding to the last round of system feedback information: "currency information query", calculate the edit distance between the two, and get the string information corresponding to the following information converted into the corresponding information above The minimum number of editing operations required for the string to get the correlation between the two.
具体的,计算字符串之间的编辑距离的算法过程包括:Specifically, the algorithmic process of calculating the edit distance between character strings includes:
假设我们使用d[i,j]个步骤(可以使用一个二维数组保存这个值),表示将字符串s[1…i]转换为字符串t[1…j]所需要的最少步骤个数,那么,在最基本的情况下,即在i等于0时,也就是说字符串s为空,那么对应的d[0,j]就是增加j个字符,使得s转化为t,在j等于0时,也就是说字符串t为空,那么对应的d[i,0]就是减少i个字符,使得s转化为t。Suppose we use d[i,j] steps (you can use a two-dimensional array to save this value), which represents the minimum number of steps required to convert the string s[1...i] to the string t[1...j] , Then, in the most basic case, that is, when i is equal to 0, that is to say, the string s is empty, then the corresponding d[0,j] is an increase of j characters, so that s is converted to t, and j is equal to When 0, that is to say, the string t is empty, then the corresponding d[i,0] is to reduce i characters, so that s is converted to t.
然后我们考虑一般情况,加动态规划的算法,我们要想得到将s[1..i]经过最少次数的增加,删除,或者替换操作就转变为t[1..j],那么我们就必须在之前可以以最少次数的增加,删除,或者替换操作,使得现在字符串s和字符串t只需要再做一次操作或者不做就可以完成s[1..i]到t[1..j]的转换。所谓的“之前”分为下面三种情况:Then we consider the general situation and add the dynamic programming algorithm. If we want to get s[1..i] after the least number of additions, deletions, or replacement operations, then we must transform it into t[1..j]. Previously, it was possible to add, delete, or replace in the least number of times, so that now the string s and string t only need to do one more operation or not to complete s[1..i] to t[1..j] Conversion. The so-called "before" is divided into the following three situations:
1)我们可以在k个操作内将s[1…i]转换为t[1…j-1];1) We can convert s[1...i] to t[1...j-1] in k operations;
2)我们可以在k个操作里面将s[1..i-1]转换为t[1..j];2) We can convert s[1..i-1] to t[1..j] in k operations;
3)我们可以在k个步骤里面将s[1…i-1]转换为t[1…j-1]。3) We can convert s[1...i-1] to t[1...j-1] in k steps.
针对第1种情况,我们只需要在最后将t[j],加上s[1..i]就完成了匹配,这样总共就需要k+1个操作。For the first case, we only need to add t[j] and s[1..i] at the end to complete the matching, so that a total of k+1 operations are required.
针对第2种情况,我们只需要在最后将s[i]移除,然后再做这k个操作,所以总共需要k+1个操作。For the second case, we only need to remove s[i] at the end, and then do these k operations, so a total of k+1 operations are required.
针对第3种情况,我们只需要在最后将s[i]替换为t[j],使得满足s[1..i]==t[1..j],这样总共也需要k+1个操作。而如果在第3种情况下,s[i]刚好等于t[j],那我们就可以仅仅使用k个操作就完成这个过程。For the third case, we only need to replace s[i] with t[j] at the end, so that s[1..i] == t[1..j], so that a total of k+1 is needed operating. And if in the third case, s[i] is exactly equal to t[j], then we can complete this process using only k operations.
最后,为了保证得到的操作次数总是最少的,我们可以从上面三种情况中选择消耗最少的一种最为将s[1..i]转换为t[1..j]所需要的最小操作次数。Finally, in order to ensure that the number of operations obtained is always the least, we can choose the least expensive one from the above three cases to convert s[1..i] to t[1..j]. The minimum operation required frequency.
算法基本步骤:The basic steps of the algorithm:
(1)构造一个行数为m+1、列数为n+1的矩阵,用来保存完成某个转换需要执行的操作的次数,将字符串s[1..n]转换到字符串t[1…m]所需要执行的操作次数为matrix[n][m]的值;(1) Construct a matrix with the number of rows m+1 and the number of columns n+1 to save the number of operations that need to be performed to complete a certain conversion, and convert the string s[1..n] to the string t [1...m] The number of operations that need to be performed is the value of matrix[n][m];
(2)初始化matrix第一行为0到n,第一列为0到m。(2) The first row of the initialization matrix is 0 to n, and the first column is 0 to m.
Matrix[0][j]表示第1行第j-1列的值,这个值表示将串s[1…0]转换为t[1..j]所需要执行的操作的次数,很显然将一个空字符串转换为一个长度为j的字符串,只需要j次的 add操作,所以matrix[0][j]的值应该是j,其他值以此类推。Matrix[0][j] represents the value in row 1 and column j-1. This value represents the number of operations required to convert the string s[1...0] to t[1..j]. Obviously An empty string is converted to a string of length j, only j times of add operations are required, so the value of matrix[0][j] should be j, and other values can be deduced by analogy.
(3)检查每个从1到n的s[i]字符;(3) Check every s[i] character from 1 to n;
(4)检查每个从1到m的s[i]字符;(4) Check every s[i] character from 1 to m;
(5)将字符串s和字符串t的每一个字符进行两两比较,如果相等,则让cost为0,如果不等,则让cost为1;(5) Compare each character of the string s and the string t in pairs, if they are equal, let the cost be 0, if they are not equal, let the cost be 1;
(6)a、如果我们可以在k个操作里面将s[1..i-1]转换为t[1..j],那么我们就可以将s[i]移除,然后再做这k个操作,所以总共需要k+1个操作。(6)a. If we can convert s[1..i-1] to t[1..j] in k operations, then we can remove s[i] and then do this k Operations, so a total of k+1 operations are required.
(6)b、如果我们可以在k个操作内将s[1…i]转换为t[1…j-1],也就是说d[i,j-1]=k,那么我们就可以将t[j]加上s[1..i],这样总共就需要k+1个操作。(6) b. If we can convert s[1...i] to t[1...j-1] in k operations, that is to say, d[i,j-1]=k, then we can convert t[j] plus s[1..i], so a total of k+1 operations are required.
(6)c、如果我们可以在k个步骤里面将s[1…i-1]转换为t[1…j-1],那么我们就可以将s[i]转换为t[j],使得满足s[1..i]==t[1..j],这样总共也需要k+1个操作。(这里加上cost,是因为如果s[i]刚好等于t[j],那么就不需要再做替换操作,即可满足,如果不等,则需要再做一次替换操作,那么就需要k+1次操作)(6)c. If we can convert s[1...i-1] to t[1...j-1] in k steps, then we can convert s[i] to t[j], such that Satisfying s[1..i] == t[1..j], so a total of k+1 operations are also required. (The cost is added here because if s[i] is exactly equal to t[j], then there is no need to perform a replacement operation, and it can be satisfied. If it does not wait, you need to do a replacement operation again, and then you need k+ 1 operation)
因为我们要取得最小操作的个数,所以我们最后还需要将这三种情况的操作个数进行比较,取最小值作为d[i,j]的值;Because we want to obtain the minimum number of operations, we finally need to compare the number of operations in these three cases, and take the minimum value as the value of d[i,j];
然后重复执行(3),(4),(5),(6),最后的结果就在d[n,m]中。Then repeat (3), (4), (5), (6), and the final result is in d[n,m].
根据判断计算出的第一相关性与预设第一阈值之间大小,做出执行单轮对话还是多轮对话对用户意图进行识别,若第一相关性大于预设第一阈值,则执行多轮对话识别用户意图,若第一相关性小于预设第一阈值,则执行单轮对话识别用户意图。According to the magnitude between the calculated first correlation and the preset first threshold, it is determined whether to perform a single round of dialogue or multiple rounds of dialogue to identify the user's intention. If the first correlation is greater than the preset first threshold, then multiple A round of dialogue recognizes the user's intention, and if the first correlation is less than a preset first threshold, a single round of dialogue is performed to recognize the user's intention.
在本实施例的一种实施方式中,为了获取更准确的相关性的判断结果,所述若所述第一对话状态与所述第二对话状态之间的第一相关性小于预设第一阈值,则根据所述下文信息对用户意图进行识别,得到用户意图识别结果,包括:In an implementation of this embodiment, in order to obtain a more accurate correlation determination result, if the first correlation between the first dialogue state and the second dialogue state is less than the preset first correlation Threshold, the user's intention is identified according to the following information, and the result of user's intention identification is obtained, including:
步骤103、若所述第一对话状态与所述第二对话状态之间的相关性大于等于预设第一阈值,则获取与所述第一对话状态所对应的系统反馈信息;Step 103: If the correlation between the first dialogue state and the second dialogue state is greater than or equal to a preset first threshold, obtain system feedback information corresponding to the first dialogue state;
针对所述第一对话状态与用户对话的机器系统会自动反馈一个回复信息,所述回复信息为系统反馈信息,所述系统反馈信息是由对话管理模块实现。结合图3所示,对话管理模块根据所述第一对话状态和第二对话状态会选择需要执行的系统反馈行为,也即是系统反馈信息;如果这个系统反馈信息需要和用户交互,那么语言生成模块会被触发,生成自然语言或者说是系统话语;最后,生成的语言由语音合成模块朗读给用户听。The machine system that talks to the user in the first dialog state will automatically feed back a reply message. The reply message is system feedback information, and the system feedback information is implemented by the dialog management module. As shown in Figure 3, the dialog management module will select the system feedback behavior that needs to be performed according to the first dialog state and the second dialog state, that is, system feedback information; if the system feedback information needs to interact with the user, then the language is generated The module will be triggered to generate natural language or system speech; finally, the generated language is read aloud to the user by the speech synthesis module.
对话管理的任务主要包括:对话状态维护和生成系统决策。对话状态维护包含维护 和更新所述对话状态,比如:t+1时刻的对话状态为S t+1,依赖于之前时刻t的状态S t,和之前时刻t的系统行为a t,以及当前时刻t+1对应的用户行为a t+1。可以写成S t+1←S t+a t+a t+1生成系统决策根据对话状态跟踪中的对话状态,产生系统反馈行为,决定下一步做什么,系统反馈行为表示根据用户的输入的对话状态,系统做出的反馈行为,因此所述对话管理模型的输入的信息为用户的语音信息与对用户的语音信息分析得到的当前对话状态,其输出为下一步的系统反馈行为和更新的对话状态。因此输入的信息中所携带的语义信息越多,则对话管理模块所反馈的信息越准确。 The main tasks of dialogue management include: dialogue state maintenance and system decision making. The dialogue state maintenance includes maintaining and updating the dialogue state. For example, the dialogue state at time t+1 is St+1 , which depends on the state St at the previous time t , and the system behavior at the previous time t , and the current time User behavior a t+1 corresponding to t+1. It can be written as S t+1 ←S t +a t +a t+1 to generate system decisions. According to the dialogue state in the dialogue state tracking, the system feedback behavior is generated to decide what to do next. The system feedback behavior represents the dialogue based on the user's input State, the feedback behavior made by the system. Therefore, the input information of the dialogue management model is the user's voice information and the current dialogue state obtained by analyzing the user's voice information, and its output is the next system feedback behavior and updated dialogue status. Therefore, the more semantic information carried in the input information, the more accurate the information fed back by the dialogue management module.
例如:当上文信息的内容为:我要看小猪佩奇,所对应的对话状态内容包括:影视、演员为动物、类型为喜剧和家庭剧以及其他与小猪佩奇相关的信息,而与所述对话状态相对应的系统反馈信息为:影视搜索。若下文信息为:第一季第3集,通过对下文信息语音识别和语义分析可以得到,与下文信息相对应的第二对话状态为:电视剧或动画片、第3集、多季剧情等,通过计算第一对话状态与第二对话状态之间的相似度,可以得到,第一对话状态与第二对话状态之间的相关性大于预设第一阈值,则需要获取针对第一对话状态的系统反馈信息:搜索小猪佩奇第一季第三集。For example: When the content of the above information is: I want to watch Peppa Pig, the corresponding dialogue status includes: film and television, actors are animals, comedy and family dramas are genres, and other information related to Peppa Pig. The system feedback information corresponding to the dialogue state is: video search. If the following information is: the third episode of the first season, through speech recognition and semantic analysis of the following information, the second dialogue state corresponding to the following information is: TV series or cartoons, episode 3, multi-season plot, etc., By calculating the similarity between the first dialogue state and the second dialogue state, it can be obtained that the correlation between the first dialogue state and the second dialogue state is greater than the preset first threshold, and it is necessary to obtain information about the first dialogue state. System feedback: Search for the third episode of the first season of Peppa Pig.
上述对话管理模块控制着人机对话的过程,其根据对话历史信息,决定此刻对用户的反应。最常见的为任务驱动的多轮对话,用户带着明确的目的如订餐、订票等,用户需求比较复杂,有很多限制条件,因此内容相对复杂的咨询应答需要分多轮进行陈述,一方面,用户在对话过程中可以不断修改或完善自己的需求,另一方面,当用户的陈述的需求不够具体或明确的时候,机器也可以通过询问、澄清或确认来帮助用户找到满意的结果,其对话过程如图4所述,用户和系统之间通过问答实现信息沟通,用户通过发出语音信息:嗨,我想订个餐,实现语音指令的传送,当系统接收到用户的语音指令后(所述系统可以为语音机器人,或者其他可以识别用户语音信息的设备),对该语音指令进行解析,识别出所述语音指令中含有的关键信息:餐厅,则系统反馈问询信息:你喜欢什么类型的食物呢,同时反馈的对话行为:食物,则用户接收到该信息后,再次发出语音信息:我喜欢吃宫保鸡丁,则系统接收到语音信息中包含的关键词:宫保鸡丁,根据接收到的信息对用户进行反馈确认,最后得到满意的订餐效果。The above-mentioned dialogue management module controls the process of man-machine dialogue, and determines the reaction to the user at the moment based on the dialogue history information. The most common multi-round dialogue is task-driven. The user has a clear purpose such as ordering food and ticketing. User needs are more complex and have many restrictions. Therefore, consultation responses with relatively complex content need to be presented in multiple rounds. On the one hand, In the dialogue process, users can continuously modify or improve their own needs. On the other hand, when the user’s stated needs are not specific or clear enough, the machine can also help users find satisfactory results by asking, clarifying, or confirming. The dialogue process is as shown in Figure 4. The user and the system realize information communication through question and answer. The user sends out a voice message: Hi, I want to order a meal and realize the transmission of voice commands. When the system receives the user’s voice command (so The system can be a voice robot, or other devices that can recognize the user's voice information), analyze the voice command, and identify the key information contained in the voice command: restaurant, then the system will feed back the inquiry information: what type do you like What about the food, and the feedback dialogue behavior: food, after the user receives the message, the user sends out the voice message again: I like to eat Gongbao chicken, then the system receives the keyword contained in the voice message: Gongbao chicken, According to the received information, the user feedback confirmation, and finally a satisfactory ordering effect is obtained.
步骤S104、若所述系统反馈信息与所述第二对话状态之间的相关性小于预设第二阈值,则根据所述下文信息对用户意图进行识别,得到用户意图识别结果。Step S104: If the correlation between the system feedback information and the second dialogue state is less than a preset second threshold, the user's intention is identified according to the following information, and the user's intention identification result is obtained.
若上述步骤S103中获取到的所述第一对话状态对应的系统反馈信息与所述第二对话状态之间的相关性小于预设第二阈值,则判定上一轮对话信息与下一轮对话信息相关 性低,仅仅以下文信息对用户意图进行识别,否则,判定上一轮对话信息与下一轮对话信息相关性高,则将下文信息和上文信息的相关内容相结合对用户意图进行识别。If the correlation between the system feedback information corresponding to the first dialogue state and the second dialogue state acquired in the above step S103 is less than the preset second threshold, it is determined that the previous round of dialogue information and the next round of dialogue The information is low in relevance. Only the following information is used to identify the user's intention. Otherwise, it is determined that the previous round of dialogue information is highly relevant to the next round of dialogue information, and the following information and the relevant content of the above information are combined to determine the user's intention. Recognition.
具体的,步骤S104若所述系统反馈信息与所述第二对话状态之间的相关性小于预设第二阈值,则根据所述下文信息对用户意图进行识别,得到用户意图识别结果的步骤之前,还包括:Specifically, in step S104, if the correlation between the system feedback information and the second dialogue state is less than the preset second threshold, the user intention is identified according to the following information, and the user intention identification result is obtained before the step ,Also includes:
获取所述系统反馈信息的第三槽位信息;Acquiring the third slot information of the system feedback information;
获取所述第二对话状态的第二槽位信息;Acquiring the second slot information of the second dialogue state;
根据所述第三槽位信息和第二槽位信息计算出所述第二相关性;Calculating the second correlation according to the third slot information and the second slot information;
判断所述第二相关性是否小于预设第二阈值。It is determined whether the second correlation is less than a preset second threshold.
与上述获取第一对话状态的第一槽位信息和获取第二对话状态的第二槽位信息相同,分别获取系统反馈信息的第三槽位信息和第二对话状态的第二槽位信息,再根据获取到的槽位信息计算所述系统反馈信息与第二对话状态之间的第二相关性。Same as the above-mentioned obtaining the first slot information of the first dialogue state and obtaining the second slot information of the second dialogue state, respectively obtaining the third slot information of the system feedback information and the second slot information of the second dialogue state, Then, the second correlation between the system feedback information and the second dialogue state is calculated according to the obtained slot information.
具体的,所述根据所述第三槽位信息和第二槽位信息计算出所述第二相关性的步骤包括:Specifically, the step of calculating the second correlation according to the third slot information and the second slot information includes:
获取所述第三槽位信息中各个槽位所含字符串,并将各个字符串合并为第三字符串信息;Acquiring the character strings contained in each slot in the third slot information, and merging the character strings into the third character string information;
获取所述第二槽位信息中各个槽位所含字符串,并将各个字符串合并为第二字符串信息;Acquiring the character strings contained in each slot in the second slot information, and merging the character strings into the second character string information;
计算所述第三字符串信息与所述第二字符串信息中各个字符串之间的编辑距离;Calculating the edit distance between each character string in the third character string information and the second character string information;
根据所述编辑距离的大小计算出所述第二相关性。The second correlation is calculated according to the size of the edit distance.
计算槽位信息之间的相关性,具体是计算与槽位信息所含信息的字符串信息之间的相关性,而字符串信息之间的相关性是通过计算字符串之间的编辑距离体现的,其计算原理与上述步骤中计算第一槽位信息与第二槽位信息之间的相关性的原理相同。Calculate the correlation between the slot information, specifically to calculate the correlation with the string information of the information contained in the slot information, and the correlation between the string information is reflected by calculating the edit distance between the strings Yes, the calculation principle is the same as the calculation principle of the correlation between the first slot information and the second slot information in the above steps.
在一种实现方式中,所述计算所述第一对话状态与所述第二对话状态之间的第一相关性的步骤之前,还包括:In an implementation manner, before the step of calculating the first correlation between the first dialogue state and the second dialogue state, the method further includes:
判断所述第一对话状态和/或第二对话状态所含有的槽位是否填充完整;Judging whether the slots contained in the first dialogue state and/or the second dialogue state are completely filled;
若未填充完整,则获取与所述第一对话状态和/或第二对话状态中未填充完整的槽位所缺失的关键词信息,并根据获取到的关键词信息将所述第一对话状态和/或第二对话状态所含有的槽位填充完整。If it is not completely filled, then obtain the missing keyword information with the incompletely filled slots in the first dialogue state and/or the second dialogue state, and change the first dialogue state according to the obtained keyword information And/or the slots contained in the second dialog state are completely filled.
若第一对话状态和/或第二对话状态所对应的槽位未填充完整,则可能会导致相关性 的计算准确度低,因此上述步骤中对第一对话状态和第二对话状态所含有的槽位是否填充完整进行判断,若未填充完整,则将其填充完整,然后再对计算两者之间的相关性。If the slots corresponding to the first dialogue state and/or the second dialogue state are not filled completely, the calculation accuracy of the correlation may be low. Therefore, in the above steps, the first dialogue state and the second dialogue state contain It is judged whether the slot is filled completely. If it is not filled completely, it will be filled completely, and then the correlation between the two will be calculated.
同样的,计算所述系统反馈信息的第三槽位信息和第二对话状态的第二槽位信息之间相关性的步骤之前,还包括:Similarly, before the step of calculating the correlation between the third slot information of the system feedback information and the second slot information of the second dialogue state, the method further includes:
判定所述第三槽位信息和所述第二槽位信息是否完整;Determine whether the third slot information and the second slot information are complete;
若未填充完整,则获取与所述系统反馈信息和/或第二对话状态中未填充完整的槽位所缺失的关键词信息,并根据获取到的关键词信息将所述系统反馈信息和/或第二对话状态所含有的槽位填充完整。If it is not completely filled, then obtain the missing keyword information with the system feedback information and/or the incompletely filled slot in the second dialogue state, and send the system feedback information and/or according to the obtained keyword information Or the slots contained in the second dialogue state are completely filled.
若系统反馈信息和/或第二对话状态所对应的槽位未填充完整,则可能会导致相关性的计算准确度低,因此上述步骤中对系统反馈信息和第二对话状态所含有的槽位是否填充完整进行判断,若未填充完整,则将其填充完整,然后再对计算两者之间的相关性。If the slot corresponding to the system feedback information and/or the second dialogue state is not filled completely, the calculation accuracy of the correlation may be low. Therefore, in the above steps, the slots contained in the system feedback information and the second dialogue state are not fully filled. It is judged whether the filling is complete. If it is not filled, it will be filled completely, and then the correlation between the two will be calculated.
当多轮对话中前一轮对话与后一轮对话相关性高,也即是当第一相关性高于预设第一预设阈值,或者当第一相关性小于等于预设第一阈值,但是第二相关性小于预设第二阈值时,以多轮对话识别用户意图,也即是将所述第一对话状态、所述系统反馈信息与下文信息相结合对用户意图进行识别,得到用户意图识别结果。When the correlation between the previous round of dialogue and the next round of dialogue among multiple rounds of dialogue is high, that is, when the first correlation is higher than the preset first preset threshold, or when the first correlation is less than or equal to the preset first threshold, However, when the second correlation is less than the preset second threshold, multiple rounds of dialogue are used to identify the user's intent, that is, the first dialogue state, the system feedback information and the following information are combined to identify the user's intent to obtain the user Intent recognition result.
具体的,将所述第一对话状态、所述系统反馈信息与下文信息相结合对用户意图进行识别,得到用户意图识别结果的步骤包括:Specifically, the step of identifying the user's intention by combining the first dialogue state, the system feedback information and the following information, and obtaining the user's intention recognition result includes:
获取所述第一对话状态和所述系统反馈信息所含信息的字符信息,提取出第一关键词集;Acquiring the first dialogue state and the character information of the information contained in the system feedback information, and extracting the first keyword set;
获取所述第二对话状态所含信息的字符信息,提取出第二关键词集;Acquiring character information of the information contained in the second dialogue state, and extracting a second keyword set;
搜索与所述第一关键词集相对应的第一用户指令信息;Searching for first user instruction information corresponding to the first keyword set;
在搜索出的所述第一用户指令信息下搜索与所述第二关键词集相对应的第二用户指令信息;Searching for second user instruction information corresponding to the second keyword set under the searched first user instruction information;
根据所述第二用户指令信息得到用户意图识别结果。Obtain the user intention recognition result according to the second user instruction information.
当第一对话状态和/或所述系统反馈信息与第二对话状态之间的相关性满足预设阈值条件后,则先对跟你讲第一对话状态和所述系统反馈信息中所含有的字符信息,进行用户意图的识别,得到针对上文信息的搜索结果,在上文信息的搜索结果基础之上,再对下文信息进行搜索,从而反馈出用户所发出的上文信息和下文信息相对应的用户指令,及搜索结果。When the first dialogue state and/or the correlation between the system feedback information and the second dialogue state meets the preset threshold condition, the first dialogue state and the system feedback information contained in the first dialogue state will be told to you Character information, the user’s intention is recognized, and the search results for the above information are obtained. On the basis of the search results of the above information, the following information is searched, so as to feed back the above information and the following information sent by the user. Corresponding user instructions, and search results.
当多轮对话中前一轮对话与后一轮对话相关性低,也即是当第一相关性小于预设第 一预设阈值,以单轮对话识别用户意图,也即是将仅仅以下文信息相结合对用户意图进行识别,得到用户意图识别结果。When the relevance between the previous round of dialogue and the next round of dialogue in multiple rounds of dialogue is low, that is, when the first relevance is less than the preset first preset threshold, a single round of dialogue is used to identify the user's intention, that is, only the following The information is combined to identify the user's intention, and the result of the user's intention identification is obtained.
当所述第一对话状态、所述系统反馈信息与所述第二对话状态之间的相关性不满足预设阈值条件,则执行单轮对话,仅仅根据下文信息对用户意图进行识别,具体的,所述根据所述下文信息对用户意图进行识别,得到用户意图识别结果的步骤包括:When the correlation between the first dialogue state, the system feedback information, and the second dialogue state does not meet the preset threshold condition, a single round of dialogue is executed, and the user's intention is only identified based on the following information. The specific , The step of recognizing the user's intention according to the following information and obtaining the result of the user's intention recognition includes:
获取所述第二对话状态所含信息的字符信息,提取出第二关键词集;Acquiring character information of the information contained in the second dialogue state, and extracting a second keyword set;
搜索与所述第二关键词集相对应的第二用户指令信息;Searching for second user instruction information corresponding to the second keyword set;
根据所述第二用户指令信息得到用户意图识别结果。Obtain the user intention recognition result according to the second user instruction information.
上述单轮对话中,仅仅针对下文信息得到用户意图识别的结果,由于本次搜索未被限定在上文信息的相关内容之内,则可以获取到的更加符合用户意图的搜索信息。In the above-mentioned single round of dialogue, only the results of user intention recognition are obtained for the following information. Since this search is not limited to the relevant content of the above information, search information that is more in line with the user's intention can be obtained.
本实施例所公开的方法中,由于对多轮对话中相邻的上下文信息所对应的对话状态及系统反馈信息进行相关性计算,因此可以避免当前一句对话和后一句对话之间无相关性时,还是在前一句对话的搜索结果基础上,生成后一句对话搜索结果的情况,而是重新根据后一句对话的内容进行相应的搜索,从而提高了用户意图识别的准确性。In the method disclosed in this embodiment, since the dialogue state and system feedback information corresponding to the adjacent context information in the multiple rounds of dialogue are calculated, it is possible to avoid when there is no correlation between the current sentence of the dialogue and the next sentence of the dialogue. , Or based on the search results of the previous sentence of dialogue, generate the search results of the next sentence of dialogue, but re-search according to the content of the next sentence of dialogue, thereby improving the accuracy of user intention recognition.
在一种实施方式中,为了避免用户前后发出相同的语音信息,对话系统重复对相同语音的相关性进行计算,从而导致增加了系统信息处理的任务量,在对所述获取多轮对话中上文信息的第一对话状态和与所述第一对话状态所对应的系统反馈信息,以及下文信息的第二对话状态的步骤之前,还包括:In one embodiment, in order to prevent the user from sending out the same voice information before and after, the dialogue system repeatedly calculates the correlation of the same voice, which leads to an increase in the amount of system information processing tasks. Before the steps of the first dialogue state of the text information and the system feedback information corresponding to the first dialogue state, and the second dialogue state of the following information, it further includes:
判断所述上文信息与所述下文信息所对应的语音信息是否相同;Determine whether the above information and the voice information corresponding to the below information are the same;
若是,则忽略所述下文信息,并重新获取位于所述下文信息后一轮的对话信息,并将重新获取的所述对话信息替换为所述下文信息。If yes, ignore the following information, and re-acquire the dialogue information located in the next round after the following information, and replace the re-acquired dialogue information with the following information.
上述步骤中当获取到上文信息和下文信息之后,先对上文信息和下文信息是否相同进行对比,若完全相同,则判定出用户重复说了相同的语音信息,则忽略接收到的下文信息,并重新接收位于所述下文信息下一轮的信息,从而避免了系统不必要的信息处理过程。比如:当用户在上下文对话中重复说出:我要订机票,由于两次信息相同,则可以直接忽略第二次接收到的:“我要订机票”的语音信息,不对其进行语义分析、用户意图识别等处理,而是直接重新获取用户在发出第二次语音信息“我要订机票”之后,发出的“从西安到北京”的语音信息,将重新接收到的“从西安到北京”的语音信息作为下文信息,进行相似性的判断。In the above steps, after obtaining the above information and the following information, first compare whether the above information and the following information are the same. If they are exactly the same, it is determined that the user has repeatedly said the same voice information, and the received following information is ignored , And re-receive the information in the next round of the following information, thereby avoiding unnecessary information processing by the system. For example: when the user repeats in a contextual dialogue: I want to book a ticket, because the information is the same two times, you can simply ignore the voice message received the second time: "I want to book a ticket" without semantic analysis, User intention recognition and other processing, but directly re-acquire the voice message of "From Xi'an to Beijing" sent by the user after the second voice message "I want to book a flight", and will re-receive the "From Xi'an to Beijing" voice message. The voice information is used as the following information to make similarity judgments.
在本实施例的一种应用实施例中,可以采用以下步骤进行用户意图识别;In an application embodiment of this embodiment, the following steps can be used to identify user intentions:
步骤H1、首先判断上文信息与下文信息是否相同,若相同,则执行步骤H2,否则执行步骤H3;Step H1, first determine whether the above information is the same as the following information, if they are the same, perform step H2, otherwise, perform step H3;
步骤H2、重新获取下文信息。Step H2, reacquire the following information.
步骤H3,获取上文信息的第一对话状态和下文信息的第二对话状态;Step H3, acquiring the first dialogue state of the above information and the second dialogue state of the following information;
步骤H4,计算所述第一对话状态与所述第二对话状态之间的第一相关性;判断所述第一相关性是否大于预设第一阈值;若小于,执行步骤H5,否则执行步骤H6;Step H4: Calculate the first correlation between the first dialogue state and the second dialogue state; determine whether the first correlation is greater than a preset first threshold; if it is less, go to step H5, otherwise go to step H6;
步骤H5,获取与所述第一对话状态相对应的系统反馈信息,并计算所述系统反馈信息与所述第二对话状态之间的第二相关性,并所述第二对话状态与系统反馈信息之间的第二相关性是否低于预设第二阈值,若是则执行步骤H7,若否执行步骤H8;Step H5: Obtain the system feedback information corresponding to the first dialogue state, and calculate the second correlation between the system feedback information and the second dialogue state, and the second dialogue state and the system feedback Whether the second correlation between the information is lower than the preset second threshold, if yes, execute step H7, if not, execute step H8;
步骤H6,判定两轮对话之间具有相关性,则进入对轮对话,获取与第一对话状态所对应的系统反馈信息,并将第一对话状态、系统反馈信息、第二对话状态相结合对用户意图进行识别。Step H6: It is determined that there is a correlation between the two rounds of dialogue, then enter the opposite round dialogue, obtain the system feedback information corresponding to the first dialogue state, and combine the first dialogue state, the system feedback information, and the second dialogue state. Recognition of user intent.
步骤H7、本次对话可以作为单轮对话进行用户意图的识别。Step H7: This conversation can be used as a single-round conversation to identify the user's intention.
步骤H8、本次对话执行多轮对话,需要结合第一对话状态、系统反馈信息、第二对话状态对用户意图进行识别。Step H8: This dialogue performs multiple rounds of dialogue, and the user's intention needs to be identified in combination with the first dialogue state, system feedback information, and second dialogue state.
例如:当上文信息的内容为:我要看小猪佩奇,所对应的对话状态内容包括:影视、演员为动物、类型为喜剧和家庭剧以及其他与小猪佩奇相关的信息,而与所述对话状态相对应的系统反馈信息为:影视搜索。若下文信息为:今天的天气如何,通过对下文信息语音识别和语义分析可以得到,与下文信息相对应的第二对话状态为:地域位置、今天、温度、下雨等,通过计算第一对话状态与第二对话状态之间的相似度,可以得到第一对话状态与第二对话状态之间的相关性低于预设第一阈值,则判断下文信息与上文信息之间不具有相关性。因此不能在上文信息的结果基础上对下文信息做出系统反馈信息,需要重新对所述第二对话状态进行搜索,做出针对第二对话状态的系统反馈信息:搜索今天当前用户所在地域的天气状况。For example: When the content of the above information is: I want to watch Peppa Pig, the corresponding dialogue status includes: film and television, actors are animals, comedies and family dramas are genres, and other information related to Peppa Pig. The system feedback information corresponding to the dialogue state is: video search. If the following information is: what's the weather today, it can be obtained by speech recognition and semantic analysis of the following information, and the second dialogue state corresponding to the following information is: geographic location, today, temperature, rain, etc., by calculating the first dialogue The similarity between the state and the second dialogue state, it can be obtained that the correlation between the first dialogue state and the second dialogue state is lower than the preset first threshold, then it is judged that the following information is not related to the above information . Therefore, it is not possible to make system feedback information for the following information based on the results of the above information. It is necessary to search for the second dialogue state again and make system feedback information for the second dialogue state: search for the current user’s location today weather.
例如:当上文信息的内容为:我要看喜剧,所对应的对话状态内容包括:影视、类型为喜剧和家庭剧,而与所述对话状态相对应的系统反馈信息为:影视搜索。若下文信息为:小猪佩奇,所对应的对话状态内容包括:影视、演员为动物、类型为喜剧和家庭剧以及其他与小猪佩奇相关的信息,而与所述对话状态相对应的系统反馈信息为:喜剧影视搜索。通过对下文信息语音识别和语义分析可以得到,可以得到第一对话状态与第二对话状态之间的相关性高于预设第一阈值,则判断下文信息与上文信息之间具有相关 性。因此需要在上文信息的结果基础上对下文信息做出系统反馈信息,需要重新对所述第二对话状态进行搜索,做出针对第二对话状态的系统反馈信息:搜索小猪佩奇的相关影视资料。For example, when the content of the above information is: I want to watch a comedy, the corresponding dialogue state content includes: movies, genres are comedy and family drama, and the system feedback information corresponding to the dialogue state is: movie search. If the following information is: Peppa Pig, the corresponding dialogue state content includes: film and television, the actor is an animal, the genre is comedy and family drama, and other information related to Peppa Pig, which corresponds to the dialogue state The system feedback information is: comedy movie search. Through speech recognition and semantic analysis of the following information, it can be obtained that the correlation between the first dialogue state and the second dialogue state is higher than the preset first threshold, and then it is determined that the following information is related to the above information. Therefore, it is necessary to make system feedback information for the following information based on the results of the above information, and it is necessary to search for the second dialogue state again, and make system feedback information for the second dialogue state: search for the related information of Peppa Pig Video information.
示例性设备Exemplary equipment
在上述方法的基础上,本实施例还公开了一种计算机设备,如图5所示,包括存储器和处理器,所述存储器存储有计算机程序,其中,所述处理器执行所述计算机程序时实现所述方法的步骤。On the basis of the above method, this embodiment also discloses a computer device, as shown in FIG. 5, including a memory and a processor, the memory storing a computer program, wherein when the processor executes the computer program Implement the steps of the method.
在上述方法的基础上,本实施例还公开了一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现所述的方法的步骤。On the basis of the above method, this embodiment also discloses a computer-readable storage medium on which a computer program is stored, wherein the steps of the method are implemented when the computer program is executed by a processor.
在示例性实施例中,一种计算机设备可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述方法。In an exemplary embodiment, a computer device can be implemented by one or more application-specific integrated circuits (ASIC), digital signal processors (DSP), digital signal processing devices (DSPD), programmable logic devices (PLD), field Programmable gate array (FPGA), controller, microcontroller, microprocessor, or other electronic components are implemented to perform the above methods.
与现有技术相比,本公开实施例具有以下优点:Compared with the prior art, the embodiments of the present disclosure have the following advantages:
根据本公开实施方式提供的方法,通过获取多轮对话中前后轮对话的语音信息;对前后轮对话的语音信息所对应的对话状态之间的相关性进行计算,判断相关性是否超出某一个阈值,若未超过,则仅针对后一轮对话状态对用户意图进行识别,若超出,则判断前一轮对话的系统行为与后一轮对话的对话状态之间的相关性是否超过某个阈值,若未超出,则单独根据后一轮的对话状态对用户意图进行识别;若超出,则将前一轮的对话状态与第二对话状态相结合对用户意图进行识别。由于本实施例中充分考虑了前后轮对话之间信息的相关性,当两者信息差别较大时,则将后一轮对话作为一个单独信息进行用户意图分析,从而可以得到更为准确的分析结果,为实现针对用户发出信息做出精确的反馈提供了基础。According to the method provided by the embodiments of the present disclosure, by acquiring the voice information of the front and back dialogues in multiple rounds of dialogue; the correlation between the dialogue states corresponding to the voice information of the front and rear dialogues is calculated to determine whether the correlation exceeds a certain threshold If it is not exceeded, the user’s intention will be recognized only for the next round of dialogue state. If it is exceeded, it will be judged whether the correlation between the system behavior of the previous round of dialogue and the dialogue state of the next round of dialogue exceeds a certain threshold. If it is not exceeded, the user's intention is identified based on the dialogue state of the next round alone; if it is exceeded, the user's intention is identified by combining the dialogue state of the previous round with the second dialogue state. Since this embodiment fully considers the relevance of the information between the front and back rounds of dialogue, when there is a big difference in the information between the two, the latter round of dialogue will be used as a separate piece of information for user intent analysis, so that a more accurate analysis can be obtained. As a result, it provides a basis for realizing accurate feedback on information sent by users.
本领域技术人员在考虑说明书及实践这里公开的公开后,将容易想到本公开的其它实施方案。本公开旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。Those skilled in the art will easily think of other embodiments of the present disclosure after considering the specification and practicing the disclosure disclosed herein. The present disclosure is intended to cover any variations, uses, or adaptive changes of the present disclosure. These variations, uses, or adaptive changes follow the general principles of the present disclosure and include common knowledge or conventional technical means in the technical field that are not disclosed in the present disclosure. . The description and the embodiments are to be regarded as exemplary only, and the true scope and spirit of the present disclosure are pointed out by the following claims.
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。It should be understood that the present disclosure is not limited to the precise structure that has been described above and shown in the drawings, and various modifications and changes can be made without departing from its scope. The scope of the present disclosure is limited only by the appended claims.

Claims (16)

  1. 一种多轮对话中用户意图的识别的方法,其中,包括:A method for recognizing user intentions in multiple rounds of dialogue, which includes:
    获取多轮对话中上文信息的第一对话状态和下文信息的第二对话状态;Acquiring the first dialogue state of the above information and the second dialogue state of the following information in the multiple rounds of dialogue;
    若所述第一对话状态与所述第二对话状态之间的第一相关性小于预设第一阈值,则根据所述下文信息对用户意图进行识别,得到用户意图识别结果。If the first correlation between the first dialogue state and the second dialogue state is less than the preset first threshold, the user intention is recognized according to the following information, and the user intention recognition result is obtained.
  2. 根据权利要求1所述的多轮对话中用户意图的识别的方法,其中,所述获取多轮对话中上文信息的第一对话状态和下文信息的第二对话状态的步骤之前,还包括:The method for recognizing user intentions in multiple rounds of dialogues according to claim 1, wherein before the step of acquiring the first dialogue state of the above information in the multiple rounds of dialogues and the second dialogue state of the following information in the multiple rounds of dialogues, the method further comprises:
    判断所述上文信息与所述下文信息所对应的语音信息是否相同;Determine whether the above information and the voice information corresponding to the below information are the same;
    若是,则重新获取位于所述下文信息后一轮的对话信息,并将重新获取的所述对话信息替换为所述下文信息。If yes, re-acquire the dialogue information in the next round after the following information, and replace the re-acquired dialogue information with the following information.
  3. 根据权利要求1所述的多轮对话中用户意图的识别的方法,其中,所述分别获取多轮对话中上文信息的第一对话状态和下文信息的第二对话状态的步骤包括:The method for recognizing user intentions in multiple rounds of dialogues according to claim 1, wherein the step of separately acquiring the first dialogue state of the above information and the second dialogue state of the following information in the multiple rounds of dialogues comprises:
    获取多轮对话中的上文信息和下文信息;Obtain the above and below information in multiple rounds of dialogue;
    分别对所述上文信息和下文信息进行语音识别和语言分析,得到第一对话状态和第二对话状态。Perform voice recognition and language analysis on the above information and below information, respectively, to obtain the first dialogue state and the second dialogue state.
  4. 根据权利要求1所述的多轮对话中用户意图的识别方法,其中,所述若所述第一对话状态与所述第二对话状态之间的第一相关性小于预设第一阈值,则根据所述下文信息对用户意图进行识别,得到用户意图识别结果,包括:The method for identifying user intentions in multiple rounds of dialogue according to claim 1, wherein said if the first correlation between the first dialogue state and the second dialogue state is less than a preset first threshold, then Recognize the user's intention according to the following information, and obtain the user's intention recognition result, including:
    若所述第一对话状态与所述第二对话状态之间的相关性大于等于预设第一阈值,则获取与所述第一对话状态所对应的系统反馈信息;If the correlation between the first dialogue state and the second dialogue state is greater than or equal to a preset first threshold, acquiring system feedback information corresponding to the first dialogue state;
    若所述系统反馈信息与所述第二对话状态之间的相关性小于预设第二阈值,则根据所述下文信息对用户意图进行识别,得到用户意图识别结果。If the correlation between the system feedback information and the second dialogue state is less than the preset second threshold, the user's intention is identified according to the following information, and the user's intention identification result is obtained.
  5. 根据权利要求1-4任一项所述的多轮对话中用户意图的识别方法,其中,所述若所述第一对话状态与所述第二对话状态之间的第一相关性小于预设第一阈值,则根据所述下文信息对用户意图进行识别,得到用户意图识别结果的步骤之前,还包括:The method for recognizing user intentions in multiple rounds of conversations according to any one of claims 1 to 4, wherein, if the first correlation between the first conversation state and the second conversation state is less than a preset The first threshold is to identify the user's intention according to the following information, and before the step of obtaining the user's intention identification result, it also includes:
    获取所述第一对话状态的第一槽位信息;Acquiring the first slot information of the first dialog state;
    获取所述第二对话状态的第二槽位信息;Acquiring the second slot information of the second dialogue state;
    根据所述第一槽位信息和第二槽位信息计算出所述第一相关性;Calculating the first correlation according to the first slot information and the second slot information;
    判断所述第一相关性是否小于预设第一阈值。It is determined whether the first correlation is less than a preset first threshold.
  6. 根据权利要求5所述的多轮对话中用户意图的识别方法,其中,所述根据所述第一槽位信息和第二槽位信息计算出所述第一相关性的步骤包括:The method for recognizing user intentions in multiple rounds of conversations according to claim 5, wherein the step of calculating the first correlation based on the first slot information and the second slot information comprises:
    获取所述第一槽位信息中各个槽位所含字符串,并将各个字符串合并为第一字符串信息;Acquiring the character strings contained in each slot in the first slot information, and merging the character strings into the first character string information;
    获取所述第二槽位信息中各个槽位所含字符串,并将各个字符串合并为第二字符串信息;Acquiring the character strings contained in each slot in the second slot information, and merging the character strings into the second character string information;
    计算所述第一字符串信息与所述第二字符串信息中各个字符串之间的编辑距离;Calculating an edit distance between each character string in the first character string information and the second character string information;
    根据所述编辑距离的大小计算出所述第一相关性。The first correlation is calculated according to the size of the edit distance.
  7. 根据权利要求4所述的多轮对话中用户意图的识别方法,其中,所述若所述系统反馈信息与所述第二对话状态之间的相关性小于预设第二阈值,则根据所述下文信息对用户意图进行识别,得到用户意图识别结果的步骤之前,还包括:The method for recognizing user intentions in multiple rounds of dialogues according to claim 4, wherein said if the correlation between said system feedback information and said second dialogue state is less than a preset second threshold value, according to said The following information identifies the user's intention, and before the step of obtaining the user's intention identification result, it also includes:
    获取所述系统反馈信息的第三槽位信息;Acquiring the third slot information of the system feedback information;
    获取所述第二对话状态的第二槽位信息;Acquiring the second slot information of the second dialogue state;
    根据所述第三槽位信息和第二槽位信息计算出所述第二相关性;Calculating the second correlation according to the third slot information and the second slot information;
    判断所述第二相关性是否小于预设第二阈值。It is determined whether the second correlation is less than a preset second threshold.
  8. 根据权利要求7所述的多轮对话中用户意图的识别方法,其中,所述根据所述第三槽位信息和第二槽位信息计算出所述第二相关性的步骤包括:8. The method for recognizing user intentions in multiple rounds of conversations according to claim 7, wherein the step of calculating the second correlation based on the third slot information and the second slot information comprises:
    获取所述第三槽位信息中各个槽位所含字符串,并将各个字符串合并为第三字符串信息;Acquiring the character strings contained in each slot in the third slot information, and merging the character strings into the third character string information;
    获取所述第二槽位信息中各个槽位所含字符串,并将各个字符串合并为第二字符串信息;Acquiring the character strings contained in each slot in the second slot information, and merging the character strings into the second character string information;
    计算所述第三字符串信息与所述第二字符串信息中各个字符串之间的编辑距离;Calculating the edit distance between each character string in the third character string information and the second character string information;
    根据所述编辑距离的大小计算出所述第二相关性。The second correlation is calculated according to the size of the edit distance.
  9. 根据权利要求5所述的多轮对话中用户意图的识别方法,其中,所述分别获取所述第一对话状态和所述第二对话状态的第一槽位信息和第二槽位信息的步骤之后,还包括:The method for recognizing user intentions in multiple rounds of dialogues according to claim 5, wherein the step of obtaining first slot information and second slot information of the first dialogue state and the second dialogue state respectively After that, it also includes:
    判断所述第一对话状态和/或第二对话状态所含有的槽位是否填充完整;Judging whether the slots contained in the first dialogue state and/or the second dialogue state are completely filled;
    若未填充完整,则获取与所述第一对话状态和/或第二对话状态中未填充完整的槽位所缺失的关键词信息,并根据获取到的关键词信息将所述第一对话状态和/或第二对话状态所含有的槽位填充完整。If it is not completely filled, then obtain the missing keyword information with the incompletely filled slots in the first dialogue state and/or the second dialogue state, and change the first dialogue state according to the obtained keyword information And/or the slots contained in the second dialog state are completely filled.
  10. 根据权利要求7所述的多轮对话中用户意图的识别方法,其中,所述分别获取所述系统反馈信息和所述第二对话状态的第三槽位信息和第二槽位信息的步骤之后,还 包括:The method for recognizing user intentions in multiple rounds of conversations according to claim 7, wherein after the step of obtaining the system feedback information and the third slot information and the second slot information of the second dialogue state respectively ,Also includes:
    判断所述系统反馈信息和/或第二对话状态所含有的槽位是否填充完整;Judging whether the slots contained in the system feedback information and/or the second dialog state are completely filled;
    若未填充完整,则获取与所述系统反馈信息和/或第二对话状态中未填充完整的槽位所缺失的关键词信息,并根据获取到的关键词信息将所述系统反馈信息和/或第二对话状态所含有的槽位填充完整。If it is not completely filled, then obtain the missing keyword information with the system feedback information and/or the incompletely filled slot in the second dialogue state, and send the system feedback information and/or according to the obtained keyword information Or the slots contained in the second dialogue state are completely filled.
  11. 根据权利要求1-4、6-10任一项所述的多轮对话中用户意图的识别方法,其中,所述根据所述下文信息对用户意图进行识别,得到用户意图识别结果的步骤包括:The method for recognizing user intentions in multiple rounds of conversations according to any one of claims 1-4 and 6-10, wherein the step of recognizing user intentions according to the following information to obtain a user intention recognition result comprises:
    获取所述第二对话状态所含信息的字符信息,提取出第二关键词集;Acquiring character information of the information contained in the second dialogue state, and extracting a second keyword set;
    确定与所述第二关键词集相对应的第二用户指令信息;Determining second user instruction information corresponding to the second keyword set;
    根据所述第二用户指令信息得到用户意图识别结果。Obtain the user intention recognition result according to the second user instruction information.
  12. 根据权利要求1所述的多轮对话中用户意图的识别方法,其中,所述识别方法还包括:The method for recognizing user intentions in multiple rounds of conversations according to claim 1, wherein the recognizing method further comprises:
    若所述第一对话状态与所述第二对话状态之间的第一相关性大于等于预设第一阈值,则将所述第一对话状态、所述系统反馈信息与下文信息相结合对用户意图进行识别,得到用户意图识别结果。If the first correlation between the first dialogue state and the second dialogue state is greater than or equal to the preset first threshold, then the first dialogue state, the system feedback information and the following information are combined to the user Intentions are recognized, and the results of user intent recognition are obtained.
  13. 根据权利要求4所述的多轮对话中用户意图的识别方法,其中,所述识别方法还包括:The method for recognizing user intentions in multiple rounds of conversations according to claim 4, wherein the recognizing method further comprises:
    若所述系统反馈信息与所述第二对话状态之间的相关性大于等于预设第二阈值,则将所述第一对话状态、所述系统反馈信息与下文信息相结合对用户意图进行识别,得到用户意图识别结果。If the correlation between the system feedback information and the second dialogue state is greater than or equal to the preset second threshold, the first dialogue state, the system feedback information and the following information are combined to identify the user's intention , Get the result of user intention recognition.
  14. 根据权利要求12或13所述的多轮对话中用户意图的识别方法,其中,所述将所述第一对话状态、所述系统反馈信息与下文信息相结合对用户意图进行识别,得到用户意图识别结果的步骤包括:The method for recognizing user intentions in multiple rounds of conversations according to claim 12 or 13, wherein the first conversation state, the system feedback information, and the following information are combined to identify the user intentions to obtain the user intentions The steps to identify the result include:
    获取所述第一对话状态和所述系统反馈信息所含信息的字符信息,提取出第一关键词集;Acquiring the first dialogue state and the character information of the information contained in the system feedback information, and extracting the first keyword set;
    获取所述第二对话状态所含信息的字符信息,提取出第二关键词集;Acquiring character information of the information contained in the second dialogue state, and extracting a second keyword set;
    搜索与所述第一关键词集相对应的第一用户指令信息;Searching for first user instruction information corresponding to the first keyword set;
    搜索所述第一用户指令信息中与所述第二关键词集相对应的第二用户指令信息;Searching for second user instruction information corresponding to the second keyword set in the first user instruction information;
    根据所述第二用户指令信息得到用户意图识别结果。Obtain the user intention recognition result according to the second user instruction information.
  15. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其中, 所述处理器执行所述计算机程序时实现权利要求1至14中任一项所述方法的步骤。A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method according to any one of claims 1 to 14 when the computer program is executed by the processor.
  16. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现权利要求1至14中任一项所述的方法的步骤。A computer-readable storage medium having a computer program stored thereon, wherein the computer program implements the steps of the method according to any one of claims 1 to 14 when the computer program is executed by a processor.
PCT/CN2020/103922 2019-09-04 2020-07-24 User intention identification method in multi-round dialogue and related device WO2021042902A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910833404.1 2019-09-04
CN201910833404.1A CN112445902A (en) 2019-09-04 2019-09-04 Method for identifying user intention in multi-turn conversation and related equipment

Publications (1)

Publication Number Publication Date
WO2021042902A1 true WO2021042902A1 (en) 2021-03-11

Family

ID=74734481

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/103922 WO2021042902A1 (en) 2019-09-04 2020-07-24 User intention identification method in multi-round dialogue and related device

Country Status (2)

Country Link
CN (1) CN112445902A (en)
WO (1) WO2021042902A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113076408A (en) * 2021-03-19 2021-07-06 联想(北京)有限公司 Session information processing method and device
CN113806503A (en) * 2021-08-25 2021-12-17 北京库睿科技有限公司 Dialog fusion method, device and equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8868548B2 (en) * 2010-07-22 2014-10-21 Google Inc. Determining user intent from query patterns
CN105704013A (en) * 2016-03-18 2016-06-22 北京光年无限科技有限公司 Context-based topic updating data processing method and apparatus
CN106649739A (en) * 2016-12-23 2017-05-10 深圳市空谷幽兰人工智能科技有限公司 Multi-round interactive information inheritance recognition method, apparatus and interactive system
CN107133345A (en) * 2017-05-22 2017-09-05 北京百度网讯科技有限公司 Exchange method and device based on artificial intelligence
CN107193978A (en) * 2017-05-26 2017-09-22 武汉泰迪智慧科技有限公司 A kind of many wheel automatic chatting dialogue methods and system based on deep learning
CN110136705A (en) * 2019-04-10 2019-08-16 华为技术有限公司 A kind of method and electronic equipment of human-computer interaction

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10726831B2 (en) * 2014-05-20 2020-07-28 Amazon Technologies, Inc. Context interpretation in natural language processing using previous dialog acts
CN107369443B (en) * 2017-06-29 2020-09-25 北京百度网讯科技有限公司 Dialog management method and device based on artificial intelligence
CN109446306A (en) * 2018-10-16 2019-03-08 浪潮软件股份有限公司 A kind of intelligent answer method of more wheels dialogue of task based access control driving
CN109616108B (en) * 2018-11-29 2022-05-31 出门问问创新科技有限公司 Multi-turn dialogue interaction processing method and device, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8868548B2 (en) * 2010-07-22 2014-10-21 Google Inc. Determining user intent from query patterns
CN105704013A (en) * 2016-03-18 2016-06-22 北京光年无限科技有限公司 Context-based topic updating data processing method and apparatus
CN106649739A (en) * 2016-12-23 2017-05-10 深圳市空谷幽兰人工智能科技有限公司 Multi-round interactive information inheritance recognition method, apparatus and interactive system
CN107133345A (en) * 2017-05-22 2017-09-05 北京百度网讯科技有限公司 Exchange method and device based on artificial intelligence
CN107193978A (en) * 2017-05-26 2017-09-22 武汉泰迪智慧科技有限公司 A kind of many wheel automatic chatting dialogue methods and system based on deep learning
CN110136705A (en) * 2019-04-10 2019-08-16 华为技术有限公司 A kind of method and electronic equipment of human-computer interaction

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WU, JIALI: "Technology Researches of Query Refinement Based on User Intent", CHINESE MASTER'S THESES FULL-TEXT DATABASE, 1 December 2015 (2015-12-01), pages 1 - 64, XP055790034, ISSN: 1674-0246 *

Also Published As

Publication number Publication date
CN112445902A (en) 2021-03-05

Similar Documents

Publication Publication Date Title
US20240070204A1 (en) Natural Language Question Answering Systems
US11442932B2 (en) Mapping natural language to queries using a query grammar
US20210019341A1 (en) Implementing a software action based on machine interpretation of a language input
US11394667B2 (en) Chatbot skills systems and methods
WO2020007224A1 (en) Knowledge graph construction and smart response method and apparatus, device, and storage medium
US8190627B2 (en) Machine assisted query formulation
US8260809B2 (en) Voice-based search processing
WO2021051866A1 (en) Method and apparatus for determining case judgment result, device, and computer-readable storage medium
US10402490B1 (en) Edit distance based spellcheck
US20090287626A1 (en) Multi-modal query generation
US20070203869A1 (en) Adaptive semantic platform architecture
US11762848B2 (en) Combining parameters of multiple search queries that share a line of inquiry
US10930272B1 (en) Event-based semantic search and retrieval
JP7300435B2 (en) Methods, apparatus, electronics, and computer-readable storage media for voice interaction
WO2021042902A1 (en) User intention identification method in multi-round dialogue and related device
WO2013071305A2 (en) Systems and methods for manipulating data using natural language commands
TW202334839A (en) Contextual clarification and disambiguation for question answering processes
US20090006344A1 (en) Mark-up ecosystem for searching
US10866961B2 (en) Data interaction method and device thereof
US10515076B1 (en) Generating query answers from a user's history
CN108959327B (en) Service processing method, device and computer readable storage medium
WO2022103683A1 (en) Systems, methods, and program products for providing investment expertise using a financial ontology framework
US11960514B1 (en) Interactive conversation assistance using semantic search and generative AI
US11741308B2 (en) Method and system for constructing data queries from conversational input
US20230161808A1 (en) Performing image search based on user input using neural networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20861031

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 20/07/2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20861031

Country of ref document: EP

Kind code of ref document: A1