WO2021077834A1 - 一种基于对话系统对用户问句提出反问的方法和装置 - Google Patents

一种基于对话系统对用户问句提出反问的方法和装置 Download PDF

Info

Publication number
WO2021077834A1
WO2021077834A1 PCT/CN2020/105063 CN2020105063W WO2021077834A1 WO 2021077834 A1 WO2021077834 A1 WO 2021077834A1 CN 2020105063 W CN2020105063 W CN 2020105063W WO 2021077834 A1 WO2021077834 A1 WO 2021077834A1
Authority
WO
WIPO (PCT)
Prior art keywords
question
rhetorical
user
clause
questions
Prior art date
Application number
PCT/CN2020/105063
Other languages
English (en)
French (fr)
Inventor
姚开盛
张家兴
李小龙
Original Assignee
支付宝(杭州)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 支付宝(杭州)信息技术有限公司 filed Critical 支付宝(杭州)信息技术有限公司
Publication of WO2021077834A1 publication Critical patent/WO2021077834A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems

Definitions

  • the embodiments of this specification relate to the technical field of dialogue systems, and more specifically, to a method and device for asking rhetorical questions to users based on a dialogue system.
  • the dialogue system realizes the communication between humans and machines through computer algorithms, including three types: question and answer type, task type and small chat type.
  • question and answer type For the above-mentioned types of dialogue systems, users usually ask questions, and the dialogue system responds. In some cases, the questions asked by users are often vague, and it is difficult for the dialogue system to directly find a matching standard question from the standard question library and respond based on the answer to the standard question. In this case, the dialogue system needs to ask the user a rhetorical question to clarify the user's question, so that it is easy to find a matching standard question.
  • the model used to raise rhetorical questions to user questions usually adopts a supervised learning model, such as an RNN model.
  • a supervised learning model such as an RNN model.
  • the user’s input question is marked to structure the user’s question.
  • the user’s question is marked into several parts: scene, intent, key information, etc., so that the rhetorical model is analyzed based on the marked sample. training.
  • the embodiments of the present specification aim to provide a more effective solution to the question of the user based on the dialogue system, so as to solve the deficiencies in the prior art.
  • one aspect of this specification provides a method for questioning user questions based on a dialogue system.
  • the dialogue system is preset with M questioning modules corresponding to N standard questions, where M ⁇ N, each A rhetorical question module includes a first clause and a second clause that are split from the corresponding standard questions, and the method includes:
  • the second clause acquires a rhetorical question for the first question, so as to acquire a plurality of rhetorical questions for the first question based on the M rhetorical modules.
  • each of the rhetorical question modules includes two clauses respectively related to business and user intentions.
  • each of the rhetorical question modules is also preset with a second rhetorical question corresponding to the second clause, wherein, based on the second clause in the rhetorical question module, a second rhetorical question corresponding to the first question is obtained.
  • the rhetorical question of the sentence includes: obtaining the second rhetorical question from the rhetorical question module as a rhetorical question for the first question.
  • each of the rhetorical question modules is also preset with a first group of keywords corresponding to the first clause and a second group of keywords corresponding to the second clause, wherein, respectively Determining whether the first question sentence matches the first clause and the second clause includes respectively determining whether the first question sentence matches the first set of keywords and the second set of keywords.
  • the N standard questions are standard questions corresponding to the first domain
  • the method further includes, after obtaining the first question of the first user, determining the domain to which the first question belongs , wherein, for each of the M rhetorical question modules, respectively determining whether the first question sentence matches the first clause and the second clause therein includes, in determining the first question In the case that the sentence corresponds to the first domain, for each of the M rhetorical question modules, it is determined whether the first question sentence matches the first clause and the second clause therein. .
  • the dialogue system includes a reinforcement learning model
  • the method further includes, after acquiring a plurality of rhetorical questions for the first question based on the M rhetorical question modules, combining the plurality of rhetorical questions Rhetorical questions are input into the reinforcement learning model; the t-th cycle of the first round is executed based on the plurality of rhetorical questions through the reinforcement learning model, wherein the t-th cycle includes the following steps:
  • a predetermined number of rhetorical questions for the first question are determined from the plurality of rhetorical questions through the reinforcement learning model, so as to be output to the first user.
  • the first round includes T cycles
  • the method further includes determining a predetermined number of rhetorical questions for the first question from the plurality of rhetorical questions through the reinforcement learning model After outputting to the first user, the feedback of the first user relative to the output of the reinforcement learning model in each cycle of the t-th cycle is obtained.
  • the method further includes, after obtaining feedback from the first user in each cycle of the t-th cycle, based on the t-th state, the predetermined number of rhetorical questions, and The feedback of the first user in each cycle of the t-th cycle trains the reinforcement learning model.
  • the method further includes, after obtaining the feedback of the first user in each cycle of the t-Tth cycle, performing feedback of the first user in each cycle of the T-th cycle.
  • Feedback receiving the intention of the first user in the case where it is determined that no rhetorical question conforming to the intention of the first user is included in the T output of the reinforcement learning model;
  • the first rhetorical module is added to the dialogue system.
  • Another aspect of this specification provides a device for asking user questions based on a dialogue system.
  • the dialogue system is preset with M questioning modules corresponding to N standard questions, where M ⁇ N, and each questioning module Including the first clause and the second clause separated from the corresponding standard question, the device includes:
  • the first obtaining unit is configured to obtain the first question of the first user
  • the first determining unit is configured to, for each of the M rhetorical question modules, respectively determine whether the first question sentence matches the first clause and the second clause therein;
  • the second acquiring unit is configured to: in the case where the first question matches the first clause in the rhetorical module, and the first question does not match the second clause in the rhetorical module , Acquiring a rhetorical question for the first question based on the second clause in the rhetorical question module, so as to acquire a plurality of rhetorical questions for the first question based on the M rhetorical questioning modules.
  • each of the rhetorical question modules is also preset with a second rhetorical question corresponding to the second clause, wherein the second acquiring unit is further configured to acquire all the rhetorical questions from the rhetorical question module.
  • the second rhetorical question is stated as a rhetorical question for the first question.
  • each of the rhetorical question modules is also preset with a first group of keywords corresponding to the first clause and a second group of keywords corresponding to the second clause, wherein The first determining unit is further configured to separately determine whether the first question sentence matches the first set of keywords and the second set of keywords.
  • the N standard questions are standard questions corresponding to the first domain
  • the device further includes a second determining unit configured to determine all the first questions after obtaining the first question from the first user. State the domain to which the first question belongs, wherein the first determining unit is further configured to, in a case where it is determined that the first question corresponds to the first domain, for each of the M rhetorical modules A rhetorical question module separately determines whether the first question sentence matches the first clause and the second clause therein.
  • the dialogue system includes a reinforcement learning model
  • the device further includes an input unit configured to obtain a plurality of rhetorical questions for the first question based on the M rhetorical question modules After that, the plurality of rhetorical questions are input into the reinforcement learning model;
  • the execution unit is configured to execute the t-th cycle in the first round based on the plurality of rhetorical questions through the reinforcement learning model, wherein the The execution unit includes:
  • the acquiring subunit is configured to acquire the t-th state of the first round, where the t-th state includes the first question and the response to the first round that has been output by the reinforcement learning model in the first round. Rhetorical question of the first question;
  • An input subunit configured to input the t-th state into the reinforcement learning model
  • the determining subunit is configured to determine a predetermined number of rhetorical questions for the first question from the plurality of rhetorical questions through the reinforcement learning model, and output them to the first user.
  • the first round includes T cycles
  • the device further includes a third acquiring unit configured to determine, from the plurality of rhetorical questions through the reinforcement learning model, that the After a predetermined number of rhetorical questions of a question are output to the first user, the feedback of the first user relative to the output of the reinforcement learning model in each cycle of the t-th cycle is obtained.
  • the device further includes a training unit configured to obtain feedback from the first user in each cycle of the t-th cycle, based on the t-th state and the predetermined The number of rhetorical questions and the feedback of the first user in each cycle of the t-th cycle train the reinforcement learning model.
  • the device further includes a receiving unit configured to, after obtaining the feedback of the first user in each cycle of the t to T cycles, perform the processing based on each cycle of the T cycles Receiving the first user’s feedback in the case where it is determined that the T output of the reinforcement learning model does not include rhetorical questions that meet the first user’s intent, receiving the first user’s intent;
  • the fourth obtaining unit is configured to obtain the first standard question corresponding to the intention of the first user from the N standard questions;
  • a configuration unit configured to configure a first rhetorical module corresponding to the first standard question based on the intention of the first user
  • the adding unit is configured to add the first rhetorical module to the dialogue system.
  • Another aspect of this specification provides a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed in a computer, the computer is caused to execute any of the above methods.
  • Another aspect of this specification provides a computing device, including a memory and a processor, wherein the memory stores executable code, and when the processor executes the executable code, any one of the above methods is implemented.
  • the rhetorical question module based on standard questions, instead of using a lot of manpower to mark the training samples.
  • the standard question-based acquiring rhetorical module only the standard questions need to be paid attention to. , Without the need to perform complex operations such as clustering and structuring, which brings great convenience to related technicians.
  • the question-and-answer module can be extended in the framework at any time, and plug and play, which brings great convenience to relevant business personnel.
  • the model can be updated online based on user feedback.
  • Fig. 1 shows a schematic diagram of a dialogue system according to an embodiment of the present specification
  • Figure 2 shows a flow chart of a method for asking user questions based on a dialogue system according to an embodiment of the present specification
  • Fig. 3 shows a schematic diagram of a rhetorical questioning module according to an embodiment of the present specification
  • Figure 4 shows a flowchart of a method for selecting through a reinforcement learning model
  • FIG. 5 shows a schematic process of outputting rhetorical questions through a reinforcement learning model according to an embodiment of the present specification
  • Figure 6 shows a method of adding a rhetorical module to the dialogue system
  • Fig. 7 shows a device for asking user questions based on a dialogue system according to an embodiment of the present specification.
  • Fig. 1 shows a schematic diagram of a dialogue system according to an embodiment of the present specification.
  • the dialogue system includes a reinforcement learning model 11.
  • the dialogue system presets M rhetorical modules corresponding to the N standard questions: rhetorical module 1, rhetorical module 2,..., rhetorical module M, each rhetorical module includes two parts, the first part and the second part, among which ,
  • the first part includes business clauses split from the corresponding standard questions
  • the second part includes the intention clauses split from the corresponding standard questions.
  • the dialogue system executes the three steps shown in the figure based on the business clauses and intention clauses in each rhetorical module: matching clauses, detecting missing clauses, and obtaining Rhetorical question.
  • the multiple rhetorical questions are input into the reinforcement learning model 11 as multiple candidate actions b 1 , b 2 , ..., b P, and the user
  • the question questions and the rhetorical questions that the model has output in this round are input to the reinforcement learning model 11 as the state st , so that the model determines a predetermined number of rhetorical questions from the plurality of rhetorical questions based on the state st (at1 , At2 , at3 ) for output to the user.
  • the user's feedback can be obtained, for example, the user's feedback is a rhetorical question output by the model by clicking or not clicking.
  • the user can ask a new question, so that the dialogue system can perform the next round of pushing the question.
  • the next round of pushing the user's question and the rhetorical question that has been pushed to the user in the previous round can be input to the reinforcement learning model as the state corresponding to the next round of pushing, so as to output the Rhetorical question of user push.
  • the reinforcement learning model can be optimized based on the user's feedback on the rhetorical question raised by the system, so that the prediction of the reinforcement learning model is more accurate.
  • the user’s intent can be directly obtained, for example, the user’s intent is directly input, or the user is instructed in the dialogue system The intent is input by the user afterwards. Therefore, based on the intent input by the user, the corresponding standard question can be split again to generate a new rhetorical module and add it to the dialogue system.
  • Fig. 2 shows a flow chart of a method for questioning user questions based on a dialogue system according to an embodiment of the present specification.
  • the dialogue system is preset with M questioning modules corresponding to N standard questions, where M ⁇ N ,
  • Each rhetorical question module includes the first clause and the second clause separated from the corresponding standard question, and the method includes:
  • Step S202 Obtain the first question of the first user
  • Step S204 For each of the M rhetorical question modules, respectively determine whether the first question sentence matches the first clause and the second clause therein;
  • Step S206 in the case where the first question matches the first clause in the rhetorical module, and the first question does not match the second clause in the rhetorical module, based on the rhetorical module
  • the second clause acquires a rhetorical question for the first question, so as to acquire a plurality of rhetorical questions for the first question based on the M rhetorical modules.
  • the standard questions can be split to obtain the first clause and the second clause corresponding to the standard question.
  • the first clause corresponding to the service and the second clause corresponding to the user's intention (appeal) may be obtained based on the service and the appeal in the standard question, respectively.
  • the standard question "How long does it take to review money for Kaitonghuabei”
  • you can get the two clauses "How long does it take for Kaitonghuabei to receive money” and "How long does it take to review”.
  • Fig. 3 shows a schematic diagram of a rhetorical questioning module according to an embodiment of the present specification.
  • the rhetorical question module includes a module identification, such as "11384" in the figure.
  • the module identification may correspond to a standard question number, for example, to indicate that the rhetorical question module corresponds to a corresponding standard question.
  • the rhetorical question module includes a first clause unit 31 and a second clause unit 32.
  • the first clause unit 31 includes, for example, the business-corresponding clause 1: "Kaitonghuabei receive money”;
  • the second clause unit 32 includes, for example: clause 2 corresponding to the business: "How long is the review”; keywords corresponding to this clause: *review* (how long
  • clause 2 corresponding to the business: "How long is the review”
  • keywords corresponding to this clause *review* (how long
  • the corresponding rhetorical question 2 "How long does it take to review?".
  • the use of keywords and rhetorical questions in the rhetorical question module will be described in detail below.
  • step S202 the first question of the first user is obtained.
  • the dialogue system usually includes a questioning interface, and the first user can ask questions to the dialogue system in the form of text or voice on the questioning interface.
  • the first question is "How long does it take to review?"
  • This question lacks relevant services for the dialogue system, so it is a vague question. Therefore, in order to clarify the business corresponding to the question, the dialogue system can be as shown in Figure 2.
  • the method puts forward a rhetorical question to make the question more clear.
  • Step S204 for each of the M rhetorical questioning modules, respectively determine whether the first question sentence matches the first clause and the second clause therein.
  • the rhetorical question module identified as "11384" shown in Figure 3 the first question of the first user "how long does it need to be reviewed” and the first clause unit in the first clause unit are determined separately. Whether it matches, and whether “how long does it take to review” and clause 2 "how long does it take to review” in the second clause unit match.
  • each clause unit of the rhetorical question module is preset with a set of keywords corresponding to the corresponding clause, for example, as described above, and
  • the set of keywords corresponding to clause 1 includes, *(open
  • the first question includes the keyword set ⁇ review, how long ⁇ , or the keyword set ⁇ review, how long ⁇ , so as to determine whether the first question matches clause 2 .
  • the first question "how long does it take to review" includes the keyword set ⁇ review, how long ⁇ , so that it can be determined that the first question matches clause 2. It can be determined in the same way that the first question does not match clause 1 in the module.
  • the method used to determine whether the first question matches the clauses in the rhetorical module is not limited to the above keyword matching method.
  • the embedding vector of the first question can be obtained through each word embedding vector , And the embedding vector of each clause, so as to determine whether the first question matches each clause by comparing the similarity between the embedding vector of the first question and the embedding vector of each clause.
  • the corresponding matching model can be trained based on the training samples obtained by each rhetorical question module. Therefore, by inputting the first question sentence into the matching model corresponding to the rhetorical question module, the first question sentence and the rhetorical question module can be directly output. Whether the two clauses in match.
  • step S206 in the case that the first question matches the first clause in the rhetorical module, and the first question does not match the second clause in the rhetorical module, based on the rhetorical question
  • the second clause in the module acquires a rhetorical question for the first question, so as to acquire a plurality of rhetorical questions for the first question based on the M rhetorical modules.
  • the first question matches clause 2 in module 11384 shown in Figure 3, but does not match clause 1 in it.
  • the first clause is said clause 2
  • the second clause If it is the clause 1, the rhetorical question for the first question is obtained based on the clause 1.
  • clause 1 itself can be used as a rhetorical question for the first question.
  • a rhetorical question may be asked to the first user "Kaitonghuabei receive money?”.
  • the standard question corresponding to clause 1 may be used as a rhetorical question for the first question.
  • a rhetorical question may be asked to the first user "Kaitong Huabei Receive Money" How long does it take to review?”.
  • the corresponding rhetorical question can be preset in each clause unit of the rhetorical question module.
  • the corresponding rhetorical question “Kaitonghua” can be preset. Receiving money?” Therefore, after determining that the first question matches clause 2 and does not match clause 1, the corresponding rhetorical question can be obtained directly from the first clause unit corresponding to clause 1. "Opening Huabei to receive money?” to ask the first user a rhetorical question.
  • a plurality of standard questions are classified according to fields in the dialogue system, so that, correspondingly, the rhetorical module corresponding to each question is classified according to fields.
  • the N standard questions are standard questions corresponding to the Huabei domain, that is, the M rhetorical questioning modules are rheological questioning modules corresponding to the Huabei domain.
  • the domain to which the first question belongs is determined. For example, by setting a set of keywords for each field, and matching the first question sentence with the keywords in each field, the field of the first question sentence can be determined. For example, if the first question is "Kaitong Huabei receive money", through keyword matching, it can be determined that the first question belongs to the field of Huabei.
  • the above steps S204 and S206 can be performed based on the M rhetorical question modules corresponding to the Huabei domain.
  • the N standard questions are standard questions in various fields included in the dialogue system, and thus, the M rhetorical modules correspond to various fields.
  • the first question is "How long does it take to review”, for this question, the corresponding field cannot be determined through keyword matching, so it is necessary to perform the above steps S204 and S204 based on the M rhetorical modules in each field. S206.
  • FIG. 4 shows a flowchart of a method for selecting through a reinforcement learning model, in which a plurality of rhetorical questions acquired by the method shown in FIG. 2 are obtained in advance from the dialogue system in the reinforcement learning model, and the method is based on the reinforcement learning model
  • the method includes:
  • Step S402 Obtain the t-th state of the round, where the t-th state includes the first question and the rhetorical question for the first question that has been output by the reinforcement learning model in this round;
  • Step S404 input the t-th state into the reinforcement learning model
  • Step S406 Determine a predetermined number of rhetorical questions for the first question from the plurality of rhetorical questions through the reinforcement learning model, and output them to the first user.
  • the one episode based on the reinforcement learning model includes, for example, T cycles, then t can be any natural number from 1 to T.
  • the one episode is a continuous multiple rounds of dialogue between the user and the dialogue system, where each The cycle corresponds to one output of the reinforcement learning model.
  • Two consecutive cycles in the T cycles can correspond to the same question.
  • the reinforcement learning model asks multiple rounds of rhetorical questions for the same question of the user, or the T
  • the two consecutive cycles in the sub-cycle may correspond to different question sentences, which are topic-related and reflect the user's consistent intention.
  • the plurality of rhetorical questions can be input into the reinforcement learning model to target the first question sentence Push rhetorical questions to clarify the first user’s intentions.
  • P rhetorical questions b 1 , b 2 , ..., for the first question can be obtained through the method shown in FIG. 2 b P
  • the multiple rhetorical questions can be input into the reinforcement learning model to serve as candidate rhetorical questions for selecting rhetorical questions.
  • the method shown in Figure 4 is one push in multiple pushes (that is, one cycle in the round). The round ends, for example, after the first user indicates the end of the conversation, or ends when the first user does not reply within a predetermined period of time.
  • step S402 the t-th state of the round is obtained, and the t-th state includes the first question and the rhetorical question for the first question that has been output by the reinforcement learning model in this round.
  • the t-th state st used to input the model in the t-th cycle of the round of the reinforcement learning model includes the first question, and the reinforcement learning model has been used in this round.
  • Two rhetorical questions are output.
  • s 1 only includes the first question asked by the user, and in the second cycle, s 2 includes the user’s
  • the second question and the predetermined number for example, one or more) rhetorical questions that the reinforcement learning model has output to the user in the first cycle.
  • Fig. 5 shows a schematic process of outputting rhetorical questions through a reinforcement learning model according to an embodiment of this specification.
  • Fig. 5 schematically shows the first to third cycles in one round, and it can be understood that the 3 cycles are only illustrative, and the round is not limited to include 3 cycles.
  • the corresponding state s 1 only includes the first question asked by the user (shown in a white box in the figure), for example, the user inputs to the dialogue system "Taobao", in response to the question, the dialogue system outputs three rhetorical questions a 11 (Do you want to open Taobao?), a 12 (how to close Taobao?) and a 13 (what is Taobao).
  • the corresponding state s 2 also includes the model in the current round. Rhetorical questions that have been output in (shown in the gray box in the figure).
  • the rhetorical questions that the model has output in this round include the a 11 , a 12 and a 13 .
  • the corresponding state s 3 similarly includes the third question asked by the user and the rhetorical question output by the model (shown in the gray box in the figure).
  • the rhetorical question of includes a 11 , a 12 , a 13 , a 21 , a 22 and a 23 .
  • the t-th cycle reinforcement learning model for this round of the t-th input model comprises a first state S t question, and a reinforcement learning model in the present round of t- The rhetorical question that has been output in 1 cycle.
  • the gray box of the corresponding state s 3 may only include a 21 , a 22 and a 23 .
  • step S404 the t-th state is input to the reinforcement learning model.
  • step S406 a predetermined number of rhetorical questions for the first question are determined from the plurality of rhetorical questions through the reinforcement learning model, so as to be output to the first user.
  • the reinforcement learning model is, for example, a model based on a policy gradient algorithm.
  • the model includes a policy function ⁇ (a
  • s, ⁇ ) is the probability of taking action a in state s.
  • a plurality of rhetorical questions b 1 , b 2 ,..., b P for the first question can be obtained by the method shown in FIG. 2 as multiple candidate actions for determining the output action.
  • the respective probability b i b i are calculated based on a state S t and a plurality of candidate actions by the policy function of the model so as to be a predetermined maximum number of probabilities (e.g., three) candidate operation model output determining operation a t1, a t2, a t3 , and outputs it to the first user.
  • a predetermined maximum number of probabilities e.g., three
  • the model outputs three rhetorical questions a 11 , a 12 , a 13
  • the model outputs three Rhetorical questions a 21 , a 22 , and a 23.
  • the model outputs three rhetorical questions a 31 , a 32 , and a 33 .
  • the output rhetorical question is output to (displayed to) the first user, so that the corresponding reward value can be obtained based on the user's feedback.
  • the reward values r 11 , r 12 , and r 13 corresponding to each output action can be obtained. For example, if the first user does not click on the rhetorical question a 11 , the reward value corresponding to a 11 is 0, and if the first user clicks on the rhetorical question a 32 , the reward value corresponding to a 32 is r 32. Is 1.
  • the reinforcement learning model is not limited to using the policy gradient algorithm, but can use other algorithms, such as Q learning algorithm, actor-critic algorithm, etc., which will not be described in detail here.
  • the model can be trained through the input and output data and feedback data in the round.
  • the first user clicked the rhetorical question a 32 in the third cycle, and did not click on any rhetorical question output by the model in the first and second cycles, so that it is identical to the rhetorical question a 32
  • the corresponding reward value r 32 is equal to 1, and the reward values corresponding to a 11 , a 12 , a 13 , a 21 , a 22 , a 23 , a 31 , and a 33 are all zero.
  • the model parameters can be updated by the following formula (1):
  • Figure 6 shows a method for adding a rhetorical module to the dialogue system, including:
  • Step S602 in the case where it is determined that the T-th output of the reinforcement learning model does not include rhetorical questions that meet the first user’s intention based on the feedback of the first user in each cycle of the T-th cycle, Receiving the intention of the first user;
  • Step S604 Obtain a first standard question corresponding to the intention of the first user from the N standard questions;
  • Step S606 Configure a first rhetorical module corresponding to the first standard question based on the intention of the first user
  • Step S608 Add the first rhetorical module to the dialogue system.
  • step S602 based on the feedback of the first user in each cycle of the T cycles, it is determined that the T output of the reinforcement learning model does not include the rhetorical question that meets the intention of the first user. In this case, the intention of the first user is received.
  • the first user when the first user does not click on the output of each cycle of the reinforcement learning model, that is, T times of output of the reinforcement learning model do not include Ask the first user’s intention.
  • the first user may actively input his intention to the dialogue system, so that the dialogue system can receive the first user’s intention, or the dialogue system can ask the first user to make the The first user inputs his intention to the dialogue system, or a business person may make a manual judgment afterwards to input the first user's intention into the dialogue system.
  • the question input by the first user to the dialogue system is "Huabe automatic repayment", and the dialogue system is based on the existing rhetorical module (Huabe, automatic repayment) corresponding to the standard question “Huabe automatic repayment and deduction order”. Section), the rhetorical question related to the "deduction order” cannot be obtained, so that the reinforcement learning model cannot output the rhetorical question related to the "deduction order”. Therefore, the first user may not click any rhetorical question output by the model. In this case, the intent of the first user to input "huabei automatic repayment” as "debit order" can be received from the outside (the first user or a business person).
  • step S604 a first standard question corresponding to the combination of the first question and the intention is obtained from the N standard questions.
  • the N standard questions can be obtained
  • the corresponding first standard question "Huabei automatic repayment deduction order”.
  • Step S606 Configure a first rhetorical module corresponding to the first standard question based on the intention of the first user.
  • step S608 the first rhetorical module is added to the dialogue system.
  • the dialogue system initially includes the above-mentioned M questioning modules, by adding the first questioning module, the dialogue system includes a total of M+1 questioning modules.
  • the M+1 rhetorical questioning module can be used immediately to perform the methods shown in Figs. 2 and 4, and
  • the rhetorical module architecture in the dialogue system according to the embodiment of the present specification can be easily expanded with user feedback, and can be plug-and-play after expansion.
  • the expansion of the rhetorical module is not limited to the expansion in the above-mentioned manner.
  • the business area increases, or when the user’s hot topic changes, it may lead to an increase in standard questions in the dialogue system.
  • the corresponding increase of the questioning module can be obtained, so as to expand the framework of the questioning module.
  • Fig. 7 shows an apparatus 700 for asking user questions based on a dialogue system according to an embodiment of the present specification.
  • the dialogue system is preset with M questioning modules corresponding to N standard questions, where M ⁇ N,
  • Each rhetorical question module includes a first clause and a second clause that are split from the corresponding standard question, and the device includes:
  • the first obtaining unit 701 is configured to obtain the first question of the first user
  • the first determining unit 702 is configured to, for each of the M rhetorical question modules, respectively determine whether the first question sentence matches the first clause and the second clause therein;
  • the second acquiring unit 703 is configured to: when the first question matches the first clause in the rhetorical module, and the first question does not match the second clause in the rhetorical module
  • a rhetorical question for the first question is acquired based on the second clause in the rhetorical question module, so as to acquire a plurality of rhetorical questions for the first question based on the M rhetorical modules.
  • each rhetorical question module is also preset with a first rhetorical question corresponding to the first clause and a second rhetorical question corresponding to the second clause, wherein the first rhetorical question corresponds to the second clause.
  • the second acquiring unit 703 is further configured to acquire the second rhetorical question from the rhetorical question module as a rhetorical question for the first question.
  • each of the rhetorical question modules is also preset with a first group of keywords corresponding to the first clause and a second group of keywords corresponding to the second clause, wherein The first determining unit 702 is further configured to separately determine whether the first question sentence matches the first set of keywords and the second set of keywords.
  • the N standard questions are standard questions corresponding to the first domain
  • the device further includes a second determining unit 704 configured to determine after obtaining the first question of the first user The domain to which the first question belongs, wherein the first determining unit is further configured to, in a case where it is determined that the first question corresponds to the first domain, respond to any of the M rhetorical modules
  • Each rhetorical question module separately determines whether the first question matches the first clause and the second clause therein.
  • the dialogue system includes a reinforcement learning model
  • the device further includes an input unit 705 configured to obtain a plurality of rhetorical questions for the first question based on the M rhetorical question modules After that, input the multiple rhetorical questions into the reinforcement learning model;
  • the execution unit 706 is configured to execute the t-th loop in the first round based on the multiple rhetorical questions through the reinforcement learning model, wherein
  • the execution unit 706 includes:
  • the obtaining sub-unit 7061 is configured to obtain the t-th state of the first round, where the t-th state includes the first question and the response to all that has been output by the reinforcement learning model in the first round. Rhetorical question that states the first question;
  • the input subunit 7062 is configured to input the t-th state into the reinforcement learning model
  • the determining sub-unit 7063 is configured to determine a predetermined number of rhetorical questions for the first question from the plurality of rhetorical questions through the reinforcement learning model, and output them to the first user.
  • the first round includes a total of T loop modules
  • the device further includes a third acquiring unit 707 configured to determine, from the plurality of rhetorical questions through the reinforcement learning model, After the predetermined number of rhetorical questions of the first question are output to the first user, the feedback of the first user relative to the output of the reinforcement learning model in each cycle of the t-th cycle is obtained.
  • the device further includes a training unit 708 configured to obtain feedback from the first user in each cycle of the t-th cycle, based on the t-th state, the A predetermined number of rhetorical questions and feedback from the first user in each cycle of the t-th cycle are used to train the reinforcement learning model.
  • a training unit 708 configured to obtain feedback from the first user in each cycle of the t-th cycle, based on the t-th state, the A predetermined number of rhetorical questions and feedback from the first user in each cycle of the t-th cycle are used to train the reinforcement learning model.
  • the device further includes:
  • the receiving unit 709 is configured to, after obtaining the feedback of the first user in each cycle of the t-th cycle, determine the feedback of the first user in each cycle of the T cycle In the case where the T-th output of the reinforcement learning model does not include rhetorical questions that meet the intention of the first user, receiving the intention of the first user;
  • the fourth obtaining unit 710 is configured to obtain a first standard question corresponding to the intention of the first user from the N standard questions;
  • the configuration unit 711 is configured to configure a first rhetorical module corresponding to the first standard question based on the intention of the first user;
  • the adding unit 712 is configured to add the first rhetorical module to the dialogue system.
  • Another aspect of this specification provides a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed in a computer, the computer is caused to execute any of the above methods.
  • Another aspect of this specification provides a computing device, including a memory and a processor, wherein the memory stores executable code, and when the processor executes the executable code, any one of the above methods is implemented.
  • the rhetorical question module based on standard questions, instead of using a lot of manpower to mark the training samples.
  • the standard question-based acquiring rhetorical module only the standard questions need to be paid attention to. , Without the need to perform complex operations such as clustering and structuring, which brings great convenience to related technicians.
  • the question-and-answer module can be extended in the framework at any time, and plug and play, which brings great convenience to relevant business personnel.
  • the model can be updated online based on user feedback.
  • the steps of the method or algorithm described in the embodiments disclosed herein can be implemented by hardware, a software module executed by a processor, or a combination of the two.
  • the software module can be placed in random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disks, removable disks, CD-ROMs, or any other technical field Any other known storage media.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

一种基于对话系统对用户问句提出反问的方法和装置,涉及对话系统技术领域。所述对话系统中预设有与N个标准问题对应的M个反问模块,其中M≥N,每个反问模块中包括从相应的标准问题拆分的第一子句和第二子句;所述方法包括:获取第一用户的第一问句(S202);对于所述M个反问模块中的每个反问模块,分别确定所述第一问句与其中的所述第一子句和第二子句是否匹配(S204);在所述第一问句与该反问模块中的第一子句的匹配、且所述第一问句与该反问模块中的第二子句不匹配的情况中,基于该反问模块中的第二子句获取针对所述第一问句的反问句(S206),以基于所述M个反问模块获取针对所述第一问句的多个反问句。

Description

一种基于对话系统对用户问句提出反问的方法和装置 技术领域
本说明书实施例涉及对话系统技术领域,更具体地,涉及一种基于对话系统对用户问句提出反问的方法和装置。
背景技术
最近,人们越来越关注如何用机器学习来更好地构建对话系统。对话系统通过计算机算法实现人与机器的交流,包括问答型、任务型和闲聊型三个类型。对于上述几种类型的对话系统,通常由用户提出问题,并由对话系统进行回复。在一些情况中,用户提出的问题往往是模糊的,对话系统很难直接从标问库中找到匹配的标准问题、并基于该标准问题的答案进行回复。在该情况中,需要由对话系统对用户提出反问,以明确用户的问题,从而便于找到匹配的标准问题。在现有技术中,用于对用户问句提出反问的模型通常采用监督学习模型,如RNN模型等,为了训练该反问模型,通常需要对多个用户输入问题进行聚类,并基于该聚类结果对用户的输入问题进行标注,以将用户问句进行结构化表示,如将用户的问题标注为几个部分:场景,意图,关键信息等,从而基于该标注样本进行对所述反问模型的训练。
因此,需要一种更有效的基于对话系统对用户问句提出反问的方案。
发明内容
本说明书实施例旨在提供一种更有效的基于对话系统对用户问句提出反问的方案,以解决现有技术中的不足。
为实现上述目的,本说明书一个方面提供一种基于对话系统对用户问句提出反问的方法,所述对话系统中预设有与N个标准问题对应的M个反问模块,其中M≥N,每个反问模块中包括从相应的标准问题拆分的第一子句和第二子句,所述方法包括:
获取第一用户的第一问句;
对于所述M个反问模块中的每个反问模块,分别确定所述第一问句与其中的所述第一子句和第二子句是否匹配;
在所述第一问句与该反问模块中的第一子句的匹配、且所述第一问句与该反问模块 中的第二子句不匹配的情况中,基于该反问模块中的第二子句获取针对所述第一问句的反问句,以基于所述M个反问模块获取针对所述第一问句的多个反问句。
在一个实施例中,每个所述反问模块中包括分别与业务和用户意图相关的两个子句。
在一个实施例中,每个所述反问模块中还预设有与所述第二子句对应的第二反问句,其中,基于该反问模块中的第二子句获取针对所述第一问句的反问句包括,从该反问模块中获取所述第二反问句作为针对所述第一问句的反问句。
在一个实施例中,每个所述反问模块中还预设有与所述第一子句对应的第一组关键词和与所述第二子句对应的第二组关键词,其中,分别确定所述第一问句与所述第一子句和第二子句是否匹配包括,分别确定所述第一问句与所述第一组关键词和所述第二组关键词是否匹配。
在一个实施例中,所述N个标准问题为与第一领域对应的标准问题,所述方法还包括,在获取第一用户的第一问句之后,确定所述第一问句所属的领域,其中,对于所述M个反问模块中的每个反问模块,分别确定所述第一问句与其中的所述第一子句和第二子句是否匹配包括,在确定所述第一问句与所述第一领域对应的情况中,对于所述M个反问模块中的每个反问模块,分别确定所述第一问句与其中的所述第一子句和第二子句是否匹配。
在一个实施例中,所述对话系统中包括强化学习模型,所述方法还包括,在基于所述M个反问模块获取针对所述第一问句的多个反问句之后,将所述多个反问句输入所述强化学习模型;通过所述强化学习模型基于所述多个反问句执行第一回合的第t次循环,其中,所述第t次循环包括以下步骤:
获取所述第一回合的第t个状态,所述第t个状态包括所述第一问句、由强化学习模型在所述第一回合中已输出的针对所述第一问句的反问句;
将所述第t个状态输入所述强化学习模型;
通过所述强化学习模型从所述多个反问句中确定针对所述第一问句的预定数目的反问句,以输出给所述第一用户。
在一个实施例中,所述第一回合包括T次循环,所述方法还包括,在通过所述强化学习模型从所述多个反问句中确定针对所述第一问句的预定数目的反问句,以输出给所述第一用户之后,获取第t~T次循环的每次循环中第一用户相对于所述强化学习模型的输出的反馈。
在一个实施例中,所述方法还包括,在获取第t~T次循环的每次循环中的第一用户的反馈之后,基于所述第t个状态、所述预定数目的反问句、以及第t~T次循环的每次循环中的第一用户的反馈,训练所述强化学习模型。
在一个实施例中,所述方法还包括,在获取第t~T次循环的每次循环中的第一用户的反馈之后,在基于所述T次循环的每次循环中的第一用户的反馈,确定所述强化学习模型的T次输出都不包括符合所述第一用户的意图的反问句的情况中,接收所述第一用户的意图;
从所述N个标准问题中获取与所述第一用户的意图对应的第一标准问题;
基于所述第一用户的意图,配置与所述第一标准问题对应的第一反问模块;
在所述对话系统中添加所述第一反问模块。
本说明书另一方面提供一种基于对话系统对用户问句提出反问的装置,所述对话系统中预设有与N个标准问题对应的M个反问模块,其中M≥N,每个反问模块中包括从相应的标准问题拆分的第一子句和第二子句,所述装置包括:
第一获取单元,配置为,获取第一用户的第一问句;
第一确定单元,配置为,对于所述M个反问模块中的每个反问模块,分别确定所述第一问句与其中的所述第一子句和第二子句是否匹配;
第二获取单元,配置为,在所述第一问句与该反问模块中的第一子句的匹配、且所述第一问句与该反问模块中的第二子句不匹配的情况中,基于该反问模块中的第二子句获取针对所述第一问句的反问句,以基于所述M个反问模块获取针对所述第一问句的多个反问句。
在一个实施例中,每个所述反问模块中还预设有与所述第二子句对应的第二反问句,其中,所述第二获取单元还配置为,从该反问模块中获取所述第二反问句作为针对所述第一问句的反问句。
在一个实施例中,每个所述反问模块中还预设有与所述第一子句对应的第一组关键词和与所述第二子句对应的第二组关键词,其中,所述第一确定单元还配置为,分别确定所述第一问句与所述第一组关键词和所述第二组关键词是否匹配。
在一个实施例中,所述N个标准问题为与第一领域对应的标准问题,所述装置还包括,第二确定单元,配置为,在获取第一用户的第一问句之后,确定所述第一问句所属 的领域,其中,所述第一确定单元还配置为,在确定所述第一问句与所述第一领域对应的情况中,对于所述M个反问模块中的每个反问模块,分别确定所述第一问句与其中的所述第一子句和第二子句是否匹配。
在一个实施例中,所所述对话系统中包括强化学习模型,所述装置还包括,输入单元,配置为,在基于所述M个反问模块获取针对所述第一问句的多个反问句之后,将所述多个反问句输入所述强化学习模型;执行单元,配置为,通过所述强化学习模型基于所述多个反问句执行第一回合中的第t次循环,其中,所述执行单元包括:
获取子单元,配置为,获取所述第一回合的第t个状态,所述第t个状态包括所述第一问句、由强化学习模型在所述第一回合中已输出的针对所述第一问句的反问句;
输入子单元,配置为,将所述第t个状态输入所述强化学习模型;
确定子单元,配置为,通过所述强化学习模型从所述多个反问句中确定针对所述第一问句的预定数目的反问句,以输出给所述第一用户。
在一个实施例中,所述第一回合包括T次循环,所述装置还包括,第三获取单元,配置为,在通过所述强化学习模型从所述多个反问句中确定针对所述第一问句的预定数目的反问句,以输出给所述第一用户之后,获取第t~T次循环的每次循环中第一用户相对于所述强化学习模型的输出的反馈。
在一个实施例中,所述装置还包括,训练单元,配置为,在获取第t~T次循环的每次循环中的第一用户的反馈之后,基于所述第t个状态、所述预定数目的反问句、以及第t~T次循环的每次循环中的第一用户的反馈,训练所述强化学习模型。
在一个实施例中,所述装置还包括,接收单元,配置为,在获取第t~T次循环的每次循环中的第一用户的反馈之后,在基于所述T次循环的每次循环中的第一用户的反馈,确定所述强化学习模型的T次输出都不包括符合所述第一用户的意图的反问句的情况中,接收所述第一用户的意图;
第四获取单元,配置为,从所述N个标准问题中获取与所述第一用户的意图对应的第一标准问题;
配置单元,配置为,基于所述第一用户的意图,配置与所述第一标准问题对应的第一反问模块;
添加单元,配置为,在所述对话系统中添加所述第一反问模块。
本说明书另一方面提供一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行上述任一项方法。
本说明书另一方面提供一种计算设备,包括存储器和处理器,其特征在于,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现上述任一项方法。
通过根据本说明书实施例的对话系统方案,只需要基于标准问题获取反问模块,而不需要通过大量的人力进行对训练样本的标注,而在基于标准问题获取反问模块中,只需要关注本标准问题,而不需要进行聚类、结构化等复杂操作,给相关的技术人员带来很大的便利性。另外,通过本说明书实施例中提出的反问模块框架,可在该框架中对反问模块随时扩展,并且即插即用,给相关的业务人员带来很大的便利性。另外,通过训练基于该反问模块框架的强化学习模型,从而可基于用户的反馈在线更新模型。
附图说明
通过结合附图描述本说明书实施例,可以使得本说明书实施例更加清楚:
图1示出根据本说明书实施例的对话系统示意图;
图2示出根据本说明书实施例的一种基于对话系统对用户问句提出反问的方法流程图;
图3示出根据本说明书实施例的反问模块的示意图;
图4示出通过强化学习模型进行挑选的方法流程图;
图5示出通过根据本说明书实施例的通过强化学习模型输出反问句的示意过程;
图6示出了在对话系统中增加反问模块的方法;
图7示出根据本说明书实施例的一种基于对话系统对用户问句提出反问的装置。
具体实施方式
下面将结合附图描述本说明书实施例。
图1示出根据本说明书实施例的对话系统示意图。如图1中所示,所述对话系统中包括强化学习模型11。所述对话系统中预设了与N个标准问题对应的M个反问模块:反问模块1,反问模块2,…,反问模块M,每个反问模块中包括第一部分和第二部分两部分,其中,第一部分包括从相应的标准问题拆分的业务子句,第二部分包括从相应的标 准问题拆分的意图子句。当用户向该对话系统输入其询问问题之后,对话系统中基于每个反问模块中的业务子句和意图子句执行图中所示的三个步骤:与子句匹配,检测缺失子句,获取反问句。在针对M个反问模块获取多个(例如P个)反问句之后,将该多个反问句作为多个候选动作b 1、b 2、…、b P输入强化学习模型11,并将所述用户询问问句和模型在本回合中已输出的反问作为状态s t输入所述强化学习模型11,从而所述模型基于状态s t从所述多个反问句中确定预定数目的反问句(a t1、a t2、a t3),以用于输出给用户。在进行该输出之后,可获取用户的反馈,例如,所述用户的反馈为点击或不点击由所述模型输出的反问句。在用户进行所述反馈之后,用户可提出新的问句,从而所述对话系统可进行下一轮的反问推送。在该下一轮的推送中,可将用户询问问句和在上一轮中已经推送给用户的反问作为与该下一轮推送对应的状态输入所述强化学习模型,以输出用于向该用户推送的反问。
可基于用户对系统提出的反问的反馈进行对所述强化学习模型的优化,从而使得所述强化学习模型的预测更加准确。
如果在经过预定轮数的多轮反问推送之后,从用户的反馈获知仍没有符合用户意图的反问,则可以直接获取用户的意图,该意图例如由用户直接输入,或者在该对话系统指示用户输入意图之后由用户输入,从而,基于用户输入的意图,可对相应的标准问题重新进行拆分,以生成新的反问模块并添加到对话系统中。
可以理解,上文中参考图1的描述仅是示意性的,而不是限定性的,下面将详细描述上述对用户问句提出反问的方法。
图2示出根据本说明书实施例的一种基于对话系统对用户问句提出反问的方法流程图,所述对话系统中预设有与N个标准问题对应的M个反问模块,其中M≥N,每个反问模块中包括从相应的标准问题拆分的第一子句和第二子句,所述方法包括:
步骤S202,获取第一用户的第一问句;
步骤S204,对于所述M个反问模块中的每个反问模块,分别确定所述第一问句与其中的所述第一子句和第二子句是否匹配;
步骤S206,在所述第一问句与该反问模块中的第一子句的匹配、且所述第一问句与该反问模块中的第二子句不匹配的情况中,基于该反问模块中的第二子句获取针对所述第一问句的反问句,以基于所述M个反问模块获取针对所述第一问句的多个反问句。
在现有技术中,通常在对话系统中都预设有多个标准问题,在本说明书实施例中, 可对标准问题进行拆分以获取与该标准问题对应的第一子句和第二子句,例如,可基于标准问题中的业务和诉求分别获取与业务对应的第一子句和与用户意图(诉求)对应的第二子句。例如对于标准问题“开通花呗收钱需要审核多久”,可从其获取“开通花呗收钱”和“审核多久”两个子句,其中,“开通花呗收钱”与业务相关,“审核多久”与用户提问的意图(诉求)相关。可以理解,对标问的拆分不限于只拆分成一种形式,另外,对标问的拆分不限于基于业务和用户意图进行,例如,对于标问“花呗自动还款扣款顺序”,可以获取子句“花呗”和“自动还款”,也可以获取子句“花呗自动还款”和“扣款顺序”等等,例如,可基于用户的提问方式进行相应的对标问的拆分。从而,可基于N个标准问题获取M个反问模块,其中M≥N。
图3示出根据本说明书实施例的反问模块的示意图。如图3所示,该反问模块包括模块标识,例如图中的“11384”,该模块标识例如可以与标准问题编号相对应,以表示该反问模块是与相应的标准问题相对应的。另外,该反问模块中包括第一子句单元31和第二子句单元32,该第一子句单元31例如包括:与业务对应的子句1:“开通花呗收钱”;与该子句1对应的关键词:*(开通|申请)*花呗收钱*;以及与该子句对应的反问句1:“开通花呗收钱?”。类似地,第二子句单元32例如包括:与业务对应的子句2:“审核多久”;与该子句对应的关键词:*审核*(多久|多长时间)*;以及与该子句对应的反问句2:“需要审核多久?”。其中,对于反问模块中的关键词和反问句的使用将在下文详细描述。
首先,步骤S202,获取第一用户的第一问句。
所述对话系统通常包括提问界面,第一用户可通过在该提问界面通文字或语音等形式向对话系统提出问题。例如,所述第一问句为“需要审核多久”,该问题对于对话系统来说缺乏相关的业务,因此属于模糊问题,从而,为了明确该问题对应的业务,对话系统可通过图2所示的方法提出反问,以使得该问题变得更加清楚。
步骤S204,对于所述M个反问模块中的每个反问模块,分别确定所述第一问句与其中的所述第一子句和第二子句是否匹配。
例如,对于图3所示的标识为“11384”的反问模块,分别确定第一用户的第一问句“需要审核多久”与第一子句单元中的子句1“开通花呗收钱”是否匹配,以及“需要审核多久”与第二子句单元中的子句2“审核多久”是否匹配。
在一个实施例中,如图3所示的反问模块中所示,在该反问模块的每个子句单元中 预设有与相应子句对应的一组关键词,例如,如上文所述,与其中子句1对应的一组关键词包括,*(开通|申请)*花呗收钱*,与其中子句2对应的一组关键词包括,*审核*(多久|多长时间)*。从而,对于每个子句,通过确定第一问句是否包括与该子句对应的一组关键词中的每两个*之间的关键词,而确定该第一问句与该子句是否匹配,其中“|”表示其两边的关键词可任选一个。例如,对于子句2,通过确定第一问句中是否包括关键词集合{审核、多久}、或者关键词集合{审核、多长时间},从而确定该第一问句与子句2是否匹配。显然,第一问句“需要审核多久”中包括关键词集合{审核、多久},从而可确定第一问句与子句2相匹配。可通过同样地方式确定,第一问句与该模块中的子句1不匹配。
可以理解,用于确定第一问句与反问模块中的子句是否匹配不限于通过上述关键词匹配的方式进行,在一个实施例中,可通过各个词嵌入向量获取第一问句的嵌入向量、以及各个子句的嵌入向量,从而可通过比较第一问句的嵌入向量与各个子句的嵌入向量的相似性,从而确定第一问句与各个子句是否匹配。在一个实施例中,可基于各个反问模块获取训练样本训练相应的匹配模型,从而,通过将该第一问句输入与该反问模块对应的匹配模型,可直接输出第一问句与该反问模块中的两个子句是否匹配。
在步骤S206,在所述第一问句与该反问模块中的第一子句的匹配、且所述第一问句与该反问模块中的第二子句不匹配的情况中,基于该反问模块中的第二子句获取针对所述第一问句的反问句,以基于所述M个反问模块获取针对所述第一问句的多个反问句。
例如如上文所述,第一问句与图3所示模块11384中的子句2匹配,与其中的子句1不匹配,这里假设第一子句为所述子句2,第二子句为所述子句1,则基于子句1获取针对所述第一问句的反问句。在一个实施例中,可将子句1本身作为针对所述第一问句的反问句,例如,针对所述第一问句,可向第一用户提出反问“开通花呗收钱?”。在一个实施例中,可将子句1对应的标准问题作为针对所述第一问句的反问句,例如,针对所述第一问句,可向第一用户提出反问“开通花呗收钱需要审核多久?”。在一个实施例中,如图3中所示,在反问模块的每个子句单元中可预设相应的反问句,例如,在第一子句单元中,可预设相应的反问句“开通花呗收钱?”,从而,在确定第一问句与其中的子句2匹配、与子句1不匹配之后,可直接从与子句1对应的第一子句单元中获取相应的反问句“开通花呗收钱?”,以向第一用户提出反问。
在一个实施例中,在对话系统中对多个标准问题按领域进行分类,从而,相应地,对各个问题相应的反问模块按领域进行分类。例如,所述N个标准问题为与花呗领域对 应的标准问题,即,所述M个反问模块为与花呗领域对应的反问模块。从而,在获取第一用户的第一问句之后,确定该第一问句所属的领域。例如,可通过对各个领域设置各自的一组关键词,并通过对第一问句进行与各个领域的关键词匹配,从而确定第一问句的领域。例如,如果第一问句为“开通花呗收钱”,从而通过关键词匹配,可确定第一问句属于花呗领域。在确定第一问句属于花呗领域之后,从而可基于与花呗领域对应的M个反问模块进行上述步骤S204和S206。
在一个实施例中,所述N个标准问题为对话系统中包括的各个领域的标准问题,从而,所述M个反问模块与各个领域对应。如上文所述,如果第一问句为“需要审核多久”,对于该问句,通过关键词匹配,并不能确定其对应的领域,从而需要基于各个领域的M个反问模块进行上述步骤S204和S206。
可以理解,基于对话系统中的M个反问模块可获取针对所述第一问句的多个反问句。例如,针对所述第一问句“需要审核多久”,通过上述步骤还可以从其它反问模块获取以下反问句:“实名认证?”、“大病保险理赔?”、“开通借呗?”等等。在该情况中,为了从该多个反问句中挑选出预定数目(例如3个)反问句输出给第一用户,可通过图1中所示的强化学习模型进行所述挑选。
图4示出通过强化学习模型进行挑选的方法流程图,所述强化学习模型中预先从所述对话系统获取了通过图2所示方法获取的多个反问句,所述方法为基于强化学习模型的一个回合中的第t次循环,所述方法包括:
步骤S402,获取该回合的第t个状态,所述第t个状态包括所述第一问句、由强化学习模型在本回合中已输出的针对所述第一问句的反问句;
步骤S404,将所述第t个状态输入所述强化学习模型;
步骤S406,通过所述强化学习模型从所述多个反问句中确定针对所述第一问句的预定数目的反问句,以输出给所述第一用户。
所述基于强化学习模型的一个回合(episode)例如包括T次循环,则t可以为1到T中的任一自然数,所述一个回合为用户与对话系统进行的连续多轮对话,其中每次循环对应于该强化学习模型的一次输出,该T次循环中的连续两次循环可对应于同一个问句,例如由强化学习模型针对用户的同一个问句提出多轮反问,或者,该T次循环中的连续两次循环可对应于不同的问句,该不同的问句在主题上是相关的,体现了用户的一致的意图。在第一用户输入第一问句之后,该对话系统在通过图2所示方法获取所述多个 反问句之后,可将该多个反问句输入该强化学习模型,以针对该第一问句进行反问句推送,以明确第一用户的意图。例如,参考图1,基于所述第一问句、图中的M个反问模块,通过图2所示方法可获取P个针对所述第一问句的反问句b 1、b 2、…、b P,可将所述多个反问句输入所述强化学习模型,以作为用于挑选反问句的候选反问句。图4所示方法即为多次推送中的一次推送(即该回合中的一次循环)。该回合例如在第一用户指示对话结束之后结束,或者在第一用户在预定时段中没有回复的情况下结束。
在步骤S402,获取该回合的第t个状态,所述第t个状态包括所述第一问句、由强化学习模型在本回合中已输出的针对所述第一问句的反问句。
参考图1中所示,在强化学习模型的该回合的该第t次循环中用于输入模型的所述第t个状态s t包括第一问句、以及由强化学习模型在本回合中已输出的反问句两项。例如,在该回合的第1次循环中,强化学习模型还未进行输出,因此,s 1中仅包括用户提出的第1个问句,在第2次循环中,s 2中包括用户提出的第2个问句、以及该强化学习模型在第1次循环中已向用户输出的预定数目(例如1个、或者多个)的反问句。
图5示出通过根据本说明书实施例的通过强化学习模型输出反问句的示意过程。图5中示意示出一个回合中的第1~3次循环,可以理解,该3次循环仅是示意性的,所述回合不限于包括3次循环。如图5中所示,在该回合的第1次循环中,对应的状态s 1中仅包括用户提出的第1个问句(图中以白色框示出),例如,用户向对话系统输入“淘宝”,针对该问句,对话系统例如输出三个反问句a 11(要开通淘宝么?)、a 12(如何关闭淘宝?)和a 13(什么是淘宝)。在第2次循环中,用户例如又提出问题“想问下怎么在淘宝上卖东西?”,从而对应的状态s 2中除了用户提出的第2个问句之外,还包括模型在本回合中已输出的反问句(图中以灰色框所示),这里,模型在本回合中已输出的反问句包括所述a 11、a 12和a 13。在第3次循环中,对应的状态s 3中类似地包括用户提出的第3个问句和模型已输出的反问句(图中以灰色框所示),这里,模型在本回合中已输出的反问句包括a 11、a 12、a 13、a 21、a 22和a 23。在一个实施例中,在强化学习模型的该回合的该第t次循环中用于输入模型的所述第t个状态s t包括第一问句、以及由强化学习模型在本回合第t-1次循环中已输出的反问句。例如,在图5所示的第3次循环中,对应的状态s 3的灰色框中可仅包括a 21、a 22和a 23
在步骤S404,将所述第t个状态输入所述强化学习模型。在步骤S406,通过所述强化学习模型从所述多个反问句中确定针对所述第一问句的预定数目的反问句,以输出给所述第一用户。
所述强化学习模型例如为基于策略梯度算法的模型,在该情况中,模型包括关于状态s和动作a的策略函数π(a|s,θ),其中,θ为该强化学习模型的模型参数,π(a|s,θ)为在状态s下采用动作a的概率。在本说明书实施例中,通过图2所示方法可获取多个针对所述第一问句的反问句b 1、b 2、…、b P,作为用于确定输出动作的多个候选动作。对于所述第t次循环,可通过该模型的策略函数基于状态s t和多个候选动作b i分别计算各个b i的概率,从而可将概率最大的预定数目的(例如3个)候选动作确定为模型输出动作a t1、a t2、a t3,并将其输出给所述第一用户。如图5中所示,在该回合中,在第1次循环中,由所述模型输出三个反问句a 11、a 12、a 13,在第2次循环中,由所述模型输出三个反问句a 21、a 22、a 23,在第3次循环中,由所述模型输出三个反问句a 31、a 32、a 33。在每次由模型输出反问句之后,都将该输出的反问句输出给(显示给)所述第一用户,从而可基于用户的反馈获取相应的回报值,例如,在第1次循环中,基于第一用户的反馈,可获取与各个输出动作分别对应的回报值r 11、r 12、r 13。例如,所述第一用户针对反问句a 11没有进行点击,则与a 11对应的回报值为0,所述第一用户针对反问句a 32进行点击,则与a 32对应的回报值r 32为1。
可以理解,所述强化学习模型不限于使用策略梯度算法,而可以使用其它算法,如Q学习算法、行为-评判算法(actor-critic)等,在此不一一详述。
如上文所述,在模型的一个回合结束之后,可通过该回合中的输入输出数据及反馈数据训练模型。例如,如上文所述,第一用户在第3次循环中点击了反问句a 32,在第1次和第2次循环中未对模型输出的任何反问句进行点击,从而与反问句a 32对应的回报值r 32等于1,与a 11、a 12、a 13、a 21、a 22、a 23、a 31、a 33对应的回报值都为零。则可通过如下公式(1)进行模型参数更新:
Figure PCTCN2020105063-appb-000001
其中,
Figure PCTCN2020105063-appb-000002
表示期望值。例如,假设第t个状态为图5中的状态s 2,对于该第2次循环中的任一输出动作,例如a 21,可通过如下公式(2)计算公式(1)中的
Figure PCTCN2020105063-appb-000003
Figure PCTCN2020105063-appb-000004
从而,通过如公式(2)所示基于r 32计算
Figure PCTCN2020105063-appb-000005
从而如公式(1)所示基于s 2、a 21
Figure PCTCN2020105063-appb-000006
更新模型参数θ。
类似地,假设第t个状态为图5中的状态s 3,对于该第3次循环中的动作a 32,可通过如下公式(3)计算公式(1)中的
Figure PCTCN2020105063-appb-000007
Figure PCTCN2020105063-appb-000008
从而,通过如公式(2)所示基于r 32计算
Figure PCTCN2020105063-appb-000009
从而如公式(1)所示基于s 3、a 32
Figure PCTCN2020105063-appb-000010
更新模型参数θ。
针对图5所示的强化学习模型的一个回合中的3次循环,如果第一用户未对该回合中由模型输出的任一反问句进行点击,即,第一用户针对由模型输出的每个反问句的回报值都为0,在该情况中,基于公式(1),该次回合的数据将无法用于训练模型。
针对上述情况,图6示出了在对话系统中增加反问模块的方法,包括:
步骤S602,在基于所述T次循环的每次循环中的第一用户的反馈,确定所述强化学习模型的T次输出都不包括符合所述第一用户的意图的反问句的情况中,接收所述第一用户的意图;
步骤S604,从所述N个标准问题中获取与所述第一用户的意图对应的第一标准问题;
步骤S606,基于所述第一用户的意图,配置与所述第一标准问题对应的第一反问模块;
步骤S608,在所述对话系统中添加所述第一反问模块。
首先,在步骤S602,在基于所述T次循环的每次循环中的第一用户的反馈,确定所述强化学习模型的T次输出都不包括符合所述第一用户的意图的反问句的情况中,接收所述第一用户的意图。
例如,如上文所述,当第一用户对于所述强化学习模型的每次循环的输出都未进行点击的情况下,也就是说,所述强化学习模型的T次输出都不包括符合所述第一用户的意图的反问,此时,第一用户可能主动向对话系统输入其意图,从而该对话系统可接收到该第一用户的意图,或者,可由对话系统通过询问第一用户以使得该第一用户向对话系统输入其意图,或者,可事后由业务人员进行人工判断以向对话系统输入所述第一用户的意图。
例如,第一用户向对话系统输入的问句为“花呗自动还款”,对话系统基于与标准问题“花呗自动还款扣款顺序”对应的已有的反问模块(花呗、自动还款),不能获取与“扣款顺序”相关的反问句,从而不能由所述强化学习模型输出与“扣款顺序”相关的反问句。从而,第一用户对于模型输出的任一反问句可能都未进行点击。在该情况中,可从外部(第一用户或业务人员)接收该第一用户输入“花呗自动还款”的意图为“扣 款顺序”。
在步骤S604,从所述N个标准问题中获取与所述第一问句和意图的结合对应的第一标准问题。
例如,基于“扣款顺序”(用户意图)的结合,例如通过将该“扣款顺序”与每个标准问题对应的一组关键词进行匹配,从而,可从所述N个标准问题中获取相应的第一标准问题“花呗自动还款扣款顺序”。
步骤S606,基于所述第一用户的意图,配置与所述第一标准问题对应的第一反问模块。
例如,对于上述第一标准问题“花呗自动还款扣款顺序”,基于所述第一用户的意图,可获取与该第一标准问题对应的两个子句“花呗自动还款”和“扣款顺序”,从而可配置与第一标准问题对应的第一反问模块,使得,该模块中的第一子句单元与“花呗自动还款”对应,该模块中的第二子句单元与“扣款顺序”对应。
在步骤S608,在所述对话系统中添加所述第一反问模块。
也就是说,如果所述对话系统中初始包括上述M个反问模块,通过添加该第一反问模块,从而使得该对话系统共包括M+1个反问模块。在添加了该第一反问模块之后,在继续通过所述对话系统用于获取针对用户问句的反问时,可立即使用该M+1个反问模块进行图2和图4所示的方法,也就是说,根据本说明书实施例的对话系统中的反问模块架构可随着用户的反馈容易地扩展,并且可在扩展之后即插即用。
可以理解,对所述反问模块的扩展不限于通过上述方式进行扩展,例如,当业务领域增加时,或者当用户的热点话题发生变化时,可能都导致对话系统中标准问题的增加,在该情况下,可基于增加的标准问题获取相应的增加的反问模块,从而对反问模块框架进行扩展。
图7示出根据本说明书实施例的一种基于对话系统对用户问句提出反问的装置700,所述对话系统中预设有与N个标准问题对应的M个反问模块,其中M≥N,每个反问模块中包括从相应的标准问题拆分的第一子句和第二子句,所述装置包括:
第一获取单元701,配置为,获取第一用户的第一问句;
第一确定单元702,配置为,对于所述M个反问模块中的每个反问模块,分别确定所述第一问句与其中的所述第一子句和第二子句是否匹配;
第二获取单元703,配置为,在所述第一问句与该反问模块中的第一子句的匹配、且所述第一问句与该反问模块中的第二子句不匹配的情况中,基于该反问模块中的第二子句获取针对所述第一问句的反问句,以基于所述M个反问模块获取针对所述第一问句的多个反问句。
在一个实施例中,每个所述反问模块中还预设有与所述第一子句对应的第一反问句和与所述第二子句对应的第二反问句,其中,所述第二获取单元703还配置为,从该反问模块中获取所述第二反问句作为针对所述第一问句的反问句。
在一个实施例中,每个所述反问模块中还预设有与所述第一子句对应的第一组关键词和与所述第二子句对应的第二组关键词,其中,所述第一确定单元702还配置为,分别确定所述第一问句与所述第一组关键词和所述第二组关键词是否匹配。
在一个实施例中,所述N个标准问题为与第一领域对应的标准问题,所述装置还包括,第二确定单元704,配置为,在获取第一用户的第一问句之后,确定所述第一问句所属的领域,其中,所述第一确定单元还配置为,在确定所述第一问句与所述第一领域对应的情况中,对于所述M个反问模块中的每个反问模块,分别确定所述第一问句与其中的所述第一子句和第二子句是否匹配。
在一个实施例中,所述对话系统中包括强化学习模型,所述装置还包括,输入单元705,配置为,在基于所述M个反问模块获取针对所述第一问句的多个反问句之后,将所述多个反问句输入所述强化学习模型;执行单元706,配置为,通过所述强化学习模型基于所述多个反问句执行第一回合中的第t次循环,其中,所述执行单元706包括:
获取子单元7061,配置为,获取所述第一回合的第t个状态,所述第t个状态包括所述第一问句、由强化学习模型在所述第一回合中已输出的针对所述第一问句的反问句;
输入子单元7062,配置为,将所述第t个状态输入所述强化学习模型;
确定子单元7063,配置为,通过所述强化学习模型从所述多个反问句中确定针对所述第一问句的预定数目的反问句,以输出给所述第一用户。
在一个实施例中,所述第一回合共包括T个循环模块,所述装置还包括,第三获取单元707,配置为,在通过所述强化学习模型从所述多个反问句中确定针对所述第一问句的预定数目的反问句,以输出给所述第一用户之后,获取第t~T次循环的每次循环中第一用户相对于所述强化学习模型的输出的反馈。
在一个实施例中,所述装置还包括,训练单元708,配置为,在获取第t~T次循环的每次循环中的第一用户的反馈之后,基于所述第t个状态、所述预定数目的反问句、以及第t~T次循环的每次循环中的第一用户的反馈,训练所述强化学习模型。
在一个实施例中,所述装置还包括,
接收单元709,配置为,在获取第t~T次循环的每次循环中的第一用户的反馈之后,在基于所述T次循环的每次循环中的第一用户的反馈,确定所述强化学习模型的T次输出都不包括符合所述第一用户的意图的反问句的情况中,接收所述第一用户的意图;
第四获取单元710,配置为,从所述N个标准问题中获取与所述第一用户的意图对应的第一标准问题;
配置单元711,配置为,基于所述第一用户的意图,配置与所述第一标准问题对应的第一反问模块;
添加单元712,配置为,在所述对话系统中添加所述第一反问模块。
本说明书另一方面提供一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行上述任一项方法。
本说明书另一方面提供一种计算设备,包括存储器和处理器,其特征在于,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现上述任一项方法。
通过根据本说明书实施例的对话系统方案,只需要基于标准问题获取反问模块,而不需要通过大量的人力进行对训练样本的标注,而在基于标准问题获取反问模块中,只需要关注本标准问题,而不需要进行聚类、结构化等复杂操作,给相关的技术人员带来很大的便利性。另外,通过本说明书实施例中提出的反问模块框架,可在该框架中对反问模块随时扩展,并且即插即用,给相关的业务人员带来很大的便利性。另外,通过训练基于该反问模块框架的强化学习模型,从而可基于用户的反馈在线更新模型。
需要理解,本文中的“第一”,“第二”等描述,仅仅为了描述的简单而对相似概念进行区分,并不具有其他限定作用。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于 系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
本领域普通技术人员应该还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执轨道,取决于技术方案的特定应用和设计约束条件。本领域普通技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
结合本文中所公开的实施例描述的方法或算法的步骤可以用硬件、处理器执轨道的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。
以上所述的具体实施方式,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施方式而已,并不用于限定本发明的保护范围,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (20)

  1. 一种基于对话系统对用户问句提出反问的方法,所述对话系统中预设有与N个标准问题对应的M个反问模块,其中M≥N,每个反问模块中包括从相应的标准问题拆分的第一子句和第二子句,所述方法包括:
    获取第一用户的第一问句;
    对于所述M个反问模块中的每个反问模块,分别确定所述第一问句与其中的所述第一子句和第二子句是否匹配;
    在所述第一问句与该反问模块中的第一子句的匹配、且所述第一问句与该反问模块中的第二子句不匹配的情况中,基于该反问模块中的第二子句获取针对所述第一问句的反问句,以基于所述M个反问模块获取针对所述第一问句的多个反问句。
  2. 根据权利要求1所述的方法,其中,每个所述反问模块中包括分别与业务和用户意图相关的两个子句。
  3. 根据权利要求1所述的方法,其中,每个所述反问模块中还预设有与所述第二子句对应的第二反问句,其中,基于该反问模块中的第二子句获取针对所述第一问句的反问句包括,从该反问模块中获取所述第二反问句作为针对所述第一问句的反问句。
  4. 根据权利要求1所述的方法,其中,每个所述反问模块中还预设有与所述第一子句对应的第一组关键词和与所述第二子句对应的第二组关键词,其中,分别确定所述第一问句与所述第一子句和第二子句是否匹配包括,分别确定所述第一问句与所述第一组关键词和所述第二组关键词是否匹配。
  5. 根据权利要求1所述的方法,其中,所述N个标准问题为与第一领域对应的标准问题,所述方法还包括,在获取第一用户的第一问句之后,确定所述第一问句所属的领域,其中,对于所述M个反问模块中的每个反问模块,分别确定所述第一问句与其中的所述第一子句和第二子句是否匹配包括,在确定所述第一问句与所述第一领域对应的情况中,对于所述M个反问模块中的每个反问模块,分别确定所述第一问句与其中的所述第一子句和第二子句是否匹配。
  6. 根据权利要求1所述的方法,其中,所述对话系统中包括强化学习模型,所述方法还包括,在基于所述M个反问模块获取针对所述第一问句的多个反问句之后,将所述多个反问句输入所述强化学习模型;通过所述强化学习模型基于所述多个反问句执行第一回合的第t次循环,其中,所述第t次循环包括以下步骤:
    获取所述第一回合的第t个状态,所述第t个状态包括所述第一问句、由强化学习模型在所述第一回合中已输出的针对所述第一问句的反问句;
    将所述第t个状态输入所述强化学习模型;
    通过所述强化学习模型从所述多个反问句中确定针对所述第一问句的预定数目的反问句,以输出给所述第一用户。
  7. 根据权利要求6所述的方法,其中,所述第一回合包括T次循环,所述方法还包括,在通过所述强化学习模型从所述多个反问句中确定针对所述第一问句的预定数目的反问句,以输出给所述第一用户之后,获取第t~T次循环的每次循环中第一用户相对于所述强化学习模型的输出的反馈。
  8. 根据权利要求7所述的方法,还包括,在获取第t~T次循环的每次循环中的第一用户的反馈之后,基于所述第t个状态、所述预定数目的反问句、以及第t~T次循环的每次循环中的第一用户的反馈,训练所述强化学习模型。
  9. 根据权利要求7所述的方法,还包括,
    在获取第t~T次循环的每次循环中的第一用户的反馈之后,在基于所述T次循环的每次循环中的第一用户的反馈,确定所述强化学习模型的T次输出都不包括符合所述第一用户的意图的反问句的情况中,接收所述第一用户的意图;
    从所述N个标准问题中获取与所述第一用户的意图对应的第一标准问题;
    基于所述第一用户的意图,配置与所述第一标准问题对应的第一反问模块;
    在所述对话系统中添加所述第一反问模块。
  10. 一种基于对话系统对用户问句提出反问的装置,所述对话系统中预设有与N个标准问题对应的M个反问模块,其中M≥N,每个反问模块中包括从相应的标准问题拆分的第一子句和第二子句,所述装置包括:
    第一获取单元,配置为,获取第一用户的第一问句;
    第一确定单元,配置为,对于所述M个反问模块中的每个反问模块,分别确定所述第一问句与其中的所述第一子句和第二子句是否匹配;
    第二获取单元,配置为,在所述第一问句与该反问模块中的第一子句的匹配、且所述第一问句与该反问模块中的第二子句不匹配的情况中,基于该反问模块中的第二子句获取针对所述第一问句的反问句,以基于所述M个反问模块获取针对所述第一问句的多个反问句。
  11. 根据权利要求10所述的装置,其中,每个所述反问模块中包括分别与业务和用户意图相关的两个子句。
  12. 根据权利要求10所述的装置,其中,每个所述反问模块中还预设有与所述第二子句对应的第二反问句,其中,所述第二获取单元还配置为,从该反问模块中获取所述 第二反问句作为针对所述第一问句的反问句。
  13. 根据权利要求10所述的装置,其中,每个所述反问模块中还预设有与所述第一子句对应的第一组关键词和与所述第二子句对应的第二组关键词,其中,所述第一确定单元还配置为,分别确定所述第一问句与所述第一组关键词和所述第二组关键词是否匹配。
  14. 根据权利要求10所述的装置,其中,所述N个标准问题为与第一领域对应的标准问题,所述装置还包括,第二确定单元,配置为,在获取第一用户的第一问句之后,确定所述第一问句所属的领域,其中,所述第一确定单元还配置为,在确定所述第一问句与所述第一领域对应的情况中,对于所述M个反问模块中的每个反问模块,分别确定所述第一问句与其中的所述第一子句和第二子句是否匹配。
  15. 根据权利要求10所述的装置,其中,所述对话系统中包括强化学习模型,所述装置还包括,输入单元,配置为,在基于所述M个反问模块获取针对所述第一问句的多个反问句之后,将所述多个反问句输入所述强化学习模型;执行单元,配置为,通过所述强化学习模型基于所述多个反问句执行第一回合中的第t次循环,其中,所述执行单元包括:
    获取子单元,配置为,获取所述第一回合的第t个状态,所述第t个状态包括所述第一问句、由强化学习模型在所述第一回合中已输出的针对所述第一问句的反问句;
    输入子单元,配置为,将所述第t个状态输入所述强化学习模型;
    确定子单元,配置为,通过所述强化学习模型从所述多个反问句中确定针对所述第一问句的预定数目的反问句,以输出给所述第一用户。
  16. 根据权利要求15所述的装置,其中,所述第一回合包括T次循环,所述装置还包括,第三获取单元,配置为,在通过所述强化学习模型从所述多个反问句中确定针对所述第一问句的预定数目的反问句,以输出给所述第一用户之后,获取第t~T次循环的每次循环中第一用户相对于所述强化学习模型的输出的反馈。
  17. 根据权利要求15所述的装置,还包括,训练单元,配置为,在获取第t~T次循环的每次循环中的第一用户的反馈之后,基于所述第t个状态、所述预定数目的反问句、以及第t~T次循环的每次循环中的第一用户的反馈,训练所述强化学习模型。
  18. 根据权利要求15所述的装置,还包括,
    接收单元,配置为,在获取第t~T次循环的每次循环中的第一用户的反馈之后,在基于所述T次循环的每次循环中的第一用户的反馈,确定所述强化学习模型的T次输出都不包括符合所述第一用户的意图的反问句的情况中,接收所述第一用户的意图;
    第四获取单元,配置为,从所述N个标准问题中获取与所述第一用户的意图对应的第一标准问题;
    配置单元,配置为,基于所述第一用户的意图,配置与所述第一标准问题对应的第一反问模块;
    添加单元,配置为,在所述对话系统中添加所述第一反问模块。
  19. 一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行权利要求1-9中任一项的所述的方法。
  20. 一种计算设备,包括存储器和处理器,其特征在于,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现权利要求1-9中任一项所述的方法。
PCT/CN2020/105063 2019-10-23 2020-07-28 一种基于对话系统对用户问句提出反问的方法和装置 WO2021077834A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911013294.0A CN110727783B (zh) 2019-10-23 2019-10-23 一种基于对话系统对用户问句提出反问的方法和装置
CN201911013294.0 2019-10-23

Publications (1)

Publication Number Publication Date
WO2021077834A1 true WO2021077834A1 (zh) 2021-04-29

Family

ID=69221861

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/105063 WO2021077834A1 (zh) 2019-10-23 2020-07-28 一种基于对话系统对用户问句提出反问的方法和装置

Country Status (2)

Country Link
CN (1) CN110727783B (zh)
WO (1) WO2021077834A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110727783B (zh) * 2019-10-23 2021-03-02 支付宝(杭州)信息技术有限公司 一种基于对话系统对用户问句提出反问的方法和装置
CN111414746B (zh) * 2020-04-10 2023-11-07 建信金融科技有限责任公司 一种匹配语句确定方法、装置、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106601237A (zh) * 2016-12-29 2017-04-26 上海智臻智能网络科技股份有限公司 交互式语音应答系统及其语音识别方法
EP3179384A1 (en) * 2014-09-29 2017-06-14 Huawei Technologies Co., Ltd. Method and device for parsing interrogative sentence in knowledge base
CN106897263A (zh) * 2016-12-29 2017-06-27 北京光年无限科技有限公司 基于深度学习的机器人对话交互方法及装置
CN107862005A (zh) * 2017-10-25 2018-03-30 阿里巴巴集团控股有限公司 用户意图识别方法及装置
CN110188180A (zh) * 2019-05-31 2019-08-30 三角兽(北京)科技有限公司 相似问题的确定方法、装置、电子设备及可读存储介质
CN110727783A (zh) * 2019-10-23 2020-01-24 支付宝(杭州)信息技术有限公司 一种基于对话系统对用户问句提出反问的方法和装置

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030101170A1 (en) * 2001-05-25 2003-05-29 Joseph Edelstein Data query and location through a central ontology model
US8423631B1 (en) * 2009-02-13 2013-04-16 Aerohive Networks, Inc. Intelligent sorting for N-way secure split tunnel
CN107885844A (zh) * 2017-11-10 2018-04-06 南京大学 基于分类检索的自动问答方法及系统
CN108363690A (zh) * 2018-02-08 2018-08-03 北京十三科技有限公司 基于神经网络的对话语义意图预测方法及学习训练方法
CN108446322B (zh) * 2018-02-10 2022-03-25 灯塔财经信息有限公司 一种智能问答系统的实现方法和装置
CN109002434A (zh) * 2018-05-31 2018-12-14 青岛理工大学 客服问答匹配方法、服务器及存储介质
CN109446306A (zh) * 2018-10-16 2019-03-08 浪潮软件股份有限公司 一种基于任务驱动的多轮对话的智能问答方法
CN109857841A (zh) * 2018-12-05 2019-06-07 厦门快商通信息技术有限公司 一种faq问句文本相似度计算方法及系统
CN110096580B (zh) * 2019-04-24 2022-05-24 北京百度网讯科技有限公司 一种faq对话方法、装置及电子设备
CN110209790B (zh) * 2019-06-06 2023-08-25 创新先进技术有限公司 问答匹配方法和装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3179384A1 (en) * 2014-09-29 2017-06-14 Huawei Technologies Co., Ltd. Method and device for parsing interrogative sentence in knowledge base
CN106601237A (zh) * 2016-12-29 2017-04-26 上海智臻智能网络科技股份有限公司 交互式语音应答系统及其语音识别方法
CN106897263A (zh) * 2016-12-29 2017-06-27 北京光年无限科技有限公司 基于深度学习的机器人对话交互方法及装置
CN107862005A (zh) * 2017-10-25 2018-03-30 阿里巴巴集团控股有限公司 用户意图识别方法及装置
CN110188180A (zh) * 2019-05-31 2019-08-30 三角兽(北京)科技有限公司 相似问题的确定方法、装置、电子设备及可读存储介质
CN110727783A (zh) * 2019-10-23 2020-01-24 支付宝(杭州)信息技术有限公司 一种基于对话系统对用户问句提出反问的方法和装置

Also Published As

Publication number Publication date
CN110727783B (zh) 2021-03-02
CN110727783A (zh) 2020-01-24

Similar Documents

Publication Publication Date Title
JP6972265B2 (ja) ポインタセンチネル混合アーキテクチャ
US20200152174A1 (en) Method, Apparatus, and System for Conflict Detection and Resolution for Competing Intent Classifiers in Modular Conversation System
US11005786B2 (en) Knowledge-driven dialog support conversation system
US10997258B2 (en) Bot networks
US10346782B2 (en) Adaptive augmented decision engine
RU2708941C1 (ru) Способ и устройство распознавания сегментированных предложений для человеко-машинной интеллектуальной вопросно-ответной системы
CN108021934B (zh) 多要素识别的方法及装置
US10580176B2 (en) Visualization of user intent in virtual agent interaction
US10326863B2 (en) Speed and accuracy of computers when resolving client queries by using graph database model
WO2022134421A1 (zh) 基于多知识图谱的智能答复方法、装置、计算机设备及存储介质
WO2021077834A1 (zh) 一种基于对话系统对用户问句提出反问的方法和装置
JPWO2007138875A1 (ja) 音声認識用単語辞書・言語モデル作成システム、方法、プログラムおよび音声認識システム
Windiatmoko et al. Developing facebook chatbot based on deep learning using rasa framework for university enquiries
JP7194233B2 (ja) オブジェクト推薦方法、ニューラルネットワークおよびそのトレーニング方法、装置ならびに媒体
Windiatmoko et al. Developing FB chatbot based on deep learning using RASA framework for university enquiries
CN110377733A (zh) 一种基于文本的情绪识别方法、终端设备及介质
CN115914148A (zh) 具有两侧建模的对话智能体
Galitsky et al. Learning communicative actions of conflicting human agents
CN117575008A (zh) 训练样本生成方法、模型训练方法、知识问答方法及设备
Currie The mystery of the Triceratops’s mother: how to be a realist about the species category
WO2021147405A1 (zh) 客服语句质检方法及相关设备
US20140195298A1 (en) Tracking of near conversions in user engagements
CN116955646A (zh) 知识图谱的生成方法和装置、存储介质及电子设备
Pragst et al. Comparative study of sentence embeddings for contextual paraphrasing
US11355118B2 (en) Virtual assistants harmonization

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20879731

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20879731

Country of ref document: EP

Kind code of ref document: A1