CN114579728A

CN114579728A - Dialogue generation method, device, equipment and medium applied to multi-turn dialogue system

Info

Publication number: CN114579728A
Application number: CN202210253749.1A
Authority: CN
Inventors: 徐万珺
Original assignee: Sichuan XW Bank Co Ltd
Current assignee: Sichuan XW Bank Co Ltd
Priority date: 2022-03-15
Filing date: 2022-03-15
Publication date: 2022-06-03
Anticipated expiration: 2042-03-15

Abstract

The invention discloses a dialogue generating method, a device, equipment and a medium applied to a multi-turn dialogue system, which comprises the steps of obtaining a first turn of dialogue of a user, and extracting the characteristics of the first turn of dialogue by utilizing a semantic understanding model to obtain semantic characteristic representation of the contents of the first turn of dialogue; extracting deep semantic information from the semantic feature representation by using an encoder to generate a first round of intermediate variables; extracting the intermediate variables of the first round by using an intention characteristic calculation network to obtain intention characteristic representation; respectively decoding the intention feature representation and the first-round intermediate variables, inputting decoding results into a generating network, completing response by the generating network, generating a response text of the first-round conversation content, and returning to complete the generation of the first-round conversation; and waiting for and acquiring the next round of conversation, and repeating the steps until the user triggers an ending condition, and ending the multiple rounds of conversation. The method is suitable for a multi-round interactive dialogue scene, response generation is realized based on current dialogue content and historical dialogue records, and multi-round interaction is completed.

Description

Dialogue generation method, device, equipment and medium applied to multi-turn dialogue system

Technical Field

The invention relates to the field of dialog systems, belongs to the natural language processing technology, and particularly relates to a dialog generation method, a dialog generation device, dialog generation equipment and a dialog generation medium applied to a multi-turn dialog system.

Background

The prior art dialog-based systems mainly include the following two categories:

the first category is matching or search based dialog systems, patent application numbers CN113806509A, CN113901188A, matching or search based methods: in the prior art, a nearly totally closed domain knowledge base is relied on, after a user conversation intention is identified, similar candidate sentences are matched in the knowledge base through a specific mechanism, and finally, a reply sentence corresponding to the candidate sentence with the highest similarity is output through similarity calculation. Although the method is low in implementation cost and high in operability, the method is easily limited by a knowledge base in practical application, and the usability of the method is greatly reduced when the method encounters knowledge outside the field.

The second category is based on the dialog system generated, patent application No. CN113792126A, based on the method of generation: the method directly generates the response text by using the generation type models such as the generation confrontation network and the self-encoder, and compared with the method based on matching and searching, the method for generating the type models is more complex and inconvenient to control, but is more flexible in application and can be better suitable for the more open knowledge field. However, due to the limitation of the model size and the like, the related art is more applied to only one round of dialog, and the more required rounds of dialog are less applied to reality.

Therefore, the existing matching or searching based method has strong dependency on the domain knowledge base and is usually a closed domain knowledge base, and has the limitation that when the domain knowledge is encountered, the quality of the response text given by the system is greatly reduced, and the response base is relatively fixed and has insufficient flexibility, cannot play a guiding role in the conversation, and is difficult to complete task-type conversation, so that the existing matching or searching based method is mostly only used for a single-turn conversation scene. Although the strong dependence on the knowledge base can be avoided based on the generation method, training of models in the open knowledge field often needs a huge data set, and the implementation cost and speed are much higher than those of a retrieval system.

Disclosure of Invention

The invention aims to provide a dialog generation method, a dialog generation device, dialog generation equipment and a dialog generation medium applied to a multi-turn dialog system. The method can be combined with a specific field background to be used as a generating task type dialogue guide or question-answering system, and avoids the strong dependence of the existing system on a field knowledge base.

The invention is realized by the following technical scheme:

in a first aspect, the present invention provides a dialog generation method applied to a multi-turn dialog system, the method including:

acquiring a first round of dialogue of a user, and performing feature extraction on the first round of dialogue by using a semantic understanding model to obtain semantic feature representation of the first round of dialogue content;

extracting deeper semantic information from the semantic feature representation by using an encoder according to the semantic feature representation to generate a first round of intermediate variables;

extracting the intermediate variables of the first round by using an intention characteristic calculation network according to the intermediate variables of the first round to obtain intention characteristic representation;

respectively decoding the intention feature representation and the first-round intermediate variables, inputting decoding results into a generating network, completing response by the generating network, generating a response text of the first-round conversation content and returning, wherein the completion of the generation of the first-round conversation is to finish the generation of the first-round conversation;

and waiting for and acquiring the next round of conversation, and repeating the steps until the user triggers an ending condition, and ending the multiple rounds of conversation.

The working principle is as follows:

existing dialog-based systems generally include two modules: an understanding module for inputting texts by users and a response module for replying texts. However, in the prior art (for example, the published text with the application number of CN112256825A entitled "method and apparatus for processing multiple rounds of dialogue intelligent question answering, and computer equipment" and the authorized text with the application number of CN111897929B entitled "method and apparatus for processing multiple rounds of question sentences, storage medium, and electronic equipment") its work focuses more on "an understanding module for user input text", the medical question answering identifies and extracts main information in question sentences through a named entity, and processes the main information into a standard question sentence form, and the multiple round question sentence processing method is how to understand the real intention of a user when a plurality of question sentences are input to the user at one time, and such work is also based on a matching and retrieving method essentially, and is based on processing and normalizing input question sentences, matching to standard questions in a knowledge base, and then responding. The process comprises the following steps: acquisition-understanding-matching-response.

The method is different from the method based on matching and retrieval, belongs to a generation-type dialogue system, has no strong dependency relationship with a knowledge base in a specific field on one hand, and can complete high-quality response content generation based on a semantic understanding module in combination with historical dialogue information even if the dialogue content outside the field is encountered in the dialogue; on the other hand, historical dialogue data is recorded and processed, current intention information is combined, and a generation network is utilized to ensure the fluency of text generation and the logicality of multiple rounds of dialogue in the dialogue process.

According to the invention, aiming at the acquired input in the conversation process, an algorithm is used for completing the generation of a response text instead of matching, the key point is generation, and then the generation is based on the content of the conversation, so that the conversation content is smooth and logical, and is more in line with the conversation process of human, and the flow is as follows: acquisition-understanding-generating-responding.

Further, waiting and acquiring the next round of conversation, and repeating the steps until the user triggers an ending condition and the current multiple rounds of conversation are ended; the method comprises the following steps:

when executing the Nth round of dialogue, decoding an intention characteristic representation group of historical user dialogue contents and the (N-1) th round intermediate variable respectively, inputting a decoding result into a generating network, completing a response by the generating network, generating a response text of the Nth round of dialogue contents and returning the response text, wherein the Nth round of dialogue generation is completed;

wherein the group of intent feature representations of the historical user dialog content is a set of intent feature representations of the first N-1 rounds, including the intent feature representation of the first round, the intent feature representation of the second round, … …, the intent feature representation of the N-1 round.

Further, the first round of conversation of the user is obtained, and the semantic feature representation of the content of the first round of conversation is obtained by performing feature extraction on the first round of conversation by using a semantic understanding model; the method specifically comprises the following steps:

acquiring a first round of dialogue input text of a user according to the first round of dialogue of the user;

performing word segmentation processing on the speech input text of the first wheel of the user by using the existing word segmentation algorithm to obtain a word segmentation sequence S1 (S is equal to₁,s₂,...,s_t,...,s_n) (ii) a (ii) a Wherein:

s_t＝onehot(x_t) (1)

in the formula, s_tFor inputting text data x by using a read-thermal coding method_tThe converted numerical vector; t is the t-th word in the word segmentation sequence;

inputting the segmentation sequence S1 into a preset semantic understanding model, performing feature extraction on the segmentation sequence S1, and calculating a semantic vector (namely semantic feature representation, which is represented in a vector form) W1 (W is represented in a vector form) of the first round of dialog₁,w₂,...,w_t,.

..w_n) And output, w_tIs an element in the semantic vector W1 for the first dialog;

specifically, the semantic understanding model is a bi-LSTM network, with x_tTransmitted as input into the bi-LSTM network, the weights of a forgetting gate, an input gate and an output gate are respectively calculated, wherein W and b are parameters which are trained in a preset semantic understanding model,

and σ (-) is the activation function used in the model:

i_t＝σ(W_iix_t+b_ii+W_hiw_(t-1)+b_hi) (2)

f_t＝σ(W_ifx_t+b_if+W_hfw_(t-1)+b_hf) (3)

o_t＝σ(W_iox_t+b_io+W_how_(t-1)+b_ho) (4)

the state of the network cells at the current moment is updated,

g_t＝tanh(W_igx_t+b_ig+W_hgw_(t-1)+b_hg) (5)

c_t＝f_t*c_(t-1)+i_t*g_t (6)

computing a semantic representation vector, where w_tIs a 512-dimensional word vector and,

w_t＝o_t*tanh(c_t) (7)

in actual operation, considering that compared with a unidirectional network, a bidirectional language model can better learn context characteristics of a sequence from a positive direction and a negative direction, and the network used in the invention for semantic vector conversion based on the sequence is a dynamic conversion network and can more fully represent semantic information of a text, even if the same text has different sequences, the semantic information is different, for example, the word "apple" in two sentences (i.e. "i want to eat apple", "apple releases a new mobile phone today") has different context and naturally has different semantics. The actual semantic understanding model adopts a bi-LSTM structure, the calculation processes of the bi-LSTM structure in two sequence directions are consistent, and finally the bi-LSTM network calculates the semantic vector of the first round of conversation, and the calculation formula is as follows:

in the formula, w_tIs an element in the semantic vector W1 for the first dialog;

is a semantic vector calculated based on two sequence directions according to the above formula set (formula (1) to formula (7)), and MLP (·) is a layer of fully connected network.

Further, according to the semantic feature representation, extracting deeper semantic information from the semantic feature representation by using an encoder to generate a first round of intermediate variables; the method specifically comprises the following steps:

constructing an encoder taking bi-GRU as a framework, wherein the encoder is used for processing and calculating conversation contents and further abstracting semantic features;

according to the semantic vector W1 of the first round of dialog, an element W in the semantic vector W1 of the first round of dialog is added_tInput into the constructed encoder, calculating the update gate weight z_tAnd a reset gate r_tAnd (3) weighting:

z_t＝σ(W_z·[h_t-1,w_t]) (9)

r_t＝σ(W_r·[h_t-1,w_t]) (10)

where W is a parameter that has been trained in the pre-encoder, W_z、W_rCorresponding to the update gate parameter and the reset gate parameter respectively employed by the encoder, σ (-) being the activation function, h_t-1Is given by w_t-1Calculated output h for input_tAnd h is₀Is a vector randomly generated by the network;

according to the updated door weight z_tAnd a reset gate r_tWeighting, calculating candidate intermediate variables; calculating a final intermediate variable c1 ═ h from the candidate intermediate variables_nWhere n is the input sequence length;

where, tanh (-) represents the activation function, W is the trained parameter in the network, h_nI.e. the corresponding network output when t-n is the sequence length.

In particular, the encoder abstracts and represents semantic features based on the sequence, so the output of the module only takes the last sequence state, i.e. the intermediate variable c 1-h_nIn this case, c1 is also a 512-dimensional vector.

Further, according to the first round intermediate variables, extracting the first round intermediate variables by using an intention feature calculation network to obtain intention feature representations; the method specifically comprises the following steps:

firstly, an intention characteristic calculation network taking an RNN + Attention mechanism as a framework is set up to record and extract characteristic information of historical conversation contents in a conversation process as a conversation intention extraction network, and on one hand, the RNN is selected as the network framework, so that the calculation performance can be effectively improved by reducing the number of parameters as much as possible under the condition that the performance is not large, and unnecessary resource consumption is avoided.

Next, an intermediate variable c calculated after each round of dialog is used as an input, that is, the input in the first round of dialog is (a, a, a, a, c1), the input in the second round of dialog is (a, a, a, c1, c2), and so on, where a is a random variable. The specific calculation process of the step is as follows:

taking a sequence of historical dialogue records as input into RNN, where W and b are parameters in the trained network, x_tIs the t-th element in the input sequence calculated as follows:

h_t＝tanh(W_ix_t+W_jh_t-1+b) (13)

then, an Attention mechanism is used for calculation, the current topic feature tendency is determined through the Attention weight, and a continuous conversation history calculation formula is as follows:

u_i＝tanh(W_sh_t+b_t) (14)

v＝∑_t a_th_t (16)

wherein, a is the weight calculated by the attention mechanism, and v is the final network output, namely the solved topic feature vector. The module ensures the record of the historical topic feature information in the conversation process by combining RNN and attention, and simultaneously outputs the topic information with more obvious features by an attention mechanism, thereby laying a foundation tone in multiple aspects such as semantics for the next sentence to be generated and fully playing the topic record after the beginning and the distributed network roles.

Further, the intention feature representation and the first-round intermediate variable are decoded respectively, decoding results are input into a generating network, the generating network completes response, a response text of the first-round dialogue content is generated and returned, and the first-round dialogue generation is completed up to this point; the method specifically comprises the following steps:

constructing a double-layer unidirectional GRU as a generation network;

respectively decoding the intention feature representation and the first round intermediate variables, and inputting decoding results into the generated network to realize text sequence generation to obtain candidate sequences;

and searching out an optimal sequence A1, namely a response text of the conversation content, according to the candidate sequence by adopting a beacon search algorithm as a search strategy, and returning the optimal sequence A1 to the conversation.

In a second aspect, the present invention further provides a dialog generating device applied to a multi-turn dialog system, which supports the dialog generating method applied to the multi-turn dialog system; the device includes:

the acquisition unit is used for acquiring the dialogue input text of each wheel of the user according to the dialogue of each wheel of the user;

the semantic vector representation unit is used for extracting the characteristics of the Nth wheel speech by using a semantic understanding model according to the obtained Nth wheel speech of the user to obtain semantic vector representation of the Nth wheel speech content;

the intermediate variable generation unit is used for extracting deeper semantic information from the semantic vector representation by using an encoder according to the semantic vector representation to generate an intermediate variable;

the intention characteristic representation calculating unit is used for extracting the intermediate variables by using an intention characteristic calculating network according to the intermediate variables to obtain intention characteristic representations;

a decoding unit, configured to decode the intention feature representation and the intermediate variable respectively to obtain a decoding result;

a response text generation unit, which is used for inputting the decoding result into a generation network, completing the response by the generation network, generating the response text of the conversation content and returning the response text, wherein the completion is one-to-one conversation generation; and waiting for and acquiring the next round of conversation, and repeating the process until the user triggers an ending condition, so that the current multiple rounds of conversation are ended.

In a third aspect, the present invention provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the dialog generation method applied to the multi-turn dialog system when executing the computer program.

In a fourth aspect, the present invention further provides a computer-readable storage medium, which stores a computer program, which when executed by a processor implements the dialog generation method applied to the multi-turn dialog system.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. the invention is based on the semantic understanding model of the data set, when the task-based intelligent dialogue system applied in some specific fields has better comprehension capability, even encountering knowledge outside the fields, the invention can realize dialogue response with higher quality through the semantic understanding model and the generation network.

2. The invention calculates the historical dialogue information of the network record through the intention characteristics, extracts the intention characteristics of the current dialogue through the network calculation, and takes the intention characteristics as one of the conditions to control response generation, thereby ensuring the fluency and the logicality in the dialogue interaction. The method is suitable for a multi-round interactive dialogue scene, response generation is realized based on current dialogue content and historical dialogue records, and multi-round interaction is completed.

3. According to the invention, a beacon search algorithm is adopted to replace greedy search in sequence search, so that the generated response sequence is globally optimal within a certain range, and a trap falling into local optimal is avoided.

4. The invention overcomes the problems of strong knowledge base dependence of a retrieval system and the defect of single-turn dialogue of a generation system in the prior art, and provides a novel dialogue generation method applied to a multi-turn dialogue system. The method is more flexible and controllable in practical application, can be used as a chatting intelligent dialogue system based on open domain data, and can also be used as a task question-answering or guide dialogue system based on a specific field background.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:

fig. 1 is a flow chart of a dialog generation method applied to a multi-turn dialog system according to the present invention.

Fig. 2 is a detailed flowchart of a dialog generation method applied to a multi-turn dialog system according to the present invention.

Fig. 3 is a schematic structural diagram of a dialog generating device applied to a multi-turn dialog system according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and the accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not used as limiting the present invention.

Example 1

As shown in fig. 1 and fig. 2, a dialog generation method applied to a multi-turn dialog system according to the present invention includes, as shown in fig. 1:

In this embodiment, the step of waiting and acquiring the next round of conversation is repeated until the user triggers an end condition, and the current multiple rounds of conversation are ended; the method comprises the following steps:

wherein the group of intent feature representations of the historical user dialog content is a set of top N-1 rounds of intent feature representations, including a first round of intent feature representations, a second round of intent feature representations, … …, an N-1 th round of intent feature representations.

According to the technical scheme, the network record historical conversation information is calculated through the intention characteristics, the intention characteristics of the current conversation are extracted through network calculation, and the intention characteristics are used as one of conditions to control response generation, so that the fluency and the logicality in conversation interaction are ensured. The method is suitable for a multi-round interactive dialogue scene, response generation is realized based on current dialogue content and historical dialogue records, and multi-round interaction is completed.

In this embodiment, the first round of dialog of the user is obtained, and a semantic understanding model is used to perform feature extraction on the first round of dialog to obtain a semantic feature representation of the content of the first round of dialog; the method specifically comprises the following steps:

according to the first round of dialogue of the user, acquiring a first round of dialogue input text of the user, for example, "today weather is true;

performing word segmentation processing on a first wheel speech input text of a user by using the existing word segmentation algorithm to obtain a word segmentation sequence (today, weather, true, good);

converting the segmented text data into a vector expressed by a numerical value, i.e., S1 (S)₁,s₂,...,s_t,...,s_n) Wherein:

s_t＝onehot(x_t) (1)

inputting the segmentation sequence S1 into a preset semantic understanding model, performing feature extraction on the segmentation sequence S1, and calculating a semantic vector (namely semantic feature representation, which is represented in a vector form) W1 (W is represented in a vector form) of the first round of dialog₁,w₂,...,w_t,...w_n) And output, w_tElements in semantic vector W1 for the first round of dialog;

specifically, the semantic understanding model is a bi-LSTM network, with w_tPassing as input into bi-LSTM network, respectively calculating weights of forgetting gate, input gate and output gate, where W and b are already trained parameters in preset semantic understanding model, and tanh (-) and σ (-) are activation functions used in the model, where:

i_t＝σ(W_iix_t+b_ii+W_hiw_(t-1)+b_hi) (2)

f_t＝σ(W_ifx_t+b_if+W_hfw_(t-1)+b_hf) (3)

o_t＝σ(W_iox_t+b_io+W_how_(t-1)+b_ho) (4)

the state of the network cells at the current moment is updated,

g_t＝tanh(W_igx_t+b_ig+W_hgw_(t-1)+b_hg) (5)

c_t＝f_t*c_(t-1)+i_t*g_t (6)

computing a semantic representation vector, where w_tIs a 512-dimensional word vector that is,

w_t＝o_t*tanh(c_t) (7)

in the formula, w_tElements in semantic vector W1 for the first round of dialog;

is based on two orders according to the above formula sets (formula (1) to formula (7))The column-wise computed semantic vector, MLP (-) is a layer of fully connected network.

In this embodiment, the encoder is used to extract deeper semantic information from the semantic feature representation according to the semantic feature representation to generate a first round of intermediate variables; the method specifically comprises the following steps:

constructing an encoder with bi-GRU as a framework, wherein the encoder is used for processing and calculating conversation content and further abstracting semantic features;

z_t＝σ(W_z·[h_t-1,w_t]) (9)

r_t＝σ(W_r·[h_t-1,w_t]) (10)

according to the updated door weight z_tAnd a reset gate r_tWeighting, calculating candidate intermediate variables; calculating a final intermediate variable c1 ═ h according to the candidate intermediate variables_nWhere n is the input sequence length;

where, tanh (-) represents the activation function, W is the trained parameter in the network, h_nI.e. when t is equal ton is the corresponding network output for the sequence length.

In particular, the encoder abstracts and represents semantic features based on the sequence, so that the output of the module only takes the last sequence state, i.e. the intermediate variable c1 ═ h_nIn this case, c1 is also a 512-dimensional vector.

In this embodiment, the intention feature representation is obtained by extracting the first-round intermediate variables by using an intention feature calculation network according to the first-round intermediate variables; the method specifically comprises the following steps:

Next, the intermediate variable c calculated after each dialog is used as input, that is, (a, a, a, c1) is input in the first dialog, and (a, a, a, c1, c2) is input in the second dialog, and so on, where a is a random variable. The specific calculation process of the step is as follows:

taking a sequence of historical dialogue records as input RNN, where W and b are parameters in the trained network, x_tIs the t-th element in the input sequence calculated as follows:

h_t＝tanh(W_ix_t+W_jh_t-1+b) (13)

u_i＝tanh(W_sh_t+b_t) (14)

v＝∑_t a_th_t (16)

wherein, a is the weight calculated by the attention mechanism, and v is the final network output, namely the solved topic feature vector. The module ensures the record of the historical topic characteristic information in the conversation process by combining RNN and attention, and outputs the topic information with more obvious characteristics by an attention mechanism, thereby laying a semantic and other multi-aspect tone for the next sentence to be generated, and fully playing the roles of the topic record after starting and the distributed network.

In this embodiment, the intention feature representation and the first-round intermediate variable are decoded respectively, and the decoding result is input into a generation network, and the generation network completes a response to generate a response text of the first-round dialog content and returns the response text, so that the first-round dialog generation is completed; the method specifically comprises the following steps:

constructing a double-layer unidirectional GRU as a generation network;

and searching out an optimal sequence A1, namely a response text of the conversation content, according to the candidate sequence by adopting a beacon search algorithm as a search strategy, and returning the optimal sequence A1 to the conversation. As shown in fig. 2.

Assuming that there is a candidate corpus { a, b } generating a sequence with a time step of 3, and the probability corresponding to the possible value of each time step is shown in the following table, the most direct search method is characterized by selecting the maximum probability value at each time step, also called greedy strategy, in the following example, the three time steps are selected as a (0.6), a (0.6) and b (0.6), respectively, and the generated sequence is aab, and the corresponding probability value is 0.6 × 0.6 — 0.216. In another case, however, the probability value corresponding to sequence aba is 0.6 x 0.5 x 0.9 x 0.270, which is clearly superior to sequence aab.

Thus, the beamsearch, i.e. the bundle search, retains a range of possible sequences at each time step, i.e. retains candidate sequences of the first n probability values at each time step. At time T ═ 0, assume that the candidate sequence is { a (0.7), b (0.3) }; comparing probability values corresponding to { aa, ab, ba, bb } in the candidate sequences respectively at a time point T-1, and finally retaining { aa (0.36), ab (0.30) }; repeating the above steps, and finally, at the time T ═ 2, retaining the final candidate sequence { aba, aab }, from the sequence { aaa, aab, aba, abb }, so as to reduce the probability of missing the best sequence.

When in implementation:

step 1, starting conversation, and acquiring first wheel conversation content (input text) of a user, wherein the current weather is true and correct;

step 2, inputting the first wheel dialog content into a preset semantic understanding model to obtain a semantic vector (w1, w2, …, wn) of the first wheel dialog content;

step 3, inputting the first round of semantic vectors into an encoder, abstracting deeper semantic information, namely generating a first round of intermediate variables c1 of response content;

step 4, inputting the first wheel pair intermediate variable c1 into an intention characteristic calculation network to obtain a response intention characteristic representation a1 of a generated first wheel;

step 5, inputting the first-round response intention characteristic representation a1 and the first-round intermediate variable c1 into the generated network, completing response generation by the generated network, returning a response text, "suitable for going out and strolling out";

step 6, carrying out a second round of dialogue to obtain the second round of dialogue content of the user, and obtaining the dialogue content such as 'where to visit';

step 7, repeating the steps 2 and 3 to obtain a second round semantic vector (w1, w2, …, wn) and a second round intermediate variable c 2;

step 8, c1 and c2 are used as input and sent into the intention feature network to obtain a second round response intention feature representation a2 (and so on, the network input in the third round is c1, c2 and c 3).

Step 9, repeat step 5, current input is c2, a2, and second round of response text generation is completed by the generation network, for example, "go to nearby park and sun-shine bar".

And step 10, repeating the steps until the user inputs the content to trigger an ending condition, or the user does not reply for a long time, and ending the conversation.

The working principle is as follows:

existing dialog-based systems generally include two modules: an understanding module for inputting texts by users and a response module for responding texts. However, in the prior art (for example, the published text with the application number of CN112256825A entitled "method and apparatus for processing multiple rounds of dialogue intelligent question answering, and computer equipment" and the authorized text with the application number of CN111897929B entitled "method and apparatus for processing multiple rounds of question sentences, storage medium, and electronic equipment") its work focuses more on "an understanding module for user input text", the medical question answering identifies and extracts main information in question sentences through a named entity, and processes the main information into a standard question sentence form, and the multiple round question sentence processing method is how to understand the real intention of a user when a plurality of question sentences are input to the user at one time, and such work is also based on a matching and retrieving method essentially, and is based on processing and normalizing input question sentences, matching to standard questions in a knowledge base, and then responding. The process comprises the following steps: get-understand-match-response.

The method is different from the method based on matching and searching, belongs to a generation-type dialogue system, has no strong dependency relationship with a knowledge base in a specific field on one hand, and can complete high-quality response content generation based on a semantic understanding module and historical dialogue information even if the dialogue content outside the field is encountered in the dialogue; on the other hand, historical dialogue data is recorded and processed, current intention information is combined, and a generation network is utilized to ensure the fluency of text generation and the logicality of multiple rounds of dialogue in the dialogue process.

According to the invention, aiming at the acquired input in the conversation process, an algorithm is used for completing the generation of a response text instead of matching, the key point is generation, and then the generation is based on the content of the conversation, so that the conversation content is smooth and logical, and is more in line with the conversation process of human, and the flow is as follows: acquisition-understanding-generation-response.

4. The invention overcomes the problems of strong knowledge base dependence of a retrieval type system and the defect of single-turn dialogue of a generation type system in the prior art, and provides a novel dialogue generation method applied to a multi-turn dialogue system. The method is more flexible and controllable in practical application, can be used as a chatting intelligent dialogue system based on open domain data, and can also be used as a task question-answering or guide dialogue system based on a specific field background.

Example 2

As shown in fig. 3, the present embodiment is different from embodiment 1 in that the present embodiment provides a dialog generating device applied to a multi-turn dialog system, which supports the dialog generating method applied to the multi-turn dialog system described in embodiment 1; the device includes:

The execution process of each unit is executed according to the flow steps of the dialog generation method applied to the multi-turn dialog system described in embodiment 1, and details are not repeated in this embodiment.

The invention relates to a dialogue generating device applied to a multi-turn dialogue system, which has the advantages that the limitation of a knowledge base does not exist, even if the user inputs sentences of which the models are not seen, the response reply can be completed based on the semantic understanding model, the intention understanding model and the generating model of the user, but due to the limitation of the generating algorithm, the work is more suitable for a chatting system based on an open data domain or a guide type question-answering system, such as customer service pre-sale guide and the like, and the next operation is guided according to the input of the user.

Meanwhile, the invention also provides a computer device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the dialog generation method applied to the multi-turn dialog system when executing the computer program.

Meanwhile, the present invention also provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the dialog generation method applied to the multi-turn dialog system.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A dialog generation method for use in a multi-turn dialog system, the method comprising:

extracting deep semantic information from the semantic feature representation by using an encoder according to the semantic feature representation to generate a first round of intermediate variables;

2. The dialog generation method applied to a multi-turn dialog system according to claim 1, wherein the dialog generation method waits for and acquires a next turn of dialog, and repeats the steps until a user triggers an end condition and the current multi-turn dialog is ended; the method comprises the following steps:

3. The dialog generation method applied to the multi-turn dialog system according to claim 1, wherein the obtaining of the first turn of dialog of the user performs feature extraction on the first turn of dialog by using a semantic understanding model to obtain a semantic feature representation of the content of the first turn of dialog; the method specifically comprises the following steps:

performing word segmentation processing on the first wheel of the user on the speech input text by using a word segmentation algorithm to obtain a word segmentation sequence S1 ═ (S)₁,s₂,...,s_t,...,s_n) (ii) a Wherein s is_t＝onehot(x_t)，s_tFor inputting text data x by using a read-thermal coding method_tThe converted numerical vector; t is the t-th word in the word segmentation sequence;

inputting the word segmentation sequence S1 into a preset semantic understanding model, performing feature extraction on the word segmentation sequence S1, and calculating a semantic vector W1 of the first round of dialogue (W is equal to W)₁,w₂,...,w_t,...w_n) And output, w_tIs an element in the semantic vector W1 for the first round of dialog.

4. The dialog generation method according to claim 3, wherein the semantic understanding model is a bi-LSTM network, and the semantic vector of the first dialog is calculated by using the bi-LSTM network, and the calculation formula is:

is based on a semantic vector calculated based on two sequence directions, MLP (-) is a layer of fully connected network.

5. The dialog generation method applied to the multi-turn dialog system of claim 3, wherein the semantic feature representation is used to extract deeper semantic information from the semantic feature representation by an encoder to generate a first-turn intermediate variable; the method specifically comprises the following steps:

constructing an encoder with bi-GRU as a framework;

z_t＝σ(W_z·[h_t-1,w_t])

r_t＝σ(W_r·[h_t-1,w_t])

where W is a trained parameter in the pre-encoder, W_z、W_rCorresponding to the update gate parameter and the reset gate parameter respectively employed by the encoder, σ (-) being the activation function, h_t-1Is given by w_t-1Calculated output h for input_tAnd h is₀Is a vector randomly generated by the network;

according to the updated door weight z_tAnd a reset gate r_tWeight and meterCalculating candidate intermediate variables; calculating a final intermediate variable c1 ═ h from the candidate intermediate variables_nWherein n is the input sequence length;

6. The dialog generating method as claimed in claim 1, wherein the intention feature calculation network is an intention feature calculation network with RNN + Attention mechanism as framework, the historical dialog record sequence is input into RNN, and then calculated by Attention mechanism, and the current topic feature tendency is determined by the attribute weight, and the dialog history is continued.

7. The dialog generation method according to claim 1, wherein the intention feature representation and the first-round intermediate variable are decoded respectively, and the decoded result is input into a generation network, and the generation network completes the response, generates the response text of the first-round dialog content and returns, so that the completion is the first-round dialog generation; the method specifically comprises the following steps:

constructing a double-layer unidirectional GRU as a generation network;

8. A dialog generating device for a multi-turn dialog system, characterized in that the device supports a dialog generating method as claimed in any of claims 1 to 7 for use in a multi-turn dialog system; the device comprises:

the intermediate variable generating unit is used for extracting deep semantic information from the semantic vector representation by using an encoder according to the semantic vector representation to generate an intermediate variable;

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements a dialog generation method applied to a multi-turn dialog system according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, implements a dialog generation method applied to a multi-turn dialog system according to any one of claims 1 to 7.