CN113050787B

CN113050787B - Training method of man-machine conversation model and man-machine conversation method

Info

Publication number: CN113050787B
Application number: CN201911367881.XA
Authority: CN
Inventors: 霍沛; 沈大框; 陈成才
Original assignee: Shanghai Xiaoi Robot Technology Co Ltd
Current assignee: Shanghai Xiaoi Robot Technology Co Ltd
Priority date: 2019-12-26
Filing date: 2019-12-26
Publication date: 2022-08-05
Anticipated expiration: 2039-12-26
Also published as: CN113050787A

Abstract

The invention provides a training method of a man-machine conversation model and a man-machine conversation method, wherein the man-machine conversation model integrates sub-modules of a pipeline type conversation system into an integral end-to-end structural framework, the current round of conversation data and the previous round of historical data of a user are obtained and encoded into a vector sequence, and finally four sub-modules are generated through natural language understanding, conversation state tracking, conversation strategy learning and natural language sequentially to obtain the current round of reply of the system. The invention also provides a man-machine conversation method which is suitable for man-machine interaction by utilizing the model. The man-machine dialogue model can flexibly select the structure of the sub-modules according to the types of the supervision labels contained in the training data during training, and can simultaneously optimize all the sub-modules, thereby avoiding the problem of continuous accumulation and propagation of errors, improving the accuracy of system reply and greatly reducing the number of samples used during training.

Description

Training method of man-machine conversation model and man-machine conversation method

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a training method of a man-machine conversation model and a man-machine conversation method.

Background

The task-based dialogue system is a man-machine dialogue system capable of providing information or services for users with clear purposes under specific conditions, and can be applied to the fields of airline tickets, restaurant reservation, fees, address inquiry and the like.

The traditional task-based dialog system model can be divided into an end-to-end type and a pipeline type, and both have respective advantages, but have obvious defects. The end-to-end dialogue model usually needs a large amount of corpora to be trained, but the corpus collection process is time-consuming and labor-consuming, and the cost is high; the pipeline-type model usually has a plurality of independent modules for independent updating and optimization, but the result of the previous module must be obtained to run the next module according to the trend of the pipeline, which easily causes the error to be continuously accumulated and propagated among the modules.

Disclosure of Invention

In order to solve the above problems, the present invention provides a training method of a human-machine interaction model, which is characterized in that:

acquiring a training sample, wherein the training sample comprises input data and supervision tag sequences corresponding to different decoding networks, the input data comprises user conversation data of the current round and system reply of the previous round, and the supervision tag sequences at least comprise a conversation state accumulated tag sequence of the current round and a system reply tag sequence of the current round;

encoding the input data, and acquiring a user conversation sequence of the current round according to the user conversation data of the current round;

and selecting different training processing modes to train the model according to the type of the supervision label sequence:

if the supervision label sequence comprises a current round of dialog understanding label sequence and a current round of system action label sequence, training the man-machine dialog model according to a first training processing mode;

if the supervision label sequence comprises the current round of dialog understanding label sequence and does not comprise the current round of system action label sequence, training the man-machine dialog model according to a second training processing mode;

if the supervision label sequence does not comprise the current round of dialog understanding label sequence and comprises the current round of system action label sequence, training the man-machine dialog model according to a third training processing mode;

if the supervision label sequence does not comprise the current round of dialog understanding label sequence and the current round of system action label sequence, training the man-machine dialog model according to a fourth training processing mode;

judging the error between the output sequence of each decoding network and the corresponding supervision label sequence, if the error does not meet the training requirement, repeating the steps from the encoding of the input data; ending the training until the errors of all the decoding networks meet the training requirement to obtain the man-machine conversation model;

according to different training processing modes, the decoding network comprises a plurality of combinations in a natural language understanding network, a dialogue state tracking network, a dialogue strategy learning network and a natural language generating network, the current round of dialogue understanding sequence, the current round of dialogue state accumulation label sequence, the current round of system action label sequence and the current round of system reply label sequence are respectively corresponding to and respectively obtain the current round of dialogue understanding sequence, the current round of dialogue state accumulation sequence, the current round of system action sequence and the current round of system reply sequence, and the natural language generating network also obtains current round of system reply data;

in each training processing mode, training each decoding network according to the input data, the corresponding supervision label sequence and the output result of the previous network, wherein the dialogue strategy learning network and the natural language generation network are also trained by utilizing a model database;

in the first training processing mode, the input data further includes a previous-wheel session history accumulation, and the previous-wheel session history accumulation includes a previous-wheel session state accumulation and a previous-wheel system action;

the decoding network sequentially comprises a natural language understanding network, a conversation state tracking network, a conversation strategy learning network and a natural language generating network;

in the second training processing mode, the input data further comprises the last pair of conversation state accumulation;

the decoding network sequentially comprises a natural language understanding network, a conversation state tracking network and a natural language generating network;

in the third training processing mode, the input data further includes a history accumulation of previous calls, and the history accumulation sequence of previous calls includes a history accumulation of previous call states and a previous system action;

the decoding network sequentially comprises a conversation state tracking network, a conversation strategy learning network and a natural language generating network;

in the fourth training processing mode, the input data further includes previous dialog state accumulation;

the decoding network comprises a dialog state tracking network and a natural language generating network in turn.

In addition, the invention also provides a man-machine conversation method, which comprises the following steps:

acquiring user dialogue data of the current round;

inputting the user dialogue data of the current round into a man-machine dialogue model trained by the training method to obtain the reply data of the system of the current round;

and sending the reply data of the current round of system to a user.

The invention also provides a man-machine dialogue model which is obtained by training by using the training method and can be used in the human body dialogue method. The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method steps as described above when executing the program. The invention also provides a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the aforementioned method steps.

The man-machine conversation model is based on a traditional end-to-end model form, a modular structure added with a production line is integrated, when the model is trained, a specific decoding network can be flexibly selected according to a supervision label contained in a training sample, even if the supervision label corresponding to a certain network is lost, the decoding network can be removed, the overall effect cannot be greatly influenced, and supervision information can be added to the certain decoding network to improve the performance of the whole model. Meanwhile, the whole model is of an end-to-end structure and can be trained as a whole, so that the problem that error accumulation is easily caused by a common pipeline structure is solved.

Drawings

The above and other objects, features and advantages of the present application will become more apparent by describing in more detail embodiments of the present application with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application. In the drawings, like reference numbers generally represent like parts or steps.

FIG. 1 is a flowchart illustrating a method for training a human-machine interaction model according to an embodiment of the present invention;

FIG. 2(a) is a decoding network structure of a first training processing method according to an embodiment of the present invention;

fig. 2(b) is a decoding network structure component of a second training processing method according to a first embodiment of the present invention;

fig. 2(c) is a decoding network structure component of the third training processing method according to the first embodiment of the present invention;

fig. 2(d) is a decoding network structure component of a fourth training processing method according to a first embodiment of the present invention;

FIG. 3 is a flowchart illustrating a man-machine interaction method according to a third embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a human-machine interaction model according to a fourth embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example one

In the embodiment, a training method of a man-machine interaction model is provided. As shown in fig. 1, the method comprises the following steps:

step S110: obtaining a training sample;

wherein the training sample comprises input data and a supervision label sequence corresponding to different decoding networks, and the input data comprises current round of user dialogue data U _t And previous round of system recovery R _t-1 The supervision label sequence at least comprises a current round conversation state accumulated label sequence S _{t_label} Reply to tag sequence R with the current round system _{t_label} Wherein t represents the number of rounds of the current round;

i.e. the data set used for training, at least includes user session data (i.e. session source), and a supervision tag for verifying the training result of the decoding network.

Step S120: encoding the input data;

for the current round of user dialogue data U _t In the encoding process in this step, the user dialog sequence is encoded into a vector sequence form to obtain the user dialog sequence U of the current round _t ；

Preferably, the input data is encoded using a bidirectional gated recurrent neural network. All decoding networks share the same encoder.

In the training sample data obtained in step S110, except for the necessary tag sequence S _{t_label} And R _{t_label} In addition, in order to meet different training requirements, other training labels are also needed when training other decoding networks, and therefore in this embodiment, different decoding structures are selected according to the label types included in the obtained training sample data, and then different training processing modes are adopted.

Step S130: and selecting different training processing modes to train the model according to the type of the supervision label sequence:

A. if the supervised tag sequence comprises the current round of dialogue understanding tag sequence M _{t_label} And the system action label sequence A of the current round _{t_label} Training the man-machine conversation model according to a first training processing mode;

B. if the supervised tag sequence comprises the current round of dialogue understanding tag sequence M _{t_label，} Does not include the system action label sequence A of the current round _{t_label} Training the man-machine conversation model according to a second training processing mode;

C. if the supervised tag sequence does not include the current round of dialogue understanding tag sequence M _{t_label} Comprises a system action label sequence A of the current round _{t_label} Training the man-machine conversation model according to a third training step;

D. if the supervised tag sequence does not include the current round of dialogue understanding tag sequence M _{t_label} And the system action label sequence A of the current round _{t_label} Then the human-machine dialogue model is trained according to a fourth training step.

The type contained in the supervision label sequence not only has great influence on the structure of a decoding network in the man-machine interaction model, but also influences the composition of model input data. In practical application, the acquired training data does not necessarily contain labels corresponding to all decoding networks, so that in training, the other network structures except the necessary decoding networks in the man-machine interaction model can be flexibly added and deleted.

According to the different training processing modes, the decoding network comprises a plurality of combinations of a Natural Language Understanding network (NLU network), a dialog State Tracking network (DST network), a dialog Policy Learning network (DPL network) and a Natural Language generating network (NLG network), and the combinations respectively correspond to the dialog Understanding tag sequence M of the current round _{t_label} The current round of dialog state accumulated tag sequence S _{t_label} The action label sequence A of the current round system _{t_label} And said round system replies with a tag sequence R _{t_label} Separately acquiring the dialog understanding sequence M of the current round _t The current round of conversation state accumulation sequence S _t The system action sequence A of the current round _t And the current round of system recovery sequence R _t The natural language generation network also acquires the reply data R of the current round of system _t 。

In each training processing mode, each decoding network is trained according to the input data, the corresponding supervision label sequence and the output result of the previous network, wherein the DPL network and the NLG network are also trained by using a model database.

Finally, step S140 is performed: judging the error between the output sequence of each decoding network and the corresponding supervision label sequence, and if the error does not meet the training requirement, starting from step S120, repeating the steps; and ending the training until the errors of all the decoding networks meet the training requirement to obtain the man-machine conversation model.

Each training processing method in step S130 will be described in detail below.

A. In the first training processing partWherein the input data further includes the history accumulation B of the previous call _t-1 Said last pair of speech history accumulations B _t-1 Including last pair of speech state accumulation S _t-1 And the last round of system action A _t-1 。

Alternatively, the R is _t-1 And B _t-1 Instead of a vector sequence, ordinary scalar data may be used. In step S120, when encoding the input data, it may be determined whether the input data is a vector sequence, and if not, U is added _t Coded as U _t At the same time, other input data are also encoded, namely the previous round of system reply data R _t-1 Coding to obtain the last round of system recovery sequence R _t-1 And/or accumulating the historical data B of the previous call _t-1 Coding to obtain the last pair of historical conversation cumulative sequence B _t-1 。

In the first training mode, as shown in fig. 2(a), the decoding network sequentially includes an NLU network, a DST network, a DPL network, and an NLG network. Each decoding network is trained separately in the following way:

according to U _t 、R _t-1 、B _t-1 And M _{t_label} Training the NLU network, wherein the NLU network is suitable for acquiring the current round of dialog understanding sequence M _t As shown in the following formula

M _t ＝Decoder _NLU (B _t-1 ，R _t-1 ，U _t )

Wherein Decoder _NLU Representing NLU decoding network, according to M of each training output _t And M _{t_label} Continuously training the network. The task of the NLU network is to perform intent detection and word slot filling on data input by the user. Compared with the traditional natural language understanding module, in the embodiment, the intention detection and the word slot filling are integrally regarded as a sequence generation problem, rather than the intention detection as a semantic classification problem and the word slot filling as a sequence marking task, so that the problem of multiple intentions can be well solved.

According to U _t 、R _t-1 、B _t-1 、M _t And S _{t_label} Exercise stationThe DST network is suitable for obtaining the current round of dialog state accumulation sequence S _t As shown in the following formula:

S _t ＝Decoder _DST (B _t-1 ，R _t-1 ，U _t ，M _t )

wherein Decoder _DST Representing the S output of the DST decoding network according to each training _t And S _{t_label} Continuously training the network. The DST network is different from the NLU network, the NLU network only understands the input of the current round of users, and the DST network can track the states of the previous rounds of conversations. In practical use, there may be words in the user dialog that do not appear in the training data, and the DST network of this embodiment can solve this problem through a copy mechanism.

According to U _t 、R _t-1 、S _t And A _{t_label} Training the DPL network by using a model database, wherein the DPL network is suitable for acquiring the action sequence A of the system in the current round _t As shown in the following formula:

A _t ＝Decoder _DPL (R _t-1 ，U _t ，S _t )

wherein Decoder _DPL Representing DPL decoding network, from A of each training output _t And A _{t_label} Continuously training the network. The purpose of the DPL network is to give consideration to the current state S of the dialog _t And predicting the action A which should be taken by the system in the next step by combining the query result of the model database _t Due to the adoption of a sequence-to-sequence generation mode, the DPL network can generate a plurality of system actions. For example, in a scenario where a user queries a restaurant, the DPL network may give different actions such as a name of the restaurant, a price of the restaurant, and an address of the restaurant.

According to U _t 、R _t-1 、A _t And a model database, R _{t_label} Training the NLG network, wherein the NLG network is suitable for acquiring a reply sequence R of the current round of system _t And give the wheel trainUnified reply data R _t As shown in the following formula:

R _t ，R _t ＝Decoder _NLG (R _t-1 ，U _t ，A _t )

wherein Decoder _NLG Representing NLG decoding network, based on R output per training _t And R _{t_label} Continuously training the network. The task of the NLG network is to connect the actions A of the system _t Conversion to the present round of system recovery R _t Here, it is also required to query the model database.

B. In the second training processing mode, the system action label sequence A of the current round is lacked _{t_label} Therefore, do not adopt the inclusion of A _t-1 B of (A) _t-1 The input data only includes the last pair of speech state accumulations S _t-1 (ii) a As shown in fig. 2(b), the decoding network includes an NLU network, a DST network, and an NLG network in this order, and does not include a DPL network. The main role of each network is the same as the first training process.

Likewise, said S _t-1 Instead of a vector sequence, ordinary scalar data may be used. In step S120, when encoding the input data, it may be determined whether the input data is a vector sequence, and if not, U is added _t Coded as U _t At the same time, other input data are also encoded, namely the last pair of speaking states is accumulated S _t-1 Coding to obtain the last pair of accumulated speech state sequence S _t-1 。

In the second training processing mode, according to U _t 、R _t-1 、S _t-1 And M _{t_label} Training the NLU network to obtain the current round of dialogue understanding sequence M _t As shown in the following formula

M _t ＝Decoder _NLU (S _t-1 ，R _t-1 ，U _t )

According to U _t 、R _t-1 、S _t-1 、M _t And S _{t_label} Training the DST network, the DST network being adapted to obtain a current round of dialog state accumulation sequence S _t As shown in the following formula:

S _t ＝Decoder _DST (B _t-1 ，R _t-1 ，U _t ，M _t )

according to U _t 、R _t-1 、S _t And a model database, R _{t_label} Training the NLG network, wherein the NLG network is suitable for acquiring a reply sequence R of the current round of system _t And gives the return data R of the current round of system _t As shown in the following formula:

R _t ，R _t ＝Decoder _NLG (R _t-1 ，U _t ，S _t )

C. in the third training processing mode, the input data further includes a previous pair of speech history accumulation B _t-1 Said last pair of speech history accumulations B _t-1 Including last pair of speech state accumulation S _t-1 And the last round of system action A _t-1 (ii) a As shown in fig. 2(c), the decoding network sequentially includes an NLU network, a DST network, and an NLG network, and the tag sequence M is understood due to the lack of the current round of dialog _{t_label} And no NLU network is included. The main role of each network is the same as the first training process.

Likewise, said B _t-1 Instead of a vector sequence, ordinary scalar data may be used. In step S120, when encoding the input data, it may be determined whether the input data is a vector sequence, and if not, U is added _t Coded as U _t Meanwhile, the historical accumulated data B of the previous pair of conversations is also used _t-1 Coding to obtain the last pair of historical conversation cumulative sequence B _t-1 。

According to U _t 、R _t-1 、S _t-1 And S _{t_label} Training the DST network, the DST network being adapted to obtain a current round of dialog state accumulation sequence S _t As shown in the following formula:

S _t ＝Decoder _DST (B _t-1 ，R _t-1 ，U _t )

according to U _t 、R _t-1 、S _t And A _{t_label} Training the DPL network by using a model database, wherein the DPL network is suitable for acquiring a system action sequence A of the current round _t As shown in the following formula:

A _t ＝Decoder _DPL (R _t-1 ，U _t ，S _t )

according to U _t 、R _t-1 、A _t And a model database, R _{t_label} Training the NLG network, wherein the NLG network is suitable for acquiring a reply sequence R of the current round of system _t And gives the return data R of the current round of system _t As shown in the following formula:

R _t ，R _t ＝Decoder _NLG (R _t-1 ，U _t ，A _t )

D. in the fourth training processing mode, the system action label sequence A of the current round is lacked _{t_label} Therefore, it does not use the inclusion A _t-1 B of (A) _t-1 The input data only includes the last pair of speech state accumulations S _t-1 (ii) a As shown in fig. 2(d), the decoding network includes a DST network and an NLG network in turn, and does not include an NLU network and a DPL network. The main role of each network is the same as the first training process.

S _t ＝Decoder _DST (B _t-1 ，R _t-1 ，U _t )

R _t ，R _t ＝Decoder _NLG (R _t-1 ，U _t ，S _t )

preferably, in this embodiment, the natural language understanding network (NLU network) acquires the current dialog understanding sequence M _t Meanwhile, the current round of dialogue understanding data M is also output _t (ii) a And/or

The session state tracking networkNetwork (DST network) acquires the current round of dialog state accumulation sequence S _t Meanwhile, the accumulated data S of the conversation state of the current round is also output _t (ii) a And/or

The dialogue strategy learning network (DPL network) acquires the current round of system action sequence A _t Meanwhile, the system action data A of the current round is also output _t 。

During training, each network can output corresponding vector sequences and also can output corresponding scalar data, so that the output condition of each decoding network can be checked at any time during training, the position where errors occur is located, and adjustment is performed in time.

The man-machine conversation model in the embodiment adopts a modular structure mode, the supervision functions of the middle decoding networks are fused in an end-to-end training framework, the decoding networks share the same encoder and are respectively provided with an independent decoder, the design of the model can jointly optimize each decoding network, can be independently adjusted in a targeted manner, and effectively avoids error cumulative propagation.

Example two

The training method of the human-computer dialogue model provided in this embodiment is based on the first embodiment, and adds hidden state connections between the encoder and each decoder of the model, instead of establishing a relationship simply with the output result of text symbols. The knowledge transfer among the networks is realized by sharing the hidden state, and the initialization of the networks is assisted, so that the connection among the networks is tighter.

Optionally, in an implementation manner of this embodiment, in step S120 of the first embodiment, when the encoder encodes the input data, in addition to converting scalar data in the input data into a vector sequence, the encoder obtains an encoder hidden state

For assisting each decoding network to perform the initialization of the current round.

In this embodiment, the encoder is in a hidden state

The role of (a) is not limited to each training process,

the method is used for initializing each decoding network included in the decoding model in the current training processing mode so that each decoding network can obtain implicit information obtained in the encoding process and adapt to the input of each training sample.

Optionally, in another implementation manner of this embodiment, the encoder hidden state is also obtained in step S120

Encoder hidden state

The decoding network used for assisting the first order in the decoding model is initialized. For example, the encoder is hidden under the conditions of the first training processing mode and the third training processing mode

The method is used for assisting the NLU network to carry out the initialization of the current round, and under the conditions of the second training processing mode and the fourth training processing mode, the encoder is in a hidden state

It is used for the auxiliary DST network.

In addition, in this embodiment, each decoding network also outputs the hidden state of its own network to the next network according to the sequence of the flow, for example: in the first training processing mode, the NLU network acquires the current round of dialog understanding sequence M _t Meanwhile, the hidden state of the first network is obtained

The DST network initialization assisting device is used for assisting the DST network to initialize in a current round; similarly, DST network acquires current turn dialog state accumulation sequence S _t While acquiring the second networkHidden state

The DPL network initialization device is used for assisting the DPL network initialization in the current round; DPL network acquiring local round system action sequence A _t Meanwhile, a third network hidden state is obtained

The method is used for assisting the NLG network to initialize in the current round.

By analogy, in the second training processing mode, the encoder is in a hidden state

For assisting the initialization of the DST network, the DST network acquires the current turn of the dialog state accumulation sequence S _t Meanwhile, the hidden state of the second network is obtained

The method is used for assisting the initialization of the NLG network in the current round.

In the third training processing mode, the encoder is in a hidden state

The method is used for assisting the NLU network to carry out the initialization of the current round, and the NLU network acquires the conversation understanding sequence M of the current round _t Meanwhile, the hidden state of the first network is obtained

The DST network initialization assisting device is used for assisting the DST network to initialize in a current round; similarly, DST network acquires current turn dialog state accumulation sequence S _t Meanwhile, the hidden state of the second network is obtained

In the fourth training processing mode, the encoder is in a hidden state

The introduction of the hidden state enables information transfer and knowledge sharing between networks to be carried out more effectively, and a more perfect relation can be established between the networks without the aid of the output of a general text symbol.

The inventor tests the man-machine conversation model in the embodiment by using a data set CamRest676 commonly used in the field, and respectively tests the following three indexes: (1) entity matching rate: determining whether the system is able to generate all accurate constraints to search for candidate entities during the dialog state phase; (2) success F1F 1 value for measuring the request word slot (considering recall rate and accuracy rate together); (3) BLEU: for measuring the quality of the generated reply.

Test results show that the model trained in the embodiment can exceed other models commonly applied in the prior art by only 60% of training data, and the entity matching rate of the model adopting all decoding networks can reach 94.7%; if the decoding network is trained by adopting all training data, the best effect can be achieved by adopting the three indexes of the model of all decoding networks, the entity matching rate can reach 95.1%, the Success F1 value is 0.860, and the BLEU value can also reach the level of 0.259.

EXAMPLE III

In the third embodiment, a man-machine interaction method is disclosed, as shown in fig. 3, including the following steps:

step S310: acquiring user dialogue data U of the current round _t ；

Step S320: the user dialogue data U of the current round is processed _t Inputting the human-computer dialogue model trained by the training method in the first embodiment or the second embodiment to obtain the reply data R of the current round of system _t ；

Step S330: replying the current round of system with data R _t And sending the data to the user.

Taking a man-machine conversation model including an NLU network, a DST network, a DLP network, and an NLG network as an example, when performing multiple rounds of man-machine conversation communication, first, in step S310, conversation data input by the user in the current round is acquired, for example: i want to find a high-grade restaurant in the south of town.

Receiving user data U ₁ And then, performing step S320, and acquiring the system reply of the current round according to the trained man-machine conversation model. First, it is converted into a vector sequence U by encoding ₁ Because of the first round of dialogue, the input data does not include the previous round of system reply data and the previous round of dialogue historical accumulated data.

NLU network according to user dialogue sequence U of the round ₁ Analyzing the intention of the user sentence, and filling corresponding word slots to obtain the user dialogue understanding sequence M in the current round ₁ . Because a sequence-to-sequence generation manner is adopted in this embodiment, a plurality of intentions included in the user dialog in the current round can be obtained, and the user requirements are filled in the corresponding word slots, for example, after the user dialog is input, the user intentions are obtained through analysis:

intention 1: informing; word slot [ valence ]: are expensive;

intention 2: informing; word groove [ region ]: south of town.

Then, M is added ₁ And U ₁ Inputting the data into DST network, storing the dialog content as history information to obtain the dialog state accumulation sequence S ₁ ；

Will S ₁ And U ₁ Inputting the data into DPL network, and obtaining the system behavior of the current round by querying in model databaseAs sequence A ₁ . In the current round of conversation, the system acquires a series of restaurant information meeting the current user requirements through querying in the model database, but the user is still required to further provide the information for screening, so the current round of system acts as: the inquiry continues to obtain food information.

According to A ₁ And U ₁ Model database, NLG network providing the reply sequence R of the current round system ₁ And converts the data into the return data R of the current round system in the form of natural language ₁ : there are several high-priced restaurants in the south of the city, what do you like to eat?

In the next step S330, the system reply data of the current round is sent to the user, and a next round of man-machine conversation is prepared.

After receiving the reply given by the system, the user can continue to input the conversation as the conversation data U of the user in the current round according to the own intention ₂ For example: i have no need for cuisine.

Will U ₂ Coding to obtain the user dialogue sequence U of the round ₂ Will S ₁ And A ₁ Composition B ₁ Will U is ₂ 、B ₁ 、R ₁ Inputting NLU network as input data to obtain user dialogue understanding sequence M in the current round ₂ . And combining the historical data of the previous round to obtain the user intention of the round and the corresponding word slot filling:

intention 3: informing; word trough [ food ]: without limitation.

Will M ₂ And U ₂ 、B ₁ 、R ₁ Inputting the input data into the DST network, the DST network will track the conversation state of the first rounds, thus outputting the conversation state including the expensive conversation state of the first round and the south of town, and obtaining the conversation state accumulation sequence S of the current round ₂ 。

Will S ₂ And U ₂ 、R ₁ Inputting the data into a DPL network as input data, and acquiring the action sequence A of the system in the current round by inquiring in a model database ₂ . If the user has no other requirements on the cuisine provided by the restaurant, the actions in the current round are as follows according to the result of the query in the model database: push awayAnd sending the object with the highest priority.

According to A ₂ And U ₂ 、R ₁ Model database, NLG network providing the reply sequence R of the current round system ₂ According to the result of the model database query, the object with the highest corresponding priority is obtained and converted into the reply data R of the current round system in the form of natural language ₂ : there is a Chicago restaurant in the south of towns where the owner is beating Mexican dishes.

By analogy, the system can complete multiple rounds of conversations with the user until the user is satisfied with a response, and the conversation is ended. For example, in the third round of dialog, the user enters the dialog: where the location of the restaurant? According to the above process, the NLU network obtains the intention of the user in the current round and the corresponding filled word slot:

intention is: a demand; word slot [ address ]: 2- (d);

through the integration of the conversation states of the previous wheels, the system searches a model database to obtain the content of the action and the reply of the system in the current wheel: chicago restaurant address cherry sinton road # 2 in Cambridge leisure park.

According to the steps, the user and the system carry out multiple rounds of man-machine conversation so as to meet the requirements of the user. It should be noted that the man-machine interaction model used in this embodiment is not limited to the form in the foregoing example, and the man-machine interaction model obtained by the training method in any of the first embodiment and the second embodiment can be used in the man-machine interaction method in this embodiment.

Example four

In the embodiment, a man-machine conversation model for a man-machine conversation system is provided, and the model is based on an end-to-end trainable encoder-decoder model framework, takes a modularized supervised neural network as a main structure, and is suitable for various application scenes such as task-type man-machine conversation and the like.

As shown in fig. 4, the man-machine conversation model in the present embodiment includes the following components:

a data obtaining module 10, wherein the data obtaining module 10 is configured to obtain input data, and the input data includes the user dialogue data U of the current round _t And if the conversation is in its turnIf the conversation is not the first round, the data obtaining module 10 further obtains the previous round of data from the decoding module 30 according to the composition of the sub-modules included in the decoding module 30, where the previous round of data at least includes the previous round of system reply R _t-1 。

The encoding module 20 is configured to encode the input data to obtain a corresponding input sequence; in the present model, a common one of all sub-modules 20 in the decoding module 30;

a decoding module 30, configured to obtain the current round of system reply data according to the input sequence, where the decoding module includes a plurality of different sub-modules, and at least includes a dialog state tracking sub-module (DST sub-module 32) and a natural language generation sub-module (NLG sub-module 34), and the decoding module further includes a model database 35;

according to the difference of the sample conditions during training, the decoding module 30 may further include other sub-modules, such as a natural language understanding sub-module (NLU sub-module 31) and/or a dialogue strategy learning sub-module (DPL sub-module 33), and when the sub-modules of the decoding module 30 are different in composition structure, the data of the previous round acquired by the data acquisition module 10 from the decoding module 30 is also different, which includes the following four cases:

A. the decoding module 30 includes an NLU sub-module 31, a DST sub-module 32, a DPL sub-module 33, and an NLG sub-module 34, which are sequentially arranged from upstream to downstream.

At this time, the previous round data further includes a previous round conversation history accumulation B _t-1 Said last pair of speech history accumulations B _t-1 Including last pair of speech state accumulation S _t-1 And the last round of system action A _t-1 。

The NLU submodule 31 is used for acquiring the current round of dialog understanding sequence M according to the input sequence _t As shown in the following formula:

M _t ＝Decoder _NLU (B _t-1 ，R _t-1 ，U _t )

wherein Decoder _NLU Representing the NLU decoding network.

The DST submodule 32 is used for summing M according to the input sequence _t Book acquisition wheelDialog state accumulation sequence S _t As shown in the following formula:

S _t ＝Decoder _DST (B _t-1 ，R _t-1 ，U _t ，M _t )

wherein Decoder _DST Representing a DST decoding network.

The DPL submodule 33 is used for accumulating the sequence S according to the input sequence and the current dialog state _t And the model database 35 acquires the action sequence A of the system in the current round _t As shown in the following formula:

A _t ＝Decoder _DPL (R _t-1 ，U _t ，S _t )

wherein Decoder _DPL Representing a DPL decoding network.

The NLG submodule 34 is used for performing the current round of system action sequence A according to the input sequence _t And the model database 35 acquires the reply sequence R of the system in the current round _t And provides the corresponding reply data R of the current round of system _t As shown in the following formula:

R _t ，R _t ＝Decoder _NLG (R _t-1 ，U _t ，A _t )

wherein Decoder _NLG Representing an NLG decoding network.

B. The decoding module 30 includes an NLU sub-module 31, a DST sub-module 32, and an NLG sub-module 34, which are arranged in sequence from upstream to downstream.

At this time, the previous round data further includes a previous round conversation state accumulation S _t-1 。

M _t ＝Decoder _NLU (S _t-1 ，R _t-1 ，U _t )

the DST submodule 32 is used for summing M according to the input sequence _t Acquiring the conversation state accumulation sequence S of the current round _t As shown in the following formula:

S _t ＝Decoder _DST (S _t-1 ，R _t-1 ，U _t ，M _t )

the NLG submodule 34 is used for accumulating a sequence S according to the input sequence and the current round of dialog state _t And the model database 35 acquires the reply sequence R of the system in the current round _t And provides the corresponding reply data R of the current round of system _t As shown in the following formula:

R _t ，R _t ＝Decoder _NLG (R _t-1 ，U _t ，S _t )

C. the decoding module 30 includes a DST sub-module 32, a DPL sub-module 33, and an NLG sub-module 34, which are sequentially arranged from upstream to downstream.

At this time, the previous round data further includes a previous round conversation history accumulation B _t-1 Said last pair of speech history accumulations B _t-1 Including last pair of speech state accumulation S _t-1 And the previous round of system action A _t-1 。

The DST submodule 32 is used for obtaining the cumulative sequence S of the dialog state of the current round according to the input sequence _t As shown in the following formula:

S _t ＝Decoder _DST (B _t-1 ，R _t-1 ，U _t )

the DPL sub-module 33 is used for accumulating the sequence S according to the input sequence and the current dialog state _t And the model database 35 acquires the action sequence A of the system in the current round _t As shown in the following formula:

A _t ＝Decoder _DPL (R _t-1 ，U _t ，S _t )

R _t ，R _t ＝Decoder _NLG (R _t-1 ，U _t ，A _t )

D. the decoding module 30 includes a DST sub-module 32 and an NLG sub-module 34, which are arranged sequentially from upstream to downstream.

S _t ＝Decoder _DST (S _t-1 ，R _t-1 ，U _t )

R _t ，R _t ＝Decoder _NLG (R _t-1 ，U _t ，S _t )

in an optional implementation manner of this embodiment, the natural language understanding network (NLU network) acquires the current dialog understanding sequence M _t Meanwhile, the current round of dialogue understanding data M is also output _t (ii) a And/or

The session state tracking network (DST network) acquires the current session state accumulation sequence S _t Meanwhile, the accumulated data S of the conversation state of the current round is also output _t (ii) a And/or

The system can check the operation of the decoding networks according to the data output of each decoding network.

Preferably, in this embodiment, the data obtaining module 10 can directly obtain the vector sequence, such as R, when obtaining the previous round of data from the decoding network 30 _t-1 、S _t-1 Or B _t-1 (ii) a Scalar data may also be retrieved and converted to a sequence of vectors R by encoding module 20 _t-1 、S _t-1 And B _t-1 . Preferably, the bidirectional gated recurrent neural network can be selected in the encoding module 20 to encode the input data.

Optionally, the encoding module 20 and the decoding module 30 are connected through a hidden state, and when the encoding module 20 encodes the input data, the encoder hidden state of the current round is obtained at the same time

To assist the initialization of each sub-module.

Further, the encoding module 20 obtains the encoder hidden state of the current round

The method is used for assisting the initialization of the first-order sub-module in the decoding module, and each sub-module acquires the hidden state of the decoder while acquiring the corresponding vector sequence to assist the initialization of the next sub-module.

Specifically, the method comprises the following steps:

A. when the decoding module 30 comprises an NLU sub-module 31, a DST sub-module 32, a DPL sub-module 33 and an NLG sub-module 34, the encoder is in a hidden state

The method is used for assisting the NLU submodule 31 of the current round to initialize, and the NLU submodule 31 obtains a first network hidden state

For assisting the DST submodule 32 of the current round to initialize; the DST submodule 32 obtains a second network hidden state

The DPL sub-module 33 is used for assisting the initialization of the DPL sub-module of the current round; the DPL submodule 33 obtains a third network hidden state

The module is used for assisting the initialization of the current NLG submodule 34;

B. when the decoding module 30 includes the NLU sub-module 31, the DST sub-module 32, and the NLG sub-module 34, the encoder is hidden

C. when the decoding module 30 includes a DST sub-module 32, a DPL sub-module 33, and an NLG sub-module 34, the encoder is hidden from state

To assist the DST submodule 32 in initializing the current round, the DST submodule 32 obtains a second network hidden state

D. when the decoding module 30 includes a DST sub-module 32 and an NLG sub-module 34, the encoder is hidden from state

To assist the present round NLG sub-module 34 in initialization.

The man-machine dialogue model in the embodiment introduces a flexible modular supervision mechanism on the basis of an encoder-decoder model architecture. On one hand, all sub-modules of the pipeline type dialog system are integrated into an integral end-to-end model framework, so that each sub-module in the decoding module can be used in a plug-and-play mode, and can be flexibly selected according to a supervision label carried by a sample during training without seriously influencing the effect of the model; on the other hand, the whole model is of an end-to-end structure, all decoding sub-modules can be integrally optimized, and the problem that error accumulation is easy to generate when only one sub-module is optimized is avoided.

Example four

It should be noted that the human-machine interaction model according to the embodiment of the present application may be integrated into the electronic device 90 as a software module and/or a hardware module, in other words, the electronic device 90 may integrate the human-machine interaction model in the above-mentioned embodiment. For example, the human-machine-dialogue model may be applied to a software module in the operating system of the electronic device 90, or may be applied to an application developed therefor; of course, the human-machine interaction model may also be inherited into a device associated with one of the hardware modules of the electronic device 90.

In another embodiment of the present application, the carrier integrated with the human machine conversation model and the electronic device 90 may be separate devices (e.g., a server), and the carrier integrated with the human machine conversation model may be connected to the electronic device 90 through a wired and/or wireless network and transmit the interactive information according to an agreed data format.

Fig. 5 is a schematic structural diagram of an electronic device 90 according to an embodiment of the present application. As shown in fig. 5, the electronic device 90 includes: one or more processors 91 and memory 92; and computer program instructions stored in the memory 92 which, when executed by the processor 91, cause the processor 91 to execute a human-machine dialogue model as in any of the embodiments described above.

The processor 91 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 90 to perform desired functions.

Memory 92 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by the processor 91 to implement the steps in the human-machine dialogue model of the various embodiments of the application described above and/or other desired functions. Information such as light intensity, compensation light intensity, position of the filter, etc. may also be stored in the computer readable storage medium.

In one example, the electronic device 90 may further include: an input device 93 and an output device 94, which are interconnected by a bus system and/or other form of connection mechanism (not shown in fig. 5).

The output device 94 may output various information to the outside, and may include, for example, a display, speakers, a printer, and a communication network and its connected remote output devices, among others.

Of course, for simplicity, only some of the components of the electronic device 90 relevant to the present application are shown in fig. 5, and components such as buses, input devices/output interfaces, and the like are omitted. In addition, the electronic device 90 may include any other suitable components, depending on the particular application.

In addition to the above-described methods and apparatus, embodiments of the present application may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform the steps in the human-machine dialogue model of any of the above-described embodiments.

The computer program product may write program code for carrying out operations for embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present application may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform steps in a human-machine dialogue model according to various embodiments of the present application described in the human-machine dialogue model section of the present specification, supra.

The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

It should be noted that in the apparatus and devices of the present application, the components may be disassembled and/or reassembled. These decompositions and/or recombinations are to be considered as equivalents of the present application.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the application to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims

1. A training method of a man-machine conversation model is characterized by comprising the following steps:

in each training processing mode, training each decoding network according to the input data, the corresponding supervision label sequence and the output result of the previous network, wherein the dialogue strategy learning network and the natural language generation network are trained by utilizing a model database;

in the second training processing mode, the input data further comprises the previous dialogue state accumulation;

in the third training processing mode, the input data further includes a previous-wheel-call history accumulation sequence, and the previous-wheel-call history accumulation sequence includes a previous-wheel-call state accumulation and a previous-wheel system action;

2. A training method for a human-machine dialogue model according to claim 1, wherein the natural language understanding network outputs the current-round dialogue understanding data while acquiring the current-round dialogue understanding sequence; and/or

The dialog state tracking network acquires the dialog state accumulation sequence of the current round and simultaneously outputs the dialog state accumulation data of the current round; and/or

And the dialogue strategy learning network acquires the action sequence of the current round of system and simultaneously outputs the action data of the current round of system.

3. The method for training a human-computer interaction model according to claim 1, wherein when encoding the input data, it is determined whether the previous round of system reply is a vector sequence, and if not, the previous round of system reply is encoded to obtain a previous round of system reply sequence; and/or

In the first training processing mode and the third training processing mode, when the input data is coded, judging whether the previous wheel-to-speech historical accumulation is a vector sequence, if not, coding the previous wheel-to-speech historical accumulation to obtain a previous wheel-to-speech historical accumulation sequence;

in the second training processing mode and the fourth training processing mode, when the input data is encoded, whether the previous wheel session state accumulation is a vector sequence is judged, if not, the previous wheel session state accumulation is encoded, and a previous wheel session state accumulation sequence is obtained.

4. The method for training a human-machine dialogue model of claim 1, wherein an encoder hidden state is obtained while obtaining the user dialogue sequence of the current round according to the user dialogue data of the current round, and the encoder hidden state is used to assist initialization of each decoding network of the current round.

5. The method for training a human-computer dialogue model according to claim 1, wherein an encoder hidden state is obtained while a current-round user dialogue sequence is obtained according to the current-round user dialogue data;

in the first training processing mode, the encoder hidden state is used for assisting the natural language understanding network to initialize in the current round;

acquiring a first network hidden state while acquiring the current round of dialog understanding sequence, wherein the first network hidden state is used for assisting the current round of dialog state tracking network to initialize;

acquiring a second network hidden state while acquiring the current round of conversation state accumulation sequence, wherein the second network hidden state is used for assisting the current round of conversation strategy learning network to initialize;

acquiring a third network hidden state while acquiring the action sequence of the system in the current round, wherein the third network hidden state is used for assisting the natural language generation network in the current round to initialize;

in the second training processing mode, the encoder hidden state is used for assisting the natural language understanding network to initialize in the current round;

acquiring a first network hidden state while acquiring the dialog understanding sequence of the current round, wherein the first network hidden state is used for assisting the dialog state tracking network of the current round to initialize;

acquiring a second network hidden state while acquiring the current round of dialog state accumulation sequence, wherein the second network hidden state is used for assisting the current round of natural language generation network to initialize;

in the third training processing mode, the encoder hidden state is used for assisting the dialog state tracking network in the current round to initialize;

in the fourth training processing mode, the encoder hidden state is used for assisting the dialog state tracking network in the current round to initialize;

and acquiring a second network hidden state while acquiring the current round of dialog state accumulation sequence, wherein the second network hidden state is used for assisting the current round of natural language generation network to initialize.

6. A method for training a human-machine interaction model according to claim 1, wherein the input data is encoded using a bi-directional gated recurrent neural network.

7. A method of human-computer interaction, comprising the steps of:

acquiring user dialogue data of the current round;

inputting the user dialogue data of the current round into a man-machine dialogue model trained by the training method according to any one of claims 1-6 to obtain the reply data of the system of the current round;

and sending the reply data of the current round of system to a user.

8. A human-computer dialogue model is characterized by comprising a data acquisition module, an encoding module and a decoding module, wherein:

the data acquisition module is used for acquiring input data, wherein the input data comprises user conversation data of the current round, and if the current round of conversation is not the first round of conversation, the data acquisition module acquires the previous round of data from the decoding module according to the composition of sub-modules included by the decoding module, and the previous round of data at least comprises a previous round of system reply;

the encoding module is used for encoding the input data and acquiring a corresponding input sequence;

the decoding module is used for acquiring the reply data of the current round of system according to the input sequence, the decoding module comprises a plurality of different sub-modules, the decoding module at least comprises a dialogue state tracking sub-module and a natural language generation sub-module, and the decoding module also comprises a model database;

if the decoding module further comprises a natural language understanding sub-module and a dialogue strategy learning sub-module, the previous round of data further comprises a previous round of dialogue history accumulation, and the previous round of dialogue history accumulation comprises a previous round of dialogue state accumulation and a previous round of system action;

the natural language understanding submodule is used for acquiring a current round of dialogue understanding sequence according to the input sequence;

the dialogue state tracking submodule is positioned at the downstream of the natural language understanding submodule and is used for acquiring a current round of dialogue state accumulation sequence according to the input sequence and the current round of dialogue understanding sequence;

the dialogue strategy learning submodule is positioned at the downstream of the dialogue state tracking submodule and is used for acquiring a system action sequence of the current round according to the input sequence, the dialogue state accumulation sequence of the current round and the model database;

the natural language generation submodule is positioned at the downstream of the dialogue strategy learning submodule and used for acquiring a current round of system reply sequence according to the input sequence, the current round of system action sequence and the model database and providing corresponding current round of system reply data;

if the decoding module further comprises a natural language understanding sub-module and does not comprise a dialogue strategy learning sub-module, the previous round of data further comprises the previous round of dialogue state accumulation;

the natural language generation submodule is positioned at the downstream of the dialogue strategy learning submodule and used for acquiring a current round of system reply sequence according to the input sequence, the current round of dialogue state accumulation sequence and the model database and providing corresponding current round of system reply data;

if the decoding module further comprises a dialogue strategy learning submodule and does not comprise a natural language understanding submodule, the previous round of data further comprises a previous round of dialogue history accumulation, and the previous round of dialogue history accumulation comprises a previous round of dialogue state accumulation and a previous round of system action;

the dialogue state tracking submodule is used for acquiring a current round of dialogue state accumulation sequence according to the input sequence;

if the decoding module does not comprise a dialogue strategy learning submodule and a natural language understanding submodule, the previous round of data also comprises a previous round of dialogue state accumulation;

and the natural language generation submodule is positioned at the downstream of the dialogue strategy learning submodule and is used for acquiring a current round of system reply sequence according to the input sequence, the current round of dialogue state accumulation sequence and the model database and providing corresponding current round of system reply data.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method steps as claimed in claims 1 to 7 when executing the program.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method steps of claims 1 to 7.