CN114490968A

CN114490968A - Dialog state tracking method, model training method and device and electronic equipment

Info

Publication number: CN114490968A
Application number: CN202111647412.0A
Authority: CN
Inventors: 苑浩; 胡江鹭; 孙辉丰; 孙叔琦; 常月; 李婷婷
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-12-29
Filing date: 2021-12-29
Publication date: 2022-05-13
Anticipated expiration: 2041-12-29
Also published as: CN114490968B

Abstract

The application discloses a dialogue state tracking method, a model training method, a device and electronic equipment, and relates to the technical field of computers, in particular to the artificial intelligence fields of deep learning, natural language processing and the like. The specific implementation scheme is as follows: obtaining conversation sentences of the current turn; obtaining conversation historical data; the conversation history data comprises context information and historical conversation states of the conversation sentences; acquiring dialogue state operation marking data of each word in the dialogue sentences according to the dialogue sentences and the context information; and determining the conversation state of the current turn according to the conversation state operation marking data and the historical conversation state. According to the method and the device, the dialog state of the current turn is determined based on the dialog state operation marking data and the historical dialog state, and the dialog state can be tracked without combining scenes. The method realizes the dialogue state tracking method decoupled from the scene, has strong generalization, and can be universally used in various scenes.

Description

Dialog state tracking method, model training method and device and electronic equipment

Technical Field

The application relates to the technical field of computers, in particular to a dialog state tracking method, a model training method and a device and electronic equipment, and further relates to the field of artificial intelligence such as deep learning and natural language processing.

Background

With the continuous development of intelligent dialog systems, more and more application scenarios are available, such as human-computer interaction scenarios of information query, telephone traffic reply, intelligent device control, and the like. In a task-based conversation scene, a conversation state needs to be tracked, and relevant information such as user intentions and the like is obtained from the conversation state in time, so that corresponding reply statements are made.

Disclosure of Invention

The application provides a dialogue state tracking method, a model training method, a device and electronic equipment.

According to a first aspect of the present application, there is provided a dialog state tracking method, including:

obtaining conversation sentences of the current turn;

obtaining conversation historical data; the dialogue history data comprises context information and historical dialogue states of the dialogue sentences;

acquiring dialogue state operation marking data of each word in the dialogue sentences according to the dialogue sentences and the context information;

and determining the conversation state of the current turn according to the conversation state operation marking data and the historical conversation state.

According to a second aspect of the present application, there is provided a dialog state tracking model training method, the dialog state tracking model being used in a dialog state tracking task, the method comprising:

obtaining a plurality of turns of dialogue statement samples and operation label real data of each word in each dialogue statement sample;

obtaining a current turn of dialogue statement samples and context information samples of the current turn of dialogue statement samples from the multiple turns of dialogue statement samples;

inputting the current turn of dialogue statement samples and the context information samples into a pre-training model to obtain dialogue state operation labeling data of each word in the current turn of dialogue statement samples;

determining the operation label real data of each word in the current turn of conversation statement samples from the operation label real data of each word in each conversation statement sample;

and training the pre-training model according to the dialogue state operation labeling data of each word in the dialogue statement sample of the current round and the operation label real data of each word in the dialogue statement sample of the current round, acquiring model parameters, and generating the dialogue state tracking model according to the model parameters.

According to a third aspect of the present application, there is provided a dialog state tracking apparatus comprising:

the first acquisition module is used for acquiring the conversation sentences of the current turn;

the second acquisition module is used for acquiring conversation historical data; the dialogue history data comprises context information and historical dialogue states of the dialogue sentences;

a third obtaining module, configured to obtain, according to the dialog statement and the context information, dialog state operation tagging data of each word in the dialog statement;

and the determining module is used for determining the conversation state of the current turn according to the conversation state operation marking data and the historical conversation state.

According to a fourth aspect of the present application, there is provided a dialogue state tracking model training apparatus, including:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring multiple turns of dialogue statement samples and operation tag real data of each word in each dialogue statement sample;

a second obtaining module, configured to obtain, from the multiple rounds of dialogue statement samples, a current round of dialogue statement samples and a context information sample of the current round of dialogue statement samples;

a third obtaining module, configured to input the dialog statement sample of the current round and the context information sample into a pre-training model, and obtain dialog state operation labeling data of each word in the dialog statement sample of the current round;

a determining module, configured to determine, from the operation tag real data of each word in each dialog statement sample, the operation tag real data of each word in the dialog statement sample of the current round;

and the training module is used for training the pre-training model according to the dialogue state operation labeling data of each word in the current turn of dialogue statement samples and the operation label real data of each word in the current turn of dialogue statement samples, acquiring model parameters and generating the dialogue state tracking model according to the model parameters.

According to a fifth aspect of the present application, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the dialog state tracking method of the first aspect or to perform the dialog state tracking model training method of the second aspect.

According to a sixth aspect of the present application, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to execute the dialog state tracking method of the aforementioned first aspect or the dialog state tracking model training method of the aforementioned second aspect.

According to a seventh aspect of the present application, there is provided a computer program product comprising a computer program which, when executed by a processor, performs the steps of the method of the aforementioned first aspect, or performs the steps of the method of the aforementioned second aspect.

According to the technical scheme of the application, the operation form of the dialogue state is systematically represented through the dialogue state operation marking data of each word in the dialogue statement. And determining the dialog state of the current turn based on the dialog state operation marking data and the historical dialog state, and realizing the tracking of the dialog state without combining scenes. The method realizes the dialogue state tracking method decoupled from the scene, has strong generalization, and can be universally used in various scenes.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

fig. 1 is a schematic flowchart of a dialog state tracking method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a dialog provided by an embodiment of the present application;

FIG. 3 is a flow chart illustrating another dialog state tracking method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a dialog provided by an embodiment of the present application;

FIG. 5 is a diagram of a dialog state tracking model provided by an embodiment of the present application;

FIG. 6 is a flowchart illustrating a method for training a dialog state tracking model according to an embodiment of the present application;

FIG. 7 is a flowchart illustrating another dialog state tracking model training method according to an embodiment of the present application;

FIG. 8 is a schematic illustration of a distillation scheme for knowledge provided by an embodiment of the present application;

FIG. 9 is a block diagram of a dialog state tracking device according to an embodiment of the present application;

FIG. 10 is a block diagram illustrating an exemplary dialog state tracking model training apparatus according to an embodiment of the present disclosure;

FIG. 11 is a block diagram of another dialog state tracking model training apparatus according to an embodiment of the present application;

fig. 12 is a block diagram of an electronic device to implement a dialog state tracking method or a dialog state tracking model training method according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that, in the technical solution of the present application, the acquisition, storage, application, and the like of the personal information of the related user all conform to the regulations of the relevant laws and regulations, and do not violate the common customs of the public order.

In the related art, a model-based dialog state tracking method is often coupled to a certain scene or field, and a dialog state is predicted based on the scene or field. Models that are not related to a certain scene or domain have poor performance on the session state tracking task.

Therefore, the application provides a dialogue state tracking method, a model training method, a device and electronic equipment. Specifically, a dialogue state tracking method, a model training method, an apparatus, and an electronic device according to an embodiment of the present application are described below with reference to the drawings.

Fig. 1 is a schematic flowchart of a dialog state tracking method according to an embodiment of the present application. It should be noted that the dialog state tracking method according to the embodiment of the present application can be applied to the dialog state tracking apparatus according to the embodiment of the present application, and the dialog state tracking apparatus can be configured on an electronic device.

As shown in fig. 1, the dialog state tracking method may include the steps of:

step 101, obtaining the dialog sentences of the current turn.

Step 102, obtaining conversation history data. The dialogue history data includes context information of the dialogue sentences and historical dialogue states.

In some embodiments of the present application, the dialog state may include a user intent and a word slot value. As an example, fig. 2 is a schematic diagram of a dialog provided in an embodiment of the present application, and as shown in fig. 2, a dialog statement 205 is a dialog statement of a current turn. Wherein the context information of the dialog sentences in the dialog history data comprises the

dialog sentences

201 and 204, and the historical dialog state 212: the user intent (intent) is "look through weather", and the word slot value of the place word slot (slots) is "city a".

And 103, acquiring dialogue state operation marking data of each word in the dialogue sentences according to the dialogue sentences and the context information.

It should be noted that the dialog state operation marking data is used for representing different dialog state operations. Various dialog state operations are included in the dialog state, such as an add operation, a delete operation, an update operation, and so on. When a word slot value corresponding to a certain word slot appears for the first time in the conversation state, the conversation state operation marking data of each word in the word slot value is marked as adding operation marking data, and the adding operation marking data can be represented by English add; when a certain word slot value is not used or discarded in the conversation state, the conversation state operation marking data of each word in the word slot value is marked as deletion operation marking data, and the deletion operation marking data can be represented by English delete; for a word slot which has appeared in a conversation state, when the word slot value corresponding to the word slot changes, the conversation state operation marking data of each word in the word slot value is marked as updating operation marking data, and the updating operation marking data can be represented by English update; marking the dialogue state operation marking data of each word in the information irrelevant to the dialogue state in the dialogue state as filling operation marking data, wherein the filling operation marking data can be expressed by English padnone; for the word slot that has appeared in the dialog state, when the word slot value corresponding to the word slot has not changed, the dialog state operation marking data of each word in the word slot value is marked as the forwarding operation marking data, and the forwarding operation marking data can be represented by English carryover.

Taking the embodiment shown in fig. 2 as an example, according to the dialogue statement 205 and the context information (the dialogue statement 201 and 204), the dialogue state operation marking data of each word of "tomorrow" in the dialogue statement 205 is marked as the adding operation marking data (add), and the other words in the dialogue statement 205 are marked as the filling operation marking data (padnone).

Note that the user intention in the dialogue sentence may also be labeled by the above method.

Alternatively, the dialogue state tracking model may be trained in advance, so that the model may learn the predictive ability of the dialogue state operation on each word in the dialogue statement sample of the current round in combination with the historical context information sample, and the dialogue statement and the context information of the current round are input to the dialogue state tracking model to obtain the dialogue state operation annotation data of each word in the dialogue statement. The dialog state tracking model may be trained based on an ERNIE (Enhanced Representation through Knowledge Integration, pre-training language model).

And 104, determining the conversation state of the current turn according to the conversation state operation marking data and the historical conversation state.

In some embodiments of the present application, the operation annotation data of the word slot value in the dialogue statement may be determined according to the dialogue state operation annotation data. And updating the historical conversation state based on the word slot in which the word slot value is positioned and the operation marking data of the word slot value in the conversation sentence to obtain the conversation state of the current turn. Taking the embodiment shown in fig. 2 as an example, the dialog state 213 of the current round is determined according to the dialog state operation marking data and the historical dialog state 212 acquired in step 103: the user intention (intent) is "look up weather", the word slot value of the place word slot (slots) is "city a", and the word slot value of the time word slot (slots) is "tomorrow".

According to the dialogue state tracking method, the operation form of the dialogue state is systematically represented through the dialogue state operation marking data of each word in the dialogue statement. And determining the dialog state of the current turn based on the dialog state operation marking data and the historical dialog state, and realizing the tracking of the dialog state without combining scenes. The dialogue state tracking method is decoupled from the scenes, has strong generalization and can be universally used in various scenes.

Fig. 3 is a flowchart illustrating another dialog state tracking method according to an embodiment of the present application. As shown in fig. 3, the dialog state tracking method may include the steps of:

step 301, obtaining the dialog sentences of the current turn.

Step 302, session history data is obtained. The dialog history data includes context information and historical dialog states for the dialog statements.

Step 303, inputting the dialogue sentences and the context information into a preset dialogue state tracking model. The dialogue state tracking model has previously learned the predictive ability of the dialogue state operation on each word in the current turn of dialogue statement samples in combination with the historical context information samples.

As an example, the dialog state tracking model may be constructed based on a pre-trained model, such as ERNIE (Enhanced Representation through Knowledge Integration).

In an optional implementation manner, the dialogue state tracking model may be a model obtained by knowledge distillation of a trained pre-training model, so as to meet the requirement of online application of the model, that is, while ensuring the performance of the dialogue state tracking model, the calculation amount is reduced, and the efficiency of the dialogue state tracking model is improved.

Step 304, obtaining dialogue state operation marking data of each word in the dialogue statement output by the dialogue state tracking model.

As an example, fig. 4 is a schematic diagram of a dialog provided in an embodiment of the present application, and as shown in fig. 4, it is assumed that a dialog sentence 405 is a dialog sentence of a current turn, and a historical dialog state before the dialog sentence 405 is a dialog state 412, where the user intention (intent) is "check weather", and a word slot value of a place word slot (slots) is "city a". As shown in FIG. 5, the dialogue statement 405 "does not search for the city A, searches for the city B" and the context information (dialogue statement 401 and 404) of the dialogue statement 405 are input into the preset dialogue state tracking model. The dialog state tracking model outputs dialog state operation annotation data 501 for each word in the dialog statement 405 "do not look up city a, look up city bar B". The "city a" is a word slot value discarded by a place word slot in the dialog state, deletion operation labeling data (i.e., a label 0 shown in fig. 5) is output for three characters of the "city a", the "city B" is a word slot value updated after the place word slot is changed, and update operation labeling data (i.e., a label 1 shown in fig. 5) is output for three characters of the "city B". Furthermore, stuffing operation annotation data (i.e., tag 3 shown in fig. 5) is output for words in the conversational sentence 405 that are not related to the conversational state.

And 305, determining operation marking data of the word slot value in the dialogue statement according to the dialogue state operation marking data.

As an example, taking the embodiment shown in fig. 4 and 5 as an example, the dialog sentence 405 "does not search for a city, and searches for B city bar" includes "a city" and "B city". According to the conversation state operation labeling data 501, it is determined that the operation labeling data of the word slot value "a city" in the conversation sentence is the deletion operation labeling data (i.e., the label 0 shown in fig. 5), and the operation labeling data of "B city" is the update operation labeling data (i.e., the label 1 shown in fig. 5).

And step 306, updating the historical conversation state based on the word slot in which the word slot value is located and the operation marking data of the word slot value in the conversation sentence, and obtaining the conversation state of the current turn.

Taking the embodiment shown in fig. 4 and 5 as an example, the historical dialog state 412 is updated based on the operation labeling data of the word slot value "city a" and the word slot value "city B" and the word slot in which the word slot value is located, and the dialog state 413 of the current turn is obtained, in which the user intention (intent) is "weather finding", and the word slot value of the "place" word slot (slots) is updated from "city a" to "city B".

In the embodiment of the present application, step 301 and step 302 may be implemented by using any one of the manners in the embodiments of the present application, and this application is not specifically limited and will not be described again.

According to the dialogue state tracking method, through the dialogue state tracking model, the dialogue state operation labeling data of each character in the spoken sentence is updated, and the current turn of dialogue state is determined based on the word slot with the word slot value and the operation labeling data of the word slot value in the dialogue sentence. According to the method and the device, the tracking of the dialogue state can be realized without combining scenes, the dialogue state tracking method decoupled from the scenes is realized, the generalization of the dialogue state tracking method is further enhanced, and the dialogue state tracking method can be universally used in various scenes.

Fig. 6 is a flowchart illustrating a method for training a dialog state tracking model according to an embodiment of the present application. It should be noted that the dialog state tracking model training method according to the embodiment of the present application may be applied to the dialog state tracking model training apparatus according to the embodiment of the present application, and the dialog state tracking model training apparatus may be configured on an electronic device.

As shown in fig. 6, the dialog state tracking model training method may include the following steps:

step 601, obtaining a plurality of turns of dialogue statement samples and operation label real data of each word in each dialogue statement sample.

Step 602, obtaining the dialog statement samples of the current round and the context information samples of the dialog statement samples of the current round from the dialog statement samples of the multiple rounds.

Step 603, inputting the dialog statement samples and the context information samples of the current round into the pre-training model, and obtaining dialog state operation labeling data of each word in the dialog statement samples of the current round.

In some embodiments of the present application, the pre-training model may be ERNIE (Enhanced Representation through Knowledge Integration, Knowledge Enhanced pre-training language model)

Step 604, determining the operation label real data of each word in the current turn of dialogue statement samples from the operation label real data of each word in each dialogue statement sample.

Step 605, training a pre-training model according to the dialogue state operation labeling data of each word in the dialogue statement sample of the current round and the operation label real data of each word in the dialogue statement sample of the current round, acquiring model parameters, and generating a dialogue state tracking model according to the model parameters.

In some embodiments of the present application, the pre-training model may be trained in a loss function determining manner according to the dialogue state operation labeling data of each word in the current turn of dialogue statement samples and the operation label real data of each word in the current turn of dialogue statement samples, and the model parameters may be obtained, and the dialogue state tracking model may be generated according to the model parameters.

According to the dialogue state tracking model training method, the pre-training model is trained according to the dialogue state operation labeling data of each word in the current turn of dialogue statement samples and the operation label real data of each word in the current turn of dialogue statement samples, the dialogue state tracking model is generated, the dialogue state tracking model is made to learn the prediction capability of carrying out dialogue state operation on each word in the current turn of dialogue statement samples by combining historical context information samples, and the performance in a dialogue state tracking task is improved.

The dialogue state tracking model generated based on the pre-training model has a large number of parameters and is large in calculation amount in application. In order to optimize the performance of the dialogue state tracking model, knowledge distillation can be carried out on the dialogue state tracking model, the performance of the dialogue state tracking model is guaranteed, meanwhile, the calculated amount is reduced, and the efficiency of the dialogue state tracking model is improved. As an example, fig. 7 is a schematic flowchart of another training method for a dialog state tracking model provided in an embodiment of the present application. As shown in fig. 7, the dialog state tracking model training method may include the following steps:

step 701, obtaining multiple turns of dialogue statement samples and operation tag real data of each word in each dialogue statement sample.

Step 702, obtaining the dialog statement samples of the current round and the context information samples of the dialog statement samples of the current round from the dialog statement samples of the multiple rounds.

And 703, inputting the dialog statement samples and the context information samples of the current round into a pre-training model, and obtaining dialog state operation labeling data of each word in the dialog statement samples of the current round.

Step 704, determining the operation label real data of each word in the current turn of dialogue statement samples from the operation label real data of each word in each dialogue statement sample.

Step 705, training a pre-training model according to the dialogue state operation labeling data of each word in the current turn of dialogue statement samples and the operation label real data of each word in the current turn of dialogue statement samples, acquiring model parameters, and generating a dialogue state tracking model according to the model parameters.

Step 706, constructing a bidirectional long-short term memory network BilSTM model.

And 707, knowledge distillation is carried out on the BilSTM model by adopting the dialogue state tracking model to obtain a knowledge-distilled BilSTM model, and the knowledge-distilled BilSTM model is used in a dialogue state tracking task.

As an example, as shown in FIG. 8, knowledge distillation is performed on the BilSTM model by using the dialogue state tracking model, and the knowledge-distilled BilSTM model is obtained.

According to the training method of the dialogue state tracking model, a pre-training model is trained according to the dialogue state operation labeling data of each word in the current turn of dialogue statement samples and the operation label real data of each word in the current turn of dialogue statement samples, and the dialogue state tracking model is generated. Knowledge distillation is carried out on the BilSTM model by adopting the dialogue state tracking model to obtain the BilSTM model after knowledge distillation, and the BilSTM model after knowledge distillation is used in the dialogue state tracking task, so that the performance of the dialogue state tracking model is ensured, the model calculation amount is reduced, and the efficiency in the dialogue state tracking task is further improved.

Fig. 9 is a block diagram of a dialog state tracking device according to an embodiment of the present application. As shown in fig. 9, the apparatus may include a first obtaining module 901, a second obtaining module 902, a third obtaining module 903, and a determining module 904.

Specifically, the first obtaining module 901 is configured to obtain the dialog statements of the current turn.

A second obtaining module 902, configured to obtain session history data; the dialogue history data includes context information of dialogue statements and historical dialogue states.

A third obtaining module 903, configured to obtain, according to the dialog statement and the context information, dialog state operation tagging data of each word in the dialog statement.

And the determining module 904 is configured to determine a current turn of the dialog state according to the dialog state operation marking data and the historical dialog state.

In some embodiments of the present application, the third obtaining module 903 is further configured to: inputting a conversation statement and context information into a preset conversation state tracking model; the conversation state tracking model learns the prediction capability of carrying out conversation state operation on each word in the current turn of conversation statement samples by combining historical context information samples in advance; and acquiring dialogue state operation marking data of each word in the dialogue sentences output by the dialogue state tracking model.

In some embodiments of the present application, the dialog state tracking model is constructed based on a pre-trained model.

In some embodiments of the present application, the dialogue state tracking model is a model obtained by knowledge distillation of a trained pre-training model.

In some embodiments of the present application, the determining module 904 is further configured to: according to the dialogue state operation marking data, determining operation marking data of a word slot value in a dialogue sentence; and updating the historical conversation state based on the word slot in which the word slot value is positioned and the operation marking data of the word slot value in the conversation sentence to obtain the conversation state of the current turn.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

According to the dialogue state tracking device, through the dialogue state tracking model, the dialogue state operation labeling data of each character in the spoken sentence are updated, the historical dialogue state is updated based on the word slot with the word slot value and the operation labeling data of the word slot value in the dialogue sentence, and the dialogue state of the current turn is determined. According to the method and the device, the tracking of the conversation state can be realized without combining scenes, the conversation state tracking method decoupled from the scenes is realized, the generalization of the conversation state tracking device is enhanced, and the conversation state tracking device can be universally used in various scenes.

Fig. 10 is a block diagram of a dialog state tracking model training apparatus according to an embodiment of the present application. As shown in fig. 10, the apparatus may include: a first obtaining module 1001, a second obtaining module 1002, a third obtaining module 1003, a determining module 1004, and a training module 1005.

Specifically, the first obtaining module 1001 is configured to obtain multiple turns of dialogue statement samples and operation tag real data of each word in each dialogue statement sample.

The second obtaining module 1002 is configured to obtain a current turn of dialogue statement samples and a current turn of context information samples of dialogue statement samples from the current turn of dialogue statement samples.

A third obtaining module 1003, configured to input the dialog statement sample and the context information sample of the current round to the pre-training model, and obtain dialog state operation labeling data of each word in the dialog statement sample of the current round.

A determining module 1004, configured to determine, from the operation tag real data of each word in each dialogue statement sample, the operation tag real data of each word in the current round of dialogue statement sample.

The training module 1005 is configured to train a pre-training model according to the dialogue state operation labeling data of each word in the current turn of dialogue statement samples and the operation label real data of each word in the current turn of dialogue statement samples, acquire model parameters, and generate a dialogue state tracking model according to the model parameters.

According to the dialogue state tracking model training device, the pre-training model is trained according to the dialogue state operation labeling data of each word in the current turn of dialogue statement samples and the operation label real data of each word in the current turn of dialogue statement samples, the dialogue state tracking model is generated, the dialogue state tracking model is made to learn the capability of predicting dialogue state operation, and performance in a dialogue state tracking task is improved.

Fig. 11 is a block diagram of another dialog state tracking model training apparatus according to an embodiment of the present application. As shown in fig. 11, on the basis of the above embodiment, the dialog state tracking device may further include a construction module 1106 and a distillation module 1107.

Specifically, the building module 1106 is used for building a bidirectional long-short term memory network BilSTM model.

And the distillation module 1107 is configured to perform knowledge distillation on the BilSTM model by using the dialogue state tracking model to obtain a knowledge-distilled BilSTM model, and use the knowledge-distilled BilSTM model in a dialogue state tracking task.

Wherein 1101-1105 in FIG. 11 and 1001-1005 in FIG. 10 have the same functions and structures.

According to the dialogue state tracking model training device, a pre-training model is trained according to dialogue state operation labeling data of each word in the current turn of dialogue statement samples and operation label real data of each word in the current turn of dialogue statement samples, and a dialogue state tracking model is generated. Knowledge distillation is carried out on the BilSTM model by adopting the dialogue state tracking model to obtain the knowledge-distilled BilSTM model, and the knowledge-distilled BilSTM model is used in the dialogue state tracking task, so that the performance of the dialogue state tracking model is ensured, the model calculation amount is reduced, and the efficiency in the dialogue state tracking task is further improved.

There is also provided, in accordance with an embodiment of the present application, an electronic device, a readable storage medium, and a computer program product.

As shown in fig. 12, fig. 12 is a block diagram of an electronic device for implementing a dialog state tracking method or a dialog state tracking model training method according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 12, the electronic apparatus includes: one or more processors 1201, memory 1202, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, if desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 12 illustrates an example of one processor 1201.

Memory 1202 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform a dialog state tracking method or a dialog state tracking model training method provided herein. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform a dialog state tracking method or a dialog state tracking model training method provided by the present application.

The memory 1202 is a non-transitory computer readable storage medium, and can be used for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the dialog state tracking method or the dialog state tracking model training method in the embodiments of the present application (for example, the first obtaining module 901, the second obtaining module 902, the third obtaining module 903, and the determining module 904 shown in fig. 9, and the first obtaining module 1101, the second obtaining module 1102, the third obtaining module 1103, the determining module 1104, the training module 1105, the constructing module 1106, and the distilling module 1107 shown in fig. 11). The processor 1201 executes various functional applications of the server and data processing, i.e., implementing the dialog state tracking method or the dialog state tracking model training method in the above-described method embodiments, by executing non-transitory software programs, instructions, and modules stored in the memory 1202.

The memory 1202 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by use of the electronic device of the dialogue state tracking method or the dialogue state tracking model training method, or the like. Further, the memory 1202 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 1202 may optionally include memory located remotely from the processor 1201 which may be connected via a network to an electronic device for implementing a dialog state tracking method or a dialog state tracking model training method. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the dialog state tracking method or the dialog state tracking model training method may further include: an input device 1203 and an output device 1204. The processor 1201, the memory 1202, the input device 1203, and the output device 1204 may be connected by a bus or other means, and the bus connection is exemplified in fig. 12.

The input device 1203 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus of the dialog state tracking method or the dialog state tracking model training method, such as an input device like a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, etc. The output devices 1204 may include a display device, auxiliary lighting devices (e.g., LEDs), tactile feedback devices (e.g., vibrating motors), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that implement the dialog state tracking method or the dialog state tracking model training method described in the above embodiments when executed by a processor, the one or more computer programs being executable and/or interpretable on a programmable system including at least one programmable processor, which may be a special or general purpose programmable processor, that can receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain. It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A dialog state tracking method, comprising:

obtaining conversation sentences of the current turn;

2. The method of claim 1, wherein the obtaining dialog state operation annotation data for each word in the dialog statement according to the dialog statement and the context information comprises:

inputting the dialogue statement and the context information into a preset dialogue state tracking model; the conversation state tracking model learns the prediction capability of carrying out conversation state operation on each word in the current turn of conversation statement samples by combining historical context information samples in advance;

and acquiring dialogue state operation marking data of each word in the dialogue statement output by the dialogue state tracking model.

3. The method of claim 2, wherein the dialog state tracking model is constructed based on a pre-trained model.

4. The method of claim 3, wherein the dialogue state tracking model is a knowledge distillation model of the trained pre-trained model.

5. The method of claim 1, wherein the determining the dialog state for the current turn based on the dialog state operational annotation data and the historical dialog state comprises:

determining operation labeling data of a word slot value in the dialogue sentence according to the dialogue state operation labeling data;

and updating the historical conversation state based on the word slot in which the word slot value is positioned and the operation marking data of the word slot value in the conversation sentence to obtain the conversation state of the current turn.

6. A method of training a dialog state tracking model for use in a dialog state tracking task, the method comprising:

obtaining a dialog statement sample of the current turn and a context information sample of the dialog statement sample of the current turn from the dialog statement samples of the multiple turns;

7. The method of claim 6, further comprising:

constructing a bidirectional long-short term memory network (BilSTM) model;

and knowledge distillation is carried out on the BilTM model by adopting the dialogue state tracking model to obtain a knowledge-distilled BilTM model, and the knowledge-distilled BilTM model is used in a dialogue state tracking task.

8. A dialog state tracking device, comprising:

9. The apparatus of claim 8, wherein the third obtaining means is further configured to:

10. The apparatus of claim 9, wherein the dialog state tracking model is constructed based on a pre-trained model.

11. The apparatus of claim 10, wherein the dialogue state tracking model is a knowledge distillation model of the trained pre-trained model.

12. The apparatus of claim 8, wherein the means for determining is further configured to:

determining operation marking data of a word slot value in the dialogue statement according to the dialogue state operation marking data;

13. A dialog state tracking model training apparatus, comprising:

a second obtaining module, configured to obtain, from the multiple turns of dialogue and statement samples, a current turn of dialogue and statement samples and a context information sample of the current turn of dialogue and statement samples;

a third obtaining module, configured to input the current round of dialogue statement samples and the context information samples into a pre-training model, and obtain dialogue state operation labeling data of each word in the current round of dialogue statement samples;

14. The apparatus of claim 13, further comprising:

the building module is used for building a bidirectional long and short term memory network (BilSTM) model;

and the distillation module is used for carrying out knowledge distillation on the BilSTM model by adopting the dialogue state tracking model to obtain a knowledge-distilled BilSTM model, and using the knowledge-distilled BilSTM model in a dialogue state tracking task.

15. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 5 or to perform the method of any one of claims 6 to 7.

16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-5 or to perform the method of any one of claims 6-7.

17. A computer program product comprising a computer program which, when executed by a processor, carries out the steps of the method of any one of claims 1 to 5 or carries out the steps of the method of any one of claims 6 to 7.