CN111090728B

CN111090728B - Dialogue state tracking method and device and computing equipment

Info

Publication number: CN111090728B
Application number: CN201911284102.XA
Authority: CN
Inventors: 石智中; 朱峰; 翟羽佳
Original assignee: Chezhi Interconnection Beijing Technology Co ltd
Current assignee: Chezhi Interconnection Beijing Technology Co ltd
Priority date: 2019-12-13
Filing date: 2019-12-13
Publication date: 2023-05-26
Anticipated expiration: 2039-12-13
Also published as: CN111090728A

Abstract

The invention discloses a dialogue state tracking method, a device and a computing equipment, wherein the computing equipment is connected with a data storage device, the data storage device stores mode information of at least one dialogue service mode, each mode information comprises a service name, a service description, at least one intention and at least one slot, each intention comprises an intention name, an intention description and at least one request slot, and the method comprises the following steps: identifying a current intention based on the dialogue statement of the current round and all the stored intents; identifying the current request slot based on the dialogue statement of the current round and all the stored request slots; identifying a slot value of a currently activated slot based at least on the dialogue statement of the current round and all the stored slots; and determining the slot values of the current intention, the current request slot and the current activation slot as a current dialogue state so as to respond to the user statement based on the current dialogue state.

Description

Dialogue state tracking method and device and computing equipment

Technical Field

The present invention relates to the field of natural language processing, and in particular, to a method and apparatus for tracking a dialogue state in a man-machine dialogue, and a computing device.

Background

The multi-wheel dialogue technology in the man-machine dialogue technology is one of core technologies for realizing the intelligent man-machine interaction system, has wide application scenes, and can be directly used in specific business processing, such as hotel reservation, train ticket reservation and the like.

How to track the dialog state in a multi-round dialog, so that the dialog can be smoothly performed, is one of the key problems. The existing dialog state tracking technology generally updates dialog states based on rules, has poor generalization and is difficult to apply to complex dialog contexts.

Therefore, how to efficiently and accurately track the dialogue state so as to further generate a reasonable system response is a technical problem to be solved in the current man-machine dialogue system design.

Disclosure of Invention

The present invention has been made in view of the above problems, and it is an object of the present invention to provide a dialog state tracking method, apparatus and computing device that overcomes or at least partially solves the above problems.

According to one aspect of the present invention, there is provided a dialog state tracking method, executed in a computing device connected to a data storage having stored therein pattern information of at least one dialog service pattern, each pattern information including a service name, a service description, at least one intent, and at least one slot, each intent including an intent name, an intent description, and at least one request slot, the method comprising: identifying a current intention based on the dialogue statement of the current round and all the stored intents; identifying the current request slot based on the dialogue statement of the current round and all the stored request slots; identifying a slot value of a currently activated slot based at least on the dialogue statement of the current round and all the stored slots; and determining the slot values of the current intention, the current request slot and the current activation slot as a current dialogue state so as to respond to the user statement based on the current dialogue state.

Optionally, in the dialog state tracking method according to the present invention, the identifying the current intention based on the dialog sentence of the current round and all the stored intents includes: acquiring a first semantic representation of a dialogue sentence of a current round; acquiring semantic representations of each intention, and splicing the semantic representations of all the intents to obtain a second semantic representation; after the first semantic representation and the second semantic representation are spliced, the first semantic representation and the second semantic representation are input into a preset intention classifier for processing, and the current intention is obtained.

Optionally, in the dialogue state tracking method according to the present invention, the identifying the current request slot based on the dialogue statement of the current round and all the stored request slots includes: acquiring a first semantic representation of a dialogue sentence of a current round; acquiring semantic representations of each request slot, and splicing the semantic representations of all the request slots to obtain a third semantic representation; and after the first semantic representation and the third semantic representation are spliced, inputting the first semantic representation and the third semantic representation into a preset request slot classifier for processing, and obtaining the current request slot.

Optionally, in the dialogue state tracking method according to the present invention, the identifying the slot value of the currently activated slot at least based on the dialogue statement of the current round and all the stored slots includes: acquiring a first semantic representation of a dialogue sentence of a current round; acquiring semantic representations of each classification slot, and splicing the semantic representations of all classification slots to obtain a fourth semantic representation; after the first semantic representation and the fourth semantic representation are spliced, inputting a preset first state classifier for processing to obtain an activated classification slot; splicing the semantic representation of the activated classified slots and the semantic representations of all possible slot values of the activated classified slots to obtain a fifth semantic representation; and after the first semantic representation and the fifth semantic representation are spliced, inputting a preset slot value classifier for processing to obtain the slot value of the currently activated classification slot.

Optionally, in the dialog state tracking method according to the present invention, the identifying the slot value of the currently active slot based on at least the dialog sentence of the current round and all the stored slots further includes: acquiring sixth semantic representations of dialogue sentences of all rounds; acquiring semantic representations of each non-classified slot, and splicing the semantic representations of all the non-classified slots to obtain a seventh semantic representation; after the sixth semantic representation and the seventh semantic representation are spliced, inputting a preset second state classifier for processing to obtain an activated non-classification slot; after splicing the sixth semantic representation and the activated semantic representation of the non-classified slot, inputting the sixth semantic representation and the activated semantic representation of the non-classified slot into a preset position predictor for processing to obtain a starting position and an ending position of the slot in a dialogue sentence; and acquiring the slot position value of the currently activated non-classified slot position based on the starting position and the ending position.

Optionally, in the dialog state tracking method according to the present invention, the dialog sentences of the current round include a response sentence of a previous round and a user sentence of the current round.

Optionally, the dialog state tracking method according to the present invention further includes pre-generating a semantic representation of slots, request slots, and intent.

Optionally, in the dialog state tracking method according to the present invention, the semantic representation of the slot is generated as follows: and for each slot, processing the service description of the dialogue service mode, the slot name of the slot and the slot description of the slot through a preset neural network language model respectively based on the dialogue service mode to which the slot belongs to obtain respective semantic representations, and splicing all the semantic representations to obtain the semantic representation of the slot.

Optionally, in the dialog state tracking method according to the present invention, the semantic representation of the request slot is generated as follows: and for each request slot, processing the service description of the dialogue service mode, the slot name of the request slot and the slot description of the request slot through a preset neural network language model respectively based on the dialogue service mode to which the request slot belongs to obtain respective semantic representations, and splicing all the semantic representations to obtain the semantic representation of the request slot.

Optionally, in the dialog state tracking method according to the invention, the semantic representation of the intent is generated as follows: for each intention, based on a dialogue service mode to which the intention belongs, processing a service description of the dialogue service mode, an intention name of the intention and an intention description of the intention through a preset neural network language model respectively to obtain respective semantic representations, and splicing all the semantic representations to obtain the semantic representation of the intention.

Optionally, in the dialog state tracking method according to the present invention, the neural network language model is a BERT model.

According to another aspect of the present invention there is provided a dialog state tracking device residing in a computing apparatus connected to a data storage device having stored therein pattern information for at least one dialog service pattern, each pattern information including a service name, a service description, at least one intent, and at least one slot, each intent including an intent name, an intent description, and at least one request slot, the device comprising: an intention processing unit adapted to identify a current intention based on the dialogue sentence of the current round and the stored all intents; the request slot processing unit is suitable for identifying the current request slot based on the dialogue statement of the current round and all the stored request slots; the slot value processing unit is suitable for identifying the slot value of the current activated slot at least based on the dialogue statement of the current round and all stored slots; and the dialogue state generation unit is suitable for determining the current intention, the current request slot and the slot value of the current activation slot as the current dialogue state so as to respond to the user statement based on the current dialogue state.

According to yet another aspect of the present invention, there is provided a computing device comprising: at least one processor; and a memory storing program instructions, wherein the program instructions are configured to be adapted to be executed by the at least one processor, the program instructions comprising instructions for performing the above-described method.

According to yet another aspect of the present invention, there is provided a readable storage medium storing program instructions that, when read and executed by a computing device, cause the computing device to perform the above-described method.

Unlike the prior art in which the dialogue state tracking is performed based on rules, the present invention combines the mode information and the dialogue data (current round dialogue sentences and/or historical dialogue sentences) of the dialogue service to automatically identify the intention, request slot and activate slot values of the slot, so that the identification accuracy of the dialogue state is improved.

Furthermore, the method integrates the extraction and management of dialogue state information by identifying the activation state of the slot, can be flexibly applied to various dialogue scenes, and has better generalization.

The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

FIG. 1 shows a schematic diagram of a dialog state tracking system 100, according to one embodiment of the invention;

FIG. 2 shows a schematic diagram of a computing device 200 according to one embodiment of the invention;

FIG. 3 illustrates a flow chart of a dialog state tracking method 300, according to one embodiment of the invention;

fig. 4 shows a schematic diagram of a dialog state tracking device 400 according to an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The terms involved in the embodiments of the present invention are explained first as follows:

the intention is: during the man-machine conversation, the user can speak for one time, namely what the user wants to acquire in the conversation scene, for example, the user speaks to the artificial intelligence assistant, "help me to set an alarm clock", and the user's intention is to set an alarm clock. Accurate understanding of user intent is a fundamental requirement for smooth human-machine conversation.

Groove position: elements required for a predefined dialog service in a man-machine dialog system. For example, to set a dialogue service such as alarm clock, it needs alarm clock time, whether to repeat the slot positions.

In the embodiment of the present invention, the slots are further divided into two types: a classified slot and a non-classified slot. The type-classifying slot positions are slots which can only take a fixed number of numerical values, for example, whether the alarm clock is repeated or not can only take one of 'sound only', 'workday repetition', 'daily repetition', namely the type-classifying slot positions; the non-classified slot is a slot of either classified slot, for example, the alarm clock time may take any value of a day, or a non-classified slot.

Request slot: in the man-machine conversation process, the user requests slot information from the man-machine conversation system, for example, in the restaurant booking process, the user may request the system to return slot information such as specific location, business hours, etc. of the restaurant.

Dialog state: the system state after one dialogue in the process of man-machine dialogue is mainly composed of the current user intention, the current slot filling condition (including classified slots and non-classified slots) and whether the user is requesting some slot information (request slots).

Conversational services mode (schema): mode information defining a dialog service includes a service name, a service description, at least one intent, and at least one slot, the intent including the intent name, the intent description, and the at least one request slot. One or more dialog service modes may be included in a personal computer dialog system, and one dialog service mode may include one or more intents.

An example of a schema for a flight service is as follows.

Service name: flight

Description of services: finding one or two-way flights to destination

Intent of：

Intent 1, find one-way flight intent

Intent name: finding a one-way flight

Intent description: searching for information on one-way flights

Request slot: departure city, arrival city, journey time

Intent 2, find two-way flight intent

……

Groove position：

Slot 1, passenger slot

Slot name: passengers

Slot description: number of passengers

Classification slot: is that

The possible values are: 1,2,3,4

Groove position 2, departure city groove position

Slot name: departure city

Slot description: city where journey starts

Classification slot: whether or not

Slot 3, reach city slot

……

FIG. 1 shows a schematic diagram of a dialog state tracking system 100, according to one embodiment of the invention. As shown in fig. 1, the dialog state tracking system 100 includes a user terminal 110 and a computing device 200.

The user terminal 110, i.e. a terminal device used by a user, may be a personal computer such as a desktop computer, a notebook computer, or a mobile phone, a tablet computer, a multimedia device, an intelligent speaker, an intelligent wearable device, but is not limited thereto. The computing device 200 is used to provide services to the user terminal 110, which may be implemented as a server, e.g., an application server, a Web server, etc.; but not limited to, desktop computers, notebook computers, processor chips, tablet computers, and the like.

According to one embodiment, computing device 200 may provide a human-machine conversation service, and terminal device 110 may establish a connection with computing device 200 via the internet, such that a user may conduct a human-machine conversation with computing device 200 via terminal device 110. The user opens a browser or a human-machine dialog-type Application (APP), such as an artificial intelligence assistant, on terminal device 110 and sends the user statement (text) by text input to computing device 200 by terminal device 110. After receiving the user statement, the computing device 200 performs semantic recognition on the user statement, and returns an appropriate response statement to the terminal device 110 according to the semantic recognition result, so as to realize man-machine conversation.

In one implementation, the terminal device 110 may also collect voice data of the user and perform voice recognition processing on the voice data to obtain a user sentence, or the terminal device may also send the voice data to the computing device 200, where the computing device 200 performs voice recognition processing on the voice data to obtain the user sentence.

The process of human-machine conversation typically has multiple rounds, and the computing device 200 needs to track the conversation state of the human-machine conversation in order to accurately respond to the user statement according to the current conversation state, so that the conversation can be smoothly performed.

In one embodiment, the dialog state tracking system 100 also includes a data store 120. The data storage 120 may be a relational database such as MySQL, ACCESS, etc., or a non-relational database such as NoSQL, etc.; the data storage device 120 may be a local database residing in the computing device 200, or may be a distributed database, such as HBase, disposed at a plurality of geographic locations, and in any case, the data storage device 120 is used to store data, and the specific deployment and configuration of the data storage device 120 is not limited by the present invention. The computing device 200 may connect with the data storage 120 and retrieve data stored in the data storage 120. For example, the computing device 200 may directly read the data in the data storage device 120 (when the data storage device 120 is a local database of the computing device 200), or may access the internet through a wired or wireless manner, and obtain the data in the data storage device 120 through a data interface.

In an embodiment of the invention, the data storage means 120 is adapted to store pattern information of one or more dialogue service patterns (patterns), each dialogue service pattern corresponding to a dialogue service, which may be, for example: hotel reservation service, flight reservation service, train ticket reservation service, alarm clock service, etc. In this way, the computing device 200 may provide a plurality of dialog services accordingly based on the stored mode information for the plurality of dialog service modes. Wherein the pattern information of each dialog service pattern may include a service name, a service description, at least one intention, and at least one slot, each intention including an intention name, an intention description, and at least one request slot.

In addition, the data storage 120 may also temporarily or permanently store dialogue sentences of the human-machine dialogue system, including user sentences and response sentences of the human-machine dialogue system to the user sentences. For multiple rounds of dialogue, in the embodiment of the invention, the dialogue sentences of the current round include the response sentences of the previous round and the user sentences of the current round; then the dialogue sentence of the first round (first round) then only comprises the user sentence of the first round.

In one implementation, the computing device 200 may also pre-process pattern information based on all dialog service patterns (schemas) stored in the data store 120 to generate a semantic representation of each slot, a semantic representation of each request slot, and a semantic representation of each intent, respectively. Accordingly, the data storage 120 may also store semantic representations of these slots, request slots, and intent for dialog state tracking based on these semantic representations at the application stage.

The dialog state tracking method of the present invention may be implemented in computing device 200. FIG. 2 illustrates a block diagram of a computing device 200 according to one embodiment of the invention. As shown in FIG. 2, in a basic configuration 202, computing device 200 typically includes a system memory 206 and one or more processors 204. A memory bus 208 may be used for communication between the processor 204 and the system memory 206.

Depending on the desired configuration, the processor 204 may be any type of processing including, but not limited to: a microprocessor (μp), a microcontroller (μc), a digital information processor (DSP), or any combination thereof. Processor 204 may include one or more levels of cache, such as a first level cache 210 and a second level cache 212, a processor core 214, and registers 216. The example processor core 214 may include an Arithmetic Logic Unit (ALU), a Floating Point Unit (FPU), a digital signal processing core (DSP core), or any combination thereof. The example memory controller 218 may be used with the processor 204, or in some implementations, the memory controller 218 may be an internal part of the processor 204.

Depending on the desired configuration, system memory 206 may be any type of memory including, but not limited to: volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.), or any combination thereof. The system memory 106 may include an operating system 220, one or more applications 222, and program data 224. The application 222 is in effect a plurality of program instructions for instructing the processor 204 to perform a corresponding operation. In some implementations, the application 222 can be arranged to cause the processor 204 to operate with the program data 224 on an operating system.

Computing device 200 may also include an interface bus 240 that facilitates communication from various interface devices (e.g., output devices 242, peripheral interfaces 244, and communication devices 246) to basic configuration 202 via bus/interface controller 230. The example output device 242 includes a graphics processing unit 248 and an audio processing unit 250. They may be configured to facilitate communication with various external devices, such as a display or speakers, via one or more a/V ports 252. The example peripheral interface 244 may include a serial interface controller 254 and a parallel interface controller 256, which may be configured to facilitate communication via one or more I/O ports 258 and external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device) or other peripherals (e.g., printer, scanner, etc.). The example communication device 246 may include a network controller 260 that may be arranged to facilitate communication with one or more other computing devices 262 over a network communication link via one or more communication ports 264.

The network communication link may be one example of a communication medium. Communication media may typically be embodied by computer readable instructions, data structures, program modules, and may include any information delivery media in a modulated data signal, such as a carrier wave or other transport mechanism. A "modulated data signal" may be a signal that has one or more of its data set or changed in such a manner as to encode information in the signal. By way of non-limiting example, communication media may include wired media such as a wired network or special purpose network, and wireless media such as acoustic, radio Frequency (RF), microwave, infrared (IR) or other wireless media. The term computer readable media as used herein may include both storage media and communication media.

In a computing device 200 according to the invention, an application 222 comprises a dialog state tracking device 400, the device 400 comprising a plurality of program instructions that may instruct a processor 104 to perform the dialog state tracking method 300.

Fig. 3 shows a flow chart of a dialog state tracking method 300 according to an embodiment of the invention. The method 300 is suitable for execution in a computing device, such as the computing device 200 described previously.

As shown in fig. 3, the method 300 begins at step S310. In step S310, the current intent is identified based on the dialog sentence of the current round and all intents stored in the data storage device. The dialogue sentences of the current round comprise the response sentences of the previous round and the user sentences of the current round; the dialogue sentence of the first round (first round) includes only the user sentence of the first round.

In one implementation, the current intent recognition process is as follows:

firstly, acquiring semantic representation of dialogue sentences of a current round, which is called first semantic representation;

for example, the response sentence of the previous round and the user sentence of the current round may be spliced, the text obtained by the splicing is input to a preset neural network language model for processing, and the output is the first semantic representation. The neural network language model can adopt a BERT model or other types of language models, the invention does not limit the specific language model, and a person skilled in the art can reasonably select the language model according to the needs. Wherein BERT (Bidirectional Encoder Representations from Transformers, transform-based bi-directional encoder characterization) is a language model that learns semantic representations by pre-training on a large scale of corpora, by which high quality semantic representation features can be obtained for phrases and sentences.

Then, semantic representations of each intention are obtained, and semantic representations of all the intents are spliced to obtain a second semantic representation;

as previously described, the data store 120 stores pattern information for a plurality of conversational service patterns, each pattern information including one or more intents. Therefore, the semantic representations corresponding to all intentions under all mode information are spliced, and the obtained semantic representation is the second semantic representation.

And finally, after the first semantic representation and the second semantic representation are spliced, inputting the first semantic representation and the second semantic representation into a preset intention classifier for processing to obtain the current intention. Here, the intention classifier is a classifier trained in advance based on training data, and the structure of the classifier may be a well-known (full-connection layer+softmax layer) structure. The output result of the intention classifier is the intention with the highest probability among all intents, and in the invention, the output result of the intention classifier is taken as the current intention.

In step S320, the current request slot is identified based on the dialogue statement of the current round and all the request slots stored.

In one implementation, the identification process of the current request slot is as follows:

firstly, acquiring a first semantic representation of a dialogue sentence of a current round;

Then, semantic representations of each request slot are obtained, and the semantic representations of all the request slots are spliced to obtain a third semantic representation;

as previously described, the data store 120 stores pattern information for a plurality of conversational service patterns, each pattern information including one or more intents, each intent including one or more request slots. Therefore, the semantic representations corresponding to all the request slots of all the intents under all the mode information are spliced, and the obtained semantic representation is the third semantic representation.

And finally, after the first semantic representation and the third semantic representation are spliced, inputting the first semantic representation and the third semantic representation into a preset request slot classifier for processing to obtain the current request slot. Here, the request slot classifier is a classifier trained in advance according to training data, and the classifier may have a structure known as (full connection layer+softmax layer). The output result of the request slot classifier is the request slot with the highest probability among all the request slots.

In step S330, a slot value for the currently active slot is identified based at least on the dialog sentence of the current round and all slots stored. Here, based at least on the dialogue statement of the current round, two cases are included: 1. dialogue sentences based on the current round; 2. based on the dialogue sentences of all rounds (from the user sentence of one dialogue start to the user sentence of the current round). In addition, whether a slot is currently active refers to whether a conversation has so far mentioned information about the slot, if so, the slot is an active slot, otherwise the slot is an inactive slot or a closed slot.

As described above, in the embodiment of the present invention, slots are divided into two types: a classified slot and a non-classified slot. The classified slots refer to slots which can only take a fixed number of values, and the non-classified slots refer to slots which are not classified slots.

In one implementation, for a typed slot, the identification process of the slot value of the currently active typed slot is as follows:

then, semantic representations of each classification slot are obtained, and semantic representations of all classification slots are spliced to obtain fourth semantic representations;

as described above, the data storage device 120 stores pattern information of a plurality of dialogue service patterns, where each pattern information includes one or more slots, so that semantic representations corresponding to all types of slots in all pattern information are spliced, and the obtained semantic representation is the fourth semantic representation.

Secondly, after the first semantic representation and the fourth semantic representation are spliced, inputting the first semantic representation and the fourth semantic representation into a preset first state classifier for processing to obtain a classification type slot position which is activated currently;

here, the first state classifier is a classifier trained in advance according to training data, and the structure of the classifier may be a well-known (full connection layer+softmax layer) structure. The output of the first state classifier is the active slot of all the typed slots.

Thirdly, splicing the semantic representation of the activated classified slot and the semantic representations of all possible slot values of the activated classified slot to obtain a fifth semantic representation;

specifically, for the activated classified slot, the semantic representation of the activated classified slot may be obtained, and for all possible values of the classified slot, the semantic representation of each value may be obtained respectively, and the semantic representations obtained by stitching all the semantic representations are obtained, namely, the fifth semantic representation.

And finally, after the first semantic representation and the fifth semantic representation are spliced, inputting a preset slot value classifier for processing to obtain the slot value of the currently activated classification slot.

Here, the bin value classifier is a classifier trained in advance according to training data, and the classifier may have a structure known as (full connection layer+softmax layer). The output result of the bin value classifier is the bin value of the classified bin.

In one implementation, for a non-categorical slot, the identification process of the slot value of the currently active non-categorical slot is as follows:

firstly, obtaining sixth semantic representations of dialogue sentences of all rounds;

for example, the response sentences and the user sentences of all the turns including the current turn are spliced, the text obtained by splicing is input into a preset neural network language model for processing, and the output is the sixth semantic representation. The neural network language model can adopt a BERT model or other types of language models, the invention does not limit the specific language model, and a person skilled in the art can reasonably select the language model according to the needs.

Then, semantic representations of each non-classified slot are obtained, and the semantic representations of all the non-classified slots are spliced to obtain a seventh semantic representation;

as described above, the data storage device 120 stores pattern information of a plurality of dialogue service patterns, where each pattern information includes one or more slots, so that semantic representations corresponding to all non-classified slots in all pattern information are spliced, and the obtained semantic representation is a seventh semantic representation.

Secondly, after the sixth semantic representation and the seventh semantic representation are spliced, inputting a preset second state classifier for processing to obtain an activated non-classification slot;

here, the second state classifier is a classifier trained in advance according to training data, and the structure of the classifier may be a well-known (full connection layer+softmax layer) structure. The output of the second state classifier is the active slot of all the non-classified slots.

Thirdly, after the sixth semantic representation and the activated semantic representation of the non-classified slot are spliced, inputting a preset position predictor for processing to obtain a starting position and an ending position of the slot in the dialogue sentence;

Here, the position predictor is a machine learning model trained in advance based on training data, and the structure of the position predictor may be a well-known (full-link layer+softmax layer) structure. The output of the position predictor is the beginning and ending position of the slot in the dialogue sentence.

And finally, acquiring the slot position value of the currently activated non-classified slot position based on the starting position and the ending position. Specifically, according to the dialogue statement in which the slot is located, the text between the starting position and the ending position of the dialogue statement is obtained, namely the slot value of the currently activated non-classified slot.

It should be noted that the execution sequence of step S310, step S320 and step S330 is not limited in the present invention.

In step S340, the slot values of the current intention, the current request slot, and the current activation slot are determined as a current dialog state, so as to respond to the current user sentence based on the current dialog state. It should be noted that, in a man-machine conversation scenario, after determining the current conversation state, how to respond to the user statement, those skilled in the art may design according to specific requirements, and the specific response method is not limited by the present invention.

As previously described, the computing device 200 may also pre-process based on pattern information of all dialog service patterns (schemas) stored in the data store 120 to generate a semantic representation of each slot, a semantic representation of each request slot, and a semantic representation of each intent, respectively. Accordingly, the data storage 120 may also store semantic representations of these slots, request slots, and intent for dialog state tracking based on these semantic representations at the application stage. Specific implementations of how the slots, request slots, and semantic representations of intent are generated are presented below.

In one implementation, a semantic representation of a slot is generated as follows:

and for each slot, processing the service description of the dialogue service mode, the slot name of the slot and the slot description of the slot through a preset neural network language model respectively based on the dialogue service mode to which the slot belongs to obtain respective semantic representations, and splicing all the semantic representations to obtain the semantic representation of the slot.

In one implementation, a semantic representation of a request slot is generated as follows:

and for each request slot, processing the service description of the dialogue service mode, the slot name of the request slot and the slot description of the request slot through a preset neural network language model respectively based on the dialogue service mode to which the request slot belongs to obtain respective semantic representations, and splicing all the semantic representations to obtain the semantic representation of the request slot.

In one implementation, a semantic representation of intent is generated as follows:

for each intention, based on a dialogue service mode to which the intention belongs, processing a service description of the dialogue service mode, an intention name of the intention and an intention description of the intention through a preset neural network language model respectively to obtain respective semantic representations, and splicing all the semantic representations to obtain the semantic representation of the intention.

The neural network language model can adopt a BERT model or other types of language models, the invention does not limit the specific language model, and a person skilled in the art can reasonably select the neural network language model according to the needs.

In addition, for each possible value of the slot of the type, it is also possible to perform preprocessing in the computing device 200, for example, processing each value separately through a preset neural network language model, such as a BERT model, to obtain respective semantic representations, and storing these semantic representations in the data storage 120.

It should be noted that, the above-mentioned machine learning models, such as the intention classifier, the request slot classifier, the first state classifier, the slot value classifier, the second state classifier, and the position predictor, may all be neural network models, and may be separately trained based on the corresponding corpus. With the inputs and outputs of each model known, it is within the ability of those skilled in the art to train the models individually based on corpus.

In addition, the embodiment of the invention also provides a method for jointly training the models (each model is called a sub-module), which comprises the following steps:

the labeled dialogue data are segmented into data needed by each sub-module according to the turn, the data are sent into a neural network, the loss of each sub-module can be obtained through the difference between the output of each sub-module and the labeled output, and the loss of the whole model can be obtained through averaging. And obtaining the gradient of each parameter of the model by using the integral loss calculation, carrying out gradient descent learning, and optimizing the model parameters.

An example of application of the present invention is given below.

The dialogue of one restaurant reservation service is as follows:

the user: i want to order a restaurant at 8 pm, two people

The system comprises: please ask you what area to eat?

The user: region of facing sun

The system comprises: good, the little pear soup can be prepared

The user: what can their phone number?

For example, in the fifth sentence, the dialogue sentence of the current round (round 3) is "good, so that the family of the small pear soup can be, and what can their phone number be? The user's intent (here, a cafeteria), request slots (here, telephone numbers), and classification slots (if any) are all calculated from the current round of session. Instead of a typed slot such as meal time (8 pm), the dining area (the morning sun) is calculated by the whole session history so far. The slot is calculated when it is active, e.g. the restaurant star level is not active here, nor is its slot value calculated. Eventually integrating this information as the dialog state for this round.

Fig. 4 shows a schematic diagram of a dialog state tracking device 400 according to an embodiment of the invention. Referring to fig. 4, the apparatus 400 includes:

an intention processing unit 410 adapted to identify a current intention based on the dialogue sentence of the current round and the stored all intents;

a request slot processing unit 420 adapted to identify a current request slot based on the dialogue statement of the current round and all the stored request slots;

a slot value processing unit 430 adapted to identify a slot value of a currently active slot based at least on the dialogue statement of the current round and all slots stored;

the dialog state generation unit 440 is adapted for determining the current intent, the current request slot and the slot value of the current activation slot as a current dialog state for responding to the user statement based on the current dialog state.

Specific processes performed by the intention processing unit 410, the request slot processing unit 420, the slot value processing unit 430, and the dialogue state generation unit 440 may refer to the above-mentioned steps S310, S320, S330, and S340, and will not be described here.

The invention also discloses: a8, the method of A7, wherein the semantic representation of the slot is generated according to the following steps:

A9, the method of A7, wherein the semantic representation of the request slot is generated according to the following steps:

A10. the method of claim A7, wherein the semantic representation of intent is generated as follows:

Wherein the neural network language model is a BERT model.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general-purpose systems may also be used with the teachings herein. The required structure for a construction of such a system is apparent from the description above. In addition, the present invention is not directed to any particular programming language. It will be appreciated that the teachings of the present invention described herein may be implemented in a variety of programming languages, and the above description of specific languages is provided for disclosure of enablement and best mode of the present invention.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Claims

1. A dialog state tracking method performed in a computing device coupled to a data store having stored therein pattern information for at least one dialog service pattern, each pattern information including a service name, a service description, at least one intent, and at least one slot, each intent including an intent name, an intent description, and at least one request slot, the method comprising:

Identifying a current intention based on the dialogue statement of the current round and all the stored intents;

identifying the current request slot based on the dialogue statement of the current round and all the stored request slots;

identifying a slot value of a currently activated slot based at least on the dialogue statement of the current round and all the stored slots;

determining the current intention, the current request slot and the slot value of the current activation slot as the current dialogue state so as to respond to the user statement based on the current dialogue state;

wherein the identifying the current intent based on the dialogue statement of the current round and the stored all intents comprises:

acquiring a first semantic representation of a dialogue sentence of a current round;

acquiring semantic representations of each intention, and splicing the semantic representations of all the intents to obtain a second semantic representation;

after the first semantic representation and the second semantic representation are spliced, inputting the first semantic representation and the second semantic representation into a preset intention classifier for processing to obtain a current intention;

the identifying the current request slot based on the dialogue statement of the current round and all the stored request slots comprises the following steps:

Acquiring semantic representations of each request slot, and splicing the semantic representations of all the request slots to obtain a third semantic representation;

after the first semantic representation and the third semantic representation are spliced, inputting the first semantic representation and the third semantic representation into a preset request slot classifier for processing to obtain a current request slot;

wherein the identifying the slot value of the currently activated slot based at least on the dialogue statement of the current round and all the stored slots includes:

acquiring semantic representations of each classification slot, and splicing the semantic representations of all classification slots to obtain a fourth semantic representation;

after the first semantic representation and the fourth semantic representation are spliced, inputting a preset first state classifier for processing to obtain an activated classification slot;

splicing the semantic representation of the activated classified slots and the semantic representations of all possible slot values of the activated classified slots to obtain a fifth semantic representation;

and after the first semantic representation and the fifth semantic representation are spliced, inputting a preset slot value classifier for processing to obtain the slot value of the currently activated classification slot.

2. The method of claim 1, wherein the identifying the slot value of the currently active slot based at least on the dialog statement of the current round and all slots stored further comprises:

acquiring sixth semantic representations of dialogue sentences of all rounds;

acquiring semantic representations of each non-classified slot, and splicing the semantic representations of all the non-classified slots to obtain a seventh semantic representation;

after the sixth semantic representation and the seventh semantic representation are spliced, inputting a preset second state classifier for processing to obtain an activated non-classification slot;

after splicing the sixth semantic representation and the activated semantic representation of the non-classified slot, inputting the sixth semantic representation and the activated semantic representation of the non-classified slot into a preset position predictor for processing to obtain a starting position and an ending position of the slot in a dialogue sentence;

and acquiring the slot position value of the currently activated non-classified slot position based on the starting position and the ending position.

3. The method of claim 1, wherein the dialogue statement of the current round includes a response statement of a previous round and a user statement of the current round.

4. The method of claim 1, further comprising pre-generating semantic representations of slots, request slots, and intent.

5. The method of claim 4, wherein the semantic representation of the slot is generated as follows:

6. The method of claim 4, wherein the semantic representation of the request slot is generated as follows:

7. The method of claim 4, wherein the semantic representation of intent is generated as follows:

8. The method of any of claims 5 to 7, wherein the neural network language model is a BERT model.

9. A dialog state tracking device residing in a computing device coupled to a data storage device having stored therein pattern information for at least one dialog service pattern, each pattern information including a service name, a service description, at least one intent, and at least one slot, each intent including an intent name, an intent description, and at least one request slot, the device comprising:

an intention processing unit adapted to identify a current intention based on the dialogue sentence of the current round and the stored all intents;

the request slot processing unit is suitable for identifying the current request slot based on the dialogue statement of the current round and all the stored request slots;

the slot value processing unit is suitable for identifying the slot value of the current activated slot at least based on the dialogue statement of the current round and all stored slots;

a dialogue state generation unit adapted to determine the current intention, the current request slot and the slot value of the current activation slot as a current dialogue state so as to respond to a user sentence based on the current dialogue state;

10. A computing device, comprising:

at least one processor; and

a memory storing program instructions, wherein the program instructions are configured to be adapted to be executed by the at least one processor, the program instructions comprising instructions for performing the method of any of claims 1-8.

11. A readable storage medium storing program instructions which, when read and executed by a computing device, cause the computing device to perform the method of any of claims 1-8.