CN109460450B

CN109460450B - Dialog state tracking method and device, computer equipment and storage medium

Info

Publication number: CN109460450B
Application number: CN201811131847.8A
Authority: CN
Inventors: 欧智坚; 戴音培; 张毅
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2018-09-27
Filing date: 2018-09-27
Publication date: 2021-07-09
Anticipated expiration: 2038-09-27
Also published as: CN109460450A

Abstract

The application relates to a conversation state tracking method, a conversation state tracking device, a computer device and a storage medium. By adopting the method, the robustness of the conversation can be improved, multiple values can be taken in one slot, and the preference of a user on the values can be expressed.

Description

Dialog state tracking method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of session technologies, and in particular, to a method and an apparatus for tracking a session state, a computer device, and a storage medium.

Background

With the development of dialogue technology, a dialogue state tracking technology has appeared, which performs dialogue state tracking based on system rules and extracts information contained in user statements through a recurrent neural network. The value in the slot-value pair used by existing dialog state systems to represent dialog states can only take one value, and this value does not express the user's preference.

However, the current dialog state tracking method has the problem of low robustness.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a dialog state tracking method, apparatus, computer device and storage medium for addressing the above technical problems.

A dialog state tracking method, the method comprising:

acquiring a current wheel conversation text;

determining the current round of dialogue semantics according to the dialogue text and the rich dialogue state tracking rule;

and updating the current wheel conversation state according to the conversation semantics and the previous wheel conversation state.

In one embodiment, the determining the current wheel pair semantic comprises:

and analyzing the dialog text according to the rich dialog state tracking rule to obtain a field label of the dialog text.

In one embodiment, the parsing the dialog text according to the rich dialog state tracking rule to obtain the domain tag of the dialog text includes:

acquiring probability distribution of the corresponding label of each field according to the dialog text and the current system behavior;

and selecting the label with the maximum probability value in the probability distribution as a domain label.

In one embodiment, the selecting the tag with the highest probability value in the probability distribution as the domain tag comprises:

judging whether the domain label is a preset label or not,

and if so, analyzing the conversation text according to the rich conversation state tracking rule to obtain a slot label of the conversation text.

In one embodiment, the parsing the dialog text according to the rich dialog state tracking rule, and obtaining a slot tag of the dialog text includes:

acquiring probability distribution of a label corresponding to each notification slot in the field according to the dialog text and the current system behavior;

and selecting the label with the highest probability value in the probability distribution as the slot label.

In one embodiment, the selecting the tag with the highest probability value in the probability distribution as the slot tag comprises:

judging whether the slot label is a preset label or not,

and if so, analyzing the dialog text according to the rich dialog state tracking rule to obtain a value tag of the dialog text.

In one embodiment, the parsing the dialog text according to the rich dialog state tracking rule, and obtaining the value tag of the dialog text includes:

acquiring probability distribution of labels corresponding to each desirable value in the slot according to the dialog text and the current system behavior;

selecting the label with the highest probability in the probability distribution as the value label.

In one embodiment, the selecting the label with the highest probability in the probability distribution as the value label comprises:

and determining the current round of dialogue semantics according to the field, the slot, the value, the field label, the slot label and the value label.

A dialog state tracking device, the device comprising:

the text acquisition module is used for acquiring a current wheel conversation text;

the text processing module is used for determining the current round of dialogue semantics according to the dialogue text and the rich dialogue state tracking rule;

and the state updating module is used for updating the current wheel conversation state according to the conversation semantics and the previous wheel conversation state.

A computer device comprising a memory storing a computer program and a processor implementing the steps of the method as described above when executing the computer program.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method as set forth above.

According to the conversation state tracking method, the conversation state tracking device, the computer equipment and the storage medium, the conversation semantics of the current wheel are determined according to the conversation text and the rich conversation state tracking rule, and the conversation state of the current wheel is updated according to the conversation semantics and the previous wheel conversation state, so that the robustness of conversation can be improved, multiple values can be taken in one slot, and the preference of a user on the values can be expressed.

Drawings

FIG. 1 is a diagram of an application environment for a dialog state tracking method in one embodiment;

FIG. 2 is a flow diagram that illustrates a method for session state tracking, according to one embodiment;

FIG. 3 is a schematic diagram illustrating a principle structure of rich dialog state tracking rules in a dialog state tracking method according to an embodiment;

FIG. 4 is a flowchart illustrating step S21 according to an embodiment;

FIG. 5 is a flow diagram illustrating a method for obtaining a domain tag in one embodiment;

FIG. 6 is a schematic diagram of the schematic structure of a convolutional neural network model in one embodiment;

FIG. 7 is a flowchart illustrating step S22 according to an embodiment;

FIG. 8 is a flowchart illustrating step S23 according to an exemplary embodiment;

FIG. 9 is a block diagram of a dialog state tracking device in one embodiment;

FIG. 10 is a diagram showing an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The dialog state tracking method provided by the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network.

The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.

In one embodiment, as shown in fig. 2, a dialog state tracking method is provided, which is described by taking the method as an example applied to the terminal in fig. 1, and includes the following steps:

step S1: and acquiring the current wheel conversation text.

In step S1, the current wheel dialog text refers to the text input by the terminal 102, which includes the voice text and the document text. The server 104 receives the current wheel-to-speech text sent by the terminal 102 through the network.

Step S2: and determining the current round of dialogue semantics according to the dialogue text and the rich dialogue state tracking rule.

In step S2, the rich dialog state tracking rule is expressed as follows, with reference to fig. 3:

d denotes a domain entity, s denotes a slot entity, and v denotes a value entity.

D represents a set of domains, S^dRepresenting a set of slots, V, in the field d^sRepresents the set of values in the bin s.

For each domain, μ_dLabels that may be desirable to represent tracking variables for a domain entity in a current turn of a conversation include: "mention", "not mention" (by default "not mentioned"). μ represents the set μ_dD ∈ D }. The grooves being marked similarly, ξ_sThe labels that may be desirable to represent tracking variables for the slot entity in the current round of the session include: "mention", "not mention", "don't care" (by default take "not mentioned"). Xi^dRepresentation set { ξ_s∶s∈S^d}. Xi represents the set { xi^dD ∈ D }. Similarly, η_vLabels that may be desirable to represent tracking variables for value entity v in the current round of the dialog include: like and dislike (default "dislike"). Eta^sSet of representations { η_v∶v∈V^s}，η^dSet of representations { η^s∶s∈S^dEta represents a set { eta }^dD ∈ D }. We denote the dialog state by (μ, ξ, η), which we call the "rich dialog state tracking rule". The dialog state in the form can solve the problems that one slot of the traditional dialog state cannot take a plurality of values and cannot reflect user preferences, and contains richer information than the traditional dialog state, so that the dialog state in the form is called as a 'rich dialog state'

After receiving the current round of dialog text, the server 104 processes the current round of dialog text through rich dialog state tracking rules to determine the current round of dialog semantics.

Step S3: and updating the current wheel conversation state according to the conversation semantics and the previous wheel conversation state.

In step S3, the dialog text is analyzed by the rich dialog state tracking rule, a new dialog state is formed in combination with the previous dialog state, and the dialog text ending at the terminal 102 of the current round is summarized, so that the user intention is tracked in the multi-round dialog.

According to the conversation state tracking method, the conversation semantics of the current round are determined according to the conversation text and the rich conversation state tracking rule by obtaining the conversation text of the current round, and then the conversation state of the current round is updated according to the conversation semantics and the conversation state of the previous round, so that the robustness of conversation can be improved, multiple values can be taken in one slot, and the preference of a user on the values can be expressed.

In one embodiment, step S2 includes:

step S21: and analyzing the dialog text according to the rich dialog state tracking rule to obtain a field label of the dialog text.

In step S21, the user text of the current round is represented as u, D represents a domain entity, and D represents a set of domains. For each domain, μ_dLabels that may be desirable to represent tracking variables for a domain entity in a current turn of a conversation include: "mentioned" and "not mentioned". And traversing all the domain entities D in the domain set D, and calculating the labels of the D. Assuming that the user text u of the current round is "movie i wants to watch gooey and bohai", the domain set D includes movies, music, time, weather, and the like, the movie domain label is "mentioned", and the other domain labels are "not mentioned".

In one embodiment, in conjunction with fig. 4, the step S21 includes:

step S211: and acquiring the probability distribution of the corresponding label of each field according to the dialog text and the current system behavior.

In step S211, with reference to fig. 5, the probability distribution of the corresponding label in each domain is obtained according to the user text u of the current round and the current system behavior a. The implementation process can be broken down into the following steps.

Converting a current round of user text u into a domain-specific embedding matrix f₁(d,u)。

First, the user text is segmented. Suppose that the current round of user text contains k_uWord u₁,u₂,…,

The dimension of word embedding for each word is dms.

A word embedding matrix representing the current round of user text u. Word embedding is a vector used to represent individual words. Compared with the traditional one-hot coding mode, the word embedding mode has the advantages that the dimensionality of word vectors is smaller, the distance between the word vectors of words with similar meanings is relatively short, and overfitting and the relation between the words is favorably reduced. Before training each discriminator, a dictionary is established containing commonly used words, and word embedding of each word is trained.

And (3) representing whether each word in the current user sentence u is related to the field d, establishing a semantic dictionary for each field, wherein the dictionary comprises common words specific to the field, and judging whether each word in u is related to the field d by matching keywords. Combining X with X_str(d, u) are spliced together to obtain a specific domain embedded matrix

(II) converting the current wheel system behavior a into a specific field action vector f₂(d,a)。f₂(d, a) is a 7-dimensional vector with the ith element equal to 1 when the ith condition on a and d listed later holds, and 0 otherwise: (1) a asks for a certain notification slot in user d(ii) a (2) a informs a certain inquiry slot in a user d; (3) a, confirming a certain informing slot in d to a user; (4) a, informing a user that no information related to d is found; (5) a, informing a user of finding a piece of information related to d; (6) a, informing a user of finding a plurality of pieces of information related to d and inquiring the user about which one is selected; (7) none of the first 6 conditions hold.

(III) embedding matrix f into specific field₁And (d, u) extracting the specific domain embedded vector by using a convolutional neural network. Using L convolution filter pairs f with window sizes of 1, 2 and 3 respectively₁And (d, u) performing convolution, and then obtaining 3 vectors through a ReLU (rectified Linear Unit) activation function and maximum pooling, wherein the vectors can be respectively regarded as characteristic representations of a univariate model, a bivariate model and a ternary model. And finally, splicing the three vectors to be used as a specific field embedded vector.

In connection with fig. 6, only 3 convolution filters are drawn per n-gram model in order to save space.

And (IV) utilizing a door mechanism, constraining the embedded vector through the action vector to obtain a semantic feature vector h, wherein a calculation formula is as follows:

h_i＝f₂(d,a)[i]·CNN(f₁(d,u))

this operation may be understood as controlling the position of the embedded vector in the semantic feature vector with the motion vector.

(V) sending the semantic feature vector into a full-connection network, and calculating the probability distribution p (u) of the specific domain label_d). For fully-connected networks

It is shown that the hidden layer nodes are the same as the input layer, the activation function is softmax, the output is a 2-dimensional vector, and the probability that the domain label is "mentioned" and "not mentioned" is represented.

Step S212: and selecting the label with the maximum probability value in the probability distribution as a domain label.

In step S212, after the probability distribution of the label corresponding to each domain is obtained, the label with the maximum probability value in the probability distribution is selected as the domain label. The output probability form of the domain label is a two-dimensional vector with a sum of 1, representing the probabilities of "mentioned" and "not mentioned", respectively, (0.8,0.2), and the label with the highest probability is selected as the output, in this case the output label is "mentioned". All fields are traversed during testing, so that each field can independently determine whether the current round is "mentioned" or "not mentioned". May or may not be mentioned in all areas.

In one embodiment, said step S212 is followed by:

step S213: and judging whether the field label is a preset label or not.

In step S213, the label with the highest probability value in the probability distribution is used as the domain label, and there are two cases of the domain label, namely mentioned and not mentioned. In the application, a preset label is set as 'mentioned', and when a field label is mentioned, namely the field label is the same as the preset label, a slot label of the dialog text is continuously acquired; and when the field label is not mentioned, namely the field label is different from the preset label, the rich conversation state tracking rule stops continuously analyzing the conversation text.

Step S22: and if so, analyzing the conversation text according to the rich conversation state tracking rule to obtain a slot label of the conversation text.

Representing the user text of the current round as u, S represents a slot entity, S^dRepresenting the set of slots in the domain d. For each groove, ξ_sThe labels that may be desirable to represent tracking variables for the slot entity in the current round of the session include: "mention", "not mention" and "don't care" (by default, "not mentioned").

The rich dialog state tracking rule starts traversing the slot set S in the field d according to the user text of the current round^dAnd acquiring a slot label of the dialog text. For example, "I want to see a movie of Ge you and Bohai", d is a movie, S^dIncluding actor, genre, age, etc., the actor slot is labeled "mentioned" and the other slots are labeled "not mentioned".

In one embodiment, in conjunction with fig. 7, the step S22 includes:

step S221: and acquiring the probability distribution of the corresponding label of each informing slot in the field according to the dialog text and the current system behavior.

In step S221, the probability distribution of the corresponding label of each notification slot is obtained according to the user text u of the current round and the current system behavior a. The implementation process can be broken down into the following steps.

Converting a current round of user text u into a particular slot embedding matrix f₁(s,u)。

First, the text is segmented. Suppose that the current round of user text contains k_uWord u₁,u₂,…,

The dimension of word embedding for each word is dms.

Whether each word in the current user sentence u is related to the slot s or not is shown, a semantic dictionary is established for each slot, the dictionary contains common words specific to the slot, and the word can be judged by matching keywords of each word in uWhether or not it is associated with slot s. Combining X with X_str(s, u) are spliced together to obtain a specific slot embedded matrix

(II) converting the current wheel system behavior a into a specific groove action vector f₂(s,a)。f₂(s, a) is a 7-dimensional vector with the ith element equal to 1 when the ith condition on a and s listed later holds, otherwise equal to 0: (1) a, inquiring a certain value in a user s; (2) a, informing a user of a certain dereferencing value in s; (3) a, confirming a certain dereferencing value in s to a user; (4) a, informing a user that information related to s is not found; (5) a, informing a user to find a piece of information related to s; (6) a, informing a user of finding a plurality of pieces of information related to s and inquiring the user about which one is selected; (7) none of the first 6 conditions hold.

(III) embedding matrix f for specific slots₁(s, u) extracting a specific slot embedding vector by using a convolutional neural network. Using L convolution filter pairs f with window sizes of 1, 2 and 3 respectively₁And (s, u) performing convolution, and then obtaining 3 vectors through a ReLU (rectified Linear Unit) activation function and maximum pooling, wherein the vectors can be respectively regarded as characteristic representations of a univariate model, a bivariate model and a ternary model. Finally, the three vectors are spliced together to be used as a specific slot embedding vector.

To save space, only 3 convolution filters are drawn for each n-gram model.

h_i＝f₂(s,a)[i]·CNN(f₁(s,u))

And (V) finally, sending the semantic feature vector into a full-connection network, and calculating the probability distribution p (xi) of the specific slot label_s|μ_d)∈R³. For fully-connected networks

It is shown that the hidden layer nodes are the same as the input layer, the activation function is softmax, the output is a 3-dimensional vector, and the probability that the domain labels are "mentioned", "not mentioned", and "don't care" is represented.

Step S222: and selecting the label with the highest probability value in the probability distribution as the slot label.

In step S222, after the probability distribution of the label corresponding to each notification slot is obtained, the label with the highest probability value in the probability distribution is selected as the slot label. The output probability form of the slot label is a three-dimensional vector with a sum of 1, representing probabilities of "mentioned", "not mentioned", and "not intended", respectively. For example (0.8,0.1,0.1), we choose the label with the highest probability as the label of the slot at the time of testing, in this case the label with the highest probability is "mentioned". The notification slots in all domains are traversed during testing, so that each notification slot can independently judge whether the current round is 'mentioned', 'not mentioned' or 'not to see'. By traversing all the notification slots, we can obtain which slots are mentioned, which slots are not mentioned, and which slots can take all the values of the slots in the current conversation.

In one embodiment, said step S222 is followed by:

step S223: judging whether the slot label is a preset label or not,

in step S223, the label with the highest probability value in the probability distribution is used as the slot label, and there are three cases of "mention", "not mention" and "not intention". In the application, a preset label is set as 'mention', and when the slot label is 'mention', namely the slot label is the same as the preset label, the value label of the dialog text is continuously acquired; when the slot label is "not mentioned" or "not intended", i.e., the slot label is different from the preset label, the rich dialog state tracking rule stops parsing the dialog text continuously.

Step S23: and if so, analyzing the dialog text according to the rich dialog state tracking rule to obtain a value tag of the dialog text.

Representing the user text of the current round as u, V representing a value entity, V^sRepresents the set of values in the bin s. For each value, η_vLabels that may be desirable to represent tracking variables for a value entity in a current round of a dialog include: "like" and "dislike".

According to the user text of the current round, the rich conversation state tracking rule traverses the field set and the groove set of the text, and then traverses the value set V^sTo obtain their value labels. For example, "i want to see movies of georgy and yellow bohai", where the domain labels are [ movie ═ mentions ] and the slot label [ actor ═ mentions ], the value labels are [ georgy ═ likes ], and the yellow bohai ═ likes ].

In one embodiment, in conjunction with fig. 8, the step S23 includes:

step S231: and acquiring the probability distribution of the label corresponding to each dereferencing value in the slot according to the dialog text and the current system behavior.

In step S231, the probability distribution of the label corresponding to each value is obtained according to the user text u and the current system behavior a of the current round. The implementation process can be broken down into the following steps.

Converting the current round of user text u into a specific value embedding matrix f₁(v,u)。

Each wordHas dimensions of dms.

A word embedding matrix representing the current round of user text u. Word embedding is a vector used to represent individual words. Compared with the traditional one-hot coding mode, the word embedding mode has the advantages that the dimensionality of word vectors is smaller, the distance between the word vectors of words with similar meanings is relatively short, and overfitting and the relation between the words is favorably reduced. Before training each discriminator, a dictionary is established containing commonly used words, and word embedding of each word is trained. To locate v in a sentence, we replace f with a vector of all 1 s₁The word vector of v in (v, u).

Representing whether each word in the current round of user sentence u is related to the value v, establishing a semantic dictionary, wherein the semantic dictionary comprises words expressing positive emotion and negative emotion, performing keyword matching on each word in u, and if the ith word is a word with positive emotion, x_str(v,u)_i1, if it is a word with negative emotion, x_str(v,u)_i1, otherwise x_str(v,u)_i0. Combining X with X_str(v, u) are spliced together to obtain a specific value embedded matrix

(II) embedding matrix f for specific value₁(v, u) extracting a specific value embedding vector by using a convolution neural network. Using L convolution filter pairs f with window sizes of 1, 2 and 3 respectively₁(v, u) convolution is carried out, and then 3 vectors are obtained through a ReLU (rectified Linear Unit) activation function and maximum pooling, and can be respectively regarded as characteristic representations of a univariate model, a bivariate model and a ternary model. Finally, the three vectors are spliced together to formVectors are embedded for a particular domain.

To save space, only 3 convolution filters are drawn for each n-gram model.

Thirdly, constraining the embedded vector through the action vector by utilizing a door mechanism to obtain a semantic feature vector h, wherein a calculation formula is as follows:

h＝CNN(f₁(v,u))#(3)

And (IV) finally, sending the semantic feature vector into a full-connection network, and calculating the probability distribution p (eta) of the specific value label_v|ξ_s,μ_d)∈R². For fully-connected networks

It shows that the hidden layer nodes are the same as the input layer, the activation function is softmax, the output is a 2-dimensional vector, and the probability that the domain labels are like and dislike is shown.

Step S232: selecting the label with the highest probability in the probability distribution as the value label.

In step S232, the label having the highest probability value in the probability distribution is used as the value label, and there are two cases of the value label, i.e., "like" and "dislike".

In one embodiment, said step S232 is followed by:

step S24: and determining the current round of dialogue semantics according to the field, the slot, the value, the field label, the slot label and the value label.

In step S24, after the current round of text is parsed by the rich dialog state tracking rule, a domain label corresponding to the domain, a slot label corresponding to the slot, and a value label corresponding to the value are obtained. And combining the acquired field, slot, value, field label, slot label and value label to acquire the current round of dialogue semantics. If the current round text is "i want to see movies of gooey and bohai", the corresponding dialog state may be expressed as [ movie ═ mention, actor ═ mention, gooey ═ like, and bohai ═ like ], or "i do not want to see movies of gooey" may be expressed as [ movie ═ mention, actor ═ mention, gooey ═ dislike ] or simply expressed as [ movie (actor (gooey ═ dislike) ].

It should be understood that, although the steps in the flowchart of fig. 2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 2 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

In one embodiment, as shown in fig. 9, there is provided a dialog state tracking apparatus, including: the device comprises an acquisition module, a processing module and an updating module, wherein:

the text acquisition module 1 is used for acquiring a current wheel conversation text;

the text processing module 2 is used for determining the current round of dialog semantics according to the dialog text and the rich dialog state tracking rule;

and the state updating module 3 is used for updating the current wheel conversation state according to the conversation semantics and the previous wheel conversation state.

In one embodiment, the text processing module 2 includes:

and the first processing module 21 is configured to parse the dialog text according to the rich dialog state tracking rule, and obtain a domain tag of the dialog text.

In one embodiment, the first processing module 21 includes:

the second processing module 211 is configured to obtain probability distribution of a corresponding tag in each field according to the dialog text and the current system behavior;

a first selecting module 212, configured to select a label with a maximum probability value in the probability distribution as a domain label.

In one embodiment, the first selection module 212 then comprises:

a first judging module 213, configured to judge whether the domain label is a preset label,

and the third processing module 22 is configured to determine that, if yes, the dialog text is analyzed according to the rich dialog state tracking rule, and a slot tag of the dialog text is obtained.

In one embodiment, the third processing module 22 includes:

a fourth processing module 221, configured to obtain, according to the dialog text and the current system behavior, probability distribution of a tag corresponding to each notification slot in the field;

a second selecting module 222, configured to select a label with a highest probability value in the probability distribution as the slot label.

In one embodiment, the second selection module 222 then comprises:

a second judging module 223 for judging whether the slot label is a preset label,

and the fifth processing module 23 is configured to, if the determination result is yes, analyze the dialog text according to the rich dialog state tracking rule, and obtain a value tag of the dialog text.

In one embodiment, the fifth processing module 23 includes:

a sixth processing module 231, configured to obtain, according to the dialog text and the current system behavior, probability distribution of a tag corresponding to each desirable value in the slot;

a third selecting module 232, configured to select a label with a highest probability in the probability distribution as the value label.

In one embodiment, the third selection module 232 then comprises:

and the seventh processing module 24 is configured to determine the current round of dialog semantics according to the field, the slot, the value, the field tag, the slot tag, and the value tag.

For a specific definition of the dialog state tracking device, reference may be made to the above definition of a dialog state tracking method, which is not described in detail herein. The modules in the dialog state tracking device described above may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 10. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing dialogue state processing data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method for tracking dialog states.

Those skilled in the art will appreciate that the architecture shown in fig. 10 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In an embodiment, a computer device is provided, comprising a memory in which a computer program is stored and a processor which, when executing the computer program, realizes the steps of the method as described above.

In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the method as described above.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A dialog state tracking method, the method comprising:

acquiring a current wheel conversation text;

determining the current round of dialogue semantics according to the dialogue text and the rich dialogue state tracking rule; wherein the rich dialog state tracking rule comprises: defining tracking variables of the field entities, the slot entities and the value entities in the current round of conversation to obtain a field label of each field, a slot label of each slot and a value label of each value; the slot entity is included in the domain entity, the domain tag includes: "mentioned", "not mentioned"; wherein the domain tag is to convert the dialog text into a domain-specific embedding matrix; converting the current wheel system behavior into a specific field action vector; extracting a specific domain embedding vector from the specific domain embedding matrix by using a convolutional neural network; utilizing a door mechanism to constrain the specific field embedded vector through the specific field action vector to obtain a semantic feature vector; determining a domain label according to the semantic feature vector;

2. The method of claim 1, wherein determining a current wheel pair utterance sense according to the dialog text and rich dialog state tracking rules comprises:

3. The method of claim 2, wherein parsing the dialog text according to the rich dialog state tracking rule to obtain a domain tag of the dialog text comprises:

4. The method of claim 3, wherein selecting the label with the highest probability value in the probability distribution as the domain label comprises:

judging whether the domain label is a preset label or not,

5. The method of claim 4, wherein parsing the dialog text according to the rich dialog state tracking rule and obtaining a slot tag of the dialog text comprises:

6. The method of claim 5, wherein the selecting the tag with the highest probability value in the probability distribution as the slot tag comprises:

judging whether the slot label is a preset label or not,

7. The method of claim 6, wherein parsing the dialog text according to the rich dialog state tracking rule, and wherein obtaining a value tag for the dialog text comprises:

8. The method of claim 7, wherein selecting the label with the highest probability in the probability distribution as the value label comprises:

9. A dialog state tracking apparatus, the apparatus comprising:

the text processing module is used for determining the current round of dialogue semantics according to the dialogue text and the rich dialogue state tracking rule; wherein the rich dialog state tracking rule comprises: defining tracking variables of the field entities, the slot entities and the value entities in the current round of conversation to obtain a field label of each field, a slot label of each slot and a value label of each value; the slot entity is included in the domain entity, the domain tag includes: "mentioned", "not mentioned"; wherein the domain tag is to convert the dialog text into a domain-specific embedding matrix; converting the current wheel system behavior into a specific field action vector; extracting a specific domain embedding vector from the specific domain embedding matrix by using a convolutional neural network; utilizing a door mechanism to constrain the specific field embedded vector through the specific field action vector to obtain a semantic feature vector; determining a domain label according to the semantic feature vector;

10. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 8 when executing the computer program.

11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 8.