WO2020228732A1

WO2020228732A1 - Method for training dialog state tracker, and computer device

Info

Publication number: WO2020228732A1
Application number: PCT/CN2020/089988
Authority: WO
Inventors: 尹伊淳; 尚利峰; 蒋欣; 陈晓
Original assignee: 华为技术有限公司
Priority date: 2019-05-13
Filing date: 2020-05-13
Publication date: 2020-11-19
Also published as: CN110245221A; CN110245221B

Abstract

Disclosed are a method for training a dialog state tracker, and a computer device, which relate to the field of artificial intelligence. The method comprises expanding texts in a training text database to obtain an enhanced database; and training a dialog state tracker by using texts in the enhanced database. The number of training texts for training the dialog state tracker can be increased, such that the performance of the dialog state tracker can be improved.

Description

Method and computer equipment for training dialog state tracking classifier

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on May 13, 2019, with application number 201910395608.1, application titled "Method and Computer Equipment for Training Dialogue State Tracking Classifier", the entire content of which is incorporated by reference In this application.

Technical field

This application relates to the field of artificial intelligence, and more specifically, to a method and computer equipment for training a conversation state tracking classifier.

Background technique

Artificial Intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.

Natural language processing is an important branch in the field of artificial intelligence. Dialogue system is an application direction of natural language processing. Common dialogue systems include automatic dialogue robots and voice assistants. Different from traditional retrieval, the text input by the user in the dialogue system is usually a complete sentence, and the text input by the user is usually a colloquial sentence. Therefore, the dialogue system needs to understand and track the user's needs according to the text input by the user, and determine the reply content according to the user's needs.

The dialogue state tracker (DST) is responsible for understanding and tracking the needs of users during the dialogue process, and determining and outputting the conversation state. The session state output by DST represents the user's needs. The dialogue system can determine the reply content according to the conversation state output by DST.

Machine learning is now a common way to determine DST. However, the machine learning process requires high-quality training text. However, high-quality training text is difficult to collect. In other words, the number of high-quality training texts that can be collected currently is small. In addition to the small number of high-quality training texts, the current high-quality training texts that can be collected involve fewer scenes. Therefore, the diversity of training samples is also poor. Due to the small number and poor diversity of training texts used for machine learning, the performance of DST obtained through machine learning will not be particularly high.

Summary of the invention

The present application provides a method and computer equipment for training a dialog state tracking classifier to provide the performance of the dialog state tracking classifier.

In a first aspect, an embodiment of the present application provides a method for training a dialog tracking classifier, the method comprising: obtaining a first text, the first text is a text in a training text database, and the first text includes at least two phrases ; Determine at least one target phrase from the first text; determine P second texts based on the at least one target phrase, each second text in the P second texts includes an extended phrase, the extended phrase is based on If one of the at least one target phrase is determined, P is a positive integer greater than or equal to 1; according to the first text and the P second text, through machine learning, a dialogue state tracking classifier is trained, and the dialogue state tracking classification The device is used to track the status of the conversation based on the acquired conversation of the user. The above technical solution can increase the number of training text samples used to train the dialogue state tracking classifier, and improve the performance of the trained dialogue state tracking classifier, so that the dialogue state tracking classifier can more accurately determine the slots in the user's expression content. Bit-slot value, and improve the accuracy of the intent determined by the dialog state tracking classifier and the accuracy of determining the slot with the unfilled slot value

Binding a first aspect, a first aspect of the possible implementations, the phrase at least one object based on the determined P second text, comprising: determining K ₁ K ₁ corresponding to the slots of first phrases Set, where the K ₁ slots are the slots of K ₁ target phrases in the at least one target phrase, K ₁ is a positive integer greater than or equal to 1, and P ₁ second text is determined, among which P ₁ The extended phrase included in the second text belongs to the K ₁ first phrase set, the P second text includes the P ₁ second text, and P ₁ is a positive integer greater than or equal to 1. In the above technical solution, the number of training texts used for training the dialog state tracking classifier is increased by changing the slot value of the same slot.

Binding a first aspect, a first aspect of the possible implementations, the phrase at least one object based on the determined P second text, comprising: determining K ₂ K ₂ a second set of phrases corresponding to the meaning , Where the K ₂ word meanings are the word meanings of K ₂ target phrases, K ₂ is a positive integer greater than or equal to 1; determine P ₂ second texts, where P ₂ second texts include extended phrases belonging to the K _{A set of 2} second phrases, the P second text includes the P ₂ second text, and P ₂ is a positive integer greater than or equal to 1. The above technical solution is based on the meaning of the phrase to increase the number of training texts used to train the dialogue state tracking classifier.

With reference to the first aspect, in a possible implementation of the first aspect, the training of the dialogue state tracking classifier according to the first text and the P second texts through machine learning includes: according to a policy network model, Determine at least one second text from the P second texts; use the first text and the at least one second text as the training text for the machine learning to train the dialogue state tracking classifier. The above technical solution can filter the second text and filter out the second text that is not suitable for training the dialogue state tracking classifier. In this way, the quality of the text used for training the dialogue state tracking classifier can be improved, thereby improving the performance of the trained dialogue state tracking classifier.

With reference to the first aspect, in a possible implementation of the first aspect, the method further includes: determining T second texts from the P second texts according to the reference strategy network model, where T is greater than or equal to 1. A positive integer; the evaluation result is determined according to the initial dialogue state tracking classifier and the T second text; according to the evaluation result, the reference strategy network model is trained to obtain the strategy network model.

In combination with the first aspect, in a possible implementation of the first aspect, the evaluation result is determined according to the initial dialogue state tracking classifier and the T second texts: the initial dialogue state tracking classifier is used to predict the T For the state of each second text in the second text, obtain T prediction results, and determine T first reward values according to the T prediction results; or use the T second text to track the classifier for the initial dialogue state Perform training; track the classifier according to the initial dialogue state after training, and determine T second reward values.

In combination with the first aspect, in a possible implementation of the first aspect, the evaluation result is determined according to the initial dialogue state tracking classifier and the T second texts: the initial dialogue state tracking classifier is used to predict the T For the state of each second text in the second text, T prediction results are obtained. According to the T prediction results, T first reward values are determined; use the T second texts to perform the initial dialog state tracking classifier Training: Track the classifier according to the initial dialogue state after training, and determine T second reward values.

In a second aspect, an embodiment of the present application provides a method for determining the state of a dialogue, the method includes: acquiring a user’s dialogue; using a dialogue state tracking classifier to track the state of the dialogue, wherein the dialogue state tracking classifier is based on the first One aspect or any possible implementation manner of the first aspect is determined.

In a third aspect, an embodiment of the present application provides a computer device, which includes a unit for executing the method described in the first aspect or any one of the possible implementation manners of the first aspect.

Optionally, the computer device of the third aspect may be a computer device, or may be a component (such as a chip or a circuit, etc.) that can be used in a computer device.

In a fourth aspect, an embodiment of the present application provides a computer device, which includes a unit for executing the method described in the second aspect.

Optionally, the computer device of the fourth aspect may be a computer device, or may be a component (for example, a chip or a circuit, etc.) used in a computer device.

In a fifth aspect, an embodiment of the present application provides a computer device that includes a memory and a processor, the memory stores instructions, and the processor invokes the instructions in the memory to execute the first aspect or any one of the first aspects. The method described in the implementation mode.

In a sixth aspect, an embodiment of the present application provides a computer device including a memory and a processor, the memory stores instructions, and the processor invokes the instructions in the memory to execute the method described in the second aspect.

In a seventh aspect, an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores instructions for implementing the first aspect or any one of the possible implementation manners of the first aspect.

In an eighth aspect, an embodiment of the present application provides a computer-readable storage medium that stores instructions for implementing the method described in the second aspect.

In a ninth aspect, this application provides a computer program product containing instructions, when the computer program product is run on a computer, the computer can execute the first aspect or any one of the possible implementations of the first aspect. method.

In a tenth aspect, this application provides a computer program product containing instructions, which when the computer program product runs on a computer, causes the computer to execute the method described in the second aspect.

Description of the drawings

Figure 1 is a schematic diagram of a common dialogue system.

Figure 2 is a schematic diagram of the work of DST.

Fig. 3 is a schematic flowchart of training DST provided according to an embodiment of the present application.

Fig. 4 is a schematic flowchart of a training strategy network model provided according to an embodiment of the present application.

Fig. 5 is a schematic flowchart of a method for training the policy network model by using the P second texts.

Fig. 6 is a structural block diagram of a computer device provided according to an embodiment of the present application.

Fig. 7 is a structural block diagram of a computer device provided according to an embodiment of the present application.

Detailed ways

The terms used in the following embodiments are only for the purpose of describing specific embodiments, and are not intended to limit the application. As used in the specification and appended claims of this application, the singular expressions "a", "an", "said", "above", "the" and "this" are intended to also This includes expressions such as "one or more" unless the context clearly indicates to the contrary. It should also be understood that in the following embodiments of the present application, “at least one” and “one or more” refer to one, two or more than two. The term "and/or" is used to describe the association relationship of associated objects, which means that there can be three kinds of relationships; for example, A and/or B can mean: A alone exists, A and B exist at the same time, and B exists alone. Among them, A and B can be singular or plural. The character "/" generally indicates that the associated objects are in an "or" relationship.

References described in this specification to "one embodiment" or "some embodiments", etc. mean that one or more embodiments of the present application include a specific feature, structure, or characteristic described in combination with the embodiment. Therefore, the phrases "in one embodiment", "in some embodiments", "in some other embodiments", "in some other embodiments", etc. appearing in different places in this specification are not necessarily All refer to the same embodiment, but mean "one or more but not all embodiments" unless it is specifically emphasized otherwise. The terms "including", "including", "having" and their variations all mean "including but not limited to" unless otherwise specifically emphasized.

Figure 1 is a schematic diagram of a common dialogue system. As shown in FIG. 1, the dialogue system 100 may include a speech recognition (automatic speech recognition, ASR) module 101, a dialogue state tracker (DST) 102, and a dialogue policy learning (DPL) module 103 , A dialogue generation (natural language generation, NLG) module 104 and a voice broadcast (text to speech, TTS) module 105 are implemented.

(1) ASR module 101

The main function of the ASR module is to recognize the user's voice as text content. The ASR module can learn what the user is saying, but it cannot understand what the user means. The understanding of the semantics will be handled by the NLU module.

(2) DST 102

DST can be used to understand the user's intent and perform slot analysis.

Exemplary, user expression: My mother likes to eat Chinese food, is there anything you can recommend?

From this sentence, DST can analyze the content shown in Table 1.

Table 1

意图(intent)Intent	“寻找餐馆”"Looking for a restaurant"
槽位(slot)Slot	食物类型＝“中餐”Food type = "Chinese food"

In the above example, two concepts are mentioned, namely intention and slot. The two concepts are explained in detail below.

intention

Intent can be understood as a classifier that determines which type of sentence the user expresses, and then the program corresponding to this type will do a special analysis. In one implementation, the "program corresponding to this type" can be a bot. For example, the user says: "Put me a happy song." DST judges that the user's intention classification is music, so it calls out music The bot recommends a song to the user to play. When the user feels that it is not right, he says: "change another song", or the music bot continues to serve the user until the user expresses other questions and the intention is no longer music , And then switch to another robot to serve users.

Slot

After the user's intention is determined, DST needs to further understand the content of the dialogue. For simplicity, you can choose the most core part to understand, and the others can be ignored. Those most important parts can be called Slots. The content of the bit can be referred to as the slot value.

A slot is included in the sentence "Looking for a restaurant". The slot is "Type of food" and the corresponding slot value is "Chinese food".

If we want to fully consider what users need to enter when looking for a restaurant, we can definitely think of more, such as the location and price of the restaurant. For designers of dialogue systems, the starting point of the design is to define slots. In other words, the designer needs to design which slots are required to complete the content of the user query.

Taking "finding a restaurant" as an example, the designer can design the following slots: location, price, request, food type. The dialogue system needs to know the slot value of the above slot to be able to provide users with appropriate query results.

In addition to determining intent and slot-to-slot value, DST can also be used to track conversation status. The dialogue state can be understood as the slot filling of the current task. The filling status of the slot may include whether the slot has been filled (that is, whether there is a corresponding slot value), and the filled slot value. In other words, after determining the intent and the slot value, the DST can continue to determine which of the slots corresponding to the intent have no corresponding slot value, and perform the probability of the existing slot value.

Exemplarily, the user expressed "My mother likes Chinese food, is there anything you can recommend?". At this time, the NLU module can determine that the user's intention is "looking for a restaurant". The slots corresponding to the intent are "location", "price", "request", and "food type". DST can determine that there is only one slot value of "food type" in the user's expression based on the slot corresponding to the intention of "finding a restaurant". In this case, DST can determine the missing slot value of the following slots: "location", "price", "request". DST can also determine the probability of "Chinese food."

The embodiment of the application provides a method of how to train DST. For the specific implementation of training DST, refer to the methods shown in FIG. 3 to FIG. 5.

(3) DPL module 103

The main function of the DPL module is to determine the follow-up processing strategy according to the dialogue state output by the DST. He also said "My mother likes to eat Chinese food. Is there anything you can recommend?". According to the dialog status output by DST, the DPL module can find that the slot values of the three slots "location", "price" and "request" are missing. Therefore, the DPL module can trigger the "reversely ask restaurant information" action and pass this action to the NLG module.

(4) NLG module 104

The main function of the NLG module is to generate dialogue. For example, after the DPL module passes the action of "requesting restaurant information" to the NLG module, the NLG module can generate the following content "I found 10 Chinese food restaurants. Where do you want to eat?".

(5) TTS module 105

The main function of the TTS module is to broadcast conversations to users. The TTS module can convert the content output by the NLG module into text-to-speech, and broadcast the dialogue generated by the dialogue system to the user through the output device.

It is understandable that the dialogue system 100 shown in FIG. 1 is just a common dialogue system that can be applied to the technical solutions provided by this application. In addition to the dialog system 100 shown in FIG. 1, other dialog systems can also apply the technical solutions provided in this application. For example, in some embodiments, the user can talk to the dialogue system through text. In this case, the dialogue system may not include the ASR module and the TTS module. For another example, in other embodiments, the dialogue system may not include the ASR module but include the TTS module. In this case, the user can enter the dialogue by text, and the dialogue system can reply by voice.

In addition, it can be understood that the division of each module in the dialogue system 100 shown in FIG. 1 is only a possible way of division. In addition to the division as shown in Figure 1, each module in the dialogue system can also have other divisions. For example, one module of the system 100 shown in FIG. 1 can be divided into multiple modules according to functions, and different modules have different functions. For another example, two or more modules in the system 100 shown in FIG. 1 can be combined into one module.

Figure 2 is a schematic diagram of the work of DST. The DST 200 as shown in FIG. 2 includes a semantic encoding module 201, a semantic encoding module 202, a semantic fusion module 203, a prediction module 204, and a status update module 205.

Assume that the dialogue flow between the user and the dialogue system is as follows:

User: My mother likes to eat Chinese food, is there anything you can recommend?

Dialogue system: I found 10 Chinese restaurants. Where would you like to eat?

User: As long as the price is cheap, the location does not matter. Can you tell me the name and location of the restaurant?

Dialogue system: Mama Zhang's Sichuan Flavor Museum is located at No. 100 Ande Road.

The semantic encoding module 201 may be used to determine the semantic vector according to the reply content of the last round of dialogue system. The semantic encoding module 202 may be used to determine the semantic vector according to the content expressed by the current round of users. For ease of description, the semantic vector determined by the semantic encoding module 201 is referred to as semantic vector 1, and the semantic vector determined by the semantic encoding module 202 is referred to as semantic vector 2.

The semantic fusion module 203 can be used to obtain the semantic vector 1 determined by the semantic encoding module 201 and the semantic vector 2 determined by the semantic encoding module 202; to merge the semantic vector 1 and the semantic vector 2 to determine a new fused semantic vector.

The prediction module 204 may perform probabilistic prediction on the possible slot-slot value two-tuple according to the fused semantic vector determined by the semantic fusion module 203. The slot value with the highest predicted probability can be used as the predicted slot value.

The status update module 205 may determine the accumulated slot-slot value of the current round according to the slot-slot value determined according to the user's expression content in the previous round and the slot-slot value determined by the user's expression content of the current round.

As shown in Figure 2, the reply content of the last round of dialogue system input into the semantic encoding module 201 is "Found 10 Chinese restaurants, where do you want to eat?", input into the semantic encoding module 202 in the current round of user expressions The content is "As long as the price is cheap, the location does not matter. Can you tell me the name and location of the restaurant?". The prediction result determined by the prediction module 204 is shown in FIG. 2. For the sake of brevity, Fig. 2 does not show all slot-slot value prediction results.

As also shown in FIG. 2, the status update module 205 determines the slot-slot value according to the user's expression content in the previous round, and the slot-slot value determined according to the content expressed by the user in the current round. The values are <price, cheap>, <location, no demand>, <request, name>, <request, location>. The accumulated slot-slot values in the current round determined by the status update module 205 are: <food type, Chinese food>, <price, cheap>, <location, no demand>, <request, name>, and <request, location>.

The embodiment of the present application provides a method for training DST. The computer equipment can expand the text in the training text database, increase the training text that can be used to train the DST, and use the expanded training text to train the DST. For ease of description, the following takes a text as an example to introduce how computer equipment expands the text, and how to use the expanded text to train DST.

Fig. 3 is a schematic flowchart of training DST provided according to an embodiment of the present application. The method shown in FIG. 3 can be executed by a computer device. The embodiment of the present application does not limit the specific form of the computer device. For example, the computer device may be a personal computer, a laptop computer (laptop), a tablet computer, a workstation, or a server. The DST trained by the method shown in FIG. 3 can implement the function of DST 102 in FIG. 1 or the DST shown in FIG. 2.

301. The computer device obtains a first text, where the first text is a text in a training text database, and the first text includes at least two phrases.

The phrase referred to in the embodiments of the present application may be an n-gram, where n is a positive integer greater than or equal to 1. The N-gram phrase represents a text segment composed of n consecutive words. For example, a unary phrase is a text fragment composed of one word; a binary phrase is a text fragment composed of two words; a triple phrase is a text fragment composed of three words.

Optionally, in some embodiments, the granularity of the text in the training text database may be sentences. In other words, each text in the training text database is a sentence.

Optionally, in other embodiments, the granularity of the text in the training text database may be a text fragment composed of multiple n-gram phrases, and the text fragment may not be a complete sentence.

Optionally, in other embodiments, the granularity of the text in the training text database may be a text composed of multiple sentences.

For ease of description, the following assumes that the granularity of the text in the training text database is sentences. In other words, the first text is a sentence composed of at least two phrases.

The embodiment of the present application does not limit the storage location of the training text database. For example, the training text database can be stored in a storage device in the computer device. For another example, the training text database is stored in an externally connected storage device, such as a mobile hard disk, U disk, etc. For another example, the training text database can be stored in other computer equipment, such as a server or a network attached storage (Network Attached Storage, NAS).

302. The computer device determines a target phrase from the first text.

303. The computer device determines P second texts based on the at least one target phrase. Each second text in the P second texts includes an expanded phrase based on one of the at least one target phrase. Certainly, P is a positive integer greater than or equal to 1.

In other words, the purpose of step 303 is to expand the first text into multiple texts (ie, P second texts) by determining the expanded phrase corresponding to the target phrase.

As assumed above, the first text is a sentence composed of at least two phrases, but it is not necessary to determine one or more corresponding extended phrases for all phrases. Therefore, it is necessary to determine the target phrase from the first text, and determine one or more expanded phrases corresponding to the target phrase. After one or more expanded phrases are determined, an expanded phrase is used to replace the target phrase corresponding to the expanded phrase in the first text to obtain a second text. The non-target phrase in the second text and the target phrase that does not correspond to the expanded phrase are the same as the first text.

For example, suppose the first text is "I want to find a cheap Chinese restaurant". By segmenting the first text, the computer device can determine that the first text includes the following phrases: "I", "Want to find", "a", "cheap", "the", "Chinese food", "restaurant" ". According to the preset expansion rules, the two phrases "cheap" and "Chinese food" can be determined as target phrases. The expanded phrase corresponding to "cheap" may include "affordable" and "low consumption". The expanded phrase corresponding to "Chinese food" may include "Japanese food" and "French food". Therefore, the second text determined based on "I want to find a cheap Chinese restaurant" can include:

Second text 1: I want to find a cheap Japanese restaurant.

Second text 2: I want to find a cheap French restaurant.

Second text 3: I want to find an affordable Chinese restaurant.

Second text 4: I want to find a Chinese restaurant with low consumption.

The preset extension rules may include two types: one preset extension rule may be an extension rule based on slot-slot value; the other preset extension rule may be an extension rule based on word meaning. For ease of description, the expansion rule based on slot-slot value is referred to as the first expansion rule, and the word meaning-based expansion rule is referred to as the second expansion rule.

The computer device can determine whether the phrase in the first text includes a phrase that can be expanded using the first expansion rule. More specifically, the computer device can determine whether there is a phrase in the first text that can be used as the slot value of the slot; if there are one or more phrases in the first text that can be used as the slot value of the slot The computer equipment can determine these phrases as target phrases. In order to facilitate the distinction, the phrase that can be used as the slot value is referred to as the first type of target phrase below.

For example, the computer device may search the slot value database to determine whether the phrase in the first text is a phrase that can be used as the slot value of the slot. The slot value database is composed of phrases that can be used as slot values. After the computer device performs word segmentation on the first text to obtain multiple phrases that make up the first text, it can search the slot value database to determine whether each phrase in the first text is in the slot value In the database. If one or more phrases in the first text are in the slot value database, it can be determined that the one or more phrases are the first-type target phrase.

When the computer device has determined the target phrase of the first type, it can determine the slot corresponding to each target phrase of the first type.

For example, suppose that the computer device determines the first type of target phrase by searching the slot value database. The slot value database may also include the slot corresponding to each slot value. Therefore, the computer device can also determine the slot corresponding to the phrase when it determines that a phrase is the first type of target phrase.

Taking the first text "I want to find a cheap Chinese restaurant" as an example, the computer device can determine that "Chinese food" is a phrase that can be used as a slot value. The computer device can also determine that the slot corresponding to "Chinese food" is "food type".

The computer device can also determine whether the phrase in the first text includes a phrase that can be expanded using the second expansion rule. More specifically, the computer device can determine whether there are some phrases in the first text that meet specific rules; if there are one or more phrases in the first text that meet the specific rules, the computer device can Identify these phrases as target phrases. In order to facilitate the distinction, the phrase that meets the specific rule is referred to as the second type of target phrase below.

For example, under normal circumstances, replacing phrases with personal pronouns, articles, prepositions, particles, etc. in a text will not be very helpful in training DST. However, replacing phrases with adjectives, adverbs, etc., is more helpful in training DST. Therefore, the specific rule may be that a phrase whose part of speech is a preset part of speech is the second type of target phrase. In this case, the computer device can determine the part of speech of each phrase in the first text. If the part of speech of the phrase belongs to the preset part of speech, it can be determined that the phrase is the second type of target phrase. The preset part of speech can be at least one of an adjective and an adverb.

For another example, the computer device can determine the importance of the phrase to determine whether the phrase is the second type of target phrase. If the phrase is an important phrase, the phrase can be the second type of target phrase. If the phrase is not an important phrase, then the phrase may not be the second type of target phrase. Optionally, in some embodiments, the importance of the phrase may be based on the frequency of the phrase in the training text database. The frequency of occurrence of the phrase can be determined by the ratio of the number of texts including the phrase to the total number of texts included in the training text database. If the frequency of a phrase in the training text database exceeds a preset frequency threshold, it can be determined that the phrase is a second-type target phrase. Optionally, in other embodiments, the importance of the phrase may be determined by the number of times the phrase appears in the training text database. If the number of occurrences of a phrase in the training text database exceeds a preset threshold, it can be determined that the phrase is a second-type target phrase.

Take the first text "I want to find a cheap Chinese restaurant" as an example. Assume that the computer device determines the second type of target phrase by part of speech. Then, the computer device can determine that there is a phrase whose part of speech is an adjective in the first article, that is, "cheap". In this case, the computer device can determine that "cheap" is a second type of target phrase.

After determining at least one target phrase, the computer device can determine at least one expanded phrase according to each target phrase.

Assume that the target phrase comprising at least one of first type K ₁ target phrase, the computer device may determine K ₁ K ₁ slots corresponding set of first phrase, the phrase K ₁ th first set each The first phrase set includes at least one phrase, and K ₁ is a positive integer greater than or equal to 1. Assume that the computer device has determined a total of K target phrases. It will be appreciated that, K is greater than or equal to 1 and greater than or equal to K a positive _integer. The K ₁ slots are respectively the slots of the K ₁ target phrase. In other words, the slots K ₁ K ₁ _n the first slots of the first K ₁ th first target phrase in the first category K ₁ _n of first type slots target phrase. Any one of K ₁ _n the first phrase of the first phrase set one of the K ₁ th first phrase set in the slot for the first K ₁ _n slots. K ₁ _n is equal to 1,..., K ₁ .

For example, suppose K ₁ is 10, and the slot of the fifth first-type target phrase in the 10 first-type target phrases is "food type", then the 10 first-time phrase set matches the first-type target The slot corresponding to any phrase in the first phrase set (assumed to be the fifth first phrase set) corresponding to the phrase is "food type". Suppose that the fifth first phrase set includes two phrases, namely "Japanese food" and "French food". In this case, the computer device can determine to replace "Chinese food" in the first text with "Japanese food" and "French food" respectively, thereby obtaining the second text 1 above (that is, I want to find a cheap Japanese restaurant ) And the second text 2 (that is, I want to find a cheap French restaurant).

Alternatively, in some embodiments, the computer device can determine the set of K ₁ K ₁ th slot of first phrase corresponding to a first corresponding relationship. The first correspondence includes correspondences between multiple slots and multiple first phrase sets. The slot of any phrase in each first phrase set is the same as the slot corresponding to the first phrase set.

Assume that the target phrase comprising at least one second category K ₂ target phrase, the computer device may determine K ₂ K ₂ th meaning corresponding second set of phrases, K ₂ is a positive integer equal to or greater than 1. Similarly, suppose that the computer equipment has determined a total of K target phrases. It can be understood that K is a positive integer greater than or equal to 1 and greater than or equal to K ₂ . In addition, if the values of K ₁ and K ₂ are not equal to K, then the sum of K ₁ and K ₂ is K. In other words, the computer device has determined a total of K target phrases, of which K ₁ is the first type of target phrase, and K ₂ is the second type of target phrase. The K ₂ word meanings are respectively the word meanings of K ₂ target phrases of the second type. In other words, K ₂ in the first two meanings K ₂ _n ₂ _n meaning as meaning two second type of target phrase K ₂ in the second category of the target phrase K. The meaning of any phrase _n K ₂ of the second set of phrases K ₂ in the second set of phrases in the first two K ₂ _n corresponding meaning. K ₂ _n is equal to 1,..., K ₂ .

Optionally, in some embodiments, the word meanings of two phrases corresponding to each other may mean that the meanings of the two phrases are the same. It can be said that any of these two words is the paraphrase of the other word, that is, one word is another expression of the other word. For example, "cheap" can be interpreted as "beneficial" and "low consumption".

Optionally, in other embodiments, the word meanings of two phrases corresponding to each other can mean that the two phrases have the same meaning, but also that the two phrases are antonyms of each other. For example, the phrases corresponding to "cheap" can be "expensive" and "consumption is high".

Alternatively, in some embodiments, the computer device can be determined with K ₂ K ₂ th second set of phrases corresponding to the meaning according to the second correspondence relationship. The second correspondence includes correspondences between multiple word meanings and multiple second phrase sets. The word meaning of any phrase in each second phrase set is the same as the word meaning corresponding to the second phrase set.

Optionally, in some embodiments, the computer device may determine the second phrase set corresponding to each target phrase of the second type according to the synonym database.

Optionally, in other embodiments, the computer device may determine the second group set corresponding to each target phrase of the second type according to the synonym database or the antonym database.

Optionally, in other embodiments, the computer device may use an existing paraphrase corpus to determine the second set of phrases corresponding to each target phrase of the second type. For example, the Paraphrase Database (http://paraphrase.org) is a widely used paraphrase corpus. Using the paraphrase database, a set of phrases corresponding to each target phrase of the second type can be determined. The word meanings of some phrases in a group of phrase sets determined by using the interpretation database and the word meanings of the phrases corresponding to the phrase set may not be completely the same or opposite. Take expensive (expensive) as an example. In addition to synonyms such as costly and pricey, and antonyms such as cheap (cheap) and expensive (not expensive), the phrase set obtained by using this paraphrase database also includes such as onerous. ), burdensome (cumbersome) and expensive words are neither antonyms nor synonyms. The reason for the above problem is determined by the way the paraphrase database is established, so it is not necessary to describe it in detail here. Therefore, if the K ₂ second phrase sets are determined using the paraphrase database, even if the meanings of a phrase in the second phrase set and the second type target phrase corresponding to the second phrase set are not exactly the same or opposite, It can also be said that these two phrases are corresponding.

Assuming that the second phrase set determined based on "cheap" includes the two phrases "affordable" and "consumption low", the computer device can determine to replace "cheap" in the first text with "affordable" and "consumption low" respectively , So as to obtain the above-mentioned second text 3 (that is, I want to find an affordable Chinese restaurant) and the second text 4 (that is, I want to find a low-consumption Chinese restaurant).

It is understandable that the above-mentioned first text "I want to find a cheap Chinese restaurant" includes the first type of target phrase and the second type of target phrase. Some texts in the training text database may include the first type target phrase and the second type target phrase, and some texts in the training text database may only include one of the first type target phrase and the second type target phrase. In some embodiments, some texts in the training text database may not include any one of the first type target phrase and the second type target phrase. For this kind of text (that is, excluding any one of the first type of target phrase and the second type of target phrase), the computer device may directly use the text without expansion.

304. The computer device trains DST through machine learning according to the first text and the P second texts.

Optionally, in some embodiments, the computer device may directly use the first text and P second texts as training texts for machine learning to train the DST. The specific implementation manner of the computer equipment training DST is the same as the existing implementation manner. For the sake of brevity, it is not necessary to repeat it here.

It is understandable that, in some embodiments, the computer device may also use part of the first text and the P second texts as training texts for machine learning to train the DST. For example, the computer device may use part of the first text and the P second texts to train the DST. For another example, the computer device may use the P second text or part of the second text in the P second text to train the DST.

Optionally, in some embodiments, the computer device may select part of the P second texts as training texts for machine learning in a random manner.

Optionally, in other embodiments, the computer device may use the P second texts to train a policy network model, and use the policy network model to select at least one second text from the P second texts as machine learning Training text.

Optionally, in some embodiments, the computer device may use a reinforcement learning algorithm or an evolutionary algorithm to train the strategy network model. More specifically, the computer device may use contextual bandit algorithms, genetic algorithms, etc. to train the strategy network model.

Taking the context gambling machine algorithm as an example, how to train the strategy network model is briefly introduced below.

401. The computer device determines M texts from the training text database. M is a positive integer greater than or equal to 1, and the value of M is less than the total number of texts included in the training text database.

Optionally, in some embodiments, the computer device may randomly select the M texts from the training text database.

Optionally, in other embodiments, the computer device may select the M texts from the training text database according to certain rules.

For example, the computer device can determine the M texts according to the number of texts expanded by each training text in the training text database. If part of the text in the training text library (hereinafter referred to as the first part of the text) expands more than another part of the text (hereinafter referred to as the second part of the text), the computer device can determine that the M texts belong to the first part One part of the text has more text than the second part of the text. The manner in which the computer device can select the text belonging to the M texts from the first part of the text and the second part of the text may be random or in a certain order.

For another example, if part of the text in the training database (hereinafter referred to as the third part of the text) based on the first expansion rule, the number of texts expanded based on the first expansion rule is greater than the number of text expanded based on the second expansion rule, another part of the training text ( (Hereinafter referred to as the fourth part of the text) The number of texts expanded based on the above second expansion rule is greater than the number of texts expanded based on the first expansion rule, the computer device can determine the text belonging to the third part of the M texts More than the text belonging to the fourth part of the text. The way that the computer device can select the text belonging to the M texts from the third part of the text and the fourth part of the text may be random or in a certain order.

402. The computer device determines M extended text fragment sets from the first enhanced database, where the M extended text fragment sets correspond to the M texts in a one-to-one correspondence.

For ease of description, the method of determining P second texts based on the first text in FIG. 3 is referred to as a coarse-grained data enhancement strategy below. The first enhanced database is a database composed of texts obtained after the texts in the training text database are expanded according to a coarse-grained data enhancement strategy. In other words, each text in the first enhanced database is generated based on a text in the training text database. The first enhanced database does not include the text in the training text database.

For example, suppose that the training text database includes a total of 1000 sentences. The computer equipment can use the coarse-grained enhancement strategy to expand the 1,000 sentences into 20,000 sentences, which does not include the 1,000 sentences in the training text database. It is understandable that there may be three types of sentences in these 1,000 sentences: the first type of sentence includes the above-mentioned first type of target phrase and the above-mentioned second type of target phrase; the second type of sentence only includes the above-mentioned first type of target phrase and the first type of target phrase. The first of the two types of target phrases; the third type of sentence may neither include the first type of target phrase nor the second type of target phrase. For each of the first type sentence and the second type sentence in the 1,000 sentences, the computer device can use the method shown in FIG. 3 to expand to obtain 20,000 sentences. The database composed of 20,000 sentences is the first enhanced database. The first enhanced database does not include 1000 sentences in the training text database.

In the foregoing embodiment, the granularity of the text included in the first enhanced database is the same as the granularity of the text in the training text database. For example, if the granularity in the training text database is a sentence, the granularity of the text in the first enhanced database is also a sentence. In other embodiments, the granularity of the text included in the first enhanced database may be different from the granularity of the text in the training text database. For example, if the granularity in the training text database is a sentence, the granularity of the text in the first enhanced database is also an extended phrase or a partial sentence including the extended phrase.

Take the above first text "I want to find a cheap Chinese restaurant" as an example. In some embodiments, the text corresponding to the text included in the text in the first enhanced database may include the aforementioned second text 1 to second text 4. In other embodiments, the text corresponding to the text included in the text in the first enhanced database may include "Japanese food", "French food", "benefit" and "low consumption". In other embodiments, the text in the first enhanced database includes the text corresponding to the text may include "Japanese restaurant", "French restaurant", "affordable Chinese restaurant" and "low-consumption Chinese restaurant" .

Optionally, in some embodiments, each text in the first enhanced database may include source indication information, and the source indication information may be used to indicate a text in the training text database. The text indicated by the source indication information is a text used to generate text including the source indication information.

Optionally, in other embodiments, the first enhanced database may store texts in the form of a collection. Each set includes at least one text, and the at least one text is obtained by performing coarse-grained enhancement strategy expansion on the same text in the training text database. Similarly, each set may include one source indication information, and the source indication information may be used to indicate a text in the training text database. The text indicated by the indication information is the text used to generate the text in the set.

After determining the M texts from the training text database, the computer device can determine the M extended text fragment sets corresponding to the M texts according to the source indication information in the first enhanced database.

The correspondence between the set of expanded text fragments and the text means that the expanded text fragments included in the set of expanded text fragments are determined according to the target phrase in the corresponding text.

Optionally, in some embodiments, the extended text segment may be an extended phrase. In other embodiments, the extended text segment may be a complete text including the extended phrase. In other embodiments, the extended text segment may also be a partial text including an extended phrase.

Take the above first text "I want to find a cheap Chinese restaurant" as an example. In some embodiments, the set of extended text fragments corresponding to the text includes the aforementioned second text 1 to second text 4. In other embodiments, the set of extended text fragments corresponding to the text includes "Japanese food", "French food", "affordable" and "low consumption". In other embodiments, the set of extended text fragments corresponding to the text includes "Japanese restaurant", "French restaurant", "affordable Chinese restaurant", and "low-consumption Chinese restaurant".

Correspondingly, the text fragments including the target phrase in the M texts corresponding to the M extended text fragments may be referred to as target text fragments. Similarly, in some embodiments, the target text segment may be a target phrase. In other embodiments, the target text segment may be a complete text including the target phrase. In other embodiments, the target text segment may also be a partial text including the target phrase.

Take the above first text "I want to find a cheap Chinese restaurant" as an example. In some embodiments, the target text segment corresponding to the text may be the first text. In other embodiments, the target text segment may include "cheap" and "Chinese food." In other embodiments, the target text segment may include "cheap" and "Chinese restaurant".

403. The computer device selects one extended text segment from each extended text segment set in the M extended text segment sets corresponding to the M training texts according to the reference strategy network model. For ease of description, the extended text fragments selected from the set of extended text fragments according to the reference strategy network model can be referred to as candidate text fragments.

In other words, through step 403, the computer device can determine a set of candidate text fragments according to the reference strategy network model. The candidate text fragment set includes M candidate text fragments, each of which comes from M extensions. A collection of text fragments.

The computer device may repeat step 403 T times to determine a total of T candidate text fragment sets. T is a positive integer greater than or equal to 1.

The values of M and T are preset. It is understandable that if the value of M and T is larger, the set of candidate text fragments determined by the computer device is more, and the training strategy network model has a better effect on text selection, but the training time is also The longer; on the contrary, if the values of M and T are smaller, the set of candidate text fragments determined by the computer equipment is less, and the strategy network model of the training office has a poorer effect on text selection, but the training time will be correspondingly cut back. Therefore, the values of M and T can be selected according to the performance and/or actual requirements of the computer equipment. For example, if you want to get a better strategic network model, you can choose larger values of M and T. For another example, if you want to determine a policy network model faster, you can choose a smaller value of M and T. In addition, computer equipment with different performance may have different effects of training the strategy network model in the same time. For example, if the training algorithm is the same, then the better the performance of the computer equipment training the better the effect of the strategy network model in the same time. Therefore, if the performance of the computer equipment is better, a larger value of M and T can be selected. If the performance of the computer equipment is poor, a smaller value of M can be selected.

404. The computer device evaluates the selected set of M candidate text fragments according to the initial DST, and obtains the evaluation result.

Optionally, in some embodiments, the computer device may perform a single-sample evaluation on the set of M candidate text segments according to the initial DST to obtain an evaluation result.

Optionally, in other embodiments, the computer device may perform a sample set evaluation on the M candidate text fragment sets according to the initial DST to obtain an evaluation result.

Optionally, in other embodiments, the computer device may perform single-sample evaluation and sample-set evaluation on the M candidate texts according to the initial DST to obtain the evaluation result.

Optionally, in some embodiments, the initial DST may be a DST obtained by training using the text in the training text database as the training text of machine learning according to an existing training DST manner.

Optionally, in other embodiments, the reference DST may be obtained by using some text training according to a preset lower accuracy rate (for example, lower than 80% or lower).

The single-sample evaluation performed by the computer device may include: the computer device uses the initial DST to predict the state of each candidate text fragment in the set of M candidate text fragments, and according to the prediction result, determines the first one corresponding to each candidate text fragment. Reward value. The set of M candidate text fragments includes a total of M×T candidate text fragments, and correspondingly, the evaluation result includes a total of M×T first reward values.

If the prediction result of a candidate text fragment meets the preset requirements, the computer device can determine that the first reward value of the candidate text fragment is a positive incentive; if the prediction result of a candidate text fragment does not meet the preset requirements, Then the computer device can determine that the first reward value of the candidate text segment is a reverse incentive.

The first reward value of the forward incentive is greater than the first reward value of the reverse incentive.

For example, in some embodiments, the first reward value of forward incentives may be a number greater than 0, such as 1, and the first reward value of reverse incentives may be a number less than 0, such as -1.

For another example, in other embodiments, both the first reward value of the forward incentive and the first reward value of the reverse incentive may be greater than 0, but the first reward value of the forward incentive is greater than the first reward value of the reverse incentive . For example, the first reward value of forward incentives is 10, and the first reward value of reverse incentives is 1.

According to the different ways of determining the expanded phrase in the candidate text segment, the preset requirements for the prediction result are also different.

For an expanded phrase in a candidate text fragment determined based on the first expansion rule (that is, the expanded phrase in the candidate text fragment is determined according to the first type of target phrase, for ease of description, this candidate text fragment is hereinafter referred to as the first Class candidate text fragment). The label of the first type of candidate text segment is the slot of the extended phrase in the candidate text segment. The predicted label is the same as the actual label and does not meet the preset requirements, and the predicted label is different from the actual label, which meets the preset requirements. In other words, if the initial DST predicts the first-type candidate text segment with the same label as the actual tag of the expanded phrase in the first-type candidate text segment, it indicates the prediction of the first-type candidate text segment The result did not meet the requirements. In this case, the computer device may determine that the first reward value corresponding to the first type of candidate text segment is a reverse incentive. If the initial DST predicts a first-type candidate text segment, the label obtained is different from the actual label of the extended phrase in the first-type candidate text segment, it means that the prediction result of the first-type candidate text segment conforms to Claim. In this case, the computer device may determine that the first reward value corresponding to the first type of candidate text segment is a positive incentive.

For an expanded phrase in a candidate text fragment determined based on the second expansion rule (that is, the expanded phrase in the candidate text fragment is determined according to the second type of target phrase, for ease of description, this candidate text fragment is hereinafter referred to as the second Class candidate text fragment). The label of the second type of candidate text segment is the meaning of the extended phrase in the candidate text segment. The predicted label is the same as the actual label to meet the preset requirements, and the predicted label is different from the actual label, which does not meet the preset requirements. In other words, if the initial DST predicts the second-type candidate text segment with the same label as the actual tag of the extended phrase in the second-type candidate text segment, it indicates the prediction of the second-type candidate text segment The results meet the requirements. In this case, the computer device may determine that the first reward value corresponding to the second type of candidate text segment is a reverse incentive. If the initial DST predicts a second-type candidate text segment, the label obtained is different from the actual label of the expanded phrase in the second-type candidate text segment, it means that the prediction result of the second-type candidate text segment is different. Meet the requirements. In this case, the computer device may determine that the first reward value corresponding to the second type of candidate text segment is a positive incentive.

The evaluation of the sample set by the computer device means that the computer device uses a set of candidate text fragments to train the initial DST to obtain the initial DST after training. For ease of description, the DST after training is referred to as the reference DST below. After obtaining the reference DST, the computer device can determine the second reward value corresponding to the set of candidate text segments according to the reference DST. The second reward value is the evaluation result of the sample set of the corresponding candidate text segment.

The computer device uses the T candidate text fragments included in a candidate text fragment set to train the initial DST. The process is the same as the existing training DST process. For brevity, it is unnecessary to describe in detail here.

If the accuracy of the initial DST prediction is too high, for example, higher than 90% or better, it is difficult to improve the prediction accuracy by training the initial DST. Therefore, the selected initial DST may be a DST with a lower prediction accuracy. For example, the accuracy of the initial DST prediction may be lower than 90%, or even lower than 80%.

It is understandable that the evaluation of the sample set by the computer device according to the initial DST may include: the computer device trains the initial DST according to the set of M candidate text fragments; and determines T second reward values according to the DST obtained after the training.

The computer device training the initial DST according to the M candidate text fragment sets may include: the computer device uses T DST training text sets to train the initial DST respectively. In this case, the computer device can obtain T initial DSTs after training. For ease of description, the initial DST after training is referred to as the reference DST below. The T DST training text sets are determined according to the M candidate text fragment sets. Each training text set in the T DST training text sets includes M candidate text fragments, and the M candidate text fragments are respectively from the M candidate text fragment sets. Specifically, the i-th DST training text set in the T DST training text sets includes M candidate text fragments, and the j-th candidate text fragment in the M candidate text fragments is in the M candidate text fragment set The i-th candidate text segment in the set of j-th candidate text segments.

The computer device can determine T second reward values according to the T reference DSTs respectively. The computer device determining the second reward value according to the reference DST may include: the computer device determines whether the accuracy rate of the reference DST prediction tag is higher than the accuracy rate of the initial DST prediction tag; determining whether the accuracy rate of the predicted tag has improved. 2. Reward value. If the accuracy of the predicted label is improved, the second reward value may be a positive incentive; if the accuracy of the predicted label is not improved or decreased, the second reward value is a reverse incentive.

The second reward value of the forward incentive is greater than the second reward value of the reverse incentive.

For example, in some embodiments, the second reward value of the forward incentive may be a number greater than 0, such as 1, and the second reward value of the reverse incentive may be a number less than 0, such as -1.

For another example, in other embodiments, the second reward value of the forward incentive and the second reward value of the reverse incentive may both be greater than 0, but the second reward value of the forward incentive is greater than the second reward value of the reverse incentive . For example, the second reward value of forward incentives is 10, and the second reward value of reverse incentives is 1.

The computer device can use the initial DST and the reference DST to perform label prediction on the same set of texts to determine whether the accuracy of the predicted label is improved. This set of texts used to measure the performance of the initial DST and the reference DST (that is, the accuracy of the predicted label) can be called a verification set. Optionally, in some embodiments, the verification set may be a candidate text segment set used to train the initial DST. Optionally, in other embodiments, the verification set may be any candidate text fragment set among the M candidate text fragment sets.

If the computer device performs single sample evaluation and sample set evaluation at the same time, the evaluation result determined by the computer device includes M×T first reward values and T second reward values.

405. The computer device uses the evaluation result to train a reference strategy network model.

The policy network model can be expressed as:

Among them, π _θ (s, p′) represents the probability prediction of the candidate text segment p′ based on the context state s. s is the vector representation extracted from the triples <x,y,p> and the candidate text p'. P represents the target text fragment. f(s,p') is calculated using a fully connected network, and represents the probability of p being replaced by p'. Because each target text segment can correspond to multiple extended text segments. Therefore, formula 1.1 uses a normalized way to represent the policy network model. Cp in formula 1.1 represents the set of all candidate text fragments corresponding to a target text fragment.

Represents any candidate text segment in Cp.

Represents the sum of the values calculated by the fully connected network of all p'in Cp, that is, the sum of the probability that p is replaced by each p'of Cp.

It can be seen that the larger the reward value (including the first reward value and the second reward value), it means that the predicted result is more in line with the requirements, and the candidate text segment selected by the reference strategy network model is more suitable for training DST . Therefore, it can be expected to maximize the reward signal to train the reference strategy network model to obtain a better strategy network model.

Optionally, in some embodiments, the computer device may train the parameters in the reference strategy network model through gradient learning. The expected reward signal can be equal to the gradient of the reference strategy network.

The gradient of the reference strategy network can be approximated as:

among them

Represents the gradient parameters, π _θ represents the parameters of the reference strategy network, s′ _i,j represents the state obtained by the _jth sample of the i-th sample of the original sample set _{, and p′i,j} represents the i-th sample of the original sample set The replacement text obtained from the jth sampling,

Represents the reward value evaluated by the sample set i (ie the second reward value),

Represents the evaluation reward value of the jth sampling in the sample set i (ie, the first reward value). The original sample set refers to the set of M extended text fragments determined in step 402. The i-th sample refers to the i-th DST training text set in the T DST training text sets. The j-th sampling of the i-th sample is the j-th candidate text segment in the i-th DST training text set.

In addition to using gradient learning to train the reference strategy network model, the computer device can also use other methods to train the reference strategy network model. For example, the computer device may use stochastic gradient descent (SGD), adaptive moment estimation (Adaptive Moment Estimation, Adam) methods to train the reference strategy network model.

The computer device can execute steps 401 to 405 again after sequentially executing steps 401 to 405. In other words, the computer device can execute the method shown in FIG. 4 cyclically in the order of step 401 to step 405. If the computer device determines that the number of cycles is greater than a preset number N, the cycle can be stopped. It is determined that the reference strategy network model trained when step 405 is executed for the Nth time is a strategy network model for selecting at least one second text from the P second texts as the training text for machine learning. The computer equipment can set an initial policy network model. The computer device can use the initial strategy network model to select candidate text segments in the first cycle. In other words, when the computer device executes the method shown in FIG. 4 for the first time in the N cycles, the reference strategy network model used in step 403 is the initial strategy network model. As described above, the computer device can execute step 403 in a loop T times. When step 403 is executed T times in a loop, the reference strategy network model used by the computer equipment is the same. When the computer device executes the method shown in FIG. 4 for the second to the Nth time, the reference strategy network model used when step 403 is executed in a loop T times is the reference strategy network model trained when step 405 is executed last time. In other words, when the method shown in FIG. 4 is executed for the tth time, the reference policy network model in step 403 is the reference policy network model determined when step 405 of the method shown in FIG. 4 is executed for the t-1th time, t It is a positive integer greater than or equal to 2 and less than or equal to N.

As described above, in the process of determining the second text, some extended texts determined by the computer device may not be appropriate. For example, the meanings of some phrases in a phrase set determined by using the paraphrase database and the meanings of the phrases corresponding to the phrase set may not be completely the same or opposite. The second text generated from these phrases is not suitable for training DST. Using the strategy network model determined by the method shown in FIG. 4, the second text can be filtered, and the second text that is not suitable for training DST can be filtered out. In this way, the quality of the text used for training the DST can be improved, thereby improving the performance of the trained DST.

Taking an extended text fragment set as an example, the method of training the strategy network model shown in FIG. 4 is further described.

Assume that the first text in the method shown in FIG. 3 is one of the M texts determined in step 401 of the method shown in FIG. 4. Assume that the extended text segment is a complete text including the extended phrase. Then, P second texts obtained by expanding the first text can be used as an expanded text fragment set.

501. The computer device uses the reference strategy network model to select a second text from the P second texts.

The computer device can execute step 501 T times. In other words, the computer device has determined T second texts in total from the P second texts. The value of P can be greater than or less than T. Duplicate text may appear in the T second text. The T second texts are T candidate text segments. The T second texts respectively belong to M candidate text fragment sets.

502. The computer device evaluates the T second texts according to the initial DST, and obtains an evaluation result.

Optionally, in some embodiments, the computer device evaluating the T second texts according to the initial DST includes: the computer device may perform a single-sample evaluation on the T second texts.

Optionally, in other embodiments, the computer device evaluating the T second texts according to the initial DST includes: the computer device evaluating the sample set according to the T second texts.

Optionally, in other embodiments, the computer device evaluates the T second text according to the initial DST, including: the computer device performs a single-sample evaluation on the T second text and performs a single-sample evaluation according to the T second text. Two text fragments are evaluated for sample collection.

The computer device can perform a single-sample evaluation of the T second texts, including: the computer device can use the initial DST to predict the state of each second text in the T second texts to obtain T prediction results, according to The M prediction results determine T first reward values, and the T first reward values have a one-to-one correspondence with T second texts. In other words, the j-th first reward value in the T first reward values is determined according to the initial DST prediction result of the i-th second text in the T second texts. The specific implementation of single-sample evaluation can refer to the method shown in FIG. 4, and it is not necessary to repeat it here.

The evaluation of the sample set by the computer device according to the T second text segments may include: the computer device uses the T second texts to train the initial DST; and according to the initial DST after training, determining T second reward values.

Specifically, the computer device using the T second texts to train the initial DST may include: the computer device training the initial DST with T DST training text sets. The T second texts belong to the T DST training text sets respectively. In other words, the i-th second text in the T second texts is a text in the i-th DST training text set in the T DST training text sets.

For the specific implementation manner of the computer device determining the second reward value according to the T DST training text sets, refer to the method shown in FIG. 4, and it is unnecessary to repeat it here.

If the computer device only performs single-sample evaluation, the evaluation result includes T first reward values.

If the computer device only performs sample set evaluation, the evaluation result includes T second reward values.

If the computer device performs single-sample evaluation and sample-set evaluation, there are T first reward values and T second reward values in the evaluation result.

503. The computer device may train the reference strategy network model according to the evaluation result.

It can be understood that the evaluation result determined in step 503 is a subset of the evaluation result determined in step 404 in FIG. 4 or the same as the evaluation result.

Specifically, if only the sample set evaluation is performed in step 404, then only the sample set evaluation is also performed in step 502. At this time, the evaluation result determined in step 503 is the same as the evaluation result determined in step 404.

If single-sample evaluation is performed in step 404, then single-sample evaluation is also performed in step 502. At this time, the evaluation result determined in step 404 includes the evaluation result determined in step 503. As described above, when performing single-sample evaluation, the evaluation result determined in step 404 includes the M×T first predicted values. The evaluation result determined in step 503 includes T first prediction values, and the T first prediction values in the evaluation result determined in step 503 belong to the corresponding M×T first prediction values in the evaluation result determined in step 404.

In the case that the policy network model is determined according to the method shown in FIG. 4, the computer device can use the policy network model to select part of the text in the first enhanced database to form a second enhanced database, and use the second enhanced database Training DST. For ease of description, the method of determining P second texts based on the first text in FIG. 3 is referred to as a coarse-grained data enhancement strategy. The policy network model determined in FIG. The textual approach is called a fine-grained data enhancement strategy.

Take the training text database of 1000 sentences and the first enhanced database of 20000 sentences as an example. Based on the coarse-grained data enhancement strategy, the computer equipment expands 1,000 sentences in the training text database to 20,000 sentences in the first augmented database. After that, the computer device can also select part of the text in the first enhanced database to form a second enhanced database based on a fine-grained data enhancement strategy. In other words, the computer device can use the strategy network model to select part of the text in the first enhanced database to form a second enhanced database. Assume that the computer device uses a fine-grained data enhancement strategy to select 12,000 sentences from the 20,000 sentences in the first augmented database. These 12,000 sentences are the sentences included in the second enhanced database. After the second enhanced database is determined, the computer device can use all the sentences in the second enhanced database and all the sentences in the training text database as the training text for machine learning, and train to obtain the DST. The DST can realize the function of DST 102 in the dialogue system 100 shown in FIG. 1 and the function of DST shown in FIG. 2.

It can be seen that, compared with the existing DST training solution, the method of the present application can expand the training text used for training DST from 1000 to 12100. Increasing the number of samples of training text used for training DST can improve the performance of the trained DST, so that the DST can more accurately determine the slot-slot value in the user's content, and improve the accuracy of the intent determined by the DST And improve the accuracy of determining the slot value of the unfilled slot.

Fig. 6 is a structural block diagram of a computer device provided according to an embodiment of the present application. The computer device 600 shown in FIG. 6 includes: an acquiring unit 601 and a processing unit 602.

The acquiring unit 601 is configured to acquire a first text, the first text is a text in a training text database, and the first text includes at least two phrases.

The processing unit 602 is configured to determine at least one target phrase from the first text.

The processing unit 602 is further configured to determine P second texts according to the at least one target phrase. Each second text in the P second texts includes an expanded phrase based on the at least one target phrase A certain, P is a positive integer greater than or equal to 1;

The processing unit 602 is further configured to train a dialog state tracking classifier based on the first text and the P second texts through machine learning. The dialog state tracking classifier is used to predict the conversation status based on the acquired user’s dialog. Current status.

The acquiring unit 601 may be implemented by a transceiver, and the processing unit 602 may be implemented by a processor. The specific functions and beneficial effects of the acquiring unit 601 and the processing unit 602 can be referred to the methods shown in FIG. 3 to FIG. 5, and details are not required here.

Fig. 7 is a structural block diagram of a computer device provided according to an embodiment of the present application. The computer device 700 shown in FIG. 7 includes a processor 701, a memory 702, and a transceiver 703.

The processor 701, the memory 702, and the transceiver 703 communicate with each other through an internal connection path to transfer control and/or data signals.

The method disclosed in the foregoing embodiment of the present application may be applied to the processor 701 or implemented by the processor 701. The processor 701 may be an integrated circuit chip with signal processing capabilities. In the implementation process, the steps of the foregoing method can be completed by hardware integrated logic circuits in the processor 701 or instructions in the form of software. The aforementioned processor 701 may be a general-purpose processor, a digital signal processor (digital signal processor, DSP), an application specific integrated circuit (ASIC), a ready-made programmable gate array (field programmable gate array, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor. The software module can be located in random access memory (RAM), flash memory, read-only memory (read-only memory, ROM), programmable read-only memory, or electrically erasable programmable memory, registers, etc. mature in the field Storage medium. The storage medium is located in the memory 702, and the processor 701 reads instructions in the memory 702, and completes the steps of the foregoing method in combination with its hardware.

Optionally, in some embodiments, the memory 702 may store instructions for executing the method executed by the computer device in the method shown in FIGS. 3 to 5. The processor 701 can execute the instructions stored in the memory 702 in combination with other hardware (for example, the transceiver 703) to complete the steps of the computer device in the method shown in FIGS. 3 to 5. The specific working process and beneficial effects can be seen in FIGS. 3 to 5. Show the description in the embodiment.

An embodiment of the present application also provides a chip, which includes a transceiver unit and a processing unit. Among them, the transceiver unit may be an input/output circuit or a communication interface; the processing unit is a processor or microprocessor or integrated circuit integrated on the chip. The chip can execute the method of the computer device in the above method embodiment.

The embodiment of the present application also provides a computer-readable storage medium on which an instruction is stored, and the method of the computer device in the foregoing method embodiment is executed when the instruction is executed.

The embodiment of the present application also provides a computer program product containing instructions that, when executed, execute the method of the computer device in the foregoing method embodiment.

A person of ordinary skill in the art may be aware that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.

Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the above-described system, device, and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

In addition, the functional units in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.

If the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .

The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in this application. Should be covered within the scope of protection of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims

A method for training a dialog state tracking classifier, characterized in that the method includes:

Acquiring a first text, where the first text is a text in a training text database, and the first text includes at least two phrases;

Determine at least one target phrase from the first text;

According to the at least one target phrase, determine P second texts, each of the P second texts includes an expanded phrase, the expanded phrase is determined based on one of the at least one target phrase , P is a positive integer greater than or equal to 1;

According to the first text and the P second texts, a dialogue state tracking classifier is trained through machine learning, and the dialogue state tracking classifier is used to track the state of the dialogue according to the acquired dialogue of the user.
The method according to claim 1, wherein the determining P second texts according to the at least one target phrase comprises:

Determine K 1 first phrase sets corresponding to K 1 slots, wherein the K 1 slots are respectively the slots of the K 1 target phrase in the at least one target phrase, and K 1 is greater than or equal to A positive integer of 1;

Determine P 1 second text, where P 1 second text includes extended phrases belonging to the K 1 first phrase set, said P second text includes said P 1 second text, P 1 is A positive integer greater than or equal to 1.
The method according to any one of claims 1 or 2, wherein the determining P second texts according to the at least one target phrase comprises:

Determine K 2 second phrase sets corresponding to K 2 word meanings, wherein the K 2 word meanings are respectively the word meanings of K 2 target phrases, and K 2 is a positive integer greater than or equal to 1;

Determine P 2 second texts, where the expanded phrases included in P 2 second texts belong to the K 2 second phrase sets, the P second texts include the P 2 second texts, and P 2 is A positive integer greater than or equal to 1.
The method according to any one of claims 1 to 3, wherein the training a dialogue state tracking classifier according to the first text and the P second texts through machine learning comprises:

Determine at least one second text from the P second texts according to the policy network model;

Using the first text and the at least one second text as the training text of the machine learning to train the dialogue state tracking classifier.
The method according to claim 4, wherein the method further comprises:

According to the reference strategy network model, determine T second texts from P second texts, where T is a positive integer greater than or equal to 1;

Determine the evaluation result according to the initial dialog state tracking classifier and the T second text;

According to the evaluation result, the reference strategy network model is trained to obtain the strategy network model.
The method of claim 5, wherein the tracking classifier and the T second texts according to the initial dialog state are used to determine the evaluation result:

Use the initial dialogue state tracking classifier to predict the state of each second text in the T second texts to obtain T prediction results, and determine T first reward values according to the T prediction results; or

Use the T second texts to train the initial dialogue state tracking classifier; and determine T second reward values according to the trained initial dialogue state tracking classifier.
A computer equipment, characterized in that the computer equipment includes:

An obtaining unit, configured to obtain a first text, where the first text is a text in a training text database, and the first text includes at least two phrases;

A processing unit, configured to determine at least one target phrase from the first text;

The processing unit is further configured to determine P second texts according to the at least one target phrase. Each second text in the P second texts includes an extended phrase, and the extended phrase is based on the At least one target phrase is determined, and P is a positive integer greater than or equal to 1;

The processing unit is further configured to train a dialogue state tracking classifier through machine learning according to the first text and the P second texts, and the dialogue state tracking classifier is used to obtain the conversations of the user, Track the status of the conversation.
The computer apparatus according to claim 7, characterized in that the processing unit is configured to determine K 1 K 1 slots corresponding set of first phrases, K 1 wherein said slots are respectively said at least one slot in the target phrase target phrase K 1, K 1 is a positive integer equal to or greater than 1;

Determine P 1 second text, where P 1 second text includes extended phrases belonging to the K 1 first phrase set, said P second text includes said P 1 second text, P 1 is A positive integer greater than or equal to 1.
7 or the computer device as claimed in claim 8, wherein the processing unit is configured to determine K 2 K 2 th meaning corresponding second set of phrases, wherein the meanings are K 2 K th target phrase meaning 2, K 2 is a positive integer equal to or greater than 1;

Determine P 2 second texts, where the expanded phrases included in P 2 second texts belong to the K 2 second phrase sets, the P second texts include the P 2 second texts, and P 2 is A positive integer greater than or equal to 1.
The computer device according to any one of claims 7 to 9, wherein the processing unit is specifically configured to determine at least one second text from the P second texts according to a policy network model;

Using the first text and the at least one second text as the training text of the machine learning to train the dialogue state tracking classifier.
The computer device according to claim 10, wherein the processing unit is further configured to determine T second texts from P second texts according to the reference strategy network model, where T is a positive value greater than or equal to 1. Integer

Determine the evaluation result according to the initial dialogue state tracking classifier and the T second text;

According to the evaluation result, the reference strategy network model is trained to obtain the strategy network model.
The computer device according to claim 11, wherein the processing unit is specifically configured to use the initial dialog state tracking classifier to predict the state of each second text in the T second texts to obtain T Predicting results, and determining T first reward values according to the T predicting results; or

Use the T second texts to train the initial dialogue state tracking classifier; and determine T second reward values according to the trained initial dialogue state tracking classifier.
A computer device, characterized in that, the computer device includes a memory and a processor, the memory stores instructions, and the processor is used to call the instructions in the memory to execute the method according to any one of claims 1 to 6 .
A computer-readable storage medium, wherein the computer-readable storage medium stores instructions for implementing the method according to any one of claims 1 to 6.