WO2020228732A1 - Method for training dialog state tracker, and computer device - Google Patents

Method for training dialog state tracker, and computer device Download PDF

Info

Publication number
WO2020228732A1
WO2020228732A1 PCT/CN2020/089988 CN2020089988W WO2020228732A1 WO 2020228732 A1 WO2020228732 A1 WO 2020228732A1 CN 2020089988 W CN2020089988 W CN 2020089988W WO 2020228732 A1 WO2020228732 A1 WO 2020228732A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
texts
phrase
computer device
training
Prior art date
Application number
PCT/CN2020/089988
Other languages
French (fr)
Chinese (zh)
Inventor
尹伊淳
尚利峰
蒋欣
陈晓
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2020228732A1 publication Critical patent/WO2020228732A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Definitions

  • This application relates to the field of artificial intelligence, and more specifically, to a method and computer equipment for training a conversation state tracking classifier.
  • Artificial Intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
  • artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Natural language processing is an important branch in the field of artificial intelligence.
  • Dialogue system is an application direction of natural language processing.
  • Common dialogue systems include automatic dialogue robots and voice assistants. Different from traditional retrieval, the text input by the user in the dialogue system is usually a complete sentence, and the text input by the user is usually a colloquial sentence. Therefore, the dialogue system needs to understand and track the user's needs according to the text input by the user, and determine the reply content according to the user's needs.
  • the dialogue state tracker (DST) is responsible for understanding and tracking the needs of users during the dialogue process, and determining and outputting the conversation state.
  • the session state output by DST represents the user's needs.
  • the dialogue system can determine the reply content according to the conversation state output by DST.
  • Machine learning is now a common way to determine DST.
  • the machine learning process requires high-quality training text.
  • high-quality training text is difficult to collect.
  • the number of high-quality training texts that can be collected currently is small.
  • the current high-quality training texts that can be collected involve fewer scenes. Therefore, the diversity of training samples is also poor. Due to the small number and poor diversity of training texts used for machine learning, the performance of DST obtained through machine learning will not be particularly high.
  • the present application provides a method and computer equipment for training a dialog state tracking classifier to provide the performance of the dialog state tracking classifier.
  • an embodiment of the present application provides a method for training a dialog tracking classifier, the method comprising: obtaining a first text, the first text is a text in a training text database, and the first text includes at least two phrases ; Determine at least one target phrase from the first text; determine P second texts based on the at least one target phrase, each second text in the P second texts includes an extended phrase, the extended phrase is based on If one of the at least one target phrase is determined, P is a positive integer greater than or equal to 1; according to the first text and the P second text, through machine learning, a dialogue state tracking classifier is trained, and the dialogue state tracking classification The device is used to track the status of the conversation based on the acquired conversation of the user.
  • the above technical solution can increase the number of training text samples used to train the dialogue state tracking classifier, and improve the performance of the trained dialogue state tracking classifier, so that the dialogue state tracking classifier can more accurately determine the slots in the user's expression content. Bit-slot value, and improve the accuracy of the intent determined by the dialog state tracking classifier and the accuracy of determining the slot with the unfilled slot value
  • the phrase at least one object based on the determined P second text comprising: determining K 1 K 1 corresponding to the slots of first phrases Set, where the K 1 slots are the slots of K 1 target phrases in the at least one target phrase, K 1 is a positive integer greater than or equal to 1, and P 1 second text is determined, among which P 1 The extended phrase included in the second text belongs to the K 1 first phrase set, the P second text includes the P 1 second text, and P 1 is a positive integer greater than or equal to 1.
  • the number of training texts used for training the dialog state tracking classifier is increased by changing the slot value of the same slot.
  • the phrase at least one object based on the determined P second text, comprising: determining K 2 K 2 a second set of phrases corresponding to the meaning , Where the K 2 word meanings are the word meanings of K 2 target phrases, K 2 is a positive integer greater than or equal to 1; determine P 2 second texts, where P 2 second texts include extended phrases belonging to the K A set of 2 second phrases, the P second text includes the P 2 second text, and P 2 is a positive integer greater than or equal to 1.
  • the above technical solution is based on the meaning of the phrase to increase the number of training texts used to train the dialogue state tracking classifier.
  • the training of the dialogue state tracking classifier according to the first text and the P second texts through machine learning includes: according to a policy network model, Determine at least one second text from the P second texts; use the first text and the at least one second text as the training text for the machine learning to train the dialogue state tracking classifier.
  • the above technical solution can filter the second text and filter out the second text that is not suitable for training the dialogue state tracking classifier. In this way, the quality of the text used for training the dialogue state tracking classifier can be improved, thereby improving the performance of the trained dialogue state tracking classifier.
  • the method further includes: determining T second texts from the P second texts according to the reference strategy network model, where T is greater than or equal to 1.
  • T is greater than or equal to 1.
  • a positive integer the evaluation result is determined according to the initial dialogue state tracking classifier and the T second text; according to the evaluation result, the reference strategy network model is trained to obtain the strategy network model.
  • the evaluation result is determined according to the initial dialogue state tracking classifier and the T second texts: the initial dialogue state tracking classifier is used to predict the T For the state of each second text in the second text, obtain T prediction results, and determine T first reward values according to the T prediction results; or use the T second text to track the classifier for the initial dialogue state Perform training; track the classifier according to the initial dialogue state after training, and determine T second reward values.
  • the evaluation result is determined according to the initial dialogue state tracking classifier and the T second texts: the initial dialogue state tracking classifier is used to predict the T For the state of each second text in the second text, T prediction results are obtained. According to the T prediction results, T first reward values are determined; use the T second texts to perform the initial dialog state tracking classifier Training: Track the classifier according to the initial dialogue state after training, and determine T second reward values.
  • an embodiment of the present application provides a method for determining the state of a dialogue, the method includes: acquiring a user’s dialogue; using a dialogue state tracking classifier to track the state of the dialogue, wherein the dialogue state tracking classifier is based on the first One aspect or any possible implementation manner of the first aspect is determined.
  • an embodiment of the present application provides a computer device, which includes a unit for executing the method described in the first aspect or any one of the possible implementation manners of the first aspect.
  • the computer device of the third aspect may be a computer device, or may be a component (such as a chip or a circuit, etc.) that can be used in a computer device.
  • an embodiment of the present application provides a computer device, which includes a unit for executing the method described in the second aspect.
  • the computer device of the fourth aspect may be a computer device, or may be a component (for example, a chip or a circuit, etc.) used in a computer device.
  • an embodiment of the present application provides a computer device that includes a memory and a processor, the memory stores instructions, and the processor invokes the instructions in the memory to execute the first aspect or any one of the first aspects. The method described in the implementation mode.
  • an embodiment of the present application provides a computer device including a memory and a processor, the memory stores instructions, and the processor invokes the instructions in the memory to execute the method described in the second aspect.
  • an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores instructions for implementing the first aspect or any one of the possible implementation manners of the first aspect.
  • an embodiment of the present application provides a computer-readable storage medium that stores instructions for implementing the method described in the second aspect.
  • this application provides a computer program product containing instructions, when the computer program product is run on a computer, the computer can execute the first aspect or any one of the possible implementations of the first aspect. method.
  • this application provides a computer program product containing instructions, which when the computer program product runs on a computer, causes the computer to execute the method described in the second aspect.
  • Figure 1 is a schematic diagram of a common dialogue system.
  • Figure 2 is a schematic diagram of the work of DST.
  • Fig. 3 is a schematic flowchart of training DST provided according to an embodiment of the present application.
  • Fig. 4 is a schematic flowchart of a training strategy network model provided according to an embodiment of the present application.
  • Fig. 5 is a schematic flowchart of a method for training the policy network model by using the P second texts.
  • Fig. 6 is a structural block diagram of a computer device provided according to an embodiment of the present application.
  • Fig. 7 is a structural block diagram of a computer device provided according to an embodiment of the present application.
  • references described in this specification to "one embodiment” or “some embodiments”, etc. mean that one or more embodiments of the present application include a specific feature, structure, or characteristic described in combination with the embodiment. Therefore, the phrases “in one embodiment”, “in some embodiments”, “in some other embodiments”, “in some other embodiments”, etc. appearing in different places in this specification are not necessarily All refer to the same embodiment, but mean “one or more but not all embodiments” unless it is specifically emphasized otherwise.
  • the terms “including”, “including”, “having” and their variations all mean “including but not limited to” unless otherwise specifically emphasized.
  • Figure 1 is a schematic diagram of a common dialogue system.
  • the dialogue system 100 may include a speech recognition (automatic speech recognition, ASR) module 101, a dialogue state tracker (DST) 102, and a dialogue policy learning (DPL) module 103 , A dialogue generation (natural language generation, NLG) module 104 and a voice broadcast (text to speech, TTS) module 105 are implemented.
  • ASR automatic speech recognition
  • DST dialogue state tracker
  • DPL dialogue policy learning
  • NLG natural language generation
  • TTS voice broadcast
  • the main function of the ASR module is to recognize the user's voice as text content.
  • the ASR module can learn what the user is saying, but it cannot understand what the user means.
  • the understanding of the semantics will be handled by the NLU module.
  • DST can be used to understand the user's intent and perform slot analysis.
  • DST can analyze the content shown in Table 1.
  • Intent can be understood as a classifier that determines which type of sentence the user expresses, and then the program corresponding to this type will do a special analysis.
  • the "program corresponding to this type” can be a bot.
  • the user says: "Put me a happy song.”
  • DST judges that the user's intention classification is music, so it calls out music
  • the bot recommends a song to the user to play. When the user feels that it is not right, he says: "change another song", or the music bot continues to serve the user until the user expresses other questions and the intention is no longer music , And then switch to another robot to serve users.
  • DST needs to further understand the content of the dialogue. For simplicity, you can choose the most core part to understand, and the others can be ignored. Those most important parts can be called Slots.
  • the content of the bit can be referred to as the slot value.
  • a slot is included in the sentence "Looking for a restaurant”.
  • the slot is "Type of food” and the corresponding slot value is "Chinese food”.
  • the starting point of the design is to define slots. In other words, the designer needs to design which slots are required to complete the content of the user query.
  • the designer can design the following slots: location, price, request, food type.
  • the dialogue system needs to know the slot value of the above slot to be able to provide users with appropriate query results.
  • DST can also be used to track conversation status.
  • the dialogue state can be understood as the slot filling of the current task.
  • the filling status of the slot may include whether the slot has been filled (that is, whether there is a corresponding slot value), and the filled slot value.
  • the DST can continue to determine which of the slots corresponding to the intent have no corresponding slot value, and perform the probability of the existing slot value.
  • the NLU module can determine that the user's intention is “looking for a restaurant”.
  • the slots corresponding to the intent are "location”, “price”, “request”, and "food type”.
  • DST can determine that there is only one slot value of "food type” in the user's expression based on the slot corresponding to the intention of "finding a restaurant”. In this case, DST can determine the missing slot value of the following slots: "location”, "price”, "request”. DST can also determine the probability of "Chinese food.”
  • the embodiment of the application provides a method of how to train DST.
  • the main function of the DPL module is to determine the follow-up processing strategy according to the dialogue state output by the DST. He also said "My mother likes to eat Chinese food. Is there anything you can recommend?". According to the dialog status output by DST, the DPL module can find that the slot values of the three slots "location", "price” and "request” are missing. Therefore, the DPL module can trigger the "reversely ask restaurant information" action and pass this action to the NLG module.
  • the main function of the NLG module is to generate dialogue. For example, after the DPL module passes the action of "requesting restaurant information" to the NLG module, the NLG module can generate the following content "I found 10 Chinese food restaurants. Where do you want to eat?".
  • the main function of the TTS module is to broadcast conversations to users.
  • the TTS module can convert the content output by the NLG module into text-to-speech, and broadcast the dialogue generated by the dialogue system to the user through the output device.
  • the dialogue system 100 shown in FIG. 1 is just a common dialogue system that can be applied to the technical solutions provided by this application.
  • other dialog systems can also apply the technical solutions provided in this application.
  • the user can talk to the dialogue system through text.
  • the dialogue system may not include the ASR module and the TTS module.
  • the dialogue system may not include the ASR module but include the TTS module.
  • the user can enter the dialogue by text, and the dialogue system can reply by voice.
  • each module in the dialogue system 100 shown in FIG. 1 is only a possible way of division.
  • each module in the dialogue system can also have other divisions.
  • one module of the system 100 shown in FIG. 1 can be divided into multiple modules according to functions, and different modules have different functions.
  • two or more modules in the system 100 shown in FIG. 1 can be combined into one module.
  • FIG. 2 is a schematic diagram of the work of DST.
  • the DST 200 as shown in FIG. 2 includes a semantic encoding module 201, a semantic encoding module 202, a semantic fusion module 203, a prediction module 204, and a status update module 205.
  • Dialogue system I found 10 Chinese restaurants. Where would you like to eat?
  • Dialogue system Mama Zhang's Sichuan Flavor Museum is located at No. 100 Ande Road.
  • the semantic encoding module 201 may be used to determine the semantic vector according to the reply content of the last round of dialogue system.
  • the semantic encoding module 202 may be used to determine the semantic vector according to the content expressed by the current round of users.
  • the semantic vector determined by the semantic encoding module 201 is referred to as semantic vector 1
  • the semantic vector determined by the semantic encoding module 202 is referred to as semantic vector 2.
  • the semantic fusion module 203 can be used to obtain the semantic vector 1 determined by the semantic encoding module 201 and the semantic vector 2 determined by the semantic encoding module 202; to merge the semantic vector 1 and the semantic vector 2 to determine a new fused semantic vector.
  • the prediction module 204 may perform probabilistic prediction on the possible slot-slot value two-tuple according to the fused semantic vector determined by the semantic fusion module 203.
  • the slot value with the highest predicted probability can be used as the predicted slot value.
  • the status update module 205 may determine the accumulated slot-slot value of the current round according to the slot-slot value determined according to the user's expression content in the previous round and the slot-slot value determined by the user's expression content of the current round.
  • the reply content of the last round of dialogue system input into the semantic encoding module 201 is "Found 10 Chinese restaurants, where do you want to eat?”, input into the semantic encoding module 202 in the current round of user expressions
  • the content is "As long as the price is cheap, the location does not matter. Can you tell me the name and location of the restaurant?”.
  • the prediction result determined by the prediction module 204 is shown in FIG. 2. For the sake of brevity, Fig. 2 does not show all slot-slot value prediction results.
  • the status update module 205 determines the slot-slot value according to the user's expression content in the previous round, and the slot-slot value determined according to the content expressed by the user in the current round.
  • the values are ⁇ price, cheap>, ⁇ location, no demand>, ⁇ request, name>, ⁇ request, location>.
  • the accumulated slot-slot values in the current round determined by the status update module 205 are: ⁇ food type, Chinese food>, ⁇ price, cheap>, ⁇ location, no demand>, ⁇ request, name>, and ⁇ request, location>.
  • the embodiment of the present application provides a method for training DST.
  • the computer equipment can expand the text in the training text database, increase the training text that can be used to train the DST, and use the expanded training text to train the DST.
  • the following takes a text as an example to introduce how computer equipment expands the text, and how to use the expanded text to train DST.
  • Fig. 3 is a schematic flowchart of training DST provided according to an embodiment of the present application.
  • the method shown in FIG. 3 can be executed by a computer device.
  • the embodiment of the present application does not limit the specific form of the computer device.
  • the computer device may be a personal computer, a laptop computer (laptop), a tablet computer, a workstation, or a server.
  • the DST trained by the method shown in FIG. 3 can implement the function of DST 102 in FIG. 1 or the DST shown in FIG. 2.
  • the computer device obtains a first text, where the first text is a text in a training text database, and the first text includes at least two phrases.
  • the phrase referred to in the embodiments of the present application may be an n-gram, where n is a positive integer greater than or equal to 1.
  • the N-gram phrase represents a text segment composed of n consecutive words.
  • a unary phrase is a text fragment composed of one word
  • a binary phrase is a text fragment composed of two words
  • a triple phrase is a text fragment composed of three words.
  • the granularity of the text in the training text database may be sentences.
  • each text in the training text database is a sentence.
  • the granularity of the text in the training text database may be a text fragment composed of multiple n-gram phrases, and the text fragment may not be a complete sentence.
  • the granularity of the text in the training text database may be a text composed of multiple sentences.
  • the granularity of the text in the training text database is sentences.
  • the first text is a sentence composed of at least two phrases.
  • the training text database can be stored in a storage device in the computer device.
  • the training text database is stored in an externally connected storage device, such as a mobile hard disk, U disk, etc.
  • the training text database can be stored in other computer equipment, such as a server or a network attached storage (Network Attached Storage, NAS).
  • NAS Network Attached Storage
  • the computer device determines a target phrase from the first text.
  • the computer device determines P second texts based on the at least one target phrase.
  • Each second text in the P second texts includes an expanded phrase based on one of the at least one target phrase.
  • P is a positive integer greater than or equal to 1.
  • step 303 the purpose of step 303 is to expand the first text into multiple texts (ie, P second texts) by determining the expanded phrase corresponding to the target phrase.
  • the first text is a sentence composed of at least two phrases, but it is not necessary to determine one or more corresponding extended phrases for all phrases. Therefore, it is necessary to determine the target phrase from the first text, and determine one or more expanded phrases corresponding to the target phrase. After one or more expanded phrases are determined, an expanded phrase is used to replace the target phrase corresponding to the expanded phrase in the first text to obtain a second text.
  • the non-target phrase in the second text and the target phrase that does not correspond to the expanded phrase are the same as the first text.
  • the computer device can determine that the first text includes the following phrases: “I”, “Want to find”, “a”, “cheap”, “the”, “Chinese food”, “restaurant” ".
  • the two phrases “cheap” and “Chinese food” can be determined as target phrases.
  • the expanded phrase corresponding to "cheap” may include “affordable” and “low consumption”.
  • the expanded phrase corresponding to "Chinese food” may include "Japanese food” and "French food”. Therefore, the second text determined based on "I want to find a cheap Chinese restaurant” can include:
  • Second text 1 I want to find a cheap Japanese restaurant.
  • Second text 2 I want to find a cheap French restaurant.
  • Second text 3 I want to find an affordable Chinese restaurant.
  • Second text 4 I want to find a Chinese restaurant with low consumption.
  • the preset extension rules may include two types: one preset extension rule may be an extension rule based on slot-slot value; the other preset extension rule may be an extension rule based on word meaning.
  • one preset extension rule may be an extension rule based on slot-slot value
  • the other preset extension rule may be an extension rule based on word meaning.
  • the expansion rule based on slot-slot value is referred to as the first expansion rule
  • the word meaning-based expansion rule is referred to as the second expansion rule.
  • the computer device can determine whether the phrase in the first text includes a phrase that can be expanded using the first expansion rule. More specifically, the computer device can determine whether there is a phrase in the first text that can be used as the slot value of the slot; if there are one or more phrases in the first text that can be used as the slot value of the slot.
  • the computer equipment can determine these phrases as target phrases. In order to facilitate the distinction, the phrase that can be used as the slot value is referred to as the first type of target phrase below.
  • the computer device may search the slot value database to determine whether the phrase in the first text is a phrase that can be used as the slot value of the slot.
  • the slot value database is composed of phrases that can be used as slot values. After the computer device performs word segmentation on the first text to obtain multiple phrases that make up the first text, it can search the slot value database to determine whether each phrase in the first text is in the slot value In the database. If one or more phrases in the first text are in the slot value database, it can be determined that the one or more phrases are the first-type target phrase.
  • the computer device When the computer device has determined the target phrase of the first type, it can determine the slot corresponding to each target phrase of the first type.
  • the slot value database may also include the slot corresponding to each slot value. Therefore, the computer device can also determine the slot corresponding to the phrase when it determines that a phrase is the first type of target phrase.
  • the computer device can determine that "Chinese food” is a phrase that can be used as a slot value.
  • the computer device can also determine that the slot corresponding to "Chinese food” is "food type”.
  • the computer device can also determine whether the phrase in the first text includes a phrase that can be expanded using the second expansion rule. More specifically, the computer device can determine whether there are some phrases in the first text that meet specific rules; if there are one or more phrases in the first text that meet the specific rules, the computer device can Identify these phrases as target phrases. In order to facilitate the distinction, the phrase that meets the specific rule is referred to as the second type of target phrase below.
  • the specific rule may be that a phrase whose part of speech is a preset part of speech is the second type of target phrase.
  • the computer device can determine the part of speech of each phrase in the first text. If the part of speech of the phrase belongs to the preset part of speech, it can be determined that the phrase is the second type of target phrase.
  • the preset part of speech can be at least one of an adjective and an adverb.
  • the computer device can determine the importance of the phrase to determine whether the phrase is the second type of target phrase. If the phrase is an important phrase, the phrase can be the second type of target phrase. If the phrase is not an important phrase, then the phrase may not be the second type of target phrase.
  • the importance of the phrase may be based on the frequency of the phrase in the training text database. The frequency of occurrence of the phrase can be determined by the ratio of the number of texts including the phrase to the total number of texts included in the training text database. If the frequency of a phrase in the training text database exceeds a preset frequency threshold, it can be determined that the phrase is a second-type target phrase.
  • the importance of the phrase may be determined by the number of times the phrase appears in the training text database. If the number of occurrences of a phrase in the training text database exceeds a preset threshold, it can be determined that the phrase is a second-type target phrase.
  • the computer device determines the second type of target phrase by part of speech. Then, the computer device can determine that there is a phrase whose part of speech is an adjective in the first article, that is, "cheap". In this case, the computer device can determine that "cheap" is a second type of target phrase.
  • the computer device can determine at least one expanded phrase according to each target phrase.
  • the computer device may determine K 1 K 1 slots corresponding set of first phrase, the phrase K 1 th first set each The first phrase set includes at least one phrase, and K 1 is a positive integer greater than or equal to 1.
  • K is greater than or equal to 1 and greater than or equal to K a positive integer.
  • the K 1 slots are respectively the slots of the K 1 target phrase.
  • the slots K 1 K 1 _n the first slots of the first K 1 th first target phrase in the first category K 1 _n of first type slots target phrase. Any one of K 1 _n the first phrase of the first phrase set one of the K 1 th first phrase set in the slot for the first K 1 _n slots.
  • K 1 _n is equal to 1,..., K 1 .
  • the 10 first-time phrase set matches the first-type target
  • the slot corresponding to any phrase in the first phrase set (assumed to be the fifth first phrase set) corresponding to the phrase is "food type”.
  • the fifth first phrase set includes two phrases, namely "Japanese food” and "French food”.
  • the computer device can determine to replace "Chinese food” in the first text with "Japanese food” and "French food” respectively, thereby obtaining the second text 1 above (that is, I want to find a cheap Japanese restaurant ) And the second text 2 (that is, I want to find a cheap French restaurant).
  • the computer device can determine the set of K 1 K 1 th slot of first phrase corresponding to a first corresponding relationship.
  • the first correspondence includes correspondences between multiple slots and multiple first phrase sets.
  • the slot of any phrase in each first phrase set is the same as the slot corresponding to the first phrase set.
  • the computer device may determine K 2 K 2 th meaning corresponding second set of phrases, K 2 is a positive integer equal to or greater than 1.
  • K is a positive integer greater than or equal to 1 and greater than or equal to K 2 .
  • the computer device has determined a total of K target phrases, of which K 1 is the first type of target phrase, and K 2 is the second type of target phrase.
  • the K 2 word meanings are respectively the word meanings of K 2 target phrases of the second type.
  • K 2 in the first two meanings K 2 _n 2 _n meaning as meaning two second type of target phrase K 2 in the second category of the target phrase K.
  • K 2 _n is equal to 1,..., K 2 .
  • the word meanings of two phrases corresponding to each other may mean that the meanings of the two phrases are the same. It can be said that any of these two words is the paraphrase of the other word, that is, one word is another expression of the other word. For example, “cheap” can be interpreted as “beneficial” and "low consumption”.
  • the word meanings of two phrases corresponding to each other can mean that the two phrases have the same meaning, but also that the two phrases are antonyms of each other.
  • the phrases corresponding to "cheap” can be "expensive” and “consumption is high”.
  • the computer device can be determined with K 2 K 2 th second set of phrases corresponding to the meaning according to the second correspondence relationship.
  • the second correspondence includes correspondences between multiple word meanings and multiple second phrase sets.
  • the word meaning of any phrase in each second phrase set is the same as the word meaning corresponding to the second phrase set.
  • the computer device may determine the second phrase set corresponding to each target phrase of the second type according to the synonym database.
  • the computer device may determine the second group set corresponding to each target phrase of the second type according to the synonym database or the antonym database.
  • the computer device may use an existing paraphrase corpus to determine the second set of phrases corresponding to each target phrase of the second type.
  • the Paraphrase Database http://paraphrase.org
  • the paraphrase database is a widely used paraphrase corpus.
  • a set of phrases corresponding to each target phrase of the second type can be determined.
  • the word meanings of some phrases in a group of phrase sets determined by using the interpretation database and the word meanings of the phrases corresponding to the phrase set may not be completely the same or opposite.
  • the phrase set obtained by using this paraphrase database also includes such as onerous.
  • the computer device can determine to replace "cheap” in the first text with “affordable” and “consumption low” respectively , So as to obtain the above-mentioned second text 3 (that is, I want to find an affordable Chinese restaurant) and the second text 4 (that is, I want to find a low-consumption Chinese restaurant).
  • the above-mentioned first text "I want to find a cheap Chinese restaurant" includes the first type of target phrase and the second type of target phrase.
  • Some texts in the training text database may include the first type target phrase and the second type target phrase, and some texts in the training text database may only include one of the first type target phrase and the second type target phrase. In some embodiments, some texts in the training text database may not include any one of the first type target phrase and the second type target phrase.
  • the computer device may directly use the text without expansion.
  • the computer device trains DST through machine learning according to the first text and the P second texts.
  • the computer device may directly use the first text and P second texts as training texts for machine learning to train the DST.
  • the specific implementation manner of the computer equipment training DST is the same as the existing implementation manner. For the sake of brevity, it is not necessary to repeat it here.
  • the computer device may also use part of the first text and the P second texts as training texts for machine learning to train the DST.
  • the computer device may use part of the first text and the P second texts to train the DST.
  • the computer device may use the P second text or part of the second text in the P second text to train the DST.
  • the computer device may select part of the P second texts as training texts for machine learning in a random manner.
  • the computer device may use the P second texts to train a policy network model, and use the policy network model to select at least one second text from the P second texts as machine learning Training text.
  • the computer device may use a reinforcement learning algorithm or an evolutionary algorithm to train the strategy network model. More specifically, the computer device may use contextual bandit algorithms, genetic algorithms, etc. to train the strategy network model.
  • Fig. 4 is a schematic flowchart of a training strategy network model provided according to an embodiment of the present application.
  • the computer device determines M texts from the training text database.
  • M is a positive integer greater than or equal to 1, and the value of M is less than the total number of texts included in the training text database.
  • the computer device may randomly select the M texts from the training text database.
  • the computer device may select the M texts from the training text database according to certain rules.
  • the computer device can determine the M texts according to the number of texts expanded by each training text in the training text database. If part of the text in the training text library (hereinafter referred to as the first part of the text) expands more than another part of the text (hereinafter referred to as the second part of the text), the computer device can determine that the M texts belong to the first part One part of the text has more text than the second part of the text.
  • the manner in which the computer device can select the text belonging to the M texts from the first part of the text and the second part of the text may be random or in a certain order.
  • the computer device can determine the text belonging to the third part of the M texts More than the text belonging to the fourth part of the text.
  • the way that the computer device can select the text belonging to the M texts from the third part of the text and the fourth part of the text may be random or in a certain order.
  • the computer device determines M extended text fragment sets from the first enhanced database, where the M extended text fragment sets correspond to the M texts in a one-to-one correspondence.
  • the method of determining P second texts based on the first text in FIG. 3 is referred to as a coarse-grained data enhancement strategy below.
  • the first enhanced database is a database composed of texts obtained after the texts in the training text database are expanded according to a coarse-grained data enhancement strategy. In other words, each text in the first enhanced database is generated based on a text in the training text database. The first enhanced database does not include the text in the training text database.
  • the training text database includes a total of 1000 sentences.
  • the computer equipment can use the coarse-grained enhancement strategy to expand the 1,000 sentences into 20,000 sentences, which does not include the 1,000 sentences in the training text database. It is understandable that there may be three types of sentences in these 1,000 sentences: the first type of sentence includes the above-mentioned first type of target phrase and the above-mentioned second type of target phrase; the second type of sentence only includes the above-mentioned first type of target phrase and the first type of target phrase. The first of the two types of target phrases; the third type of sentence may neither include the first type of target phrase nor the second type of target phrase.
  • the computer device can use the method shown in FIG. 3 to expand to obtain 20,000 sentences.
  • the database composed of 20,000 sentences is the first enhanced database.
  • the first enhanced database does not include 1000 sentences in the training text database.
  • the granularity of the text included in the first enhanced database is the same as the granularity of the text in the training text database.
  • the granularity in the training text database is a sentence
  • the granularity of the text in the first enhanced database is also a sentence.
  • the granularity of the text included in the first enhanced database may be different from the granularity of the text in the training text database.
  • the granularity in the training text database is a sentence
  • the granularity of the text in the first enhanced database is also an extended phrase or a partial sentence including the extended phrase.
  • the text corresponding to the text included in the text in the first enhanced database may include the aforementioned second text 1 to second text 4.
  • the text corresponding to the text included in the text in the first enhanced database may include "Japanese food”, “French food”, “benefit” and “low consumption”.
  • the text in the first enhanced database includes the text corresponding to the text may include "Japanese restaurant”, “French restaurant”, “affordable Chinese restaurant” and "low-consumption Chinese restaurant” .
  • each text in the first enhanced database may include source indication information, and the source indication information may be used to indicate a text in the training text database.
  • the text indicated by the source indication information is a text used to generate text including the source indication information.
  • the first enhanced database may store texts in the form of a collection.
  • Each set includes at least one text, and the at least one text is obtained by performing coarse-grained enhancement strategy expansion on the same text in the training text database.
  • each set may include one source indication information, and the source indication information may be used to indicate a text in the training text database. The text indicated by the indication information is the text used to generate the text in the set.
  • the computer device can determine the M extended text fragment sets corresponding to the M texts according to the source indication information in the first enhanced database.
  • the correspondence between the set of expanded text fragments and the text means that the expanded text fragments included in the set of expanded text fragments are determined according to the target phrase in the corresponding text.
  • the extended text segment may be an extended phrase.
  • the extended text segment may be a complete text including the extended phrase.
  • the extended text segment may also be a partial text including an extended phrase.
  • the set of extended text fragments corresponding to the text includes the aforementioned second text 1 to second text 4.
  • the set of extended text fragments corresponding to the text includes "Japanese food”, “French food”, “affordable” and “low consumption”.
  • the set of extended text fragments corresponding to the text includes "Japanese restaurant”, “French restaurant”, “affordable Chinese restaurant”, and "low-consumption Chinese restaurant”.
  • the text fragments including the target phrase in the M texts corresponding to the M extended text fragments may be referred to as target text fragments.
  • the target text segment may be a target phrase.
  • the target text segment may be a complete text including the target phrase.
  • the target text segment may also be a partial text including the target phrase.
  • the target text segment corresponding to the text may be the first text.
  • the target text segment may include "cheap” and “Chinese food.”
  • the target text segment may include "cheap” and "Chinese restaurant”.
  • the computer device selects one extended text segment from each extended text segment set in the M extended text segment sets corresponding to the M training texts according to the reference strategy network model.
  • the extended text fragments selected from the set of extended text fragments according to the reference strategy network model can be referred to as candidate text fragments.
  • the computer device can determine a set of candidate text fragments according to the reference strategy network model.
  • the candidate text fragment set includes M candidate text fragments, each of which comes from M extensions. A collection of text fragments.
  • the computer device may repeat step 403 T times to determine a total of T candidate text fragment sets.
  • T is a positive integer greater than or equal to 1.
  • the values of M and T are preset. It is understandable that if the value of M and T is larger, the set of candidate text fragments determined by the computer device is more, and the training strategy network model has a better effect on text selection, but the training time is also The longer; on the contrary, if the values of M and T are smaller, the set of candidate text fragments determined by the computer equipment is less, and the strategy network model of the training office has a poorer effect on text selection, but the training time will be correspondingly cut back. Therefore, the values of M and T can be selected according to the performance and/or actual requirements of the computer equipment. For example, if you want to get a better strategic network model, you can choose larger values of M and T.
  • M and T For another example, if you want to determine a policy network model faster, you can choose a smaller value of M and T.
  • computer equipment with different performance may have different effects of training the strategy network model in the same time. For example, if the training algorithm is the same, then the better the performance of the computer equipment training the better the effect of the strategy network model in the same time. Therefore, if the performance of the computer equipment is better, a larger value of M and T can be selected. If the performance of the computer equipment is poor, a smaller value of M can be selected.
  • the computer device evaluates the selected set of M candidate text fragments according to the initial DST, and obtains the evaluation result.
  • the computer device may perform a single-sample evaluation on the set of M candidate text segments according to the initial DST to obtain an evaluation result.
  • the computer device may perform a sample set evaluation on the M candidate text fragment sets according to the initial DST to obtain an evaluation result.
  • the computer device may perform single-sample evaluation and sample-set evaluation on the M candidate texts according to the initial DST to obtain the evaluation result.
  • the initial DST may be a DST obtained by training using the text in the training text database as the training text of machine learning according to an existing training DST manner.
  • the reference DST may be obtained by using some text training according to a preset lower accuracy rate (for example, lower than 80% or lower).
  • the single-sample evaluation performed by the computer device may include: the computer device uses the initial DST to predict the state of each candidate text fragment in the set of M candidate text fragments, and according to the prediction result, determines the first one corresponding to each candidate text fragment. Reward value.
  • the set of M candidate text fragments includes a total of M ⁇ T candidate text fragments, and correspondingly, the evaluation result includes a total of M ⁇ T first reward values.
  • the computer device can determine that the first reward value of the candidate text fragment is a positive incentive; if the prediction result of a candidate text fragment does not meet the preset requirements, Then the computer device can determine that the first reward value of the candidate text segment is a reverse incentive.
  • the first reward value of the forward incentive is greater than the first reward value of the reverse incentive.
  • the first reward value of forward incentives may be a number greater than 0, such as 1, and the first reward value of reverse incentives may be a number less than 0, such as -1.
  • both the first reward value of the forward incentive and the first reward value of the reverse incentive may be greater than 0, but the first reward value of the forward incentive is greater than the first reward value of the reverse incentive .
  • the first reward value of forward incentives is 10, and the first reward value of reverse incentives is 1.
  • the preset requirements for the prediction result are also different.
  • the first Class candidate text fragment For an expanded phrase in a candidate text fragment determined based on the first expansion rule (that is, the expanded phrase in the candidate text fragment is determined according to the first type of target phrase, for ease of description, this candidate text fragment is hereinafter referred to as the first Class candidate text fragment).
  • the label of the first type of candidate text segment is the slot of the extended phrase in the candidate text segment.
  • the predicted label is the same as the actual label and does not meet the preset requirements, and the predicted label is different from the actual label, which meets the preset requirements. In other words, if the initial DST predicts the first-type candidate text segment with the same label as the actual tag of the expanded phrase in the first-type candidate text segment, it indicates the prediction of the first-type candidate text segment The result did not meet the requirements.
  • the computer device may determine that the first reward value corresponding to the first type of candidate text segment is a reverse incentive. If the initial DST predicts a first-type candidate text segment, the label obtained is different from the actual label of the extended phrase in the first-type candidate text segment, it means that the prediction result of the first-type candidate text segment conforms to Claim. In this case, the computer device may determine that the first reward value corresponding to the first type of candidate text segment is a positive incentive.
  • the second Class candidate text fragment For an expanded phrase in a candidate text fragment determined based on the second expansion rule (that is, the expanded phrase in the candidate text fragment is determined according to the second type of target phrase, for ease of description, this candidate text fragment is hereinafter referred to as the second Class candidate text fragment).
  • the label of the second type of candidate text segment is the meaning of the extended phrase in the candidate text segment.
  • the predicted label is the same as the actual label to meet the preset requirements, and the predicted label is different from the actual label, which does not meet the preset requirements. In other words, if the initial DST predicts the second-type candidate text segment with the same label as the actual tag of the extended phrase in the second-type candidate text segment, it indicates the prediction of the second-type candidate text segment The results meet the requirements.
  • the computer device may determine that the first reward value corresponding to the second type of candidate text segment is a reverse incentive. If the initial DST predicts a second-type candidate text segment, the label obtained is different from the actual label of the expanded phrase in the second-type candidate text segment, it means that the prediction result of the second-type candidate text segment is different. Meet the requirements. In this case, the computer device may determine that the first reward value corresponding to the second type of candidate text segment is a positive incentive.
  • the evaluation of the sample set by the computer device means that the computer device uses a set of candidate text fragments to train the initial DST to obtain the initial DST after training.
  • the DST after training is referred to as the reference DST below.
  • the computer device can determine the second reward value corresponding to the set of candidate text segments according to the reference DST.
  • the second reward value is the evaluation result of the sample set of the corresponding candidate text segment.
  • the computer device uses the T candidate text fragments included in a candidate text fragment set to train the initial DST.
  • the process is the same as the existing training DST process. For brevity, it is unnecessary to describe in detail here.
  • the selected initial DST may be a DST with a lower prediction accuracy.
  • the accuracy of the initial DST prediction may be lower than 90%, or even lower than 80%.
  • the evaluation of the sample set by the computer device according to the initial DST may include: the computer device trains the initial DST according to the set of M candidate text fragments; and determines T second reward values according to the DST obtained after the training.
  • the computer device training the initial DST according to the M candidate text fragment sets may include: the computer device uses T DST training text sets to train the initial DST respectively. In this case, the computer device can obtain T initial DSTs after training. For ease of description, the initial DST after training is referred to as the reference DST below.
  • the T DST training text sets are determined according to the M candidate text fragment sets. Each training text set in the T DST training text sets includes M candidate text fragments, and the M candidate text fragments are respectively from the M candidate text fragment sets.
  • the i-th DST training text set in the T DST training text sets includes M candidate text fragments
  • the j-th candidate text fragment in the M candidate text fragments is in the M candidate text fragment set
  • the computer device can determine T second reward values according to the T reference DSTs respectively.
  • the computer device determining the second reward value according to the reference DST may include: the computer device determines whether the accuracy rate of the reference DST prediction tag is higher than the accuracy rate of the initial DST prediction tag; determining whether the accuracy rate of the predicted tag has improved. 2. Reward value. If the accuracy of the predicted label is improved, the second reward value may be a positive incentive; if the accuracy of the predicted label is not improved or decreased, the second reward value is a reverse incentive.
  • the second reward value of the forward incentive is greater than the second reward value of the reverse incentive.
  • the second reward value of the forward incentive may be a number greater than 0, such as 1, and the second reward value of the reverse incentive may be a number less than 0, such as -1.
  • the second reward value of the forward incentive and the second reward value of the reverse incentive may both be greater than 0, but the second reward value of the forward incentive is greater than the second reward value of the reverse incentive .
  • the second reward value of forward incentives is 10, and the second reward value of reverse incentives is 1.
  • the computer device can use the initial DST and the reference DST to perform label prediction on the same set of texts to determine whether the accuracy of the predicted label is improved.
  • This set of texts used to measure the performance of the initial DST and the reference DST (that is, the accuracy of the predicted label) can be called a verification set.
  • the verification set may be a candidate text segment set used to train the initial DST.
  • the verification set may be any candidate text fragment set among the M candidate text fragment sets.
  • the evaluation result determined by the computer device includes M ⁇ T first reward values and T second reward values.
  • the computer device uses the evaluation result to train a reference strategy network model.
  • the policy network model can be expressed as:
  • ⁇ ⁇ (s, p′) represents the probability prediction of the candidate text segment p′ based on the context state s.
  • s is the vector representation extracted from the triples ⁇ x,y,p> and the candidate text p'.
  • P represents the target text fragment.
  • f(s,p') is calculated using a fully connected network, and represents the probability of p being replaced by p'. Because each target text segment can correspond to multiple extended text segments. Therefore, formula 1.1 uses a normalized way to represent the policy network model.
  • Cp in formula 1.1 represents the set of all candidate text fragments corresponding to a target text fragment. Represents any candidate text segment in Cp. Represents the sum of the values calculated by the fully connected network of all p'in Cp, that is, the sum of the probability that p is replaced by each p'of Cp.
  • the larger the reward value (including the first reward value and the second reward value), it means that the predicted result is more in line with the requirements, and the candidate text segment selected by the reference strategy network model is more suitable for training DST . Therefore, it can be expected to maximize the reward signal to train the reference strategy network model to obtain a better strategy network model.
  • the computer device may train the parameters in the reference strategy network model through gradient learning.
  • the expected reward signal can be equal to the gradient of the reference strategy network.
  • the gradient of the reference strategy network can be approximated as:
  • the replacement text obtained from the jth sampling Represents the reward value evaluated by the sample set i (ie the second reward value), Represents the evaluation reward value of the jth sampling in the sample set i (ie, the first reward value).
  • the original sample set refers to the set of M extended text fragments determined in step 402.
  • the i-th sample refers to the i-th DST training text set in the T DST training text sets.
  • the j-th sampling of the i-th sample is the j-th candidate text segment in the i-th DST training text set.
  • the computer device can also use other methods to train the reference strategy network model.
  • the computer device may use stochastic gradient descent (SGD), adaptive moment estimation (Adaptive Moment Estimation, Adam) methods to train the reference strategy network model.
  • SGD stochastic gradient descent
  • ADMS adaptive moment estimation
  • Adam adaptive Moment Estimation
  • the computer device can execute steps 401 to 405 again after sequentially executing steps 401 to 405. In other words, the computer device can execute the method shown in FIG. 4 cyclically in the order of step 401 to step 405. If the computer device determines that the number of cycles is greater than a preset number N, the cycle can be stopped. It is determined that the reference strategy network model trained when step 405 is executed for the Nth time is a strategy network model for selecting at least one second text from the P second texts as the training text for machine learning. The computer equipment can set an initial policy network model. The computer device can use the initial strategy network model to select candidate text segments in the first cycle. In other words, when the computer device executes the method shown in FIG.
  • the reference strategy network model used in step 403 is the initial strategy network model.
  • the computer device can execute step 403 in a loop T times.
  • the reference strategy network model used by the computer equipment is the same.
  • the reference strategy network model used when step 403 is executed in a loop T times is the reference strategy network model trained when step 405 is executed last time.
  • the reference policy network model in step 403 is the reference policy network model determined when step 405 of the method shown in FIG. 4 is executed for the t-1th time, t It is a positive integer greater than or equal to 2 and less than or equal to N.
  • some extended texts determined by the computer device may not be appropriate.
  • the meanings of some phrases in a phrase set determined by using the paraphrase database and the meanings of the phrases corresponding to the phrase set may not be completely the same or opposite.
  • the second text generated from these phrases is not suitable for training DST.
  • the strategy network model determined by the method shown in FIG. 4 the second text can be filtered, and the second text that is not suitable for training DST can be filtered out. In this way, the quality of the text used for training the DST can be improved, thereby improving the performance of the trained DST.
  • the first text in the method shown in FIG. 3 is one of the M texts determined in step 401 of the method shown in FIG. 4.
  • the extended text segment is a complete text including the extended phrase. Then, P second texts obtained by expanding the first text can be used as an expanded text fragment set.
  • Fig. 5 is a schematic flowchart of a method for training the policy network model by using the P second texts.
  • the computer device uses the reference strategy network model to select a second text from the P second texts.
  • the computer device can execute step 501 T times.
  • the computer device has determined T second texts in total from the P second texts.
  • the value of P can be greater than or less than T.
  • Duplicate text may appear in the T second text.
  • the T second texts are T candidate text segments.
  • the T second texts respectively belong to M candidate text fragment sets.
  • the computer device evaluates the T second texts according to the initial DST, and obtains an evaluation result.
  • the computer device evaluating the T second texts according to the initial DST includes: the computer device may perform a single-sample evaluation on the T second texts.
  • the computer device evaluating the T second texts according to the initial DST includes: the computer device evaluating the sample set according to the T second texts.
  • the computer device evaluates the T second text according to the initial DST, including: the computer device performs a single-sample evaluation on the T second text and performs a single-sample evaluation according to the T second text. Two text fragments are evaluated for sample collection.
  • the computer device can perform a single-sample evaluation of the T second texts, including: the computer device can use the initial DST to predict the state of each second text in the T second texts to obtain T prediction results, according to The M prediction results determine T first reward values, and the T first reward values have a one-to-one correspondence with T second texts.
  • the j-th first reward value in the T first reward values is determined according to the initial DST prediction result of the i-th second text in the T second texts.
  • the specific implementation of single-sample evaluation can refer to the method shown in FIG. 4, and it is not necessary to repeat it here.
  • the evaluation of the sample set by the computer device according to the T second text segments may include: the computer device uses the T second texts to train the initial DST; and according to the initial DST after training, determining T second reward values.
  • the computer device using the T second texts to train the initial DST may include: the computer device training the initial DST with T DST training text sets.
  • the T second texts belong to the T DST training text sets respectively.
  • the i-th second text in the T second texts is a text in the i-th DST training text set in the T DST training text sets.
  • the evaluation result includes T first reward values.
  • the evaluation result includes T second reward values.
  • the computer device may train the reference strategy network model according to the evaluation result.
  • evaluation result determined in step 503 is a subset of the evaluation result determined in step 404 in FIG. 4 or the same as the evaluation result.
  • step 404 if only the sample set evaluation is performed in step 404, then only the sample set evaluation is also performed in step 502. At this time, the evaluation result determined in step 503 is the same as the evaluation result determined in step 404.
  • the evaluation result determined in step 404 includes the evaluation result determined in step 503.
  • the evaluation result determined in step 404 includes the M ⁇ T first predicted values.
  • the evaluation result determined in step 503 includes T first prediction values, and the T first prediction values in the evaluation result determined in step 503 belong to the corresponding M ⁇ T first prediction values in the evaluation result determined in step 404.
  • the computer device can use the policy network model to select part of the text in the first enhanced database to form a second enhanced database, and use the second enhanced database Training DST.
  • the method of determining P second texts based on the first text in FIG. 3 is referred to as a coarse-grained data enhancement strategy.
  • the policy network model determined in FIG. The textual approach is called a fine-grained data enhancement strategy.
  • the computer equipment expands 1,000 sentences in the training text database to 20,000 sentences in the first augmented database.
  • the computer device can also select part of the text in the first enhanced database to form a second enhanced database based on a fine-grained data enhancement strategy.
  • the computer device can use the strategy network model to select part of the text in the first enhanced database to form a second enhanced database.
  • the computer device uses a fine-grained data enhancement strategy to select 12,000 sentences from the 20,000 sentences in the first augmented database. These 12,000 sentences are the sentences included in the second enhanced database.
  • the computer device can use all the sentences in the second enhanced database and all the sentences in the training text database as the training text for machine learning, and train to obtain the DST.
  • the DST can realize the function of DST 102 in the dialogue system 100 shown in FIG. 1 and the function of DST shown in FIG. 2.
  • the method of the present application can expand the training text used for training DST from 1000 to 12100. Increasing the number of samples of training text used for training DST can improve the performance of the trained DST, so that the DST can more accurately determine the slot-slot value in the user's content, and improve the accuracy of the intent determined by the DST And improve the accuracy of determining the slot value of the unfilled slot.
  • Fig. 6 is a structural block diagram of a computer device provided according to an embodiment of the present application.
  • the computer device 600 shown in FIG. 6 includes: an acquiring unit 601 and a processing unit 602.
  • the acquiring unit 601 is configured to acquire a first text, the first text is a text in a training text database, and the first text includes at least two phrases.
  • the processing unit 602 is configured to determine at least one target phrase from the first text.
  • the processing unit 602 is further configured to determine P second texts according to the at least one target phrase.
  • Each second text in the P second texts includes an expanded phrase based on the at least one target phrase A certain, P is a positive integer greater than or equal to 1;
  • the processing unit 602 is further configured to train a dialog state tracking classifier based on the first text and the P second texts through machine learning.
  • the dialog state tracking classifier is used to predict the conversation status based on the acquired user’s dialog. Current status.
  • the acquiring unit 601 may be implemented by a transceiver, and the processing unit 602 may be implemented by a processor.
  • the specific functions and beneficial effects of the acquiring unit 601 and the processing unit 602 can be referred to the methods shown in FIG. 3 to FIG. 5, and details are not required here.
  • Fig. 7 is a structural block diagram of a computer device provided according to an embodiment of the present application.
  • the computer device 700 shown in FIG. 7 includes a processor 701, a memory 702, and a transceiver 703.
  • the processor 701, the memory 702, and the transceiver 703 communicate with each other through an internal connection path to transfer control and/or data signals.
  • the method disclosed in the foregoing embodiment of the present application may be applied to the processor 701 or implemented by the processor 701.
  • the processor 701 may be an integrated circuit chip with signal processing capabilities. In the implementation process, the steps of the foregoing method can be completed by hardware integrated logic circuits in the processor 701 or instructions in the form of software.
  • the aforementioned processor 701 may be a general-purpose processor, a digital signal processor (digital signal processor, DSP), an application specific integrated circuit (ASIC), a ready-made programmable gate array (field programmable gate array, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • Programmable logic devices discrete gate or transistor logic devices, discrete hardware components.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in random access memory (RAM), flash memory, read-only memory (read-only memory, ROM), programmable read-only memory, or electrically erasable programmable memory, registers, etc. mature in the field Storage medium.
  • the storage medium is located in the memory 702, and the processor 701 reads instructions in the memory 702, and completes the steps of the foregoing method in combination with its hardware.
  • the memory 702 may store instructions for executing the method executed by the computer device in the method shown in FIGS. 3 to 5.
  • the processor 701 can execute the instructions stored in the memory 702 in combination with other hardware (for example, the transceiver 703) to complete the steps of the computer device in the method shown in FIGS. 3 to 5.
  • the specific working process and beneficial effects can be seen in FIGS. 3 to 5. Show the description in the embodiment.
  • An embodiment of the present application also provides a chip, which includes a transceiver unit and a processing unit.
  • the transceiver unit may be an input/output circuit or a communication interface
  • the processing unit is a processor or microprocessor or integrated circuit integrated on the chip.
  • the chip can execute the method of the computer device in the above method embodiment.
  • the embodiment of the present application also provides a computer-readable storage medium on which an instruction is stored, and the method of the computer device in the foregoing method embodiment is executed when the instruction is executed.
  • the embodiment of the present application also provides a computer program product containing instructions that, when executed, execute the method of the computer device in the foregoing method embodiment.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • each unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of this application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

Disclosed are a method for training a dialog state tracker, and a computer device, which relate to the field of artificial intelligence. The method comprises expanding texts in a training text database to obtain an enhanced database; and training a dialog state tracker by using texts in the enhanced database. The number of training texts for training the dialog state tracker can be increased, such that the performance of the dialog state tracker can be improved.

Description

训练对话状态跟踪分类器的方法和计算机设备Method and computer equipment for training dialog state tracking classifier
本申请要求于2019年5月13日提交中国专利局、申请号为201910395608.1、申请名称为“训练对话状态跟踪分类器的方法和计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on May 13, 2019, with application number 201910395608.1, application titled "Method and Computer Equipment for Training Dialogue State Tracking Classifier", the entire content of which is incorporated by reference In this application.
技术领域Technical field
本申请涉及人工智能领域,更具体地,涉及训练对话状态跟踪分类器的方法和计算机设备。This application relates to the field of artificial intelligence, and more specifically, to a method and computer equipment for training a conversation state tracking classifier.
背景技术Background technique
人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。Artificial Intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
自然语言处理是人工智能领域的一个重要分支。对话系统是自然语言处理的一个应用方向。常见的对话系统包括自动对话机器人、语音助手等。与传统的检索不同,用户输入对话系统的文本通常是完整的句子,并且用户输入的文本通常是口语化的句子。因此,对话系统需要根据用户输入的文本理解并跟踪用户的需求,并根据用户的需求确定答复内容。Natural language processing is an important branch in the field of artificial intelligence. Dialogue system is an application direction of natural language processing. Common dialogue systems include automatic dialogue robots and voice assistants. Different from traditional retrieval, the text input by the user in the dialogue system is usually a complete sentence, and the text input by the user is usually a colloquial sentence. Therefore, the dialogue system needs to understand and track the user's needs according to the text input by the user, and determine the reply content according to the user's needs.
对话状态跟踪分类器(dialog state tracker,DST)负责在对话过程中理解并跟踪用户的需求,确定并输出会话状态。DST输出的会话状态表示了用户的需求。对话系统可以根据DST输出的会话状态,确定答复内容。The dialogue state tracker (DST) is responsible for understanding and tracking the needs of users during the dialogue process, and determining and outputting the conversation state. The session state output by DST represents the user's needs. The dialogue system can determine the reply content according to the conversation state output by DST.
机器学习是现在常用的确定DST的方式。但是,机器学习过程需要高质量的训练文本。然而高质量的训练文本很难收集。换句话说,目前能够收集到的高质量的训练文本数量较少。除了高质量的训练文本数量较少外,目前能够收集到的高质量的训练文本的涉及的场景也较少。因此,训练样本的多样性也较差。由于用于机器学习的训练文本数量较少和多样性较差,通过机器学习得到的DST的性能也不会特别高。Machine learning is now a common way to determine DST. However, the machine learning process requires high-quality training text. However, high-quality training text is difficult to collect. In other words, the number of high-quality training texts that can be collected currently is small. In addition to the small number of high-quality training texts, the current high-quality training texts that can be collected involve fewer scenes. Therefore, the diversity of training samples is also poor. Due to the small number and poor diversity of training texts used for machine learning, the performance of DST obtained through machine learning will not be particularly high.
发明内容Summary of the invention
本申请提供一种训练对话状态跟踪分类器的方法和计算机设备,提供对话状态跟踪分类器的性能。The present application provides a method and computer equipment for training a dialog state tracking classifier to provide the performance of the dialog state tracking classifier.
第一方面,本申请实施例提供一种训练对话跟踪分类器的方法,该方法包括:获取第一文本,该第一文本为训练文本数据库中的一个文本,该第一文本包括至少两个词组;从该第一文本中确定至少一个目标词组;根据该至少一个目标词组,确定P个第二文本,该 P个第二文本中的每个第二文本包括一个扩展词组,该扩展词组是基于该至少一个目标词组中的一个确定的,P为大于或等于1的正整数;根据该第一文本和该P个第二文本,通过机器学习,训练对话状态跟踪分类器,该对话状态跟踪分类器用于根据获取到的用户的对话,跟踪该对话的状态。上述技术方案可以增加用于训练对话状态跟踪分类器的训练文本的样本数量可以提高训练出的对话状态跟踪分类器的性能,使得该对话状态跟踪分类器可以更加准确地确定用户表述内容中的槽位-槽位值,以及提高该对话状态跟踪分类器确定的意图的准确性和提高确定未填充槽位值的槽位的准确性In a first aspect, an embodiment of the present application provides a method for training a dialog tracking classifier, the method comprising: obtaining a first text, the first text is a text in a training text database, and the first text includes at least two phrases ; Determine at least one target phrase from the first text; determine P second texts based on the at least one target phrase, each second text in the P second texts includes an extended phrase, the extended phrase is based on If one of the at least one target phrase is determined, P is a positive integer greater than or equal to 1; according to the first text and the P second text, through machine learning, a dialogue state tracking classifier is trained, and the dialogue state tracking classification The device is used to track the status of the conversation based on the acquired conversation of the user. The above technical solution can increase the number of training text samples used to train the dialogue state tracking classifier, and improve the performance of the trained dialogue state tracking classifier, so that the dialogue state tracking classifier can more accurately determine the slots in the user's expression content. Bit-slot value, and improve the accuracy of the intent determined by the dialog state tracking classifier and the accuracy of determining the slot with the unfilled slot value
结合第一方面,在第一方面的一种可能的实现方式中,该根据该至少一个目标词组,确定P个第二文本,包括:确定与K 1个槽位对应的K 1个第一词组集合,其中该K 1个槽位分别为该至少一个目标词组中的K 1个目标词组的槽位,K 1为大于或等于1的正整数;确定P 1个第二文本,其中P 1个第二文本包括的扩展词组属于该K 1个第一词组集合,该P个第二文本包括该P 1个第二文本,P 1为大于或等于1的正整数。上述技术方案通过更改相同槽位的槽位值来实现增加用于训练对话状态跟踪分类器的训练文本数量。 Binding a first aspect, a first aspect of the possible implementations, the phrase at least one object based on the determined P second text, comprising: determining K 1 K 1 corresponding to the slots of first phrases Set, where the K 1 slots are the slots of K 1 target phrases in the at least one target phrase, K 1 is a positive integer greater than or equal to 1, and P 1 second text is determined, among which P 1 The extended phrase included in the second text belongs to the K 1 first phrase set, the P second text includes the P 1 second text, and P 1 is a positive integer greater than or equal to 1. In the above technical solution, the number of training texts used for training the dialog state tracking classifier is increased by changing the slot value of the same slot.
结合第一方面,在第一方面的一种可能的实现方式中,该根据该至少一个目标词组,确定P个第二文本,包括:确定与K 2个词义对应的K 2个第二词组集合,其中该K 2个词义分别为K 2个目标词组的词义,K 2为大于或等于1的正整数;确定P 2个第二文本,其中P 2个第二文本包括的扩展词组属于该K 2个第二词组集合,该P个第二文本包括该P 2个第二文本,P 2为大于或等于1的正整数。上述技术方案基于词组的词义来实现增加用于训练对话状态跟踪分类器的训练文本数量。 Binding a first aspect, a first aspect of the possible implementations, the phrase at least one object based on the determined P second text, comprising: determining K 2 K 2 a second set of phrases corresponding to the meaning , Where the K 2 word meanings are the word meanings of K 2 target phrases, K 2 is a positive integer greater than or equal to 1; determine P 2 second texts, where P 2 second texts include extended phrases belonging to the K A set of 2 second phrases, the P second text includes the P 2 second text, and P 2 is a positive integer greater than or equal to 1. The above technical solution is based on the meaning of the phrase to increase the number of training texts used to train the dialogue state tracking classifier.
结合第一方面,在第一方面的一种可能的实现方式中,该根据该第一文本和该P个第二文本,通过机器学习,训练对话状态跟踪分类器,包括:根据策略网络模型,从该P个第二文本中确定至少一个第二文本;使用该第一文本和该至少一个第二文本作为该机器学习的训练文本,训练该对话状态跟踪分类器。上述技术方案可以对第二文本进行筛选,过滤掉不适合用于训练对话状态跟踪分类器的第二文本。这样,可以提高用于训练对话状态跟踪分类器的文本的质量,从而提高训练出的对话状态跟踪分类器的性能。With reference to the first aspect, in a possible implementation of the first aspect, the training of the dialogue state tracking classifier according to the first text and the P second texts through machine learning includes: according to a policy network model, Determine at least one second text from the P second texts; use the first text and the at least one second text as the training text for the machine learning to train the dialogue state tracking classifier. The above technical solution can filter the second text and filter out the second text that is not suitable for training the dialogue state tracking classifier. In this way, the quality of the text used for training the dialogue state tracking classifier can be improved, thereby improving the performance of the trained dialogue state tracking classifier.
结合第一方面,在第一方面的一种可能的实现方式中,该方法还包括:根据参考策略网络模型,从P个第二文本中确定T个第二文本,T为大于或等于1的正整数;根据初始对话状态跟踪分类器和该T个第二文本,确定评测结果;根据该评测结果,训练该参考策略网络模型得到该策略网络模型。With reference to the first aspect, in a possible implementation of the first aspect, the method further includes: determining T second texts from the P second texts according to the reference strategy network model, where T is greater than or equal to 1. A positive integer; the evaluation result is determined according to the initial dialogue state tracking classifier and the T second text; according to the evaluation result, the reference strategy network model is trained to obtain the strategy network model.
结合第一方面,在第一方面的一种可能的实现方式中,该根据初始对话状态跟踪分类器和该T个第二文本,确定评测结果:使用初始对话状态跟踪分类器,预测该T个第二文本中的每个第二文本的状态,得到T个预测结果,根据该T个预测结果,确定T个第一奖励值;或者使用该T个第二文本对该初始对话状态跟踪分类器进行训练;根据训练后的该初始对话状态跟踪分类器,确定T个第二奖励值。In combination with the first aspect, in a possible implementation of the first aspect, the evaluation result is determined according to the initial dialogue state tracking classifier and the T second texts: the initial dialogue state tracking classifier is used to predict the T For the state of each second text in the second text, obtain T prediction results, and determine T first reward values according to the T prediction results; or use the T second text to track the classifier for the initial dialogue state Perform training; track the classifier according to the initial dialogue state after training, and determine T second reward values.
结合第一方面,在第一方面的一种可能的实现方式中,该根据初始对话状态跟踪分类器和该T个第二文本,确定评测结果:使用初始对话状态跟踪分类器,预测该T个第二文本中的每个第二文本的状态,得到T个预测结果,根据该T个预测结果,确定T个第一奖励值;使用该T个第二文本对该初始对话状态跟踪分类器进行训练;根据训练后的该初始对话状态跟踪分类器,确定T个第二奖励值。In combination with the first aspect, in a possible implementation of the first aspect, the evaluation result is determined according to the initial dialogue state tracking classifier and the T second texts: the initial dialogue state tracking classifier is used to predict the T For the state of each second text in the second text, T prediction results are obtained. According to the T prediction results, T first reward values are determined; use the T second texts to perform the initial dialog state tracking classifier Training: Track the classifier according to the initial dialogue state after training, and determine T second reward values.
第二方面,本申请实施例提供了一种确定对话状态的方法,该方法包括:获取用户的对话;使用对话状态跟踪分类器跟踪所述对话的状态,其中该对话状态跟踪分类器是根据第一方面或第一方面的任一种可能的实现方式确定的。In a second aspect, an embodiment of the present application provides a method for determining the state of a dialogue, the method includes: acquiring a user’s dialogue; using a dialogue state tracking classifier to track the state of the dialogue, wherein the dialogue state tracking classifier is based on the first One aspect or any possible implementation manner of the first aspect is determined.
第三方面,本申请实施例提供一种计算机设备,该计算机设备包括用于执行第一方面或第一方面的任一种可能的实现方式所述方法的单元。In a third aspect, an embodiment of the present application provides a computer device, which includes a unit for executing the method described in the first aspect or any one of the possible implementation manners of the first aspect.
可以选的,第三方面的计算机设备可以为计算机设备,或者可以为可用于计算机设备的部件(例如芯片或者电路等)。Optionally, the computer device of the third aspect may be a computer device, or may be a component (such as a chip or a circuit, etc.) that can be used in a computer device.
第四方面,本申请实施例提供一种计算机设备,该计算机设备包括用于执行第二方面所述方法的单元。In a fourth aspect, an embodiment of the present application provides a computer device, which includes a unit for executing the method described in the second aspect.
可选的,第四方面的计算机设备可以为计算机设备、或者可以为用于计算机设备的部件(例如芯片或者电路等)。Optionally, the computer device of the fourth aspect may be a computer device, or may be a component (for example, a chip or a circuit, etc.) used in a computer device.
第五方面,本申请实施例提供一种计算机设备,该计算机设备包括存储器和处理器,该存储器存储指令,该处理器调用该存储器中的指令执行第一方面或第一方面的任一种可能的实现方式所述的方法。In a fifth aspect, an embodiment of the present application provides a computer device that includes a memory and a processor, the memory stores instructions, and the processor invokes the instructions in the memory to execute the first aspect or any one of the first aspects. The method described in the implementation mode.
第六方面,本申请实施例提供一种计算机设备,该计算机设备包括存储器和处理器,该存储器存储指令,该处理器调用该存储器中的指令执行第二方面所述的方法。In a sixth aspect, an embodiment of the present application provides a computer device including a memory and a processor, the memory stores instructions, and the processor invokes the instructions in the memory to execute the method described in the second aspect.
第七方面,本申请实施例提供一种计算机可读存储介质,该计算机可读存储介质存储用于实现第一方面或第一方面的任一种可能的实现方式所述的方法的指令。In a seventh aspect, an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores instructions for implementing the first aspect or any one of the possible implementation manners of the first aspect.
第八方面,本申请实施例提供一种计算机可读存储介质,该计算机可读存储介质存储用于实现第二方面所述的方法的指令。In an eighth aspect, an embodiment of the present application provides a computer-readable storage medium that stores instructions for implementing the method described in the second aspect.
第九方面,本申请提供了一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述第一方面或第一方面的任一种可能的实现方式所述的方法。In a ninth aspect, this application provides a computer program product containing instructions, when the computer program product is run on a computer, the computer can execute the first aspect or any one of the possible implementations of the first aspect. method.
第十方面,本申请提供了一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述第二方面所述的方法。In a tenth aspect, this application provides a computer program product containing instructions, which when the computer program product runs on a computer, causes the computer to execute the method described in the second aspect.
附图说明Description of the drawings
图1是常见的对话系统的示意图。Figure 1 is a schematic diagram of a common dialogue system.
图2是DST的工作示意图。Figure 2 is a schematic diagram of the work of DST.
图3是根据本申请实施例提供的训练DST的示意性流程图。Fig. 3 is a schematic flowchart of training DST provided according to an embodiment of the present application.
图4是根据本申请实施例提供的训练策略网络模型的示意性流程图。Fig. 4 is a schematic flowchart of a training strategy network model provided according to an embodiment of the present application.
图5是利用该P个第二文本训练该策略网络模型的方法的示意性流程图。Fig. 5 is a schematic flowchart of a method for training the policy network model by using the P second texts.
图6是根据本申请实施例提供的计算机设备的结构框图。Fig. 6 is a structural block diagram of a computer device provided according to an embodiment of the present application.
图7是根据本申请实施例提供的计算机设备的结构框图。Fig. 7 is a structural block diagram of a computer device provided according to an embodiment of the present application.
具体实施方式Detailed ways
以下实施例中所使用的术语只是为了描述特定实施例的目的,而并非旨在作为对本申请的限制。如在本申请的说明书和所附权利要求书中所使用的那样,单数表达形式“一个”、“一种”、“所述”、“上述”、“该”和“这一”旨在也包括例如“一个或多个”这种 表达形式,除非其上下文中明确地有相反指示。还应当理解,在本申请以下各实施例中,“至少一个”、“一个或多个”是指一个、两个或两个以上。术语“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系;例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A、B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。The terms used in the following embodiments are only for the purpose of describing specific embodiments, and are not intended to limit the application. As used in the specification and appended claims of this application, the singular expressions "a", "an", "said", "above", "the" and "this" are intended to also This includes expressions such as "one or more" unless the context clearly indicates to the contrary. It should also be understood that in the following embodiments of the present application, “at least one” and “one or more” refer to one, two or more than two. The term "and/or" is used to describe the association relationship of associated objects, which means that there can be three kinds of relationships; for example, A and/or B can mean: A alone exists, A and B exist at the same time, and B exists alone. Among them, A and B can be singular or plural. The character "/" generally indicates that the associated objects are in an "or" relationship.
在本说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。References described in this specification to "one embodiment" or "some embodiments", etc. mean that one or more embodiments of the present application include a specific feature, structure, or characteristic described in combination with the embodiment. Therefore, the phrases "in one embodiment", "in some embodiments", "in some other embodiments", "in some other embodiments", etc. appearing in different places in this specification are not necessarily All refer to the same embodiment, but mean "one or more but not all embodiments" unless it is specifically emphasized otherwise. The terms "including", "including", "having" and their variations all mean "including but not limited to" unless otherwise specifically emphasized.
图1是常见的对话系统的示意图。如图1所示,对话系统100包括可以包括语音识别(automatic speech recognition,ASR)模块101、对话状态跟踪分类器(dialog state tracker,DST)102、对话策略学习(dialogue policy learning,DPL)模块103、对话生成(natural language generation,NLG)模块104和语音播报(text to speech,TTS)模块105等实现。Figure 1 is a schematic diagram of a common dialogue system. As shown in FIG. 1, the dialogue system 100 may include a speech recognition (automatic speech recognition, ASR) module 101, a dialogue state tracker (DST) 102, and a dialogue policy learning (DPL) module 103 , A dialogue generation (natural language generation, NLG) module 104 and a voice broadcast (text to speech, TTS) module 105 are implemented.
(1)ASR模块101(1) ASR module 101
ASR模块的主要作用是将用户的语音识别为文字内容。ASR模块可以获知用户在说什么,但其无法理解用户的意思,对语义的理解会交由NLU模块来处理。The main function of the ASR module is to recognize the user's voice as text content. The ASR module can learn what the user is saying, but it cannot understand what the user means. The understanding of the semantics will be handled by the NLU module.
(2)DST 102(2) DST 102
DST可以用于理解用户的意图(intent),进行槽位(slot)解析。DST can be used to understand the user's intent and perform slot analysis.
示例性的,用户表达:我母亲喜欢吃中餐,请问有什么可以推荐的么?Exemplary, user expression: My mother likes to eat Chinese food, is there anything you can recommend?
从这句话中,DST可以解析得出表1所示的内容。From this sentence, DST can analyze the content shown in Table 1.
表1Table 1
意图(intent)Intent “寻找餐馆”"Looking for a restaurant"
槽位(slot)Slot 食物类型=“中餐”Food type = "Chinese food"
上述示例中提到了2个概念,分别是意图和槽位,下面对这两个概念进行详细解释。In the above example, two concepts are mentioned, namely intention and slot. The two concepts are explained in detail below.
意图intention
意图可以理解成是一个分类器,确定用户表达的这句话是哪个类型,进而由这个类型对应的程序做专门的解析。在一种实现方式中,“这个类型对应的程序”可以是机器人(Bot),比如用户说:“给我放一首快乐的歌吧”,DST判断用户的意图分类是音乐,因此召唤出音乐机器人(Bot)给用户推荐一首歌播放,用户听着觉得不对的时候,说:“换一首”,还是这个音乐机器人继续为用户服务,直到用户表达别的问题,意图已经不是音乐的时候,再切换成别的机器人为用户服务。Intent can be understood as a classifier that determines which type of sentence the user expresses, and then the program corresponding to this type will do a special analysis. In one implementation, the "program corresponding to this type" can be a bot. For example, the user says: "Put me a happy song." DST judges that the user's intention classification is music, so it calls out music The bot recommends a song to the user to play. When the user feels that it is not right, he says: "change another song", or the music bot continues to serve the user until the user expresses other questions and the intention is no longer music , And then switch to another robot to serve users.
槽位Slot
当用户意图被确定之后,DST就需要进一步理解对话中的内容,为简便起见,可以选择最核心的部分进行理解,其他可以忽略,那些最重要的部分可以称之为槽位(Slot),槽位的内容可以称为槽位值(value)。After the user's intention is determined, DST needs to further understand the content of the dialogue. For simplicity, you can choose the most core part to understand, and the others can be ignored. Those most important parts can be called Slots. The content of the bit can be referred to as the slot value.
在“寻找餐馆”这句话中包括了一个槽位,该槽位为“食物类型”,对应的槽位值为“中餐”。A slot is included in the sentence "Looking for a restaurant". The slot is "Type of food" and the corresponding slot value is "Chinese food".
如果要全面考虑用户寻找餐馆需要输入的内容,我们肯定能想到更多,比如餐馆的地点、价格等。对于对话系统的设计者来说,设计的起点就是定义槽位。换句话说,设计者需要设计完成用户查询的内容需要有哪些槽位。If we want to fully consider what users need to enter when looking for a restaurant, we can definitely think of more, such as the location and price of the restaurant. For designers of dialogue systems, the starting point of the design is to define slots. In other words, the designer needs to design which slots are required to complete the content of the user query.
还以“寻找餐馆”为例,设计者可以设计以下槽位:地点,价格,请求,食物类型。对话系统需要知道上述槽位的槽位值,才能够为用户提供合适的查询结果。Taking "finding a restaurant" as an example, the designer can design the following slots: location, price, request, food type. The dialogue system needs to know the slot value of the above slot to be able to provide users with appropriate query results.
除了可以用于确定意图和槽位-槽位值外,DST还可以用于跟踪对话状态。对话状态可以理解为当前任务的槽位填充情况。槽位的填充情况可以包括槽位是否已被填充(即是否有对应的槽位值),以及已填充的槽位值。换句话说,DST可以在确定了意图和槽位值后,继续确定与该意图对应的槽位中有哪些还没有对应的槽位值,并且对已经有的槽位值的进行概率。In addition to determining intent and slot-to-slot value, DST can also be used to track conversation status. The dialogue state can be understood as the slot filling of the current task. The filling status of the slot may include whether the slot has been filled (that is, whether there is a corresponding slot value), and the filled slot value. In other words, after determining the intent and the slot value, the DST can continue to determine which of the slots corresponding to the intent have no corresponding slot value, and perform the probability of the existing slot value.
示例性的,用户表达的是“我母亲喜欢吃中餐,请问有什么可以推荐的么?”。此时NLU模块可以确定用户的意图为“寻找餐馆”。与该意图对应的槽位为“地点”,“价格”,“请求”,“食物类型”。DST可以根据与“寻找餐馆”这一意图对应的槽位,确定出用户的表达的语句中只有“食物类型”一个槽位的槽位值。在此情况下DST可以确定缺失以下槽位的槽位值:“地点”,“价格”,“请求”。DST还可以确定“中餐”的概率。Exemplarily, the user expressed "My mother likes Chinese food, is there anything you can recommend?". At this time, the NLU module can determine that the user's intention is "looking for a restaurant". The slots corresponding to the intent are "location", "price", "request", and "food type". DST can determine that there is only one slot value of "food type" in the user's expression based on the slot corresponding to the intention of "finding a restaurant". In this case, DST can determine the missing slot value of the following slots: "location", "price", "request". DST can also determine the probability of "Chinese food."
本申请实施例提供了如何训练DST的方法,训练DST的具体实现方式可以参见图3至图5所示的方法。The embodiment of the application provides a method of how to train DST. For the specific implementation of training DST, refer to the methods shown in FIG. 3 to FIG. 5.
(3)DPL模块103(3) DPL module 103
DPL模块主要作用是根据DST输出的对话状态,确定后续处理策略。还以“我母亲喜欢吃中餐,请问有什么可以推荐的么?”。根据DST输出的对话状态,DPL模块可以发现“地点”,“价格”,“请求”这三个槽位的槽位值缺失。因此,DPL模块可以触发“反问餐馆信息”动作,并将这个动作传递给NLG模块。The main function of the DPL module is to determine the follow-up processing strategy according to the dialogue state output by the DST. He also said "My mother likes to eat Chinese food. Is there anything you can recommend?". According to the dialog status output by DST, the DPL module can find that the slot values of the three slots "location", "price" and "request" are missing. Therefore, the DPL module can trigger the "reversely ask restaurant information" action and pass this action to the NLG module.
(4)NLG模块104(4) NLG module 104
NLG模块的主要作用是生成对话。例如,DPL模块在将“反问餐馆信息”动作传递给NLG模块后,NLG模块可以生成如下内容“找到是10家中餐店,请问您想在哪个地方就餐?”。The main function of the NLG module is to generate dialogue. For example, after the DPL module passes the action of "requesting restaurant information" to the NLG module, the NLG module can generate the following content "I found 10 Chinese food restaurants. Where do you want to eat?".
(5)TTS模块105(5) TTS module 105
TTS模块的主要作用是向用户播报对话。TTS模块可以将NLG模块输出的内容进行文字-语音转换,通过输出装置向用户播报对话系统生成的对话。The main function of the TTS module is to broadcast conversations to users. The TTS module can convert the content output by the NLG module into text-to-speech, and broadcast the dialogue generated by the dialogue system to the user through the output device.
可以理解的是,图1所示的对话系统100只是一种常见的可以应用本申请提供技术方案的对话系统。除了如图1所示的对话系统100外,其他对话系统也可以应用本申请提供的技术方案。例如,在一些实施例中,用户可以通过文字与对话系统进行对话。在此情况下,该对话系统可以不包括ASR模块和TTS模块。又如,在另一些实施例中,对话系统可以不包括ASR模块但是包括TTS模块。在此情况下,用户可以通过文字输入对话,对话系统可以通过语音进行回复。It is understandable that the dialogue system 100 shown in FIG. 1 is just a common dialogue system that can be applied to the technical solutions provided by this application. In addition to the dialog system 100 shown in FIG. 1, other dialog systems can also apply the technical solutions provided in this application. For example, in some embodiments, the user can talk to the dialogue system through text. In this case, the dialogue system may not include the ASR module and the TTS module. For another example, in other embodiments, the dialogue system may not include the ASR module but include the TTS module. In this case, the user can enter the dialogue by text, and the dialogue system can reply by voice.
另外,可以理解的是,图1所示的对话系统100中的各个模块划分,仅仅为一种可能的划分方式。除了如图1所示的划分方式外,对话系统中的各个模块还可以有其他的划分方式。例如,可以将图1所示系统100的一个模块按照功能划分为多个模块,不同模块具有不同的功能。又如,可以将图1所示的系统100中的两个或更多的模块合成一个模块。In addition, it can be understood that the division of each module in the dialogue system 100 shown in FIG. 1 is only a possible way of division. In addition to the division as shown in Figure 1, each module in the dialogue system can also have other divisions. For example, one module of the system 100 shown in FIG. 1 can be divided into multiple modules according to functions, and different modules have different functions. For another example, two or more modules in the system 100 shown in FIG. 1 can be combined into one module.
图2是DST的工作示意图。如图2所示的DST200包括语义编码模块201、语义编码模块202、语义融合模块203、预测模块204和状态更新模块205。Figure 2 is a schematic diagram of the work of DST. The DST 200 as shown in FIG. 2 includes a semantic encoding module 201, a semantic encoding module 202, a semantic fusion module 203, a prediction module 204, and a status update module 205.
假设用户和对话系统的对话流程如下:Assume that the dialogue flow between the user and the dialogue system is as follows:
用户:我母亲喜欢吃中餐,请问有什么可以推荐的么?User: My mother likes to eat Chinese food, is there anything you can recommend?
对话系统:找到是10家中餐店,请问您想在哪个地方就餐?Dialogue system: I found 10 Chinese restaurants. Where would you like to eat?
用户:只要价格便宜,地点无所谓。可以告诉我餐厅的名字和位置吗?User: As long as the price is cheap, the location does not matter. Can you tell me the name and location of the restaurant?
对话系统:张妈妈川味馆位于安德路100号。Dialogue system: Mama Zhang's Sichuan Flavor Museum is located at No. 100 Ande Road.
语义编码模块201可以用于根据上一轮对话系统的回复内容,确定语义向量。语义编码模块202可以用于根据当前轮用户表述的内容,确定语义向量。为了便于描述,以下将语义编码模块201确定的语义向量称为语义向量1,将语义编码模块202确定的语义向量称为语义向量2。The semantic encoding module 201 may be used to determine the semantic vector according to the reply content of the last round of dialogue system. The semantic encoding module 202 may be used to determine the semantic vector according to the content expressed by the current round of users. For ease of description, the semantic vector determined by the semantic encoding module 201 is referred to as semantic vector 1, and the semantic vector determined by the semantic encoding module 202 is referred to as semantic vector 2.
语义融合模块203可以用于获取语义编码模块201确定的语义向量1和语义编码模块202确定的语义向量2;对语义向量1和语义向量2进行融合,确定一个新的融合语义向量。The semantic fusion module 203 can be used to obtain the semantic vector 1 determined by the semantic encoding module 201 and the semantic vector 2 determined by the semantic encoding module 202; to merge the semantic vector 1 and the semantic vector 2 to determine a new fused semantic vector.
预测模块204可以根据语义融合模块203确定的融合语义向量,对可能的槽位-槽位值二元组进行概率预测。预测的概率最大的槽位值可以作为预测的槽位值。The prediction module 204 may perform probabilistic prediction on the possible slot-slot value two-tuple according to the fused semantic vector determined by the semantic fusion module 203. The slot value with the highest predicted probability can be used as the predicted slot value.
状态更新模块205可以根据上一轮根据用户的表述内容确定的槽位-槽位值和当前轮用户的表述内容确定的槽位-槽位值,确定当前轮累计的槽位-槽位值。The status update module 205 may determine the accumulated slot-slot value of the current round according to the slot-slot value determined according to the user's expression content in the previous round and the slot-slot value determined by the user's expression content of the current round.
如图2所示,输入到语义编码模块201的上一轮对话系统的回复内容为“找到10家中餐店,请问您想在哪个地方就餐?”,输入到语义编码模块202的当前轮用户表述的内容为“只要价格便宜,地点无所谓。可以告诉我餐厅的名字和位置吗?”。预测模块204确定的预测结果如图2所示。为了简洁,图2中并未示出所有槽位-槽位值的预测结果。As shown in Figure 2, the reply content of the last round of dialogue system input into the semantic encoding module 201 is "Found 10 Chinese restaurants, where do you want to eat?", input into the semantic encoding module 202 in the current round of user expressions The content is "As long as the price is cheap, the location does not matter. Can you tell me the name and location of the restaurant?". The prediction result determined by the prediction module 204 is shown in FIG. 2. For the sake of brevity, Fig. 2 does not show all slot-slot value prediction results.
还如图2所示,状态更新模块205根据上一轮根据用户的表述内容确定的槽位-槽位值为<食物类型,中餐>,根据当前轮用户表述的内容确定的槽位-槽位值为<价格,便宜>、<地点,无需求>、<请求,名字>、<请求,位置>。状态更新模块205确定的当前轮累计的槽位-槽位值为:<食物类型,中餐>、<价格,便宜>、<地点,无需求>、<请求,名字>、<请求,位置>。As also shown in FIG. 2, the status update module 205 determines the slot-slot value according to the user's expression content in the previous round, and the slot-slot value determined according to the content expressed by the user in the current round. The values are <price, cheap>, <location, no demand>, <request, name>, <request, location>. The accumulated slot-slot values in the current round determined by the status update module 205 are: <food type, Chinese food>, <price, cheap>, <location, no demand>, <request, name>, and <request, location>.
本申请实施例提供一种训练DST的方法。计算机设备可以对训练文本数据库中的文本进行扩充,增加可以用于训练DST的训练文本,并利用扩充后的训练文本训练DST。为了便于描述,以下以一个文本为例,介绍计算机设备如何扩充文本,以及如何利用扩充后的文本训练DST。The embodiment of the present application provides a method for training DST. The computer equipment can expand the text in the training text database, increase the training text that can be used to train the DST, and use the expanded training text to train the DST. For ease of description, the following takes a text as an example to introduce how computer equipment expands the text, and how to use the expanded text to train DST.
图3是根据本申请实施例提供的训练DST的示意性流程图。图3所示的方法可以由计算机设备执行。本申请实施例对该计算机设备的具体形式并不进行限定,例如该计算机设备可以是个人计算机、膝上型计算机(laptop)、平板电脑、工作站、或者服务器等。图3所示方法训练得到的DST可以实现图1中的DST 102或者如图2所示的DST的功能。Fig. 3 is a schematic flowchart of training DST provided according to an embodiment of the present application. The method shown in FIG. 3 can be executed by a computer device. The embodiment of the present application does not limit the specific form of the computer device. For example, the computer device may be a personal computer, a laptop computer (laptop), a tablet computer, a workstation, or a server. The DST trained by the method shown in FIG. 3 can implement the function of DST 102 in FIG. 1 or the DST shown in FIG. 2.
301,计算机设备获取第一文本,该第一文本为训练文本数据库中的一个文本,该第一文本包括至少两个词组。301. The computer device obtains a first text, where the first text is a text in a training text database, and the first text includes at least two phrases.
本申请实施例中所称的词组可以是n元词组(n-gram),n为大于或等于1的正整数。N元词组表示由n个连续的词组成的文本片段。例如,一元词组是由一个词组成的文本片 段;二元词组是由两个词组成的文本片段;三元词组是由三个词组成的文本片段。The phrase referred to in the embodiments of the present application may be an n-gram, where n is a positive integer greater than or equal to 1. The N-gram phrase represents a text segment composed of n consecutive words. For example, a unary phrase is a text fragment composed of one word; a binary phrase is a text fragment composed of two words; a triple phrase is a text fragment composed of three words.
可选的,在一些实施例中,训练文本数据库中的文本的粒度可以为句子。换句话说,训练文本数据库中的每个文本是一个句子。Optionally, in some embodiments, the granularity of the text in the training text database may be sentences. In other words, each text in the training text database is a sentence.
可选的,另一些实施例中,训练文本数据库中的文本的粒度可以是由多个n元词组组成的文本片段,该文本片段可能并不是一个完整的句子。Optionally, in other embodiments, the granularity of the text in the training text database may be a text fragment composed of multiple n-gram phrases, and the text fragment may not be a complete sentence.
可选的,在另一些实施例中,训练文本数据库中的文本的粒度可以是由多个句子组成的文本。Optionally, in other embodiments, the granularity of the text in the training text database may be a text composed of multiple sentences.
为了便于描述,以下假设训练文本数据库中的文本的粒度为句子。换句话说,该第一文本是一个由至少两个词组组成的句子。For ease of description, the following assumes that the granularity of the text in the training text database is sentences. In other words, the first text is a sentence composed of at least two phrases.
本申请实施例对该训练文本数据库的保存位置并不限定。例如,该训练文本数据库可以保存在该计算机设备内的存储装置中。又如,该训练文本数据库保存在一个外部接的直连式存储装置中,例如移动硬盘、U盘等。又如,该训练文本数据库可以保存在其他计算机设备中,例如服务器、网络附加存储设备(Network Attached Storage,NAS)中。The embodiment of the present application does not limit the storage location of the training text database. For example, the training text database can be stored in a storage device in the computer device. For another example, the training text database is stored in an externally connected storage device, such as a mobile hard disk, U disk, etc. For another example, the training text database can be stored in other computer equipment, such as a server or a network attached storage (Network Attached Storage, NAS).
302,该计算机设备从该第一文本中确定至一个目标词组。302. The computer device determines a target phrase from the first text.
303,该计算机设备根据该至少一个目标词组,确定P个第二文本,该P个第二文本中的每个第二文本包括一个扩展词组,该扩展词组是基于该至少一个目标词组中的一个确定的,P为大于或等于1的正整数。303. The computer device determines P second texts based on the at least one target phrase. Each second text in the P second texts includes an expanded phrase based on one of the at least one target phrase. Certainly, P is a positive integer greater than or equal to 1.
换句话说,步骤303的目的是通过确定对应于目标词组的扩展词组,将第一文本扩展为多个文本(即P个第二文本)。In other words, the purpose of step 303 is to expand the first text into multiple texts (ie, P second texts) by determining the expanded phrase corresponding to the target phrase.
如上假设,第一文本中是由至少两个词组组成的句子,但是并不需要为所有词组都确定一个或多个对应的扩展词组。因此,需要从该第一文本中确定目标词组,并确定对应于目标词组的一个或多个扩展词组。在确定出来了一个或多个扩展词组后,使用一个扩展词组替换第一文本中与该扩展词组对应的目标词组,得到一个第二文本。第二文本中的非目标词组以及与该扩展词组不对应的目标词组与第一文本相同。As assumed above, the first text is a sentence composed of at least two phrases, but it is not necessary to determine one or more corresponding extended phrases for all phrases. Therefore, it is necessary to determine the target phrase from the first text, and determine one or more expanded phrases corresponding to the target phrase. After one or more expanded phrases are determined, an expanded phrase is used to replace the target phrase corresponding to the expanded phrase in the first text to obtain a second text. The non-target phrase in the second text and the target phrase that does not correspond to the expanded phrase are the same as the first text.
例如,假设第一文本为“我想找一个便宜的中餐餐馆”。该计算机设备通过对该第一文本进行分词,可以确定该第一文本中包括以下词组:“我”,“想找”,“一个”,“便宜”,“的”,“中餐”,“餐馆”。根据预设的扩展规则,可以确定“便宜”和“中餐”这两个词组为目标词组。对应于“便宜”的扩展词组可以包括“实惠”和“消费低”。对应于“中餐”的扩展词组可以包括“日餐”和“法餐”。因此,根据“我想找一个便宜的中餐馆”确定出的第二文本可以包括:For example, suppose the first text is "I want to find a cheap Chinese restaurant". By segmenting the first text, the computer device can determine that the first text includes the following phrases: "I", "Want to find", "a", "cheap", "the", "Chinese food", "restaurant" ". According to the preset expansion rules, the two phrases "cheap" and "Chinese food" can be determined as target phrases. The expanded phrase corresponding to "cheap" may include "affordable" and "low consumption". The expanded phrase corresponding to "Chinese food" may include "Japanese food" and "French food". Therefore, the second text determined based on "I want to find a cheap Chinese restaurant" can include:
第二文本1:我想找一个便宜的日餐餐馆。Second text 1: I want to find a cheap Japanese restaurant.
第二文本2:我想找一个便宜的法餐餐馆。Second text 2: I want to find a cheap French restaurant.
第二文本3:我想找一个实惠的中餐餐馆。Second text 3: I want to find an affordable Chinese restaurant.
第二文本4:我想找一个消费低的中餐餐馆。Second text 4: I want to find a Chinese restaurant with low consumption.
预设的扩展规则可以包括两种:一种预设的扩展规则可以是基于槽位-槽位值的扩展规则;另一种预设的扩展规则可以是基于词义的扩展规则。为了便于描述,以下将基于槽位-槽位值的扩展规则简称为第一扩展规则,将基于词义的扩展规则简称为第二扩展规则。The preset extension rules may include two types: one preset extension rule may be an extension rule based on slot-slot value; the other preset extension rule may be an extension rule based on word meaning. For ease of description, the expansion rule based on slot-slot value is referred to as the first expansion rule, and the word meaning-based expansion rule is referred to as the second expansion rule.
该计算机设备可以确定第一文本中的词组是否包括可以使用第一扩展规则进行扩展的词组。更具体地,该计算机设备可以确定第一文本的词组中是否有可以作为槽位的槽位 值的词组;若该第一文本中的词组中有一个或多个可以作为槽位的槽位值的词组,则该计算机设备可以确定这些词组为目标词组。为了便于区分,以下将可以作为槽位值的词组称为第一类目标词组。The computer device can determine whether the phrase in the first text includes a phrase that can be expanded using the first expansion rule. More specifically, the computer device can determine whether there is a phrase in the first text that can be used as the slot value of the slot; if there are one or more phrases in the first text that can be used as the slot value of the slot The computer equipment can determine these phrases as target phrases. In order to facilitate the distinction, the phrase that can be used as the slot value is referred to as the first type of target phrase below.
例如,该计算机设备可以通过搜索槽位值数据库来确定该第一文本中的词组是否是可以作为槽位的槽位值的词组。槽位值数据库由可以作为槽位值的词组组成。该计算机设备在对该第一文本进行分词,得到组成该第一文本的多个词组后,可以对该槽位值数据库进行搜索,确定该第一文本中的每个词组是否在该槽位值数据库中。若该第一文本中的一个或多个词组在该槽位值数据库中,则可以确定该一个或多个词组为第一类目标词组。For example, the computer device may search the slot value database to determine whether the phrase in the first text is a phrase that can be used as the slot value of the slot. The slot value database is composed of phrases that can be used as slot values. After the computer device performs word segmentation on the first text to obtain multiple phrases that make up the first text, it can search the slot value database to determine whether each phrase in the first text is in the slot value In the database. If one or more phrases in the first text are in the slot value database, it can be determined that the one or more phrases are the first-type target phrase.
该计算机设备在确定了第一类目标词组的情况下,可以确定每个第一类目标词组所对应的槽位。When the computer device has determined the target phrase of the first type, it can determine the slot corresponding to each target phrase of the first type.
例如,假设该计算机设备是通过搜索槽位值数据库来确定第一类目标词组的。该槽位值数据库中还可以包括每个槽位值对应的槽位。因此,该计算机设备在确定了一个词组为该第一类目标词组的情况下,还可以确定该词组对应的槽位。For example, suppose that the computer device determines the first type of target phrase by searching the slot value database. The slot value database may also include the slot corresponding to each slot value. Therefore, the computer device can also determine the slot corresponding to the phrase when it determines that a phrase is the first type of target phrase.
还以第一文本“我想找一个便宜的中餐餐馆”为例,该计算机设备可以确定“中餐”为一个可以作为槽位值的词组。该计算机设备还可以确定对应于“中餐”的槽位为“食物类型”。Taking the first text "I want to find a cheap Chinese restaurant" as an example, the computer device can determine that "Chinese food" is a phrase that can be used as a slot value. The computer device can also determine that the slot corresponding to "Chinese food" is "food type".
该计算机设备还可以确定第一文本中的词组是否包括可以使用第二扩展规则进行扩展的词组。更具体地,该计算机设备可以确定该第一文本的词组中是否有一些符合特定规则的词组;若该第一文本的词组中有一个或多个符合该特定规则的词组,则该计算机设备可以确定这些词组为目标词组。为了便于区分,以下将符合该特定规则的词组称为第二类目标词组。The computer device can also determine whether the phrase in the first text includes a phrase that can be expanded using the second expansion rule. More specifically, the computer device can determine whether there are some phrases in the first text that meet specific rules; if there are one or more phrases in the first text that meet the specific rules, the computer device can Identify these phrases as target phrases. In order to facilitate the distinction, the phrase that meets the specific rule is referred to as the second type of target phrase below.
例如,通常情况下,替换一个文本中词性为人称代词、冠词、介词、助词等的词组并不会对训练DST有很大的帮助。而替换词性为形容词、副词等的词组对训练DST的帮助较大。因此,该特定规则可以是词性为预设的词性的词组为该第二类目标词组。在此情况下,该计算机设备可以确定该第一文本中的每个词组的词性。如果词组的词性属于预设的词性,则可以确定该词组是第二类目标词组。该预设的词性可以是形容词和副词中的至少一个。For example, under normal circumstances, replacing phrases with personal pronouns, articles, prepositions, particles, etc. in a text will not be very helpful in training DST. However, replacing phrases with adjectives, adverbs, etc., is more helpful in training DST. Therefore, the specific rule may be that a phrase whose part of speech is a preset part of speech is the second type of target phrase. In this case, the computer device can determine the part of speech of each phrase in the first text. If the part of speech of the phrase belongs to the preset part of speech, it can be determined that the phrase is the second type of target phrase. The preset part of speech can be at least one of an adjective and an adverb.
又如,该计算机设备可以确定词组重要性来确定该词组是否是第二类目标词组。如果该词组是一个重要词组,则该词组可以是第二类目标词组。如果该词组不是一个重要词组,那么该词组可以不是该第二类目标词组。可选的,在一些实施例中,词组的重要性可以通过词组在训练文本数据库中出现的频率。词组出现的频率可以通过包括该词组的文本数目与训练文本数据库中包括的总文本数目的比值来确定。如果一个词组在训练文本数据库中出现的频率超过一个预设频率阈值,则可以确定该词组是一个第二类目标词组。可选的,在另一些实施例中,词组的重要性可以通过词组在训练文本数据库中出现的次数来确定。如果一个词组在训练文本数据库中出现的次数超过一个预设次数阈值,则可以确定该词组是一个第二类目标词组。For another example, the computer device can determine the importance of the phrase to determine whether the phrase is the second type of target phrase. If the phrase is an important phrase, the phrase can be the second type of target phrase. If the phrase is not an important phrase, then the phrase may not be the second type of target phrase. Optionally, in some embodiments, the importance of the phrase may be based on the frequency of the phrase in the training text database. The frequency of occurrence of the phrase can be determined by the ratio of the number of texts including the phrase to the total number of texts included in the training text database. If the frequency of a phrase in the training text database exceeds a preset frequency threshold, it can be determined that the phrase is a second-type target phrase. Optionally, in other embodiments, the importance of the phrase may be determined by the number of times the phrase appears in the training text database. If the number of occurrences of a phrase in the training text database exceeds a preset threshold, it can be determined that the phrase is a second-type target phrase.
还以第一文本“我想找一个便宜的中餐馆”为例。假设该计算机设备是通过词性来确定该第二类目标词组的。那么,该计算机设备可以确定该第一文中有一个词性为形容词的词组,即“便宜”。在此情况下,该计算机设备可以确定“便宜”是一个第二类目标词组。Take the first text "I want to find a cheap Chinese restaurant" as an example. Assume that the computer device determines the second type of target phrase by part of speech. Then, the computer device can determine that there is a phrase whose part of speech is an adjective in the first article, that is, "cheap". In this case, the computer device can determine that "cheap" is a second type of target phrase.
在确定了至少一个目标词组后,该计算机设备可以根据每个目标词组确定至少一个扩展词组。After determining at least one target phrase, the computer device can determine at least one expanded phrase according to each target phrase.
假设该至少一个目标词组中包括K 1个第一类目标词组,该计算机设备可以确定与K 1个槽位对应的K 1个第一词组集合,该K 1个第一词组集合中的每个第一次词组集合包括至少一个词组,K 1为大于或等于1的正整数。假设该计算机设备共确定了K个目标词组。可以理解的是,K为大于或等于1且大于或等于K 1的正整数。该K 1个槽位分别为该K 1个目标词组的槽位。换句话说,K 1个槽位中的第K 1_n个槽位为第K 1个第一类目标词组中的第K 1_n个第一类目标词组的槽位。该K 1个第一词组集合中的第K 1_n个第一词组集合中的任一个词组的槽位为第K 1_n个槽位。K 1_n等于1,……,K 1Assume that the target phrase comprising at least one of first type K 1 target phrase, the computer device may determine K 1 K 1 slots corresponding set of first phrase, the phrase K 1 th first set each The first phrase set includes at least one phrase, and K 1 is a positive integer greater than or equal to 1. Assume that the computer device has determined a total of K target phrases. It will be appreciated that, K is greater than or equal to 1 and greater than or equal to K a positive integer. The K 1 slots are respectively the slots of the K 1 target phrase. In other words, the slots K 1 K 1 _n the first slots of the first K 1 th first target phrase in the first category K 1 _n of first type slots target phrase. Any one of K 1 _n the first phrase of the first phrase set one of the K 1 th first phrase set in the slot for the first K 1 _n slots. K 1 _n is equal to 1,..., K 1 .
例如,假设K 1为10,该10个第一类目标词组中的第5个第一类目标词组的槽位为“食物类型”,则10个第一次词组集合中与该第一类目标词组对应的第一次词组集合(假设为第5个第一词组集合)中的任一个词组对应的槽位为“食物类型”。假设第5个第一词组集合中包括两个词组,分别为“日餐”和“法餐”。在此情况下,该计算机设备可以确定分别使用“日餐”和“法餐”替换第一文本中的“中餐”,从而得到上述第二文本1(即,我想找一个便宜的日餐餐馆)和第二文本2(即,我想找一个便宜的法餐餐馆)。 For example, suppose K 1 is 10, and the slot of the fifth first-type target phrase in the 10 first-type target phrases is "food type", then the 10 first-time phrase set matches the first-type target The slot corresponding to any phrase in the first phrase set (assumed to be the fifth first phrase set) corresponding to the phrase is "food type". Suppose that the fifth first phrase set includes two phrases, namely "Japanese food" and "French food". In this case, the computer device can determine to replace "Chinese food" in the first text with "Japanese food" and "French food" respectively, thereby obtaining the second text 1 above (that is, I want to find a cheap Japanese restaurant ) And the second text 2 (that is, I want to find a cheap French restaurant).
可选的,在一些实施例中,该计算机设备可以根据第一对应关系确定出与该K 1个槽位对应的K 1个第一词组集合。该第一对应关系包括多个槽位与多个第一词组集合的对应关系。每个第一词组集合中的任一个词组的槽位与该第一词组集合对应的槽位相同。 Alternatively, in some embodiments, the computer device can determine the set of K 1 K 1 th slot of first phrase corresponding to a first corresponding relationship. The first correspondence includes correspondences between multiple slots and multiple first phrase sets. The slot of any phrase in each first phrase set is the same as the slot corresponding to the first phrase set.
假设该至少一个目标词组中包括K 2个第二类目标词组,该计算机设备可以确定与K 2个词义对应的K 2个第二词组集合,K 2为大于或等于1的正整数。类似的,假设该计算机设备共确定了K个目标词组。可以理解的是,K为大于或等于1且大于或等于K 2的正整数。此外,若K 1与K 2的值均不等于K,则K 1与K 2的和为K。换句话说,该计算机设备共确定了K个目标词组,其中K 1个为第一类目标词组,K 2个为第二类目标词组。该K 2个词义分别为K 2个第二类目标词组的词义。换句话说,K 2个词义中的第K 2_n个词义为第K 2个第二类目标词组中的第K 2_n个第二类目标词组的词义。该K 2个第二词组集合中的第K 2_n个第二词组集合中的任一个词组的词义与第K 2_n个词义相对应。K 2_n等于1,……,K 2Assume that the target phrase comprising at least one second category K 2 target phrase, the computer device may determine K 2 K 2 th meaning corresponding second set of phrases, K 2 is a positive integer equal to or greater than 1. Similarly, suppose that the computer equipment has determined a total of K target phrases. It can be understood that K is a positive integer greater than or equal to 1 and greater than or equal to K 2 . In addition, if the values of K 1 and K 2 are not equal to K, then the sum of K 1 and K 2 is K. In other words, the computer device has determined a total of K target phrases, of which K 1 is the first type of target phrase, and K 2 is the second type of target phrase. The K 2 word meanings are respectively the word meanings of K 2 target phrases of the second type. In other words, K 2 in the first two meanings K 2 _n 2 _n meaning as meaning two second type of target phrase K 2 in the second category of the target phrase K. The meaning of any phrase _n K 2 of the second set of phrases K 2 in the second set of phrases in the first two K 2 _n corresponding meaning. K 2 _n is equal to 1,..., K 2 .
可选的,在一些实施例中,两个词组的词义相对应可以是指两个词组的词义相同。可以称这两个词中的任一个词是另一个词的释义(paraphrase),即一个词是另一个词的另一种表达方式。例如,“便宜”的释义可以为“实惠”和“消费低”。Optionally, in some embodiments, the word meanings of two phrases corresponding to each other may mean that the meanings of the two phrases are the same. It can be said that any of these two words is the paraphrase of the other word, that is, one word is another expression of the other word. For example, "cheap" can be interpreted as "beneficial" and "low consumption".
可选的,在另一些实施例中,两个词组的词义相对应除了可以指两个词组的词义相同外,还可以是指这两个词组互为反义词。例如,与“便宜”对应的词组可以为“贵”和“消费高”。Optionally, in other embodiments, the word meanings of two phrases corresponding to each other can mean that the two phrases have the same meaning, but also that the two phrases are antonyms of each other. For example, the phrases corresponding to "cheap" can be "expensive" and "consumption is high".
可选的,在一些实施例中,该计算机设备可以根据第二对应关系确定出与K 2个词义对应的K 2个第二词组集合。该第二对应关系包括多个词义与多个第二词组集合的对应关系。每个第二词组集合中的任一个词组的词义与该第二词组集合对应的词义相同。 Alternatively, in some embodiments, the computer device can be determined with K 2 K 2 th second set of phrases corresponding to the meaning according to the second correspondence relationship. The second correspondence includes correspondences between multiple word meanings and multiple second phrase sets. The word meaning of any phrase in each second phrase set is the same as the word meaning corresponding to the second phrase set.
可选的,在一些实施例中,该计算机设备可以根据同义词数据库来确定与每个第二类目标词组对应的第二词组集合。Optionally, in some embodiments, the computer device may determine the second phrase set corresponding to each target phrase of the second type according to the synonym database.
可选的,在另一些实施例中,该计算机设备可以根据同义词数据库或反义词数据库来 确定与每个第二类目标词组对应的第二次组集合。Optionally, in other embodiments, the computer device may determine the second group set corresponding to each target phrase of the second type according to the synonym database or the antonym database.
可选的,在另一些实施例中,该计算机设备可以利用现有的释义语料库来确定与每个第二类目标词组对应的第二词组集合。例如,释义数据库(http://paraphrase.org)是一个目前被广泛使用的释义语料库。利用该释义数据库,可以确定与每个第二类目标词组对应的一组词组集合。利用释义数据库确定的一组词组集合中的部分词组的词义和对应于该词组集合的词组的词义可能并不完全相同或者相反。以expensive(昂贵的)为例,利用该释义数据库得到的词组集合中除了包括如costly、pricey等同义词和cheap(便宜的)、inexpensive(不昂贵的)等反义词外,还包括诸如onerous(繁重的)、burdensome(累赘的)等与昂贵的词义既不是反义词也不是同义词的词组。出现上述问题的原因是该释义数据库的建立方式决定的,在此就不必详细描述。因此,如果该K 2个第二词组集合是利用释义数据库确定的,那么即使第二词组集合中的一个词组和对应于该第二词组集合的第二类目标词组的词义不完全相同或相反,也可以称这两个词组是对应的。 Optionally, in other embodiments, the computer device may use an existing paraphrase corpus to determine the second set of phrases corresponding to each target phrase of the second type. For example, the Paraphrase Database (http://paraphrase.org) is a widely used paraphrase corpus. Using the paraphrase database, a set of phrases corresponding to each target phrase of the second type can be determined. The word meanings of some phrases in a group of phrase sets determined by using the interpretation database and the word meanings of the phrases corresponding to the phrase set may not be completely the same or opposite. Take expensive (expensive) as an example. In addition to synonyms such as costly and pricey, and antonyms such as cheap (cheap) and expensive (not expensive), the phrase set obtained by using this paraphrase database also includes such as onerous. ), burdensome (cumbersome) and expensive words are neither antonyms nor synonyms. The reason for the above problem is determined by the way the paraphrase database is established, so it is not necessary to describe it in detail here. Therefore, if the K 2 second phrase sets are determined using the paraphrase database, even if the meanings of a phrase in the second phrase set and the second type target phrase corresponding to the second phrase set are not exactly the same or opposite, It can also be said that these two phrases are corresponding.
假设基于“便宜”确定的第二词组集合包括“实惠”和“消费低”这两个词组,则该计算机设备可以确定分别使用“实惠”和“消费低”替换第一文本中的“便宜”,从而得到上述第二文本3(即,我想找一个实惠的中餐餐馆)和第二文本4(即,我想找一个消费低的中餐餐馆)。Assuming that the second phrase set determined based on "cheap" includes the two phrases "affordable" and "consumption low", the computer device can determine to replace "cheap" in the first text with "affordable" and "consumption low" respectively , So as to obtain the above-mentioned second text 3 (that is, I want to find an affordable Chinese restaurant) and the second text 4 (that is, I want to find a low-consumption Chinese restaurant).
可以理解的是,在上述第一文本“我想找一个便宜的中餐馆”中包括第一类目标词组和第二类目标词组。训练文本数据库中的一些文本可能包括第一类目标词组和第二类目标词组,训练文本数据库中的一些文本可能只包括第一类目标词组和第二类目标词组中的一个。在一些实施例中,训练文本数据库中的一些文本可能不包括第一类目标词组和第二类目标词组中的任一个。对于这种文本(即不包括第一类目标词组和第二类目标词组中的任一个),该计算机设备可以不进行扩展,直接使用该文本。It is understandable that the above-mentioned first text "I want to find a cheap Chinese restaurant" includes the first type of target phrase and the second type of target phrase. Some texts in the training text database may include the first type target phrase and the second type target phrase, and some texts in the training text database may only include one of the first type target phrase and the second type target phrase. In some embodiments, some texts in the training text database may not include any one of the first type target phrase and the second type target phrase. For this kind of text (that is, excluding any one of the first type of target phrase and the second type of target phrase), the computer device may directly use the text without expansion.
304,该计算机设备根据该第一文本和P个第二文本,通过机器学习,训练DST。304. The computer device trains DST through machine learning according to the first text and the P second texts.
可选的,在一些实施例中,该计算机设备可以直接使用该第一文本和P个第二文本作为机器学习的训练文本,训练该DST。该计算机设备训练DST的具体实现方式与现有的实现方式相同,为了简洁,在此就不必赘述。Optionally, in some embodiments, the computer device may directly use the first text and P second texts as training texts for machine learning to train the DST. The specific implementation manner of the computer equipment training DST is the same as the existing implementation manner. For the sake of brevity, it is not necessary to repeat it here.
可以理解的是,在一些实施例中,该计算机设备还可以使用该第一文本和该P个第二文本中的部分文本作为机器学习的训练文本,训练该DST。例如,该计算机设备可以使用该第一文本和该P个第二文本中的部分文本,训练该DST。又如,该计算机设备可以使用该P个第二文本或者该P个第二文本中的部分第二文本训练该DST。It is understandable that, in some embodiments, the computer device may also use part of the first text and the P second texts as training texts for machine learning to train the DST. For example, the computer device may use part of the first text and the P second texts to train the DST. For another example, the computer device may use the P second text or part of the second text in the P second text to train the DST.
可选的,在一些实施例中,该计算机设备选择该P个第二文本中的部分文本作为机器学习的训练文本的方式可以是随机选择。Optionally, in some embodiments, the computer device may select part of the P second texts as training texts for machine learning in a random manner.
可选的,在另一些实施例中,该计算机设备可以使用该P个第二文本,训练策略网络模型,利用该策略网络模型从该P个第二文本中选择至少一个第二文本作为机器学习的训练文本。Optionally, in other embodiments, the computer device may use the P second texts to train a policy network model, and use the policy network model to select at least one second text from the P second texts as machine learning Training text.
可选的,在一些实施例中,该计算机设备可以利用强化学习算法或进化算法训练该策略网络模型。更具体地,该计算机设备可以利用上下文赌博机(contextual bandit)算法、遗传算法等训练该策略网络模型。Optionally, in some embodiments, the computer device may use a reinforcement learning algorithm or an evolutionary algorithm to train the strategy network model. More specifically, the computer device may use contextual bandit algorithms, genetic algorithms, etc. to train the strategy network model.
下面以上下文赌博机算法为例,对如何训练该策略网络模型进行简单介绍。Taking the context gambling machine algorithm as an example, how to train the strategy network model is briefly introduced below.
图4是根据本申请实施例提供的训练策略网络模型的示意性流程图。Fig. 4 is a schematic flowchart of a training strategy network model provided according to an embodiment of the present application.
401,计算机设备从训练文本数据库中确定M个文本。M为大于或等于1的正整数,且M的值小于训练文本数据库包括的总文本数量。401. The computer device determines M texts from the training text database. M is a positive integer greater than or equal to 1, and the value of M is less than the total number of texts included in the training text database.
可选的,在一些实施例中,该计算机设备可以随机从训练文本数据库中挑选出该M个文本。Optionally, in some embodiments, the computer device may randomly select the M texts from the training text database.
可选的,在另一些实施例中,该计算机设备可以按照一定的规则从训练文本数据库中挑选出该M个文本。Optionally, in other embodiments, the computer device may select the M texts from the training text database according to certain rules.
例如,该计算机设备可以根据训练文本数据库中每个训练文本扩展得到的文本数量,确定该M个文本。如果该训练文本书库中的部分文本(以下简称第一部分文本)扩展得到的文本数量多于另一部分文本(以下简称第二部分文本),则该计算机设备可以确定出的M个文本中属于该第一部分文本的文本多于属于该第二部分文本的文本。该计算机设备可以从该第一部分文本和该第二部分文本中挑选属于该M个文本的文本的方式可以是随机的,也可以是按照一定顺序的。For example, the computer device can determine the M texts according to the number of texts expanded by each training text in the training text database. If part of the text in the training text library (hereinafter referred to as the first part of the text) expands more than another part of the text (hereinafter referred to as the second part of the text), the computer device can determine that the M texts belong to the first part One part of the text has more text than the second part of the text. The manner in which the computer device can select the text belonging to the M texts from the first part of the text and the second part of the text may be random or in a certain order.
又如,如果训练数据库中的部分文本(以下简称第三部分文本)基于上述第一扩展规则扩展出的文本数目大于基于第二扩展规则扩展出的文本数目,该训练文本中的另一部分文本(以下简称第四部分文本)基于上述第二扩展规则扩展出的文本数目大于基于第一扩展规则扩展出的文本数目,则该计算机设备可以确定出的M个文本中属于该第三部分文本的文本多于属于该第四部分文本的文本。该计算机设备可以从该第三部分文本和该第四部分文本中挑选属于该M个文本的文本的方式可以是随机的,也可以是按照一定顺序的。For another example, if part of the text in the training database (hereinafter referred to as the third part of the text) based on the first expansion rule, the number of texts expanded based on the first expansion rule is greater than the number of text expanded based on the second expansion rule, another part of the training text ( (Hereinafter referred to as the fourth part of the text) The number of texts expanded based on the above second expansion rule is greater than the number of texts expanded based on the first expansion rule, the computer device can determine the text belonging to the third part of the M texts More than the text belonging to the fourth part of the text. The way that the computer device can select the text belonging to the M texts from the third part of the text and the fourth part of the text may be random or in a certain order.
402,该计算机设备从第一增强数据库中确定M个扩展文本片段集合,其中该M个扩展文本片段集合与该M个文本一一对应。402. The computer device determines M extended text fragment sets from the first enhanced database, where the M extended text fragment sets correspond to the M texts in a one-to-one correspondence.
为了便于描述,以下将图3中基于第一文本确定P个第二文本的方式称为粗粒度数据增强策略。该第一增强数据库是对该训练文本数据库中的文本按照粗粒度数据增强策略扩展后得到的文本组成的数据库。换句话说,该第一增强数据库中的每个文本是根据训练文本数据库中的一个文本生成的。该第一增强数据库中并不包括训练文本数据库中的文本。For ease of description, the method of determining P second texts based on the first text in FIG. 3 is referred to as a coarse-grained data enhancement strategy below. The first enhanced database is a database composed of texts obtained after the texts in the training text database are expanded according to a coarse-grained data enhancement strategy. In other words, each text in the first enhanced database is generated based on a text in the training text database. The first enhanced database does not include the text in the training text database.
例如,假设训练文本数据库中共包括1000个句子。计算机设备可以利用粗粒度增强策略,将这1000个句子扩展为20000个句子,这20000个句子中不包括训练文本数据库中的1000个句子。可以理解的是,这1000个句子中可能存在三类句子:第一类句子中包括上述第一类目标词组和上述第二类目标词组;第二类句子仅包括上述第一类目标词组和第二类目标词组中第一个;第三类句子可能即不包括第一类目标词组也不包括第二类目标词组。对于该1000个句子中的每个第一类句子和第二类句子,该计算机设备可以利用图3所示的方法,进行扩展,得到20000个句子。这20000个句子组成的数据库就是该第一增强数据库。该第一增强数据库中不包括训练文本数据库中的1000个句子。For example, suppose that the training text database includes a total of 1000 sentences. The computer equipment can use the coarse-grained enhancement strategy to expand the 1,000 sentences into 20,000 sentences, which does not include the 1,000 sentences in the training text database. It is understandable that there may be three types of sentences in these 1,000 sentences: the first type of sentence includes the above-mentioned first type of target phrase and the above-mentioned second type of target phrase; the second type of sentence only includes the above-mentioned first type of target phrase and the first type of target phrase. The first of the two types of target phrases; the third type of sentence may neither include the first type of target phrase nor the second type of target phrase. For each of the first type sentence and the second type sentence in the 1,000 sentences, the computer device can use the method shown in FIG. 3 to expand to obtain 20,000 sentences. The database composed of 20,000 sentences is the first enhanced database. The first enhanced database does not include 1000 sentences in the training text database.
上述实施例中,第一增强数据库中包括文本的粒度与根据训练文本数据库中的文本粒度相同。例如,如果训练文本数据库中的粒度是句子,则第一增强数据库中的文本的粒度也是句子。在另一些实施例中,第一增强数据库中包括文本的粒度可以与根据训练文本数据库中的文本粒度不同。例如,如果训练文本数据库中的粒度是句子,则第一增强数据库中的文本的粒度也是扩展词组或者包括扩展词组的部分句子。In the foregoing embodiment, the granularity of the text included in the first enhanced database is the same as the granularity of the text in the training text database. For example, if the granularity in the training text database is a sentence, the granularity of the text in the first enhanced database is also a sentence. In other embodiments, the granularity of the text included in the first enhanced database may be different from the granularity of the text in the training text database. For example, if the granularity in the training text database is a sentence, the granularity of the text in the first enhanced database is also an extended phrase or a partial sentence including the extended phrase.
还以上述第一文本“我想找一个便宜的中餐馆”为例。在一些实施例中,第一增强数 据库中的文本包括的对应于该文本的文本可以包括上述第二文本1至第二文本4。在另一些实施例中,第一增强数据库中的文本包括的对应于该文本的文本可以包括“日餐”、“法餐”、“实惠”和“消费低”。在另一些实施例中,第一增强数据库中的文本包括的对应于该文本的文本可以包括“日餐餐馆”、“法餐餐馆”、“实惠的中餐餐馆”和“消费低的中餐餐馆”。Take the above first text "I want to find a cheap Chinese restaurant" as an example. In some embodiments, the text corresponding to the text included in the text in the first enhanced database may include the aforementioned second text 1 to second text 4. In other embodiments, the text corresponding to the text included in the text in the first enhanced database may include "Japanese food", "French food", "benefit" and "low consumption". In other embodiments, the text in the first enhanced database includes the text corresponding to the text may include "Japanese restaurant", "French restaurant", "affordable Chinese restaurant" and "low-consumption Chinese restaurant" .
可选的,在一些实施例中,第一增强数据库中的每个文本可以包括一个源指示信息,该源指示信息可以用于指示训练文本数据库中的一个文本。该源指示信息所指示的文本是用于生成包括该源指示信息的文本的文本。Optionally, in some embodiments, each text in the first enhanced database may include source indication information, and the source indication information may be used to indicate a text in the training text database. The text indicated by the source indication information is a text used to generate text including the source indication information.
可选的,在另一些实施例中,第一增强数据库可以以集合的形式保存文本。每个集合包括至少一个文本,该至少一个文本是由训练文本数据库中的同一个文本进行粗粒度增强策略扩展得到的。类似的,每个集合可以包括一个源指示信息,该源指示信息可以用于指示可以用于指示训练文本数据库中的一个文本。该指示信息所指示的文本就是用于生成该集合中的文本的文本。Optionally, in other embodiments, the first enhanced database may store texts in the form of a collection. Each set includes at least one text, and the at least one text is obtained by performing coarse-grained enhancement strategy expansion on the same text in the training text database. Similarly, each set may include one source indication information, and the source indication information may be used to indicate a text in the training text database. The text indicated by the indication information is the text used to generate the text in the set.
该计算机设备在从训练文本数据库中确定了M个文本后,可以根据第一增强数据库中的源指示信息,确定出与该M个文本对应的M个扩展文本片段集合。After determining the M texts from the training text database, the computer device can determine the M extended text fragment sets corresponding to the M texts according to the source indication information in the first enhanced database.
扩展文本片段集合与文本对应是指扩展文本片段集合中包括的扩展文本片段是根据对应的文本中的目标词组确定的。The correspondence between the set of expanded text fragments and the text means that the expanded text fragments included in the set of expanded text fragments are determined according to the target phrase in the corresponding text.
可选的,在一些实施例中,扩展文本片段可以是扩展词组。在另一些实施例中,扩展文本片段可以是包括扩展词组的完整文本。在另一些实施例中,扩展文本片段还可以是包括扩展词组的部分文本。Optionally, in some embodiments, the extended text segment may be an extended phrase. In other embodiments, the extended text segment may be a complete text including the extended phrase. In other embodiments, the extended text segment may also be a partial text including an extended phrase.
还以上述第一文本“我想找一个便宜的中餐馆”为例。在一些实施例中,对应于该文本的扩展文本片段集合包括上述第二文本1至第二文本4。在另一些实施例中,对应于该文本的扩展文本片段集合包括“日餐”、“法餐”、“实惠”和“消费低”。在另一些实施例中,对应于该文本的扩展文本片段集合包括“日餐餐馆”、“法餐餐馆”、“实惠的中餐餐馆”和“消费低的中餐餐馆”。Take the above first text "I want to find a cheap Chinese restaurant" as an example. In some embodiments, the set of extended text fragments corresponding to the text includes the aforementioned second text 1 to second text 4. In other embodiments, the set of extended text fragments corresponding to the text includes "Japanese food", "French food", "affordable" and "low consumption". In other embodiments, the set of extended text fragments corresponding to the text includes "Japanese restaurant", "French restaurant", "affordable Chinese restaurant", and "low-consumption Chinese restaurant".
相应的,与该M个扩展文本片段对应的M个文本中包括目标词组的文本片段可以称为目标文本片段。类似的,在一些实施例中,该目标文本片段可以是目标词组。在另一些实施例中,该目标文本片段可以是包括目标词组的完整文本。在另一些实施例中,目标文本片段还可以是包括目标词组的部分文本。Correspondingly, the text fragments including the target phrase in the M texts corresponding to the M extended text fragments may be referred to as target text fragments. Similarly, in some embodiments, the target text segment may be a target phrase. In other embodiments, the target text segment may be a complete text including the target phrase. In other embodiments, the target text segment may also be a partial text including the target phrase.
还以上述第一文本“我想找一个便宜的中餐馆”为例。在一些实施例中,对应于该文本的目标文本片段可以是该第一文本。在另一些实施例中,目标文本片段可以包括“便宜”和“中餐”。在另一些实施例中,目标文本片段可以包括“便宜”和“中餐餐馆”。Take the above first text "I want to find a cheap Chinese restaurant" as an example. In some embodiments, the target text segment corresponding to the text may be the first text. In other embodiments, the target text segment may include "cheap" and "Chinese food." In other embodiments, the target text segment may include "cheap" and "Chinese restaurant".
403,该计算机设备根据参考策略网络模型,从与该M个训练文本对应的M个扩展文本片段集合中的每个扩展文本片段集合中挑选1个扩展文本片段。为了便于描述,可以将根据参考策略网络模型从扩展文本片段集合中挑选出的扩展文本片段称为候选文本片段。403. The computer device selects one extended text segment from each extended text segment set in the M extended text segment sets corresponding to the M training texts according to the reference strategy network model. For ease of description, the extended text fragments selected from the set of extended text fragments according to the reference strategy network model can be referred to as candidate text fragments.
换句话说,通过步骤403,该计算机设备可以根据参考策略网络模型,确定1个候选文本片段集合,该候选文本片段集合包括M个候选文本片段,该M个候选文本片段分别来自于M个扩展文本片段集合。In other words, through step 403, the computer device can determine a set of candidate text fragments according to the reference strategy network model. The candidate text fragment set includes M candidate text fragments, each of which comes from M extensions. A collection of text fragments.
该计算机设备可以重复执行T次步骤403,共确定T个候选文本片段集合。T为大于或等于1的正整数。The computer device may repeat step 403 T times to determine a total of T candidate text fragment sets. T is a positive integer greater than or equal to 1.
M和T的取值是预先设定的。可以理解的是,如果M和T的取值越大,那么该计算机设备确定的候选文本片段集合就越多,训练出来的策略网络模型挑选文本的效果就越好,但是训练所耗费的时间也越长;相反,如果M和T的取值越小,那么计算机设备确定的候选文本片段集合就越少,训练处的策略网络模型挑选文本的效果就较差,但是训练所耗费的时间会相应减少。因此,可以根据该计算机设备的性能和/或实际需求,选择M和T的取值。例如,若希望能够得到更好的策略网络模型,则可以选择较大的M和T的取值。又如,若希望能够更快的确定一个策略网络模型,则可以选择较小的M和T的取值。此外,不同性能的计算机设备在相同时间内训练策略网络模型的效果可能不同。例如,如果训练算法相同,那么在相同的时间内,性能越好的计算机设备训练得到的策略网络模型的效果就越好。因此,若计算机设备的性能越好,则可以选择较大的M和T的取值。若计算机设备的性能较差,则可以选择较小的M的取值。The values of M and T are preset. It is understandable that if the value of M and T is larger, the set of candidate text fragments determined by the computer device is more, and the training strategy network model has a better effect on text selection, but the training time is also The longer; on the contrary, if the values of M and T are smaller, the set of candidate text fragments determined by the computer equipment is less, and the strategy network model of the training office has a poorer effect on text selection, but the training time will be correspondingly cut back. Therefore, the values of M and T can be selected according to the performance and/or actual requirements of the computer equipment. For example, if you want to get a better strategic network model, you can choose larger values of M and T. For another example, if you want to determine a policy network model faster, you can choose a smaller value of M and T. In addition, computer equipment with different performance may have different effects of training the strategy network model in the same time. For example, if the training algorithm is the same, then the better the performance of the computer equipment training the better the effect of the strategy network model in the same time. Therefore, if the performance of the computer equipment is better, a larger value of M and T can be selected. If the performance of the computer equipment is poor, a smaller value of M can be selected.
404,该计算机设备根据初始DST,对挑选出的M个候选文本片段集合进行评测,得到评测结果。404. The computer device evaluates the selected set of M candidate text fragments according to the initial DST, and obtains the evaluation result.
可选的,在一些实施例中,该计算机设备可以根据该初始DST对该M个候选文本片段集合进行单样本评测,得到评测结果。Optionally, in some embodiments, the computer device may perform a single-sample evaluation on the set of M candidate text segments according to the initial DST to obtain an evaluation result.
可选的,在另一些实施例中,该计算机设备可以根据该初始DST对该M个候选文本片段集合进行样本集合评测,得到评测结果。Optionally, in other embodiments, the computer device may perform a sample set evaluation on the M candidate text fragment sets according to the initial DST to obtain an evaluation result.
可选的,在另一些实施例中,该计算机设备可以根据该初始DST对该M个候选文本进行单样本评测以及样本集合评测,得到评测结果。Optionally, in other embodiments, the computer device may perform single-sample evaluation and sample-set evaluation on the M candidate texts according to the initial DST to obtain the evaluation result.
可选的,在一些实施例中,该初始DST可以是按照现有训练DST的方式,使用该训练文本数据库中的文本作为机器学习的训练文本进行训练得到的DST。Optionally, in some embodiments, the initial DST may be a DST obtained by training using the text in the training text database as the training text of machine learning according to an existing training DST manner.
可选的,在另一些实施例中,该参考DST可以是按照一个预先设定好的较低(例如低于80%或者更低)的准确率,使用一些文本训练得到的。Optionally, in other embodiments, the reference DST may be obtained by using some text training according to a preset lower accuracy rate (for example, lower than 80% or lower).
该计算机设备进行单样本评测可以包括:该计算机设备使用初始DST预测该M个候选文本片段集合中的每个候选文本片段的状态,根据预测结果,确定对应于个每个候选文本片段的第一奖励值。该M个候选文本片段集合共包括M×T个候选文本片段,相应的,该评测结果中共包括M×T个第一奖励值。The single-sample evaluation performed by the computer device may include: the computer device uses the initial DST to predict the state of each candidate text fragment in the set of M candidate text fragments, and according to the prediction result, determines the first one corresponding to each candidate text fragment. Reward value. The set of M candidate text fragments includes a total of M×T candidate text fragments, and correspondingly, the evaluation result includes a total of M×T first reward values.
如果对一个候选文本片段的预测结果符合预设要求,则该计算机设备可以确定该候选文本片段的第一奖励值是一个正向激励;如果对一个候选文本片段的预测结果不符合预设要求,则该计算机设备可以确定该候选文本片段的第一奖励值是一个反向激励。If the prediction result of a candidate text fragment meets the preset requirements, the computer device can determine that the first reward value of the candidate text fragment is a positive incentive; if the prediction result of a candidate text fragment does not meet the preset requirements, Then the computer device can determine that the first reward value of the candidate text segment is a reverse incentive.
正向激励的第一奖励值是大于反向激励的第一奖励值。The first reward value of the forward incentive is greater than the first reward value of the reverse incentive.
例如,在一些实施例中,正向激励的第一奖励值可以是大于0的数,例如1;反向激励的第一奖励值可以是小于0的数,例如-1。For example, in some embodiments, the first reward value of forward incentives may be a number greater than 0, such as 1, and the first reward value of reverse incentives may be a number less than 0, such as -1.
又如,在另一些实施例中,正向激励的第一奖励值和反向激励的第一奖励值都可以大于0,但是正向激励的第一奖励值大于反向激励的第一奖励值。例如,正向激励的第一奖励值为10,反向激励的第一奖励值为1。For another example, in other embodiments, both the first reward value of the forward incentive and the first reward value of the reverse incentive may be greater than 0, but the first reward value of the forward incentive is greater than the first reward value of the reverse incentive . For example, the first reward value of forward incentives is 10, and the first reward value of reverse incentives is 1.
根据候选文本片段中的扩展词组的确定方式的不同,预测结果的预设要求也不相同。According to the different ways of determining the expanded phrase in the candidate text segment, the preset requirements for the prediction result are also different.
对于一个基于第一扩展规则确定的候选文本片段中的扩展词组(即该候选文本片段中的扩展词组是根据第一类目标词组确定的,为便于描述,以下称这种候选文本片段为第一类候选文本片段)。第一类候选文本片段的标签是该候选文本片段中的扩展词组的槽位。预测的标签与实际标签相同为不符合预设要求,预测的标签与实际的标签不同,为符合预设要求。换句话说,如果该初始DST对第一类候选文本片段进行预测得到的标签与该第一类候选文本片段中的扩展词组的实际的标签相同,则表示对该第一类候选文本片段的预测结果不符合要求。在此情况下,该计算机设备可以确定对应于该第一类候选文本片段的第一奖励值是一个反向激励。如果该初始DST对于一个第一类候选文本片段进行预测得到的标签与该第一类候选文本片段中的扩展词组的实际的标签不相同,则表示对该第一类候选文本片段的预测结果符合要求。在此情况下,该计算机设备可以确定对应于该第一类候选文本片段的第一奖励值是一个正向激励。For an expanded phrase in a candidate text fragment determined based on the first expansion rule (that is, the expanded phrase in the candidate text fragment is determined according to the first type of target phrase, for ease of description, this candidate text fragment is hereinafter referred to as the first Class candidate text fragment). The label of the first type of candidate text segment is the slot of the extended phrase in the candidate text segment. The predicted label is the same as the actual label and does not meet the preset requirements, and the predicted label is different from the actual label, which meets the preset requirements. In other words, if the initial DST predicts the first-type candidate text segment with the same label as the actual tag of the expanded phrase in the first-type candidate text segment, it indicates the prediction of the first-type candidate text segment The result did not meet the requirements. In this case, the computer device may determine that the first reward value corresponding to the first type of candidate text segment is a reverse incentive. If the initial DST predicts a first-type candidate text segment, the label obtained is different from the actual label of the extended phrase in the first-type candidate text segment, it means that the prediction result of the first-type candidate text segment conforms to Claim. In this case, the computer device may determine that the first reward value corresponding to the first type of candidate text segment is a positive incentive.
对于一个基于第二扩展规则确定的候选文本片段中的扩展词组(即该候选文本片段中的扩展词组是根据第二类目标词组确定的,为便于描述,以下称这种候选文本片段为第二类候选文本片段)。第二类候选文本片段的标签是该候选文本片段中的扩展词组的词义。预测的标签与实际标签相同为符合预设要求,预测的标签与实际的标签不同,为不符合预设要求。换句话说,如果该初始DST对第二类候选文本片段进行预测得到的标签与该第二类候选文本片段中的扩展词组的实际的标签相同,则表示对该第二类候选文本片段的预测结果符合要求。在此情况下,该计算机设备可以确定对应于该第二类候选文本片段的第一奖励值是一个反向激励。如果该初始DST对于一个第二类候选文本片段进行预测得到的标签与该第二类候选文本片段中的扩展词组的实际的标签不相同,则表示对该第二类候选文本片段的预测结果不符合要求。在此情况下,该计算机设备可以确定对应于该第二类候选文本片段的第一奖励值是一个正向激励。For an expanded phrase in a candidate text fragment determined based on the second expansion rule (that is, the expanded phrase in the candidate text fragment is determined according to the second type of target phrase, for ease of description, this candidate text fragment is hereinafter referred to as the second Class candidate text fragment). The label of the second type of candidate text segment is the meaning of the extended phrase in the candidate text segment. The predicted label is the same as the actual label to meet the preset requirements, and the predicted label is different from the actual label, which does not meet the preset requirements. In other words, if the initial DST predicts the second-type candidate text segment with the same label as the actual tag of the extended phrase in the second-type candidate text segment, it indicates the prediction of the second-type candidate text segment The results meet the requirements. In this case, the computer device may determine that the first reward value corresponding to the second type of candidate text segment is a reverse incentive. If the initial DST predicts a second-type candidate text segment, the label obtained is different from the actual label of the expanded phrase in the second-type candidate text segment, it means that the prediction result of the second-type candidate text segment is different. Meet the requirements. In this case, the computer device may determine that the first reward value corresponding to the second type of candidate text segment is a positive incentive.
该计算机设备进行样本集合评测是指该计算机设备使用一个候选文本片段集合训练该初始DST,得到训练后的初始DST。为了便于描述,以下将训练后的DST称为参考DST。该计算机设备在得到该参考DST后,可以根据该参考DST,确定对应于该候选文本片段集合的第二奖励值。该第二奖励值就是该对应的候选文本片段的样本集合评测结果。The evaluation of the sample set by the computer device means that the computer device uses a set of candidate text fragments to train the initial DST to obtain the initial DST after training. For ease of description, the DST after training is referred to as the reference DST below. After obtaining the reference DST, the computer device can determine the second reward value corresponding to the set of candidate text segments according to the reference DST. The second reward value is the evaluation result of the sample set of the corresponding candidate text segment.
该计算机设备使用一个候选文本片段集合包括的T个候选文本片段训练初始DST的过程与现有训练DST的过程相同,为了简洁,在此就不必详细描述。The computer device uses the T candidate text fragments included in a candidate text fragment set to train the initial DST. The process is the same as the existing training DST process. For brevity, it is unnecessary to describe in detail here.
若初始DST预测的准确率太高,例如高于90%或者更好,那么再通过训练该初始DST来提高预测准确率比较困难。因此,所选择的该初始DST可以是预测准确率较低的DST。例如,该初始DST预测的准确率可以低于90%,甚至低于80%。If the accuracy of the initial DST prediction is too high, for example, higher than 90% or better, it is difficult to improve the prediction accuracy by training the initial DST. Therefore, the selected initial DST may be a DST with a lower prediction accuracy. For example, the accuracy of the initial DST prediction may be lower than 90%, or even lower than 80%.
可以理解的是,该计算机设备根据该初始DST进行样本集合评测可以包括:计算机设备根据该M个候选文本片段集合对初始DST进行训练;根据训练后得到的DST,确定T个第二奖励值。It is understandable that the evaluation of the sample set by the computer device according to the initial DST may include: the computer device trains the initial DST according to the set of M candidate text fragments; and determines T second reward values according to the DST obtained after the training.
该计算机设备根据该M个候选文本片段集合对初始DST进行训练,可以包括:该计算机设备分别使用T个DST训练文本集合对该初始DST进行训练。在此情况下,该计算机设备可以得到T个训练后的初始DST。为了便于描述,以下将训练后的初始DST称为参考DST。该T个DST训练文本集合是根据该M个候选文本片段集合确定的。该T个DST训练文本集合中的每个训练文本集合包括M个候选文本片段,该M个候选文本片段 分别来自于该M个候选文本片段集合。具体地,该T个DST训练文本集合中的第i个DST训练文本集合中包括M个候选文本片段,该M个候选文本片段中的第j个候选文本片段是该M个候选文本片段集合中的第j个候选文本片段集合中的第i个候选文本片段。The computer device training the initial DST according to the M candidate text fragment sets may include: the computer device uses T DST training text sets to train the initial DST respectively. In this case, the computer device can obtain T initial DSTs after training. For ease of description, the initial DST after training is referred to as the reference DST below. The T DST training text sets are determined according to the M candidate text fragment sets. Each training text set in the T DST training text sets includes M candidate text fragments, and the M candidate text fragments are respectively from the M candidate text fragment sets. Specifically, the i-th DST training text set in the T DST training text sets includes M candidate text fragments, and the j-th candidate text fragment in the M candidate text fragments is in the M candidate text fragment set The i-th candidate text segment in the set of j-th candidate text segments.
该计算机设备可以分别根据该T个参考DST,确定T个第二奖励值。该计算机设备根据该参考DST,确定第二奖励值可以包括:该计算机设备确定参考DST预测标签的准确率是否高于初始DST预测标签的准确率;根据预测标签的准确率是否有提升,确定第二奖励值。如果预测标签的准确率有提升,则可以第二奖励值是一个正向激励;如果预测标签的准确率没有提升或者降低,则该第二奖励值是一个反向激励。The computer device can determine T second reward values according to the T reference DSTs respectively. The computer device determining the second reward value according to the reference DST may include: the computer device determines whether the accuracy rate of the reference DST prediction tag is higher than the accuracy rate of the initial DST prediction tag; determining whether the accuracy rate of the predicted tag has improved. 2. Reward value. If the accuracy of the predicted label is improved, the second reward value may be a positive incentive; if the accuracy of the predicted label is not improved or decreased, the second reward value is a reverse incentive.
正向激励的第二奖励值是大于反向激励的第二奖励值。The second reward value of the forward incentive is greater than the second reward value of the reverse incentive.
例如,在一些实施例中,正向激励的第二奖励值可以是大于0的数,例如1;反向激励的第二奖励值可以是小于0的数,例如-1。For example, in some embodiments, the second reward value of the forward incentive may be a number greater than 0, such as 1, and the second reward value of the reverse incentive may be a number less than 0, such as -1.
又如,在另一些实施例中,正向激励的第二奖励值和反向激励的第二奖励值都可以大于0,但是正向激励的第二奖励值大于反向激励的第二奖励值。例如,正向激励的第二奖励值为10,反向激励的第二奖励值为1。For another example, in other embodiments, the second reward value of the forward incentive and the second reward value of the reverse incentive may both be greater than 0, but the second reward value of the forward incentive is greater than the second reward value of the reverse incentive . For example, the second reward value of forward incentives is 10, and the second reward value of reverse incentives is 1.
该计算机设备可以使用初始DST和参考DST对同一组文本进行标签预测来判断预测标签的准确率是否有提升。这一组用于衡量初始DST和参考DST的性能(即预测标签的准确率)的文本可以称为验证集。可选的,在一些实施例中,该验证集可以是用于训练初始DST的一个候选文本片段集合。可选的,在另一些实施例中,验证集可以是M个候选文本片段集合中的任一个候选文本片段集合。The computer device can use the initial DST and the reference DST to perform label prediction on the same set of texts to determine whether the accuracy of the predicted label is improved. This set of texts used to measure the performance of the initial DST and the reference DST (that is, the accuracy of the predicted label) can be called a verification set. Optionally, in some embodiments, the verification set may be a candidate text segment set used to train the initial DST. Optionally, in other embodiments, the verification set may be any candidate text fragment set among the M candidate text fragment sets.
若该计算机设备同时进行单样本评测和样本集合评测,则该计算机设备确定的评测结果包括M×T个第一奖励值和T个第二奖励值。If the computer device performs single sample evaluation and sample set evaluation at the same time, the evaluation result determined by the computer device includes M×T first reward values and T second reward values.
405,该计算机设备利用该评测结果训练参考策略网络模型。405. The computer device uses the evaluation result to train a reference strategy network model.
策略网络模型可以表示为:The policy network model can be expressed as:
Figure PCTCN2020089988-appb-000001
Figure PCTCN2020089988-appb-000001
其中π θ(s,p′)表示基于上下文状态s,对候选文本片段p′进行概率预测。s从三元组<x,y,p>和候选文本p′抽取出来的向量表征。P表示目标文本片段。f(s,p′)是使用全连接网络计算得到的,表示p被p′替换的概率。由于每个目标文本片段可以对应多个扩展文本片段。因此,公式1.1中使用了归一化的方式表示策略网络模型。公式1.1中的Cp表示对应于一个目标文本片段的全部候选文本片段的集合。
Figure PCTCN2020089988-appb-000002
表示Cp中的任一个候选文本片段。
Figure PCTCN2020089988-appb-000003
表示对Cp中的所有p′的全连接网络计算得到的值的和,即p被Cp的每个p′替换的概率的和。
Among them, π θ (s, p′) represents the probability prediction of the candidate text segment p′ based on the context state s. s is the vector representation extracted from the triples <x,y,p> and the candidate text p'. P represents the target text fragment. f(s,p') is calculated using a fully connected network, and represents the probability of p being replaced by p'. Because each target text segment can correspond to multiple extended text segments. Therefore, formula 1.1 uses a normalized way to represent the policy network model. Cp in formula 1.1 represents the set of all candidate text fragments corresponding to a target text fragment.
Figure PCTCN2020089988-appb-000002
Represents any candidate text segment in Cp.
Figure PCTCN2020089988-appb-000003
Represents the sum of the values calculated by the fully connected network of all p'in Cp, that is, the sum of the probability that p is replaced by each p'of Cp.
可以看出,奖励值(包括第一奖励值和第二奖励值)越大,则意味着预测的结果更符合要求,参考策略网络模型挑选的候选文本片段越适合用于作为训练DST的训练文本。因此可以期望最大化奖励信号来训练参考策略网络模型,得到更优的策略网络模型。It can be seen that the larger the reward value (including the first reward value and the second reward value), it means that the predicted result is more in line with the requirements, and the candidate text segment selected by the reference strategy network model is more suitable for training DST . Therefore, it can be expected to maximize the reward signal to train the reference strategy network model to obtain a better strategy network model.
可选的,在一些实施例中,该计算机设备可以通过梯度学习,训练该参考策略网络模型中的参数。期望奖励信号可以等于该参考策略网络的梯度。Optionally, in some embodiments, the computer device may train the parameters in the reference strategy network model through gradient learning. The expected reward signal can be equal to the gradient of the reference strategy network.
该参考策略网络的梯度可以近似为:The gradient of the reference strategy network can be approximated as:
Figure PCTCN2020089988-appb-000004
Figure PCTCN2020089988-appb-000004
Figure PCTCN2020089988-appb-000005
Figure PCTCN2020089988-appb-000005
其中
Figure PCTCN2020089988-appb-000006
表示梯度参数,π θ表示参考策略网络的参数,s′ i,j表示原始样本集合的第i个样本的第j次采样得到的状态,p′ i,j表示原始样本集合的第i个样本的第j次采样得到的替换文本,
Figure PCTCN2020089988-appb-000007
表示样本集合i评测的奖励值(即第二奖励值),
Figure PCTCN2020089988-appb-000008
表示样本集合i中的第j次采样的评测的奖励值(即第一奖励值)。该原始样本集合是指步骤402确定的M个扩展文本片段集合。第i个样本是指T个DST训练文本集合中的中的第i个DST训练文本集合。第i个样本的第j次采样是第i个DST训练文本集合中的第j个候选文本片段。
among them
Figure PCTCN2020089988-appb-000006
Represents the gradient parameters, π θ represents the parameters of the reference strategy network, s′ i,j represents the state obtained by the jth sample of the i-th sample of the original sample set , and p′i,j represents the i-th sample of the original sample set The replacement text obtained from the jth sampling,
Figure PCTCN2020089988-appb-000007
Represents the reward value evaluated by the sample set i (ie the second reward value),
Figure PCTCN2020089988-appb-000008
Represents the evaluation reward value of the jth sampling in the sample set i (ie, the first reward value). The original sample set refers to the set of M extended text fragments determined in step 402. The i-th sample refers to the i-th DST training text set in the T DST training text sets. The j-th sampling of the i-th sample is the j-th candidate text segment in the i-th DST training text set.
除了利用梯度学习训练参考策略网络模型外,该计算机设备还可以利用其它方式训练该参考策略网络模型。例如,该计算机设备可以利用随机梯度下降(stochastic gradient descent,SGD)、自适应矩估计(Adaptive Moment Estimation,Adam)等方法来训练该参考策略网络模型。In addition to using gradient learning to train the reference strategy network model, the computer device can also use other methods to train the reference strategy network model. For example, the computer device may use stochastic gradient descent (SGD), adaptive moment estimation (Adaptive Moment Estimation, Adam) methods to train the reference strategy network model.
该计算机设备在依次执行完步骤401至步骤405后,可以重新执行步骤401至405。换句话说,该计算机设备可以按照步骤401至步骤405的顺序,循环执行图4所示的方法。如果计算机设备确定循环次数大于一个预设次数N,则可以停止循环。确定第N次执行步骤405时训练的参考策略网络模型为用于从该P个第二文本中选择至少一个第二文本作为机器学习的训练文本的策略网络模型。该计算机设备可以设置一个初始策略网络模型。该计算机设备在第一次循环时可以使用该初始策略网络模型挑选候选文本片段。换句话说,该计算机设备在N次循环中的第一次执行图4所示的方法时,执行步骤403时使用的参考策略网络模型为初始策略网络模型。如上所述,该计算机设备可以循环T次执行步骤403。在循环T次执行步骤403时,该计算机设备使用的参考策略网络模型是相同的。计算机设备在第2次至第N次执行图4所示的方法时,在循环T次执行步骤403时使用的参考策略网络模型是上一次执行步骤405时训练得到的参考策略网络模型。换句话说,在第t次执行图4所示方法时,步骤403中的参考策略网络模型为第t-1次执行图4所示的方法的步骤405时确定的该参考策略网络模型,t为大于或等于2且小于或等于N的正整数。The computer device can execute steps 401 to 405 again after sequentially executing steps 401 to 405. In other words, the computer device can execute the method shown in FIG. 4 cyclically in the order of step 401 to step 405. If the computer device determines that the number of cycles is greater than a preset number N, the cycle can be stopped. It is determined that the reference strategy network model trained when step 405 is executed for the Nth time is a strategy network model for selecting at least one second text from the P second texts as the training text for machine learning. The computer equipment can set an initial policy network model. The computer device can use the initial strategy network model to select candidate text segments in the first cycle. In other words, when the computer device executes the method shown in FIG. 4 for the first time in the N cycles, the reference strategy network model used in step 403 is the initial strategy network model. As described above, the computer device can execute step 403 in a loop T times. When step 403 is executed T times in a loop, the reference strategy network model used by the computer equipment is the same. When the computer device executes the method shown in FIG. 4 for the second to the Nth time, the reference strategy network model used when step 403 is executed in a loop T times is the reference strategy network model trained when step 405 is executed last time. In other words, when the method shown in FIG. 4 is executed for the tth time, the reference policy network model in step 403 is the reference policy network model determined when step 405 of the method shown in FIG. 4 is executed for the t-1th time, t It is a positive integer greater than or equal to 2 and less than or equal to N.
如上所述,在确定第二文本的过程中,该计算机设备确定的一些扩展文本可能并不合适。例如,在利用释义数据库确定的一组词组集合中的部分词组的词义和对应于该词组集合的词组的词义可能并不完全相同或者相反。根据这些词组生成的第二文本并不适合用来训练DST。利用图4所示的方法确定的策略网络模型,可以对第二文本进行筛选,过滤掉不适合用于训练DST的第二文本。这样,可以提高用于训练DST的文本的质量,从而提高训练出的DST的性能。As described above, in the process of determining the second text, some extended texts determined by the computer device may not be appropriate. For example, the meanings of some phrases in a phrase set determined by using the paraphrase database and the meanings of the phrases corresponding to the phrase set may not be completely the same or opposite. The second text generated from these phrases is not suitable for training DST. Using the strategy network model determined by the method shown in FIG. 4, the second text can be filtered, and the second text that is not suitable for training DST can be filtered out. In this way, the quality of the text used for training the DST can be improved, thereby improving the performance of the trained DST.
下面以一个扩展文本片段集合为例,对图4所示的训练策略网络模型的方法进行进一步的描述。Taking an extended text fragment set as an example, the method of training the strategy network model shown in FIG. 4 is further described.
假设图3所示方法中的第一文本是图4所示方法的步骤401中确定的M个文本中的一个。假设扩展文本片段为包括扩展词组的完整文本。那么,对该第一文本进行扩展得到 的P个第二文本可以做为一个扩展文本片段集合。Assume that the first text in the method shown in FIG. 3 is one of the M texts determined in step 401 of the method shown in FIG. 4. Assume that the extended text segment is a complete text including the extended phrase. Then, P second texts obtained by expanding the first text can be used as an expanded text fragment set.
图5是利用该P个第二文本训练该策略网络模型的方法的示意性流程图。Fig. 5 is a schematic flowchart of a method for training the policy network model by using the P second texts.
501,该计算机设备使用参考策略网络模型,从该P个第二文本中选择一个第二文本。501. The computer device uses the reference strategy network model to select a second text from the P second texts.
该计算机设备可以执行T次步骤501。换句话说,该计算机设备总共从该P个第二文本中确定了T个第二文本。P的取值可以大于T,也可以小于T。该T个第二文本中可能会出现重复的文本。该T个第二文本即为T个候选文本片段。该T个第二文本分别属于M个候选文本片段集合。The computer device can execute step 501 T times. In other words, the computer device has determined T second texts in total from the P second texts. The value of P can be greater than or less than T. Duplicate text may appear in the T second text. The T second texts are T candidate text segments. The T second texts respectively belong to M candidate text fragment sets.
502,该计算机设备根据初始DST,对该T个第二文本进行评测,得到评测结果。502. The computer device evaluates the T second texts according to the initial DST, and obtains an evaluation result.
可选的,在一些实施例中,该计算机设备根据初始DST,对该T个第二文本进行评测,包括:该计算机设备可以对该T个第二文本进行单样本评测。Optionally, in some embodiments, the computer device evaluating the T second texts according to the initial DST includes: the computer device may perform a single-sample evaluation on the T second texts.
可选的,在另一些实施例中,该计算机设备根据初始DST,对该T个第二文本进行评测,包括:计算机设备根据该T个第二文本进行样本集合评测。Optionally, in other embodiments, the computer device evaluating the T second texts according to the initial DST includes: the computer device evaluating the sample set according to the T second texts.
可选的,在另一些实施例中,该计算机设备根据初始DST,对该T个第二文本进行评测,包括:该计算机设备对该T个第二文本进行单样本评测以及根据该T个第二文本片段进行样本集合评测。Optionally, in other embodiments, the computer device evaluates the T second text according to the initial DST, including: the computer device performs a single-sample evaluation on the T second text and performs a single-sample evaluation according to the T second text. Two text fragments are evaluated for sample collection.
该计算机设备可以对该T个第二文本进行单样本评测,包括:该计算机设备可以使用初始DST,预测该T个第二文本中的每个第二文本的状态,得到T个预测结果,根据该M个预测结果,确定T个第一奖励值,该T个第一奖励值与T个第二文本一一对应。换句话说,该T个第一奖励值中的第j个第一奖励值是根据该初始DST对T个第二文本中的第i个第二文本的预测结果确定的。单样本评测的具体实现方式可以参见图4所示的方法,在此就不必赘述。The computer device can perform a single-sample evaluation of the T second texts, including: the computer device can use the initial DST to predict the state of each second text in the T second texts to obtain T prediction results, according to The M prediction results determine T first reward values, and the T first reward values have a one-to-one correspondence with T second texts. In other words, the j-th first reward value in the T first reward values is determined according to the initial DST prediction result of the i-th second text in the T second texts. The specific implementation of single-sample evaluation can refer to the method shown in FIG. 4, and it is not necessary to repeat it here.
计算机设备根据该T个第二文本片段进行样本集合评测可以包括:该计算机设备使用该T个第二文本对该初始DST进行训练;根据训练后的该初始DST,确定T个第二奖励值。The evaluation of the sample set by the computer device according to the T second text segments may include: the computer device uses the T second texts to train the initial DST; and according to the initial DST after training, determining T second reward values.
具体地,该计算机设备使用该T个第二文本对该初始DST进行训练可以包括:该计算机设备T个DST训练文本集合对该初始DST进行训练。该T个第二文本分别属于该T个DST训练文本集合。换句话说,该T个第二文本中的第i个第二文本为该T个DST训练文本集合中的第i个DST训练文本集合中的一个文本。Specifically, the computer device using the T second texts to train the initial DST may include: the computer device training the initial DST with T DST training text sets. The T second texts belong to the T DST training text sets respectively. In other words, the i-th second text in the T second texts is a text in the i-th DST training text set in the T DST training text sets.
该计算机设备根据T个DST训练文本集合确定第二奖励值的具体实现方式可以参见图4所示的方法,在此就不必赘述。For the specific implementation manner of the computer device determining the second reward value according to the T DST training text sets, refer to the method shown in FIG. 4, and it is unnecessary to repeat it here.
若该计算机设备仅进行单样本评测,则该评测结果包括T个第一奖励值。If the computer device only performs single-sample evaluation, the evaluation result includes T first reward values.
若该计算机设备仅进行样本集合评测,则该评测结果包括T个第二奖励值。If the computer device only performs sample set evaluation, the evaluation result includes T second reward values.
若该计算机设备进行单样本评测以及样本集合评测,则该评测结果中T个第一奖励值和T个第二奖励值。If the computer device performs single-sample evaluation and sample-set evaluation, there are T first reward values and T second reward values in the evaluation result.
503,该计算机设备可以根据该评测结果,训练该参考策略网络模型。503. The computer device may train the reference strategy network model according to the evaluation result.
可以理解,步骤503确定的评测结果是图4中步骤404确定的评测结果的子集或者与该评测结果相同。It can be understood that the evaluation result determined in step 503 is a subset of the evaluation result determined in step 404 in FIG. 4 or the same as the evaluation result.
具体地,若步骤404中仅进行样本集合评测,那么步骤502中也仅进行样本集合评测。此时步骤503确定的评测结果与步骤404确定的评测结果相同。Specifically, if only the sample set evaluation is performed in step 404, then only the sample set evaluation is also performed in step 502. At this time, the evaluation result determined in step 503 is the same as the evaluation result determined in step 404.
若步骤404中进行单样本评测,那么步骤502中也进行单样本评测。此时步骤404确定的评测结果包括与步骤503确定的评测结果。如上所述,在进行单样本评测时,步骤404确定的评测结果中共包括该M×T个第一预测值。步骤503中确定的评测结果中包括T个第一预测值,步骤503确定的评测结果中的T个第一预测值属于对应的步骤404确定的评测结果中的M×T个第一预测值。If single-sample evaluation is performed in step 404, then single-sample evaluation is also performed in step 502. At this time, the evaluation result determined in step 404 includes the evaluation result determined in step 503. As described above, when performing single-sample evaluation, the evaluation result determined in step 404 includes the M×T first predicted values. The evaluation result determined in step 503 includes T first prediction values, and the T first prediction values in the evaluation result determined in step 503 belong to the corresponding M×T first prediction values in the evaluation result determined in step 404.
在根据图4所示的方法确定了策略网络模型的情况下,该计算机设备可以利用该策略网络模型,挑选该第一增强数据库中的部分文本组成第二增强数据库,并使用该第二增强数据库训练DST。为了便于描述,以下将图3中基于第一文本确定P个第二文本的方式称为粗粒度数据增强策略,将利用图4确定的策略网络模型从P个第二文本中挑选至少一个第二文本的方式称为细粒度数据增强策略。In the case that the policy network model is determined according to the method shown in FIG. 4, the computer device can use the policy network model to select part of the text in the first enhanced database to form a second enhanced database, and use the second enhanced database Training DST. For ease of description, the method of determining P second texts based on the first text in FIG. 3 is referred to as a coarse-grained data enhancement strategy. The policy network model determined in FIG. The textual approach is called a fine-grained data enhancement strategy.
还以包括1000个句子的训练文本数据库和20000个句子的第一增强数据库为例。计算机设备基于粗粒度数据增强策略,将训练文本数据库中的1000个句子扩展到第一增数据库中的20000个句子。在此之后,该计算机设备还可以基于细粒度数据增强策略,挑选该第一增强数据库中的部分文本组成第二增强数据库。换句话说,该计算机设备可以使用该策略网络模型,挑选该第一增强数据库中的部分文本组成第二增强数据库。假设该计算机设备利用细粒度数据增强策略,从第一增数据库中的20000个句子中挑选出了12000个句子。这12000个句子就是该第二增强数据库所包括的句子。在确定了第二增强数据库后,该计算机设备可以使用第二增强数据库中的全部句子以及训练文本数据库中的全部句子作为机器学习的训练文本,训练得到DST。该DST可以实现图1所示的对话系统100中的DST 102的功能以及图2所示的DST的功能。Take the training text database of 1000 sentences and the first enhanced database of 20000 sentences as an example. Based on the coarse-grained data enhancement strategy, the computer equipment expands 1,000 sentences in the training text database to 20,000 sentences in the first augmented database. After that, the computer device can also select part of the text in the first enhanced database to form a second enhanced database based on a fine-grained data enhancement strategy. In other words, the computer device can use the strategy network model to select part of the text in the first enhanced database to form a second enhanced database. Assume that the computer device uses a fine-grained data enhancement strategy to select 12,000 sentences from the 20,000 sentences in the first augmented database. These 12,000 sentences are the sentences included in the second enhanced database. After the second enhanced database is determined, the computer device can use all the sentences in the second enhanced database and all the sentences in the training text database as the training text for machine learning, and train to obtain the DST. The DST can realize the function of DST 102 in the dialogue system 100 shown in FIG. 1 and the function of DST shown in FIG. 2.
可以看出,与现有的训练DST的方案相比,采用本申请的方法可以将用于训练DST的训练文本从1000个扩展到12100个。增加用于训练DST的训练文本的样本数量可以提高训练出的DST的性能,使得该DST可以更加准确地确定用户表述内容中的槽位-槽位值,以及提高该DST确定的意图的准确性和提高确定未填充槽位值的槽位的准确性。It can be seen that, compared with the existing DST training solution, the method of the present application can expand the training text used for training DST from 1000 to 12100. Increasing the number of samples of training text used for training DST can improve the performance of the trained DST, so that the DST can more accurately determine the slot-slot value in the user's content, and improve the accuracy of the intent determined by the DST And improve the accuracy of determining the slot value of the unfilled slot.
图6是根据本申请实施例提供的计算机设备的结构框图。如图6所示的计算机设备600包括:获取单元601和处理单元602。Fig. 6 is a structural block diagram of a computer device provided according to an embodiment of the present application. The computer device 600 shown in FIG. 6 includes: an acquiring unit 601 and a processing unit 602.
获取单元601,用于获取第一文本,该第一文本为训练文本数据库中的一个文本,该第一文本包括至少两个词组。The acquiring unit 601 is configured to acquire a first text, the first text is a text in a training text database, and the first text includes at least two phrases.
处理单元602,用于从该第一文本中确定至少一个目标词组。The processing unit 602 is configured to determine at least one target phrase from the first text.
处理单元602,还用于根据该至少一个目标词组,确定P个第二文本,该P个第二文本中的每个第二文本包括一个扩展词组,该扩展词组是基于该至少一个目标词组中的一个确定的,P为大于或等于1的正整数;The processing unit 602 is further configured to determine P second texts according to the at least one target phrase. Each second text in the P second texts includes an expanded phrase based on the at least one target phrase A certain, P is a positive integer greater than or equal to 1;
处理单元602,还用于根据该第一文本和该P个第二文本,通过机器学习,训练对话状态跟踪分类器,该对话状态跟踪分类器用于根据获取到的用户的对话,预测该对话的当前状态。The processing unit 602 is further configured to train a dialog state tracking classifier based on the first text and the P second texts through machine learning. The dialog state tracking classifier is used to predict the conversation status based on the acquired user’s dialog. Current status.
获取单元601可以由收发器实现,处理单元602可以由处理器实现。获取单元601和处理单元602的具体功能和有益效果,可以参见图3至图5所示的方法,在此就不必赘述。The acquiring unit 601 may be implemented by a transceiver, and the processing unit 602 may be implemented by a processor. The specific functions and beneficial effects of the acquiring unit 601 and the processing unit 602 can be referred to the methods shown in FIG. 3 to FIG. 5, and details are not required here.
图7是根据本申请实施例提供的计算机设备的结构框图。图7所示的计算机设备700包括:处理器701、存储器702和收发器703。Fig. 7 is a structural block diagram of a computer device provided according to an embodiment of the present application. The computer device 700 shown in FIG. 7 includes a processor 701, a memory 702, and a transceiver 703.
处理器701、存储器702和收发器703之间通过内部连接通路互相通信,传递控制和/或数据信号。The processor 701, the memory 702, and the transceiver 703 communicate with each other through an internal connection path to transfer control and/or data signals.
上述本申请实施例揭示的方法可以应用于处理器701中,或者由处理器701实现。处理器701可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器701中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器701可以是通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存取存储器(random access memory,RAM)、闪存、只读存储器(read-only memory,ROM)、可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器702,处理器701读取存储器702中的指令,结合其硬件完成上述方法的步骤。The method disclosed in the foregoing embodiment of the present application may be applied to the processor 701 or implemented by the processor 701. The processor 701 may be an integrated circuit chip with signal processing capabilities. In the implementation process, the steps of the foregoing method can be completed by hardware integrated logic circuits in the processor 701 or instructions in the form of software. The aforementioned processor 701 may be a general-purpose processor, a digital signal processor (digital signal processor, DSP), an application specific integrated circuit (ASIC), a ready-made programmable gate array (field programmable gate array, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor. The software module can be located in random access memory (RAM), flash memory, read-only memory (read-only memory, ROM), programmable read-only memory, or electrically erasable programmable memory, registers, etc. mature in the field Storage medium. The storage medium is located in the memory 702, and the processor 701 reads instructions in the memory 702, and completes the steps of the foregoing method in combination with its hardware.
可选的,在一些实施例中,存储器702可以存储用于执行如图3至图5所示方法中计算机设备执行的方法的指令。处理器701可以执行存储器702中存储的指令结合其他硬件(例如收发器703)完成如图3至图5所示方法中计算机设备的步骤,具体工作过程和有益效果可以参见图3至图5所示实施例中的描述。Optionally, in some embodiments, the memory 702 may store instructions for executing the method executed by the computer device in the method shown in FIGS. 3 to 5. The processor 701 can execute the instructions stored in the memory 702 in combination with other hardware (for example, the transceiver 703) to complete the steps of the computer device in the method shown in FIGS. 3 to 5. The specific working process and beneficial effects can be seen in FIGS. 3 to 5. Show the description in the embodiment.
本申请实施例还提供一种芯片,该芯片包括收发单元和处理单元。其中,收发单元可以是输入输出电路、通信接口;处理单元为该芯片上集成的处理器或者微处理器或者集成电路。该芯片可以执行上述方法实施例中计算机设备的方法。An embodiment of the present application also provides a chip, which includes a transceiver unit and a processing unit. Among them, the transceiver unit may be an input/output circuit or a communication interface; the processing unit is a processor or microprocessor or integrated circuit integrated on the chip. The chip can execute the method of the computer device in the above method embodiment.
本申请实施例还提供一种计算机可读存储介质,其上存储有指令,该指令被执行时执行上述方法实施例中计算机设备的方法。The embodiment of the present application also provides a computer-readable storage medium on which an instruction is stored, and the method of the computer device in the foregoing method embodiment is executed when the instruction is executed.
本申请实施例还提供一种包含指令的计算机程序产品,该指令被执行时执行上述方法实施例中计算机设备的方法。The embodiment of the present application also provides a computer program product containing instructions that, when executed, execute the method of the computer device in the foregoing method embodiment.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。A person of ordinary skill in the art may be aware that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the above-described system, device, and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, the functional units in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in this application. Should be covered within the scope of protection of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims (14)

  1. 一种训练对话状态跟踪分类器的方法,其特征在于,所述方法包括:A method for training a dialog state tracking classifier, characterized in that the method includes:
    获取第一文本,所述第一文本为训练文本数据库中的一个文本,所述第一文本包括至少两个词组;Acquiring a first text, where the first text is a text in a training text database, and the first text includes at least two phrases;
    从所述第一文本中确定至少一个目标词组;Determine at least one target phrase from the first text;
    根据所述至少一个目标词组,确定P个第二文本,所述P个第二文本中的每个第二文本包括一个扩展词组,所述扩展词组是基于所述至少一个目标词组中的一个确定的,P为大于或等于1的正整数;According to the at least one target phrase, determine P second texts, each of the P second texts includes an expanded phrase, the expanded phrase is determined based on one of the at least one target phrase , P is a positive integer greater than or equal to 1;
    根据所述第一文本和所述P个第二文本,通过机器学习,训练对话状态跟踪分类器,所述对话状态跟踪分类器用于根据获取到的用户的对话,跟踪所述对话的状态。According to the first text and the P second texts, a dialogue state tracking classifier is trained through machine learning, and the dialogue state tracking classifier is used to track the state of the dialogue according to the acquired dialogue of the user.
  2. 如权利要求1所述的方法,其特征在于,所述根据所述至少一个目标词组,确定P个第二文本,包括:The method according to claim 1, wherein the determining P second texts according to the at least one target phrase comprises:
    确定与K 1个槽位对应的K 1个第一词组集合,其中所述K 1个槽位分别为所述至少一个目标词组中的K 1个目标词组的槽位,K 1为大于或等于1的正整数; Determine K 1 first phrase sets corresponding to K 1 slots, wherein the K 1 slots are respectively the slots of the K 1 target phrase in the at least one target phrase, and K 1 is greater than or equal to A positive integer of 1;
    确定P 1个第二文本,其中P 1个第二文本包括的扩展词组属于所述K 1个第一词组集合,所述P个第二文本包括所述P 1个第二文本,P 1为大于或等于1的正整数。 Determine P 1 second text, where P 1 second text includes extended phrases belonging to the K 1 first phrase set, said P second text includes said P 1 second text, P 1 is A positive integer greater than or equal to 1.
  3. 如权利要求1或2中任一项所述的方法,其特征在于,所述根据所述至少一个目标词组,确定P个第二文本,包括:The method according to any one of claims 1 or 2, wherein the determining P second texts according to the at least one target phrase comprises:
    确定与K 2个词义对应的K 2个第二词组集合,其中所述K 2个词义分别为K 2个目标词组的词义,K 2为大于或等于1的正整数; Determine K 2 second phrase sets corresponding to K 2 word meanings, wherein the K 2 word meanings are respectively the word meanings of K 2 target phrases, and K 2 is a positive integer greater than or equal to 1;
    确定P 2个第二文本,其中P 2个第二文本包括的扩展词组属于所述K 2个第二词组集合,所述P个第二文本包括所述P 2个第二文本,P 2为大于或等于1的正整数。 Determine P 2 second texts, where the expanded phrases included in P 2 second texts belong to the K 2 second phrase sets, the P second texts include the P 2 second texts, and P 2 is A positive integer greater than or equal to 1.
  4. 如权利要求1至3中任一项所述的方法,其特征在于,所述根据所述第一文本和所述P个第二文本,通过机器学习,训练对话状态跟踪分类器,包括:The method according to any one of claims 1 to 3, wherein the training a dialogue state tracking classifier according to the first text and the P second texts through machine learning comprises:
    根据策略网络模型,从所述P个第二文本中确定至少一个第二文本;Determine at least one second text from the P second texts according to the policy network model;
    使用所述第一文本和所述至少一个第二文本作为所述机器学习的训练文本,训练所述对话状态跟踪分类器。Using the first text and the at least one second text as the training text of the machine learning to train the dialogue state tracking classifier.
  5. 如权利要求4所述的方法,其特征在于,所述方法还包括:The method according to claim 4, wherein the method further comprises:
    根据参考策略网络模型,从P个第二文本中确定T个第二文本,T为大于或等于1的正整数;According to the reference strategy network model, determine T second texts from P second texts, where T is a positive integer greater than or equal to 1;
    根据初始对话状态跟踪分类器和所述T个第二文本,确定评测结果;Determine the evaluation result according to the initial dialog state tracking classifier and the T second text;
    根据所述评测结果,训练所述参考策略网络模型得到所述策略网络模型。According to the evaluation result, the reference strategy network model is trained to obtain the strategy network model.
  6. 如权利要求5所述的方法,其特征在于,所述根据初始对话状态跟踪分类器和所述T个第二文本,确定评测结果:The method of claim 5, wherein the tracking classifier and the T second texts according to the initial dialog state are used to determine the evaluation result:
    使用初始对话状态跟踪分类器,预测所述T个第二文本中的每个第二文本的状态,得到T个预测结果,根据所述T个预测结果,确定T个第一奖励值;或者Use the initial dialogue state tracking classifier to predict the state of each second text in the T second texts to obtain T prediction results, and determine T first reward values according to the T prediction results; or
    使用所述T个第二文本对所述初始对话状态跟踪分类器进行训练;根据训练后的所述 初始对话状态跟踪分类器,确定T个第二奖励值。Use the T second texts to train the initial dialogue state tracking classifier; and determine T second reward values according to the trained initial dialogue state tracking classifier.
  7. 一种计算机设备,其特征在于,所述计算机设备包括:A computer equipment, characterized in that the computer equipment includes:
    获取单元,用于获取第一文本,所述第一文本为训练文本数据库中的一个文本,所述第一文本包括至少两个词组;An obtaining unit, configured to obtain a first text, where the first text is a text in a training text database, and the first text includes at least two phrases;
    处理单元,用于从所述第一文本中确定至少一个目标词组;A processing unit, configured to determine at least one target phrase from the first text;
    所述处理单元,还用于根据所述至少一个目标词组,确定P个第二文本,所述P个第二文本中的每个第二文本包括一个扩展词组,所述扩展词组是基于所述至少一个目标词组中的一个确定的,P为大于或等于1的正整数;The processing unit is further configured to determine P second texts according to the at least one target phrase. Each second text in the P second texts includes an extended phrase, and the extended phrase is based on the At least one target phrase is determined, and P is a positive integer greater than or equal to 1;
    所述处理单元,还用于根据所述第一文本和所述P个第二文本,通过机器学习,训练对话状态跟踪分类器,所述对话状态跟踪分类器用于根据获取到的用户的对话,跟踪所述对话的状态。The processing unit is further configured to train a dialogue state tracking classifier through machine learning according to the first text and the P second texts, and the dialogue state tracking classifier is used to obtain the conversations of the user, Track the status of the conversation.
  8. 如权利要求7所述的计算机设备,其特征在于,所述处理单元,具体用于确定与K 1个槽位对应的K 1个第一词组集合,其中所述K 1个槽位分别为所述至少一个目标词组中的K 1个目标词组的槽位,K 1为大于或等于1的正整数; The computer apparatus according to claim 7, characterized in that the processing unit is configured to determine K 1 K 1 slots corresponding set of first phrases, K 1 wherein said slots are respectively said at least one slot in the target phrase target phrase K 1, K 1 is a positive integer equal to or greater than 1;
    确定P 1个第二文本,其中P 1个第二文本包括的扩展词组属于所述K 1个第一词组集合,所述P个第二文本包括所述P 1个第二文本,P 1为大于或等于1的正整数。 Determine P 1 second text, where P 1 second text includes extended phrases belonging to the K 1 first phrase set, said P second text includes said P 1 second text, P 1 is A positive integer greater than or equal to 1.
  9. 如权利要求7或8所述的计算机设备,其特征在于,所述处理单元,具体用于确定与K 2个词义对应的K 2个第二词组集合,其中所述K 2个词义分别为K 2个目标词组的词义,K 2为大于或等于1的正整数; 7 or the computer device as claimed in claim 8, wherein the processing unit is configured to determine K 2 K 2 th meaning corresponding second set of phrases, wherein the meanings are K 2 K th target phrase meaning 2, K 2 is a positive integer equal to or greater than 1;
    确定P 2个第二文本,其中P 2个第二文本包括的扩展词组属于所述K 2个第二词组集合,所述P个第二文本包括所述P 2个第二文本,P 2为大于或等于1的正整数。 Determine P 2 second texts, where the expanded phrases included in P 2 second texts belong to the K 2 second phrase sets, the P second texts include the P 2 second texts, and P 2 is A positive integer greater than or equal to 1.
  10. 如权利要求7至9中任一项所述的计算机设备,其特征在于,所述处理单元,具体用于根据策略网络模型,从所述P个第二文本中确定至少一个第二文本;The computer device according to any one of claims 7 to 9, wherein the processing unit is specifically configured to determine at least one second text from the P second texts according to a policy network model;
    使用所述第一文本和所述至少一个第二文本作为所述机器学习的训练文本,训练所述对话状态跟踪分类器。Using the first text and the at least one second text as the training text of the machine learning to train the dialogue state tracking classifier.
  11. 如权利要求10所述的计算机设备,其特征在于,所述处理单元,还用于根据参考策略网络模型,从P个第二文本中确定T个第二文本,T为大于或等于1的正整数;The computer device according to claim 10, wherein the processing unit is further configured to determine T second texts from P second texts according to the reference strategy network model, where T is a positive value greater than or equal to 1. Integer
    根据初始对话状态跟踪分类器和所述T个第二文本,确定评测结果;Determine the evaluation result according to the initial dialogue state tracking classifier and the T second text;
    根据所述评测结果,训练所述参考策略网络模型得到所述策略网络模型。According to the evaluation result, the reference strategy network model is trained to obtain the strategy network model.
  12. 如权利要求11所述的计算机设备,其特征在于,所述处理单元,具体用于使用初始对话状态跟踪分类器,预测所述T个第二文本中的每个第二文本的状态,得到T个预测结果,根据所述T个预测结果,确定T个第一奖励值;或者The computer device according to claim 11, wherein the processing unit is specifically configured to use the initial dialog state tracking classifier to predict the state of each second text in the T second texts to obtain T Predicting results, and determining T first reward values according to the T predicting results; or
    使用所述T个第二文本对所述初始对话状态跟踪分类器进行训练;根据训练后的所述初始对话状态跟踪分类器,确定T个第二奖励值。Use the T second texts to train the initial dialogue state tracking classifier; and determine T second reward values according to the trained initial dialogue state tracking classifier.
  13. 一种计算机设备,其特征在于,所述计算机设备包括存储器和处理器,存储器存储指令,所述处理器用于调用所述存储器中的指令执行如权利要求1至6中任一项所述的方法。A computer device, characterized in that, the computer device includes a memory and a processor, the memory stores instructions, and the processor is used to call the instructions in the memory to execute the method according to any one of claims 1 to 6 .
  14. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储用于实现如权利要求1至6中任一项所述方法的指令。A computer-readable storage medium, wherein the computer-readable storage medium stores instructions for implementing the method according to any one of claims 1 to 6.
PCT/CN2020/089988 2019-05-13 2020-05-13 Method for training dialog state tracker, and computer device WO2020228732A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910395608.1 2019-05-13
CN201910395608.1A CN110245221B (en) 2019-05-13 2019-05-13 Method and computer device for training dialogue state tracking classifier

Publications (1)

Publication Number Publication Date
WO2020228732A1 true WO2020228732A1 (en) 2020-11-19

Family

ID=67884552

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/089988 WO2020228732A1 (en) 2019-05-13 2020-05-13 Method for training dialog state tracker, and computer device

Country Status (2)

Country Link
CN (1) CN110245221B (en)
WO (1) WO2020228732A1 (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245221B (en) * 2019-05-13 2023-05-23 华为技术有限公司 Method and computer device for training dialogue state tracking classifier
CN110888968A (en) * 2019-10-15 2020-03-17 浙江省北大信息技术高等研究院 Customer service dialogue intention classification method and device, electronic equipment and medium
CN110738996B (en) * 2019-10-24 2022-05-03 深圳小蛙出海科技有限公司 Method for controlling printer printing through voice and printing terminal
CN110766086B (en) * 2019-10-28 2022-07-22 支付宝(杭州)信息技术有限公司 Method and device for fusing multiple classification models based on reinforcement learning model
CN111061850B (en) * 2019-12-12 2023-04-28 中国科学院自动化研究所 Dialogue state tracking method, system and device based on information enhancement
CN111143514B (en) * 2019-12-27 2023-03-21 北京百度网讯科技有限公司 Method and apparatus for generating information
CN111324747B (en) * 2020-02-28 2023-06-06 北京百度网讯科技有限公司 Triplet generation method and device and electronic equipment
CN111611365A (en) * 2020-05-19 2020-09-01 上海鸿翼软件技术股份有限公司 Flow control method, device, equipment and storage medium of dialog system
CN112182171A (en) * 2020-09-18 2021-01-05 国网湖南省电力有限公司 Method and device for constructing operation assistant based on human-computer conversation dispatching robot
CN112215328B (en) * 2020-10-29 2024-04-05 腾讯科技(深圳)有限公司 Training of intelligent agent, action control method and device based on intelligent agent
CN112820295B (en) * 2020-12-29 2022-12-23 华人运通(上海)云计算科技有限公司 Voice processing device and system, cloud server and vehicle

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8515736B1 (en) * 2010-09-30 2013-08-20 Nuance Communications, Inc. Training call routing applications by reusing semantically-labeled data collected for prior applications
CN106202177A (en) * 2016-06-27 2016-12-07 腾讯科技(深圳)有限公司 A kind of file classification method and device
CN109299231A (en) * 2018-09-14 2019-02-01 苏州思必驰信息科技有限公司 Dialogue state tracking, system, electronic equipment and storage medium
CN109460450A (en) * 2018-09-27 2019-03-12 清华大学 Dialogue state tracking, device, computer equipment and storage medium
CN110245221A (en) * 2019-05-13 2019-09-17 华为技术有限公司 The method and computer equipment of training dialogue state tracking classifier

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10055403B2 (en) * 2016-02-05 2018-08-21 Adobe Systems Incorporated Rule-based dialog state tracking
US9977778B1 (en) * 2016-11-03 2018-05-22 Conduent Business Services, Llc Probabilistic matching for dialog state tracking with limited training data
CN107357838B (en) * 2017-06-23 2020-09-01 上海交大知识产权管理有限公司 On-line implementation method of conversation strategy based on multi-task learning
CN107342078B (en) * 2017-06-23 2020-05-05 上海交通大学 Conversation strategy optimized cold start system and method
CN108460015A (en) * 2018-02-08 2018-08-28 合肥工业大学 Text emotion grouped data enhances analysis method
CN108962221B (en) * 2018-07-12 2020-08-04 苏州思必驰信息科技有限公司 Optimization method and system of online dialog state tracking model
CN108959271B (en) * 2018-08-10 2020-06-16 广州太平洋电脑信息咨询有限公司 Description text generation method and device, computer equipment and readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8515736B1 (en) * 2010-09-30 2013-08-20 Nuance Communications, Inc. Training call routing applications by reusing semantically-labeled data collected for prior applications
CN106202177A (en) * 2016-06-27 2016-12-07 腾讯科技(深圳)有限公司 A kind of file classification method and device
CN109299231A (en) * 2018-09-14 2019-02-01 苏州思必驰信息科技有限公司 Dialogue state tracking, system, electronic equipment and storage medium
CN109460450A (en) * 2018-09-27 2019-03-12 清华大学 Dialogue state tracking, device, computer equipment and storage medium
CN110245221A (en) * 2019-05-13 2019-09-17 华为技术有限公司 The method and computer equipment of training dialogue state tracking classifier

Also Published As

Publication number Publication date
CN110245221A (en) 2019-09-17
CN110245221B (en) 2023-05-23

Similar Documents

Publication Publication Date Title
WO2020228732A1 (en) Method for training dialog state tracker, and computer device
CN107818781B (en) Intelligent interaction method, equipment and storage medium
CN107480143B (en) Method and system for segmenting conversation topics based on context correlation
US11301637B2 (en) Methods, devices, and systems for constructing intelligent knowledge base
JP5901001B1 (en) Method and device for acoustic language model training
CN111241267B (en) Abstract extraction and abstract extraction model training method, related device and storage medium
WO2019153737A1 (en) Comment assessing method, device, equipment and storage medium
WO2018049960A1 (en) Method and apparatus for matching resource for text information
US20210312139A1 (en) Method and apparatus of generating semantic feature, method and apparatus of training model, electronic device, and storage medium
JP2023535709A (en) Language expression model system, pre-training method, device, device and medium
US20220318275A1 (en) Search method, electronic device and storage medium
US11170168B2 (en) Unsupervised adaptation of sentiment lexicon
CN113326374B (en) Short text emotion classification method and system based on feature enhancement
US20160224663A1 (en) Context based passage retreival and scoring in a question answering system
CN111460115A (en) Intelligent man-machine conversation model training method, model training device and electronic equipment
US20220156467A1 (en) Hybrid Natural Language Understanding
US20230094730A1 (en) Model training method and method for human-machine interaction
WO2023130951A1 (en) Speech sentence segmentation method and apparatus, electronic device, and storage medium
CN113158687A (en) Semantic disambiguation method and device, storage medium and electronic device
JP2022031863A (en) Word slot recognition method, device and electronic apparatus
US20220198358A1 (en) Method for generating user interest profile, electronic device and storage medium
CN114298055B (en) Retrieval method and device based on multilevel semantic matching, computer equipment and storage medium
CN114722832A (en) Abstract extraction method, device, equipment and storage medium
CN113033194B (en) Training method, device, equipment and storage medium for semantic representation graph model
Banerjee et al. Generating abstractive summaries from meeting transcripts

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20805641

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20805641

Country of ref document: EP

Kind code of ref document: A1