JPWO2021055247A5

JPWO2021055247A5 -

Info

Publication number: JPWO2021055247A5
Application number: JP2022516740A
Authority: JP
Publication date: 2023-08-25

Claims

a method,
In a data processing system, receiving a training set of utterances for training an intent classifier to identify one or more intents for one or more utterances;
said data processing system extending said training set of utterances with stopwords to produce an extended training set of out-domain utterances for unresolved intent categories corresponding to unresolved intents. wherein said expanding comprises:
selecting one or more utterances from the training set of utterances;
for each selected utterance, preserving existing stopwords in said utterance and replacing at least one non-stopword in said utterance with a stopword or stopword phrase selected from a list of stopwords; generating an out-domain utterance, the method further comprising:
A method comprising said data processing system training said intent classifier using said training set of utterances and said extended training set of out-domain utterances.

For each selected utterance, the existing stopwords in the utterance are preserved, and all of the non-stopwords in the utterance are matched with the stopword or stopword phrase selected from the list of stopwords. 2. The method of claim 1, wherein the method is permuted.

For each selected utterance, the existing stopwords in the utterance are preserved, and at least one of the non-stopwords in the utterance is the stopword or the stopword selected from the list of stopwords. 2. The method of claim 1, wherein the stopword phrases are randomly permuted.

Selecting the utterance comprises: retrieving consecutive stopwords at the beginning of the utterance in the training set of utterances; and selecting an utterance comprising :

Selecting the utterance comprises searching for n consecutive stopwords anywhere in the utterance in the training set of utterances; and selecting the utterance that has .

Augmenting the training set of utterances with the stopwords further comprises: (i) repeatedly selecting the one or more utterances from the training set of utterances, processing each utterance once, and generating a corresponding out-domain utterance based on the expansion rate; (ii) selecting said one or more utterances from said training set of utterances and processing said one or more utterances multiple times; The method of any of claims 1-5, comprising generating a plurality of out-domain utterances from each utterance based on said predefined expansion ratio, or (iii) any combination thereof. .

7. The method of claim 6, wherein the predefined expansion ratio is between 1:0.05 and 1:1 ratio of original speech to expanded speech.

a system,
one or more data processors;
a non-transitory computer-readable storage medium containing instructions that, when executed by the one or more data processors, cause the one or more data processors to perform an action, the action comprising:
receiving a training set of utterances for training an intent classifier to identify one or more intents for one or more utterances;
extending the training set of utterances with stopwords to generate an extended training set of out-domain utterances for unresolved intent categories corresponding to unresolved intents; teeth,
selecting one or more utterances from the training set of utterances;
for each selected utterance, preserving existing stopwords in said utterance and replacing at least one non-stopword in said utterance with a stopword or stopword phrase selected from a list of stopwords; generating an out-domain utterance, the action further comprising:
training the intent classifier using the training set of utterances and the extended training set of out-domain utterances.

For each selected utterance, the existing stopwords in the utterance are preserved, and all of the non-stopwords in the utterance are matched with the stopword or stopword phrase selected from the list of stopwords. 9. The system of claim 8, wherein the system is permuted.

For each selected utterance, the existing stopwords in the utterance are preserved, and at least one of the non-stopwords in the utterance is the stopword or the stopword selected from the list of stopwords. 9. The system of claim 8, wherein the stopword phrases are randomly permuted.

Selecting the utterance comprises: retrieving consecutive stopwords at the beginning of the utterance in the training set of utterances; and selecting an utterance that has an utterance.

Selecting the utterance comprises searching for n consecutive stopwords anywhere in the utterance in the training set of utterances; and selecting the utterance that has the utterance.

Augmenting the training set of utterances with the stopwords further comprises: (i) repeatedly selecting the one or more utterances from the training set of utterances, processing each utterance once, and generating a corresponding out-domain utterance based on the expansion rate; (ii) selecting said one or more utterances from said training set of utterances and processing said one or more utterances multiple times; 9. The system of claim 8, comprising generating multiple out-domain utterances from each utterance based on the predefined expansion ratio, or (iii) any combination thereof.

14. The system of claim 13, wherein the predefined expansion ratio is between 1:0.05 and 1:1 ratio of original speech to expanded speech.

A method for determining a resolved intent or an unresolved intent from an utterance, comprising:
a chatbot system receiving utterances generated by a user interacting with the chatbot system;
An intent classifier deployed within the chatbot system is used to classify the utterances into resolved intent categories corresponding to resolved intents or unresolved intent categories corresponding to unresolved intents. and classifying, wherein the intent classifier includes a plurality of model parameters identified using training data, the training data comprising:
a training set of utterances for training the intent classifier to identify one or more resolved intents for one or more utterances;
an expanded training set of out-domain utterances for training the intent classifier to identify one or more unresolved intents for one or more utterances; A trained training set is artificially generated to include utterances from the training set of utterances, in which the training set of utterances preserves existing stop word patterns within the utterances, and at least one non-stopword in is randomly replaced with a stopword;
The plurality of model parameters are identified using the training data based on minimizing a loss function, the method further comprising:
using the intent classifier to output the resolved intent or the unresolved intent based on the classification.

Computer program for causing one or more processors to perform the method of any of claims 1-7 and 15.