JPWO2021055247A5 - - Google Patents
Download PDFInfo
- Publication number
- JPWO2021055247A5 JPWO2021055247A5 JP2022516740A JP2022516740A JPWO2021055247A5 JP WO2021055247 A5 JPWO2021055247 A5 JP WO2021055247A5 JP 2022516740 A JP2022516740 A JP 2022516740A JP 2022516740 A JP2022516740 A JP 2022516740A JP WO2021055247 A5 JPWO2021055247 A5 JP WO2021055247A5
- Authority
- JP
- Japan
- Prior art keywords
- utterances
- utterance
- stopwords
- training set
- stopword
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Claims (16)
データ処理システムにおいて、インテント分類器を、1つ以上の発話について1つ以上のインテントを特定するように訓練するための発話の訓練セットを受信することと、
前記データ処理システムが、前記発話の訓練セットをストップワードで拡張して、未解決のインテントに対応する未解決のインテントカテゴリについて、アウトドメイン発話の拡張された訓練セットを生成することとを備え、前記拡張することは、
前記発話の訓練セットから、1つ以上の発話を選択することと、
選択された発話ごとに、前記発話内の既存のストップワードを保存し、前記発話内の少なくとも1つの非ストップワードを、ストップワードのリストから選択されるストップワードまたはストップワードフレーズと置換して、アウトドメイン発話を生成することとを含み、前記方法はさらに、
前記データ処理システムが、前記発話の訓練セットと、前記アウトドメイン発話の拡張された訓練セットとを用いて、前記インテント分類器を訓練することを備える、方法。 a method,
In a data processing system, receiving a training set of utterances for training an intent classifier to identify one or more intents for one or more utterances;
said data processing system extending said training set of utterances with stopwords to produce an extended training set of out-domain utterances for unresolved intent categories corresponding to unresolved intents. wherein said expanding comprises:
selecting one or more utterances from the training set of utterances;
for each selected utterance, preserving existing stopwords in said utterance and replacing at least one non-stopword in said utterance with a stopword or stopword phrase selected from a list of stopwords; generating an out-domain utterance, the method further comprising:
A method comprising said data processing system training said intent classifier using said training set of utterances and said extended training set of out-domain utterances.
1つ以上のデータプロセッサと、
前記1つ以上のデータプロセッサで実行されると、前記1つ以上のデータプロセッサにアクションを行わせる命令を含む非一時的なコンピュータ読取可能ストレージ媒体とを備え、前記アクションは、
インテント分類器を、1つ以上の発話について1つ以上のインテントを特定するように訓練するための発話の訓練セットを受信することと、
前記発話の訓練セットをストップワードで拡張して、未解決のインテントに対応する未解決のインテントカテゴリについて、アウトドメイン発話の拡張された訓練セットを生成することとを含み、前記拡張することは、
前記発話の訓練セットから、1つ以上の発話を選択することと、
選択された発話ごとに、前記発話内の既存のストップワードを保存し、前記発話内の少なくとも1つの非ストップワードを、ストップワードのリストから選択されるストップワードまたはストップワードフレーズと置換して、アウトドメイン発話を生成することとを含み、前記アクションはさらに、
前記発話の訓練セットと、前記アウトドメイン発話の拡張された訓練セットとを用いて、前記インテント分類器を訓練することを含む、システム。 a system,
one or more data processors;
a non-transitory computer-readable storage medium containing instructions that, when executed by the one or more data processors, cause the one or more data processors to perform an action, the action comprising:
receiving a training set of utterances for training an intent classifier to identify one or more intents for one or more utterances;
extending the training set of utterances with stopwords to generate an extended training set of out-domain utterances for unresolved intent categories corresponding to unresolved intents; teeth,
selecting one or more utterances from the training set of utterances;
for each selected utterance, preserving existing stopwords in said utterance and replacing at least one non-stopword in said utterance with a stopword or stopword phrase selected from a list of stopwords; generating an out-domain utterance, the action further comprising:
training the intent classifier using the training set of utterances and the extended training set of out-domain utterances.
チャットボットシステムが、前記チャットボットシステムと対話しているユーザによって生成された発話を受信することと、
前記チャットボットシステム内で展開されるインテント分類器を用いて、前記発話を、解決済みのインテントに対応する解決済みのインテントカテゴリまたは未解決のインテントに対応する未解決インテントカテゴリに分類することとを備え、前記インテント分類器は、訓練データを用いて特定された複数のモデルパラメータを含み、前記訓練データは、
前記インテント分類器を、1つ以上の発話について1つ以上の解決済みのインテントを特定するように訓練するための発話の訓練セットと、
前記インテント分類器を、1つ以上の発話について1つ以上の未解決のインテントを特定するように訓練するためのアウトドメイン発話の拡張された訓練セットとを含み、前記アウトドメイン発話の拡張された訓練セットは、前記発話の訓練セットからの発話を含むように人工的に生成され、前記発話の訓練セットでは、前記発話内の既存のストップワードパターンが保存され、かつ、前記発話の各々における少なくとも1つの非ストップワードが、ストップワードとランダムに置換され、
前記複数のモデルパラメータは、損失関数の最小化に基づいて、前記訓練データを用いて識別され、前記方法はさらに、
前記インテント分類器を用いて、前記分類に基づいて、前記解決済みのインテントまたは前記未解決のインテントを出力することを備える、方法。 A method for determining a resolved intent or an unresolved intent from an utterance, comprising:
a chatbot system receiving utterances generated by a user interacting with the chatbot system;
An intent classifier deployed within the chatbot system is used to classify the utterances into resolved intent categories corresponding to resolved intents or unresolved intent categories corresponding to unresolved intents. and classifying, wherein the intent classifier includes a plurality of model parameters identified using training data, the training data comprising:
a training set of utterances for training the intent classifier to identify one or more resolved intents for one or more utterances;
an expanded training set of out-domain utterances for training the intent classifier to identify one or more unresolved intents for one or more utterances; A trained training set is artificially generated to include utterances from the training set of utterances, in which the training set of utterances preserves existing stop word patterns within the utterances, and at least one non-stopword in is randomly replaced with a stopword;
The plurality of model parameters are identified using the training data based on minimizing a loss function, the method further comprising:
using the intent classifier to output the resolved intent or the unresolved intent based on the classification.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201962901203P | 2019-09-16 | 2019-09-16 | |
US62/901,203 | 2019-09-16 | ||
PCT/US2020/050407 WO2021055247A1 (en) | 2019-09-16 | 2020-09-11 | Stop word data augmentation for natural language processing |
Publications (2)
Publication Number | Publication Date |
---|---|
JP2022547631A JP2022547631A (en) | 2022-11-14 |
JPWO2021055247A5 true JPWO2021055247A5 (en) | 2023-08-25 |
Family
ID=72659345
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2022516740A Pending JP2022547631A (en) | 2019-09-16 | 2020-09-11 | Stopword data augmentation for natural language processing |
Country Status (5)
Country | Link |
---|---|
US (1) | US11651768B2 (en) |
EP (1) | EP4032004A1 (en) |
JP (1) | JP2022547631A (en) |
CN (1) | CN114424185A (en) |
WO (1) | WO2021055247A1 (en) |
Families Citing this family (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
BR112015018905B1 (en) | 2013-02-07 | 2022-02-22 | Apple Inc | Voice activation feature operation method, computer readable storage media and electronic device |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
DK201770427A1 (en) | 2017-05-12 | 2018-12-20 | Apple Inc. | Low-latency intelligent automated assistant |
DK179496B1 (en) | 2017-05-12 | 2019-01-15 | Apple Inc. | USER-SPECIFIC Acoustic Models |
GB2569335B (en) * | 2017-12-13 | 2022-07-27 | Sage Global Services Ltd | Chatbot system |
US10928918B2 (en) | 2018-05-07 | 2021-02-23 | Apple Inc. | Raise to speak |
US11468890B2 (en) | 2019-06-01 | 2022-10-11 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US11222630B1 (en) * | 2019-09-19 | 2022-01-11 | Amazon Technologies, Inc. | Detecting false accepts in a shopping domain for handling a spoken dialog |
US11741140B2 (en) | 2019-12-17 | 2023-08-29 | Microsoft Technology Licensing, Llc | Marketplace for conversational bot skills |
US11321532B2 (en) * | 2019-12-17 | 2022-05-03 | Microsoft Technology Licensing, Llc | Conversational manifests for enabling complex bot communications |
WO2021144750A1 (en) * | 2020-01-17 | 2021-07-22 | Bitonic Technology Labs Private Limited | Method and system for identifying ideal virtual assitant bots for providing response to user queries |
US11316806B1 (en) * | 2020-01-28 | 2022-04-26 | Snap Inc. | Bulk message deletion |
CN111414731B (en) * | 2020-02-28 | 2023-08-11 | 北京小米松果电子有限公司 | Text labeling method and device |
US11061543B1 (en) | 2020-05-11 | 2021-07-13 | Apple Inc. | Providing relevant data items based on context |
US11490204B2 (en) | 2020-07-20 | 2022-11-01 | Apple Inc. | Multi-device audio adjustment coordination |
US11715464B2 (en) * | 2020-09-14 | 2023-08-01 | Apple Inc. | Using augmentation to create natural language models |
US11893354B2 (en) | 2021-03-25 | 2024-02-06 | Cognizant Technology Solutions India Pvt. Ltd. | System and method for improving chatbot training dataset |
US20230008868A1 (en) * | 2021-07-08 | 2023-01-12 | Nippon Telegraph And Telephone Corporation | User authentication device, user authentication method, and user authentication computer program |
CN114881035B (en) * | 2022-05-13 | 2023-07-25 | 平安科技(深圳)有限公司 | Training data augmentation method, device, equipment and storage medium |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017112813A1 (en) * | 2015-12-22 | 2017-06-29 | Sri International | Multi-lingual virtual personal assistant |
US11158311B1 (en) * | 2017-08-14 | 2021-10-26 | Guangsheng Zhang | System and methods for machine understanding of human intentions |
CN107515857B (en) | 2017-08-31 | 2020-08-18 | 科大讯飞股份有限公司 | Semantic understanding method and system based on customization technology |
US11093707B2 (en) * | 2019-01-15 | 2021-08-17 | International Business Machines Corporation | Adversarial training data augmentation data for text classifiers |
US20200257856A1 (en) * | 2019-02-07 | 2020-08-13 | Clinc, Inc. | Systems and methods for machine learning based multi intent segmentation and classification |
-
2020
- 2020-09-09 US US17/016,122 patent/US11651768B2/en active Active
- 2020-09-11 WO PCT/US2020/050407 patent/WO2021055247A1/en unknown
- 2020-09-11 JP JP2022516740A patent/JP2022547631A/en active Pending
- 2020-09-11 EP EP20780842.9A patent/EP4032004A1/en active Pending
- 2020-09-11 CN CN202080064541.4A patent/CN114424185A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JPWO2021055247A5 (en) | ||
US20230409102A1 (en) | Low-power keyword spotting system | |
Ravuri et al. | Recurrent neural network and LSTM models for lexical utterance classification. | |
US7120582B1 (en) | Expanding an effective vocabulary of a speech recognition system | |
EP4018437B1 (en) | Optimizing a keyword spotting system | |
JP2004362584A (en) | Discrimination training of language model for classifying text and sound | |
US5950158A (en) | Methods and apparatus for decreasing the size of pattern recognition models by pruning low-scoring models from generated sets of models | |
US20180047385A1 (en) | Hybrid phoneme, diphone, morpheme, and word-level deep neural networks | |
US5963902A (en) | Methods and apparatus for decreasing the size of generated models trained for automatic pattern recognition | |
US8032377B2 (en) | Grapheme to phoneme alignment method and relative rule-set generating system | |
Le et al. | Developing STT and KWS systems using limited language resources | |
Golik et al. | Multilingual features based keyword search for very low-resource languages. | |
Liu et al. | The Cambridge University 2014 BOLT conversational telephone Mandarin Chinese LVCSR system for speech translation | |
JP6594251B2 (en) | Acoustic model learning device, speech synthesizer, method and program thereof | |
TW495736B (en) | Method for generating candidate strings in speech recognition | |
Wang et al. | CUHK System for the Spoken Web Search task at Mediaeval 2012. | |
JP2003271629A (en) | Text retrieval method and device by voice input | |
Attabi et al. | Anchor Models and WCCN Normalization For Speaker Trait Classification. | |
JPWO2021201907A5 (en) | ||
Penagarikano et al. | Dimensionality reduction for using high-order n-grams in SVM-based phonotactic language recognition | |
WO2017212689A1 (en) | Responding device, method for controlling responding device, and control program | |
JP2938865B1 (en) | Voice recognition device | |
JP2021193608A5 (en) | ||
Penagarikano et al. | A dynamic approach to the selection of high order n-grams in phonotactic language recognition | |
JP2002082690A (en) | Language model generating method, voice recognition method and its program recording medium |