JP2022076439A

JP2022076439A - Dialogue management

Info

Publication number: JP2022076439A
Application number: JP2021042260A
Authority: JP
Inventors: ストヤンシェヴスベトラーナ; Stoyanchev Svetlana; カイゼルサイモン; Keizer Simon; サナンドドディパトララマ; Sanand Doddipatla Rama
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2020-11-09
Filing date: 2021-03-16
Publication date: 2022-05-19
Anticipated expiration: 2041-03-16
Also published as: GB2604317B; US20220147719A1; GB202017663D0; GB2604317A; JP7279099B2

Abstract

To provide a module for updating a dialogue state for use in a dialogue system for conducting a dialogue with a user, a method for training a classifier, and a method for updating a dialogue state for the user.SOLUTION: A dialogue system updates a dialogue state stored in a memory and having a data structure that stores information exchanged between a user and the dialogue system by comparing natural language input from the user with a plurality of actions indicating possible requests of the user and using information from the action that matches with the natural language input, and the dialog system generates a response to the natural language input using the updated dialogue state.SELECTED DRAWING: Figure 3

Description

ここで説明する実施形態は、対話管理に関する。 The embodiments described here relate to dialogue management.

対話システム、例えば、タスク指向対話システムは、情報検索、カスタマーサポート、ｅ－コマース、物理的環境制御、および人間－ロボット交流（interaction）のような、タスクに対する自然言語インターフェースである。自然言語は、ユーザがタスク特有コマンドのセットを学習することを必要としない、ユニバーサル通信インターフェースである。音声インターフェースは、話すことによってユーザが通信することを可能にし、チャットインターフェースは、タイピングによって可能にする。ユーザ入力の正しい解釈は、人が幅広い自然入力を苦も無く解釈することを可能にする文法的および常識的知識が欠如している自動対話システムにとって難しい課題でありうる。 Dialogue systems, such as task-oriented dialogue systems, are natural language interfaces to tasks such as information retrieval, customer support, e-commerce, physical environment control, and human-robot interaction. Natural language is a universal communication interface that does not require the user to learn a set of task-specific commands. The voice interface allows the user to communicate by speaking, and the chat interface is enabled by typing. Correct interpretation of user input can be a daunting task for automated dialogue systems that lack the grammatical and common sense knowledge that allows a person to comfortably interpret a wide range of natural inputs.

以下の図面を参照して、実施形態を説明する。
図１Ａは、実施形態にしたがう対話システムを使用するモバイルの概略図である。図１Ｂは、実施形態にしたがう対話システムを使用するモバイルの概略図である。図２Ａは、実施形態にしたがうシステムの概略図である。図２Ｂは、図２Ｂ中に示すアプリケーションの概略図である。図３は、実施形態にしたがう方法を示すフローチャートである。図４は、例示的な対話状態の概略図である。図５は、実施形態にしたがうシステムの概略図である。 Embodiments will be described with reference to the following drawings.
FIG. 1A is a schematic diagram of a mobile using a dialogue system according to an embodiment. FIG. 1B is a schematic diagram of a mobile using a dialogue system according to an embodiment. FIG. 2A is a schematic diagram of a system according to an embodiment. FIG. 2B is a schematic diagram of the application shown in FIG. 2B. FIG. 3 is a flowchart showing a method according to the embodiment. FIG. 4 is a schematic diagram of an exemplary dialogue state. FIG. 5 is a schematic diagram of a system according to an embodiment.

１つの実施形態において、ユーザとの対話を行うための対話システムにおける使用のために対話状態を更新するためのモジュールが提供され、モジュールは、
ユーザ入力と、
プロセッサと、
メモリと、を備え、
ここで、プロセッサは、ユーザからの自然言語入力に応答して対話状態を更新するように適合され、対話状態はメモリに記憶され、
対話状態は、ユーザと対話システムとの間で交換された情報を記憶するデータ構造を備え、
プロセッサは、前記ユーザからの自然言語入力を複数の有り得るアクションと比較することによって前記対話状態を更新し、前記アクションは、ユーザの有り得る要求を示し、自然言語入力と一致するアクションからの情報を使用して、状態を更新するように構成される。 In one embodiment, a module is provided for updating the dialogue state for use in a dialogue system for interacting with a user.
User input and
With the processor
With memory,
Here, the processor is adapted to update the dialogue state in response to natural language input from the user, and the dialogue state is stored in memory.
The dialogue state has a data structure that stores the information exchanged between the user and the dialogue system.
The processor updates the dialogue state by comparing the natural language input from the user with a plurality of possible actions, the action indicating the user's possible request and using information from the action that matches the natural language input. And it is configured to update the state.

状態に基づく対話システムにおいて、対話が進行すると、ユーザとシステムとの間で情報を交換するために対話状態は使用される。状態に基づく対話システムが有する課題は、より多くの情報をユーザから受信するときに状態を更新することである。ユーザがまず対話システムに発話するとき、対話状態は一般的に空であり、対話が開始する。その後、システムは応答し、ユーザは更新されるべき対話状態に対するさらなる情報を提供して応答するだろう。システムおよびユーザは、その後、交代で発話を提供する。 In a state-based dialogue system, dialogue states are used to exchange information between the user and the system as the dialogue progresses. The challenge with state-based dialogue systems is to update the state as more information is received from the user. When the user first speaks to the dialogue system, the dialogue state is generally empty and the dialogue begins. The system will then respond and the user will respond by providing more information about the dialogue state to be updated. The system and the user then take turns providing the utterance.

開示されるモジュールは、ユーザの発話のテキスト入力を入力とする統計モデルを使用する対話システムを実行するコンピュータによる以前に実行されていない機能のコンピュータ性能を可能にすることによって、コンピュータの機能性に改善をもたらす。具体的には、開示されるシステムは、ユーザが対話の前の順番で提供された情報を参照するときに、適切な応答を出力できる対話システムを提供する。それは、３ステージアプローチによってこの改善を提供し、実施形態において、システムは、
１）対話状態から候補アクションを推測し；
２）各候補アクションに対して関連性スコア∈［０,１］を計算し；
３）最も起こりえるアクションで状態を更新する。 The disclosed module enhances computer functionality by enabling the computer performance of previously unexecuted functions by a computer running an interactive system that uses a statistical model that takes the text input of the user's speech as input. Bring improvement. Specifically, the disclosed system provides a dialogue system capable of outputting an appropriate response when the user refers to the information provided in the order prior to the dialogue. It provides this improvement by a three-stage approach, and in embodiments, the system is:
1) Guess the candidate action from the dialogue state;
2) Calculate the relevance score ∈ [0,1] for each candidate action;
3) Update the state with the most likely action.

上記のシステムは、ドメイン特有の自然言語理解コンポーネントを実装することなく、拡張された機能性を可能にする。さらに、注釈スキームを設計する必要がなく、かつ、意図およびエンティティに注釈をつける必要がない。 The above system enables enhanced functionality without implementing domain-specific natural language understanding components. Moreover, there is no need to design annotation schemes and annotate intents and entities.

実施形態において、対話状態は、対話の間に言及されているアイテムを備えるデータ構造を備える。いくつかの実施形態では、対話状態はスロットを提供することによって、情報を記憶するだろう。他では、決定木データ構造が提供されるだろう。他の実施形態では、構造の何らかのフリーテキスト部分が提供されるかもしれない。 In embodiments, the dialogue state comprises a data structure comprising the items mentioned during the dialogue. In some embodiments, the interactive state will store information by providing a slot. Elsewhere, a decision tree data structure will be provided. In other embodiments, some free text portion of the structure may be provided.

実施形態において、複数の有り得るアクションは、対話の間に言及されている複数のアイテムに関するアクションを含む。いくつかの実施形態では、対話中で言及されているすべてのアイテムが有り得るアクションに含まれることができる。これは、ユーザによる最新の発話が対話中で参照された以前のアイテムと比較されることを可能にする。他の実施形態では、有り得るアクションは、すべての対話ではなく、最後のいくつかの順番に基づいている。 In embodiments, the plurality of possible actions includes actions relating to the plurality of items mentioned during the dialogue. In some embodiments, all items mentioned in the dialogue can be included in possible actions. This allows the user's latest utterance to be compared to previous items referenced during the dialogue. In other embodiments, the possible actions are based on the last few orders rather than all dialogues.

複数の有り得るアクションは、状態およびドメイン定義から推測される。ドメイン定義は、データ構造の説明である。例えば、レストラン検索ドメインにおいて、ドメイン定義は、情報提供可能／要求可能スロットのセットを含む。カタログ注文ドメインにおいて、それは、アイテムタイプおよびその属性（色、サイズ等）である。食べ物の注文において、それは、レストランのメニューを表す構造である。 Multiple possible actions are inferred from the state and domain definitions. A domain definition is a description of a data structure. For example, in a restaurant search domain, the domain definition includes a set of informable / requestable slots. In the catalog order domain, it is the item type and its attributes (color, size, etc.). In ordering food, it is a structure that represents a restaurant menu.

ドメイン定義はまた、ドメイン特有のルールを含むことができる。例えば、ホテル予約システムにおいて、ユーザは到着日および出発日、または、到着日および滞在期間を特定することができる。（現在の対話状態と共に）ドメイン定義は、候補アクションのリストを生成するために使用される。 Domain definitions can also include domain-specific rules. For example, in a hotel reservation system, a user can specify an arrival date and a departure date, or an arrival date and a length of stay. The domain definition (along with the current dialogue state) is used to generate a list of candidate actions.

対話システムは、多くの使用のために適合できる。１つの可能な使用は、情報検索である。しかしながら、他の使用、例えば、情報収集、トラブルシューティング、カスタマーサポート、ｅ－コマース、物理的環境制御、および人間－ロボット交流が可能である。対話状態は、ユーザとシステムとの間で交換される情報を備える。対話システムは、情報を取り出すように構成され、前記対話状態は、ユーザ目的および履歴を備えるとき、前記ユーザ目的は、ユーザが要求する情報を示し、前記履歴は、ユーザ目的に応答して以前に取り出されているアイテムを定義する。ユーザ目的は、ユーザによって所望される食べ物のタイプ、興味のある物理的エリア等であってもよい。 Dialogue systems can be adapted for many uses. One possible use is information retrieval. However, other uses are possible, such as information gathering, troubleshooting, customer support, e-commerce, physical environmental control, and human-robot interaction. The interactive state comprises information exchanged between the user and the system. The dialogue system is configured to retrieve information, wherein the dialogue state comprises a user purpose and a history, the user purpose indicates information requested by the user, and the history previously in response to the user purpose. Define the item being retrieved. The user's purpose may be the type of food desired by the user, the physical area of interest, and the like.

さらなる実施形態において、プロセッサは、一致するアクションと一致しないアクションを示すために二値分類器を使用することにより、ユーザからの自然言語入力を複数の有り得るアクションと比較するように構成される。二値分類器は、スコアを出力するように構成され、前記スコアは、アクションが一致するかどうかを決定するためにしきい値と比較される。 In a further embodiment, the processor is configured to compare natural language input from the user with a plurality of possible actions by using a binary classifier to indicate matching and unmatched actions. The binary classifier is configured to output a score, which score is compared to a threshold to determine if the actions match.

１つの実施形態において、プロセッサは、各アクションに対する複数のモデル入力を生成することによって、ユーザからの自然言語入力を複数の有り得るアクションと比較するように構成され、各モデル入力は、ユーザからの自然言語入力およびアクションを備え、処理することは、前記スコアを出力するために、モデル入力をトレーニング済み機械学習モデルとして実装された二値分類器に入力するようにさらに構成される。 In one embodiment, the processor is configured to compare natural language inputs from the user to multiple possible actions by generating multiple model inputs for each action, where each model input is natural from the user. The processing with the language input and the action is further configured to input the model input into a binary classifier implemented as a trained machine learning model in order to output the score.

トレーニング済み機械学習モデルは、トランスフォーマーモデルであってもよい。トランスフォーマーモデルは、自己注意機構（self-attention mechanism）を使用し、自己注意機構によってこれらの距離にかかわらず依存性が捕捉される。トランスフォーマーモデルは、エンコーダ－デコーダフレームワークを用いてよく、トレーニング済み機械学習モデルは、ＢＥＲＴのような双方向にトレーニングされた機械学習モデルであってもよい。 The trained machine learning model may be a transformer model. The transformer model uses a self-attention mechanism, which captures dependencies regardless of these distances. The transformer model may use an encoder-decoder framework and the trained machine learning model may be a bidirectionally trained machine learning model such as BERT.

実施形態において、モデル入力は、対話システムからの以前の応答をさらに備える。例えば、最後のシステム発話が使用されてもよく、または、システム発話に対応する語彙対話作用のような以前のシステム発話の表現が使用されてもよい。 In embodiments, the model input further comprises a previous response from the dialogue system. For example, the last system utterance may be used, or earlier system utterance expressions such as the lexical dialogue corresponding to the system utterance may be used.

実施形態において、アクションは、候補アクションおよび状態更新アクションから選択されてもよく、ここで、候補アクションは、システムからの以前の応答のユーザによって尋ねられた質問を示し、状態更新アクションは、システムからの以前の応答にリンクしないユーザからの要求を示す。状態更新は、「目的変更」を表してもよい。 In embodiments, the action may be selected from a candidate action and a state update action, where the candidate action indicates a question asked by the user in a previous response from the system and the state update action is from the system. Indicates a request from a user that does not link to the previous response of. The state update may represent a "purpose change".

アクションに対するモジュール入力は、システムの以前の応答の表現、ユーザ入力、対話状態履歴にあるアイテムのアイテム説明、およびアイテム説明において参照されるアイテムに関連する提案された質問を備えてもよい。状態更新アクションに対するモジュール入力は、システムの以前の応答の表現、ユーザ入力、および有り得るユーザクエリに関連して提案された質問を備える。 Module input to an action may include a representation of the system's previous response, user input, an item description for an item in the dialogue state history, and suggested questions related to the item referenced in the item description. Module inputs to state update actions include representations of the system's previous responses, user inputs, and suggested questions related to possible user queries.

上記のモジュールは、対話システムの一部を形成してもよい。したがって、さらなる実施形態において、対話システムは、
ユーザ入力と、
プロセッサと、
メモリとを備え、
プロセッサは、ユーザからの自然言語入力に応答して対話状態を更新するように適合され、対話状態はメモリに記憶され、
対話状態は、ユーザと対話システムとの間で交換された情報を記憶するデータ構造を備え、
プロセッサは、前記ユーザからの自然言語入力を複数の有り得るアクションと比較することによって前記対話状態を更新し、前記アクションは、ユーザの有り得る要求を示し、自然言語入力と一致するアクションからの情報を使用して、状態を更新するように構成され、
プロセッサは、更新された状態を使用して、自然言語入力への応答を生成するように構成される。 The above modules may form part of a dialogue system. Therefore, in a further embodiment, the dialogue system is
User input and
With the processor
Equipped with memory
The processor is adapted to update the dialogue state in response to natural language input from the user, and the dialogue state is stored in memory.
The dialogue state has a data structure that stores the information exchanged between the user and the dialogue system.
The processor updates the dialogue state by comparing the natural language input from the user with a plurality of possible actions, the action indicating the user's possible request and using information from the action that matches the natural language input. And is configured to update the state,
The processor is configured to use the updated state to generate a response to natural language input.

さらなる実施形態において、ユーザとの対話を行うための対話システムにおけるユーザに対する対話状態を更新するためのコンピュータ実現方法が提供され、方法は、
ユーザから自然言語入力を受信することと、
ユーザからの自然言語入力に応答して、対話状態を更新するように、プロセッサを使用することと、対話状態は、メモリに記憶され、対話状態は、ユーザと対話システムとの間で交換される情報を記憶するデータ構造を備え、
前記ユーザからの自然言語入力を複数の有り得るアクションと比較することにより、前記対話状態を更新することとを備え、前記アクションは、ユーザの有り得る要求を示し、自然言語入力と一致するアクションからの情報を使用して、状態を更新する。 In a further embodiment, a computer implementation method for updating the dialogue state for a user in a dialogue system for interacting with the user is provided, the method of which is:
Receiving natural language input from the user and
Using the processor to update the dialogue state in response to natural language input from the user, the dialogue state is stored in memory, and the dialogue state is exchanged between the user and the dialogue system. With a data structure to store information
It comprises updating the dialogue state by comparing the natural language input from the user with a plurality of possible actions, wherein the action indicates a possible request of the user and information from the action that matches the natural language input. Use to update the status.

さらなる実施形態において、対話システムにおいて状態を更新するための分類器をトレーニングする方法であって、
分類器を提供することと、前記分類器は、自然言語入力が、有り得るアクションと一致するときに一致を示すスコアを分類器が出力するように、ユーザからの自然言語入力を有り得るアクションと比較することが可能である、
自然言語入力および有り得るアクションを備えるデータセットを使用して、前記分類器をトレーニングすることと、を備え、前記データセットは、自然言語入力と有り得るアクションが一致する場合、肯定の組み合わせを、自然言語入力と有り得るアクションが一致しない場合、不正解の選択肢（distractors）を備える。 In a further embodiment, a method of training a classifier for updating state in a dialogue system.
Providing a classifier, the classifier compares a natural language input from a user to a possible action such that the classifier outputs a score indicating a match when the natural language input matches a possible action. Is possible,
Training the classifier using a dataset with natural language input and possible actions, the dataset provides a positive combination, natural language, if the natural language input and possible actions match. If the input and possible actions do not match, provide distractors.

上記の方法において、有り得るアクションは、候補アクションおよび状態更新アクションから選択され、ここで、候補アクションは、システムからの以前の応答のユーザによって尋ねられた質問を示し、状態更新アクションは、システムからの以前の応答にリンクしないユーザからの要求を示す。 In the above method, the possible action is selected from a candidate action and a state update action, where the candidate action indicates the question asked by the user in the previous response from the system and the state update action is from the system. Indicates a request from a user that does not link to the previous response.

分類器のトレーニングは、ポリシーモデルのトレーニングと共に、または別々に実行されてもよい。 Classifier training may be performed with or separately from policy model training.

上記の方法は、命令を備えるコンピュータ読取可能媒体を使用して実行されてもよく、命令がコンピュータによって実行されるとき、コンピュータに、上記の方法を実行させる。 The above method may be performed using a computer-readable medium with instructions, which causes the computer to perform the above method when the instructions are executed by the computer.

対話システムにおけるユーザ入力は、自然言語理解（ＮＬＵ）と対話状態追跡（ＤＳＴ）とのコンポーネントの組み合わせを使用して理解できる。ＮＬＵはユーザ入力にあるドメイン特有の意図とエンティティを識別し、ＤＳＴは、対話状態を更新する。 User input in a dialogue system can be understood using a combination of components of natural language understanding (NLU) and dialogue state tracking (DST). NLU identifies domain-specific intents and entities in user input, and DST updates the dialogue state.

図１Ａおよび１Ｂは、実施形態にしたがう方法の使用を図示するための、スマートフォンの概略図である。図１Ａにおいて、ユーザは、質問１「私は安いイタリア風レストランを探しています」を電話機３に入力する。図１Ｂにおいて、電話機５は、「Ｚｉｚｚｉケンブリッジは、中央で良い飲食店です」で応答する。 1A and 1B are schematic views of a smartphone for illustrating the use of a method according to an embodiment. In FIG. 1A, the user inputs question 1 "I am looking for a cheap Italian restaurant" to phone 3. In FIG. 1B, the telephone 5 answers with "Zizzi Cambridge is a good restaurant in the center".

図１Ａおよび１Ｂは、この説明で使用されるであろう、ケンブリッジのレストラン検索に関連するタスク指向対話システムの１つの例を示している。しかしながら、方法は、ユーザから自然言語有力を受信する、情報検索、カスタマーサポート、ｅコマース、物理的環境制御、および人間－ロボット交流のような、任意のタスク指向対話システムに適用できる。ユーザ入力は、音声認識を介して処理される発話としてマイクロフォンを介して受信されることができ、または、テキスト入力であることがある。 FIGS. 1A and 1B show one example of a task-oriented dialogue system related to restaurant search in Cambridge, which will be used in this description. However, the method can be applied to any task-oriented dialogue system that receives natural language power from the user, such as information retrieval, customer support, e-commerce, physical environment control, and human-robot interaction. User input can be received via a microphone as an utterance processed via voice recognition, or may be text input.

スマートフォンが示されているが、方法は、プロセッサを有する任意のデバイス上で実現できる。例えば、店、銀行、輸送プロバイダ等においてユーザクエリを取り扱うように構成されている、標準コンピュータ、任意の音声－制御オートメーション、サーバである。 A smartphone is shown, but the method can be implemented on any device that has a processor. For example, a standard computer, any voice-control automation, server configured to handle user queries in stores, banks, transportation providers, and the like.

会話を以下に示す。

The conversation is shown below.

ユーザは、順番１、３、および５においてクエリを入力し、システムは、順番２、４、および６においてそれぞれ応答する。 The user enters the query in orders 1, 3, and 5, and the system responds in orders 2, 4, and 6, respectively.

上記対話の５番目の順番において、ユーザは、別のレストラン（Ｎａｎｄｏ）の提示の直後に、３つ前の順番でシステムによって提示されたレストラン（Ｚｉｚｚｉ）の住所を尋ねている。ユーザは、表現「イタリア風飲食店」を参照してターゲットレストランを識別している。このタイプの対話は、特に対話システムにおいて問題となる。 In the fifth order of the dialogue, the user asks for the address of the restaurant (Zizzi) presented by the system in the third previous order immediately after the presentation of another restaurant (Nando). The user identifies the target restaurant by referring to the expression "Italian restaurant". This type of dialogue is especially problematic in dialogue systems.

上記で示した対話は、図２Ａおよび２Ｂならびに図３のフローチャートも参照して説明するシステムを使用して達成される。 The dialogue shown above is accomplished using the system described with reference to FIGS. 2A and 2B and the flowchart of FIG.

図２Ａは、実施形態にしたがう方法を実現するために使用できるハードウェアの概略図である。これは１つの例であり、他の構成を使用できることに留意すべきである。 FIG. 2A is a schematic diagram of the hardware that can be used to implement the method according to the embodiment. It should be noted that this is just one example and other configurations can be used.

ハードウェアは、コンピューティングセクション７００を備えている。この特定の例では、このセクションのコンポーネントはともに説明される。しかしながら、これらは必ずしも同じ位置に配置されるわけではないことが認識される。 The hardware comprises a computing section 700. In this particular example, the components in this section are described together. However, it is recognized that they are not necessarily placed in the same position.

コンピューティングシステム７００のコンポーネントは、（中央処理ユニット、ＣＰＵのような）処理ユニット７１３、システムメモリ７０１、システムメモリ７０１から処理ユニット７１３までを含むさまざまなシステムコンポーネントを結合するシステムバス７１１、を含んでいてもよいがこれらに限定されない。システムバス７１１は、メモリバスまたはメモリコントローラ、さまざまなバスアーキテクチャ等のうちのいずれかを使用する周辺バスおよびローカルバスを含むいくつかのタイプのバス構造のうちのいずれかであってもよい。コンピューティングセクション７００は、バス７１１に接続された外部メモリ７１５も含む。 The components of the computing system 700 include a processing unit 713 (such as a central processing unit, CPU), system memory 701, and a system bus 711 that combines various system components including system memory 701 to processing unit 713. It may be, but it is not limited to these. The system bus 711 may be any of several types of bus structures, including peripheral buses and local buses that use any of memory buses or memory controllers, various bus architectures, and the like. The computing section 700 also includes an external memory 715 connected to the bus 711.

システムメモリ７０１は、リードオンリーメモリのような、揮発性／または不揮発性メモリの形態のコンピュータ記憶媒体を含む。基本入力出力システム（ＢＩＯＳ）７０３は、スタートアップの間のような、コンピュータ内の要素間で情報を変換することを助けるルーチンを含み、システムメモリ７０１に典型的には記憶されている。さらに、システムメモリは、ＣＰＵ７１３によって使用される、オペレーティングシステム７０５、アプリケーションプログラム７０７、およびプログラムデータ７０９を含んでいる。 The system memory 701 includes a computer storage medium in the form of volatile / or non-volatile memory, such as read-only memory. The basic input / output system (BIOS) 703 includes routines that help convert information between elements in the computer, such as during startup, and is typically stored in system memory 701. Further, the system memory includes an operating system 705, an application program 707, and program data 709 used by the CPU 713.

また、インターフェース７２５は、バス７１１に接続されている。インターフェースは、コンピュータシステムがさらなるデバイスから情報を受信するネットワークインターフェースであってもよい。インターフェースはまた、ユーザがあるコマンド等に応答することを可能にするユーザインターフェースであってもよい。 Further, the interface 725 is connected to the bus 711. The interface may be a network interface on which the computer system receives information from additional devices. The interface may also be a user interface that allows the user to respond to certain commands and the like.

この例では、ビデオインターフェース７１７が提供されている。ビデオインターフェース７１７は、グラフィック処理メモリ７２１に接続されているグラフィック処理ユニット７１９を備えている。 In this example, video interface 717 is provided. The video interface 717 includes a graphic processing unit 719 connected to the graphic processing memory 721.

グラフィック処理ユニット（ＧＰＵ）７１９は、ニューラルネットワークトレーニングのような、データ並列動作へのその適合による分類器のトレーニングに特に良く適している。したがって、実施形態において、分類器をトレーニングするための処理は、ＣＰＵ７１３とＧＰＵ７１９との間で分割されてもよい。 The graphics processing unit (GPU) 719 is particularly well suited for training classifiers by their adaptation to data parallelism, such as neural network training. Therefore, in the embodiment, the process for training the classifier may be divided between the CPU 713 and the GPU 719.

いくつかの実施形態において、分類器をトレーニングすることと状態更新を実行することとのために、異なるハードウェアが使用されてもよいことに留意すべきである。例えば、分類器のトレーニングは、１つ以上のローカルデスクトップまたはワークステーションコンピュータ、あるいは、クラウドコンピューティングシステムのデバイスで生じるかもしれず、これらは、１つ以上の分離したデスクトップまたはワークステーションＧＰＵを含んでいてもよく、１つ以上の分離したデスクトップまたはワークステーションＣＰＵは、例えば、ＰＣ指向アーキテクチャ、および例えば１６ＧＢ以上の揮発性システムメモリの実質的量を有するプロセッサである。例えば、対話の性能はモバイルまたは組み込まれたハードウェアを使用してもよいけれども（これらは、システムオンチップ（ＳｏＣ）の一部としてのモバイルＧＰＵを含む、またはＧＰＵを含まない）、１つ以上のモバイルまたは組み込まれているＣＰＵ、例えばモバイル指向アーキテクチャ、またはマイクロコントローラ指向アーキテクチャと、例えば１ＧＢ未満のより少ない量の揮発性メモリとを有するプロセッサ、を使用してもよい。例えば、対話を実行するハードウェアは、スマートスピーカーまたは、バーチャルアシスタントを含む移動体電話機のような音声支援システム１２０であってもよい。 It should be noted that in some embodiments, different hardware may be used for training the classifier and performing state updates. For example, classifier training may occur on one or more local desktop or workstation computers, or devices in a cloud computing system, which include one or more separate desktop or workstation GPUs. Often, one or more separate desktop or workstation CPUs are processors with, for example, a PC-oriented architecture and, for example, a substantial amount of volatile system memory of 16 GB or more. For example, although interactive performance may use mobile or embedded hardware (these include or do not include a mobile GPU as part of a system-on-chip (SoC)), one or more. Mobile or embedded CPUs, such as mobile-oriented architectures, or microcontroller-oriented architectures, and processors with, for example, less than 1 GB of volatile memory may be used. For example, the hardware that performs the dialogue may be a voice assist system 120, such as a smart speaker or a mobile phone that includes a virtual assistant.

分類器をトレーニングするために使用されるハードウェアは、大幅により多くの計算能力を有してもよく、例えば、エージェントを使用してタスクを実行するために使用されるハードウェアよりも、１秒間により多くの演算を実行でき、かつ、より多くメモリを有する。より少ないリソースを有するハードウェアを使用することは可能である。なぜなら、例えば、１つ以上のニューラルネットワークを使用して推測を実行することによって音声認識を実行することは、例えば、１つ以上のニューラルネットワークをトレーニングすることによって音声認識システムをトレーニングすることよりも、実質的にかなり少ない計算リソースであるからである。さらに、例えば、１つ以上のニューラルネットワークを使用して推測を実行する、音声認識を実行するために使用される計算リソースを低減するために技術が用いられることができる。このような技術の例は、モデル蒸留（distillation）を含み、ニューラルネットワークに対しては、プルーニング（枝刈り：pruning）および量子化のような、ニューラルネットワーク圧縮技術を含む。 The hardware used to train the classifier may have significantly more computing power, for example, one second than the hardware used to perform tasks using agents. More operations can be performed and more memory is available. It is possible to use hardware with less resources. Because, for example, performing speech recognition by performing inferences using one or more neural networks is more than training a speech recognition system, for example, by training one or more neural networks. This is because it is a substantially small amount of computational resource. Further, techniques can be used to reduce the computational resources used to perform speech recognition, eg, to perform inferences using one or more neural networks. Examples of such techniques include model distillation, and for neural networks include neural network compression techniques such as pruning and quantization.

対話を行う事に対して、図２Ａのアプリケーションプログラム７０７は、図２Ｂ中に示される３つのメインモジュールを有する。これらは１）アクション状態更新コンポーネント７５１、２）システム移動選択コンポーネント７５３、および３）テンプレートに基づく自然言語生成器７５５である。 For dialogue, the application program 707 of FIG. 2A has three main modules shown in FIG. 2B. These are 1) action state update component 751, 2) system move selection component 753, and 3) template-based natural language generator 755.

対話システムは対話状態を使用して動作する。対話状態の例が図４に示される。実施形態において、対話状態は、以前に議論したアイテムを含む、対話履歴とユーザ目的とについてのシステムビリーフ（the system beliefs）を記憶する。各発話またはユーザ入力の後、状態は、アクション状態更新コンポーネント７５１によって更新される。更新された状態は、システム移動選択コンポーネント７５３に移動する。このシステム移動選択コンポーネント７５３は、更新された状態を受信し、答えを決定するためにシステム移動選択ポリシーを適用する。更新された状態を受信すると応答を提供するように構成されている多くのこのようなモジュールがあることから、システム移動選択コンポーネントまたは「ポリシーコンポーネント」に対する多くの有り得るオプションがある。実施形態において、統計的学習ポリシーが使用される。しかしながら、ルールベースのアプローチを使用する他のシステムも使用できる。例では、以下の方法を使用できる。Jost Schatzmann他、Human Language Technologies 2007における「Agenda-based user simulation for bootstrapping a POMDP dialogue system」。Association for Computational Linguistics、ｐｐ．１４９－１５２、２００７年４月。 The dialogue system operates using the dialogue state. An example of the dialogue state is shown in FIG. In embodiments, the dialogue state stores the system beliefs about the dialogue history and user objectives, including previously discussed items. After each utterance or user input, the state is updated by the action state update component 751. The updated state is moved to the system move selection component 753. The system move selection component 753 receives the updated state and applies the system move selection policy to determine the answer. With so many such modules configured to provide a response when they receive an updated state, there are many possible options for the system move selection component or the "policy component". In embodiments, statistical learning policies are used. However, other systems that use a rule-based approach can also be used. In the example, the following methods can be used. "Agenda-based user simulation for bootstrapping a POMDP dialogue system" in Human Language Technologies 2007 by Jost Schatzmann et al. Association for Computational Linguistics, pp. 149-152, April 2007.

システム移動選択コンポーネント７５３の出力は、その後、テンプレートに基づく自然言語生成器７５５によって自然言語応答に変換される。 The output of the system move selection component 753 is then converted into a natural language response by the template-based natural language generator 755.

図４は、状態の例を示している。状態は目的を備えている。この特定の例では、目的は、３つのスロット：食べ物、エリア、価格帯によって表される。対話の開始時に、各スロットは空であるが、ユーザからより多くの情報が集められるとスロットにはデータが入れられる。 FIG. 4 shows an example of the state. The state has a purpose. In this particular example, the purpose is represented by three slots: food, area, and price range. At the beginning of the dialogue, each slot is empty, but as more information is gathered from the user, the slots will be populated.

対話状態はまた、対話履歴を備える。この例では、対話履歴は、３つのアイテムを含んでいるが、アイテムの数は固定されず、対話の間により多くのアイテムが追加されると増加するであろうことに留意すべきである。この実施形態のシステムは、スロット充填システムに関する履歴を定義し、これは、この例では、特定のエリア、価格帯、または食べ物のタイプに一致するレストランをユーザが見つけることを可能にする。これらは、この例のドメイン定義で情報提供可能なスロットであり、各アイテムに対する対話履歴で設定される（このケースではレストランである）。情報提供可能なスロットに加えて、要求可能なスロットも定義される。この例では、要求可能なスロットは、電話番号、住所、郵便番号、エリア、価格帯、および食べ物のタイプである。スロットは、ドメインによって定義される。 The dialogue state also comprises a dialogue history. It should be noted that in this example, the dialogue history contains three items, but the number of items is not fixed and will increase as more items are added during the dialogue. The system of this embodiment defines a history of the slot filling system, which in this example allows the user to find a restaurant that matches a particular area, price range, or food type. These are slots that can be informative in the domain definition of this example and are set in the dialogue history for each item (in this case, a restaurant). In addition to the informable slots, the requestable slots are also defined. In this example, the requestable slots are phone number, address, zip code, area, price range, and food type. Slots are defined by the domain.

実施形態において、状態更新は、動作のセットまたはアクションにおいて見られる。各アクションは、対話状態の値を変更する。例えば、発話「私はイタリア風の食べ物に関心がある」に対する状態更新アクションは、ユーザ目的を食べ物＝イタリア風で更新する。発話「イタリア風レストランはどのエリアにある？」に対する状態更新アクションは、属性食べ物＝イタリア風に一致するエンティティのエリアフィールドに対する要求ビットをオンに切り替える。アクション検出は、どの状態変更アクションが、所定の文脈においてユーザによって意図されているかを識別するタスクである。我々のアプローチでは、状態変更のための命令であるアクションは、発話の意味解析をすることなく検出される。 In embodiments, state updates are seen in a set of actions or actions. Each action changes the value of the interactive state. For example, the state update action for the utterance "I am interested in Italian-style food" updates the user's purpose with food = Italian-style. The state update action for the utterance "Which area is the Italian restaurant?" Turns on the request bit for the area field of the entity that matches the attribute food = Italian style. Action detection is the task of identifying which state-changing action is intended by the user in a given context. In our approach, actions, which are commands for state changes, are detected without semantic analysis of the utterance.

全体のプロセスは、図３のフローチャートを参照して説明する。ステップＳ１０１において、ユーザ入力が受信され、これは自然言語入力である。 The entire process will be described with reference to the flowchart of FIG. In step S101, a user input is received, which is a natural language input.

ステップＳ１０３において、複数入力アクションが生成され、これらは、候補要求アクションおよび目的変更アクションであることがある。候補要求アクションは、対話履歴に記憶された各アイテムに対する要求可能なスロットのそれぞれに対して生成される。例えば、対話履歴がこれらのレストランを含む場合、１８個の要求候補アクションが生成される（６つの要求可能スロット×３アイテム）。ユーザ目的を変更することは、対照的に、文脈－独立アクションである。ドメインオントロジーを考慮すると、（情報提供可能な）スロット－値ペアに対応する、各順番における同じ数の目的変更アクションをモデルは分類する。例えば、ケンブリッジレストランドメインは、食べ物のタイプ、エリア、および価格幅スロットに対して１０２の値を有する。 In step S103, a plurality of input actions are generated, which may be a candidate request action and a purpose change action. Candidate request actions are generated for each of the requestable slots for each item stored in the dialogue history. For example, if the dialogue history includes these restaurants, 18 request candidate actions are generated (6 requestable slots x 3 items). Changing the user's purpose, in contrast, is a context-independent action. Given the domain ontology, the model classifies the same number of repurpose actions in each order that correspond to (informative) slot-value pairs. For example, the Cambridge restaurant domain has a value of 102 for a food type, area, and price range slot.

これらは、その後、モデルへの入力として変換される。この実施形態において、モデルへの入力は、以下からなるワードシーケンスである：１）システムの最後の発話から導出されたワードシーケンス、これは、それが現れるシステム発話であってもよいし、または語彙化された対話作用の形態でのシステム発話であってもよい、２）ステップＳ１０１からのユーザ発話、３）アイテム説明、および４）テンプレート生成アクション文。アイテム説明は、アクションから生成された文字列である。アイテム－独立アクション（目的変更）について、アイテム説明は空であり；アイテム－独立アクション（情報要求）について、それは要求されたアイテムの説明に対応する。図４の状態に対する第１のアイテムのアクション要求アドレスに対応する説明は、「名前ｚｉｚｉエリア中央価格安い食べ物イタリア風」である。 These are then transformed as inputs to the model. In this embodiment, the input to the model is a word sequence consisting of: 1) a word sequence derived from the last utterance of the system, which may be the system utterance in which it appears, or a lexical. It may be a system utterance in the form of a converted dialogue action, 2) a user utterance from step S101, 3) an item description, and 4) a template generation action statement. The item description is a string generated from the action. For item-independent action (change of purpose), the item description is empty; for item-independent action (request for information), it corresponds to the requested item description. The description corresponding to the action request address of the first item for the state of FIG. 4 is "name zizi area central price cheap food Italian style".

これを説明するために、この例について、システムは、要求アクションに対して１８個の入力を生成する。 To illustrate this, for this example, the system will generate 18 inputs for the requested action.

Ｎａｎｄｏは北で良いレストランであるＳＥＰイタリア風レストランの価格帯は？
ＳＥＰ名前ｚｉｚｉエリア中央価格安い食べ物イタリア風ＳＥＰ電話番号は？
Ｎａｎｄｏは北で良いレストランであるＳＥＰイタリア風レストランの価格帯は？
ＳＥＰ名前ｚｉｚｉエリア中央価格安い食べ物イタリア風ＳＥＰ住所は？
Ｎａｎｄｏは北で良いレストランであるＳＥＰイタリア風レストランの価格帯は？
ＳＥＰ名前ｚｉｚｉエリア中央価格安い食べ物イタリア風ＳＥＰ郵便番号は？
Ｎａｎｄｏは北で良いレストランであるＳＥＰイタリア風レストランの価格帯は？
ＳＥＰ名前ｚｉｚｉエリア中央価格安い食べ物イタリア風ＳＥＰエリアは？
Ｎａｎｄｏは北で良いレストランであるＳＥＰイタリア風レストランの価格帯は？
ＳＥＰ名前ｚｉｚｉエリア中央価格安い食べ物イタリア風ＳＥＰ価格帯は？
Ｎａｎｄｏは北で良いレストランであるＳＥＰイタリア風レストランの価格帯は？
ＳＥＰ名前ｚｉｚｉエリア中央価格安い食べ物イタリア風ＳＥＰ食べ物のタイプは？
Ｎａｎｄｏは北で良いレストランであるＳＥＰイタリア風レストランの価格帯は？
ＳＥＰ名前Ｇａｎｄｈｉエリア中央価格手頃食べ物インド風ＳＥＰ電話番号は？
Ｎａｎｄｏは北で良いレストランであるＳＥＰイタリア風レストランの価格帯は？
ＳＥＰ名前Ｇａｎｄｈｉエリア中央価格手頃食べ物インド風イタリア風ＳＥＰ住所は？
Ｎａｎｄｏは北で良いレストランであるＳＥＰイタリア風レストランの価格帯は？
ＳＥＰ名前Ｇａｎｄｈｉエリア中央価格手頃食べ物インド風ＳＥＰ郵便番号は？
Ｎａｎｄｏは北で良いレストランであるＳＥＰイタリア風レストランの価格帯は？
ＳＥＰ名前Ｇａｎｄｈｉエリア中央価格手頃食べ物インド風ＳＥＰエリアは？
Ｎａｎｄｏは北で良いレストランであるＳＥＰイタリア風レストランの価格帯は？
ＳＥＰ名前Ｇａｎｄｈｉエリア中央価格手頃食べ物インド風ＳＥＰ価格帯は？
Ｎａｎｄｏは北で良いレストランであるＳＥＰイタリア風レストランの価格帯は？
ＳＥＰ名前Ｇａｎｄｈｉエリア中央価格手頃食べ物インド風ＳＥＰ食べ物のタイプは？
Ｎａｎｄｏは北で良いレストランであるＳＥＰイタリア風レストランの価格帯は？
ＳＥＰ名前Ｈｏｔｐｏｔエリア北価格高価食べ物中国風ＳＥＰ電話番号は？
Ｎａｎｄｏは北で良いレストランであるＳＥＰイタリア風レストランの価格帯は？
ＳＥＰ名前Ｈｏｔｐｏｔエリア北価格高価食べ物中国風ＳＥＰ住所は？
Ｎａｎｄｏは北で良いレストランであるＳＥＰイタリア風レストランの価格帯は？
ＳＥＰ名前Ｈｏｔｐｏｔエリア北価格高価食べ物中国風ＳＥＰ郵便番号は？
Ｎａｎｄｏは北で良いレストランであるＳＥＰイタリア風レストランの価格帯は？
ＳＥＰ名前Ｈｏｔｐｏｔエリア北価格高価食べ物中国風ＳＥＰエリアは？
Ｎａｎｄｏは北で良いレストランであるＳＥＰイタリア風レストランの価格帯は？
ＳＥＰ名前Ｈｏｔｐｏｔエリア北価格高価食べ物中国風ＳＥＰ価格帯は？
Ｎａｎｄｏは北で良いレストランであるＳＥＰイタリア風レストランの価格帯は？
ＳＥＰ名前Ｈｏｔｐｏｔエリア北価格高価食べ物中国風ＳＥＰ食べ物のタイプは？
目的変更アクションについての１０２の入力は、タイプである：
Ｎａｎｄｏは北で良いレストランであるＳＥＰイタリア風レストランの価格帯は？
ＳＥＰＳＥＰ食べ物イタリア風
Ｎａｎｄｏは北で良いレストランであるＳＥＰイタリア風レストランの価格帯は？
ＳＥＰＳＥＰ食べ物中国風
Ｎａｎｄｏは北で良いレストランであるＳＥＰイタリア風レストランの価格帯は？
ＳＥＰＳＥＰエリア中央 Nando is a good restaurant in the north What is the price range for SEP Italian restaurants?
SEP Name zizi Area Central Price Cheap Food Italian Style SEP Phone Number?
Nando is a good restaurant in the north What is the price range for SEP Italian restaurants?
SEP Name zizi Area Central Price Cheap Food Italian Style SEP Address?
Nando is a good restaurant in the north What is the price range for SEP Italian restaurants?
SEP Name zizi Area Central Price Cheap Food Italian Style SEP Zip Code?
Nando is a good restaurant in the north What is the price range for SEP Italian restaurants?
SEP Name zizi Area Central Price Cheap Food Italian Style What is the SEP Area?
Nando is a good restaurant in the north What is the price range for SEP Italian restaurants?
SEP name zizi area central price cheap food Italian style SEP price range?
Nando is a good restaurant in the north What is the price range for SEP Italian restaurants?
SEP Name zizi Area Central Price Cheap Food Italian Style SEP What type of food?
Nando is a good restaurant in the north What is the price range for SEP Italian restaurants?
SEP Name Gandhi Area Central Price Affordable Food Indian Style SEP Phone Number?
Nando is a good restaurant in the north What is the price range for SEP Italian restaurants?
SEP Name Gandhi Area Central Price Affordable Food Indian Style Italian Style SEP Address?
Nando is a good restaurant in the north What is the price range for SEP Italian restaurants?
SEP Name Gandhi Area Central Price Affordable Food Indian Style SEP Zip Code?
Nando is a good restaurant in the north What is the price range for SEP Italian restaurants?
SEP Name Gandhi Area Central Price Affordable Food Indian Style What is the SEP Area?
Nando is a good restaurant in the north What is the price range for SEP Italian restaurants?
SEP Name Gandhi Area Central Price Affordable Food Indian Style SEP Price Range?
Nando is a good restaurant in the north What is the price range for SEP Italian restaurants?
SEP Name Gandhi Area Central Price Affordable Food Indian Style SEP What type of food?
Nando is a good restaurant in the north What is the price range for SEP Italian restaurants?
SEP Name Hotpot Area North Price Expensive Food Chinese Style SEP Phone Number?
Nando is a good restaurant in the north What is the price range for SEP Italian restaurants?
SEP Name Hotpot Area North Price Expensive Food Chinese Style SEP Address?
Nando is a good restaurant in the north What is the price range for SEP Italian restaurants?
SEP Name Hotpot Area North Price Expensive Food Chinese Style SEP Zip Code?
Nando is a good restaurant in the north What is the price range for SEP Italian restaurants?
SEP Name Hotpot Area North Price Expensive Food Chinese Style What is the SEP area?
Nando is a good restaurant in the north What is the price range for SEP Italian restaurants?
SEP Name Hotpot Area North Price Expensive Food Chinese Style SEP Price Range?
Nando is a good restaurant in the north What is the price range for SEP Italian restaurants?
SEP Name Hotpot Area North Price Expensive Food Chinese Style SEP What type of food?
The input of 102 for the change purpose action is of type:
Nando is a good restaurant in the north What is the price range for SEP Italian restaurants?
SEP SEP Food Italian Nando is a good restaurant in the north SEP Italian restaurant price range?
SEP SEP Food Chinese-style Nando is a good restaurant in the north SEP Italian-style restaurant price range?
SEP SEP area center

上記において、ＳＥＰは、文の間の分離を示す。 In the above, SEP indicates the separation between sentences.

ステップＳ１０５において、入力がスコアリングされる。実施形態において、これは、双方向トランスフォーマーであるトレーニングされたモデルに入力をすることによってなされる。これは、図５に概略的に示される。１）システム、２）ユーザ、３）アイテム説明、および４）アクション文を備えている入力は、双方向エンコーダへのシーケンスとして入力されることが示されている（このケースではＢＥＲＴである）。分類フラグＣＬＳが全体入力に対して生成され、その後これはスコアを生成するために線形レイヤを通して提供される。モデル入力にアイテム説明を含むことにより、トランスフォーマーモデルの注意機構は、所定の文脈におけるユーザ発話からアクションが推測されうるかどうかを検出するように学習する。アイテム説明の存在、候補アクションの動的生成、およびデータ生成の方法は、参照されている表現をモデルが解釈することを可能にする。 In step S105, the input is scored. In embodiments, this is done by inputting into a trained model that is a bidirectional transformer. This is schematically shown in FIG. Inputs with 1) system, 2) user, 3) item description, and 4) action statement have been shown to be input as a sequence to a bidirectional encoder (in this case, BERT). A classification flag CLS is generated for the entire input, which is then provided through a linear layer to generate a score. By including an item description in the model input, the transformer model's attention mechanism learns to detect whether an action can be inferred from a user's utterance in a given context. The existence of item descriptions, the dynamic generation of candidate actions, and the method of data generation allow the model to interpret the referenced representation.

異なる部分に入力を備える上記の方法は、潜在的利点を有しており、それは、事前トレーニングから意味をエンコードすることである。 The above method of having inputs in different parts has a potential advantage, which is to encode the meaning from pre-training.

入力としての上記の「アクション文」例えば「価格帯は」は、単にワード「価格帯」を使用することとは対照的である。しかしながら、ワード「価格帯」のみを使用することもできる。「価格帯を要求」は自然でないことから、文は生成され、自然言語で動作するようにＢＥＲＴが最適化される。 The above "action statement", eg, "price range", as input is in contrast to simply using the word "price range". However, it is also possible to use only the word "price range". Since "request price range" is not natural, the sentence is generated and the BERT is optimized to work in natural language.

ステップ１０７において、しきい値よりも高いスコアを有する入力が選択され、しきい値はこのケースでは、０．５である。その後ステップ１０９において、これらの入力は状態を更新するために使用される、すなわち、目的（スロット値）を変更すること、または対話履歴中のアイテムのうちの１つについての要求ビットを設定することのいずれかによって対話状態を更新するために使用される。更新の間、以下のヒューリスティクスが適用される：１）スロットに対して複数のアクションが予測される場合、最も高いスコアを有するものが使用される；２）複数の要求アクションが０．５より大きいスコアを受信する場合、最新に言及されたアイテムに対する要求ビットのみが使用される。上記で説明したように、対話状態は、最新に言及された順序で対話履歴を記憶し、したがって、最新に言及されたアイテムを容易に決定することが可能である。いったん要求ビットが設定されると、この情報は、他の状態更新情報を、例えば、目的が更新されることを考慮して要求ビットが設定される情報をどのように取り扱うかの決定を行うポリシーモジュールへと移動される。実施形態において、ポリシーモデルは、システム応答に対するテンプレートを選ぶ分類器である。これはルールが要求ビットの設定によってトリガされるルールベースの応答選択であることもある。 In step 107, an input with a score higher than the threshold is selected and the threshold is 0.5 in this case. Then in step 109, these inputs are used to update the state, i.e. to change the purpose (slot value) or set the request bit for one of the items in the dialogue history. Used to update the dialogue state by either. During the update, the following heuristics apply: 1) If multiple actions are predicted for a slot, the one with the highest score is used; 2) Multiple request actions are from 0.5 When receiving a large score, only the request bit for the most recently mentioned item is used. As described above, the dialogue state stores the dialogue history in the order of the most recent mentions, and thus it is possible to easily determine the most recently mentioned item. Once the request bit is set, this information is a policy that determines how to handle other state update information, for example, the information for which the request bit is set, taking into account that the purpose is updated. Moved to the module. In embodiments, the policy model is a classifier that selects a template for the system response. This may be a rule-based response selection where the rule is triggered by the setting of the request bit.

ステップＳ１１１において、更新された対話状態は、その後、ポリシーモデルによって受信され、ポリシーモデルは、ステップＳ１１３においてシステム応答を提供するために使用される。自然言語応答は、Ｓ１１５における出力を提供するために、自然言語生成コンポーネントを使用して生成されることができる。システム応答は、その後、ユーザに提供され、ユーザ応答が待たれる。いったんユーザ入力が受信されると、プロセスは、Ｓ１０１に戻り、再開する。しかしながら、ここで、ステップＳ１１５におけるシステム応答は、複数の入力を生成するために使用される。 In step S111, the updated dialogue state is then received by the policy model, which is used to provide the system response in step S113. The natural language response can be generated using the natural language generation component to provide the output in S115. The system response is then provided to the user and the user response is awaited. Once the user input is received, the process returns to S101 and resumes. However, here, the system response in step S115 is used to generate the plurality of inputs.

上述の実施形態において、対話状態から候補アクションのセットが生成される。文脈は対話状態に記憶され、対話状態を更新するために統計的方法が使用される。二値分類は、ユーザによって意図されたアクションを検出するために使用される。これらのアクションは、その後、状態を決定論的に更新する。 In the above embodiment, a set of candidate actions is generated from the dialogue state. The context is stored in the interactive state and statistical methods are used to update the interactive state. Binary classification is used to detect actions intended by the user. These actions then deterministically update the state.

提案される「アクション検出器」モデルは、候補アクションのリストからユーザ発話によって意図されるアクションを識別するようにトレーニングされる。タスク指向対話システム中の候補アクションは、現在の対話状態とドメインオントロジーとに基づいて、動的に生成される。上記の実施形態は、テキストベースのチャットでのタイプされたテキストのようなユーザの発話の入力ワードとして、または、音声対話システムでの音声認識器の出力として取り込んでいる。 The proposed "action detector" model is trained to identify the action intended by the user utterance from the list of candidate actions. Candidate actions in a task-oriented dialogue system are dynamically generated based on the current dialogue state and domain ontology. The above embodiment is captured as an input word for a user's utterance, such as typed text in a text-based chat, or as the output of a speech recognizer in a speech dialogue system.

上記の実施形態において、状態更新は、動作またはアクションのセットとしてみなされる。各アクションは、対話状態において値を変更し、これは、以前に議論したアイテムを含む、ユーザ目的および対話履歴についてのシステムビリーフを記憶する。例えば、発話「私はイタリア風の食べ物に興味を持っています」についての状態更新は、ユーザ目的を食べ物＝イタリア風で更新する。発話「イタリア風レストランはどのエリアにありますか」についての状態更新アクションは、属性の食べ物＝イタリア風に一致するエンティティのエリアフィールドに対する要求ビットをオンに切り替える。 In the above embodiments, state updates are considered as a set of actions or actions. Each action changes its value in the dialogue state, which stores a system belief about the user's purpose and dialogue history, including the previously discussed items. For example, the status update for the utterance "I am interested in Italian-style food" updates the user's purpose with food = Italian-style. The state update action for the utterance "Which area is the Italian restaurant in?" Turns on the request bit for the area field of the entity that matches the attribute food = Italian.

図５は、図３の上記の説明から理解できるプロセスおよびモデルの概略を示している。図５において、＄スロットは、価格帯、エリア、食べ物のタイプのうちの１つであり、および＄値は、データベースに記憶されているこれらの値（安い／手頃／高価、北／南／．．．、インド風／イタリア風／．．．）である。 FIG. 5 outlines a process and model that can be understood from the above description of FIG. In FIG. 5, the $ slot is one of the price range, area, food type, and the $ value is these values stored in the database (cheap / affordable / expensive, north / south /. ..., Indian style / Italian style / ...).

実施形態において、上記の状態更新モジュールは、以下の３つの基本ステップを実行する：
１）対話状態から候補アクションを推測
２）各候補アクションに対する関連性スコアを計算
３）最も有り得るアクションで状態を更新 In an embodiment, the above state update module performs three basic steps:
1) Guess the candidate action from the dialogue state 2) Calculate the relevance score for each candidate action 3) Update the state with the most probable action

アルゴリズムの第１のステップ、現在の対話状態に対する候補アクションのセットを生成することは、決定論的である。アクションは、現在の状態から推定されうる。所定のアクションのセットの状態を更新する最後のステップも決定論的である。アルゴリズムの第２のステップは、ユーザによって意図されているその確率で各候補アクションをスコアリングすることである。 The first step in the algorithm, generating a set of candidate actions for the current dialogue state, is deterministic. The action can be inferred from the current state. The final step of updating the state of a given set of actions is also deterministic. The second step of the algorithm is to score each candidate action with that probability intended by the user.

上記の実施形態では、二進出力によるＢＥＲＴエンコーダおよび線形レイヤが使用される。モデルへの入力は、ワードシーケンスであり、以下からなる：１）語彙化された対話作用のシーケンス、２）ユーザ発話、３）アイテム説明、および４）テンプレート－生成アクション文。アイテム説明は、アクションから生成された文字列である。アイテム－独立アクション（目的変更）について、アイテム説明は空であり；アイテム－依存アクション（情報要求）について、それは要求されたアイテムの説明に対応する。モデルは、アクションがユーザによって意図されたかどうかの確率を出力する。 In the above embodiment, a BERT encoder with binary output and a linear layer are used. The input to the model is a word sequence and consists of: 1) a sequence of lexicalized dialogues, 2) user utterances, 3) item descriptions, and 4) templates-generated action statements. The item description is a string generated from the action. For item-independent action (change of purpose), the item description is empty; for item-dependent action (request information), it corresponds to the requested item description. The model outputs the probability of whether the action was intended by the user.

次に、分類器のトレーニングが説明される。分類器は、肯定および否定例を使用してトレーニングされる。
< sys, usr, action → (itemdescr, actionsent) >:0/1 Next, classifier training is described. Classifiers are trained using positive and negative examples.
<sys, usr, action → (itemdescr, actionsent)>: 0/1

用語「sys」は、以前のシステム応答であり、「usr」はユーザ発話であり、actionは、ユーザによって意図されたアクションである。上述の例と一致するように、「action」は上述したようなアイテム説明とアクション文に細分化される。 The term "sys" is the previous system response, "usr" is the user utterance, and action is the action intended by the user. To be consistent with the example above, "action" is subdivided into item descriptions and action statements as described above.

トレーニングセットを作り出すために、（１でラベル付けした）肯定例においてアクションはユーザによって意図されるが、（０でラベル付けした）否定例では、そうではない。アクションは、現在の状態についての命令、例えば、「第１のアイテムの価格帯を要求」であることから、アイテム説明およびモデルへのアクション文の入力は、アクションおよび状態から推測される。分類器をトレーニングするために３つのデータセットは、以下の表２に要約される。

To create a training set, the action is intended by the user in the positive example (labeled with 1), but not in the negative example (labeled with 0). Since the action is an instruction about the current state, for example, "request the price range of the first item", the item description and the input of the action statement to the model are inferred from the action and the state. The three datasets for training the classifier are summarized in Table 2 below.

ベースラインデータセットは、ＤＳＴＣ２コーパスのトレーニング区分から生成される。各順番について、ユーザによって意図される各アクションに対する肯定的な例が生成される。意図されたアクションは、マニュアルＮＬ注釈から推測され、例えば、アクションはＮＬ注釈から抽出され、例えば、「私はイタリア風がほしい／食べ物＿タイプ食べ物」／要求＿食べ物（‘I want italian/FOOD_TYPE food’/REQUEST_FOOD）は、アクション要求＿イタリア風に対応する。否定的な例（不正解の選択肢）を生成するために、すべての有効な意図されていないアクション（スロット－値ペア）を使用することが考慮される。しかしながら、これは、アクションの数が大きいとき、高度にゆがめられたデータセットを作り出す。代わりに、各肯定的な例に対して、意図されていないアクションは、より関連のある不正解の選択肢を選択するために頻度および類似性のヒューリスティクスを使用してサンプリングされる。タスクの設計によって、ＤＳＴＣ２データセットは、ユーザの番で表現参照することを含まない。すべてのユーザ要求は、一般的であり、最後に提示されたアイテム（例えば、電話番号は？）を参照する。したがって、ベースラインデータセットでトレーニングされたモデルは、最後に提示されたアイテムへの参照だけを理解できる。 The baseline dataset is generated from the training division of the DSTC2 corpus. For each order, a positive example is generated for each action intended by the user. The intended action is inferred from the manual NL annotation, for example, the action is extracted from the NL annotation, for example, "I want italian / FOOD_TYPE food" / request_food ('I want italian / FOOD_TYPE food). '/ REQUEST_FOOD) corresponds to the action request_Italian style. It is considered to use all valid unintended actions (slot-value pairs) to generate negative examples (incorrect choices). However, this creates a highly distorted dataset when the number of actions is large. Instead, for each positive example, unintended actions are sampled using frequency and similarity heuristics to select more relevant incorrect answer options. Due to the design of the task, the DSTC2 dataset does not include representational references in the user's turn. All user requests are general and refer to the last presented item (eg, what is the phone number?). Therefore, the model trained in the baseline dataset can only understand the reference to the last presented item.

ｅｘｔＨは、表現を参照する自動生成発話でベースラインデータセットを拡張する。ユーザは、要求可能なスロットのうちのいずれかについての質問を尋ね、情報提供可能なスロットのうちのいずれかを参照してもよい。これをするために、ＤＳＴＣ２データセットからの要求スロットに対する参照表現なく要求発話をランダムにサンプリングし、参照スロットのためにそれをテンプレート－生成参照表現と連結することにより、要求可能および情報提供可能なスロットのすべての組み合わせに対するデータセットをトレーニング／開発するための表現を参照して、１０Ｋ／３Ｋ要求が生成される（表３参照）。

extH extends the baseline dataset with auto-generated utterances that reference the representation. The user may ask a question about any of the requestable slots and refer to any of the informative slots. To do this, requestable and informative can be provided by randomly sampling the request utterance without a reference representation for the request slot from the DSTC2 dataset and concatenating it with a template-generated reference representation for the reference slot. A 10K / 3K request is generated with reference to the representation for training / developing the dataset for all combinations of slots (see Table 3).

表２中に示すように、能動学習（active learning）を使用して、更なるデータセットが生成される。キーアイディアは、アルゴリズムがトレーニングサンプルを選択できることである。表２のｅｘｔＡデータセットは、シミュレートされた対話から最もチャレンジングな不正解の選択肢を自動的に選択することによって、生成される。 As shown in Table 2, active learning is used to generate additional datasets. The key idea is that the algorithm can select training samples. The extA dataset in Table 2 is generated by automatically selecting the most challenging incorrect answer choices from the simulated dialogue.

トレーニングセットは、目的制約を繰り返し変更することによって複数の場所を探し出し、対話の早期に提供された場所に対するスロットを要求するように拡張されうる。さらに、この新たな挙動に対する表現を参照して発話を生成するためにテンプレートが生成され、結果として、シミュレートされたユーザ発話を生成するためのハイブリッド検索／テンプレートに基づくモデルをもたらす。 The training set can be extended to seek out multiple locations and request slots for the locations provided early in the dialogue by repeatedly changing the objective constraints. In addition, a template is generated to generate utterances with reference to the representation for this new behavior, resulting in a hybrid search / template-based model for generating simulated user utterances.

テストとして、５０００対話に対するベースラインデータセットでトレーニング済みの分類器を使用するＡＳＵモジュールで第１のシミュレーションが実行される。実際のユーザの代わりのシミュレーションにおいて、ユーザをシミュレートするために別のシステムが使用される。この特定の例では、ランダムに選択された目的を受信し、人－コンピュータ対話に類似した発話を生成するルールに基づきシミュレートされたユーザが用いられる。シミュレートされたユーザ意図から、「意図された」ユーザアクションが推測され、新たなトレーニング例が自動的にラベル付けされる。ベースラインモデルがＴ１未満の関連性スコアを予測した各「意図された」アクションは、肯定的な例として使用される。Ｔ２より大きい最も高い関連性スコアを有する最大Ｍの「意図されていない」アクションは、否定的な例として使用される。このテストでは、Ｔ１＝．９９、Ｔ２＝０．５、およびＭ＝２である。ベースラインデータセットでトレーニングされたモデルでこれらが正しく分類された場合でさえ、表現を参照するすべての生成された発話はまた、肯定的な例として使用される。 As a test, a first simulation is performed on an ASU module that uses a trained classifier with a baseline dataset for 5000 dialogues. In a simulation on behalf of the actual user, another system is used to simulate the user. In this particular example, a user simulated based on a rule that receives a randomly selected purpose and produces an utterance similar to a human-computer dialogue is used. From the simulated user intent, the "intended" user action is inferred and new training examples are automatically labeled. Each "intended" action in which the baseline model predicts a relevance score below T1 is used as a positive example. The maximum M "unintended" action with the highest relevance score greater than T2 is used as a negative example. In this test, T1 =. 99, T2 = 0.5, and M = 2. All generated utterances that refer to the representation are also used as a positive example, even if these are correctly classified in the model trained in the baseline dataset.

上記を論証するために、ＤＳＴＣ２コーパスのテストサブセット上のベースラインモデルで、すなわち、表現を参照することなく、ＡＳＵアプローチがトレーニングされる。ユーザ入力のマニュアルトランスクリプトを使用して、ユーザ通知の９６％およびユーザ要求の９９％（公式ＤＳＴＣ２評価スクリプトによって計算されるような平均目的および要求精度）をモデルは正しく識別する。 To demonstrate the above, the ASU approach is trained in a baseline model on a test subset of the DSTC2 corpus, ie, without reference to representations. Using a user-entered manual transcript, the model correctly identifies 96% of user notifications and 99% of user requests (average purpose and required accuracy as calculated by the official DSTC2 evaluation script).

次に、ユーザ要求中の表現を参照してシミュレートされた対話に関して提案するアプローチが評価される。ベースライン、ｅｘｐＨ、およびｅｘｐＡデータセットに関して提案したアクション状態更新コンポーネントでシミュレーションは実行される。
結果を表４に示す。

Next, the approach proposed for the simulated dialogue with reference to the representation in the user request is evaluated. The simulation is performed with the proposed action state update components for the baseline, expH, and expA datasets.
The results are shown in Table 4.

上限（ＧＯＬＤ）条件として、シミュレートされた対話作用から推測された正しいアクションで、シミュレーションは実行される。対話作用（ＤＡ）を入力および２５％対話作用混同率として使用するアジェンダに基づくシミュレーションにより、ポリシーモデルはトレーニングされる。ｅｘｐＨおよびｅｘｐＡでトレーニングされるモデルに関して、ポリシーモデルはまた、入力として、対話作用仮説よりもむしろシミュレートされたユーザ発話でトレーニングされる。この条件において、ポリシーは、ＡＳＵモデルによって作られた状態更新エラーを克服するように学習してもよい。 As a GOLD condition, the simulation is performed with the correct action inferred from the simulated dialogue. The policy model is trained by agenda-based simulations that use dialogue (DA) as input and as a 25% dialogue confusion rate. For models trained at expH and expA, the policy model is also trained as input with simulated user utterances rather than dialogue hypotheses. In this condition, the policy may be learned to overcome the state update error created by the ASU model.

各実験条件に対して５０００対話がシミュレートされ、対話および個々の順番に対する統計が計算される。（場合によっては多数の目的変更の後）システムが提供する場所がシミュレートされたユーザの目的制約に一致する場合、対話成功率は、シミュレートされた対話の比率であり、シミュレートされたユーザによって要求された追加の情報を提供する。状態更新精度は、ａ）すべての順番、ｂ）通知のみとして注釈された順番、およびｃ）要求のみとして注釈された順番にわたって、平均精度として計算される。 5000 dialogues are simulated for each experimental condition and statistics for dialogues and individual turns are calculated. If the location provided by the system (possibly after a number of objective changes) matches the simulated user's objective constraints, the dialogue success rate is the ratio of the simulated dialogue and the simulated user. Provide additional information requested by. The state update accuracy is calculated as average accuracy over a) all orders, b) the order annotated as notification only, and c) the order annotated as request only.

シミュレートされたユーザ挙動は、状態更新モデルによって影響を及ぼされる。シミュレートされた対話の平均長さは、ＧＯＬＤ条件に対する７．９３からベースラインに対する１０．０６の範囲である。より低い状態更新精度は、より長い対話につながる。なぜなら、システムが正しい応答に失敗したとき、シミュレートされるユーザ繰り返しまたは言い換え要求は、対話の長さを増加させるからである。ベースライン条件はたった４３．９パーセントの対話成功を達成し、すべてのユーザの順番で５０%の状態更新精度を達成する。ｅｘｐＨＤＡ条件では、対話成功および全体の精度は、通知について７９％であり要求については僅か５０．０％である精度で９１．１％および７５．１％に増加する。能動学習アプローチ（ｅｘｐＡＤＡ）により、対話成功および全体の精度は、通知について９８．８％であり要求について９４．０％である精度で９９．５％および９８．１％まで増加する。一致したポリシーモデルを使用することは、ｅｘｐＨおよびｅｘｐＡモデルの両方の性能に対して影響を及ぼし、要求について精度を４．３および１．４絶対％ポイント増加させる。しかしながら、ｅｘｐＨモデルによってトレーニングされたポリシーを使用すると、ユーザ通知作用の精度は３．１ポイント減少し、対話の長さを増加させる。結果は、アクション状態更新アプローチが能動学習と組み合わせて効果的であることを示している。 The simulated user behavior is influenced by the state update model. The average length of the simulated dialogue ranges from 7.93 for the GOLD condition to 10.06 for the baseline. Lower state update accuracy leads to longer dialogue. This is because the simulated user iteration or paraphrase request increases the length of the dialogue when the system fails to respond correctly. Baseline conditions achieve only 43.9 percent dialogue success and 50 percent state update accuracy in turn for all users. Under expH DA conditions, dialogue success and overall accuracy increase to 91.1% and 75.1% with an accuracy of 79% for notifications and only 50.0% for requirements. The active learning approach (expA DA) increases dialogue success and overall accuracy to 99.5% and 98.1% with an accuracy of 98.8% for notifications and 94.0% for requirements. Using a matched policy model impacts the performance of both the expH and expA models, increasing accuracy by 4.3 and 1.4 absolute percentage points for requirements. However, using policies trained by the expH model, the accuracy of the user notification action is reduced by 3.1 points and the length of the dialogue is increased. The results show that the action state update approach is effective in combination with active learning.

提案したアクション検出モデルを実際のユーザとテストするために、予備的ユーザ研究が実行された。テキストに基づくシステムは、ｅｘｐＡアクション検出モデルを使用する提案された対話状態トラッカー、テキストに基づくユーザシミュレータでトレーニングされた対話ポリシー、およびテンプレートに基づく自然言語生成器からなる。被験者が採用され、レストラン情報ナビゲーションを伴う５つのタスクを実行するように尋ねられる。各タスクにおいて、被験者に制約の初期設定（例えば、食べ物のタイプ：中国風、価格帯：安い）が与えられ、システムから適切な推薦を得るように尋ねられる。彼らは、その後、会話を継続し、制約を変更し、トータルで３つの推薦される場所を取得することにより、２つの代替推薦を得る。最後に、彼らは、これらの２つの場所についての電話番号または住所のような追加の情報を得るように尋ねられる。被験者はまた、＜ｅｒｒｏｒ＞を入力することにより、いつシステム応答が正しくなかったかを示すように尋ねられる。５つのすべてのタスクを完了した後、被験者は、「強く同意しない」から「強く同意する」におよぶ６段階リッカート尺度でスコアリングするために５つの文からなる質問事項、および、いくつのタスクが成功して完了したかを尋ねる質問（表５参照）に入力する。

Preliminary user studies were performed to test the proposed action detection model with real users. The text-based system consists of a proposed dialogue state tracker using the expA action detection model, a dialogue policy trained in a text-based user simulator, and a template-based natural language generator. Subjects are hired and asked to perform five tasks with restaurant information navigation. In each task, the subject is given a default constraint (eg, food type: Chinese style, price range: cheap) and asked to get appropriate recommendations from the system. They then get two alternative recommendations by continuing the conversation, changing the constraints, and getting a total of three recommended locations. Finally, they are asked to get additional information such as phone numbers or addresses about these two locations. Subjects are also asked to indicate when the system response was incorrect by entering <error>. After completing all five tasks, the subject will have a five-sentence question and a number of tasks to score on a six-point Likert scale ranging from "strongly disagree" to "strongly agree". Fill in the question (see Table 5) asking if it was successful and completed.

各ユーザは平均で６０．９の順番（turns）を入力し、これらのうちの１５パーセントをエラーとして印付けた。質問事項の結果は、システムは、彼らの場所への参照を理解していたことを示している（平均スコア４．８）。ユーザの半分は、５つすべてのタスクを完了し、ユーザのうちの一人のみが、システムがよく理解していなかったと感じていた。ユーザ全体での高い標準偏差は、ユーザ経験における高い変動性と、恐らくシステムの期待を示している。人の評価は、双方向の（interactive）対話システムにおいて上記モデルが使用できることを示している。 Each user entered an average of 60.9 turns, marking 15 percent of these as errors. The results of the questions indicate that the system understood the reference to their location (mean score 4.8). Half of the users completed all five tasks and only one of the users felt that the system was not well understood. A high standard deviation across the user indicates high volatility in the user experience and perhaps system expectations. Human evaluation shows that the above model can be used in an interactive dialogue system.

ここで説明される実施形態は、対話状態を更新する新規のアプローチを提供しており、表現を参照する要求を含む、ユーザ発話を解釈することに成功できる。最初のケンブリッジレストランデータセットを、表現とサンプルされた不正解の選択肢とを参照することを含むシミュレートされた要求で拡張することにより、実験モデルがトレーニングされる。不正解の選択肢が能動学習アプローチを使用してサンプリングされるデータセットでトレーニングされたモデルは、そのトレーニングセットのより小さなサイズにかかわらず、最良の性能を達成する。このモデルの人の評価は、アプローチは実際のユーザと対話システムにおいて使用できることを示している。 The embodiments described herein provide a novel approach to updating the dialogue state and can succeed in interpreting a user utterance, including a request to refer to a representation. The experimental model is trained by extending the first Cambridge restaurant dataset with simulated requirements that include reference to representations and sampled incorrect answer choices. Models trained on a dataset in which incorrect choices are sampled using an active learning approach achieve best performance regardless of the smaller size of the training set. Human evaluation of this model shows that the approach can be used in real-world user-interaction systems.

ある実施形態を説明してきたが、これらの実施形態は例としてのみ提示されており、本発明の範囲を限定することは意図していない。実際に、ここで説明した新規のデバイス、方法は、さまざまな他の形態で具現化されてもよく、さらに、ここで説明したデバイス、方法、および製品の形態におけるさまざまな省略、置き換え、および変更が、本発明の範囲および精神から逸脱することなくなされてもよい。付随する特許請求の範囲およびこれらの均等物は、本発明の範囲および精神内あるように、このような形態または修正をカバーするように意図されている。

Although certain embodiments have been described, these embodiments are presented by way of example only and are not intended to limit the scope of the invention. In fact, the new devices, methods described herein may be embodied in various other forms, as well as various omissions, replacements, and modifications in the form of devices, methods, and products described herein. However, it may be done without departing from the scope and spirit of the present invention. The accompanying claims and their equivalents are intended to cover such forms or modifications, as is within the scope and spirit of the invention.

Claims

A module for updating the dialogue state for use in a dialogue system for interacting with a user.
User input and
With the processor
Equipped with memory
The processor is adapted to update the dialogue state in response to a natural language input from the user, and the dialogue state is stored in the memory.
The dialogue state comprises a data structure that stores information exchanged between the user and the dialogue system.
The processor updates the dialogue state by comparing the natural language input from the user with a plurality of possible actions, wherein the action indicates a possible request of the user and is from an action that matches the natural language input. A module configured to update the above state using the information in.

The module of claim 1, wherein the dialogue state comprises a data structure comprising items mentioned during the dialogue.

The module of claim 2, wherein the plurality of possible actions include actions relating to the plurality of items referred to during the dialogue.

The dialogue system is configured for information retrieval, the dialogue state comprises a user purpose and a history, the user purpose indicates information requested by the user, and the history is previously searched in response to the user purpose. The module according to claim 1, which defines an item that is being used.

The processor is configured to compare the natural language input from the user with a plurality of possible actions by using a binary classifier to indicate matching and non-matching actions. The module described in.

The module of claim 5, wherein the binary classifier is configured to output a score, the score being compared to a threshold value to determine if the actions match.

The processor is configured to compare the natural language input from the user to a plurality of possible actions by generating multiple model inputs for each action, where each model input is the action and the natural from the user. 6. The processor is further configured to input the model input into a binomial classifier implemented as a trained machine learning model to output the score, comprising a language input. Module.

The module according to claim 7, wherein the trained machine learning model is a transformer-based trained machine learning model.

The module according to claim 7, wherein the trained machine learning model is a machine learning model trained in both directions.

The module of claim 7, wherein the model input further comprises a previous response from the dialogue system.

The action is selected from a candidate action and a state update action, the candidate action indicates a question asked by the user in a previous response from the system, and the state update action does not link to a previous response from the system. The module according to claim 7, which indicates a request from a user.

The module input to the candidate action comprises a representation of the previous response of the system, the user input, an item description of the item in the dialogue state history, and a proposed question related to the item referenced in the item description. The module according to claim 11.

11. The module of claim 11, wherein the module input to the state update action comprises a representation of the previous response of the system, the user input, and a proposed question related to a possible user query.

12. The module of claim 12, configured to set a request bit when the module inputs for the candidate action match.

13. The module of claim 13, configured to update the state when the module inputs to the state update action match.

A way to train a classifier to update a state in a dialogue system,
Providing a classifier, wherein the classifier may receive the natural language input from the user such that the classifier outputs a score indicating a match when the natural language input matches the possible action. It is possible to compare with,
Training the classifier using a dataset with natural language input and possible actions, said dataset comprising a positive combination if the natural language input and possible actions match. A method that provides incorrect answer options when natural language input and possible actions do not match.

The possible action is selected from a candidate action and a state update action, the candidate action indicates the question asked by the user in the previous response from the system, and the state update action does not link to the previous response from the system. The method of claim 16, wherein the request from the user is shown.

It ’s a dialogue system,
User input and
With the processor
Equipped with memory
The processor is adapted to update the dialogue state in response to a natural language input from the user, and the dialogue state is stored in the memory.
The dialogue state comprises a data structure that stores information exchanged between the user and the dialogue system.
The processor updates the dialogue state by comparing the natural language input from the user with a plurality of possible actions, wherein the action indicates a possible request of the user and is from an action that matches the natural language input. It is configured to update the above state using the information in
The processor is configured to use the updated state to generate a response to the natural language input.

It is a computer implementation method for updating the dialogue state with the user in the dialogue system for interacting with the user.
Receiving natural language input from the user and
The processor is used to update the dialogue state in response to a natural language input from the user, wherein the dialogue state is stored in memory and the dialogue state is the user and the dialogue system. It has a data structure that stores information exchanged with and from.
It comprises updating the dialogue state by comparing the natural language input from the user with a plurality of possible actions, wherein the action indicates a possible request of the user and is consistent with the natural language input. A method of updating the state using information from an action.

A computer-readable medium comprising an instruction that causes the computer to perform the method of claim 19 when the instruction is executed by the computer.