JP6997046B2

JP6997046B2 - Annotation support device

Info

Publication number: JP6997046B2
Application number: JP2018131051A
Authority: JP
Inventors: 和史池田; 啓一郎帆足
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2018-07-10
Filing date: 2018-07-10
Publication date: 2022-01-17
Anticipated expiration: 2038-07-10
Also published as: JP2020009264A

Description

本発明は、データに意味付けを与えるアノテーションを効率化するアノテーション支援装置に係り、特に、対話シナリオの発話テキストに属性情報を付与するアノテーションを効率化するアノテーション支援装置に関する。 The present invention relates to an annotation support device that streamlines annotations that give meaning to data, and more particularly to an annotation support device that streamlines annotations that add attribute information to spoken text in a dialogue scenario.

対話エージェントシステムに高度な対話を実現させるためには、想定される対話のやり取り（対話シナリオ）を記述した大規模な文章データ（対話シナリオコーパス）が必要となる。システムがより正確な対話を行うためには、対話シナリオの対話行為（挨拶、質問、返答、など）やコンテキスト（時間、場所、周辺状況、など）、感性指標（喜怒哀楽、興味レベル、など）といった属性情報が必要となる。 In order to realize a high-level dialogue in the dialogue agent system, a large-scale text data (dialogue scenario corpus) that describes the expected dialogue exchange (dialogue scenario) is required. In order for the system to have a more accurate dialogue, the dialogue behavior (greetings, questions, responses, etc.), context (time, place, surroundings, etc.), Kansei indicators (emotions, emotions, interest level, etc.) of the dialogue scenario, etc. ) And other attribute information is required.

特許文献１には、雑談を行う対話システムを対象に、ユーザ発話の属性（質問の種類や対話行為、話題カテゴリなど）を推定することで適切な応答選択を行う方法が開示されている。 Patent Document 1 discloses a method of selecting an appropriate response by estimating the attributes of user utterances (question type, dialogue action, topic category, etc.) for a dialogue system for chatting.

特許文献２には、対話行為推定を高精度に行うために、時系列モデルを用いて発話パターンを学習し、対話行為を推定する方法が開示されている。 Patent Document 2 discloses a method of learning an utterance pattern using a time-series model and estimating a dialogue action in order to estimate the dialogue action with high accuracy.

非特許文献１には、クラウドソーシングを利用して、２名の作業者によるチャット形式での対話入力によってシナリオ作成を行うシステムが提案されている。当該システムは、対話行為などのアノテーションをシナリオ作成と同時に実施可能な機能を提供する。 Non-Patent Document 1 proposes a system that uses crowdsourcing to create a scenario by interactive input in a chat format by two workers. The system provides a function that can perform annotations such as dialogue actions at the same time as creating a scenario.

特開2017-027234号公報Japanese Unexamined Patent Publication No. 2017-027234 特開2018-025747号公報Japanese Unexamined Patent Publication No. 2018-025747

オープンプラットフォームとクラウドソーシングを活用した対話コーパス構築方法, 塚原裕史, 内海慶, 言語処理学会年次大会 (2015年3月)How to Build a Dialogue Corpus Utilizing Open Platform and Crowdsourcing, Hiroshi Tsukahara, Kei Utsumi, Association for Natural Language Processing Annual Conference (March 2015)

特許文献１および特許文献２は、発話内容のテキスト情報を解析することで対話行為や質問種別などの属性情報を推定する手法を提案する。これにより、ユーザ発話の属性を推定し、対話シナリオ中から属性に応じた応答選択を行うことが可能となる。 Patent Document 1 and Patent Document 2 propose a method of estimating attribute information such as dialogue actions and question types by analyzing text information of utterance contents. This makes it possible to estimate the attributes of user utterances and select a response according to the attributes from the dialogue scenario.

ユーザ発話の推定に当たっては、属性情報が付与された発話テキストの特徴を機械学習により学習する方法などが用いられる。この方法を対話シナリオに適用することで、対話シナリオに自動的に属性情報を付与することが可能となる。 In estimating the user's utterance, a method of learning the characteristics of the utterance text to which the attribute information is added by machine learning or the like is used. By applying this method to the dialogue scenario, it is possible to automatically add attribute information to the dialogue scenario.

しかしながら、機械学習による推定精度は必ずしも高くないため、対話シナリオとユーザ発話の双方に推定手法を適用した場合、精度は二重に低下し、実用レベルでなくなるという課題がある。具体的には、推定精度が80%であるとすると、対話シナリオの属性情報80%×ユーザ発話の属性情報80%＝64%のように、応答精度を大幅に低下させてしまうことが課題となる。 However, since the estimation accuracy by machine learning is not always high, when the estimation method is applied to both the dialogue scenario and the user's utterance, there is a problem that the accuracy is doubly lowered and is not at a practical level. Specifically, assuming that the estimation accuracy is 80%, the problem is that the response accuracy is significantly reduced, such as 80% of the attribute information of the dialogue scenario x 80% of the attribute information of the user's utterance = 64%. Become.

上記の技術課題を回避するために、対話シナリオに属性情報をあらかじめ人手によってアノテーションすることで、高精度な応答を実現する方法がある。 In order to avoid the above technical problems, there is a method of realizing a highly accurate response by manually annotating the attribute information in the dialogue scenario in advance.

非特許文献３は、クラウドソーシングを利用して多数の作業者にシナリオの作成とアノテーションを行わせるシステムが提案する。しかしながら、対話行為やコンテキスト、感情指標などの属性情報は数十以上のカテゴリ（属性値）から正解となる属性値を選択する必要がある場合もあり、多くの労力と比較的高い専門性が求められる。したがって、大量の対話シナリオにアノテーションを行うには、膨大なコストを要することが課題となる。 Non-Patent Document 3 proposes a system that allows a large number of workers to create and annotate scenarios using crowdsourcing. However, for attribute information such as dialogue, context, and emotional index, it may be necessary to select the correct attribute value from dozens of categories (attribute values), which requires a lot of effort and relatively high expertise. Be done. Therefore, it is a problem that an enormous cost is required to annotate a large number of dialogue scenarios.

本発明の第１の目的は、上記の技術課題を解決し、機械学習を用いて対話シナリオの属性情報を推定し、属性値の候補を予め限定することにより、ワーカに多数の属性値を検討させる必要がなく、作業効率を向上させることが可能な対話シナリオのアノテーション支援装置を提供することにある。 The first object of the present invention is to solve the above technical problems, estimate the attribute information of the dialogue scenario using machine learning, and limit the candidate attribute values in advance, thereby examining a large number of attribute values for the worker. It is an object of the present invention to provide an annotation support device for dialogue scenarios that can improve work efficiency without the need for the operation.

本発明の第２の目的は、推定の信頼性や作業者のアノテーション結果に応じて、必要となる冗長性を制御することにより、ワーカの作業を必要最低限に軽減することを可能にする対話シナリオのアノテーション支援装置を提供することにある。 A second object of the present invention is a dialogue that makes it possible to reduce the work of a worker to the minimum necessary by controlling the required redundancy according to the reliability of estimation and the annotation result of the worker. The purpose is to provide a scenario annotation support device.

上記の目的を達成するために、本発明は、対話シナリオへのアノテーションを支援するアノテーション支援装置において、以下の構成を具備した点に特徴がある。 In order to achieve the above object, the present invention is characterized in that it has the following configuration in an annotation support device that supports annotation to a dialogue scenario.

(1)対話シナリオの発話テキストごとにその属性を推定する手段と、推定された複数の属性から少なくとも一つの属性を選択させるアノテーションタスクのタスク条件を決定する手段と、前記タスク条件に基づいて作成したアノテーションタスクを作業者へ提供する手段と、前記アノテーションタスクを実行した各作業者からアノテーションの結果を集計する手段とを具備した。 (1) A means for estimating the attribute for each spoken text of the dialogue scenario, a means for determining the task condition of the annotation task to select at least one attribute from a plurality of estimated attributes, and creation based on the above task condition. It is provided with a means for providing the annotation task to the worker and a means for aggregating the annotation results from each worker who executed the annotation task.

(2)前記推定する手段は、対話シナリオの発話テキストごとに特徴抽出を行う手段と、
前記抽出した特徴から属性を推定して推定尤度を出力する推定モデルとを含むことを特徴とする。 (2) The estimation means are a means for extracting features for each spoken text of the dialogue scenario and a means for extracting features.
It is characterized by including an estimation model that estimates attributes from the extracted features and outputs an estimated likelihood.

(3)前記推定する手段は、前記アノテーションの結果と前記抽出した特徴との関係を機械学習して前記推定モデルを更新する手段をさらに含むことを特徴とする。 (3) The estimation means further includes means for updating the estimation model by machine learning the relationship between the result of the annotation and the extracted feature.

(4)前記推定モデルは、前記抽出した特徴から属性種別ごとに属性値候補を推定して推定尤度を出力することを特徴とする。 (4) The estimation model is characterized in that attribute value candidates are estimated for each attribute type from the extracted features and the estimated likelihood is output.

(5)前記タスク条件を決定する手段は、アノテーションを依頼する発話テキストを他の発話テキストと共に時系列で表示する際の発話表示数Lを、当該アノテーションを依頼する発話テキストの属性値判断に影響を及ぼす直近の発話数に基づいて決定することを特徴とする。 (5) The means for determining the task condition affects the determination of the attribute value of the utterance text for which the annotation is requested by determining the utterance display number L when the utterance text for which the annotation is requested is displayed in chronological order together with other utterance texts. It is characterized in that it is determined based on the number of recent utterances.

(6)前記タスク条件を決定する手段は、推定尤度の降順で表示する属性値候補の表示数Mを前記推定尤度に基づいて決定することを特徴とする。 (6) The means for determining the task condition is characterized in that the display number M of the attribute value candidates to be displayed in descending order of the estimated likelihood is determined based on the estimated likelihood.

(7)前記タスク条件を決定する手段は、発話テキストごとにアノテーションタスクを依頼する作業者数Nを前記推定尤度に基づいて決定することを特徴とする。 (7) The means for determining the task condition is characterized in that the number of workers N requesting the annotation task for each utterance text is determined based on the estimated likelihood.

本発明によれば、以下のような効果が達成される。 According to the present invention, the following effects are achieved.

(1) 機械学習を用いて対話シナリオの属性情報を推定し、属性値の候補を予め限定することができるので、作業者は多数の属性値を検討する必要がなくなり、作業効率を向上させることが可能となる。 (1) Since the attribute information of the dialogue scenario can be estimated using machine learning and the candidate attribute values can be limited in advance, the worker does not have to consider a large number of attribute values and the work efficiency is improved. Is possible.

(2) 推定の信頼性や作業者のアノテーション結果に応じて、提示する属性値候補数、提示する発話数、依頼するワーカ数といった冗長性を制御するので、必要最小限の作業者が必要最小限の情報を確認するだけでアノテーションが可能となり、作業効率を向上させることが可能となる。 (2) Redundancy such as the number of attribute value candidates to be presented, the number of utterances to be presented, and the number of workers to be requested is controlled according to the reliability of estimation and the annotation result of the worker, so the minimum required worker is the minimum required. Annotation is possible just by checking the limited information, and it is possible to improve work efficiency.

(3) アノテーションされた結果に基づいて機械学習モデルの再学習を行うので、アノテーション推定が高精度化され、さらにアノテーション対象となる属性値の数やワーカ数を限定することが可能となる。 (3) Since the machine learning model is relearned based on the annotationd result, the annotation estimation is made highly accurate, and the number of attribute values and the number of workers to be annotated can be limited.

本発明の一実施形態に係る対話シナリオのアノテーション支援システムの構成を示したブロック図である。It is a block diagram which showed the structure of the annotation support system of the dialogue scenario which concerns on one Embodiment of this invention. 対話シナリオの例を示した図である。It is a figure which showed the example of the dialogue scenario. 属性情報を格納するためのフィールドの例を示した図である。It is a figure which showed the example of the field for storing the attribute information. アノテーションタスク依頼画面の一例を示した図である。It is a figure which showed an example of the annotation task request screen. 図３に示した各発話テキストについて、属性種別「対話行為」の属性値候補をその推定尤度の降順で示した図である。It is a figure which showed the attribute value candidate of the attribute type "dialogue act" for each utterance text shown in FIG. 3 in descending order of the estimated likelihood. 発話テキストごとに推定される属性値候補およびその推定尤度の例を示した図である。It is a figure which showed the example of the attribute value candidate estimated for each utterance text, and the estimated likelihood thereof.

以下、図面を参照して本発明の実施の形態について詳細に説明する。図１は、本発明の一実施形態に係る対話シナリオを対象としたアノテーション支援システムの構成を示したブロック図である。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing a configuration of an annotation support system for a dialogue scenario according to an embodiment of the present invention.

アノテーション支援システムは、クラウドソーシング等の手段によって収集された大量の対話シナリオを格納するシナリオDB２と、この対話シナリオに対して、クラウドワーカ（作業者）Wによる支援を受けてアノテーションを付与するアノテーション支援装置１とを主要な構成としている。 The annotation support system is a scenario DB2 that stores a large amount of dialogue scenarios collected by means such as cloud sourcing, and annotation support that annotates this dialogue scenario with the support of a cloud worker (worker) W. The main configuration is the device 1.

アノテーション支援装置１は、アノテーション推定部１１、タスク条件決定部１２、アノテーションタスク生成部１３およびアノテーション更新部１４を主要な構成としている。このようなアノテーション支援装置１は、汎用のコンピュータやサーバに各機能を実現するアプリケーション（プログラム）を実装することで構成できる。あるいはアプリケーションの一部がハードウェア化またはROM化された専用機や単能機としても構成できる。 The annotation support device 1 mainly includes an annotation estimation unit 11, a task condition determination unit 12, an annotation task generation unit 13, and an annotation update unit 14. Such an annotation support device 1 can be configured by implementing an application (program) that realizes each function on a general-purpose computer or server. Alternatively, it can be configured as a dedicated machine or a single-purpose machine in which a part of the application is made into hardware or ROM.

対話シナリオは、図２に示したように、想定されるユーザ発話およびシステム応答の各テキスト情報（発話テキスト）ならびに次発話へのリンク情報を含み、分岐を含むように構造化されていてもよい。 As shown in FIG. 2, the dialogue scenario includes text information (spoken text) of the expected user utterance and system response, as well as link information to the next utterance, and may be structured to include a branch. ..

前記シナリオDB２は更に、図３に示したように、前記発話テキストに加えて、「対話行為」、「話題カテゴリ」、「感性指標」、「コンテキスト」などの属性種別ごとに、その属性値が登録された属性情報を格納するためのフィールドを備える。 Further, as shown in FIG. 3, in the scenario DB2, in addition to the utterance text, the attribute value is set for each attribute type such as "dialogue action", "topic category", "sensitivity index", and "context". It has a field for storing the registered attribute information.

例えば、発話「サークルでテニスをやります。」には、属性種別「対話行為」の属性値として「応答、平叙」が、属性種別「話題カテゴリ」の属性値として「スポーツ／テニス」が、属性種別「感性指標」の属性値として「容認」が、それぞれ登録されている。 For example, in the utterance "I play tennis in a circle.", The attribute value of the attribute type "dialogue" is "response, flat", and the attribute value of the attribute type "topic category" is "sports / tennis". "Acceptance" is registered as an attribute value of the type "sensitivity index".

前記アノテーション支援装置１において、アノテーション推定部１１は、特徴抽出部１１１、特徴学習部１１２および推定尤度算出部１１３を含み、シナリオDB２に格納されている対話シナリオの発話テキストごとに、その特徴と予め機械学習した推定モデルとに基づいてアノテーション候補を推定する。本実施形態では、各発話テキストに関連した属性種別ごとに複数の属性値候補が推定される。 In the annotation support device 1, the annotation estimation unit 11 includes a feature extraction unit 111, a feature learning unit 112, and an estimation probability calculation unit 113, and each feature and a feature of each dialogue scenario utterance text stored in the scenario DB2. Annotation candidates are estimated based on an estimation model that has been machine-learned in advance. In this embodiment, a plurality of attribute value candidates are estimated for each attribute type related to each utterance text.

特徴抽出部１１１は、アノテーション推定に有効な特徴を対話シナリオから抽出する。具体的には、各発話テキストに対して形態素解析等を行い、単語を特徴として利用する方法などがある。その他に、単語の出現傾向に基づき、関連性の高い単語が近い数値となるよう数値変換された分散表現を特徴として利用する方法などもある。 The feature extraction unit 111 extracts features effective for annotation estimation from the dialogue scenario. Specifically, there is a method of performing morphological analysis or the like for each utterance text and using words as features. In addition, there is also a method of using a distributed expression that is numerically converted so that highly related words have close numerical values based on the appearance tendency of words.

特徴学習部１１２は、推定対象の属性値がアノテーションされたシナリオから、特徴の出現パターンを機械学習により学習する。機械学習には、単独の発話で学習を行う場合はSVM等を利用する。発話を時系列で学習させ、前後の発話文脈も含めて推定を行う場合はHMMやRNN等を利用する。機械学習への入力は、発話の特徴である単語の出現頻度または分散表現と、その発話の属性値とすることができる。 The feature learning unit 112 learns the appearance pattern of the feature by machine learning from the scenario in which the attribute value of the estimation target is annotated. For machine learning, use SVM etc. when learning by a single utterance. Use HMM, RNN, etc. when learning utterances in chronological order and estimating including the context of utterances before and after. The input to machine learning can be the frequency of occurrence or distributed expression of words that are characteristic of the utterance, and the attribute value of the utterance.

推定尤度算出部１１３は、特徴学習部１１２において学習した推定モデルを用いて、アノテーションが行われていない対話シナリオに対して各属性値候補の推定を行う。多くの機械学習ツールでは、内部で推定尤度を算出しており、属性値ごとの推定尤度を出力させることができる。 The estimation likelihood calculation unit 113 estimates each attribute value candidate for the dialogue scenario in which the annotation is not performed, using the estimation model learned in the feature learning unit 112. In many machine learning tools, the estimated likelihood is calculated internally, and the estimated likelihood for each attribute value can be output.

タスク条件決定部１２は、発話表示数決定部１２１、属性値候補数決定部１２２およびタスク依頼数決定部１２３を含み、ワーカWにアノテーションを依頼する際の条件（タスク条件）として、以下に詳述する発話表示数L、属性値候補数Mおよびタスク依頼数Nを決定する。 The task condition determination unit 12 includes the utterance display number determination unit 121, the attribute value candidate number determination unit 122, and the task request number determination unit 123, and is described in detail below as conditions (task conditions) for requesting annotation from the worker W. Determine the number of utterance displays L, the number of attribute value candidates M, and the number of task requests N to be described.

図４は、アノテーション支援装置１がワーカWにアノテーションタスクを依頼する際に各ワーカWへ提供されるアノテーションタスク依頼画面の一例を示した図であり、ここでは、システム応答の発話「お上手なんですね。」に対するアノテーションタスクの依頼例が示されている。 FIG. 4 is a diagram showing an example of the annotation task request screen provided to each worker W when the annotation support device 1 requests the annotation task to the worker W. Here, the system response utterance “You are good at it, isn't it? An example of requesting an annotation task for "." Is shown.

依頼画面には、アノテーション対象の発話「お上手なんですね。」を含む直近の過去L個（ここでは、５個）のユーザ発話およびシステム応答の各発話テキストが時系列関係を反映して交互に段差表示されている。 On the request screen, the latest L past utterances (5 in this case) including the annotation target utterance "You're good at it." And each utterance text of the system response are alternately stepped to reflect the time-series relationship. It is displayed.

このとき、発話表示数Lが多ければ、精度の高いアノテーションが可能になるもののワーカWが発話内容を読むために要する時間が長くなる。これに対して、発話表示数Lが少なければ、時間短縮となるがアノテーションの精度は低下する傾向にある。 At this time, if the number of utterance displays L is large, highly accurate annotation is possible, but the time required for the worker W to read the utterance content becomes long. On the other hand, if the number of utterances displayed is small, the time is shortened, but the accuracy of annotation tends to decrease.

また、ワーカWのアノテーションタスクの負担を軽減するために、属性種別ごとにM個の属性値候補が一覧表示されている。ワーカWは、属性種別ごとにいずれかの属性値候補のラジオボタンをチェックすることでアノテーションを実施できる。 In addition, in order to reduce the burden on the annotation task of Worker W, M attribute value candidates are listed for each attribute type. Worker W can perform annotation by checking one of the attribute value candidate radio buttons for each attribute type.

本実施形態では、属性種別「対話行為」については３つ（M=3）の属性値候補「伝達、非過去、創造」、「質問、Yes No」および「要求」が用意されている。属性種別「話題カテゴリ」については４つ（M=4）の属性値候補「教育／大学」、「スポーツ／テニス」、「芸術／絵画」および「芸術／音楽」が用意されている。ワーカWは、過去の発話履歴を参照し、アノテーション対象の発話テキスト「お上手なんですね」に対応する属性値をラジオボタンで選択する。 In this embodiment, three (M = 3) attribute value candidates "communication, non-past, creation", "question, Yes No", and "request" are prepared for the attribute type "dialogue act". For the attribute type "topic category", four (M = 4) attribute value candidates "education / university", "sports / tennis", "art / painting" and "art / music" are prepared. Worker W refers to the past utterance history and selects the attribute value corresponding to the utterance text "You are good" to be annotated with the radio button.

このとき、表示する属性値候補数Mが多ければ精度の高いアノテーションが期待できるもののワーカWの判断に要する時間が長くなる。これに対して、属性値候補数Mが少ないと、時間短縮となるがアノテーションの精度は低下する傾向にある。 At this time, if the number of attribute value candidates M to be displayed is large, highly accurate annotation can be expected, but the time required for the worker W to judge becomes long. On the other hand, if the number of attribute value candidates M is small, the time is shortened, but the accuracy of annotation tends to decrease.

さらに、アノテーションは、単独のワーカWに依頼したのでは必ずしも正しい属性値が選択されるとは限らないため、発話テキストごとにN人のワーカWに依頼し、多数決等の方法によってより正確に属性値を決定することが望ましい。 Furthermore, since the correct attribute value is not always selected by requesting a single worker W for annotation, request N worker W for each utterance text and make the attribute more accurate by a method such as majority voting. It is desirable to determine the value.

この際、発話テキストごとのアノテーション依頼数Nは、多ければ精度の高いアノテーションが可能になるもののコスト負担が増大する。これに対して、アノテーション依頼数Nを少なくすると、コスト負担は軽減できるがアノテーションの精度は低下する傾向にある。 At this time, if the number of annotation requests N for each utterance text is large, highly accurate annotation can be performed, but the cost burden increases. On the other hand, if the number of annotation requests N is reduced, the cost burden can be reduced, but the accuracy of annotation tends to decrease.

本実施形態では、前記タスク条件決定部１２がこれらのトレードオフを考慮して発話表示数L、属性値候補数Mおよびタスク依頼数Nを最適値に決定する。 In the present embodiment, the task condition determination unit 12 determines the utterance display number L, the attribute value candidate number M, and the task request number N to the optimum values in consideration of these trade-offs.

前記発話表示数決定部１２１は、アノテーションを依頼する発話テキストの属性値判断に、いくつ前までの発話が影響を及ぼしているかを分析することにより、発話表示数Lを決定する。 The utterance display number determination unit 121 determines the utterance display number L by analyzing how many previous utterances affect the attribute value determination of the utterance text for which annotation is requested.

具体的には、HMM (Hidden Markov Model) やRNN (Recurrent Neural Network) といったアノテーション推定に用いる機械学習装置に対して、発話系列（長さL=LA）を入力することで、各属性値の推定尤度が算出される。ここで、入力する発話系列の長さLをL=LA-1，LA-2，LA-3…と徐々に短くした際の各属性値の推定尤度も同様に出力し、各属性値の推定尤度の差が閾値未満である最小のLを決定する。 Specifically, by inputting the speech sequence (length L = LA) to the machine learning device used for annotation estimation such as HMM (Hidden Markov Model) and RNN (Recurrent Neural Network), each attribute value is estimated. The likelihood is calculated. Here, the estimated likelihood of each attribute value when the length L of the input utterance series is gradually shortened to L = LA-1, LA-2, LA-3 ... Is also output, and the estimated likelihood of each attribute value is also output. Determine the minimum L for which the difference in estimated likelihood is less than the threshold.

図５は、図３に示した発話「お上手なんでしょうね。」およびその直前の４つの発話テキストに関して、属性種別「対話行為」の属性値候補をその推定尤度の降順で示した図である。 FIG. 5 is a diagram showing the attribute value candidates of the attribute type “dialogue act” in descending order of their estimated likelihoods for the utterance “You are good at it.” And the four utterance texts immediately before it shown in FIG. ..

図示の例では、発話「お上手なんでしょうね。」については、属性値候補「伝達、非過去、創造」の推定尤度が0.35で最も高く、属性値候補「質問、Yes/No」の推定尤度が0.30で２番目に高くなっている。同様に、発話「大学からなので１０年近くです。」については、属性値候補「応答、平叙」の推定尤度が0.60で最も高く、属性値候補「伝達、非過去、創造」の推定尤度0.35が２番目に高くなっている。 In the example shown in the figure, for the utterance "You're good at it.", The estimated likelihood of the attribute value candidate "communication, non-past, creation" is the highest at 0.35, and the estimated likelihood of the attribute value candidate "question, Yes / No". The degree is 0.30, which is the second highest. Similarly, for the utterance "It's been nearly 10 years since I was from university", the estimated likelihood of the attribute value candidate "response, declarative" is the highest at 0.60, and the estimated likelihood of the attribute value candidate "transmission, non-past, creation". 0.35 is the second highest.

ここで、発話「お上手なんでしょうね。」をアノテーション対象とし、複数の属性値候補の中からワーカWへ提示する属性値候補を選抜する場合を考える。 Here, consider a case where the utterance "You are good at it." Is targeted for annotation, and the attribute value candidate to be presented to the worker W is selected from a plurality of attribute value candidates.

図５を参照すれば、L=5，4，3における推定尤度の差は十分に小さいため、直近の3つの発話L=1～3のみを提示すれば、ワーカWは対話行為のアノテーションが可能となることが見込まれる。複数の属性について同時にアノテーションを依頼する場合、属性ごとに算出したLの最大値を適用する。 Referring to FIG. 5, the difference in estimated likelihood at L = 5, 4, 3 is sufficiently small, so if only the last three utterances L = 1 to 3 are presented, the worker W will be annotated with the dialogue action. It is expected that it will be possible. When requesting annotations for multiple attributes at the same time, apply the maximum value of L calculated for each attribute.

このように、本実施形態ではアノテーションを依頼する発話テキストを他の発話テキストと共に時系列で表示する際の発話表示数Lが、アノテーションを依頼する発話テキストおよびその直近の少なくとも一つの発話テキストの属性種別ごとの各属性値候補の推定尤度の相違に基づいて決定される。 As described above, in the present embodiment, the utterance display number L when displaying the utterance text for which annotation is requested together with other utterance texts in chronological order is the attribute of the utterance text for which annotation is requested and at least one utterance text thereof. It is determined based on the difference in the estimated likelihood of each attribute value candidate for each type.

属性値候補数決定部１２２は、上位M件の属性値候補の推定尤度の和が閾値θ_Mを上回るように表示数Mを設定する。図６を参照し、ここでも各発話テキストに関して、属性値候補がその推定尤度の降順で登録されているものとする。 The attribute value candidate number determination unit 122 sets the display number M so that the sum of the estimated likelihoods of the upper M attribute value candidates exceeds the threshold value θ _M. With reference to FIG. 6, it is also assumed that the attribute value candidates are registered in descending order of the estimated likelihood for each utterance text.

θ_M=0.9に設定する場合、発話「休日は何をしていますか？」の属性種別「対話行為」について、属性値候補「質問、what」の推定尤度0.90は、既に閾値θ_Mに達している。したがって、当該属性値候補「質問、what」を正解と考えてアノテーションを行わないこととする。 When θ _M = 0.9 is set, the estimated likelihood 0.90 of the attribute value candidate “question, what” is already set to the threshold value θ _M for the attribute type “dialogue” of the utterance “what are you doing on holidays?”. Have reached. Therefore, it is decided not to annotate the attribute value candidate "question, what" as the correct answer.

発話「サークルでテニスをやります。」については、属性種別「対話行為」の属性値候補「応答、平叙」の推定尤度0.75と属性値候補「伝達、非過去、実在」の推定尤度0.15との和が0.90となって閾値θ_Mに達するため、この上位２つの属性値候補をワーカWに提示してアノテーションさせる。 For the utterance "I play tennis in a circle", the estimated likelihood of the attribute value candidate "response, debate" of the attribute type "dialogue" is 0.75 and the estimated likelihood of the attribute value candidate "transmission, non-past, reality" is 0.15. Since the sum of and reaches 0.90 and reaches the threshold θ _M , the top two attribute value candidates are presented to the worker W for annotation.

発話「大学からなので10年近くです。」については、推定尤度が上位２つの属性値候補の推定尤度の和が0.85となって閾値θ_Mに満たないため、上位3つ以上の属性値候補をワーカに提示する。 Regarding the utterance "It's been nearly 10 years since I'm from university.", The sum of the estimated likelihoods of the top two attribute value candidates is 0.85, which is less than the threshold θ _M , so the top three or more attribute values. Present the candidate to the worker.

前記閾値θ_Mは低く設定するほど提示される候補の件数が減少するため、アノテーションに要する時間を短縮でき、コストを削減できるが、正解となる属性値が含まれない可能性が増加するため、アノテーション品質が低下する。求めるアノテーション品質に応じて閾値θ_Mを決定する。 As the threshold value θ _M is set lower, the number of candidates presented decreases, so that the time required for annotation can be shortened and the cost can be reduced, but the possibility that the correct attribute value is not included increases. Annotation quality is degraded. The threshold value θ _M is determined according to the desired annotation quality.

このように、本実施形態ではアノテーションを依頼する発話テキストごとに各属性種別の属性値候補の推定尤度を降順で順次に加算して求めた加算値に基づいて表示数Mが決定される。 As described above, in the present embodiment, the display number M is determined based on the added value obtained by sequentially adding the estimated likelihoods of the attribute value candidates of each attribute type for each utterance text for which annotation is requested.

タスク依頼数決定部１２３は、アノテーション対象の発話テキストごとに、その属性値の推定尤度や前記属性値候補数Mに基づいて、アノテーションを依頼するワーカ数Nを決定する。 The task request number determination unit 123 determines the number of workers N to request annotation based on the estimated likelihood of the attribute value and the attribute value candidate number M for each utterance text to be annotated.

例えば、最も推定尤度の高い属性値候補の推定尤度が閾値θ_Nを上回る場合、アノテーションを行うことなく、当該属性のアノテーション結果は最も推定尤度の高い属性値に決定することができる（N=0）。 For example, when the estimated likelihood of the attribute value candidate having the highest estimated likelihood exceeds the threshold value θ _N , the annotation result of the attribute can be determined to be the attribute value having the highest estimated likelihood (without annotation). N = 0).

最も推定尤度の高い属性値候補の推定尤度が閾値θ_Mを下回る場合、ワーカWに提示するM件の属性値のそれぞれについて、Piをi番目（i=1～M）に推定尤度の高い属性値の推定尤度とし、i番目の属性値をワーカWが選択した回数をLiとしたとき、次式(1)に基づいて、更なる他のワーカへの依頼要否を判定する。 When the estimated likelihood of the attribute value candidate with the highest estimated likelihood is below the threshold θ _M , Pi is the i-th (i = 1 to M) estimated likelihood for each of the M attribute values presented to the worker W. When the estimated likelihood of the high attribute value of is set and the number of times the worker W selects the i-th attribute value is Li, the necessity of requesting another worker is determined based on the following equation (1). ..

Pi＊Liの最大値－Pi＊Liの二番目に大きい値＜閾値θ_N …(1) Maximum value of Pi * Li-Second largest value of Pi * Li <threshold value θ _N … (1)

本実施形態では、上式(1)が成立すれば、１名のワーカWに追加のアノテーションを依頼し、当該ワーカWのアノテーション結果に応じて、Liの値を更新する。上式(1)が成立しなければ、追加のアノテーションを依頼しない。 In the present embodiment, if the above equation (1) is satisfied, one worker W is requested to perform additional annotation, and the value of Li is updated according to the annotation result of the worker W. If the above equation (1) does not hold, no additional annotation is requested.

図６の例では、発話「何年くらい続けていますか？」にアノテーションを依頼する場合、属性値表示数M=2であり、P1=0.85＜θ_N(=0.90)，P2=0.10となる。Pi＊Liの最大値および二番目に大きい値は共に0である(L1=L2=0のため)ことから、１名のワーカWにアノテーションを依頼する。 In the example of Fig. 6, when requesting annotation to the utterance "How many years have you continued?", The attribute value display number M = 2, P1 = 0.85 <θ _N (= 0.90), P2 = 0.10. .. Since the maximum value and the second largest value of Pi * Li are both 0 (because L1 = L2 = 0), request annotation from one worker W.

ここで、ワーカＷがi=1の属性値（ここでは、推定尤度1）を選択した場合、その推定尤度は0.85なので前記「Pi＊Liの最大値」は0.85となる。また、前記「Pi＊Liの二番目に大きい値」は依然として0なので、その差は0.85で上式(1)を満足する。したがって、更に１名のワーカWを追加する。 Here, when the worker W selects the attribute value of i = 1 (here, the estimated likelihood 1), the estimated likelihood is 0.85, so the above-mentioned "maximum value of Pi * Li" is 0.85. Further, since the above-mentioned "second largest value of Pi * Li" is still 0, the difference is 0.85, which satisfies the above equation (1). Therefore, one more worker W is added.

さらに、次のワーカWもi=1の属性値を選択した場合、前記「Pi＊Liの最大値」は1.70、前記「Pi＊Liの二番目に大きい値」は依然として0なので、その差はその差は1.70で上式(1)を満足しない。したがって、発話「何年くらい続けていますか？」に関するアノテーションを終了する。 Furthermore, when the next worker W also selects the attribute value of i = 1, the "maximum value of Pi * Li" is 1.70 and the "second largest value of Pi * Li" is still 0, so the difference is. The difference is 1.70, which does not satisfy the above equation (1). Therefore, the annotation regarding the utterance "How long have you been doing this?" Ends.

このように、本実施形態では発話ごとにそのアノテーションの結果に基づいて、アノテーションされた属性値とそのアノテーション回数との積を属性値ごとに求め、各積の差分に基づいて作業者数Nを決定する。 As described above, in the present embodiment, the product of the annotated attribute value and the number of annotations is obtained for each attribute value based on the result of the annotation for each utterance, and the number of workers N is calculated based on the difference between the products. decide.

上記の方法でアノテーションを依頼することで、推定尤度が高い属性値は、少数のワーカが当該属性値にアノテーションを実施した段階で結果を確定でき効率化が図れる。また、推定尤度が低い属性値は多数のワーカのアノテーションを得て結果が確定するため、信頼性を確保できる。 By requesting annotation by the above method, the result of an attribute value with high estimated likelihood can be confirmed at the stage when a small number of workers annotate the attribute value, and efficiency can be improved. In addition, since the attribute value with low estimated likelihood is annotated by many workers and the result is confirmed, reliability can be ensured.

なお、上記は実装方法の一例であり、閾値θ_Nは求めるアノテーションの品質に応じて調整する。また、初期値Piは利用する機械学習装置によって特性が異なるため、調整するための関数を利用してもよい。 The above is an example of the mounting method, and the threshold value θ _N is adjusted according to the quality of the desired annotation. Further, since the characteristics of the initial value Pi differ depending on the machine learning device to be used, a function for adjusting may be used.

アノテーションタスク生成部１３は、前記各タスク条件L，Mの値に基づいて、前記図４を参照して説明したアノテーション用のタスクを生成し、前記タスク依頼要否判定に基づいて、必要なワーカ数分の依頼を行う。 The annotation task generation unit 13 generates a task for annotation described with reference to FIG. 4 based on the values of the task conditions L and M, and is a necessary worker based on the task request necessity determination. Make a request for a few minutes.

アノテーション更新部１４は、ワーカWのアノテーション結果を受領し、Pi*Liを最大値とする属性値をアノテーション結果として前記シナリオDB２に登録する。また、一定量のアノテーション結果がシナリオDB２に追加された際に、アノテーション推定部１１の特徴学習部１１２で機械学習される推定モデルを更新する。 The annotation update unit 14 receives the annotation result of the worker W, and registers the attribute value having Pi * Li as the maximum value in the scenario DB2 as the annotation result. Further, when a certain amount of annotation results are added to the scenario DB2, the estimation model machine-learned by the feature learning unit 112 of the annotation estimation unit 11 is updated.

１…アノテーション支援装置，２…シナリオDB，１１…アノテーション推定部，１２…タスク条件決定部，１３…アノテーションタスク生成部，１４…アノテーション更新部，１１１…特徴抽出部，１１２…特徴学習部，１１３…推定尤度算出部，１２１…発話表示数決定部，１２２…属性値候補数決定部，１２３…タスク依頼数決定部 1 ... Annotation support device, 2 ... Scenario DB, 11 ... Annotation estimation unit, 12 ... Task condition determination unit, 13 ... Annotation task generation unit, 14 ... Annotation update unit, 111 ... Feature extraction unit, 112 ... Feature learning unit, 113 ... Estimated likelihood calculation unit, 121 ... Speech display number determination unit, 122 ... Attribute value candidate number determination unit, 123 ... Task request number determination unit

Claims

In the annotation support device that supports annotation to dialogue scenarios
A means of estimating the attributes of each spoken text in a dialogue scenario,
A means of determining the task conditions of an annotation task that causes at least one attribute to be selected from multiple estimated attributes.
A means for providing an annotation task created based on the above task conditions to a worker,
It is provided with a means for aggregating the annotation results from each worker who has executed the annotation task.
The estimation means is
A means of extracting features for each spoken text in a dialogue scenario,
Including an estimation model that estimates attributes from the extracted features and outputs estimated likelihood.
The estimation model estimates attribute value candidates for each attribute type from the extracted features and outputs an estimated likelihood.
The means for determining the task condition is to determine the number of utterance displays L when displaying the utterance text for which annotation is requested together with other utterance texts in chronological order, which affects the determination of the attribute value of the utterance text for which the annotation is requested. Annotation support device characterized in that it is determined based on the number of utterances of .

In the annotation support device that supports annotation to dialogue scenarios
A means of estimating the attributes of each spoken text in a dialogue scenario,
A means of determining the task conditions of an annotation task that causes at least one attribute to be selected from multiple estimated attributes.
A means for providing an annotation task created based on the above task conditions to a worker,
It is provided with a means for aggregating the annotation results from each worker who has executed the annotation task.
The estimation means is
A means of extracting features for each spoken text in a dialogue scenario,
Including an estimation model that estimates attributes from the extracted features and outputs estimated likelihood.
The estimation model estimates attribute value candidates for each attribute type from the extracted features and outputs an estimated likelihood.
The means for determining the task condition is an annotation support device characterized in that the display number M of attribute value candidates to be displayed in descending order of estimated likelihood is determined based on the estimated likelihood .

In the annotation support device that supports annotation to dialogue scenarios
A means of estimating the attributes of each spoken text in a dialogue scenario,
A means of determining the task conditions of an annotation task that causes at least one attribute to be selected from multiple estimated attributes.
A means for providing an annotation task created based on the above task conditions to a worker,
It is provided with a means for aggregating the annotation results from each worker who has executed the annotation task.
The estimation means is
A means of extracting features for each spoken text in a dialogue scenario,
Including an estimation model that estimates attributes from the extracted features and outputs estimated likelihood.
The estimation model estimates attribute value candidates for each attribute type from the extracted features and outputs an estimated likelihood.
The means for determining the task condition is an annotation support device characterized in that the number N of workers requesting an annotation task for each utterance text is determined based on the estimated likelihood .

The estimation means is
The annotation support device according to claim 1 , further comprising means for updating the estimation model by machine learning the relationship between the annotation result and the extracted feature.

The means for determining the task condition is to determine the number of utterance displays L when displaying the utterance text for which annotation is requested together with other utterance texts in chronological order, which affects the determination of the attribute value of the utterance text for which the annotation is requested. The annotation support device according to claim 2 or 3 , wherein the annotation is determined based on the number of utterances.

The means for determining the task condition is to determine the utterance display number L based on the difference in the estimated likelihood of each attribute value candidate for each attribute type of the utterance text for which annotation is requested and at least one utterance text immediately thereof. The annotation support device according to claim 1 , wherein the annotation support device is characterized.

The means for determining the task condition is to set the display number M based on the added value obtained by sequentially adding the estimated likelihoods of the attribute value candidates of each attribute type for each utterance text for which annotation is requested. The annotation support device according to claim 2 , which is characterized.

The means for determining the task condition is to obtain the product of the annotated attribute value and the number of annotations for each attribute value based on the result of the annotation for each utterance text, and the number of workers based on the difference between the products. The annotation support device according to claim 3 , wherein N is determined.