JP2019036171A

JP2019036171A - System for assisting in creation of interaction scenario corpus

Info

Publication number: JP2019036171A
Application number: JP2017157641A
Authority: JP
Inventors: 池田　和史; Kazufumi Ikeda; 和史池田; 啓一郎帆足; Keiichiro Hoashi
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2017-08-17
Filing date: 2017-08-17
Publication date: 2019-03-07
Anticipated expiration: 2037-08-17
Also published as: JP6853752B2

Abstract

To provide a system for assisting in creation of an interaction scenario corpus which does not need to constrain operators by allowing the same scenario to be created by a plurality of persons and can improve work efficiency.SOLUTION: At a time t1, attribute information of a worker in charge of an inputting operation of an utterance is registered to a scenario creation assistance device 1. At a time t2, the scenario creation assistance device 1 selects and extracts an incomplete scenario for requesting an input of an utterance to workers W1 and W2 from a scenario DB2. At a time t3, the incomplete scenario is provided to each of the workers W1 and W2, and the input of utterances is requested. At times t4 and t5, the workers W1 and W2 input utterances to be added into the incomplete scenario to an utterance input field 31. At times t6 and t7, when the workers W1 and W2 click a transmission button 32, the utterance input by each of the workers W1 and W2 is notified to the scenario creation assistance device 1, and is additionally registered to the scenario DB2 as the utterance of each incomplete scenario.SELECTED DRAWING: Figure 3

Description

本発明は、対話シナリオコーパスの作成支援システムに係り、特に、発話の登録数が不足する未完成シナリオを発話入力作業を担うワーカへ提示し、次の発話者の立場で発話を入力させて更新登録することを繰り返して対話シナリオを完成させる対話シナリオコーパスの作成支援システムに関する。 The present invention relates to a dialogue scenario corpus creation support system, and in particular, presents an incomplete scenario in which the number of registered utterances is insufficient to a worker who is responsible for utterance input work, and inputs and updates the utterance as the next speaker. The present invention relates to a dialogue scenario corpus creation support system that completes a dialogue scenario by repeating registration.

対話エージェントシステムに高度な対話を実現させるためには、想定される対話のやり取り（対話シナリオ）を記述した大規模な文章データ（対話シナリオコーパス）の作成が必要となる。対話シナリオコーパスは、既に存在するチャットやSNS上のデータから作成する方法や、人手で入力することで作成する方法などがあるが、一般的に人手によって作成した方が用途に合った高品質なものとなることが多い。 In order to realize a high level of dialogue in the dialogue agent system, it is necessary to create large-scale text data (dialog scenario corpora) describing the expected dialogue exchange (dialog scenario). A dialogue scenario corpus can be created from existing chat or SNS data, or manually entered, but it is generally better to manually create a higher quality that suits the application. It often becomes a thing.

特許文献１には、チャットのような複数ユーザによるテキストコミュニケーションにおける入力文章を対象に自然言語処理を行い、文章間の関係情報を取得、蓄積することで対話コーパスを生成する方法が開示されている。 Patent Document 1 discloses a method of generating a dialogue corpus by performing natural language processing on input sentences in text communication by a plurality of users such as chat, and acquiring and accumulating relation information between sentences. .

非特許文献１には、クラウドソーシングを利用してテキスト対話を行う２名の作業者を募集し、チャット形式での対話入力と、入力した文章に対してその役割などを付与するアノテーション作業を支援するシステムが開示されている。 Non-Patent Document 1 recruits two workers who perform text dialogue using crowdsourcing, and supports dialogue input in chat format and annotation work that assigns its role to the entered text. A system is disclosed.

非特許文献２は、TwitterのようなSNS上で行われているコミュニケーションをコーパスとして収集することで、ユーザの発話と類似度が高いSNS上の投稿を発見し、当該投稿に対する返答を対話エージェントの返答とすることで対話システムを実現する方法が開示されている。ユーザ発話に類似する投稿が存在しない場合、非特許文献２では、クラウドソーシング上のワーカに返答の作成を依頼する。 Non-Patent Document 2 collects communications on SNS such as Twitter as a corpus, and finds posts on SNS that have a high degree of similarity to user utterances. A method for realizing a dialogue system by using a response is disclosed. If there is no post similar to the user utterance, Non-Patent Document 2 requests a worker on crowdsourcing to create a reply.

特開2008 -299754号公報JP 2008-299754A

「オープンプラットフォームとクラウドソーシングを活用した対話コーパス構築方法」塚原裕史，内海慶，言語処理学会年次大会(2015年3月)"Conversation corpus construction method using open platform and crowdsourcing" Hiroshi Tsukahara, Kei Utsumi, Language Processing Society Annual Conference (March 2015) 「リアルタイムクラウドソーシングとTwitter大規模コーパスを利用した対話シテム」別所史浩，原田達也，國吉康夫，情報処理学会研究報告(2012年5月)"Dialogue system using real-time crowdsourcing and Twitter large-scale corpus" Fumihiro Bessho, Tatsuya Harada, Yasuo Kuniyoshi, IPSJ Research Report (May 2012)

特許文献１や非特許文献２では、チャットやSNSなどテキストによるコミュニケーションデータが既に存在することを前提に、それらの関係性を取得してコーパスとするが、既存のコミュニケーションデータは多様なユーザが自身の立場で情報を発信しているため、そのままでは活用が難しい。例えば、男性の投稿と女性の投稿とが混ざっており、これらが混合されたコーパスは対話エージェントの性格を破たんさせる。対話エージェンの品質向上には、性格などを考慮したコーパスを人手で作成することが求められる。 In Patent Document 1 and Non-Patent Document 2, on the assumption that text communication data such as chat and SNS already exists, these relationships are acquired and used as a corpus. It is difficult to use it as it is because the information is transmitted from the standpoint of. For example, men's posts and women's posts are mixed, and the combined corpus destroys the character of the dialogue agent. In order to improve the quality of dialogue agents, it is necessary to manually create a corpus that takes into account the personality.

非特許文献１では、クラウドソーシングを利用して複数のワーカを収集し、２名のワーカペアにチャットを行わせる方法が提案されている。この方法では、２名のワーカを同時刻に集合させてチャットを行う必要があるために時間的拘束が大きくなる。また、相手が文章を入力している間、他方のワーカは待ち状態となるので無駄な時間が多く、効率が悪くなる。さらに、複数のワーカペアで会話の内容が重複し、無駄になるという課題もある。 Non-Patent Document 1 proposes a method of collecting a plurality of workers using crowdsourcing and allowing two worker pairs to chat. In this method, since it is necessary to gather two workers at the same time for chatting, time constraints are increased. In addition, while the other party is inputting a sentence, the other worker is in a waiting state, so a lot of time is wasted and efficiency is lowered. Furthermore, there is also a problem that the content of conversations overlaps with a plurality of worker pairs and is wasted.

本発明の目的は、上記の技術課題を解決し、同一のシナリオを複数人で作成可能とすることにより、作業者を拘束する必要がなくなり、作業効率を向上させることが可能な対話シナリオコーパスの作成支援システムを提供することにある。 An object of the present invention is to solve the above technical problem, and by enabling the creation of the same scenario by a plurality of people, there is no need to restrain the operator, and an interactive scenario corpus that can improve work efficiency is provided. The purpose is to provide a creation support system.

上記の目的を達成するために、本発明は、対話する各発話者の立場での発話入力を繰り返して対話シナリオを完成させる対話シナリオコーパスの作成支援システムにおいて、以下の構成を具備した点に特徴がある。 In order to achieve the above object, the present invention is characterized in that a dialogue scenario corpus creation support system that completes a dialogue scenario by repeating utterance input from the standpoint of each uttering speaker has the following configuration: There is.

(1) 発話の登録数が不足する未完成シナリオを記憶するデータベースと、発話の入力作業を担うワーカへ発話入力を依頼する未完成シナリオを選択する手段と、選択した未完成シナリオをワーカへ提示する手段と、提示した未完成シナリオに対して入力された発話を前記データベースに登録する手段とを具備し、前記未完成シナリオの選択、提示および入力された発話の登録を繰り返すようにした。 (1) A database for storing incomplete scenarios with insufficient number of utterances registered, means for selecting incomplete scenarios for requesting utterance input to workers responsible for utterance input, and presenting selected incomplete scenarios to workers And means for registering an utterance input to the presented incomplete scenario in the database, and selecting, presenting and registering the input utterance is repeated.

(2) 各発話者の属性情報を記憶する手段と、各ワーカの属性情報を取得する手段とを具備し、選択する手段は、発話者の属性情報がワーカと類似する未完成シナリオを優先的に選択するようにした。 (2) It has means for storing the attribute information of each speaker and means for acquiring the attribute information of each worker, and the means for selection is given priority to an incomplete scenario in which the attribute information of the speaker is similar to that of the worker. To choose.

(3) 前記選択する手段は、当該ワーカによる発話入力の履歴がある未完成シナリオを優先的に選択するようにした。 (3) The selecting means preferentially selects an incomplete scenario having a history of speech input by the worker.

(4) 未完成シナリオの話題を判別する手段と、未完成シナリオの話題とワーカの識別情報との関連性を計算する手段とを具備し、選択する手段は、ワーカの識別情報との関連性がより高い話題の未完成シナリオを優先的に選択するようにした。 (4) It has means for discriminating the topic of the incomplete scenario, and means for calculating the relationship between the topic of the incomplete scenario and the worker identification information, and the means for selecting is the relationship with the worker identification information. Priority is given to the incomplete scenario of a higher topic.

(5) 前記提示する手段は、発話者の属性情報をワーカに提示する手段をさらに具備した。 (5) The means for presenting further includes means for presenting speaker attribute information to a worker.

(6) 不適切な発話を含む未完成シナリオをワーカに報告させる手段を更に具備し、報告数が基準値を超えた未完成シナリオを削除するようにした。 (6) A means for allowing workers to report incomplete scenarios including inappropriate utterances is further provided, and incomplete scenarios whose number of reports exceeds the reference value are deleted.

(7) 入力された各発話に含まれる語句をクラスタリングし、発話数が所定数に達したか否かに基づいて対話シナリオが完成したか否かを判断する際の当該所定数をクラスタ数に応じて可変とした。 (7) The words included in each input utterance are clustered, and based on whether the number of utterances reaches a predetermined number, the predetermined number when judging whether or not the dialogue scenario is completed is set as the number of clusters. It was made variable accordingly.

(8) 前記データベースを、一つのルート発話から少なくとも一つのノード発話を中継してリーフ発話に至る複数の対話シナリオで構成される木構造とした。 (8) The database has a tree structure composed of a plurality of dialogue scenarios that relay at least one node utterance from one root utterance to a leaf utterance.

(9) 階層ごとに発話内容が類似するノードを集約する手段を具備した。 (9) A means for aggregating nodes with similar utterance contents for each layer is provided.

(10) 入力発話に含まれる語句の当該入力発話と同一階層における出現頻度が所定値を超えていると当該語句を含む発話入力を禁止する手段を具備した。 (10) There is provided means for prohibiting utterance input including a word / phrase when the frequency of appearance of the word / phrase included in the input utterance exceeds a predetermined value in the same hierarchy as the input utterance.

(11) 前記提示する手段は、n番目までの発話とn+2番目以降の発話とが登録された未完成シナリオをワーカへ提示し、n+1番目の発話の穴埋め的な入力を依頼するようにした。 (11) The presenting means presents the incomplete scenario in which the utterances up to the nth and the utterances after the n + 2th are registered to the worker, and requests filling-in input of the n + 1th utterance. I did it.

本発明によれば、以下のような効果が達成される。 According to the present invention, the following effects are achieved.

(1) 複数のワーカが独立して複数のシナリオを並列的に作成することが可能になるので、効率的なシナリオ構築が可能となり、かつ多様な話題を含むシナリオを簡単に構築できるようになる。 (1) Since multiple workers can independently create multiple scenarios in parallel, it is possible to construct efficient scenarios and easily construct scenarios that include various topics. .

(2) 各発話者の発話を交互に入力することを繰り返すので、構造化されたシナリオを構築することが可能となり、対話システムにおける会話の品質を高められるようになる。 (2) Since it repeats inputting each speaker's utterance alternately, it becomes possible to construct a structured scenario and improve the quality of conversation in the dialogue system.

(3) 複数のワーカが、発話入力を依頼された未完成シナリオに応じて一方側の発話者及び他方側の発話者のいずれとしても発話入力することができ、一つの発話入力を終えると次の未完成シナリオが提示されるので、対話相手の発話入力を待つことなく、かつ時間的な拘束を受けることもないので効率的なシナリオ作成が可能になる。 (3) Multiple workers can input utterances as either one utterer or the other utterer according to the incomplete scenario requested for utterance input. Since an uncompleted scenario is presented, it is possible to create an efficient scenario without waiting for the dialogue partner's utterance input and without being restricted in time.

(4) 未完成シナリオと共に発話者の仮想的な属性情報をワーカに提示するので、同一発話者の発話を異なるワーカが入力する場合、あるいは異なる発話者の発話を同じワーカが入力する場合でも発話内容の一貫性を維持できるようになる。 (4) Since the virtual attribute information of the speaker is presented to the worker together with the unfinished scenario, the utterance can be input even when different workers input the same speaker's utterance or when the same worker inputs the utterance of the different speaker. Maintain consistency in content.

(5) 内容が類似する発話を一つに統合するので、多様な発話を残しつつ、分岐数の爆発的な増加を抑えられるようになる。 (5) Since utterances with similar contents are integrated into one, it is possible to suppress an explosive increase in the number of branches while leaving various utterances.

(6) シナリオの完成条件を、入力された各発話に含まれる語句をクラスタリングした際のクラスタ数に応じて可変としたので、話題の豊富なシナリオについては階層数の多いシナリオを構築できる一方、話題に乏しいシナリオについては階層数を少なくできる。したがって、無駄の少ない効率的なシナリオ構築が可能となる。 (6) Since the scenario completion conditions are variable according to the number of clusters when the words included in each input utterance are clustered, a scenario with a large number of hierarchies can be constructed for a topic rich scenario, The number of hierarchies can be reduced for scenarios with few topics. Therefore, it is possible to construct an efficient scenario with little waste.

(7) 不適切な発話を含むシナリオを各ワーカへ報告させ、報告数の多いシナリオをデータベースから削除するようにしたので、不適切なシナリオの作成が継続されてしまう無駄が排除され、シナリオの品質を維持できるようになる。 (7) Scenarios containing inappropriate utterances are reported to each worker, and scenarios with a large number of reports are deleted from the database, eliminating the waste of continuing to create inappropriate scenarios and eliminating scenarios. Quality can be maintained.

(8) n番目までの発話とn+2番目以降の発話とが登録された未完成シナリオをワーカへ提示し、n+1番目の穴埋め的な発話入力を依頼するようにしたので、既存の木構造を拡張することができ、一貫性のある複数のシナリオを簡単に追加できるようになる。 (8) Since the incomplete scenario in which the utterances up to the nth and the utterances after the n + 2th are registered is presented to the worker and the + 1st filling speech is requested, the existing utterance is requested. The tree structure can be expanded, and multiple consistent scenarios can be easily added.

本発明を適用した対話シナリオコーパスの作成支援システムの第１実施形態のブロック図である。1 is a block diagram of a first embodiment of a dialogue scenario corpus creation support system to which the present invention is applied. FIG. 対話シナリオを木構造化した例を示した図である。It is the figure which showed the example which made the dialog scenario tree structure. 複数のワーカW1，W2が各発話者の発話を交互に入力してシナリオを完成させる手順を示したシーケンスフローである。This is a sequence flow showing a procedure in which a plurality of workers W1 and W2 complete the scenario by alternately inputting the utterances of the respective speakers. 各ワーカによる発話の入力例（その１）を示した図である。It is the figure which showed the input example (the 1) of the speech by each worker. 各ワーカによる発話の入力例（その２）を示した図である。It is the figure which showed the input example (the 2) of the speech by each worker. 各ワーカによる発話の入力例（その３）を示した図である。It is the figure which showed the input example (the 3) of the speech by each worker. 本発明の第２実施形態に係る対話シナリオコーパス作成支援システムの機能ブロック図である。It is a functional block diagram of the dialogue scenario corpus creation assistance system which concerns on 2nd Embodiment of this invention. 発話者P2の仮想的な属性情報の提示例を示した図である。It is the figure which showed the example of presentation of the virtual attribute information of speaker P2. 本発明の第３実施形態に係る対話シナリオコーパス作成支援システムの機能ブロック図である。It is a functional block diagram of the dialogue scenario corpus creation assistance system which concerns on 3rd Embodiment of this invention. 第３実施形態における発話入力画面の一例を示した図である。It is the figure which showed an example of the speech input screen in 3rd Embodiment. 本発明の第４実施形態に係る対話シナリオコーパス作成支援システムの機能ブロック図である。It is a functional block diagram of the dialogue scenario corpus creation assistance system which concerns on 4th Embodiment of this invention. 発話の入力内容が重複する例を示した図である。It is the figure which showed the example in which the input content of an utterance overlaps. 本発明の第５実施形態に係る対話シナリオコーパス作成支援システムの機能ブロック図である。It is a functional block diagram of the dialogue scenario corpus creation assistance system which concerns on 5th Embodiment of this invention. 入力禁止語が設定される例を示した図である。It is the figure which showed the example in which an input prohibition word is set. 本発明の第６実施形態に係る対話シナリオコーパス作成支援システムの機能ブロック図（その１）である。It is a functional block diagram (the 1) of the dialogue scenario corpus creation assistance system which concerns on 6th Embodiment of this invention. 本発明の第６実施形態に係る対話シナリオコーパス作成支援システムの機能ブロック図（その２）である。It is a functional block diagram (the 2) of the dialogue scenario corpus creation assistance system which concerns on 6th Embodiment of this invention. 本発明の第７実施形態に係る対話シナリオコーパス作成支援システムの機能ブロック図である。It is a functional block diagram of the dialogue scenario corpus creation assistance system which concerns on 7th Embodiment of this invention. 発話を穴埋め的に入力することで木構造が拡張される様子を示した図である。It is the figure which showed a mode that a tree structure was expanded by inputting speech utterly.

以下、図面を参照して本発明の実施の形態について詳細に説明する。図１は、本発明を適用した対話シナリオコーパスの作成支援システムの構成を示したブロック図であり、シナリオ作成支援装置１、シナリオデータベース(DB)２および作業者端末３をネットワークで相互に接続して構成される。本発明の作成支援システムは、対話する各発話者の立場での発話入力を繰り返して対話シナリオを完成させる。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing the configuration of a dialogue scenario corpus creation support system to which the present invention is applied. A scenario creation support device 1, a scenario database (DB) 2 and a worker terminal 3 are connected to each other via a network. Configured. The creation support system of the present invention repeats utterance input from the standpoint of each utterer who interacts to complete a dialogue scenario.

シナリオDB２には、発話の登録数が不足する未完成のシナリオが多数登録されている。図２は、本実施形態におけるシナリオの構成例を示した図であり、ルートに相当する第１発話から、ノードに相当する複数の発話への分岐を繰り返し、リーフに相当する複数の最終発話に至る複数の対話シナリオが木構造で登録され、各ノードに発話が登録される。 In the scenario DB 2, many unfinished scenarios in which the number of registered utterances is insufficient are registered. FIG. 2 is a diagram showing a configuration example of a scenario in the present embodiment. The branch from the first utterance corresponding to the root to a plurality of utterances corresponding to the nodes is repeated, and a plurality of final utterances corresponding to the leaves are obtained. A plurality of interactive scenarios are registered in a tree structure, and an utterance is registered in each node.

シナリオ作成支援装置１において、ワーカ割当部１０１は、例えばクラウドソーシングにより発話の入力作業を担うワーカWiの参加登録を受け付け、参加登録した各ワーカWiにシナリオ作成タスクを割り当てる。各ワーカWiには、特定の発話者に関する発話入力タスクを割り当てても良いし、発話者を特定せずに発話入力タスクを割り当てても良い。 In the scenario creation support apparatus 1, the worker assignment unit 101 accepts participation registration of worker Wis responsible for utterance input work by crowdsourcing, for example, and assigns a scenario creation task to each worker Wi registered for participation. Each worker Wi may be assigned a speech input task related to a specific speaker, or may be assigned a speech input task without specifying a speaker.

シナリオ選択部１０２は、発話の入力を依頼する未完成のシナリオをワーカWiごとに選択する。本実施形態では、シナリオ選択部１０２が属性評価部１０２ａおよび発話評価部１０２ｂを含む。前記属性評価部１０２ａは、予め登録されている発話者P1，P2の仮想的な属性情報と各ワーカWiの属性情報とを比較し、発話者の属性がワーカWiと類似する未完成シナリオを優先的に選択する。 The scenario selection unit 102 selects, for each worker Wi, an incomplete scenario for which input of an utterance is requested. In the present embodiment, the scenario selection unit 102 includes an attribute evaluation unit 102a and an utterance evaluation unit 102b. The attribute evaluation unit 102a compares the virtual attribute information of the speakers P1 and P2 registered in advance with the attribute information of each worker Wi, and gives priority to the incomplete scenario whose speaker attribute is similar to the worker Wi. To choose.

前記発話評価部１０２ｂは、未完成シナリオの話題と各ワーカWiの属性情報とを比較し、ワーカWiの属性情報と関連性のより高い話題の未完成シナリオを優先的に選択する。未完成シナリオの話題は、話題が既知の文書を機械学習の学習データとして識別器に学習させ、現在のシナリオを話題判別対象とすることで推定できる。 The utterance evaluation unit 102b compares the topic of the incomplete scenario with the attribute information of each worker Wi, and preferentially selects the incomplete scenario of the topic that is more relevant to the attribute information of the worker Wi. The topic of an incomplete scenario can be estimated by causing a classifier to learn a document with a known topic as machine learning learning data and using the current scenario as a topic discrimination target.

あるいは、ワーカWiが過去に発話入力した履歴のある未完成シナリオを優先的に選択するようにしても良い。例えば、ワーカWiが過去に発話入力した回数が多い未完成シナリオほど優先度を高くすれば、ワーカは一貫して対話に取り組めるため、対話の矛盾や破たんを軽減できるようになる。 Or you may make it preferentially select the incomplete scenario with the log | history which the worker Wi input utterance in the past. For example, if the priority is set higher for an incomplete scenario where the number of utterances input by the worker Wi in the past is higher, the worker can work on the conversation consistently, so that it becomes possible to reduce the contradiction and breakdown of the dialogue.

シナリオ提示部１０３は、各ワーカWiの作業者端末３に前記選択した未完成シナリオを提示して発話の入力を依頼する。シナリオ登録部１０４は、ワーカが作業者端末３から入力した発話をシナリオDB２上の対応する未完成シナリオの対応するノードに追加登録する。 The scenario presentation unit 103 presents the selected incomplete scenario to the worker terminal 3 of each worker Wi and requests input of an utterance. The scenario registration unit 104 additionally registers the utterance input from the worker terminal 3 by the worker to the corresponding node of the corresponding incomplete scenario on the scenario DB 2.

本実施形態では、第n発話まで登録された未完成シナリオをシナリオ選択部１０２がシナリオDB２から選択し、シナリオ提示部１０３が各ワーカWiへ提示することで当該未完成シナリオへの第n+1発話の入力を依頼する。各ワーカWiは作業者端末３を操作して未完成シナリオに第n+1発話を入力する。シナリオ登録部１０４は、入力された第n+1発話を未完成シナリオの対応ノードに追加登録する。このような未完成シナリオの選択、各ワーカWiへの提示、各ワーカWiによる発話入力および発話登録は、最終発話の入力が完了して対話シナリオが完成するまで繰り返される。 In the present embodiment, the scenario selection unit 102 selects an incomplete scenario registered up to the nth utterance from the scenario DB 2, and the scenario presentation unit 103 presents each worker Wi to the n + 1th to the incomplete scenario. Ask for utterance input. Each worker Wi operates the worker terminal 3 to input the (n + 1) th utterance in the incomplete scenario. The scenario registration unit 104 additionally registers the input n + 1-th utterance to the corresponding node of the incomplete scenario. Such selection of an incomplete scenario, presentation to each worker Wi, utterance input and utterance registration by each worker Wi are repeated until the input of the final utterance is completed and the dialogue scenario is completed.

図３は、複数のワーカWiが各発話者P1，P2の発話を交互に入力してシナリオを完成させる手順を示したシーケンスフローであり、図４は、各ワーカWiによる発話の入力例を示した図である。ここでは、2人のワーカW1，W2に発話の入力を依頼する場合を例にして説明する。 FIG. 3 is a sequence flow showing a procedure for completing the scenario by alternately inputting the utterances of the respective speakers P1 and P2 by a plurality of worker Wis, and FIG. 4 shows an example of utterance input by each worker Wi. It is a figure. Here, a case where two workers W1 and W2 are requested to input an utterance will be described as an example.

時刻t1では、例えばクラウドソーシングによりシナリオ作成のワーカを募集し、応募者の属性情報としてプロフィールなどが支援装置１に登録される。時刻t2では、シナリオ作成支援装置１のシナリオ選択部１０２が、ワーカW1，W2へ作成依頼する未完成シナリオをシナリオDB２からそれぞれ選択、抽出する。時刻t3では、シナリオ提示部１０３が、テーマと第1発話のみが登録された未完成シナリオを各ワーカW1，W2へそれぞれ提示して発話の入力を依頼する。 At time t1, for example, a scenario creation worker is recruited by crowdsourcing, and a profile or the like is registered in the support apparatus 1 as attribute information of the applicant. At time t2, the scenario selection unit 102 of the scenario creation support apparatus 1 selects and extracts from the scenario DB 2 unfinished scenarios requested to be created by the workers W1 and W2. At time t3, the scenario presenting unit 103 presents the incomplete scenario in which only the theme and the first utterance are registered to each of the workers W1 and W2, and requests the input of the utterance.

本実施形態では、「休日の過ごし方」、「飲食」といったテーマごとに2人の発話者P1，P2の仮想的な対話が交互に繰り返される対話シナリオの作成支援を想定しており、未完成シナリオの初期状態は、テーマと当該テーマに対する一方側発話者P1の第1発話のみが登録されている。本実施形態では、初めは図４(a)に示したように、テーマ「休日の過ごし方」と当該テーマに対する発話者P1の第１発話「休日は何をしていますか？」のみが登録された未完成シナリオがワーカW1，W2にそれぞれ提示され、各作業者端末３のディスプレイに表示される。 This embodiment assumes support for creating a dialogue scenario in which virtual conversations between two speakers P1 and P2 are repeated alternately for each theme such as “How to spend holidays” and “Eating and drinking”. In the initial state of the scenario, only the theme and the first utterance of the one-side speaker P1 for the theme are registered. In this embodiment, as shown in FIG. 4 (a), only the theme “how to spend holidays” and the first utterance of the speaker P1 for the theme “what are you doing on holidays?” Are registered. The completed incomplete scenario is presented to each of workers W1 and W2 and displayed on the display of each worker terminal 3.

時刻t4，t5では、ワーカW1，W2が前記提示された未完成シナリオの直前発話に対する次発話を、他方側発話者P2の立場でそれぞれ入力する。ここでは、未完成シナリオが発話者P1の第1発話まで登録されているので、図４(b)に示したように、各ワーカW1，W2は発話者P2の立場で発話入力欄３１にキー入力または音声入力等の適宜の手段により第２発話を入力する。 At times t4 and t5, the workers W1 and W2 input the next utterance for the immediately preceding utterance of the presented incomplete scenario from the standpoint of the other speaker P2. Here, since the incomplete scenario has been registered up to the first utterance of the speaker P1, as shown in FIG. 4 (b), each worker W1, W2 has the key in the utterance input field 31 in the position of the speaker P2. The second utterance is input by appropriate means such as input or voice input.

本実施形態は、「サークルでテニスをやります。」と入力した例を示している。発話の入力後、時刻t6，t7において、ワーカW1，W2が送信ボタン３２をクリックすると、図４(c)に示したように、各ワーカW1，W2の入力した第２発話がシナリオ支援装置１へ通知され、各未完成シナリオの第２発話としてシナリオDB２に登録される。 This embodiment shows an example in which “I play tennis in a circle” is input. When the workers W1 and W2 click the send button 32 at time t6 and t7 after inputting the utterance, as shown in FIG. 4 (c), the second utterance input by each worker W1 and W2 is the scenario support device 1. Is registered in the scenario DB 2 as the second utterance of each incomplete scenario.

時刻t8では、ワーカW1へ依頼する未完成シナリオおよびワーカW2へ依頼する未完成シナリオがシナリオDB２から改めて選択、抽出される。時刻t9では、各ワーカW1，W2へ前記抽出された未完成シナリオが提示されてシナリオ作成が依頼される。 At time t8, an incomplete scenario requested to worker W1 and an incomplete scenario requested to worker W2 are selected and extracted from scenario DB 2 again. At time t9, the extracted unfinished scenarios are presented to the workers W1 and W2, and a scenario creation is requested.

時刻t10，t11では、ワーカW1，W2が前記提示された未完成シナリオに次発話をそれぞれ入力する。ここでは、図５(a)に示したように、各未完成シナリオが発話者P2による第２発話まで登録されているので、図５(b)に示したように、各ワーカW1，W2は発話者P1の立場で第３発話「何年くらいつづけていますか？」を入力する。時刻t12，t13では、図５(c)に示したように、ワーカW1，W2の入力した前記第３発話がシナリオDB２に登録される。 At times t10 and t11, workers W1 and W2 input next utterances to the presented incomplete scenarios, respectively. Here, as shown in FIG. 5 (a), since each incomplete scenario has been registered up to the second utterance by the speaker P2, as shown in FIG. 5 (b), each worker W1, W2 Enter the third utterance "How many years have you continued?" From the viewpoint of speaker P1. At times t12 and t13, as shown in FIG. 5 (c), the third utterance input by the workers W1 and W2 is registered in the scenario DB2.

時刻t14では、ワーカW1へ依頼する未完成シナリオおよびワーカW2へ依頼する未完成シナリオがシナリオDB２から改めて選択、抽出される。時刻t15では、各ワーカW1，W2へ前記抽出された未完成シナリオが提示されてシナリオ作成が依頼される。 At time t14, the incomplete scenario requested to worker W1 and the incomplete scenario requested to worker W2 are selected and extracted from the scenario DB 2 again. At time t15, the extracted incomplete scenario is presented to each of the workers W1 and W2, and a scenario creation is requested.

時刻t16，t17では、ワーカW1，W2が前記提示された未完成シナリオに次発話をそれぞれ入力する。ここでは、図６(a)に示したように、各未完成シナリオが発話者P1による第３発話まで登録されているので、図６(b)に示したように、各ワーカW1，W2は発話者P2の立場で第４発話「大学からなので10年です。」を入力する。時刻t18，t19では、図６(c)に示したように、ワーカW1，W2の入力した前記第４発話がシナリオDB２に登録される。以下同様に、ワーカW1，W2は、提示された未完成シナリオに第n発話まで登録されていれば、第n+1発話を入力することを繰り返す。 At times t16 and t17, workers W1 and W2 input next utterances to the presented incomplete scenarios, respectively. Here, as shown in FIG. 6 (a), since each incomplete scenario has been registered up to the third utterance by the speaker P1, each worker W1, W2 is shown in FIG. 6 (b). Enter the 4th utterance "From university, 10 years" as the speaker P2. At times t18 and t19, as shown in FIG. 6C, the fourth utterance input by the workers W1 and W2 is registered in the scenario DB2. Similarly, the workers W1 and W2 repeatedly input the (n + 1) th utterance if the nth utterance is registered in the presented incomplete scenario.

本実施形態によれば、各ワーカWは発話入力を依頼された未完成シナリオに応じて一方側和発話者P1及び他方側発話者P2のいずれとしても発話入力することができ、一つの発話入力を終えると次の未完成シナリオが提示されるので、相手の発話入力を待つことなく、かつ時間的な拘束を受けることなく、効率的なシナリオ作成が可能になる。 According to the present embodiment, each worker W can input an utterance as either one of the one-side Japanese speaker P1 and the other-side speaker P2 according to the incomplete scenario requested to input the utterance, and one utterance input Since the next unfinished scenario is presented after finishing the process, it is possible to create an efficient scenario without waiting for the input of the other party's utterance and without being restricted in time.

図７は、本発明の第２実施形態に係る対話シナリオコーパス作成支援システムの構成を示した機能ブロック図であり、前記と同一の符号は同一又は同等部分を表しているので、その説明は省略する。本実施形態は、シナリオ提示部１０３が、各発話者の仮想的な属性情報としてペルソナを記憶するペルソナ記憶部１０３ａ、および未完成シナリオをワーカWiへ提示する際に発話者のペルソナも併せて提示するペルソナ提示部１０３ｂを具備した点に特徴がある。 FIG. 7 is a functional block diagram showing the configuration of the dialogue scenario corpus creation support system according to the second embodiment of the present invention. The same reference numerals as those described above represent the same or equivalent parts, and the description thereof is omitted. To do. In the present embodiment, the scenario presenting unit 103 also presents the persona storage unit 103a for storing the persona as virtual attribute information of each speaker, and the persona of the speaker when presenting the incomplete scenario to the worker Wi. The persona presenting unit 103b is characterized.

シナリオ提示部１０３は、未完成シナリオをワーカWiに提示して発話の入力を依頼する際、図８に示したように、発話者の仮想的なペルソナを併せてワーカWiに提示し、ワーカWiが当該ペルソナを考慮した発話を入力できるようにした点に特徴がある。本実施形態では、発話者P1，P2の仮想的なペルソナが予め登録されており、名前、ニックネーム、年齢、性別、生年月日等が提示される。 When the scenario presenting unit 103 presents the incomplete scenario to the worker Wi and requests the input of the utterance, as shown in FIG. 8, the scenario presenting unit 103 presents the virtual persona of the speaker together with the worker Wi, Is characterized in that it can input utterances considering the persona. In the present embodiment, virtual personas of the speakers P1 and P2 are registered in advance, and the name, nickname, age, gender, date of birth, etc. are presented.

本実施形態によれば、未完成シナリオと共に発話者の仮想的なペルソナを提示するので、同一発話者の発話を異なるワーカWiが入力する場合、あるいは異なる発話者の発話を同じワーカWiが入力する場合でも発話内容の一貫性を維持できるようになる。 According to this embodiment, since the speaker's virtual persona is presented together with the incomplete scenario, when different worker Wis input the same speaker's speech, or the same worker Wi inputs the same speaker's speech Even in this case, the consistency of the utterance contents can be maintained.

図９は、本発明の第３実施形態に係る対話シナリオコーパス作成支援システムの構成を示した機能ブロック図であり、前記と同一の符号は同一又は同等部分を表している。本実施形態では、シナリオ提示部１０３が、不適切な発話を含む未完成シナリオをワーカWiに報告させる不適切報告要求部１０３ｃを具備し、シナリオ登録部１０４が、不適切の報告数が基準値を超えた未完成シナリオをシナリオDB２から削除するシナリオ削除部１０４ａを具備した点に特徴がある。 FIG. 9 is a functional block diagram showing the configuration of the dialogue scenario corpus creation support system according to the third embodiment of the present invention. The same reference numerals as those described above represent the same or equivalent parts. In the present embodiment, the scenario presenting unit 103 includes an inappropriate report requesting unit 103c that causes the worker Wi to report an incomplete scenario including an inappropriate utterance, and the scenario registration unit 104 determines that the inappropriate report count is a reference value. It is characterized in that a scenario deletion unit 104a for deleting unfinished scenarios exceeding the above from the scenario DB 2 is provided.

未完成シナリオをワーカWiへ提示して発話の入力を依頼する際に、不適切報告要求部１０３ｃは、図１０に示したように、発話入力欄３１および送信ボタン３２に加えて、既登録の発話に不適切な表現、内容が含まれていることをシステム側へ報告させるための不適切報告ボタン３３を表示させる。 When the uncompleted scenario is presented to the worker Wi and the input of the utterance is requested, the inappropriate report requesting unit 103c, in addition to the utterance input field 31 and the send button 32, as shown in FIG. An inappropriate report button 33 is displayed for reporting to the system side that an inappropriate expression or content is included in the utterance.

図示の例では、前回発話の「初めましてこんにちは。」が、それまでの発話の経緯から不自然であるため、ワーカWiは発話入力することなく不適切報告ボタン３３をクリックすることで、提示された未完成シナリオに不適切な発話が含まれていることをシナリオ作成支援装置１へ通知する。シナリオ登録部１０４では、前記シナリオ削除部１０４ａが不適切報告数を未完成シナリオごとに計数し、所定数を超える不適切報告のあった未完成シナリオをシナリオDB２から削除する。 In the illustrated example, the "Hello Nice to meet you." The last speech, because it is unnatural from the history of the utterances of up to it, the worker Wi can click the incorrect report button 33 without speech input, it is presented The scenario creation support apparatus 1 is notified that an inappropriate utterance is included in the incomplete scenario. In the scenario registration unit 104, the scenario deletion unit 104a counts the number of inappropriate reports for each incomplete scenario, and deletes the uncompleted scenarios with inappropriate reports exceeding a predetermined number from the scenario DB2.

本実施形態によれば、不適切な発話を含む未完成シナリオを排除できるので、不適切なシナリオの作成が継続されてしまう無駄が排除され、シナリオの品質を高く維持できるようになる。 According to the present embodiment, since an incomplete scenario including an inappropriate utterance can be eliminated, waste that continues to create an inappropriate scenario is eliminated, and the quality of the scenario can be maintained high.

図１１は、本発明の第４実施形態に係る対話シナリオコーパス作成支援システムの構成を示した機能ブロック図であり、前記と同一の符号は同一又は同等部分を表している。本実施形態は、テーマが同一の複数の未完成シナリオを対象に、同一階層ごとに発話内容に基づくクラスタリングを実行し、同一クラスタに分類された発話を一の代表発話に置き換えることでシナリオを集約するシナリオ集約部１０５を具備した点に特徴がある。 FIG. 11 is a functional block diagram showing a configuration of a dialogue scenario corpus creation support system according to the fourth exemplary embodiment of the present invention. The same reference numerals as those described above represent the same or equivalent parts. In this embodiment, for a plurality of unfinished scenarios with the same theme, clustering based on utterance contents is performed for each layer, and scenarios are aggregated by replacing utterances classified into the same cluster with one representative utterance. This is characterized in that a scenario aggregation unit 105 is provided.

対話シナリオが木構造であると、子ノードの数だけシナリオが分岐するため、対話が進むにつれてシナリオパターンが爆発的に増加する。例えば、一つの発話に対して10個の発話を入力させて木構造を構築すると、10段目は10億通りとなり、膨大な入力が必要となってしまう。シナリオ集約部１０５は、内容が類似する発話をk-means又はX-meansなどのクラスタリング手法を用いてグルーピングし、k-means手法であればクラスタの中心に近い発話、X-meansであれが最適発話のみに対して発話を継続する。 If the dialogue scenario has a tree structure, the scenario branches as many as the number of child nodes, so that the scenario pattern increases explosively as the dialogue proceeds. For example, if a tree structure is constructed by inputting 10 utterances for one utterance, the 10th stage becomes 1 billion ways, which requires enormous input. The scenario aggregator 105 groups utterances with similar contents using a clustering method such as k-means or X-means, and if the k-means method is used, the utterance close to the center of the cluster, X-means is optimal. Continue to speak only for utterances.

図１２の例であれば、「ビールが好きです。」、「夏はビールをよく飲みます。」、「地ビールが好きで、旅行の楽しみの一つです。」が同一クラスタに分類され、「ビールが好きです。」が代表発話に選定されている。したがって、「ビールが好きです。」と同一クラスタに属する発話は全て「ビールが好きです。」のノードに集約される。本実施形態によれば、多様な発話を残しつつ、分岐数の爆発的な増加を抑えられるようになる。 In the example of FIG. 12, “I like beer”, “I often drink beer in the summer”, “I like local beer and it ’s one of the fun of traveling” are classified into the same cluster, “I like beer.” Is selected as the representative utterance. Therefore, all utterances belonging to the same cluster as “I like beer” are collected in the node “I like beer”. According to the present embodiment, an explosive increase in the number of branches can be suppressed while leaving various utterances.

図１３は、本発明の第５実施形態に係る対話シナリオコーパス作成支援システムの構成を示した機能ブロック図であり、前記と同一の符号は同一又は同等部分を表している。本実施形態では、シナリオ登録部１０４が入力禁止語処理部１０４ｂを具備した点に特徴がある。 FIG. 13 is a functional block diagram showing a configuration of a dialogue scenario corpus creation support system according to the fifth exemplary embodiment of the present invention. The same reference numerals as those described above represent the same or equivalent parts. The present embodiment is characterized in that the scenario registration unit 104 includes an input prohibited word processing unit 104b.

入力禁止語処理部１０４ｂは、タイトルが同一の未完成シナリオを対象に発話内での語句の出現頻度を計算し、所定の頻度を超えて出現する頻出語を入力禁止語に設定する。そして、入力された発話に入力禁止語が含まれていると、ワーカWiに対して入力禁止語が含まれる旨を通知して他の発話入力を促すようにしている。 The input-prohibited word processing unit 104b calculates the frequency of appearance of a phrase within an utterance for an incomplete scenario with the same title, and sets frequent words that appear beyond a predetermined frequency as input-prohibited words. If an input prohibited word is included in the input utterance, the worker Wi is notified that the input prohibited word is included, and is prompted to input another utterance.

図１４に示した例では、直前発話「好きなお酒は何ですか」に対して、他のワーカWiが既に「ビールが好きです。」「発明はビールをよく飲みます。」、「地ビールが好きで、旅行の楽しみの一つです。」「家で少しボビールを飲みます」など、ビール関連の話題を多数登録しているので「ビール」が入力禁止語に設定されている。 In the example shown in FIG. 14, in response to the previous utterance “What is your favorite liquor?”, Another worker Wi has already “I like beer.” “Inventors often drink beer.” I like it and it is one of the fun of traveling. ”“ Beer ”is set as a prohibited word because there are many beer related topics such as“ I drink a little beer at home ”.

これにより、「ビール」を含む発話が入力されると、入力禁止語処理部１０４ｂは送信ボタン３２をグレーアウトさせると共に「ビール」が入力禁止語である旨のメッセージおよび「入力禁止語が含まれている」旨のメッセージを提示して他の発話入力を促す。本実施形態によれば、多様な発話を残しつつ、分岐数の爆発的な増加を抑えられるようになる。 As a result, when an utterance including “beer” is input, the input prohibition word processing unit 104b grays out the transmission button 32 and includes a message that “beer” is an input prohibition word and an “input prohibition word”. Presents a message saying “I am present” and prompts other utterances to be input. According to the present embodiment, an explosive increase in the number of branches can be suppressed while leaving various utterances.

図１５，１６は、本発明の第６実施形態に係る対話シナリオコーパス作成支援システムの構成を示した機能ブロック図であり、前記と同一の符号は同一又は同等部分を表している。本実施形態では、発話の登録数が所定数に達した対話シナリオを完成と評価するにあたり、発話に含まれる語句の出現頻度や発話入力に要する時間に基づいて前記所定値を動的に変更する完成条件設定部１０６を具備した点に特徴がある。 FIGS. 15 and 16 are functional block diagrams showing the configuration of a dialogue scenario corpus creation support system according to the sixth embodiment of the present invention. The same reference numerals as those described above represent the same or equivalent parts. In this embodiment, when evaluating a dialogue scenario in which the number of registered utterances reaches a predetermined number as completion, the predetermined value is dynamically changed based on the appearance frequency of words included in the utterance and the time required for utterance input. A feature is that a completion condition setting unit 106 is provided.

図１５の例では、前記完成条件設定部１０６がクラスタリング部１０６ａを含み、未完成シナリオごとに各発話に含まれる語句を抽出してクラスタリングを実施する。そして、クラスタ数が多い場合は更に話題が拡がる可能性が高いと判断して所定数を大きな値に設定する一方、クラスタ数が少ない場合は類似の語句の出現頻度が高く、更に話題が拡がる可能性は低いと判断して所定数を小さな値に設定する。 In the example of FIG. 15, the completion condition setting unit 106 includes a clustering unit 106 a, which extracts words / phrases included in each utterance for each incomplete scenario and performs clustering. If the number of clusters is large, it is determined that there is a high possibility that the topic will be expanded, and the predetermined number is set to a large value. On the other hand, if the number of clusters is small, the appearance frequency of similar words is high, and the topic may be expanded. Therefore, the predetermined number is set to a small value.

図１６の例では、前記完成条件設定部１０６が入力時間計時部１０６ｂを含み、例えばワーカWiに入力禁止語を提示した以降の発話入力に要する時間を測定する。そして、短時間で発話が入力される場合は話題が豊富と判断して所定数を大きな値に設定する一方、発話の入力が短時間で行われなくなると話題が欠乏したと判断して所定数を小さな値に設定する。 In the example of FIG. 16, the completion condition setting unit 106 includes an input time counting unit 106b, and measures the time required for speech input after the input prohibited word is presented to the worker Wi, for example. And when utterances are input in a short time, it is determined that the topic is abundant and the predetermined number is set to a large value, while when utterance input is not performed in a short time, it is determined that the topic is insufficient and the predetermined number Set to a small value.

本実施形態によれば、話題の豊富なシナリオについては階層数の多いシナリオを構築できる一方、話題に乏しいシナリオについては階層数を少なくできるので、話題に応じて効率的なシナリオ構築が可能にはり、発話入力の効率化が可能になる。 According to the present embodiment, a scenario with a large number of layers can be constructed for a scenario with abundant topics, while a number of layers can be reduced for a scenario with few topics, so that an efficient scenario can be constructed according to the topic. This makes it possible to improve the efficiency of speech input.

図１７は、本発明の第７実施形態に係る対話シナリオコーパス作成支援システムの構成を示した機能ブロック図であり、前記と同一の符号は同一又は同等部分を表している。本実施形態では、シナリオ作成支援装置1がシナリオ拡張部１０７を具備した点に特徴がある。 FIG. 17 is a functional block diagram showing a configuration of a dialogue scenario corpus creation support system according to the seventh exemplary embodiment of the present invention. The same reference numerals as those described above represent the same or equivalent parts. The present embodiment is characterized in that the scenario creation support apparatus 1 includes a scenario expansion unit 107.

上記の各実施形態では、ワーカWiに対して未完成シナリオを提示して発話の入力を依頼する際に、それまでの発話履歴を提示し、直前発話との関連で次の発話の入力を依頼していた。これに対して、本実施形態ではシナリオ拡張部１０７がn番目までの発話とn+2番目以降の発話とが登録された未完成シナリオをワーカWiへ提示し、n+1番目の穴埋め的な発話入力を依頼するようにしている。 In each of the above embodiments, when requesting an input of an utterance by presenting an incomplete scenario to worker Wi, an utterance history up to that point is presented, and an input of the next utterance is requested in relation to the previous utterance Was. On the other hand, in this embodiment, the scenario expansion unit 107 presents the incomplete scenario in which the utterances up to the nth and the utterances after the n + 2 are registered to the worker Wi, and fills the n + 1th embedment Requests utterance input.

図１８(a)に示した入力例では、第1発話として「休日は何をしていますか」が登録され、第３発話として「何年位続けていますか」、第４発話として「大学からなので１０年近くです。」がそれぞれ登録されている未完成シナリオがワーカへ提示されている。この場合、第２発話として「ゴルフです。」「草野球です。」「ドライブです。」などの発話入力が可能である。本実施形態によれば、一つの発話を穴埋め的に入力させることで、図１８(b)に示したように既存の木構造を拡張することができ、一貫性のある複数のシナリオを簡単に追加できるようになる。 In the input example shown in FIG. 18 (a), “What are you doing during the holidays” is registered as the first utterance, “How many years have you been continuing” as the third utterance, and “University” as the fourth utterance? Because it is almost 10 years from now, "unfinished scenarios each registered" are presented to the worker. In this case, it is possible to input utterances such as “golf”, “grass baseball”, and “drive” as the second utterance. According to this embodiment, it is possible to expand an existing tree structure as shown in FIG. 18 (b) by inputting a single utterance in a filling manner, and it is possible to easily create a plurality of consistent scenarios. Can be added.

なお、上記の各実施形態では対話シナリオが木構造である場合を例にして説明したが、本発明はこれのみに限定されるものではなく、各対話シナリオが相互に独立していても良い。 In each of the above embodiments, the case where the dialogue scenario has a tree structure has been described as an example. However, the present invention is not limited to this, and each dialogue scenario may be independent of each other.

１…シナリオ作成支援装置，２…シナリオデータベース，３…作業者端末，３１…発話入力欄，３２…送信ボタン，３３…不適切報告ボタン，１０１…ワーカ割当部，１０２…シナリオ選択部，１０２ａ…属性評価部，１０２ｂ…発話評価部，１０３…シナリオ提示部，１０３ａ…ペルソナ記憶部，１０３ｂ…ペルソナ提示部，１０３ｃ…不適切報告要求部，１０４…シナリオ登録部，１０４ａ…シナリオ削除部，１０４ｂ…入力禁止語処理部，１０５…シナリオ集約部，１０６…完成条件設定部，１０６ａ…クラスタリング部，１０６ｂ…入力時間計時部，１０７…シナリオ拡張部 DESCRIPTION OF SYMBOLS 1 ... Scenario creation assistance apparatus, 2 ... Scenario database, 3 ... Worker terminal, 31 ... Speech input column, 32 ... Send button, 33 ... Inappropriate report button, 101 ... Worker allocation part, 102 ... Scenario selection part, 102a ... Attribute evaluation unit, 102b ... utterance evaluation unit, 103 ... scenario presentation unit, 103a ... persona storage unit, 103b ... persona presentation unit, 103c ... inappropriate report request unit, 104 ... scenario registration unit, 104a ... scenario deletion unit, 104b ... Input prohibition word processing unit, 105 ... scenario aggregation unit, 106 ... completion condition setting unit, 106a ... clustering unit, 106b ... input time counting unit, 107 ... scenario expansion unit

Claims

In the dialogue scenario corpus creation support system that completes the dialogue scenario by repeating the utterance input from the standpoint of each talking speaker,
A database that stores incomplete scenarios with insufficient utterance registrations,
Means for selecting an incomplete scenario for requesting utterance input to a worker who is in charge of utterance input work;
Means for presenting the selected incomplete scenario to a worker;
Means for registering, in the database, utterances input for the presented incomplete scenario;
A dialogue scenario corpus creation support system, characterized in that selection of the incomplete scenario, presentation, and registration of an inputted utterance are repeated.

Means for storing attribute information of each speaker;
Means for obtaining attribute information of each worker,
2. The dialogue scenario corpus creation support system according to claim 1, wherein the selecting means preferentially selects an incomplete scenario whose speaker attribute information is similar to that of a worker.

2. The dialogue scenario corpus creation support system according to claim 1, wherein the selecting means preferentially selects an incomplete scenario having an utterance input history by the worker.

A means of determining the topic of an incomplete scenario;
A means for calculating the relationship between the topic of the incomplete scenario and the identification information of the worker,
2. The dialogue scenario corpus creation support system according to claim 1, wherein the selecting means preferentially selects an incomplete scenario of a topic having a higher relevance with worker identification information.

5. The dialogue scenario corpus creation support system according to claim 1, wherein the presenting means further comprises means for presenting speaker attribute information to a worker.

Further comprising means for allowing workers to report incomplete scenarios containing inappropriate utterances,
6. The interactive scenario corpus creation support system according to claim 1, wherein an incomplete scenario whose number of reports exceeds a reference value is deleted.

The words and phrases included in each input utterance are clustered, and the predetermined number when judging whether or not the dialogue scenario is completed based on whether or not the number of utterances reaches a predetermined number is variable according to the number of clusters. The dialogue scenario corpus creation support system according to any one of claims 1 to 6, wherein

The database according to any one of claims 1 to 7, wherein the database has a tree structure including a plurality of dialogue scenarios that relay at least one node utterance from one root utterance to reach a leaf utterance. Dialog scenario corpus creation support system.

9. The dialogue scenario corpus creation support system according to claim 8, further comprising means for aggregating nodes having similar utterance contents for each hierarchy.

9. The dialogue scenario according to claim 8, further comprising means for prohibiting an utterance input including the word / phrase when an appearance frequency of the word / phrase included in the input utterance exceeds a predetermined value in the same hierarchy as the input utterance. Corpus creation support system.

The presenting means presents an incomplete scenario in which up to nth utterances and n + 2 and subsequent utterances are registered to a worker, and requests n + 1th utterance input. 8. A dialogue scenario corpus creation support system according to 8.