JP2023173036A

JP2023173036A - Dialogue information extraction device and dialogue information extraction method

Info

Publication number: JP2023173036A
Application number: JP2022085015A
Authority: JP
Inventors: 泰弘十河; Yasuhiro Sogawa
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2022-05-25
Filing date: 2022-05-25
Publication date: 2023-12-07

Abstract

To extract information on a dialogue having a viewpoint to which attention should be paid from a dialogue made by a plurality of speakers and enable a dialogue situation to be efficiently checked.SOLUTION: A dialogue information extraction device 100 includes: an utterance information input unit 111 that for each of utterances constituting a dialogue, inputs utterance information including a written text of the utterance; a dialogue structure estimation unit 112 that using an estimation model having learned a dialogue structure, estimates the dialogue structure of the dialogue from utterance information on a plurality of utterances in the dialogue; and a viewpoint-based dialogue extraction unit 113 that determines whether or not there is a predetermined viewpoint to which attention should be paid in an utterance group composed of utterances of a plurality of speakers generated based on the dialogue structure, thereby extracting an utterance group having the viewpoint.SELECTED DRAWING: Figure 1

Description

本発明は、対話情報抽出装置及び対話情報抽出方法に関し、複数の話者による対話を解析する対話情報抽出装置及び対話情報抽出方法に適用して好適なものである。 The present invention relates to a dialogue information extraction device and a dialogue information extraction method, and is suitable for application to a dialogue information extraction device and a dialogue information extraction method that analyze dialogues between a plurality of speakers.

近年、デジタル対話サービスの価値の拡大に向けて、音声対話の書き起こしに関する様々な取り組みが進められている。 In recent years, various efforts have been made to transcribe spoken dialogue in order to expand the value of digital dialogue services.

例えばコールセンター対応や窓口対応において、銀行などの行員または社員が顧客に向けて説明を行って手続きを進める場合、特に顧客の同意という観点を重視して対話状況を確認しながら対話を進める必要がある。この場合、顧客の同意を明確に得ていない状況でのやり取りは、金融商品販売法に抵触する等のコンプライアンス上のリスクを有する。したがって、対話時または事後の確認において発話内容及びそれに対する同意状況を確認することは非常に重要である。 For example, when dealing with a call center or at a counter, when bank staff or employees provide explanations to customers and proceed with procedures, it is necessary to proceed with the dialogue while checking the status of the dialogue, with particular emphasis on the customer's consent. . In this case, any communication without clear consent from the customer poses compliance risks, such as violating the Financial Instruments Sales Act. Therefore, it is very important to confirm the content of the utterance and the state of consent to it during the dialogue or after the fact.

上記のような背景に関連する従来技術として、例えば特許文献１には、話者ごとの応対をチャット形式で表示するコールセンター会話内容表示システムが開示されている。 As a related art related to the above background, for example, Patent Document 1 discloses a call center conversation content display system that displays the response for each speaker in a chat format.

国際公開第２０１９／００３３９５号International Publication No. 2019/003395

しかし、特許文献１に開示されたシステムは、比較的単純に、話者ごとの応対をチャット形式で表示するものであり、着目すべき所定の観点について応対の対話内容を確認しようとする場合には、発話を書き起こしたテキストから手動で確認を行う必要があった。この場合、手動作業による作業コストが膨大になるだけでなく、確認漏れが発生するリスクもあった。 However, the system disclosed in Patent Document 1 relatively simply displays the response for each speaker in a chat format, and when trying to confirm the content of the response conversation with respect to a predetermined viewpoint that should be noted. required manual verification from the transcribed text. In this case, not only would the cost of manual work be enormous, but there was also the risk of omissions in confirmation.

本発明は以上の点を考慮してなされたもので、複数の話者による対話から、着目すべき観点を有する対話の情報を抽出し、対話状況の効率的な確認を可能にする対話情報抽出装置及び対話情報抽出方法を提案しようとするものである。 The present invention has been made in consideration of the above points, and is a dialogue information extraction method that extracts information on dialogues that have noteworthy viewpoints from dialogues between multiple speakers, and enables efficient confirmation of the dialogue situation. This paper attempts to propose a device and a method for extracting dialogue information.

かかる課題を解決するため本発明においては、複数の話者による対話を着目すべき観点に基づいて分析する対話情報抽出装置であって、前記対話を構成する発話ごとに、当該発話を書き起こしたテキストを含む発話情報を入力する発話情報入力部と、対話構造を学習済みの推定モデルを用いて、前記対話における複数の発話の前記発話情報から、当該対話の対話構造を推定する対話構造推定部と、前記対話構造に基づいて生成される、複数の話者の発話からなる発話群に対して、前記観点の有無を判定することにより、前記観点を有する発話群を抽出する観点別対話抽出部と、を備えることを特徴とする対話情報抽出装置が提供される。 In order to solve this problem, the present invention provides a dialogue information extraction device that analyzes dialogues between a plurality of speakers based on noteworthy viewpoints, and which transcribes each utterance constituting the dialogue. an utterance information input unit that inputs utterance information including text; and a dialogue structure estimation unit that estimates the dialogue structure of a dialogue from the utterance information of a plurality of utterances in the dialogue using an estimation model that has learned the dialogue structure. and a viewpoint-specific dialogue extraction unit that extracts a group of utterances having the viewpoint by determining the presence or absence of the viewpoint from a group of utterances generated based on the dialogue structure and consisting of utterances of a plurality of speakers. Provided is a dialogue information extraction device characterized by comprising the following.

また、かかる課題を解決するため本発明においては、複数の話者による対話を着目すべき観点に基づいて分析する対話情報抽出装置による対話情報抽出方法であって、前記対話情報抽出装置が、前記対話を構成する発話ごとに、当該発話を書き起こしたテキストを含む発話情報を入力する発話情報入力ステップと、前記対話情報抽出装置が、対話構造を学習済みの推定モデルを用いて、前記発話情報入力ステップで入力された前記対話における複数の発話の前記発話情報から、当該対話の対話構造を推定する対話構造推定ステップと、前記対話情報抽出装置が、前記対話構造推定ステップの推定結果に基づいて生成される、複数の話者の発話からなる発話群に対して、前記観点の有無を判定することにより、前記観点を有する発話群を抽出する観点別対話抽出ステップと、を備えることを特徴とする対話情報抽出方法が提供される。 Further, in order to solve such problems, the present invention provides a dialogue information extraction method using a dialogue information extraction device that analyzes dialogues between a plurality of speakers based on viewpoints to be focused on, the dialogue information extraction device comprising: an utterance information input step of inputting utterance information including a text transcribed from the utterance for each utterance constituting the dialogue; a dialogue structure estimating step of estimating a dialogue structure of the dialogue from the utterance information of a plurality of utterances in the dialogue input in the input step; and a dialogue information extraction device, based on the estimation result of the dialogue structure estimation step. The system is characterized by comprising a perspective-specific dialogue extraction step of extracting a group of utterances having the viewpoint by determining the presence or absence of the viewpoint from the generated utterance group consisting of utterances of a plurality of speakers. A method for extracting dialogue information is provided.

本発明によれば、複数の話者による対話から、着目すべき観点を有する対話の情報を抽出し、対話状況の効率的な確認が可能となる。 According to the present invention, it is possible to extract dialogue information having a noteworthy viewpoint from dialogues between a plurality of speakers, and to efficiently check the dialogue situation.

本発明の第１の実施形態に係る対話情報抽出装置１００の構成例を示すブロック図である。FIG. 1 is a block diagram showing a configuration example of a dialogue information extraction device 100 according to a first embodiment of the present invention. 対話情報抽出処理の処理手順例を示すフローチャートである。3 is a flowchart illustrating an example of a processing procedure of dialogue information extraction processing. 発話情報の一例を示す図である。It is a figure which shows an example of utterance information. 過去発話情報群の一例を示す図である。It is a figure which shows an example of a past utterance information group. 対話構造の推定結果の一例を示す図である。FIG. 3 is a diagram illustrating an example of a dialog structure estimation result. 観点を有する発話セットに関する対話情報の一例を示す図である。FIG. 7 is a diagram illustrating an example of dialogue information regarding an utterance set having a viewpoint. 対話情報の抽出結果の表示例を示す図である。FIG. 6 is a diagram illustrating a display example of a result of extracting dialogue information. 本発明の第２の実施形態に係る対話情報抽出装置１０１の構成例を示すブロック図である。FIG. 2 is a block diagram showing a configuration example of a dialogue information extraction device 101 according to a second embodiment of the present invention. 第２の実施形態における対話情報抽出処理の処理手順例を示すフローチャートである。7 is a flowchart illustrating an example of a processing procedure of dialogue information extraction processing in the second embodiment. 対話情報抽出装置１００，１０１のハードウェア構成例を示すブロック図である。1 is a block diagram showing an example of a hardware configuration of dialogue information extraction devices 100 and 101. FIG.

以下、図面を参照して、本発明の実施形態を詳述する。 Embodiments of the present invention will be described in detail below with reference to the drawings.

なお、以下の記載及び図面は、本発明を説明するための例示であって、説明の明確化のため、適宜、省略及び簡略化がなされている。また、実施形態の中で説明されている特徴の組み合わせの全てが発明の解決手段に必須であるとは限らない。本発明が実施形態に制限されることは無く、本発明の思想に合致するあらゆる応用例が本発明の技術的範囲に含まれる。本発明は、当業者であれば本発明の範囲内で様々な追加や変更等を行うことができる。本発明は、他の種々の形態でも実施する事が可能である。特に限定しない限り、各構成要素は複数でも単数でも構わない。 Note that the following description and drawings are examples for explaining the present invention, and are omitted and simplified as appropriate to clarify the explanation. Furthermore, not all combinations of features described in the embodiments are essential to the solution of the invention. The present invention is not limited to the embodiments, and any application examples that match the idea of the present invention are included within the technical scope of the present invention. Those skilled in the art can make various additions and changes to the present invention within the scope of the present invention. The present invention can also be implemented in various other forms. Unless specifically limited, each component may be plural or singular.

以下の説明では、「テーブル」、「表」、「リスト」、「キュー」等の表現にて各種情報を説明することがあるが、各種情報は、これら以外のデータ構造で表現されていてもよい。データ構造に依存しないことを示すために「ＸＸテーブル」、「ＸＸリスト」等を「ＸＸ情報」と呼ぶことがある。各情報の内容を説明する際に、「識別情報」、「識別子」、「名」、「ＩＤ」、「番号」等の表現を用いるが、これらについてはお互いに置換が可能である。 In the following explanation, various information may be explained using expressions such as "table", "table", "list", "queue", etc., but various information may also be expressed using data structures other than these. good. "XX table", "XX list", etc. are sometimes referred to as "XX information" to indicate that they do not depend on the data structure. When explaining the contents of each piece of information, expressions such as "identification information", "identifier", "name", "ID", and "number" are used, but these can be replaced with each other.

また、以下の説明では、同種の要素を区別しないで説明する場合には、参照符号又は参照符号における共通番号を使用し、同種の要素を区別して説明する場合は、その要素の参照符号を使用又は参照符号に代えてその要素に割り振られたＩＤを使用することがある。 In addition, in the following explanation, when the same type of elements are explained without distinguishing them, reference numerals or common numbers in the reference signs are used, and when the same kind of elements are explained separately, the reference numerals of the elements are used. Alternatively, an ID assigned to the element may be used instead of the reference code.

また、以下の説明では、プログラムを実行して行う処理を説明する場合があるが、プログラムは、少なくとも１以上のプロセッサ（例えばＣＰＵ）によって実行されることで、定められた処理を、適宜に記憶資源（例えばメモリ）及び／又はインターフェースデバイス（例えば通信ポート）等を用いながら行うため、処理の主体がプロセッサとされてもよい。同様に、プログラムを実行して行う処理の主体が、プロセッサを有するコントローラ、装置、システム、計算機、ノード、ストレージシステム、ストレージ装置、サーバ、管理計算機、クライアント、又は、ホストであってもよい。プログラムを実行して行う処理の主体（例えばプロセッサ）は、処理の一部又は全部を行うハードウェア回路を含んでもよい。例えば、プログラムを実行して行う処理の主体は、暗号化及び復号化、又は圧縮及び伸張を実行するハードウェア回路を含んでもよい。プロセッサは、プログラムに従って動作することによって、所定の機能を実現する機能部として動作する。プロセッサを含む装置及びシステムは、これらの機能部を含む装置及びシステムである。 In addition, in the following explanation, processing performed by executing a program may be explained, but the program is executed by at least one or more processors (for example, a CPU) to store predetermined processing as appropriate. Since the processing is performed using resources (for example, memory) and/or interface devices (for example, communication ports), the main body of the processing may be a processor. Similarly, the subject of processing performed by executing a program may be a controller having a processor, a device, a system, a computer, a node, a storage system, a storage device, a server, a management computer, a client, or a host. The main body (for example, a processor) that performs processing by executing a program may include a hardware circuit that performs part or all of the processing. For example, the main body of processing performed by executing a program may include a hardware circuit that performs encryption and decryption, or compression and expansion. A processor operates as a functional unit that implements a predetermined function by operating according to a program. Devices and systems that include processors are devices and systems that include these functional units.

プログラムは、プログラムソースから計算機のような装置にインストールされてもよい。プログラムソースは、例えば、プログラム配布サーバ又は計算機が読み取り可能な記憶メディアであってもよい。プログラムソースがプログラム配布サーバの場合、プログラム配布サーバはプロセッサ（例えばＣＰＵ）と記憶資源を含み、記憶資源はさらに配布プログラムと配布対象であるプログラムとを記憶してよい。そして、プログラム配布サーバのプロセッサが配布プログラムを実行することで、プログラム配布サーバのプロセッサは配布対象のプログラムを他の計算機に配布してよい。また、以下の説明において、２以上のプログラムが１つのプログラムとして実現されてもよいし、１つのプログラムが２以上のプログラムとして実現されてもよい。 A program may be installed on a device, such as a computer, from a program source. The program source may be, for example, a program distribution server or a computer-readable storage medium. When the program source is a program distribution server, the program distribution server includes a processor (for example, a CPU) and a storage resource, and the storage resource may further store a distribution program and a program to be distributed. Then, by the processor of the program distribution server executing the distribution program, the processor of the program distribution server may distribute the program to be distributed to other computers. Furthermore, in the following description, two or more programs may be realized as one program, or one program may be realized as two or more programs.

以下の説明において「観点」とは、発話に対して着目する観点（着目観点）であって、具体的には例えば、発話に対する同意または不同意が挙げられる。そして、対話のやりとりを通じて（言い換えると、複数の発話を書き起こしたテキストのなかで）当該観点で何らかの意味がなされる場合、すなわち、上記具体例では発話に対する同意または不同意があると判断できる場合に、「観点が有る」と表現する。観点は任意に設定可能であり、例えばユーザによって指定されてもよいし、プログラム等によって決定されてもよい。また、本説明では、複数の話者の発話からなる一連の対話を「対話シナリオ」として扱い、本発明に係る対話情報抽出装置は、対話シナリオごとに、着目すべき観点に基づいて対話を分析する。 In the following description, a "viewpoint" is a viewpoint from which attention is focused on an utterance (viewpoint of interest), and specifically includes, for example, agreement or disagreement with the utterance. Then, if some meaning is made from that perspective through dialogue exchange (in other words, in the text transcribed from multiple utterances), in other words, in the above specific example, it can be determined that there is agreement or disagreement with the utterances. ``I have a point of view.'' The viewpoint can be set arbitrarily, and may be specified by the user or determined by a program or the like, for example. In addition, in this explanation, a series of dialogues consisting of utterances from multiple speakers is treated as a "dialogue scenario", and the dialogue information extraction device according to the present invention analyzes the dialogue based on the viewpoint to be focused on for each dialogue scenario. do.

（１）第１の実施形態
図１は、本発明の第１の実施形態に係る対話情報抽出装置１００の構成例を示すブロック図である。図１に示した対話情報抽出装置１００は、所定の処理機能を有する処理部として、発話情報入力部１１１、対話構造推定部１１２、発話セット生成部１１４と観点判定部１１５を有する観点別対話抽出部１１３、及び抽出結果表示部１１６を備え、データを格納する記憶部として、対話構造推定モデル記憶部１２１、観点判定モデル記憶部１２２、過去発話情報記憶部１２３、対話構造記憶部１２４、及び抽出結果記憶部１２５を備える。 (1) First Embodiment FIG. 1 is a block diagram showing a configuration example of a dialogue information extraction device 100 according to a first embodiment of the present invention. The dialogue information extraction device 100 shown in FIG. 1 includes a dialogue information input unit 111, a dialogue structure estimation unit 112, an utterance set generation unit 114, and a viewpoint determination unit 115 as processing units having predetermined processing functions. 113 and an extraction result display section 116, and includes a dialogue structure estimation model storage section 121, a viewpoint determination model storage section 122, a past utterance information storage section 123, a dialogue structure storage section 124, and an extraction result display section 116 as storage sections for storing data. A result storage unit 125 is provided.

発話情報入力部１１１は、発話ごとに、当該発話を書き起こしたテキストを含む発話情報を対話情報抽出装置１００に入力する機能を有する。発話情報入力部１１１は、対話情報抽出装置１００の外部（対話内容をテキストデータで記録する対話記録装置など）からテキストを取得する構成でもよいし、自身が発話の音声データからテキストを書き起こす構成でもよい。 The utterance information input unit 111 has a function of inputting utterance information including text transcribed from the utterance to the dialogue information extraction device 100 for each utterance. The utterance information input unit 111 may be configured to acquire text from outside the dialogue information extraction device 100 (such as a dialogue recording device that records dialogue content as text data), or may be configured to transcribe the text from the audio data of the utterance itself. But that's fine.

対話構造推定部１１２は、統計的手法によって学習された推定モデルを用いて、現在の対話シナリオにおける対話構造を推定する機能を有する。 The dialogue structure estimation unit 112 has a function of estimating the dialogue structure in the current dialogue scenario using an estimation model learned by a statistical method.

観点別対話抽出部１１３は、対話シナリオにおける対話構造の推定結果に基づいて、当該対話シナリオにおいて所定の観点を有する対話に関する情報を抽出する機能を有する。観点別対話抽出部１１３のうち、発話セット生成部１１４は、対話構造の推定結果に基づいて、対話上の関係が強い発話の組合せを示す発話セットを生成する機能を有する。また、観点判定部１１５は、発話セット生成部１１４が生成した発話セットに対して、観点の有無を判定する機能を有する。 The viewpoint-specific dialogue extraction unit 113 has a function of extracting information regarding a dialogue having a predetermined viewpoint in the dialogue scenario, based on the estimation result of the dialogue structure in the dialogue scenario. Of the viewpoint-specific dialogue extracting unit 113, the utterance set generating unit 114 has a function of generating a utterance set indicating a combination of utterances having a strong dialogue relationship based on the estimation result of the dialogue structure. Further, the viewpoint determination unit 115 has a function of determining whether or not there is a viewpoint in the utterance set generated by the utterance set generation unit 114.

抽出結果表示部１１６は、観点別対話抽出部１１３によって抽出された、所定の観点を有する発話セットのうち、ユーザの指定条件に該当する発話セットに関する対話情報を出力する機能を有する。 The extraction result display unit 116 has a function of outputting dialogue information regarding the utterance set that corresponds to the user's specified conditions, among the utterance sets having a predetermined viewpoint extracted by the viewpoint-based dialogue extraction unit 113.

対話構造推定モデル記憶部１２１は、対話構造推定部１１２が対話構造を推定する際に用いる推定モデルを記憶する。 The dialogue structure estimation model storage unit 121 stores an estimation model used when the dialogue structure estimation unit 112 estimates a dialogue structure.

観点判定モデル記憶部１２２は、観点判定部１１５が発話セットにおける観点の有無を判定する際に用いる判定モデルを記憶する。 The viewpoint determination model storage unit 122 stores a determination model used by the viewpoint determination unit 115 to determine the presence or absence of a viewpoint in the utterance set.

過去発話情報記憶部１２３は、各対話シナリオにおける一連の対話の発話情報（過去発話情報）をまとめた過去発話情報群を記憶する。後述する図４には、過去発話情報群の具体例が示される。 The past utterance information storage unit 123 stores a past utterance information group that is a collection of utterance information (past utterance information) of a series of dialogues in each dialogue scenario. FIG. 4, which will be described later, shows a specific example of the past utterance information group.

対話構造記憶部１２４は、対話シナリオにおける対話構造の推定結果を示す情報を記憶する。対話構造の推定は対話構造推定部１１２によって行われ、後述する図５には、対話構造の推定結果の具体例が示される。 The dialogue structure storage unit 124 stores information indicating the estimation result of the dialogue structure in the dialogue scenario. The dialogue structure is estimated by the dialogue structure estimation unit 112, and a specific example of the dialogue structure estimation result is shown in FIG. 5, which will be described later.

抽出結果記憶部１２５は、観点別対話抽出部１１３（観点判定部１１５）が決定した抽出対象の対話情報を記憶する。 The extraction result storage unit 125 stores dialogue information to be extracted determined by the viewpoint-based dialogue extraction unit 113 (viewpoint determination unit 115).

図２は、対話情報抽出処理の処理手順例を示すフローチャートである。図２に示す対話情報抽出処理は、対話情報抽出装置１００の各部によって実行される。対話情報抽出処理は、２人以上の話者による対話の進行中にオンラインで逐次実行されてもよいし、対話の終了後に実行されてもよい。以下では、逐次実行されている場合を例として、図２に示した対話情報抽出処理の処理手順について、適宜、他図面を参照しながら説明する。なお、発話に対する着目観点は、図２に処理が開始される前に、例えばユーザから指定される。 FIG. 2 is a flowchart illustrating an example of a processing procedure for dialogue information extraction processing. The dialogue information extraction process shown in FIG. 2 is executed by each part of the dialogue information extraction device 100. The dialog information extraction process may be executed sequentially online while a dialog between two or more speakers is in progress, or may be executed after the dialog ends. In the following, the processing procedure of the dialog information extraction processing shown in FIG. 2 will be explained with reference to other drawings as appropriate, taking as an example the case where the dialog information extraction processing is executed sequentially. Note that the viewpoint of interest for the utterance is specified, for example, by the user before the process shown in FIG. 2 is started.

図２によればまず、発話情報入力部１１１が、最新の発話を書き起こしたテキストデータ（発話情報）を入力する（ステップＳ１０１）。 According to FIG. 2, first, the utterance information input unit 111 inputs text data (utterance information) in which the latest utterance is transcribed (step S101).

図３は、発話情報の一例を示す図である。図３に示した発話情報２１０は、発話ＩＤ２１１、発話時間２１２、発話ユーザＩＤ２１３、及びテキスト２１４の項目を有する。発話ＩＤ２１１は、発話ごとに割り当てられる識別子（発話ＩＤ）を示す。発話ＩＤは、対話シナリオに関わらず発話順に沿って割り当てられる識別子であってもよいし、対話シナリオごとに発話順に沿って割り当てられる識別子であってもよい。発話時間２１２は、当該発話の発生時間を示す。発話ユーザＩＤ２１３は、当該発話の話者の識別子（ユーザＩＤ）を示す。テキスト２１４は、当該発話の内容を書き起こしたテキストを示す。 FIG. 3 is a diagram showing an example of speech information. The utterance information 210 shown in FIG. 3 includes the following items: utterance ID 211, utterance time 212, utterance user ID 213, and text 214. The utterance ID 211 indicates an identifier (utterance ID) assigned to each utterance. The utterance ID may be an identifier assigned according to the order of utterances regardless of the dialogue scenario, or may be an identifier assigned according to the order of utterances for each dialogue scenario. The utterance time 212 indicates the time at which the utterance occurred. The utterance user ID 213 indicates the identifier (user ID) of the speaker of the utterance. Text 214 indicates a text that is a transcription of the content of the utterance.

ステップＳ１０１に次いで、発話情報入力部１１１は、過去発話情報記憶部１２３に記憶された過去発話情報群を参照し、現在の対話シナリオにおける過去の発話に関する発話情報（過去発話情報）が存在するか否かを確認する（ステップＳ１０２）。 Next to step S101, the utterance information input unit 111 refers to the past utterance information group stored in the past utterance information storage unit 123, and determines whether there is utterance information regarding past utterances in the current dialogue scenario (past utterance information). It is confirmed whether or not (step S102).

図４は、過去発話情報群の一例を示す図である。図４に示した過去発話情報群２２０は、発話ＩＤ２２１、発話時間２２２、ユーザＩＤ２２３、及びテキスト２２４の項目を有する。基本的には、過去発話情報群２２０の各項目は、図３に例示した発話情報２１０の各項目２１１～２１４と対応する。発話ＩＤ２２１は、当該過去発話情報群２２０に対応する対話シナリオにおける発話の順番を特定可能な識別子を示す。なお、発話情報２１０の発話ＩＤ２１１が対話シナリオに関係なく発話順に付与された識別子である場合は、過去発話情報群２２０に発話情報２１０を登録する際に、発話ＩＤ２１１の値を当該対話シナリオにおける発話順番に相当する値に変換してから発話ＩＤ２２１に登録すればよい。 FIG. 4 is a diagram showing an example of a past utterance information group. The past utterance information group 220 shown in FIG. 4 includes the following items: utterance ID 221, utterance time 222, user ID 223, and text 224. Basically, each item of the past utterance information group 220 corresponds to each item 211 to 214 of the utterance information 210 illustrated in FIG. 3. The utterance ID 221 indicates an identifier that can specify the order of utterances in the dialogue scenario corresponding to the past utterance information group 220. Note that if the utterance ID 211 of the utterance information 210 is an identifier assigned in the order of utterances regardless of the dialogue scenario, when registering the utterance information 210 in the past utterance information group 220, the value of the utterance ID 211 is set to the utterance in the dialogue scenario. It is only necessary to convert the values into corresponding values in order and then register them in the utterance ID 221.

ステップＳ１０２において過去発話情報が存在しない場合は（ステップＳ１０２のＮＯ）、ステップＳ１０１で入力された発話情報は現在の対話シナリオにおける最初の発話を示す発話情報である。この場合、現時点では当該対話シナリオにおける対話が形成されておらず、対話情報を抽出する状況にない。そこで発話情報入力部１１１は、ステップＳ１０１で入力された発話情報を、新たな対話シナリオの過去発話情報群として過去発話情報記憶部１２３に記憶し（ステップＳ１１０）、対話情報抽出処理を終了する。 If past utterance information does not exist in step S102 (NO in step S102), the utterance information input in step S101 is utterance information indicating the first utterance in the current dialogue scenario. In this case, no dialogue has been formed in the dialogue scenario at this point, and there is no situation in which to extract dialogue information. Therefore, the utterance information input unit 111 stores the utterance information input in step S101 in the past utterance information storage unit 123 as a group of past utterance information of a new dialogue scenario (step S110), and ends the dialogue information extraction process.

一方、ステップＳ１０２において過去発話情報が存在する場合には（ステップＳ１０２のＹＥＳ）、発話情報入力部１１１は、ステップＳ１０１で入力された発話情報と、過去発話情報記憶部１２３に記憶されている現在の対話シナリオの過去発話情報群とを、対話構造推定部１１２に入力する（ステップＳ１０３）。 On the other hand, if past utterance information exists in step S102 (YES in step S102), the utterance information input unit 111 combines the utterance information input in step S101 with the current utterance information stored in the past utterance information storage unit 123. and the past utterance information group of the dialogue scenario are input to the dialogue structure estimation unit 112 (step S103).

そして、対話構造推定部１１２が、ステップＳ１０３で入力された情報に基づいて、現在の対話シナリオにおける対話構造を推定する（ステップＳ１０４）。そして対話構造推定部１１２は、対話構造の推定結果を対話構造記憶部１２４に格納する。 Then, the dialogue structure estimation unit 112 estimates the dialogue structure in the current dialogue scenario based on the information input in step S103 (step S104). The dialogue structure estimation unit 112 then stores the dialogue structure estimation result in the dialogue structure storage unit 124.

ここで、ステップＳ１０４における対話構造推定部１１２による対話構造の推定について詳しく説明する。 Here, the estimation of the dialogue structure by the dialogue structure estimation unit 112 in step S104 will be explained in detail.

対話構造推定部１１２は、統計的手法によって学習された推定モデルを用いて、異なる話者による２つの発話間での関係有無の確率を計算し、その計算結果に基づいて対話上の関係を有する発話の組合せを推定することにより、対話構造を推定する。上記の推定モデルは、対話構造推定モデル記憶部１２１に記憶されている。 The dialogue structure estimation unit 112 uses an estimation model learned by a statistical method to calculate the probability of the presence or absence of a relationship between two utterances by different speakers, and based on the calculation result, there is a dialogue relationship. The dialogue structure is estimated by estimating the combination of utterances. The above estimation model is stored in the dialogue structure estimation model storage unit 121.

対話情報抽出装置１００（例えば対話構造推定部１１２）は、対話構造を付与済みの学習データを用いて、対話構造推定モデル記憶部１２１に記憶される推定モデルを学習することができる。具体的には例えば、推定モデルの学習の際には、以下のような入力を用意する。発話文（テキスト２２４）は、Ｗｏｒｄ２Ｖｅｃ、Ｇｌｏｖｅ、または事前学習済み言語モデル等によってベクトル化を行う。また、発話時刻（発話時間２２２）は、数値をそのままベクトルとして扱う。また、発話者（ユーザＩＤ２２３）は、ワンホットベクトルとしてベクトル化する。 The dialogue information extraction device 100 (for example, the dialogue structure estimating unit 112) can learn the estimation model stored in the dialogue structure estimation model storage unit 121 using the learning data to which a dialogue structure has been added. Specifically, for example, when learning the estimation model, the following inputs are prepared. The uttered sentence (text 224) is vectorized using Word2Vec, Globe, or a pre-trained language model. Furthermore, the utterance time (utterance time 222) is treated as a vector as it is. Furthermore, the speaker (user ID 223) vectorizes the vector as a one-hot vector.

上記のような学習データを用いて推定モデルが学習されることにより、当該推定モデルの学習または当該推定モデルを用いた推定において、入力とする特徴ベクトルは、「関係推定の対象とする２つの発話間の発話時間の差分」、「関係推定の対象とする２つの発話間の話者ごとの発話数」、及び「関係推定の対象とする２つの発話間の発話文の類似性」という特徴を有する。なお、上記した「関係推定の対象とする２つの発話」は、ある１つの発話と、その発話よりも過去の１つの発話との組合せであり、いくつ分の過去までの発話と組み合わせるかは、任意に設定することができる。また、「発話文の類似性」は、例えば、発話文において意味的に似た語句が出現しているかを示す。 By learning an estimation model using the above training data, the feature vectors to be input during training of the estimation model or estimation using the estimation model are "two utterances targeted for relationship estimation. ``difference in utterance time between the two utterances'', ``number of utterances per speaker between the two utterances targeted for relationship estimation'', and ``similarity of uttered sentences between the two utterances targeted for relationship estimation''. have Note that the above-mentioned "two utterances targeted for relationship estimation" is a combination of one utterance and one utterance earlier than that utterance, and how many past utterances to combine is determined by Can be set arbitrarily. Further, "similarity of uttered sentences" indicates, for example, whether words or phrases that are semantically similar appear in the uttered sentences.

図５は、対話構造の推定結果の一例を示す図である。図５に示した対話構造データ２３０は、対話構造推定部１１２によって推定された対話構造の一例を示すデータであって、対話上の関係を有する発話の組合せが、各発話の発話ＩＤを用いて示される。対話構造データ２３０は、対話構造ごとにレコードを有し、各レコードは、発話Ａの発話ＩＤ２３１及び発話Ｂの発話ＩＤ２３２の項目を有して構成される。発話ＩＤ２３１，２３２の値は、図４の過去発話情報群２２０における発話ＩＤ２２１の値に対応する。 FIG. 5 is a diagram illustrating an example of a dialog structure estimation result. The dialogue structure data 230 shown in FIG. 5 is data showing an example of a dialogue structure estimated by the dialogue structure estimating unit 112, in which a combination of utterances having a dialogue relationship is determined using the utterance ID of each utterance. shown. The dialogue structure data 230 has a record for each dialogue structure, and each record is configured to include an utterance ID 231 for utterance A and an utterance ID 232 for utterance B. The values of the utterance IDs 231 and 232 correspond to the value of the utterance ID 221 in the past utterance information group 220 in FIG.

なお、本説明では、対話においてある１つの発話が別の１つの発話に対して関係を有する発話であるとき、前者の発話を「発話Ａ」とし、後者の発話を「発話Ｂ」とする。すなわち、発話Ｂは発話Ａよりも過去に発生した発話であり、発話Ａの話者と発話Ｂの話者は別の人である。 In this description, when one utterance is related to another utterance in a dialogue, the former utterance will be referred to as "utterance A" and the latter utterance will be referred to as "utterance B." That is, utterance B is an utterance that occurred in the past than utterance A, and the speaker of utterance A and the speaker of utterance B are different people.

また、対話構造の推定結果において、発話Ａと発話Ｂとの組合せは、必ずしも１対１にならなくてもよい。具体的には例えば、図５の場合、発話ＩＤ「４」の発話Ｂは、発話ＩＤ「５」の発話Ａ及び発話ＩＤ「６」の発話Ａと、それぞれに対話構造を有する（対話関係を有する）。 Furthermore, in the dialogue structure estimation result, the combination of utterance A and utterance B does not necessarily have to be one-to-one. Specifically, for example, in the case of FIG. 5, utterance B with utterance ID "4" has a dialogue structure (dialogue relationship) with utterance A with utterance ID "5" and utterance A with utterance ID "6". ).

上述したようにステップＳ１０４で対話構造推定部１１２が対話構造を推定した後は、観点別対話抽出部１１３の発話セット生成部１１４が、対話構造の推定結果に基づいて、対話上の強い関係を有する発話セットを生成する（ステップＳ１０５）。 After the dialogue structure estimation unit 112 estimates the dialogue structure in step S104 as described above, the utterance set generation unit 114 of the viewpoint-based dialogue extraction unit 113 determines strong relationships in the dialogue based on the dialogue structure estimation result. A utterance set having the following information is generated (step S105).

発話セットの生成方法の一例を説明する。発話セット生成部１１４は、１つの発話について、任意の閾値の確率値以上で関係を持つ過去発話のうちから、所定条件（例えば、最も発話時刻が近い）を満たす１つの発話を抽出し、これらの発話を組合せた発話群を発話セットとする。上記の所定条件は、経験則では最も発話時刻が近い過去発話であるが、これに限定されるものではない。また、上記の閾値は、例えば０．５を基準とし、０から１の範囲でユーザが抽出用途に応じて設定される。再現率を高めたい（抽出漏れを防止したい）場合には、閾値を低く設定すればよい。また、再現率を高めたい場合には、発話セットの抽出対象を広くしてもよい。 An example of a method for generating an utterance set will be explained. The utterance set generation unit 114 extracts one utterance that satisfies a predetermined condition (for example, the closest utterance time) from among past utterances that have a relationship with one utterance at a probability value greater than or equal to an arbitrary threshold value, and An utterance set is a group of utterances that are a combination of utterances. According to a rule of thumb, the above predetermined condition is a past utterance with the closest utterance time, but is not limited to this. Further, the above-mentioned threshold value is set, for example, in the range of 0 to 1 by the user according to the purpose of extraction, with 0.5 as the standard. If you want to increase the recall rate (prevent missing extraction), you can set the threshold low. Furthermore, if it is desired to increase the recall rate, the utterance set may be extracted from a wider range of subjects.

また、発話セットの生成方法の変形例として、発話セット生成部１１４は、１つの発話について、過去発話のうちから発話時間が近い複数個の発話を抽出し、発話セットのサイズを２以上としてもよい。あるいは、任意の閾値の確率値以上で関係を持つ過去発話のうちから、発話時間が近い複数個の発話を抽出するようにしてもよい。 In addition, as a modification of the utterance set generation method, the utterance set generation unit 114 extracts a plurality of utterances with similar utterance times from past utterances for one utterance, and sets the size of the utterance set to 2 or more. good. Alternatively, a plurality of utterances with similar utterance times may be extracted from among past utterances that are related by a probability value greater than or equal to an arbitrary threshold value.

図２の説明に戻る。ステップＳ１０５で発話セットを生成した後、発話セット生成部１１４は、生成した発話セットを観点判定部１１５に入力する（ステップＳ１０６）。 Returning to the explanation of FIG. 2. After generating the utterance set in step S105, the utterance set generation unit 114 inputs the generated utterance set to the viewpoint determination unit 115 (step S106).

次に、観点判定部１１５は、統計的手法によって学習された判定モデルを用いて、それぞれの発話セットにおける観点の有無を判定する（ステップＳ１０７）。上記の判定モデルは、観点判定モデル記憶部１２２に記憶されている。 Next, the viewpoint determination unit 115 determines the presence or absence of a viewpoint in each utterance set using the determination model learned by the statistical method (step S107). The above judgment model is stored in the viewpoint judgment model storage unit 122.

ステップＳ１０７における観点判定部１１５による観点の有無の判定方法の一例について、詳しく説明する。まず、判定モデルには事前学習済み言語モデルを用いる。そして観点判定部１１５は、発話セットの発話を発話時刻順に並べたものとそのラベル（観点の有無）とを１つのサンプルとし、そのサンプル集合を用いて事前学習済み言語モデルのファインチューニングを行うことによって、判定モデルを学習する。例えば発話セットを構成する発話数が２の場合、文ペア分類問題として扱うことができる。また、発話セットを構成する発話数が３以上の場合でも、同様に分類問題として扱えばよい。 An example of a method for determining the presence or absence of a viewpoint by the viewpoint determination unit 115 in step S107 will be described in detail. First, a pre-trained language model is used as the judgment model. Then, the viewpoint determination unit 115 uses the utterances of the utterance set arranged in order of utterance time and their labels (presence or absence of viewpoint) as one sample, and fine-tunes the pre-trained language model using that sample set. The decision model is learned by For example, if the number of utterances constituting an utterance set is two, it can be treated as a sentence pair classification problem. Further, even if the number of utterances constituting the utterance set is three or more, it may be treated as a classification problem in the same way.

次に、観点別対話抽出部１１３（例えば観点判定部１１５）が、ステップＳ１０７の判定結果に基づいて、観点を有する発話セットを抽出対象として決定する（ステップＳ１０８）。そして観点別対話抽出部１１３は、決定した抽出対象の発話セットに関する対話情報を抽出結果記憶部１２５に格納する。 Next, the viewpoint-specific dialogue extraction unit 113 (for example, the viewpoint determination unit 115) determines an utterance set having a viewpoint as an extraction target based on the determination result in step S107 (step S108). Then, the viewpoint-specific dialogue extraction unit 113 stores dialogue information regarding the determined extraction target utterance set in the extraction result storage unit 125.

なお、前述した通り、本実施形態において「観点を有する」とは、当該観点で何らかの意味がなされることを意味する。したがって、着目観点が顧客の同意または不同意であった場合、顧客による同意の発話が含まれる発話セットだけでなく、顧客による不同意の発話が含まれる発話セットも、観点を有する発話セットとして決定される。 Note that, as described above, in this embodiment, "having a viewpoint" means having some meaning from the viewpoint. Therefore, when the viewpoint of interest is the customer's agreement or disagreement, not only the utterance set that includes the customer's utterance of agreement but also the utterance set that includes the customer's utterance of disagreement is determined as the utterance set that has the viewpoint. be done.

図６は、観点を有する発話セットに関する対話情報の一例を示す図である。図６に示した対話情報２４０は、観点を有すると判定された発話セットごとにレコードを有し、各レコードは、発話ＡのユーザＩＤ２４１、発話Ａの発話ＩＤ２４２、発話Ａ２４３、発話ＢのユーザＩＤ２４４、発話Ｂの発話ＩＤ２４５、発話Ｂ２４６、及び観点２４７の項目を有して構成される。 FIG. 6 is a diagram illustrating an example of dialogue information regarding an utterance set having a viewpoint. The dialogue information 240 shown in FIG. 6 has a record for each utterance set determined to have a viewpoint, and each record includes a user ID 241 of utterance A, an utterance ID 242 of utterance A, a utterance A 243, and a user ID 244 of utterance B. , utterance ID 245 of utterance B, utterance B 246, and viewpoint 247.

対話情報２４０において、発話ＡのユーザＩＤ２４１、発話Ａの発話ＩＤ２４２、及び発話Ａ２４３は、発話セットを構成する回答側の「発話Ａ」に関する発話情報である。発話Ａの発話ＩＤ２４２は、対話構造データ２３０の発話ＩＤ２３１から取得することができる（図５参照）。そして、この発話ＩＤをキーとして過去発話情報群２２０を参照することにより（図４参照）、ユーザＩＤ２４１はユーザＩＤ２２３から取得することができ、発話Ａ２４３はテキスト２２４から取得することができる。また、発話ＢのユーザＩＤ２４４、発話Ｂの発話ＩＤ２４５、及び発話Ｂ２４６は、発話セットを構成する問い掛け側の「発話Ｂ」に関する発話情報である。したがって、発話Ａに関する発話情報と同様にして、発話Ｂの発話ＩＤ２４５は対話構造データ２３０の発話ＩＤ２３２から、ユーザＩＤ２４４は過去発話情報群２２０のユーザＩＤ２２３から、発話Ｂ２４６は過去発話情報群２２０のテキスト２２４から、それぞれ取得することができる。また、観点２４７は、当該発話セットが有する観点の内容を示す。観点の内容とは、例えば「同意」または「不同意」であり、これらの内容は、観点判定部１１５による観点の有無の判定の際に判別される。 In the dialog information 240, the user ID 241 of the utterance A, the utterance ID 242 of the utterance A, and the utterance A 243 are utterance information regarding "utterance A" on the answering side that constitutes the utterance set. The utterance ID 242 of the utterance A can be obtained from the utterance ID 231 of the dialogue structure data 230 (see FIG. 5). By referring to the past utterance information group 220 using this utterance ID as a key (see FIG. 4), the user ID 241 can be obtained from the user ID 223, and the utterance A 243 can be obtained from the text 224. Further, the user ID 244 of utterance B, the utterance ID 245 of utterance B, and the utterance B 246 are utterance information regarding "utterance B" of the inquiring side that constitutes the utterance set. Therefore, in the same way as the utterance information regarding utterance A, the utterance ID 245 of utterance B is derived from the utterance ID 232 of the dialogue structure data 230, the user ID 244 is derived from the user ID 223 of the past utterance information group 220, and the utterance B 246 is the text of the past utterance information group 220. 224, respectively. Furthermore, the viewpoint 247 indicates the content of the viewpoint that the utterance set has. The content of the viewpoint is, for example, "agree" or "disagree", and these contents are determined when the viewpoint determination unit 115 determines whether there is a viewpoint.

ステップＳ１０８の後は、抽出結果表示部１１６が、抽出結果記憶部１２５に格納された抽出対象の対話情報に対して、ユーザから指定されたフィルタ条件でフィルタリングを行い、フィルタ条件に該当する発話セットに関する対話情報を、抽出結果として出力する（ステップＳ１０９）。 After step S108, the extraction result display unit 116 filters the dialogue information to be extracted stored in the extraction result storage unit 125 using the filter conditions specified by the user, and sets the utterances that correspond to the filter conditions. The dialog information regarding the extraction process is output as an extraction result (step S109).

図７は、対話情報の抽出結果の表示例を示す図である。図７に示す表示画面２５０は、例えばＧＵＩ等によってユーザが操作可能な端末（対話情報抽出装置１００が備えるディスプレイ等でもよい）で表示される画面例であって、入力画面２５１と出力画面２５２とを有して構成される。入力画面２５１は、ユーザによるフィルタ条件の指定を受け付け、その指定内容を表示する。なお、フィルタ条件は、キーワード、発話ユーザ、または観点等から任意にユーザが指定可能であるが、条件項目はこれらに限定されるものではない。出力画面２５２は、フィルタ条件に基づく対話情報の抽出結果を表示する。 FIG. 7 is a diagram illustrating a display example of the dialogue information extraction results. A display screen 250 shown in FIG. 7 is an example of a screen displayed on a terminal that can be operated by a user using a GUI or the like (a display included in the dialog information extraction device 100 may be used), and includes an input screen 251 and an output screen 252. It is composed of The input screen 251 accepts the user's designation of filter conditions and displays the designation contents. Note that the filter conditions can be arbitrarily specified by the user based on keywords, speaking users, viewpoints, etc., but the condition items are not limited to these. The output screen 252 displays the extraction results of dialogue information based on the filter conditions.

図７を参照して具体的に説明すると、入力画面２５１には、キーワード検索で「契約書」が指定され、ユーザ検索で「Ｕｓｅｒ－Ｂ」が指定され、抽出観点で「不同意」が指定されている。この場合、抽出結果表示部１１６は、抽出結果記憶部１２５に記憶された対話情報２４０（図６参照）のうちから、発話Ａ２４３または発話Ｂ２４５に「契約書」の語句が含まれ、ユーザＩＤ２４１またはユーザＩＤ２４４が「Ｕｓｅｒ－Ｂ」であり、観点２４７が「不同意」であるレコードを検索する。なお、対話情報抽出装置１００は、図２のステップＳ１０９より前の任意のタイミングで、ユーザによる入力画面２５１への入力操作を受け付け可能としてよい。そして抽出結果表示部１１６は、図２のステップＳ１０９において、抽出結果記憶部１２５に格納された抽出対象の対話情報から、上記の各種検索条件（フィルタ条件）に該当するレコードの内容を取得し、これを出力画面２５２に表示する。 To explain specifically with reference to FIG. 7, on the input screen 251, "contract" is specified in the keyword search, "User-B" is specified in the user search, and "disagree" is specified in the extraction viewpoint. has been done. In this case, the extraction result display unit 116 displays the user ID 241 or A record in which the user ID 244 is "User-B" and the viewpoint 247 is "disagree" is searched. Note that the dialog information extraction device 100 may be configured to be able to accept input operations from the user on the input screen 251 at any timing before step S109 in FIG. Then, in step S109 of FIG. 2, the extraction result display unit 116 acquires the contents of records that correspond to the various search conditions (filter conditions) described above from the dialogue information to be extracted stored in the extraction result storage unit 125. This is displayed on the output screen 252.

上記のようにして出力される出力画面２５２は、ユーザが着目する観点に基づいて対話内容を抽出したものであり、ユーザは表示画面２５０を見ることにより、効率的に対話状況を確認することができる。 The output screen 252 output as described above is a screen that extracts the content of the dialogue based on the viewpoint that the user focuses on, and the user can efficiently check the dialogue status by looking at the display screen 250. can.

なお、抽出結果表示部１１６による抽出結果の出力形態は、図７の表示画面２５０の画面構成に限定されるものではない。他にも例えば、表示画面２５０の一領域に、対話シナリオにおける発話のテキストデータを、リアルタイムな応対チャットの形式で表示するようにしてもよい。この場合、どのような応対のなかで着目観点を有する対話が発生したのかが、より分かり易く表示される。 Note that the output format of the extraction results by the extraction result display section 116 is not limited to the screen configuration of the display screen 250 in FIG. 7 . In addition, for example, text data of utterances in the dialogue scenario may be displayed in a region of the display screen 250 in the form of a real-time reception chat. In this case, it is displayed more clearly what kind of response the conversation with the viewpoint of interest occurred in.

そしてステップＳ１０９の後は、発話情報入力部１１１が、ステップＳ１０１で入力された発話情報を、過去発話情報記憶部１２３に記憶されている現在の対話シナリオの過去発話情報群に追加し（ステップＳ１１０）、対話情報抽出処理を終了する。なお、発話情報を登録するステップＳ１１０の実行タイミングは、ステップＳ１０１以降の任意のタイミングとしてもよい。 After step S109, the utterance information input unit 111 adds the utterance information input in step S101 to the past utterance information group of the current dialogue scenario stored in the past utterance information storage unit 123 (step S110). ), the dialogue information extraction process ends. Note that the execution timing of step S110 for registering the utterance information may be any timing after step S101.

以上のように対話情報抽出処理を実行することにより、対話情報抽出装置１００は、複数の話者による発話シーケンスから、対話構造の推定と観点に関する判定とを行い、観点を有する発話を含む発話セットを抽出し、当該発話セットに関する対話情報を出力する。このような本実施形態に係る対話情報抽出装置１００によれば、複数の話者による対話から、着目すべき観点を有する対話の情報を抽出することができ、対話状況の効率的な確認が可能となる。 By executing the dialog information extraction process as described above, the dialog information extraction device 100 estimates the dialog structure and determines the viewpoint from the utterance sequences of a plurality of speakers, and sets an utterance set including utterances having a viewpoint. is extracted, and dialogue information regarding the utterance set is output. According to the dialogue information extraction device 100 according to the present embodiment, it is possible to extract dialogue information having a noteworthy viewpoint from dialogues between a plurality of speakers, and it is possible to efficiently check the dialogue situation. becomes.

また、対話情報抽出装置１００は、発話セットを単位として観点に関する判定を行うことにより、観点の有無だけでなく、観点の内容（例えば、同意か不同意か）まで精度よく判別することができ、抽出される対話情報の粒度を向上させることができる。 In addition, the dialogue information extraction device 100 can accurately determine not only the presence or absence of a viewpoint but also the content of the viewpoint (for example, whether it is agreed or disagreed) by making a determination regarding the viewpoint in units of utterance sets. The granularity of extracted dialogue information can be improved.

また、対話情報抽出装置１００が、抽出した対話情報をユーザが指定したフィルタ条件でフィルタリングして出力することにより、ユーザは、対話状況の確認をより効率的に行うことができる。 Furthermore, the dialog information extraction device 100 filters and outputs the extracted dialog information using the filter conditions specified by the user, so that the user can check the dialog status more efficiently.

また、対話情報抽出装置１００は、発話が発生するたび（テキストの発話情報が入力されるたび）に、図２に示した対話情報抽出処理を逐次的に実行することにより、オンライン処理が可能となる。このとき、発話が発生するたび（テキストの発話情報が入力されるたび）に、対話情報の抽出結果の画面表示を更新するように構成すれば、ユーザは、リアルタイムで対話状況（例えば、同意や不同意といった観点の状況）を確認することができる。オンラインで発話状況を継続的に監視しながら所定の観点に関する情報を抽出できることは、同意の確認漏れなどのリスクを低減するために有用である。 Furthermore, the dialogue information extraction device 100 can perform online processing by sequentially executing the dialogue information extraction process shown in FIG. 2 every time an utterance occurs (every time text utterance information is input). Become. At this time, if the screen display of the dialogue information extraction results is updated every time an utterance occurs (every time text utterance information is input), the user can check the dialogue status (for example, consent, etc.) in real time. It is possible to check the status of viewpoints such as disagreement. Being able to extract information related to a predetermined viewpoint while continuously monitoring speech status online is useful for reducing risks such as failure to confirm consent.

（２）第２の実施形態
図８は、本発明の第２の実施形態に係る対話情報抽出装置１０１の構成例を示すブロック図である。対話情報抽出装置１０１は、第１の実施形態で図１に示した対話情報抽出装置１００に判定対象選択部１１７が追加された構成となっており、対話情報抽出装置１００と共通する構成については説明を省略する。 (2) Second Embodiment FIG. 8 is a block diagram showing a configuration example of a dialogue information extraction device 101 according to a second embodiment of the present invention. The dialogue information extraction device 101 has a configuration in which a determination target selection unit 117 is added to the dialogue information extraction device 100 shown in FIG. 1 in the first embodiment, and the configuration common to the dialogue information extraction device 100 is The explanation will be omitted.

判定対象選択部１１７は、対話シナリオにおける一連の対話（複数の発話）のうちから、予め設定された判定条件（任意のタイミングでユーザから指定されてもよい）に基づいて、対話情報抽出の判定対象とする発話（判定対象発話）を選択する機能を有する。 The determination target selection unit 117 determines whether to extract dialogue information from a series of dialogues (a plurality of utterances) in a dialogue scenario based on preset determination conditions (which may be specified by the user at any timing). It has a function of selecting a target utterance (judgment target utterance).

上記の判定条件としては例えば、対話情報の抽出処理の開始を判定するための「開始キーワード発話」と、対話情報の抽出処理の終了を判定するための「終了キーワード発話」とが設定される。このとき、判定対象選択部１１７は、対話シナリオの進行に伴って対話情報抽出装置１０１に入力される発話のテキストデータ（発話情報）に対して、「開始キーワード発話」または「終了キーワード発話」が存在するかを検索し、開始キーワード発話が最初に存在した発話から、終了キーワード発話が最初に存在した発話までを、判定対象発話として選択する。また、その他の判定条件として例えば、判定対象の対話区間の開始時に話者の１人（オペレータ等）が所定のスイッチを押下するように構成される場合、このスイッチ操作を判定条件とすることができる。 As the above-mentioned determination conditions, for example, a "start keyword utterance" for determining the start of the dialogue information extraction process and an "end keyword utterance" for determining the end of the dialogue information extraction process are set. At this time, the determination target selection unit 117 determines whether the "start keyword utterance" or the "end keyword utterance" is the utterance text data (utterance information) that is input to the dialogue information extraction device 101 as the dialogue scenario progresses. A search is made to see if the utterance exists, and the utterances from the utterance in which the start keyword utterance first exists to the utterance in which the end keyword utterance first exists are selected as determination target utterances. In addition, as another judgment condition, for example, if one of the speakers (operator, etc.) is configured to press a predetermined switch at the start of the dialogue section to be judged, this switch operation can be used as a judgment condition. can.

図９は、第２の実施形態における対話情報抽出処理の処理手順例を示すフローチャートである。図９では、図２に示した第１の実施形態における対話情報抽出処理と同様の処理に、共通する符号を付している。以下、図９に示した対話情報抽出処理の処理手順について、図２の対話情報抽出処理との相違点を中心に説明する。 FIG. 9 is a flowchart illustrating an example of a procedure for dialog information extraction processing in the second embodiment. In FIG. 9, processes similar to the dialog information extraction process in the first embodiment shown in FIG. 2 are given the same reference numerals. The processing procedure of the dialog information extraction process shown in FIG. 9 will be described below, focusing on the differences from the dialog information extraction process of FIG. 2.

図９によればまず、発話情報入力部１１１が発話情報を入力する（ステップＳ１０１）。次いで、判定対象選択部１１７が、所定の判定条件に基づいて、ステップＳ１０１で入力された発話情報が判定対象発話であるか否かを判定する（ステップＳ２０１）。 According to FIG. 9, first, the utterance information input unit 111 inputs utterance information (step S101). Next, the determination target selection unit 117 determines whether the utterance information input in step S101 is a determination target utterance based on a predetermined determination condition (step S201).

ステップＳ２０１において判定対象発話ではないと判定された場合（ステップＳ２０１のＮＯ）、対話情報を抽出する必要がないため、特段の処理を行うことなく、対話情報抽出処理を終了する。 If it is determined in step S201 that the utterance is not a determination target utterance (NO in step S201), there is no need to extract dialogue information, so the dialogue information extraction process ends without performing any special processing.

一方、ステップＳ２０１において判定対象発話であると判定された場合は（ステップＳ２０１のＹＥＳ）、発話情報入力部１１１が、過去発話情報記憶部１２３に記憶された現在の対話シナリオの過去発話情報群を参照し、判定対象発話の判定条件を満たす過去発話情報が存在するか否かを確認する（ステップＳ２０２）。なお、第２の実施形態では、過去発話情報記憶部１２３には、判定対象発話の判定条件を満たす過去発話情報だけが記憶される。そのため実際には、ステップＳ２０２において発話情報入力部１１１は、現在の対話シナリオの過去発話情報群に過去発話情報が存在するかを確認すればよい。 On the other hand, if it is determined in step S201 that the utterance is a determination target utterance (YES in step S201), the utterance information input unit 111 inputs the past utterance information group of the current dialogue scenario stored in the past utterance information storage unit 123. It is checked whether there is past utterance information that satisfies the criteria for the utterance to be determined (step S202). Note that in the second embodiment, the past utterance information storage unit 123 stores only past utterance information that satisfies the determination condition of the determination target utterance. Therefore, in reality, in step S202, the utterance information input unit 111 only needs to check whether past utterance information exists in the past utterance information group of the current dialogue scenario.

ステップＳ２０２において判定対象発話の判定条件を満たす過去発話情報が存在した場合は（ステップＳ２０２のＹＥＳ）、ステップＳ１０３～Ｓ１０８の処理が行われることにより、観点を有する発話を含む発話セットが対話情報の抽出対象として決定され、抽出対象の発話セットに関する対話情報が抽出結果記憶部１２５に格納される。次いで、抽出結果表示部１１６が、抽出結果記憶部１２５に格納された対話情報のうちから、ユーザから指定されたフィルタ条件に該当する対話情報を抽出結果として出力する（ステップＳ１０９）。そして最後に、ステップＳ１１０で、発話情報入力部１１１が、ステップＳ１０１で入力された発話情報を、過去発話情報記憶部１２３に記憶されている現在の対話シナリオの過去発話情報群に追加し（ステップＳ１１０）、対話情報抽出処理を終了する。 If there is past utterance information that satisfies the criteria for the determination target utterance in step S202 (YES in step S202), the processes in steps S103 to S108 are performed, so that the utterance set including the utterance with a viewpoint is Dialogue information regarding the utterance set determined as the extraction target is stored in the extraction result storage unit 125. Next, the extraction result display unit 116 outputs, as an extraction result, dialogue information that corresponds to the filter condition specified by the user from among the dialogue information stored in the extraction result storage unit 125 (step S109). Finally, in step S110, the utterance information input unit 111 adds the utterance information input in step S101 to the past utterance information group of the current dialogue scenario stored in the past utterance information storage unit 123 (step S110), the dialog information extraction process ends.

一方、ステップＳ２０２において判定対象発話の判定条件を満たす過去発話情報が存在しない場合（ステップＳ２０２のＹＥＳ）、現時点では当該対話シナリオにおいて、対話情報の抽出対象となる対話が形成されていない。そこで発話情報入力部１１１は、ステップＳ１０１で入力された発話情報を、該当する対話シナリオの過去発話情報群として過去発話情報記憶部１２３に記憶し（ステップＳ１１０）、対話情報抽出処理を終了する。なお、ステップＳ１０１で入力された発話情報が新たな対話シナリオの最初の発話に関する場合は、発話情報入力部１１１は、当該発話情報を、新たな対話シナリオの過去発話情報群として過去発話情報記憶部１２３に記憶すればよい。 On the other hand, if there is no past utterance information that satisfies the determination condition for the determination target utterance in step S202 (YES in step S202), no dialogue from which dialogue information is to be extracted is currently formed in the dialogue scenario. Therefore, the utterance information input unit 111 stores the utterance information input in step S101 in the past utterance information storage unit 123 as a group of past utterance information of the corresponding dialogue scenario (step S110), and ends the dialogue information extraction process. Note that if the utterance information input in step S101 relates to the first utterance of a new dialogue scenario, the utterance information input unit 111 stores the utterance information in the past utterance information storage unit as a past utterance information group of the new dialogue scenario. 123.

以上のように対話情報抽出処理を実行することにより、第２の実施形態に係る対話情報抽出装置１０１は、判定対象発話を限定した上で、観点を有する対話情報を抽出することができるため、第１の実施形態に比べて、処理部（例えば対話構造推定部１１２や観点別対話抽出部１１３等）の処理負荷を軽減し、記憶部（例えば過去発話情報記憶部１２３や対話構造記憶部１２４等）の容量消費を抑制することができる。また、「開始キーワード発話」や「終了キーワード発話」のような判定条件を適切に設定することにより、観点とは関係のない発話を除外することができるので、抽出される対話情報の粒度をより向上させる効果が得られる。 By executing the dialogue information extraction process as described above, the dialogue information extraction device 101 according to the second embodiment can extract dialogue information having a viewpoint while limiting the determination target utterances. Compared to the first embodiment, the processing load on the processing units (for example, the dialogue structure estimation unit 112 and the viewpoint-based dialogue extraction unit 113) is reduced, and the processing load on the storage units (for example, the past utterance information storage unit 123 and the dialogue structure storage unit 124) is reduced. etc.) capacity consumption can be suppressed. In addition, by appropriately setting judgment conditions such as "start keyword utterance" and "end keyword utterance", it is possible to exclude utterances that are unrelated to the viewpoint, thereby increasing the granularity of the dialogue information extracted. You can get the effect of improving.

（３）ハードウェア構成
最後に、第１または第２の実施形態に係る対話情報抽出装置１００，１０１のハードウェア構成例を説明する。 (3) Hardware Configuration Finally, an example of the hardware configuration of the dialog information extraction apparatus 100, 101 according to the first or second embodiment will be described.

図１０は、対話情報抽出装置１００，１０１のハードウェア構成例を示すブロック図である。図１０に示すように、対話情報抽出装置１００，１０１は、ＣＰＵ１１、ＲＯＭ１２、ＲＡＭ１３、入出力インタフェース１４、記憶装置１５、ドライブ装置１６、及び通信インタフェース１７を備えて構成される。記憶装置１５は、対話情報抽出装置１００，１０１が利用する各種データやプログラム２０を記憶する。ＣＰＵ１１がプログラム２０をＲＡＭ１３に読み出して実行することにより、対話情報抽出装置１００，１０１の各処理部（発話情報入力部１１１，対話構造推定部１１２，観点別対話抽出部１１３，抽出結果表示部１１６，判定対象選択部１１７）が実現される。 FIG. 10 is a block diagram showing an example of the hardware configuration of the dialog information extraction devices 100 and 101. As shown in FIG. 10, the dialogue information extraction devices 100 and 101 are configured to include a CPU 11, a ROM 12, a RAM 13, an input/output interface 14, a storage device 15, a drive device 16, and a communication interface 17. The storage device 15 stores various data and programs 20 used by the dialogue information extraction devices 100 and 101. By the CPU 11 reading out the program 20 into the RAM 13 and executing it, each processing unit of the dialogue information extraction devices 100 and 101 (utterance information input unit 111, dialogue structure estimation unit 112, viewpoint-based dialogue extraction unit 113, extraction result display unit 116) , determination target selection unit 117) is realized.

１１ＣＰＵ
１２ＲＯＭ
１３ＲＡＭ
１４入出力インタフェース
１５記憶装置
１６ドライブ装置
１７通信インタフェース
２０プログラム
１００，１０１対話情報抽出装置
１１１発話情報入力部
１１２対話構造推定部
１１３観点別対話抽出部
１１４発話セット生成部
１１５観点判定部
１１６抽出結果表示部
１１７判定対象選択部
１２１対話構造推定モデル記憶部
１２２観点判定モデル記憶部
１２３過去発話情報記憶部
１２４対話構造記憶部
１２５抽出結果記憶部
２１０発話情報
２２０過去発話情報群
２３０対話構造データ
２４０対話情報
２５０表示画面
11 CPU
12 ROM
13 RAM
14 input/output interface 15 storage device 16 drive device 17 communication interface 20 program 100, 101 dialogue information extraction device 111 utterance information input unit 112 dialogue structure estimation unit 113 viewpoint-based dialogue extraction unit 114 utterance set generation unit 115 viewpoint determination unit 116 extraction result Display section 117 Judgment target selection section 121 Dialogue structure estimation model storage section 122 Viewpoint determination model storage section 123 Past utterance information storage section 124 Dialogue structure storage section 125 Extraction result storage section 210 Utterance information 220 Past utterance information group 230 Dialogue structure data 240 Dialogue Information 250 Display screen

Claims

A dialogue information extraction device that analyzes dialogue between multiple speakers based on a viewpoint to be focused on,
an utterance information input unit that inputs, for each utterance constituting the dialogue, utterance information including text transcribed from the utterance;
a dialogue structure estimation unit that estimates a dialogue structure of the dialogue from the utterance information of a plurality of utterances in the dialogue using an estimation model that has learned the dialogue structure;
a viewpoint-specific dialogue extraction unit that extracts a group of utterances having the viewpoint by determining the presence or absence of the viewpoint from a group of utterances generated based on the dialogue structure and consisting of utterances of a plurality of speakers;
A dialogue information extraction device comprising:

Regarding the utterance group extracted by the viewpoint-specific dialogue extracting unit, the extraction result display unit outputs dialogue information including the text of each utterance constituting the utterance group and the content of the viewpoint that the utterance group has. The dialog information extraction device according to claim 1.

The dialogue structure estimation unit calculates the degree of relationship between two utterances by different speakers using the estimation model for a plurality of utterances in the dialogue, and determines that there is a dialogue relationship based on the result of the calculation. The dialogue information extraction device according to claim 1, wherein the dialogue structure is estimated by estimating a combination of utterances.

The viewpoint-specific dialogue extraction unit is
an utterance set generation unit that generates an utterance set by narrowing down the combinations of utterances estimated by the dialogue structure estimation unit under predetermined conditions as an utterance group consisting of utterances of the plurality of speakers;
and a viewpoint determination unit that determines, for each of the utterance sets, whether or not the viewpoint exists in the utterance set using a previously learned determination model, and extracts a utterance set having the viewpoint from the result of the determination. The dialog information extraction device according to claim 3, characterized in that:

The utterance set generation unit generates, as the utterance set, combinations of utterances that have a dialogical relationship at or above an arbitrary specified probability value among the combinations of utterances estimated by the dialogue structure estimation unit. The dialog information extraction device according to claim 4.

The utterance set generation unit selects a predetermined number of combinations of utterances from the answering side and utterances from the questioner that are past the utterance from among the combinations of utterances estimated by the dialogue structure estimating unit, starting from the one with the smallest time difference. The dialogue information extraction device according to claim 4, wherein each is generated as the utterance set.

The extraction result display unit extracts a group of utterances that match a filter condition specified by the user from among the utterance groups having the viewpoint extracted by the viewpoint-based dialogue extracting unit, and displays information about the extracted utterance group. The dialogue information extraction device according to claim 2, wherein the dialogue information is output.

Each time an utterance is made as the dialogue progresses, the utterance information input unit inputs the utterance information of the utterance,
In response to the input of the utterance information, processing is performed by the dialogue structure estimation unit and the viewpoint-based dialogue extraction unit, and the extraction result display unit updates the output of the dialogue information based on the results of the processing. The dialogue information extraction device according to claim 2, characterized in that:

With respect to the utterance information that the utterance information input unit attempts to input, based on a predetermined determination condition, whether or not the utterance information is to be processed by the dialogue structure estimation unit and the viewpoint-based dialogue extraction unit. The dialog information extraction device according to claim 1, further comprising: a determination target selection unit that determines.

The dialogue information extraction device according to claim 1, wherein the utterance of the answering side indicating agreement or disagreement with the utterance of the questioner is regarded as having the viewpoint.

A dialogue information extraction method using a dialogue information extraction device that analyzes dialogues between multiple speakers based on noteworthy viewpoints, the method comprising:
an utterance information input step in which the dialogue information extraction device inputs utterance information including text transcribed from the utterance for each utterance constituting the dialogue;
A dialogue structure in which the dialogue information extraction device estimates the dialogue structure of the dialogue from the utterance information of a plurality of utterances in the dialogue input in the utterance information input step, using an estimation model whose dialogue structure has been learned. an estimation step;
The dialogue information extraction device has the viewpoint by determining whether or not the viewpoint exists for a group of utterances made up of utterances of a plurality of speakers, which is generated based on the estimation result of the dialogue structure estimation step. a perspective-specific dialogue extraction step for extracting utterance groups;
A dialogue information extraction method characterized by comprising: