JP2016139299A

JP2016139299A - Information processing system, information processing method, and program

Info

Publication number: JP2016139299A
Application number: JP2015014212A
Authority: JP
Inventors: 貴士大西; Takashi Onishi; 康高山本; Yasutaka Yamamoto
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2015-01-28
Filing date: 2015-01-28
Publication date: 2016-08-04
Anticipated expiration: 2035-01-28
Also published as: JP6492698B2

Abstract

PROBLEM TO BE SOLVED: To improve the accuracy of implication clustering to conversation texts.SOLUTION: A representative candidate extraction part 22 of a clustering system 1 extracts a section having a high possibility that prescribed contents are clearly indicated among sections related to the prescribed contents in each of one or more conversation texts as a representative candidate. A member candidate extraction part 23 extracts a section including a section related to the prescribed contents and being larger than the section related to the prescribed contents as a member candidate in each of one or more conversation texts. A partial text output part 24 outputs the extracted representative candidate and member candidate as a partial text from which an implication relation should be extracted.SELECTED DRAWING: Figure 1

Description

本発明は、情報処理システム、情報処理方法、及び、プログラムに関する。 The present invention relates to an information processing system, an information processing method, and a program.

コールセンタには、顧客から様々な製品やサービスに対する不満や要望等の意見が寄せられる。このような顧客の意見に基づいて、サービスを改善したり、製品開発に活かしたりすることが企業にとって重要である。顧客の意見を集計する場合、コールセンタにおいて収録された音声を聞いて集計すると、コストが高くなる。また、オペレータが顧客の意見を要約して入力すると、オペレータの業務が増える、あるいは、入力内容にぶれや抜けが生じる可能性がある。そのため、収録された音声に対して音声認識により生成した会話テキストから意見を抽出、及び、要約し、集計することが望まれる。 Opinions such as dissatisfaction and requests for various products and services are received from customers at the call center. It is important for companies to improve services and utilize them in product development based on customer opinions. When summing up customer opinions, cost increases when listening to voices recorded at the call center. Further, when the operator summarizes and inputs the customer's opinions, there is a possibility that the operator's work increases or the input content is blurred or missing. Therefore, it is desired to extract, summarize, and tabulate opinions from the conversation text generated by voice recognition for the recorded voice.

このような、テキストに含まれる意見の集計に利用可能な技術として、例えば、非特許文献１には、テキスト間の含意関係を抽出し、含意関係があるテキストを同じグループに分類する、含意クラスタリング技術が開示されている。含意関係とは、テキスト間の意味の関係であり、第１のテキストの内容から第２のテキストの内容が読み取れる場合、第１のテキストが第２のテキストを含意すると定義される。含意クラスタリング技術では、グループ内のテキストが共通に含意するテキストが代表文として抽出される。このような含意クラスタリング技術を用いることにより、テキストに含まれる話題の観点をもれなく、かつ、明確に抽出できる。 As a technique that can be used for tabulating opinions included in text, for example, in Non-Patent Document 1, implication relationships between texts are extracted, and texts with implication relationships are classified into the same group. Technology is disclosed. An implication relationship is a semantic relationship between texts, and when the content of the second text can be read from the content of the first text, the first text is defined to imply the second text. In the implication clustering technique, texts that commonly imply text in a group are extracted as representative sentences. By using such an implication clustering technique, it is possible to extract clearly from the viewpoint of topics included in the text.

特許第５３８７８７０号公報Japanese Patent No. 5387870

「NEC、大量の文書データを同じ意味で自動グループ化する技術を開発」、[online]、日本電気株式会社、[平成27年1月8日検索]、インターネット<URL:http://jpn.nec.com/press/201411/20141118_02.html>"NEC develops technology to automatically group large amounts of document data in the same sense", [online], NEC Corporation, [January 8, 2015 search], Internet <URL: http: // jpn. nec.com/press/201411/20141118_02.html>

会話テキストは、文書として構造化されておらず、挨拶や冗長語、あるいは、クラスタリング対象の意見以外の冗長部分を含むテキストである。このため、会話テキストについては、このような冗長部分を除去しないと、含意関係の抽出が正しく行われない可能性がある。 The conversation text is not structured as a document, but is a text including a redundant part other than greetings, redundant words, or opinions to be clustered. For this reason, for conversational text, the extraction of implications may not be performed correctly unless such redundant portions are removed.

また、会話テキストは、例えば、音声認識による無音区間よって分割され、文法的な文と異なる単位で文が生成される。このため、文単位で含意クラスタリングを行うと、短すぎる文を中心にグループが形成され、意味のある意見の集計が行えない可能性がある。 In addition, the conversation text is divided by, for example, a silent section by voice recognition, and sentences are generated in units different from grammatical sentences. For this reason, if implication clustering is performed in units of sentences, a group is formed around sentences that are too short, and there is a possibility that meaningful opinions cannot be aggregated.

さらに、会話テキストでは、会話の流れの中で主語と述語が離れ、意見が一つの文で的確に表現されないことがある。この場合、含意クラスタリングの対象として、例えば、複数の文の内、意見が的確に表現されている部分だけを抽出すると、集計に漏れが生じる可能性がある。逆に、含意クラスタリングの対象として、複数の文にまたがった大きな区間を抽出すると、上述の冗長部分が含まれてしまい、含意関係の抽出が行われない可能性がある。 Furthermore, in the conversation text, the subject and the predicate are separated in the flow of conversation, and the opinion may not be expressed accurately in one sentence. In this case, for example, if only a part in which an opinion is accurately expressed in a plurality of sentences is extracted as an object of implication clustering, there is a possibility that omission may occur in the aggregation. Conversely, if a large section extending over a plurality of sentences is extracted as an object of implication clustering, the above-described redundant portion is included, and there is a possibility that the extraction of the implication relationship is not performed.

このように、会話テキストに含意クラスタリング技術を適用すると、クラスタリング精度が低下するという技術課題があった。 As described above, when the implication clustering technique is applied to the conversation text, there is a technical problem that the clustering accuracy is lowered.

本発明の目的は、上述の技術課題を解決し、会話テキストに対する含意クラスタリングの精度を向上できる、情報処理システム、情報処理方法、及び、プログラムを提供することである。 An object of the present invention is to provide an information processing system, an information processing method, and a program that can solve the above-described technical problems and improve the accuracy of implication clustering for conversational text.

上述の技術課題を解決するための技術手段として、本発明の情報処理システムは、１以上のテキストの各々から、他の部分テキストにより含意される可能性が高い部分テキストである被含意候補テキストを抽出する被含意候補抽出手段と、前記１以上のテキストの各々から、他の部分テキストを含意する可能性が高い部分テキストである含意候補テキストを抽出する含意候補抽出手段と、前記抽出された被含意候補テキストと含意候補テキストとを、含意関係を抽出すべき複数の部分テキストとして出力する出力手段と、を備える。 As a technical means for solving the above technical problem, the information processing system according to the present invention generates an implication candidate text that is a partial text highly likely to be implied by another partial text from each of one or more texts. An implication candidate extraction means for extracting; an implication candidate extraction means for extracting an implication candidate text that is a partial text that is likely to imply another partial text from each of the one or more texts; Output means for outputting the implication candidate text and the implication candidate text as a plurality of partial texts from which an implication relationship is to be extracted.

本発明の情報処理方法は、１以上のテキストの各々において、所定の内容に係る区間の内、当該所定の内容が明示されている可能性が高い区間を、被含意候補テキストとして抽出し、前記１以上のテキストの各々において、前記所定の内容に係る区間を包含する、当該所定の内容に係る区間より大きな区間を、含意候補テキストとして抽出し、前記抽出された被含意候補テキストと含意候補テキストとを、含意関係を抽出すべき部分テキストとして出力する。 In the information processing method of the present invention, in each of one or more texts, a section that has a high possibility that the predetermined content is clearly specified is extracted as an entailment candidate text, among the sections related to the predetermined content, In each of one or more texts, a section larger than the section related to the predetermined contents including the section related to the predetermined contents is extracted as an implication candidate text, and the extracted implication candidate text and the implication candidate text are extracted. Are output as a partial text from which an implication relationship is to be extracted.

本発明のプログラムは、コンピュータに、１以上のテキストの各々において、所定の内容に係る区間の内、当該所定の内容が明示されている可能性が高い区間を、被含意候補テキストとして抽出し、前記１以上のテキストの各々において、前記所定の内容に係る区間を包含する、当該所定の内容に係る区間より大きな区間を、含意候補テキストとして抽出し、前記抽出された被含意候補テキストと含意候補テキストとを、含意関係を抽出すべき部分テキストとして出力する、処理を実行させる。 The program of the present invention extracts, to each of one or more texts, a section having a high possibility that the predetermined contents are clearly specified as an entailment candidate text in the sections related to the predetermined contents. In each of the one or more texts, a section larger than the section related to the predetermined contents including the section related to the predetermined contents is extracted as an implication candidate text, and the extracted implication candidate text and implication candidate A process of outputting the text as a partial text from which an implication relationship is to be extracted is executed.

本発明の技術効果は、会話テキストに対する含意クラスタリングの精度を向上できることである。 The technical effect of the present invention is that the accuracy of implication clustering for conversational text can be improved.

本発明の第１の実施の形態の基本的な構成を示すブロック図である。It is a block diagram which shows the basic composition of the 1st Embodiment of this invention. 本発明の第１の実施の形態における、クラスタリングシステム１の構成を示すブロック図である。It is a block diagram which shows the structure of the clustering system 1 in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、コンピュータにより実現されたクラスタリングシステム１の構成を示すブロック図である。It is a block diagram which shows the structure of the clustering system 1 implement | achieved by the computer in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、クラスタリングシステム１の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the clustering system 1 in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、会話テキスト８１の例を示す図である。It is a figure which shows the example of the conversation text 81 in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、部分テキスト抽出の例を示す図である。It is a figure which shows the example of partial text extraction in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、含意関係の判定処理を行う部分テキストの組と抽出結果を示す図である。It is a figure which shows the combination of the partial text which performs the implication relation determination process, and the extraction result in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、グループ８４の生成結果を示す図である。It is a figure which shows the production | generation result of the group 84 in the 1st Embodiment of this invention. 本発明の第２の実施の形態における、含意関係の判定処理を行う部分テキストの組と抽出結果を示す図である。It is a figure which shows the group of the partial text which performs the implication relation determination process, and the extraction result in the 2nd Embodiment of this invention. 本発明の第２の実施の形態における、含意関係の判定処理を行う部分テキストの組と抽出結果の他の例を示す図である。It is a figure which shows the other example of the group of the partial text which performs the implication relation determination process, and the extraction result in the 2nd Embodiment of this invention.

（第１の実施の形態）
本発明の第１の実施の形態について説明する。 (First embodiment)
A first embodiment of the present invention will be described.

本発明の第１の実施の形態では、コールセンタにおける会話テキスト８１をもとに、製品について発生した不具合に係る含意クラスタリングを行う場合を例に説明する。 In the first embodiment of the present invention, a case will be described as an example in which implication clustering related to a defect occurring in a product is performed based on conversation text 81 in a call center.

また、本発明の第１の実施の形態では、含意関係を、特許文献１と同様に、次のように定義する。すなわち、第１のテキストの内容から第２のテキストの内容が読み取れる場合、第１のテキストが第２のテキストを含意すると定義する。また、第１のテキストの内容が真であるならば第２のテキストの内容が真である場合、第１のテキストが第２のテキストを含意すると定義してもよい。 In the first embodiment of the present invention, the implication relationship is defined as follows, as in Patent Document 1. That is, when the content of the second text can be read from the content of the first text, it is defined that the first text implies the second text. Further, if the content of the first text is true, the first text may be defined to imply the second text if the content of the second text is true.

はじめに、本発明の第１の実施の形態の構成を説明する。 First, the configuration of the first exemplary embodiment of the present invention will be described.

図２は、本発明の第１の実施の形態における、クラスタリングシステム１の構成を示すブロック図である。 FIG. 2 is a block diagram showing a configuration of the clustering system 1 in the first exemplary embodiment of the present invention.

図２を参照すると、本発明の第１の実施の形態におけるクラスタリングシステム１は、会話テキスト記憶部１０、部分テキスト抽出部２０、部分テキスト記憶部３０、含意関係抽出部４０、及び、グループ生成部５０を含む。クラスタリングシステム１は、本発明の情報処理システムの一実施形態である。 Referring to FIG. 2, the clustering system 1 according to the first embodiment of the present invention includes a conversation text storage unit 10, a partial text extraction unit 20, a partial text storage unit 30, an implication relationship extraction unit 40, and a group generation unit. 50 is included. The clustering system 1 is an embodiment of the information processing system of the present invention.

会話テキスト記憶部１０は、１以上の会話テキスト８１（または、単にテキスト）を記憶する。 The conversation text storage unit 10 stores one or more conversation texts 81 (or simply text).

部分テキスト抽出部２０は、会話テキスト８１から、クラスタリング（含意関係の抽出、及び、グループの生成）の対象である、複数の部分テキストを抽出する。部分テキスト抽出部２０は、このような部分テキストとして、後述する、代表候補８２（または、被含意候補テキスト）とメンバ候補８３（または、含意候補テキスト）とを抽出する。 The partial text extraction unit 20 extracts a plurality of partial texts that are targets of clustering (extraction of implications and generation of groups) from the conversation text 81. The partial text extraction unit 20 extracts representative candidates 82 (or implication candidate texts) and member candidates 83 (or implication candidate texts), which will be described later, as such partial texts.

部分テキスト記憶部３０は、部分テキスト抽出部２０により抽出された部分テキスト（代表候補８２、メンバ候補８３）を記憶する。 The partial text storage unit 30 stores the partial text (representative candidate 82 and member candidate 83) extracted by the partial text extraction unit 20.

含意関係抽出部４０は、部分テキスト記憶部３０に記憶された部分テキスト間の含意関係を抽出する。 The implication relationship extraction unit 40 extracts an implication relationship between partial texts stored in the partial text storage unit 30.

グループ生成部５０は、含意関係抽出部４０により抽出された部分テキスト間の含意関係をもとに、部分テキストの内のある部分テキストを代表テキスト、当該部分テキストを含意する他の部分テキストをメンバとするグループを生成する。代表テキストは、グループを代表する（グループの概観を把握できる）テキストである。 Based on the implication relationship between the partial texts extracted by the implication relationship extraction unit 40, the group generation unit 50 uses the partial text in the partial text as the representative text, and other partial texts that imply the partial text as members. Generate a group. The representative text is a text that represents the group (can grasp an overview of the group).

上述の部分テキスト抽出部２０は、発話区間抽出部２１、代表候補抽出部２２（または、被含意候補抽出部）、メンバ候補抽出部２３（または、含意候補抽出部）、及び、部分テキスト出力部２４（または、単に、出力部）を含む。 The partial text extraction unit 20 includes an utterance section extraction unit 21, a representative candidate extraction unit 22 (or an implication candidate extraction unit), a member candidate extraction unit 23 (or an implication candidate extraction unit), and a partial text output unit. 24 (or simply an output unit).

発話区間抽出部２１は、各会話テキスト８１を、複数の発話区間９１（または、単に、区間）に分割し、当該複数の発話区間９１から、対象区間９２を抽出する。対象区間９２は、複数の発話区間９１の内、クラスタリング対象の話題や内容（所定の内容）の少なくとも一部が含まれている区間（所定の内容に係る区間）である。所定の内容としては、製品に係るコールセンタの会話テキスト８１をクラスタリングする場合、例えば、製品について発生した不具合等の現象や、その原因、対策、あるいは、製品についての要求、不満、評価等の意見が用いられる。 The utterance section extraction unit 21 divides each conversation text 81 into a plurality of utterance sections 91 (or simply sections), and extracts a target section 92 from the plurality of utterance sections 91. The target section 92 is a section (a section related to the predetermined contents) including at least a part of the topics and contents (predetermined contents) to be clustered among the plurality of utterance sections 91. As the predetermined content, when clustering call center conversation text 81 related to a product, for example, a phenomenon such as a defect that has occurred in the product, its cause, countermeasure, or an opinion such as a request, dissatisfaction, evaluation, etc. Used.

代表候補抽出部２２は、発話区間抽出部２１により抽出された対象区間９２の内、クラスタリング対象の内容が明示されている（的確、簡潔に表している）可能性が高い対象区間９２を、代表候補８２として抽出する。 The representative candidate extraction unit 22 represents the target section 92 that has a high possibility that the content of the clustering target is clearly indicated (exactly and concisely) among the target sections 92 extracted by the utterance section extraction unit 21. Extract as candidate 82.

メンバ候補抽出部２３は、発話区間抽出部２１により抽出された対象区間９２を包含する、当該対象区間９２より大きな区間を、メンバ候補８３として抽出する。 The member candidate extraction unit 23 extracts a section larger than the target section 92 including the target section 92 extracted by the speech section extraction section 21 as the member candidate 83.

部分テキスト出力部２４は、抽出された代表候補８２とメンバ候補８３とを、クラスタリングを行う単位である、部分テキストとして出力する。 The partial text output unit 24 outputs the extracted representative candidate 82 and member candidate 83 as a partial text, which is a unit for clustering.

ここで、上述のように、代表候補８２は、クラスタリング対象の内容が明示されている可能性が高い対象区間９２である。一方、メンバ候補８３は、クラスタリング対象の内容の少なくとも一部を含む対象区間９２を包含し、対象区間９２より大きな区間である。したがって、代表候補８２は、メンバ候補８３に比べて少ない単語数で、クラスタリング対象の内容を、的確、簡潔に表している可能性が高い。また、メンバ候補８３は、クラスタリング対象の内容ではない冗長部分を含むものの、クラスタリング対象の内容を表している可能性が高い。このため、代表候補８２とメンバ候補８３とをクラスタリングを行う単位として、含意クラスタリングを行えば、代表候補８２を代表テキスト、当該代表候補８２を含意するメンバ候補８３をメンバに設定したグループが生成される可能性が高い。すなわち、含意クラスタリングにおいて、クラスタリング対象の内容が明示されている代表テキストと、当該代表テキストを含意するメンバとからなる、適切なグループが生成できる。 Here, as described above, the representative candidate 82 is the target section 92 in which the content of the clustering target is highly likely to be clearly indicated. On the other hand, the member candidate 83 includes a target section 92 including at least a part of the contents to be clustered and is a section larger than the target section 92. Therefore, the representative candidate 82 is likely to accurately and concisely represent the contents of the clustering target with a smaller number of words than the member candidate 83. Moreover, although the member candidate 83 includes a redundant portion that is not the content of the clustering target, it is highly likely that the member candidate 83 represents the content of the clustering target. Therefore, if implication clustering is performed using the representative candidate 82 and the member candidate 83 as a unit for clustering, a group is generated in which the representative candidate 82 is set as the representative text and the member candidate 83 that implies the representative candidate 82 is set as the member. There is a high possibility. That is, in the implication clustering, an appropriate group can be generated that includes representative text in which the contents to be clustered are specified and members that imply the representative text.

なお、クラスタリングシステム１は、ＣＰＵ（Central Processing Unit）とプログラムを記憶した記憶媒体を含み、プログラムに基づく制御によって動作するコンピュータであってもよい。 The clustering system 1 may be a computer that includes a CPU (Central Processing Unit) and a storage medium that stores a program, and that operates by control based on the program.

図３は、本発明の第１の実施の形態における、コンピュータにより実現されたクラスタリングシステム１の構成を示すブロック図である。 FIG. 3 is a block diagram showing a configuration of the clustering system 1 realized by a computer according to the first embodiment of the present invention.

クラスタリングシステム１は、ＣＰＵ２、ハードディスクやメモリ等の記憶デバイス（記憶媒体）３、他の装置等と通信を行う通信デバイス４、マウスやキーボード等の入力デバイス５、及び、ディスプレイ等の出力デバイス６を含む。 The clustering system 1 includes a CPU 2, a storage device (storage medium) 3 such as a hard disk and a memory, a communication device 4 that communicates with other devices, an input device 5 such as a mouse and a keyboard, and an output device 6 such as a display. Including.

ＣＰＵ２は、部分テキスト抽出部２０、含意関係抽出部４０、及び、グループ生成部５０の機能を実現するためのコンピュータプログラムを実行する。記憶デバイス３は、会話テキスト記憶部１０、及び、部分テキスト記憶部３０のデータを記憶する。入力デバイス５は、利用者等から、会話テキスト８１の入力を受け付ける。出力デバイス６は、利用者等へ、抽出された部分テキストや含意関係、生成されたグループを出力する。また、通信デバイス４が、他の装置から会話テキスト８１を受信し、他の装置へ部分テキストや含意関係、グループを送信してもよい。 The CPU 2 executes a computer program for realizing the functions of the partial text extraction unit 20, the implication relationship extraction unit 40, and the group generation unit 50. The storage device 3 stores data of the conversation text storage unit 10 and the partial text storage unit 30. The input device 5 receives input of the conversation text 81 from a user or the like. The output device 6 outputs the extracted partial text, implication relationship, and generated group to the user or the like. Further, the communication device 4 may receive the conversation text 81 from another device and transmit the partial text, the implication relationship, and the group to the other device.

また、クラスタリングシステム１は、図２に示された各構成要素が、有線または無線で接続された複数の物理的な装置に分散的に配置されることより構成されていてもよい。 Further, the clustering system 1 may be configured by disposing each component shown in FIG. 2 in a plurality of physical devices connected by wire or wirelessly.

次に、本発明の第１の実施の形態の動作について説明する。 Next, the operation of the first exemplary embodiment of the present invention will be described.

図５は、本発明の第１の実施の形態における、会話テキスト８１の例を示す図である。会話テキスト８１は、例えば、コールセンタにおいて収録された音声データに対する音声認識により生成される。図５において、会話テキスト８１ａ、ｂ、…に付与されている「ＣＵ：」、「ＯＰ：」は、それぞれ、顧客、オペレータの発話であることを示す。 FIG. 5 is a diagram showing an example of the conversation text 81 in the first embodiment of the present invention. The conversation text 81 is generated by, for example, voice recognition for voice data recorded at a call center. In FIG. 5, “CU:” and “OP:” given to the conversation texts 81a, 81b,... Indicate that they are utterances of the customer and the operator, respectively.

ここでは、図５のような会話テキスト８１が、会話テキスト記憶部１０に記憶されていると仮定する。なお、会話テキスト記憶部１０は、各会話の会話テキスト８１を、当該会話の音声データともに関連付けて記憶していてもよい。 Here, it is assumed that the conversation text 81 as shown in FIG. 5 is stored in the conversation text storage unit 10. Note that the conversation text storage unit 10 may store the conversation text 81 of each conversation in association with the voice data of the conversation.

図４は、本発明の第１の実施における、クラスタリングシステム１の動作を示すフローチャートである。 FIG. 4 is a flowchart showing the operation of the clustering system 1 in the first embodiment of the present invention.

はじめに、部分テキスト抽出部２０の発話区間抽出部２１は、会話テキスト記憶部１０に記憶されている各会話テキスト８１を、複数の発話区間９１に分割する（ステップＳ１０１）。 First, the speech segment extraction unit 21 of the partial text extraction unit 20 divides each conversation text 81 stored in the conversation text storage unit 10 into a plurality of speech segments 91 (step S101).

ここで、発話区間抽出部２１は、例えば、会話テキスト８１を、話者の交代に応じて分割する。この場合、発話区間抽出部２１は、話者毎に用意されたマイクロフォンにおける音声の検出状況をもとに、話者の交代を検出してもよい。また、発話区間抽出部２１は、音声データをもとに話者を認識することにより、話者の交代を検出してもよい。 Here, for example, the utterance section extraction unit 21 divides the conversation text 81 according to the change of the speaker. In this case, the utterance section extraction unit 21 may detect the change of the speaker based on the sound detection status of the microphone prepared for each speaker. Further, the utterance section extraction unit 21 may detect the change of the speaker by recognizing the speaker based on the voice data.

また、発話区間抽出部２１は、会話テキスト８１を、所定の時間長以上の無音区間で分割してもよい。 Further, the utterance section extraction unit 21 may divide the conversation text 81 into silence sections having a predetermined time length or more.

また、発話区間抽出部２１は、会話テキスト８１を、言語に応じた文法的な文単位で分割してもよい。 Further, the utterance section extraction unit 21 may divide the conversation text 81 in units of grammatical sentences corresponding to the language.

また、発話区間抽出部２１は、会話テキスト８１を、予め設定された分割ルールに基づいて分割してもよい。この場合、分割ルールには、例えば、発言の先頭や末尾に現れる表現や単語が設定される。そして、発話区間抽出部２１は、それらの表現や単語で会話テキスト８１を分割する。また、この場合、分割ルールは、機械学習により学習された分割ルールでもよい。 Further, the utterance section extraction unit 21 may divide the conversation text 81 based on a preset division rule. In this case, for example, an expression or a word that appears at the beginning or end of a statement is set in the division rule. And the utterance area extraction part 21 divides | segments the conversation text 81 by those expressions and words. In this case, the division rule may be a division rule learned by machine learning.

なお、発話区間抽出部２１は、発話区間９１の抽出結果から、挨拶や冗長語のみを含むような発話区間９１を除外してもよい。 Note that the utterance section extraction unit 21 may exclude the utterance section 91 including only a greeting or a redundant word from the extraction result of the utterance section 91.

図６は、本発明の第１の実施の形態における、部分テキスト抽出の処理例を示す図である。 FIG. 6 is a diagram showing an example of partial text extraction processing in the first exemplary embodiment of the present invention.

例えば、発話区間抽出部２１は、図６に示すように、図５の会話テキスト８１ａを、話者（オペレータ、顧客）に応じて、発話区間９１ａ１、ａ２、…に分割する。同様に、発話区間抽出部２１は、会話テキスト８１ｂを、発話区間９１ｂ１、ｂ２、…に分割する。 For example, as shown in FIG. 6, the utterance section extraction unit 21 divides the conversation text 81a of FIG. 5 into utterance sections 91a1, a2,... According to the speaker (operator, customer). Similarly, the speech segment extraction unit 21 divides the conversation text 81b into speech segments 91b1, b2,.

発話区間抽出部２１は、発話区間９１から、対象区間９２を抽出する（ステップＳ１０２）。 The utterance section extraction unit 21 extracts the target section 92 from the utterance section 91 (step S102).

ここで、発話区間抽出部２１は、例えば、予め設定された対象抽出ルールに基づいて、対象区間９２を抽出してもよい。この場合、対象抽出ルールには、例えば、クラスタリング対象である所定の内容を表すときに用いられる表現や単語が設定される。そして、発話区間抽出部２１は、それらの表現や単語が含まれる発話区間９１を対象区間９２として抽出する。また、この場合、対象抽出ルールは、機械学習により学習された対象抽出ルールでもよい。 Here, the utterance section extraction unit 21 may extract the target section 92 based on, for example, a preset target extraction rule. In this case, for example, an expression or a word used when expressing predetermined contents that are clustering targets is set in the target extraction rule. Then, the utterance section extraction unit 21 extracts the utterance section 91 including those expressions and words as the target section 92. In this case, the target extraction rule may be a target extraction rule learned by machine learning.

例えば、クラスタリング対象の内容が、製品についての不具合であり、不具合を表す単語として、「動かない」、「フリーズ」等が対象抽出ルールに設定されていると仮定する。この場合、発話区間抽出部２１は、図６に示すように、会話テキスト８１ａについて、単語「動かない」を含む発話区間９１ａ５を対象区間９２ａ１として抽出する。同様に、発話区間抽出部２１は、会話テキスト８１ｂについて、単語「フリーズ」を含む発話区間９１ｂ４、９１ｂ６を対象区間９２ｂ１、９２ｂ２として抽出する。 For example, it is assumed that the content of the clustering target is a defect in the product, and “does not move”, “freeze” or the like is set in the target extraction rule as a word indicating the defect. In this case, as shown in FIG. 6, the utterance section extraction unit 21 extracts a utterance section 91a5 including the word “does not move” as the target section 92a1 from the conversation text 81a. Similarly, the speech segment extraction unit 21 extracts speech segments 91b4 and 91b6 including the word “freeze” as the target segments 92b1 and 92b2 from the conversation text 81b.

代表候補抽出部２２は、発話区間抽出部２１により抽出された対象区間９２から、代表候補８２を抽出する（ステップＳ１０３）。 The representative candidate extraction unit 22 extracts a representative candidate 82 from the target section 92 extracted by the utterance section extraction unit 21 (step S103).

ここで、代表候補抽出部２２は、クラスタリング対象である所定の内容が明示されている可能性が高い対象区間９２を、代表候補８２として抽出する。 Here, the representative candidate extraction unit 22 extracts, as the representative candidate 82, a target section 92 that is highly likely to have specified content that is a clustering target.

代表候補抽出部２２は、所定の内容が明示されている可能性が高い対象区間９２として、例えば、主語と述語とを含む対象区間９２を、代表候補８２として抽出する。この場合、代表候補抽出部２２は、代表候補８２の抽出結果から、主語や述語に指示語を含む代表候補８２を除外してもよい。また、代表候補抽出部２２は、体言と用言とを所定数以上含む対象区間９２を、代表候補８２として抽出してもよい。 The representative candidate extraction unit 22 extracts, as the representative candidate 82, for example, the target section 92 including the subject and the predicate as the target section 92 that is highly likely to have the predetermined content clearly specified. In this case, the representative candidate extraction unit 22 may exclude the representative candidate 82 including the instruction word in the subject or predicate from the extraction result of the representative candidate 82. Further, the representative candidate extraction unit 22 may extract a target section 92 including a predetermined number or more of body words and predicates as representative candidates 82.

また、代表候補抽出部２２は、予め設定された代表候補抽出ルールに基づいて、代表候補８２を抽出してもよい。この場合、代表候補抽出ルールには、例えば、クラスタリング対象である所定の内容を明確に表す文や表現が設定される。そして、代表候補抽出部２２は、それらの文や表現が含まれる対象区間９２を代表候補８２として抽出する。また、この場合、代表候補抽出ルールは、機械学習により学習された代表候補抽出ルールでもよい。 Further, the representative candidate extraction unit 22 may extract the representative candidate 82 based on a preset representative candidate extraction rule. In this case, the representative candidate extraction rule is set with, for example, a sentence or expression that clearly expresses predetermined contents that are clustering targets. Then, the representative candidate extraction unit 22 extracts a target section 92 including those sentences and expressions as a representative candidate 82. In this case, the representative candidate extraction rule may be a representative candidate extraction rule learned by machine learning.

例えば、主語と述語とを含む対象区間９２を代表候補８２として抽出する場合、図６に示すように、会話テキスト８１ａの対象区間９２ａ１には、主語が含まれない。したがって、代表候補抽出部２２は、会話テキスト８１ａについては、代表候補８２を抽出しない。一方、会話テキスト８１ｂの対象区間９２ｂ１は、主語「ＰＣ」と述語「フリーズ」を含む。したがって、代表候補抽出部２２は、対象区間９２ｂ１を代表候補８２＿１として抽出する。同様に、代表候補抽出部２２は、対象区間９２ｂ２を代表候補８２＿２として抽出する。 For example, when the target section 92 including the subject and the predicate is extracted as the representative candidate 82, the target section 92a1 of the conversation text 81a does not include the subject as shown in FIG. Therefore, the representative candidate extraction unit 22 does not extract the representative candidate 82 for the conversation text 81a. On the other hand, the target section 92b1 of the conversation text 81b includes the subject “PC” and the predicate “freeze”. Therefore, the representative candidate extraction unit 22 extracts the target section 92b1 as the representative candidate 82_1. Similarly, the representative candidate extraction unit 22 extracts the target section 92b2 as a representative candidate 82_2.

メンバ候補抽出部２３は、発話区間抽出部２１により抽出された発話区間９１から、メンバ候補８３を抽出する（ステップＳ１０４）。 The member candidate extraction unit 23 extracts member candidates 83 from the utterance section 91 extracted by the utterance section extraction unit 21 (step S104).

ここで、メンバ候補抽出部２３は、例えば、対象区間９２を含む複数の発話区間９１を、メンバ候補８３として抽出する。この場合、メンバ候補抽出部２３は、対象区間９２の所定数前の発話区間９１から所定数後の発話区間９１までを、メンバ候補８３として抽出してもよい。また、メンバ候補抽出部２３は、対象区間９２の所定時間前の発話区間９１から所定時間後の発話区間９１までを、メンバ候補８３として抽出してもよい。また、異なる二つのメンバ候補８３が重なっている、もしくは、連続する場合、メンバ候補抽出部２３は、これら二つのメンバ候補８３をマージしてもよい。 Here, the member candidate extraction unit 23 extracts, for example, a plurality of utterance sections 91 including the target section 92 as member candidates 83. In this case, the member candidate extraction unit 23 may extract, as member candidates 83, the utterance section 91 that is a predetermined number before the target section 92 to the utterance section 91 that is a predetermined number after. Further, the member candidate extraction unit 23 may extract the utterance section 91 from a predetermined time before the target section 92 to the utterance section 91 after a predetermined time as member candidates 83. When two different member candidates 83 are overlapped or continuous, the member candidate extraction unit 23 may merge these two member candidates 83.

また、メンバ候補抽出部２３は、予め設定されたメンバ候補抽出ルールに基づいて、メンバ候補８３を抽出してもよい。この場合、メンバ候補抽出ルールには、会話における話題の先頭や末尾に現れる文や表現、単語が設定される。そして、メンバ候補抽出部２３は、対象区間９２含み、かつ、それらの文や、表現、単語で分割される一連の発話区間９１を、メンバ候補８３として抽出する。また、この場合、メンバ候補抽出ルールは、機械学習により学習されたメンバ候補抽出ルールでもよい。 Moreover, the member candidate extraction part 23 may extract the member candidate 83 based on the preset member candidate extraction rule. In this case, a sentence, expression, or word that appears at the beginning or end of a topic in a conversation is set in the member candidate extraction rule. Then, the member candidate extraction unit 23 extracts a series of utterance sections 91 including the target section 92 and divided by those sentences, expressions, and words as member candidates 83. In this case, the member candidate extraction rule may be a member candidate extraction rule learned by machine learning.

例えば、対象区間９２の一つ前から一つ後の発話区間９１までをメンバ候補８３として抽出する場合、メンバ候補抽出部２３は、図６に示すように、対象区間９２ａ１を含む発話区間９１ａ４から９１ａ６までを、メンバ候補８３＿１として抽出する。また、メンバ候補抽出部２３は、対象区間９２ｂ１、９２ｂ２を含む発話区間９１ｂ３から９１ｂ７までを、メンバ候補８３＿２として抽出する。 For example, when extracting the utterance section 91 immediately before and after the target section 92 as member candidates 83, the member candidate extraction unit 23 starts from the utterance section 91a4 including the target section 92a1 as shown in FIG. Up to 91a6 are extracted as member candidates 83_1. Further, the member candidate extraction unit 23 extracts the speech sections 91b3 to 91b7 including the target sections 92b1 and 92b2 as member candidates 83_2.

部分テキスト出力部２４は、抽出された代表候補８２とメンバ候補８３とを、クラスタリングを行う単位である部分テキストとして出力し、部分テキスト記憶部３０に保存する（ステップＳ１０５）。 The partial text output unit 24 outputs the extracted representative candidate 82 and member candidate 83 as a partial text, which is a unit for clustering, and stores it in the partial text storage unit 30 (step S105).

例えば、部分テキスト出力部２４は、会話テキスト８１ａ、ｂから抽出された、代表候補８２＿１、８２＿２、メンバ候補８３＿１、８３＿２を、部分テキストとして部分テキスト記憶部３０に保存する。 For example, the partial text output unit 24 stores the representative candidates 82_1 and 82_2 and the member candidates 83_1 and 83_2 extracted from the conversation texts 81a and 81b as the partial text in the partial text storage unit 30.

次に、含意関係抽出部４０は、部分テキスト記憶部３０に記憶された部分テキスト間の含意関係を抽出する（ステップＳ１０６）。ここで、含意関係抽出部４０は、例えば、特許文献１と同様の判定処理を行うことにより、部分テキスト間の含意関係を抽出する。すなわち、含意関係抽出部４０は、部分テキストに含まれる内容語を比較し、被覆率を算出することにより、含意関係の有無を判定する。含意関係抽出部４０は、部分テキスト記憶部３０に記憶されている部分テキストの二つの組の全てについて、一方の部分テキストが他方の部分テキストを含意する方向、及び、他方の部分テキストが一方の部分テキストを含意する方向について、判定処理を行う。なお、含意関係抽出部４０は、部分テキスト間の含意関係を抽出できれば、特許文献１と異なる判定処理により、部分テキスト間の含意関係を判定してもよい。 Next, the implication relationship extraction unit 40 extracts an implication relationship between partial texts stored in the partial text storage unit 30 (step S106). Here, the implication relationship extraction unit 40 extracts the implication relationship between the partial texts by performing, for example, the same determination process as in Patent Document 1. In other words, the implication relationship extraction unit 40 compares the content words included in the partial texts and calculates the coverage rate, thereby determining the presence or absence of the implication relationship. The implication relationship extraction unit 40 has a direction in which one partial text implies the other partial text, and the other partial text is one of the two sets of partial texts stored in the partial text storage unit 30. Judgment processing is performed for directions that imply partial text. Note that the implication relationship extraction unit 40 may determine the implication relationship between the partial texts by a determination process different from that of Patent Document 1 as long as the implication relationship between the partial texts can be extracted.

図７は、本発明の第１の実施の形態における、含意関係の判定処理を行う部分テキストの組と抽出結果を示す図である。図７において、矢印（太線、及び、細線）は、判定処理が行われる部分テキストの組と方向を示す。ここで、矢印の元の部分テキストが矢印の先の部分テキストを含意する方向について、判定処理が行われる。太線は、判定処理の結果、含意関係ありと判定されたことを示す。細線は、判定処理の結果、含意関係なしと判定されたことを示す。 FIG. 7 is a diagram showing a set of partial texts for performing implication relationship determination processing and extraction results in the first embodiment of the present invention. In FIG. 7, arrows (thick lines and thin lines) indicate partial text sets and directions in which the determination process is performed. Here, a determination process is performed for a direction in which the original partial text of the arrow implies the partial text of the arrow destination. A bold line indicates that it is determined that there is an implication relationship as a result of the determination process. The thin line indicates that it is determined that there is no implication relationship as a result of the determination process.

例えば、含意関係抽出部４０は、図７に示すように、代表候補８２＿１とメンバ候補８３＿１との組について、メンバ候補８３＿１が代表候補８２＿１を含意する方向、及び、代表候補８２＿１がメンバ候補８３＿１を含意する方向の判定処理を行う。そして、含意関係抽出部４０は、メンバ候補８３＿１が代表候補８２＿１を含意する方向の含意関係ありと判定する。他の組（代表候補８２＿１とメンバ候補８３＿２、代表候補８２＿２とメンバ候補８３＿１、代表候補８２＿２とメンバ候補８３＿２、代表候補８２＿１と代表候補８２＿２、メンバ候補８３＿１とメンバ候補８３＿２）についても同様に、判定処理が行われる。この結果、含意関係抽出部４０は、図７に示すように、含意関係を抽出する。 For example, as illustrated in FIG. 7, the implication relationship extraction unit 40 sets the direction in which the member candidate 83_1 implies the representative candidate 82_1 and the representative candidate 82_1 sets the member candidate 83_1 for the set of the representative candidate 82_1 and the member candidate 83_1. The process of determining the direction to imply is performed. Then, the implication relationship extraction unit 40 determines that there is an implication relationship in a direction in which the member candidate 83_1 implies the representative candidate 82_1. Similarly, other groups (representative candidate 82_1 and member candidate 83_2, representative candidate 82_2 and member candidate 83_1, representative candidate 82_2 and member candidate 83_2, representative candidate 82_1 and representative candidate 82_2, member candidate 83_1 and member candidate 83_2) are determined in the same manner. Processing is performed. As a result, the implication relationship extraction unit 40 extracts the implication relationship as shown in FIG.

グループ生成部５０は、含意関係抽出部４０により抽出された部分テキスト間の含意関係をもとに、ある部分テキストを代表テキスト、当該部分テキストを含意する他の部分テキストをメンバとするグループ８４を生成する（ステップＳ１０７）。 Based on the implication relationship between the partial texts extracted by the implication relationship extraction unit 40, the group generation unit 50 creates a group 84 having a partial text as a representative text and another partial text implying the partial text as a member. Generate (step S107).

図８は、本発明の第１の実施の形態における、グループ８４の生成結果を示す図である。例えば、グループ生成部５０は、図７の含意関係をもとに、図８に示すように、代表候補８２＿１を代表テキスト、代表候補８２＿１を含意する代表候補８２＿２、メンバ候補８３＿１、８３＿２をメンバとするグループ８４＿１を生成する。同様に、グループ生成部５０は、代表候補８２＿２を代表テキスト、代表候補８２＿２を含意するメンバ候補８３＿２をメンバとするグループ８４＿２生成する。 FIG. 8 is a diagram illustrating a generation result of the group 84 according to the first embodiment of this invention. For example, based on the implication relationship of FIG. 7, the group generation unit 50 sets the representative candidate 82_1 as representative text, the representative candidate 82_2 implying the representative candidate 82_1, and the member candidates 83_1 and 83_2 as members, as shown in FIG. Group 84_1 to be generated is generated. Similarly, the group generation unit 50 generates a group 84_2 having the representative candidate 82_2 as the representative text and the member candidate 83_2 implying the representative candidate 82_2 as a member.

なお、グループ生成部５０は、さらに、異なる二つのグループ間のメンバの重複の度合いを基に、当該二つのグループを一つのグループに統合してもよい。 The group generation unit 50 may further integrate the two groups into one group based on the degree of duplication of members between two different groups.

以上により、本発明の第１の実施の形態の動作が完了する。 Thus, the operation of the first exemplary embodiment of the present invention is completed.

なお、本発明の第１の実施では、クラスタリング対象のテキストが、複数話者の会話についての音声データをもとに生成された会話テキスト８１であり、クラスタリング対象の内容が、製品について発生した不具合である場合を例に説明した。 In the first embodiment of the present invention, the text to be clustered is the conversation text 81 generated based on the speech data about the conversations of a plurality of speakers, and the content of the clustering target is a defect that has occurred in the product. The case has been described as an example.

しかしながら、これに限らず、クラスタリング対象のテキストとして、チャットや電子メール、電子掲示板等、テキスト形式のメッセージデータをもとに生成されたテキストを用いてもよい。また、クラスタリング対象のテキストとして、一人の話者によるスピーチに対して生成されたテキストを用いてもよい。また、クラスタリング対象の内容（話題）として、不具合以外の様々な現象や事象、それらの原因、対策等を用いてもよい。また、クラスタリング対象の内容（話題）として、気象、災害、経済、社会等、様々なカテゴリーおける現象や事象等を用いてもよい。また、クラスタリング対象の内容（話題）として、様々なカテゴリーおける話者の要求、不満、評価等、話者の意見を用いてもよい。 However, the present invention is not limited to this, and text generated based on text-format message data such as chat, e-mail, and electronic bulletin board may be used as the text to be clustered. In addition, text generated for a speech by a single speaker may be used as the text to be clustered. Further, as the contents (topics) to be clustered, various phenomena and events other than defects, their causes, countermeasures, and the like may be used. Further, as contents (topics) to be clustered, phenomena and events in various categories such as weather, disaster, economy, society, etc. may be used. Further, as the contents (topics) to be clustered, speaker opinions such as request, dissatisfaction, and evaluation of speakers in various categories may be used.

次に、本発明の第１の実施の形態の基本的な構成を説明する。 Next, the basic configuration of the first exemplary embodiment of the present invention will be described.

図１は、本発明の第１の実施の形態の基本的な構成を示すブロック図である。図１を参照すると、クラスタリングシステム１（情報処理システム）は、代表候補抽出部２２（被含意候補抽出部）、メンバ候補抽出部２３（含意候補抽出部）、及び、部分テキスト出力部２４（出力部）を含む。 FIG. 1 is a block diagram showing a basic configuration of the first embodiment of the present invention. Referring to FIG. 1, the clustering system 1 (information processing system) includes a representative candidate extraction unit 22 (implication candidate extraction unit), a member candidate extraction unit 23 (entailment candidate extraction unit), and a partial text output unit 24 (output). Part).

代表候補抽出部２２は、１以上の会話テキスト８１（テキスト）の各々において、所定の内容に係る区間の内、当該所定の内容が明示されている可能性が高い区間を、代表候補８２（被含意候補テキスト）として抽出する。メンバ候補抽出部２３は、１以上の会話テキスト８１の各々において、所定の内容に係る区間を包含する、当該所定の内容に係る区間より大きな区間を、メンバ候補８３（含意候補テキスト）として抽出する。部分テキスト出力部２４は、抽出された代表候補８２とメンバ候補８３とを、含意関係を抽出すべき部分テキストとして出力する。 In each of the one or more conversation texts 81 (text), the representative candidate extraction unit 22 selects a section that has a high possibility that the predetermined content is clearly indicated among the sections related to the predetermined content. Extracted as implication candidate text). In each of the one or more conversation texts 81, the member candidate extraction unit 23 extracts a section that includes a section related to the predetermined content and is larger than the section related to the predetermined content as the member candidate 83 (entailment candidate text). . The partial text output unit 24 outputs the extracted representative candidate 82 and member candidate 83 as a partial text whose implication relationship is to be extracted.

次に、本発明の第１の実施の形態の効果を説明する。 Next, effects of the first exemplary embodiment of the present invention will be described.

本発明の第１の実施の形態によれば、会話テキストに対する含意クラスタリングの精度を向上できる。その理由は、会話テキスト８１の各々において、クラスタリング対象の内容に係る区間の内、当該クラスタリング対象の内容が明示されている可能性が高い区間を代表候補８２、クラスタリング対象の内容に係る区間を包含するより大きな区間をメンバ候補８３として抽出するためである。 According to the first embodiment of the present invention, the accuracy of implication clustering for conversational text can be improved. The reason is that in each of the conversation texts 81, among the sections related to the contents of the clustering target, a section where the content of the clustering target is highly likely to be clearly included includes the representative candidate 82 and the section related to the contents of the clustering target. This is because a larger section is extracted as the member candidate 83.

（第２の実施の形態）
次に、本発明の第２の実施の形態について説明する。 (Second Embodiment)
Next, a second embodiment of the present invention will be described.

本発明の第１の実施の形態では、含意関係抽出部４０が、部分テキストの二つの組の全てについて、一方の部分テキストが他方の部分テキストを含意する方向、及び、他方の部分テキストが一方の部分テキストを含意する方向について、判定処理を行った。しかしながら、部分テキストの数が多い場合、判定処理を行う組の数が膨大となり、含意関係抽出の処理時間が大きくなるという問題がある。 In the first embodiment of the present invention, the implication relation extraction unit 40 has a direction in which one partial text implies the other partial text and one of the other partial texts for all two sets of partial texts. Judgment processing was performed for the direction that implied the partial text. However, when the number of partial texts is large, there is a problem that the number of sets for which the determination process is performed becomes enormous, and the processing time for extracting the implication relationship becomes long.

ここで、部分テキストの二つの組の内で、メンバ候補８３間に含意関係が存在する可能性は低いと考えられる。同様に、メンバ候補８３が代表候補８２を含意する方向の含意関係が存在する可能性も低いと考えられる。 Here, it is considered that it is unlikely that an implication relationship exists between the member candidates 83 in the two sets of partial texts. Similarly, the possibility that there is an implication relationship in a direction in which the member candidate 83 implies the representative candidate 82 is considered to be low.

そこで、本発明の第２の実施の形態では、含意関係抽出部４０は、部分テキストの二つの組の内で、このような存在する可能性が低い組、及び、方向の含意関係を除いた、メンバ候補８３が代表候補８２を含意する方向の含意関係のみについて、判定処理を行う。 Therefore, in the second embodiment of the present invention, the implication relation extraction unit 40 excludes such a pair that is unlikely to exist and a direction implication relation among the two sets of partial texts. The determination process is performed only for the implication relationship in the direction in which the member candidate 83 implies the representative candidate 82.

図９は、本発明の第２の実施の形態における、含意関係の判定処理を行う部分テキストの組と抽出結果を示す図である。 FIG. 9 is a diagram illustrating a set of partial texts for performing implication relation determination processing and extraction results in the second embodiment of the present invention.

例えば、含意関係抽出部４０は、図９に示すように、代表候補８２＿１とメンバ候補８３＿１との組、及び、代表候補８２＿１とメンバ候補８３＿２との組について、メンバ候補８３が代表候補８２を含意する方向について判定処理を行う。また、含意関係抽出部４０は、代表候補８２＿２とメンバ候補８３＿１との組、及び、代表候補８２＿２とメンバ候補８３＿２との組について、メンバ候補８３が代表候補８２を含意する方向について判定処理を行う。この結果、意関係抽出部４０は、図９に示すように、含意関係を抽出する。 For example, as shown in FIG. 9, the implication relationship extraction unit 40 implies that the member candidate 83 implies the representative candidate 82 for the set of the representative candidate 82_1 and the member candidate 83_1 and the set of the representative candidate 82_1 and the member candidate 83_2. The determination process is performed for the direction to be performed. Further, the implication relationship extraction unit 40 performs a determination process on the direction in which the member candidate 83 implies the representative candidate 82 for the pair of the representative candidate 82_2 and the member candidate 83_1 and the pair of the representative candidate 82_2 and the member candidate 83_2. . As a result, the meaning relationship extraction unit 40 extracts an implication relationship as shown in FIG.

さらに、含意関係抽出部４０は、先に代表候補８２間の判定処理を行った後に、メンバ候補８３が代表候補８２を含意する方向の判定処理を行ってもよい。この場合、含意関係抽出部４０は、メンバ候補８３が代表候補８２を含意すると判定したときに、既に、当該代表候補８２が他の代表候補８２を含意すると判定済みの場合、当該メンバ候補８３が当該他の代表候補８２を含意する方向の判定処理を省略する。そして、含意関係抽出部４０は、判定処理を行わずに、当該メンバ候補８３が当該他の代表候補８２を含意すると決定する。 Furthermore, the implication relationship extraction unit 40 may perform a determination process in a direction in which the member candidate 83 implies the representative candidate 82 after performing the determination process between the representative candidates 82 first. In this case, when the implication relation extraction unit 40 determines that the member candidate 83 implies the representative candidate 82, and already determines that the representative candidate 82 implies another representative candidate 82, the member candidate 83 is The process of determining the direction implying the other representative candidate 82 is omitted. Then, the implication relationship extraction unit 40 determines that the member candidate 83 implies the other representative candidate 82 without performing the determination process.

図１０は、本発明の第２の実施の形態における、含意関係の判定処理を行う部分テキストの組と抽出結果の他の例を示す図である。 FIG. 10 is a diagram showing another example of a partial text set and extraction result for performing implication relationship determination processing in the second exemplary embodiment of the present invention.

例えば、含意関係抽出部４０は、図１０に示すように、代表候補８２＿１と代表候補８２＿２との組について判定処理を行い、代表候補８２＿２が代表候補８２＿１を含意すると判定する。そして、含意関係抽出部４０が、メンバ候補８３＿２が代表候補８２＿２を含意すると判定したときに、メンバ候補８３＿２と代表候補８２＿１との組についての判定処理を行うことなく、メンバ候補８３＿２が代表候補８２＿１を含意すると決定する。 For example, as illustrated in FIG. 10, the implication relationship extraction unit 40 performs a determination process on a pair of the representative candidate 82_1 and the representative candidate 82_2, and determines that the representative candidate 82_2 implies the representative candidate 82_1. Then, when the implication relation extraction unit 40 determines that the member candidate 83_2 implies the representative candidate 82_2, the member candidate 83_2 does not perform the determination process on the pair of the member candidate 83_2 and the representative candidate 82_1, and the member candidate 83_2 does not represent the representative candidate 82_1. To imply.

次に、本発明の第２の実施の形態の効果を説明する。 Next, effects of the second exemplary embodiment of the present invention will be described.

本発明の第２の実施の形態によれば、部分テキストの数が多い場合でも、含意関係抽出の処理時間の増加を抑えることできる。その理由は、含意関係抽出部４０が、部分テキストの二つの組や組における含意関係の方向の内で、存在する可能性が低い組、及び、方向の含意関係を除いて、含意関係の判定処理を行うためである。 According to the second embodiment of the present invention, it is possible to suppress an increase in the processing time for implication relation extraction even when the number of partial texts is large. The reason for this is that the implication relation extraction unit 40 determines implication relations except for the pair of partial texts and the direction of the implication relations in the pair, which are unlikely to exist and the implication relation of the direction This is for processing.

以上、実施形態を参照して本願発明を説明したが、本願発明は上記実施形態に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。 While the present invention has been described with reference to the embodiments, the present invention is not limited to the above embodiments. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

本発明は、コールセンタ等における会話に係る会話テキストや、チャット、電子メール、ブログ、電子掲示板等における会話テキストに含まれる内容を概観するためのシステムに適用できる。 The present invention can be applied to a system for overviewing contents included in conversation text related to conversation in a call center or the like, and conversation text in a chat, electronic mail, blog, electronic bulletin board, or the like.

１クラスタリングシステム
２ＣＰＵ
３記憶デバイス
４通信デバイス
５入力デバイス
６出力デバイス
１０会話テキスト記憶部
２０部分テキスト抽出部
２１発話区間抽出部
２２代表候補抽出部
２３メンバ候補抽出部
２４部分テキスト出力部
３０部分テキスト記憶部
４０含意関係抽出部
５０グループ生成部
８１会話テキスト
８２代表候補
８３メンバ候補
８４グループ
９１発話区間
９２対象区間 1 Clustering system 2 CPU
DESCRIPTION OF SYMBOLS 3 Memory | storage device 4 Communication device 5 Input device 6 Output device 10 Conversation text memory | storage part 20 Partial text extraction part 21 Speech area extraction part 22 Representative candidate extraction part 23 Member candidate extraction part 24 Partial text output part 30 Partial text storage part 40 Implication relationship Extraction unit 50 Group generation unit 81 Conversation text 82 Representative candidate 83 Member candidate 84 Group 91 Speaking section 92 Target section

Claims

In each of the one or more texts, an implication candidate extraction unit that extracts, as an implication candidate text, an interval that has a high possibility that the predetermined content is clearly specified in an interval related to the predetermined content;
In each of the one or more texts, an implication candidate extracting unit that extracts a section larger than the section related to the predetermined content, including the section related to the predetermined content, as an implication candidate text;
An output means for outputting the extracted implication candidate text and the implication candidate text as a partial text from which an implication relationship is to be extracted;
Information processing system with

The implication candidate extraction means extracts, as the implication candidate text, an interval including a subject and a predicate in an interval related to the predetermined content.
The information processing system according to claim 1.

The implication candidate extraction means extracts, as the implication candidate text, a section including a predetermined sentence or expression in the section related to the predetermined content.
The information processing system according to claim 1.

The implication candidate extraction means extracts a section composed of a plurality of continuous sections including the section related to the predetermined content as the implication candidate text.
The information processing system according to claim 1.

Furthermore, an implication relationship extracting means for extracting an implication relationship between the partial texts is provided.
The information processing system according to any one of claims 1 to 4.

The implication relationship extraction means is an implication relationship excluding an implication relationship between the implication candidate texts and an implication relationship in which the implication candidate text implies the implication candidate text among the implication relationships between the partial texts. Determine the presence or absence,
The information processing system according to claim 5.

When the implication relation extraction unit determines that the implication candidate text implies the other implication candidate text when it is determined that the implication candidate text implies the implication candidate text, the implication candidate text is It is determined to entail the other implication candidate text,
The information processing system according to claim 6.

Furthermore, based on the extracted implication relationship, a group generation means for generating a group having a partial text that implies one partial text of the partial text as a member,
The information processing system according to claim 1.

In each of the one or more texts, a section having a high possibility that the predetermined content is clearly specified is extracted as an entailment candidate text among the sections related to the predetermined content.
In each of the one or more texts, a section larger than the section related to the predetermined content including the section related to the predetermined content is extracted as an implication candidate text.
The extracted implication candidate text and the implication candidate text are output as a partial text from which an implication relationship is to be extracted.
Information processing method.

On the computer,
In each of the one or more texts, a section having a high possibility that the predetermined content is clearly specified is extracted as an entailment candidate text among the sections related to the predetermined content.
In each of the one or more texts, a section larger than the section related to the predetermined content including the section related to the predetermined content is extracted as an implication candidate text.
The extracted implication candidate text and the implication candidate text are output as a partial text from which an implication relationship is to be extracted.
A program that executes processing.