JP2013045363A

JP2013045363A - Context dependency estimation device, speech clustering device, method, and program

Info

Publication number: JP2013045363A
Application number: JP2011184054A
Authority: JP
Inventors: Ryuichiro Higashinaka; 竜一郎東中; Kugatsu Sadamitsu; 九月貞光; Yasuhiro Minami; 泰浩南; Toyomi Meguro; 豊美目黒; Koji Dosaka; 浩二堂坂; Hiroto Inagaki; 博人稲垣
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2011-08-25
Filing date: 2011-08-25
Publication date: 2013-03-04
Anticipated expiration: 2031-08-25
Also published as: JP5591772B2

Abstract

PROBLEM TO BE SOLVED: To estimate degrees of depending on context for dialog data.SOLUTION: Feature quantity of each speech is extracted from dialog data which is a time series of a plurality of speeches by a feature quantity extraction unit 30. The plurality of speeches are clustered by using a technique of a CRP on the basis of the extracted feature quantity of each speech by a CRP clustering unit 31. The plurality of speeches are clustered by using a technique of infinite an HMM on the basis of the extracted feature quantity of each speech by an infinite HMM clustering unit 32. Context dependency of the speeches is calculated on the basis of clustering results of the CRP and clustering results of the HMM by a context dependency calculation unit 23.

Description

本発明は、文脈依存性推定装置、発話クラスタリング装置、方法、及びプログラムに係り、特に、対話データについて、発話をクラスタリングする文脈依存性推定装置、発話クラスタリング装置、方法、及びプログラムに関する。 The present invention relates to a context dependency estimation apparatus, an utterance clustering apparatus, a method, and a program, and more particularly to a context dependency estimation apparatus, an utterance clustering apparatus, a method, and a program for clustering utterances with respect to conversation data.

特定の対話ドメイン（ここで、ドメインとは対話システムが扱う対話内容・分野・ジャンルを表す。たとえば、フライト予約や会議室予約）における対話システムを構築する場合、当該ドメインの対話データを収集し、研究者や開発者が、その対話ドメインの対話をモデル化する必要がある。たとえば、語彙のセットは何がよいかといったものや、どのような種類の発話を扱うべきかといったことを決める。 When constructing a dialogue system in a specific dialogue domain (where the domain represents the dialogue content / field / genre handled by the dialogue system, for example, flight reservation or conference room reservation), collect the dialogue data of the domain, Researchers and developers need to model the interaction domain interaction. For example, the vocabulary set determines what is good and what kind of utterance should be handled.

対話システム構築において特に重要とされるのは、後者の発話の種類（対話行為タイプ、発話行為タイプとも呼ばれる）を決めるフェーズであり、非特許文献１に示されるように多くの研究がある。 What is particularly important in constructing a dialogue system is the latter phase of determining the type of utterance (also called dialogue action type or utterance action type), and there are many studies as shown in Non-Patent Document 1.

しかしながら、こういった研究では対話行為の種類を人間が事前に決定している。一般に、どのような発話がドメインに存在し、どのくらいの対話行為数が必要かを決定するには、専門家の詳細な分析が必要で、コストが高い。そこで、発話をクラスタリングし、どのような発話のまとまりがあるか、いくつくらいの対話行為数が必要かを、データから自動的に求める手法が知られている（非特許文献２）。 However, in these studies, humans determine in advance the types of dialogue. In general, deciding what utterances exist in a domain and how many interaction actions are required requires detailed expert analysis and is expensive. Therefore, a technique is known in which utterances are clustered to automatically determine from the data what kind of utterances are clustered and how many interactive actions are required (Non-patent Document 2).

非特許文献２の手法は、中華料理店過程（ＣｈｉｎｅｓｅＲｅｓｔａｕｒａｎｔＰｒｏｃｅｓｓ、ＣＲＰ）と呼ばれる手法を用いて、発話のクラスタリングを行い、最適な対話行為数を推定している。本手法は、対話中の発話を独立のものと見なし、クラスタリングを実施し、同時に、クラスタ数（すなわち、対話行為数）を決定している。 The technique of Non-Patent Document 2 uses a technique called Chinese Restaurant Process (CRP) to cluster utterances and estimate the optimal number of dialogue actions. In this method, utterances during dialogue are regarded as independent, clustering is performed, and at the same time, the number of clusters (that is, the number of dialogue actions) is determined.

なお、クラスタ数を事前に決定しない方法はＣＲＰ以外にもいくつかあり、たとえばＡｆｆｉｎｉｔｙＰｒｏｐａｇａｔｉｏｎと呼ばれる手法や、Ｘ−Ｍｅａｎｓと呼ばれる手法が知られている（非特許文献３、非特許文献４）。また、クラスタ数を事前に決定する手法（たとえば、Ｋ−Ｍｅａｎｓ）を繰り返し用いて、最適なクラスタ数を発見することも可能である。たとえば、ある評価セットについて、クラスタ数を少しずつ増やして、最も高いクラスタリング精度が得られるクラスタ数を最適とする。ここで、精度はクラスタリング評価で一般的なｐｕｒｉｔｙやＦ値（Ｆ−ｍｅａｓｕｒｅ）などを用いればよい。 There are several methods other than CRP that do not determine the number of clusters in advance. For example, a method called Affinity Propagation and a method called X-Means are known (Non-patent Documents 3 and 4). It is also possible to find the optimum number of clusters by repeatedly using a method for determining the number of clusters in advance (for example, K-Means). For example, with respect to a certain evaluation set, the number of clusters is increased little by little, and the number of clusters that provides the highest clustering accuracy is optimized. Here, the accuracy may be a general purity or F value (F-measure) in clustering evaluation.

A. Stolcke, N. Coccaro, R. Bates, P. Taylor, C. V. Ess-Dykema,K. Ries, E. Shriberg, D. Jurafsky, R. Martin, and M. Meteer,“ Dialogue act modelingfor automatic tagging and recognition of conversational speech, ”Computational Linguistics, vol. 26, no. 3, pp. 339-373, 2000.A. Stolcke, N. Coccaro, R. Bates, P. Taylor, CV Ess-Dykema, K. Ries, E. Shriberg, D. Jurafsky, R. Martin, and M. Meteer, “Dialogue act modeling for automatic tagging and recognition of conversational speech, "Computational Linguistics, vol. 26, no. 3, pp. 339-373, 2000. N. Crook, R. Granell, and S. Pulman,“ Unsupervised classification of dialogue acts using a Dirichlet process mixture model, ” in Proc. SIGDIAL, 2009, pp. 341-348.N. Crook, R. Granell, and S. Pulman, “Unsupervised classification of dialogue acts using a Dirichlet process mixture model,” in Proc. SIGDIAL, 2009, pp. 341-348. Clustering by Passing Messages Between Data Points. Brendan J. Frey and Delbert Dueck, Science 315, 972--976, 2007.Clustering by Passing Messages Between Data Points. Brendan J. Frey and Delbert Dueck, Science 315, 972--976, 2007. Dan Pelleg and Andrew Moore: X-means: Extending K-means with Efficient Estimation of the Number of Clusters. In Proc. ICML, 2000.Dan Pelleg and Andrew Moore: X-means: Extending K-means with Efficient Estimation of the Number of Clusters.In Proc.ICML, 2000.

上記の非特許文献２に記載の手法では、クラスタリングに際して、対話中の発話を独立のものと見なしている。しかし、対話データは連続した発話からなるのが通例である。従来技術では、このような対話に重要な文脈情報を使っておらず、クラスタリングの精度や対話行為数の推定が十分でない。 In the method described in Non-Patent Document 2 described above, utterances during dialogue are regarded as independent during clustering. However, it is customary for dialogue data to consist of continuous utterances. In the prior art, context information important for such dialogue is not used, and the accuracy of clustering and the estimation of the number of dialogue actions are not sufficient.

たとえば、「はい」は、肯定と相槌の両方の可能性が有り、文脈からでないと肯定及び相槌のいずれであるかを判断できないが、従来技術では同じものとして扱ってしまう。 For example, “yes” can be both affirmative and affirmative, and it cannot be determined whether it is affirmative or affirmative unless it is from the context, but the prior art treats it as the same.

加えて、ドメイン中の発話がどの程度文脈に依存するのかを知ることは対話システムを構築する上で有用であるが、従来技術では発話を独立なものと見なしているため、そのような知見は得られない。 In addition, knowing how much the utterances in the domain depend on the context is useful for constructing a dialogue system, but since the conventional technology regards utterances as independent, such knowledge is I can't get it.

本発明は、上記の事情を鑑みてなされたもので、対話データについて、文脈に依存している度合いを推定することができる文脈依存性推定装置、方法、及びプログラムを提供することを第１の目的とする。また、対話データについて、文脈を考慮して発話を精度良くクラスタリングすることができる発話クラスタリング装置及び方法を提供することを第２の目的とする。 The present invention has been made in view of the above circumstances, and provides a context-dependent estimation device, method, and program capable of estimating the degree of dependence of conversation data on the context. Objective. It is a second object of the present invention to provide an utterance clustering apparatus and method capable of accurately clustering utterances with respect to dialogue data in consideration of the context.

上記の目的を達成するために本発明に係る文脈依存性推定装置は、複数の発話の時系列である対話データから、各発話の特徴量を抽出する特徴量抽出手段と、前記特徴量抽出手段によって抽出された各発話の特徴量に基づいて、前記複数の発話をクラスタリングする第１クラスタリング手段と、前記特徴量抽出手段によって抽出された各発話の特徴量に基づいて、前記発話の文脈情報を用いて、前記複数の発話をクラスタリングする第２クラスタリング手段と、前記第１クラスタリング手段によるクラスタリング結果、及び前記第２クラスタリング手段によるクラスタリング結果に基づいて、文脈に依存している度合いを推定する推定手段と、を含んで構成されている。 In order to achieve the above object, a context-dependent estimation apparatus according to the present invention includes a feature amount extracting unit that extracts a feature amount of each utterance from dialogue data that is a time series of a plurality of utterances, and the feature amount extracting unit. First clustering means for clustering the plurality of utterances based on the feature amount of each utterance extracted by the above, and context information of the utterance based on the feature amount of each utterance extracted by the feature amount extraction means And a second clustering unit that clusters the plurality of utterances, a clustering result obtained by the first clustering unit, and an estimation unit that estimates a degree depending on a context based on the clustering result obtained by the second clustering unit. And.

本発明に係る文脈依存性推定方法は、特徴量抽出手段、第１クラスタリング手段、第２クラスタリング手段、及び推定手段を含む文脈依存性推定装置における文脈依存性推定方法であって、前記文脈依存性推定装置は、前記特徴量抽出手段によって、複数の発話の時系列である対話データから、各発話の特徴量を抽出し、前記第１クラスタリング手段によって、前記特徴量抽出手段によって抽出された各発話の特徴量に基づいて、前記複数の発話をクラスタリングし、前記第２クラスタリング手段によって、前記特徴量抽出手段によって抽出された各発話の特徴量に基づいて、前記発話の文脈情報を用いて、前記複数の発話をクラスタリングし、前記推定手段によって、前記第１クラスタリング手段によるクラスタリング結果、及び前記第２クラスタリング手段によるクラスタリング結果に基づいて、文脈に依存している度合いを推定することを特徴とする。 A context dependency estimation method according to the present invention is a context dependency estimation method in a context dependency estimation apparatus including a feature amount extraction unit, a first clustering unit, a second clustering unit, and an estimation unit, wherein the context dependency The estimation apparatus extracts feature amounts of each utterance from dialogue data that is a time series of a plurality of utterances by the feature amount extraction unit, and each utterance extracted by the feature amount extraction unit by the first clustering unit. Clustering the plurality of utterances based on the feature amount of the utterance, and using the context information of the utterance based on the feature amount of each utterance extracted by the feature amount extraction means by the second clustering means, Clustering a plurality of utterances, and by the estimating means, the clustering result by the first clustering means, and the second Based on the clustering result by rastering means, and estimates the degree that depends on the context.

本発明によれば、特徴量抽出手段によって、複数の発話の時系列である対話データから、各発話の特徴量を抽出する。そして、第１クラスタリング手段によって、前記特徴量抽出手段によって抽出された各発話の特徴量に基づいて、前記複数の発話をクラスタリングする。第２クラスタリング手段によって、前記特徴量抽出手段によって抽出された各発話の特徴量に基づいて、前記発話の文脈情報を用いて、前記複数の発話をクラスタリングする。 According to the present invention, the feature quantity of each utterance is extracted from the dialogue data which is a time series of a plurality of utterances by the feature quantity extraction means. Then, the first clustering unit clusters the plurality of utterances based on the feature amount of each utterance extracted by the feature amount extracting unit. The second clustering unit clusters the plurality of utterances using the context information of the utterance based on the feature amount of each utterance extracted by the feature amount extracting unit.

そして、推定手段によって、前記第１クラスタリング手段によるクラスタリング結果、及び前記第２クラスタリング手段によるクラスタリング結果に基づいて、文脈に依存している度合いを推定する。 Then, the estimation unit estimates a degree depending on the context based on the clustering result by the first clustering unit and the clustering result by the second clustering unit.

このように、発話の文脈情報を用いずに、発話をクラスタリングすると共に、発話の文脈情報を用いて発話をクラスタリングすることにより、対話データについて、文脈に依存している度合いを推定することができる。 As described above, by clustering utterances without using utterance context information, and by clustering utterances using utterance context information, it is possible to estimate the degree of dependence of conversation data on the context. .

本発明に係る前記第１クラスタリング手段は、ＣＲＰ（ＣｈｉｎｅｓｅＲｅｓｔａｕｒａｎｔＰｒｏｃｅｓｓ）に従って、前記複数の発話をクラスタリングし、前記第２クラスタリング手段は、無限ＨＭＭ（ＨｉｄｄｅｎＭａｒｋｏｖＭｏｄｅｌ）に従って、前記対話データの発話間の遷移情報を用いて前記複数の発話をクラスタリングするようにすることができる。 The first clustering means according to the present invention clusters the plurality of utterances according to CRP (Chinese Restorant Process), and the second clustering means performs between the conversation data utterances according to an infinite HMM (Hidden Markov Model). The plurality of utterances can be clustered using transition information.

本発明に係る前記第２クラスタリング手段は、前記特徴量抽出手段によって抽出された各発話の特徴量に、前記発話の文脈情報として該発話の直前の発話の特徴量を付加した付加特徴量を各々生成し、前記生成された各発話の前記付加特徴量に基づいて、前記複数の発話をクラスタリングするようにすることができる。 The second clustering unit according to the present invention includes an additional feature amount obtained by adding the feature amount of the utterance immediately before the utterance as context information of the utterance to the feature amount of each utterance extracted by the feature amount extracting unit. The plurality of utterances can be clustered based on the generated additional feature amount of each utterance generated.

上記の対話データは、特定のドメインに関する対話データであり、前記推定手段は、以下の式に従って、前記特定のドメインにおける前記発話の文脈依存度を推定するようにすることができる。 The dialogue data is dialogue data relating to a specific domain, and the estimation means can estimate the context dependency of the utterance in the specific domain according to the following equation.

ただし、クラスタ数Ｃ１は、前記第１クラスタリング手段によってクラスタリングされたクラスタ数であり、クラスタ数Ｃ２は、前記第２クラスタリング手段によってクラスタリングされたクラスタ数である。 However, the cluster number C1 is the number of clusters clustered by the first clustering means, and the cluster number C2 is the number of clusters clustered by the second clustering means.

上記の対話データは、異なる２つのドメインに関する対話データであり、前記第１クラスタリング手段は、各ドメインについて、前記ドメインに関する対話データの複数の発話を各々クラスタリングし、前記第２クラスタリング手段は、各ドメインについて、前記ドメインに関する対話データの複数の発話を各々クラスタリングし、前記推定手段は、各ドメインについて、前記発話の文脈依存度を推定すると共に、以下の式に従って、前記ドメインの文脈依存比を推定するようにすることができる。 The dialogue data is dialogue data concerning two different domains, the first clustering means clusters a plurality of utterances of the dialogue data concerning the domain for each domain, and the second clustering means A plurality of utterances of dialogue data related to the domain are clustered, and the estimation means estimates a context dependency of the utterance for each domain and estimates a context dependency ratio of the domain according to the following equation: Can be.

本発明に係る前記推定手段は、以下の式に従って、前記第１クラスタリング手段によってクラスタリングされたクラスタＣの文脈依存度を推定するようにすることができる。 The estimation means according to the present invention can estimate the context dependency of the cluster C clustered by the first clustering means according to the following equation.

ただし、Ｃ’は、前記第２クラスタリング手段によってクラスタリングされたクラスタの集合であり、ｃは、Ｃ’の要素であるクラスタである。 Here, C ′ is a set of clusters clustered by the second clustering means, and c is a cluster that is an element of C ′.

また、上記のクラスタの文脈依存度を推定する推定手段は、前記第１クラスタリング手段によってクラスタリングされた各クラスタの文脈依存度を推定すると共に、以下の式に従って、前記第１クラスタリング手段によってクラスタリングされたクラスタの平均文脈依存度を推定するようにすることができる。 The estimation means for estimating the context dependency of the cluster estimates the context dependency of each cluster clustered by the first clustering means and is clustered by the first clustering means according to the following formula: The average context dependency of the cluster can be estimated.

ただし、ｃ’’は、Ｃの要素であるクラスタである。 Here, c ″ is a cluster that is an element of C.

本発明に係る発話クラスタリング装置は、入力された複数の発話の時系列である対話データを受け付ける入力手段と、前記入力手段により受け付けた前記対話データを記憶する対話データ記憶手段と、前記対話データから、各発話の特徴量を抽出する特徴量抽出手段と、前記特徴量抽出手段によって抽出された各発話の特徴量に基づいて、無限ＨＭＭ（ＨｉｄｄｅｎＭａｒｋｏｖＭｏｄｅｌ）に従って、前記対話データの発話間の遷移情報を用いて前記複数の発話をクラスタリングする無限ＨＭＭクラスタリング手段と、を含んで構成されている。 An utterance clustering apparatus according to the present invention includes an input unit that receives dialogue data that is a time series of a plurality of input utterances, a dialogue data storage unit that stores the dialogue data received by the input unit, and a dialogue data A feature amount extracting unit for extracting a feature amount of each utterance; and a transition between utterances of the dialog data according to an infinite HMM (Hidden Markov Model) based on the feature amount of each utterance extracted by the feature amount extracting unit. And infinite HMM clustering means for clustering the plurality of utterances using information.

本発明に係る発話クラスタリング方法は、入力手段、対話データ記憶手段、特徴量抽出手段、及び無限ＨＭＭクラスタリング手段を含む発話クラスタリング装置における発話クラスタリング方法であって、前記発話クラスタリング装置は、前記入力手段によって、入力された複数の発話の時系列である対話データを受け付け、前記入力手段により受け付けた前記対話データを対話データ記憶手段に記憶し、特徴量抽出手段によって、前記対話データから、各発話の特徴量を抽出し、前記無限ＨＭＭクラスタリング手段によって、前記特徴量抽出手段によって抽出された各発話の特徴量に基づいて、無限ＨＭＭ（ＨｉｄｄｅｎＭａｒｋｏｖＭｏｄｅｌ）に従って、前記対話データの発話間の遷移情報を用いて前記複数の発話をクラスタリングする。 The utterance clustering method according to the present invention is an utterance clustering method in an utterance clustering apparatus including an input means, an interaction data storage means, a feature amount extraction means, and an infinite HMM clustering means, wherein the utterance clustering apparatus is controlled by the input means. Receiving dialogue data which is a time series of a plurality of inputted utterances, storing the dialogue data received by the input means in the dialogue data storage means, and by using feature amount extraction means, features of each utterance from the dialogue data And the transition information between the utterances of the dialogue data is used according to the infinite HMM (Hidden Markov Model) based on the feature quantity of each utterance extracted by the feature quantity extraction means by the infinite HMM clustering means. Clustering the multiple utterances To do.

本発明に係る発話クラスタリング装置は、入力された複数の発話の時系列である対話データを受け付ける入力手段と、前記入力手段により受け付けた前記対話データを記憶する対話データ記憶手段と、前記対話データから、各発話の特徴量を抽出する特徴量抽出手段と、前記特徴量抽出手段によって抽出された各発話の特徴量に、前記発話の文脈情報として該発話の直前の発話の特徴量を付加した付加特徴量を各々生成する文脈情報付加手段と、前記文脈情報付加手段によって生成された各発話の付加特徴量に基づいて、ＣＲＰ（ＣｈｉｎｅｓｅＲｅｓｔａｕｒａｎｔＰｒｏｃｅｓｓ）に従って、前記複数の発話をクラスタリングするＣＲＰクラスタリング手段と、を含んで構成されている。 An utterance clustering apparatus according to the present invention includes an input unit that receives dialogue data that is a time series of a plurality of input utterances, a dialogue data storage unit that stores the dialogue data received by the input unit, and a dialogue data A feature amount extracting unit that extracts a feature amount of each utterance, and an addition in which the feature amount of the utterance immediately before the utterance is added as the context information of the utterance to the feature amount of each utterance extracted by the feature amount extracting unit Context information adding means for generating each feature quantity; CRP clustering means for clustering the plurality of utterances according to CRP (Chinese Restorant Process) based on the additional feature quantity of each utterance generated by the context information adding means; , Including.

本発明に係る発話クラスタリング方法は、入力手段、対話データ記憶手段、特徴量抽出手段、文脈情報付加手段、及びＣＲＰクラスタリング手段を含む発話クラスタリング装置における発話クラスタリング方法であって、前記発話クラスタリング装置は、前記入力手段によって、入力された複数の発話の時系列である対話データを受け付け、前記入力手段により受け付けた前記対話データを前記対話データ記憶手段に記憶し、前記特徴量抽出手段によって、前記対話データから、各発話の特徴量を抽出し、前記文脈情報付加手段によって、前記特徴量抽出手段によって抽出された各発話の特徴量に、前記発話の文脈情報として該発話の直前の発話の特徴量を付加した付加特徴量を各々生成し、前記ＣＲＰクラスタリング手段によって、前記文脈情報付加手段によって生成された各発話の付加特徴量に基づいて、ＣＲＰ（ＣｈｉｎｅｓｅＲｅｓｔａｕｒａｎｔＰｒｏｃｅｓｓ）に従って、前記複数の発話をクラスタリングする。 The utterance clustering method according to the present invention is an utterance clustering method in an utterance clustering apparatus including an input means, a dialog data storage means, a feature amount extraction means, a context information addition means, and a CRP clustering means, wherein the utterance clustering device comprises: The dialogue means which is a time series of a plurality of utterances inputted by the input means is received, the dialogue data received by the input means is stored in the dialogue data storage means, and the dialogue data is obtained by the feature amount extraction means. Then, the feature amount of each utterance is extracted, and the feature amount of the utterance immediately before the utterance as the context information of the utterance is added to the feature amount of each utterance extracted by the feature information extracting unit by the context information adding unit. Each added feature is generated, and the sentence is added by the CRP clustering means. Based on the additional feature amount of each utterance generated by the information adding means, according to CRP (Chinese Restaurant Process), clustering the plurality of speech.

本発明に係るプログラムは、コンピュータを、上記の文脈依存性推定装置の各手段として機能させるためのプログラムである。 The program according to the present invention is a program for causing a computer to function as each unit of the context dependency estimation apparatus.

以上説明したように、本発明の文脈依存性装置、方法、及びプログラムによれば、発話の文脈情報を用いずに、発話をクラスタリングすると共に、発話の文脈情報を用いて発話をクラスタリングすることにより、対話データについて、文脈に依存している度合いを推定することができる、という効果が得られる。
また、本発明の発話クラスタリング装置及び方法によれば、対話データについて、文脈を考慮して発話を精度良くクラスタリングすることができる、という効果が得られる。 As described above, according to the context-dependent apparatus, method, and program of the present invention, the utterances are clustered without using the utterance context information, and the utterances are clustered using the utterance context information. As a result, it is possible to estimate the degree of dependence on the conversation data depending on the context.
Further, according to the utterance clustering apparatus and method of the present invention, it is possible to obtain an effect that utterances can be clustered with high accuracy in consideration of the context of the conversation data.

本発明の第１の実施の形態に係る文脈依存性推定装置の構成を示す概略図である。It is the schematic which shows the structure of the context dependence estimation apparatus which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る文脈依存性推定装置における文脈依存性推定処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the context dependence estimation processing routine in the context dependence estimation apparatus which concerns on the 1st Embodiment of this invention. 対話データの一例を示す図である。It is a figure which shows an example of dialog data. 対話行為の例を示す図である。It is a figure which shows the example of a dialogue act. 対話データの一例を示す図である。It is a figure which shows an example of dialog data. 対話行為の例を示す図である。It is a figure which shows the example of a dialogue act. 本発明の第２の実施の形態に係る文脈依存性推定装置の構成を示す概略図である。It is the schematic which shows the structure of the context dependence estimation apparatus which concerns on the 2nd Embodiment of this invention. 本発明の第２の実施の形態に係る文脈依存性推定装置における対話データについて、文脈に依存している度合いを推定するフローチャートである。It is a flowchart which estimates the degree which is dependent on context about the dialog data in the context dependence estimation apparatus which concerns on the 2nd Embodiment of this invention. 本発明の第３の実施の形態に係る文脈依存性推定装置の構成を示す概略図である。It is the schematic which shows the structure of the context dependence estimation apparatus which concerns on the 3rd Embodiment of this invention.

以下、図面を参照して本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

〔第１の実施の形態〕
＜システム構成＞
本発明の第１の実施の形態に係る文脈依存性推定装置１００は、特定のドメインに関連する複数の発話の時系列である対話データが入力され、文脈依存性を推定して出力する。この文脈依存性推定装置１００は、ＣＰＵと、ＲＡＭと、後述する文脈依存度推定処理ルーチンを実行するためのプログラムを記憶したＲＯＭとを備えたコンピュータで構成され、機能的には次に示すように構成されている。図１に示すように、文脈依存性推定装置１００は、入力部１０と、演算部２０と、出力部２８とを備えている。 [First Embodiment]
<System configuration>
The context dependence estimation apparatus 100 according to the first embodiment of the present invention receives dialogue data that is a time series of a plurality of utterances related to a specific domain, and estimates and outputs the context dependence. This context dependency estimation apparatus 100 is constituted by a computer including a CPU, a RAM, and a ROM storing a program for executing a context dependency degree estimation processing routine to be described later. It is configured. As illustrated in FIG. 1, the context dependency estimation apparatus 100 includes an input unit 10, a calculation unit 20, and an output unit 28.

入力部１０は、入力された対話データとして、特定のドメインに関連する複数の対話データを受け付ける。各対話データは複数の発話の時系列からなる。例えば、対話データは、対話システムと人間との対話データ、又は人間同士の対話データであり、データは自然言語のテキストや音声認識結果など、時系列的に順次処理できる自然言語のデータである。 The input unit 10 receives a plurality of interaction data related to a specific domain as input interaction data. Each dialogue data consists of a time series of a plurality of utterances. For example, the dialogue data is dialogue data between the dialogue system and a person, or dialogue data between people, and the data is natural language data that can be sequentially processed in time series, such as a natural language text or a speech recognition result.

演算部２０は、対話データ記憶部２１、発話クラスタリング部２２、及び文脈依存度算出部２３を備えている。なお、文脈依存度算出部２３が、推定手段の一例である。 The calculation unit 20 includes a dialogue data storage unit 21, an utterance clustering unit 22, and a context dependency calculation unit 23. The context dependency calculation unit 23 is an example of an estimation unit.

対話データ記憶部２１は、入力部１０により受け付けた複数の対話データを記憶する。 The dialogue data storage unit 21 stores a plurality of dialogue data received by the input unit 10.

発話クラスタリング部２２は、特徴量抽出部３０、ＣＲＰクラスタリング部３１、及び無限ＨＭＭクラスタリング部３２を備えている。なお、ＣＲＰクラスタリング部３１が、第１クラスタリング手段の一例であり、無限ＨＭＭクラスタリング部３２が、第２クラスタリング手段の一例である。 The utterance clustering unit 22 includes a feature amount extraction unit 30, a CRP clustering unit 31, and an infinite HMM clustering unit 32. The CRP clustering unit 31 is an example of a first clustering unit, and the infinite HMM clustering unit 32 is an example of a second clustering unit.

特徴量抽出部３０は、入力された対話データにおける各発話から特徴量を抽出する。例えば、ｂａｇ−ｏｆ−ｗｏｒｄｓの特徴量を抽出する。ｂａｇ−ｏｆ−ｗｏｒｄｓとは自然言語処理でよく用いられる特徴量であり、単語の頻度付き集合のことである。この集合を得るために、特徴量抽出部３０は、形態素解析を用い（本実施の形態ではＣｈａＳｅｎを使用）、各発話について、ｂａｇ−ｏｆ−ｗｏｒｄｓの特徴量を求める。なお、低頻度語はクラスタリングに悪影響を及ぼす可能性があるため、各ドメインの全データについて、１０回以上出現している単語のみを特徴量としてもよい。また、上記集合の各単語として標準形の単語を用いる。特徴量として、内容語のみの頻度を用いたり、機能語のみの頻度を用いたりするようにしてもよい。 The feature amount extraction unit 30 extracts a feature amount from each utterance in the input dialogue data. For example, the feature amount of bag-of-words is extracted. Bag-of-words is a feature quantity often used in natural language processing, and is a set with a frequency of words. In order to obtain this set, the feature quantity extraction unit 30 uses morphological analysis (ChaSen is used in the present embodiment), and obtains the feature quantity of bag-of-words for each utterance. Since low-frequency words may adversely affect clustering, only words that appear 10 times or more may be used as feature amounts for all data in each domain. A standard word is used as each word in the set. As the feature quantity, the frequency of only the content word or the frequency of only the function word may be used.

ＣＲＰクラスタリング部３１は、特徴量抽出部３０によって抽出された各発話の特徴量に基づいて、ＣＲＰの手法を用いて、対話データの各発話をクラスタリングする。 The CRP clustering unit 31 clusters each utterance of the dialog data using the CRP method based on the feature amount of each utterance extracted by the feature amount extraction unit 30.

ＣＲＰは、データから自動的にクラスタ数を決定する手法であり、下記の手続きでクラスタリングを行う。ＣＲＰにおいて、データ（すなわち、発話）は客と呼ばれ、クラスタはテーブルと呼ばれる。 CRP is a method for automatically determining the number of clusters from data, and performs clustering according to the following procedure. In CRP, data (ie, utterances) are called customers and clusters are called tables.

まず、最初の客は最初のテーブルに配置される。そして、次の客(c_i) は、すでに客がついたテーブル(t_j)に座るか、新しいテーブル(t_new; new は新しいテーブルのインデックス) に、以下の（１）式で表される確率で座る。 First, the first customer is placed on the first table. Then, the next customer (c _i ) sits on a table (t _j ) that already has a customer, or is expressed in the following table (1) in a new table (t _new ; new is the index of the new table) Sit with probability.

ここで、‘ｎ（t_j）’はｔ_jについている客の数を返す関数であり、Ｎはこれまでにテーブルについた客の数である。また、αは、客が新しいテーブルにつく度合いを示すハイパーパラメタであり、αが大きければ大きいほどクラスタ数が多くなる。ヒューリスティクスとして、想定されるおおよそのクラスタ数の逆数がαに用いられる（たとえば、想定されるクラスタ数が１００なら０．０１）。Ｐ（ｃ_i|ｔ_j）はｃ_i がｔ_jから生成される確率である。この確率は以下の（２）式に従って計算する。 Here, 'n (t _j )' is a function that returns the number of customers attached to t _j , and N is the number of customers that have been on the table so far. Further, α is a hyperparameter indicating the degree to which a customer gets to a new table. The larger α is, the larger the number of clusters is. As a heuristic, the reciprocal of the estimated approximate number of clusters is used for α (for example, 0.01 if the assumed number of clusters is 100). P (c _i | t _j ) is the probability that c _i is generated from t _j . This probability is calculated according to the following equation (2).

ここで、Ｗは特徴量の集合であり、count（*，ｗ）は客またはテーブルにおいて、特徴量ｗが何回生起したかを表す。βは確率０を防ぐためのハイパーパラメタであり、十分小さい数であればよい。たとえば、０.００００１などである。Ｐ（ｃ_i|ｔ_new）には一様分布を用いる。すべての客を順番に配置した後、ギブスサンプリングという手法で客を再配置していく。これは、客を一人そのテーブルから離し、上記処理によって別テーブル（新しいテーブルも含む）か、自分が元いたテーブルに再度配置させるものであり、この再配置を、すべての客について最適な配置が求まるまで何度も繰り返す。客の配置が変わらなくなるか、各データにつき１０００回といった十分な回数のサンプリングが行われたら、収束したとみなし、そのときの客のテーブルにおける配置を、クラスタリング結果とする。 Here, W is a set of feature values, and count (*, w) represents how many times the feature value w has occurred in the customer or table. β is a hyperparameter for preventing probability 0 and may be a sufficiently small number. For example, 0.0001. A uniform distribution is used for P (c _i | t _new ). After all customers are placed in order, the customers are rearranged using a technique called Gibbs sampling. This is to move a customer away from the table and place it again on another table (including a new table) or on the table that he / she originally used, and this rearrangement is the best arrangement for all customers. Repeat as many times as you want. If the customer arrangement does not change or sampling is performed a sufficient number of times such as 1000 times for each data, it is considered that the customer has converged, and the arrangement in the customer table at that time is taken as the clustering result.

このように、ＣＲＰクラスタリング部３１は、各発話の特徴量に基づいて、上記のようにＣＲＰの手法を用いて、発話（客）を複数のクラスタ（テーブル）にクラスタリングし、クラスタ数、及び各発話と該発話の属するクラスタの情報とを出力する。 As described above, the CRP clustering unit 31 clusters the utterances (customers) into a plurality of clusters (tables) using the CRP technique as described above based on the feature amount of each utterance. An utterance and information on a cluster to which the utterance belongs are output.

無限ＨＭＭクラスタリング部３２は、特徴量抽出部３０によって抽出された各発話の特徴量に基づいて、無限ＨＭＭの手法を用いて、発話をクラスタリングする。 The infinite HMM clustering unit 32 clusters utterances using the infinite HMM technique based on the feature values of each utterance extracted by the feature value extraction unit 30.

本実施の形態では、無限ＨＭＭと呼ばれる手法により、文脈情報を用いて発話のクラスタリングを行う。そして、その際にクラスタ数が自動的に決定されるようにする。 In the present embodiment, utterance clustering is performed using context information by a technique called infinite HMM. At that time, the number of clusters is automatically determined.

無限ＨＭＭの手法は、データからパラメタを推定するノンパラメトリックベイズの手法の一つであり、時系列的なデータを扱うＨＭＭを、無限の状態が扱えるようにしたものである。無限の状態が扱えるということの意味は、状態数が予め定まっていないということを指し、状態数はデータ依存で決定される。無限ＨＭＭの詳細は、非特許文献（Y. Teh, M. Jordan, M. Beal, and D. Blei, “ Sharing clusters among related groups: Hierarchical Dirichlet processes, ”in Proc. NIPS, 2004.）に記載されている。 The infinite HMM method is one of non-parametric Bayesian methods for estimating parameters from data, and an HMM that handles time-series data can handle an infinite state. The meaning that an infinite state can be handled means that the number of states is not predetermined, and the number of states is determined depending on data. Details of Mugen HMM are described in non-patent literature (Y. Teh, M. Jordan, M. Beal, and D. Blei, “Sharing clusters among related groups: Hierarchical Dirichlet processes,” in Proc. NIPS, 2004.). ing.

本実施の形態では、この無限ＨＭＭを用いることで、文脈情報を用いた発話のクラスタリングを行い、同時に、対話行為数を推定する。無限ＨＭＭでは、発話のシーケンスをモデル化する。すなわち、各状態から発話が出力され、次の状態に遷移するモデルである。状態間の遷移（すなわち、発話集合間の遷移）を扱うため、文脈情報（特に、直前の発話の情報）を用いていると考えることができる。なお、ＨＭＭでは複数の状態が接続されているため、必ずしも直前の発話のみに依存してクラスタリングがされているわけではないことに注意する。 In this embodiment, by using this infinite HMM, utterance clustering using context information is performed, and at the same time, the number of dialogue actions is estimated. Infinite HMM models a sequence of utterances. That is, this is a model in which an utterance is output from each state and transitions to the next state. It can be considered that context information (in particular, information on the immediately preceding utterance) is used to handle transitions between states (ie, transitions between utterance sets). Note that since a plurality of states are connected in the HMM, clustering is not necessarily performed depending only on the immediately preceding utterance.

ここで、無限ＨＭＭを用いたクラスタリング手法について説明する。無限ＨＭＭはＣＲＰに似た処理によってクラスタリングを行うため、ここでも、データを客と呼び、クラスタをテーブルと呼んで、説明する。 Here, a clustering method using an infinite HMM will be described. Since the infinite HMM performs clustering by a process similar to CRP, data will be referred to as a customer and the cluster will be referred to as a table.

無限ＨＭＭでは、客ｃ_iは、すでに客の着いているテーブルｔ_jか、新しいテーブル（ｔ_j=new）に、以下の（３）式で表される確率に従って座る。 In the infinite HMM, the customer c _i sits on a table t _j that has already arrived at the customer or a new table (t _{j = new} ) according to the probability expressed by the following equation (3).

ここで、ｔ_cはｃの着席しているテーブルを表す。無限ＨＭＭでは、客には順序があり、ｃ_iの前と後の客を、それぞれｃ_i−1とｃ_i+1とする。これは、対話データ中の発話に順序があることに該当する。 Here, t _c represents a table where _c is seated. In infinite HMM, the customer may order a customer before and after the c _i, respectively and c _i-1 and c _{i + 1.} This corresponds to the order of utterances in the dialog data.

Ｐ（ｔ_j，ｔ_k）はテーブル間の遷移確率であり、以下の（４）式で求められる。 P (t _j , t _k ) is a transition probability between tables, and is obtained by the following equation (4).

ここで、αは客が新しいテーブルに着く度合いを表すハイパーパラメタであり、Ｋはすでに客がいるテーブル数を表す。transitions(ｔ_j，ｔ_k）はｔ_jからｔ_kの遷移数であり、γは確率０を避けるためのハイパーパラメタである。十分小さい数であればよい。たとえば、０．００００１などである。客が新しいテーブルに着く確率は、以下の（５）式で表される。 Here, α is a hyperparameter that represents the degree of arrival of a customer at a new table, and K represents the number of tables that already have customers. transitions (t _j , t _k ) is the number of transitions from t _j to t _k , and γ is a hyperparameter for avoiding probability 0. It is sufficient if the number is sufficiently small. For example, 0.00001. The probability that the customer will arrive at the new table is expressed by the following equation (5).

ここで、Ｐ（ｃ_i|ｔ_new）には一様分布を用いる。 Here, a uniform distribution is used for P (c _i | t _new ).

ＣＲＰの時と同様、ギブスサンプリングを用いて客の配置を最適化し、最終的に得られた客の配置をクラスタリングの結果とする。上記のように、客は、自分の前の客の着いているテーブルを見て自分の着くべきテーブルを決めており、無限ＨＭＭでは、文脈情報を用いてクラスタリングを行っている。 As in the case of CRP, the customer arrangement is optimized using Gibbs sampling, and the finally obtained customer arrangement is used as the clustering result. As described above, the customer determines the table that the customer should arrive by looking at the table that the customer in front of him / her arrives, and infinite HMM performs clustering using context information.

このように、無限ＨＭＭクラスタリング部３２は、各発話の特徴量に基づいて、上記のように無限ＨＭＭの手法を用いて、発話（客）を複数のクラスタ（テーブル）にクラスタリングし、クラスタ数、及び各発話と該発話の属するクラスタの情報とを出力する。 As described above, the infinite HMM clustering unit 32 clusters the utterances (customers) into a plurality of clusters (tables) using the infinite HMM technique as described above based on the feature amount of each utterance. And each utterance and information of the cluster to which the utterance belongs.

文脈依存度算出部２３は、以下に説明するように、文脈に依存している度合いを示す、発話の文脈依存度及びクラスタの文脈依存度を算出する。 As will be described below, the context dependency calculation unit 23 calculates the context dependency of the utterance and the context dependency of the cluster, which indicate the degree of dependency on the context.

ＣＲＰを用いたクラスタリング結果は文脈を見ない場合の結果であり、無限ＨＭＭを用いたクラスタリング結果は文脈を見た場合の結果である。よって、このクラスタ数（推定対話行為数）の違いを見ることで、ドメインにおいてどれほど発話が文脈に依存しているかを計算できる。具体的には、文脈依存度算出部２３は、以下の（６）式に従って、対話データのドメインに関する発話の文脈依存度を算出する。 The clustering result using CRP is the result when the context is not seen, and the clustering result using the infinite HMM is the result when the context is seen. Therefore, it is possible to calculate how much the utterance depends on the context in the domain by looking at the difference in the number of clusters (estimated number of dialogue actions). Specifically, the context dependency calculating unit 23 calculates the context dependency of the utterance related to the domain of the conversation data according to the following equation (6).

また、ＣＲＰを用いたクラスタリングによるクラスタと無限ＨＭＭを用いたクラスタリングによるクラスタとを比較することで、どのような発話がより文脈に依存しているかを知ることができる。具体的には、ＣＲＰを用いたクラスタリングによる各クラスタの各データ（発話）が、無限ＨＭＭを用いたクラスタリングによる各クラスタにどのように割り振られたかを調べることで実現できる。ＣＲＰの１つのクラスタの各データが、無限ＨＭＭのクラスタのうち一つまたは少量のものにのみ割り振られているとすれば、そのクラスタの発話はそれほど文脈依存ではないと言える。しかし、ＣＲＰの１つのクラスタの各データが、無限ＨＭＭの多くのクラスタに割り振られているとすれば、それは文脈に大きく依存したクラスタであると言える。 Further, by comparing the cluster by clustering using CRP and the cluster by clustering using infinite HMM, it is possible to know what kind of utterance is more dependent on the context. Specifically, it can be realized by examining how each data (utterance) of each cluster by clustering using CRP is allocated to each cluster by clustering using infinite HMM. If each data of one cluster of CRP is allocated to only one or a small number of clusters of infinite HMM, it can be said that the utterance of that cluster is not so context-dependent. However, if each data of one cluster of CRP is allocated to many clusters of infinite HMM, it can be said that it is a cluster depending on the context.

そこで、文脈依存度算出部２３は、ＣＲＰクラスタリング部３１によるクラスタリング結果の各クラスタＣの文脈依存度を、以下の（７）式に従って算出する。 Therefore, the context dependency calculation unit 23 calculates the context dependency of each cluster C of the clustering result by the CRP clustering unit 31 according to the following equation (7).

ここで、Ｃ’ は無限ＨＭＭクラスタリング部３２によるクラスタリング結果のクラスタの集合であり、ｃは、クラスタの集合Ｃ’の各要素（クラスタ）である。Ｐ（ｃ）は以下の（８）式に従って求められる。 Here, C ′ is a set of clusters resulting from clustering by the infinite HMM clustering unit 32, and c is each element (cluster) of the set of clusters C ′. P (c) is obtained according to the following equation (8).

上記（７）式は、情報理論におけるエントロピーの式と同様であり、ＣＲＰのクラスタ内のデータ（発話）が、無限ＨＭＭの多くのクラスタに散らばっている場合に大きな値を取る。これによって、各クラスタの文脈依存度を求めることができる。すなわち、この値が大きければ、そのクラスタに含まれる発話は文脈依存性が高いと考えられ、これらの発話を分析することで、文脈依存の発話に頑健な対話システムの構築につなげることが可能となる。 The above equation (7) is similar to the entropy equation in information theory, and takes a large value when the data (utterances) in the CRP cluster is scattered in many clusters of the infinite HMM. As a result, the context dependency of each cluster can be obtained. In other words, if this value is large, the utterances included in the cluster are considered to be highly context-dependent, and analyzing these utterances can lead to the construction of a dialogue system that is robust against context-dependent utterances. Become.

たとえば、ＣＲＰのあるクラスタに属する複数の発話が、無限ＨＭＭの多くのクラスタに対応していたとすると、そのクラスタにおける発話は、表面上については似ているが、文脈によって意味が異なる可能性が高い。 For example, if multiple utterances belonging to a cluster with CRP correspond to many clusters of infinite HMM, the utterances in that cluster are similar on the surface but are likely to have different meanings depending on the context. .

そのような発話のみを取り上げて集中的に分析することにより、文脈に応じてユーザ発話を高精度に理解できる対話システムの理解部につなげることができる。 By taking only such utterances and analyzing them intensively, it is possible to connect to an understanding unit of a dialog system that can understand user utterances with high accuracy according to the context.

また、全クラスタの文脈依存度の平均を取ることで、全体の文脈依存度も計算でき、分析に利用することができる。そこで、文脈依存度算出部２３は、以下の（９）式に従って、平均文脈依存度を算出する。 Also, by taking the average of the context dependency of all clusters, the overall context dependency can also be calculated and used for analysis. Therefore, the context dependency calculation unit 23 calculates the average context dependency according to the following equation (9).

ここで、ｃ’’はＣＲＰクラスタリング部３１によるクラスタリング結果におけるクラスタ集合Ｃの各要素である。 Here, c ″ is each element of the cluster set C in the clustering result by the CRP clustering unit 31.

出力部２８は、文脈依存度算出部２３によって算出された、発話の文脈依存度、クラスタの文脈依存度、及び平均文脈依存度を出力する。 The output unit 28 outputs the utterance context dependency, the cluster context dependency, and the average context dependency calculated by the context dependency calculation unit 23.

＜文脈依存性推定装置の作用＞
次に、本実施の形態に係る文脈依存性推定装置１００の作用について説明する。まず、あるドメインに関する複数の発話の時系列が対話データとして文脈依存性推定装置１００に複数入力されると、文脈依存性推定装置１００によって、入力された複数の対話データが、対話データ記憶部２１へ格納される。そして、文脈依存性推定装置１００によって、図２に示す文脈依存性推定処理ルーチンが実行される。 <Operation of context dependency estimation device>
Next, the operation of the context dependency estimation apparatus 100 according to the present embodiment will be described. First, when a plurality of time series of a plurality of utterances related to a certain domain are inputted as dialogue data to the context dependence estimation device 100, the plurality of dialogue data inputted by the context dependence estimation device 100 is converted into the dialogue data storage unit 21. Stored in Then, the context dependency estimation apparatus 100 executes the context dependency estimation processing routine shown in FIG.

まず、ステップＳ１０１において、複数の対話データの全ての発話について、ｂａｇ−ｏｆ−ｗｏｒｄｓの特徴量を抽出する。そして、ステップＳ１０２において、上記ステップＳ１０１において抽出された各発話の特徴量に基づいて、ＣＲＰの手法を用いたクラスタリングにより、各発話を複数のクラスタに分類する。 First, in step S101, bag-of-words feature amounts are extracted for all utterances of a plurality of dialogue data. In step S102, each utterance is classified into a plurality of clusters by clustering using a CRP method based on the feature amount of each utterance extracted in step S101.

次のステップＳ１０３では、上記ステップＳ１０１において抽出された各発話の特徴量に基づいて、無限ＨＭＭの手法を用いたクラスタリングにより、各発話を複数のクラスタに分類する。 In the next step S103, each utterance is classified into a plurality of clusters by clustering using the infinite HMM method based on the feature amount of each utterance extracted in step S101.

そして、ステップＳ１０４では、上記ステップＳ１０２のクラスタリング結果におけるクラスタ数と、上記ステップＳ１０３のクラスタリング結果におけるクラスタ数とに基づいて、上記（６）式に従って、当該ドメインに関する発話の文脈依存度を算出する。 In step S104, based on the number of clusters in the clustering result in step S102 and the number of clusters in the clustering result in step S103, the context dependency of the utterance related to the domain is calculated according to the equation (6).

ステップＳ１０５では、上記ステップＳ１０２のクラスタリング結果における各クラスタに属するデータ（発話）と、上記ステップＳ１０３のクラスタリング結果における各クラスタに属するデータ（発話）とに基づいて、上記（７）式に従って、ＣＲＰの手法を用いたクラスタリングによる各クラスタＣの文脈依存度を算出する。また、算出した各クラスタの文脈依存度に基づいて、上記（９）式に従って、ＣＲＰの手法を用いたクラスタリングによる各クラスタの平均文脈依存度を算出する。 In step S105, based on the data (utterance) belonging to each cluster in the clustering result in step S102 and the data (utterance) belonging to each cluster in the clustering result in step S103, according to the above equation (7), the CRP The context dependency of each cluster C is calculated by clustering using the technique. Further, based on the calculated context dependency of each cluster, the average context dependency of each cluster by clustering using the CRP technique is calculated according to the above equation (9).

そして、ステップＳ１０６において、上記ステップＳ１０４、１０５の算出結果を出力して、文脈依存度算出処理ルーチンを終了する。 In step S106, the calculation results of steps S104 and S105 are output, and the context dependency calculation processing routine is terminated.

＜実施例＞
以下に、実施例を示す。対話システムと人間との対話データと、人間同士の対話データをクラスタリングする例を用いて説明する。なお、ここで用いるデータは、チャットインタフェースを通して集められたデータであり、テキスト対話のデータである。 <Example>
Examples are shown below. An explanation will be given by using an example of clustering dialogue data between a dialogue system and a human and dialogue data between humans. The data used here is data collected through the chat interface, and is data of text dialogue.

対話システムと人間との対話データは、対話システムと人間とが会話したデータであり、全部で１０００個の対話データである。対話の中で、システムと人間は動物の好き嫌いについて議論している。 The dialogue data between the dialogue system and the human is data in which the dialogue system and the human are talking, and a total of 1000 pieces of dialogue data. In the dialogue, the system and humans discuss animal likes and dislikes.

このドメインをＡｎｉｍａｌＤｉｓｃｕｓｓｉｏｎ（ＡＤ）ドメインと呼ぶこととする。対話例を図３に示す。上記図３では、Ｕがユーザ発話を表わしＳがシステム発話を表わしている。括弧内は本ドメインにおける対話行為タイプである。本ドメインでは、図４に示すような２９の対話行為が人手によって定義されている。 This domain will be referred to as an Animal Discussion (AD) domain. An example of interaction is shown in FIG. In FIG. 3, U represents a user utterance and S represents a system utterance. In parentheses are dialogue action types in this domain. In this domain, 29 interactive actions as shown in FIG. 4 are defined manually.

各対話行為の詳細については、非特許文献（東中竜一郎, 堂坂浩二, 磯崎秀樹, ”対話システムにおける共感と自己開示の効果”, 言語処理学会第15 回年次大会発表論文集, pp.446-449, 2009.）に詳細が記載されている。 For details of each dialogue act, please refer to non-patent literature (Ryuichiro Higashinaka, Koji Dosaka, Hideki Amagasaki, “Effects of empathy and self-disclosure in dialogue system”, Proc. -449, 2009.).

また、人間同士の対話データは、聞き役対話を集めたものである。このドメインをＡｔｔｅｎｔｉｖｅＬｉｓｔｅｎｉｎｇ（ＡＬ）ドメインと呼ぶこととする。聞き役対話とは、二者が聞き役と話し役に分かれて、一方が聞き役となって話し役の話を聞くという対話である。人間同士の対話データとして、このような対話データを、１２６０個収集した。対話例を図５に示す。上記図５では、Ｓは話し役を表わし、Ｌは聞き役を表わす。括弧内は本ドメインにおける対話行為タイプであり、図６に示すような３８の対話行為が人手によって定義されている。 In addition, the dialogue data between humans is a collection of dialogues of listeners. This domain is referred to as an Attentive Listening (AL) domain. The listener dialogue is a dialogue in which two people are divided into a listener and a speaker, and one of them becomes a listener and listens to the story of the speaker. As human interaction data, 1260 such interaction data were collected. An example of interaction is shown in FIG. In FIG. 5 above, S represents a talking role, and L represents a listening role. In the parentheses are interactive action types in this domain, and 38 interactive actions as shown in FIG. 6 are manually defined.

各対話行為の詳細については、非特許文献（T. Meguro, R. Higashinaka, Y. Minami, and K. Dohsaka, “Controlling listening-oriented dialogue using partially observable Markov decision processes, ” in Proc. COLING, 2010, pp. 761-769.）に詳細が記載されている。 For details of each dialogue act, see non-patent literature (T. Meguro, R. Higashinaka, Y. Minami, and K. Dohsaka, “Controlling listening-oriented dialogue using partially observable Markov decision processes,” in Proc. COLING, 2010, pp. 761-769.) for details.

また、比較対象として、Ｋ−ｍｅａｎｓというクラスタリング手法を用いた。Ｋ−ｍｅａｎｓは、事前にクラスタ数が分かっている場合に用いられるクラスタリング手法の代表的なものであ。これは、まず、ランダムにクラスタを作成し、ＥＭアルゴリズムの枠組みによって、クラスタを局所最適な解が得られるまでアップデートしていく手法である。 As a comparison target, a clustering technique called K-means was used. K-means is a representative clustering method used when the number of clusters is known in advance. In this method, first, clusters are randomly generated, and the clusters are updated until a locally optimal solution is obtained by the framework of the EM algorithm.

ＡＤドメインとＡＬドメインのデータに対し、Ｋ−ｍｅａｎｓ、ＣＲＰ、及び無限ＨＭＭの各々の手法を用いてクラスタリングを行い、対話行為数を推定する実験を行った。ここで、Ｋ−ｍｅａｎｓは対話行為数を推定できない手法であるため、直接的な比較はできない。そこで、Ｋ−ｍｅａｎｓについては、人手で正解の対話行為数を与え、発話のクラスタリングを行った。対話行為数が予め分かっている状態で、クラスタリングを行うため、非常に強力なベースラインと見なせる。 An experiment was conducted to estimate the number of interactive actions by clustering AD domain and AL domain data using the K-means, CRP, and infinite HMM methods. Here, since K-means is a technique that cannot estimate the number of interactive actions, a direct comparison cannot be made. Therefore, for K-means, the number of correct interactive actions was given manually, and utterances were clustered. Since clustering is performed in a state where the number of dialogue actions is known in advance, it can be regarded as a very powerful baseline.

ギブスサンプリングの計算コストが比較的高いため、実験に際しては、各ドメインからランダムに抽出した５０個の対話データずつを対象とした。ＡＤドメインの対話データは２８９４個の発話データであり、ＡＬドメインの対話データは、２４７０個の発話のデータであった。人手で付与した対話行為によれば、これらのサブセットの中には、それぞれ、２７種類の対話行為、３３種類の対話行為が含まれていた。 Since the calculation cost of Gibbs sampling is relatively high, 50 dialogue data extracted at random from each domain were used for the experiment. The AD domain dialogue data was 2894 utterance data, and the AL domain dialogue data was 2470 utterance data. According to the interactive actions given manually, these subsets included 27 kinds of interactive actions and 33 kinds of interactive actions, respectively.

クラスタリングを行う前に、形態素解析を用い（本実験ではＣｈａＳｅｎを使用した）、各発話について、ｂａｇ−ｏｆ−ｗｏｒｄｓの特徴量を求めた。ただ、低頻度語はクラスタリングに悪影響を及ぼす可能性があるため、各ドメインの全データについて、１０回以上出現している単語のみを特徴量とし、単語は標準形を用いた。 Before clustering, morpheme analysis was used (ChaSen was used in this experiment), and the feature amount of bag-of-words was obtained for each utterance. However, since low-frequency words may have an adverse effect on clustering, only words that appear 10 times or more are used as feature quantities for all data in each domain, and the standard form is used for the words.

クラスタリングの評価は、データに人手で付与された正解の対話行為ラベルと対照することで行った。 Clustering was evaluated by contrasting the correct dialogue action labels manually assigned to the data.

評価尺度としては、ｐｕｒｉｔｙとＦ−ｍｅａｓｕｒｅを用いた。どちらもクラスタリング評価の一般的な指標である。ｐｕｒｉｔｙは一つのクラスタにどの程度同じ対話行為の発話が入っているかを表し、Ｆ−ｍｅａｓｕｒｅは、データのペアに着目し、同じクラスタにあるべきペアがどの程度正しく同じクラスタに入っているかを定量化する。ｐｕｒｉｔｙは以下の（１０）式で計算される。 As an evaluation scale, purity and F-measure were used. Both are general indicators for clustering evaluation. “purity” indicates how much the same dialogue act is spoken in one cluster, and “F-measure” pays attention to the pair of data and quantifies how correctly the pair that should be in the same cluster is in the same cluster. Turn into. The purity is calculated by the following equation (10).

ここで、Ｃ＝｛ｃ₁,・・・,ｃ_K｝はクラスタの集合であり、Ｄ＝｛ｄ1,・・・,ｄN｝は対話行為の集合であり、Ｎはデータ数（発話数）である。 Here, C = {c ₁ ,..., C _K } is a set of clusters, D = {d 1,..., DN} is a set of dialogue actions, and N is the number of data (number of utterances). It is.

Ｆ−ｍｅａｓｕｒｅは、以下の（１１）式に従って算出される。 F-measure is calculated according to the following equation (11).

ここで、ＴＰ、ＦＰ、ＦＮは、それぞれｔｒｕｅｐｏｓｉｔｉｖｅ、ｆａｌｓｅｐｏｓｉｔｉｖｅ、ｆａｌｓｅｎｅｇａｔｉｖｅを表す。ｔｒｕｅｐｏｓｉｔｉｖｅは、同じ対話行為である発話のペアが同じクラスタに入っている回数であり、ｆａｌｓｅｐｏｓｉｔｉｖｅは異なる対話行為である発話のペアが同じクラスタに入っている回数であり、ｆａｌｓｅｎｅｇａｔｉｖｅは同じ対話行為である発話のペアが異なったクラスタに入っている回数である。 Here, TP, FP, and FN represent true positive, false positive, and false negative, respectively. true positive is the number of times a pair of utterances with the same dialogue act is in the same cluster, false positive is the number of times a pair of utterances with a different dialogue act is in the same cluster, and false negative is the same dialogue This is the number of times that a pair of utterances that are actions is in different clusters.

Ｋ−ｍｅａｎｓがランダムな初期値に依存すること、ＣＲＰと無限ＨＭＭが確率的に動作することなどから、本実験ではそれぞれのクラスタリング手法で１００回クラスタリングを行い、その平均値を求めた。ＣＲＰと無限ＨＭＭについては、αには０．１を、βとγには０．０１を用いた。ギブスサンプリングのイタレーション数は１００とした。つまり、すべての客は１００回ずつ再配置された。 Since K-means depends on a random initial value and CRP and infinite HMM operate stochastically, in this experiment, clustering was performed 100 times with each clustering method, and the average value was obtained. For CRP and infinite HMM, 0.1 was used for α and 0.01 for β and γ. The number of iterations of Gibbs sampling was 100. In other words, all customers were relocated 100 times.

以下の表１にＡＤドメインの発話のクラスタリング結果を示す。 Table 1 below shows the results of clustering AD domain utterances.

＊は、Ｋ−ｍｅａｎｓに対してｔ検定により１％の有意水準で差があることを示す。＋は、ＣＲＰに対してｔ検定により１％の有意水準で差があることを示す。
また、以下の表２に、ＡＬドメインの発話のクラスタリング結果を示す。

* Indicates that there is a difference at a 1% significance level by t-test with respect to K-means. + Indicates that there is a difference at 1% significance level by t-test against CRP.
Table 2 below shows the clustering result of the utterances of the AL domain.

上記の結果から分かるように、無限ＨＭＭは、他の手法よりもクラスタリング性能が良い。すなわち、発話のクラスタリングに文脈情報を利用することが有用であることが分かった。 As can be seen from the above results, the infinite HMM has better clustering performance than other methods. That is, it was found useful to use context information for utterance clustering.

また、無限ＨＭＭのクラスタ数は、ＡＤドメインで約１４３個、ＡＬドメインで３８個となっており、これが、自動的に推定された対話行為数である。 The number of infinite HMM clusters is about 143 in the AD domain and 38 in the AL domain. This is the automatically estimated number of interactive actions.

ＣＲＰで推定された対話行為数の方が人手で与えた個数に近い。このことから、人間が対話行為を付与するという行為は、発話を独立のものと見なしてなされていると推測できる。しかしながら、本実験の結果によれば、人手による正解の対話行為数より、文脈を考慮した場合の対話行為数の方が多い。このことは、文脈を鑑みれば、人手による対話行為数が少なすぎる可能性を示唆していると考えられる。つまり、対話システムの設計者からすれば、文脈をより考慮した対話行為を加えるなど、対話行為の再設計の指針としてとらえることができ、その指針に沿って対話行為を設計し直すことで、より適切にユーザ発話を処理できる対話システムにつなげることが可能となると考えられる。 The number of interactive actions estimated by CRP is closer to the number given manually. From this, it can be inferred that the act of giving a dialogue act by a human being is regarded as an independent utterance. However, according to the result of this experiment, the number of dialogue actions when the context is considered is larger than the number of correct dialogue actions manually. This seems to suggest that the number of human interaction acts may be too small in view of the context. In other words, a dialog system designer can take it as a guideline for redesigning a dialog act, such as adding a dialog act that takes context into consideration, and by redesigning the dialog act according to that guideline, It will be possible to connect to a dialogue system that can properly handle user utterances.

また、発話の文脈依存度を算出すると、ＡＤドメインに対する発話の文脈依存度は１４３．６２／３５．０３＝４．０９９である。そして、ＡＬドメインに対する文脈依存度は、３８．００／２９．２８＝１．２９８である。このことから、ＡＤドメインの発話の方が、ドメインにおける文脈依存性が高いと判断できる。また、各ドメインにおける文脈依存度を比較して、後述する文脈依存比を算出すると、ＡＤドメインはＡＬドメインの４．０９９／１．２９８＝３．１５８倍、ドメイン依存の対話行為が多いことが、客観的な数値として分かった。 When the context dependency of the utterance is calculated, the context dependency of the utterance with respect to the AD domain is 143.62 / 35.03 = 4.099. The context dependency on the AL domain is 38.00 / 29.28 = 1.298. From this, it can be determined that the utterance of the AD domain is higher in context dependency in the domain. In addition, when the context dependency ratio described below is calculated by comparing the context dependency in each domain, the AD domain is 4.099 / 1.298 = 3.158 times that of the AL domain, and there are many domain-dependent dialog acts. It was found as an objective numerical value.

以上説明したように、本実施の形態に係る文脈依存性推定装置によれば、発話の文脈情報を考慮しないＣＲＰの手法を用いて、対話データの発話をクラスタリングすると共に、発話の文脈情報を考慮した無限ＨＭＭの手法を用いた、対話データの発話をクラスタリングし、発話のクラスタリング結果を比較することにより、あるドメインの対話データについて、文脈依存度を推定することができる。 As described above, according to the context dependency estimation apparatus according to the present embodiment, the utterances of conversation data are clustered and the context information of the utterance is considered using the CRP technique that does not consider the context information of the utterance. By clustering utterances of dialogue data using the infinite HMM method, and comparing the utterance clustering results, it is possible to estimate the context dependency of dialogue data of a certain domain.

また、発話のクラスタリングにおいて発話の文脈情報を考慮することにより、発話のクラスタリングの性能が向上するため、どのような発話がドメインに存在するかが一目で分かるようになり、対話システム構築が容易になる。さらに、ドメイン中の発話の文脈依存度を数値として算出できるため、対話データのドメインの深い理解につながる。たとえば、文脈依存度が高い発話が多いドメインだということが分かれば、システムの理解部において文脈情報をより多く持つといった改善が可能となる。 In addition, by considering the context information of utterances in utterance clustering, the performance of utterance clustering is improved, so it becomes possible to see at a glance what kind of utterances exist in the domain, making it easy to construct a dialogue system. Become. Furthermore, since the context dependency of utterances in a domain can be calculated as a numerical value, this leads to a deep understanding of the domain of dialogue data. For example, if it is known that the domain has a lot of utterances with a high degree of context dependency, it is possible to improve the system so that the understanding unit has more context information.

〔第２の実施の形態〕
次に、第２の実施の形態について説明する。なお、第１の実施の形態と同様の構成となる部分については、同一符号を付して説明を省略する。 [Second Embodiment]
Next, a second embodiment will be described. In addition, about the part which becomes the structure similar to 1st Embodiment, the same code | symbol is attached | subjected and description is abbreviate | omitted.

第２の実施の形態では、文脈情報を付加した発話の特徴量に基づいて、ＣＲＰの手法を用いたクラスタリングを行っている点が、第１の実施の形態と異なっている。 The second embodiment is different from the first embodiment in that clustering using a CRP technique is performed based on the feature amount of an utterance to which context information is added.

図７に示すように、第２の実施の形態に係る文脈依存性推定装置２００の発話クラスタリング部２２２は、特徴量抽出部３０、ＣＲＰクラスタリング部３１、文脈情報付加部２３１、及びＣＲＰクラスタリング部２３２を備えている。なお、ＣＲＰクラスタリング部３１が、第１クラスタリング手段の一例であり、ＣＲＰクラスタリング部２３２が、第２クラスタリング手段の一例である。 As shown in FIG. 7, the utterance clustering unit 222 of the context dependency estimation apparatus 200 according to the second embodiment includes a feature amount extraction unit 30, a CRP clustering unit 31, a context information addition unit 231, and a CRP clustering unit 232. It has. The CRP clustering unit 31 is an example of a first clustering unit, and the CRP clustering unit 232 is an example of a second clustering unit.

文脈情報付加部２３１は、特徴量抽出部３０によって抽出された各発話の特徴量に対して、文脈情報として、直前の発話の特徴量を付加して、付加特徴量を各々生成する。例えば、発話１、・・・、発話Ｎがあり、それぞれの特徴量を、特徴量１、・・・、特徴量Ｎとすると、各発話の特徴量に、前発話の特徴量を付加したもの、つまり、｛開始記号、特徴量１｝、｛特徴量１、特徴量２｝、・・・、｛特徴量Ｎ−１、特徴量Ｎ｝を、各発話の付加特徴量として生成する。これによって、特徴量（ベクトル）の次元が２倍となる。 The context information adding unit 231 adds the feature amount of the immediately previous utterance as context information to the feature amount of each utterance extracted by the feature amount extracting unit 30 to generate each additional feature amount. For example, if there are utterances 1,..., Utterance N, and the feature amounts are feature amounts 1,..., Feature amount N, the feature amount of the previous utterance is added to the feature amount of each utterance. That is, {start symbol, feature amount 1}, {feature amount 1, feature amount 2},..., {Feature amount N-1, feature amount N} are generated as additional feature amounts for each utterance. As a result, the dimension of the feature quantity (vector) is doubled.

ＣＲＰクラスタリング部２３２は、文脈情報付加部２３１によって生成された各発話の付加特徴量に基づいて、ＣＲＰクラスタリング部３１と同様に、ＣＲＰを用いて、発話をクラスタリングする。 Similar to the CRP clustering unit 31, the CRP clustering unit 232 clusters the utterances using the CRP based on the additional feature amount of each utterance generated by the context information adding unit 231.

文脈依存度算出部２３は、ＣＲＰクラスタリング部３１によるクラスタリング結果におけるクラスタ数、及びＣＲＰクラスタリング部２３１によるクラスタリング結果におけるクラスタ数に基づいて、上記（６）式と同様の式に従って、文脈依存度を算出する。 Based on the number of clusters in the clustering result by the CRP clustering unit 31 and the number of clusters in the clustering result by the CRP clustering unit 231, the context dependency calculating unit 23 calculates the context dependency according to the same formula as the above formula (6). To do.

また、文脈依存度算出部２３は、ＣＲＰクラスタリング部３１によるクラスタリング結果における各クラスタのデータ、及びＣＲＰクラスタリング部２３１によるクラスタリング結果における各クラスタのデータに基づいて、上記（７）式と同様の式に従って、ＲＰクラスタリング部３１によるクラスタリング結果の各クラスタＣの文脈依存度を算出する。また、文脈依存度算出部２３は、上記（９）式と同様の式に従って、ＣＲＰクラスタリング部３１によるクラスタリング結果のクラスタの平均文脈依存度を算出する In addition, the context dependence calculation unit 23 follows the same formula as the formula (7) based on the data of each cluster in the clustering result by the CRP clustering unit 31 and the data of each cluster in the clustering result by the CRP clustering unit 231. The context dependency of each cluster C of the clustering result by the RP clustering unit 31 is calculated. In addition, the context dependency calculation unit 23 calculates the average context dependency of the cluster of the clustering result by the CRP clustering unit 31 according to the same expression as the expression (9).

次に、第２の実施の形態における文脈依存性推定処理ルーチンについて、図８を用いて説明する。なお、第１の実施の形態と同様の処理については、同一符号を付して詳細な説明を省略する。 Next, the context dependence estimation processing routine in the second embodiment will be described with reference to FIG. In addition, about the process similar to 1st Embodiment, the same code | symbol is attached | subjected and detailed description is abbreviate | omitted.

まず、ステップＳ１０１において、複数の対話データの全ての発話について、特徴量を抽出する。そして、ステップＳ１０２において、上記ステップＳ１０１において抽出された各発話の特徴量に基づいて、ＣＲＰの手法を用いたクラスタリングにより、各発話を複数のクラスタに分類する。 First, in step S101, feature amounts are extracted for all utterances of a plurality of dialogue data. In step S102, each utterance is classified into a plurality of clusters by clustering using a CRP method based on the feature amount of each utterance extracted in step S101.

次のステップＳ２０１では、上記ステップＳ１０１において抽出された各発話の特徴量に対して、それぞれ直前の発話の特徴量を付加して、付加特徴量を各発話について生成する。 In the next step S201, the feature amount of the immediately preceding utterance is added to the feature amount of each utterance extracted in step S101, and an additional feature amount is generated for each utterance.

そして、ステップＳ２０２において、上記ステップＳ２０１において生成された各発話の付加特徴量に基づいて、ＣＲＰの手法を用いたクラスタリングにより、各発話を複数のクラスタに分類する。 In step S202, each utterance is classified into a plurality of clusters by clustering using the CRP method based on the additional feature amount of each utterance generated in step S201.

そして、ステップＳ１０４では、上記ステップＳ１０２のクラスタリング結果におけるクラスタ数と、上記ステップＳ２０２のクラスタリング結果におけるクラスタ数とに基づいて、上記（６）式と同様の式に従って、当該ドメインに関する発話の文脈依存度を算出する。 In step S104, based on the number of clusters in the clustering result in step S102 and the number of clusters in the clustering result in step S202, the context dependency of the utterance related to the domain according to the same expression as the above expression (6) Is calculated.

ステップＳ１０５では、上記ステップＳ１０２のクラスタリング結果における各クラスタに属するデータ（発話）と、上記ステップＳ２０２のクラスタリング結果における各クラスタに属するデータ（発話）とに基づいて、上記（７）式と同様の式に従って、上記ステップＳ１０２でのクラスタリングによる各クラスタＣの文脈依存度を算出する。また、算出した各クラスタの文脈依存度に基づいて、上記（９）式と同様の式に従って、上記ステップＳ１０２でのクラスタリングによる各クラスタの平均文脈依存度を算出する。 In step S105, based on the data (utterance) belonging to each cluster in the clustering result in step S102 and the data (utterance) belonging to each cluster in the clustering result in step S202, an expression similar to the above expression (7) Accordingly, the context dependency of each cluster C by the clustering in step S102 is calculated. Further, based on the calculated context dependency of each cluster, the average context dependency of each cluster by the clustering in step S102 is calculated according to the same expression as the above expression (9).

以上説明したように、本実施の形態に係る文脈依存性推定装置によれば、発話の文脈情報を考慮しないＣＲＰの手法を用いて、対話データの発話をクラスタリングすると共に、文脈情報として直前の発話の特徴量を付加した付加特徴量を用いて、対話データの発話をクラスタリングし、発話のクラスタリング結果を比較することにより、あるドメインの対話データについて、文脈依存度を推定することができる。 As described above, according to the context dependency estimation apparatus according to the present embodiment, the conversation data utterances are clustered using the CRP technique that does not consider the utterance context information, and the immediately preceding utterance is used as the context information. By using the additional feature amount to which the feature amount is added, the dialogue data utterances are clustered, and the utterance clustering results are compared, whereby the context dependency of the dialogue data of a certain domain can be estimated.

〔第３の実施の形態〕
次に、第３の実施の形態について説明する。 [Third Embodiment]
Next, a third embodiment will be described.

第３の実施の形態では、複数のドメインの各々に対する対話データについて、それぞれ発話のクラスタリングを行って、ドメインの文脈依存比を算出している点が、第１の実施の形態及び第２の実施の形態と異なっている。以下では、複数のドメインの各ドメインの文脈依存度を算出する方法として、第１の実施の形態と同様の方法を用いる場合を例に説明を行うが、各ドメインの文脈依存度を算出する方法として、第２の実施の形態と同様の方法を用いるようにしてもよい。 In the third embodiment, utterance clustering is performed on the conversation data for each of a plurality of domains, and the context dependency ratio of the domain is calculated. The form is different. In the following, a case where the same method as that of the first embodiment is used as a method for calculating the context dependency of each domain of a plurality of domains will be described as an example, but a method of calculating the context dependency of each domain is described. As an alternative, the same method as in the second embodiment may be used.

図９に示すように、第３の実施の形態に係る文脈依存性推定装置３００は、入力部１０Ａ、１０Ｂと、演算部２０と、出力部２８とを備えている。 As illustrated in FIG. 9, the context dependency estimation apparatus 300 according to the third embodiment includes input units 10 </ b> A and 10 </ b> B, a calculation unit 20, and an output unit 28.

入力部１０Ａは、入力された対話データとして、ドメインＡに関連する複数の対話データを受け付ける。入力部１０Ｂは、入力された対話データとして、ドメインＡとは異なるドメインＢに関連する複数の対話データを受け付ける。 The input unit 10A receives a plurality of dialogue data related to the domain A as the inputted dialogue data. The input unit 10B accepts a plurality of pieces of dialogue data related to a domain B different from the domain A as inputted dialogue data.

演算部２０は、対話データ記憶部２１Ａ、２１Ｂ、発話クラスタリング部２２Ａ、２２Ｂ、文脈依存度算出部２３Ａ、２３Ｂ、文脈依存比算出部３２３を備えている。 The calculation unit 20 includes dialogue data storage units 21A and 21B, utterance clustering units 22A and 22B, context dependency calculation units 23A and 23B, and a context dependency ratio calculation unit 323.

対話データ記憶部２１Ａは、入力部１０Ａにより受け付けた複数の対話データを記憶する。対話データ記憶部２１Ｂは、入力部１０Ｂにより受け付けた複数の対話データを記憶する。 The dialogue data storage unit 21A stores a plurality of dialogue data received by the input unit 10A. The dialogue data storage unit 21B stores a plurality of dialogue data received by the input unit 10B.

発話クラスタリング部２２ＡのＣＲＰクラスタリング部３１は、特徴量抽出部３０によって抽出されたドメインＡの対話データの各発話の特徴量に基づいて、ＣＲＰの手法を用いて、ドメインＡについて、発話をクラスタリングする。発話クラスタリング部２２ＢのＣＲＰクラスタリング部３１は、特徴量抽出部３０によって抽出されたドメインＢの各発話の特徴量に基づいて、ＣＲＰの手法を用いて、ドメインＢについて、発話をクラスタリングする。 The CRP clustering unit 31 of the utterance clustering unit 22A clusters the utterances for the domain A using the CRP technique based on the feature amount of each utterance of the domain A dialogue data extracted by the feature amount extraction unit 30. . The CRP clustering unit 31 of the utterance clustering unit 22B clusters utterances for the domain B using the CRP technique based on the feature amount of each utterance of the domain B extracted by the feature amount extraction unit 30.

発話クラスタリング部２２Ａの無限ＨＭＭクラスタリング部３２は、特徴量抽出部３０によって抽出されたドメインＡの対話データの各発話の特徴量に基づいて、無限ＨＭＭの手法を用いて、ドメインＡについて、発話をクラスタリングする。発話クラスタリング部２２Ｂの無限ＨＭＭクラスタリング部３２は、特徴量抽出部３０によって抽出されたドメインＢの対話データの各発話の特徴量に基づいて、無限ＨＭＭの手法を用いて、ドメインＢについて、発話をクラスタリングする。 The infinite HMM clustering unit 32 of the utterance clustering unit 22A performs an utterance on the domain A using the infinite HMM method based on the feature amount of each utterance of the domain A dialogue data extracted by the feature amount extraction unit 30. Clustering. The infinite HMM clustering unit 32 of the utterance clustering unit 22B performs an utterance on the domain B using the infinite HMM method based on the feature amount of each utterance of the domain B dialogue data extracted by the feature amount extraction unit 30. Clustering.

文脈依存度算出部２３Ａは、上記の（６）式に従って、ドメインＡに関する発話の文脈依存度を算出する。文脈依存度算出部２３Ｂは、上記の（６）式に従って、ドメインＢに関する発話の文脈依存度を算出する。 The context dependency calculation unit 23A calculates the context dependency of the utterance related to the domain A according to the above equation (6). The context dependency calculation unit 23B calculates the context dependency of the utterance related to the domain B according to the above equation (6).

文脈依存比算出部３２３は、算出したドメインＡに関する発話の文脈依存度及びドメインＢに関する発話の文脈依存度に基づいて、以下の（１２）式に従って、ドメインＡとドメインＢの文脈依存比を算出する。 The context-dependent ratio calculation unit 323 calculates the context-dependent ratio between domain A and domain B according to the following equation (12) based on the calculated context dependency of utterance related to domain A and the context dependency of utterance related to domain B. To do.

また、文脈依存度算出部２３Ａは、ドメインＡについて、発話クラスタリング部２２ＡのＣＲＰクラスタリング部３１によるクラスタリング結果の各クラスタＣの文脈依存度を、上記の（７）式に従って算出する。文脈依存度算出部２３Ｂは、ドメインＢについて、発話クラスタリング部２２ＢのＣＲＰクラスタリング部３１によるクラスタリング結果の各クラスタＣの文脈依存度を算出する。 In addition, the context dependency calculation unit 23A calculates the context dependency of each cluster C of the clustering result by the CRP clustering unit 31 of the utterance clustering unit 22A for the domain A according to the above equation (7). For the domain B, the context dependency calculation unit 23B calculates the context dependency of each cluster C of the clustering result by the CRP clustering unit 31 of the utterance clustering unit 22B.

文脈依存度算出部２３Ａは、ドメインＡについて、上記の（９）式に従って、平均文脈依存度を算出する。文脈依存度算出部２３Ｂは、ドメインＢについて、上記の（９）式に従って、平均文脈依存度を算出する。 The context dependency calculation unit 23A calculates an average context dependency for the domain A according to the above equation (9). The context dependency calculation unit 23B calculates the average context dependency for the domain B according to the above equation (9).

なお、第３の実施の形態に係る文脈依存性推定装置の他の構成及び作用については、第１の実施の形態と同様であるため、説明を省略する。 Note that the other configuration and operation of the context dependency estimation apparatus according to the third embodiment are the same as those of the first embodiment, and thus the description thereof is omitted.

このように、各ドメインについて発話の文脈依存度を算出して比較することにより、ドメインの文脈依存比を算出することができる。 Thus, the context dependency ratio of the domain can be calculated by calculating and comparing the context dependency of the utterance for each domain.

なお、本発明は、上述した実施形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 Note that the present invention is not limited to the above-described embodiment, and various modifications and applications are possible without departing from the gist of the present invention.

例えば、対話全体の文脈を考慮しないクラスタリング手法として、ＣＲＰの手法を用いた場合を例に説明したが、対話全体の文脈を考慮しないクラスタリング手法であればこれに限定するものではない。例えば、Ｋ−Ｍｅａｎｓ、ＡｆｆｉｎｉｔｙＰｒｏｐａｇａｔｉｏｎ、Ｘ−Ｍｅａｎｓといった方法を用いて、発話のクラスタリングを行ってもよい。 For example, the case where the CRP method is used as the clustering method that does not consider the context of the entire dialogue has been described as an example. However, the clustering method is not limited to this as long as the clustering method does not consider the context of the entire dialogue. For example, utterance clustering may be performed using a method such as K-Means, Affinity Propagation, or X-Means.

なお、クラスタ数を自動的に決定し、文脈情報を用いて発話のクラスタリングを行う手法は、対話全体の文脈を考慮しない従来のクラスタリング手法に比べて、発話間の関係を考慮できるため、クラスタリングの精度が高いという利点がある。 Note that the method of automatically determining the number of clusters and clustering utterances using context information can consider the relationship between utterances compared to the conventional clustering method that does not consider the context of the entire conversation. There is an advantage of high accuracy.

第１の実施の形態に記載の入力部１０、対話データ記憶部２１、特徴量抽出部３０、および無限ＨＭＭクラスタリング部３２を取り出して、発話クラスタリング装置として機能させることができる。同様に、第２の実施の形態に記載の入力部１０、対話データ記憶部２１、特徴量抽出部３０、文脈情報付加部２３１、およびＣＲＰクラスタリング部２３２を取り出して、発話クラスタリング装置として機能させることができる。クラスタ数を自動的に決定し、文脈情報を用いて発話のクラスタリングを行うことで、高精度なクラスタリングが可能になるという利点がある。 The input unit 10, the dialogue data storage unit 21, the feature amount extraction unit 30, and the infinite HMM clustering unit 32 described in the first embodiment can be extracted and function as an utterance clustering device. Similarly, the input unit 10, the dialog data storage unit 21, the feature amount extraction unit 30, the context information addition unit 231, and the CRP clustering unit 232 described in the second embodiment are extracted and functioned as an utterance clustering device. Can do. By automatically determining the number of clusters and clustering utterances using context information, there is an advantage that highly accurate clustering becomes possible.

また、本願明細書中において、プログラムが予めインストールされている実施形態として説明したが、当該プログラムを、コンピュータ読み取り可能な記録媒体に格納して提供することも可能である。 In the present specification, the embodiment has been described in which the program is installed in advance. However, the program can be provided by being stored in a computer-readable recording medium.

１０、１０Ａ、１０Ｂ入力部
２０演算部
２１、２１Ａ、２１Ｂ対話データ記憶部
２２、２２Ａ、２２Ｂ、２２２発話クラスタリング部
２３、２３Ａ、２３Ｂ文脈依存度算出部
３０特徴量抽出部
３１、２３２ＣＲＰクラスタリング部
３２無限ＨＭＭクラスタリング部
１００、２００、３００文脈依存性推定装置
２３１文脈情報付加部
３２３文脈依存比算出部 10, 10A, 10B Input unit 20 Arithmetic unit 21, 21A, 21B Dialog data storage unit 22, 22A, 22B, 222 Utterance clustering unit 23, 23A, 23B Context-dependent calculation unit 30 Feature quantity extraction unit 31, 232 CRP clustering unit 32 Infinite HMM clustering unit 100, 200, 300 Context dependency estimation device 231 Context information addition unit 323 Context dependency ratio calculation unit

Claims

Feature amount extraction means for extracting feature amounts of each utterance from dialogue data that is a time series of a plurality of utterances;
First clustering means for clustering the plurality of utterances based on the feature quantity of each utterance extracted by the feature quantity extraction means;
Second clustering means for clustering the plurality of utterances using context information of the utterance based on the feature quantity of each utterance extracted by the feature quantity extraction means;
Estimating means for estimating the degree of dependence on the context based on the clustering result by the first clustering means and the clustering result by the second clustering means;
Context-dependent estimation device including

The first clustering means clusters the plurality of utterances in accordance with CRP (Chinese Restaurant Process).
The context dependence estimation apparatus according to claim 1, wherein the second clustering unit clusters the plurality of utterances using transition information between utterances of the conversation data according to an infinite HMM (Hidden Markov Model).

The second clustering unit generates an additional feature amount obtained by adding the feature amount of the utterance immediately before the utterance as context information of the utterance to the feature amount of each utterance extracted by the feature amount extracting unit, The context dependence estimation apparatus according to claim 1, wherein the plurality of utterances are clustered based on the generated additional feature amount of each utterance.

The interaction data is interaction data regarding a specific domain,
4. The context dependency estimation apparatus according to claim 1, wherein the estimation unit estimates a context dependency of the utterance in the specific domain according to the following expression. 5.

However, the cluster number C1 is the number of clusters clustered by the first clustering means, and the cluster number C2 is the number of clusters clustered by the second clustering means.

The interaction data is interaction data regarding two different domains;
The first clustering means clusters, for each domain, a plurality of utterances of dialogue data related to the domain,
The second clustering means clusters, for each domain, a plurality of utterances of dialogue data related to the domain,
5. The context dependency estimation apparatus according to claim 4, wherein the estimation means estimates the context dependency of the utterance for each domain and estimates the context dependency ratio of the domain according to the following equation.

The context estimation device according to any one of claims 1 to 3, wherein the estimation unit estimates the context dependency of the cluster C clustered by the first clustering unit according to the following expression.

Here, C ′ is a set of clusters clustered by the second clustering means, and c is a cluster that is an element of C ′.

The estimation means estimates the context dependence of each cluster clustered by the first clustering means, and estimates the average context dependence of the clusters clustered by the first clustering means according to the following formula: 6. The context-dependent estimation device according to 6.

Here, c ″ is a cluster which is an element of C.

A context dependency estimation method in a context dependency estimation apparatus including a feature amount extraction unit, a first clustering unit, a second clustering unit, and an estimation unit,
The context-dependent estimation device includes:
The feature value extraction means extracts feature values of each utterance from dialogue data that is a time series of a plurality of utterances,
Clustering the plurality of utterances based on the feature amount of each utterance extracted by the feature amount extraction means by the first clustering means;
Based on the feature quantity of each utterance extracted by the feature quantity extraction means by the second clustering means, the utterance context information is used to cluster the plurality of utterances,
A context-dependent estimation method characterized in that the estimation unit estimates a context-dependent degree based on a clustering result by the first clustering unit and a clustering result by the second clustering unit.

Input means for receiving dialogue data that is a time series of a plurality of input utterances;
Dialog data storage means for storing the dialog data received by the input means;
Feature amount extraction means for extracting feature amounts of each utterance from the dialogue data;
Infinite HMM clustering means for clustering the plurality of utterances using transition information between utterances of the dialogue data according to an infinite HMM (Hidden Markov Model) based on the feature quantity of each utterance extracted by the feature quantity extraction means. When,
Utterance clustering device.

Input means for receiving dialogue data that is a time series of a plurality of input utterances;
Dialog data storage means for storing the dialog data received by the input means;
Feature amount extraction means for extracting feature amounts of each utterance from the dialogue data;
Context information adding means for generating each additional feature quantity obtained by adding the feature quantity of the utterance immediately before the utterance as context information of the utterance to the feature quantity of each utterance extracted by the feature quantity extracting means;
CRP clustering means for clustering the plurality of utterances according to a CRP (Chinese Restaurant Process) based on the additional feature amount of each utterance generated by the context information adding means;
Utterance clustering device.

An utterance clustering method in an utterance clustering device including an input means, an interaction data storage means, a feature amount extraction means, and an infinite HMM clustering means,
The utterance clustering device comprises:
Accepting dialogue data that is a time series of a plurality of utterances input by the input means,
Storing the dialog data received by the input means in a dialog data storage means;
The feature amount extraction means extracts the feature amount of each utterance from the dialogue data,
Based on the feature amount of each utterance extracted by the feature amount extraction means by the infinite HMM clustering means, the plurality of utterances using transition information between utterances of the dialog data according to an infinite HMM (Hidden Markov Model). An utterance clustering method characterized by clustering.

An utterance clustering method in an utterance clustering device including an input means, an interaction data storage means, a feature amount extraction means, a context information addition means, and a CRP clustering means,
The utterance clustering device comprises:
Accepting dialogue data that is a time series of a plurality of utterances input by the input means,
Storing the dialogue data received by the input means in the dialogue data storage means;
The feature amount extraction means extracts the feature amount of each utterance from the dialogue data,
The context information adding means generates an additional feature quantity obtained by adding the feature quantity of the utterance immediately before the utterance as context information of the utterance to the feature quantity of each utterance extracted by the feature quantity extracting means,
An utterance clustering method comprising: clustering the plurality of utterances according to a CRP (Chinese Restorant Process) by the CRP clustering means based on an additional feature amount of each utterance generated by the context information adding means.

The program for functioning a computer as each means of the context dependence estimation apparatus of any one of Claims 1-7.