JP6630304B2

JP6630304B2 - Dialogue destruction feature extraction device, dialogue destruction feature extraction method, program

Info

Publication number: JP6630304B2
Application number: JP2017042710A
Authority: JP
Inventors: 弘晃杉山
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2017-03-07
Filing date: 2017-03-07
Publication date: 2020-01-15
Anticipated expiration: 2037-03-07
Also published as: JP2018147288A

Description

本発明は、雑談対話技術に関し、特にある発話がどの程度対話を破綻させる不適切さを持つかを推定する対話破壊力推定技術に関する。 The present invention relates to a chat dialogue technology, and more particularly to a dialogue destruction force estimating technology for estimating a degree of inadequacy of a certain utterance to break a dialogue.

近年、特定のタスクを目的としないオープンドメインな雑談を行う雑談対話システムへのニーズが高まっている。雑談対話システムでは、例えば、人手で構築した対話知識を利用して発話する方法（非特許文献１）、Ｗｅｂの情報を利用して発話を構築する方法（非特許文献２）が利用されている。 2. Description of the Related Art In recent years, there has been an increasing need for a chat dialogue system for performing an open-domain chat that does not aim at a specific task. In the chat dialogue system, for example, a method of uttering using dialogue knowledge manually constructed (Non-Patent Document 1) and a method of constructing utterance using Web information (Non-Patent Document 2) are used. .

目黒豊美，杉山弘晃，東中竜一郎，“ルールベース発話生成と統計的発話生成の融合に基づく対話システムの構築”，人工知能学会全国大会論文集，pp.1-4，2014．Toyomi Meguro, Hiroaki Sugiyama, Ryuichiro Higashinaka, "Construction of Dialogue System Based on Fusion of Rule-Based Utterance Generation and Statistical Utterance Generation," Proceedings of the Japan Society for Artificial Intelligence, pp.1-4, 2014. 杉山弘晃，目黒豊美，東中竜一郎，南泰浩，“任意の話題を持つユーザ発話に対する係り受けと用例を利用した応答文の生成”，人工知能学会論文誌，vol.30, no.1, pp.183-194, 2015．Hiroaki Sugiyama, Toyomi Meguro, Ryuichiro Higashinaka, Yasuhiro Minami, "Generating Response Sentences Using Dependencies and Examples for User Utterances with Arbitrary Topics," Transactions of the Japanese Society for Artificial Intelligence, vol.30, no.1, pp .183-194, 2015.

雑談対話システムは、ユーザの発話に含まれる非常に幅広い話題に応答する必要がある。そのため、適切な応答を出力し続けることは難しく、現在の雑談対話システムでは対話を破綻させるような発話がしばしば生成される（参考非特許文献１）。
（参考非特許文献１：東中竜一郎，船越孝太郎，“Project Next NLP対話タスクにおける雑談対話データの収集と対話破綻アノテーション”，人工知能学会言語・音声理解と対話処理研究会第72回，pp.45-50, 2014．）
こうした対話を破綻させる可能性のある発話を実際に発話する前に予め検出し、出力を抑制することができれば、雑談対話システムにおける対話の継続が容易になると考えられる。 The chat dialogue system needs to respond to a very wide range of topics contained in the user's utterance. Therefore, it is difficult to continuously output an appropriate response, and in the current chat dialogue system, an utterance that breaks down the dialogue is often generated (Reference Non-Patent Document 1).
(Reference non-patent document 1: Ryuichiro Higashinaka, Kotaro Funakoshi, "Collecting chat conversation data and conversation failure annotation in the Project Next NLP dialog task", The 72nd meeting of the Japanese Society for Artificial Intelligence Language and Speech Understanding and Dialogue Processing, pp. 45-50, 2014.)
If it is possible to detect such an utterance that may break the dialog before actually uttering it and suppress the output, it is considered that the continuation of the dialog in the chat dialog system becomes easy.

そこで本発明は、発話が対話を破綻させる程度を示す対話破壊力を推定するために用いることができる対話破壊特徴量を抽出する技術を提供することを目的とする。 Therefore, an object of the present invention is to provide a technique for extracting a dialogue destruction feature amount that can be used to estimate a dialogue destructive force indicating a degree at which an utterance breaks a dialogue.

本発明の一態様は、Jを1以上の整数、第j種特徴量（1≦j≦J）を対話が破綻しているか否かの特徴を示す特徴量とし、一連の発話からなる対話から、前記第j種特徴量（1≦j≦J）の組合せである対話破壊特徴量を抽出する対話破壊特徴量抽出部とを含む対話破壊特徴量抽出装置であって、前記対話破壊特徴量抽出部は、1≦j≦Jを満たす各jについて、前記対話から前記第j種特徴量を計算する第j種特徴量計算部を含み、頻出単語列特徴量を、対話内に所定の頻度以上出現する単語Ngramの文字列を要素とする特徴量、発話間類似度特徴量を、対話内の最後の発話とそれ以外の発話がどの程度似ているかを表す特徴量、文長特徴量を、対話内の最後の発話の単語長および文字長を表す特徴量、ターン数特徴量を、対話開始からの経過ターン数を表す特徴量、単語ベクトル特徴量を、対話に含まれる各発話の単語N-gramを表すベクトルから生成されるベクトルとして表現される特徴量、質問タイプ特徴量を、対話内の最後の発話の直前の発話が質問である場合、推定される質問タイプを表す特徴量、質問クラス特徴量を、対話内の最後の発話の直前の発話が質問である場合、質問が回答に要求していると推定される単語クラスを表すベクトル、最後の発話に含まれる単語クラスを表すベクトル、それらの差分ベクトル、それらの２つのベクトルが表す単語クラスが一致しているか否かを表す真偽値のうち、いずれか１つ以上からなる特徴量、話題繰り返し数特徴量を、対話内での話題の繰り返し数を表す特徴量とし、前記対話破壊特徴量は、頻出単語列特徴量、発話間類似度特徴量、文長特徴量、ターン数特徴量、単語ベクトル特徴量、質問タイプ特徴量、質問クラス特徴量、話題繰り返し数特徴量のうち、いずれか1つ以上の特徴量を含む組合せである。 According to one embodiment of the present invention, J is an integer equal to or greater than 1 and a j-th feature amount (1 ≦ j ≦ J) is a feature amount indicating whether or not a dialogue has failed. A dialogue destruction feature amount extraction unit that extracts a dialogue destruction feature amount that is a combination of the j-th type feature amount (1 ≦ j ≦ J). The unit includes, for each j that satisfies 1 ≦ j ≦ J, a j-th feature amount calculating unit that calculates the j-th feature amount from the dialogue, and sets the frequent word string feature amount to a predetermined frequency or more within the dialogue. The feature amount using the character string of the appearing word Ngram as an element, the inter-utterance similarity feature amount, the feature amount indicating how similar the last utterance in the dialogue is to other utterances, the sentence length feature amount, The features representing the word length and character length of the last utterance in the dialogue, the number of turns feature, and the features representing the number of turns elapsed since the start of the dialogue , The word vector feature, the feature expressed as a vector generated from the vector representing the word N-gram of each utterance included in the dialog, the question type feature, the utterance immediately before the last utterance in the dialog If it is a question, the features representing the question type to be estimated, the question class features, and the word that is presumed to be required for the answer if the utterance immediately before the last utterance in the dialogue is a question Any one of a vector representing a class, a vector representing a word class included in the last utterance, a difference vector between them, and a boolean indicating whether or not the word classes represented by the two vectors match. The feature amount and the topic repetition number feature amount described above are set as the feature amount representing the number of repetitions of the topic in the dialog, and the dialogue destruction feature amount is a frequent word string feature amount, an inter-utterance similarity feature amount, a sentence length feature. amount, Over emissions number feature quantity, word vector feature quantity, question type feature amount, QCLASS feature amount, out of the topic repetition number feature amount, a combination comprising any one or more feature values.

本発明の一態様は、Jを1以上の整数、第j種特徴量（1≦j≦J）を対話が破綻しているか否かの特徴を示す特徴量とし、一連の発話からなる対話から、前記第j種特徴量（1≦j≦J）の組合せである対話破壊特徴量を抽出する対話破壊特徴量抽出部とを含む対話破壊特徴量抽出装置であって、前記対話破壊特徴量抽出部は、1≦j≦Jを満たす各jについて、前記対話から前記第j種特徴量を計算する第j種特徴量計算部を含み、対話行為特徴量を、対話に含まれる発話が表す対話行為から生成される特徴量、発話間類似度特徴量を、対話内の最後の発話とそれ以外の発話がどの程度似ているかを表す特徴量、単語特徴量を、対話に含まれる各発話の単語N-gramを並べたBag-of-wordsベクトルとして表現される特徴量、単語クラス特徴量を、対話に含まれる各発話の単語に対応する単語クラスを並べたBag-of-classesベクトルとして表現される特徴量、単語組合せ特徴量を、対話内の最後の発話とそれ以外の発話との間または最後の発話内において共起している単語Ngram、単語クラスNgram、単語集合、述語項構造のいずれかの組合せの出現結果を要素とするBag-of-wordsベクトルとして表現される特徴量、頻出単語列特徴量を、対話内に所定の頻度以上出現する単語Ngramの文字列を要素とする特徴量とし、前記対話破壊特徴量は、対話行為特徴量、発話間類似度特徴量を含み、単語特徴量、単語クラス特徴量、単語組合せ特徴量、頻出単語列特徴量のうち少なくともいずれか1つの特徴量を含まない組合せである。 According to one embodiment of the present invention, J is an integer equal to or greater than 1 and a j-th feature amount (1 ≦ j ≦ J) is a feature amount indicating whether or not a dialogue has failed. A dialogue destruction feature amount extraction unit that extracts a dialogue destruction feature amount that is a combination of the j-th type feature amount (1 ≦ j ≦ J). The unit includes, for each j that satisfies 1 ≦ j ≦ J, a j-th feature amount calculation unit that calculates the j-th feature amount from the dialogue, and the dialogue act feature amount is a dialogue represented by an utterance included in the dialogue. The features generated from the action, the similarity between utterances, the amount of similarity between the last utterance in the dialogue and the other utterances, and the word feature amount are calculated for each utterance included in the dialogue. The feature amount expressed as a Bag-of-words vector in which word N-grams are arranged and the word class feature amount are calculated for each utterance included in the dialogue. A feature expressed as a Bag-of-classes vector in which word classes corresponding to words are arranged, and a word combination feature are co-occurred between the last utterance in a dialog and other utterances or in the last utterance The feature amount expressed as a Bag-of-words vector whose element is the appearance result of any combination of the word Ngram, word class Ngram, word set, and predicate term structure, Is a feature amount having a character string of a word Ngram appearing at a predetermined frequency or more as an element, and the dialogue destruction feature amount includes a dialogue act feature amount, an utterance similarity feature amount, a word feature amount, a word class feature amount, The combination does not include at least one of the word combination feature amount and the frequent word string feature amount.

本発明の一態様は、Jを1以上の整数、第j種特徴量（1≦j≦J）を対話が破綻しているか否かの特徴を示す特徴量とし、一連の発話からなる対話から、前記第j種特徴量（1≦j≦J）の組合せである対話破壊特徴量を抽出する対話破壊特徴量抽出部とを含む対話破壊特徴量抽出装置であって、前記対話破壊特徴量抽出部は、1≦j≦Jを満たす各jについて、前記対話から前記第j種特徴量を計算する第j種特徴量計算部を含み、発話間類似度特徴量を、対話内の最後の発話とそれ以外の発話がどの程度似ているかを表す特徴量、頻出単語列特徴量を、対話内に所定の頻度以上出現する単語Ngramの文字列を要素とする特徴量、対話行為特徴量を、対話に含まれる発話が表す対話行為から生成される特徴量、文長特徴量を、対話内の最後の発話の単語長および文字長を表す特徴量、単語組合せ特徴量を、対話内の最後の発話とそれ以外の発話との間または最後の発話内において共起している単語Ngram、単語クラスNgram、単語集合、述語項構造のいずれかの組合せの出現結果を要素とするBag-of-wordsベクトルとして表現される特徴量とし、前記対話破壊特徴量は、発話間類似度特徴量、頻出単語列特徴量、対話行為特徴量、文長特徴量、単語組合せ特徴量を含む組合せである。 According to one embodiment of the present invention, J is an integer equal to or greater than 1 and a j-th feature amount (1 ≦ j ≦ J) is a feature amount indicating whether or not a dialogue has failed. A dialogue destruction feature amount extraction unit that extracts a dialogue destruction feature amount that is a combination of the j-th type feature amount (1 ≦ j ≦ J). The unit includes, for each j that satisfies 1 ≦ j ≦ J, a j-th feature amount calculating unit that calculates the j-th feature amount from the dialogue, and calculates the similarity feature between utterances in the last utterance in the dialogue. The feature amount that represents how similar the utterance is and the other utterances, the frequent word string feature amount, the feature amount of the word Ngram that appears in the dialog at a predetermined frequency or more as an element, the dialogue act feature amount, The feature amount and sentence length feature amount generated from the dialogue action represented by the utterance included in the dialogue are converted to the word length and sentence of the last utterance in the dialogue. Word Ngram, word class Ngram, word set, predicate term structure that co-occurs a feature representing length and a word combination feature between the last utterance in a dialog and other utterances or in the last utterance , And the dialogue destruction features are the utterance similarity feature, the frequent word string feature, and the dialogue act feature. , Sentence length feature, and word combination feature.

本発明の一態様は、Jを1以上の整数、第j種特徴量（1≦j≦J）を対話が破綻しているか否かの特徴を示す特徴量とし、一連の発話からなる対話から、前記第j種特徴量（1≦j≦J）の組合せである対話破壊特徴量を抽出する対話破壊特徴量抽出部とを含む対話破壊特徴量抽出装置であって、前記対話破壊特徴量抽出部は、1≦j≦Jを満たす各jについて、前記対話から前記第j種特徴量を計算する第j種特徴量計算部を含み、対話行為特徴量を、対話に含まれる発話が表す対話行為から生成される特徴量、発話間類似度特徴量を、対話内の最後の発話とそれ以外の発話がどの程度似ているかを表す特徴量、単語特徴量を、対話に含まれる各発話の単語N-gramを並べたBag-of-wordsベクトルとして表現される特徴量、単語クラス特徴量を、対話に含まれる各発話の単語に対応する単語クラスを並べたBag-of-classesベクトルとして表現される特徴量とし、前記対話破壊特徴量は、対話行為特徴量、発話間類似度特徴量を含み、更に単語特徴量、単語クラス特徴量のうちいずれか1つ以上の特徴量を含む組合せである。 According to one embodiment of the present invention, J is an integer equal to or greater than 1 and a j-th feature amount (1 ≦ j ≦ J) is a feature amount indicating whether or not a dialogue has failed. A dialogue destruction feature amount extraction unit that extracts a dialogue destruction feature amount that is a combination of the j-th type feature amount (1 ≦ j ≦ J). The unit includes, for each j that satisfies 1 ≦ j ≦ J, a j-th feature amount calculation unit that calculates the j-th feature amount from the dialogue, and the dialogue act feature amount is a dialogue represented by an utterance included in the dialogue. The features generated from the action, the similarity between utterances, the amount of similarity between the last utterance in the dialogue and the other utterances, and the word feature amount are calculated for each utterance included in the dialogue. The feature amount expressed as a Bag-of-words vector in which word N-grams are arranged and the word class feature amount are calculated for each utterance included in the dialogue. The feature amount is expressed as a Bag-of-classes vector in which word classes corresponding to the words are arranged. The dialogue destruction feature amount includes a dialogue act feature amount, an inter-utterance similarity feature amount, and further includes a word feature amount, a word This is a combination that includes at least one of the class feature amounts.

本発明によれば、発話が対話を破綻させる程度を示す対話破壊力を推定するために用いることができる対話破壊特徴量を抽出することができる。 According to the present invention, it is possible to extract a dialogue destruction feature amount that can be used for estimating a dialogue destructive force indicating a degree at which an utterance breaks a dialogue.

対話破壊モデル学習装置１００の構成の一例を示す図。The figure which shows an example of a structure of the dialog destruction model learning apparatus 100. 対話破壊モデル学習装置１００の動作の一例を示す図。The figure which shows an example of operation | movement of the dialog destruction model learning apparatus 100. 対話破壊力推定装置２００の構成の一例を示す図。The figure which shows an example of a structure of the dialog destructive force estimation apparatus 200. 対話破壊力推定装置２００の動作の一例を示す図。The figure which shows an example of operation | movement of the dialog destructive force estimation apparatus 200. 対話破壊力推定装置２００の入出力の一例を示す図。The figure which shows an example of the input / output of the dialog destructive force estimation apparatus 200.

以下、本発明の実施の形態について、詳細に説明する。なお、同じ機能を有する構成部には同じ番号を付し、重複説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail. Note that components having the same functions are given the same numbers, and overlapping descriptions are omitted.

＜用語＞
まず、各実施形態で用いる用語について簡単に説明する。
対話とは、過去Ｎ個（Ｎは１以上の整数）の一連の発話のことをいう。つまり、発話とはユーザとシステムによる発話の時系列のことである。 <Term>
First, terms used in each embodiment will be briefly described.
The dialogue refers to a series of utterances in the past N (N is an integer of 1 or more). That is, the utterance is a time series of the utterance by the user and the system.

対話破壊力とは、発話が対話を破綻させる程度のことをいう。過去Ｎ個の一連の発話（対話）において、最後の発話がこの対話を破綻させる程度のことである。なお、最後の発話は未発話となっていても構わない。対話破壊力は、破綻している、破綻していない、どちらでもないのいずれかを示すラベルとして表現してもよいし、破綻している、破綻していない、それ以外を確率変数とする確率分布として表現してもよい。また、対話破壊力は、対話の破綻の程度を示す実数値として表現してもよい。対話破壊力を実数値として表現する場合は、閾値との比較で対話が破綻している、破綻していない、どちらでもないなどの状態を判断すればよい。 The dialogue destructive power is the degree to which the utterance breaks down the dialogue. In a series of past N utterances (dialogs), the last utterance breaks the dialogue. Note that the last utterance may be unuttered. The dialogue destructive power may be expressed as a label indicating whether the bankruptcy is not bankrupt, not bankrupt, or neither, or is the probability of bankruptcy, not bankrupt, and the others as random variables. It may be expressed as a distribution. Further, the dialog destructive power may be expressed as a real numerical value indicating the degree of the collapse of the dialog. When expressing the dialog destructive power as a real value, a state in which the dialog has failed, not failed, or neither may be determined by comparing with a threshold value.

＜特徴量＞
対話の破綻に影響を及ぼす要因は様々あり、各要因に関連する特徴量も多様である。以下、対話が破綻しているか否かの特徴を示す特徴量について説明していく。その際、各特徴量の定義に加えて、各特徴量がどのような観点で対話破壊力に影響しているかについても説明する。なお、一般に、特徴量は、対話に含まれるいくつかの発話から抽出されるものであり、ベクトルとして表現される。 <Feature>
There are various factors that affect the failure of the dialogue, and the features associated with each factor also vary. Hereinafter, the feature amount indicating the feature of whether or not the dialogue has failed will be described. At this time, in addition to the definition of each feature, a description is also given of the viewpoint from which each feature affects the dialog destructive power. Note that, in general, the feature amount is extracted from some utterances included in the dialogue, and is expressed as a vector.

以下、各特徴量の説明の中で、具体的な例を挙げることもあるが、これらはあくまでも一例であってその他のベクトル表現であっても構わない。 Hereinafter, specific examples may be given in the description of each feature amount, but these are merely examples, and other vector expressions may be used.

［話題の結束性］
現在の対話システムでは、ユーザの発話と関係のない話題の発話を生成してしまい、対話を破綻させてしまう場合がある。また逆に、１つの話題に固執し何度も同じ内容や話題の発話を繰り返すことで、対話を破綻させてしまう場合もある。 [Topic unity]
In the current dialog system, utterance of a topic unrelated to the utterance of the user is generated, and the dialog may be broken. Conversely, by sticking to one topic and repeating the utterance of the same content or topic many times, the dialogue may be broken.

そこで、話題の遷移パターンの出現頻度、ユーザの発話の話題とシステムによる発話の話題の近さ、話題の繰り返し回数などを測ることにより、話題の結束性を考慮した対話破壊力推定が可能になる。以下、単語組合せ特徴量、発話間類似度特徴量、話題繰り返し数特徴量の３つの特徴量について説明する。 Therefore, by measuring the appearance frequency of the topic transition pattern, the closeness of the user's utterance topic and the utterance topic of the system, the number of repetitions of the topic, etc., it becomes possible to estimate the dialog destructive power considering the cohesion of the topic. . Hereinafter, the three feature amounts of the word combination feature amount, the inter-utterance similarity feature amount, and the topic repetition number feature amount will be described.

(1-1)単語組合せ特徴量
単語組合せ特徴量とは、対話内の最後の発話とそれ以外の発話との間または最後の発話内において共起している単語Ngram、単語クラスNgram、単語集合、述語項構造のいずれかの組合せ（以下、単語組合せという）の出現結果を要素とするBag-of-wordsベクトルとして表現される特徴量である。 (1-1) Word combination feature amount The word combination feature amount is a word Ngram, word class Ngram, word set co-occurring between the last utterance in a dialog and another utterance or in the last utterance. , A feature amount expressed as a Bag-of-words vector having an appearance result of any combination of the predicate-argument structures (hereinafter referred to as a word combination) as an element.

なお、単語Ngram、単語クラスNgram、単語集合、述語項構造は、形態素解析することにより対話に含まれる発話を単語に分割し、得ることができる。形態素解析の対象となる発話は、対話破壊力推定対象となる最後の発話を含む直前のM個の発話である。なお、Mは1〜4程度が好ましい。 The word Ngram, word class Ngram, word set, and predicate term structure can be obtained by performing morphological analysis to divide the utterance included in the dialog into words. The utterances to be subjected to the morphological analysis are the M utterances immediately before and including the last utterance to be the target of estimating the dialog destructive power. In addition, M is preferably about 1 to 4.

単語Ngram、単語クラスNgramのNは1〜4程度が好ましい。また、単語クラスとは、word2vec（詳細は後述する）を用いて得られる単語ベクトル表現をクラスタリングした結果得られる単語ベクトルの集合、または日本語語彙大系のような辞書で付与されている、単語を抽象化して表現したものでよい。例えば、自動車の単語クラスは、乗り物・人工物などと表現される。また、単語集合は、順序を考慮するNgramとは異なり、文（例えば、直前のM個の発話）に含まれる単語と単語が離れていてもよい。 N of the word Ngram and the word class Ngram is preferably about 1 to 4. A word class is a set of word vectors obtained as a result of clustering word vector expressions obtained using word2vec (details will be described later), or a word assigned in a dictionary such as the Japanese vocabulary system. May be represented by abstraction. For example, the word class of a car is represented as a vehicle / artificial object. Also, unlike an Ngram that considers the order, the word set may be separated from words included in a sentence (for example, the immediately preceding M utterances).

通常、このような単語組合せとして得られるものの数は、非常に膨大となり、効率的なモデル学習の妨げとなる。そこで、単語組合せ特徴量の定義に用いる単語組合せを、学習対象となる対話からなるコーパスにおける出現数、TF-IDF値などを用いて上位K個（Kは数十個から数万個程度）に限定するとよい。また、取りうる単語組合せの範囲を各発話内の係り受け関係があるもの、述語項構造内のみの共起に限定してもよい。さらに、考慮する単語を内容語のみに限定し、助詞や句読点などの話題に関わらない単語を除いてもよい。その他、考慮する単語を名詞に比べて種類が少ない述語のみに限定してもよい。このようにすると、名詞の多様性に対して頑健に推定することができる。 Usually, the number of such word combinations obtained is extremely large, which hinders efficient model learning. Therefore, the number of word combinations used to define the word combination features is increased to the top K (K is about tens to tens of thousands) using the number of appearances in the corpus consisting of the dialogues to be learned and the TF-IDF value. It should be limited. Further, the range of possible word combinations may be limited to those having a dependency relation in each utterance and co-occurrence only in the predicate term structure. Furthermore, words to be considered may be limited to only content words, and words that do not relate to the topic, such as particles and punctuation, may be excluded. In addition, the words to be considered may be limited to only predicates having fewer types than nouns. In this way, it is possible to robustly estimate the diversity of nouns.

このように、単語組合せ特徴量を用いることで、ある対話システムに特有の破綻パターンを捉えたり、逆に一般的にあり得る話題の遷移パターンを捉えたりすることが可能になる。 As described above, by using the word combination feature amount, it becomes possible to capture a failure pattern peculiar to a certain dialogue system, and conversely, a general transition pattern of a topic that can occur.

以下の対話例１を用いて、単語組合せ特徴量の例について説明する。
（対話例１）
１ユーザ: こんにちは/。/旅行/は/好き/です/か/？
２システム: はい/。/先日/京都/に/行き/まし/た/。
ただし、記号“/”は単語の区切りを表す。
例えば、対話例１において、ユーザ発話からは「こんにちは」「。」「旅行」「は」「好き」「です」「か」「？」の８個の単語が得られ、システム発話からは「はい」「。」「先日」「京都」「に」「行き」「まし」「た」「。」の９個の単語が得られた場合、単語Ngram(N=1)の組合せとして、ユーザ発話とシステム発話の間で「はい-こんにちは」「はい-。」「はい-旅行」…「。-？」の9x8=72通りの組合せが得られ、システム発話内で₉C₂=9x8/2=36通りの組合せが得られる。つまり、対話内の最後の発話とそれ以外の発話との間において共起している単語Ngramの組合せが72通り、最後の発話内において共起している単語Ngramの組合せが36通り得られる。 An example of the word combination feature amount will be described using Dialog Example 1 below.
(Interaction example 1)
1 user: Hello /. / Travel / is / like / is / does /?
2 System: Yes /. / The other day / Kyoto / To / Go / Most / Ta /.
Here, the symbol “/” indicates a word delimiter.
For example, in an interactive Example 1, from the user utterance "Hello", ".", "Travel", "may", "love", "is", "do", "?" Is eight words of the obtained, "Yes, from the system utterance If the nine words “,” “.”, “The other day”, “Kyoto”, “ni”, “go”, “masashi”, “ta”, and “.” Are obtained, the combination of the word Ngram (N = 1) between the system utterance "Yes - Hello""Yes-." - "? .-""Yestravel" ... a combination of ways 9x8 = 72 is obtained, and in the system utterance _{_{9 C 2 = 9x8 / 2 =}} 36 The following combinations are obtained. That is, 72 combinations of word Ngrams co-occurring between the last utterance in the dialogue and other utterances are obtained, and 36 combinations of word Ngrams co-occurring in the last utterance are obtained.

このように、学習に用いる訓練データ内に現れるすべての組合せを列挙し、それぞれの組合せがある次元の要素に対応するベクトルを構成しておく。ベクトルの各次元の要素の値は、ある組合せが出現している場合は対応するベクトルの次元の要素を1、出現しない場合は0とする。例えば、上記により構成されるベクトルの次元数が5で、2次元目の要素に対応する組合せのみが出現していた場合、得られる単語組合せ特徴量は(0,1,0,0,0)となる。 In this manner, all combinations appearing in the training data used for learning are listed, and a vector corresponding to each dimensional element is formed for each combination. The value of the element of each dimension of the vector is set to 1 if a certain combination appears, and 0 if it does not appear. For example, when the number of dimensions of the vector configured as described above is 5, and only a combination corresponding to the element of the second dimension appears, the obtained word combination feature amount is (0,1,0,0,0) It becomes.

(1-2)発話間類似度特徴量
発話間類似度特徴量とは、対話内の最後の発話とそれ以外の発話がどの程度似ているかを表す特徴量であり、後述する類似度のうち、１つ以上の類似度を要素として並べたベクトルである。ここで用いる各類似度は、発話と発話の間の類似の程度を測るものである。 (1-2) Inter-utterance similarity feature amount The inter-utterance similarity feature amount is a feature amount indicating how similar the last utterance in a dialog is to other utterances. Is a vector in which one or more similarities are arranged as elements. Each similarity used here measures the degree of similarity between utterances.

特定の遷移パターンをとらえる単語組合せ特徴量と異なり、発話間類似度特徴量は、対話中に現れない組合せであっても、発話間の関連性を類似度に基づいて評価することができる。 Unlike the word combination feature that captures a specific transition pattern, the inter-utterance similarity feature can evaluate the relevance between utterances based on the similarity, even for a combination that does not appear during a dialogue.

なお、直前のユーザの発話と最後の発話との間で発話間類似度特徴量を計算した場合は、不連続な話題の遷移を検出することができる。また、直前のシステムによる発話と最後の発話との間で発話間類似度特徴量を計算した場合は、特定の話題への固執を検出することができる。 If the inter-utterance similarity feature amount is calculated between the immediately preceding user's utterance and the last utterance, a discontinuous topic transition can be detected. When the inter-utterance similarity feature amount is calculated between the utterance by the immediately preceding system and the last utterance, persistence on a specific topic can be detected.

類似度の計算には、単語コサイン距離、word2vecの平均ベクトル間距離（参考非特許文献２）、WordMoversDistance距離（参考非特許文献３）を用いることができる。また、BLEUスコア、ROUGEスコア（参考非特許文献４）のような、単語の組合せ（BLEUスコア及びROUGEスコアの場合は単語Ngram）を考慮したものを用いることもできる。つまり、類似度は単語間の距離や単語の共起関係に基づいて算出される発話間の距離といえる。 The similarity can be calculated using the word cosine distance, the average vector distance of word2vec (Reference Non-Patent Document 2), and the WordMoversDistance distance (Reference Non-Patent Document 3). Further, a combination of words (word Ngram in the case of a BLEU score and a ROUGE score), such as a BLEU score and a ROUGE score (reference non-patent document 4), may be used. In other words, the degree of similarity can be said to be the distance between utterances calculated based on the distance between words or the co-occurrence relationship between words.

word2vecは、単語を意味ベクトルへ変換する手法であり、各単語に対応する意味ベクトルは、コーパス（ここでは対話）内で共起する単語が似ている単語としてベクトル間距離が近くなるように計算される。これにより、word2vecの平均ベクトル間距離は、単語コサイン距離より、ネコと猫、ネコと子猫などのような表記のゆれや小さな違いに対して頑健になる。
（参考非特許文献２：T. Mikolov, K. Chen, G. Corrado, J. Dean, “Efficient Estimation of Word Representations in Vector Space”, arXiv preprint arXiv:1301.3781, 2013.） word2vec is a method of converting words into semantic vectors, and the meaning vectors corresponding to each word are calculated so that words that co-occur in the corpus (here, dialogue) are similar so that the distance between the vectors is short. Is done. This makes the average vector-to-vector distance of word2vec more robust than word cosine distance to fluctuations and small differences in notations such as cats and cats and cats and kittens.
(Reference non-patent document 2: T. Mikolov, K. Chen, G. Corrado, J. Dean, “Efficient Estimation of Word Representations in Vector Space”, arXiv preprint arXiv: 1301.3781, 2013.)

WordMoversDistanceは、ある文Sに含まれている単語wについて、文Sとは別の文S’に含まれる単語vとの距離d(w,v)を調べ、最も近い距離d’(w)を計算し、文Sのすべての単語についての総和Σd’(w)を取ったものである。WordMoversDistanceは、個々の単語の類似性をword2vecの平均ベクトル間距離よりも詳細に評価することができる。
（参考非特許文献３：M. J. Kusner, Y. Sun, N. I. Kolkin, K. Q. Weinberger, “From Word Embeddings To Document Distances”, Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pp.957-966, 2015.） WordMoversDistance examines the distance d (w, v) between a word w contained in a certain sentence S and a word v contained in another sentence S ′ from the sentence S, and finds the closest distance d ′ (w). It is calculated and the sum Σd '(w) of all the words of the sentence S is obtained. WordMoversDistance can evaluate the similarity of individual words in more detail than the average vector-to-vector distance of word2vec.
(Reference non-patent document 3: MJ Kusner, Y. Sun, NI Kolkin, KQ Weinberger, “From Word Embeddings To Document Distances”, Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pp.957-966, 2015.)

BLEUスコア及びROUGEスコアは、機械翻訳などで用いられる、２文間の距離を単語Ngramの一致率を利用して計算するものである。
（参考非特許文献４：平尾努，磯崎秀樹，須藤克仁，Duh Kevin，塚田元，永田昌明，“語順の相関に基づく機械翻訳の自動評価法”，自然言語処理，vol.21, no.3, pp.421-444, 2014．） The BLEU score and the ROUGE score are used in machine translation or the like to calculate the distance between two sentences using the matching rate of the word Ngram.
(Non-Patent Document 4: Tsutomu Hirao, Hideki Isozaki, Katsuhito Sudo, Duh Kevin, Moto Tsukada, Masaaki Nagata, "Automatic Evaluation Method of Machine Translation Based on Word Order Correlation", Natural Language Processing, vol.21, no.3 , pp.421-444, 2014.)

なお、単語をそのまま用いて類似度を計算する代わりに、日本語語彙大系（参考非特許文献５）のような辞書を用いて、単語を単語クラスに抽象化したうえで類似度を計算してもよい。
（参考非特許文献５：池原悟，宮崎正弘，白井諭，横尾昭男，中岩浩巳，小倉健太郎，大山芳史，林良彦，“日本語語彙大系”，岩波書店，1997.） Instead of calculating the similarity using the words as they are, using a dictionary such as Japanese Vocabulary Taiyo (Reference Non-Patent Document 5), the words are abstracted into word classes, and the similarity is calculated. May be.
(Reference non-patent document 5: Satoru Ikehara, Masahiro Miyazaki, Satoshi Shirai, Akio Yokoo, Hiromi Nakaiwa, Kentaro Ogura, Yoshifumi Oyama, Yoshihiko Hayashi, "Japanese vocabulary system", Iwanami Shoten, 1997.)

さらに、類似度を計算するときに考慮する単語を内容語に限定してもよい。 Furthermore, words to be considered when calculating the similarity may be limited to content words.

以下の対話例２を用いて、発話間類似度特徴量の例について説明する。
（対話例２）
１ユーザ: こんにちは/。/旅行/は/好き/です/か/？
２システム: はい/。/先日/京都/に/行き/まし/た/。
例えば、対話例２において、内容語（名詞・動詞・形容詞・独立詞など）に限定したword2vecの平均ベクトル間距離を考える。ユーザ発話からは、「こんにちは」「旅行」「好き」の３個の内容語が得られ、システム発話からは「はい」「先日」「京都」「行き」の４個の内容語が得られる。word2vecを用いて得られた単語をベクトルへ変換する。例えば、3次元のベクトルとして、「こんにちは」は(0.1,0.7,0.2)、「旅行」は(0.8,0.1,0.1)、「好き」は(0.3,0.4,0.3)、「はい」は(0.2,0.6,0.2)、「先日」は(0.1,0.1,0.8)、「京都」は(0.6,0.3,0.1)、「行き」は(0.7,0.2,0.1)が得られたとする。このとき、ユーザ発話の平均ベクトルは((0.1,0.7,0.2)+(0.8,0.1,0.1)+(0.3,0.4,0.3))/3 = (0.4,0.4,0.2)、システム発話の平均ベクトルは((0.2,0.6,0.2)+(0.1,0.1,0.8)+(0.6,0.3,0.1)+(0.7,0.2,0.1))/4 = (0.4,0.3,0.3)となる。これらのコサイン類似度（≒0.97）などを計算することで、上記ユーザ発話１と上記システム発話２との間の類似度を得ることができる。こうして得られた類似度を1つ以上並べたベクトル(0.97,…)が発話間類似度特徴量となる。 An example of the inter-utterance similarity feature amount will be described using Dialog Example 2 below.
(Interaction example 2)
1 user: Hello /. / Travel / is / like / is / does /?
2 System: Yes /. / The other day / Kyoto / To / Go / Most / Ta /.
For example, in Dialogue Example 2, consider the average vector-to-vector distance of word2vec limited to content words (nouns, verbs, adjectives, independent words, etc.). From the user utterance, "Hello", "travel" three content words "like" is obtained, the system is from the speech of four content words "YES" and "the other day", "Kyoto", "go" is obtained. Convert the word obtained using word2vec to a vector. For example, as a three-dimensional vector, "Hello" is (0.1,0.7,0.2), "travel" is (0.8,0.1,0.1), "like" the (0.3,0.4,0.3), 'Yes' (0.2 , 0.6, 0.2), "the other day" is (0.1, 0.1, 0.8), "Kyoto" is (0.6, 0.3, 0.1), and "going" is (0.7, 0.2, 0.1). At this time, the average vector of the user utterance is ((0.1, 0.7, 0.2) + (0.8, 0.1, 0.1) + (0.3, 0.4, 0.3)) / 3 = (0.4, 0.4, 0.2), and the average vector of the system utterance Is ((0.2,0.6,0.2) + (0.1,0.1,0.8) + (0.6,0.3,0.1) + (0.7,0.2,0.1)) / 4 = (0.4,0.3,0.3). By calculating the cosine similarity ($ 0.97) or the like, the similarity between the user utterance 1 and the system utterance 2 can be obtained. A vector (0.97,...) Obtained by arranging one or more similarities obtained in this manner is the inter-utterance similarity feature amount.

(1-3)話題繰り返し数特徴量
話題繰り返し数特徴量とは、対話内での話題の繰り返し数を表す特徴量である。ここで、話題とは、焦点となっている単語、焦点となっている述語項構造のことである。 (1-3) Topic repetition count feature The topic repetition count feature is a feature representing the number of repetitions of a topic in a dialogue. Here, the topic refers to a focused word and a focused predicate-argument structure.

ある特定の話題が連続して発話されるのは、一般的に不自然な振る舞いであるため、ユーザが違和感を覚えたり、ユーザの対話意欲が減退したりし、その結果対話が破綻することが多い。したがって、ある話題の繰り返し数を調べることにより、対話の破綻を検知することができる。 Consecutive utterances of a particular topic are generally unnatural behaviors, and may cause the user to feel uncomfortable or reduce the user's willingness to interact, and as a result, the dialogue may fail. Many. Therefore, by examining the number of repetitions of a certain topic, it is possible to detect the breakdown of the conversation.

以下の対話例３を用いて、話題繰り返し数特徴量の例について説明する。
（対話例３）
１システム: こんにちは。熱中症に気をつけて。
２ユーザ: はい。ありがとう。あなたも気を付けて。
３システム: 熱中症に気をつけないんですか？
４ユーザ: 小まめに水を飲んだりして、気を付けていますよ。
５システム：熱中症に気をつけたいんでしょう？
対話例３の場合、システムは「熱中症」という単語や「熱中症に気をつける」という述語項構造を繰り返して発話している。このとき、最後の発話である５のシステム発話における話題繰り返し数特徴量は3として計算する。 An example of the topic repetition number feature amount will be described using Dialog Example 3 below.
(Interaction example 3)
1 system: Hello. Watch out for heat stroke.
2 Users: Yes. Thank you. Be careful too.
3 System: Do you be careful about heat stroke?
4 Users: I drink a little water and watch out.
5 System: Do you want to be careful about heat stroke?
In the case of Dialogue Example 3, the system repeatedly utters the word “heat stroke” and the predicate-argument structure of “watch out for heat stroke”. At this time, the topic repetition number feature amount in the last system utterance of 5 is calculated as 3.

［対話行為のつながり］
対話行為特徴量とは、対話に含まれる発話が表す対話行為から生成される特徴量である。ここで、対話行為とは、質問・挨拶・自己開示・賞賛・謝罪などのユーザ等の発話意図のことである（参考非特許文献６）。対話行為は、後述するようにBag-of-wordsベクトルとして表すことができる。
（参考非特許文献６：T. Meguro, Y. Minami, R. Higashinaka, K. Dohsaka, “Controlling listening-oriented dialogue using partially observable Markov decision processes”, Proceedings of the 23rd international conference on computational linguistics. Association for Computational Linguistics (COLING 10), pp.761-769, 2010.） [Connection of dialogue act]
The dialogue act feature is a feature generated from a dialogue act represented by an utterance included in the dialogue. Here, the dialogue act refers to a user's utterance intention such as a question, a greeting, a self-disclosure, a praise, an apology, etc. The dialogue action can be represented as a Bag-of-words vector as described later.
(Reference Non-Patent Document 6: T. Meguro, Y. Minami, R. Higashinaka, K. Dohsaka, “Controlling listening-oriented dialogue using partially observable Markov decision processes”, Proceedings of the 23rd international conference on computational linguistics. Association for Computational Linguistics (COLING 10), pp.761-769, 2010.)

対話行為特徴量には、以下に説明する対話行為列特徴量と予測対話行為特徴量がある。 The dialogue act feature includes a dialogue act sequence feature and a predictive dialogue act feature described below.

(2-1)対話行為列特徴量
対話行為列特徴量とは、対話に含まれる各発話が表す対話行為を推定した結果（以下、推定対話行為という）を要素とするベクトルとして表現される特徴量である。推定結果（推定対話行為）はBag-of-wordsベクトルとして表すことができる。具体的には、各発話の推定対話行為に対応するBag-of-wordsベクトルは、1bestの対話行為の値を1、それ以外の対話行為の値を0とする1-of-Kベクトルとしたり、推定された対話行為らしさを表す確率分布（確率分布ベクトル）としたりすることで表現できる。なお、対話行為を推定する最後の発話を含み発話は、最後の発話を含む直前のM個の発話である。なお、Mは1〜4程度が好ましい。 (2-1) Dialogue act sequence feature amount The dialogue act sequence feature amount is a feature expressed as a vector having elements of a result of estimating a dialogue act represented by each utterance included in the dialogue (hereinafter referred to as an estimated dialogue act). Quantity. The estimation result (estimated dialogue action) can be represented as a Bag-of-words vector. Specifically, the Bag-of-words vector corresponding to the estimated dialogue action of each utterance is a 1-of-K vector in which the value of 1best dialogue action is 1 and the values of other dialogue actions are 0. Or a probability distribution (probability distribution vector) representing the estimated dialogue act. The utterance including the last utterance for estimating the dialogue action is M utterances immediately before including the last utterance. In addition, M is preferably about 1 to 4.

例えば、推定する対話行為を質問・挨拶・自己開示・賞賛・謝罪の５つとし、対話行為列を最後の発話を含む４つの直前の発話から生成する場合、４つの発話の対話行為の1bestが「挨拶⇒挨拶⇒自己開示⇒称賛」であるとき、Bag-of-wordsベクトルのベクトルである対話行為列特徴量は、((0,1,0,0,0), (0,1,0,0,0), (0,0,1,0,0), (0,0,0,1,0))となる。 For example, if the dialogue actions to be estimated are five questions, greeting, self-disclosure, praise, and apology, and the dialogue action sequence is generated from the four immediately preceding utterances including the last utterance, 1best of the dialogue actions of the four utterances is When “greeting ⇒ greeting ⇒ self-disclosure ⇒ praise”, the dialogue action sequence feature quantity, which is a vector of the Bag-of-words vector, is ((0,1,0,0,0), (0,1,0 , 0,0), (0,0,1,0,0), (0,0,0,1,0)).

各発話の推定対話行為を表すBag-of-wordsベクトルの生成には、単語を特徴量とするSVM(Support Vector Machine)を用いる。なお、人があらかじめ発話に対応する対話行為を付与した対話データベースを利用して、事前にSVMの学習を行っておく必要がある。 To generate a Bag-of-words vector representing an estimated dialogue action of each utterance, an SVM (Support Vector Machine) using words as feature amounts is used. In addition, it is necessary for a person to learn SVM in advance using a dialog database to which a dialog act corresponding to an utterance is given in advance.

(2-2)予測対話行為特徴量
予測対話行為特徴量とは、対話に含まれる発話から最後の発話が持つべき対話行為を予測した結果（以下、予測対話行為という）を表す予測結果ベクトル、予測結果ベクトルと最後の発話が表す対話行為を推定した結果を表す推定結果ベクトルを並べたベクトル、予測結果ベクトルと推定結果ベクトルの差分ベクトル、予測結果ベクトルと推定結果ベクトルの1bestが一致しているか否かの真偽値のうち、いずれか１つ以上からなる特徴量である。予測結果ベクトルと推定結果ベクトルの1bestが一致するとは、各ベクトルの要素のうち最大となる要素の次元が一致することをいう。 (2-2) Predictive dialogue act feature amount The predictive dialogue act feature amount is a prediction result vector representing a result of predicting a dialogue action that the last utterance should have from an utterance included in the dialogue (hereinafter, referred to as a predictive dialogue action), A vector in which the prediction result vector and the estimation result vector representing the result of estimating the dialogue action represented by the last utterance are arranged, the difference vector between the prediction result vector and the estimation result vector, and whether the 1best of the prediction result vector and the estimation result vector are the same. This is a feature quantity composed of one or more of the true / false values of no. A match between the prediction result vector and 1best of the estimation result vector means that the dimension of the largest element among the elements of each vector matches.

なお、予測結果（予測対話行為）は、(2-1)の推定結果と同様、Bag-of-wordsベクトルとして表すことができる。具体的には、最後の発話の予測対話行為に対応するBag-of-wordsベクトルは、1bestの対話行為の値を1、それ以外の対話行為の値を0とする1-of-Kベクトルとしたり、予測された対話行為らしさを表す確率分布（確率分布ベクトル）としたりすることで表現できる。 The prediction result (prediction dialogue action) can be represented as a Bag-of-words vector, similarly to the estimation result of (2-1). Specifically, the Bag-of-words vector corresponding to the predicted dialogue act of the last utterance is a 1-of-K vector in which the value of the dialogue act of 1best is 1 and the values of the other dialogue acts are 0. Or a probability distribution (probability distribution vector) representing the likelihood of a predicted dialogue action.

最後の発話の予測対話行為を表すBag-of-wordsベクトル（予測結果ベクトル）の生成には、単語や直前の発話の対話行為を特徴量とするSVMやPOMDP(Partially Observable Markov Decision Process)を用いる（参考非特許文献６）。なお、人があらかじめ発話に対応する対話行為を付与した対話データベースを利用して、事前にSVMやPOMDPの学習を行っておく必要がある。 A Bag-of-words vector (prediction result vector) representing the predicted dialogue action of the last utterance is generated using SVM or POMDP (Partially Observable Markov Decision Process) that features the dialogue action of a word or the immediately preceding utterance. (Reference non-patent document 6). It is necessary for a person to learn SVM and POMDP in advance by using a dialog database to which a dialog act corresponding to an utterance has been given.

例えば、対話行為を質問・挨拶・自己開示・賞賛・謝罪の５つとし、最後の発話から対話行為として“質問”が予測されたとするとき、予測した結果を表すBag-of-wordsベクトルと最後の発話が表す対話行為を推定した結果を表すBag-of-wordsベクトルを並べたベクトルは((1,0,0,0,0),(0,0,0,1,0))となる。また、それらの差分ベクトルは(1,0,0,-1,0)、一致しているかの真偽値は偽（0）となる。例えば、これらのベクトルを結合したベクトルを予測対話行為特徴量とすると、ベクトル(1,0,0,0,0,0,0,0,1,0,1,0,0,-1,0,0)が予測対話行為特徴量として得られることになる。 For example, if there are five dialogue actions: question, greeting, self-disclosure, praise, and apology, and if a “question” is predicted as a dialogue action from the last utterance, a Bag-of-words vector and a final The vector in which the Bag-of-words vectors representing the results of estimating the dialogue action represented by the utterance of the utterance are ((1,0,0,0,0), (0,0,0,1,0)) . The difference vectors are (1,0,0, -1,0), and the true / false value of the match is false (0). For example, assuming that a vector obtained by combining these vectors is a predictive dialogue action feature, the vector (1,0,0,0,0,0,0,0,1,0,1,0,0, -1,0 , 0) is obtained as the predictive dialogue action feature quantity.

(2-3)文字列共起特徴量
文字列共起特徴量とは、対話内の最後の発話とそれ以外の発話との間において共起している文字列Ngram（ただし、Nは3以上の整数）の組合せの出現結果を要素とするBag-of-wordsベクトルとして表現される特徴量である。
語尾の文字列は対話行為を表すことが多いため、それらの共起を見ることにより、対話行為の共起関係をとらえることができる。 (2-3) Character string co-occurrence feature A character string co-occurrence feature is a character string Ngram (where N is 3 or more) that co-occurs between the last utterance in a dialog and other utterances. ) Is a feature quantity expressed as a Bag-of-words vector having the appearance result of the combination of
Since the ending character string often represents a dialogue act, the co-occurrence relationship of the dialogue act can be grasped by looking at their co-occurrence.

以下の対話例４を用いて、文字列共起特徴量の例について説明する。
（対話例４）
１ユーザ：どこから来たんですか？
２システム：フォレストアドベンチャーと竹田城跡なら、どちらに関心がありますか？
例えば、N=3として文字列Ngramを抽出すると、ユーザ発話からは「どこか」「こから」「から来」…「すか？」が得られ、システム発話からは「フォレ」「ォレス」…「すか？」が得られる。ここで、特に語尾に着目して共起を取ると、「すか？-すか？」という組合せが得られる。 An example of a character string co-occurrence feature amount will be described using Dialog Example 4 below.
(Interaction example 4)
1 User: Where did you come from?
2 System: Which are you interested in, Forest Adventure or Takeda Castle Ruins?
For example, if a character string Ngram is extracted with N = 3, “somewhere”, “from here”, “from”, “Suka?” Can be obtained from the user's utterance, and “Fore”, “Oresu”,. Do you? " Here, if the co-occurrence is focused particularly on the endings, the combination "Suka? -Suka?" Is obtained.

このように、学習に用いる訓練データ内に現れるすべての組合せを列挙し、それぞれの組合せがある次元の要素に対応するベクトルを構成しておく。ベクトルの各次元の要素の値は、ある組合せが出現している場合は対応するベクトルの次元の要素を1、出現しない場合は0とする。例えば、上記により構成されるベクトルの次元数が5で、2次元目の要素に対応する組合せのみが出現していた場合、得られる文字列共起特徴量は(0,1,0,0,0)となる。 In this manner, all combinations appearing in the training data used for learning are listed, and a vector corresponding to each dimensional element is formed for each combination. The value of the element of each dimension of the vector is set to 1 if a certain combination appears, and 0 if it does not appear. For example, when the number of dimensions of the vector configured as described above is 5, and only a combination corresponding to the element of the second dimension appears, the obtained character string co-occurrence feature amount is (0, 1, 0, 0, 0).

［論理的なつながり］
(3-1)質問タイプ特徴量
質問タイプ特徴量とは、対話内の最後の発話の直前の発話が質問である場合、推定される質問タイプを表す特徴量である。質問タイプの例として、話者の具体的な嗜好や経験を問うパーソナリティ質問、具体的な事物を問うファクトイド質問、（ニュースなど）ある事象の５Ｗ１Ｈを問う質問などが挙げられる。また、“レストランの場所”のように、話題に紐付いた形で質問タイプを定義してもよい。 [Logical connection]
(3-1) Question type feature quantity The question type feature quantity is a feature quantity representing a question type estimated when the utterance immediately before the last utterance in the dialogue is a question. Examples of question types include personality questions asking specific tastes and experiences of speakers, factoid questions asking specific things, and questions asking 5W1H of certain events (such as news). Further, a question type may be defined in a form linked to a topic, such as “restaurant place”.

質問タイプの推定には、単語を特徴量とするSVMを用いる。なお、人があらかじめ質問タイプを分類したデータベースを利用して、事前にSVMの学習を行っておく必要がある。 For estimating the question type, an SVM using a word as a feature amount is used. In addition, it is necessary for a person to learn SVM in advance using a database in which question types are classified in advance.

対話システムには、上記質問タイプの一部に対する応答ができない（応答を苦手とする）ものもあるため、質問タイプ特徴量を用いると、そうしたシステム特性を反映した対話破壊力推定が可能になる。 Some dialog systems cannot respond to some of the above question types (they are not good at responding). Therefore, using question type features makes it possible to estimate dialog destructive power reflecting such system characteristics.

例えば、天気案内を行う対話システムは、ある特定の場所の天気についての質問には答えられるものの、その場所の観光情報やシステム自身のパーソナリティに関する質問には答えられないことが多い。そのため、質問タイプを“天気に関する質問”と“それ以外の質問”の2タイプとして定義し、ユーザからの質問がいずれの質問タイプかを推定して、(1,0)のように1-of-K表現を用いてベクトル化することで、質問タイプ特徴量を得る。 For example, an interactive system that provides weather guidance can answer questions about the weather at a specific place, but cannot answer questions about the tourist information of the place or the personality of the system itself. Therefore, the question type is defined as two types of “weather-related questions” and “other questions”, and the question from the user is estimated as one of the question types, and 1-of as in (1,0). The question type feature amount is obtained by vectorizing using the -K expression.

(3-2)質問クラス特徴量
質問クラス特徴量とは、対話内の最後の発話の直前の発話が質問である場合、質問が回答に要求していると推定される単語クラスを表すベクトル、回答（最後の発話）に含まれる単語クラスを表すベクトル、それらの差分ベクトル、それらの２つのベクトルが表す単語クラスが一致しているか否かを表す真偽値のうち、いずれか１つ以上からなる特徴量である。推定される単語クラスを表すベクトル、回答に含まれる単語クラスを表すベクトルは、確率分布ベクトルや1-of-Kベクトルとして表現することができる。 (3-2) Question class feature quantity The question class feature quantity is a vector representing a word class that is presumed to require an answer when the utterance immediately before the last utterance in the dialogue is a question, From one or more of the vectors representing the word classes included in the answer (the last utterance), their difference vectors, and boolean values indicating whether the word classes represented by these two vectors match or not. Characteristic amount. A vector representing the estimated word class and a vector representing the word class included in the answer can be represented as a probability distribution vector or a 1-of-K vector.

ENE（拡張固有表現）抽出技術を用いて、推定した単語クラス（つまり、質問クラス特徴量が表す単語クラス）が最後の発話に含まれるか否かを調べることにより、質問とその答えについての対応関係を調べることができる。 By using ENE (extended named entity) extraction technology to check whether the estimated word class (that is, the word class represented by the question class feature) is included in the last utterance, the correspondence between the question and the answer Can examine relationships.

以下の対話例５を用いて、質問クラス特徴量の例について説明する。
（対話例５）
１ユーザ：どこから来たんですか？
２システム：京都から来ました
例えば、対話例５において、ユーザ発話が回答に要求している単語クラスが「場所」であると推定され、システム発話に「場所」の単語クラスが含まれていると推定された場合を考える。単語クラスの集合を固有物、場所、数量としたとき、ユーザ発話から得られる1-of-Kベクトル（つまり、質問が回答に要求していると推定される単語クラスを表すベクトル）は(0,1,0)となり、システム発話から得られる1-of-Kベクトル（つまり、回答（最後の発話）に含まれる単語クラスを表すベクトル）は(0,1,0)となる。例えば、これらのベクトルを結合したベクトルを質問クラス特徴量とすると、ベクトル(0,1,0,0,1,0)が質問クラス特徴量として得られることになる。 An example of the question class feature amount will be described using Dialog Example 5 below.
(Example 5 of dialogue)
1 User: Where did you come from?
2 System: From Kyoto For example, in Dialogue Example 5, it is estimated that the word class requested by the user utterance for the answer is “place”, and the system utterance includes the “place” word class. Consider the case where it is estimated that Given a set of word classes as unique, place, and quantity, the 1-of-K vector obtained from the user's utterance (that is, the vector representing the word class that is presumed to require the answer to the question) is (0 , 1,0), and the 1-of-K vector obtained from the system utterance (that is, the vector representing the word class included in the answer (the last utterance)) is (0,1,0). For example, if a vector obtained by combining these vectors is used as a question class feature, a vector (0,1,0,0,1,0) is obtained as a question class feature.

なお、ユーザ発話からの単語クラスの推定には質問分類と呼ばれる技術が、システム発話からの単語クラスの推定にはENE抽出と呼ばれる技術が用いられることが多い。 A technique called question classification is often used for estimating a word class from a user utterance, and a technique called ENE extraction is often used for estimating a word class from a system utterance.

［発話自体の適切さ］
(4-1)パープレキシティ特徴量
パープレキシティ特徴量とは、対話に含まれる各発話について言語モデルを用いて計算したパープレキシティを表す特徴量である。パープレキシティは、単語間の連なりの自然さを表現しており、文法的に不自然な発話を検出することができる。また、言語モデルは、単語Ngramや文字Ngram（Nは1〜7程度が多い）を利用したもの、Recurrent Neural Networkを利用したものが知られている。パープレキシティを計算できるものであればどのような言語モデルを用いてもよい。パープレキシティ特徴量は、パープレキシティの値そのものを直接特徴量とする方法のほか、適当な個数に量子化した1-of-Kベクトルを特徴量としてもよい。 [Suitability of the utterance itself]
(4-1) Perplexity feature amount The perplexity feature amount is a feature amount representing perplexity calculated using a language model for each utterance included in a dialog. Perplexity expresses the naturalness of a series of words, and can detect grammatically unnatural utterances. As the language model, a model using a word Ngram or a character Ngram (N is often about 1 to 7) and a model using a Recurrent Neural Network are known. Any language model that can calculate perplexity may be used. The perplexity feature amount may be a method in which the perplexity value itself is directly used as the feature amount, or a 1-of-K vector quantized to an appropriate number may be used as the feature amount.

なお、単語自体の出現確率に依らず文の流暢さを重視して表現するために、上記のように計算されるパープレキシティの代わりに、パープレキシティを各単語の出現確率で正規化した値（パープレキシティを単語の出現確率で割った値、以下、正規化パープレキシティという）を用いてもよい。 In addition, instead of the perplexity calculated as described above, perplexity was normalized by the appearance probability of each word in order to express with emphasis on sentence fluency regardless of the appearance probability of the word itself. A value (a value obtained by dividing perplexity by the probability of occurrence of a word, hereinafter, referred to as normalized perplexity) may be used.

例えば、システム発話「どこのご出身ですか？」のパープレキシティとシステム発話「の出身ですどこごか？」のパープレキシティとでは使われている単語は同一であるが、「どこのご出身ですか？」の方が流暢な表現であり、パープレキシティが低下することが期待される。 For example, while the perplexity of the system utterance "Where are you from?" And the perplexity of the system utterance "Where are you from?" Are you from? ”Is a more fluent expression, and it is expected that perplexity will decrease.

(4-2)単語特徴量
単語特徴量とは、対話に含まれる各発話の単語N-gram（Nは1〜5程度）を並べたBag-of-wordsベクトルとして表現される特徴量である。
単語特徴量を利用することにより、ある対話システムが出力しやすい誤りパターンをとらえることが可能になる。 (4-2) Word feature quantity The word feature quantity is a feature quantity expressed as a Bag-of-words vector in which words N-grams (N is about 1 to 5) of each utterance included in the dialog are arranged. .
By using the word feature amount, it becomes possible to capture an error pattern that is easily output by a certain interactive system.

単語特徴量に用いる単語は、対話内出現数やTF-IDF値を用いて上位N個に足切りして用いてもよい。また、考慮する単語を内容語のみに限定し、助詞などの話題に関わらない単語を除外するようにしてもよい。その他、考慮する単語を名詞に比べて種類が少ない述語のみに限定してもよい。このようにすると、名詞の多様性に対して頑健に推定することができる。 The word used for the word feature amount may be cut off to the top N words using the number of occurrences in the dialog or the TF-IDF value. Further, the words to be considered may be limited to only the content words, and words not related to the topic such as particles may be excluded. In addition, the words to be considered may be limited to only predicates having fewer types than nouns. In this way, it is possible to robustly estimate the diversity of nouns.

以下の対話例６を用いて、単語特徴量の例について説明する。
（対話例６）
１ユーザ: こんにちは/。/旅行/は/好き/です/か/？
２システム: はい/。/先日/京都/に/行き/まし/た/。
例えば、対話例６において、ユーザ発話からは、「こんにちは」「。」「旅行」「は」「好き」「です」「か」「？」の８個の単語が得られ、システム発話からは「はい」「。」「先日」「京都」「に」「行き」「まし」「た」「。」の９個の単語が得られた場合、単語Ngram(N=1)のBag-of-wordsベクトルは、ユーザ発話からは「こんにちは」「。」「旅行」「は」「好き」「です」「か」「？」に対応する次元の要素が1、それ以外が0となるベクトルが単語特徴量として得られる。一方、システム発話からは「はい」「。」「先日」「京都」「に」「行き」「まし」「た」「。」に対応する次元の要素が1、それ以外が0となるベクトルが単語特徴量として得られる。 An example of the word feature amount will be described using Dialog Example 6 below.
(Example 6 of dialogue)
1 user: Hello /. / Travel / is / like / is / does /?
2 System: Yes /. / The other day / Kyoto / To / Go / Most / Ta /.
For example, in an interactive Example 6, from the user utterance, "Hello", ".", "Travel", "may,""love,""is,""or""?" From the eight words is obtained, the system utterance of " If nine words of "Yes", ".", "The other day", "Kyoto", "Nii", "Go", "Mashi", "Ta" and "." Are obtained, Bag-of-words of the word Ngram (N = 1) vector, from the user utterance "Hello", ".", "travel", "may,""love,""is,""or""?" dimension of the elements 1 corresponding to, vector word feature other than that it becomes 0 Obtained as a quantity. On the other hand, from the system utterance, a vector whose dimension element corresponding to "Yes", ".", "The other day", "Kyoto", "Nii", "Go", "Better", "Ta", and "." It is obtained as a word feature.

(4-3)単語クラス特徴量
単語クラス特徴量とは、対話に含まれる各発話の単語に対応する単語クラスを並べたBag-of-classesベクトルとして表現される特徴量である。単語クラスとは、その単語のおおまかな意味を表すものである。 (4-3) Word Class Feature The word class feature is a feature expressed as a Bag-of-classes vector in which word classes corresponding to the words of each utterance included in the dialog are arranged. The word class represents the general meaning of the word.

単語クラスの構成方法には、Wordnetや日本語語彙大系などの辞書に付与されたクラス情報を用いる辞書ベースの方法、Word2vecのベクトルをK-meansでクラスタリングし、単語の集合を生成する方法などがある。 The word class construction method is a dictionary-based method using class information given to dictionaries such as Wordnet or Japanese vocabulary, a method of clustering Word2vec vectors with K-means and generating a set of words, etc. There is.

単語クラス特徴量に用いる単語クラスは、対話内出現数、TF-IDF値を用いて上位N個に足切りして用いてもよい。 The word class used for the word class feature may be cut off to the top N using the number of occurrences in the dialogue and the TF-IDF value.

以下の対話例７を用いて、単語クラス特徴量の例について説明する。
（対話例７）
１ユーザ：どこから来たんですか？
２システム：京都から来ました
例えば、対話例７において、単語クラスを人名、場所、金額に限定した場合を考える。このとき、ユーザ発話には単語クラスに変換可能な単語が含まれていないため、(0,0,0)が単語クラス特徴量として得られる。一方、システム発話には「場所」の単語クラスが含まれていると推定されるため、(0,1,0)が単語クラス特徴量として得られる。 An example of the word class feature amount will be described using the following interactive example 7.
(Example 7 of dialogue)
1 User: Where did you come from?
2 System: From Kyoto For example, in Dialogue Example 7, consider the case where the word classes are limited to person names, places, and amounts. At this time, since the user utterance does not include a word that can be converted to a word class, (0,0,0) is obtained as a word class feature amount. On the other hand, since it is estimated that the system utterance includes the word class of “place”, (0,1,0) is obtained as the word class feature amount.

(4-4)単語ベクトル特徴量
単語ベクトル特徴量とは、対話に含まれる各発話の単語N-gram（Nは1〜5程度）を表すベクトルから生成されるベクトルとして表現される特徴量である。例えば、重み付き平均や要素ごとの掛け合わせを用いて生成することができる。 (4-4) Word Vector Feature The word vector feature is a feature expressed as a vector generated from a vector representing a word N-gram (N is about 1 to 5) of each utterance included in the dialogue. is there. For example, it can be generated using a weighted average or multiplication for each element.

重み付き平均を用いる場合、単語ベクトル特徴量は、対話に含まれる各発話の単語N-gramを表すベクトルを重み付き平均として構成したベクトルで表される特徴量となる。 When the weighted average is used, the word vector feature amount is a feature amount represented by a vector configured as a weighted average of a vector representing the word N-gram of each utterance included in the dialogue.

単語N-gramを表すベクトルは、例えば、Word2vecで抽出すればよい。重み付き平均の重みには、TF-IDF値を用いてもよいし、すべて等しくしてもよい。 The vector representing the word N-gram may be extracted by, for example, Word2vec. As the weight of the weighted average, a TF-IDF value may be used, or all may be equal.

また、重み付き平均の算出に用いる単語N-gramを表すベクトルの数をTF-IDF値を用いて足切りし、上位M個のみを用いるようにしてもよい。 Alternatively, the number of vectors representing the word N-gram used for calculating the weighted average may be truncated using the TF-IDF value, and only the top M words may be used.

ベクトルの要素ごとの掛けあわせを用いる場合、単語ベクトル特徴量は、対話に含まれる各発話の単語N-gramを表すベクトルの各要素を掛け合わせて構成したベクトルで表される特徴量となる。 When the multiplication for each element of the vector is used, the word vector feature amount is a feature amount represented by a vector configured by multiplying each element of the vector representing the word N-gram of each utterance included in the dialogue.

以下の対話例８を用いて、単語ベクトル特徴量の例について説明する。
（対話例８）
１ユーザ: こんにちは/。/旅行/は/好き/です/か/？
２システム: はい/。/先日/京都/に/行き/まし/た/。
例えば、対話例８において、内容語（名詞・動詞・形容詞・独立詞など）に限定したword2vecの平均ベクトルを単語ベクトル特徴量として考える。ユーザ発話からは、「こんにちは」「旅行」「好き」の３個の内容語が得られ、システム発話からは、「はい」「先日」「京都」「行き」の４個の内容語が得られる。word2vecを用いて得られた単語をベクトルへ変換する。例えば、3次元のベクトルとして、「こんにちは」は(0.1,0.7,0.2)、「旅行」は(0.8,0.1,0.1)、「好き」は(0.3,0.4,0.3)、「はい」は(0.2,0.6,0.2)、「先日」は(0.1,0.1,0.8)、「京都」は(0.6,0.3,0.1)、「行き」は(0.7,0.2,0.1)が得られたとする。このとき、ユーザ発話の平均ベクトル（単語ベクトル特徴量）は、((0.1,0.7,0.2)+(0.8,0.1,0.1)+(0.3,0.4,0.3))/3 = (0.4,0.4,0.2)、システム発話の平均ベクトル（単語ベクトル特徴量）は((0.2,0.6,0.2)+(0.1,0.1,0.8)+(0.6,0.3,0.1)+(0.7,0.2,0.1))/4 = (0.4,0.3,0.3)となる。 An example of the word vector feature amount will be described using Dialog Example 8 below.
(Example 8 of dialogue)
1 user: Hello /. / Travel / is / like / is / does /?
2 System: Yes /. / The other day / Kyoto / To / Go / Most / Ta /.
For example, in Dialogue Example 8, an average vector of word2vec limited to content words (noun, verb, adjective, independent word, etc.) is considered as a word vector feature. From the user utterance, "Hello" three content words "travel", "love" is obtained, from the system utterance, four of content words "YES" and "the other day", "Kyoto", "go" is obtained . Convert the word obtained using word2vec to a vector. For example, as a three-dimensional vector, "Hello" is (0.1,0.7,0.2), "travel" is (0.8,0.1,0.1), "like" the (0.3,0.4,0.3), 'Yes' (0.2 , 0.6, 0.2), "the other day" is (0.1, 0.1, 0.8), "Kyoto" is (0.6, 0.3, 0.1), and "going" is (0.7, 0.2, 0.1). At this time, the average vector (word vector feature amount) of the user utterance is ((0.1, 0.7, 0.2) + (0.8, 0.1, 0.1) + (0.3, 0.4, 0.3)) / 3 = (0.4, 0.4, 0.2) ), The average vector (word vector feature) of the system utterance is ((0.2, 0.6, 0.2) + (0.1, 0.1, 0.8) + (0.6, 0.3, 0.1) + (0.7, 0.2, 0.1)) / 4 = (0.4,0.3,0.3).

(4-5)文長特徴量
文長特徴量とは、対話内の最後の発話の単語長および文字長を表す特徴量である。
現在の対話システムでは、ユーザ発話の内容とシステム発話の内容との一貫性を誤りなく推定することは困難である。そのため、システム発話が長ければ長いほど、無関係な部分が含まれる可能性が多くなってしまうという問題がある。これを対話破壊力の推定に反映させるため、文長特徴量を用いることができる。
例えば、対話内の最後の発話が“買い物は一緒が楽しいですね”である場合、単語長は7、文字長は13となり、文長特徴量は(7,13)となる。 (4-5) Sentence length feature amount The sentence length feature amount is a feature amount representing the word length and character length of the last utterance in a dialogue.
In the current interactive system, it is difficult to estimate the consistency between the contents of the user utterance and the contents of the system utterance without error. Therefore, there is a problem that the longer the system utterance is, the more likely that an extraneous part is included. In order to reflect this in the estimation of the dialog destructive power, a sentence length feature can be used.
For example, when the last utterance in the dialogue is “shopping is fun,” the word length is 7, the character length is 13, and the sentence length feature amount is (7, 13).

(4-6)ターン数特徴量
ターン数特徴量とは、対話開始からの経過ターン数を表す特徴量である。
これは、いずれの対話システムも対話の冒頭部分では比較的適切な発話を生成しているものの、対話が経過するごとに不適切な発話の割合が増えていく傾向がみられるという特徴を表すためのものである。
例えば、図５の入力にある対話の場合、ターン数は5であるため、ターン数特徴量は(5)となる。 (4-6) Number-of-turns feature amount The number-of-turns feature amount is a feature amount representing the number of turns that have elapsed since the start of the dialogue.
This is because both dialog systems generate relatively appropriate utterances at the beginning of the dialogue, but the proportion of inappropriate utterances tends to increase as the dialogue progresses. belongs to.
For example, in the case of the dialog shown in the input of FIG. 5, the number of turns is 5, and therefore, the number-of-turns feature amount is (5).

［想定シナリオ内］
(5-1)頻出単語列特徴量
頻出単語列特徴量とは、対話内に所定の頻度T以上出現する単語Ngramの文字列を要素とする特徴量である。ここで、Nは4〜7程度、Tは10以上が好ましい。 [In scenario]
(5-1) Frequent Word String Feature Amount The frequent word string feature is a feature having a character string of a word Ngram appearing at a predetermined frequency T or more in a dialog as an element. Here, N is preferably about 4 to 7, and T is preferably 10 or more.

この特徴量は、文Aと文Bのどちらを発話するかのような、あらかじめ想定されたシナリオに誘導するシステム発話を入れ込んで対話システムを構成する場合を考慮したものである。シナリオに誘導した直後は、比較的それまでの文脈から切り離された形でシステムが応答できるため、適切な応答を生成しやすい。また、シナリオに誘導した直後は、他の部分とは評価傾向が異なると想定される。そのため、シナリオに誘導した直後か否かを推定するための特徴として頻出単語列特徴量を用いる。 This feature value takes into account a case where a dialogue system is constructed by inserting a system utterance that leads to a scenario assumed in advance, such as which one of sentence A and sentence B is uttered. Immediately after leading to a scenario, the system can respond in a relatively disconnected manner from the context so far, so it is easy to generate an appropriate response. Immediately after inducing the scenario, it is assumed that the evaluation tendency differs from the other parts. Therefore, a frequent word string feature is used as a feature for estimating whether or not it is immediately after a scenario is guided.

例えば、対話内に単語2gramである“買い物は”が所定の頻度3以上出現する場合、頻出単語列特徴量は、“買い物は”を要素として含み、(“買い物は”)のようにベクトルとして表現される。 For example, if the word “2 gram” “shopping” appears in the dialogue at a predetermined frequency of 3 or more, the frequent word string feature amount includes “shopping” as an element, and is a vector such as (“shopping”). Is expressed.

＜第一実施形態＞
［対話破壊モデル学習装置１００］
以下、図１〜図２を参照して対話破壊モデル学習装置１００について説明する。図１に示すように対話破壊モデル学習装置１００は、対話破壊特徴量抽出部１１０、モデル生成部１２０、記録部１９０を含む。記録部１９０は、対話破壊モデル学習装置１００の処理に必要な情報を適宜記録する構成部である。例えば、学習中の対話破壊モデル（対話破壊モデルパラメータ）を記録する。 <First embodiment>
[Dialogue destruction model learning device 100]
Hereinafter, the dialogue destruction model learning device 100 will be described with reference to FIGS. As shown in FIG. 1, the dialogue destruction model learning device 100 includes a dialogue destruction feature amount extraction unit 110, a model generation unit 120, and a recording unit 190. The recording unit 190 is a component that appropriately records information necessary for the processing of the dialogue destruction model learning device 100. For example, a dialogue destruction model during learning (dialogue destruction model parameter) is recorded.

また、対話破壊特徴量抽出部１１０は、第1種特徴量計算部１１０_１、…、第J種特徴量計算部１１０_Ｊを含む（ただし、Jは1以上の整数）。第j種特徴量計算部１１０_j(1≦j≦J)は、対話から第j種特徴量を計算するものである。第j種特徴量は、＜特徴量＞にて説明した特徴量のいずれかである。 Also, interactive fracture feature amount extraction unit 110, the one feature quantity calculating unit 110 1, _..., including a first J species feature quantity calculator 110 _J (although, J is an integer of 1 or more). The j-th feature amount calculation unit 110 _j (1 ≦ j ≦ J) calculates the j-th feature amount from a dialog. The j-th feature amount is any of the feature amounts described in <feature amount>.

学習開始前に、一連の発話からなる対話と当該対話が破綻しているか否かを示す正解データの組である訓練データを複数用意しておく。対話が破綻しているか否かを示す正解データは、破綻している、破綻していない、どちらでもないのいずれかを示すラベルでもよいし、破綻している、破綻していない、それ以外を確率変数とする確率分布であってもよい。また、正解データは、破綻の程度を示す実数値であってもよい。訓練データには、参考非特許文献１にある、人手であらかじめ対話の破綻を示すラベルや確率分布を付与した対話破綻データを利用することができる。 Prior to the start of learning, a plurality of training data, which are a set of a dialog made up of a series of utterances and correct data indicating whether or not the dialog has failed, are prepared. The correct answer data indicating whether or not the dialogue has failed may be a label indicating whether the dialogue has failed, has not failed, or has neither, or has failed, has not failed, or has not. The probability distribution may be a random variable. Further, the correct answer data may be a real numerical value indicating the degree of failure. As the training data, dialogue failure data to which a label indicating the failure of the dialogue or a probability distribution is manually added in advance in Reference Non-Patent Document 1 can be used.

対話破壊モデル学習装置１００は、訓練データである対話と正解データの組から、対話破壊モデルを学習する。対話破壊モデルは、対話が破綻しているか否かの程度を示す対話破壊力を推定するために用いる。 The dialogue destruction model learning device 100 learns a dialogue destruction model from a set of a dialogue and correct answer data which are training data. The dialogue destruction model is used for estimating a dialogue destructive power indicating a degree of whether or not a dialogue has failed.

図２に従い対話破壊モデル学習装置１００の動作について説明する。対話破壊特徴量抽出部１１０は、入力された対話から、最後の発話により対話が破綻しているか否かの特徴を示す対話破壊特徴量を抽出する（Ｓ１１０）。対話破壊特徴量は、第1種特徴量、…、第J種特徴量の組合せである。特徴量の組合せである対話破壊特徴量の例として、以下のようなものがある。 The operation of the dialog destructive model learning device 100 will be described with reference to FIG. The dialogue destruction feature extraction unit 110 extracts a dialogue destruction feature that indicates whether or not the dialogue is broken by the last utterance from the input dialogue (S110). The dialog destruction feature is a combination of the first type feature,..., The J type feature. The following is an example of the dialogue destruction feature amount which is a combination of the feature amounts.

1) 頻出単語列特徴量、発話間類似度特徴量、文長特徴量、ターン数特徴量、単語ベクトル特徴量、質問タイプ特徴量、質問クラス特徴量、話題繰り返し数特徴量のうち、いずれか1つ以上の特徴量を含む組合せ。
ここでの発話間類似度特徴量は、単語コサイン距離以外の類似度をベクトルの要素とする。 1) One of the frequent word string feature, inter-utterance similarity feature, sentence length feature, turn number feature, word vector feature, question type feature, question class feature, and topic repetition feature A combination that includes one or more features.
The inter-utterance similarity feature amount uses a similarity other than the word cosine distance as a vector element.

2) 対話行為特徴量、発話間類似度特徴量を含み、単語特徴量、単語クラス特徴量、単語組合せ特徴量、頻出単語列特徴量のうち少なくともいずれか1つの特徴量を含まない組合せ。
この組合せは、単語特徴量、単語クラス特徴量、単語組合せ特徴量、頻出単語列特徴量の４つの特徴量のうち、少なくともいずれか1つの特徴量を含まない。例えば、対話行為特徴量、発話間類似度特徴量に加えて、単語特徴量、単語クラス特徴量、単語組合せ特徴量の３つを含む組合せは対話破壊特徴量として適切であるが、対話行為特徴量、発話間類似度特徴量に加えて、単語特徴量、単語クラス特徴量、単語組合せ特徴量、頻出単語列特徴量の４つすべてを含む組合せは適切ではない。したがって、この組合せを用いると、特徴量数を押さえつつ、対話行為の不自然な遷移による破綻と話題の急激な遷移による破綻を効果的に推定することが可能になる。 2) A combination that includes a dialogue act feature, an inter-utterance similarity feature, and does not include at least one of a word feature, a word class feature, a word combination feature, and a frequent word string feature.
This combination does not include at least one of the four feature amounts of the word feature amount, the word class feature amount, the word combination feature amount, and the frequent word string feature amount. For example, a combination including three features of a word feature, a word class feature, and a word combination feature in addition to the dialogue feature and the inter-utterance similarity feature is appropriate as the dialogue destruction feature. A combination including all four of the word feature, the word class feature, the word combination feature, and the frequent word string feature in addition to the feature and the inter-utterance similarity feature is not appropriate. Therefore, by using this combination, it is possible to effectively estimate the failure due to the unnatural transition of the dialogue action and the failure due to the rapid transition of the topic while suppressing the number of feature values.

3) 対話行為特徴量、発話間類似度特徴量、文長特徴量を含み、単語特徴量、単語クラス特徴量、単語組合せ特徴量、頻出単語列特徴量のうち少なくともいずれか1つの特徴量を含まない組合せ。
この組合せを用いると、特徴量数を押さえつつ、対話行為の不自然な遷移による破綻、話題の急激な遷移による破綻、長過ぎる発話による無関係な話題の混入による破綻を効果的に推定することが可能になる。つまり、2)よりも好ましい特徴を備えた組合せになる。 3) Includes dialogue action features, inter-utterance similarity features, and sentence length features, and calculates at least one of the word features, word class features, word combination features, and frequent word string features. Not including combinations.
Using this combination, it is possible to effectively estimate failures due to unnatural transitions in dialogue actions, failures due to sudden transitions in topics, and failures due to the inclusion of irrelevant topics due to too long utterances, while suppressing the number of feature values. Will be possible. In other words, a combination having more preferable characteristics than the case 2) is obtained.

4) 対話行為特徴量、発話間類似度特徴量、文長特徴量、ターン数特徴量を含み、単語特徴量、単語クラス特徴量、単語組合せ特徴量、頻出単語列特徴量のうち少なくともいずれか1つの特徴量を含まない組合せ。
この組合せを用いると、対話開始時の破綻しにくさを反映しかつ特徴量数を押さえつつ、対話行為の不自然な遷移による破綻、話題の急激な遷移による破綻、長過ぎる発話による無関係な話題の混入による破綻を効果的に推定することが可能になる。つまり、3)よりも好ましい特徴を備えた組合せになる。 4) Including dialogue action features, inter-utterance similarity features, sentence length features, and turn number features, and at least one of word features, word class features, word combination features, and frequent word string features A combination that does not include one feature.
When this combination is used, it reflects the difficulty of failure at the beginning of the dialogue and suppresses the number of features, while the failure due to an unnatural transition of the dialogue action, the failure due to a sudden transition of the topic, and the irrelevant topic due to an excessively long utterance It is possible to effectively estimate a failure due to the incorporation of. In other words, a combination having characteristics more preferable than 3) is obtained.

5) 対話行為特徴量、発話間類似度特徴量、文長特徴量、ターン数特徴量を含む一方、単語特徴量、単語クラス特徴量、単語組合せ特徴量、頻出単語列特徴量を含まない組合せ。
この組合せは、4)の組合せをより制限したものになっている。 5) Combinations that include dialogue action features, inter-utterance similarity features, sentence length features, and turn number features, but do not include word features, word class features, word combination features, and frequent word string features .
This combination further restricts the combination of 4).

6) 対話行為特徴量、発話間類似度特徴量、文長特徴量、ターン数特徴量、文字列共起特徴量を含む一方、単語特徴量、単語クラス特徴量、単語組合せ特徴量、頻出単語列特徴量を含まない組合せ。
この組合せは、5)の組合せをより制限したものになっている。 6) Includes dialogue act features, inter-utterance similarity features, sentence length features, turn number features, and character string co-occurrence features, while word features, word class features, word combination features, and frequent words Combinations that do not include column features.
This combination further restricts the combination of 5).

7) 対話行為特徴量、発話間類似度特徴量、文長特徴量、ターン数特徴量、パープレキシティ特徴量を含み、単語特徴量、単語クラス特徴量、単語組合せ特徴量、頻出単語列特徴量のうち少なくともいずれか1つの特徴量を含まない組合せ
この組合せを用いると、対話開始時の破綻しにくさ及び発話自体の構文の自然さを反映しかつ特徴量数を押さえつつ、対話行為の不自然な遷移による破綻、話題の急激な遷移による破綻、長過ぎる発話による無関係な話題の混入による破綻を効果的に推定することが可能になる。つまり、4)よりも好ましい特徴を備えた組合せになる。 7) Including dialogue act features, inter-utterance similarity features, sentence length features, turn number features, perplexity features, word features, word class features, word combination features, frequent word string features Combination that does not include at least one of the feature quantities of this quantity When this combination is used, it reflects the difficulty of failure at the start of the dialogue and the naturalness of the syntax of the utterance itself and suppresses the number of feature quantities, It is possible to effectively estimate a failure due to an unnatural transition, a failure due to a sudden transition of a topic, or a failure due to mixing of an unrelated topic due to an utterance that is too long. In other words, the combination having characteristics more preferable than 4) is obtained.

8) 発話間類似度特徴量、頻出単語列特徴量、対話行為特徴量、文長特徴量、単語組合せ特徴量を含む組合せ。
この組合せは、シナリオに基づいて動作する対話システムの固有の振る舞いを捉えるために頻出単語列特徴量を、効率的な話題遷移を捉えるために単語組合せ特徴量を利用する。 8) Combinations including inter-utterance similarity features, frequent word string features, dialogue act features, sentence length features, and word combination features.
This combination uses frequently occurring word string features to capture the unique behavior of a dialog system that operates based on a scenario, and word combination features to capture efficient topic transitions.

9) 対話行為特徴量、発話間類似度特徴量を含み、更に単語特徴量、単語クラス特徴量のうちいずれか1つ以上の特徴量を含む組合せ。
少なくとも、単語特徴量、単語クラス特徴量のいずれかを特徴量として含むことにより、この組合せはデータ量を十分に確保できる場合により性能を高めることが可能となる。 9) A combination that includes a dialogue act feature, an inter-utterance similarity feature, and further includes at least one of a word feature and a word class feature.
By including at least one of the word feature amount and the word class feature amount as the feature amount, this combination can further improve the performance when the data amount can be sufficiently secured.

モデル生成部１２０は、Ｓ１１０で抽出した対話破壊特徴量と入力された正解データを用いて、対話破壊モデルを生成する（Ｓ１２０）。ここで、対話破壊モデルの学習アルゴリズムには、どのようなアルゴリズムを用いてもよい。例えば、ディープニューラルネットワーク（DNN: Deep Neural Networks）、SVM、ExtraTreesClassifierを用いることができる。 The model generation unit 120 generates a dialogue destruction model using the dialogue destruction feature amount extracted in S110 and the input correct data (S120). Here, any algorithm may be used as the learning algorithm of the dialogue destruction model. For example, a Deep Neural Network (DNN), SVM, ExtraTreesClassifier can be used.

これらの学習アルゴリズムにより獲得される対話破壊モデルを用いて構成される対話破壊力推定装置には、それぞれ以下のような特徴がある。DNNを用いた場合、特徴量の組合せを自動的に考慮した推定ができる点において優れている（実際、実験的に優れた結果を示している）が、訓練データの量が少ない場合に挙動が安定しないという問題がある。SVMを用いた場合、推定精度のピーク値ではDNNに劣るものの、少量の訓練データでも効率的に学習できる。ExtraTreesClassifierを用いた場合、訓練データの量が少ない場合でも特徴量の組合せを考慮した推定が可能であるが、特徴量が多くなると、モデル学習に時間がかかり、推定精度も低下するという問題がある。 The dialogue destruction force estimating devices configured using the dialogue destruction models obtained by these learning algorithms have the following features, respectively. The use of DNN is superior in that it can automatically take into account the combination of features (in fact, it shows good results experimentally), but the behavior is small when the amount of training data is small. There is a problem that it is not stable. When the SVM is used, although the peak value of the estimation accuracy is inferior to DNN, it is possible to efficiently learn even a small amount of training data. When ExtraTreesClassifier is used, estimation can be performed in consideration of the combination of features even when the amount of training data is small.However, when the amount of features increases, model learning takes time and the accuracy of estimation decreases. .

なお、正解データが確率分布として与えられる場合（つまり、対話破壊力推定装置が対話の破綻を確率分布として推定する場合）は、SVMの代わりにSupport Vector Regressorを、ExtraTreesClassifierの代わりにExtraTreesRegressorを用いるとよい。 When the correct answer data is given as a probability distribution (that is, when the dialogue destruction force estimating device estimates the failure of the dialogue as a probability distribution), the Support Vector Regressor is used instead of the SVM, and the ExtraTreesRegressor is used instead of the ExtraTreesClassifier. Good.

生成した対話破壊モデルは、フィードバックされ、次の訓練データを用いた学習に利用される。 The generated dialogue destruction model is fed back and used for learning using the next training data.

モデル生成部１２０は、学習アルゴリズムに基づく計算を実行する構成部である。したがって、対話破壊モデル学習装置１００は、学習開始までに、記録部１９０に記録した対話破壊モデルの初期値をモデル生成部１２０に設定する。また、対話破壊モデル学習装置１００は、学習中、モデル生成部１２０が対話破壊モデルを生成する都度、生成した対話破壊モデルをモデル生成部１２０に設定する。 The model generation unit 120 is a component that executes a calculation based on a learning algorithm. Therefore, the dialogue destruction model learning device 100 sets the initial value of the dialogue destruction model recorded in the recording unit 190 in the model generation unit 120 before learning starts. Further, the dialogue destruction model learning apparatus 100 sets the generated dialogue destruction model in the model generation unit 120 each time the model generation unit 120 generates the dialogue destruction model during learning.

対話破壊モデル学習装置１００は、Ｓ１１０〜Ｓ１２０の処理を訓練データの数だけ繰り返し、最終的に生成された対話破壊モデルを学習結果として出力する。 The dialogue destruction model learning apparatus 100 repeats the processing of S110 to S120 by the number of training data, and outputs a finally generated dialogue destruction model as a learning result.

なお、対話破壊特徴量を抽出する対話破壊特徴量抽出部１１０を対話破壊モデル学習装置１００の一部としてではなく、独立した装置（以下、対話破壊特徴量抽出装置という）として扱うこともできる。この場合、対話破壊特徴量抽出装置は、対話破壊特徴量抽出部１１０と記録部１９０を含む。対話破壊特徴量抽出装置は、対話を入力として、当該対話が破綻しているか否かの特徴を示す特徴量の組合せである対話破壊特徴量を抽出、出力するものとなる。 Note that the dialogue destruction feature amount extraction unit 110 that extracts the dialogue destruction feature amount may be handled as an independent device (hereinafter, a dialogue destruction feature amount extraction device) instead of as a part of the dialogue destruction model learning device 100. In this case, the dialogue destruction feature amount extraction device includes a dialogue destruction feature amount extraction unit 110 and a recording unit 190. The dialogue destruction feature value extraction device extracts and outputs a dialogue destruction feature value which is a combination of feature values indicating characteristics of whether or not the dialogue has failed, using the dialogue as an input.

［対話破壊力推定装置２００］
以下、図３〜図４を参照して対話破壊力推定装置２００について説明する。図３に示すように対話破壊力推定装置２００は、対話破壊特徴量抽出部１１０、対話破壊力計算部２２０を含む。 [Interaction destructive force estimation device 200]
Hereinafter, the dialog destructive force estimation device 200 will be described with reference to FIGS. As shown in FIG. 3, the dialogue destruction force estimation device 200 includes a dialogue destruction feature amount extraction unit 110 and a dialogue destruction force calculation unit 220.

また、対話破壊力推定装置２００は、学習結果記録部２９０と接続している。学習結果記録部２９０は、対話破壊モデル学習装置１００が学習した対話破壊モデルを記録している。なお、学習結果記録部２９０は、対話破壊力推定装置２００に含まれる構成部としてもよい。 Further, the dialog destructive force estimation device 200 is connected to the learning result recording unit 290. The learning result recording unit 290 records the dialogue destruction model learned by the dialogue destruction model learning device 100. The learning result recording unit 290 may be a component included in the dialog destructive force estimation device 200.

対話破壊力推定装置２００は、一連の発話である対話から、当該対話の最後の発話が対話を破綻させる程度である対話破壊力を推定する。図５は、対話破壊力推定装置２００の入出力の例を示す。一連の発話（“買い物は一人が楽です”というシステムによる発話から“買い物は一緒が楽しいですね”というシステムによる発話まで）が入力である。“買い物は一緒が楽しいですね”というシステムによる発話が、最後の発話であり、対話破壊力を推定する対象となる。また、破綻していない（○）、破綻している（×）、どちらでもない（△）の確率分布として対話破壊力が推定されている。なお、ここでは確率値が最も大きいのが（つまり、1bestが）×であることからこの最後の発話により当該対話は破綻していると考えられる。 The dialog destructive force estimating apparatus 200 estimates a dialog destructive force that is the degree to which the last utterance of the dialog breaks the dialog, from the dialog that is a series of utterances. FIG. 5 shows an example of input and output of the interactive destructive force estimation device 200. A series of utterances (from utterances by the system "shopping is easy for one person" to utterances by the system "shopping is fun together") is input. The utterance made by the system "shopping is fun together" is the last utterance and is the target for estimating the destructive power of the dialogue. In addition, the dialog destructive power is estimated as a probability distribution of not failing (）), failing (x), or neither (△). Here, since the probability value is the largest (that is, 1best is ×), it is considered that the dialogue is broken by the last utterance.

対話破壊力推定装置２００は、推定開始までに、学習結果記録部２９０に記録した対話破壊モデルを対話破壊力計算部２２０に設定する。 The dialog destructive force estimation device 200 sets the dialog destructive model recorded in the learning result recording unit 290 in the dialog destructive force calculation unit 220 before the estimation starts.

図４に従い対話破壊力推定装置２００の動作について説明する。対話破壊特徴量抽出部１１０は、入力された対話から、最後の発話により対話が破綻しているか否かの特徴を示す対話破壊特徴量を抽出する（Ｓ１１０）。なお、最後の発話はユーザの発話であっても、システムによる発話であってもよい。 The operation of the dialog destructive force estimation device 200 will be described with reference to FIG. The dialogue destruction feature extraction unit 110 extracts a dialogue destruction feature that indicates whether or not the dialogue is broken by the last utterance from the input dialogue (S110). The last utterance may be an utterance of the user or an utterance by the system.

対話破壊力計算部２２０は、Ｓ１１０で抽出した対話破壊特徴量から、最後の発話が対話を破綻させる程度である対話破壊力を計算する（Ｓ２２０）。その際、対話破壊モデル学習装置１００が学習した対話破壊モデルを用いる。推定結果である対話破壊力は、生成した対話破壊モデルに応じて、ラベル、確率分布、実数値のいずれかとして計算される。 The dialog destructive power calculation unit 220 calculates a dialog destructive power that is such that the last utterance breaks the dialog from the dialog destructive feature amount extracted in S110 (S220). At that time, the dialog destruction model learned by the dialog destruction model learning device 100 is used. The dialogue destructive power as the estimation result is calculated as one of a label, a probability distribution, and a real value according to the generated dialogue destruction model.

本実施形態の発明によれば、対話を破綻させる様々な要因を踏まえた特徴量の組合せとして対話破壊特徴量を計算することができる。また、組合せに用いる各特徴量の特徴を考慮した対話破壊特徴量とすることにより、少量の訓練データから対話破壊モデルを学習することが可能となる。 According to the invention of this embodiment, the dialogue destruction feature amount can be calculated as a combination of feature amounts based on various factors that cause the dialogue to fail. Further, by setting the dialogue destruction feature amount in consideration of the feature of each feature amount used for the combination, it becomes possible to learn the dialogue destruction model from a small amount of training data.

システムによる発話に対して対話破壊力を推定することにより、対話破壊力が高い発話の出力を抑制することができる。その結果、対話の継続が容易になる。 By estimating the dialog destructive power for the utterance by the system, it is possible to suppress the output of an utterance having a high dialog destructive power. As a result, the continuation of the dialogue becomes easy.

また、実際の人とシステムが行った対話を対象に対話破壊力を推定することにより、対話が破綻している可能性の高い箇所を見つけることができ、システムの改善に活用することができる。 In addition, by estimating the dialog destructive power for a dialog between an actual person and the system, it is possible to find a place where the dialog is likely to be broken, which can be used for improving the system.

さらに、音声対話では、ユーザによる発話の音響特徴のみから音声認識エラーを検出することは難しいため、認識エラーを含んだ発話からシステムが発話を生成してしまうことがある。このような場合、ユーザ発話の音声認識結果と直前のシステム発話との間の対話的なつながりの自然性をユーザ発話の対話破壊力として推定することにより、認識エラーの検出、音声認識候補のリランキングが可能となり、スムースな音声対話が可能になる。 Further, in speech dialogue, since it is difficult to detect a speech recognition error only from the acoustic feature of the utterance by the user, the system may generate an utterance from the utterance including the recognition error. In such a case, by estimating the naturalness of the interactive connection between the speech recognition result of the user utterance and the immediately preceding system utterance as the dialog destructive power of the user utterance, detection of a recognition error and retrieval of speech recognition candidates are performed. Ranking becomes possible, and smooth voice dialogue becomes possible.

＜変形例＞
この発明は上述の実施形態に限定されるものではなく、この発明の趣旨を逸脱しない範囲で適宜変更が可能であることはいうまでもない。上記実施形態において説明した各種の処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。 <Modification>
The present invention is not limited to the above embodiment, and it goes without saying that changes can be made as appropriate without departing from the spirit of the present invention. The various processes described in the above embodiment may be performed not only in chronological order according to the order described, but also in parallel or individually according to the processing capability of the device that executes the process or as necessary.

＜補記＞
本発明の装置は、例えば単一のハードウェアエンティティとして、キーボードなどが接続可能な入力部、液晶ディスプレイなどが接続可能な出力部、ハードウェアエンティティの外部に通信可能な通信装置（例えば通信ケーブル）が接続可能な通信部、ＣＰＵ（Central Processing Unit、キャッシュメモリやレジスタなどを備えていてもよい）、メモリであるＲＡＭやＲＯＭ、ハードディスクである外部記憶装置並びにこれらの入力部、出力部、通信部、ＣＰＵ、ＲＡＭ、ＲＯＭ、外部記憶装置の間のデータのやり取りが可能なように接続するバスを有している。また必要に応じて、ハードウェアエンティティに、ＣＤ−ＲＯＭなどの記録媒体を読み書きできる装置（ドライブ）などを設けることとしてもよい。このようなハードウェア資源を備えた物理的実体としては、汎用コンピュータなどがある。 <Supplementary note>
The device of the present invention is, for example, a single hardware entity, an input unit to which a keyboard or the like can be connected, an output unit to which a liquid crystal display or the like can be connected, a communication device (for example, a communication cable) that can communicate outside the hardware entity. , A communication unit, a CPU (which may include a central processing unit, a cache memory and a register, etc.), a RAM and a ROM as a memory, an external storage device as a hard disk, and an input unit, an output unit, and a communication unit thereof. , A CPU, a RAM, a ROM, and a bus connected so that data can be exchanged between the external storage devices. If necessary, the hardware entity may be provided with a device (drive) that can read and write a recording medium such as a CD-ROM. A physical entity provided with such hardware resources includes a general-purpose computer.

ハードウェアエンティティの外部記憶装置には、上述の機能を実現するために必要となるプログラムおよびこのプログラムの処理において必要となるデータなどが記憶されている（外部記憶装置に限らず、例えばプログラムを読み出し専用記憶装置であるＲＯＭに記憶させておくこととしてもよい）。また、これらのプログラムの処理によって得られるデータなどは、ＲＡＭや外部記憶装置などに適宜に記憶される。 The external storage device of the hardware entity stores a program necessary for realizing the above-described functions, data necessary for processing the program, and the like. It may be stored in a ROM that is a dedicated storage device). Data obtained by the processing of these programs is appropriately stored in a RAM, an external storage device, or the like.

ハードウェアエンティティでは、外部記憶装置（あるいはＲＯＭなど）に記憶された各プログラムとこの各プログラムの処理に必要なデータが必要に応じてメモリに読み込まれて、適宜にＣＰＵで解釈実行・処理される。その結果、ＣＰＵが所定の機能（上記、…部、…手段などと表した各構成要件）を実現する。 In the hardware entity, each program stored in the external storage device (or ROM or the like) and data necessary for processing of each program are read into the memory as needed, and interpreted and executed / processed by the CPU as appropriate. . As a result, the CPU realizes a predetermined function (each of the components described as the above-mentioned... Section,... Means).

本発明は上述の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。また、上記実施形態において説明した処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されるとしてもよい。 The present invention is not limited to the above-described embodiment, and can be appropriately modified without departing from the spirit of the present invention. Further, the processes described in the above embodiments may be executed not only in chronological order according to the order of description, but also in parallel or individually according to the processing capability of the device that executes the processes or as necessary. .

既述のように、上記実施形態において説明したハードウェアエンティティ（本発明の装置）における処理機能をコンピュータによって実現する場合、ハードウェアエンティティが有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記ハードウェアエンティティにおける処理機能がコンピュータ上で実現される。 As described above, when the processing function of the hardware entity (the device of the present invention) described in the above embodiment is implemented by a computer, the processing content of the function that the hardware entity should have is described by a program. Then, by executing this program on a computer, the processing functions of the hardware entities are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ−ＲＡＭ（Random Access Memory）、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ−Ｒ（Recordable）／ＲＷ（ReWritable）等を、光磁気記録媒体として、ＭＯ（Magneto-Optical disc）等を、半導体メモリとしてＥＥＰ−ＲＯＭ（Electronically Erasable and Programmable-Read Only Memory）等を用いることができる。 A program describing this processing content can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used. Specifically, for example, a hard disk device, a flexible disk, a magnetic tape, or the like is used as a magnetic recording device, and a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), or a CD-ROM (Compact Disc Read Only) is used as an optical disk. Memory), CD-R (Recordable) / RW (ReWritable), magneto-optical recording media, MO (Magneto-Optical disc), EEP-ROM (Electronically Erasable and Programmable-Read Only Memory) as semiconductor memory, etc. Can be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The distribution of the program is carried out, for example, by selling, transferring, lending, or the like, a portable recording medium such as a DVD or a CD-ROM on which the program is recorded. Further, the program may be stored in a storage device of a server computer, and the program may be distributed by transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. Then, when executing the process, the computer reads the program stored in its own recording medium and executes the process according to the read program. Further, as another execution form of the program, the computer may directly read the program from the portable recording medium and execute processing according to the program, and further, the program may be transferred from the server computer to the computer. Each time, the processing according to the received program may be sequentially executed. A configuration in which the above-described processing is executed by a so-called ASP (Application Service Provider) type service which realizes a processing function only by executing an instruction and acquiring a result without transferring a program from the server computer to the computer. It may be. It should be noted that the program in the present embodiment includes information used for processing by the computer and which is similar to the program (data that is not a direct command to the computer but has characteristics that define the processing of the computer).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、ハードウェアエンティティを構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 Further, in this embodiment, a hardware entity is configured by executing a predetermined program on a computer. However, at least a part of these processing contents may be realized by hardware.

Claims

J is an integer of 1 or more, and a j-th feature quantity (1 ≦ j ≦ J) is a feature quantity indicating a feature indicating whether or not the dialogue has failed.
A dialogue destruction feature extraction unit that extracts a dialogue destruction feature that is a combination of the j-th type feature (1 ≦ j ≦ J) from a dialogue consisting of a series of utterances. ,
The dialog destruction feature amount extraction unit includes a j-th feature amount calculation unit that calculates the j-th feature amount from the dialog for each j satisfying 1 ≦ j ≦ J,
A dialogue act feature is defined as a feature generated from a dialogue act represented by an utterance included in the dialogue,
The similarity feature between utterances is a feature quantity indicating how similar the last utterance in the dialog is to other utterances,
A word feature is expressed as a Bag-of-words vector in which words N-grams of each utterance included in the dialog are arranged,
A word class feature is expressed as a Bag-of-classes vector in which word classes corresponding to words of each utterance included in the dialog are arranged,
The word combination feature is defined as the combination of any one of the word Ngram, word class Ngram, word set, and predicate term structure co-occurring between the last utterance and the other utterances in the dialog or in the last utterance. Feature quantity expressed as a Bag-of-words vector with the appearance result as an element,
The frequent word string feature amount is a feature amount having a character string of a word Ngram appearing at a predetermined frequency or more in a dialog as an element,
The sentence length feature is a feature representing the word length and character length of the last utterance in the dialogue,
The dialogue destruction feature includes a dialogue act feature, an inter-utterance similarity feature, and a sentence length feature, and is at least one of a word feature, a word class feature, a word combination feature, and a frequent word string feature. A dialogue destruction feature quantity extraction device characterized by a combination not including one feature quantity.

The dialogue destruction feature quantity extraction device according to claim 1 ,
The number-of-turns feature is a feature representing the number of turns elapsed since the start of the dialogue,
The dialogue destruction feature quantity extraction device, wherein the dialogue destruction feature quantity is a combination further including a turn number feature quantity.

The dialogue destruction feature quantity extraction device according to claim 2 ,
The dialogue destruction feature amount extraction apparatus, wherein the dialogue destruction feature amount is a combination that does not include any of a word feature amount, a word class feature amount, a word combination feature amount, and a frequent word string feature amount.

The dialogue destruction feature quantity extraction device according to claim 3 ,
Character string co-occurrence features are expressed as a Bag-of-words vector whose elements are the appearance results of combinations of character strings Ngram co-occurring between the last utterance in the dialog and other utterances Feature
The dialogue destruction feature quantity extraction device, wherein the dialogue destruction feature quantity is a combination further including a character string co-occurrence feature quantity.

The dialogue destruction feature quantity extraction device according to claim 2 ,
The perplexity feature amount is a feature amount representing perplexity calculated using a language model for each utterance included in the dialogue,
The dialogue destruction feature quantity extraction device, wherein the dialogue destruction feature quantity is a combination further including a perplexity feature quantity.

J is an integer of 1 or more, and a j-th feature quantity (1 ≦ j ≦ J) is a feature quantity indicating a feature indicating whether or not the dialogue has failed.
A dialogue destruction feature extraction unit that extracts a dialogue destruction feature that is a combination of the j-th type feature (1 ≦ j ≦ J) from a dialogue consisting of a series of utterances. ,
The dialog destruction feature amount extraction unit includes a j-th feature amount calculation unit that calculates the j-th feature amount from the dialog for each j satisfying 1 ≦ j ≦ J,
The similarity feature between utterances is a feature quantity indicating how similar the last utterance in the dialog is to other utterances,
A frequently occurring word string feature amount is a feature amount having a character string of a word Ngram appearing at a predetermined frequency or more in a dialogue as an element,
A dialogue act feature is defined as a feature generated from a dialogue act represented by an utterance included in the dialogue,
The sentence length feature is defined as a feature representing the word length and character length of the last utterance in the dialogue,
The word combination feature is defined as the combination of any one of the word Ngram, word class Ngram, word set, and predicate term structure co-occurring between the last utterance and the other utterances in the dialog or in the last utterance. A feature quantity expressed as a Bag-of-words vector with the appearance result as an element,
The dialogue destruction feature amount extraction device, wherein the dialogue destruction feature amount is a combination including an inter-utterance similarity feature amount, a frequent word string feature amount, a dialogue action feature amount, a sentence length feature amount, and a word combination feature amount. .

The dialogue destruction feature quantity extraction device according to any one of claims 1 to 6 , wherein:
When the dialogue destruction feature amount includes an inter-utterance similarity feature amount, the inter-utterance similarity feature amount is a similarity calculated using an average vector distance of word2vec, and a similarity calculated using a WordMoversDistance distance. , A dialogue destruction feature quantity extraction device characterized by being a vector in which at least one of the similarity calculated using a BLEU score and the similarity calculated using a ROUGE score is arranged as an element. .

J is an integer of 1 or more, and a j-th feature quantity (1 ≦ j ≦ J) is a feature quantity indicating a feature indicating whether or not the dialogue has failed.
A dialogue destruction feature extraction unit that extracts a dialogue destruction feature that is a combination of the j-th type feature (1 ≦ j ≦ J) from a dialogue consisting of a series of utterances. ,
The dialog destruction feature amount extraction unit includes a j-th feature amount calculation unit that calculates the j-th feature amount from the dialog for each j satisfying 1 ≦ j ≦ J,
A dialogue act feature is defined as a feature generated from a dialogue act represented by an utterance included in the dialogue,
The similarity feature between utterances is a feature quantity indicating how similar the last utterance in the dialog is to other utterances,
A word feature is expressed as a Bag-of-words vector in which words N-grams of each utterance included in the dialog are arranged,
The word class feature is a feature expressed as a Bag-of-classes vector in which word classes corresponding to words of each utterance included in the dialog are arranged,
The interactive fracture characteristic amount dialogue act characteristic quantity includes an utterance similarity between the feature amount, Ri combination der containing further word feature, one or more feature quantities any of the word class characteristic quantity,
When the dialogue destruction feature amount includes an inter-utterance similarity feature amount, the inter-utterance similarity feature amount is a similarity calculated using an average vector distance of word2vec, and a similarity calculated using a WordMoversDistance distance. , A dialogue destruction feature quantity extraction device characterized by being a vector in which at least one of the similarity calculated using a BLEU score and the similarity calculated using a ROUGE score is arranged as an element. .

The dialogue destruction feature quantity extraction device according to any one of claims 1 to 6 , wherein:
A dialogue action sequence feature is defined as a feature expressed as a vector having elements obtained by estimating a dialogue action represented by each utterance included in the dialogue,
The predicted dialogue act feature amount is a prediction result vector representing a result of predicting a dialogue action that the last utterance should have from the utterance included in the dialogue, and represents a result of estimating the dialogue act represented by the prediction result vector and the last utterance. Any one of a vector in which the estimation result vectors are arranged, a difference vector between the prediction result vectors and the estimation result vectors, and a boolean value indicating whether or not 1best of the prediction result vectors and the estimation result vectors match. The feature amount consisting of the above,
The dialogue act feature amount is a dialogue act sequence feature amount or a predictive dialogue act feature amount.

J is an integer of 1 or more, and a j-th feature quantity (1 ≦ j ≦ J) is a feature quantity indicating a feature indicating whether or not the dialogue has failed.
A dialogue destruction feature amount extracting apparatus for extracting a dialogue destruction feature amount which is a combination of the j-th type feature amount (1 ≦ j ≦ J) from a dialogue made up of a series of utterances. A destructive feature extraction method,
The dialog destruction feature amount extraction step includes a j-th type feature amount calculation step of calculating the j-th type feature amount from the dialog for each j satisfying 1 ≦ j ≦ J,
A dialogue act feature is defined as a feature generated from a dialogue act represented by an utterance included in the dialogue,
The similarity feature between utterances is a feature quantity indicating how similar the last utterance in the dialog is to other utterances,
A word feature is expressed as a Bag-of-words vector in which words N-grams of each utterance included in the dialog are arranged,
A word class feature is expressed as a Bag-of-classes vector in which word classes corresponding to words of each utterance included in the dialog are arranged,
The word combination feature is defined as the combination of any one of the word Ngram, word class Ngram, word set, and predicate term structure co-occurring between the last utterance and the other utterances in the dialog or in the last utterance. Feature quantity expressed as a Bag-of-words vector with the appearance result as an element,
The frequent word string feature amount is a feature amount having a character string of a word Ngram appearing at a predetermined frequency or more in a dialog as an element,
The sentence length feature is a feature representing the word length and character length of the last utterance in the dialogue,
The dialogue destruction feature includes a dialogue act feature, an inter-utterance similarity feature, and a sentence length feature, and is at least one of a word feature, a word class feature, a word combination feature, and a frequent word string feature. A method for extracting dialogue destruction features, which is a combination that does not include one feature .

  J is an integer of 1 or more, and a j-th feature quantity (1 ≦ j ≦ J) is a feature quantity indicating a feature indicating whether or not the dialogue has failed.
  A dialog destruction feature extracting step of extracting a dialog destruction feature, which is a combination of the j-th type feature (1 ≦ j ≦ J), from a dialog consisting of a series of utterances;
  Dialogue feature extraction method including
  The dialog destruction feature amount extraction step includes a j-th type feature amount calculation step of calculating the j-th type feature amount from the dialog for each j satisfying 1 ≦ j ≦ J,
  The similarity feature between utterances is a feature quantity indicating how similar the last utterance in the dialog is to other utterances,
  A frequently occurring word string feature amount is a feature amount having a character string of a word Ngram appearing at a predetermined frequency or more in a dialogue as an element,
  A dialogue act feature is defined as a feature generated from a dialogue act represented by an utterance included in the dialogue,
  The sentence length feature is defined as a feature representing the word length and character length of the last utterance in the dialogue,
  The word combination feature is defined as the combination of any one of the word Ngram, word class Ngram, word set, and predicate term structure co-occurring between the last utterance and the other utterances in the dialog or in the last utterance. A feature quantity expressed as a Bag-of-words vector with the appearance result as an element,
  The dialogue destruction feature amount extraction method, wherein the dialogue destruction feature amount is a combination including an utterance similarity feature amount, a frequent word string feature amount, a dialogue action feature amount, a sentence length feature amount, and a word combination feature amount. .

  J is an integer of 1 or more, and a j-th feature quantity (1 ≦ j ≦ J) is a feature quantity indicating a feature indicating whether or not the dialogue has failed.
  A dialog destruction feature extracting step of extracting a dialog destruction feature, which is a combination of the j-th type feature (1 ≦ j ≦ J), from a dialog consisting of a series of utterances;
  Dialogue feature extraction method including
  The dialog destruction feature amount extraction step includes, for each j satisfying 1 ≦ j ≦ J, a j-th feature amount calculation step of calculating the j-th feature amount from the dialog.
  A dialogue act feature is defined as a feature generated from a dialogue act represented by an utterance included in the dialogue,
  The similarity feature between utterances is a feature quantity indicating how similar the last utterance in the dialog is to other utterances,
  A word feature is expressed as a Bag-of-words vector in which words N-grams of each utterance included in the dialog are arranged,
  The word class feature is a feature expressed as a Bag-of-classes vector in which word classes corresponding to words of each utterance included in the dialog are arranged,
  The dialogue destruction feature amount includes a dialogue act feature amount, an utterance similarity feature amount, and further includes a word feature amount, a combination including one or more feature amounts of word class feature amounts,
  When the dialogue destruction feature amount includes an inter-utterance similarity feature amount, the inter-utterance similarity feature amount is a similarity calculated using an average vector distance of word2vec, and a similarity calculated using a WordMoversDistance distance. A dialogue destruction feature amount extraction method, characterized in that the vector is a vector in which at least one of the similarity calculated using a BLEU score and the similarity calculated using a ROUGE score is arranged as an element. .

Program for causing a computer to function as the interactive fracture feature extraction equipment according to any one of claims 1 to 9.