JP6220762B2

JP6220762B2 - Next utterance candidate scoring device, method, and program

Info

Publication number: JP6220762B2
Application number: JP2014219533A
Authority: JP
Inventors: 克人別所; 東中　竜一郎; 竜一郎東中; 牧野　俊朗; 俊朗牧野; 松尾　義博; 義博松尾
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2014-10-28
Filing date: 2014-10-28
Publication date: 2017-10-25
Anticipated expiration: 2034-10-28
Also published as: JP2016085685A

Description

本発明は、ユーザとの間で対話を行う対話システムにおいて、ユーザとの間で、ある発話列がやり取りされた後、システムが生成した次発話候補集合の中で、システム側が発する次発話として相応しい次発話候補を決定するための次発話候補スコアリング装置、方法、及びプログラムに関する。 The present invention is suitable as a next utterance to be uttered by the system side in a next utterance candidate set generated by the system after a certain utterance sequence is exchanged with the user in an interactive system that performs a dialogue with the user. The present invention relates to a next utterance candidate scoring apparatus, method, and program for determining a next utterance candidate.

非タスク指向型対話システムでは、対話の内容は雑談であり、非特許文献１では、ウェブやツイッター（登録商標）などの文章をデータベース化しておき、ユーザ発話に類似するものを選択することでシステム発話を生成する。 In the non-task-oriented dialog system, the content of the dialog is chat, and in Non-Patent Document 1, the system is prepared by creating a database of texts such as the web and Twitter (registered trademark) and selecting those similar to user utterances. Generate an utterance.

これまでやり取りされてきた発話列における焦点（トピック）を求め、当該焦点について言及した文をデータベースから次発話候補として複数取り、この次発話候補のスコアを算出する手法がある。スコア算出後、最高スコアを出した次発話候補をシステムがユーザに返す。あるいは、ある閾値以上のスコアをもつ次発話候補、またはスコアが上位何位かまでの次発話候補から、ランダムに選択した次発話候補を、システムがユーザに返す。 There is a method of obtaining a focus (topic) in an utterance sequence exchanged so far, taking a plurality of sentences referring to the focus as a next utterance candidate from a database, and calculating a score of the next utterance candidate. After the score is calculated, the system returns the next utterance candidate with the highest score to the user. Alternatively, the system returns to the user the next utterance candidate randomly selected from the next utterance candidate having a score equal to or higher than a certain threshold or the next utterance candidate having the highest score.

このような手法として、直前のＮ(≧１)個の発話の列を文脈とし、各次発話候補に対し、文脈発話列の概念ベクトルと、該次発話候補の概念ベクトルとの近さを表すスコアを算出する概念ベースに基づく手法がある。 As such a method, a sequence of N (≧ 1) utterances immediately before is used as a context, and for each next utterance candidate, the closeness between the concept vector of the context utterance sequence and the concept vector of the next utterance candidate is expressed. There is a method based on a concept base for calculating a score.

Bessho, F., Harada, T., and Kuniyoshi, Y.“Dialog System Using Real-Time Crowdsourcing and Twitter Large-Scale Corpus.”In Proc. SIGDIAL, pp. 227-231, 2012.Bessho, F., Harada, T., and Kuniyoshi, Y. “Dialog System Using Real-Time Crowdsourcing and Twitter Large-Scale Corpus.” In Proc. SIGDIAL, pp. 227-231, 2012.

上述した概念ベースに基づく手法では、文脈発話列と次発話候補の内容が近ければ高スコアとなるが、このため、内容は近いが次発話として相応しくない次発話候補が選択され、内容が少々遠いが次発話として相応しい次発話候補が選択されないことがあるという課題がある。例えば、文脈発話列が１質問文で、ある次発話候補が全く同じ質問文であれば、該次発話候補が選択されるが、該次発話候補は次発話として相応しいとは言えない。そして、当該質問文に回答しているような、より相応しい次発話候補が選択されない。 In the method based on the concept base described above, if the context utterance sequence and the content of the next utterance candidate are close to each other, the score is high. For this reason, the next utterance candidate that is close to the content but is not suitable as the next utterance is selected, and the content is a little far away. However, there is a problem that the next utterance candidate suitable as the next utterance may not be selected. For example, if the context utterance string is one question sentence and a certain next utterance candidate is the same question sentence, the next utterance candidate is selected, but it cannot be said that the next utterance candidate is suitable as the next utterance. Then, a more appropriate next utterance candidate that answers the question is not selected.

本発明の目的は、この課題を解決するため、次発話候補集合の中から、文脈発話列の次発話として相応しい次発話候補を選択することができる次発話候補スコアリング装置、方法、及びプログラムを提供することにある。 In order to solve this problem, an object of the present invention is to provide a next utterance candidate scoring apparatus, method, and program capable of selecting a next utterance candidate suitable as the next utterance in the context utterance sequence from the next utterance candidate set. It is to provide.

上記課題を解決するため、本発明に係る次発話候補スコアリング装置は、単語と該単語の意味を表す概念ベクトルとの対の集合である概念ベースと、文脈となる発話列Ａと、次発話候補Ｂと、前記発話列Ａの次発話として前記次発話候補Ｂが相応しいか否かのラベルＣとの組合せからなる発話列評価データＤの集合を入力とし、各発話列評価データＤに対し、前記概念ベースを参照し、前記発話列Ａの概念ベクトルＥを生成し、前記次発話候補Ｂの概念ベクトルＦを生成し、前記概念ベクトルＥと前記概念ベクトルＦとを結合した概念ベクトルＧと前記ラベルＣとの組合せからなる変換後発話列評価データＨを生成する発話列評価データ変換手段と、前記変換後発話列評価データＨの集合から、前記概念ベクトルＧと同次元の任意の概念ベクトルが、前記ラベルＣの一つの値に分類されるスコアを算出するための分類モデルを生成する学習手段と、を含んで構成されている。
上記の次発話候補スコアリング装置は、文脈となる発話列Ｉと次発話候補集合とを入力とし、前記概念ベースを参照し、前記発話列Ｉの概念ベクトルＪを生成し、前記次発話候補集合における各次発話候補Ｋに対し、前記次発話候補Ｋの概念ベクトルＬを生成し、前記概念ベクトルＪと前記概念ベクトルＬを結合した概念ベクトルＭを生成し、前記学習手段で生成された分類モデルを参照することにより、前記概念ベクトルＭが前記ラベルＣの一つの値に分類されるスコアを算出する評価手段を更に含むようにすることができる。 In order to solve the above-described problem, the next utterance candidate scoring device according to the present invention includes a concept base that is a set of a pair of a word and a concept vector representing the meaning of the word, an utterance string A as a context, and a next utterance. A set of utterance string evaluation data D composed of a combination of the candidate B and a label C indicating whether the next utterance candidate B is appropriate as the next utterance of the utterance string A is input, and for each utterance string evaluation data D, Referencing the concept base, generating a concept vector E of the utterance string A, generating a concept vector F of the next utterance candidate B, and combining the concept vector E and the concept vector F with the concept vector G and An utterance string evaluation data converting means for generating the converted utterance string evaluation data H composed of a combination with the label C, and an arbitrary concept vector having the same dimension as the concept vector G from the set of the converted utterance string evaluation data H. Le, are configured to include a learning means for generating a classification model for calculating a score is classified into one of the values of the label C.
The next utterance candidate scoring apparatus receives the context utterance string I and the next utterance candidate set as input, generates a concept vector J of the utterance string I by referring to the concept base, and generates the next utterance candidate set. For each next utterance candidate K in FIG. 4, a concept vector L of the next utterance candidate K is generated, a concept vector M obtained by combining the concept vector J and the concept vector L is generated, and the classification model generated by the learning means , It is possible to further include an evaluation means for calculating a score for classifying the concept vector M into one value of the label C.

また、本発明に係る次発話候補スコアリング方法は、単語と該単語の意味を表す概念ベクトルとの対の集合である概念ベースと、発話列評価データ変換手段と、学習手段とを含む次発話候補スコアリング装置における次発話候補スコアリング方法であって、前記発話列評価データ変換手段が、文脈となる発話列Ａと、次発話候補Ｂと、前記発話列Ａの次発話として前記次発話候補Ｂが相応しいか否かのラベルＣとの組合せからなる発話列評価データＤの集合を入力とし、各発話列評価データＤに対し、前記概念ベースを参照し、前記発話列Ａの概念ベクトルＥを生成し、前記次発話候補Ｂの概念ベクトルＦを生成し、前記概念ベクトルＥと前記概念ベクトルＦとを結合した概念ベクトルＧと前記ラベルＣとの組合せからなる変換後発話列評価データＨを生成し、前記学習手段が、前記変換後発話列評価データＨの集合から、前記概念ベクトルＧと同次元の任意の概念ベクトルが、前記ラベルＣの一つの値に分類されるスコアを算出するための分類モデルを生成する。
上記の次発話候補スコアリング方法は、評価手段が、文脈となる発話列Ｉと次発話候補集合とを入力とし、前記概念ベースを参照し、前記発話列Ｉの概念ベクトルＪを生成し、前記次発話候補集合における各次発話候補Ｋに対し、前記次発話候補Ｋの概念ベクトルＬを生成し、前記概念ベクトルＪと前記概念ベクトルＬを結合した概念ベクトルＭを生成し、前記学習手段で生成された分類モデルを参照することにより、前記概念ベクトルＭが前記ラベルＣの一つの値に分類されるスコアを算出することを更に含むようにすることができる。 Further, the next utterance candidate scoring method according to the present invention is a next utterance including a concept base that is a set of a pair of a word and a concept vector representing the meaning of the word, an utterance string evaluation data conversion unit, and a learning unit. The next utterance candidate scoring method in the candidate scoring device, wherein the utterance string evaluation data conversion means includes the utterance string A as a context, the next utterance candidate B, and the next utterance candidate as the next utterance of the utterance string A. A set of utterance string evaluation data D consisting of a combination with a label C indicating whether B is appropriate or not is used as an input, the concept base is referenced for each utterance string evaluation data D, and a concept vector E of the utterance string A is obtained. And generating a concept vector F of the next utterance candidate B, and evaluating a converted utterance string composed of a combination of the concept vector G and the label C obtained by combining the concept vector E and the concept vector F A score for classifying an arbitrary concept vector having the same dimension as the concept vector G into one value of the label C from the set of the converted utterance string evaluation data H. Generate a classification model for calculating.
In the next utterance candidate scoring method, the evaluation means receives the utterance sequence I as a context and the next utterance candidate set as input, generates a concept vector J of the utterance sequence I by referring to the concept base, For each next utterance candidate K in the next utterance candidate set, a concept vector L of the next utterance candidate K is generated, a concept vector M combining the concept vector J and the concept vector L is generated, and generated by the learning means By referring to the classified model, it is possible to further include calculating a score at which the concept vector M is classified into one value of the label C.

また、本発明のプログラムは、コンピュータを、上記の次発話候補スコアリング装置を構成する各手段として機能させるためのプログラムである。 Moreover, the program of this invention is a program for functioning a computer as each means which comprises said next utterance candidate scoring apparatus.

本発明では、次発話候補が文脈発話列の次発話として相応しいか否かを、分類問題として解く。請求項１は学習フェーズの処理を構成するものであり、請求項２は分類フェーズの処理を構成するものである。 In the present invention, whether the next utterance candidate is suitable as the next utterance in the context utterance string is solved as a classification problem. Claim 1 constitutes the process of the learning phase, and claim 2 constitutes the process of the classification phase.

本発明に係る次発話候補スコアリング装置、方法、及びプログラムによれば、文脈となる発話列Ａと、次発話候補Ｂと、発話列Ａの次発話として相応しいか否かのラベルＣとの組合せからなる発話列評価データＤの集合を入力とし、各発話列評価データＤに対し、発話列Ａの概念ベクトルＥを生成し、次発話候補Ｂの概念ベクトルＦを生成し、概念ベクトルＥと概念ベクトルＦとを結合した概念ベクトルＧとラベルＣとの組合せからなる変換後発話列評価データＨを生成し、変換後発話列評価データＨの集合から、概念ベクトルＧと同次元の任意の概念ベクトルが、ラベルＣの一つの値に分類されるスコアを算出するための分類モデルを学習して、次発話として相応しい発話をシステムが返すことにより、システムとユーザとのインタラクションが円滑になるという効果を奏する。 According to the next utterance candidate scoring device, method, and program according to the present invention, the combination of the utterance sequence A as the context, the next utterance candidate B, and the label C indicating whether the utterance sequence A is suitable as the next utterance A set of utterance string evaluation data D consisting of the following is input, and for each utterance string evaluation data D, a concept vector E of the utterance string A is generated, a concept vector F of the next utterance candidate B is generated, and the concept vector E and the concept A post-conversion utterance string evaluation data H composed of a combination of a concept vector G combined with the vector F and a label C is generated, and an arbitrary concept vector having the same dimension as the concept vector G is generated from the set of post-conversion utterance string evaluation data H. Learns a classification model for calculating a score that is classified into one value of label C, and the system returns an utterance suitable for the next utterance. There is an effect that emissions will be smooth.

本発明の実施の形態に係る次発話候補スコアリング装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the next utterance candidate scoring apparatus which concerns on embodiment of this invention. 発話列評価データＤの集合の例を示す図である。It is a figure which shows the example of the collection of utterance string evaluation data D. FIG. 概念ベースの例を示す図である。It is a figure which shows the example of a concept base. 変換後発話列評価データＨの集合の例を示す図である。It is a figure which shows the example of the set of the utterance string evaluation data H after conversion. 本発明の実施の形態に係る次発話候補スコアリング装置の発話列評価データ変換手段及び学習手段における処理ルーチンを示すフローチャート図である。It is a flowchart figure which shows the processing routine in the utterance string evaluation data conversion means and learning means of the next utterance candidate scoring apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る次発話候補スコアリング装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the next utterance candidate scoring apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る次発話候補スコアリング装置の評価手段における処理ルーチンを示すフローチャート図である。It is a flowchart figure which shows the processing routine in the evaluation means of the next utterance candidate scoring apparatus which concerns on embodiment of this invention. 概念ベクトルＭとそれに対して算出したスコアとの組の集合を示す図である。It is a figure which shows the set of the group of the concept vector M and the score calculated with respect to it.

以下、図面とともに本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

＜次発話候補スコアリング装置の構成＞ <Configuration of next utterance candidate scoring device>

次に、本発明の実施の形態に係る次発話候補スコアリング装置の構成について説明する。図１は、本発明の請求項１の次発話候補スコアリング装置の構成例である。図１に示すように、本発明の実施の形態に係る次発話候補スコアリング装置１００は、ＣＰＵと、ＲＡＭと、後述する処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。この次発話候補スコアリング装置１００は、機能的には図１に示すように入力手段１０と、演算手段２０とを備えている。 Next, the configuration of the next utterance candidate scoring device according to the embodiment of the present invention will be described. FIG. 1 is a configuration example of a next utterance candidate scoring device according to claim 1 of the present invention. As shown in FIG. 1, the next utterance candidate scoring device 100 according to the embodiment of the present invention includes a CPU, a RAM, and a ROM that stores programs and various data for executing processing routines to be described later. It can be configured with a computer including. Functionally, the next utterance candidate scoring device 100 includes an input unit 10 and a calculation unit 20 as shown in FIG.

入力手段１０は、文脈となる発話列Ａと、次発話候補Ｂと、発話列Ａの次発話として次発話候補Ｂが相応しいか否かのラベルＣとの組合せからなる発話列評価データＤの集合を入力とする。発話列評価データＤの集合は、学習フェーズにおける学習データ群である。図２は、発話列評価データＤの集合の例を示す図である。各行が、１発話列評価データＤである。 The input means 10 is a set of utterance string evaluation data D comprising a combination of a utterance string A as a context, a next utterance candidate B, and a label C indicating whether the next utterance candidate B is appropriate as the next utterance of the utterance string A. As an input. A set of utterance string evaluation data D is a learning data group in the learning phase. FIG. 2 is a diagram showing an example of a set of utterance string evaluation data D. As shown in FIG. Each row is one utterance column evaluation data D.

文脈となる発話列Ａは、システムとユーザとの間でやり取りされる発話の列である。発話列Ａを構成する発話の数は任意であるが、ある一定の数Ｎ（例えば２）としてもよい。 The context utterance string A is a string of utterances exchanged between the system and the user. The number of utterances constituting the utterance string A is arbitrary, but may be a certain number N (for example, 2).

発話列Ａの次発話として次発話候補Ｂが相応しい場合、ラベルＣは値１をとり、相応しくない場合、ラベルＣは値０をとる。発話列Ａと次発話候補Ｂからなるデータは、ラベルＣが１であれば正例データであり、ラベルＣが０であれば負例データである。 When the next utterance candidate B is appropriate as the next utterance of the utterance string A, the label C takes the value 1, and when it is not appropriate, the label C takes the value 0. The data composed of the utterance string A and the next utterance candidate B is positive example data if the label C is 1, and is negative example data if the label C is 0.

正例データは、人と人との対話ログからＮ＋１個の連続する発話の列をとり、発話列Ａを最初のＮ個の発話の列とし、次発話候補ＢをＮ＋１番目の発話とすることによっても構成できる。また、負例データは、人と人との対話ログからＮ＋１個の連続する発話の列をとり、発話列Ａを最初のＮ個の発話の列とし、次発話候補ＢをＮ＋１番目の発話以外の、次発話として相応しくない任意の発話とすることによっても構成できる。 The positive example data is a sequence of N + 1 consecutive utterances from a person-to-person dialogue log, the utterance sequence A is the first N utterance sequence, and the next utterance candidate B is the N + 1th utterance. Can also be configured. The negative example data is a sequence of N + 1 consecutive utterances from the person-to-person dialogue log, the utterance string A is the first N utterance strings, and the next utterance candidate B is other than the N + 1th utterance. It can also be configured by making an arbitrary utterance not suitable as the next utterance.

演算手段２０は、発話列評価データ変換手段２１と、概念ベース２２と、学習手段２３と、分類モデル２４と、を含んで構成されている。 The calculation means 20 includes an utterance string evaluation data conversion means 21, a concept base 22, a learning means 23, and a classification model 24.

概念ベース２２は、単語と該単語の意味を表す概念ベクトルとの対の集合を記憶している。概念ベース２２には、名詞、動詞、形容詞等の内容語のみを登録するというようにしてもよい。図３は、概念ベース２２の例である。概念ベース２２は、例えば、非特許文献2の手法によって生成する（非特許文献２：別所克人, 内山俊郎, 内山匡, 片岡良治, 奥雅博,“単語・意味属性間共起に基づくコーパス概念ベースの生成方式,”情報処理学会論文誌, Dec. 2008, Vol.49, No.12, pp.3997-4006.）。 The concept base 22 stores a set of pairs of a word and a concept vector representing the meaning of the word. Only the content words such as nouns, verbs, and adjectives may be registered in the concept base 22. FIG. 3 is an example of the concept base 22. The concept base 22 is generated, for example, by the method of Non-Patent Document 2 (Non-Patent Document 2: Katsuto Bessho, Toshiro Uchiyama, Jun Uchiyama, Ryoji Kataoka, Masahiro Oku, “Corpus Concept Based on Co-occurrence between Words and Meaning Attributes” Generating method of the base, “Journal of Information Processing Society of Japan, Dec. 2008, Vol.49, No.12, pp.3997-4006.).

概念ベース２２において単語は、該単語の終止形で登録されており、概念ベース２２を検索する際は、単語の終止形で検索する。各単語の概念ベクトルは長さ１に正規化されたｄ次元ベクトルであり、意味的に近い単語の概念ベクトルは、近くに配置されている。 In the concept base 22, the word is registered with the word end form. When the concept base 22 is searched, the word is searched with the word end form. The concept vector of each word is a d-dimensional vector normalized to a length of 1, and the concept vectors of semantically close words are arranged nearby.

本発明の処理では、与えられたテキストの概念ベクトルを、該テキストを単語分割して得られた各単語で概念ベース２２を検索し、取得した概念ベクトルを加算し、その結果得られた概念ベクトルを長さ１に正規化することによって生成する。ここで、該テキスト中の単語の中の内容語のみを使用して、該テキストの概念ベクトルを生成してもよい。 In the processing of the present invention, a concept vector of a given text is searched for the concept base 22 with each word obtained by dividing the text into words, the acquired concept vectors are added, and the resulting concept vector is obtained. Is generated by normalizing to length 1. Here, the concept vector of the text may be generated using only the content words in the words in the text.

該テキストが複数の発話からなる発話列である場合、各発話ごとに概念ベクトルを求めると、ある発話で十分な内容語が無い等の場合に、該発話の概念ベクトルの品質が低いものとなる可能性がある。各発話に十分な内容語が無い場合でも、発話列全体では、内容語が十分揃うことがあり、そのような場合、発話列の概念ベクトルは品質が高いものとなる。 When the text is an utterance sequence composed of a plurality of utterances, when a concept vector is obtained for each utterance, the quality of the utterance concept vector is low when there are not enough content words in a certain utterance. there is a possibility. Even when there are not enough content words for each utterance, the content words may be sufficiently arranged in the entire utterance string. In such a case, the concept vector of the utterance string has high quality.

もっともあえて、発話列の各発話ごとに概念ベクトルを求め、取得した概念ベクトルの和を長さ１に正規化したものを、発話列の概念ベクトルとするというようにしてもよい。 Needless to say, a concept vector may be obtained for each utterance in the utterance sequence, and the sum of the acquired concept vectors normalized to length 1 may be used as the concept vector of the utterance sequence.

以下、発話列評価データ変換手段２１の処理を述べる。 Hereinafter, processing of the speech string evaluation data conversion unit 21 will be described.

各発話列評価データＤに対し以下の処理を行う。 The following processing is performed on each utterance string evaluation data D.

概念ベース２２を参照し、発話列Ａの概念ベクトルＥを生成し、次発話候補Ｂの概念ベクトルＦを生成する。 Referring to the concept base 22, a concept vector E of the utterance string A is generated, and a concept vector F of the next utterance candidate B is generated.

ｄ次元概念ベクトルＥとｄ次元概念ベクトルＦを結合した２ｄ次元の概念ベクトルＧとラベルＣとの組合せからなる変換後発話列評価データＨを生成する。変換後発話列評価データＨは、２ｄ次元の素性ベクトルと分類ラベルＣからなる学習データであり、この２ｄ次元素性ベクトルは、ラベルＣが１であれば正例データであり、ラベルＣが０であれば負例データである。変換後発話列評価データＨは、文脈発話列の概念ベクトルと、次発話候補の概念ベクトルとが、その類似度に関わりなく、文脈発話列とその次発話として相応しい組合せか否かを表現している。 A post-conversion utterance string evaluation data H composed of a combination of a 2d dimensional concept vector G obtained by combining the d dimensional concept vector E and the d dimensional concept vector F and a label C is generated. The post-conversion utterance string evaluation data H is learning data composed of a 2d-dimensional feature vector and a classification label C. If the label C is 1, this 2d-order elemental vector is positive data, and the label C is 0. If so, it is negative example data. The post-conversion utterance string evaluation data H represents whether the concept vector of the context utterance string and the concept vector of the next utterance candidate are a suitable combination as the context utterance string and the next utterance regardless of the similarity. Yes.

各発話列評価データＤに対し生成した変換後発話列評価データＨの集合を、発話列評価データ変換手段２１は出力する。図４は、変換後発話列評価データＨの集合の例を示す図である。各行が、１変換後発話列評価データＨである。 The utterance string evaluation data conversion means 21 outputs a set of converted utterance string evaluation data H generated for each utterance string evaluation data D. FIG. 4 is a diagram showing an example of a set of post-conversion utterance string evaluation data H. Each row is the post-conversion utterance string evaluation data H.

学習手段２３は、変換後発話列評価データＨの集合を学習データ群として、サポートベクタマシン（略称：ＳＶＭ）等のアルゴリズムを用いて、任意の２ｄ次元概念ベクトルが、ラベルＣの値１に分類されるスコアを算出するための分類モデル２４を生成する。 The learning means 23 classifies an arbitrary 2d-dimensional concept vector as a value 1 of the label C using an algorithm such as a support vector machine (abbreviation: SVM) using the set of converted utterance string evaluation data H as a learning data group. The classification model 24 for calculating the score to be generated is generated.

任意の文脈発話列の概念ベクトルと、任意の次発話候補の概念ベクトルとの組に対し、その類似度に関わりなく、結合した概念ベクトルが、大勢として、正例データ群に近ければ高スコアとなり、負例データ群に近ければ低スコアとなるように、モデルが生成される。 Regardless of the similarity of a set of concept vectors of an arbitrary context utterance sequence and a concept vector of an optional next utterance candidate, the combined concept vectors are generally high scores if they are close to the positive example data group. The model is generated so that the score is low if it is close to the negative example data group.

図５は、発話列評価データ変換手段２１及び学習手段２３の処理フローの一例である。 FIG. 5 is an example of the processing flow of the utterance string evaluation data conversion means 21 and the learning means 23.

まず、ステップＳ１において、各発話列評価データＤに対し、概念ベース２２を参照し、発話列Ａの概念ベクトルＥを生成し、次発話候補Ｂの概念ベクトルＦを生成し、ｄ次元概念ベクトルＥとｄ次元概念ベクトルＦを結合した２ｄ次元の概念ベクトルＧとラベルＣとの組合せからなる変換後発話列評価データＨを生成する。 First, in step S1, the concept base 22 is referred to each utterance string evaluation data D, a concept vector E of the utterance string A is generated, a concept vector F of the next utterance candidate B is generated, and a d-dimensional concept vector E And post-conversion utterance string evaluation data H consisting of a combination of a 2d-dimensional concept vector G and a label C, which is a combination of the d-dimensional concept vector F.

そして、ステップＳ２において、上記ステップＳ１で生成された変換後発話列評価データＨの集合に基づいて、任意の２ｄ次元概念ベクトルが、ラベルＣの値１に分類されるスコアを算出するための分類モデル２４を生成する。 Then, in step S2, a classification for calculating a score by which an arbitrary 2d-dimensional concept vector is classified as a value 1 of label C based on the set of post-conversion utterance string evaluation data H generated in step S1. A model 24 is generated.

図６は本発明の請求項２の次発話候補スコアリング装置の構成例である。なお、請求項１の次発話候補スコアリング装置の構成例と同様の構成となる部分については、同一符号を付して説明を省略する。 FIG. 6 is a configuration example of the next utterance candidate scoring device according to claim 2 of the present invention. In addition, about the part which becomes the structure similar to the structural example of the next utterance candidate scoring apparatus of Claim 1, the same code | symbol is attached | subjected and description is abbreviate | omitted.

図６に示すように、本発明の実施の形態に係る次発話候補スコアリング装置２００は、ＣＰＵと、ＲＡＭと、後述する処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。この次発話候補スコアリング装置２００は、機能的には図６に示すように入力手段１０と、演算手段２２０と、出力手段２３０とを備えている。 As shown in FIG. 6, the next utterance candidate scoring device 200 according to the embodiment of the present invention includes a CPU, a RAM, and a ROM that stores programs and various data for executing processing routines to be described later. It can be configured with a computer including. Functionally, the next utterance candidate scoring device 200 includes an input unit 10, a calculation unit 220, and an output unit 230 as shown in FIG.

入力手段１０は、文脈となる発話列Ａと、次発話候補Ｂと、発話列Ａの次発話として次発話候補Ｂが相応しいか否かのラベルＣとの組合せからなる発話列評価データＤの集合を入力とする。 The input means 10 is a set of utterance string evaluation data D comprising a combination of a utterance string A as a context, a next utterance candidate B, and a label C indicating whether the next utterance candidate B is appropriate as the next utterance of the utterance string A. As an input.

また、入力手段１０は、入力された文脈となる発話列Ｉと、次発話候補集合とを受け付ける。 Moreover, the input means 10 receives the utterance string I that is the input context and the next utterance candidate set.

システムとユーザとの間でこれまでやり取りされてきた発話を時系列順に並べて、x1,x2,・・・,xmとする。Ｎ(≧１)を定め、当該発話列において、直前のＮ個の発話の列（x(m-N+1),x(m-N+2),・・・,xm）を、文脈となる発話列Ｉとして入力する。文脈となる発話列Ｉとして、直前までの全発話（x1,x2,・・・,xm）を入力してもよい。 The utterances exchanged so far between the system and the user are arranged in chronological order, and are set as x1, x2,..., Xm. N (≧ 1) is determined, and the sequence of the last N utterances (x (m−N + 1), x (m−N + 2),..., Xm) As an utterance string I. All utterances (x1, x2,..., Xm) up to immediately before may be input as the utterance string I as the context.

別途、これまでの発話列から抽出した焦点（トピック）について言及した文を、データベースから次発話候補として複数取るものとし、次発話候補集合として入力する。 Separately, a plurality of sentences referring to the focus (topic) extracted from the utterance sequence so far are taken as the next utterance candidates from the database, and are input as the next utterance candidate set.

演算手段２２０は、発話列評価データ変換手段２１と、概念ベース２２と、学習手段２３と、分類モデル２４と、評価手段２２１と、を含んで構成されている。 The calculation means 220 includes an utterance string evaluation data conversion means 21, a concept base 22, a learning means 23, a classification model 24, and an evaluation means 221.

図７は、評価手段２２１の処理フローの一例である。図７に沿って、評価手段２２１の処理内容を説明する。 FIG. 7 is an example of a processing flow of the evaluation unit 221. The processing content of the evaluation means 221 is demonstrated along FIG.

まず、ステップＳ３において、概念ベース２２を参照し、発話列Ｉの概念ベクトルＪを生成する。 First, in step S3, the concept base 22 is referred to generate a concept vector J of the utterance sequence I.

次に、次発話候補集合における各次発話候補Ｋに対し以下のステップＳ４〜ステップＳ６の処理を行う。 Next, the following steps S4 to S6 are performed on each next utterance candidate K in the next utterance candidate set.

ステップＳ４において、概念ベース２２を参照し、次発話候補Ｋの概念ベクトルＬを生成する。 In step S4, the concept base 22 is referred to generate a concept vector L of the next utterance candidate K.

ステップＳ５において、ｄ次元概念ベクトルＪとｄ次元概念ベクトルＬを結合した２ｄ次元の概念ベクトルＭを生成する。 In step S5, a 2d-dimensional concept vector M is generated by combining the d-dimensional concept vector J and the d-dimensional concept vector L.

ステップＳ６において、分類モデル２４を参照することにより、２ｄ次元概念ベクトルＭが、ラベルＣの値１に分類されるスコアを算出する。このスコアは、値１に分類される度合いが大きいほど、大きい値となる。 In step S <b> 6, by referring to the classification model 24, a score for classifying the 2d-dimensional concept vector M into the value 1 of the label C is calculated. This score becomes larger as the degree of classification into value 1 is larger.

概念ベクトルＪと概念ベクトルＬとの組に対し、その類似度に関わりなく、結合した概念ベクトルＭが、大勢として、正例データ群に近ければ高スコアとなり、負例データ群に近ければ低スコアとなる。すなわち、文脈発話列Ｉと次発話候補Ｋの内容の近さに関わりなく、それぞれの内容の対応関係が、正例データにおける文脈発話列Ａと次発話候補Ｂの対応関係に近ければ高スコアとなり、負例データにおける文脈発話列Ａと次発話候補Ｂの対応関係に近ければ低スコアとなる。このため、文脈発話列Ｉとの内容の近さに関わりなく、文脈発話列Ｉの次発話として相応しい次発話候補が選択されるようになる。 For a set of concept vector J and concept vector L, regardless of their similarity, the combined concept vector M generally has a high score if it is close to the positive example data group, and a low score if it is close to the negative example data group. It becomes. That is, regardless of the closeness of the contents of the context utterance sequence I and the next utterance candidate K, a high score is obtained if the correspondence between the contents is close to the correspondence between the context utterance sequence A and the next utterance candidate B in the positive example data. If the correspondence between the context utterance string A and the next utterance candidate B in the negative example data is close, the score is low. Therefore, the next utterance candidate suitable as the next utterance of the context utterance string I is selected regardless of the closeness of the content with the context utterance string I.

図８は、評価手段２２１が出力した、概念ベクトルＭとそれに対して算出したスコアとの組の集合を示す図である。 FIG. 8 is a diagram showing a set of sets of the concept vector M output from the evaluation unit 221 and the score calculated for the concept vector M.

そして、ステップＳ７において、最高スコアを出した次発話候補を選択し、システムがユーザに返す。あるいは、ある閾値以上のスコアをもつ次発話候補、またはスコアが上位何位かまでの次発話候補から、ランダムに選択した次発話候補を、システムがユーザに返す。 In step S7, the next utterance candidate that gave the highest score is selected, and the system returns it to the user. Alternatively, the system returns to the user the next utterance candidate randomly selected from the next utterance candidate having a score equal to or higher than a certain threshold or the next utterance candidate having the highest score.

なお、発話列Ｉまたは次発話候補Ｋから概念ベクトルが生成されない場合は、概念ベクトルＭが生成されないので、次発話候補Ｋのスコアは算出されない。スコアが算出されない次発話候補は、スコアが算出された次発話候補よりも順位は低いものとし、スコアが算出されない次発話候補同士はランダムに順序付けする。 If no concept vector is generated from the utterance string I or the next utterance candidate K, the concept vector M is not generated, and therefore the score of the next utterance candidate K is not calculated. The next utterance candidate whose score is not calculated is assumed to have a lower rank than the next utterance candidate whose score is calculated, and the next utterance candidate whose score is not calculated is randomly ordered.

これまで述べた処理をプログラムとして構築し、当該プログラムを通信回線または記録媒体からインストールし、ＣＰＵ等の手段で実施することが可能である。 It is possible to construct the processing described so far as a program, install the program from a communication line or a recording medium, and implement it by means such as a CPU.

なお、本発明は、上記の実施例に限定されることなく、特許請求の範囲内において、種々変更・応用が可能である。 The present invention is not limited to the above-described embodiments, and various modifications and applications are possible within the scope of the claims.

請求項１（学習フェーズ）と請求項２（分類フェーズ）の両方において、文脈発話列中の発話数を一定数Ｎとし、文脈発話列中の発話ごとに概念ベクトルを生成し、それらＮ個の概念ベクトルと、次発話候補から生成した概念ベクトルとを結合したｄ（Ｎ＋１）次元概念ベクトルを、概念ベクトルＧや概念ベクトルＭ相当のものとして処理を行うというような拡張も可能である。これは、請求項１、２において、文脈発話列の概念ベクトルＥ、Ｊを、文脈発話列中のＮ個の発話それぞれの概念ベクトルを結合したものと解釈したことに相当する。 In both claim 1 (learning phase) and claim 2 (classification phase), the number of utterances in the context utterance sequence is a fixed number N, and a concept vector is generated for each utterance in the context utterance sequence. It is possible to extend such that the d (N + 1) -dimensional concept vector obtained by combining the concept vector and the concept vector generated from the next utterance candidate is processed as a concept vector G or a concept vector M equivalent. This corresponds to interpretation of the concept vectors E and J of the context utterance sequence as a combination of the concept vectors of the N utterances in the context utterance sequence.

分類手法としてｋ近傍法を採用した場合は、学習手段２３では特に何も行わずに、発話列評価データ変換手段２１が出力した変換後発話列評価データＨの集合を分類モデル２４とする。評価手段２２１では、この分類モデル２４から、概念ベクトルＭに近い順にｋ個の概念ベクトルＧを取り、そのような概念ベクトルＧのラベルで、個数がより多い方の値を、概念ベクトルＭのスコア及びラベルとする。 When the k-nearest neighbor method is adopted as the classification method, the learning unit 23 does not perform anything in particular, and the set of the converted utterance string evaluation data H output from the utterance string evaluation data conversion unit 21 is used as the classification model 24. The evaluation means 221 takes k concept vectors G from the classification model 24 in the order closer to the concept vector M, and uses the value of the concept vector G with the larger number as the score of the concept vector M. And labels.

本発明は、システムとユーザとの円滑なインタラクションを実現する対話処理技術に適用可能である。 The present invention can be applied to a dialogue processing technique that realizes a smooth interaction between a system and a user.

１０入力手段
２０、２２０演算手段
２１発話列評価データ変換手段
２２概念ベース
２３学習手段
２４分類モデル
１００、２００次発話候補スコアリング装置
２２１評価手段
２３０出力手段 DESCRIPTION OF SYMBOLS 10 Input means 20, 220 Calculation means 21 Utterance string evaluation data conversion means 22 Concept base 23 Learning means 24 Classification model 100, 200 Next utterance candidate scoring device 221 Evaluation means 230 Output means

Claims

A concept base that is a set of pairs of a word and a concept vector representing the meaning of the word;
An input is a set of utterance string evaluation data D composed of a combination of a utterance string A as a context, a next utterance candidate B, and a label C indicating whether the next utterance candidate B is appropriate as the next utterance of the utterance string A. , Referring to the concept base for each utterance string evaluation data D, generating a concept vector E of the utterance string A, generating a concept vector F of the next utterance candidate B, and the concept vector E and the concept vector Utterance string evaluation data conversion means for generating post-conversion utterance string evaluation data H comprising a combination of the concept vector G combined with F and the label C;
Learning means for generating a classification model for calculating a score in which an arbitrary concept vector having the same dimension as the concept vector G is classified into one value of the label C from the set of the converted utterance string evaluation data H When,
The next utterance candidate scoring device, comprising:

With the utterance string I as a context and the next utterance candidate set as inputs, refer to the concept base, and generate a concept vector J of the utterance string I,
For each next utterance candidate K in the next utterance candidate set, a concept vector L of the next utterance candidate K is generated, a concept vector M obtained by combining the concept vector J and the concept vector L is generated, and the learning means The next utterance candidate score according to claim 1, further comprising: an evaluation unit that calculates a score by which the concept vector M is classified into one value of the label C by referring to the generated classification model. Ring device.

A next utterance candidate scoring method in a next utterance candidate scoring device including a concept base that is a set of a pair of a word and a concept vector representing the meaning of the word, an utterance string evaluation data conversion unit, and a learning unit ,
The utterance string evaluation data conversion means includes a combination of a utterance string A as a context, a next utterance candidate B, and a label C indicating whether the next utterance candidate B is appropriate as the next utterance of the utterance string A. A set of sequence evaluation data D is input, and for each utterance sequence evaluation data D, the concept base is referred to, a concept vector E of the utterance sequence A is generated, and a concept vector F of the next utterance candidate B is generated. Generating post-conversion speech string evaluation data H composed of a combination of a concept vector G obtained by combining the concept vector E and the concept vector F and the label C;
A classification model for the learning means to calculate a score in which an arbitrary concept vector having the same dimension as the concept vector G is classified into one value of the label C from the set of converted utterance string evaluation data H The next utterance candidate scoring method.

The evaluation means receives the utterance sequence I as a context and the next utterance candidate set as input, refers to the concept base, and generates a concept vector J of the utterance sequence I,
For each next utterance candidate K in the next utterance candidate set, a concept vector L of the next utterance candidate K is generated, a concept vector M obtained by combining the concept vector J and the concept vector L is generated, and the learning means The next utterance candidate scoring method according to claim 3, further comprising: calculating a score by which the concept vector M is classified into one value of the label C by referring to the generated classification model.

The program for functioning a computer as each means which comprises the next speech candidate scoring apparatus of Claim 1 or Claim 2.