JP2017059205A

JP2017059205A - Subject estimation system, subject estimation method, and program

Info

Publication number: JP2017059205A
Application number: JP2016080684A
Authority: JP
Inventors: 宏杰史; Hung-Chieh Shi; 貴志牛尾; Takashi Ushio; 遠藤　充; Mitsuru Endo; 充遠藤; 山上　勝義; Katsuyoshi Yamagami; 勝義山上
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2015-09-17
Filing date: 2016-04-13
Publication date: 2017-03-23
Anticipated expiration: 2036-04-13
Also published as: JP6611053B2

Abstract

PROBLEM TO BE SOLVED: To provide a subject estimation system and the like capable of highly accurately estimating a subject of a dialog even if learning data is insufficient.SOLUTION: A subject estimation system comprises a convolution neural network 10 for estimating a subject label of a dialog. The convolution neural network 10 includes: a convolution layer 12 consisting of one or more topic dependent convolution layers for performing convolution calculation dependent on a topic with respect to input of a word string vector sequence corresponding to a dialog text which is transcription of the dialog and one topic independent convolution layer for performing convolution calculation independent of the topic; a pooling layer 13 for performing pooling processing with respect to output of the convolution layer 12; and a whole coupling layer 14 for performing whole coupling processing with respect to output of the pooling layer 13.SELECTED DRAWING: Figure 3

Description

本発明は、対話の主題を推定する主題推定システム、主題推定方法およびプログラムに関する。 The present invention relates to a subject estimation system, a subject estimation method, and a program for estimating a subject of dialogue.

畳み込みニューラルネットワークを利用してパターン認識を行うシステムがある（例えば特許文献１）。特許文献１には、畳み込みニューラルネットワークを用いたパターン認識の一般的な方法が開示されている。 There is a system that performs pattern recognition using a convolutional neural network (for example, Patent Document 1). Patent Document 1 discloses a general method of pattern recognition using a convolutional neural network.

また、畳み込みニューラルネットワークを自然言語処理の分野に適用する方法も知られている（例えば非特許文献１）。非特許文献１には、公知のデータセットを用いて学習させた畳み込みニューラルネットワークを用いて文の分類を行う方法が開示されている。 A method of applying a convolutional neural network to the field of natural language processing is also known (for example, Non-Patent Document 1). Non-Patent Document 1 discloses a method of classifying sentences using a convolutional neural network trained using a known data set.

米国特許出願公開第２００３／０１７４８８１号明細書US Patent Application Publication No. 2003/0174881

Yoon Kim、“Convolutional Neural Networks for Sentence Classification”、［online］、［平成28年3月29日検索］、インターネット〈URL：http://arxiv.org/abs/1408.5882〉Yoon Kim, “Convolutional Neural Networks for Sentence Classification”, [online], [Search March 29, 2016], Internet <URL: http://arxiv.org/abs/1408.5882>

しかしながら、上記従来技術を用いた文の分類方法では、畳み込みニューラルネットワークは十分な学習データを用いて学習されることが前提であり、学習データが十分でない場合についてはあまり考慮されていない。 However, in the sentence classification method using the above-described conventional technique, it is assumed that the convolutional neural network is learned using sufficient learning data, and the case where the learning data is insufficient is not taken into consideration.

そのため、上記従来技術に開示される畳み込みニューラルネットワークを用いても、学習データが十分でない場合、対話の主題を推定するタスクを精度よく行えないという課題がある。 Therefore, even if the convolutional neural network disclosed in the above prior art is used, there is a problem that the task of estimating the subject of dialogue cannot be performed with high accuracy if the learning data is not sufficient.

そこで、本発明は、上述の問題点に着目したものであり、学習データが十分でない場合でも、対話の主題をより高精度に推定することができる主題推定システム、主題推定方法およびプログラムを提供することを目的とする。 Therefore, the present invention focuses on the above-described problems, and provides a subject estimation system, a subject estimation method, and a program capable of estimating the subject of a conversation with higher accuracy even when learning data is not sufficient. For the purpose.

上記課題を解決するため、本発明の一形態に係る主題推定システムは、畳み込みニューラルネットワークを備え、対話の主題ラベルを推定するための主題推定システムであって、前記畳み込みニューラルネットワークは、対話を書き起こした対話テキストに対応する単語列ベクトル列の入力に対して、トピックに依存した畳み込み演算を行う１以上のトピック依存畳み込み層と、当該トピックに依存しない畳み込み演算を行う１つのトピック非依存畳み込み層とで構成される畳み込み層と、前記畳み込み層の出力に対してプーリング処理を行うプーリング層と、前記プーリング層の出力に対して全結合処理を行う全結合層とを備える。 In order to solve the above problems, a theme estimation system according to an aspect of the present invention includes a convolutional neural network, and is a theme estimation system for estimating a theme label of a dialog, the convolution neural network writing a dialog. One or more topic-dependent convolution layers that perform topic-dependent convolution operations on a word string vector sequence input corresponding to the generated conversation text, and one topic-independent convolution layer that performs convolution operations that do not depend on the topic A convolution layer, a pooling layer that performs a pooling process on the output of the convolution layer, and a total coupling layer that performs a total coupling process on the output of the pooling layer.

なお、これらの全般的または具体的な態様は、システム、方法、集積回路、コンピュータプログラムまたは記録媒体で実現されてもよく、システム、方法、集積回路、コンピュータプログラムおよび記録媒体の任意な組み合わせで実現されてもよい。 These general or specific aspects may be realized by a system, a method, an integrated circuit, a computer program, or a recording medium, and are realized by any combination of the system, method, integrated circuit, computer program, and recording medium. May be.

本発明によれば、学習データが十分でない場合でも、対話の主題をより高精度に推定することができる主題推定システム等を実現できる。 ADVANTAGE OF THE INVENTION According to this invention, even when learning data is not enough, the theme estimation system etc. which can estimate the theme of a dialog with higher precision are realizable.

比較例の主題推定システムが利用する畳み込みニューラルネットの構造を示す図The figure which shows the structure of the convolution neural network which the theme estimation system of the comparative example uses 比較例の畳み込みニューラルネットワークがマルチラベル出力に対応しないことを説明するための図Diagram for explaining that the convolutional neural network of the comparative example does not support multi-label output 実施の形態１における主題推定システムが利用する畳み込みニューラルネットの構造を示す図The figure which shows the structure of the convolution neural network which the theme estimation system in Embodiment 1 utilizes 実施の形態１における主題推定システムの識別時の機能構成を示すブロック図FIG. 3 is a block diagram showing a functional configuration at the time of identification of the subject estimation system in the first embodiment. 主題推定システムを実行するのに必要なコンピュータのハードウェア構成の一例を示す図The figure which shows an example of the hardware constitutions of a computer required in order to perform a theme estimation system 図４に示す識別時の主題推定システムの動作を示すフローチャートThe flowchart which shows operation | movement of the theme estimation system at the time of identification shown in FIG. 実施の形態１における主題推定システムの学習時の機能構成を示すブロック図FIG. 3 is a block diagram showing a functional configuration during learning of the subject estimation system in the first embodiment. 図７に示す学習時の主題推定システムの動作を示すフローチャートThe flowchart which shows operation | movement of the theme estimation system at the time of learning shown in FIG. 実施の形態１における主題推定システムの実験的検証結果を示す図The figure which shows the experimental verification result of the theme estimation system in Embodiment 1 実施の形態２における主題推定システムの追加学習時の機能構成を示すブロック図FIG. 7 is a block diagram showing a functional configuration during additional learning of the subject estimation system in the second embodiment. 図１０に示す追加学習時の主題推定システムの動作を示すフローチャートThe flowchart which shows operation | movement of the theme estimation system at the time of the additional learning shown in FIG. 実施の形態２における主題推定システムの実験的検証結果を示すグラフThe graph which shows the experimental verification result of the theme estimation system in Embodiment 2

（本発明の基礎となった知見）
本開示において想定する対話の主題を推定するタスクは、人間と機械とを問わず、二者間を基本とする自然言語のやりとりである対話の主題が何かを推定するというタスクである。当該タスクは、対話中の単語列が入力され、一つのトピックに関すると認定された当該単語列のセグメントを一単位として、その対話における主題が何かを推定する。 (Knowledge that became the basis of the present invention)
The task of estimating the subject of dialogue assumed in the present disclosure is a task of estimating what is the subject of dialogue, which is a natural language exchange based on two parties, regardless of whether a person or a machine. The task estimates a subject in the dialogue by inputting a word string during the dialogue and taking a segment of the word row recognized as related to one topic as a unit.

畳み込みニューラルネットワークを基本とする主題推定システムでは、学習データを用いて畳み込みニューラルネットワークの学習を行い、評価データを用いて評価を行う。 In a subject estimation system based on a convolutional neural network, learning of a convolutional neural network is performed using learning data, and evaluation is performed using evaluation data.

本開示において想定する上記タスクでは、有限個のトピックが定義され、出力として得たい有限個の主題も定義される。また、トピックが何であるかについては、セグメント内の単語列と共に主題推定システムに与えられる。 In the above tasks assumed in this disclosure, a finite number of topics are defined, and a finite number of subjects that are desired to be output are also defined. Also, what the topic is is given to the subject estimation system together with the word string in the segment.

しかしながら、上述したように、上記従来技術に開示される畳み込みニューラルネットワークを主題推定システムに利用する場合には、その畳み込みニューラルネットワークを十分な学習データを用いて学習させることが前提である。学習データが十分でない場合についてまでは検討されていない。 However, as described above, when the convolutional neural network disclosed in the above-described prior art is used in the subject estimation system, it is assumed that the convolutional neural network is learned using sufficient learning data. The case where the learning data is not sufficient is not considered.

例えば人間同士の対話を記録したデータから当該対話を書き起こしたものを学習データとして利用する場合、あるトピックについては対話量が少なく学習データが少ないということが起こりうる。また、出力として得たい主題についても、ある主題については学習データが少ないということが起こり得る。このような、学習データが少なくまた数に偏りがあるという場合についてまでは、検討されていない。 For example, when data obtained by recording a dialogue between humans is used as learning data, there may be a small amount of dialogue and less learning data for a certain topic. Also, for a subject that is desired to be obtained as an output, it may happen that there is little learning data for a certain subject. Such a case where there is little learning data and there is a bias in the number has not been studied.

そのため、上記従来技術に開示される畳み込みニューラルネットワークを用いても、学習データが十分でない場合、対話の主題を推定するタスクを精度よく行えないという課題が発生する。 Therefore, even if the convolutional neural network disclosed in the above prior art is used, if the learning data is not sufficient, there arises a problem that the task of estimating the subject of dialogue cannot be performed with high accuracy.

このような課題を解決するために、本発明の一形態に係る主題推定システムは、畳み込みニューラルネットワークを備え、対話の主題ラベルを推定するための主題推定システムであって、前記畳み込みニューラルネットワークは、対話を書き起こした対話テキストに対応する単語列ベクトル列の入力に対して、トピックに依存した畳み込み演算を行う１以上のトピック依存畳み込み層と、当該トピックに依存しない畳み込み演算を行う１つのトピック非依存畳み込み層とで構成される畳み込み層と、前記畳み込み層の出力に対してプーリング処理を行うプーリング層と、前記プーリング層の出力に対して全結合処理を行う全結合層とを備える。 In order to solve such a problem, a subject estimation system according to an aspect of the present invention includes a convolutional neural network, and is a subject estimation system for estimating a subject label of a dialogue, wherein the convolutional neural network includes: One or more topic-dependent convolutional layers that perform topic-dependent convolution operations on an input of a word string vector sequence corresponding to the dialog text that transcribes the conversation, and one topic non-conformation that performs convolution operations that do not depend on the topic A convolution layer including a dependent convolution layer; a pooling layer that performs a pooling process on an output of the convolution layer; and an all coupling layer that performs a total coupling process on an output of the pooling layer.

この構成により、学習データが十分でない場合でも、対話の主題をより高精度に推定することができる主題推定システムを実現できる。より具体的には、この構成によれば、学習データが多い場合に精度が高いトピックごとに学習される１つのトピック非依存畳み込み層と、学習データが少ない場合に精度が高いトピックによらず学習される１以上のトピック依存畳み込み層とが、上位の階層で統合されるため、学習データの量が少ない場合にも高い主題推定性能が得られる。 With this configuration, it is possible to realize a theme estimation system that can estimate the theme of a conversation with higher accuracy even when the learning data is not sufficient. More specifically, according to this configuration, one topic-independent convolution layer that is learned for each topic with high accuracy when there is a large amount of learning data, and learning regardless of a topic with high accuracy when there is a small amount of learning data Since one or more topic-dependent convolutional layers are integrated in a higher hierarchy, high subject estimation performance can be obtained even when the amount of learning data is small.

また、前記畳み込みニューラルネットワークは、前記入力を２クラス分類問題として解くことで前記入力に対する対話の主題ラベルを推定するとしてもよい。 The convolutional neural network may estimate the subject label of the dialogue for the input by solving the input as a two-class classification problem.

また、前記畳み込みニューラルネットワークにおいて、対話を書き起こした学習用対話テキストであって前記対話の時系列のテキストがトピックごとのセグメントに予め分割され、かつ、分割されたセグメントごとに対応するトピックのラベルが予め付与された学習用対話テキストを、学習データとして用いて、前記１以上のトピック依存畳み込み層それぞれに、依存するトピックである依存トピックごとに当該依存トピックに依存する畳み込み演算を行わせるよう第１重みを学習させ、かつ、前記トピック非依存畳み込み層に当該依存トピックに依存しない畳み込み演算を行わせるよう第２重みを学習させるとしてもよい。 Further, in the convolutional neural network, learning dialogue text that transcribes a dialogue, wherein the dialogue time-series text is pre-divided into segments for each topic, and a topic label corresponding to each divided segment Is used as learning data to cause each of the one or more topic-dependent convolutional layers to perform a convolution operation that depends on the dependent topic for each dependent topic that is a dependent topic. The first weight may be learned, and the second weight may be learned so that the topic-independent convolution layer performs a convolution operation that does not depend on the dependent topic.

また、前記１以上のトピック依存畳み込み層それぞれは、前記学習用対話テキストに対応する単語列ベクトル列のうち前記依存トピックに関連する単語列ベクトル列が入力されることにより、当該依存トピックに依存する畳み込み演算を行うよう前記第１重みを学習し、前記トピック非依存畳み込み層は、前記学習用対話テキストに対応する単語列ベクトル列が入力されることにより、前記依存トピックに依存しない畳み込み演算を行うよう前記第２重みを学習するとしてもよい。 Each of the one or more topic-dependent convolutional layers depends on the dependent topic by inputting a word string vector sequence related to the dependent topic among word string vector sequences corresponding to the learning dialogue text. The first weight is learned so as to perform a convolution operation, and the topic-independent convolution layer performs a convolution operation that does not depend on the dependent topic by inputting a word string vector sequence corresponding to the learning dialogue text. The second weight may be learned as described above.

また、上記課題を解決するために、本発明の一形態に係る主題推定方法は、畳み込みニューラルネットワークを備え、対話の主題ラベルを推定するための主題推定システムの主題推定方法であって、対話を書き起こした対話テキストに対応する単語列ベクトル列の入力に対して、トピックに依存した畳み込み演算を行うトピック依存畳み込み処理ステップと、前記入力に対して、前記トピックに依存しない畳み込み演算を行うトピック非依存畳み込み処理ステップと、前記トピック依存畳み込み処理ステップの出力とトピック非依存畳み込み処理ステップの出力とに対してプーリング処理を行うプーリング処理ステップと、前記プーリング処理ステップの出力に対して全結合処理を行う全結合処理ステップとを含む。 In order to solve the above problem, a theme estimation method according to an aspect of the present invention is a theme estimation method of a theme estimation system that includes a convolutional neural network and estimates a theme label of a dialog. A topic-dependent convolution processing step that performs a topic-dependent convolution operation on the input of a word string vector sequence corresponding to the dialogue text that has been transcribed, and a topic non-conformation that performs a topic-independent convolution operation on the input. A pooling processing step for performing a pooling process on the dependent convolution processing step, an output of the topic-dependent convolution processing step and an output of the topic-independent convolution processing step, and a fully combining process on the output of the pooling processing step And a full join processing step.

これにより、学習データが十分でない場合でも、対話の主題をより高精度に推定することができる主題推定方法を実現できる。より具体的には、トピックに依存した畳み込み演算の結果とトピックに依存しない畳み込み演算の結果とが後段で統合されるため、学習データの量が少ない場合にも高い主題推定性能が得られる。 Thereby, even when the learning data is not sufficient, it is possible to realize a theme estimation method that can estimate the theme of the dialogue with higher accuracy. More specifically, since the result of the topic-dependent convolution operation and the result of the topic-independent convolution operation are integrated at a later stage, high subject estimation performance can be obtained even when the amount of learning data is small.

また、前記トピック依存畳み込み処理ステップでは、前記単語列ベクトル列と、依存するトピックである依存トピックを示す特定の単語で発火する第１重み（Ａ）との畳み込み演算を行い、前記トピック非依存畳み込み処理ステップでは、前記単語列ベクトル列と、前記依存トピック以外のトピックを示す単語で発火する第２重み（Ｚ）との畳み込み演算を行い、前記プーリング処理ステップでは、前記トピック依存畳み込み処理ステップの出力と前記トピック非依存畳み込み処理ステップの出力とから時間方向の最大値を取り出す演算を行い、全結合処理ステップでは、前記プーリング処理ステップの出力に対して、結合重みを用いた重み付き加算を行った後に、確率分布化を行うことで全結合処理を行うとしてもよい。 In the topic-dependent convolution processing step, the topic-independent convolution is performed by performing a convolution operation on the word string vector sequence and a first weight (A) that fires at a specific word indicating a dependent topic that is a dependent topic. In the processing step, a convolution operation is performed between the word string vector sequence and a second weight (Z) ignited by a word indicating a topic other than the dependent topic. In the pooling processing step, an output of the topic-dependent convolution processing step is performed. And the topic-independent convolution processing step output for extracting the maximum value in the time direction, and in the all connection processing step, weighted addition using the connection weight is performed on the output of the pooling processing step. Later, it is also possible to perform the full connection process by performing probability distribution.

また、前記プーリング処理ステップの出力の確率分布と閾値とを比較することにより、前記対話の主題ラベルを推定して出力する出力ステップと、を含むとしてもよい。 And an output step of estimating and outputting the subject label of the dialogue by comparing the probability distribution of the output of the pooling processing step with a threshold value.

また、前記対話テキストに対応する単語列ベクトル列の入力を行う入力ステップを含み、前記入力ステップは、さらに、対話を書き起こした対話テキストであって前記対話が時系列にテキスト化された対話テキストを受理する受理ステップと、前記対話テキストに含まれる単語列の各単語のベクトルを所定の方法により計算して単語列ベクトル列を得るベクトル化ステップとを含むとしてもよい。 And an input step for inputting a word string vector sequence corresponding to the dialog text, wherein the input step further includes dialog text in which the dialog is transcribed, wherein the dialog is converted into text in time series. And a vectorizing step of calculating a word vector of each word string included in the dialog text by a predetermined method to obtain a word string vector string.

また、さらに、前記畳み込みニューラルネットワークに、対話を書き起こした学習用対話テキストであって前記対話の時系列のテキストがトピックごとのセグメントに予め分割され、かつ、分割されたセグメントごとに対応するトピックのラベルが予め付与された学習用対話テキストを、学習データとして用いて、前記トピック依存畳み込み処理ステップが依存するトピックである依存トピックに依存する畳み込み演算を行うよう前記第１重みを学習させる第１ステップと、前記学習用対話テキストを用いて、前記トピック非依存畳み込み処理ステップが当該依存トピックに依存しない畳み込み演算を行うよう前記第２重みを学習させる第２ステップとを含むとしてもよい。 Further, the conversational text for learning that transcribes the conversation to the convolutional neural network, the time-series text of the conversation being divided in advance into segments for each topic, and topics corresponding to the divided segments. A learning dialog text pre-assigned with a label is used as learning data to learn the first weight so as to perform a convolution operation depending on a dependent topic which is a topic on which the topic-dependent convolution processing step depends. And a second step of learning the second weight so that the topic-independent convolution processing step performs a convolution operation independent of the dependent topic using the learning dialogue text.

また、前記第１ステップでは、前記学習用対話テキストに対応する単語列ベクトル列のうち前記依存トピックに関連する単語列ベクトル列を用いて前記第１重みを学習させ、前記第２ステップでは、前記学習用対話テキストに対応する単語列ベクトル列のうち前記依存トピック以外のトピックに関連する単語列ベクトル列を用いて前記第２重みを学習させるとしてもよい。 In the first step, the first weight is learned using a word string vector string related to the dependent topic among word string vector strings corresponding to the learning dialogue text, and in the second step, the first weight is learned. The second weight may be learned using a word string vector string related to a topic other than the dependent topic among word string vector strings corresponding to the learning dialogue text.

また、さらに、前記学習用対話テキストに対応する単語列ベクトル列のうち第１依存トピックに関連する単語列ベクトル列の数が第２依存トピックに関連する単語列ベクトル列の数よりも少ない場合、Ｗｅｂを検索して得た前記第１依存トピックに関連する対話テキストを前記学習データの半教師データとして用いて、前記第１ステップおよび前記第２ステップを行うとしてもよい。 Further, when the number of word string vector sequences related to the first dependent topic among the word string vector sequences corresponding to the learning dialogue text is smaller than the number of word string vector sequences related to the second dependent topic, The first step and the second step may be performed using a dialogue text related to the first dependent topic obtained by searching the Web as semi-teacher data of the learning data.

なお、本発明は、装置として実現するだけでなく、このような装置が備える処理手段を備える集積回路として実現したり、その装置を構成する処理手段をステップとする方法として実現したり、それらステップをコンピュータに実行させるプログラムとして実現したり、そのプログラムを示す情報、データまたは信号として実現したりすることもできる。そして、それらプログラム、情報、データおよび信号は、ＣＤ−ＲＯＭ等の記録媒体やインターネット等の通信媒体を介して配信してもよい。 The present invention is not only realized as an apparatus, but also realized as an integrated circuit including processing means included in such an apparatus, or realized as a method using the processing means constituting the apparatus as a step. Can be realized as a program for causing a computer to execute, or as information, data, or a signal indicating the program. These programs, information, data, and signals may be distributed via a recording medium such as a CD-ROM or a communication medium such as the Internet.

以下、本発明の実施の形態について、図面を用いて詳細に説明する。なお、以下で説明する実施の形態は、いずれも本発明の一具体例を示すものである。以下の実施の形態で示される数値、形状、構成要素、ステップ、ステップの順序などは、一例であり、本発明を限定する主旨ではない。また、以下の実施の形態における構成要素のうち、最上位概念を示す独立請求項に記載されていない構成要素については、任意の構成要素として説明される。また全ての実施の形態において、各々の内容を組み合わせることも出来る。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. Note that each of the embodiments described below shows a specific example of the present invention. Numerical values, shapes, components, steps, order of steps and the like shown in the following embodiments are merely examples, and are not intended to limit the present invention. In addition, among the constituent elements in the following embodiments, constituent elements that are not described in the independent claims indicating the highest concept are described as optional constituent elements. In all the embodiments, the contents can be combined.

（実施の形態１）
本実施の形態では、畳み込みニューラルネットワークを備え、対話の主題ラベルを推定する主題推定システムについて説明する。なお、以下では、まず、図１を用いて比較例の主題推定システムが利用する畳み込みニューラルネットワークの構造等を説明した後に、図３を用いて本実施の形態における主題推定システムが利用する畳み込みニューラネットワークの構造等を説明する。 (Embodiment 1)
In the present embodiment, a subject estimation system that includes a convolutional neural network and estimates a subject label of an interaction will be described. In the following, first, the structure of the convolutional neural network used by the subject estimation system of the comparative example will be described with reference to FIG. 1, and then the convolution neural network used by the subject estimation system according to the present embodiment will be used with reference to FIG. The network structure and the like will be described.

[比較例の主題推定システムの構造等]
図１は、比較例の主題推定システムが利用する畳み込みニューラルネットワーク８０の構造を示す図である。図１に示す畳み込みニューラルネットワーク８０は、入力特徴８１と、畳み込み層８２と、プーリング層８３と、全結合層８４とを備える。 [Structure of subject estimation system of comparative example]
FIG. 1 is a diagram showing the structure of a convolutional neural network 80 used by the subject estimation system of the comparative example. The convolutional neural network 80 shown in FIG. 1 includes an input feature 81, a convolutional layer 82, a pooling layer 83, and a fully connected layer 84.

入力特徴８１は、入力となる単語列を所定の方法でベクトル列に変換する。畳み込み層８２は、１から数単語の近接するベクトル列を切り出した上で、学習済みの重み行列を用いて畳み込み演算を行う。プーリング層８３は、畳み込み層８２の出力に対して、時間方向の最大値を求めるという演算を行う。全結合層８４は、全結合層８４の出力素子ごとにプーリング層８３の出力に対して結合重みを掛けて加算し、最後にsoftmax関数を用いて確率分布化する。 The input feature 81 converts an input word string into a vector string by a predetermined method. The convolution layer 82 cuts out adjacent vector sequences of one to several words, and then performs a convolution operation using a learned weight matrix. The pooling layer 83 performs an operation for obtaining the maximum value in the time direction with respect to the output of the convolution layer 82. The total coupling layer 84 multiplies the output of the pooling layer 83 by the coupling weight for each output element of the total coupling layer 84, and finally performs probability distribution using the softmax function.

このような比較例における畳み込みニューラルネットワーク８０を主題推定タスクに用いると、学習データを用いた学習により、対話中の単語列の中で主題と関連性の高い部分の重みが大きくなり、特定の言語表現が含まれる場合に特定の主題を推定することができる。 When the convolutional neural network 80 in such a comparative example is used for the subject estimation task, the weight of the part highly related to the subject in the word string in the dialogue is increased by learning using the learning data, and a specific language A particular subject can be inferred if an expression is included.

図１の入力特徴８１内の太線で示した枠は、時間窓である。図１には、１単語用の時間窓と２単語用の時間窓との２種類が示されている。これらの時間窓は、時間方向に沿って、入力となる単語列を変換したベクトル列の最初から最後までシフトされ、それぞれの時間窓ごとに畳み込み演算と非線形処理が行われ、出力値を得る。これらを保持する出力素子が、畳み込み層８２内の太線で示した四角で表現されている。 A frame indicated by a thick line in the input feature 81 in FIG. 1 is a time window. FIG. 1 shows two types, a time window for one word and a time window for two words. These time windows are shifted from the beginning to the end of the vector sequence obtained by converting the input word sequence along the time direction, and a convolution operation and nonlinear processing are performed for each time window to obtain an output value. An output element that holds these is represented by a square indicated by a thick line in the convolution layer 82.

なお、畳み込みニューラルネットワーク８０を利用した畳み込み演算は、フィルター処理とも呼ばれる。また、畳み込み層８２の出力素子の数は、フィルターの総数と時間窓のシフト数との掛け算で決まる。一方、全結合層８４は、全ての主題ラベルのそれぞれに対応する出力素子を有するので、全結合層８４の出力素子の数は、全ての主題ラベルの数で決まる。したがって、比較例の畳み込みニューラルネットワーク８０を利用した主題推定システムでは、多クラス分類問題として問題（主題推定）が解かれることなる。 Note that the convolution operation using the convolutional neural network 80 is also referred to as filter processing. The number of output elements of the convolution layer 82 is determined by multiplying the total number of filters and the number of shifts of the time window. On the other hand, since the total coupling layer 84 has output elements corresponding to all the subject labels, the number of output elements of the total coupling layer 84 is determined by the number of all the subject labels. Therefore, in the subject estimation system using the convolutional neural network 80 of the comparative example, the problem (subject estimation) is solved as a multi-class classification problem.

しかしながら、上述したように、比較例の畳み込みニューラルネットワーク８０では、学習データが少ない場合、学習データが少ない主題の推定精度に引っ張られてしまい、対話の主題を推定するタスクを精度よく行えないという課題がある。 However, as described above, in the convolutional neural network 80 of the comparative example, when learning data is small, the learning data is pulled to the estimation accuracy of the subject, and the task of estimating the subject of the dialogue cannot be performed accurately. There is.

さらに、比較例の畳み込みニューラルネットワーク８０では、マルチラベル出力（言語表現の多義性）に対応していないという課題もある。これについて図２を用いて説明する。 Furthermore, the convolutional neural network 80 of the comparative example also has a problem that it does not support multi-label output (linguistic ambiguity). This will be described with reference to FIG.

図２は、比較例の畳み込みニューラルネットワーク８０がマルチラベル出力に対応しないことを説明するための図である。 FIG. 2 is a diagram for explaining that the convolutional neural network 80 of the comparative example does not support multi-label output.

図２の（ａ）および（ｂ）には、対話中の単語列中に同一の単語「rate」があるが別のトピックである対話文を畳み込みニューラルネットワーク８０を利用する主題推定システムが主題を推定した場合が示されている。より具体的には、図２の（ａ）には、トピックが「ACCOMMODATION」（「宿泊」）で、対話中の単語列「The room rate is twenty dollars.」の場合に、比較例の畳み込みニューラルネットワーク８０を利用する主題推定システムが推定した主題ラベル「Pricerange」が示されている。図２の（ｂ）には、トピックが「ATTRACTION」（「アトラクション」）で、対話中の単語列「So kids have to pay the same rate as adults.」の場合に、比較例の畳み込みニューラルネットワーク８０を利用する主題推定システムが推定した主題ラベル「Pricerange」が示されている。 2 (a) and 2 (b), a subject estimation system using a neural network 80 that convolves a dialogue sentence that is a different topic but has the same word “rate” in the word string being talked about is the subject. The estimated case is shown. More specifically, in FIG. 2A, the topic is “ACCOMMODATION” (“accommodation”), and when the word string “The room rate is twenty dollars.” During the conversation, The subject label “Pricerange” estimated by the subject estimation system using the network 80 is shown. FIG. 2B shows a convolutional neural network 80 of the comparative example when the topic is “ATTRACTION” (“attraction”) and the word string “So kids have to pay the same rate as adults.” The subject label “Pricerange” estimated by the subject estimation system that uses is shown.

図２の（ｂ）では、対話中の単語列中に同一の単語「rate」があるものの図２の（ａ）とは別のトピックであるため、主題ラベル「Fee」が正解となる。しかし、図２の（ｂ）では、図２の（ａ）と同じ主題ラベル「Pricerange」が推定されている。このように、比較例の畳み込みニューラルネットワーク８０を利用する主題推定システムでは、対話中の単語列中に同一の単語「rate」があるが文脈で主題が変わる場合（言語表現に多義性がある場合）、対応できないという課題もある。 In FIG. 2B, although the same word “rate” is present in the word string being talked about, the topic label “Fee” is correct because it is a different topic from FIG. However, in FIG. 2B, the same subject label “Pricerange” as in FIG. 2A is estimated. As described above, in the subject estimation system using the convolutional neural network 80 of the comparative example, when the same word “rate” is present in the word string being talked about but the subject changes in context (when the language expression is ambiguous) ), There is also a problem that it can not cope.

[本実施の形態の主題推定システムの構造等]
図３は、本実施の形態における主題推定システムが利用する畳み込みニューラルネットワークの構造を示す図である。 [The structure of the subject estimation system of this embodiment]
FIG. 3 is a diagram showing the structure of a convolutional neural network used by the subject estimation system in the present embodiment.

図３に示す畳み込みニューラルネットワーク１０は、入力特徴１１と、畳み込み層１２と、プーリング層１３と、全結合層１４とを備える。詳細は後述するが、入力特徴１１はトピックに応じて畳み込み層１２を構成するトピック依存の畳み込み層およびトピック非依存の畳み込み層に結合される。また、トピックに応じてプーリング層１３がトピック依存の畳み込み層とトピック非依存の畳み込み層に結合される。さらにトピックに応じて全結合層１４がプーリング層１３に結合される。 The convolutional neural network 10 shown in FIG. 3 includes an input feature 11, a convolutional layer 12, a pooling layer 13, and a fully connected layer 14. As will be described in detail later, the input feature 11 is coupled to a topic-dependent convolution layer and a topic-independent convolution layer that constitute the convolution layer 12 according to the topic. Further, the pooling layer 13 is coupled to a topic-dependent convolution layer and a topic-independent convolution layer according to the topic. Further, the entire coupling layer 14 is coupled to the pooling layer 13 according to the topic.

入力特徴１１は、単語列が入力され、入力された単語列を所定の方法でベクトル列に変換する。なお、入力特徴１１には、入力対象の単語列が所定の方法で変換されたベクトル列が入力されるとしてもよい。つまり、入力特徴１１は、対話を書き起こした対話テキストに対応する単語列ベクトル列が入力されてもよい。より具体的には、入力特徴１１には、対話を書き起こした対話テキストであって対話が時系列にテキスト化された対話テキストに含まれる単語列の各単語のベクトルを所定の方法により計算して得た（変換した）単語列ベクトル列が入力されるとしてもよい。 The input feature 11 receives a word string and converts the input word string into a vector string by a predetermined method. The input feature 11 may be a vector string obtained by converting a word string to be input by a predetermined method. That is, as the input feature 11, a word string vector sequence corresponding to the dialog text that transcribes the dialog may be input. More specifically, the input feature 11 calculates, by a predetermined method, a vector of each word in a word string included in the dialogue text that is a dialogue text in which the dialogue is transcribed and the dialogue is converted into text in time series. The word string vector sequence obtained (converted) may be input.

入力特徴１１は、入力されたベクトル列のうち、時間窓により切り出された１から数単語の近接するベクトル列のトピックに応じて、畳み込み層１２を構成するトピック依存の畳み込み層またはトピック非依存の畳み込み層（後述）に結合される。 The input feature 11 includes a topic-dependent convolution layer or a topic-independent layer constituting the convolution layer 12 according to the topic of one to a few words of adjacent vector sequences extracted by the time window. Coupled to a convolution layer (described below).

図３に示される例では、入力特徴１１に、まず、トピック＃ａに関する対話を書き起こした対話テキスト「if you take a dorm bed per…」に対応する単語列ベクトル列１１ａが入力されている。そして、その後、入力特徴１１に、トピック＃ｂに関する対話を書き起こした対話テキスト「if you want to buy a spec…」に対応する単語列ベクトル列１１ｂが入力されている。ここで、図３の入力特徴１１内の太線で示した枠は、時間窓である。図３には、図１同様に、１単語用の時間窓と２単語用の時間窓の２種類が示されている。 In the example shown in FIG. 3, first, a word string vector sequence 11 a corresponding to the dialogue text “if you take a dorm bed per... Then, after that, a word string vector string 11b corresponding to the dialogue text “if you want to buy a spec... Here, a frame indicated by a thick line in the input feature 11 in FIG. 3 is a time window. FIG. 3 shows two types of time windows for one word and two words, as in FIG.

畳み込み層１２は、単語列ベクトル列の入力に対して、トピックに依存した畳み込み演算を行う１以上のトピック依存畳み込み層と、当該単語列ベクトル列の入力に対して、当該トピックに依存しない畳み込み演算を行う１つのトピック非依存畳み込み層とで構成される。トピック依存畳み込み層は、単語列ベクトル列と、当該依存するトピックを示す特定の単語で発火する第１重みとの畳み込み演算を行う。また、トピック非依存畳み込み層は、単語列ベクトル列と、当該依存するトピック以外の（当該依存トピックに依存しない）トピックを示す単語で発火する第２重みとの畳み込み演算を行う。 The convolution layer 12 includes one or more topic-dependent convolution layers that perform topic-dependent convolution operations on word string vector sequence inputs, and the topic-independent convolution operations on the word string vector sequence inputs. It consists of one topic-independent convolution layer that performs The topic-dependent convolution layer performs a convolution operation between the word string vector sequence and a first weight that fires at a specific word indicating the dependent topic. The topic-independent convolution layer performs a convolution operation between the word string vector sequence and a second weight that is fired with a word indicating a topic other than the dependent topic (independent of the dependent topic).

このように、畳み込み層１２は、トピックと関連づけたトピック依存畳み込み層とトピックと関連づけられていないトピック非依存畳み込み層とを有する。そして、入力特徴１１は、トピックに応じて、トピック依存畳み込み層またはトピック非依存畳み込み層に結合される。 Thus, the convolution layer 12 has a topic-dependent convolution layer associated with a topic and a topic-independent convolution layer not associated with the topic. The input feature 11 is then coupled to a topic-dependent or topic-independent convolution layer depending on the topic.

本実施の形態では、例えばトピックが＃ａと＃ｂの２種類であるとして説明する。なお、もちろんトピックは２種類に限らないのはいうまでもない。 In the present embodiment, for example, it is assumed that there are two types of topics, #a and #b. Of course, the topic is not limited to two types.

畳み込み層１２は、図３に示すように、トピック＃ａに依存した畳み込み演算を行うトピック依存畳み込み層１２ａおよびトピック＃ｂに依存した畳み込み演算を行うトピック依存畳み込み層１２ｂと、これらトピックに依存しない畳み込み演算を行うトピック非依存畳み込み層１２ｚとの３つのパートで構成される。 As shown in FIG. 3, the convolution layer 12 does not depend on the topic-dependent convolution layer 12a that performs the convolution operation depending on the topic #a and the topic-dependent convolution layer 12b that performs the convolution operation depending on the topic #b. It consists of three parts, a topic-independent convolution layer 12z that performs convolution operations.

より具体的には、トピック依存畳み込み層１２ａは、単語列ベクトル列１１ａにおいて切り出された１から数単語の近接するベクトル列と、トピック＃ａを示す特定の単語で発火する学習済みの重み行列（第１重み）との畳み込み演算を行う。トピック依存畳み込み層１２ｂは、単語列ベクトル列１１ｂにおいて切り出された１から数単語の近接するベクトル列と、トピック＃ｂを示す特定の単語で発火する学習済みの重み行列（第１重み）との畳み込み演算を行う。トピック非依存畳み込み層１２ｚは、単語列ベクトル列１１ａにおいて切り出された１から数単語の近接するベクトル列および単語列ベクトル列１１ｂにおいて切り出された１から数単語の近接するベクトル列と、トピック＃ａおよびトピック以外のトピックを示す単語で発火する学習済みの重み行列（第２重み）との畳み込み演算を行う。 More specifically, the topic-dependent convolutional layer 12a includes a vector sequence of 1 to several words that are cut out in the word sequence vector sequence 11a, and a learned weight matrix that fires at a specific word indicating the topic #a ( A convolution operation with the first weight) is performed. The topic-dependent convolutional layer 12b includes a vector sequence of 1 to several words that are cut out in the word sequence vector sequence 11b and a learned weight matrix (first weight) that fires at a specific word indicating the topic #b. Perform a convolution operation. The topic-independent convolutional layer 12z includes an adjacent vector sequence of 1 to several words extracted in the word sequence vector sequence 11a and an adjacent vector sequence of 1 to several words extracted in the word sequence vector sequence 11b, and topic #a. And a convolution operation with a learned weight matrix (second weight) that is ignited by a word indicating a topic other than the topic.

プーリング層１３は、畳み込み層１２の出力に対してプーリング処理を行う。より具体的には、プーリング層１３は、トピック依存畳み込み層の出力とトピック非依存畳み込み層の出力とから時間方向の最大値を取り出す演算を行う。 The pooling layer 13 performs a pooling process on the output of the convolution layer 12. More specifically, the pooling layer 13 performs an operation for extracting the maximum value in the time direction from the output of the topic-dependent convolution layer and the output of the topic-independent convolution layer.

図３に示される例では、入力特徴１１に単語列ベクトル列１１ａが入力されたときには、トピック依存畳み込み層１２ａおよびトピック非依存畳み込み層１２ｚにプーリング層１３ａが結合される。プーリング層１３ａはこれらの出力から時間方向の最大値を取り出す演算を行う。また、入力特徴１１に単語列ベクトル列１１ｂが入力されたときには、トピック依存畳み込み層１２ｂおよびトピック非依存畳み込み層１２ｚにプーリング層１３ｂが結合される。プーリング層１３ｂは、これらの出力から時間方向の最大値を取り出す演算を行う。 In the example shown in FIG. 3, when the word string vector sequence 11a is input to the input feature 11, the pooling layer 13a is coupled to the topic-dependent convolution layer 12a and the topic-independent convolution layer 12z. The pooling layer 13a performs an operation for extracting the maximum value in the time direction from these outputs. When the word string vector sequence 11b is input to the input feature 11, the pooling layer 13b is coupled to the topic-dependent convolution layer 12b and the topic-independent convolution layer 12z. The pooling layer 13b performs an operation for extracting the maximum value in the time direction from these outputs.

全結合層１４は、プーリング層１３の出力に対して全結合処理を行う。より具体的には、全結合層１４は、プーリング層１３の出力に対して、結合重みを用いた重み付き加算を行った後に、確率分布化を行う。本実施の形態では、全結合層１４は、出力素子ごとにプーリング層１３の出力に対して結合重みを掛けて加算し、最後にsoftmax関数を用いて確率分布化する。 The total coupling layer 14 performs a total coupling process on the output of the pooling layer 13. More specifically, the total coupling layer 14 performs probability distribution on the output of the pooling layer 13 after performing weighted addition using coupling weights. In the present embodiment, the total coupling layer 14 adds and adds the coupling weight to the output of the pooling layer 13 for each output element, and finally generates a probability distribution using the softmax function.

図３に示される例では、入力特徴１１に単語列ベクトル列１１ａが入力されたときには、プーリング層１３ａと全結合層１４ａとが結合される。全結合層１４ａはプーリング層１３ａの出力に対して結合重みを掛けて加算し、最後にsoftmax関数を用いて確率分布化する。また、入力特徴１１に単語列ベクトル列１１ｂが入力されたときには、プーリング層１３ｂと全結合層１４ｂとが結合される。全結合層１４ｂは、プーリング層１３ｂの出力に対して結合重みを掛けて加算し、最後にsoftmax関数を用いて確率分布化する。 In the example shown in FIG. 3, when the word string vector string 11a is input to the input feature 11, the pooling layer 13a and the fully connected layer 14a are combined. The total coupling layer 14a multiplies the output of the pooling layer 13a by a coupling weight and adds the result, and finally, a probability distribution is made using a softmax function. When the word string vector string 11b is input to the input feature 11, the pooling layer 13b and the fully connected layer 14b are combined. The total coupling layer 14b multiplies the output of the pooling layer 13b by a coupling weight, and finally adds a probability distribution using a softmax function.

そして、全結合層１４は、プーリング層１３の出力の確率分布と閾値とを比較することにより、対話の主題ラベルを推定して出力する。 Then, the total connection layer 14 estimates and outputs the subject label of the dialogue by comparing the probability distribution of the output of the pooling layer 13 with a threshold value.

このようにして、畳み込みニューラルネットワーク１０は、入力を２クラス分類問題として解くことで当該入力に対する対話の主題ラベルを推定する。 In this way, the convolutional neural network 10 estimates the subject label of the dialogue for the input by solving the input as a two-class classification problem.

換言すると、図１で説明した比較例の全結合層８４では、全ての主題ラベルのそれぞれに対応する出力素子があり、多クラス分類問題として問題が解かれていた（学習されていた）。一方、本実施の形態における畳み込みニューラルネットワーク１０を利用した主題推定システムでは、それぞれの主題（トピック）に特化したトピック依存畳み込み層とトピックに特化しないトピック非依存畳み込み層を有する。そのため、図３に示すように、主題ラベルpricerangeについては、pricerangeであるか、または、pricerangeではない(NOT pricerange)という２クラス分類問題として解くことができる（学習することができる）。これによって、主題ごとの学習データの数に偏りがあっても、数の少ない主題の学習結果が、数の多い主題の学習結果に影響されないだけではく、数が少ない学習データに対する性能すなわち推定精度が改善される。 In other words, in the all coupling layer 84 of the comparative example described in FIG. 1, there are output elements corresponding to all the subject labels, and the problem has been solved (learned) as a multi-class classification problem. On the other hand, the subject estimation system using the convolutional neural network 10 in this embodiment has a topic-dependent convolution layer specialized for each subject (topic) and a topic-independent convolutional layer not specialized for a topic. Therefore, as shown in FIG. 3, the subject label pricerange can be solved (learned) as a two-class classification problem of pricerange or not pricerange (NOT pricerange). As a result, even if there is a bias in the number of learning data for each subject, the learning results for a small number of subjects are not affected by the learning results for a large number of subjects. Is improved.

（畳み込みニューラルネットワーク１０の学習）
ここで、本実施の形態における畳み込みニューラルネットワーク１０の学習について説明する。 (Learning of convolutional neural network 10)
Here, learning of the convolutional neural network 10 in the present embodiment will be described.

本実施の形態では、学習データ（訓練データ）としては、対話を書き起こした学習用対話テキストであって対話の時系列のテキストがトピックごとのセグメントに予め分割され、かつ、分割されたセグメントごとに対応するトピックのラベルが予め付与された学習用対話テキストを用いる。この学習用対話テキストとしては、例えばＤＳＴＣ４（Dialog State Tracking Challenge 4)のデータセットを用いてもよい。 In the present embodiment, the learning data (training data) is a learning dialogue text that transcribes a dialogue, and the dialogue time-series text is divided in advance into segments for each topic, and for each divided segment. The learning dialogue text in which the label of the topic corresponding to is assigned in advance is used. As the learning dialogue text, for example, a DSTC4 (Dialog State Tracking Challenge 4) data set may be used.

畳み込みニューラルネットワーク１０において、１以上のトピック依存畳み込み層１２ａ、１２ｂそれぞれに、依存するトピックごとに当該トピックに依存する畳み込み演算を行わせるよう第１重みを学習させ、かつ、トピック非依存畳み込み層１２ｚに、当該依存するトピックに依存しない畳み込み演算を行わせるよう第２重みを学習させる。１以上のトピック依存畳み込み層それぞれは、学習用対話テキストに対応する単語列ベクトル列のうち当該依存するトピックに関連する単語列ベクトル列が入力されて、当該依存するトピックに依存する畳み込み演算を行うよう第１重みを学習する。トピック非依存畳み込み層１２ｚは、学習用対話テキストに対応する単語列ベクトル列が入力されて、当該依存するトピックに依存しない畳み込み演算を行うよう第２重みを学習する。 In the convolutional neural network 10, the first weight is learned so that each of the one or more topic-dependent convolutional layers 12a and 12b performs a convolution operation depending on the topic for each dependent topic, and the topic-independent convolutional layer 12z. Then, the second weight is learned so that the convolution operation independent of the dependent topic is performed. Each of the one or more topic-dependent convolution layers receives a word string vector sequence related to the dependent topic among the word string vector sequences corresponding to the learning dialogue text, and performs a convolution operation depending on the dependent topic. The first weight is learned. The topic-independent convolution layer 12z receives the word string vector sequence corresponding to the learning dialogue text and learns the second weight so as to perform a convolution operation that does not depend on the dependent topic.

また、畳み込みニューラルネットワーク１０では、畳み込み層１２の畳み込み重み（第１重み、第２重み）と全結合層１４の結合重みの学習を、望ましい出力と実際の出力の差（エラー）に基づいて行う。望ましい出力と実際の出力の差（エラー）に基づく学習を行うための学習アルゴリズムとしては、確率的勾配降下法（ＳＧＤ）などが知られている。学習アルゴリズムは公知のものでよいので、ここでの説明は省略する。 In the convolutional neural network 10, learning of the convolution weights (first weight and second weight) of the convolution layer 12 and the connection weights of all the connection layers 14 is performed based on a difference (error) between a desired output and an actual output. . As a learning algorithm for performing learning based on a difference (error) between a desired output and an actual output, a stochastic gradient descent method (SGD) or the like is known. Since the learning algorithm may be a known algorithm, a description thereof is omitted here.

このような学習処理を、畳み込み層１２を構成するトピック依存畳み込み層１２ａ、トピック依存畳み込み層１２ｂおよびトピック非依存畳み込み層１２ｚのそれぞれに、行う。これにより、トピック依存畳み込み層１２ａ、１２ｂでは、トピックにより特化した言語表現と結びつくが、学習データ中のそれぞれに依存するトピックの対話文の量は限られているので学習データの量は比較的少なくなる。一方で、トピック非依存畳み込み層１２ｚでは、トピックによらない言語表現と結びつくため学習データの量は比較的多くなる。 Such learning processing is performed on each of the topic-dependent convolutional layer 12a, the topic-dependent convolutional layer 12b, and the topic-independent convolutional layer 12z constituting the convolutional layer 12. As a result, the topic-dependent convolutional layers 12a and 12b are connected to a linguistic expression specialized for the topic. However, since the amount of interactive sentences of topics depending on each of the learning data is limited, the amount of learning data is relatively small. Less. On the other hand, in the topic-independent convolutional layer 12z, the amount of learning data is relatively large because the topic-independent convolutional layer 12z is linked to language expression independent of the topic.

そして、両者（トピック依存畳み込み層１２ａ、１２ｂとトピック非依存畳み込み層１２ｚ）は、図３に示すように、後段（プーリング層１３や全結合層１４）で結びつくように構成されている。そのため、学習によりバランスが取られることになる。すなわち、学習データが多いトピックについても学習データが少ないトピックについてもトピック依存畳み込み層１２ａ、１２ｂとトピック非依存畳み込み層１２ｚとの結合重みが調整される。 Then, both (topic-dependent convolutional layers 12a and 12b and topic-independent convolutional layer 12z) are configured to be connected at the subsequent stage (pooling layer 13 and all coupling layer 14) as shown in FIG. Therefore, a balance is taken by learning. That is, the connection weight between the topic-dependent convolutional layers 12a and 12b and the topic-independent convolutional layer 12z is adjusted for a topic with a large amount of learning data and a topic with a small amount of learning data.

これにより、学習データが多いピックが対話文として入力された場合にはトピック依存畳み込み層の出力が重視され、学習データが少ないトピックが対話文として入力された場合にはトピック非依存の畳み込み層の出力が重視されるため、学習データの量の違いによる性能の差が低減される。本構成では、特に学習データが少ないトピックについての性能（推定精度）を底上げすることができる。 As a result, when a pick with a lot of learning data is input as a conversation sentence, the output of the topic-dependent convolution layer is emphasized, and when a topic with a small amount of learning data is input as a conversation sentence, the topic-independent convolution layer is output. Since the output is emphasized, the difference in performance due to the difference in the amount of learning data is reduced. In this configuration, it is possible to raise the performance (estimation accuracy) for a topic with particularly little learning data.

さらに、トピック依存畳み込み層１２ａ、１２ｂとトピック非依存畳み込み層１２ｚとが、図１に示すように、上位の階層（プーリング層１３や全結合層１４）で結びつくように構成されているため、図２で説明したような問題も起きない。すなわち、畳み込みニューラルネットワーク１０を備える本実施の形態における主題推定システムは、畳み込みニュートラルネットワークによるマルチドメイン対話主題も推定することができる。 Furthermore, since the topic-dependent convolution layers 12a and 12b and the topic-independent convolution layer 12z are configured to be connected at an upper layer (the pooling layer 13 and the fully connected layer 14) as shown in FIG. The problem described in 2 does not occur. That is, the theme estimation system in the present embodiment including the convolutional neural network 10 can also estimate the multi-domain dialogue theme by the convolutional neutral network.

［本実施の形態の主題推定システムの機能構成］
次に、上述した本実施の形態における主題推定システムの学習時および識別時について具体的に説明する。以下では、本主題推定システムの学習時および識別時の機能構成図と動作図を用いて説明する。 [Functional configuration of the subject estimation system of the present embodiment]
Next, the learning time and identification time of the subject estimation system in the above-described embodiment will be specifically described. Below, it demonstrates using the functional block diagram and operation | movement figure at the time of the learning and identification of this subject estimation system.

（識別時）
図４は、本実施の形態における主題推定システムの識別時の機能構成を示すブロック図である。ここでは、主題推定システムが利用する畳み込みニューラルネットワーク１０を機能ブロック図として記載している。 (At identification)
FIG. 4 is a block diagram showing a functional configuration during identification of the subject estimation system in the present embodiment. Here, the convolutional neural network 10 used by the subject estimation system is described as a functional block diagram.

図４に示す本実施の形態における主題推定システムは、入力部１１０、単語ベクトル列制御部１１１、トピック依存畳み込み層計算部１２１、トピック非依存畳み込み層計算部１２３、プーリング層計算部１３１、全結合層計算部１４１、および、出力部１５０を備える。さらに、本主題推定システムは、重みＡ（第１重み）を格納する格納部１２２と、重みＺ（第２重み）を格納する格納部１２４と、重みｆを格納する格納部１４２とを備える。ここで、入力部１１０と単語ベクトル列制御部１１１とは、上述した入力特徴１１の機能構成である。トピック依存畳み込み層計算部１２１と格納部１２２とトピック非依存畳み込み層計算部１２３と格納部１２４とは、上述した畳み込み層１２の機能構成である。プーリング層計算部１３１は上述したプーリング層１３の機能構成であり、全結合層計算部１４１と格納部１４２とは上述した全結合層１４の機能構成である。 The subject estimation system in the present embodiment shown in FIG. 4 includes an input unit 110, a word vector sequence control unit 111, a topic-dependent convolutional layer calculation unit 121, a topic-independent convolutional layer calculation unit 123, a pooling layer calculation unit 131, and a full combination. A layer calculation unit 141 and an output unit 150 are provided. Further, the subject estimation system includes a storage unit 122 that stores weight A (first weight), a storage unit 124 that stores weight Z (second weight), and a storage unit 142 that stores weight f. Here, the input unit 110 and the word vector sequence control unit 111 are functional configurations of the input feature 11 described above. The topic-dependent convolutional layer calculation unit 121, the storage unit 122, the topic-independent convolutional layer calculation unit 123, and the storage unit 124 are functional configurations of the convolutional layer 12 described above. The pooling layer calculation unit 131 has the functional configuration of the pooling layer 13 described above, and the total coupling layer calculation unit 141 and the storage unit 142 have the functional configuration of the total coupling layer 14 described above.

図４に示す主題推定システムには、単語列情報５１とトピック情報５２とからなる入力情報５０が入力される。入力部１１０は、単語列情報５１から単語列を受け取り、同時にトピック情報５２からトピックを受け取る。単語ベクトル列制御部１１１は、単語列を所定の方法によりベクトル列に変換する。ここで、単語列をベクトルに変換する方法は、bag-of-wordsと呼ばれる方法やその次元を圧縮して用いる方法が種々提案されており、それらの公知の方法を上記所定の方法とすればよい。 In the subject estimation system shown in FIG. 4, input information 50 composed of word string information 51 and topic information 52 is input. The input unit 110 receives a word string from the word string information 51 and simultaneously receives a topic from the topic information 52. The word vector sequence control unit 111 converts the word sequence into a vector sequence by a predetermined method. Here, as a method for converting a word string into a vector, various methods called bag-of-words and a method of compressing the dimension have been proposed. If these known methods are the above-mentioned predetermined methods, Good.

トピック依存畳み込み層計算部１２１は、単語列ベクトル列と重みＡ（第１重み）を用いて畳み込み演算を行う。この重みＡ（第１重み）は、トピックごとのフィルターの総数だけ存在する。トピック非依存畳み込み層計算部１２３は、単語列ベクトル列と重みＺ（第２重み）を用いて畳み込み演算を行う。この重みＺ（第２重み）はトピック非依存のフィルターの数だけ存在する。 The topic-dependent convolution layer calculation unit 121 performs a convolution operation using the word string vector sequence and the weight A (first weight). This weight A (first weight) exists by the total number of filters for each topic. The topic-independent convolution layer calculation unit 123 performs a convolution operation using the word string vector string and the weight Z (second weight). There are as many weights Z (second weights) as there are topic-independent filters.

プーリング層計算部１３１は、トピック依存畳み込み層計算部１２１とトピック非依存畳み込み層計算部１２３のそれぞれのフィルターに対応する出力を時間方向に見て最大値を取り出す。 The pooling layer calculation unit 131 takes out the maximum value by looking at the outputs corresponding to the filters of the topic-dependent convolutional layer calculation unit 121 and the topic-independent convolutional layer calculation unit 123 in the time direction.

全結合層計算部１４１は、出力素子ごとにフィルターに対応する出力に重みｆを掛けて加算し、最後にsoftmax関数により確率分布化する。Softmax関数により、例えば、図３に示す例のように出力の主題が”pricerange”と”NOT pricerange”である場合に、２つの素子の出力は、それぞれ０以上で、和が１となるように調整される。 The total coupling layer calculation unit 141 multiplies the output corresponding to the filter for each output element by adding the weight f, and finally generates a probability distribution by the softmax function. With the Softmax function, for example, when the output subjects are “pricerange” and “NOT pricerange” as in the example shown in FIG. 3, the output of the two elements is 0 or more and the sum is 1. Adjusted.

出力部１５０は、閾値（例えば０．５）と全結合層計算部１４１の結果とを比較し、確率分布が閾値を超える主題を出力する。 The output unit 150 compares a threshold value (for example, 0.5) with the result of the total connected layer calculation unit 141, and outputs a subject whose probability distribution exceeds the threshold value.

なお、本主題推定システムは、図５に示すようなハードウェア構成のコンピュータにより実行される。図５は、主題推定システムを実行するのに必要なコンピュータのハードウェア構成の一例を示す図である。 The subject estimation system is executed by a computer having a hardware configuration as shown in FIG. FIG. 5 is a diagram illustrating an example of a hardware configuration of a computer necessary for executing the subject estimation system.

本主題推定システムを実行するコンピュータは、図５に示すように、ＣＰＵ１００１、メモリ１００２、外部記憶装置１００３、ネットワークインターフェイス１００４、出力装置１００６及び入力装置１００７を備える。これらは、バスにより接続される。 As shown in FIG. 5, the computer that executes the present subject estimation system includes a CPU 1001, a memory 1002, an external storage device 1003, a network interface 1004, an output device 1006, and an input device 1007. These are connected by a bus.

本主題推定システムのすべての演算はＣＰＵ１００１で行われ、重み等の更新が必要な値やプログラムはメモリ１００２上に記憶される。また、学習データなどの大量のデータは外部記憶装置１００３に記憶される。ネットワークインターフェイス１００４は、インターネット１００５上のデータにアクセスして外部から学習データを取り込むために用いられる。また、ユーザーインターフェイスとして、出力装置１００６と入力装置１００７も必要である。入力装置１００７は、入力ボタン、タッチパッド、タッチパネルディスプレイなどといったユーザインタフェースとなる装置であり、ユーザの操作を受け付ける。 All operations of the subject estimation system are performed by the CPU 1001, and values and programs that require updating of weights and the like are stored in the memory 1002. A large amount of data such as learning data is stored in the external storage device 1003. The network interface 1004 is used to access data on the Internet 1005 and capture learning data from the outside. Also, an output device 1006 and an input device 1007 are necessary as a user interface. The input device 1007 is a device serving as a user interface such as an input button, a touch pad, or a touch panel display, and accepts a user operation.

図６は、図４に示す識別時の主題推定システムの動作を示すフローチャートである。 FIG. 6 is a flowchart showing the operation of the subject estimation system at the time of identification shown in FIG.

まず、入力部１１０は、入力単語列とトピックラベルとを受理する（Ｓ１０１）。次に、単語ベクトル列制御部１１１は、入力単語列の各単語を所定の方法により計算して単語（単語列）のベクトル列を得る（Ｓ１０２）。次に、トピック依存畳み込み層計算部１２１は、単語のベクトル列と格納部１２２に記憶する重みＡ（第１重み）との畳み込み演算（トピック依存畳み込み演算）を行う（Ｓ１０３）。次に、トピック非依存畳み込み層計算部１２３は、単語のベクトル列と格納部１２４に記憶する重みＺ（第２重み）との畳み込み演算（トピック非依存畳み込み演算）を行う（Ｓ１０４）。次に、プーリング層計算部１３１は、トピック依存畳み込み層計算部１２１の出力とトピック非依存畳み込み層計算部１２３の出力とから各フィルターの最大値を取り出すプーリング処理を行う（Ｓ１０５）。次に、全結合層計算部１４１は、プーリング層計算部１３１の出力に対して、全結合層処理を行う（Ｓ１０６）。より具体的には、全結合層計算部１４１は、プーリング層計算部１３１の出力に対して、格納部１２２に記憶する重みｆにより重み付けを行って、加算する。そして、全ての出力ラベルに対しての重み付き加算を行った後に、確率分布化を行う。最後に、出力部１５０は、出力ラベルに対する確率分布と閾値とを比較することで、出力ラベルを決定する（Ｓ１０７）。 First, the input unit 110 receives an input word string and a topic label (S101). Next, the word vector string control unit 111 calculates each word of the input word string by a predetermined method to obtain a vector string of words (word strings) (S102). Next, the topic-dependent convolution layer calculation unit 121 performs a convolution operation (topic-dependent convolution operation) between the word vector string and the weight A (first weight) stored in the storage unit 122 (S103). Next, the topic-independent convolutional layer calculation unit 123 performs a convolution operation (topic-independent convolution operation) between the vector sequence of words and the weight Z (second weight) stored in the storage unit 124 (S104). Next, the pooling layer calculation unit 131 performs a pooling process for extracting the maximum value of each filter from the output of the topic-dependent convolutional layer calculation unit 121 and the output of the topic-independent convolutional layer calculation unit 123 (S105). Next, the total connection layer calculation unit 141 performs a total connection layer process on the output of the pooling layer calculation unit 131 (S106). More specifically, the total connected layer calculation unit 141 weights the output of the pooling layer calculation unit 131 with the weight f stored in the storage unit 122 and adds the weighted values. Then, after performing weighted addition for all output labels, probability distribution is performed. Finally, the output unit 150 determines the output label by comparing the probability distribution with respect to the output label and the threshold (S107).

なお、図４に示す主題推定システムがマルチラベル出力を行うときには、Ｓ１０３とＳ１０４との処理を並行に行い、後段でそれらの結果を統合すればよい。以下の学習時でも同様である。 Note that when the theme estimation system shown in FIG. 4 performs multi-label output, the processes of S103 and S104 may be performed in parallel, and the results thereof may be integrated at a later stage. The same applies to the following learning.

（学習時）
図７は、本実施の形態における主題推定システムの学習時の機能構成を示すブロック図である。図４と同様の要素には同一の符号を付しており、詳細な説明を省略する。図７に示す学習時の主題推定システムは、図４に示す識別時の主題推定システム同様に、上述した図５に示すようなハードウェア構成のコンピュータにより実行される。 (During learning)
FIG. 7 is a block diagram showing a functional configuration during learning of the subject estimation system in the present embodiment. Elements similar to those in FIG. 4 are denoted by the same reference numerals, and detailed description thereof is omitted. The subject estimation system at the time of learning shown in FIG. 7 is executed by a computer having a hardware configuration as shown in FIG. 5 described above, similarly to the subject estimation system at the time of identification shown in FIG.

図７に示す学習時の機能構成図は、図４に示す識別時の機能構成図と比較して、学習データ６０と、エラー判定部１６０と、重み更新部１６１とが異なる。 7 is different from the functional configuration diagram at the time of identification shown in FIG. 4 in the learning data 60, the error determination unit 160, and the weight update unit 161.

学習時の本主題推定システムには、学習データ（訓練データ）として、単語列情報６１とトピック情報６２とが入力される。また、学習データ６０には、主題情報６３は、学習時の本主題推定システムに入力される単語列情報５１およびトピック情報５２に対応した主題情報６３であって出力として望ましい主題情報６３が記憶されている。 The subject string estimation system at the time of learning receives word string information 61 and topic information 62 as learning data (training data). The learning data 60 stores the subject information 63 which is the subject information 63 corresponding to the word string information 51 and the topic information 52 input to the subject estimation system at the time of learning, and is desirable as an output. ing.

エラー判定部１６０は、出力部１５０で出力される主題ラベルに対する確率分布と、主題情報６３から得られる望ましい主題ラベルの確率を１．０とし、他のラベルの確率を０．０とした場合の確率分布と比較し、それらの確率分布の差をエラーとして出力する。 The error determination unit 160 sets the probability distribution of the desired label obtained from the probability distribution for the subject label output from the output unit 150 and the subject information 63 as 1.0, and the probability of other labels as 0.0. Compare with the probability distribution and output the difference between the probability distributions as an error.

重み更新部１６１は、所定の学習アルゴリズムにより、エラー判定部１６０から出力されるエラーの値に基づき、重みＡ（第１重み）、重みＺ（第２重み）、および重みｆそれぞれの更新量を決定し、それらの更新を実行する。このような重みの更新は、学習データ全体にわたって、学習係数を変化させながら繰り返し実行される。 Based on the error value output from the error determination unit 160 by a predetermined learning algorithm, the weight update unit 161 updates the update amounts of the weight A (first weight), the weight Z (second weight), and the weight f, respectively. Determine and perform those updates. Such updating of the weight is repeatedly executed over the entire learning data while changing the learning coefficient.

図８は、図７に示す学習時の主題推定システムの動作を示すフローチャートである。なお、Ｓ２０１〜Ｓ２０７の処理は、図６に示すＳ１０１〜Ｓ１０７の処理と同様であるので説明を省略する。 FIG. 8 is a flowchart showing the operation of the subject estimation system during learning shown in FIG. Note that the processing of S201 to S207 is the same as the processing of S101 to S107 shown in FIG.

学習時の主題推定システムは、Ｓ２０７までの処理により入力単語列とトピックとに基づき、主題ラベルに対する確率分布を推定している。次に、エラー判定部１６０は、主題情報６３から望ましい主題を得て、主題ラベルに対する確率分布の望ましい値（確率）をセットし、セットした値と推定した主題ラベルに対する確率分布との差をエラーとして計算する（Ｓ２０８）。次に、重み更新部１６１は、トピック依存畳み込み層計算部１２１で用いた重みＡ（第１重み）と全結合層計算部１４１で用いる、現在のトピックに関連する重みｆとトピック非依存畳み込み層計算部１２３で用いた重みＺ（第１重み）とを、所定の学習アルゴリズムにより更新する（Ｓ２０９）。 The subject estimation system at the time of learning estimates the probability distribution for the subject label based on the input word string and the topic by the processing up to S207. Next, the error determination unit 160 obtains a desired subject from the subject information 63, sets a desired value (probability) of the probability distribution with respect to the subject label, and determines the difference between the set value and the estimated probability distribution with respect to the subject label as an error. Is calculated as (S208). Next, the weight update unit 161 uses the weight A (first weight) used in the topic-dependent convolution layer calculation unit 121 and the weight f related to the current topic and the topic-independent convolution layer used in the all-connection layer calculation unit 141. The weight Z (first weight) used in the calculation unit 123 is updated by a predetermined learning algorithm (S209).

なお、これら学習は予め設定した終了条件を満たすかどうかの終了判定を行い（Ｓ２１０）、終了条件が満たされるまで繰り返される。この終了判定には、各重みの更新を行ってもエラーが改善されないことを条件としたり、エラーが閾値以下になったことを条件としたりする。 Note that these learnings are performed to determine whether or not a preset termination condition is satisfied (S210), and are repeated until the termination condition is satisfied. This termination determination is made on the condition that the error is not improved even if each weight is updated, or on the condition that the error is equal to or less than the threshold value.

[効果等]
以上のように、本実施の形態の畳み込みニューラルネットワーク１０を利用する主題推定システムは、学習データが十分でない場合でも、対話の主題をより高精度に推定することができる。また、この主題推定システムは、マルチドメイン対話主題も推定することができる。 [Effects]
As described above, the subject estimation system using the convolutional neural network 10 according to the present embodiment can estimate the subject of the conversation with higher accuracy even when the learning data is not sufficient. The subject estimation system can also estimate multi-domain interaction themes.

より具体的には、畳み込みニューラルネットワーク１０の畳み込み層１２をトピックに依存するトピック依存畳み込み層とトピックに依存しないトピック非依存畳み込み層とで構成し、プーリング層でそれらをマージし、全結合層でトピック依存畳み込み層由来の出力とトピック非依存畳み込み層由来の出力とのバランスを取る。これによって、学習データが多いピックが対話文として入力された場合にはトピック依存畳み込み層の出力が重視され、学習データが少ないトピックが対話文として入力された場合にはトピック非依存の畳み込み層の出力が重視されるため、学習データの量の違いによる性能の差が低減される。 More specifically, the convolution layer 12 of the convolutional neural network 10 includes a topic-dependent topic-dependent convolution layer and a topic-independent topic-independent convolution layer, merges them in the pooling layer, Balance the output from the topic-dependent convolutional layer with the output from the topic-independent convolutional layer. As a result, when a pick with a lot of learning data is input as a conversation sentence, the output of the topic-dependent convolution layer is emphasized, and when a topic with a small amount of learning data is input as a conversation sentence, the topic-independent convolution layer is output. Since the output is emphasized, the difference in performance due to the difference in the amount of learning data is reduced.

ここで、本実施の形態の畳み込みニューラルネットワーク１０を利用する主題推定システムが、比較例のニューラルネットワークを利用する主題推定システムと比較して、推定精度が向上していることについて実験的検証結果を用いて説明する。 Here, an experimental verification result is shown that the estimation accuracy of the subject estimation system using the convolutional neural network 10 of the present embodiment is improved as compared with the subject estimation system using the neural network of the comparative example. It explains using.

図９は、本実施の形態における主題推定システムの実験的検証結果を示す図である。図９には、Dialog State Tracking Challenge 4 (DSTC4)の対話コーパスを用いたときの、比較例と本実施の形態の主題推定システムの主題推定の精度を比較した結果が示されている。DSTC4対話コーパスは５つのドメイン（Attraction, Accommodation, Food, Shopping, Transportation）の対話が含まれる。各ドメインにおいて、対話セクションに対してPricerange、Preference、Exhibitなどの全部５４種類の主題を推定することができるが、図９には、対話主題ラベルが「Pricerange」の場合の結果が示されている。また、図９に示す「ACCOMMODATION(42/30)」の(42/30)は、学習データの数が４２で、テストデータの数が３０であることを意味している。 FIG. 9 is a diagram showing an experimental verification result of the subject estimation system in the present embodiment. FIG. 9 shows the result of comparing the accuracy of the subject estimation of the subject estimation system of the present embodiment and the comparative example when using the Dialog State Tracking Challenge 4 (DSTC4) dialogue corpus. The DSTC4 dialogue corpus includes dialogue of five domains (Attraction, Accommodation, Food, Shopping, Transportation). In each domain, 54 types of themes such as Pricerange, Preference, and Exhibit can be estimated for the dialogue section. FIG. 9 shows the result when the dialogue subject label is “Pricerange”. . Further, (42/30) of “ACCOMMODATION (42/30)” shown in FIG. 9 means that the number of learning data is 42 and the number of test data is 30.

比較例のGeneral Modelは、例えば図１に示す畳み込みニューラルネットワーク８０を利用した主題推定システムであり、すべてのトピックを一つの畳み込みニューラルネットワーク８０で学習させた主題推定システムを意味する。また、比較例のTopic-specific Modelは、Attractionのドメインの対話のみ学習するニューラルネットワークを利用する主題推定システムなど、ドメイン毎にドメインに対応する主題推定システムを構成する場合を意味する。つまり、トピック毎に別の主題推定システムのニューラルネットワークに学習させた場合である。 The general model of the comparative example is a theme estimation system using the convolutional neural network 80 shown in FIG. 1, for example, and means a theme estimation system in which all topics are learned by one convolutional neural network 80. Further, the Topic-specific Model of the comparative example means a case where a subject estimation system corresponding to a domain is configured for each domain, such as a subject estimation system using a neural network that learns only the interaction of the domain of Attraction. That is, it is a case where the neural network of another subject estimation system is trained for each topic.

一方、Multi-topic modelは、図３に示す畳み込みニューラルネットワーク１０を利用した主題推定システムであり、本実施の形態における主題推定システムを意味する。 On the other hand, the multi-topic model is a theme estimation system using the convolutional neural network 10 shown in FIG. 3 and means the theme estimation system in the present embodiment.

図９に示すように、実験結果では、Multi-topic modelは、対話主題ラベルが「Pricerange」の場合の各ドメインのすべての正解率が比較例のものより高い。また、推定精度を示すF値（overall）に関してもMulti-topic modelは、２つの比較例より向上していることがわかる。 As shown in FIG. 9, in the experimental results, the multi-topic model has a higher accuracy rate of each domain when the dialogue subject label is “Pricerange” than that of the comparative example. It can also be seen that the multi-topic model is improved over the two comparative examples with respect to the F value (overall) indicating the estimation accuracy.

なお、DSTC4対話コーパスを用いた全対話主題ラベルの推定精度は、Multi-topic modelが４８％、General Modelが４３％、Topic-specific Modelが４３％であったことからも、Multi-topic modelは、２つの比較例よりも推定精度が向上しているのがわかる。 Note that the estimation accuracy of all conversation subject labels using the DSTC4 dialogue corpus was 48% for Multi-topic model, 43% for General Model, and 43% for Topic-specific Model. It can be seen that the estimation accuracy is improved over the two comparative examples.

（実施の形態２）
実施の形態１では、畳み込み層１２をトピックに依存するトピック依存畳み込み層とトピックに依存しないトピック非依存畳み込み層とで構成することで、主題推定の推定精度が向上することについて説明した。この畳み込み層１２の構成では、上述したように、トピック依存畳み込み層に対する学習データが少なくなる傾向がある。本実施の形態では、学習データの不足を補うために、実施の形態１で説明した畳み込みニューラルネットワーク１０を利用する主題推定システムが半教師有り学習を利用する場合について機能構成図と動作図を用いて説明する。 (Embodiment 2)
In Embodiment 1, it has been described that the estimation accuracy of subject estimation is improved by configuring the convolution layer 12 with a topic-dependent topic-dependent convolution layer and a topic-independent topic-independent convolution layer. In the configuration of this convolution layer 12, as described above, there is a tendency that learning data for the topic-dependent convolution layer is reduced. In this embodiment, in order to compensate for the shortage of learning data, a functional configuration diagram and an operation diagram are used when the subject estimation system using the convolutional neural network 10 described in the first embodiment uses semi-supervised learning. I will explain.

図１０は、本実施の形態における主題推定システムの追加学習時の機能構成を示すブロック図である。図７と同様の要素には同一の符号を付しており、詳細説明を省略する。図１０に示す追加学習時の主題推定システムは、図７に示す学習時の主題推定システム同様に、上述した図５に示すようなハードウェア構成のコンピュータにより実行される。 FIG. 10 is a block diagram showing a functional configuration during additional learning of the subject estimation system in the present embodiment. Elements similar to those in FIG. 7 are denoted by the same reference numerals, and detailed description thereof is omitted. The subject estimation system at the time of additional learning shown in FIG. 10 is executed by a computer having a hardware configuration as shown in FIG. 5 described above, similarly to the subject estimation system at the time of learning shown in FIG.

図１０に示す追加学習時の機能構成図は、図７に示す学習時の機能構成図と比較して、外部データ取得部１７０が追加されている。 Compared with the functional configuration diagram at the time of learning shown in FIG. 7, the external data acquisition unit 170 is added to the functional configuration diagram at the time of additional learning shown in FIG.

外部データ取得部１７０は、学習用対話テキストに対応する単語列ベクトル列のうちある依存トピックに関連する単語列ベクトル列の数が別の依存トピックに関連する単語列ベクトル列の数よりも少ない場合には、Ｗｅｂを検索して得たある依存トピックに関連する対話テキストを学習データの半教師データとして取得する。 When the number of word string vector sequences related to a certain dependent topic is less than the number of word string vector sequences related to another dependent topic among the word string vector sequences corresponding to the learning dialogue text, the external data acquisition unit 170 In this case, a dialog text related to a certain dependent topic obtained by searching the Web is acquired as semi-teacher data of learning data.

より具体的には、外部データ取得部１７０は、例えば対話データの内容が旅行のプランニングに関する場合、インターネットの旅行の口コミサイトから旅行に関するテキスト情報を教師なし学習データとして取得する。しかし、旅行の口コミサイトの情報には、上述した公知のデータセットを用いた学習データ（教師あり学習データ）のように主題情報の正解ラベルは付与されていない。また、旅行の口コミサイトの情報には、トピックラベルも付与されていない。 More specifically, when the content of the dialog data relates to travel planning, for example, the external data acquisition unit 170 acquires text information related to travel as unsupervised learning data from an Internet travel review site. However, the correct word label of the subject information is not given to the information on the word-of-mouth site of the travel like the learning data (supervised learning data) using the known data set described above. The topic label is not given to the information on the travel word-of-mouth site.

そこで、本実施の形態の主題推定システムは、このような教師なし学習データである旅行の口コミサイトの情報に擬似的に正解ラベルを付与することで教師あり学習データを増やす。これにより、量の少ないトピックに関する教師あり学習データを増やすことができる。より詳細には、図１０に示す主題推定システムは、識別時の動作（図６に示すＳ１０１〜Ｓ１０７の識別処理）を行い、主題を推定する。また、図１０に示す主題推定システムは、旅行の口コミサイトの情報のトピックについては、全てのラベルについて、順に入力する。さらに、こようにして得られた、トピックに対応した全結合層計算部１４１の出力の主題の推定確率が予め設定した閾値より大きいものに限定して、トピックのラベルと主題のラベルを付与する。 Therefore, the subject estimation system according to the present embodiment increases the supervised learning data by giving a correct answer label to the information of the travel word-of-mouth site that is such unsupervised learning data. Thereby, supervised learning data regarding a topic with a small amount can be increased. More specifically, the subject estimation system shown in FIG. 10 performs an operation at the time of identification (identification processing in S101 to S107 shown in FIG. 6) to estimate a subject. In addition, the subject estimation system shown in FIG. 10 sequentially inputs all the labels for the topic of information on the word-of-mouth travel site. Furthermore, the topic label and the subject label are assigned only to those obtained by the above-described method so that the estimated probability of the theme of the output of the total connected layer calculation unit 141 corresponding to the topic is larger than a preset threshold value. .

次に、図１０に示す主題推定システムは、トピックのラベルと主題のラベルを付与した旅行の口コミサイトの情報を用いて、再度、学習時の動作（図８に示すＳ２０１〜Ｓ２１０の学習処理）を行い、再度旅行の口コミサイトの情報に対する識別処理と、再度の学習処理とを繰り返す。なお、このように繰り返す学習処理（半教師あり学習処理）においては、閾値を初めは高く設定し、徐々に低くするとよい。また、口コミサイトからテキストデータを得る際に、例えば、タイトルがexhibitionであれば、主題ラベルのexhibitに関連する内容であるということが期待できるので、主題ラベルごとに関連語句を設定して、タイトルなどで制限を加えると効果的である。また、トピックラベルについてもタイトルなどから制限を加えると効果的である。つまり、外部データ取得部１７０は、旅行の口コミサイトのレビュー文などの外部の教師なしデータを取得し、取得した当該教師なし学習データから、対話主題と無関係なデータをキーワードで除外することで、有用な所定の対話主題に対応する教師あり学習データを取得することができる。 Next, the subject estimation system shown in FIG. 10 uses the information on the word-of-mouth site of the trip to which the topic label and the subject label are assigned, and again performs the learning operation (the learning process in S201 to S210 shown in FIG. 8). The identification process for the travel review site information and the learning process again are repeated. In the learning process that repeats in this way (semi-supervised learning process), the threshold value may be initially set high and gradually reduced. Also, when obtaining text data from a word-of-mouth site, for example, if the title is an exhibition, it can be expected that the content is related to the subject label exhibit, so set a related phrase for each subject label, It is effective to add restrictions such as. In addition, it is effective to limit the topic label from the title. In other words, the external data acquisition unit 170 acquires external unsupervised data such as a review text of a travel review site, and excludes data unrelated to the conversation subject from the acquired unsupervised learning data by keywords, Supervised learning data corresponding to useful predetermined dialog themes can be obtained.

図１１は、図１０に示す追加学習時の主題推定システムの動作を示すフローチャートである。 FIG. 11 is a flowchart showing the operation of the subject estimation system during additional learning shown in FIG.

まず、教師あり学習データを準備する（Ｓ３０１）。これは、例えば、人間同士の対話データを音声で収録し、人手によって書き起こす。または、キーボード入力によるチャットを行い、テキストを保存する。さらに、対話の中で、どこからどこまでがなんと言うトピックであるかを認定するというアノテーションを人手で行う。アノテーションには、クラウドソーシングを利用することができる。しかし、これらの作業にはコストがかかるため、学習用データは学習するのに不十分であることが多い。 First, supervised learning data is prepared (S301). For example, dialogue data between humans is recorded by voice and written manually. Or, chat by keyboard input and save the text. Furthermore, in the dialogue, an annotation is given manually to identify what is the topic from where to where. Crowd sourcing can be used for annotation. However, since these operations are costly, the learning data is often insufficient to learn.

次に、本実施の形態の主題推定システムは、得られた教師あり学習データを用いて、Ｓ２０１〜Ｓ２１０の学習処理を行う（Ｓ３０２）。 Next, the subject estimation system of the present embodiment performs the learning process of S201 to S210 using the obtained supervised learning data (S302).

次に、外部データ取得部１７０は、トピックや主題に関連する教師なし学習データを取得する（Ｓ３０３）。具体的には、外部データ取得部１７０は、上述したように、単に旅行の口コミサイトという粒度での絞り込んで取得するのではなく、トピックや主題のラベルに関連する語彙を用いて、タイトルやその他の見出しを利用して、より細かい絞込みを行ったものを教師なし学習データ（単語列）として取得する。 Next, the external data acquisition unit 170 acquires unsupervised learning data related to the topic or subject (S303). Specifically, as described above, the external data acquisition unit 170 does not simply acquire by narrowing down the granularity of a travel word-of-mouth site, but uses a vocabulary related to a topic or subject label to obtain a title or other information. Using the headline, the narrowed-down data is acquired as unsupervised learning data (word string).

次に、本実施の形態の主題推定システムは、Ｓ３０２の学習処理により学習した畳み込みニューラルネットワーク１０を用いて、教師なし学習データのトピックラベルと主題ラベルとを推定し、推定したものを当該教師なし学習データに対してトピックラベルと主題ラベルとして付与する（Ｓ３０４）。 Next, the subject estimation system of the present embodiment estimates the topic label and the subject label of unsupervised learning data using the convolutional neural network 10 learned by the learning process of S302, and the estimated result is obtained without the teacher. A topic label and a subject label are assigned to the learning data (S304).

次に、本実施の形態の主題推定システムは、トピックラベルと主題ラベルとが付与された当該教師なし学習データを用いて、再度、Ｓ２０１〜Ｓ２１０の学習処理を行う（Ｓ３０５）。Ｓ３０２による学習結果とＳ３０５による学習結果とでは畳み込みニューラルネットワーク１０の重み（第１重み、第２重み）が変化するため、それに応じてＳ３０４で推定するトピックラベルと主題ラベルも変化する。 Next, the subject estimation system of the present embodiment performs the learning process of S201 to S210 again using the unsupervised learning data to which the topic label and the subject label are assigned (S305). Since the weight (first weight, second weight) of the convolutional neural network 10 changes between the learning result in S302 and the learning result in S305, the topic label and the subject label estimated in S304 also change accordingly.

次に、本実施の形態の主題推定システムは、重みの更新の変化量などを基準とした終了判定を行う（Ｓ３０６）。終了条件を満たさない場合は、Ｓ３０４とＳ３０５を繰り返す。 Next, the subject estimation system according to the present embodiment performs an end determination based on the amount of change in weight update or the like (S306). If the end condition is not satisfied, S304 and S305 are repeated.

[効果等]
以上のように、本実施の形態の畳み込みニューラルネットワーク１０を利用する主題推定システムは、学習データが十分でない場合でも、教師なし学習データを識別処理し擬似的に正解ラベルを付与して学習処理を繰り返し行うことで、教師有り学習データを十分に増やすことができる。それにより、本実施の形態の主題推定システムは、対話の主題をより高精度に推定することができる。 [Effects]
As described above, the subject estimation system using the convolutional neural network 10 according to the present embodiment performs the learning process by identifying unsupervised learning data and assigning a correct answer label in a pseudo manner even when the learning data is not sufficient. By repeating it, supervised learning data can be increased sufficiently. Thereby, the theme estimation system of the present embodiment can estimate the theme of the dialog with higher accuracy.

ここで、本実施の形態の畳み込みニューラルネットワーク１０を利用する主題推定システムが、上記のような学習処理を繰り返すことにより推定精度が向上することについて実験的検証結果を用いて説明する。 Here, the subject estimation system using the convolutional neural network 10 of the present embodiment will be described using experimental verification results to improve estimation accuracy by repeating the learning process as described above.

図１２は、本実施の形態における主題推定システムの実験的検証結果を示すグラフである。図１２には、DSTC4の対話コーパスを用いたときの、本実施の形態の主題推定システムが行う半教師あり学習処理の効果が示されている。 FIG. 12 is a graph showing experimental verification results of the subject estimation system in the present embodiment. FIG. 12 shows the effect of the semi-supervised learning process performed by the subject estimation system of the present embodiment when the DSTC4 dialogue corpus is used.

本実験では、対話文がExhibit主題に属する、属さないかの２クラス分類において、本実施の形態の主題推定システムが行う半教師あり学習処理により外部から取得した教師なしデータを教師データ（教師ありデータ）として元の教師ありデータに追加する。ここで、元の教師あり学習データ数は７６２、追加した教師なし学習データ数は２０〜７５３である。 In this experiment, unsupervised data acquired from the outside by the semi-supervised learning process performed by the subject estimation system of the present embodiment in the two-class classification of whether the dialogue sentence belongs to the Exhibit theme or not belongs to the teacher data (supervised Data) to the original supervised data. Here, the original number of supervised learning data is 762, and the added number of unsupervised learning data is 20 to 753.

図１２に示すように、外部から取得した教師なしデータに当該半教師あり学習処理を行い教師あり学習データとして追加することにより、２値分類精度が最大３％向上したことがわかる。 As shown in FIG. 12, it can be seen that the binary classification accuracy is improved by 3% at maximum by performing the semi-supervised learning process on the unsupervised data acquired from the outside and adding it as supervised learning data.

以上、実施の形態１および実施の形態２において本発明の主題推定システムおよび主題推定方法について説明したが、各処理が実施される主体や装置に関しては特に限定しない。ローカルに配置された特定の装置内に組み込まれたプロセッサーなど（以下に説明）によって処理されてもよい。またローカルの装置と異なる場所に配置されているクラウドサーバなどによって処理されてもよい。 As described above, the subject estimation system and the subject estimation method of the present invention have been described in the first embodiment and the second embodiment, but there is no particular limitation on the subject or apparatus in which each process is performed. It may be processed by a processor or the like (described below) embedded in a specific device located locally. Further, it may be processed by a cloud server or the like arranged at a location different from the local device.

なお、本発明は、さらに、以下のような場合も含まれる。 Note that the present invention further includes the following cases.

（１）上記の装置は、具体的には、マイクロプロセッサ、ＲＯＭ、ＲＡＭ、ハードディスクユニット、ディスプレイユニット、キーボード、マウスなどから構成されるコンピュータシステムである。前記ＲＡＭまたはハードディスクユニットには、コンピュータプログラムが記憶されている。前記マイクロプロセッサが、前記コンピュータプログラムにしたがって動作することにより、各装置は、その機能を達成する。ここでコンピュータプログラムは、所定の機能を達成するために、コンピュータに対する指令を示す命令コードが複数個組み合わされて構成されたものである。 (1) Specifically, the above apparatus is a computer system including a microprocessor, ROM, RAM, a hard disk unit, a display unit, a keyboard, a mouse, and the like. A computer program is stored in the RAM or hard disk unit. Each device achieves its functions by the microprocessor operating according to the computer program. Here, the computer program is configured by combining a plurality of instruction codes indicating instructions for the computer in order to achieve a predetermined function.

（２）上記の装置を構成する構成要素の一部または全部は、１個のシステムＬＳＩ（Large Scale Integration：大規模集積回路）から構成されているとしてもよい。システムＬＳＩは、複数の構成部を１個のチップ上に集積して製造された超多機能ＬＳＩであり、具体的には、マイクロプロセッサ、ＲＯＭ、ＲＡＭなどを含んで構成されるコンピュータシステムである。前記ＲＡＭには、コンピュータプログラムが記憶されている。前記マイクロプロセッサが、前記コンピュータプログラムにしたがって動作することにより、システムＬＳＩは、その機能を達成する。 (2) A part or all of the constituent elements constituting the above-described apparatus may be constituted by one system LSI (Large Scale Integration). The system LSI is an ultra-multifunctional LSI manufactured by integrating a plurality of components on a single chip, and specifically, a computer system including a microprocessor, ROM, RAM, and the like. . A computer program is stored in the RAM. The system LSI achieves its functions by the microprocessor operating according to the computer program.

（３）上記の装置を構成する構成要素の一部または全部は、各装置に脱着可能なＩＣカードまたは単体のモジュールから構成されているとしてもよい。前記ＩＣカードまたは前記モジュールは、マイクロプロセッサ、ＲＯＭ、ＲＡＭなどから構成されるコンピュータシステムである。前記ＩＣカードまたは前記モジュールは、上記の超多機能ＬＳＩを含むとしてもよい。マイクロプロセッサが、コンピュータプログラムにしたがって動作することにより、前記ＩＣカードまたは前記モジュールは、その機能を達成する。このＩＣカードまたはこのモジュールは、耐タンパ性を有するとしてもよい。 (3) A part or all of the constituent elements constituting the above-described device may be constituted by an IC card that can be attached to and detached from each device or a single module. The IC card or the module is a computer system including a microprocessor, a ROM, a RAM, and the like. The IC card or the module may include the super multifunctional LSI described above. The IC card or the module achieves its function by the microprocessor operating according to the computer program. This IC card or this module may have tamper resistance.

（４）また、本発明は、上記に示す方法であるとしてもよい。また、これらの方法をコンピュータにより実現するコンピュータプログラムであるとしてもよいし、前記コンピュータプログラムからなるデジタル信号であるとしてもよい。 (4) Further, the present invention may be the method described above. Further, the present invention may be a computer program that realizes these methods by a computer, or may be a digital signal composed of the computer program.

（５）また、本発明は、前記コンピュータプログラムまたは前記デジタル信号をコンピュータで読み取り可能な記録媒体、例えば、フレキシブルディスク、ハードディスク、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤ、ＤＶＤ−ＲＯＭ、ＤＶＤ−ＲＡＭ、ＢＤ（Ｂｌｕ−ｒａｙ（登録商標）Ｄｉｓｃ）、半導体メモリなどに記録したものとしてもよい。また、これらの記録媒体に記録されている前記デジタル信号であるとしてもよい。 (5) In addition, the present invention provides a computer-readable recording medium such as a flexible disk, hard disk, CD-ROM, MO, DVD, DVD-ROM, DVD-RAM, BD ( It may be recorded on a Blu-ray (registered trademark) Disc), a semiconductor memory, or the like. The digital signal may be recorded on these recording media.

また、本発明は、前記コンピュータプログラムまたは前記デジタル信号を、電気通信回線、無線または有線通信回線、インターネットを代表とするネットワーク、データ放送等を経由して伝送するものとしてもよい。 In the present invention, the computer program or the digital signal may be transmitted via an electric communication line, a wireless or wired communication line, a network represented by the Internet, a data broadcast, or the like.

また、本発明は、マイクロプロセッサとメモリを備えたコンピュータシステムであって、前記メモリは、上記コンピュータプログラムを記憶しており、前記マイクロプロセッサは、前記コンピュータプログラムにしたがって動作するとしてもよい。 The present invention may be a computer system including a microprocessor and a memory, wherein the memory stores the computer program, and the microprocessor operates according to the computer program.

また、前記プログラムまたは前記デジタル信号を前記記録媒体に記録して移送することにより、または前記プログラムまたは前記デジタル信号を、前記ネットワーク等を経由して移送することにより、独立した他のコンピュータシステムにより実施するとしてもよい。 In addition, the program or the digital signal is recorded on the recording medium and transferred, or the program or the digital signal is transferred via the network or the like and executed by another independent computer system. You may do that.

本発明は、人間と機械とを問わず、二者間を基本とする自然言語のやりとりである対話の主題が何かを推定するというタスクを行う主題推定システムおよび主題推定方法に利用できる。 INDUSTRIAL APPLICABILITY The present invention can be used for a subject estimation system and a subject estimation method that perform a task of estimating what a conversation subject is, which is a natural language exchange between two people, regardless of a human or a machine.

１０、８０畳み込みニューラルネットワーク
１１、８１入力特徴
１１ａ、１１ｂ単語列ベクトル列
１２、８２畳み込み層
１２ａ、１２ｂトピック依存畳み込み層
１２ｚトピック非依存畳み込み層
１３、１３ａ、１３ｂ、８３プーリング層
１４、１４ａ、１４ｂ、８４全結合層
５０入力情報
５１、６１単語列情報
５２、６２トピック情報
６０学習データ
６３主題情報
１１０入力部
１１１単語ベクトル列制御部
１２１トピック依存畳み込み層計算部
１２２、１２４、１４２格納部
１２３トピック非依存畳み込み層計算部
１３１プーリング層計算部
１４１全結合層計算部
１５０出力部
１６０エラー判定部
１６１重み更新部
１７０外部データ取得部
１００１ＣＰＵ
１００２メモリ
１００３外部記憶装置
１００４ネットワークインターフェイス
１００５インターネット 10, 80 Convolutional neural network 11, 81 Input features 11a, 11b Word sequence vector sequence 12, 82 Convolution layer 12a, 12b Topic-dependent convolution layer 12z Topic-independent convolution layer 13, 13a, 13b, 83 Pooling layer 14, 14a, 14b , 84 Fully connected layer 50 Input information 51, 61 Word string information 52, 62 Topic information 60 Learning data 63 Subject information 110 Input unit 111 Word vector string control unit 121 Topic-dependent convolutional layer calculation unit 122, 124, 142 Storage unit 123 Topic Independent convolutional layer calculation unit 131 Pooling layer calculation unit 141 Fully connected layer calculation unit 150 Output unit 160 Error determination unit 161 Weight update unit 170 External data acquisition unit 1001 CPU
1002 Memory 1003 External storage device 1004 Network interface 1005 Internet

Claims

A subject estimation system for estimating a subject label of a dialog, comprising a convolutional neural network,
The convolutional neural network is:
One or more topic-dependent convolutional layers that perform topic-dependent convolution operations on an input of a word string vector sequence corresponding to the dialog text that transcribes the conversation, and one topic non-conformation that performs convolution operations that do not depend on the topic A convolution layer composed of a dependent convolution layer;
A pooling layer that performs a pooling process on the output of the convolution layer;
A total coupling layer that performs a total coupling process on the output of the pooling layer,
Thematic estimation system.

The convolutional neural network estimates the subject label of the dialog for the input by solving the input as a two-class classification problem;
The subject estimation system according to claim 1.

In the convolutional neural network, a dialogue text for learning that transcribes a dialogue, the time-series text of the dialogue is divided in advance into segments for each topic, and a label for a topic corresponding to each divided segment is obtained in advance. The first weight is used to cause each of the one or more topic-dependent convolutional layers to perform a convolution operation depending on the dependent topic for each dependent topic that is a dependent topic, using the given learning dialogue text as learning data. And learning a second weight so that the topic-independent convolution layer performs a convolution operation that does not depend on the dependent topic.
The subject estimation system according to claim 1.

Each of the one or more topic-dependent convolutional layers receives a word string vector string related to the dependent topic from among word string vector strings corresponding to the learning dialogue text, thereby performing a convolution operation depending on the dependent topic. Learning the first weight to perform
The topic-independent convolution layer learns the second weight so as to perform a convolution operation independent of the dependent topic by inputting a word string vector sequence corresponding to the learning dialogue text.
The subject estimation system according to claim 3.

A subject estimation method for a subject estimation system, comprising a convolutional neural network, for estimating a subject label of a dialogue, comprising:
A topic-dependent convolution processing step for performing a topic-dependent convolution operation on an input of a word string vector sequence corresponding to the dialog text that transcribes the dialog;
A topic-independent convolution processing step for performing a topic-independent convolution operation on the input;
A pooling processing step for performing a pooling process on the output of the topic-dependent convolution processing step and the output of the topic-independent convolution processing step;
A full join process step for performing a full join process on the output of the pooling process step,
Thematic estimation method.

In the topic-dependent convolution processing step, a convolution operation is performed between the word string vector sequence and a first weight (A) that fires at a specific word indicating a dependent topic that is a dependent topic,
In the topic-independent convolution process step, a convolution operation is performed between the word string vector sequence and a second weight (Z) that fires at a word indicating a topic other than the dependent topic,
In the pooling processing step, an operation for extracting a maximum value in the time direction from the output of the topic-dependent convolution processing step and the output of the topic-independent convolution processing step is performed,
In the total connection processing step, after performing weighted addition using the connection weight on the output of the pooling processing step, the total connection processing is performed by performing probability distribution.
The subject estimation method according to claim 5.

An output step of estimating and outputting a subject label of the dialogue by comparing a probability distribution of an output of the pooling processing step with a threshold value,
The subject estimation method according to claim 6.

An input step of inputting a word string vector sequence corresponding to the dialogue text;
The input step includes
Further, an accepting step of accepting a dialog text that transcribes the dialog, wherein the dialog is converted into text in time series, and
A vectorization step of obtaining a word string vector sequence by calculating a vector of each word of the word sequence included in the dialogue text by a predetermined method,
The subject estimation method according to claim 5.

further,
In the convolutional neural network, learning dialogue text in which dialogue is transcribed, the time-series text of the dialogue is divided in advance into segments for each topic, and a label for a topic corresponding to each divided segment is preliminarily obtained. A first step of learning the first weight so as to perform a convolution operation depending on a dependent topic that is a topic on which the topic-dependent convolution processing step depends, using the given learning dialogue text as learning data;
Using the learning dialogue text, the topic-independent convolution processing step includes a second step of learning the second weight so as to perform a convolution operation independent of the dependent topic.
The subject estimation method according to claim 6.

In the first step, the first weight is learned using a word string vector string related to the dependent topic among word string vector strings corresponding to the learning dialogue text;
In the second step, the second weight is learned using a word string vector string related to a topic other than the dependent topic among word string vector strings corresponding to the learning dialogue text.
The subject estimation method according to claim 9.

further,
When the number of word string vector sequences related to the first dependent topic is less than the number of word string vector sequences related to the second dependent topic among the word string vector sequences corresponding to the learning dialogue text, the Web is searched. Using the dialogue text related to the first dependent topic obtained as the semi-teacher data of the learning data, the first step and the second step are performed.
The subject estimation method according to claim 9 or 10.

A computer readable program comprising a convolutional neural network for estimating a subject label for dialogue,
A topic-dependent convolution processing step for performing a topic-dependent convolution operation on an input of a word string vector sequence corresponding to the dialog text that transcribes the dialog;
A topic-independent convolution processing step for performing a topic-independent convolution operation on the input;
A pooling processing step for performing a pooling process on the output of the topic-dependent convolution processing step and the output of the topic-independent convolution processing step;
A full join process step for performing a full join process on the output of the pooling process step,
program.