JP7432898B2

JP7432898B2 - Parameter optimization device, parameter optimization method, and program

Info

Publication number: JP7432898B2
Application number: JP2021035373A
Authority: JP
Inventors: 努平尾; 昌明永田; 尚輝小林; 英剛上垣外; 学奥村
Original assignee: Nippon Telegraph and Telephone Corp; Tokyo Institute of Technology NUC
Current assignee: Nippon Telegraph and Telephone Corp; Tokyo Institute of Technology NUC
Priority date: 2021-03-05
Filing date: 2021-03-05
Publication date: 2024-02-19
Anticipated expiration: 2041-03-05
Also published as: JP2022135517A

Description

本発明は、計算機を用いて自動的に言語を処理する自然言語処理分野に属し、特に、ＥＤＵ（Elementary Discourse Units）を葉とした木構造（修辞構造木）として文書を表す修辞構造解析技術に関連するものである。 The present invention belongs to the field of natural language processing in which language is automatically processed using a computer, and in particular to rhetorical structure analysis technology that represents a document as a tree structure (rhetorical structure tree) with EDU (Elementary Discourse Units) as leaves. It is related.

文書の修辞構造解析を行って、修辞構造木を自動的に構築する様々な修辞構造解析器が提案されている。多くの修辞構造解析器は人間が用意した正解データ、すなわち、修辞構造を表すアノテーションが付与された文書を学習データとしたニューラルネットワークを利用して実現されている。 Various rhetorical structure analyzers have been proposed that automatically construct a rhetorical structure tree by analyzing the rhetorical structure of a document. Many rhetorical structure analyzers are implemented using neural networks that use human-prepared ground truth data, that is, documents annotated with rhetorical structure, as training data.

例えば、非特許文献１に開示された技術では、スパンの２分割によって木構造を推定するネットワークのパラメタ、隣接するスパンに対する核性、関係ラベルを付与するネットワークのパラメタを、それぞれ正解データを用いて学習している。なお、核性ラベリングは分割された２つのスパンを「Ｎ－Ｓ」、「Ｓ－Ｎ」、「Ｎ－Ｎ」の３通りのいずれかに分類する問題に帰着し、関係ラベリングは「Ｅｌａｂｏｒａｔｉｏｎ，Ｂａｃｋｇｒｏｕｎｄ」などの全１８種のいずれかのラベルに分類する問題に帰着する。 For example, in the technology disclosed in Non-Patent Document 1, the parameters of a network that estimates a tree structure by dividing a span into two, the parameters of a network that assigns nuclearity to adjacent spans, and relationship labels are determined using ground truth data. I'm learning. Note that nuclear labeling results in the problem of classifying two divided spans into one of three ways: "NS", "SN", and "NN", and relational labeling results in "Elaboration, The problem comes down to classifying the image into one of 18 types of labels, such as ``Background.''

Kobayashi Naoki, Tsutomu Hirao, Hidetaka Kamigaito, Manabu Okumura, Masaaki Nagata: Top-Down RST Parsing Utilizing Granularity Levels in Documents, AAAI-2020, pp. 8099-8106, (2020)Kobayashi Naoki, Tsutomu Hirao, Hidetaka Kamigaito, Manabu Okumura, Masaaki Nagata: Top-Down RST Parsing Utilizing Granularity Levels in Documents, AAAI-2020, pp. 8099-8106, (2020)

修辞構造のアノテーションには多大なコストがかかるため大量の修辞構造アノテーション済みデータを用意することが困難であり、現状で最大規模のデータでも３８５文書しかない。よって、多くの修辞構造解析技術はデータスパースネスの問題に直面しており、特に１８クラス分類問題を解かねばならない関係ラベリングの性能に問題がある。 Rhetorical structure annotation costs a lot of money, so it is difficult to prepare a large amount of rhetorical structure annotated data, and at present the largest amount of data is only 385 documents. Therefore, many rhetorical structure analysis techniques face the problem of data sparsity, especially the performance of relational labeling, which has to solve an 18-class classification problem.

本発明は上記の点に鑑みてなされたものであり、大量の修辞構造アノテーション済みデータを用いることなく、性能の良い修辞構造解析器を実現するための技術を提供することを目的とする。 The present invention has been made in view of the above points, and it is an object of the present invention to provide a technique for realizing a rhetorical structure analyzer with good performance without using a large amount of rhetorical structure annotated data.

開示の技術によれば、ラベルなしデータから複数の修辞構造解析器により得られた複数の修辞構造木において共通する部分木を、疑似正解データとして生成する疑似正解データ生成部と、
前記疑似正解データ生成部により生成された前記疑似正解データを用いて、ニューラルネットワークを用いた修辞構造解析器のパラメタを最適化する事前学習部と、
前記事前学習部により最適化された前記パラメタを、正解データを用いてファインチューニングするファインチューニング部と
を備えるパラメタ最適化装置が提供される。 According to the disclosed technology, a pseudo-correct data generation unit that generates, as pseudo-correct data, a subtree common to a plurality of rhetorical structure trees obtained from unlabeled data by a plurality of rhetorical structure analyzers;
a pre-learning unit that optimizes parameters of a rhetorical structure analyzer using a neural network using the pseudo-correct data generated by the pseudo-correct data generating unit;
A parameter optimization device is provided, comprising: a fine-tuning unit that fine-tunes the parameters optimized by the pre-learning unit using correct data.

開示の技術によれば、大量の修辞構造アノテーション済みデータを用いることなく、性能の良い修辞構造解析器を実現するための技術が提供される。 According to the disclosed technique, a technique is provided for realizing a rhetorical structure analyzer with good performance without using a large amount of rhetorical structure annotated data.

修辞構造木の例を示す図である。FIG. 3 is a diagram illustrating an example of a rhetorical structure tree. パラメタ最適化装置の構成図である。FIG. 2 is a configuration diagram of a parameter optimization device. 疑似正解データ生成部を説明するための図である。FIG. 3 is a diagram for explaining a pseudo-correct data generation section. アルゴリズム１を示す図である。FIG. 2 is a diagram showing Algorithm 1. アルゴリズム１の動作例を説明するための図である。3 is a diagram for explaining an example of the operation of algorithm 1. FIG. 事前学習部を説明するための図である。FIG. 3 is a diagram for explaining a pre-learning section. 装置のハードウェア構成例を示す図である。It is a diagram showing an example of the hardware configuration of the device.

以下、図面を参照して本発明の実施の形態（本実施の形態）を説明する。以下で説明する実施の形態は一例に過ぎず、本発明が適用される実施の形態は、以下の実施の形態に限られるわけではない。 DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention (this embodiment) will be described below with reference to the drawings. The embodiments described below are merely examples, and embodiments to which the present invention is applied are not limited to the following embodiments.

（修辞構造木について）
まず、本実施の形態において対象としている修辞構造木の例を説明する。修辞構造木は、それを構成する最小の談話基本単位であるＥＤＵの系列（以降、スパンと呼ぶ）を修辞関係により結合し、より大きなスパンを構成するという操作を再帰的に繰り返すことによって得られる木である。 (About rhetorical structure trees)
First, an example of a rhetorical structure tree targeted in this embodiment will be explained. A rhetorical structure tree is obtained by recursively repeating the operation of connecting a series of EDUs (hereinafter referred to as spans), which are the smallest basic units of discourse that make up the tree, through rhetorical relationships to form larger spans. It's a tree.

木の葉はＥＤＵ（節に相当）のユニットであり、木のノードにはそれが支配するスパンの核性ラベルが付与される。結合される２つのスパン（兄弟スパン）の一方は重要な情報を持つ核となり、もう一方はそれを補足する衛星となる。例外的に双方が核となる場合もある。木の枝にはスパン間の修辞関係を表す関係ラベルが付与される。 A leaf of a tree is a unit of EDU (equivalent to a node), and a node of the tree is given a nuclearity label of the span it dominates. One of the two spans (sibling spans) that will be combined will become a core containing important information, and the other will become a satellite that supplements it. In exceptional cases, both sides may be the core. Tree branches are given relational labels that represent the rhetorical relationships between spans.

図１に修辞構造木の例を示す。図中のｅ_１～ｅ_７がそれぞれＥＤＵであり、Ｓ／Ｎがスパンの核性ラベル（Ｎが核でＳが衛星）、Ｃｏｎｄｉｔｉｏｎ、Ｅｌａｂｏｒａｔｉｏｎなどが兄弟スパンの間の関係ラベルである。関係ラベルは兄弟スパンの核性がＳとＮの組合せの場合、Ｓ側のスパンに対して与えられ、ＮとＮとなる場合には双方のスパンに対して与えられる。ＣｏｎｄｉｔｉｏｎやＥｌａｂｏｒａｔｉｏｎはＳとＮの組合せに対して与えられ、Ｌｉｓｔ、Ｓａｍｅ－ＵｎｉｔはＮとＮの組合せに与えられる。 Figure 1 shows an example of a rhetorical structure tree. In the figure, e ₁ to e ₇ are EDUs, S/N is the nuclearity label of the span (N is the nucleus, S is the satellite), Condition, Elaboration, etc. are the relationship labels between the sibling spans. When the nuclearity of the sibling spans is a combination of S and N, the relationship label is given to the span on the S side, and when it is N and N, it is given to both spans. Condition and Elaboration are given to the combination of S and N, and List and Same-Unit are given to the combination of N and N.

（実施の形態の概要）
本実施の形態では、パラメタ最適化装置１００が、まず、既存の複数の修辞構造解析器を用いて大量のラベルなしデータを解析することである程度の頻度を保ったうえで多様なラベルを持つデータセットを構築する。次に、複数の修辞構造解析器から得られた複数の修辞構造木の間で共通する部分木を擬似正解データとして抽出する。そして、擬似正解データを用いてニューラルネットに基づく修辞構造解析器のパラメタを事前学習した後、正解データ（人手によるアノテーション済みデータ）を用いてパラメタをファインチューニングする。 (Summary of embodiment)
In this embodiment, the parameter optimization device 100 first analyzes a large amount of unlabeled data using a plurality of existing rhetorical structure analyzers, and then analyzes data with various labels while maintaining a certain degree of frequency. Build a set. Next, common subtrees among the plurality of rhetorical structure trees obtained from the plurality of rhetorical structure analyzers are extracted as pseudo-correct data. Then, after pre-learning the parameters of a rhetorical structure analyzer based on a neural network using the pseudo-correct data, the parameters are fine-tuned using the correct data (data that has been manually annotated).

上記の処理により得られたパラメタを用いてニューラルネットワークに基づく修辞構造解析器を構成することにより、人手アノテーション済みデータはファインチューニングのみで使用すればよいため、大量の人手アノテーション済みデータを用いることなく、性能の良い修辞構造解析器を実現することができる。 By configuring a rhetorical structure analyzer based on a neural network using the parameters obtained through the above processing, the manually annotated data can be used only for fine tuning, thereby eliminating the need to use a large amount of manually annotated data. , it is possible to realize a rhetorical structure analyzer with good performance.

（装置構成例、動作概要）
図２に、本実施の形態におけるパラメタ最適化装置１００の構成例を示す。図２に示すようにパラメタ最適化装置１００は、疑似正解データ生成部１１０、事前学習部１２０、ファインチューニング部１３０を備える。 (Device configuration example, operation overview)
FIG. 2 shows a configuration example of the parameter optimization device 100 in this embodiment. As shown in FIG. 2, the parameter optimization device 100 includes a pseudo-correct data generation section 110, a pre-learning section 120, and a fine tuning section 130.

パラメタ最適化装置１００は、１つのコンピュータで実装されてもよいし、複数のコンピュータで実装されてもよい。また、パラメタ最適化装置１００のうちの一部又は全部の機能が、クラウド上の仮想マシンで実装されてもよい。疑似正解データ生成部１１０、事前学習部１２０、ファインチューニング部１３０をそれぞれ、疑似正解データ生成装置、事前学習装置、ファインチューニング装置と呼んでもよい。 The parameter optimization device 100 may be implemented using one computer or multiple computers. Furthermore, some or all of the functions of the parameter optimization device 100 may be implemented in a virtual machine on the cloud. The pseudo-correct data generation unit 110, the pre-learning unit 120, and the fine-tuning unit 130 may be respectively referred to as a pseudo-correct data generation device, a pre-learning device, and a fine-tuning device.

図２には、処理の流れも示されている。図２に示すように、疑似正解データ生成部１１０は、ラベルなしデータから疑似正解データを生成し、事前学習部１２０が疑似正解データを用いてニューラルネットワークに基づく修辞構造解析器のパラメタを学習する。ファインチューニング部１３０は、事前学習部１２０により学習されたパラメタを初期値として、正解データ（人手アノテーション済みデータ）を用いてパラメタを再度学習する。 FIG. 2 also shows the flow of processing. As shown in FIG. 2, the pseudo-correct data generation unit 110 generates pseudo-correct data from unlabeled data, and the pre-learning unit 120 uses the pseudo-correct data to learn parameters of a rhetorical structure analyzer based on a neural network. . The fine tuning unit 130 uses the parameters learned by the pre-learning unit 120 as initial values and learns the parameters again using the correct data (manually annotated data).

以下、各部の処理内容を詳細に説明する。 The processing contents of each part will be explained in detail below.

（疑似正解データ生成部１１０）
疑似正解データ生成部１１０は、複数の修辞構造解析器を用いてラベルなしデータを解析した結果から共通する部分木を合意木として生成し、それを擬似正解データとして事前学習部１２０へと渡す。 (Pseudo-correct data generation unit 110)
The pseudo-correct data generation unit 110 generates a common subtree as a consensus tree from the results of analyzing unlabeled data using a plurality of rhetorical structure analyzers, and passes it to the pre-learning unit 120 as pseudo-correct data.

図３を参照して疑似正解データ生成部１１０について説明する。図３に示すように、疑似正解データ生成部１１０は、ｎ個の修辞構造解析器と、合意木抽出部１１５を備える。 The pseudo-correct data generation unit 110 will be described with reference to FIG. 3. As shown in FIG. 3, the pseudo-correct data generation unit 110 includes n rhetorical structure analyzers and a consensus tree extraction unit 115.

ｎ個の修辞構造解析器について、異なるｎ個の修辞構造解析器を使用してもよいし、同じ修辞構造解析器のハイパーパラメタを変更したものをｎ個使用してもよい。 Regarding the n rhetorical structure analyzers, n different rhetorical structure analyzers may be used, or n pieces of the same rhetorical structure analyzer with modified hyperparameters may be used.

図３に示すように、ｎ個の修辞構造解析器はそれぞれ、入力されたラベルなしデータを独立に解析し、各文書に対する修辞構造木を生成する。これにより、ｎ個の修辞構造木が生成される。 As shown in FIG. 3, each of the n rhetorical structure analyzers independently analyzes input unlabeled data and generates a rhetorical structure tree for each document. This generates n rhetorical structure trees.

次に、合意木抽出部１１５がｎ個の修辞構造木の間で共通する部分木を抽出し、疑似正解データとして事前学習部１２０へ渡す。 Next, the consensus tree extraction unit 115 extracts common subtrees among the n rhetorical structure trees and passes them to the pre-learning unit 120 as pseudo-correct answer data.

＜合意木抽出部１１５の処理＞
合意木抽出部１１５の処理内容を詳細に説明する。ｎ個の修辞構造木の間で共通する部分木を抽出するアルゴリズム１を図４に示す。アルゴリズム１は、合意木抽出部１１５が実行するプログラムに相当する。 <Processing of the consensus tree extraction unit 115>
The processing contents of the consensus tree extraction unit 115 will be explained in detail. FIG. 4 shows Algorithm 1 for extracting common subtrees among n rhetorical structure trees. Algorithm 1 corresponds to a program executed by the consensus tree extraction unit 115.

アルゴリズム１を実行する合意木抽出部１１５は、ある文書に対する異なるｎ個の修辞構造木を入力（ｔｒｅｅｓ）として受け取り、それらに共通する部分木（ｓｕｂｔｒｅｅｓ）を合意木として出力する。 The consensus tree extraction unit 115 that executes Algorithm 1 receives as input (trees) n different rhetorical structure trees for a certain document, and outputs subtrees (subtrees) common to these as a consensus tree.

３行目に示される関数ＡＧＲＥＥＭＥＮＴは、ノードｓｐａｎを頂点とした部分木が合意木であるか否かを判定する関数である。ｓｐａｎが葉の場合、ＡＧＲＥＥＭＥＮＴは真を返す（４行目、５行目）。 The function AGREEMENT shown in the third line is a function that determines whether the subtree with the node span as the vertex is a consensus tree. If span is a leaf, AGREEMENT returns true (lines 4 and 5).

それ以外の場合、関数ＡＧＲＥＥＭＥＮＴは、ノードｓｐａｎの頻度がｎの場合にＳ_ｃの値を真にし（７行目、８行目）、そうでない場合には偽にする（１０行目）。なお、頻度とは、対象としているｎ個の修辞構造木における該当ノードｓｐａｎの個数である。 Otherwise, the function AGREEMENT makes the value of S _c true when the frequency of the node span is n (lines 7 and 8), and false otherwise (line 10). Note that the frequency is the number of corresponding nodes span in the target n rhetorical structure trees.

そして、ノードｓｐａｎの左の子ノード、右の子ノードに対するＡＧＲＥＥＭＥＮＴの値をそれぞれＳ_ｌ、Ｓ_ｒに格納する（１１行目、１２行目）。Ｓ_ｃ、Ｓ_ｌ、Ｓ_ｒの全てが真のとき、ＡＧＲＥＥＭＥＮＴは真を返し、それ以外は偽を返す（１３～１７行目）。 Then, the values of AGREEMENT for the left child node and right child node of node span are stored in S _l and S _r , respectively (11th line and 12th line). When S _c , S _l , and S _r are all true, AGREEMENT returns true; otherwise, it returns false (lines 13 to 17).

２行目（及び１８～３３行目）の関数ＦＩＮＤＲＯＯＴにより、ＳがＴｒｕｅであるノード（スパン）を追加していくことで部分木を生成し、出力する。 The function FINDROO on the second line (and lines 18 to 33) generates a subtree by adding nodes (spans) where S is True, and outputs it.

ｎ個の修辞構造木のいずれか一つを対象として関数ＡＧＲＥＥＭＥＮＴをそのルートノードから順に深さ優先探索で適用していくことで合意木を得る。ただし、部分木のサイズを制御するため、葉の数に対して最小値ｌ_ｍｉｎと最大値ｌ_ｍａｘの制約を導入する（２１～２６行目）。 An agreement tree is obtained by sequentially applying the function AGREEMENT to any one of the n rhetorical structure trees in a depth-first search starting from its root node. However, in order to control the size of the subtree, constraints of a minimum value l _min and a maximum value l _max are introduced for the number of leaves (lines 21 to 26).

図５を参照して、ｎ＝２の場合における、アルゴリズム１に従って動作する合意木抽出部１１５の動作例を説明する。まず、図５の左側に示す２つの木についてラベルも考慮してノードの頻度を数えておく。なお、ノードはスパン、つまりそれが支配するＥＤＵの開始、終了インデックスを表す。 Referring to FIG. 5, an example of the operation of the consensus tree extraction unit 115 that operates according to Algorithm 1 in the case of n=2 will be described. First, the frequency of nodes is counted for the two trees shown on the left side of FIG. 5, taking into account the labels. Note that each node represents a span, that is, the start and end index of the EDU that it controls.

図５の左側において、２つの修辞構造木の間で共通するノードは、下線を引いて示したノードである。図５の左上の修辞構造木を取り出し、アルゴリズム１を適用する。図５の右側に、左上の修辞構造木を取り出してアルゴリズム１が適用された際の各ノードでのＳ_ｃ、Ｓ_ｌ、Ｓ_ｒ、Ｓが示されている。 On the left side of FIG. 5, the nodes that are common between the two rhetorical structure trees are the underlined nodes. The rhetorical structure tree at the top left of FIG. 5 is taken out and Algorithm 1 is applied. On the right side of FIG. 5, S _c , S _l , S _r , and S at each node are shown when the upper left rhetorical structure tree is taken out and Algorithm 1 is applied.

ＡＧＲＥＥＭＥＮＴ（１，１０）はスパン（１，１０）の頻度が２であることからＳ_ｃ（１，１０）をＴｒｕｅにセットし、左の子供（１，４）と右の子供（５，１０）に対してＡＧＲＥＥＭＥＮＴを適用する。 AGREEMENT(1,10) sets S _c (1,10) to True because the frequency of span (1,10) is 2, and the left child (1,4) and right child (5,10 ) to apply AGREEMENT.

それぞれに対して、ＡＧＲＥＥＭＥＮＴを再帰的に適用し、Ｓ_ｃを決定していくと修辞構造木の葉（開始と終了インデックスが等しいスパン）に行き着くのでＳの値はＴｒｕｅにセットされる。つまり、アルゴリズム１では、複数の修辞構造木において共通に出現するスパンがあるかどうかの判断を再帰的に実行することとしている。そして、すでにセットしたＳ_ｃと左右のノードから決定されたＳ_ｌ、Ｓ_ｒの値に基づき各ノードのＳの値が決定される。図５の右側の修辞構造木における下から上への矢印線は、各ノードのＳの値が順次決定される様子を示している。あるノードについて、そのノードのＳと、そのノードの下にある全てのノードのＳがＴｒｕｅである部分木が、そのノードを頂点とする合意木になる。 For each, by recursively applying AGREEMENT and determining S _c , we arrive at a leaf of the rhetorical structure tree (a span with equal start and end indices), so the value of S is set to True. In other words, in Algorithm 1, it is determined recursively whether there is a span that appears in common in a plurality of rhetorical structure trees. Then, the value of S of each node is determined based on the already set S _c and the values of S _l and S _r determined from the left and right nodes. The arrow line from bottom to top in the rhetorical structure tree on the right side of FIG. 5 shows how the value of S for each node is determined in sequence. For a certain node, a subtree in which S of that node and S of all nodes below that node are True becomes a consensus tree with that node as the vertex.

図５の右側に示すように、本例ではスパン（１，４）を頂点とする部分木とスパン（５，７）を頂点とする部分木の２つの部分木が合意木として抽出される。例えばスパン（５，１０）やスパン（８，１０）は、Ｓ＝Ｆａｌｓｅなので、それを頂点とする部分木は合意木にならない。ただし、包含関係にある合意木は最大のもののみを抽出する。 As shown on the right side of FIG. 5, in this example, two subtrees are extracted as a consensus tree, one having the span (1, 4) as the vertex and the other having the span (5, 7) as the vertex. For example, S=False for span (5, 10) and span (8, 10), so the subtree with these as vertices does not become a consensus tree. However, only the largest consensus tree in an inclusive relationship is extracted.

（事前学習部１２０について）
次に、事前学習部１２０の構成例と処理内容を説明する。事前学習部１２０は、疑似正解データ生成部１１０から疑似正解データを受け取り、ニューラルネットワークのモデルに基づく修辞構造解析器のパラメタを学習する。本実施の形態では、どのようなニューラル修辞構造解析器を用いてもよいが、非特許文献１に開示されている技術を例として、ニューラルネットワークモデルに基づく修辞構造解析器のパラメタ最適化について説明する。なお、事前学習においてパラメタはランダムに初期化されているものとする。 (About the pre-learning section 120)
Next, a configuration example and processing contents of the pre-learning section 120 will be explained. The pre-learning unit 120 receives the pseudo-correct data from the pseudo-correct data generating unit 110, and learns the parameters of the rhetorical structure analyzer based on the neural network model. In this embodiment, although any neural rhetorical structure analyzer may be used, parameter optimization of a rhetorical structure analyzer based on a neural network model will be explained using the technique disclosed in Non-Patent Document 1 as an example. do. It is assumed that the parameters are randomly initialized in the pre-learning.

図６に、事前学習部１２０の構成例を示す。図６に示すように、事前学習部１２０は、木構造推定部１２１、ラベル推定部１２２、及びパラメタ最適化部１２３を有する。以下、各部の処理内容について説明する。 FIG. 6 shows a configuration example of the pre-learning section 120. As shown in FIG. 6, the pre-learning unit 120 includes a tree structure estimation unit 121, a label estimation unit 122, and a parameter optimization unit 123. The processing contents of each part will be explained below.

＜木構造推定部１２１＞
木構造推定部１２１は、スパンの分割点を推定することで木構造を推定する。任意のスパン（ｉ番目のＥＤＵからｊ番目のＥＤＵからなるＥＤＵの系列）に対し、ｋ番目のＥＤＵでスパンが分割されるスコアｓ_{ｓｐｌｉｔ}（ｉ；ｊ；ｋ）が以下の式で与えられる。 <Tree structure estimation unit 121>
The tree structure estimation unit 121 estimates the tree structure by estimating the division points of the span. For any span (a series of EDUs consisting of the i-th EDU to the j-th EDU), the score s _split (i; j; k) at which the span is divided by the k-th EDU is given by the following formula.

ここで、Ｗ_ｕは重み行列であり、ｖ_ｌ（添字ｌはＬの小文字）とｖ_ｒはそれぞれ分割された左右のスパンに対する重みベクトルである。ｈ_ｉ：ｋとｈ_{ｋ＋１：ｊ}は以下で定義される。

Here, W _u is a weight matrix, and v _l (subscript l is a lowercase letter L) and v _r are weight vectors for the divided left and right spans, respectively. h _i:k and h _k+1:j are defined below.

ｈ_ｉ：ｋ＝ＭＬＰ_ｌｅｆｔ（ｕ_ｉ：ｋ），ｈ_{ｋ＋１：ｊ}＝ＭＬＰ_{ｒｉｇｈｔ}（ｕ_{ｋ＋１：ｊ}）
上記式のＭＬＰ_＊は多層パーセプトロンを表す。スパンのベクトル表現ｕ_ｉ：ｊは単語ベクトルをＬＳＴＭに入力することで得る。下記の式（２）に示すように、スパンは、下記の式（１）を最大にするｋにて分割される。 h _i:k = MLP _left (u _i:k ), h _k+1:j = MLP _right (u _k+1:j )
MLP _* in the above formula represents a multilayer perceptron. The vector representation of the span u _i:j is obtained by inputting the word vector into the LSTM. As shown in equation (2) below, the span is divided by k that maximizes equation (1) below.

＜ラベル推定部１２２＞
ラベル推定部１２２は、木構造推定部１２１が決定したスパンの分割点ｋに対し、分割した２つのスパンに対する核性、修辞関係ラベルを予測する。予測のスコアは以下の式で与えられる。

<Label estimation unit 122>
The label estimating unit 122 predicts the core nature and rhetorical relationship labels for the two divided spans at the span dividing point k determined by the tree structure estimating unit 121. The prediction score is given by the following formula:

Ｗ_ｌ（添字ｌはＬの小文字）は重み行列であり、ｕ_１：ｉ；ｕ_ｊ：ｎはそれぞれｉ番目のＥＤＵの左側のスパンのベクトル表現、ｊ番目のＥＤＵの右側のスパンのベクトル表現である。最終的に、以下の式（４）で式（３）を最大にするラベルが与えられる。

W _l (the subscript l is a lowercase letter L) is a weight matrix, and u _1:i ; u _j:n are the vector representations of the left span of the i-th EDU and the vector representation of the right span of the j-th EDU, respectively. It is. Finally, the label that maximizes equation (3) is given by equation (4) below.

Ｌは、ラベル集合であり核性ラベルを付与する場合には３種のラベルからなる集合｛Ｎ－Ｓ、Ｓ－Ｎ、Ｎ－Ｎ｝であり、修辞ラベルを付与する場合には１８種のラベルからなる集合｛Ｅｌａｂｏｒａｔｉｏｎ，Ｃｏｎｄｉｔｉｏｎ，......｝である。なお、Ｗ_ｌとＭＬＰは核性ラベルを与える場合と修辞ラベルを与える場合とで独立に最適化される。

L is a label set, which is a set of three types of labels {NS, SN, N-N} when a nuclear label is assigned, and a set of 18 types when a rhetorical label is assigned. A set of labels {Elaboration, Condition, ...}. Note that W _l and MLP are independently optimized for the case of giving a nuclear label and the case of giving a rhetorical label.

＜パラメタ最適化部１２３＞
パラメタ最適化部１２３は、学習対象の全てのパラメタ、すなわち、Ｗ_ｕ、Ｗ_ｌ、ｖ_ｒ、ｖ_ｌ、ＬＳＴＭ、及びＭＬＰのパラメタを、以下に定義する２つの損失関数の和を最小化することで得る。なお、ｋ^＊とｌ^＊（ｌはＬの小文字）はそれぞれ正解（ここでは疑似正解データ）の分割位置、ラベルである。 <Parameter optimization unit 123>
The parameter optimization unit 123 minimizes all the parameters to be learned, that is, the parameters of W _u , W _l , v _r , v _l , LSTM, and MLP by the sum of two loss functions defined below. get it by doing that. Note that k ^* and l ^* (l is a lowercase letter L) are the division positions and labels of the correct answer (here, pseudo-correct data), respectively.

損失関数を最小化する演算については、誤差逆伝搬法等の既存手法を用いて行うことができる。そして、パラメタ最適化部１２３は、最適化したパラメタをファインチューニング部１３０へ渡す。

The calculation to minimize the loss function can be performed using existing methods such as error backpropagation. Then, the parameter optimization unit 123 passes the optimized parameters to the fine tuning unit 130.

＜ファインチューニング部１３０＞
ファインチューニング部１３０は、事前学習部１２０で最適化されたパラメタを初期値として正解データ（人手で作成したアノテーション済みデータ）を用いてニューラル修辞構造解析器のパラメタを再度最適化する。ファインチューニング部１３０は最適化されたパラメタを出力する。 <Fine tuning section 130>
The fine tuning unit 130 re-optimizes the parameters of the neural rhetorical structure analyzer using the correct data (annotated data created manually) with the parameters optimized by the pre-learning unit 120 as initial values. The fine tuning unit 130 outputs optimized parameters.

ファインチューニング部１３０の構成は図６に示した事前学習部１２０の構成と同じである。ただし、ファインチューニング部１３０では、パラメタの初期値として、事前学習部１２０で最適化されたパラメタを使用する点と、正解データとして、疑似正解データではなく、人手で作成したアノテーション済みデータを用いる点が、事前学習部１２０と異なる。 The configuration of the fine tuning section 130 is the same as the configuration of the pre-learning section 120 shown in FIG. However, the fine tuning unit 130 uses the parameters optimized by the pre-learning unit 120 as the initial values of the parameters, and uses manually created annotated data as the correct data, rather than pseudo-correct data. is different from the pre-learning section 120.

（装置のハードウェア構成例）
パラメタ最適化装置１００、疑似正解データ生成部１１０、事前学習部１２０、ファインチューニング部１３０（これらを総称して「装置」と呼ぶ）はいずれも、例えば、コンピュータに、本実施の形態で説明する処理内容を記述したプログラムを実行させることにより実現可能である。 (Example of device hardware configuration)
The parameter optimization device 100, the pseudo-correct data generation unit 110, the pre-learning unit 120, and the fine-tuning unit 130 (collectively referred to as the “device”) are all implemented by, for example, a computer, which will be described in this embodiment. This can be achieved by executing a program that describes the processing contents.

上記プログラムは、コンピュータが読み取り可能な記録媒体（可搬メモリ等）に記録して、保存したり、配布したりすることが可能である。また、上記プログラムをインターネットや電子メール等、ネットワークを通して提供することも可能である。 The above program can be recorded on a computer-readable recording medium (such as a portable memory) and can be stored or distributed. It is also possible to provide the above program through a network such as the Internet or e-mail.

図７は、上記コンピュータのハードウェア構成例を示す図である。図７のコンピュータは、それぞれバスＢＳで相互に接続されているドライブ装置１０００、補助記憶装置１００２、メモリ装置１００３、ＣＰＵ１００４、インタフェース装置１００５、表示装置１００６、入力装置１００７、出力装置１００８等を有する。 FIG. 7 is a diagram showing an example of the hardware configuration of the computer. The computer in FIG. 7 includes a drive device 1000, an auxiliary storage device 1002, a memory device 1003, a CPU 1004, an interface device 1005, a display device 1006, an input device 1007, an output device 1008, etc., which are interconnected by a bus BS.

当該コンピュータでの処理を実現するプログラムは、例えば、ＣＤ－ＲＯＭ又はメモリカード等の記録媒体１００１によって提供される。プログラムを記憶した記録媒体１００１がドライブ装置１０００にセットされると、プログラムが記録媒体１００１からドライブ装置１０００を介して補助記憶装置１００２にインストールされる。但し、プログラムのインストールは必ずしも記録媒体１００１より行う必要はなく、ネットワークを介して他のコンピュータよりダウンロードするようにしてもよい。補助記憶装置１００２は、インストールされたプログラムを格納すると共に、必要なファイルやデータ等を格納する。 A program for realizing processing by the computer is provided, for example, by a recording medium 1001 such as a CD-ROM or a memory card. When the recording medium 1001 storing the program is set in the drive device 1000, the program is installed from the recording medium 1001 to the auxiliary storage device 1002 via the drive device 1000. However, the program does not necessarily need to be installed from the recording medium 1001, and may be downloaded from another computer via a network. The auxiliary storage device 1002 stores installed programs as well as necessary files, data, and the like.

メモリ装置１００３は、プログラムの起動指示があった場合に、補助記憶装置１００２からプログラムを読み出して格納する。ＣＰＵ１００４は、メモリ装置１００３に格納されたプログラムに従って、当該装置に係る機能を実現する。インタフェース装置１００５は、ネットワークに接続するためのインタフェースとして用いられる。表示装置１００６はプログラムによるＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）等を表示する。入力装置１００７はキーボード及びマウス、ボタン、又はタッチパネル等で構成され、様々な操作指示を入力させるために用いられる。出力装置１００８は演算結果を出力する。 The memory device 1003 reads the program from the auxiliary storage device 1002 and stores it when there is an instruction to start the program. The CPU 1004 implements functions related to the device according to programs stored in the memory device 1003. The interface device 1005 is used as an interface for connecting to a network. A display device 1006 displays a GUI (Graphical User Interface) or the like based on a program. The input device 1007 is composed of a keyboard, a mouse, buttons, a touch panel, or the like, and is used to input various operation instructions. An output device 1008 outputs the calculation result.

（実施の形態の効果）
以上説明したとおり、本実施の形態では、ラベルなしデータを複数の修辞構造解析器を用いて解析した結果から共通する部分木を擬似正解データとし、疑似正解データを用いてニューラル修辞構造解析器のパラメタを事前学習により最適化し、そのパラメタを初期値として正解データを用いてファインチューニングすることにより、ニューラル修辞構造解析器のパラメタを最適化することとした。 (Effects of embodiment)
As explained above, in this embodiment, a common subtree from the results of analyzing unlabeled data using a plurality of rhetorical structure analyzers is used as pseudo-correct data, and the pseudo-correct data is used to analyze the neural rhetorical structure analyzer. We decided to optimize the parameters of the neural rhetorical structure analyzer by optimizing the parameters through pre-learning, and using the parameters as initial values and fine-tuning using the ground truth data.

これにより、ランダムな初期値から正解データ（大量に用意できないもの）のみを用いて最適化した場合よりも良いパラメタが得られ、ニューラル修辞構造解析器の性能が向上する。すなわち、大量の修辞構造アノテーション済みの正解データを用いることなく、性能の良い修辞構造解析器を実現することが可能となる。 As a result, better parameters can be obtained than when optimization is performed using only correct data (which cannot be prepared in large quantities) from random initial values, and the performance of the neural rhetorical structure analyzer is improved. In other words, it is possible to realize a rhetorical structure analyzer with good performance without using a large amount of correct answer data that has been annotated with rhetorical structure.

（実施の形態のまとめ）
本明細書には、少なくとも下記各項のパラメタ最適化装置、パラメタ最適化方法、及びプログラムが開示されている。
（第１項）
ラベルなしデータから複数の修辞構造解析器により得られた複数の修辞構造木において共通する部分木を、疑似正解データとして生成する疑似正解データ生成部と、
前記疑似正解データ生成部により生成された前記疑似正解データを用いて、ニューラルネットワークを用いた修辞構造解析器のパラメタを最適化する事前学習部と、
前記事前学習部により最適化された前記パラメタを、正解データを用いてファインチューニングするファインチューニング部と
を備えるパラメタ最適化装置。
（第２項）
前記疑似正解データ生成部は、前記複数の修辞構造木において共通に出現するスパンがあるかどうかの判断を再帰的に実行することにより前記疑似正解データを生成する
第１項に記載のパラメタ最適化装置。
（第３項）
パラメタ最適化装置が実行するパラメタ最適化方法であって、
ラベルなしデータから複数の修辞構造解析器により得られた複数の修辞構造木において共通する部分木を、疑似正解データとして生成する疑似正解データ生成ステップと、
前記疑似正解データ生成ステップにより生成された前記疑似正解データを用いて、ニューラルネットワークを用いた修辞構造解析器のパラメタを最適化する事前学習ステップと、
前記事前学習ステップにより最適化された前記パラメタを、正解データを用いてファインチューニングするファインチューニングステップと
を備えるパラメタ最適化方法。
（第４項）
コンピュータを、第１項又は第２項に記載のパラメタ最適化装置における各部として機能させるためのプログラム。 (Summary of embodiments)
This specification discloses at least a parameter optimization device, a parameter optimization method, and a program described in each of the following sections.
(Section 1)
a pseudo-correct data generation unit that generates, as pseudo-correct data, a subtree common to the plurality of rhetorical structure trees obtained from the unlabeled data by the plurality of rhetorical structure analyzers;
a pre-learning unit that optimizes parameters of a rhetorical structure analyzer using a neural network using the pseudo-correct data generated by the pseudo-correct data generating unit;
A parameter optimization device comprising: a fine-tuning unit that fine-tunes the parameters optimized by the pre-learning unit using correct data.
(Section 2)
Parameter optimization according to item 1, wherein the pseudo-correct data generation unit generates the pseudo-correct data by recursively determining whether there is a span that appears in common in the plurality of rhetorical structure trees. Device.
(Section 3)
A parameter optimization method executed by a parameter optimization device,
a pseudo-correct data generation step of generating, as pseudo-correct data, a subtree common to the plurality of rhetorical structure trees obtained from the unlabeled data by the plurality of rhetorical structure analyzers;
a pre-learning step of optimizing parameters of a rhetorical structure analyzer using a neural network using the pseudo-correct data generated in the pseudo-correct data generation step;
A parameter optimization method comprising: a fine-tuning step of fine-tuning the parameters optimized in the pre-learning step using correct data.
(Section 4)
A program for causing a computer to function as each part of the parameter optimization device according to item 1 or 2.

以上、本実施の形態について説明したが、本発明はかかる特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形・変更が可能である。 Although the present embodiment has been described above, the present invention is not limited to such specific embodiment, and various modifications and changes can be made within the scope of the gist of the present invention as described in the claims. It is possible.

１００パラメタ最適化装置
１１０疑似正解データ生成部
１１５合意木抽出部
１２０事前学習部
１２１木構造推定部
１２２ラベル推定部
１２３パラメタ最適化部
１３０ファインチューニング部
１０００ドライブ装置
１００１記録媒体
１００２補助記憶装置
１００３メモリ装置
１００４ＣＰＵ
１００５インタフェース装置
１００６表示装置
１００７入力装置
１００８出力装置 100 Parameter optimization device 110 Pseudo-correct data generation section 115 Consensus tree extraction section 120 Pre-learning section 121 Tree structure estimation section 122 Label estimation section 123 Parameter optimization section 130 Fine tuning section 1000 Drive device 1001 Recording medium 1002 Auxiliary storage device 1003 Memory Device 1004 CPU
1005 Interface device 1006 Display device 1007 Input device 1008 Output device

Claims

a pseudo-correct data generation unit that generates, as pseudo-correct data, a subtree common to the plurality of rhetorical structure trees obtained from the unlabeled data by the plurality of rhetorical structure analyzers;
a pre-learning unit that optimizes parameters of a rhetorical structure analyzer using a neural network using the pseudo-correct data generated by the pseudo-correct data generating unit;
A parameter optimization device comprising: a fine-tuning unit that fine-tunes the parameters optimized by the pre-learning unit using correct data.

Parameter optimization according to claim 1, wherein the pseudo-correct data generation unit generates the pseudo-correct data by recursively determining whether there is a span that appears in common in the plurality of rhetorical structure trees. Device.

A parameter optimization method executed by a parameter optimization device, comprising:
a pseudo-correct data generation step of generating, as pseudo-correct data, a subtree common to the plurality of rhetorical structure trees obtained from the unlabeled data by the plurality of rhetorical structure analyzers;
a pre-learning step of optimizing parameters of a rhetorical structure analyzer using a neural network using the pseudo-correct data generated in the pseudo-correct data generation step;
A parameter optimization method comprising: a fine-tuning step of fine-tuning the parameters optimized in the pre-learning step using correct data.

A program for causing a computer to function as each part of the parameter optimization device according to claim 1 or 2.