JPH04367062A

JPH04367062A - Structure extracting method by neural network

Info

Publication number: JPH04367062A
Application number: JP3143014A
Authority: JP
Inventors: Hideyuki Maki; 秀行牧; Ikuo Matsuba; 松葉　育雄
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1991-06-14
Filing date: 1991-06-14
Publication date: 1992-12-18

Abstract

PURPOSE:To make useful for the characteristic understanding by presuming the grammar of a language from an example sentence with a neural network, using it for sentence structure analysis, etc., in the language processing and extracting the feature of a signal from a sample with the neural network even in the signal processing. CONSTITUTION:When a set 101 of an example sentence is inputted, conversion is performed with a pattern for learning and an identity mapping learning neural network part 103 learns the identity mapping of a pattern for learning. Thus, the structure of the pattern for learning is reflected to the linking weight between the units of the neural network, and by analyzing the linking weight between the units by a neural network analyzing part 104, the auxiliary and independent relation between the constituents of the pattern is found.

Description

[Detailed description of the invention]

【０００１】0001

【産業上の利用分野】言語処理などで、対象となる言語
の文法が事前に得られていない場合に、ニューラルネッ
トワークを用いて例文から言語の文法を獲得し、利用す
ることができる。また、信号処理においても、信号の特
徴が容易に発見できない場合に、ニューラルネットワー
クを用いて標本から特徴を獲得でき、信号の特性理解に
役立てることができる。[Industrial Application Field] When the grammar of the target language is not obtained in advance, such as in language processing, neural networks can be used to obtain and use the grammar of the language from example sentences. Furthermore, in signal processing, when the characteristics of a signal cannot be easily discovered, neural networks can be used to acquire the characteristics from a sample, which can be useful for understanding the characteristics of the signal.

【０００２】0002

【従来の技術】言語処理へのニューラルネットワークの
応用例の１つに構文解析器がある。なかでも文脈自由文
法に関するものが多く、例えば、エイアイピー・カンフ
ァレンス・プロシーディングズ１５１，ｐ．１４０−ｐ
．１４５，“コンテクストフリー・パージング・ウィズ
・コネクショニスト・ネットワーク”（ＡＩＰ　Ｃｏｎ
ｆｅｒｅｎｃｅＰｒｏｃｅｅｄｉｎｇｓ　１５１，ｐ．
１４０−ｐ．１４５，“Ｃｏｎｔｅｘｔ−Ｆｒｅｅ　Ｐ
ａｒｓｉｎｇ　ｗｉｔｈＣｏｎｎｅｃｔｉｏｎｉｓｔ　
Ｎｅｔｗｏｒｋ”）などがある。それらは、あらかじめ明示的に与えられた文法をもとに
ネットワークを構成するという方法をとっており、ニュ
ーラルネットワークの並列処理によって構文解析を高速
に行うことを目的としている。したがって、ニューラル
ネットワークの学習能力を活かしたものではない。モー
ガン・カウフマン・パブリシャー，アドバンスィズ・イ
ン・ニューラル・インフォメーション・プロセッシング
・システムズ１，ｐ．５３７−ｐ．５４４，“ア・マッ
シブリ・パラレル・セルフチューニング・コンテクスト
フリー・パーザー”（Ｍｏｒｇａｎ　Ｋａｕｆｍａｎｎ
　Ｐｕｂｌｉｓｈｅｒ，Ａｄｖａｎｃｅｓ　ｉｎ　Ｎｅ
ｕｒａｌＩｎｆｏｒｍａｔｉｏｎ　Ｐｒｏｃｅｓｓｉｎ
ｇＳｙｓｔｅｍｓ１，ｐ．５３７−ｐ．５４４，“Ａ　
Ｍａｓｓｉｖｅｌｙ　ＰａｒａｌｌｅｌＳｅｌｆ−Ｔｕ
ｎｉｎｇ　Ｃｏｎｔｅｘｔ−Ｆｒｅｅ　Ｐａｒｓｅｒ”
）　には学習能力を持った構文解析器が述べられている
が、これも文法はあらかじめ与えられており、その適用
の仕方を学習するものである。その他、特開平１−２５
５９６６　号も言語処理にニューラルネットワークを用
いた例だが、これも文法規則の適用順序を学習するもの
であり、文法があらかじめ与えられることを前提として
いる。2. Description of the Related Art A syntactic analyzer is one of the applications of neural networks to language processing. Among them, there are many related to context-free grammar, such as AIP Conference Proceedings 151, p. 140-p
．． 145, “Context-Free Purging with Connectionist Networks” (AIP Con
ferenceProceedings 151, p.
140-p. 145, “Context-Free P
asing with Connectionist
These methods use a method of constructing a network based on a grammar that is explicitly given in advance, and their purpose is to perform syntax analysis at high speed through parallel processing of neural networks. Therefore, it does not take advantage of the learning ability of neural networks.Morgan Kaufman Publishers, Advances in Neural Information Processing Systems 1, p.537-p.544, “A Massively Parallel・Self-tuning context-free parser” (Morgan Kaufmann)
Publisher, Advances in Ne
uralInformation Processin
gSystems1, p. 537-p. 544, “A
Massively ParallelSelf-Tu
ning Context-Free Parser”
) describes a syntactic analyzer with learning ability, but this too is given a grammar in advance and learns how to apply it. Others, JP-A-1-25
No. 5966 is also an example of using a neural network for language processing, but it also learns the order in which grammar rules are applied, and assumes that the grammar is given in advance.

【０００３】0003

【発明が解決しようとする課題】構文解析への従来のニ
ューラルネットワークの応用例では、言語の文法が明示
的に与えられていなければならなかった。したがって、
文法が事前に得られていない言語に対しては、何らかの
手段で文法を作成して与えてやらなければならない。本
発明の目的は、文法が明示的に与えられていない未知の
言語に対して、ニューラルネットワークを用いて例文の
構造を抽出し、言語の文法を推定する構造抽出方法を提
供することにある。[Problems to be Solved by the Invention] In conventional applications of neural networks to parsing, the grammar of the language must be explicitly given. therefore,
For languages for which a grammar is not available in advance, a grammar must be created and provided by some means. An object of the present invention is to provide a structure extraction method for extracting the structure of example sentences using a neural network and estimating the grammar of an unknown language for which no grammar has been explicitly given.

【０００４】さらに、記号処理のみでなく信号処理にお
ける特徴抽出にも適用できることを目指す。Furthermore, we aim to be able to apply this method not only to symbol processing but also to feature extraction in signal processing.

【０００５】[0005]

【課題を解決するための手段】構造抽出手段は図１に示
すように、入力パターン生成部１０２，恒等写像学習ニ
ューラルネットワーク部１０３，ニューラルネットワー
ク解析部１０４からなる。入力パターン生成部１０２は
与えられた例文集合１０１から、ニューラルネットワー
クに与える入力パターン集合を生成する。恒等写像学習
ニューラルネットワーク部１０３では砂時計型多層ニュ
ーラルネットワーク（以下、砂時計型ネットワーク）を
用いて入力パターンの恒等写像（入力値をそのまま出力
値とする写像）を学習する。ニューラルネットワークは
、簡単な計算を行うユニットを多数結合して構成され、
ユニット間の結合はそれぞれ結合重みと呼ばれる重みを
持っている。多層ニューラルネットワークは、図２に示
すように、構成要素であるユニット２０６が入力層２０
２，中間層２０３，出力層２０４という層構造をなし、
各層を構成するユニットが、隣接した層内のユニットと
の間にのみ結合を持つニューラルネットワークである。信号はネットワークの中を入力層から出力層の方向へ伝
達される。砂時計型ネットワークとは図５に示すように
、入力層５０３と出力層５０５のユニット数が同じで、
中間層５０４のユニット数がそれより少ない多層ニュー
ラルネットワークである。砂時計型ネットワークに入力
パターンの恒等写像を学習させることによって、例文の
構造が砂時計型ネットワークのユニット間の結合重みに
反映される。ニューラルネットワーク解析部１０４は学
習の終わった砂時計型ネットワークのユニット間の結合
重みを解析し、そこから例文を構成する要素間の関係を
抽出する手段である。[Means for Solving the Problems] As shown in FIG. 1, the structure extraction means includes an input pattern generation section 102, an identity mapping learning neural network section 103, and a neural network analysis section 104. The input pattern generation unit 102 generates an input pattern set to be given to the neural network from the given example sentence set 101. The identity mapping learning neural network unit 103 uses an hourglass multilayer neural network (hereinafter referred to as an hourglass network) to learn an identity mapping of an input pattern (a mapping in which the input value is used as an output value). Neural networks are constructed by combining many units that perform simple calculations.
Each connection between units has a weight called a connection weight. As shown in FIG. 2, the multilayer neural network has a unit 206 that is a component of
2. It has a layered structure of an intermediate layer 203 and an output layer 204,
This is a neural network in which the units forming each layer have connections only with units in adjacent layers. Signals are transmitted through the network from the input layer to the output layer. As shown in FIG. 5, an hourglass network has the same number of units in the input layer 503 and output layer 505,
This is a multilayer neural network in which the number of units in the middle layer 504 is smaller than that. By making the hourglass network learn the identity mapping of the input pattern, the structure of the example sentence is reflected in the connection weights between the units of the hourglass network. The neural network analysis unit 104 is a means for analyzing the connection weights between units of the hourglass network that has been trained, and extracting relationships between elements constituting an example sentence from there.

【０００６】[0006]

【作用】入力として例文の集合１０１が与えられると、
入力パターン生成部１０２においてこれをニューラルネ
ットワークが直接扱える数値パターンの集合に変換する
。次に、これを学習用パターンとして、恒等写像学習ニ
ューラルネットワーク部１０３において砂時計型ニュー
ラルネットワークに学習用パターンの恒等写像を学習さ
せる。砂時計型ネットワークは中間層のユニット数が入
力層，出力層のユニット数よりも少ないので、恒等写像
を学習させることにより、中間層で情報圧縮を行うしく
みが形成され、学習用パターンに内在する構造が結合重
みに反映される。学習終了後、ニューラルネットワーク
解析部１０４において砂時計型ネットワークの出力層と
中間層のユニット間の結合重みを解析し、パターンを構
成する要素間の従属，独立関係を発見する。出力層内の
ユニットについて、中間層内の同一のユニットから重い
結合を受けているユニット同士には従属関係を、そうで
ないユニット同士には独立関係を結論づける。この従属
，独立関係を言語の構造として抽出し、抽出結果１０５
とする。[Operation] When a set of example sentences 101 is given as input,
The input pattern generation unit 102 converts this into a set of numerical patterns that can be directly handled by the neural network. Next, using this as a learning pattern, the identity mapping learning neural network unit 103 causes an hourglass neural network to learn the identity mapping of the learning pattern. In an hourglass network, the number of units in the middle layer is smaller than the number of units in the input and output layers, so by learning the identity mapping, a mechanism for compressing information in the middle layer is formed, which is inherent in the learning pattern. The structure is reflected in the connection weights. After the learning is completed, the neural network analysis unit 104 analyzes the connection weights between the units of the output layer and the intermediate layer of the hourglass network, and discovers the dependence and independence relationships between the elements constituting the pattern. Regarding units in the output layer, a dependent relationship is concluded between units that are heavily coupled from the same unit in the intermediate layer, and an independent relationship is concluded between units that are not. This dependency/independence relationship is extracted as a language structure, and the extraction result 105
shall be.

【０００７】また、この方法は記号処理だけではなく信
号処理にも適用できる。例文の集合１０１の代わりに信
号の標本の集合が与えられると、入力パターン生成部１
０２でこれをニューラルネットワークで扱うのに適した
数値パターンに変換する。これを学習用パターンとして
、恒等写像学習ニューラルネットワーク部１０３で学習
用パターンの恒等写像を学習する。その後、ニューラル
ネットワーク解析部１０４で砂時計型ネットワークの結
合重みを解析し、そこから信号の成分間の従属、独立関
係を抽出し、抽出結果１０５とする。Furthermore, this method can be applied not only to symbol processing but also to signal processing. When a set of signal samples is given instead of the set of example sentences 101, the input pattern generation unit 1
In step 02, this is converted into a numerical pattern suitable for handling by a neural network. Using this as a learning pattern, the identity mapping learning neural network unit 103 learns the identity mapping of the learning pattern. Thereafter, the neural network analysis unit 104 analyzes the connection weights of the hourglass network, extracts dependence and independence relationships between signal components from the connection weights, and extracts them as extraction results 105.

【０００８】[0008]

【実施例】多層ニューラルネットワークは図２に示すよ
うに、入力層２０２，いくつかの中間層２０３，出力層
２０４が順に結合されて構成される。各層は、入力層側
の隣接する層（前段の層）から送られてきた信号を入力
とし、これに変換を加え、出力層側の隣接する層（次段
の層）へ出力する。ただし、入力層は外部からの信号を
入力とし、変換を加えずに次段の層へ出力する。出力層
は前段の層からの信号を入力とし、変換を加え、外部へ
出力する。外部から入力層へ入力された信号はネットワ
ークの中を常に入力層側から出力層側へ伝えられ、出力
層側から入力層側へ逆に伝えられることはない。各層は
簡単な計算を行うユニットで構成される。入力層を構成
するユニットを入力ユニット，出力層を構成するユニッ
トを出力ユニット，中間層を構成するユニットを中間ユ
ニットと呼ぶことにする。DESCRIPTION OF THE PREFERRED EMBODIMENTS As shown in FIG. 2, a multilayer neural network is constructed by sequentially connecting an input layer 202, some intermediate layers 203, and an output layer 204. Each layer inputs a signal sent from an adjacent layer on the input layer side (previous layer), transforms it, and outputs it to an adjacent layer on the output layer side (next layer). However, the input layer receives signals from the outside and outputs them to the next layer without any conversion. The output layer receives the signal from the previous layer, transforms it, and outputs it to the outside. Signals input from the outside to the input layer are always transmitted through the network from the input layer side to the output layer side, and are never transmitted from the output layer side to the input layer side. Each layer consists of units that perform simple calculations. The units that make up the input layer will be called input units, the units that make up the output layer will be called output units, and the units that make up the intermediate layer will be called intermediate units.

【０００９】中間ユニット，出力ユニットは図３に示す
ような多入力一出力のユニットである。各ユニットは数
１で与えられる入出力特性を持つ。The intermediate unit and the output unit are multi-input and one-output units as shown in FIG. Each unit has input/output characteristics given by equation 1.

【００１０】0010

【数１】　　　　　　ｏｉ＝ｆ（ｎｅｔｉ）　　　　　　　　　
　　　　　　　　　　　　　　　　　　　　　　　　　
　　　　　　　…（数１）　　ｏｉ　はユニットｉの出力、ｆは出力関数である。ｎｅｔｉ　はユニットｉの入力の重み付き総和であり、
数２で与えられる。[Formula 1] oi=f(neti)

...(Equation 1) oi is the output of unit i, and f is the output function. neti is the weighted sum of the inputs of unit i,
It is given by the number 2.

【００１１】[0011]

【数２】[Math 2]

【００１２】ここで、ｏｊ　は前段の層内のユニットｊ
の出力、ｗｉｊはユニットｊからユニットｉへの結合重
み、θｉ　はユニットｉが持つバイアスである。各ユニ
ットは隣接する層内のユニットとの間に結合を持ち、同
一層内のユニット間には結合はない。出力関数ｆは通常
、数３で与えられるシグモイド関数が使われる。図４に
シグモイド関数の入出力特性を示す。Here, oj is the unit j in the previous layer.
, wij is the connection weight from unit j to unit i, and θi is the bias that unit i has. Each unit has bonds between units in adjacent layers, and there are no bonds between units in the same layer. As the output function f, a sigmoid function given by Equation 3 is usually used. Figure 4 shows the input/output characteristics of the sigmoid function.

【００１３】[0013]

【数３】　　　　　　ｆ（ｘ）＝１／（１＋ｅｘｐ（−ｘ））　
　　　　　　　　　　　　　　　　　　　　　　　　　
　　　　…（数３）　　入力ユニットは１入力１出力のユニットであり、入
力値をそのまま出力値とする。[Equation 3] f(x)=1/(1+exp(-x))

...(Math. 3) The input unit is a unit with one input and one output, and the input value is used as the output value.

【００１４】砂時計型ニューラルネットワークは図５に
示すように、入力層５０３と出力層５０５のユニット数
が同じで、中間層５０４のユニット数がそれより少ない
多層ネットワークである。この実施例では中間層を１層
持つ、３層の砂時計型ネットワークを用いたが、さらに
多くの中間層を持つ砂時計型ネットワークも可能である
。この砂時計型ネットワークに恒等写像を学習させる。恒等写像とは、出力値が入力値に等しい写像である。砂
時計型ネットワークに恒等写像を学習させると、中間層
で情報圧縮が行われるためのしくみが形成され、その結
果、パターンに内在する構造が結合重みに反映される。As shown in FIG. 5, the hourglass neural network is a multilayer network in which the input layer 503 and output layer 505 have the same number of units, and the intermediate layer 504 has a smaller number of units. In this embodiment, a three-layer hourglass network with one intermediate layer is used, but an hourglass network with more intermediate layers is also possible. Let this hourglass network learn identity mapping. An identity mapping is a mapping whose output value is equal to the input value. When an hourglass network learns the identity mapping, a mechanism is created for information compression in the middle layer, and as a result, the structure inherent in the pattern is reflected in the connection weights.

【００１５】ニューラルネットワークは結合重みを調節
することによって入出力間の写像を学習する。ここでは
バックプロパゲーション法という学習アルゴリズムをも
ちいて結合重みの調節を行う。バックプロパゲーション
法は、ニューラルネットワークに入力パターンとそれに
対して出力すべきパターン（これを教師パターンと呼ぶ
）の組を提示し、ニューラルネットワークが実際に出力
したパターンと教師パターンの差に応じて結合重みを修
正するという教師付き学習である。ユニットｊからユニ
ットｉへの結合重みｗｉｊの修正量Δｗｉｊは数４で与
えられる。Neural networks learn mappings between inputs and outputs by adjusting connection weights. Here, a learning algorithm called backpropagation method is used to adjust the connection weights. The backpropagation method presents a neural network with a set of an input pattern and a pattern to be output (this is called a teacher pattern), and combines them according to the difference between the pattern actually output by the neural network and the teacher pattern. This is supervised learning that modifies the weights. The modification amount Δwij of the connection weight wij from unit j to unit i is given by Equation 4.

【００１６】[0016]

【数４】　　　　　　　　Δｗｉｊ＝−ηδｉｏｊ　　　　　　
　　　　　　　　　　　　　　　　　　　　　　　　　
　　　　　　…（数４）　　ηは学習定数と呼ばれる定数、ｏｊ　はユニットｊ
の出力である。δｉ　はユニットｉの誤差信号で、その
求め方はユニットｉが出力ユニットか中間ユニットかに
よって違う。ユニットｉが出力ユニットである場合はδ
ｉ　は数５によって与えられる。[Formula 4] Δwij=−ηδioj

...(Equation 4) η is a constant called learning constant, oj is unit j
This is the output of δi is the error signal of unit i, and how to obtain it differs depending on whether unit i is an output unit or an intermediate unit. δ if unit i is an output unit
i is given by equation 5.

【００１７】[0017]

【数５】　　　　　　　　δｉ＝（ｏｉ−ｔｉ）ｆ′（ｎｅｔｉ
）　　　　　　　　　　　　　　　　　　　　　　　　
　　　…（数５）　　ここで、ｔｉ　はユニットｉに対する教師パターン
である。また、ｆ′は関数ｆの微分係数である。ユニッ
トｉが中間ユニットである場合は、δｉ　は数６で与え
られる。[Formula 5] δi=(oi-ti)f'(neti
)
...(Equation 5) Here, ti is the teacher pattern for unit i. Further, f' is a differential coefficient of the function f. If unit i is an intermediate unit, δi is given by Equation 6.

【００１８】[0018]

【数６】[Math 6]

【００１９】ユニットｊはユニットｉの次段の層内のユ
ニットである。このように、誤差信号は出力層側から入
力層側へ伝えられる。数４にしたがって、入力パターン
とそれに対する教師パターンを提示しては結合重みを修
正するという操作を繰り返し行い、写像を学習する。図
６にバックプロパゲーション法のＰＡＤ図を示す。Unit j is a unit in the next layer to unit i. In this way, the error signal is transmitted from the output layer side to the input layer side. According to Equation 4, the mapping is learned by repeatedly presenting an input pattern and a teacher pattern corresponding to the input pattern and modifying the connection weights. FIG. 6 shows a PAD diagram of the backpropagation method.

【００２０】恒等写像は、入力と同じ値を出力とするの
で、教師パターンを用意するために入力パターン以外の
情報をなんら必要としない。したがって、事前に知識が
得られていない、全くの未知のパターンに対しても容易
に適用できるという利点を持つ。Since the identity mapping outputs the same value as the input, no information other than the input pattern is required to prepare the teacher pattern. Therefore, it has the advantage that it can be easily applied to completely unknown patterns for which no prior knowledge has been obtained.

【００２１】次に、さらに具体的な実施例を用いて説明
する。Next, a more specific example will be explained.

【００２２】実施例１実施例として記号列の構造抽出を行う。用いた記号列は
、４通りの主語（Ｉ，ＹＯＵ，ＨＥ，ＳＨＥ），１つの
動詞（ＬＩＫＥ），４通りの目的語（ＭＥ，ＹＯＵ，Ｈ
ＩＭ，ＨＥＲ）から作られる３単語の英文である。なお
、Ｉ　　ＬＩＫＥＭＥ．のような文は自然な英語ではな
いかも知れないが、ここでは許しているので、主語と目
的語の組み合わせから１６個の文ができる。Example 1 As an example, the structure of a symbol string will be extracted. The symbol strings used were four types of subjects (I, YOU, HE, SHE), one verb (LIKE), and four types of objects (ME, YOU, H).
This is a three-word English sentence made from IM, HER). In addition, I LIKEME. Sentences like ``may not be natural English, but they are allowed here, so 16 sentences can be made from the combination of subject and object.

【００２３】ニューラルネットワークの構造を図７に示
す。記号列に用いられる記号の数をＳ、記号列の長さを
Ｌとしたとき、入力層はＳ×Ｌ個のユニットを持ち、各
ユニットは、それぞれの記号と、記号列中の位置を表し
ている。この例では、１つの単語を１つの記号とみなし
、用いる単語は、Ｉ，ＹＯＵ，ＨＥ，ＳＨＥ，ＭＥ，Ｈ
ＩＭ，ＨＥＲ，ＬＩＫＥ，ＬＩＫＥＳの９種類に、空白
を加えた１０種類とし、英文は３単語からなるので、入
力ユニット数は３０となる。例えば、このネットワーク
に　　Ｉ　　ＬＩＫＥ　　ＹＯＵ．　　という文を入力
する場合は、第１列のＩ，第２列のＬＩＫＥ，第３列の
ＹＯＵに対応するユニットに値１を、その他のユニット
に値０を入力する。出力層の構成は入力層のそれと同じ
である。中間層は１層とする。各中間ユニットはすべて
の入力ユニット、すべての出力ユニットとの間に結合を
持つ。The structure of the neural network is shown in FIG. When the number of symbols used in a symbol string is S and the length of the symbol string is L, the input layer has S×L units, and each unit represents each symbol and its position in the symbol string. ing. In this example, one word is considered one symbol, and the words used are I, YOU, HE, SHE, ME, H.
There are 10 types, including 9 types: IM, HER, LIKE, and LIKES, plus blanks, and an English sentence consists of 3 words, so the number of input units is 30. For example, in this network I LIKE YOU. When inputting the sentence, enter the value 1 in the units corresponding to I in the first column, LIKE in the second column, and YOU in the third column, and enter the value 0 in the other units. The configuration of the output layer is the same as that of the input layer. The middle layer is one layer. Each intermediate unit has connections with all input units and all output units.

【００２４】学習の結果、恒等写像が実現されているか
どうかの判定は最大学習誤差を用いる。最大学習誤差　
ｍａｘｅｒｒ　は数７で定義される。The maximum learning error is used to determine whether the identity mapping is realized as a result of learning. maximum learning error
maxerr is defined by equation 7.

【００２５】[0025]

【数７】[Math 7]

【００２６】ここで、ｏｉｐはパターンｐを入力したと
きの出力ユニットｉの出力値、ｔｉｐは入力パターンｐ
に対する出力ユニットｉの教師パターンである。ｍａｘ
ｅｒｒ＜０．１であれば恒等写像が実現できたとみなす
。Here, oip is the output value of output unit i when pattern p is input, and tip is input pattern p.
This is the teacher pattern of the output unit i for the output unit i. max
If err<0.1, it is assumed that the identity mapping has been realized.

【００２７】中間ユニットの数を決めるために、種々の
中間ユニット数について、言語の構造が適切に獲得され
るかを調べた。言語の構造が獲得されているかどうかの
判定は次のようにして行った。１６個の英文のうち、１
５個を用いて恒等写像を学習させる。こうして学習した
ネットワークに、学習に用いなかった１文を入力し、未
学習パターンについても恒等写像が実現されれば言語の
構造が獲得されているとする。１６個の文はみな同じ構
造を持っていると考えられるので、１５個の文について
の構造が獲得されていれば、その構造は未学習の１文に
も適用できるはずである。In order to determine the number of intermediate units, we investigated whether the structure of the language could be properly acquired for various numbers of intermediate units. We determined whether the language structure had been acquired as follows. 1 out of 16 English sentences
Learn identity mapping using 5 pieces. It is assumed that the structure of the language has been acquired if one sentence that was not used for learning is input to the network thus learned, and the identity mapping is realized even for the unlearned pattern. Since all 16 sentences are considered to have the same structure, if the structure of the 15 sentences has been acquired, the structure should be applicable to one unlearned sentence.

【００２８】また、赤池の情報量規準ＡＩＣを導入する
。ＡＩＣは数８で定義される。Furthermore, Akaike's information criterion AIC is introduced. AIC is defined by equation 8.

【００２９】[0029]

【数８】　　　　　　　　ＡＩＣ＝ＮｌｏｇｅＥ＋２ｍ　　　　
　　　　　　　　　　　　　　　　　　　　　　　　　
　　　　…（数８）　　ここで、Ｎは学習パターン数、ｍは中間ユニット数
である。Ｅは誤差の二乗和で、数９で与えられる。[Formula 8] AIC=NlogeE+2m

...(Equation 8) Here, N is the number of learning patterns, and m is the number of intermediate units. E is the sum of squares of errors and is given by Equation 9.

【００３０】[0030]

【数９】[Math. 9]

【００３１】ただし、Ｍは出力ユニット数である。ＡＩ
Ｃは、与えられた標本点からもとの関数の近似関数を求
める問題において、近似関数のパラメータの数が適当で
あるかどうかの評価に用いられる量であり、その場合、
Ｎは標本点の数、ｍはパラメータの数である。理想的な
場合においては、パラメータ数の増加にともなってＡＩ
Ｃの値は減少し、ある段階で最小となり、その後はわず
かに増大する傾向を示す。そこで、ＡＩＣの値が最小と
なるパラメータ数を最適とする。Ｎを学習パターン数、
ｍを中間ユニット数に置き換えて、ＡＩＣをニューラル
ネットワークの問題に導入するのは多少無理があるかも
知れないが、参考として取り入れた。なお、ＡＩＣにつ
いては岩波講座ソフトウェア科学９「数値処理プログラ
ミング」ｐ．１６９−ｐ．１７１，ｐ．１７６−ｐ．１
７８を参考にした。[0031] However, M is the number of output units. AI
C is a quantity used to evaluate whether the number of parameters of the approximation function is appropriate in the problem of finding an approximation function of the original function from given sample points; in that case,
N is the number of sample points, and m is the number of parameters. In the ideal case, as the number of parameters increases, the AI
The value of C decreases, reaches a minimum at a certain stage, and then shows a tendency to increase slightly. Therefore, the number of parameters for which the AIC value is the minimum is determined to be the optimum number. N is the number of learning patterns,
Although it may be a bit unreasonable to replace m with the number of intermediate units and introduce AIC into neural network problems, we have incorporated it as a reference. Regarding AIC, please refer to Iwanami Lecture Software Science 9 "Numerical Processing Programming" p. 169-p. 171, p. 176-p. 1
78 was used as a reference.

【００３２】中間ユニット数が３から９のそれぞれの場
合について、ネットワークに恒等写像を学習させる実験
を、結合重みの初期値を変えて２０回行い、未学習パタ
ーンについての最大誤差、ならびに、ＡＩＣによる比較
を行った。学習定数は０．５、１回の学習における、結
合重み修正の繰り返し回数は学習用パターン１個につき
１０００００回ずつ、結合重みの初期値は（−１，１）
の一様乱数で与えた。表１に中間ユニット数が３から９
のそれぞれの場合について、２０回の学習のうち、学習
用データについて恒等写像を実現できた回数、そのうち
、さらに未学習データについても恒等写像を実現できた
回数を示す。また、図８に中間ユニット数とＡＩＣの関
係を示す。図８には学習用データについて恒等写像を実
現できたときのＡＩＣの平均値をプロットしてある。[0032] For each case where the number of intermediate units is from 3 to 9, an experiment in which the network learns the identity mapping is performed 20 times by changing the initial value of the connection weight, and the maximum error for the unlearned pattern and the AIC A comparison was made. The learning constant is 0.5, the number of repetitions of connection weight correction in one learning is 100,000 times for each learning pattern, and the initial value of connection weights is (-1, 1).
given as a uniform random number. Table 1 shows the number of intermediate units from 3 to 9.
For each case, the number of times the identity mapping was realized for the learning data out of 20 learnings and the number of times the identity mapping was realized for the unlearned data are also shown. Further, FIG. 8 shows the relationship between the number of intermediate units and AIC. FIG. 8 plots the average value of AIC when the identity mapping can be realized for the learning data.

【００３３】[0033]

【表１】[Table 1]

【００３４】中間ユニットが３個の場合は、２０回の学
習のうちのいずれも恒等写像を正しく学習できなかった
。中間ユニットが４個以上の場合は、中間ユニットが多
いほど学習用パターンに対する恒等写像は実現されやす
いが、未学習パターンにたいしてはむしろ実現されにく
くなる傾向がある。この結果から、言語の構造を推定す
るには、中間ユニット数は４が適当と言える。なお、Ａ
ＩＣの値を見ると中間ユニット数４の場合に最小となっ
てはいないが、それほど大きくずれてはおらず、中間ユ
ニット数の増加にともなって減少し、中間ユニットを多
くしすぎると増加するという、大まかな傾向は合ってい
るので、中間ユニット数を決定する際の参考程度にはな
る。When there were three intermediate units, the identity mapping could not be learned correctly in any of the 20 learnings. When there are four or more intermediate units, the more intermediate units there are, the easier it is to realize identity mapping for learning patterns, but it tends to be rather difficult to realize identity mapping for unlearned patterns. From this result, it can be said that four intermediate units is appropriate for estimating the structure of a language. In addition, A
Looking at the value of IC, it is not the minimum when the number of intermediate units is 4, but it is not so far off, it decreases as the number of intermediate units increases, and increases when the number of intermediate units is increased. Since the general trend is correct, it can be used as a reference when determining the number of intermediate units.

【００３５】中間ユニット数４の場合の、入力パターン
に対する中間層の発火パターンの一例を表２に示す。ま
た、そのときの入力ユニットと中間ユニットの間の結合
重みを表３に、中間ユニットと出力ユニットの間の結合
重みを表４に示す。表３では、絶対値が１未満の結合重
みは中間ユニットの出力に影響を与えないとして省略し
た。同様に、表４では、絶対値が２未満の結合重みは省
略した。表３，表４において、入力ユニット、および、
出力ユニット番号１から４は記号列の第１列、ユニット
番号１８，１９は第２列、ユニット番号２２から２７は
第３列に対応するユニットである。また、すべてのパタ
ーンを通じて、１度も発火しない入力，出力ユニットは
省略した。Table 2 shows an example of the firing pattern of the intermediate layer with respect to the input pattern when the number of intermediate units is 4. Further, Table 3 shows the connection weights between the input unit and the intermediate unit at that time, and Table 4 shows the connection weights between the intermediate unit and the output unit. In Table 3, connection weights whose absolute value is less than 1 are omitted because they do not affect the output of the intermediate unit. Similarly, in Table 4, connection weights with absolute values less than 2 are omitted. In Tables 3 and 4, the input unit and
Output unit numbers 1 to 4 correspond to the first column of the symbol string, unit numbers 18 and 19 correspond to the second column, and unit numbers 22 to 27 correspond to the third column. In addition, input and output units that never fire were omitted throughout all the patterns.

【００３６】[0036]

【表２】[Table 2]

【００３７】[0037]

【表３】[Table 3]

【００３８】[0038]

【表４】[Table 4]

【００３９】中間ユニットと出力ユニットの間の結合重
みから、出力ユニット間には次のような従属，独立関係
があることがわかる。・第１，第２列は中間ユニット１，４の影響を受けてい
る。第３列は中間ユニット２，３の影響を受けている。第１，第２列と第３列の両方に影響を与える中間ユニッ
トはないので、第１，第２列と第３列は独立である。・第１列と第２列はともに中間ユニット１の影響を受け
ているので、第１列と第２列は従属である。From the connection weight between the intermediate unit and the output unit, it can be seen that there is the following dependent/independent relationship between the output units. - The first and second columns are influenced by intermediate units 1 and 4. The third column is influenced by intermediate units 2,3. Since there is no intermediate unit that affects both the first, second and third columns, the first, second and third columns are independent. - Since the first and second columns are both influenced by the intermediate unit 1, the first and second columns are dependent.

【００４０】一般に、中間ユニットが多すぎると中間層
で情報圧縮が十分に行われず、したがってパターンの冗
長性が残り、本質的な構造が獲得されない。反対に少な
すぎると、必要な情報が中間層に表現できず、恒等写像
が実現されない。したがって、中間ユニット数は、学習
用パターンに対する出力誤差が大きくならない範囲でな
るべく少なく選ぶ。In general, if there are too many intermediate units, information compression will not be sufficient in the intermediate layer, and therefore pattern redundancy will remain and the essential structure will not be obtained. On the other hand, if there are too few, the necessary information cannot be expressed in the intermediate layer, and the identity mapping will not be realized. Therefore, the number of intermediate units is selected to be as small as possible without increasing the output error with respect to the learning pattern.

【００４１】実施例２次に、１６個の文について恒等写像を実現するには、い
くつの例文が学習に必要か、言いかえれば、言語の構造
を推定するにはいくつの例文が必要かを調べた。実施例
１で中間ユニット数は４が適当であるとの結果を得たの
で、ここでは中間ユニット数は４とし、種々の学習用パ
ターン数について、実施例１と同様に２０とおりの結合
重みの初期値から学習を行った。Example 2 Next, how many example sentences are needed for learning to realize the identity mapping for 16 sentences, or in other words, how many example sentences are needed to estimate the structure of the language? I looked into it. In Example 1, we obtained a result that 4 is the appropriate number of intermediate units, so here we set the number of intermediate units to 4, and set 20 combination weights as in Example 1 for various numbers of learning patterns. Learning was performed from initial values.

【００４２】この結果を表５に示す。表中の各列はそれ
ぞれ、学習用パターンの数、２０とおりのうち、学習用
データについて恒等写像が実現できた回数、そのうちさ
らに未学習データについても恒等写像が実現できた回数
を表す。この結果、１２個の例文からでも言語の構造が
獲得できた。The results are shown in Table 5. Each column in the table represents the number of learning patterns, the number of times the identity mapping was realized for the learning data out of 20 patterns, and the number of times the identity mapping was realized for the unlearned data as well. As a result, the structure of the language could be acquired even from 12 example sentences.

【００４３】[0043]

【表５】[Table 5]

【００４４】ある言語の構造を学習した砂時計型ネット
ワークに未知の文を入力すると、学習した言語の構造に
その文が合っていれば、砂時計型ネットワークは入力と
同じ文を出力できる。文が構造に合っていなければ、入
力と同じ文を出力できない。したがって、入力された文
が、学習した構造に合っているかどうかの判定器として
使える。また、砂時計型ネットワークの中間ユニットの
発火パターンを見れば、入力文が砂時計型ネットワーク
で、どのように処理されたかわかるので、構文解析に役
立てることができる。When an unknown sentence is input to an hourglass network that has learned the structure of a certain language, if the sentence matches the structure of the learned language, the hourglass network can output the same sentence as the input. If the sentence does not match the structure, it will not be possible to output the same sentence as the input. Therefore, it can be used as a judge of whether the input sentence matches the learned structure. Furthermore, by looking at the firing patterns of the intermediate units of the hourglass network, we can see how the input sentence was processed by the hourglass network, which can be useful for syntactic analysis.

【００４５】[0045]

【発明の効果】事前に文法が得られていない未知の言語
に対してニューラルネットワークを用いて例文から言語
の文法を推定でき、構文解析に利用できる。また、記号
処理だけでなく信号処理においても、容易に特徴が発見
できないような信号に対してニューラルネットワークを
用いてその特徴を抽出し、信号の特性理解に役立てるこ
とができる。[Effects of the Invention] For unknown languages whose grammars have not been obtained in advance, the grammar of the language can be estimated from example sentences using a neural network, and can be used for syntactic analysis. In addition, not only in symbol processing but also in signal processing, neural networks can be used to extract features of signals whose features cannot be easily discovered, and this can be used to help understand the characteristics of the signal.

[Brief explanation of the drawing]

【図１】構造抽出のフローチャート。FIG. 1 is a flowchart of structure extraction.

【図２】多層ニューラルネットワーク。[Figure 2] Multilayer neural network.

【図３】ニューラルネットワークを構成するユニット。FIG. 3: Units configuring a neural network.

【図４】シグモイド関数の入出力特性。[Figure 4] Input-output characteristics of a sigmoid function.

【図５】砂時計型ネットワーク。FIG. 5: Hourglass network.

【図６】バックプロパゲーション法のＰＡＤ図。FIG. 6 is a PAD diagram of the backpropagation method.

【図７】実施例で用いたニューラルネットワークの構成
。FIG. 7 shows the configuration of the neural network used in the example.

【図８】中間ユニット数とＡＩＣの関係。FIG. 8 shows the relationship between the number of intermediate units and AIC.

[Explanation of symbols]

１０１…構造抽出に用いる例文の集合、１０２…入力パ
ターン生成部、１０３…恒等写像学習ニューラルネット
ワーク部、１０４…ニューラルネットワーク解析部、１
０５…構造抽出結果、７０１…入力ユニットを行列状に
配置した入力層、７０２…入力，出力層よりユニット数
が少ない中間層、７０３…入力層と同じ構造の出力層。101...Set of example sentences used for structure extraction, 102...Input pattern generation section, 103...Identity mapping learning neural network section, 104...Neural network analysis section, 1
05...Structure extraction result, 701...Input layer in which input units are arranged in a matrix, 702...Intermediate layer with fewer units than the input and output layers, 703...Output layer with the same structure as the input layer.

Claims

[Claims]

Claim 1: Using a sample of a pattern belonging to a certain pattern set as input, a neural network consisting of a large number of units that perform simple calculations of multiple inputs and one output are connected via weighted combinations, and the sample pattern is constant. A neural network characterized by comprising a process of learning an identity mapping, and a process of estimating independence and dependency relationships between constituent elements of patterns belonging to a pattern set by analyzing the connection weights of the neural network that has learned the identity mapping. structure extraction method.

2. In the structure extraction method according to claim 1, the target pattern set is a set of symbol strings belonging to a certain language, and from a given example sentence, the independence and dependence relationships between the symbols constituting the symbol string are determined. A structure extraction method using a neural network that is characterized by estimating.

3. The structure extraction method according to claim 1, wherein the target pattern set is a signal set, and independence and dependence relationships between signal components are estimated from a given signal sample. A structure extraction method using a neural network.

4. In the structure extraction method according to claim 1, the neural network for learning the identity mapping comprises an input layer,
It is a multilayer neural network that consists of a set of units called a middle layer and an output layer connected in order, and a signal given to the input layer is transmitted only in one direction from the input layer side to the output layer side. and the number of units in the output layer is equal,
A structure extraction method using a neural network characterized by an hourglass-shaped multilayer neural network in which the number of units in the middle layer is smaller than that.

5. In the structure extraction method according to claim 1, the process of analyzing the connection weights of the neural network that has learned the identity mapping of the sample pattern includes analyzing the connection weights between the intermediate layer and the output layer, and analyzing the connection weights between the intermediate layer and the output layer. A structure extraction method using a neural network, which is characterized in that, regarding units in a layer, a dependent relationship is concluded between units that are heavily connected to each other from the same unit in an intermediate layer, and an independent relationship is concluded between units that are not.

6. The structure extraction method according to claim 1, wherein the number of units in the intermediate layer of the neural network that learns the identity mapping of the sample pattern can be determined with reference to Akaike's information criterion AIC. Structure extraction method using neural network.

7. In the structure extraction method according to claim 1, the number of units in the intermediate layer of the neural network that learns the identity mapping of the input pattern is set as small as possible within a range that can realize the identity mapping for all sample patterns. A structure extraction method using a neural network.

8. In the structure extraction method for symbol strings according to claim 2, the input layer of the neural network for learning the identity mapping has units arranged in a matrix and located in the i-th row and the j-th column. The unit is the symbol i at position j in the symbol string
Due to this shape of the input layer, inputting a symbol string into the neural network requires no information other than all the symbols that can appear in the symbol string and the maximum length of the symbol string. A structure extraction method using a neural network.