JP2004348637A

JP2004348637A - Associative storage device and method therefor

Info

Publication number: JP2004348637A
Application number: JP2003147599A
Authority: JP
Inventors: Yukihiro Tsuboshita; 幸寛坪下; Hiroshi Okamoto; 洋岡本; Motofumi Fukui; 基文福井; Masahiro Maeda; 正浩前田; Isao Yamaguchi; 功山口
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2003-05-26
Filing date: 2003-05-26
Publication date: 2004-12-09
Anticipated expiration: 2023-05-26
Also published as: JP4360122B2

Abstract

<P>PROBLEM TO BE SOLVED: To properly perform associative recall with reduced cost of computation even for unbalanced data such as a document pattern. <P>SOLUTION: An input feature vector is generated on the basis of the word string from a syntax analyzer 12 and received at an input feature vector storage 141. The input feature vector initializes the associative recall feature vector value of an associative recall feature vector storage 142. Then, an associative recall feature vector updating part 143 generates a new associative recall feature vector on the basis of the weighting matrix of a database 15 and the associative recall feature vector at the point of time, and updates the value. A convergence decision part 144 discriminates whether the associative recall feature vector is converged or not. When discriminated that the vector is converged, an associative recall feature vector output part 145 outputs the associative recall feature vector at the point of time and outputs a key word to a user interface part 11 based on this. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
この発明は、ニューラルネットワークを用いた連想記憶技術に関し、とくにニューラルネットワークにおける活性化ノードの決定手法の改良に関する。
【０００２】
【従来の技術】
連想記憶に基づく認識理解技術は、自然言語処理分野に応用可能である。例えば、特許文献１は、関連性はあるが異質な情報を提供できるような情報検索装置を提供することを目的として連想記憶技術を文書の検索に適用した発明を開示している。与えられた複数の文書を辞書中の単語が存在するか否かで１，０を付与した「キーワードベクトル」を作成し、自己相関連想記憶として、「アソシアトロン」に記銘させる。ユーザが入力した文章、あるいは単語列から、「キーワードベクトル」を作成し、それを初期条件としてアソシアトロンに想起を行わせる。この操作により、ａｎｄ条件、ｏｒ条件とは全く異なるパターン認識的な検索処理を実現している。
【０００３】
しかしながら、この手法では従来型の連想記憶モデルである「アソシアトロン」をそのまま用いているので、文書パターンを想起する場合における様々な問題点を解決するには至っていない。従来型の連想記憶モデルではランダムパターン（パターンをＮ個の０と１との列として表現した場合ｉ番目の値とｊ番目の値とが互いに独立に決まるようなパターンをランダムパターンという）の学習・想起することは容易であるが、文書パターンのような偏ったパターンを学習・想起することは困難である。
【０００４】
辞書中の単語が文書中に存在するか否かで、１，０を付与したベクトルによって、一つの文書パターンを表現する場合、文書パターンには次のような特徴がある。
【０００５】
（１）ノイズを含まないパターンは存在しない
実際の文書を学習データとして使用した場合、大抵は話題と関係ない単語を含んでいる。そのようなデータは学習データのノイズと見ることができる。
【０００６】
（２）一つのパターンは全単語に対して非常に少ない単語しか含まない
日本語には数十万の単語があるが、例えば電子メール等の比較的短い文書に含まれるのはせいぜい数百語程度である。このようなパターンはスパース（疎）なパターンといわれる。
【０００７】
（３）単語の出現頻度差が大きい
例えば、「これ」「この」といった指示語は使用頻度が高いが、様々な専門用語等は特殊な場合にしか現れない。
【０００８】
（４）類似パターンの出現頻度に偏りがある
例えば、お客様からの問い合わせの電子メール等を対象にした場合、良くある問い合わせパターンと、あまり現れないパターンの出現頻度の差異は非常に顕著である。
【０００９】
ランダムスパースなパターンを記憶するために、共分散行列を用いた連想記憶が提案されている（非特許文献１、非特許文献２）。この連想記憶モデルでは、Ｎ個のノードからランダムにノードを順次選び、（２）式に従ってその活性値を更新する。活性値の変化するノードがなくなるまでこの操作を繰り返す。
【数１】

この連想記憶モデルでは、上記（１）の問題に関しては、ノードの活性値の活性確率を導入することにより、ランダムに現れる共起性と、真に相関があり共起するパターンを区別することで解決されている。また、この連想記憶モデルは、ランダムスパースなパターンを記憶するために考案されたので、上記（２）については問題ない。また、上記（３）についても、ノードの活性値の活性確率を導入することにより解決されている。
【００１０】
しかし、この連想記憶モデルは、類似パターンの出現頻度が偏った場合（上記４）には適していない。なぜならば、この連想記憶モデルではよく現れるパターンのみを想起する傾向にあるからである。
【００１１】
また、特許文献２では、連想記憶モデルを仮名漢字変換に適用する研究が行われている。この連想記憶モデルでは、上記（１）〜（４）の問題を解決するために、共分散行列を用いた連想記憶モデルにおいて、次のような工夫を施している。新規の文章パターンを記銘する際に、そのパターンのエネルギー値を計算し、このエネルギー値と所定のエネルギー値との差に基づいてリンクの重みを更新させる度合いを定める。
【数２】

このような工夫により、文書パターンがどのようなものであってもエネルギー値をほぼ一定にするようにリンクの値を学習する。これにより偏ったパターンについても適切に想起できるようになる（上記（４）の解決）。
【００１２】
しかしながら、文書パターンを学習する際に、その都度エネルギーを計算する必要があるので学習に時間を要する。また、この手法では、類似パターンの出現頻度などの情報がネットワークの重みには反映されていないので、重み行列は、学習データの構造を正しく反映しているものであるとはいえない。
【特許文献１】
特許第２８３２６７８号
【特許文献２】
特許第３３６４２４２号
【非特許文献１】
Ｓ．Ａｍａｒｉ，Ｎｅｕｒａｌｔｈｅｏｒｙｏｆａｓｓｏｃｉａｔｉｏｎａｎｄｃｏｎｃｅｐｔ−ｆｏｒｍａｔｉｏｎ，Ｂｉｏ．Ｃｙｂｅｒｎ，Ｖｏｌ．２６，ｐｐ．１８５−１７５，１９７７
【非特許文献２】
Ｓ．Ａｍａｒｉ．Ｃｈａｒａｃｔｅｒｉｓｔｉｃｓｏｆｓｐａｒｓｅｌｙｅｎｃｏｄｅｄａｓｓｏｃｉａｔｉｖｅｍｅｍｏｒｙ．ＮｅｕｒａｌＮｅｔｗｏｒｋｓ，Ｖｏｌ．２，ｐｐ．４５１−４５７，１９８９）
【００１３】
【発明が解決する課題】
この発明は、以上の事情を考慮してなされたものであり、少ない計算コストでも偏ったパターンを適切に想起できる連想記憶手法を提供することを目的としている。
【００１４】
【課題を解決するための手段】
この発明によれば、上述の目的を達成するために、特許請求の範囲に記載のとおりの構成を採用している。ここでは、発明を詳細に説明するのに先だって、特許請求の範囲の記載について補充的に説明を行なっておく。
【００１５】
この発明の原理的な構成では、共分散行列を用いた連想記憶手法において、全体の発火率に比例した抑制入力をネット−ワークに加えることによって、文書パターンのような非常に偏ったデータに対しても、初期状態に依存した形での連想想起を行えるようにする。
【００１６】
この結果、つぎのような効果がある。
・非常に「偏った」重み行列を用いても、状況依存的な出力が可能である。
・リンクの更新に際し、逐一エネルギー値を求める必要がないので、時間を要しない。
・抑制入力の大きさを制御することによって、エネルギー値が低いパターンから、エネルギー値が高いパターンまで、様々なレベルのパターンを出力することが可能である。
【００１７】
なお、この発明は装置またはシステムとして実現できるのみでなく、方法としても実現可能である。また、そのような発明の一部をソフトウェアとして構成することができることはもちろんである。またそのようなソフトウェアをコンピュータに実行させるために用いるソフトウェア製品もこの発明の技術的な範囲に含まれることも当然である。
【００１８】
この発明の上述の側面および他の側面は特許請求の範囲に記載され以下実施例を用いて詳述される。
【００１９】
【発明の実施の形態】
以下、この発明をキーワード抽出装置に適用した実施例について説明する。このキーワード抽出装置は、文書に自動的にキーワードを付与して文書の内容を把握しやすくする、あるいは検索を容易にすることのできる文書処理装置の構築を目指すものである。オフィス等における、文書の管理・処理に関するシステムなどに応用される。とくに、文書中に含まれている単語のみならず、文書に含まれていないが、文書に含まれている単語群との関連に基づいて、文書の意味内容に深く関わる単語もキーワードとして出力できるようにするものである。
【００２０】
図１はこの実施例のキーワード抽出装置を全体として示しており、この図において、キーワード抽出装置１０はユーザインターフェース部１１、構文解析部１２、単語間相関学習部１３、想起情報抽出部１４、データベース１５等を含んで構成されている。
【００２１】
ユーザインターフェース部１１は、キーボードやモニタ等からなり、文書の入力やキーワードの提示をユーザが実行可能にするものである。
【００２２】
構文解析部１２は、学習用文書、及び、入力文書を単語に分解して構文解析する（以下学習用文書群、入力文書に対して全く同様の処理を行う場合、これらを総称し「文書」と記述する）。この時、同じ意味の単語は一つの代表単語に変換される。例えば、「プリンター」、「プリンタ」、「Ｐｒｉｎｔｅｒ」、「ｐｒｉｎｔｅｒ」は同じ意味の単語として一つの代表語「プリンター」に置き換えられる。そして文書は一つの特徴ベクトルに変換される。文書Ｄ_μの特徴ベクトルＦ_μは次のようになる。
【数３】

特徴ベクトルＦ_μの要素ｗ_ｉ ^μは単語Ｗ_ｉが文書Ｄ_μに現れたら１、現れなければ０となる。すなわち、特徴ベクトルは、ある単語がその文書に現れたか否かのみで判断されており、出現頻度、重要度などは考慮されていない。特徴ベクトルの長さｎは、あらかじめ設定する。また、特徴ベクトル生成時に使用されるｎ個の単語｛Ｗ_１，Ｗ_２，…，Ｗ_ｎ｝は、学習用文書群に出現する単語をＴＦ＊ＩＤＦの値によりスコア付けして、その上位ｎ個を採用する。ＴＦ（ＴｅｒｍＦｒｅｑｕｅｎｃｙ）は、ある文書ｄにおける索引語ｔの生起頻度であり、ＩＤＦ（ＩｎｖｅｒｓｅＤｏｃｕｍｅｎｔＦｒｅｑｕｅｎｃｙ）ＴＦは語がどのくらい特定性を持つかを表す。
【００２３】
単語間相関学習部１３は、学習用文書から、共起性に基づいて単語間の関連性を学習し連想記憶行列を構築する。すなわち連想行列ｇ_ｉｊは、次のような式で求められる。
【数４】

更に、例えば、突出して出現度の高い単語の影響を抑えるため、連想記憶行列ｇ_ｉｊを行ごとにノーマライズすることによって、システムからの入力を一定値以内に抑える。
【数５】

このようにして、連想記憶行列Ｇ_ｉｊ（１≦ｉ，ｊ≦ｎ）を得る。得られた連想記憶行列Ｇ_ｉｊはデータベース１５に格納される。
【００２４】
想起情報抽出部１４は、入力文書から、連想想起によりキーワードを抽出する。想起情報抽出部１４は、模式的には図２に示すような構成を有し、図３に示すような動作を実行する。
【００２５】
図２に示すように、構文解析部１２からの単語列に基づいて入力特徴ベクトルを生成し、これを入力特徴ベクトル記憶部１４１で受取り、この入力特徴ベクトルで連想想起特徴ベクトル記憶部１４２の連想想起特徴ベクトルの値を初期化する。この後、連想想起特徴ベクトル更新部１４３が、データベース１５の重み行列およびその時点での連想想起特徴ベクトルに基づいて新たな連想想起特徴ベクトルを生成してその値を更新する。収束判定部１４４は、連想想起特徴ベクトルが収束したかどうかを判別し、収束したと判別した場合には連想想起特徴ベクトル出力部１４５がその時点の連想想起特徴ベクトルを出力し、これに基づいてキーワードをユーザインターフェース部１１に出力する。
【００２６】
以下、図３を参照して詳細に説明する。なお、連想想起を行わせるｎ次元ベクトルをＮ＝［Ｎ_１，Ｎ_２，…，Ｎ_ｎ］と定義する。
【００２７】
［ステップＳ１０］：特徴ベクトルの初期化
入力文書から得た特徴ベクトルをＱ＝［Ｑ_１，Ｑ_２，…，Ｑ_ｎ］とする時、入力特徴ベクトルＱで連想想起特徴ベクトルＮを初期化する。すなわち、
【数６】

【００２８】
［ステップＳ１１］：ランダムにノードｋを選択
ノードの状態変化は非同期的に行われる。すなわち一回に任意の一つのノードのみが状態変化する。
【００２９】
［ステップＳ１２］：Ｎ_ｋの更新
基本的には、閾値が一定値以上であれば発火（Ｎ_ｋ＝１）し、一定値以下であれば、発火を取りやめる（Ｎ_ｋ＝０）単純なバイナリ型のモデルニューロンによって、ダイナミクスは実現される。もちろん、これに限定されない。
【００３０】
選択されたノードＮ_ｋは、次式に従って状態が更新される。
【数７】

このような抑制入力を加えることによって、最終的に抽出される単語数を一定数以内に制御することができる。なお、比例定数αにかえて単調増加する関数を用いてもよい。
また
【数８】

の値をノードｋの活性度と定義する。これは他のノードからどれくらい大きな入力を受けているかを表している。すなわち、その単語が入力文書に対してどれくらいの活性を有するかの尺度として用いることができる。
【００３１】
［ステップＳ１３］：収束判定
収束判定は様々な方法が考えられる。ある一定試行回数状態が変化しなくなったところで収束と判定しても良い。あるいは、試行回数に上限を与えるなどの方法などを採用しても良いだろう。
【００３２】
［ステップＳ１４］：連想想起特徴ベクトルの出力
収束が判定された後、連想想起特徴ベクトルＮを出力する。この連想想起特徴ベクトルＮに基づいて１または複数のキーワードを決定する。
【００３３】
この実施例によれば、文書知識のような非常に偏ったデータに対しても、全体の発火数に比例する抑制入力を導入することで、リンクの学習に特別な処理を施すことなく初期状態に依存した形での連想想起を行える。
【００３４】
また、抑制入力の比例定数αを変化させることにより、最終的にキーワードとして抽出される語数をある一定数に制御することも可能である。
【００３５】
更に、キーワードとして抽出された単語それぞれの活性度の値を入力文書に対するそのキーワードの位置付けの重要度とみなすことにより、抽出された単語のランク付けを行うことも可能である。
【００３６】
以上のような連想想起によって、入力文書に存在していた単語であっても、発火している他のノードからの入力が小さければ、すなわち、発火している他の単語との関連が小さければ発火しなくなり、逆に入力文書に存在していない単語であっても、他のノードから大きな入力を受ければ、すなわち、発火している他の単語と関連が大きければ発火する。これにより、過去の知識を活用しながらも、それぞれの文書に固有の関連キーワードを、その文書に含まれていない単語も含め抽出するキーワード抽出装置を実現した。
【００３７】
また単語間の関連性は、学習文書群を与えることによって自動的に獲得されるので、分野ごとに固有の知識を先見的に与える必要はない。更に、従来までの概念辞書、意味ネットワーク等を用いるアプローチでは、辞書のメンテナンス作業を日々行う必要があるが、本特許では、新規の学習文書群を追加するだけで、新たな単語間の関連性を自動的に獲得することができるので、メンテナンスに対する労力も大幅に低減される。
［実験例］
【００３８】
以下、実際に行った実験を題材に具体的に説明する。
【００３９】
富士ゼロックス株式会社のお客様相談センターに寄せられた問い合わせメールを用いた検証を行った。学習文書数は、１１，３６５であり、特徴ベクトル生成時に使用する単語数は１，０００とした。
【００４０】
まずは、学習文書とは異なる問い合わせメール６２通を初期状態として与えた場合の連想結果（制御入力がある場合と無い場合）を図４示す。このように、抑制入力を適用しない場合は、６２通のメールがほとんど同一のパターンに収束しているのに対して、抑制入力を適用した場合には、ほとんどのメールが異なるパターンに収束していることが分かる。
【００４１】
次に、ある典型的なクレームメールに対する適用例を示す。
【００４２】
メール本文を要約すると、
「富士ゼロックスからコピー機を購入して使用している。先日インクがよく出ないのでカートリッジを交換したところ、エラーが出てしまう。古いものに交換したところ正常である。相談センターへ連絡すると本体を送ってほしいといわれた。指示通り送付し、確認の電話を入れたら届いていないといわれる。なぜインクが出ないのかを聞いても要領を得ない回答をされる。コールセンターの事務的取り扱いは顧客の立場を少しも考えてくれていない。責任の所在を明確にして欲しい。」
といったものであった。
この入力メールの特徴ベクトル中、値が１となった単語は図５のようになった。
連想想起後、最終的に抽出されたキーワードは図６のようになった。ただし、（）内の数字は、最終的な活性値、アスタリスクは、本文中にはないキーワードを表す。
【００４３】
更に、上記抽出キーワードの中から、活性度に従って上位１０単語を切り出した。この結果は図７に示すとおりである。この中で、「クレーム」、「責任」等のキーワードは、本文中に出現する単語よりも文章の内容を的確に表していると考えることができ、「文章の内容を把握しやすくする」という目的に対して、より適切なキーワードが得られたといえる。
【００４４】
つぎに上述実施例を用いてキーワードを抽出し、このキーワードを文書とともに登録する文書処理装置の例を説明する。
【００４５】
図８は、文書のキーワードを抽出しながら文書を登録し、登録文書のキーワード検索を可能にする文書処理装置２０を示している。この文書処理装置２０は、基本的には図１のキーワード抽出装置１０を主たる構成要素とし、これに文書データベース２１、文書登録部２２、文書検索部２３を付加したものである。なお、図８において図１に対応する箇所には対応する符号を付した。この例では想起情報抽出部１４で抽出された連想想起特徴ベクトルに基づいて出力されるキーワードを文書と関連づけて文書データベース２１に登録し、文書検索部２３を用いて文書をキーワード検索する。
【００４６】
以上で、この発明の実施例の説明を終了する。
【００４７】
なお、この発明は上述の実施例に限定されるものではなくその趣旨を逸脱しない範囲で種々変更が可能である。例えば、ニューラルネットワーク全体の発火率に比例する、あるいはその発火率に応じて単調増加する抑制入力を用いるようにしたが、予め所定範囲のノードに限定してその範囲内の発火率を基準にしてもよい。またバイナリ型でない場合には、発火率に代えて活性値の総和を用いるようにしてもよい。考慮するノードの範囲は、ノードに該当する単語の頻度等に基づいて決定してもよい。また、状況に応じて、抑制入力を与えることなく、連想想起特徴ベクトルの収束結果を出力して連想想起を行うようにしてもよい。
【００４８】
【発明の効果】
以上説明したように、この発明によれば、文書パターンのように偏ったデータに対しても、連想想起を適切に行うことができ、しかも計算コストを抑えることができる。
【図面の簡単な説明】
【図１】この発明の実施例のキーワード抽出装置の構成を示すブロック図である。
【図２】図１の実施例の想起情報抽出部１４の構成を模式的に示すブロック図である。
【図３】図１の実施例の想起情報抽出部１４の動作を説明するフローチャートである。
【図４】実験例を説明する図である。
【図５】実験例を説明する図である。
【図６】実験例を説明する図である。
【図７】実験例を説明する図である。
【図８】図１の実施例を用いたキーワード付きで文書を登録する文書処理装置の構成例を示すブロック図である。
【符号の説明】
１０キーワード抽出装置
１１ユーザインターフェース部
１２構文解析部
１３単語間相関学習部
１４想起情報抽出部
１５データベース
２０文書処理装置
２１文書データベース
２２文書登録部
２３文書検索部
１４１入力特徴ベクトル記憶部
１４２連想想起特徴ベクトル記憶部
１４３連想想起特徴ベクトル更新部
１４４収束判定部
１４５連想想起特徴ベクトル出力部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an associative memory technology using a neural network, and more particularly to an improvement in a method for determining an activation node in a neural network.
[0002]
[Prior art]
The recognition and understanding technology based on associative memory can be applied to the field of natural language processing. For example, Patent Literature 1 discloses an invention in which an associative memory technology is applied to a document search in order to provide an information search device capable of providing related but heterogeneous information. A “keyword vector” is added to the given documents based on whether or not a word in the dictionary exists, and “1” and “0” are created and stored in “Associtron” as self-related association memory. A “keyword vector” is created from a sentence or a word string input by the user, and associatron recalls the keyword vector as an initial condition. This operation realizes a pattern-aware search process completely different from the AND condition and the OR condition.
[0003]
However, in this method, since the conventional associative memory model "Associantron" is used as it is, various problems in recalling a document pattern have not been solved. In the conventional associative memory model, learning of a random pattern (a pattern in which the i-th value and the j-th value are determined independently of each other when a pattern is represented as a sequence of N 0s and 1s is called a random pattern) It is easy to recall, but it is difficult to learn and recall a biased pattern such as a document pattern.
[0004]
When one document pattern is represented by a vector to which 1,0 is added depending on whether or not a word in the dictionary exists in the document, the document pattern has the following features.
[0005]
(1) When an actual document in which no pattern containing no noise does not exist is used as learning data, it usually contains words that are not related to the topic. Such data can be viewed as noise in the training data.
[0006]
(2) One pattern contains hundreds of thousands of words in Japanese, which contain very few words in all words, but a few hundred words are included in a relatively short document such as e-mail. It is about. Such a pattern is called a sparse (sparse) pattern.
[0007]
(3) The appearance frequency difference of words is large. For example, the designation words such as "this" and "this" are frequently used, but various technical terms and the like appear only in special cases.
[0008]
(4) There is a bias in the frequency of appearance of similar patterns. For example, in the case of an e-mail of an inquiry from a customer, the difference between the frequency of appearance of a common inquiry pattern and the frequency of appearance of a pattern that rarely appears is very remarkable.
[0009]
In order to store a random sparse pattern, associative memory using a covariance matrix has been proposed (Non-Patent Documents 1 and 2). In this associative memory model, nodes are sequentially selected at random from N nodes, and their activation values are updated according to the equation (2). This operation is repeated until there is no node whose active value changes.
(Equation 1)

In this associative memory model, regarding the above-mentioned problem (1), by introducing the activation probability of the activation value of the node, it is possible to distinguish the co-occurrence that appears at random from the pattern that is truly correlated and co-occurs. Has been resolved. Also, since this associative memory model is devised to store a random sparse pattern, there is no problem for the above (2). Also, the above (3) is solved by introducing the activation probability of the activation value of the node.
[0010]
However, this associative memory model is not suitable when the frequency of appearance of similar patterns is biased (4). This is because this associative memory model tends to recall only patterns that often appear.
[0011]
Further, in Patent Literature 2, research is performed on applying an associative memory model to kana-kanji conversion. In this associative memory model, in order to solve the problems (1) to (4), the following ingenuity is applied to the associative memory model using a covariance matrix. When a new text pattern is memorized, the energy value of the pattern is calculated, and the degree of updating the link weight is determined based on the difference between this energy value and a predetermined energy value.
(Equation 2)

With such a contrivance, the value of the link is learned so that the energy value is substantially constant regardless of the document pattern. As a result, a biased pattern can be appropriately recalled (solution (4) above).
[0012]
However, when learning a document pattern, it is necessary to calculate the energy each time, so that it takes time to learn. Further, in this method, since information such as the frequency of appearance of similar patterns is not reflected in the weight of the network, the weight matrix cannot be said to correctly reflect the structure of the learning data.
[Patent Document 1]
Patent No. 2832678 [Patent Document 2]
Patent No. 3364242 [Non-Patent Document 1]
S. Amari, Neural theory of association and concept-formation, Bio. Cybern, Vol. 26 pp. 185-175,1977
[Non-patent document 2]
S. Amari. Characteristics of sparselyencoded associated memory. Neural Networks, Vol. 2, pp. 451-457, 1989)
[0013]
[Problems to be solved by the invention]
The present invention has been made in view of the above circumstances, and has as its object to provide an associative memory method that can appropriately recall a biased pattern with a small calculation cost.
[0014]
[Means for Solving the Problems]
According to the present invention, in order to achieve the above object, a configuration as described in the claims is adopted. Here, before describing the invention in detail, the description of the claims will be supplementarily described.
[0015]
According to the principle configuration of the present invention, in an associative memory method using a covariance matrix, by applying a suppression input proportional to the overall firing rate to a network, highly biased data such as a document pattern can be obtained. However, associative recall in a form depending on the initial state can be performed.
[0016]
As a result, the following effects are obtained.
-A situation-dependent output is possible even with a very "biased" weight matrix.
-When updating the link, it is not necessary to obtain the energy value every time, so that no time is required.
By controlling the magnitude of the suppression input, it is possible to output various levels of patterns from patterns with low energy values to patterns with high energy values.
[0017]
The present invention can be realized not only as a device or a system but also as a method. In addition, it goes without saying that a part of such an invention can be configured as software. Also, it goes without saying that a software product used for causing a computer to execute such software is also included in the technical scope of the present invention.
[0018]
The above and other aspects of the invention are set forth in the appended claims and described in detail below using embodiments.
[0019]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, an embodiment in which the present invention is applied to a keyword extracting device will be described. The purpose of this keyword extraction device is to construct a document processing device capable of automatically assigning a keyword to a document to make it easy to grasp the contents of the document or to make it easy to search. It is applied to systems related to document management and processing in offices and the like. In particular, not only words included in the document but also words that are not included in the document but are deeply related to the meaning and content of the document can be output based on the relationship with the words included in the document. Is to do so.
[0020]
FIG. 1 shows a keyword extracting apparatus as a whole according to this embodiment. In this figure, a keyword extracting apparatus 10 includes a user interface unit 11, a syntax analyzing unit 12, an inter-word correlation learning unit 13, a recall information extracting unit 14, a database 15 and the like.
[0021]
The user interface unit 11 includes a keyboard, a monitor, and the like, and enables a user to input a document and present a keyword.
[0022]
The syntax analysis unit 12 decomposes the learning document and the input document into words and performs syntax analysis (hereinafter, when the same processing is performed on the learning document group and the input document, these are collectively referred to as “documents”. Described). At this time, words having the same meaning are converted into one representative word. For example, “printer”, “printer”, “Printer”, and “printer” are replaced with one representative word “printer” as words having the same meaning. Then, the document is converted into one feature vector. The feature vector F _mu document D _mu as follows.
[Equation 3]

1 When the elements _{w i} ^μ of the feature vector _F μ appeared word _{W i} is the document D _μ, a 0 if it appears. That is, the feature vector is determined only by whether or not a certain word appears in the document, and does not consider the appearance frequency, importance, and the like. The length n of the feature vector is set in advance. Also, the n words {W ₁ , W ₂ ,..., W _n } used when generating the feature vector are obtained by scoring words appearing in the learning document group by the value of TF * IDF, Adopt a piece. TF (Term Frequency) is the frequency of occurrence of the index term t in a certain document d, and IDF (Inverse Document Frequency) TF indicates how specific the word is.
[0023]
The inter-word correlation learning unit 13 learns the relevance between words from the learning document based on co-occurrence, and constructs an associative memory matrix. That is, the association matrix g _ij is obtained by the following equation.
(Equation 4)

Further, for example, in order to suppress the influence of a word having a high appearance frequency, the input from the system is suppressed within a certain value by normalizing the associative memory matrix g _ij for each row.
(Equation 5)

Thus, an associative memory matrix G _ij (1 ≦ i, j ≦ n) is obtained. The obtained associative memory matrix G _ij is stored in the database 15.
[0024]
The recall information extracting unit 14 extracts a keyword from the input document by associative recall. The recall information extracting unit 14 has a configuration schematically shown in FIG. 2 and executes an operation as shown in FIG.
[0025]
As shown in FIG. 2, an input feature vector is generated based on the word string from the syntax analysis unit 12, received by the input feature vector storage unit 141, and is associated with the associative recall feature vector storage unit 142 using the input feature vector. Initialize the value of the recall feature vector. Thereafter, the associative associative feature vector updating unit 143 generates a new associative associative feature vector based on the weight matrix of the database 15 and the associative associative feature vector at that time, and updates its value. The convergence determination unit 144 determines whether or not the associative recall feature vector has converged. If it is determined that the associative recall feature vector has converged, the associative recall feature vector output unit 145 outputs the associative recall feature vector at that time. The keyword is output to the user interface unit 11.
[0026]
Hereinafter, this will be described in detail with reference to FIG. Note that an n-dimensional vector for causing associative recall is defined as N = [N ₁ , N ₂ ,..., N _n ].
[0027]
[Step S10]: wherein _Q = a feature vector obtained from the initialization input document vectors [Q _1, Q 2, ..., _{Q n]} when the initializes an associative recall feature vector N at the input feature vector Q . That is,
(Equation 6)

[0028]
[Step S11]: Node k is selected at random The state change of the node is performed asynchronously. That is, only one arbitrary node changes state at a time.
[0029]
[Step S12]: Update of _Nk Basically, if the threshold value is equal to or more than a certain value, fire ( _Nk = 1), and if it is equal to or less than a certain value, stop firing ( _Nk = 0). Dynamics are realized by a model neuron of the type. Of course, it is not limited to this.
[0030]
The state of the selected node _Nk is updated according to the following equation.
(Equation 7)

By adding such a suppression input, the number of words finally extracted can be controlled within a certain number. Note that a function that monotonically increases may be used instead of the proportionality constant α.
Also,

Is defined as the activity of the node k. This shows how much input is received from other nodes. That is, it can be used as a measure of how active the word is in the input document.
[0031]
[Step S13]: Convergence Judgment Various methods can be considered for convergence judgment. Convergence may be determined when the state of a certain number of trials stops changing. Alternatively, a method of giving an upper limit to the number of trials may be adopted.
[0032]
[Step S14]: After the output convergence of the associative recall feature vector is determined, the associative recall feature vector N is output. One or more keywords are determined based on the associative recall feature vector N.
[0033]
According to this embodiment, even for highly skewed data such as document knowledge, by introducing a suppression input proportional to the total number of firings, the initial state can be obtained without performing any special processing for link learning. Can perform associative recall in a form that depends on.
[0034]
Further, by changing the proportionality constant α of the suppression input, the number of words finally extracted as a keyword can be controlled to a certain number.
[0035]
Furthermore, it is possible to rank the extracted words by regarding the value of the activity of each word extracted as a keyword as the importance of positioning the keyword with respect to the input document.
[0036]
With the above associative recall, even if a word was present in the input document, if the input from the other firing node is small, that is, if the association with the other firing word is small, It does not fire, and conversely, even if it is a word that does not exist in the input document, it is fired if it receives a large input from another node, that is, if it is highly related to the other word that is firing. As a result, a keyword extracting device that extracts related keywords unique to each document, including words not included in the document, while utilizing past knowledge is realized.
[0037]
In addition, since the relevance between words is automatically acquired by providing a learning document group, it is not necessary to give knowledge specific to each field in advance. Furthermore, in the conventional approach using a concept dictionary, a semantic network, and the like, it is necessary to perform dictionary maintenance work every day. However, in this patent, the relevance between new words can be obtained simply by adding a new learning document group. Can be obtained automatically, so that the maintenance effort is greatly reduced.
[Example of experiment]
[0038]
Hereinafter, an actual experiment will be described in detail.
[0039]
The verification was performed using the inquiry e-mail sent to the Fuji Xerox Customer Service Center. The number of learning documents was 11,365, and the number of words used when generating feature vectors was 1,000.
[0040]
First, FIG. 4 shows an association result (with and without a control input) when 62 inquiry mails different from the learning document are given as an initial state. Thus, when the suppression input is not applied, 62 mails converge to almost the same pattern, whereas when the suppression input is applied, most mails converge to different patterns. I understand that there is.
[0041]
Next, an example of application to a typical complaint mail will be described.
[0042]
To summarize the email body,
"I purchased and used a copy machine from Fuji Xerox. The other day, the ink did not come out well, so when I replaced the cartridge, I got an error. When I replaced it with an old one, it was normal. It is said that it has not been received if you send it as instructed and give a confirmation call. I don't think of the customer's position at all. I want you to clarify the responsibility. "
It was something like that.
The words having the value 1 in the feature vector of the input mail are as shown in FIG.
After the association recall, the finally extracted keywords are as shown in FIG. However, the numbers in parentheses indicate the final activity values, and the asterisks indicate keywords not present in the text.
[0043]
Furthermore, the top 10 words were cut out from the extracted keywords according to the degree of activity. The result is as shown in FIG. Among them, keywords such as "claim" and "responsibility" can be considered to express the contents of the sentence more accurately than words appearing in the text, and it is said that "the contents of the sentence are easily understood". It can be said that a keyword more appropriate for the purpose was obtained.
[0044]
Next, an example of a document processing apparatus that extracts a keyword using the above-described embodiment and registers the keyword together with a document will be described.
[0045]
FIG. 8 shows a document processing apparatus 20 that registers a document while extracting a keyword of the document and enables a keyword search of the registered document. The document processing device 20 basically includes the keyword extraction device 10 of FIG. 1 as a main component, and a document database 21, a document registration unit 22, and a document search unit 23 are added to this. In FIG. 8, portions corresponding to those in FIG. 1 are denoted by corresponding reference numerals. In this example, the keyword output based on the associative recall feature vector extracted by the recall information extracting unit 14 is registered in the document database 21 in association with the document, and the document is searched for the keyword using the document searching unit 23.
[0046]
This concludes the description of the embodiment of the present invention.
[0047]
It should be noted that the present invention is not limited to the above-described embodiment, and various changes can be made without departing from the gist of the present invention. For example, although the suppression input that is proportional to the firing rate of the entire neural network or monotonically increases according to the firing rate is used, it is limited to a node in a predetermined range in advance and based on the firing rate within the range. Is also good. In the case of a non-binary type, the sum of the activation values may be used instead of the firing rate. The range of the node to be considered may be determined based on the frequency of the word corresponding to the node. In addition, depending on the situation, the associative recall may be performed by outputting the convergence result of the associative recall feature vector without giving the suppression input.
[0048]
【The invention's effect】
As described above, according to the present invention, associative associative recall can be appropriately performed even for data that is biased such as a document pattern, and the calculation cost can be reduced.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a keyword extracting device according to an embodiment of the present invention.
FIG. 2 is a block diagram schematically showing a configuration of a recall information extracting unit 14 of the embodiment of FIG.
FIG. 3 is a flowchart illustrating an operation of the recall information extraction unit 14 in the embodiment of FIG.
FIG. 4 is a diagram illustrating an experimental example.
FIG. 5 is a diagram illustrating an experimental example.
FIG. 6 is a diagram illustrating an experimental example.
FIG. 7 is a diagram illustrating an experimental example.
8 is a block diagram illustrating a configuration example of a document processing apparatus that registers a document with a keyword using the embodiment of FIG. 1;
[Explanation of symbols]
Reference Signs List 10 Keyword extraction device 11 User interface unit 12 Syntax analysis unit 13 Inter-word correlation learning unit 14 Recall information extraction unit 15 Database 20 Document processing device 21 Document database 22 Document registration unit 23 Document search unit 141 Input feature vector storage unit 142 Associative recall feature Vector storage unit 143 Associative recall feature vector update unit 144 Convergence determination unit 145 Associative recall feature vector output unit

Claims

In an associative storage device using a neural network in which a plurality of nodes are connected by a link and an activity value of the node is propagated based on the weight of the link to perform association, a suppression input proportional to a firing rate of the entire neural network is input to the network. An associative memory device characterized by inputting to the whole.

In an associative storage device using a neural network in which a plurality of nodes are connected by a link and propagate an activity value of the node based on the weight of the link to perform association, based on a firing rate of at least a part of the neural network, An associative storage device, wherein a suppression input that monotonically increases with respect to at least a part of the firing rate is input to at least a part of the network.

In an associative storage device using a neural network in which a plurality of nodes are connected by a link and an activity value of the node is propagated based on the weight of the link to perform association,
Means for applying an input to a corresponding node based on the associative memory input;
After the associative memory input is applied, a node is repeatedly selected, and the selected node has a weighted active value from a node connected to the selected node and a sum of active values of the entire neural network. Means for inputting a monotonically increasing suppression input and updating the activation value of the selected node;
Means for determining convergence of activity values of nodes in the network;
Means for outputting an associative memory output based on the activity value of the node after the convergence of the activity value of the node in the network is determined.

In an associative storage device using a neural network in which a plurality of nodes are connected by a link and an activity value of the node is propagated based on the weight of the link to perform association,
Means for applying an input to a corresponding node based on the associative memory input;
After the associative memory input is applied, a node is repeatedly selected, a weighted activation value from a node connected to the selected node is input to the selected node, and the activation value of the selected node is updated. Means to
Means for determining convergence of activity values of nodes in the network;
Means for outputting an associative memory output based on the activity value of the node after the convergence of the activity value of the node in the network is determined.

5. A document analysis device using an associative storage device according to claim 1, wherein a word is assigned to said node, and said node corresponds to a word assigned to said node in a document to be analyzed. A document analysis device that outputs an input corresponding to a node that has fired as an analysis result.

4. The document analysis device according to claim 3, wherein a word output as an analysis result is registered as the analysis target keyword.

In an associative memory method using a neural network in which a plurality of nodes are connected by a link and an activity value of the node is propagated based on the weight of the link to perform association, a suppression input proportional to a firing rate of the entire neural network is input to the entire network. Associative memory method, characterized by inputting to an associative memory.

In an associative memory method using a neural network in which a plurality of nodes are connected by a link and an activity value of the node is propagated based on the weight of the link to perform association,
Applying an input to a corresponding node based on the associative memory input;
After the associative memory input is applied, a node is repeatedly selected, and the selected node has a weighted active value from a node connected to the selected node and a sum of active values of the entire neural network. Inputting a monotonically increasing suppression input and updating the activation value of the selected node;
Determining the convergence of the activity values of the nodes in the network;
Outputting an associative memory output based on the activity value of the node after the convergence of the activity value of the node in the network is determined.