JP2014056516A

JP2014056516A - Device, method and program for extracting knowledge structure out of document set

Info

Publication number: JP2014056516A
Application number: JP2012202037A
Authority: JP
Inventors: Yasudai Tanaka; 靖大田中
Original assignee: Canon Marketing Japan Inc; Canon IT Solutions Inc; Canon MJ IT Group Holdings Inc
Current assignee: Canon Marketing Japan Inc; Canon IT Solutions Inc; Canon MJ IT Group Holdings Inc
Priority date: 2012-09-13
Filing date: 2012-09-13
Publication date: 2014-03-27
Anticipated expiration: 2032-09-13
Also published as: JP5700007B2

Abstract

PROBLEM TO BE SOLVED: To provide a system in which on the basis of a relationship between extraction keywords in a document set, extraction keywords having a close relationship can be disposed close to each other in a flexible manner.SOLUTION: A contribution degree for discriminating from other documents is calculated for a keyword contained in a document having been subjected to field classification; the document is classified into a unit on the basis of the calculated contribution degree by using a self-organization map, and displayed; and thereafter, arrangement information of the keyword is calculated from an appearance frequency of the keyword contained in the document in the unit, and displayed in accordance with the unit.

Description

本発明は、電子化された文書集合から知識構造の抽出を行う情報処理装置及び制御に関する。 The present invention relates to an information processing apparatus and control for extracting a knowledge structure from an electronic document set.

近年、ビッグデータの活用に取り組む企業が増えている。今まで記録として保存していた膨大なデータを分析することにより、隠れた知見を見つけ出し、企業活動に役立てていこうという取り組みである。 In recent years, an increasing number of companies are working on the use of big data. By analyzing the enormous amount of data that has been stored as records, it is an effort to discover hidden knowledge and use it for corporate activities.

現在は、販売データのような構造化されたデータを分析するデータマイニングが主流であるが、自然言語処理技術を用いて、ビジネス文書のような構造化されていないデータを分析するテキストマイニングについても注目が高まってきている。 Currently, data mining that analyzes structured data such as sales data is the mainstream, but text mining that analyzes unstructured data such as business documents using natural language processing technology is also the mainstream. Attention is increasing.

テキストマイニングには、ユーザによって入力された検索条件に対して、文書あるいは文に存在するキーワードや共起の頻度を集計して表やグラフで見せる単純な機能や、文書集合の概要を表現するため、特徴的なキーワードとその関係を１つの図にまとめて示すキーワードマップのような機能がある。 Text mining is a simple function for summarizing keywords and co-occurrence frequencies in a document or sentence and displaying them in a table or graph based on the search conditions entered by the user, or to express an outline of a document set. There is a function like a keyword map that shows characteristic keywords and their relationships in a single figure.

例えば、検索語を含めた関連語間の関係をマップ上に表示し、検索語に関連性の高い語を近傍に配置すると共に、マップ上の高い語のフォントサイズを大きく示すことで、関連性の強いことが一目でユーザに知らしめる技術が開示されている（例えば、特許文献１参照）。 For example, the relationship between related terms including the search term is displayed on the map, the words that are highly relevant to the search term are arranged in the vicinity, and the font size of the high words on the map is shown to be large. A technique for informing the user of the fact that the strength is strong at a glance is disclosed (for example, see Patent Document 1).

また、文書検索を行うにあたり、検索キーワードを持つ文書に含まれる単語の出現頻度に限らず、全体の文書に含まれる当該単語の出現頻度を用いてクラス化を行い、当該クラス毎に、特徴的な単語を特定して、特徴的な単語同士の共起強度等を用いることで、グラフ形式で特徴後間の関係を表示することで、バランスのとれた文書の検索を行う技術が開示されている（例えば、特許文献２参照）。 In addition, when performing a document search, classification is performed using the appearance frequency of the word included in the entire document, not limited to the appearance frequency of the word included in the document having the search keyword. A technique for searching for a balanced document by specifying a simple word and using the co-occurrence strength between characteristic words and displaying the relationship between the features in a graph format is disclosed. (For example, refer to Patent Document 2).

特開２００８−２５０６２５号公報JP 2008-250625 A 特開平１０−７４２１０号公報JP-A-10-74210

しかしながら、上記特許文献１のシステムでは、抽出されたキーワードは、検索キーワードとの関連度合いに依存して配置され、抽出キーワード間の関係を表現することができない。 However, in the system of Patent Document 1, the extracted keywords are arranged depending on the degree of association with the search keyword, and the relationship between the extracted keywords cannot be expressed.

また、上記特許文献２のシステムでは、抽出されたキーワードの共起関係に基づいて生成されたリンクが生成されるので、抽出キーワード間の関係を表現することができる。 Moreover, in the system of the above-mentioned patent document 2, since the link generated based on the co-occurrence relationship of the extracted keywords is generated, the relationship between the extracted keywords can be expressed.

しかしながら、実施例にあるように１つの抽出キーワードが親へのリンクを１つしか持たない場合、リンクの末端になるキーワードは関係が深くても異なるリンクに接続されてしまう可能性がある。また、２つ以上のリンクを可能とする場合、現実的なリンクや配置が困難になると考えられる。 However, when one extracted keyword has only one link to the parent as in the embodiment, there is a possibility that the keyword at the end of the link is connected to a different link even if the relationship is deep. In addition, when two or more links are possible, it is considered that realistic links and arrangement are difficult.

そこで、本発明の目的は、文書集合における抽出キーワード間の関係に基づき、関連が深い抽出キーワードを、柔軟に、近くに配置可能な仕組みを提供することである。 Accordingly, an object of the present invention is to provide a mechanism capable of flexibly arranging closely related extracted keywords based on the relationship between extracted keywords in a document set.

上記目的を達成するための第１の発明は、文書内容に基づいて当該文書を分野分類する分野分類手段と、前記文書に含まれるキーワードに対して他文書との弁別するための寄与度を算出するための寄与度算出手段と、前記寄与度算出手段により算出した寄与度に基づいて自己組織化マップを用いて、前記文書をユニットに分類させるユニット分類手段と、前記ユニット分類手段によって分類されたユニットを表示する表示手段と、前記ユニット分類手段によってユニット分類された各ユニットに基づいて階層的クラスタリングを用いて、前記各ユニットのクラスタを生成するクラスタ生成手段と、前記ユニットにおける前記文書に含まれるキーワードの出現頻度から特定されたユニットから前記表示手段に対する前記キーワードの配置情報を算出する配置情報算出手段と、前記特定されたユニットと前記クラスタとの合致度を求めることで、前記特定されたユニットが属する前記クラスタと他クラスタとの関連からリンクを生成して前記表示手段に表示するリンク生成表示手段と、を備えたことを特徴とする知識構造抽出装置である。 The first invention for achieving the above object is a field classification means for classifying the document based on the document content, and a contribution for discriminating the keyword included in the document from other documents. A unit for classifying the document into units using a self-organizing map based on the contribution calculated by the contribution degree calculating unit, and the unit classification unit Included in the document in the unit, display means for displaying the units, cluster generation means for generating a cluster of each unit using hierarchical clustering based on each unit classified by the unit classification means The arrangement information of the keyword with respect to the display means from the unit specified from the appearance frequency of the keyword A link information is generated from the association between the cluster to which the specified unit belongs and another cluster by obtaining the degree of coincidence between the specified unit and the cluster, and the display information is calculated on the display means. A knowledge structure extraction device comprising: a link generation display means for displaying.

上記目的を達成するための第２の発明は、知識構造抽出装置において文書から関連するキーワードを抽出して表示する知識構造抽出方法であって、前記知識構造抽出装置の分野分類手段は、文書内容に基づいて当該文書を分野分類しする分野分類ステップと、前記知識構造抽出装置の寄与度算出手段は、前記文書に含まれるキーワードに対して他文書との弁別するための寄与度を算出するための寄与度算出ステップと、前記知識構造抽出装置のユニット分類手段は、前記寄与度算出ステップにより算出した寄与度に基づいて自己組織化マップを用いて、前記文書をユニットに分類させるユニット分類ステップと、前記知識構造抽出装置の表示手段は、前記ユニット分類ステップによって分類されたユニットを表示する表示ステップと、前記知識構造抽出装置のクラスタ生成手段は、前記ユニット分類ステップによってユニット分類された各ユニットに基づいて階層的クラスタリングを用いて、前記各ユニットのクラスタを生成するクラスタ生成ステップと、前記知識構造抽出装置の配置情報算出手段は、前記ユニットにおける前記文書に含まれるキーワードの出現頻度から特定されたユニットから前記表示ステップに対する前記キーワードの配置情報を算出する配置情報算出ステップと、前記知識構造抽出装置のリンク生成手段は、前記特定されたユニットと前記クラスタとの合致度を求めることで、前記特定されたユニットが属する前記クラスタと他クラスタとの関連からリンクを生成して表示するリンク生成表示ステップと、を実行することを特徴とする知識構造抽出方法である。 A second invention for achieving the above object is a knowledge structure extraction method for extracting and displaying a related keyword from a document in a knowledge structure extraction device, wherein the field classification means of the knowledge structure extraction device includes a document content A field classification step of classifying the document on the basis of the document, and a contribution calculation means of the knowledge structure extraction device for calculating a contribution for discriminating from another document with respect to a keyword included in the document And a unit classification step of classifying the document into units using a self-organizing map based on the contribution calculated in the contribution calculation step. The display unit of the knowledge structure extraction device includes a display step for displaying the units classified by the unit classification step, and the knowledge The cluster generation means of the structure extraction device includes a cluster generation step of generating a cluster of each unit using hierarchical clustering based on each unit classified by the unit classification step, and an arrangement of the knowledge structure extraction device An information calculation means includes an arrangement information calculation step for calculating the arrangement information of the keyword with respect to the display step from a unit specified from the appearance frequency of the keyword included in the document in the unit, and a link generation means of the knowledge structure extraction device Performing a link generation display step of generating and displaying a link from the association between the cluster to which the identified unit belongs and another cluster by obtaining a degree of match between the identified unit and the cluster. Knowledge structure extraction method characterized by .

上記目的を達成するための第３の発明は、文書から関連するキーワードを抽出して表示する知識構造抽出装置において実行されるプログラムであって、前記知識構造抽出装置を、文書内容に基づいて当該文書を分野分類する分野分類手段と、前記文書に含まれるキーワードに対して他文書との弁別するための寄与度を算出するための寄与度算出手段と、前記寄与度算出手段により算出した寄与度に基づいて自己組織化マップを用いて、前記文書をユニットに分類させるユニット分類手段と、前記ユニット分類手段によって分類されたユニットを表示する表示手段と、前記ユニット分類手段によってユニット分類された各ユニットに基づいて階層的クラスタリングを用いて、前記各ユニットのクラスタを生成するクラスタ生成手段と、前記ユニットにおける前記文書に含まれるキーワードの出現頻度から特定されたユニットから前記表示手段に対する前記キーワードの配置情報を算出する配置情報算出手段と、前記特定されたユニットと前記クラスタとの合致度を求めることで、前記特定されたユニットが属する前記クラスタと他クラスタとの関連からリンクを生成して前記表示手段に表示するリンク生成表示手段と、して機能させることと特徴とするプログラムである。 A third invention for achieving the above object is a program executed in a knowledge structure extraction device that extracts and displays related keywords from a document, and the knowledge structure extraction device is based on the contents of the document. Field classification means for classifying a document, contribution calculation means for calculating a contribution for discriminating a keyword included in the document from other documents, and contribution calculated by the contribution calculation means Unit classification means for classifying the document into units using a self-organizing map, display means for displaying the units classified by the unit classification means, and each unit classified by the unit classification means Cluster generating means for generating a cluster of each unit using hierarchical clustering based on An arrangement information calculation means for calculating the arrangement information of the keyword with respect to the display means from a unit specified from the appearance frequency of the keyword included in the document in a document, and a degree of coincidence between the specified unit and the cluster Thus, the program is characterized by functioning as link generation display means for generating a link from the association between the cluster to which the identified unit belongs and another cluster and displaying the link on the display means.

本発明によれば、指定された文書集合から特徴的なキーワードを抽出し、抽出キーワード間の関係に基づき、関連が深い抽出キーワードを近くに配置することが可能となり、より直観的で理解しやすい図を作成することができるようになる。 According to the present invention, it is possible to extract characteristic keywords from a specified document set and to place extracted keywords that are closely related to each other based on the relationship between the extracted keywords, which is more intuitive and easy to understand. A figure can be created.

本発明の知識構造抽出システムの構成の一例を示すシステム構成図である。It is a system configuration figure showing an example of the composition of the knowledge structure extraction system of the present invention. 本発明の知識構造抽出装置、及び利用者端末に適用可能な情報処理装置のハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the information structure apparatus applicable to the knowledge structure extraction apparatus of this invention, and a user terminal. 本発明における知識構造抽出装置における知識構造抽出処理の一例を示すフローチャートである。It is a flowchart which shows an example of the knowledge structure extraction process in the knowledge structure extraction apparatus in this invention. 本発明の知識構造抽出処理における文書収集処理の一例を示すフローチャートである。It is a flowchart which shows an example of the document collection process in the knowledge structure extraction process of this invention. 本発明における利用端末から取得するＵＲＬ履歴の一例を示す図である。It is a figure which shows an example of the URL log | history acquired from the utilization terminal in this invention. 本発明の知識構造抽出処理における分野分析処理の一例を示すフローチャートである。It is a flowchart which shows an example of the field analysis process in the knowledge structure extraction process of this invention. 本発明の分野分析処理における重み付き分野情報取得処理の概要を示す図である。It is a figure which shows the outline | summary of the weighted field information acquisition process in the field analysis process of this invention. 本発明の知識構造抽出処理におけるキーワード抽出処理の一例を示すフローチャートである。It is a flowchart which shows an example of the keyword extraction process in the knowledge structure extraction process of this invention. 本発明における文書情報保存領域における文書情報テーブルの一例を示す図である。It is a figure which shows an example of the document information table in the document information storage area | region in this invention. 本発明の知識構造抽出処理における文書配置処理の一例を示すフローチャートである。It is a flowchart which shows an example of the document arrangement | positioning process in the knowledge structure extraction process of this invention. 本発明の文書配置処理において使用される自己組織化マップと表示位置の一例を示す図である。It is a figure which shows an example of the self-organization map used in the document arrangement | positioning process of this invention, and a display position. 本発明のユニット情報の一例を示す図である。It is a figure which shows an example of the unit information of this invention. 本発明の文書配置処理における自己組織化マップユニットのクラスタリング結果の概要を示す図である。It is a figure which shows the outline | summary of the clustering result of the self-organizing map unit in the document arrangement | positioning process of this invention. 本発明の文書配置処理において生成される自己組織化マップユニットのクラスタリングテーブルの一例を示す図である。It is a figure which shows an example of the clustering table of the self-organizing map unit produced | generated in the document arrangement | positioning process of this invention. 本発明における文書情報保存領域における文書情報テーブルの一例を示す図である。It is a figure which shows an example of the document information table in the document information storage area | region in this invention. 本発明の知識構造抽出処理におけるキーワード配置処理の一例を示すフローチャートである。It is a flowchart which shows an example of the keyword arrangement | positioning process in the knowledge structure extraction process of this invention. 本発明のキーワード配置処理において生成されるキーワード配置テーブルの一例を示す図である。It is a figure which shows an example of the keyword arrangement | positioning table produced | generated in the keyword arrangement | positioning process of this invention. 本発明のキーワード配置処理において生成されたキーワード配置情報に基づき、自己組織化マップ上にキーワードを配置した様子の一例を示す図である。It is a figure which shows an example of a mode that the keyword was arrange | positioned on the self-organization map based on the keyword arrangement | positioning information produced | generated in the keyword arrangement | positioning process of this invention. 本発明の知識構造抽出処理におけるリンク生成処理の一例を示すフローチャートである。It is a flowchart which shows an example of the link production | generation process in the knowledge structure extraction process of this invention. 本発明のリンク生成処理において生成されたリンク情報に基づき、自己組織化マップ上に配置されたキーワードにリンクを付与した様子の一例を示す図である。It is a figure which shows an example of a mode that the link was provided to the keyword arrange | positioned on the self-organization map based on the link information produced | generated in the link production | generation process of this invention. 本発明の分野分析処理における重み付き分野情報取得処理の概要を示す図である。It is a figure which shows the outline | summary of the weighted field information acquisition process in the field analysis process of this invention.

以下、図面を参照して、本発明の実施形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、本発明の知識構造抽出システムの構成の一例を示すシステム構成図である。 FIG. 1 is a system configuration diagram showing an example of the configuration of the knowledge structure extraction system of the present invention.

図１は、知識構造抽出装置１００、および１又は複数の利用者端末１３０がローカルエリアネットワークを介して接続される構成となっている。また利用者端末１３０は外部ネットワーク１４０に接続可能な構成となっている。 FIG. 1 shows a configuration in which a knowledge structure extraction apparatus 100 and one or a plurality of user terminals 130 are connected via a local area network. In addition, the user terminal 130 can be connected to the external network 140.

知識構造抽出装置１００は、利用者端末１３０から取得したウェブページの閲覧履歴から、ウェブページの内容を取得・分析し、ウェブページ閲覧内容の概要を示す図を作成し、利用者端末１３０に返す。 The knowledge structure extraction device 100 acquires / analyzes the contents of the web page from the browsing history of the web page acquired from the user terminal 130, creates a diagram showing an outline of the browsing content of the web page, and returns it to the user terminal 130. .

利用者端末１３０は、外部ネットワーク１４０を介して閲覧したウェブページの閲覧履歴を、知識構造抽出装置１００に送り、知識構造抽出装置１００により生成された閲覧内容の概要を図示したものを取得する。 The user terminal 130 sends the browsing history of the web pages browsed via the external network 140 to the knowledge structure extracting device 100, and obtains an overview of browsing contents generated by the knowledge structure extracting device 100.

また、本実施形態の知識構造抽出システム、利用者端末１３０から取得する文書がウェブの閲覧履歴以外であってもよい。 Further, the document acquired from the knowledge structure extraction system and the user terminal 130 of the present embodiment may be other than the web browsing history.

以下、図２を用いて、図１に示した知識構造抽出装置１００、利用者端末１３０に適用可能な情報処理装置のハードウェア構成について説明する。 Hereinafter, the hardware configuration of the information processing apparatus applicable to the knowledge structure extraction apparatus 100 and the user terminal 130 illustrated in FIG. 1 will be described with reference to FIG.

図２は、図１に示した知識構造抽出装置１００、利用者端末１３０に適用可能な情報処理装置のハードウェア構成を示すブロック図である。 FIG. 2 is a block diagram illustrating a hardware configuration of an information processing apparatus applicable to the knowledge structure extraction apparatus 100 and the user terminal 130 illustrated in FIG.

図２において、２０１はＣＰＵで、システムバス２０４に接続される各デバイスやコントローラを統括的に制御する。また、ＲＯＭ２０２あるいは外部メモリ２１１には、ＣＰＵ２０１の制御プログラムであるＢＩＯＳ（ＢａｓｉｃＩｎｐｕｔ／ＯｕｔｐｕｔＳｙｓｔｅｍ）やオペレーティングシステムプログラム（以下、ＯＳ）や、各サーバ或いは各ＰＣの実行する機能を実現するために必要な後述する各種プログラム等が記憶されている。 In FIG. 2, reference numeral 201 denotes a CPU that comprehensively controls each device and controller connected to the system bus 204. Further, the ROM 202 or the external memory 211 is necessary to realize a BIOS (Basic Input / Output System) or an operating system program (hereinafter referred to as an OS), which is a control program of the CPU 201, or a function executed by each server or each PC. Various programs to be described later are stored.

２０３はＲＡＭで、ＣＰＵ２０１の主メモリ、ワークエリア等として機能する。ＣＰＵ２０１は、処理の実行に際して必要なプログラム等をＲＯＭ２０２あるいは外部メモリ２１１からＲＡＭ２０３にロードして、該ロードしたプログラムを実行することで各種動作を実現するものである。 A RAM 203 functions as a main memory, work area, and the like for the CPU 201. The CPU 201 implements various operations by loading a program or the like necessary for execution of processing from the ROM 202 or the external memory 211 into the RAM 203 and executing the loaded program.

また、２０５は入力コントローラで、キーボード（ＫＢ）２０９や不図示のマウス等のポインティングデバイス等からの入力を制御する。２０６はビデオコントローラで、ＣＲＴディスプレイ（ＣＲＴ）２１０等の表示器への表示を制御する。なお、図２では、ＣＲＴ２１０と記載しているが、表示器はＣＲＴだけでなく、液晶ディスプレイ等の他の表示器であってもよい。これらは必要に応じて管理者が使用するものである。 An input controller 205 controls input from a keyboard (KB) 209 or a pointing device such as a mouse (not shown). A video controller 206 controls display on a display device such as a CRT display (CRT) 210. In FIG. 2, although described as CRT 210, the display device is not limited to the CRT, but may be another display device such as a liquid crystal display. These are used by the administrator as needed.

２０７はメモリコントローラで、ブートプログラム，各種のアプリケーション，フォントデータ，ユーザファイル，編集ファイル，各種データ等を記憶する外部記憶装置（ハードディスク（ＨＤ））や、フレキシブルディスク（ＦＤ）、或いはＰＣＭＣＩＡカードスロットにアダプタを介して接続されるコンパクトフラッシュ（登録商標）メモリ等の外部メモリ２１１へのアクセスを制御する。 A memory controller 207 is provided in an external storage device (hard disk (HD)), flexible disk (FD), or PCMCIA card slot for storing a boot program, various applications, font data, user files, editing files, various data, and the like. Controls access to an external memory 211 such as a compact flash (registered trademark) memory connected via an adapter.

２０８は通信Ｉ／Ｆコントローラで、ネットワーク（例えば、図１に示したＬＡＮ４００）を介して外部機器と接続・通信するものであり、ネットワークでの通信制御処理を実行する。例えば、ＴＣＰ／ＩＰを用いた通信等が可能である。 A communication I / F controller 208 connects and communicates with an external device via a network (for example, the LAN 400 shown in FIG. 1), and executes communication control processing in the network. For example, communication using TCP / IP is possible.

なお、ＣＰＵ２０１は、例えばＲＡＭ２０３内の表示情報用領域へアウトラインフォントの展開（ラスタライズ）処理を実行することにより、ＣＲＴ２１０上での表示を可能としている。また、ＣＰＵ２０１は、ＣＲＴ２１０上の不図示のマウスカーソル等でのユーザ指示を可能とする。 Note that the CPU 201 enables display on the CRT 210 by executing outline font rasterization processing on a display information area in the RAM 203, for example. In addition, the CPU 201 enables a user instruction with a mouse cursor (not shown) on the CRT 210.

本発明を実現するための後述する各種プログラムは、外部メモリ２１１に記録されており、必要に応じてＲＡＭ２０３にロードされることによりＣＰＵ２０１によって実行されるものである。さらに、上記プログラムの実行時に用いられる定義ファイル及び各種情報テーブル等も、外部メモリ２１１に格納されており、これらについての詳細な説明も後述する。 Various programs to be described later for realizing the present invention are recorded in the external memory 211 and executed by the CPU 201 by being loaded into the RAM 203 as necessary. Furthermore, definition files and various information tables used when executing the program are also stored in the external memory 211, and a detailed description thereof will be described later.

以下、本実施形態における知識構造抽出システムの全体の流れを説明する。 The overall flow of the knowledge structure extraction system in this embodiment will be described below.

利用者端末１３０は、利用者の指示により、特定期間におけるウェブページの閲覧履歴の全てまたは一部を知識構造抽出装置１００に送信する。送信するウェブページの閲覧履歴を利用者が選択する構成にしてもよい。 The user terminal 130 transmits all or part of the browsing history of the web page in a specific period to the knowledge structure extraction device 100 according to a user instruction. You may make it the structure which a user selects the browsing history of the web page to transmit.

知識構造抽出装置１００は、利用者端末１３０よりウェブページの閲覧履歴を受信すると、ウェブページの内容を分析し、抽出した知識構造を利用者端末１３０において表示可能な形式で返信する。 When receiving the web page browsing history from the user terminal 130, the knowledge structure extracting device 100 analyzes the content of the web page and returns the extracted knowledge structure in a format that can be displayed on the user terminal 130.

利用者端末１３０は、知識構造抽出装置１００から抽出した知識構造を受信すると、受信した知識構造をブラウザ１２１に図示する。 When the user terminal 130 receives the knowledge structure extracted from the knowledge structure extraction apparatus 100, the user terminal 130 displays the received knowledge structure in the browser 121.

本実施形態においてウェブページの分類および二次元平面上への配置において、自己組織化マップを用いる。また、自己組織化マップにより分類されたユニットのクラスタリングにウォード法を用いる。 In the present embodiment, a self-organizing map is used for classification of web pages and arrangement on a two-dimensional plane. In addition, the Ward method is used for clustering the units classified by the self-organizing map.

「参考文献」
(1)T.Kohonen,“The self-organizing map”,Proceeding of IEEE, vol.78, no.9,Sept.1990
(2)JoeH.Ward,Jr.,Hierarchical Grouping to Optimize an Objective Function, Journal of the American Statistical Association,Vol.58,1963 References
(1) T. Kohonen, “The self-organizing map”, Proceeding of IEEE, vol. 78, no. 9, Sept. 1990
(2) Joe H. Ward, Jr., Hierarchical Grouping to Optimize an Objective Function, Journal of the American Statistical Association, Vol. 58, 1963

以下、図３を参照して、本実施形態の知識構造抽出システムにおける知識構造抽出処理について説明する。 Hereinafter, with reference to FIG. 3, a knowledge structure extraction process in the knowledge structure extraction system of the present embodiment will be described.

ステップＳ３０１において文書収集部１０１は、利用者端末１３０より受信したウェブページの閲覧履歴から、知識構造抽出対象となるウェブページを選別し、ウェブページの本文テキストを取得し、文書情報保存領域１０２に保存する。 In step S 301, the document collection unit 101 selects a web page as a knowledge structure extraction target from the browsing history of the web page received from the user terminal 130, acquires the body text of the web page, and stores it in the document information storage area 102. save.

ステップＳ３０２おいて分野分析部１０３は、文書情報保存領域１０２に保存されたウェブページの本文テキストを解析し、分野情報１０４を抽出する。 In step S 302, the field analysis unit 103 analyzes the body text of the web page stored in the document information storage area 102 and extracts field information 104.

ステップＳ３０３においてキーワード抽出部１０５は、文書情報保存領域１０２に保存されたウェブページの本文テキストを解析し、キーワード情報１０６を抽出する。 In step S 303, the keyword extraction unit 105 analyzes the body text of the web page stored in the document information storage area 102 and extracts the keyword information 106.

ステップＳ３０２およびＳ３０３は並列に処理を実行するように構成してもよいし、同一の処理において、分野情報１０４およびキーワード情報１０６の両方を抽出するように構成してもよい。 Steps S302 and S303 may be configured to execute processes in parallel, or may be configured to extract both the field information 104 and the keyword information 106 in the same process.

ステップＳ３０４において文書配置部１０７は、分野情報１０４に対し自己組織化マップを用いて、各ウェブページに対し、内容の近いウェブページが二次元平面上でも近くなるような位置を決定し、文書配置情報１０８として生成する。 In step S304, the document placement unit 107 uses the self-organizing map for the field information 104 to determine a position where a web page with similar contents is close to the two-dimensional plane for each web page, and the document placement. Information 108 is generated.

ステップＳ３０５においてキーワード配置部１０９は、ステップＳ３０３において抽出されたキーワード情報１０６のキーワードに対し、ステップＳ３０４で生成した文書配置情報１０８を参照して、キーワードの二次元平面上での位置を決定し、キーワード配置情報１１０として生成する。また、キーワードの属する文書と、文書が属する自己組織化マップのユニットの情報から、キーワードと関連が高いクラスタを特定する。 In step S305, the keyword arrangement unit 109 refers to the document arrangement information 108 generated in step S304 for the keyword of the keyword information 106 extracted in step S303, determines the position of the keyword on the two-dimensional plane, Generated as keyword arrangement information 110. Further, a cluster highly related to the keyword is identified from the information of the document to which the keyword belongs and the unit information of the self-organizing map to which the document belongs.

ステップＳ３０６においてリンク生成部１１１は、キーワードを、ステップＳ３０５で関連付けたクラスタの情報の階層関係に応じて、キーワード間のリンク情報を生成する。リンク情報を含め生成された各種情報を知識構造情報として知識構造情報保存領域１１２に保存する。 In step S306, the link generation unit 111 generates link information between keywords according to the hierarchical relationship of the cluster information associated in step S305. Various information generated including link information is stored as knowledge structure information in the knowledge structure information storage area 112.

ステップＳ３０７において表示・編集部１１３は、知識構造情報保存領域１１２に保存された知識構造情報から、図２０に示すような文書集合全体を概括するような図を生成し、利用者端末に送信する。 In step S307, the display / editing unit 113 generates a diagram summarizing the entire document set as shown in FIG. 20 from the knowledge structure information stored in the knowledge structure information storage area 112, and transmits the generated diagram to the user terminal. .

図４では文書収集処理Ｓ３０１の詳細について記載する。 FIG. 4 describes details of the document collection processing S301.

ステップＳ４０１において文書収集部１０１は、利用者端末からウェブページの閲覧履歴をＵＲＬ一覧として受信する。図５にＵＲＬ一覧の例を示す。 In step S401, the document collection unit 101 receives a web page browsing history from the user terminal as a URL list. FIG. 5 shows an example of a URL list.

ステップＳ４０２において文書収集部１０１は、受信したURL一覧に対し、ステップＳ４０７までの繰り返し処理を開始する。 In step S402, the document collection unit 101 starts repetitive processing up to step S407 on the received URL list.

ステップＳ４０３において文書収集部１０１は、処理中のＵＲＬが処理の対象であるか否かを判定する。ＵＲＬが処理対象である場合はステップＳ４０４に処理を移す。ＵＲＬが処理対象でない場合は、ステップＳ４０７に処理を移す。 In step S403, the document collection unit 101 determines whether the URL being processed is a processing target. If the URL is a processing target, the process proceeds to step S404. If the URL is not a processing target, the process proceeds to step S407.

処理の対象であるか否かの判定は、例えば閲覧時刻が直近１日以内であるとか特定のユーザのみを対象にするとか、定期的に閲覧している特定のＵＲＬを除外する等、任意の条件に基づいてよい。 The determination as to whether or not it is a target of processing is arbitrary, for example, whether the browsing time is within the last day, only a specific user is targeted, or a specific URL that is periodically browsed is excluded. It may be based on conditions.

ステップＳ４０４において文書収集部１０１は、処理中のＵＲＬで示された文書を、ネットワークを介して取得する。 In step S404, the document collection unit 101 acquires the document indicated by the URL being processed via the network.

ステップＳ４０５において文書収集部１０１は、取得した文書から本文テキストを抽出する。広告などの不要部分を除去する処理を行ってもよい。 In step S405, the document collection unit 101 extracts body text from the acquired document. You may perform the process which removes unnecessary parts, such as an advertisement.

ステップＳ４０６において文書収集部１０１は、本文テキストをＵＲＬ情報とともに文書情報保存領域１０２の文書情報テーブル９０１に保存する。図９に文書情報テーブル９０１の一例を示す。 In step S406, the document collection unit 101 stores the body text in the document information table 901 in the document information storage area 102 together with the URL information. FIG. 9 shows an example of the document information table 901.

ステップＳ４０７において文書収集部１０１は、次のＵＲＬがある場合、ステップＳ４０２からの処理を実施する。次のＵＲＬがない場合、処理を終了する。 In step S407, the document collection unit 101 performs the processing from step S402 when there is a next URL. If there is no next URL, the process ends.

文書収集処理Ｓ３０１の具体例について記載する。 A specific example of the document collection process S301 will be described.

ステップＳ４０１において文書収集部１０１は、利用者端末から図５に示すウェブページの閲覧履歴をＵＲＬ一覧として受信する。 In step S401, the document collection unit 101 receives the web page browsing history shown in FIG. 5 as a URL list from the user terminal.

ステップＳ４０２において文書収集部１０１は、ＵＲＬ５０１に対し、ステップＳ４０７までの繰り返し処理を開始する。 In step S402, the document collection unit 101 starts the iterative process up to step S407 for the URL 501.

ステップＳ４０３において文書収集部１０１は、ＵＲＬ５０１が処理の対象であるか否かを判定する。ここではアクセスしたユーザが「nagai」であるＵＲＬを処理対象とする。ＵＲＬ５０１のユーザが「sakai」であるので、ステップＳ４０７に処理を移す。 In step S403, the document collection unit 101 determines whether the URL 501 is a processing target. Here, the URL whose accessed user is “nagai” is the processing target. Since the user of the URL 501 is “sakai”, the process proceeds to step S407.

ステップＳ４０７において文書収集部１０１は、次のＵＲＬ５０２があるので、ステップＳ４０２に処理を移す。 In step S407, the document collection unit 101 moves to step S402 because there is the next URL 502.

以降、文書収集部１０１は、ＵＲＬ５０２以降のＵＲＬに対し、ＵＲＬ５０３の直前まで同様の処理を行う。 Thereafter, the document collection unit 101 performs the same processing for the URLs after the URL 502 until just before the URL 503.

ステップＳ４０２において文書収集部１０１は、ＵＲＬ５０３に対し、ステップＳ４０７までの繰り返し処理を開始する。 In step S402, the document collection unit 101 starts the iterative process up to step S407 for the URL 503.

ステップＳ４０３において文書収集部１０１は、ＵＲＬ５０３のユーザが「nagai」であるので、ステップＳ４０４に処理を移す。 In step S403, the document collection unit 101 moves the process to step S404 because the user of the URL 503 is “nagai”.

ステップＳ４０４において文書収集部１０１は、ＵＲＬ５０３で示された文書を、ネットワークを介して取得する。 In step S404, the document collection unit 101 acquires the document indicated by the URL 503 via the network.

ステップＳ４０６において文書収集部１０１は、抽出した本文テキストとＵＲＬ情報を文書情報９０７として図９に示す文書情報テーブル９０１に保存する。この時点では文書情報９０７の分野カラム９０５およびキーワードカラム９０６は空である。 In step S406, the document collection unit 101 stores the extracted text text and URL information as document information 907 in the document information table 901 shown in FIG. At this time, the field column 905 and the keyword column 906 of the document information 907 are empty.

ステップＳ４０７において文書収集部１０１は、次のＵＲＬ５０４があるので、ステップＳ４０２からの処理を実施する。 In step S407, the document collection unit 101 performs the processing from step S402 because there is the next URL 504.

以下、同様の処理を繰り返す。 Thereafter, the same processing is repeated.

図６では分野分析処理Ｓ３０２の詳細について記載する。 FIG. 6 describes details of the field analysis processing S302.

ステップＳ６０１において分野分析部１０３は、文書情報保存領域１０２の文書情報テーブル９０１に保存された文書情報に対して、ステップＳ６０４までの繰り返し処理を開始する。 In step S 601, the field analysis unit 103 starts repetitive processing up to step S 604 on the document information stored in the document information table 901 in the document information storage area 102.

ステップＳ６０２において分野分析部１０３は、文書情報のテキストを解析して、文書が属する分野を特定する。本発明の分野分類では、文書が属する分野を１つに決めるのではなく、図７に示すように、複数の、特定した分野と分野に属する度合いの組として表現する。以降、文書に対する複数の分野と度合いの組を、重み付き分野情報と呼ぶ。文書の分野分類は、単純ベイズ分類器などの既存の方法を用いて実現することが可能である。 In step S602, the field analysis unit 103 analyzes the text of the document information and specifies the field to which the document belongs. In the field classification of the present invention, the field to which the document belongs is not determined as one, but is expressed as a set of a plurality of specified fields and degrees belonging to the field as shown in FIG. Hereinafter, a set of a plurality of fields and degrees for a document is referred to as weighted field information. Document field classification can be implemented using existing methods such as naive Bayes classifiers.

単純ベイズ分類器では、文書を構成する単語がある分野に属する文書群において出現する頻度から、その文書が文書群に属する確率を求めることができる。この確率に基づき、分野に属する度合いを数値化し、重み付き分野情報として取得する。重み付き分野情報の取得は、単純ベイズ以外の方法で行ってもよい。 In the naive Bayes classifier, the probability that a document belongs to the document group can be obtained from the frequency of occurrence of the word constituting the document in the document group belonging to a certain field. Based on this probability, the degree belonging to the field is quantified and acquired as weighted field information. The weighted field information may be acquired by a method other than naive Bayes.

ステップＳ６０３において分野分析部１０３は、ステップＳ６０２で取得した重み付き分野情報を文書情報保存領域１０２の文書情報テーブル９０１に追加する。 In step S603, the field analysis unit 103 adds the weighted field information acquired in step S602 to the document information table 901 in the document information storage area 102.

ステップＳ６０４において分野分析部１０３は、次の文書情報がある場合、ステップＳ６０１からの処理を実施する。次の文書情報がない場合、処理を終了する。 In step S604, the field analysis unit 103 performs the processing from step S601 when there is next document information. If there is no next document information, the process ends.

分野分析処理Ｓ３０２の具体例について記載する。 A specific example of the field analysis process S302 will be described.

ステップＳ６０１において分野分析部１０３は、文書情報テーブル９０１に保存された文書情報９０７に対して、ステップＳ６０４までの繰り返し処理を開始する。 In step S601, the field analysis unit 103 starts repetitive processing up to step S604 for the document information 907 stored in the document information table 901.

ステップＳ６０２において分野分析部１０３は、文書情報９０７のテキストカラム９０４のテキスト７０１を解析して、重み付き分野情報７０２を取得する。 In step S602, the field analysis unit 103 analyzes the text 701 in the text column 904 of the document information 907, and obtains weighted field information 702.

ステップＳ６０３において分野分析部１０３は、ステップＳ６０２で取得した重み付き分野情報７０２を文書情報テーブル９０１の文書情報９０７の分野カラム９０５に追加する。 In step S603, the field analysis unit 103 adds the weighted field information 702 acquired in step S602 to the field column 905 of the document information 907 in the document information table 901.

ステップＳ６０４において分野分析部１０３は、次の文書情報９０８があるので、ステップＳ６０１からの処理を実施する。 In step S604, the field analysis unit 103 performs the processing from step S601 because there is the next document information 908.

以下同様の処理を繰り返す。 Thereafter, the same processing is repeated.

図８ではキーワード抽出処理Ｓ３０３の詳細について記載する。 FIG. 8 describes details of the keyword extraction process S303.

ステップＳ８０１においてキーワード抽出部１０５は、文書情報保存領域１０２の文書情報テーブル９０１に保存された文書情報に対して、ステップＳ６０４までの繰り返し処理を開始する。 In step S801, the keyword extraction unit 105 starts the iterative process up to step S604 on the document information stored in the document information table 901 in the document information storage area 102.

ステップＳ８０２において、キーワード抽出部１０５は、文書情報のテキストを解析して、文書に含まれるキーワードを抽出する。 In step S802, the keyword extraction unit 105 analyzes the text of the document information and extracts keywords included in the document.

本発明のキーワード抽出では、文書を構成するキーワードごとに他の文書と弁別するのに寄与する度合いを数値化し、弁別に寄与する度合いの高いものを、度合いの数値とともにキーワードとして抽出する。以降、キーワードと弁別に寄与する度合いの組を、重み付きキーワード情報と呼ぶ。 In the keyword extraction of the present invention, the degree of contribution to discriminating from other documents is quantified for each keyword constituting the document, and those having a high degree of contribution to discrimination are extracted as keywords together with the degree value. Hereinafter, a set of the degree of contribution to the keyword and the discrimination is referred to as weighted keyword information.

キーワードの文書弁別に寄与する度合いはtf・idf値を用いることができる。キーワードの文書弁別に寄与する度合いはtf・idf値以外の値を用いてもよい。tf・idf値を求めるためには、文書集合におけるキーワードの出現頻度などの統計情報が必要となる。 The degree of contribution to keyword document discrimination can be determined using tf / idf values. A value other than the tf / idf value may be used as the degree of contribution of the keyword to document discrimination. In order to obtain the tf / idf value, statistical information such as the frequency of occurrence of keywords in the document set is required.

文書集合としては分析の対象とした文書全体とする場合が多いが、本発明においては、各種文書を集めた大規模な文書集合から予め抽出した統計情報を用いる。これは分析対象全体を表現し、分析対象全体でよく出現するキーワードが抽出対象外となるのを防ぐためである。 In many cases, the entire document to be analyzed is used as a document set. In the present invention, statistical information extracted in advance from a large-scale document set including various documents is used. This is to express the entire analysis target and prevent keywords that frequently appear in the entire analysis target from being excluded from the extraction target.

ステップＳ８０３においてキーワード抽出部１０５は、ステップＳ８０２で取得した重み付きキーワード情報を文書情報保存領域１０２の文書情報テーブル９０１に追加する。 In step S 803, the keyword extraction unit 105 adds the weighted keyword information acquired in step S 802 to the document information table 901 in the document information storage area 102.

ステップＳ８０４においてキーワード抽出部１０５は、次の文書情報がある場合、ステップＳ８０１からの処理を実施する。次の文書情報がない場合、処理を終了する。 In step S804, the keyword extraction unit 105 performs the processing from step S801 when there is next document information. If there is no next document information, the process ends.

キーワード抽出処理Ｓ３０３の具体例について記載する。 A specific example of the keyword extraction process S303 will be described.

ステップＳ８０１においてキーワード抽出部１０５は、文書情報テーブル９０１に保存された文書情報９０７に対して、ステップＳ６０４までの繰り返し処理を開始する。 In step S 801, the keyword extraction unit 105 starts the iterative processing up to step S 604 for the document information 907 stored in the document information table 901.

ステップＳ８０２において、キーワード抽出部１０５は、文書情報９０７のテキストカラム９０４のテキスト７０１を解析して、重み付きキーワード情報２１０１（図２１参照）を取得する In step S802, the keyword extraction unit 105 analyzes the text 701 in the text column 904 of the document information 907, and acquires weighted keyword information 2101 (see FIG. 21).

ステップＳ８０３においてキーワード抽出部１０５は、ステップＳ８０２で取得した重み付きキーワード情報２１０１を文書情報テーブル９０１の文書情報９０７のキーワードカラム９０６に追加する。 In step S803, the keyword extraction unit 105 adds the weighted keyword information 2101 acquired in step S802 to the keyword column 906 of the document information 907 in the document information table 901.

ステップＳ８０４においてキーワード抽出部１０５は、次の文書情報９０８があるので、ステップＳ８０１からの処理を実施する。 In step S804, the keyword extraction unit 105 performs the processing from step S801 because there is the next document information 908.

図１０では文書配置処理Ｓ３０４の詳細について記載する。 FIG. 10 describes details of the document arrangement processing S304.

本実施形態の文書配置処理においては、重み付き分野情報をベクトルと見做し、自己組織化マップを適用することで二次元平面上に文書を配置する。 In the document placement process of this embodiment, the weighted field information is regarded as a vector, and the document is placed on a two-dimensional plane by applying a self-organizing map.

ステップＳ１００１において文書配置部１０７は、文書情報保存領域１０２の文書情報テーブル９０１に保存された文書情報に対して自己組織化マップの学習を行う。自己組織化マップの学習および分類、ウォード法によるクラスタリングに必要な、重み付き分野（ベクトル）どうしの距離の算出式の一例を図２４に示す。これ以外の方法により距離を定義してもよい。
「式１」 In step S 1001, the document placement unit 107 learns a self-organizing map for the document information stored in the document information table 901 in the document information storage area 102. FIG. 24 shows an example of a formula for calculating the distance between weighted fields (vectors) necessary for learning and classification of the self-organizing map and clustering by the Ward method. The distance may be defined by other methods.
"Formula 1"

図１１に可視化した自己組織化マップの一例を示す。図の六角形はユニットと呼ばれ、自己組織化マップは学習が終了すると、分類対象をいずれかのユニットに分類することができるようになる。図１１におけるユニット内の番号はユニットの識別番号であり、図１２に示すユニット情報テーブルに各ユニットの情報が格納される。 FIG. 11 shows an example of the self-organizing map visualized. The hexagon in the figure is called a unit, and the self-organizing map can be classified into any unit when learning is completed. The numbers in the units in FIG. 11 are unit identification numbers, and information on each unit is stored in the unit information table shown in FIG.

各ユニットは重み付きの分野カラム１２０７に分野情報を持つ。yカラム１２０３は左上を起点としてユニットの縦方向の位置を、xカラム１２０４は左上を起点としてユニットの横方向の位置を意味する。 Each unit has field information in a weighted field column 1207. The y column 1203 means the vertical position of the unit starting from the upper left, and the x column 1204 means the horizontal position of the unit starting from the upper left.

ステップＳ１００２において文書配置部１０７は、学習が終わった自己組織化マップのユニットに対して、ウォード法などの階層的クラスタリングを実施する。階層的クラスタリングにより各ユニットは重み付き分野情報が近い順にまとめられる。 In step S 1002, the document placement unit 107 performs hierarchical clustering such as the Ward method on the self-organizing map unit for which learning has been completed. By hierarchical clustering, each unit is grouped in order of weighted field information.

図１３に階層的クラスリングの結果の一例を示す。一番末端の番号はユニットの識別番号であり、途中の番号はクラスタの識別番号を示す。 FIG. 13 shows an example of the result of hierarchical class ring. The number at the very end is the unit identification number, and the middle number is the cluster identification number.

ステップＳ１００３において文書配置部１０７は、クラスタの情報を保存する。 In step S1003, the document placement unit 107 stores cluster information.

図１４にクラスタ情報を保存するテーブルの一例を示す。 FIG. 14 shows an example of a table for storing cluster information.

ステップＳ１００４において文書配置部１０７は、文書情報保存領域１０２の文書情報テーブル９０１に保存された文書に対して、ステップＳ１００７までの繰り返し処理を開始する。 In step S1004, the document placement unit 107 starts the iterative process up to step S1007 for the document stored in the document information table 901 in the document information storage area 102.

ステップＳ１００５において文書配置部１０７は、処理中の文書情報が属するユニットを特定する。処理中の文書は、抽出した重み付き分野情報に最も近い重み付き分野情報を持つユニットに属すると判定される。 In step S1005, the document placement unit 107 identifies the unit to which the document information being processed belongs. The document being processed is determined to belong to the unit having the weighted field information closest to the extracted weighted field information.

ステップＳ１００６において文書配置部１０７は、ユニットの情報を文書情報保存領域１０２の文書情報テーブル９０１に保存する。 In step S 1006, the document placement unit 107 stores unit information in the document information table 901 in the document information storage area 102.

図１５にユニット情報を含んだ文書情報テーブル９０１の一例を示す。 FIG. 15 shows an example of a document information table 901 including unit information.

ステップＳ１００７において文書配置部１０７は、次の文書情報がある場合、ステップＳ１００４からの処理を実施する。次の文書情報がない場合、処理を終了する。 In step S1007, the document placement unit 107 performs the processing from step S1004 when there is next document information. If there is no next document information, the process ends.

文書配置処理Ｓ３０４の具体例について記載する。 A specific example of the document arrangement process S304 will be described.

ステップＳ１００１において文書配置部１０７は、文書情報テーブル９０１に保存された文書情報に対して自己組織化マップの学習を行い、ユニット情報テーブル１２０１を得る。 In step S1001, the document placement unit 107 learns a self-organizing map for the document information stored in the document information table 901, and obtains a unit information table 1201.

ステップＳ１００２において文書配置部１０７は、ユニット情報テーブル１２０１に含まれる各ユニットに対し、分野カラム１２０７から求めたお互いの距離により階層的クラスタリングを実施して図１３の樹形図で示されるクラスタリング結果を得る。 In step S1002, the document placement unit 107 performs hierarchical clustering on each unit included in the unit information table 1201 based on the mutual distance obtained from the field column 1207, and obtains the clustering result shown in the tree diagram of FIG. obtain.

ステップＳ１００３において文書配置部１０７は、図１３の樹形図で示されたクラスタリング結果を保存して、クラスタ情報テーブル１４０１を得る。 In step S 1003, the document placement unit 107 stores the clustering result shown in the tree diagram of FIG. 13 to obtain the cluster information table 1401.

ステップＳ１００４において文書配置部１０７は、文書情報９０７に対して、ステップＳ１０１２までの繰り返し処理を開始する。 In step S1004, the document placement unit 107 starts the iterative process up to step S1012 for the document information 907.

ステップＳ１００５において文書配置部１０７は、文書情報９０７の重み付き分野情報とユニット情報テーブル１２０１における各ユニットの分野カラム１２０７との距離を算出し、最も距離の小さいユニット情報１２０８を文書情報９０７が属するユニットとして特定する。 In step S1005, the document placement unit 107 calculates the distance between the weighted field information of the document information 907 and the field column 1207 of each unit in the unit information table 1201, and the unit information 1208 having the smallest distance is the unit to which the document information 907 belongs. As specified.

ステップＳ１００６において文書配置部１０７は、図１５に示すように、文書情報テーブル９０１の文書情報９０７のuidカラム１５０１にユニット情報１４１１の識別番号を追加する。 In step S1006, the document placement unit 107 adds the identification number of the unit information 1411 to the uid column 1501 of the document information 907 in the document information table 901 as shown in FIG.

ステップＳ１００７において文書配置部１０７は、次の文書情報９０８があるのでステップＳ１００４からの処理を実施する。 In step S1007, the document placement unit 107 performs the processing from step S1004 because the next document information 908 exists.

以下、同様の処理を繰り返すことで、全ての文書の自己組織化マップ上での配置位置が定まる。 Thereafter, by repeating the same processing, the arrangement positions of all documents on the self-organizing map are determined.

図１６ではキーワード配置処理Ｓ３０５の詳細について記載する。 FIG. 16 describes details of the keyword arrangement process S305.

ステップＳ１６０１においてキーワード配置部１０９は、文書情報保存領域１０２の文書情報テーブル９０１の文書情報に対して、ステップＳ１６０７までの繰り返し処理を開始する。 In step S1601, the keyword placement unit 109 starts the iterative process up to step S1607 for the document information in the document information table 901 in the document information storage area 102.

ステップＳ１６０２においてキーワード配置部１０９は、ステップＳ１６０１で取得された文書情報に含まれるキーワードに対し、ステップＳ１６０６までの繰り返し処理を開始する。 In step S1602, the keyword placement unit 109 starts the iterative process up to step S1606 for the keyword included in the document information acquired in step S1601.

ステップＳ１６０３においてキーワード配置部１０９は、処理中のキーワードがキーワード配置情報テーブル１７０１に登録されているか否かを判定する。キーワードがキーワード配置情報テーブル１７０１に登録済みであれば、ステップＳ１６０５に処理を移す。キーワードがキーワード配置情報テーブル１７０１に登録されていなければ、ステップＳ１６０４に処理を移す。 In step S1603, the keyword placement unit 109 determines whether the keyword being processed is registered in the keyword placement information table 1701. If the keyword has already been registered in the keyword arrangement information table 1701, the process proceeds to step S1605. If no keyword is registered in the keyword arrangement information table 1701, the process proceeds to step S1604.

ステップＳ１６０４においてキーワード配置部１０９は、処理中のキーワードをキーワード配置情報テーブル１７０１に登録する。 In step S1604, the keyword placement unit 109 registers the keyword being processed in the keyword placement information table 1701.

ステップＳ１６０５においてキーワード配置部１０９は、キーワード配置情報テーブル１７０１における処理中のキーワードに対し、処理中の文書情報が属するユニットの識別番号を追加する。 In step S1605, the keyword arrangement unit 109 adds the identification number of the unit to which the document information being processed belongs to the keyword being processed in the keyword arrangement information table 1701.

既にユニットの識別番号が登録されている場合は、出現頻度を１増やす。 If the unit identification number is already registered, the appearance frequency is increased by one.

ステップＳ１６０６においてキーワード配置部１０９は、次のキーワードがある場合、ステップＳ１６０２からの処理を実施する。次のキーワードがない場合、ステップＳ１６０７に処理を移す。 In step S1606, the keyword arrangement unit 109 performs the processing from step S1602 when there is a next keyword. If there is no next keyword, the process proceeds to step S1607.

ステップＳ１６０７においてキーワード配置部１０９は、次の文書情報がある場合、ステップＳ１６０１からの処理を実施する。次のキーワードがない場合、ステップＳ１６０８に処理を移す。 In step S1607, the keyword arrangement unit 109 performs the processing from step S1601 when there is next document information. If there is no next keyword, the process proceeds to step S1608.

ステップＳ１６０８においてキーワード配置部１０９は、キーワード配置情報テーブル１７０１におけるキーワード配置情報に対し、ステップＳ１６１３までの繰り返し処理を開始する。 In step S1608, the keyword placement unit 109 starts the iterative process up to step S1613 for the keyword placement information in the keyword placement information table 1701.

ステップＳ１６０９においてキーワード配置部１０９は、処理中のキーワード配置情報のユニット識別番号と対応するユニット情報と出現頻度から位置を算出し、処理中のキーワードの位置情報とする。位置を算出する式の一例を式２に示すが、別の方法により算出してもよい。
「式２」 In step S1609, the keyword placement unit 109 calculates a position from the unit information corresponding to the unit identification number of the keyword placement information being processed and the appearance frequency, and sets the position information as the keyword information being processed. An example of an expression for calculating the position is shown in Expression 2, but may be calculated by another method.
"Formula 2"

ステップＳ１６１０においてキーワード配置部１０９は、ステップＳ１６０４で算出したキーワード配置情報の位置情報に追加する。 In step S1610, the keyword placement unit 109 adds the keyword placement information calculated in step S1604 to the position information.

ステップＳ１６１１においてキーワード配置部１０９は、処理中のキーワードを含むユニットの集合に対し、ステップＳ１００８において取得したクラスタ情報を参照し、最もユニットの集合が合致するクラスタを取得する。合致の度合いの判定には式３に示す式により算出する。
「式３」 In step S 1611, the keyword placement unit 109 refers to the cluster information acquired in step S 1008 for the set of units including the keyword being processed, and acquires the cluster that most matches the set of units. The degree of match is determined by the equation shown in Equation 3.
"Formula 3"

ステップＳ１６１２においてキーワード配置部１０９は、ステップＳ１６１１において取得したクラスタ情報の識別番号をキーワード配置情報に追加する。 In step S1612, the keyword placement unit 109 adds the cluster information identification number acquired in step S1611 to the keyword placement information.

ステップＳ１６１３においてキーワード配置部１０９は、次のキーワード配置情報がある場合、ステップＳ１６０８からの処理を実施する。次のキーワード配置情報がない場合、処理を終了する。 In step S1613, when there is next keyword arrangement information, the keyword arrangement unit 109 performs processing from step S1608. If there is no next keyword arrangement information, the process ends.

キーワード配置処理Ｓ３０５の具体例について記載する。 A specific example of the keyword arrangement process S305 will be described.

ステップＳ１６０１においてキーワード配置部１０９は、文書情報テーブル９０１の文書情報９０７に対して、ステップＳ１６０７までの繰り返し処理を開始する。 In step S1601, the keyword placement unit 109 starts the iterative process up to step S1607 for the document information 907 in the document information table 901.

ステップＳ１６０２においてキーワード配置部１０９は、文書情報９０７に含まれるキーワード「高感度」に対し、ステップＳ１６０６までの繰り返し処理を開始する。 In step S1602, the keyword placement unit 109 starts the iterative process up to step S1606 for the keyword “high sensitivity” included in the document information 907.

ステップＳ１６０３においてキーワード配置部１０９は、処理中のキーワード「高感度」がキーワード配置情報テーブル１７０１に登録されていないので、ステップＳ１６０４に処理を移す。 In step S1603, the keyword placement unit 109 moves the process to step S1604 because the keyword “high sensitivity” being processed is not registered in the keyword placement information table 1701.

ステップＳ１６０４においてキーワード配置部１０９は、処理中のキーワード「高感度」をキーワード配置情報テーブル１７０１に登録する。 In step S 1604, the keyword placement unit 109 registers the keyword “high sensitivity” being processed in the keyword placement information table 1701.

ステップＳ１６０５においてキーワード配置部１０９は、キーワード配置情報テーブル１７０１における処理中のキーワード「高感度」に対し、文書情報９０７が属するユニットの識別番号「１４」を追加する。 In step S 1605, the keyword placement unit 109 adds the identification number “14” of the unit to which the document information 907 belongs to the keyword “high sensitivity” being processed in the keyword placement information table 1701.

ステップＳ１６０６においてキーワード配置部１０９は、次のキーワード「カメラ」があるので、ステップＳ１６０２からの処理を実施する。 In step S 1606, the keyword placement unit 109 performs the processing from step S 1602 because there is the next keyword “camera”.

以下、ステップＳ１６０２〜Ｓ１６０６までの処理を繰り返し、文書情報９０７のキーワードを全て処理して、ステップＳ１６０７に処理を移す。 Thereafter, the processes from step S1602 to S1606 are repeated, all the keywords of the document information 907 are processed, and the process proceeds to step S1607.

ステップＳ１６０７においてキーワード配置部１０９は、次の文書情報９０８があるので、ステップＳ１６０１からの処理を実施する。 In step S1607, the keyword placement unit 109 performs the processing from step S1601 because the next document information 908 is present.

以下、ステップＳ１６０１〜Ｓ１６０７までの処理を繰り返し、キーワード配置情報テーブル１７０１を得る。 Thereafter, the processing from step S1601 to S1607 is repeated to obtain the keyword arrangement information table 1701.

ステップＳ１６０８においてキーワード配置部１０９は、キーワード配置情報テーブル１７０１におけるキーワード配置情報１７０９に対し、ステップＳ１６１３までの繰り返し処理を開始する。 In step S1608, the keyword placement unit 109 starts the iterative process up to step S1613 for the keyword placement information 1709 in the keyword placement information table 1701.

ステップＳ１６０９においてキーワード配置部１０９は、キーワード配置情報１７０９のユニット識別番号と対応するユニット情報と出現頻度から位置を算出する。ユニット情報テーブルから、ユニット識別番号7のユニットの位置は(1, 2)を得る。他のユニット識別番号についても同様に位置を取得して、式２の式よりキーワードの位置（3.15, 2.55）を算出する。
y=(1*2+1*2+2*3+4*3+5*3+2*3+1*4+3*4+1*4)/20=3.15
x=(1*1+1*2+2*1+4*2+5*3+2*4+1*2+3*3+1*4)/20=2.55 In step S 1609, the keyword placement unit 109 calculates a position from the unit information corresponding to the unit identification number of the keyword placement information 1709 and the appearance frequency. From the unit information table, the position of the unit with unit identification number 7 is (1, 2). Similarly, the positions of the other unit identification numbers are acquired, and the keyword positions (3.15, 2.55) are calculated from the expression (2).
y = (1 * 2 + 1 * 2 + 2 * 3 + 4 * 3 + 5 * 3 + 2 * 3 + 1 * 4 + 3 * 4 + 1 * 4) /20=3.15
x = (1 * 1 + 1 * 2 + 2 * 1 + 4 * 2 + 5 * 3 + 2 * 4 + 1 * 2 + 3 * 3 + 1 * 4) /20=2.55

ステップＳ１６１０においてキーワード配置部１０９は、ステップＳ１６０４で算出した位置情報（3.15, 2.55）をキーワード配置情報１７０９のyカラム１７０４およびxカラム１７０５に追加する。 In step S1610, the keyword arrangement unit 109 adds the position information (3.15, 2.55) calculated in step S1604 to the y column 1704 and the x column 1705 of the keyword arrangement information 1709.

ステップＳ１６１１においてキーワード配置部１０９は、キーワード配置情報１７０９が含むユニット集合に対し、クラスタ情報テーブル１４０１を参照し、最もユニットの集合が合致するクラスタを取得する。 In step S 1611, the keyword placement unit 109 refers to the cluster information table 1401 for the unit set included in the keyword placement information 1709, and acquires the cluster that most matches the set of units.

クラスタ情報１４０２は全てのユニットを含む最上位のクラスタである。キーワード配置情報１７０９のユニット集合とクラスタ情報１４０２のユニット集合の合致度を式３より算出する。 The cluster information 1402 is the highest level cluster including all units. The degree of coincidence between the unit set of the keyword arrangement information 1709 and the unit set of the cluster information 1402 is calculated from Equation 3.

合致するユニットの数は{7, 8, 13, 14, 15, 16, 20, 21, 22}の９であり、キーワード配置情報１７０９のユニット数も９である。クラスタ情報１４０２のユニット数は36である。 The number of matching units is 9, {7, 8, 13, 14, 15, 16, 20, 21, 22}, and the number of units in the keyword arrangement information 1709 is also 9. The number of units of the cluster information 1402 is 36.

（クラスタ情報１４０２との合致度）=(2*9)/(9+36)=0.4 (Matching degree with cluster information 1402) = (2 * 9) / (9 + 36) = 0.4

キーワード配置情報１７０９のユニット集合とクラスタ情報１４０３のユニット集合の合致度を式３より算出する。合致するユニットの数は{7,8,14,15,16,21,22}の７であり、キーワード配置情報１７０９のユニット数は{7,8,13,14,15,16,20,21,22}の９である。クラスタ情報１４０７のユニット数は{7,8,14,15,16,21,22}の７である。 The degree of coincidence between the unit set of the keyword arrangement information 1709 and the unit set of the cluster information 1403 is calculated from Equation 3. The number of matching units is 7 of {7, 8, 14, 15, 16, 21, 22}, and the number of units of the keyword arrangement information 1709 is {7, 8, 13, 14, 15, 16, 20, 21 , 22}. The number of units of the cluster information 1407 is 7 of {7, 8, 14, 15, 16, 21, 22}.

（クラスタ情報１４０７との合致度）=(2*7)/(7+9)=0.875 (Matching degree with cluster information 1407) = (2 * 7) / (7 + 9) = 0.875

このような計算を全てのクラスタについて算出し、最も合致度の高いクラスタを選択する。キーワード配置情報１７０９に対してはクラスタ情報１４０７が最も合致する。 Such calculation is performed for all clusters, and the cluster with the highest degree of match is selected. The cluster information 1407 most closely matches the keyword arrangement information 1709.

ステップＳ１６１２においてキーワード配置部１０９は、クラスタ情報１４０７の識別番号64をキーワード配置情報１７０９のクラスタ識別番号１７０７に追加する。 In step S 1612, the keyword arrangement unit 109 adds the identification number 64 of the cluster information 1407 to the cluster identification number 1707 of the keyword arrangement information 1709.

ステップＳ１６１３においてキーワード配置部１０９は、次のキーワード配置情報１７１０があるので、ステップＳ１６０８からの処理を実施する。 In step S 1613, the keyword arrangement unit 109 performs the processing from step S 1608 because there is next keyword arrangement information 1710.

図１９ではリンク生成処理Ｓ３０６の詳細について記載する。 FIG. 19 describes details of the link generation processing S306.

ステップＳ１９０１においてリンク生成部１１１は、ステップＳ１００８で取得したクラスタ情報に対し、ステップＳ１９１２までの繰り返し処理を開始する。 In step S1901, the link generation unit 111 starts the iterative process up to step S1912 for the cluster information acquired in step S1008.

ステップＳ１９０２においてリンク生成部１１１は、処理中のクラスタに対応付けられたキーワードをキーワード配置情報から取得する。 In step S1902, the link generation unit 111 acquires a keyword associated with the cluster being processed from the keyword arrangement information.

ステップＳ１９０３においてリンク生成部１１１は、処理中のクラスタに対応付けられたキーワードに対し、ステップＳ１９１０までの繰り返し処理を開始する。 In step S1903, the link generation unit 111 starts the iterative process up to step S1910 for the keyword associated with the cluster being processed.

ステップＳ１９０４においてリンク生成部１１１は、処理中のクラスタの上位のクラスタを取得する。 In step S1904, the link generation unit 111 acquires a cluster higher than the cluster being processed.

ステップＳ１９０５においてリンク生成部１１１は、ステップＳ１９０４で取得したクラスタに対応付けられたキーワードを上位キーワードとして取得する。 In step S1905, the link generation unit 111 acquires a keyword associated with the cluster acquired in step S1904 as an upper keyword.

ステップＳ１９０６においてリンク生成部１１１は、上位キーワードがあるか否かを判定する。上位キーワードがある場合、ステップＳ１９０７に移す。上位キーワードがない場合、ステップＳ１９０９に処理を移す。 In step S1906, the link generation unit 111 determines whether there is a higher keyword. If there is a higher keyword, the process moves to step S1907. If there is no higher keyword, the process proceeds to step S1909.

ステップＳ１９０７においてリンク生成部１１１は、処理中のキーワードに最も関連する上位キーワードを選択する。選択の基準としては、処理中のキーワード配置情報のユニット集合のうち、上位のキーワード配置情報のユニット集合に含まれている割合や、キーワード配置情報の位置(y, x)から算出した距離を用いることができる。他の選択基準を用いてもよい。 In step S1907, the link generation unit 111 selects an upper keyword most relevant to the keyword being processed. As a selection criterion, the ratio calculated from the position of the keyword arrangement information (y, x) or the ratio included in the higher keyword arrangement information unit set in the keyword arrangement information unit set being processed is used. be able to. Other selection criteria may be used.

ステップＳ１９０８においてリンク生成部１１１は、キーワード配置情報のリンク情報に上位キーワードを設定する。 In step S1908, the link generation unit 111 sets the upper keyword in the link information of the keyword arrangement information.

ステップＳ１９０９においてリンク生成部１１１は、ステップＳ１９０４で取得した上位のクラスタが最上位であるか否かを判定する。上位のクラスタが最上位である場合、ステップＳ１９１０に処理を移す。上位のクラスタが最上位でない場合、ステップＳ１９０４に処理を移す。 In step S1909, the link generation unit 111 determines whether the upper cluster acquired in step S1904 is the highest. If the upper cluster is the highest, the process proceeds to step S1910. If the upper cluster is not the highest cluster, the process proceeds to step S1904.

ステップＳ１９１０においてリンク生成部１１１は、処理中のクラスタに対応付けられた次のキーワードがある場合、ステップＳ１９０３からの処理を実施する。次のキーワードがない場合、ステップＳ１９１１に処理を移す。 In step S1910, if there is a next keyword associated with the cluster being processed, the link generation unit 111 performs the processing from step S1903. If there is no next keyword, the process proceeds to step S1911.

ステップＳ１９１１においてリンク生成部１１１は、次のクラスタがある場合、ステップＳ１９０１からの処理を実施する。次のクラスタがない場合、処理を終了する。 In step S1911, the link generation unit 111 performs the processing from step S1901 when there is a next cluster. If there is no next cluster, the process ends.

リンク生成処理Ｓ３０６の具体例について記載する。 A specific example of the link generation process S306 will be described.

ステップＳ１９０１においてリンク生成部１１１は、クラスタ情報１４０２に対し、ステップＳ１９１２までの繰り返し処理を開始する。 In step S1901, the link generation unit 111 starts the iterative process up to step S1912 for the cluster information 1402.

ステップＳ１９０２においてリンク生成部１１１は、クラスタ情報１４０２（クラスタ識別番号７２）に対応付けられたキーワード配置情報１７１１（「高画質」）を取得する。 In step S1902, the link generation unit 111 acquires keyword arrangement information 1711 (“high image quality”) associated with the cluster information 1402 (cluster identification number 72).

ステップＳ１９０３においてリンク生成部１１１は、クラスタ情報１４０２に対応付けられたキーワード配置情報１７１１に対し、ステップＳ１９１０までの繰り返し処理を開始する。 In step S1903, the link generation unit 111 starts the iterative process up to step S1910 for the keyword arrangement information 1711 associated with the cluster information 1402.

ステップＳ１９０４においてリンク生成部１１１は、クラスタ情報１４０２の上位のクラスタ情報を取得しようとするが存在しないため。上位のクラスタ情報を取得できない。 In step S1904, the link generation unit 111 tries to acquire cluster information at a higher level than the cluster information 1402, but does not exist. The upper cluster information cannot be acquired.

ステップＳ１９０５においてリンク生成部１１１は、ステップＳ１９０４で上位のクラスタ情報を取得できなかったので、上位のキーワードも取得できない。 In step S1905, the link generation unit 111 cannot acquire higher-order cluster information in step S1904, and therefore cannot acquire higher-order keywords.

ステップＳ１９０６においてリンク生成部１１１は、上位キーワードがないので、ステップＳ１９０９に処理を移す。 In step S1906, the link generation unit 111 moves the process to step S1909 because there is no higher keyword.

ステップＳ１９０９においてリンク生成部１１１は、ステップＳ１９０４で取得した上位のクラスタ情報１４０２が最上位であるので、ステップＳ１９１０に処理を移す。 In step S1909, the link generation unit 111 moves the process to step S1910 because the upper cluster information 1402 acquired in step S1904 is the highest.

ステップＳ１９１０においてリンク生成部１１１は、クラスタ情報１４０２に対応付けられた次のキーワード配置情報がないので、ステップＳ１９１１に処理を移す。 In step S1910, the link generation unit 111 moves to step S1911 because there is no next keyword arrangement information associated with the cluster information 1402.

ステップＳ１９１１においてリンク生成部１１１は、次のクラスタ情報１４０３があるので、ステップＳ１９０１からの処理を実施する。 In step S1911, the link generation unit 111 performs the processing from step S1901 because there is next cluster information 1403.

ステップＳ１９０１においてリンク生成部１１１は、クラスタ情報１４０３に対し、ステップＳ１９１２までの繰り返し処理を開始する。 In step S1901, the link generation unit 111 starts the iterative process up to step S1912 for the cluster information 1403.

ステップＳ１９０２においてリンク生成部１１１は、クラスタ情報１４０３（クラスタ識別番号７１）に対応付けられたキーワード配置情報１７１２（「非球面レンズ」）を取得する。 In step S1902, the link generation unit 111 acquires keyword arrangement information 1712 (“aspheric lens”) associated with the cluster information 1403 (cluster identification number 71).

ステップＳ１９０３においてリンク生成部１１１は、クラスタ情報１４０３に対応付けられたキーワード配置情報１７１２に対し、ステップＳ１９１０までの繰り返し処理を開始する。 In step S1903, the link generation unit 111 starts the iterative process up to step S1910 for the keyword arrangement information 1712 associated with the cluster information 1403.

ステップＳ１９０４においてリンク生成部１１１は、クラスタ情報１４０３の上位のクラスタ情報１４０２を取得する。 In step S1904, the link generation unit 111 acquires the upper cluster information 1402 of the cluster information 1403.

ステップＳ１９０５においてリンク生成部１１１は、ステップＳ１９０４で上位のクラスタ情報１４０２に（クラスタ識別番号７２）に対応付けられたキーワード配置情報１７１１を取得する。 In step S1905, the link generation unit 111 acquires keyword arrangement information 1711 associated with (cluster identification number 72) in the upper cluster information 1402 in step S1904.

ステップＳ１９０６においてリンク生成部１１１は、上位のキーワード配置情報があるので、ステップＳ１９０７に処理を移す。 In step S1906, the link generation unit 111 moves the process to step S1907 because there is upper keyword arrangement information.

ステップＳ１９０７においてリンク生成部１１１は、キーワード配置情報１７１２に最も関連する上位のキーワード配置情報としてキーワード配置情報１７１１を選択する。 In step S1907, the link generation unit 111 selects the keyword arrangement information 1711 as the upper keyword arrangement information most relevant to the keyword arrangement information 1712.

ステップＳ１９０８においてリンク生成部１１１は、キーワード配置情報１７１２の上位キーワードカラム１７０８に上位キーワードの識別番号「２５」を設定する。 In step S 1908, the link generation unit 111 sets the upper keyword identification number “25” in the upper keyword column 1708 of the keyword arrangement information 1712.

ステップＳ１９１０においてリンク生成部１１１は、クラスタ情報１４０３に対応付けられた次のキーワード配置情報がないので、ステップＳ１９１１に処理を移す。 In step S1910, the link generation unit 111 moves to step S1911 because there is no next keyword arrangement information associated with the cluster information 1403.

ステップＳ１９１１においてリンク生成部１１１は、次のクラスタ情報１４０４があるので、ステップＳ１９０１からの処理を実施する。 In step S1911, the link generation unit 111 performs the processing from step S1901 because there is next cluster information 1404.

キーワード配置情報を用いることで図２０のようなマインドマップ用の図を作成することができる。 By using the keyword arrangement information, a mind map diagram as shown in FIG. 20 can be created.

以上、本発明によれば、指定された文書集合から特徴的なキーワードを抽出し、抽出キーワード間の関係に基づき、関連が深い抽出キーワードを近くに配置することが可能となり、より直観的で理解しやすい図を作成することができるようになる。 As described above, according to the present invention, it is possible to extract characteristic keywords from a designated document set, and to place extracted keywords that are closely related to each other based on the relationship between the extracted keywords. It becomes possible to create a figure that is easy to do.

以上、実施形態例を詳述したが、本発明は、例えば、方法、プログラムもしくは記憶媒体等としての実施態様をとることが可能であり、具体的には、複数の機器から構成されるシステムに適用しても良いし、また、一つの機器からなる装置に適用しても良い。また、本発明におけるプログラムは、各処理方法をコンピュータが実行可能（読み取り可能）なプログラムであり、本発明の記憶媒体は、各処理方法をコンピュータが実行可能なプログラムが記憶されている。 Although the embodiment has been described in detail above, the present invention can take an embodiment as, for example, a method, a program, a storage medium, or the like. Specifically, the present invention is applied to a system composed of a plurality of devices. The present invention may be applied, or may be applied to an apparatus composed of one device. The program according to the present invention is a program that allows a computer to execute (read) each processing method, and the storage medium according to the present invention stores a program that allows the computer to execute each processing method.

なお、本発明におけるプログラムは、各装置の処理方法ごとのプログラムであってもよい。 The program in the present invention may be a program for each processing method of each device.

以上のように、前述した実施形態の機能を実現するプログラムを記録した記録媒体を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ）が記録媒体に格納されたプログラムを読取り実行することによっても、本発明の目的が達成されることは言うまでもない。 As described above, a recording medium that records a program that implements the functions of the above-described embodiments is supplied to a system or apparatus, and a computer (or CPU or MPU) of the system or apparatus stores the program stored in the recording medium. It goes without saying that the object of the present invention can also be achieved by performing reading.

この場合、記録媒体から読み出されたプログラム自体が本発明の新規な機能を実現することになり、そのプログラムを記憶した記録媒体は本発明を構成することになる。 In this case, the program itself read from the recording medium realizes the novel function of the present invention, and the recording medium storing the program constitutes the present invention.

プログラムを供給するための記録媒体としては、例えば、フレキシブルディスク，ハードディスク，光ディスク，光磁気ディスク，ＣＤ−ＲＯＭ，ＣＤ−Ｒ，ＤＶＤ−ＲＯＭ，磁気テープ，不揮発性のメモリカード，ＲＯＭ，ＥＥＰＲＯＭ，シリコンディスク等を用いることができる。 As a recording medium for supplying the program, for example, a flexible disk, hard disk, optical disk, magneto-optical disk, CD-ROM, CD-R, DVD-ROM, magnetic tape, nonvolatile memory card, ROM, EEPROM, silicon A disk or the like can be used.

また、コンピュータが読み出したプログラムを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのプログラムの指示に基づき、コンピュータで稼働しているＯＳ等が実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Further, by executing the program read by the computer, not only the functions of the above-described embodiments are realized, but also an OS or the like operating on the computer based on an instruction of the program is a part of the actual processing or It goes without saying that the case where the functions of the above-described embodiments are realized by performing all of the above processing is also included.

さらに、記録媒体から読み出されたプログラムが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれた後、そのプログラムコードの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵ等が実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Furthermore, after the program read from the recording medium is written to the memory provided in the function expansion board inserted into the computer or the function expansion unit connected to the computer, the function expansion board is based on the instructions of the program code. It goes without saying that the case where the CPU or the like provided in the function expansion unit performs part or all of the actual processing and the functions of the above-described embodiments are realized by the processing.

また、本発明は、複数の機器から構成されるシステムに適用しても、１つの機器からなる装置に適用してもよい。また、本発明は、システムあるいは装置にプログラムを供給することによって達成される場合にも適応できることは言うまでもない。この場合、本発明を達成するためのプログラムを格納した記録媒体を該システムあるいは装置に読み出すことによって、そのシステムあるいは装置が、本発明の効果を享受することが可能となる。 Further, the present invention may be applied to a system composed of a plurality of devices or an apparatus composed of a single device. Needless to say, the present invention can be applied to a case where the present invention is achieved by supplying a program to a system or apparatus. In this case, by reading a recording medium storing a program for achieving the present invention into the system or apparatus, the system or apparatus can enjoy the effects of the present invention.

さらに、本発明を達成するためのプログラムをネットワーク上のサーバ，データベース等から通信プログラムによりダウンロードして読み出すことによって、そのシステム、あるいは装置が、本発明の効果を享受することが可能となる。なお、上述した各実施形態およびその変形例を組み合わせた構成も全て本発明に含まれるものである。 Furthermore, by downloading and reading a program for achieving the present invention from a server, database, etc. on a network using a communication program, the system or apparatus can enjoy the effects of the present invention. In addition, all the structures which combined each embodiment mentioned above and its modification are also included in this invention.

１００知識構造抽出装置
１３０利用者端末
１４０外部ネットワーク
２０１ＣＰＵ
２０２ＲＯＭ
２０３ＲＡＭ
２０４システムバス
２０５入力コントローラ
２０６ビデオコントローラ
２０７メモリコントローラ
２０８通信Ｉ／Ｆコントローラ
２０９入力部
２１０ディスプレイ
２１１外部メモリ 100 Knowledge structure extraction device 130 User terminal 140 External network 201 CPU
202 ROM
203 RAM
204 System bus 205 Input controller 206 Video controller 207 Memory controller 208 Communication I / F controller 209 Input unit 210 Display 211 External memory

Claims

Field classification means for classifying the document based on the document content;
Contribution calculation means for calculating a contribution for discriminating from other documents with respect to a keyword included in the document;
Unit classification means for classifying the document into units using a self-organizing map based on the contribution calculated by the contribution calculation means;
Display means for displaying the units classified by the unit classification means;
Cluster generation means for generating a cluster of each unit using hierarchical clustering based on each unit classified by the unit classification means;
Arrangement information calculating means for calculating arrangement information of the keyword with respect to the display means from a unit specified from the appearance frequency of the keyword included in the document in the unit;
Link generation display means for generating a link from the association between the cluster to which the specified unit belongs and another cluster by displaying a degree of coincidence between the specified unit and the cluster, and displaying the link on the display means;
A knowledge structure extraction device characterized by comprising:

A knowledge structure extraction method for extracting and displaying related keywords from a document in a knowledge structure extraction device,
The field classification means of the knowledge structure extraction device includes a field classification step of classifying the document based on the document content;
The contribution calculation unit of the knowledge structure extraction device includes a contribution calculation step for calculating a contribution for discriminating the keyword included in the document from another document;
A unit classification step of classifying the document into units using a self-organizing map based on the contribution calculated in the contribution calculation step;
The display means of the knowledge structure extraction device displays a unit classified by the unit classification step,
The cluster generation means of the knowledge structure extraction device includes a cluster generation step of generating a cluster of each unit using hierarchical clustering based on each unit classified by the unit classification step;
An arrangement information calculating unit of the knowledge structure extracting device calculates an arrangement information of the keyword with respect to the display step from a unit specified from an appearance frequency of the keyword included in the document in the unit;
The link generation means of the knowledge structure extraction device generates and displays a link from the association between the cluster to which the specified unit belongs and another cluster by obtaining the degree of coincidence between the specified unit and the cluster. A link generation display step to perform,
The knowledge structure extraction method characterized by performing this.

A program executed in a knowledge structure extraction device that extracts and displays related keywords from a document,
The knowledge structure extraction device;
Field classification means for classifying the document based on the document content;
Contribution calculation means for calculating a contribution for discriminating from other documents with respect to a keyword included in the document;
Unit classification means for classifying the document into units using a self-organizing map based on the contribution calculated by the contribution calculation means;
Display means for displaying the units classified by the unit classification means;
Cluster generation means for generating a cluster of each unit using hierarchical clustering based on each unit classified by the unit classification means;
Arrangement information calculating means for calculating arrangement information of the keyword with respect to the display means from a unit specified from the appearance frequency of the keyword included in the document in the unit;
Link generation display means for generating a link from the association between the cluster to which the specified unit belongs and another cluster by displaying a degree of coincidence between the specified unit and the cluster, and displaying the link on the display means;
A program characterized by functioning.