JP2011227742A

JP2011227742A - Contrast display data generation device or contrast display data generation method

Info

Publication number: JP2011227742A
Application number: JP2010097535A
Authority: JP
Inventors: Takahiro Miura; 高広三浦; Kunihiro Kitamura; 国博北村
Original assignee: Hitachi Solutions Ltd
Current assignee: Hitachi Solutions Ltd
Priority date: 2010-04-21
Filing date: 2010-04-21
Publication date: 2011-11-10

Abstract

PROBLEM TO BE SOLVED: To achieve micro-analysis under the consideration of macro-analysis.SOLUTION: When document data are given, a classification means 5 classifies it into an entity-categorized microscopic viewpoint group or a macroscopic viewpoint group based on an entity specification word. A similarity-categorized group representative keyword determination means 8 classifies it into a similarity-categorized group, and determines a similarity-categorized group representative keyword in each similarity-categorized group. An entity-categorized microscopic viewpoint group representative keyword determination means 7 determines an entity-categorized microscopic viewpoint group representative keyword. A similarity determination means 9 determines similarity from those representative keywords. A generation means 10 determines the combination of extraction from the similarity, and generates relevant display data by arranging the document specification data of the extracted entity-categorized microscopic viewpoint group and the document specification data of the extracted similarity-categorized group on the same time axis for every group.

Description

この発明は、対比表示データ生成装置に関し、特に、マクロ分析およびミクロ分析の相互利用に関する。 The present invention relates to a contrast display data generation apparatus, and more particularly to mutual use of macro analysis and micro analysis.

特許文献１には、様々な関係への言及を含む可能性のあるニュース、ブログ、業界レポート、業界紙（誌）等の様々な形態のテキストから、データマイニングによって時系列的な関係やイベントを抽出するためのエンティティ関係マイニング装置が開示されている。 In Patent Document 1, time-series relationships and events are collected by data mining from various forms of text such as news, blogs, industry reports, and industry papers (magazines) that may include references to various relationships. An entity relationship mining device for extraction is disclosed.

かかる装置により、着目する企業と直接関わりのある業界・分野内の他企業動向等のいわゆるミクロ環境を分析することができる。 With such an apparatus, it is possible to analyze a so-called microenvironment such as a trend of other companies in the industry / field directly related to the company of interest.

特開2005-182528号公報JP 2005-182528

しかし、上記特許文献１においては、企業の戦略策定に大きな影響を及ぼす、法的規制・金利動向・環境問題など、いわゆるマクロ環境を考慮した分析は困難であるという問題があった。 However, the above-mentioned Patent Document 1 has a problem that it is difficult to perform analysis in consideration of so-called macro environment such as legal regulations, interest rate trends, and environmental problems that have a great influence on the strategy formulation of a company.

この発明は、上記の問題点を解決して、マクロ分析とミクロ分析を関連づけて提示できる対比表示データ生成装置を提供することを目的とする。 An object of the present invention is to provide a contrast display data generation apparatus that solves the above-described problems and can present a macro analysis and a micro analysis in association with each other.

(1)本発明にかかる対比表示データ生成装置は、Ａ）主体を特定するための主体特定ワードを１または２以上記憶する主体特定ワード記憶手段、Ｂ）本文および作成時期データが対応づけられた文書データが与えられると、各文書データの本文に前記１または２以上の主体特定ワードが存在するか否か判断し、前記本文に前記１または２以上の主体特定ワードが存在する場合には前記主体特定ワードで特定される主体毎別に主体別微視的観点群に分類し、前記本文に、主体特定ワードが存在しない場合には巨視的観点群に分類する分類手段、Ｃ）前記分類結果を記憶する分類結果記憶手段、Ｄ）前記巨視的観点群に属する文書の本文におけるワード出現度に基づいて、文書毎に、文書別キーワードを１または２以上決定し、各文書別キーワードが合致する文書を類似度別群に分類するとともに、各類似度別群における類似度別群代表キーワードを１または２以上決定する類似度別群代表キーワード決定手段、Ｅ）操作者の指定した、または抽出対象として予め定められた主体特定ワードについて、当該主体特定ワードで特定される主体別微視的観点群に属する文書の本文における出現度に基づいて、当該主体別微視的観点群の主体別微視的観点群代表キーワードを１または２以上決定する主体別微視的観点群代表キーワード決定手段、Ｆ）前記各主体別微視的観点群代表キーワードと、前記各類似度別群文書別キーワードとの合致度から主体別微視的観点群と類似度別群との類似度を判断する類似度判断手段、Ｇ）前記主体別微視的観点群と類似度別群との類似度が所定の閾値よりも高い組み合わせを抽出し、抽出された主体別微視的観点群に属する文書データの文書特定データおよび類似度別群に属する文書の文書データ特定データを群毎に、同じ時間軸上に前記作成時期の時系列順に並べた関連表示データを生成する生成手段を備えている。 (1) In the comparison display data generating device according to the present invention, A) subject specifying word storage means for storing one or more subject specifying words for specifying the subject, B) text and creation time data are associated with each other. When document data is given, it is determined whether or not the one or more subject specifying words exist in the body of each document data, and if the one or more subject specifying words exist in the body, Classifying means for classifying each subject specified by the subject specifying word into a microscopic viewpoint group for each subject, and when the subject specifying word does not exist, classifying means into a macroscopic viewpoint group, C) A classification result storage means for storing; D) one or more keywords for each document are determined for each document based on the word appearance degree in the body of the document belonging to the macroscopic viewpoint group; Classifying matching documents into groups by similarity, and group representative keywords determining means by similarity for determining one or more groups representative keywords by similarity in each group by similarity, E) designated by the operator, or For subject-specific words predetermined as extraction targets, by subject-specific microscopic viewpoint group based on the degree of appearance in the text of the document belonging to the subject-specific microscopic viewpoint group specified by the subject-specific word A subject-specific microscopic viewpoint group representative keyword determining means for determining one or more microscopic viewpoint group representative keywords; F) each subject-specific microscopic viewpoint group representative keyword and each similarity-specific group document keyword G) similarity determination means for determining the similarity between the subject-specific microscopic viewpoint group and the similarity-specific group; and G) the similarity between the subject-specific microscopic viewpoint group and the similarity-specific group is predetermined. Than the threshold of The document creation data extracted from the subject-specific microscopic viewpoint group and the document data identification data of the document belonging to the similarity-specific group are created on the same time axis for each group. Generating means for generating related display data arranged in chronological order.

したがって、主体別微視的観点群に属する文書特定データと、主体別微視的観点群に属する文書特定データを同じ時間軸に並べた対比データを生成することができる。 Therefore, it is possible to generate comparison data in which document specifying data belonging to the subject-specific microscopic viewpoint group and document specifying data belonging to the subject-specific microscopic viewpoint group are arranged on the same time axis.

(2)本発明にかかる対比表示データ生成装置は、Ａ）主体を特定するための主体特定ワードを記憶する主体特定ワード記憶手段、Ｂ）作成時期データが対応づけられた文書データが与えられると、各文書データについて前記主体特定ワードが存在するか否か判断し、前記主体特定ワードが存在する場合には前記主体特定ワードで特定される主体毎に主体別微視的観点群に分類し、前記主体特定ワードが存在しない場合には巨視的観点群に分類する分類手段、Ｃ）前記分類結果を記憶する分類結果記憶手段、Ｄ）前記巨視的観点群に属する文書について、文書毎に、文書別キーワードを決定し、決定した各文書別キーワードに基づいて類似度別群に分類するとともに、各類似度別群における類似度別群代表キーワードを決定する類似度別群代表キーワード決定手段、Ｅ）操作者の指定した、または抽出対象として予め定められた主体特定ワードについて、前記各主体別微視的観点群に属する文書について、主体別微視的観点群別に代表キーワードを決定する主体別微視的観点群代表キーワード決定手段、Ｆ）前記主体別微視的観点群代表キーワード決定手段が決定した前記主体別微視的観点群別の代表キーワードと、前記各類似度別群のキーワードとの類似度を判断する類似度判断手段、Ｇ）前記類似度判断手段が判断した前記主体別微視的観点群と類似度別群群との類似度が所定の閾値よりも高い組み合わせを抽出し、抽出された主体別微視的観点群に属する文書データの文書特定データおよび類似度別群に属する文書の文書データ特定データを群毎に、同じ時間軸上に前記作成時期の時系列順に並べた関連表示データを生成する生成手段を備えている。 (2) The comparison display data generating apparatus according to the present invention is provided with A) subject specifying word storage means for storing a subject specifying word for specifying the subject, and B) document data associated with creation time data. Determining whether or not the subject specifying word exists for each document data, and if the subject specifying word exists, classify each subject specified by the subject specifying word into a subject-specific microscopic viewpoint group, Classifying means for classifying into a macroscopic viewpoint group when the subject specifying word does not exist, C) Classification result storing means for storing the classification result, and D) Documents belonging to the macroscopic viewpoint group for each document. Another keyword is determined, classified into groups by similarity based on the determined keywords by document, and group representative key by similarity to determine a group representative keyword by similarity in each group by similarity Word determination means, E) For a subject-specific word designated by the operator or predetermined as an extraction target, a representative keyword is assigned to each subject-specific microscopic viewpoint group for each subject-specific microscopic viewpoint group. A subject-specific microscopic viewpoint group representative keyword determining means; F) a representative keyword for each subject-specific microscopic viewpoint group determined by the subject-specific microscopic viewpoint group representative keyword determining means; Similarity determination means for determining the similarity between the group keyword and G) the similarity between the subject-specific microscopic viewpoint group determined by the similarity determination means and the similarity-specific group group is higher than a predetermined threshold A combination is extracted, and the document creation data extracted from the subject-specific microscopic viewpoint group and the document data identification data of the document belonging to the similarity-specific group are created on the same time axis for each group. And a generating means for generating a related display data arranged in time series order.

(3)本発明にかかる対比表示データ生成装置においては、前記主体別微視的観点群代表キーワード決定手段は、操作者の指定した、または抽出対象として予め定められた主体特定ワード以外の主体特定ワードについても、前記各主体別微視的観点群に属する文書について、主体別微視的観点群別に代表キーワードを決定し、前記生成手段は、操作者の指定した、または抽出対象として予め定められた主体特定ワード以外の主体特定ワードで特定される主体別微視的観点群のうち、前記所定の閾値よりも高い組み合わせとなる主体別微視的観点群についても、前記同じ時間軸上に関連表示する。 (3) In the comparison display data generating device according to the present invention, the subject-specific microscopic viewpoint group representative keyword determining means specifies the subject other than the subject-specific word designated by the operator or predetermined as the extraction target. For the word, the representative keyword is determined for each subject-specific microscopic viewpoint group for the documents belonging to each subject-specific microscopic viewpoint group, and the generation means is specified by the operator or predetermined as an extraction target. Among subject-specific microscopic viewpoint groups specified by subject-specific words other than subject-specific words, subject-specific microscopic viewpoint groups that are combinations higher than the predetermined threshold are also related on the same time axis. indicate.

したがって、操作者の指定した、または抽出対象として予め定められた主体特定ワード以外の主体特定ワードで特定される主体別微視的観点群についても、関連表示することができる。これにより、操作者がかかる主体別微視的観点群についても主体別微視的分析を踏まえた巨視的観点分析が可能となる。 Accordingly, the subject-specific microscopic viewpoint group specified by the subject specifying word other than the subject specifying word specified by the operator or predetermined as the extraction target can be displayed in a related manner. As a result, a macroscopic viewpoint analysis based on the subject-specific microscopic analysis can be performed on the subject-specific microscopic viewpoint group.

(4)本発明にかかる対比表示データ生成プログラムは、コンピュータを以下の手段として機能させるための対比表示データ生成プログラムである。Ａ）主体を特定するための主体特定ワードを記憶する主体特定ワード記憶手段、Ｂ）作成時期データが対応づけられた文書データが与えられると、各文書データについて前記主体特定ワードが存在するか否か判断し、前記主体特定ワードが存在する場合には前記主体特定ワードで特定される主体毎に主体別微視的観点群に分類し、前記主体特定ワードが存在しない場合には巨視的観点群に分類する分類手段、Ｃ）前記分類結果を記憶する分類結果記憶手段、Ｄ）前記巨視的観点群に属する文書について、文書毎に、文書別キーワードを決定し、決定した各文書別キーワードに基づいて類似度別群に分類するとともに、各類似度別群における類似度別群代表キーワードを決定する類似度別群代表キーワード決定手段、Ｅ）操作者の指定した、または抽出対象として予め定められた主体特定ワードについて、前記各主体別微視的観点群に属する文書について、主体別微視的観点群別に代表キーワードを決定する主体別微視的観点群代表キーワード決定手段、Ｆ）前記主体別微視的観点群代表キーワード決定手段が決定した前記主体別微視的観点群別の代表キーワードと、前記各類似度別群のキーワードとの類似度を判断する類似度判断手段、Ｇ）前記類似度判断手段が判断した前記主体別微視的観点群と類似度別群群との類似度が所定の閾値よりも高い組み合わせを抽出し、抽出された主体別微視的観点群に属する文書データの文書特定データおよび類似度別群に属する文書の文書データ特定データを群毎に、同じ時間軸上に前記作成時期の時系列順に並べた関連表示データを生成する生成手段。 (4) A comparison display data generation program according to the present invention is a comparison display data generation program for causing a computer to function as the following means. A) Subject specifying word storage means for storing a subject specifying word for specifying the subject, B) If document data associated with creation time data is given, whether or not the subject specifying word exists for each document data If the subject specifying word exists, classify it into a subject-specific microscopic viewpoint group for each subject specified by the subject specifying word, and macroscopic viewpoint group if the subject specifying word does not exist C) Classification result storage means for storing the classification result, D) For each document belonging to the macroscopic viewpoint group, a document-specific keyword is determined for each document, and based on the determined document-specific keywords A group representative keyword deciding means by similarity degree for determining a group representative keyword by similarity degree in each similarity degree group, and E) Is a subject-specific microscopic viewpoint group representative keyword determination that determines a representative keyword for each subject-specific microscopic viewpoint group for a subject belonging to each subject-specific microscopic viewpoint group for a subject-specific word predetermined as an extraction target Means F) a degree of similarity for determining the degree of similarity between the representative keyword for each subject-specific microscopic viewpoint group determined by the subject-specific micro-viewpoint group representative keyword determining means and the keyword for each similarity degree group Determining means, G) extracting a combination in which the similarity between the subject-specific microscopic viewpoint group determined by the similarity determining means and the similarity-specific group group is higher than a predetermined threshold, and extracting the subject-specific microscopic Generation of related display data in which document specific data of document data belonging to a general viewpoint group and document data specific data of a document belonging to a group classified by similarity are arranged for each group in the time series of the creation time on the same time axis Forming means.

(5)本発明にかかる対比表示データ生成方法は、コンピュータに主体を特定するための主体特定ワードを記憶しておき、前記コンピュータが、下記ステップを実行する。Ａ）作成時期データが対応づけられた文書データが与えられると、各文書データについて前記主体特定ワードが存在するか否か判断し、前記主体特定ワードが存在する場合には前記主体特定ワードで特定される主体毎に主体別微視的観点群に分類し、前記主体特定ワードが存在しない場合には巨視的観点群に分類する分類し、分類結果を記憶するステップ、Ｂ）前記巨視的観点群に属する文書について、文書毎に、文書別キーワードを決定し、決定した各文書別キーワードに基づいて類似度別群に分類するとともに、各類似度別群における類似度別群代表キーワードを決定するステップ、Ｃ）操作者の指定した、または抽出対象として予め定められた主体特定ワードについて、前記各主体別微視的観点群に属する文書について、主体別微視的観点群別に代表キーワードを決定するステップ、Ｄ）前記主体別微視的観点群代表キーワード決定手段が決定した前記主体別微視的観点群別の代表キーワードと、前記各類似度別群のキーワードとの類似度を判断するステップ、Ｅ）前記類似度判断手段が判断した前記主体別微視的観点群と類似度別群群との類似度が所定の閾値よりも高い組み合わせを抽出し、抽出された主体別微視的観点群に属する文書データの文書特定データおよび類似度別群に属する文書の文書データ特定データを群毎に、同じ時間軸上に前記作成時期の時系列順に並べた関連表示データを生成するステップ。 (5) In the comparison display data generating method according to the present invention, a computer stores a subject specifying word for specifying a subject, and the computer executes the following steps. A) When document data associated with creation time data is given, it is determined whether or not the subject specifying word exists for each document data. If the subject specifying word exists, the subject specifying word is specified. A subject-specific microscopic viewpoint group for each subject to be classified, and when the subject-specific word does not exist, classify into a macroscopic viewpoint group, and store the classification result; B) the macroscopic viewpoint group Determining for each document a document keyword for each document belonging to the document, classifying the keyword into a group by similarity based on the determined document keyword, and determining a group representative keyword by similarity in each similarity group C) Subject-specific words designated by the operator or predetermined as extraction targets, and documents belonging to each subject-specific microscopic viewpoint group, A step of determining a representative keyword for each point group; D) a representative keyword for each subject-specific microscopic viewpoint group determined by the subject-specific microscopic viewpoint group representative keyword determining means; E) extracting a combination having a similarity between the subject-specific microscopic viewpoint group and the similarity group group determined by the similarity determination means being higher than a predetermined threshold Related display in which document specific data of document data belonging to a group of microscopic viewpoints by subject and document data specific data of documents belonging to a group by similarity are arranged for each group in the order of time series of the creation time on the same time axis Generating data.

対比表示データ生成表示装置１の機能ブロック図である。It is a functional block diagram of contrast display data generation display device 1. 対比表示データ生成表示装置１を、ＣＰＵを用いて実現したハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitution which implement | achieved contrast display data production | generation display apparatus 1 using CPU. 文書記憶部２６dに記憶される文書のデータ構造を示す図である。It is a figure which shows the data structure of the document memorize | stored in the document memory | storage part 26d. メインプログラム２６mのフローチャートを示す図である。It is a figure which shows the flowchart of the main program 26m. 図４のステップＳ３の詳細フローチャートである。It is a detailed flowchart of step S3 of FIG. 分類結果のデータ構造を示す図である。It is a figure which shows the data structure of a classification result. 群別代表ワードを示す図である。It is a figure which shows the group representative word. 図４のステップＳ５の詳細フローチャートである。It is a detailed flowchart of step S5 of FIG. 群間関連結果を示す図である。It is a figure which shows a group related result. 対比表示データの表示例である。It is a display example of contrast display data.

以下、本発明における実施形態について、図面を参照して説明する。 Embodiments of the present invention will be described below with reference to the drawings.

（１. 機能ブロック図）
図１に、本発明の１実施形態にかかる対比表示データ生成表示装置１の機能ブロック図を示す。 (1. Functional block diagram)
FIG. 1 shows a functional block diagram of a comparison display data generation / display apparatus 1 according to an embodiment of the present invention.

対比表示データ生成表示装置１は、文書記憶手段３、主体特定ワード記憶手段４、分類手段５、分類結果記憶手段６、類似度別群代表キーワード決定手段８、主体別微視的観点群代表キーワード決定手段７、類似度判断手段９、生成手段１０、表示手段１１を備えている。 The comparison display data generation / display apparatus 1 includes a document storage unit 3, a subject identification word storage unit 4, a classification unit 5, a classification result storage unit 6, a similarity-specific group representative keyword determination unit 8, and a subject-specific microscopic viewpoint group representative keyword. A determination unit 7, a similarity determination unit 9, a generation unit 10, and a display unit 11 are provided.

文書記憶手段３は、本文と作成時期データが対応づけられた文書データを複数記憶する。主体特定ワード記憶手段４は、主体を特定するための主体特定ワード１または２以上を記憶する。分類手段５は、文書記憶手段３から、本文と作成時期データが対応づけられた文書データが与えられると、各文書データの本文に前記主体特定ワード記憶手段４に記憶された前記１または２以上の主体特定ワードが存在するか否か判断し、前記主体特定ワードが存在する場合には前記主体特定ワードで特定される主体毎に主体別微視的観点群に分類し、前記主体特定ワードが存在しない場合には巨視的観点群に分類する。分類結果記憶手段６は、前記分類結果を記憶する。 The document storage means 3 stores a plurality of document data in which the text and the creation time data are associated with each other. The subject specifying word storage unit 4 stores the subject specifying word 1 or two or more for specifying the subject. When the document data in which the text and the creation time data are associated with each other is given from the document storage means 3, the classification means 5 stores the one or two or more stored in the subject specifying word storage means 4 in the text of each document data. If the subject identification word exists, the subject identification word is classified into a subject-specific microscopic viewpoint group for each subject identified by the subject identification word. If it does not exist, classify it into a macroscopic viewpoint group. The classification result storage means 6 stores the classification result.

類似度別群代表キーワード決定手段８は、前記巨視的観点群に属する文書について、各文書の本文におけるワード出現度に基づいて、文書毎に、文書別キーワードを１または２以上決定し、各文書別キーワードが合致する文書を類似度別群に分類するとともに、各類似度別群における類似度別群代表キーワードを１または２以上決定する。主体別微視的観点群代表キーワード決定手段７は、操作者の指定した、または抽出対象として予め定められた主体特定ワードについて、当該主体特定ワードで特定される主体別微視的観点群に属する文書における出現度に基づいて、当該主体別微視的観点群の主体別微視的観点群代表キーワードを１または２以上決定する。類似度判断手段９は、前記各主体別微視的観点群代表キーワードと、前記各類似度別群文書別キーワードとの合致度から主体別微視的観点群と類似度別群との類似度を判断する。生成手段１０は、前記主体別微視的観点群と類似度別群との類似度が所定の閾値よりも高い組み合わせを抽出し、抽出された主体別微視的観点群に属する文書データの文書特定データおよび類似度別群に属する文書の文書データ特定データを群毎に、同じ時間軸上に前記作成時期の時系列順に並べた関連表示データを生成する。表示手段１１は生成された関連表示データを表示する。 The similarity-specific group representative keyword determining means 8 determines one or more document-specific keywords for each document based on the word appearance degree in the body of each document for the documents belonging to the macroscopic viewpoint group. Documents that match different keywords are classified into groups according to similarities, and one or two or more group representative keywords according to similarities in each similarity degree group are determined. The subject-specific microscopic viewpoint group representative keyword determining means 7 belongs to the subject-specific microscopic viewpoint group specified by the subject specifying word for the subject specifying word designated by the operator or predetermined as the extraction target. Based on the degree of appearance in the document, one or more subject-specific microscopic viewpoint group representative keywords of the subject-specific microscopic viewpoint group are determined. Similarity determination means 9 determines the similarity between the subject-specific microscopic viewpoint group and the similarity-specific group based on the degree of coincidence between the subject-specific microscopic viewpoint group representative keyword and the similarity-specific group-specific keyword. Judging. The generation means 10 extracts a combination of the subject-specific microscopic viewpoint group and the similarity-specific group having a similarity higher than a predetermined threshold, and the document data document belonging to the extracted subject-specific microscopic viewpoint group Related display data in which document data specifying data of documents belonging to specific data and similarity groups is arranged for each group in the time series of the creation time on the same time axis is generated. The display means 11 displays the generated related display data.

このように、本実施形態においては、以下の1)〜5)の処理をしているので、主体別微視的観点群に属する文書と、主体別微視的観点群に属する文書を同じ時間軸に並べた対比データを生成することができる。1)各文書データの本文に前記主体特定ワードが存在するか否か判断し、主体別微視的観点群と巨視的観点群に分類する。2)この巨視的観点群に属する文書について、各文書におけるワード出現度に基づいて、類似度別群に分類するとともに、各類似度別群における類似度別群代表キーワードを決定する。3)主体別微視的観点群に属する文書における出現度に基づいて、当該主体別微視的観点群の主体別微視的観点群代表キーワードを決定する。4)前記各主体別微視的観点群代表キーワードと、前記各類似度別群文書別キーワードとの合致度から主体別微視的観点群と類似度別群との類似度を判断する。5)前記主体別微視的観点群と類似度別群との類似度が所定の閾値よりも高い組み合わせを抽出し、抽出された主体別微視的観点群に属する文書データの文書特定データおよび類似度別群に属する文書の文書データ特定データを群毎に、同じ時間軸上に前記作成時期の時系列順に並べた関連表示データを生成する。 As described above, in the present embodiment, since the following processes 1) to 5) are performed, a document belonging to the subject-specific microscopic viewpoint group and a document belonging to the subject-specific microscopic viewpoint group are processed at the same time. Comparison data arranged on the axis can be generated. 1) It is determined whether or not the subject specifying word exists in the body of each document data, and classified into a subject-specific microscopic viewpoint group and a macroscopic viewpoint group. 2) The documents belonging to this macroscopic viewpoint group are classified into groups by similarity based on the word appearance degree in each document, and a group representative keyword by similarity is determined in each group by similarity. 3) Based on the degree of appearance in the document belonging to the subject-specific microscopic viewpoint group, the subject-specific microscopic viewpoint group representative keyword of the subject-specific microscopic viewpoint group is determined. 4) The similarity between the subject-specific microscopic viewpoint group and the similarity-specific group is determined from the degree of coincidence between the subject-specific microscopic viewpoint group representative keyword and the similarity-specific group-specific keyword. 5) extracting the combination of the subject-specific microscopic viewpoint group and the similarity-specific group having a similarity higher than a predetermined threshold, and specifying the document specifying data of the document data belonging to the extracted subject-specific microscopic viewpoint group; Related display data in which document data specifying data of documents belonging to groups by similarity is arranged for each group in the time series of the creation time on the same time axis is generated.

（２. ハードウェア構成）
図１に示す対比表示データ生成表示装置１のハードウェア構成について、図２を用いて説明する。同図は、対比表示データ生成表示装置１を、ＣＰＵを用いて構成したハードウェア構成の一例である。 (2. Hardware configuration)
The hardware configuration of the comparison display data generation / display apparatus 1 shown in FIG. 1 will be described with reference to FIG. FIG. 1 shows an example of a hardware configuration in which the comparison display data generation / display apparatus 1 is configured using a CPU.

対比表示データ生成表示装置１は、ＣＰＵ２３、メモリ２７、ハードディスク２６、モニタ３０、光学式ドライブ２５、入力デバイス２８（マウス２８ａ，キーボード２８ｂ）、およびバスライン２９を備えている。ＣＰＵ２３は、ハードディスク２６に記憶された各プログラムにしたがいバスライン２９を介して、各部を制御する。 The comparison display data generation / display apparatus 1 includes a CPU 23, a memory 27, a hard disk 26, a monitor 30, an optical drive 25, an input device 28 (mouse 28 a, keyboard 28 b), and a bus line 29. The CPU 23 controls each unit via the bus line 29 according to each program stored in the hard disk 26.

ハードディスク２６は、オペレーティングシステムプログラム２６ｏ（以下ＯＳと略す）、メインプログラム２６mが記憶されており、各種のデータが記憶される。本実施形態においては、オペレーティングシステムプログラム（ＯＳ）２６ｏとして、LINUX（登録商標または商標）を採用したが、これに限定されるものではない。 The hard disk 26 stores an operating system program 26o (hereinafter abbreviated as OS) and a main program 26m, and stores various data. In this embodiment, LINUX (registered trademark or trademark) is adopted as the operating system program (OS) 26o, but the present invention is not limited to this.

なお、上記各プログラムは、光学式ドライブ２５を介して、プログラムが記憶されたＣＤ−ＲＯＭ２５ａから読み出されてハードディスク２６にインストールされたものである。なお、ＣＤ−ＲＯＭ以外に、フレキシブルディスク（ＦＤ）、ＩＣカード等のプログラムをコンピュータ可読の記録媒体から、ハードディスクにインストールさせるようにしてもよい。さらに、通信回線を用いてダウンロードするようにしてもよい。 Each of the above programs is read from the CD-ROM 25a storing the program via the optical drive 25 and installed in the hard disk 26. In addition to the CD-ROM, a program such as a flexible disk (FD) or an IC card may be installed on a hard disk from a computer-readable recording medium. Furthermore, it may be downloaded using a communication line.

本実施形態においては、プログラムをＣＤ−ＲＯＭからハードディスク２６にインストールさせることにより、ＣＤ−ＲＯＭに記憶させたプログラムを間接的にコンピュータに実行させるようにしている。しかし、これに限定されることなく、ＣＤ−ＲＯＭに記憶させたプログラムを光学式ドライブ２５から直接的に実行するようにしてもよい。なお、コンピュータによって、実行可能なプログラムとしては、そのままインストールするだけで直接実行可能なものはもちろん、一旦他の形態等に変換が必要なもの（例えば、データ圧縮されているものを、解凍する等）、さらには、他のモジュール部分と組合して実行可能なものも含む。 In the present embodiment, the program stored in the CD-ROM is indirectly executed by the computer by installing the program from the CD-ROM to the hard disk 26. However, the present invention is not limited to this, and the program stored in the CD-ROM may be directly executed from the optical drive 25. Note that programs that can be executed by a computer are not only programs that can be directly executed as they are installed, but also programs that need to be converted to other forms once (for example, those that have been compressed) In addition, those that can be executed in combination with other module parts are also included.

ハードディスク２６に記憶される各種データのデータ構造について説明する。文書記憶部２６dには、図３に示すような文書が記憶されている。各文書は、文書ＩＤ３０１、日時３０２、本文３０３を有する。文書ＩＤ３０１は、文書を一意に特定するための識別子である。日時３０２は、当該文書の作成日時である。本文３０３は、当該文書の内容を表すテキストデータである。例えば文書ＩＤ「Ｄ００１」の文書は、作成日時が「２００３／０３／３００１６：４３：００」、本文が「X社が携帯向け製品○○を発表…」であることを表している。 The data structure of various data stored in the hard disk 26 will be described. A document as shown in FIG. 3 is stored in the document storage unit 26d. Each document has a document ID 301, a date 302, and a body 303. A document ID 301 is an identifier for uniquely identifying a document. Date 302 is the date and time when the document was created. A body 303 is text data representing the contents of the document. For example, the document with the document ID “D001” indicates that the creation date and time is “2003/03/30 016: 43: 00” and the text is “Company X announces portable product OO ...”.

企業名抽出辞書２６kには、ミクロ分析の対象とする企業名が複数記憶されている。本実施形態においては、企業名として、「Ｘ社」「Ａ社」「Ｂ社」「Ｃ社」「Ｄ社」の５つを記憶した。マクロ分類所属決定辞書２６mdには、後述するミクロ分析対象とされなかった文書について、各文書の本文の記述されたテキストデータに基づいて、マクロ分析の細分類カテゴリーのいずれに属するかを決定するための分類データが記憶されている。本実施形態においては、各文書について代表キーワードを決定して、この代表キーワードについて、概念辞書を用いて、マクロ環境分析における分類軸である「政治」・「経済」・「社会」・「技術」の４つのカテゴリのいずれに属するかを決定するようにした。概念辞書としては、たとえば、WordNet（商標）（http://wordnet.princeton.edu/）を採用すればよい。 The company name extraction dictionary 26k stores a plurality of company names to be subjected to micro analysis. In the present embodiment, five company names “Company X”, “Company A”, “Company B”, “Company C”, and “Company D” are stored. The macro classification affiliation determination dictionary 26md is used to determine which of the sub-category categories of the macro analysis is based on the text data described in the body of each document for a document that has not been subjected to micro analysis described later. The classification data is stored. In the present embodiment, a representative keyword is determined for each document, and “politics”, “economics”, “society”, and “technology”, which are classification axes in macro environment analysis, are used for the representative keywords using a concept dictionary. Which of the four categories is determined. For example, WordNet (trademark) (http://wordnet.princeton.edu/) may be adopted as the concept dictionary.

分類結果記憶部２６r、群別代表ワード記憶部２６g、群間関連記憶部２６gkに記憶されるデータのデータ構造については、後述する。 The data structure of data stored in the classification result storage unit 26r, the group-specific representative word storage unit 26g, and the inter-group relation storage unit 26gk will be described later.

（３．メインプログラムによる処理)
メインプログラム２６mに基づく処理について、図４のフローチャートを用いて説明する。 (3. Processing by the main program)
Processing based on the main program 26m will be described with reference to the flowchart of FIG.

ＣＰＵ２３は文書分類処理を行う（図４ステップＳ１）。文書分類処理の詳細フローチャートを図５に示す。以下、分類対象として図３に示す文書を分類する場合を例として説明する。 The CPU 23 performs document classification processing (step S1 in FIG. 4). A detailed flowchart of the document classification process is shown in FIG. Hereinafter, a case where the document shown in FIG. 3 is classified as a classification target will be described as an example.

ＣＰＵ２３は対象文書番号nを初期化する（ステップＳ１１）。ＣＰＵ２３は、対象文書番号n番目の文書が企業名を含むか否か判断する（ステップＳ１３）。この場合、n＝１であるので、ＣＰＵ２３は、１番目の文書ＩＤ「Ｄ００１」について、本文に企業名を含むか否かを判断する。この場合、企業名抽出辞書２６kには、既に説明したように、企業名として、「Ｘ社」「Ａ社」「Ｂ社」「Ｃ社」「Ｄ社」が記憶されている。したがって、ＣＰＵ２３は、文書ＩＤ「Ｄ００１」について、本文に企業名を含むと判断して、文書ＩＤ「Ｄ００１」について、分類軸ラベルを「Ｘ社」、種別「企業」とする。 The CPU 23 initializes the target document number n (step S11). The CPU 23 determines whether or not the document with the object document number n includes a company name (step S13). In this case, since n = 1, the CPU 23 determines whether or not the first document ID “D001” includes the company name in the text. In this case, as described above, “Company X”, “Company A”, “Company B”, “Company C”, and “Company D” are stored in the company name extraction dictionary 26k. Therefore, the CPU 23 determines that the document ID “D001” includes the company name in the body, and sets the classification axis label “Company X” and the type “Company” for the document ID “D001”.

ＣＰＵ２３は、ステップＳ１５で決定した文書ＩＤ「Ｄ００１」について、分類軸ラベル「Ｘ社」、種別「企業」を分類結果として分類結果記憶部に記憶する（ステップＳ１９）。ＣＰＵ２３は、全て終了したか否か判断する（図５ステップＳ２１）。この場合、終了していないので、対象文書番号nをインクリメントする。 The CPU 23 stores the classification axis label “Company X” and the type “Company” as the classification result in the classification result storage unit for the document ID “D001” determined in Step S15 (Step S19). The CPU 23 determines whether or not all the processes have been completed (step S21 in FIG. 5). In this case, since the processing has not been completed, the target document number n is incremented.

ＣＰＵ２３は、対象文書番号n番目の文書が企業名を含むか否か判断する（ステップＳ１３）。この場合、n＝２であるので、ＣＰＵ２３は、２番目の文書ＩＤ「Ｄ００２」について、本文に企業名を含むか否かを判断する。この場合、ＣＰＵ２３は、文書ＩＤ「Ｄ００２」について、本文に企業名を含むと判断して、文書ＩＤ「Ｄ００２」について、分類軸ラベルを「Ｘ社」、種別「企業」とする。 The CPU 23 determines whether or not the document with the object document number n includes a company name (step S13). In this case, since n = 2, the CPU 23 determines whether or not the company name is included in the text for the second document ID “D002”. In this case, the CPU 23 determines that the document ID “D002” includes the company name in the body, and sets the classification axis label “X company” and the type “company” for the document ID “D002”.

ＣＰＵ２３は、ステップＳ１５で決定した文書ＩＤ「Ｄ００２」について、分類軸ラベル「Ｘ社」、種別「企業」を分類結果として分類結果記憶部に記憶する（ステップＳ１９）。 The CPU 23 stores the classification axis label “Company X” and the type “Company” as the classification result in the classification result storage unit for the document ID “D002” determined in Step S15 (Step S19).

以下、ステップＳ２３からステップＳ１３の処理を繰り返す。そして、ステップＳ２３にて、nをインクリメントした結果、n＝１１となると、ステップＳ１３にて、１１番目の文書ＩＤ「Ｄ０１１」についての判断がなされる。この場合、本文に企業名を含まないので、ステップＳ１７に進み、文書ＩＤ「Ｄ０１１」について、代表キーワードを重要語として抽出すると共に、マクロ分析のカテゴリ決定処理を行う（ステップＳ１７）。代表キーワードの決定処理は、本文のテキストから重要語を抽出すればよい。重要語は、形態素解析エンジン等を用いて、本文のテキストからキーワードを抽出し、最も頻度の大きなキーワードを選択すればよい。また、テキストの構造を解析し、タイトルや本文の一行目のように、重要語が含まれる可能性の高い箇所のキーワードには重みを付けるようにしてもよい。その他、一般的なtf-idf法などを採用してもよい。 Thereafter, the processing from step S23 to step S13 is repeated. If n is incremented as a result of incrementing n in step S23, the eleventh document ID “D011” is determined in step S13. In this case, since the company name is not included in the text, the process proceeds to step S17, where the representative keyword is extracted as an important word for the document ID “D011”, and the macro analysis category determination process is performed (step S17). The representative keyword determination process may be performed by extracting an important word from the body text. For the important word, a keyword is extracted from the text of the body using a morphological analysis engine or the like, and the keyword with the highest frequency may be selected. Also, the structure of the text may be analyzed, and keywords such as the title and the first line of the body that are likely to contain important words may be weighted. In addition, a general tf-idf method or the like may be employed.

ここでは、代表キーワードとして、「携帯サイト規制法」が特定される。なお、かかる代表キーワードは概念辞書により、分類軸「政治」に該当すると判断されたものとする。 Here, the “mobile site regulation law” is specified as a representative keyword. It is assumed that the representative keyword is determined by the concept dictionary to correspond to the classification axis “politics”.

ＣＰＵ２３は、ステップＳ１７で決定した文書ＩＤ「Ｄ０１１」について、分類軸ラベル「携帯サイト規制法」、種別「政治」を分類結果として分類結果記憶部に記憶する（ステップＳ１９）。 For the document ID “D011” determined in step S17, the CPU 23 stores the classification axis label “mobile site regulation law” and the type “politics” as the classification result in the classification result storage unit (step S19).

以下、全ての文書について、ステップＳ１３〜ステップＳ１９の処理を行った場合には、文書分類処理を終了する。 Hereinafter, when the processes of steps S13 to S19 are performed for all documents, the document classification process is terminated.

図６に、分類結果記憶部２６rに記憶された分類結果を示す。この実施形態では、ＣＰＵ２３は、分類結果記憶部２６rに記憶する際に、分類軸ラベルの値が既に存在する場合には、同じ分類群ＩＤを付与して記憶するようにしている。 FIG. 6 shows the classification results stored in the classification result storage unit 26r. In this embodiment, when storing in the classification result storage unit 26r, the CPU 23 assigns and stores the same classification group ID when the value of the classification axis label already exists.

つぎに、ＣＰＵ２３は、群別代表ワード決定処理を行う（図４ステップＳ３）。群別代表ワードは、同じ群に属する全ての文書について形態素解析を行い、当該群に属する文書が有する全キーワードから、tf-idf法等により、評価値が閾値を超えたワードを、１または２以上、決定すればよい。本実施形態においては、図７に示すような群別代表ワードが生成されたものとする。 Next, the CPU 23 performs group-specific representative word determination processing (step S3 in FIG. 4). The group-specific representative word is obtained by performing morphological analysis on all documents belonging to the same group, and by using the tf-idf method or the like, the word having an evaluation value exceeding the threshold is determined by 1 or 2 The determination is as described above. In the present embodiment, it is assumed that group-specific representative words as shown in FIG. 7 are generated.

つぎに、ＣＰＵ２３は関連度決定処理を行う（図４ステップＳ５）。関連度決定処理の詳細フローチャートを図８に示す。ＣＰＵ２３は、分類群の一覧を取得する（ステップＳ３１）。ＣＰＵ２３は対象群番号mを初期化する（ステップＳ３３）。ＣＰＵ２３は、対象群番号m番目の群について、類似する他の群を抽出する（ステップＳ３５）。この場合、m＝１であるので、ＣＰＵ２３は、１番目の分類群ＩＤ「Ｇ００１」のキーワード「Ｘ社」、「携帯」、「製品○○」を読み出して、他の群におけるキーワードとの類似度を判断する。これにより、分類群ＩＤ「Ｇ００１」と「Ｇ００２」、「Ｇ００３」・・・・の群間類似度が得られる。 Next, the CPU 23 performs relevance determination processing (step S5 in FIG. 4). A detailed flowchart of the relevance determination process is shown in FIG. The CPU 23 acquires a list of classification groups (step S31). The CPU 23 initializes the target group number m (step S33). The CPU 23 extracts another similar group for the target group number m-th group (step S35). In this case, since m = 1, the CPU 23 reads the keywords “Company X”, “mobile”, and “product XX” of the first classification group ID “G001”, and similar to the keywords in other groups Judge the degree. Accordingly, the similarity between groups of the classification group IDs “G001”, “G002”, “G003”,... Is obtained.

ＣＰＵ２３は、結果を、群間関連記憶部２６gkに記憶する（ステップＳ３７）。かかる処理を全ての群について行うことにより、群間の関連度が得られる。なお、分類群ＩＤ「Ｇ００１」を軸にして、他の分類群との関連度を演算しているので、既に関連度を求めた場合がある。その場合には計算を省略すればよい。 The CPU 23 stores the result in the inter-group relation storage unit 26gk (Step S37). By performing this process for all groups, the degree of association between groups can be obtained. Since the degree of association with another classification group is calculated around the classification group ID “G001”, the degree of association may already be obtained. In that case, the calculation may be omitted.

図９に、群間関連記憶部２６gkに記憶された結果を示す。この場合、分類群Ｇ００１が分類群Ｇ００２と，分類群Ｇ００１が分類群Ｇ００３と、分類群Ｇ００１が分類群Ｇ００４と類似しており、また、分類群Ｇ００４が分類群Ｇ００５と類似しているとの関連結果が得られている。 FIG. 9 shows the results stored in the inter-group relation storage unit 26gk. In this case, the classification group G001 is similar to the classification group G002, the classification group G001 is similar to the classification group G003, the classification group G001 is similar to the classification group G004, and the classification group G004 is similar to the classification group G005. Relevant results have been obtained.

ＣＰＵ２３は、得られた群間関連度を表示する表示データを生成して、モニタ３０に表示する（図４ステップＳ７）。本実施形態においては、図１０に示すように、ミクロ分析結果とマクロ分析結果を同じ時間軸上に、群毎に分類して各文書の作成日時が対比できるように、表示する表示データを生成するようにした。具体的には、ミクロ分析結果表示領域５１０には、分類軸種別「企業」の分類群「Ｇ００１」，「Ｇ００２」、「Ｇ００３」がそれぞれ配置されている。各分類群は、当該分類群に属する文書を特定する文書ＩＤが時系列順に並べてられている。たとえば、分類群「Ｇ００１」は、文書Ｄ００１，Ｄ００２、Ｄ００３がそれぞれ、２００３年、２００５年、２００７年の文書であることがわかるように表示されている。 CPU23 produces | generates the display data which display the obtained degree of association between groups, and displays it on the monitor 30 (FIG. 4 step S7). In this embodiment, as shown in FIG. 10, the display data to be displayed is generated so that the micro analysis result and the macro analysis result are classified into groups on the same time axis and the creation date and time of each document can be compared. I tried to do it. Specifically, classification groups “G001”, “G002”, and “G003” of the classification axis type “company” are arranged in the micro analysis result display area 510, respectively. In each classification group, document IDs specifying documents belonging to the classification group are arranged in time series. For example, the classification group “G001” is displayed so that it can be seen that the documents D001, D002, and D003 are documents of 2003, 2005, and 2007, respectively.

また、マクロ分析結果表示領域５２０には、分類軸種別「政治」Ｇ００４，Ｇ００５が表示される。分類群ＩＤ「Ｇ００４」，「Ｇ００５」の群には、当該分類群に属する文書を特定する文書ＩＤが時系列順に並べてられている。たとえば、分類群「Ｇ００４」は、文書Ｄ０１１，Ｄ０１２がそれぞれ、２００５年、２００９年の文書であることがわかるように表示されている。 In the macro analysis result display area 520, the classification axis type “politics” G004, G005 is displayed. In the group of classification group IDs “G004” and “G005”, document IDs that specify documents belonging to the classification group are arranged in time series. For example, the classification group “G004” is displayed so that it can be understood that the documents D011 and D012 are documents of 2005 and 2009, respectively.

したがって、操作者は、かかる表示を参考に、マクロ経済分析結果を参照したミクロ経済分析が可能となる。これにより、ビジネス環境分析、特に、ニュース記事や業界レポートなどのテキストを元に企業を取り巻く外部環境を分析し、マーケティング活動、あるいは戦略立案・策定を支援するビジネス分析環境装置およびその方法を提供することができる。 Therefore, the operator can perform a microeconomic analysis with reference to the macroeconomic analysis result with reference to the display. This provides a business analysis environment apparatus and method for supporting business environment analysis, particularly the external environment surrounding the company based on texts such as news articles and industry reports, and supporting marketing activities or strategy planning / development. be able to.

なお、この場合、分類群Ｇ００１と関連する分類群Ｇ００２、Ｇ００３、Ｇ００４だけでなく、かかる分類群Ｇ００４と類似する分類群Ｇ００５も表示されている。したがって、操作者が想定していなかった分析が可能となる。 In this case, not only the classification groups G002, G003, and G004 related to the classification group G001 but also the classification group G005 similar to the classification group G004 is displayed. Therefore, an analysis that is not assumed by the operator is possible.

なお、この場合、マクロ分析結果表示領域５２０の「経済」、「社会」、「技術」には、該当する分類群が存在しないので、これらは空欄である。 In this case, since there is no corresponding classification group in “economy”, “society”, and “technology” in the macro analysis result display area 520, these are blank.

（４.他の実施形態）
本実施形態においては、対比データ生成表示装置として構成したが、生成装置として把握することもできる。また、文書データについては、別のコンピュータから取得するようにしてもよい。 (4. Other embodiments)
In the present embodiment, the comparison data generation / display apparatus is configured, but the comparison data generation / display apparatus can be grasped. Further, the document data may be acquired from another computer.

本実施形態においては、マクロデータをさらに４つのカテゴリー（政治、経済、社会、技術）に分かる場合について説明したが、カテゴリー数については任意としてもよい。 In the present embodiment, the case has been described in which the macro data is further known in four categories (politics, economy, society, technology), but the number of categories may be arbitrary.

本実施形態においては、企業名抽出辞書２６kに記憶された企業について、全て、表示対象とするようにしたが、表示対象についてはさらに操作者から指定させるようにしてもよいし、また、表示対象を予め定めておいてもよい。 In the present embodiment, all companies stored in the company name extraction dictionary 26k are set as display targets. However, the display target may be further specified by the operator, or the display target may be displayed. May be determined in advance.

さらに、対比表示データ生成表示装置が、類似する企業を候補として選択するようにしてもよい。選択手法としては、予め定められた主体について操作者の指定した、または抽出対象として予め定められた主体特定ワード以外の主体特定ワードについても、前記各主体別微視的観点群に属する文書について、主体別微視的観点群別に代表キーワードを決定する。そして、操作者の指定した、または抽出対象として予め定められた主体特定ワード以外の主体特定ワードで特定される主体別微視的観点群のうち、前記所定の閾値よりも高い組み合わせとなる主体別微視的観点群についても、前記同じ時間軸上に関連表示するようにすればよい。これにより、操作者が所望する企業以外についても、ミクロ分析結果表示領域５１０、マクロ分析結果表示領域５２０に対応させて表示することができる。 Furthermore, the comparison display data generation display device may select a similar company as a candidate. As a selection method, for a subject belonging to a subject-specific word other than a subject-specific word specified by an operator for a predetermined subject or predetermined as an extraction target, Representative keywords are determined for each group of microscopic viewpoints. Then, among the subject-specific microscopic viewpoint groups specified by the subject specifying word other than the subject specifying word specified by the operator or predetermined as the extraction target, the subject-specific combination that is higher than the predetermined threshold The microscopic viewpoint group may be displayed in association on the same time axis. As a result, companies other than those desired by the operator can be displayed in correspondence with the micro analysis result display area 510 and the macro analysis result display area 520.

具体的には、操作者は、「Ｘ社」「Ａ社」「Ｂ社」についてミクロ分析対象として特定したとして、これらと関係するマクロ分析対象のデータと高い創刊を有するのが、「Ｃ社」「Ｄ社」である場合に、かかる「Ｃ社」「Ｄ社」のミクロ分析結果をあわせて表示することができる。 Specifically, assuming that “Company X”, “Company A”, and “Company B” have been specified as micro analysis targets, the operator has data of macro analysis targets related to these and a high publication. In the case of “Company D”, the micro analysis results of “Company C” and “Company D” can be displayed together.

なお、上記実施形態においては、各群に属する全文書から群別代表ワードを決定し、各群別代表ワードの出現頻度に基づいてベクトルを生成し、それらの内積値を文書間の類似度と定義し、類似度の高いものを取得するようにした。しかしこれに限定されず、各群に属する全ての文書の本文テキストを連結したものを群別の種文書として、群別種文書間で類似度を判断するようにしてもよい。 In the above embodiment, the representative word for each group is determined from all the documents belonging to each group, a vector is generated based on the appearance frequency of each representative word for each group, and the inner product value is used as the similarity between documents. Defined and acquired high similarity. However, the present invention is not limited to this, and the degree of similarity between the group-specific types of documents may be determined by combining the body texts of all the documents belonging to each group as group-specific types of documents.

上記実施形態においては、図１に示す機能を実現するために、ＣＰＵ２３を用い、ソフトウェアによってこれを実現している。しかし、その一部もしくは全てを、ロジック回路などのハードウェアによって実現してもよい。なお、プログラムの一部の処理を、オペレーティングシステム（ＯＳ）にさせるようにしてもよい。 In the above embodiment, the CPU 23 is used to realize the function shown in FIG. 1, and this is realized by software. However, some or all of them may be realized by hardware such as a logic circuit. In addition, you may make it make an operating system (OS) process a part of program.

Claims

Subject specifying word storage means for storing one or more subject specifying words for specifying the subject,
When the document data in which the body text and the creation time data are associated with each other is given, it is determined whether or not the one or more subject specifying words exist in the body of each document data, and the one or two or more body identification words are included in the body. If there is a subject-specific word, classify the subject-specific microscopic viewpoint group for each subject specified by the subject-specific word, and if there is no subject-specific word, classify it as a macroscopic viewpoint group Classification means,
Classification result storage means for storing the classification result;
Based on the word appearance degree in the body text of the document belonging to the macroscopic viewpoint group, one or two or more document keywords are determined for each document, and the documents matching each document keyword are classified into the similarity group. , Group representative keyword determination means by similarity for determining one or more group representative keywords by similarity in each group by similarity,
For subject-specific words designated by the operator or predetermined as extraction targets, the subject-specific words are determined based on the appearance level of the document belonging to the subject-specific microscopic viewpoint group specified by the subject-specific words. A subject-specific microscopic viewpoint group representative keyword determining means for determining one or more subject-specific microscopic viewpoint group representative keywords of the visual viewpoint group;
Similarity determination means for determining the similarity between the subject-specific microscopic viewpoint group and the similarity-specific group from the degree of match between the subject-specific microscopic viewpoint group representative keyword and the similarity-specific group-specific keyword ,
The combination of the subject-specific microscopic viewpoint group and the similarity-specific group is extracted with a combination higher than a predetermined threshold, and the document specifying data and similarity of the document data belonging to the extracted subject-specific microscopic viewpoint group Generating means for generating related display data in which document data specifying data of documents belonging to different groups are arranged for each group in time series of the creation time on the same time axis;
A comparison display data generation device.

A subject specifying word storage means for storing a subject specifying word for specifying the subject,
When document data associated with creation time data is given, it is determined whether or not the subject specifying word exists for each document data. If the subject specifying word exists, the subject specifying word is specified by the subject specifying word. Classifying means for classifying each subject according to a microscopic viewpoint group for each subject, and when there is no subject specific word, classifying into a macroscopic viewpoint group,
Classification result storage means for storing the classification result;
For each document belonging to the macroscopic viewpoint group, a keyword for each document is determined for each document, classified into a group by similarity based on the determined keyword for each document, and a group representative by similarity in each group by similarity Group representative keyword determination means by similarity to determine keywords,
Subject-specific micros to determine representative keywords for each subject-specific microscopic viewpoint group for documents belonging to each subject-specific microscopic viewpoint group for subject-specific words specified by the operator or predetermined as extraction targets Representative viewpoint group representative keyword decision means,
Similarity determination means for determining the similarity between the representative keyword for each subject microscopic viewpoint group determined by the subject-specific microscopic viewpoint group representative keyword determination means and the keyword for each similarity degree group;
A combination in which the similarity between the subject-specific microscopic viewpoint group determined by the similarity determination means and the similarity-specific group group is higher than a predetermined threshold is extracted and belongs to the extracted subject-specific microscopic viewpoint group Generating means for generating related display data in which document specific data of document data and document data specific data of a document belonging to a group classified by similarity are arranged for each group in time series on the same time axis;
A comparison display data generation device.

In the comparison display data generation device according to claim 2,
The subject-specific microscopic viewpoint group representative keyword determining means belongs to the subject-specific microscopic viewpoint group even for subject-specific words other than the subject-specific words specified by the operator or predetermined as extraction targets. For a document, determine one or more representative keywords for each group of microscopic viewpoints by subject,
The generation means includes a combination higher than the predetermined threshold among the subject-specific microscopic viewpoint groups specified by the subject specifying word other than the subject specifying word specified by the operator or predetermined as the extraction target. The subject-specific microscopic viewpoint group is related and displayed on the same time axis,
A comparison display data generation device characterized by the above.

A comparison display data generation program for causing a computer to function as the following means.
A subject specifying word storage means for storing a subject specifying word for specifying the subject,
When document data associated with creation time data is given, it is determined whether or not the subject specifying word exists for each document data. If the subject specifying word exists, the subject specifying word is specified by the subject specifying word. Classifying means for classifying each subject according to a microscopic viewpoint group for each subject, and when there is no subject specific word, classifying into a macroscopic viewpoint group,
Classification result storage means for storing the classification result;
For each document belonging to the macroscopic viewpoint group, a keyword for each document is determined for each document, classified into a group by similarity based on the determined keyword for each document, and a group representative by similarity in each group by similarity Group representative keyword determination means by similarity to determine keywords,
Subject-specific micros to determine representative keywords for each subject-specific microscopic viewpoint group for documents belonging to each subject-specific microscopic viewpoint group for subject-specific words specified by the operator or predetermined as extraction targets Representative viewpoint group representative keyword decision means,
Similarity determination means for determining the similarity between the representative keyword for each subject microscopic viewpoint group determined by the subject-specific microscopic viewpoint group representative keyword determination means and the keyword for each similarity degree group;
A combination in which the similarity between the subject-specific microscopic viewpoint group determined by the similarity determination means and the similarity-specific group group is higher than a predetermined threshold is extracted and belongs to the extracted subject-specific microscopic viewpoint group Generating means for generating related display data in which document specific data of document data and document data specific data of a document belonging to a group classified by similarity are arranged for each group in the time series of the creation time on the same time axis;

A comparison display data generation method in which a subject specifying word for specifying a subject is stored in a computer, and the computer executes the following steps.
When document data associated with creation time data is given, it is determined whether or not the subject specifying word exists for each document data. If the subject specifying word exists, the subject specifying word is specified by the subject specifying word. Classifying each subject into a microscopic viewpoint group for each subject, classifying into a macroscopic viewpoint group if the subject specific word does not exist, and storing the classification result;
For each document belonging to the macroscopic viewpoint group, a keyword for each document is determined for each document, classified into a group by similarity based on the determined keyword for each document, and a group representative by similarity in each group by similarity Determining keywords,
Determining a representative keyword for each subject-specific microscopic viewpoint group for a document belonging to each subject-specific microscopic viewpoint group for a subject-specific word designated by an operator or predetermined as an extraction target;
Determining a similarity between a representative keyword for each subject microscopic viewpoint group determined by the subject-specific microscopic viewpoint group representative keyword determining means and a keyword for each similarity degree group;
A combination in which the similarity between the subject-specific microscopic viewpoint group determined by the similarity determination means and the similarity-specific group group is higher than a predetermined threshold is extracted and belongs to the extracted subject-specific microscopic viewpoint group Generating related display data in which the document specification data of the document data and the document data specification data of the documents belonging to the groups classified by similarity are arranged for each group in the time series of the creation time on the same time axis.