JP2005128978A

JP2005128978A - Apparatus, program and method for automatic preparation of information analysis report

Info

Publication number: JP2005128978A
Application number: JP2003396361A
Authority: JP
Inventors: Hiroaki Masuyama; 博昭増山; Noriaki Yoshino; 令晃吉野
Original assignee: IPB KK
Current assignee: IPB KK
Priority date: 2003-10-22
Filing date: 2003-10-22
Publication date: 2005-05-19

Abstract

<P>PROBLEM TO BE SOLVED: To analyze information of a research object document against a comparison object document to prepare a report. <P>SOLUTION: This apparatus is an information analysis report preparation apparatus which automatically analyzes information of a research object document against a document used as a comparison object document, and it is provided with; an input means for inputting a research object document and a comparison object document; an input means for inputting conditions of information analysis; a selection means of a population document consisting of document groups similar to the research object document; an extraction means of an index word; an operation means of significance of the index word; an output means to output the selection result of the population and the operation result of the index word in a display means, a recording means, or a communication means. Thereby, information on the research object documents can be expressed exactly without reading the contents of research object documents and a vast quantity of comparison object documents at all. A program is also provided. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、文書の解析装置に係わり、特に調査対象文書或いは文書群を解析して、その特徴を表す情報解析報告書自動作成装置、プログラム、及び方法に関するものである。 The present invention relates to a document analysis apparatus, and more particularly to an information analysis report automatic creation apparatus, program, and method for analyzing a document or group of documents to be investigated and expressing the characteristics thereof.

特許文書をはじめ技術的文書やその他の文書は年々確実に量が増えている。近年、文書データが電子化されて流通するようになってから、膨大な文書から調査対象の文書に類似した文書だけを自動検索するシステムが実用化されてきた。しかし、それでも検索結果の類似文書の量は多く、調査対象の文書の内容或いは性格を知るためには、熟練した者が検索結果の類似文書を読み込まなければならなかった。 The amount of technical documents and other documents including patent documents has been steadily increasing year by year. In recent years, since document data has been digitized and circulated, a system for automatically searching only a document similar to a document to be investigated from a large number of documents has been put into practical use. However, the amount of similar documents as a search result is still large, and in order to know the content or character of the document to be investigated, a skilled person has to read the similar document as a search result.

例えば、特許文献１の「類似文書検索装置及び類似文書検索方法」においても、調査対象の文書或いは文書群に含まれる索引語を比較対象の文書群に含まれる索引語と比較し、類似する索引語の種類や出現回数などから類似度を算出し、最も類似度の高い文書から順に出力している。図６は特許文献１に示される装置の全体構成図である。従来、入力装置６０２から入力された調査対象文書は、制御装置６０１の中の類似度算出システムによりある抽出条件にて外部補助記憶装置６０３のデータベースの文書群と比較して類似度算出の処理をし、出力装置６０４にて出力し、出力した文書一覧の結果に基づいて類似度の高い文書の内容を熟練した評価者が読み込んで、調査対象文書の評価としていた。評価者は、類似度の高い文書の内容を知るために、それらの数件乃至数千件程度を読み込まなければならなかった。
特開平１１−７３４１５「類似文書検索装置及び類似文書検索方法」 For example, in the “similar document search device and similar document search method” of Patent Document 1, an index word included in a document or group of documents to be searched is compared with an index word included in a group of documents to be compared, and a similar index The similarity is calculated from the word type and the number of appearances, and the documents with the highest similarity are output in order. FIG. 6 is an overall configuration diagram of the apparatus disclosed in Patent Document 1. In FIG. Conventionally, an investigation target document input from the input device 602 is subjected to similarity calculation processing by comparing with a document group in a database in the external auxiliary storage device 603 under a certain extraction condition by a similarity calculation system in the control device 601. Then, the content of the document with high similarity is read based on the result of the output document list that is output by the output device 604, and an expert evaluator reads it to evaluate the document to be investigated. The evaluator had to read several to thousands of documents in order to know the contents of documents with high similarity.
Japanese Patent Laid-Open No. 11-73415 “Similar Document Retrieval Device and Similar Document Retrieval Method”

しかし従来、前述した特許文献１と同様な類似文書自動検索システムは、比較対象文書群の中から、調査対象文書に類似した文書一覧を検索結果として出力し、評価者が、調査対象文書に類似した該文書一覧から類似度の高い文書を数件から数千件程度抽出して読み込み、調査対象の文書に類似している文書を見つけ、更に読み込んで評価し、それらを基準に調査対象文書の性格を位置付けていたので、評価者は、文書を数件から数千件程度抽出て読み込まなければ、調査対象の文書の性格を的確に表す表現を見つけることができない、という不具合が生じていた。 However, conventionally, the similar document automatic search system similar to the above-described Patent Document 1 outputs a list of documents similar to the search target document from the comparison target document group as a search result, and the evaluator is similar to the search target document. From this list of documents, extract several to thousands of documents with high similarity, read them, find documents that are similar to the documents to be investigated, read them further, evaluate them, Since the personality was positioned, the evaluator had to extract several to thousands of documents and read them, so that there was a problem that an expression that accurately represents the character of the document to be investigated could not be found.

そこで、本発明は、人間が調査対象の文書も膨大な比較対象の文書も、それらの内容を一切読むことなく、該調査対象の文書の情報を的確に報告できる、情報解析報告書を自動的に作成することを目的とする。 Therefore, the present invention automatically generates an information analysis report that enables a human to accurately report information on a document to be investigated without reading the contents of any document to be investigated and a large number of documents to be compared. The purpose is to create.

上記課題を解決するために本発明は、調査対象文書及び比較対象文書を指定して入力し、情報解析をする条件を入力する入力手段と、前記調査対象文書と類似な文書群からなる母集団文書を前記比較対象文書から選出する選出手段と、前記調査対象文書の前記母集団文書に対する特徴ある索引語を抽出する抽出手段と、前記調査対象文書の特徴を表わす、母集団或いは索引語を表示、記録、又は通信に出力する出力手段とを備えたことを特徴とする。 In order to solve the above-mentioned problems, the present invention specifies and inputs a survey target document and a comparison target document, a population comprising a group of documents similar to the survey target document, input means for inputting information analysis conditions Selection means for selecting a document from the comparison target document, extraction means for extracting a characteristic index word for the population document of the search target document, and a population or index word representing the characteristics of the search target document Output means for outputting to recording or communication.

また、上記課題を解決するために本発明は、前記調査対象文書及び前記比較対象文書を指定して入力し、前記情報解析をする条件を入力する入力手段と、前記調査対象文書と類似な文書群からなる母集団文書を比較対象文書から選出する選出手段と、前記調査対象文書の前記母集団文書に対する特徴ある索引語を抽出する抽出手段と、前記調査対象文書の特徴を表わす、母集団或いは索引語を表示、記録、又は通信に出力する出力手段とを備えたことを特徴とする、調査対象文書の情報解析であって、
前記調査対象文書と類似な文書群からなる母集団文書を比較対象文書から選出する選出手段において、前記選出手段が、比較対象文書に対する類似率を算出する算出手段の結果により選出する選出手段であることを特徴とする。In order to solve the above problems, the present invention provides an input means for specifying and inputting the investigation target document and the comparison target document, and inputting conditions for performing the information analysis, and a document similar to the investigation target document. A selection means for selecting a group document consisting of groups from the comparison target document, an extraction means for extracting characteristic index words of the survey target document with respect to the population document, and a population or An information analysis of a document to be investigated, comprising an output means for displaying, recording, or outputting an index word for communication,
In the selection means for selecting a population document consisting of a document group similar to the survey target document from the comparison target documents, the selection means is a selection means for selecting based on a result of a calculation means for calculating a similarity ratio for the comparison target document. It is characterized by that.

また、上記課題を解決するために本発明は、前記調査対象文書及び前記比較対象文書を指定して入力し、前記情報解析をする条件を入力する入力手段と、前記調査対象文書と類似な文書群からなる母集団文書を比較対象文書から選出する選出手段と、前記調査対象文書の前記母集団文書に対する特徴ある索引語を抽出する抽出手段と、前記調査対象文書の特徴を表わす、母集団或いは索引語を表示、記録、又は通信に出力する出力手段とを備えたことを特徴とする、調査対象文書の情報解析であって、
前記選出手段が、比較対象文書に対する類似率を算出する算出手段の結果により選出する選出手段であることにおいて、前記類似率を算出する算出手段が、各文書の索引語毎の出現頻度と文書頻度の関数値による類似率の算出手段であることを特徴とする。In order to solve the above problems, the present invention provides an input means for specifying and inputting the investigation target document and the comparison target document, and inputting conditions for performing the information analysis, and a document similar to the investigation target document. A selection means for selecting a group document consisting of groups from the comparison target document, an extraction means for extracting characteristic index words of the survey target document with respect to the population document, and a population or An information analysis of a document to be investigated, comprising an output means for displaying, recording, or outputting an index word for communication,
The selection means is a selection means that is selected based on a result of a calculation means for calculating a similarity ratio for a comparison target document, and the calculation means for calculating the similarity ratio includes an appearance frequency and a document frequency for each index word of each document. It is a means for calculating the similarity based on the function value of.

また、上記課題を解決するために本発明は、前記調査対象文書及び前記比較対象文書を指定して入力し、前記情報解析をする条件を入力する入力手段と、前記調査対象文書と類似な文書群からなる母集団文書を比較対象文書から選出する選出手段と、前記調査対象文書の前記母集団文書に対する特徴ある索引語を抽出する抽出手段と、前記調査対象文書の特徴を表わす、母集団或いは索引語を表示、記録、又は通信に出力する出力手段とを備えたことを特徴とする、調査対象文書の情報解析であって、
前記調査対象文書の特徴を表わす、母集団或いは索引語を表示、記録、又は通信に出力する出力手段において、前記母集団或いは前記索引語を分布させてマップ状にして表示する前記表示手段と、前記母集団或いは前記索引語のデータを一部表示する表示手段と、前記出力手段が、その内容に応じた定形のコメントを自動的に若しくは選択して、或いは自由なコメントを記入若しくは選択して表示する出力手段を含むことを特徴とする。In order to solve the above problems, the present invention provides an input means for specifying and inputting the investigation target document and the comparison target document, and inputting conditions for performing the information analysis, and a document similar to the investigation target document. A selection means for selecting a group document consisting of groups from the comparison target document, an extraction means for extracting characteristic index words of the survey target document with respect to the population document, and a population or An information analysis of a document to be investigated, comprising an output means for displaying, recording, or outputting an index word for communication,
In the output means for displaying, recording, or outputting the population or index word representing the characteristics of the survey object document, the display means for distributing and displaying the population or the index word in the form of a map; The display means for displaying a part of the data of the population or the index word, and the output means automatically or select a fixed comment according to the contents, or enter or select a free comment Output means for displaying is included.

本発明によれば、調査対象文書及び比較対象文書を指定して入力し、情報解析をする条件を入力する入力手段と、前記調査対象文書と類似な文書群からなる母集団文書を前記比較対象文書から選出する選出手段と、前記調査対象文書の前記母集団文書に対する特徴ある索引語を抽出する抽出手段と、前記調査対象文書の特徴を表わす、母集団或いは索引語を表示、記録、又は通信に出力する出力手段とを備えたので、調査対象文書の情報解析において、調査対象文書の比較対象文書に対する特徴を的確にそして自動的に作成することができる。 According to the present invention, the comparison target document and the comparison target document are designated and input, the input means for inputting the condition for information analysis, and the population document composed of a document group similar to the survey target document is set as the comparison target. Selection means for selecting from documents, extraction means for extracting characteristic index words for the population document of the survey target document, and displaying, recording, or communicating a population or index word representing the characteristics of the survey target document Therefore, in the information analysis of the survey target document, the characteristics of the survey target document with respect to the comparison target document can be accurately and automatically created.

また、本発明によれば、調査対象文書及び比較対象文書を指定して入力し、情報解析をする条件を入力する入力手段と、前記調査対象文書と類似な文書群からなる母集団文書を前記比較対象文書から選出する選出手段と、前記調査対象文書の前記母集団文書に対する特徴ある索引語を抽出する抽出手段と、前記調査対象文書の特徴を表わす、母集団或いは索引語を表示、記録、又は通信に出力する出力手段とを備え、前記調査対象文書と類似な文書群からなる母集団文書を比較対象文書から選出する選出手段において、前記選出手段が、比較対象文書に対する類似率を算出する算出手段の結果により選出する選出手段とを備えたので、調査対象文書の情報解析において、調査対象文書の比較対象文書に対する特徴を的確にそして自動的に作成することができる。 Further, according to the present invention, an input unit for specifying and inputting a search target document and a comparison target document and inputting a condition for information analysis, and a population document composed of a document group similar to the search target document Selection means for selecting from comparison target documents; extraction means for extracting characteristic index words for the population document of the search target document; and displaying or recording a population or index word representing the characteristics of the search target documents; Or an output means for outputting to communication, wherein the selecting means calculates a similarity ratio with respect to the comparison target document in a selection means for selecting a population document consisting of a document group similar to the survey target document from the comparison target documents. Since there is a selection means to select based on the result of the calculation means, in the information analysis of the survey target document, the characteristics of the survey target document with respect to the comparison target document are created accurately and automatically Rukoto can.

また、本発明によれば、調査対象文書の情報解析において、調査対象文書の比較対象文書に対する特徴を的確にそして自動的に作成する情報解析報告書自動作成装置であって、前記調査対象文書及び前記比較対象文書を指定して入力し、前記情報解析をする条件を入力する入力手段と、前記調査対象文書と類似な文書群からなる母集団文書を比較対象文書から選出する選出手段と、前記調査対象文書の前記母集団文書に対する特徴ある索引語を抽出する抽出手段と、前記調査対象文書の特徴を表わす、母集団或いは索引語を表示、記録、又は通信に出力する出力手段とを備え、前記選出手段が、比較対象文書に対する類似率を算出する算出手段の結果により選出する選出手段であることにおいて、前記類似率を算出する算出手段が、各文書の索引語毎の出現頻度と文書頻度の関数値による類似率の算出手段とを備えたので、調査対象文書の情報解析において、調査対象文書の比較対象文書に対する特徴を的確にそして自動的に作成することができる。 Further, according to the present invention, in the information analysis of the survey target document, an information analysis report automatic creation device that accurately and automatically creates the characteristics of the survey target document with respect to the comparison target document, the survey target document and Specifying and inputting the comparison target document, input means for inputting conditions for performing the information analysis, selection means for selecting a population document consisting of a document group similar to the investigation target document from the comparison target document, Extraction means for extracting a characteristic index word for the population document of the survey target document, and output means for displaying, recording, or outputting the population or index word representing the characteristics of the survey target document, The selecting means is a selecting means that selects based on the result of the calculating means for calculating the similarity to the comparison target document, and the calculating means for calculating the similarity is Since it has a means for calculating the similarity based on the appearance frequency for each reference and the function value of the document frequency, it accurately and automatically creates the characteristics of the target document for comparison in the information analysis of the target document be able to.

また、本発明によれば、調査対象文書の情報解析において、調査対象文書の比較対象文書に対する特徴を的確にそして自動的に作成する情報解析報告書自動作成装置であって、前記調査対象文書及び前記比較対象文書を指定して入力し、前記情報解析をする条件を入力する入力手段と、前記調査対象文書と類似な文書群からなる母集団文書を比較対象文書から選出する選出手段と、前記調査対象文書の前記母集団文書に対する特徴ある索引語を抽出する抽出手段と、前記調査対象文書の特徴を表わす、母集団或いは索引語を表示、記録、又は通信に出力する出力手段とを備え、前記調査対象文書の特徴を表わす、母集団或いは索引語を表示、記録、又は通信に出力する出力手段において、前記母集団或いは前記索引語を分布させてマップ状にして表示する前記表示手段と、前記母集団或いは前記索引語のデータを一部表示する表示手段と、前記出力手段が、その内容に応じた定形のコメントを自動的に若しくは選択して、或いは自由なコメントを記入若しくは選択して表示する出力手段備えたので、調査対象文書の情報解析において、調査対象文書の比較対象文書に対する特徴を的確にそして自動的に作成することができる。 Further, according to the present invention, in the information analysis of the survey target document, an information analysis report automatic creation device that accurately and automatically creates the characteristics of the survey target document with respect to the comparison target document, the survey target document and Specifying and inputting the comparison target document, input means for inputting conditions for performing the information analysis, selection means for selecting a population document consisting of a document group similar to the investigation target document from the comparison target document, Extraction means for extracting a characteristic index word for the population document of the survey target document, and output means for displaying, recording, or outputting the population or index word representing the characteristics of the survey target document, In the output means for displaying, recording, or outputting to the communication a population or index word representing the characteristics of the document to be investigated, the population or the index word is distributed in a map form The display means for displaying the data, the display means for displaying a part of the data of the population or the index word, and the output means automatically or select a fixed comment according to the contents, or freely Therefore, in the information analysis of the survey target document, the characteristics of the survey target document with respect to the comparison target document can be accurately and automatically created.

以下、本発明の実施の形態を図面を参照して詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

本明細書の中で使用した語彙を定義或いは説明する。
調査対象文書ｄ：調査に係る、ある案件。例えば、特許公報第何号などの文書、或いはその集合。
比較対象文書Ｐ：調査対象文書を比較する対象の文書、或いはその集合。ｄを含む。
母集団文書Ｓ：比較対象文書Ｐの中で調査対象文書ｄに類似な文書の集団。ｄを含む。Define or explain the vocabulary used in this specification.
Survey target document d: A certain project related to the survey. For example, a document such as a patent gazette number or a collection thereof.
Comparison target document P: a document to be compared with a survey target document or a set thereof. including d.
Population document S: A group of documents similar to the survey target document d among the comparison target documents P. including d.

図中の構成部品に付してある、ｄ或いは（ｄ）、Ｐ或いは（Ｐ）、又はＳ或いは（Ｓ）は、それぞれ調査対象文書、比較対象文書、又は母集団文書の意味であり、以降判別しやすいように構成部品や動作にも付する。例えば、索引語（ｄ）とは、調査対象文書ｄの索引語を意味し、索引語（Ｐ）とは、比較対象文書Ｐの索引語を意味し、索引語（Ｓ）とは、母集団文書Ｓの索引語を意味する。
ＴＦ演算とはＴｅｒｍＦｒｅｑｕｅｎｃｙの計算のことであり、ある文書に含まれる索引語の当該文書内の出現頻度（索引語頻度）の計算である。
ＤＦ演算のＤＦとはＤｏｃｕｍｅｎｔＦｒｅｑｕｅｎｃｙの計算のことであり、ある文書に含まれる索引語で検索対象文書群から検索したときの文書頻度いいかえるとヒット数の計算である。
ＩＤＦ演算とは、例えばＤＦ演算結果の逆数或いは逆数にＰないしＳの文書数を乗じたものの対数などである。“D” or “d”, “P” or “P”, or “S” or “S” attached to the component in the figure means the survey target document, the comparison target document, or the population document, respectively. It is also attached to components and operations so that it can be easily identified. For example, the index word (d) means the index word of the survey target document d, the index word (P) means the index word of the comparison target document P, and the index word (S) means the population. This means an index word of the document S.
The TF calculation is a calculation of Term Frequency, which is a calculation of an appearance frequency (index word frequency) of an index word included in a document in the document.
The DF of the DF operation is a calculation of Document Frequency, which is a calculation of the number of hits when the document frequency is searched from the search target document group using an index word included in a certain document.
The IDF calculation is, for example, the reciprocal of the DF calculation result or the logarithm obtained by multiplying the reciprocal by the number of documents P or S.

以降の説明を簡素にするため、略号を決める。
Ｎ：比較対象文書Ｐの文書数
Ｎ’：母集団文書Ｓの文書の数
ＴＦ（ｄ）：ｄの索引語による、ｄの中での出現頻度
ＴＦ（Ｐ）：Ｐの索引語による、Ｐの中での出現頻度
ＤＦ（Ｐ）：Ｐの索引語による、Ｐの中での文書頻度
ＤＦ（Ｓ）：Ｓの索引語による、Ｓの中での文書頻度
ＩＤＦ（Ｐ）：ＤＦ（Ｐ）の逆数×文書数の対数：ｌｎ［Ｎ／ＤＦ（Ｐ）］
ＩＤＦ（Ｓ）：ＤＦ（Ｓ）の逆数×文書数の対数：ｌｎ［Ｎ’／ＤＦ（Ｓ）］
ＴＦＩＤＦ：ＴＦとＩＤＦとの積。文書の索引語ごとに演算される。
類似率：調査対象文書ｄと、比較対象文書Ｐに属する或る文書との類似の程度。To simplify the following explanation, abbreviations are determined.
N: Number of documents of comparison target document P N ′: Number of documents of population document S TF (d): Frequency of occurrence in index word TF (P): P according to index word of P Appearance frequency DF (P) in P: Document frequency in P DF (S): Document frequency in P DF (S): Document frequency in S according to S index word IDF (P): DF (P ) X logarithm of document number: ln [N / DF (P)]
IDF (S): reciprocal of DF (S) × logarithm of the number of documents: ln [N ′ / DF (S)]
TFIDF: product of TF and IDF. Calculated for each index word in the document.
Similarity: The degree of similarity between the survey target document d and a certain document belonging to the comparison target document P.

ここで、索引語とはいわゆるキーワードであり、文書の全部或いは一部から切り出される単語のことである。単語の切り出し方は従来から知られている方法や市販のソフトを活用して、助詞や接続詞を除き、意味ある名詞を抽出してもよいし、又索引語の辞書（シソーラス）のデータベースを事前に保持し該データベースから得られる索引語を利用してもよい。
尚、調査対象文書が複数ある文書群の場合は、抽出する対象は、前記の通りの索引語でもよいが、ＩＰＣの分類や、企業の群や、年毎のブループ例えば特許出願年や特許登録年などでも可能である。以下、本明細書では、代表して索引語とすることが多い。Here, the index word is a so-called keyword, which is a word cut out from all or part of the document. Words can be extracted using known methods or commercially available software to extract meaningful nouns, excluding particles and conjunctions, and to create a database of index words (thesaurus) in advance. It is also possible to use index words held in the database and obtained from the database.
In the case of a document group having a plurality of documents to be investigated, the extraction target may be the index word as described above, but the IPC classification, the group of companies, the yearly group such as the patent application year and the patent registration It is possible even in years. Hereinafter, in this specification, the index word is often used as a representative.

図１は本発明に係る一実施形態の情報解析報告書自動作成装置のハードウェア構成を示す図である。
同図に示すように、本発明に係る一実施形態の情報解析報告書自動作成装置は、ＣＰＵ（中央演算子）およびメモリィ（記録装置）などから構成される処理装置１、キーボード（手入力器具）などの入力手段である入力装置２、文書データや条件や処理装置１による作業結果などを格納する記録手段である記録装置３、および特徴索引語の抽出結果などをマップやデータなどで表示などする出力手段である出力装置４から構成される。FIG. 1 is a diagram showing a hardware configuration of an information analysis report automatic creating apparatus according to an embodiment of the present invention.
As shown in the figure, an information analysis report automatic creation device according to an embodiment of the present invention includes a processing device 1 including a CPU (central operator) and a memory (recording device), and a keyboard (manual input instrument). ) And the like, an input device 2 as a recording means, a recording device 3 as a recording means for storing document data, conditions, work results by the processing device 1, etc., and a feature index word extraction result, etc. are displayed on a map, data, etc. It is comprised from the output device 4 which is an output means to perform.

図２は本発明に係る一実施形態の情報解析報告書自動作成装置、プログラム、及び方法における構成と機能を詳しく説明する図である。 FIG. 2 is a diagram for explaining in detail the configuration and functions of the information analysis report automatic creating apparatus, program, and method according to an embodiment of the present invention.

処理装置１は、調査対象文書ｄ読み出し部１１０、索引語（ｄ）抽出部１２０、ＴＦ（ｄ）演算部１２１、比較対象文書Ｐ読み出し部１３０、索引語（Ｐ）抽出部１４０、比較対象文書ＰのＴＦ（Ｐ）演算部１４１、比較対象文書ＰのＩＤＦ（Ｐ）演算部１４２、類似率演算部１５０、母集団文書Ｓ選出部１６０、索引語（Ｓ）抽出部１７０、ＩＤＦ（Ｓ）演算部１７１、特徴索引語ＴＦ（ｄ）ＩＤＦ（Ｓ）演算部１８０などから構成される。 The processing apparatus 1 includes an investigation target document d reading unit 110, an index word (d) extraction unit 120, a TF (d) calculation unit 121, a comparison target document P reading unit 130, an index word (P) extraction unit 140, and a comparison target document. TF (P) calculation unit 141 for P, IDF (P) calculation unit 142 for comparison target document P, similarity calculation unit 150, population document S selection unit 160, index word (S) extraction unit 170, IDF (S) The calculation unit 171 and the feature index word TF (d) IDF (S) calculation unit 180 are included.

入力装置２は、調査対象文書ｄ条件入力部２１０、比較対象文書Ｐ条件入力部２２０、抽出条件その他入力部２３０などから構成される。 The input device 2 includes an investigation target document d condition input unit 210, a comparison target document P condition input unit 220, an extraction condition and other input unit 230, and the like.

記録装置３は、条件記録部３１０、作業結果格納部３２０、文書格納部３３０などから構成される。文書格納部３３０は外部データベースや内部データベースを含んでいる。外部データベースとは、例えば特許庁でサービスしている特許電子図書館のＩＰＤＬや、株式会社パトリスでサービスしているＰＡＴＯＬＩＳなどの文書データベースを意味する。又内部データベースとは、販売している例えば特許ＪＰ−ＲＯＭなどのデータを自前で格納したデータベース、文書を格納したＦＤ（フロッピーディスク）、ＣＤＲＯＭ（コンパクトディスク）、ＭＯ（光磁気ディスク）、ＤＶＤ（デジタルビデオディスク）などの媒体から読み出す装置、紙などに出力された或いは手書きされた文書を読み込むＯＣＲ（光学的情報読み取り装置）などの装置及び読み込んだデータをテキストなどの電子データに変換する装置などを含んでいるものとする。 The recording device 3 includes a condition recording unit 310, a work result storage unit 320, a document storage unit 330, and the like. The document storage unit 330 includes an external database and an internal database. The external database means a document database such as IPDL of a patent electronic library serviced by the Patent Office or PATOLIS serviced by Patrice Co., Ltd. The internal database is a database in which data such as a patent JP-ROM that is sold is stored by itself, an FD (floppy disk), a CDROM (compact disk), an MO (magneto-optical disk), a DVD (which stores documents) A device that reads from a medium such as a digital video disk), a device such as an OCR (optical information reader) that reads a document that has been output on paper or handwritten, and a device that converts the read data into electronic data such as text. Is included.

出力装置４は、マップ作成条件読み出し部４１０、マップ用データ取り込み部４１２、マップ（グラフ・表）生成部４１５、母集団データ出力条件読み出し部４２０、出力データ取り込み部４２２、コメント条件読み出し部４３０、定形コメント取り込み部４３２、コメント追記部４３５、マップ・データ・コメント複合整形出力部４４０などから構成される。 The output device 4 includes a map creation condition reading unit 410, a map data loading unit 412, a map (graph / table) generation unit 415, a population data output condition reading unit 420, an output data loading unit 422, a comment condition reading unit 430, A standard comment capturing unit 432, a comment adding unit 435, a map / data / comment composite shaping output unit 440, and the like are included.

図１及び図２において、処理装置１、入力装置２、記録装置３、および出力装置４の間で信号やデータをやり取りする通信手段としては、ＵＳＢ（ユニバーサルシステムバス）ケーブルなどで直接接続してもよいし，ＬＡＮ（ローカルエリヤネットワーク）などのネットワークを介して送受信してもよいし、文書を格納したＦＤ、ＣＤＲＯＭ、ＭＯ、ＤＶＤなどの媒体を介してもよい。或いはこれらの一部、又はいくつかを組み合わせたものでもよい。 1 and 2, the communication means for exchanging signals and data among the processing device 1, the input device 2, the recording device 3, and the output device 4 is directly connected by a USB (Universal System Bus) cable or the like. Alternatively, transmission / reception may be performed via a network such as a LAN (local area network), or may be performed via a medium such as an FD, CDROM, MO, or DVD storing a document. Alternatively, a part or a combination of these may be used.

図２により本発明に係る一実施形態の情報解析報告書自動作成装置、プログラム、及び方法における機能を詳しく説明する。 The functions of the information analysis report automatic creating apparatus, program, and method according to an embodiment of the present invention will be described in detail with reference to FIG.

図２の入力装置２において、調査対象文書ｄ条件入力部２１０は、入力画面などによって調査対象文書ｄの読み出しを行なう条件を設定する。比較対象文書Ｐ条件入力部２２０は、入力画面などによって比較対象文書Ｐの読み出しを行なう条件を設定する。抽出条件その他入力部２３０は、入力画面などによって調査対象文書ｄ及び比較対象文書Ｐの索引語抽出条件、ＴＦ演算の条件、ＩＤＦ演算の条件、類似率演算の条件、類似文書の選出条件、マップ作成条件、データ出力条件、コメント追記条件などを設定する。これら入力された条件は、記録装置３の条件記録部３１０へ送られ格納される。 In the input device 2 of FIG. 2, the survey target document d condition input unit 210 sets conditions for reading the survey target document d through an input screen or the like. The comparison target document P condition input unit 220 sets conditions for reading the comparison target document P on an input screen or the like. The extraction condition and other input unit 230 displays the index word extraction condition of the investigation target document d and the comparison target document P, the TF calculation condition, the IDF calculation condition, the similarity ratio calculation condition, the similar document selection condition, the map, etc. Set creation conditions, data output conditions, comment addition conditions, etc. These input conditions are sent to and stored in the condition recording unit 310 of the recording device 3.

図２の処理装置１において、調査対象文書ｄ読み出し部１１０は、調査対象の文書を、条件記録部３１０の条件に基づいて、文書格納部３３０より読み出しを行ない、索引語（ｄ）抽出部１２０に送られる。索引語（ｄ）抽出部１２０は、条件記録部３１０の条件に基づいて、調査対象文書ｄ読み出し部１１０で得られた文書から索引語の抽出を行ない、作業結果格納部３２０に格納する。 In the processing apparatus 1 of FIG. 2, the survey target document d reading unit 110 reads the survey target document from the document storage unit 330 based on the condition of the condition recording unit 310, and the index word (d) extraction unit 120. Sent to. The index word (d) extraction unit 120 extracts an index word from the document obtained by the investigation target document d reading unit 110 based on the condition of the condition recording unit 310 and stores it in the work result storage unit 320.

比較対象文書Ｐ読み出し部１３０は、母集団の文書を、条件記録部３１０の条件に基づいて、文書格納部３３０より読み出しを行ない、索引語（Ｐ）抽出部１４０に送られる。索引語（Ｐ）抽出部１４０は、条件記録部３１０の条件に基づいて、比較対象文書Ｐ読み出し部１３０で得られた文書から索引語の抽出を行ない、作業結果格納部３２０に格納する。 The comparison target document P reading unit 130 reads a population document from the document storage unit 330 based on the condition of the condition recording unit 310 and sends the document to the index word (P) extraction unit 140. The index word (P) extraction unit 140 extracts an index word from the document obtained by the comparison target document P reading unit 130 based on the condition of the condition recording unit 310 and stores it in the work result storage unit 320.

前記の、比較対象文書Ｐ読み出し部１３０、及び索引語（Ｐ）抽出部１４０については、通常は、比較対象文書のひとつである特許公開公報などの公報全部という場合が多く、一度索引語を切り出して用意し保存しておけば、わざわざ改めて切り出す必要はないので、省略することができる。 The comparison target document P reading unit 130 and the index word (P) extraction unit 140 are usually all publications such as patent publications, which are one of the comparison target documents. If you prepare and save it, you do not need to cut it out again, so you can omit it.

ＴＦ（ｄ）演算部１２１は，条件記録部３１０の条件に基づいて、作業結果格納部３２０に格納された調査対象文書ｄについての索引語（ｄ）抽出部１２０の作業結果を、ＴＦ演算して、作業結果格納部３２０に格納或いは直接、類似率演算部１５０、或いは特徴索引語ＴＦ（ｄ）ＩＤＦ（Ｓ）演算部１８０に送られる。 The TF (d) calculation unit 121 performs TF calculation on the work result of the index word (d) extraction unit 120 for the investigation target document d stored in the work result storage unit 320 based on the condition of the condition recording unit 310. Then, it is stored in the work result storage unit 320 or directly sent to the similarity calculation unit 150 or the feature index word TF (d) IDF (S) calculation unit 180.

ＴＦ（Ｐ）演算部１４１は，条件記録部３１０の条件に基づいて、作業結果格納部３２０に格納された比較対象文書Ｐについての索引語（Ｐ）抽出部１４０の作業結果を、ＴＦ演算して、作業結果格納部３２０に格納或いは直接類似率演算部１５０に送られる。 Based on the condition of the condition recording unit 310, the TF (P) calculation unit 141 performs TF calculation on the work result of the index word (P) extraction unit 140 for the comparison target document P stored in the work result storage unit 320. And stored in the work result storage unit 320 or directly sent to the similarity ratio calculation unit 150.

ＩＤＦ（Ｐ）演算部１４２は、条件記録部３１０の条件に基づいて、作業結果格納部３２０に格納された比較対象文書Ｐについての索引語（Ｐ）抽出部１４０の作業結果を、ＩＤＦ演算して、作業結果格納部３２０に格納、或いは直接類似率演算部１５０に送られる。 Based on the condition of the condition recording unit 310, the IDF (P) calculation unit 142 performs IDF calculation on the work result of the index word (P) extraction unit 140 for the comparison target document P stored in the work result storage unit 320. And stored in the work result storage unit 320 or directly sent to the similarity calculation unit 150.

類似率演算部１５０は、条件記録部３１０の条件に基づいて、ＴＦ（ｄ）演算部１２１、ＴＦ（Ｐ）演算部１４１、及びＩＤＦ（Ｐ）演算部１４２の演算結果を、それぞれから直接或いは作業結果格納部３２０から得て、比較対象文書Ｐの文書それぞれの、調査対象文書ｄに対する類似率を演算し、比較対象文書Ｐのそれぞれに類似率データとして付され、作業結果格納部３２０或いは直接母集団文書Ｓ選出部１６０に送られる。 Based on the conditions of the condition recording unit 310, the similarity ratio calculation unit 150 can directly calculate the calculation results of the TF (d) calculation unit 121, the TF (P) calculation unit 141, and the IDF (P) calculation unit 142 from each of them. Obtained from the work result storage unit 320, the similarity ratio of each of the comparison target documents P with respect to the survey target document d is calculated, and is added to each of the comparison target documents P as similarity ratio data. It is sent to the population document S selection unit 160.

類似率演算部１５０における、類似率の演算は、各文書が索引語毎に、例えばＴＦＩＤＦ演算などに代表される計算がなされ、比較対象文書Ｐの文書それぞれの、調査対象文書ｄに対する類似率が計算される。ＴＦＩＤＦ演算とは、ＴＦ演算結果とＩＤＦ演算結果の積である。類似率の演算方法は後で詳しく述べる。 The similarity ratio calculation unit 150 calculates the similarity ratio for each index word, for example, a calculation represented by a TFIDF calculation, and the like, and the similarity ratio of each document of the comparison target document P to the survey target document d is calculated. Calculated. The TFIDF calculation is a product of the TF calculation result and the IDF calculation result. A method of calculating the similarity rate will be described in detail later.

類似率の演算の方法はいろいろあるので、前記の、ＴＦ（ｄ）演算部１２１、ＴＦ（Ｐ）演算部１４１、及びＩＤＦ（Ｐ）演算部１４２に基づいた類似率演算部１５０の場合はこのままでよいが、前記の、ＴＦ（ｄ）演算部１２１、ＴＦ（Ｐ）演算部１４１、及びＩＤＦ（Ｐ）演算部１４２を必要としないに類似率の演算の方法のときは、総て省略して類似率演算部１５０だけとしてもよい、ことは言うまでもない。 Since there are various methods for calculating the similarity ratio, the similarity ratio calculation section 150 based on the TF (d) calculation section 121, the TF (P) calculation section 141, and the IDF (P) calculation section 142 is not changed. However, when the method of calculating similarity is not required, the TF (d) calculation unit 121, TF (P) calculation unit 141, and IDF (P) calculation unit 142 are all omitted. Needless to say, only the similarity calculation unit 150 may be used.

母集団文書Ｓ選出部１６０は、条件記録部３１０の条件に基づいて、作業結果格納部３２０或いは直接類似率演算部１５０の結果から、母集団文書Ｓの類似率演算結果のうち、条件に記された選出方法により選出、例えば類似率の高い順に文書をソートし、条件に記された必要な数だけ選出し、作業結果格納部３２０或いは直接索引語（Ｓ）抽出部１７０に送られる。 Based on the condition of the condition recording unit 310, the population document S selection unit 160 records the conditions in the similarity calculation result of the population document S from the results of the work result storage unit 320 or the direct similarity calculation unit 150. Selection is performed by the selected selection method, for example, the documents are sorted in descending order of similarity, and the necessary number indicated in the condition is selected and sent to the work result storage unit 320 or the index word (S) extraction unit 170 directly.

また、母集団文書Ｓ選出部１６０の出力から、直接、マップ用データ取り込み部４１２や出力データ取り込み部４２２に進むことがあるので、その場合は、以降の構成は不要となることも言うまでもない。 In addition, since the output from the population document S selection unit 160 may proceed directly to the map data capturing unit 412 or the output data capturing unit 422, it goes without saying that the subsequent configuration is unnecessary.

索引語（Ｓ）抽出部１７０は、条件記録部３１０の条件に基づいて、作業結果格納部３２０或いは母集団文書Ｓ選出部１６０の結果の母集団文書Ｓから、索引語（Ｓ）を抽出し、作業結果格納部３２０或いは直接ＩＤＦ（Ｓ）演算部１７１に送られる。 The index word (S) extraction unit 170 extracts the index word (S) from the population document S as a result of the work result storage unit 320 or the population document S selection unit 160 based on the condition of the condition recording unit 310. The result is sent to the work result storage unit 320 or directly to the IDF (S) calculation unit 171.

ＩＤＦ（Ｓ）演算部１７１は、条件記録部３１０の条件に基づいて、作業結果格納部３２０或いは直接索引語（Ｓ）抽出部１７０の作業結果を、ＩＤＦ演算して、作業結果格納部３２０に格納或いは直接、特徴索引語ＴＦ（ｄ）ＩＤＦ（Ｓ）演算部１８０に送られる。 The IDF (S) calculation unit 171 performs IDF calculation on the work result of the work result storage unit 320 or the direct index word (S) extraction unit 170 based on the condition of the condition recording unit 310, and stores the result in the work result storage unit 320. It is stored or directly sent to the feature index word TF (d) IDF (S) calculation unit 180.

特徴索引語ＴＦ（ｄ）ＩＤＦ（Ｓ）演算部１８０は、条件記録部３１０の条件に基づいて、作業結果格納部３２０から、或いはＴＦ（ｄ）演算部１２１の結果、及び直接ＩＤＦ（Ｓ）演算部１７１の結果から、選出のための条件に記された必要な数だけ、或いは条件に基づいた計算結果により選ばれた数だけ、母集団文書を、例えば類似率の高い順に選出して、作業結果格納部３２０に送る。 The feature index word TF (d) IDF (S) calculation unit 180 is based on the condition of the condition recording unit 310, or from the work result storage unit 320 or the result of the TF (d) calculation unit 121 and the direct IDF (S). From the result of the calculation unit 171, select a population document, for example, in ascending order of similarity, by the required number written in the selection condition or the number selected by the calculation result based on the condition, The result is sent to the work result storage unit 320.

図２の記録装置３において、条件記録部３１０は、入力装置２から得られた条件などの情報を記録し、処理装置１或いは出力装置４の要求に基づき、それぞれに必要なデータを送る。作業結果格納部３２０は、処理装置１における各構成部位の作業結果を格納し、処理装置１或いは出力装置４の要求に基づき、それぞれに必要なデータを送る。 In the recording device 3 of FIG. 2, the condition recording unit 310 records information such as conditions obtained from the input device 2, and sends necessary data to each based on a request from the processing device 1 or the output device 4. The work result storage unit 320 stores the work result of each component in the processing device 1 and sends necessary data to each based on a request from the processing device 1 or the output device 4.

文書格納部３３０は、入力装置２或いは処理装置１の要求に基づき、外部データベース或いは内部データベースから得た、必要な文書データを格納し、提供する。 The document storage unit 330 stores and provides necessary document data obtained from an external database or an internal database based on a request from the input device 2 or the processing device 1.

図２の出力装置４において、マップ作成条件読出部４１０は、条件記録部３１０の条件に基づいて、マップの作成条件を読み出し、マップ用データ取り込み部４１２に送る。母集団データ出力条件読出部４２０は、条件記録部３１０の条件に基づいて、母集団データの出力条件を読み出し、出力データ取り込み部４２２に送る。コメント条件読出部４３０は、条件記録部３１０の条件に基づいて、コメントの出力条件や追記条件を読み出し、コメント追記部４３２に送る。 In the output device 4 of FIG. 2, the map creation condition reading unit 410 reads map creation conditions based on the conditions of the condition recording unit 310 and sends them to the map data fetching unit 412. The population data output condition reading unit 420 reads out the output condition of the population data based on the condition of the condition recording unit 310 and sends it to the output data capturing unit 422. The comment condition reading unit 430 reads out the comment output condition and the additional writing condition based on the conditions of the condition recording unit 310 and sends them to the comment additional writing unit 432.

マップ用データ取り込み部４１２は、マップ作成条件読み出し部４１０の条件に従い、作業結果格納部３２０に格納された、母集団文書Ｓ選出部１６０の結果や特徴索引語ＴＦ（ｄ）ＩＤＦ（Ｓ）演算部１８０の結果などを、文書格納部３３０のデータとともに取り込み、作業結果格納部３２０或いは直接マップ（グラフ・表）生成部４１５に送られる。 The map data capturing unit 412 calculates the result of the population document S selection unit 160 and the feature index word TF (d) IDF (S) stored in the work result storage unit 320 according to the conditions of the map creation condition reading unit 410. The result of the unit 180 is fetched together with the data of the document storage unit 330 and sent to the work result storage unit 320 or the direct map (graph / table) generation unit 415.

マップ（グラフ・表）生成部４１５は、マップ用データ取り込み部４１２からのデータを使って、グラフ、表、タイトル、凡例などを生成し、マップ・データ・コメント複合整形出力部４４０に送られる。 The map (graph / table) generation unit 415 generates a graph, a table, a title, a legend, and the like using the data from the map data acquisition unit 412, and sends the graph, table, title, legend, etc.

出力データ取り込み部４２２は、データ出力条件読み出し部４２０の条件に従い、作業結果格納部３２０に格納された、母集団文書Ｓ選出部１６０の結果や特徴索引語ＴＦ（ｄ）ＩＤＦ（Ｓ）演算部１８０の結果などを、文書格納部３３０のデータとともに取り込み、作業結果格納部３２０或いは直接マップ・データ・コメント複合整形出力部４４０に送られる。 The output data capturing unit 422 is a result of the population document S selection unit 160 and a feature index word TF (d) IDF (S) calculation unit stored in the work result storage unit 320 according to the conditions of the data output condition reading unit 420. The result of 180 or the like is taken together with the data in the document storage unit 330 and sent to the work result storage unit 320 or directly to the map / data / comment composite shaping output unit 440.

定形コメント取り込み部４３２は、コメント条件読出部４３０の条件に従い、作業結果格納部３２０や文書格納部３３０のデータを取り込み、コメント追記部４３５に、或いは直接マップ・データ・コメント複合整形出力部４４０に送られる。 The fixed comment capturing unit 432 captures the data of the work result storage unit 320 and the document storage unit 330 in accordance with the conditions of the comment condition reading unit 430, and inputs the data to the comment addition unit 435 or the direct map data / comment composite output unit 440. Sent.

コメント追記部４３５は、コメント条件読出部４３０の条件に従い、キーボードやＯＣＲなどの外部入力装置から直接、或いは文書格納部３３０の内部データベースに事前に用意したものを、調査対象文書ｄの評価者のコメントとして追記するためのデータを用意し、作業結果格納部３２０或いは直接マップ・データ・コメント複合整形出力部４４０に送られる。 According to the conditions of the comment condition reading unit 430, the comment adding unit 435 directly prepares an external input device such as a keyboard or an OCR or an internal database of the document storage unit 330 in advance by the evaluator of the survey target document d. Data to be added as a comment is prepared and sent to the work result storage unit 320 or directly to the map / data / comment composite shaping output unit 440.

マップ・データ・コメント複合整形出力部４４０は、マップ（グラフ・表）生成部４１５から出力される条件とデータ、出力データ取り込み部４２２から出力される条件とデータ、定形コメント取り込み部４３２から出力される条件とデータ及びコメント追記部４３５から出力される条件とデータをそれぞれ直接或いは作業結果格納部３２０より得て、マップ・データ・コメントを紙出力として最適な形に整形したあと複合的に出力する用意をし、マップ上に表示し、データ一覧に出力し、及びコメント或いはそれらの一部を表示、印刷、若しくはデータで格納できるように出力する。 The map / data / comment composite shaping output unit 440 outputs the conditions and data output from the map (graph / table) generation unit 415, the conditions and data output from the output data acquisition unit 422, and the standard comment acquisition unit 432. Conditions, data and conditions and data output from the comment adding section 435 are obtained directly or from the work result storage section 320, and map data, comments are formed into an optimal form as paper output, and then output in combination. Prepare, display on map, output to data list, and output comments or part of them so that they can be displayed, printed, or stored as data.

図３、図４、及び図５は本発明に係る一実施形態の情報解析報告書自動作成装置、プログラム、及び方法における典型的な一例の動作を説明する図である。 3, 4, and 5 are diagrams for explaining typical examples of operations in the information analysis report automatic creating apparatus, program, and method according to an embodiment of the present invention.

図３は、入力装置２による各構成の動作を示す条件設定のフローチャートである。まず初期化（ステップＳ２０１）のあと、入力する条件を区別する（ステップＳ２０２）。このとき、条件が調査対象文書ｄの条件入力であるとき、調査対象文書ｄ条件入力部２１０において調査対象文書ｄの条件を入力する（ステップＳ２１０）。次に、入力した条件を表示画面で確認し、よければ設定を選んで条件記録部３１０で格納し（ステップＳ３１０）、悪ければ戻るを選んでステップＳ２１０に戻る（ステップＳ２１１）。 FIG. 3 is a condition setting flowchart showing the operation of each component by the input device 2. First, after initialization (step S201), input conditions are distinguished (step S202). At this time, when the condition is the condition input of the survey target document d, the condition of the survey target document d is input in the survey target document d condition input unit 210 (step S210). Next, the input condition is confirmed on the display screen. If it is good, the setting is selected and stored in the condition recording unit 310 (step S310). If it is bad, return is selected and the process returns to step S210 (step S211).

一方ステップＳ２０２において条件が比較対象文書Ｐの条件入力であるとき、比較対象文書Ｐ条件入力部２２０において比較対象文書Ｐの条件を入力する（ステップＳ２２０）。次に、入力した条件を表示画面で確認し、よければ設定を選んで条件記録部３１０で格納し（ステップＳ３１０）、悪ければ戻るを選んでステップＳ２２０に戻る（ステップＳ２２１）。 On the other hand, when the condition is the condition input of the comparison target document P in step S202, the comparison target document P condition input unit 220 inputs the condition of the comparison target document P (step S220). Next, the input condition is confirmed on the display screen. If it is good, the setting is selected and stored in the condition recording unit 310 (step S310). If it is bad, return is selected and the process returns to step S220 (step S221).

又、ステップＳ２０２において条件が抽出条件その他であるとき、抽出条件その他入力部２３０において抽出条件その他を入力する（ステップＳ２３０）。次に、入力した条件を表示画面で確認し、よければ設定を選んで条件記録部３１０で格納し（ステップＳ３１０）、悪ければ戻るを選んでステップＳ２３０に戻る（ステップＳ２３１）。該ステップＳ２３０においては、調査対象文書ｄの抽出条件と、比較対象文書Ｐからの母集団文書Ｓの抽出条件の両方を設定する。 When the condition is the extraction condition or the like in step S202, the extraction condition or the like is input in the extraction condition or other input unit 230 (step S230). Next, the input condition is confirmed on the display screen. If it is good, the setting is selected and stored in the condition recording unit 310 (step S310). If it is bad, return is selected and the process returns to step S230 (step S231). In step S230, both the extraction condition for the survey target document d and the extraction condition for the population document S from the comparison target document P are set.

図４は、処理装置１による各構成の動作を示すフローチャートである。まず初期化（ステップＳ１０１）のあと、条件記録部３１０の条件に基づいて、文書格納部３３０から読み出す文書を、調査対象文書ｄと比較対象文書Ｐに区別する（ステップＳ１０２）。読み出す文書が調査対象文書ｄであるとき、調査対象文書ｄ読み出し部１１０において調査対象文書を文書格納部３３０より読み出す（ステップＳ１１０）。次に、索引語（ｄ）抽出部１２０において調査対象文書ｄの索引語抽出を行なう（ステップＳ１２０）。引き続き、抽出された索引語の各々について、ＴＦ（ｄ）演算部１２１においてＴＦ演算をする（ステップＳ１２１）。 FIG. 4 is a flowchart showing the operation of each component by the processing apparatus 1. First, after initialization (step S101), based on the condition of the condition recording unit 310, a document to be read from the document storage unit 330 is classified into a survey target document d and a comparison target document P (step S102). When the document to be read is the survey target document d, the survey target document d reading unit 110 reads the survey target document from the document storage unit 330 (step S110). Next, the index word (d) extraction unit 120 extracts the index word of the investigation target document d (step S120). Subsequently, the TF (d) calculation unit 121 performs TF calculation for each of the extracted index words (step S121).

一方ステップＳ１０２において、読み出す文書が比較対象文書Ｐであるとき、比較対象文書Ｐ読み出し部１３０において比較対象文書Ｐを読み出す（ステップＳ１３０）。次に、索引語（Ｐ）抽出部１４０において比較対象文書Ｐの索引語抽出を行なう（ステップＳ１４０）。引き続き、抽出された索引語の各々について、ＴＦ（Ｐ）演算部１４１においてＴＦ演算をする（ステップＳ１４１）とともに、ＩＤＦ（Ｐ）演算部１４２においてＩＤＦ演算をする（ステップＳ１４２）。 On the other hand, when the document to be read is the comparison target document P in step S102, the comparison target document P is read by the comparison target document P reading unit 130 (step S130). Next, the index word (P) extraction unit 140 performs index word extraction of the comparison target document P (step S140). Subsequently, for each of the extracted index words, the TF (P) calculation unit 141 performs TF calculation (step S141) and the IDF (P) calculation unit 142 performs IDF calculation (step S142).

次に、ＴＦ（ｄ）演算部１２１の出力のＴＦ（ｄ）演算結果と、ＴＦ（Ｐ）演算部１４１の出力のＴＦ（Ｐ）演算結果、及びＩＤＦ（Ｐ）演算部１４２の出力のＩＤＦ（Ｐ）演算結果にて、類似率演算部１５０において、文書の各索引語についての演算結果を出し、例えば索引語の平均値を出して、文書の類似率とする、などして類似率の演算を行なう（ステップＳ１５０）。 Next, the TF (d) calculation result output from the TF (d) calculation unit 121, the TF (P) calculation result output from the TF (P) calculation unit 141, and the IDF output from the IDF (P) calculation unit 142 (P) Based on the calculation result, the similarity calculation unit 150 outputs the calculation result for each index word of the document, for example, the average value of the index words is calculated and used as the document similarity, and the similarity ratio is calculated. Calculation is performed (step S150).

類似率の演算の方法がＴＦＩＤＦなどではない場合は、調査対象文書ｄの索引語（ｄ）抽出部１２０と比較対象文書Ｐの索引語（Ｐ）抽出部１４０から、別の方法により類似率が求められることがある。 When the method of calculating the similarity rate is not TFIDF or the like, the similarity rate is calculated by another method from the index word (d) extraction unit 120 of the investigation target document d and the index word (P) extraction unit 140 of the comparison target document P. Sometimes required.

次に、母集団文書Ｓ選出部１６０により、ステップＳ１５０にて演算した文書を類似率の順に並べ替え、抽出条件その他入力部２３０において設定した条件に沿った数の母集団文書Ｓを選出する（ステップＳ１６０）。 Next, the population document S selection unit 160 rearranges the documents calculated in step S150 in the order of similarity, and selects the number of population documents S according to the extraction conditions and other conditions set in the input unit 230 ( Step S160).

これらのデータは、出力装置４のマップ（グラフ・表）生成部４１５やマップ・データ・コメント複合整形出力部４４０で直接利用することがある。 These data may be directly used by the map (graph / table) generation unit 415 or the map / data / comment combined shaping output unit 440 of the output device 4.

次に、母集団文書Ｓの索引語（Ｓ）抽出部１７０により、ステップＳ１６０にて選出した母集団文書Ｓの索引語（Ｓ）を抽出する（ステップＳ１７０）。 Next, the index word (S) extraction unit 170 of the population document S extracts the index word (S) of the population document S selected in step S160 (step S170).

次に、索引語（ｄ）の各々について、ＩＤＦ（Ｓ）演算部１７１により、ＩＤＦ演算する（ステップＳ１７１）。 Next, IDF calculation is performed by the IDF (S) calculation unit 171 for each index word (d) (step S171).

次に、ステップＳ１７１による母集団文書Ｓにおける索引語（ｄ）の各々についてのＩＤＦ（Ｓ）演算の結果と、Ｓ１２１ステップによる調査対象文書ｄにおける索引語（ｄ）の各々についてのＴＦ（ｄ）演算の結果とから、特徴索引語演算ＴＦ（ｄ）ＩＤＦ（Ｓ）を行なう（ステップＳ１８０）。 Next, the result of the IDF (S) calculation for each index word (d) in the population document S in step S171, and the TF (d) for each index word (d) in the survey target document d in step S121. Based on the result of the calculation, a feature index word calculation TF (d) IDF (S) is performed (step S180).

図５は、出力装置４による、マップ、データ、及び或いはコメントの出力のフローチャートである。まず初期化（ステップＳ４０１）のあと、条件記録部３１０から読み出す条件を、マップ作成条件と、データ出力条件と、コメント追記条件に区別する（ステップＳ４０２）。 FIG. 5 is a flowchart of outputting maps, data, and / or comments by the output device 4. First, after initialization (step S401), conditions to be read from the condition recording unit 310 are classified into map creation conditions, data output conditions, and comment addition conditions (step S402).

条件記録部３１０から読み出す条件がマップ作成条件であるとき（ステップＳ４１０）、マップを必要とする条件であったら（ステップＳ４１１）、作業結果格納部３２０からマップ用データ取り込み部４１２によりマップ用データ取り込みを行なう（ステップ４１２）。マップ作成条件読み出し部４１０のマップ作成条件に沿って、グラフや表などのマップを生成し（ステップＳ４１５）、続いて、マップを表示する準備をし（ステップ４１９）、マップ・データ・コメント複合整形出力部４４０に送る。 When the condition read from the condition recording unit 310 is the map creation condition (step S410), if the map requires the condition (step S411), the map data capturing unit 412 captures the map data from the work result storage unit 320. (Step 412). A map such as a graph or a table is generated in accordance with the map creation condition of the map creation condition reading unit 410 (step S415), and then the map is prepared to be displayed (step 419). The data is sent to the output unit 440.

一方、条件記録部３１０から読み出す条件が母集団データ出力条件であるとき（ステップＳ４２０）、データを必要とする条件であったら（ステップＳ４２１）、作業結果格納部３２０から出力データ取り込み部４２２により出力用データ取り込みを行なう（ステップＳ４２２）。次に、データ出力条件読み出し部４２０のデータ出力条件に沿って、データを出力し（ステップＳ４２３）、続いて、データを出力する準備をし（ステップ４２９）、マップ・データ・コメント複合整形出力部４４０に送る。 On the other hand, when the condition read from the condition recording unit 310 is a population data output condition (step S420), if the condition requires data (step S421), the data is output from the work result storage unit 320 by the output data capturing unit 422. The data is taken in (step S422). Next, the data is output in accordance with the data output condition of the data output condition reading unit 420 (step S423), and then the data is prepared to be output (step 429). To 440.

また一方、条件記録部３１０から読み出す条件がコメント条件であるとき（ステップＳ４３０）、コメントを必要とする条件であったら（ステップＳ４３１）、マップ・データ・コメント複合整形出力部４４０にて、コメントを追記できる枠を準備し、該枠内に、キーボードから或いはＯＣＲから、手入力するか（ステップＳ４３５）、或いは又は、文書格納部３３０の内部データベースにある、事前に準備されたコメントを使って、コメントを取り込み（ステップＳ４３２）、続いて、コメントを出力する準備をし（ステップ４３９）、マップ・データ・コメント複合整形出力部４４０に送る。 On the other hand, when the condition read from the condition recording unit 310 is a comment condition (step S430), if the condition requires a comment (step S431), the map data / comment combined output unit 440 Prepare a frame that can be appended, and manually enter it from the keyboard or OCR (step S435), or use a comment prepared in advance in the internal database of the document storage unit 330, A comment is taken in (step S432), and subsequently, a preparation for outputting a comment is made (step 439), and the comment is sent to the map / data / comment combined output unit 440.

ステップＳ４１１でマップを表示する条件でなかったら、又はステップＳ４２１でデータを出力する条件でなかったら、又はステップＳ４３１でコメントを追記する条件でなかったら、それぞれその時点で終了し、マップ・データ・コメント複合整形出力部４４０へはデータを送らない。 If it is not a condition for displaying a map in step S411, or if it is not a condition for outputting data in step S421, or if it is not a condition for adding a comment in step S431, the process ends at that point, and the map data comment Data is not sent to the composite shaping output unit 440.

図７は、本発明に係る情報解析報告書自動作成装置の一実施形態の入力装置２の入力条件設定画面である。 FIG. 7 is an input condition setting screen of the input device 2 of one embodiment of the information analysis report automatic creating device according to the present invention.

図７は、情報解析報告書自動作成装置の前記入力装置２の入力条件設定（１）画面の表示例である。図７において、「対象文書」のウインドの「調査対象文書」と「比較対象文書」の中から「調査対象文書」を選び、次に「文書内容」のウインドの「公開特許」「登録特許」「実用新案」「学術文献」などの中から「公開特許」を選び、次に「データの読み出し」のウインドの「自社ＤＢ１」「自社ＤＢ２」「特許庁ＩＰＤＬ」「ＰＡＴＯＬＩＳ」「他商用ＤＢ１」「他商用ＤＢ２」「ＦＤ」「ＣＤ」「ＭＯ」「ＤＶＤ」「その他」などの中から「ＦＤ」を選び、更に「ＦＤ」の「文書１」「文書２」「文書３」「文書４」「文書５」「文書６」などの中から「文書３」を選んだ状態の例である。 FIG. 7 is a display example of the input condition setting (1) screen of the input device 2 of the information analysis report automatic creation device. In FIG. 7, “Survey target document” is selected from “Survey target document” and “Comparison target document” in the “Target document” window, and then “Public patent” and “Registered patent” in the “Document content” window. Select “Public Patent” from “Utility Model”, “Academic Literature”, etc., then “Read Data” window “Company DB1” “Company DB2” “Patent Office IPDL” “PATOLIS” “Other Commercial DB1” “FD” is selected from “Other Commercial DB 2”, “FD”, “CD”, “MO”, “DVD”, “Other”, etc., and “Document 1” “Document 2” “Document 3” “Document 4” of “FD”. In this example, “Document 3” is selected from “Document 5”, “Document 6”, and the like.

図８は、情報解析報告書自動作成装置の前記入力装置２の入力条件設定（２）画面の表示例である。図８において、「対象文書」のウインドの「調査対象文書」と「比較対象文書」などの中から「比較対象文書」を選び、次に「文書内容」のウインドの「公開特許」「登録特許」「実用新案」「学術文献」などの中から「公開特許」と「登録特許」の両方を選び、次に「抽出内容」のウインドの「請求項」「従来技術」「発明の課題」「手段・効果」「実施例」「図の説明」「図面」「要約」「書誌事項」「経過情報」「登録情報」「その他」などの中から「請求項」と「要約」の両方を選び、次に「データの読み出し」のウインドで前述と同じ項目の中から「自社ＤＢ１」を選んだ状態の例である。この例の入力条件設定画面における設定条件が、調査対象文書ｄ条件入力部２１０と比較対象文書Ｐ条件入力部２２０を設定する。 FIG. 8 is a display example of the input condition setting (2) screen of the input device 2 of the information analysis report automatic creation device. In FIG. 8, a “comparison target document” is selected from “investigation target document” and “comparison target document” in the “target document” window, and then “published patent” and “registered patent” in the “document content” window. "Utility model", "Academic literature", etc., select "Public patent" and "Registered patent", then "extracted contents" window "claim" "prior art" "invention problem" Select both “Claim” and “Summary” from “Means / Effects”, “Examples”, “Explanation of Figures”, “Drawings”, “Summary”, “Bibliographic Items”, “Progress Information”, “Registration Information”, “Others”, etc. Next, in the “data read” window, “in-house DB1” is selected from the same items as described above. The setting conditions on the input condition setting screen in this example set the investigation target document d condition input unit 210 and the comparison target document P condition input unit 220.

図９は、情報解析報告書自動作成装置の前記入力装置２の入力条件設定（３）画面の表示例である。「索引語抽出条件」のウインドの「自社キーワード切出１」「自社キーワード切出２」「商用キーワード切出１」「商用キーワード切出２」などの中から「自社キーワード切出１」を選び、次に「類似率算出方法」のウインドの「類似率１」「類似率２」「類似率３」「類似率４」「類似率５」「類似率６」などの中から「類似率１」を選び、次に「母集団文書選出」のウインドの「母集団文書数」「非母集団文書数」などの中から「母集団文書数」を選び、更に「上位１００件」「上位１０００件」「上位３０００件」「上位５０００件」「数値入力」などの中から「上位３０００件」を選んだ状態の例である。この例の抽出条件設定画面における設定条件が、抽出条件その他入力部２３０を設定する。 FIG. 9 is a display example of an input condition setting (3) screen of the input device 2 of the information analysis report automatic creation device. Select “In-house keyword extraction 1” from “In-house keyword extraction 1”, “In-house keyword extraction 2”, “Commercial keyword extraction 1”, “Commercial keyword extraction 2”, etc. Next, “similarity 1” from among “similarity 1” “similarity 2” “similarity 3” “similarity 4” “similarity 5” “similarity 6” of the window of “similarity calculation method” Next, select “Number of Population Documents” from the “Number of Population Documents”, “Number of Non-Population Documents”, etc. in the “Select Population Document” window, and then select “Top 100” and “Top 1000”. This is an example of a state in which “higher 3000 items” is selected from “items”, “higher 3000 items”, “higher 5000 items”, “numerical value input” and the like. The setting condition on the extraction condition setting screen in this example sets the extraction condition other input unit 230.

図１０は、情報解析報告書自動作成装置の前記出力装置２の出力条件設定画面の表示例である。「マップ算出方法」のウインドの「ｘ軸」に「ｘ軸：索引語件数」及び「ｙ軸」に「ｙ軸：索引語順位」を選び、次に「マップ位置」のウインドの「マップ１枚」「マップ２枚」「マップ１枚・データ付」「マップ２枚・データ付」「マップ１枚・コメント付」「マップ２枚・コメント付」「マップ１・データ・コメント付」「マップ２・データ・コメント付」などの中から「マップ１枚」を選び、次に「出力データ」のウインドの「独創語」「専門語」「母集団特性語」などの中から「独創語」を選び、更に「なし」「上位５個」「上位１０個」「上位１５個」「上位２０個」「数値入力」などの中から「上位２０個」を選んだ状態の例である。次に「コメン卜」のウインドの枠内の「（自由記入）」には無記入にした。こうして抽出条件その他入力部２３０の出力条件を設定する。 FIG. 10 is a display example of an output condition setting screen of the output device 2 of the information analysis report automatic creation device. Select “x axis: number of index words” and “y axis: index word ranking” for “x axis” in the “map calculation method” window, and then “map 1” in the “map position” window. "Sheets" "two maps" "one map with data" "two maps with data" "one map with comments" "two maps with comments" "map one with data and comments" "maps" 2. Select “1 map” from “Data, with comments”, etc., and then “Original words” from the “Output words” window, “Original words”, “Technical terms”, “Population characteristic words”, etc. And “Top 20” is selected from “None”, “Top 5”, “Top 10”, “Top 15”, “Top 20”, “Numeric Input”, and the like. Next, “(free entry)” in the window of “Comment” was left blank. Thus, the extraction conditions and other output conditions of the input unit 230 are set.

図１１は、本発明の情報解析報告書自動作成装置において、図７から図１０までの例で入力したときの、母集団文書Ｓ選出部１６０の選出結果、及び特徴索引語ＴＦ（ｄ）ＩＤＦ（Ｓ）演算部１８０の抽出結果を、マップ（グラフ・表）生成部４１５を介して、マップ・データ・コメント複合整形出力部４４０にて出力した出力結果の、具体例である。 FIG. 11 shows the selection result of the population document S selection unit 160 and the characteristic index word TF (d) IDF when the information analysis report automatic creating apparatus of the present invention is inputted in the examples of FIGS. (S) This is a specific example of the output result output by the map / data / comment composite shaping output unit 440 through the map (graph / table) generation unit 415 and the extraction result of the calculation unit 180.

図１１から、本発明の情報解析報告書自動作成装置において、調査対象文書ｄの「ＷＥＢサイトへのアクセス促進システム」に関する公開特許公報にとって、比較対象文書として特許公開公報と特許公報の約１０年分の文書と比較して、特徴のある索引語を調べた結果、「会員」「画像」「ＷＥＢ」「表示」などが特徴索引語であることが分かる。 From FIG. 11, in the information analysis report automatic creating apparatus according to the present invention, the published patent gazette related to the “system for promoting access to the WEB site” of the surveyed document d is about 10 years as a comparative document. As a result of examining characteristic index words compared to the minute document, it is understood that “member”, “image”, “WEB”, “display”, and the like are characteristic index words.

図１２乃至図３１は、本発明の情報解析報告書自動作成装置の別の例で、図１１のときと同条件で情報解析したときの、母集団文書Ｓ選出部１６０の選出結果、及び特徴索引語ＴＦ（ｄ）ＩＤＦ（Ｓ）演算部１８０の抽出結果を、マップ（グラフ・表）生成部４１５を介して、マップ・データ・コメント複合整形出力部４４０にて出力した出力結果の、マップ表示の具体例である。 FIGS. 12 to 31 show another example of the information analysis report automatic creating apparatus of the present invention. The selection results and characteristics of the population document S selection unit 160 when information is analyzed under the same conditions as in FIG. A map of an output result output from the index word TF (d) IDF (S) operation unit 180 by the map / data / comment composite shaping output unit 440 via the map (graph / table) generation unit 415 It is a specific example of a display.

は本発明に係る構成図Is a block diagram according to the present invention は本発明に係る一実施形態の詳細な構成図FIG. 1 is a detailed configuration diagram of an embodiment according to the present invention. は本発明に係る一実施形態の入力装置２の動作を示すフローチャートThese are flowcharts which show operation | movement of the input device 2 of one Embodiment which concerns on this invention. は本発明に係る一実施形態の処理装置１の動作を示すフローチャートThese are flowcharts which show operation | movement of the processing apparatus 1 of one Embodiment which concerns on this invention. は本発明に係る一実施形態の出力装置４の動作を示すフローチャートFIG. 5 is a flowchart showing the operation of the output device 4 according to the embodiment of the present invention. は従来例Is a conventional example は本発明に係る一実施形態の入力条件設定の実施例Is an example of input condition setting of an embodiment according to the present invention は本発明に係る一実施形態の入力条件設定の実施例Is an example of input condition setting of an embodiment according to the present invention は本発明に係る一実施形態の入力条件設定の実施例Is an example of input condition setting of an embodiment according to the present invention は本発明に係る一実施形態の出力条件設定の実施例Is an example of output condition setting of an embodiment according to the present invention は本発明の実施例Is an embodiment of the invention は本発明の別の実施例の表示例Is a display example of another embodiment of the present invention は本発明の別の実施例の表示例Is a display example of another embodiment of the present invention は本発明の別の実施例の表示例Is a display example of another embodiment of the present invention は本発明の別の実施例の表示例Is a display example of another embodiment of the present invention は本発明の別の実施例の表示例Is a display example of another embodiment of the present invention は本発明の別の実施例の表示例Is a display example of another embodiment of the present invention は本発明の別の実施例の表示例Is a display example of another embodiment of the present invention は本発明の別の実施例の表示例Is a display example of another embodiment of the present invention は本発明の別の実施例の表示例Is a display example of another embodiment of the present invention は本発明の別の実施例の表示例Is a display example of another embodiment of the present invention は本発明の別の実施例の表示例Is a display example of another embodiment of the present invention は本発明の別の実施例の表示例Is a display example of another embodiment of the present invention は本発明の別の実施例の表示例Is a display example of another embodiment of the present invention は本発明の別の実施例の表示例Is a display example of another embodiment of the present invention は本発明の別の実施例の表示例Is a display example of another embodiment of the present invention は本発明の別の実施例の表示例Is a display example of another embodiment of the present invention は本発明の別の実施例の表示例Is a display example of another embodiment of the present invention は本発明の別の実施例の表示例Is a display example of another embodiment of the present invention は本発明の別の実施例の表示例Is a display example of another embodiment of the present invention は本発明の別の実施例の表示例Is a display example of another embodiment of the present invention

Explanation of symbols

１：処理装置、２：入力装置、３：記録装置、４：出力装置，
１１０：調査対象文書ｄ読み出し部、１２０：索引語（ｄ）抽出部、１３０：比較対象文書Ｐ読み出し部、１４０：索引語（Ｐ）抽出部、１２１：ＴＦ（ｄ）演算部，１４１：ＴＦ（Ｐ）演算部、１４２：ＩＤＦ（Ｐ）演算部、１５０：類似率演算部、１６０：母集団文書Ｓ選出部、１７０：索引語（Ｓ）抽出部、１７１：ＩＤＦ（Ｓ）演算部、１８０：特徴索引語ＴＦ（ｄ）ＩＤＦ（Ｓ）演算部、
２１０：調査対象文書ｄ条件入力部、２２０：比較対象文書Ｐ条件入力部、２３０：抽出条件その他入力部、
３１０：条件記録部、３２０：作業結果格納部、３３０：文書格納部、
４１０：マップ作成条件読出部、４１２：マップ用データ取り込み部、４２０：母集団データ出力条件読み出し部、４２２：出力データ取り込み部、４３０：コメント条件読出部、４３２：定形コメント取り込み部、４３５：コメント追記部、４４０：マップ・データ・コメント複合整形出力部。1: processing device, 2: input device, 3: recording device, 4: output device,
110: Survey target document d reading unit, 120: Index word (d) extraction unit, 130: Comparison target document P reading unit, 140: Index word (P) extraction unit, 121: TF (d) calculation unit, 141: TF (P) calculation unit, 142: IDF (P) calculation unit, 150: similarity ratio calculation unit, 160: population document S selection unit, 170: index word (S) extraction unit, 171: IDF (S) calculation unit, 180: feature index word TF (d) IDF (S) calculation unit,
210: Survey target document d condition input unit, 220: Comparison target document P condition input unit, 230: Extraction condition other input unit,
310: Condition recording unit, 320: Work result storage unit, 330: Document storage unit,
410: Map creation condition reading unit, 412: Map data loading unit, 420: Population data output condition reading unit, 422: Output data loading unit, 430: Comment condition reading unit, 432: Standard comment loading unit, 435: Comment Appending part, 440: Map, data, and comment combined shaping output part.

Claims

An information analysis report automatic creation device that accurately and automatically creates the characteristics of a survey target document with respect to a comparison target document in information analysis of the survey target document,
An input means for specifying and inputting the survey target document and the comparison target document, and inputting conditions for performing the information analysis,
A selection means for selecting, from the comparison target documents, a population document consisting of a document group similar to the survey target document;
Extraction means for extracting characteristic index terms for the population document of the survey target document;
An information analysis report automatic creation apparatus comprising: output means for displaying, recording, or outputting to a communication a population or index word representing the characteristics of the document to be investigated.

The information analysis report automatic creation device according to claim 1,
In a selection means for selecting a population document consisting of a document group similar to the survey target document from the comparison target documents,
An information analysis report automatic creation device, wherein the selection means is a selection means that is selected based on a result of a calculation means that calculates a similarity to a comparison target document

An information analysis report automatic creation device according to claim 2,
In the selection means, the selection means is selected based on the result of the calculation means for calculating the similarity to the comparison target document.
An information analysis report automatic creating apparatus, wherein the calculation means for calculating the similarity is a similarity calculation means based on a function value of the appearance frequency and document frequency for each index word of each document.

An information analysis report automatic creation device according to claim 2,
In the output means for displaying, recording, or outputting the population or index word representing the characteristics of the document to be investigated,
The display means for distributing and displaying the population or the index terms in a map;
Display means for partially displaying data of the population or the index word;
The output means includes an output means for automatically or selecting a standard comment corresponding to the content, or inputting or selecting a free comment, and displaying the information analysis report automatically. apparatus.

An input means for inputting a condition for performing information analysis by designating and inputting a survey target document and a comparison target document,
A selection means for selecting, from the comparison target documents, a population document consisting of a document group similar to the survey target document;
Extraction means for extracting characteristic index terms for the population document of the survey target document;
An information processing means for automatically generating an information analysis report, comprising: an output means for displaying, recording, or outputting the population or the index word representing characteristics of the investigation target document to communication Works
In the information analysis of the survey target document, an information analysis report automatic creation program for accurately and automatically creating the characteristics of the survey target document with respect to the comparison target document,
A function of designating and inputting the survey target document and the comparison target document, and inputting conditions for performing the information analysis;
A function of selecting a population document consisting of a document group similar to the survey target document from comparison target documents;
A function of extracting characteristic index terms for the population document of the survey target document;
An information analysis report automatic creation program characterized by realizing a function of displaying, recording, or outputting to a communication a population or index word representing the characteristics of the survey target document.

It operates in the information processing means of the information analysis report automatic creation device according to claim 5,
In the information analysis of the survey target document, an information analysis report automatic creation program for accurately and automatically creating the characteristics of the survey target document with respect to the comparison target document,
In the function of selecting a population document consisting of a document group similar to the survey target document from the comparison target documents,
An information analysis report automatic creation program characterized by realizing that the selection function is a function that is selected based on a result of a calculation function for calculating a similarity to a comparison target document.

It operates in the information processing means of the information analysis report automatic creation device according to claim 6,
In the information analysis of the survey target document, an information analysis report automatic creation program for accurately and automatically creating the characteristics of the survey target document with respect to the comparison target document,
In the function of selecting by the result of the calculation function that calculates the similarity to the comparison target document,
An information analysis report automatic creation program that realizes that the calculation function for calculating the similarity is a function for calculating a similarity based on a function value of the appearance frequency and document frequency for each index word of each document.

It operates in the information processing means of the information analysis report automatic creation device according to claim 6,
In the information analysis of the survey target document, an information analysis report automatic creation program for accurately and automatically creating the characteristics of the survey target document with respect to the comparison target document,
In the function of displaying, recording, or outputting the population or index word representing the characteristics of the survey target document to the communication,
The display function for distributing and displaying the population or the index terms in a map form;
A display function for partially displaying the data of the population or the index word;
An information analysis report characterized by realizing that the output function includes an output function for automatically or selecting a fixed comment according to the content, or inputting or selecting a free comment for display. Automatic document creation program.

An input means for inputting a condition for performing information analysis by designating and inputting a survey target document and a comparison target document,
A selection means for selecting, from the comparison target documents, a population document consisting of a document group similar to the survey target document;
Extraction means for extracting characteristic index terms for the population document of the survey target document;
Using the information analysis report automatic creation device, characterized in that it comprises an output means for displaying, recording, or outputting the population or the index word representing the characteristics of the document to be investigated,
In the information analysis of the survey target document, an information analysis report automatic creation method for accurately and automatically creating the characteristics of the survey target document with respect to the comparison target document,
Designating and inputting the survey target document and the comparison target document, and inputting conditions for performing the information analysis;
Selecting a population document consisting of a group of documents similar to the survey target document from the comparison target documents;
Extracting characteristic index terms for the population document of the survey target document;
A method of automatically generating an information analysis report, comprising the step of displaying, recording, or outputting to a communication a population or index word representing the characteristics of the document to be investigated.

An information analysis report automatic creation method according to claim 9,
In the process of selecting a population document consisting of a document group similar to the survey target document from the comparison target documents,
An information analysis report automatic creation method characterized in that the selection step includes a step of selecting based on a result of a calculation function for calculating a similarity to a comparison target document.

An information analysis report automatic creation method according to claim 10,
In the step of selecting by the result of the calculation step of calculating the similarity to the comparison target document,
An information analysis report automatic creation method characterized in that the calculation step of calculating the similarity rate includes a step of calculating a similarity rate based on a function value of the appearance frequency and document frequency for each index word of each document.

An information analysis report automatic creation method according to claim 10,
In the step of displaying, recording, or outputting to the communication, the population or the index word representing the characteristics of the survey target document,
A display step of distributing and displaying the population or the index terms in a map;
A display step for displaying a part of the data of the population or the index word;
The output process includes an output process for automatically or selecting a standard comment according to the content, or inputting or selecting a free comment, and displaying the information analysis report automatically. Method.