JP4699909B2

JP4699909B2 - Keyword correspondence analysis apparatus and analysis method

Info

Publication number: JP4699909B2
Application number: JP2006016136A
Authority: JP
Inventors: 誠司高野; 隆中居; 克哉三室; 英介須藤
Original assignee: Nomura Research Institute Ltd
Current assignee: Nomura Research Institute Ltd
Priority date: 2006-01-25
Filing date: 2006-01-25
Publication date: 2011-06-15
Anticipated expiration: 2026-01-25
Also published as: JP2007199906A

Description

本発明は、例えば、特許文献や学術文献等の文献情報に含まれるキーワードの対応関係を分析するキーワード対応関係分析装置及び分析方法に関する。 The present invention relates to a keyword correspondence analysis device and an analysis method for analyzing the correspondence between keywords included in document information such as patent documents and academic documents.

例えば、文献情報としての特許文献を調査することにより、研究開発の方向性を探ったり、あるいは事業計画の立案等を行うことができる。このために、膨大な特許文献群から所定の目的に合致する特許文献を抽出し、抽出された特許文献群を分析する。この分析結果を二次元平面に表現したものは、いわゆる特許マップとして知られており、特許文献群から特許マップを自動的に作成する技術は知られている（特許文献１）。 For example, by investigating patent documents as document information, it is possible to find out the direction of research and development or to make business plans. For this purpose, patent documents that match a predetermined purpose are extracted from a large group of patent documents, and the extracted patent documents are analyzed. A representation of this analysis result on a two-dimensional plane is known as a so-called patent map, and a technique for automatically creating a patent map from a group of patent documents is known (Patent Document 1).

また、形態素解析を行って、対象の文書内で所定数以上出現した用語を抽出することにより、辞書に未登録の専門用語や特定の組織でのみ使用される略称であっても、用語として検出できるようにした技術も知られている（特許文献２）。
特開２００５−１４９３４６号公報特開２００２−３４２３２１号公報 In addition, by performing morphological analysis and extracting terms that appear more than a certain number in the target document, even technical terms that are not registered in the dictionary or abbreviations that are used only in specific organizations are detected as terms A technique that enables this is also known (Patent Document 2).
JP 2005-149346 A JP 2002-342321 A

特許文献等では、できるだけ正確な用語を用いるのが好ましい。しかし、特許文献は、最新の技術を対象とするため、正確な用語が制定される前に作成されやすいという性質を有する。出願の際に業界で使用されている用語を用いて、特許文献は作成されるため、同一の技術要素を示す用語でも、出願人によってばらつきを生じる場合がある。 In patent documents and the like, it is preferable to use terms as accurate as possible. However, since patent documents are directed to the latest technology, they have the property that they are easily created before exact terms are established. Since patent documents are created using terms used in the industry at the time of filing, terms indicating the same technical element may vary depending on the applicant.

また、最新の用語ではない場合でも、各出願人（企業等）で慣用されている用語がそれぞれ異なる場合もある。例えば、ある出願人は、日本工業規格等で制定された正式な名称を使用し、他の出願人は、正式名称の略称を使用し、さらに別の出願人は、その組織内でのみ使用されている特殊な用語を使用する場合がある。 Even if the term is not the latest, the term commonly used by each applicant (company, etc.) may be different. For example, some applicants use formal names established in Japanese Industrial Standards, others use formal name abbreviations, and other applicants are used only within the organization. There are cases where special terms are used.

さらに、用語は時代と共に変遷していく性質を備える。従って、ある時代で標準的な用語であっても、別の時代には別の用語で表現されている場合もある。 Furthermore, terms have the property of changing with the times. Therefore, even a standard term in one era may be expressed in another term in another era.

同一の技術要素を示す用語が、種々の異なる用語で表現されている場合、機械検索による検索精度は、一般的に低下する。その技術分野の知識及び検索技術の両方に精通した検索者であれば、検索漏れ及びノイズの少ない検索結果を得られるであろうが、技術知識または検索技術のいずれかが不足している検索者の場合、精度の高い検索結果を得るのは難しい。 When terms indicating the same technical element are expressed by various different terms, the search accuracy by machine search generally decreases. Searchers who are familiar with both technical knowledge and search technology will be able to obtain search results with low search omissions and noise, but searchers who lack either technical knowledge or search technology. In this case, it is difficult to obtain a highly accurate search result.

また、熟練した検索者であっても、同一の技術用語が異なる用語で表現されている場合は、検索に要する時間も手間も増大する。検索者は、例えば、多数の特許文献を事前に読み込んで、用語の異称を研究してから、検索式を組み立てる必要があるためである。 Moreover, even if it is an expert searcher, when the same technical vocabulary is expressed with a different term, the time and effort required for a search increase. This is because, for example, it is necessary for a searcher to read a large number of patent documents in advance and study the nicknames of terms before assembling a search expression.

さらに、普段使用している用語とは別の用語が使用されている特許文献を読む場合、正しい内容を短時間で把握するのは難しく、混乱や誤解を招く可能性がある。 Furthermore, when reading a patent document in which a term different from the term that is normally used is read, it is difficult to grasp the correct contents in a short time, which may lead to confusion and misunderstanding.

本発明は、上述の問題点に鑑みてなされたもので、その目的は、特定の文献情報でのみ使用されているキーワードを自動的に検出することができるキーワード対応関係分析装置及び分析方法を提供することにある。本発明の他の目的は、多くの文献情報で使用されている標準的なキーワードと特定の文献情報でのみ使用されているキーワードとの対応関係を自動的に検出し、キーワード検索を支援できるキーワード対応関係分析装置及び分析方法を提供することにある。本発明のさらに別の目的は、標準的ではないキーワードが使用されている文献情報の内容把握を支援できるようにしたキーワード対応関係分析装置及び分析方法を提供することにある。本発明の更なる目的は、後述する実施形態の記載から明らかになるであろう。 The present invention has been made in view of the above-described problems, and an object thereof is to provide a keyword correspondence analysis device and an analysis method capable of automatically detecting a keyword used only in specific document information. There is to do. Another object of the present invention is to automatically detect a correspondence relationship between a standard keyword used in a large amount of document information and a keyword used only in specific document information, and to support keyword search. To provide a correspondence analysis apparatus and an analysis method. Still another object of the present invention is to provide a keyword correspondence analysis device and an analysis method that can support grasping the contents of document information in which non-standard keywords are used. Further objects of the present invention will become clear from the description of the embodiments described later.

上記課題を解決すべく、本発明の一つの観点に従うキーワード対応関係分析装置は、電子化された複数の文献情報を記憶する文献情報記憶部と、与えられた分析条件に基づいて文献情報記憶部を検索することにより、分析対象の文献情報を抽出する対象文献抽出部と、抽出された文献情報に基づいて、当該文献情報に出現するキーワード毎に、その出願頻度を解析してなるキーワード出現頻度解析情報を生成する出現頻度解析部と、キーワード出現頻度解析情報に基づいて、抽出された文献情報に含まれるキーワードのうち、所定の文献情報で使用されている第１キーワードを検出する第１キーワード検出部と、キーワード出現頻度解析情報に基づいて、第１キーワードに対応する第２キーワードの候補となる第２キーワード候補を、抽出された文献情報のうち所定の文献情報以外の他の文献情報に含まれるキーワードの中から検出する第２キーワード候補検出部と、検出された第２キーワード候補の中から第１キーワードに対応する第２キーワードを検出する第２キーワード検出部と、を備える。 In order to solve the above problem, a keyword correspondence analysis device according to one aspect of the present invention includes a document information storage unit that stores a plurality of digitized document information, and a document information storage unit based on a given analysis condition The keyword appearance frequency obtained by analyzing the application frequency for each keyword appearing in the document information based on the extracted document information and the target document extraction unit that extracts the document information to be analyzed by searching An appearance frequency analysis unit that generates analysis information, and a first keyword that detects a first keyword used in predetermined document information among keywords included in the extracted document information based on the keyword appearance frequency analysis information Based on the detection unit and the keyword appearance frequency analysis information, second keyword candidates that are candidates for the second keyword corresponding to the first keyword are extracted. A second keyword candidate detecting unit that detects from keywords included in other document information other than the predetermined document information, and a second corresponding to the first keyword from the detected second keyword candidates A second keyword detection unit for detecting a keyword.

文献情報としては、例えば、特許文献（公開公報、登録公報等を含む）や学術論文等を挙げることができる。文献情報は、電子化されて文献情報記憶部に記憶されている。文献情報記憶部に記憶された複数の文献情報のうち、与えられた分析条件に基づいて、分析対象の文献情報が抽出される。より詳しくは、分析対象として抽出される文献情報は、特定の技術分野に関する文献情報である。 Examples of the document information include patent documents (including open gazettes, registered gazettes) and academic papers. The document information is digitized and stored in the document information storage unit. Of the plurality of document information stored in the document information storage unit, the document information to be analyzed is extracted based on the given analysis condition. More specifically, the literature information extracted as the analysis target is literature information related to a specific technical field.

出現頻度解析部は、抽出された各文献情報に含まれるキーワード毎に、それぞれの出現頻度を算出し、例えば、出現頻度の高いものから順番に順位を付与して、キーワード出現頻度解析情報を生成する。第１キーワード検出部は、所定の文献情報で使用されている第１キーワードを検出する。この第１キーワードは、例えば、所定数以上の所定の文献情報で使用されている標準的なキーワードである。第２キーワード候補検出部は、第１キーワードに対応する第２キーワードの候補となるキーワードを検出する。第２キーワードとは、所定の文献情報以外の他の文献情報でのみ使用されている特殊なキーワードであり、いわゆる方言的なキーワードと呼ぶこともできる。そして、第２キーワード検出部は、第２キーワード候補の中から、第１キーワードに対応する第２キーワードを検出する。これにより、表現の異なるキーワード同士の対応関係を自動的に検出することができる。 The appearance frequency analysis unit calculates each appearance frequency for each keyword included in each extracted document information, for example, assigns a ranking in descending order of appearance frequency to generate keyword appearance frequency analysis information To do. The first keyword detection unit detects a first keyword used in predetermined document information. The first keyword is, for example, a standard keyword used in predetermined document information of a predetermined number or more. The second keyword candidate detection unit detects a keyword that is a candidate for the second keyword corresponding to the first keyword. The second keyword is a special keyword used only in other document information other than the predetermined document information, and can be called a so-called dialect keyword. Then, the second keyword detecting unit detects a second keyword corresponding to the first keyword from the second keyword candidates. Thereby, it is possible to automatically detect the correspondence between keywords having different expressions.

本発明の一態様では、第２キーワード検出部は、第２キーワード候補のうちユーザにより選択された第２キーワード候補を第２キーワードとして検出する。例えば、第２キーワード候補の候補リストをユーザに提示し、ユーザによって第２キーワードを選択させることができる。例えば、候補リストには、第１キーワードに対応する可能性の高いものから順番に、第２キーワード候補が記載される。 In one aspect of the present invention, the second keyword detection unit detects the second keyword candidate selected by the user from the second keyword candidates as the second keyword. For example, a candidate list of second keyword candidates can be presented to the user, and the second keyword can be selected by the user. For example, in the candidate list, the second keyword candidates are listed in order from the one that is most likely to correspond to the first keyword.

本発明の一態様では、検出された第１キーワードの特徴情報を検出する第１特徴情報検出部と、検出された第２キーワード候補の特徴情報を検出する第２特徴情報検出部と、をさらに備え、第２キーワード検出部は、検出された第１キーワードの特徴情報と検出された第２キーワード候補の特徴情報とを比較することにより、第２キーワード候補の中から第２キーワードを検出する。特徴情報としては、例えば、抽出された文献情報に基づいて生成された座標系における各キーワードの座標や、各キーワードの出現頻度の順位等を用いることができる。第２キーワード検出部は、第１キーワードとの間で特徴情報の差異が最も少ない第２キーワード候補を第２キーワードとして検出可能である。 In one aspect of the present invention, a first feature information detection unit that detects feature information of the detected first keyword, and a second feature information detection unit that detects feature information of the detected second keyword candidate are further included. The second keyword detection unit detects the second keyword from the second keyword candidates by comparing the detected feature information of the first keyword with the detected feature information of the second keyword candidate. As the feature information, for example, the coordinates of each keyword in the coordinate system generated based on the extracted document information, the ranking of the appearance frequency of each keyword, and the like can be used. The second keyword detection unit can detect, as the second keyword, a second keyword candidate that has the least difference in feature information from the first keyword.

本発明の一態様では、検出された第１キーワードの特徴情報を検出する第１特徴情報検出部と、検出された第２キーワード候補の特徴情報を検出する第２特徴情報検出部と、第１キーワードの特徴情報と第２キーワードの特徴情報とを比較する特徴情報比較部と、特徴情報比較部による比較結果を出力する比較結果出力部と、をさらに備え、第２キーワード検出部は、比較結果に基づいて、第２キーワード候補のうちユーザにより選択された第２キーワード候補を第２キーワードとして検出するユーザ指定モードと、比較結果に基づいて、第１キーワードの特徴情報との差異が最も少ない特徴情報を有する第２キーワード候補を第２キーワードとして検出する自動検出モードと、を備える。 In one aspect of the present invention, a first feature information detector that detects feature information of a detected first keyword, a second feature information detector that detects feature information of a detected second keyword candidate, A feature information comparison unit that compares the feature information of the keyword with the feature information of the second keyword, and a comparison result output unit that outputs a comparison result by the feature information comparison unit, and the second keyword detection unit includes the comparison result Based on the above, the user specification mode for detecting the second keyword candidate selected by the user among the second keyword candidates as the second keyword, and the feature having the smallest difference between the feature information of the first keyword based on the comparison result And an automatic detection mode for detecting a second keyword candidate having information as a second keyword.

本発明の一態様では、第１キーワード検出部は、キーワード出現頻度解析情報に基づいて、第１の所定値までの出現頻度順位を有するキーワードのうち、第２の所定値以上かつ抽出された文献情報の総数未満の所定の文献情報で使用されているキーワードを、第１キーワードとして検出する。 In one aspect of the present invention, the first keyword detection unit extracts a document that is equal to or more than the second predetermined value and extracted from keywords having the appearance frequency rank up to the first predetermined value based on the keyword appearance frequency analysis information. A keyword used in predetermined document information less than the total number of information is detected as a first keyword.

本発明の一態様では、第２キーワード候補検出部は、キーワード出現頻度解析情報に基づいて、所定の文献情報以外の他の文献情報に含まれるキーワードのうち、所定の文献情報で第３の所定値以上使用されているキーワードを除去して残ったキーワードを、第２キーワード候補として検出する。 In one aspect of the present invention, the second keyword candidate detection unit uses the predetermined document information as a third predetermined keyword out of keywords included in other document information other than the predetermined document information based on the keyword appearance frequency analysis information. Keywords remaining after removal of keywords that are used in excess of the value are detected as second keyword candidates.

本発明の一態様では、第１特徴情報検出部は、第１キーワードの所定の文献情報における平均出現頻度順位を第１キーワードの特徴情報として検出し、第２特徴情報検出部は、所定の文献情報以外の他の文献情報における第２キーワード候補の出現頻度順位を第２キーワードの特徴情報として検出する。 In one aspect of the present invention, the first feature information detecting unit detects the average appearance frequency rank in the predetermined document information of the first keyword as the feature information of the first keyword, and the second feature information detecting unit is the predetermined document. The appearance frequency rank of the second keyword candidate in the document information other than the information is detected as the feature information of the second keyword.

本発明の一態様では、抽出された文献情報の総数及び抽出された文献情報から抽出される所定の複数のキーワードの出現数に基づいて主成分分析を行うことにより、抽出された文献情報に含まれるキーワードの座標を算出するキーワード座標算出部をさらに備え、第１特徴情報検出部は、キーワード座標算出部により算出される第１キーワードの座標を第１キーワードの特徴情報として検出し、第２特徴情報検出部は、キーワード座標算出部により算出される第２キーワード候補の座標を第２キーワード候補の特徴情報として検出する。 In one aspect of the present invention, the principal component analysis is performed based on the total number of extracted document information and the number of appearances of a plurality of predetermined keywords extracted from the extracted document information, so that it is included in the extracted document information The first feature information detection unit further detects the first keyword coordinates calculated by the keyword coordinate calculation unit as feature information of the first keyword, and calculates a second feature. The information detection unit detects the coordinates of the second keyword candidate calculated by the keyword coordinate calculation unit as the feature information of the second keyword candidate.

本発明の一態様では、前記第１特徴情報検出部は、前記所定の文献情報において前記第１キーワードと係り受けをなす単語のランキングを前記第１特徴情報として検出し、
前記第２特徴情報検出部は、前記所定の文献情報以外の他の文献情報において前記第２キーワード候補と係り受けをなす単語のランキングを前記第２特徴情報として検出するようになっている。 In one aspect of the present invention, the first feature information detection unit detects a ranking of words that are dependent on the first keyword in the predetermined document information as the first feature information,
The second feature information detection unit is configured to detect a ranking of words that are dependent on the second keyword candidate in other document information other than the predetermined document information as the second feature information.

本発明の一態様では、抽出された文献情報から抽出される所定の複数のキーワードの組合せ及び出現数に基づいて主成分分析を行うことにより、抽出された文献情報群の各文献情報の座標をそれぞれ算出する文献座標算出部と、所定の各キーワードを含む文献情報の総数及び所定の各キーワードの出現数に基づいて主成分分析を行うことにより、所定の各キーワードの座標をそれぞれ算出するキーワード座標算出部と、文献座標算出部により算出された各文献情報の座標に基づいて各文献情報の分布密度を算出し、この算出された分布密度に基づく輪郭を有するマップ図形及び所定のキーワードをそれぞれ可視化してマップ情報を生成するマップ生成部をさらに備えており、第１キーワード及び第２キーワードは、マップ情報に可視化されている。 In one aspect of the present invention, by performing principal component analysis based on a combination of a plurality of predetermined keywords extracted from the extracted document information and the number of appearances, the coordinates of each document information of the extracted document information group are obtained. Keyword coordinates for calculating the coordinates of each predetermined keyword by performing a principal component analysis based on the total number of document information including each predetermined keyword and the number of occurrences of each predetermined keyword. The distribution density of each document information is calculated based on the coordinates of each document information calculated by the calculation unit and the document coordinate calculation unit, and a map figure having an outline based on the calculated distribution density and a predetermined keyword are visualized respectively. And a map generation unit for generating map information, wherein the first keyword and the second keyword are visualized in the map information. That.

ここで、文献座標算出部は、所定の複数のキーワードの組合せ及び所定の複数のキーワードの出現数に基づいて、主成分分析を行い、抽出された文献情報の座標をそれぞれ算出する。主成分分析とは、多変量解析の一手法であり、簡単に言えば、それぞれ多くの変量を含む各サンプル情報について、その相違を最も端的に表す幾つかの総合的指標（主成分）で代表させ、次元数を縮減させる分析方法である。 Here, the document coordinate calculation unit performs principal component analysis based on the combination of a plurality of predetermined keywords and the number of appearances of the predetermined plurality of keywords, and calculates the coordinates of the extracted document information, respectively. Principal component analysis is a method of multivariate analysis. To put it simply, each sample information that contains many variables is represented by several comprehensive indicators (principal components) that most directly represent the differences. This is an analysis method that reduces the number of dimensions.

同様にして、キーワード座標算出部は、主成分分析により、所定の各キーワードの座標をそれぞれ算出する。マップ生成部は、各文献情報の座標に基づいて、文献情報の分布密度を算出する。例えば、マップ生成部は、マップの全領域を複数のブロックに区切り、各ブロックに位置する文献情報の数を算出することによって、各ブロック毎の文献情報の粗密を検出可能である。マップ生成部は、例えば、文献情報の分布密度に表示要素（例えば等高線等）を対応付けることにより、輪郭を有するマップ図形を作成する。また、マップ生成部は、このマップ図形とキーワードの存在を示す表示要素（例えば、キーワードの文字そのもの、あるいはシンボル）とをマップ領域上に配置することにより、マップ情報を生成する。このマップ情報は、例えば、２次元平面に表現されるが、これに限らず、３次元空間に表現することもできる。 Similarly, the keyword coordinate calculation unit calculates the coordinates of each predetermined keyword by principal component analysis. The map generation unit calculates the distribution density of the document information based on the coordinates of each document information. For example, the map generation unit can detect the density of the document information for each block by dividing the entire area of the map into a plurality of blocks and calculating the number of document information located in each block. For example, the map generation unit creates a map figure having an outline by associating display elements (for example, contour lines) with the distribution density of the document information. The map generation unit generates map information by arranging the map graphic and a display element indicating the presence of the keyword (for example, a keyword character itself or a symbol) on the map area. This map information is expressed, for example, on a two-dimensional plane, but is not limited thereto, and can be expressed in a three-dimensional space.

第２キーワード検出部は、キーワード座標算出部によりそれぞれ算出される第１キーワードの座標及び第２キーワード候補の座標を比較することにより、第２キーワード候補の中から第２キーワードを検出することができる。 The second keyword detection unit can detect the second keyword from the second keyword candidates by comparing the coordinates of the first keyword and the coordinates of the second keyword candidate respectively calculated by the keyword coordinate calculation unit. .

第１キーワードと第２キーワードとは、それぞれ異なる表示形態でマップ情報に可視化されており、マップ情報には、第１キーワードと第２キーワードとの対応関係を示す表示要素を含めることができる。 The first keyword and the second keyword are visualized in map information in different display forms, and the map information can include a display element indicating a correspondence relationship between the first keyword and the second keyword.

本発明の一態様では、第２キーワードを第１キーワードに置換するキーワード置換部を備える。例えば、キーワード置換部は、前記他の文献情報に含まれる第２キーワードを第１キーワードに置換した状態で、当該他の文献情報を出力させることができる。 In one aspect of the present invention, a keyword replacement unit that replaces the second keyword with the first keyword is provided. For example, the keyword replacement unit can output the other document information in a state where the second keyword included in the other document information is replaced with the first keyword.

本発明の一態様では、第２キーワードを第１キーワードに関連づけて記憶させる関連性登録部を備える。 In one aspect of the present invention, a relevance registration unit is provided that stores the second keyword in association with the first keyword.

本発明の他の観点に従うプログラムは、コンピュータを、電子化された複数の文献情報を記憶する文献情報記憶手段と、与えられた分析条件に基づいて分析対象の文献情報を抽出する対象文献抽出手段と、抽出された文献情報に基づいて、当該文献情報に出現するキーワード毎に、その出願頻度を解析してなるキーワード出現頻度解析情報を生成する出現頻度解析手段と、キーワード出現頻度解析情報に基づいて、抽出された文献情報に含まれるキーワードのうち、所定の文献情報で使用されている第１キーワードを検出する第１キーワード検出手段と、キーワード出現頻度解析情報に基づいて、第１キーワードに対応する第２キーワードの候補となる第２キーワード候補を、抽出された文献情報のうち所定の文献情報以外の他の文献情報に含まれるキーワードの中から検出する第２キーワード候補検出手段と、検出された第２キーワード候補の中から第１キーワードに対応する第２キーワードを検出する第２キーワード検出手段と、して機能させる。 A program according to another aspect of the present invention includes a computer, a document information storage unit that stores a plurality of digitized document information, and a target document extraction unit that extracts document information to be analyzed based on a given analysis condition And, based on the extracted document information, for each keyword appearing in the document information, based on the keyword appearance frequency analysis information, the appearance frequency analysis means for generating keyword appearance frequency analysis information obtained by analyzing the application frequency The first keyword detection means for detecting the first keyword used in the predetermined document information among the keywords included in the extracted document information and the first keyword based on the keyword appearance frequency analysis information The second keyword candidate that is a candidate for the second keyword to be used is the document information other than the predetermined document information among the extracted document information. A second keyword candidate detection means for detecting from the Murrell keyword, second keyword detection means for detecting a second keyword corresponding from the second keyword candidates detected in the first keyword, thereby to function.

本発明のさらに別の観点に従うキーワード対応関係分析方法は、分析条件を取得するステップと、取得された分析条件に基づいて文献情報記憶部を検索することにより、分析対象の文献情報を抽出するステップと、抽出された文献情報に基づいて、当該文献情報に出現するキーワード毎に、その出願頻度を解析してなるキーワード出現頻度解析情報を生成するステップと、キーワード出現頻度解析情報に基づいて、抽出された文献情報に含まれるキーワードのうち所定の文献情報で使用されている第１キーワードを検出するステップと、キーワード出現頻度解析情報に基づいて、第１キーワードに対応する第２キーワードの候補となる第２キーワード候補を、抽出された文献情報のうち所定の文献情報以外の他の文献情報に含まれるキーワードの中から検出するステップと、検出された第２キーワード候補の中から第１キーワードに対応する第２キーワードを検出するステップと、を含んでいる。 A keyword correspondence analysis method according to still another aspect of the present invention includes a step of acquiring analysis conditions, and a step of extracting document information to be analyzed by searching a document information storage unit based on the acquired analysis conditions And, based on the extracted document information, for each keyword appearing in the document information, generating a keyword appearance frequency analysis information obtained by analyzing the application frequency, and extracting based on the keyword appearance frequency analysis information The second keyword corresponding to the first keyword is selected based on the step of detecting the first keyword used in the predetermined document information from the keywords included in the document information and the keyword appearance frequency analysis information. The second keyword candidate is a keyword included in other document information other than the predetermined document information among the extracted document information. Includes a step of detecting, the step of detecting a second keyword corresponding to the first keyword from among the second keyword candidates detected, from among the.

以下、図面に基づき、本発明の実施の形態を説明する。まず最初に、キーワード対応関係分析装置を単独で構成する場合を説明し、次に、文献情報分析装置の中にキーワード対応関係分析装置を組み込む場合を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. First, a case where the keyword correspondence analysis device is configured alone will be described, and then a case where the keyword correspondence analysis device is incorporated in the document information analysis device will be described.

図１は、キーワード対応関係分析装置（以下、「対応関係分析装置」と略す場合がある）の全体構成を示す説明図である。この対応関係分析装置は、それぞれ後述するように、例えば、分析条件設定部１と、対象文献抽出部２と、情報蓄積部３と、キーワードランキング生成部４と、基準キーワード検出部６と、対応キーワード候補検出部７と、基準キーワード特徴情報検出部８と、対応キーワード候補特徴情報検出部９と、特徴情報比較部１０と、対応キーワード検出部１１と、キーワード対応関係出力部１２と、シソーラス登録部１４及びキーワード置換部１５を備えて構成することができる。対応関係分析装置は、コンピュータ装置またはコンピュータに所定の機能を実現させるプログラムとして、構成される。 FIG. 1 is an explanatory diagram showing the overall configuration of a keyword correspondence analysis device (hereinafter, sometimes abbreviated as “correspondence analysis device”). As will be described later, this correspondence analysis device includes, for example, an analysis condition setting unit 1, a target document extraction unit 2, an information storage unit 3, a keyword ranking generation unit 4, and a reference keyword detection unit 6. Keyword candidate detection unit 7, reference keyword feature information detection unit 8, corresponding keyword candidate feature information detection unit 9, feature information comparison unit 10, corresponding keyword detection unit 11, keyword correspondence output unit 12, and thesaurus registration A unit 14 and a keyword replacement unit 15 can be provided. The correspondence analysis device is configured as a computer device or a program that causes a computer to realize a predetermined function.

分析条件設定部１は、ユーザにより指定される分析条件を設定するものである。ユーザは、種々の分析条件を指定可能である。例えば、ユーザは、特定の企業、特定の技術分野、特定の文献作成期間または公開期間、特定の発明者、特定の技術用語等のように、種々の条件を単独で、または組み合わせて、分析条件を指定することができる。例えば、ユーザは、「特定の企業により特定の技術分野について、特定期間内に出願された特許文献」等のような分析条件を指定することができる。分析条件を設定することにより、分析対象となる母集団が特定される。 The analysis condition setting unit 1 sets analysis conditions specified by the user. The user can specify various analysis conditions. For example, the user can analyze various conditions alone or in combination, such as a specific company, a specific technical field, a specific document creation or publication period, a specific inventor, a specific technical term, etc. Can be specified. For example, the user can specify an analysis condition such as “patent document filed within a specific period for a specific technical field by a specific company”. By setting analysis conditions, a population to be analyzed is specified.

対象文献抽出部２は、分析条件に合致する文献情報を、情報蓄積部３から抽出するものである。情報蓄積部３には、文献情報記憶部３Ａやキーワード等を記憶する記憶部３Ｂ等が設けられている。文献情報記憶部３Ａには、例えば、特許文献、科学技術文献、学術論文等の各種文献が電子化された状態で記憶されている。キーワード等を記憶する記憶部３Ｂには、例えば、各文献情報で使用されている主要なキーワードや、辞書等の情報が記憶されている。 The target document extraction unit 2 extracts document information that matches the analysis conditions from the information storage unit 3. The information storage unit 3 is provided with a document information storage unit 3A, a storage unit 3B for storing keywords, and the like. In the document information storage unit 3A, for example, various documents such as patent documents, scientific and technical documents, and academic papers are stored in an electronic state. The storage unit 3B that stores keywords and the like stores, for example, main keywords used in each document information and information such as a dictionary.

キーワードランキング生成部４は、抽出された文献情報を分析することにより、キーワードランキングテーブル５を生成するものである。キーワードランキング生成部４は「出現頻度解析部」に、キーワードランキングテーブル５は「キーワード出現頻度解析情報」に、それぞれ対応する。キーワードランキング生成部（以下、ランキング生成部と略す場合もある）４は、抽出された各文献情報で使用されているキーワードを抽出し、各キーワード毎に、その出現頻度をカウントする。そして、出現頻度の高い順番に順位を付けて、キーワードランキングテーブル５を生成する。 The keyword ranking generation unit 4 generates a keyword ranking table 5 by analyzing the extracted document information. The keyword ranking generation unit 4 corresponds to “appearance frequency analysis unit”, and the keyword ranking table 5 corresponds to “keyword appearance frequency analysis information”. A keyword ranking generation unit (hereinafter also abbreviated as a ranking generation unit) 4 extracts keywords used in each extracted document information, and counts the appearance frequency for each keyword. Then, the keyword ranking table 5 is generated by ranking in the order of appearance frequency.

ここで、キーワードの抽出方法としては、公知の種々の方法を採用できる。また、例えば、「ガラス」と「カ”ラス」のように、濁点等をあえて別の記号や文字に置き換えて表現しているキーワードであっても、ひとかたまりのキーワードとして検出可能である。この場合は、例えば、文章中に含まれるカタカナの文字ブロック集合を形態素分析で抽出すればよい。 Here, as a keyword extraction method, various known methods can be employed. Further, for example, keywords such as “glass” and “crow” that are expressed by replacing the dakuten with other symbols or characters can be detected as a group of keywords. In this case, for example, a katakana character block set included in the sentence may be extracted by morphological analysis.

また、キーワード中に不要なスペースコードや改行コードが含まれている場合、これらのスペースコードや改行コードを削除することにより、キーワードとして抽出することができる。従って、濁点や余分なコードの追加等によって、標準的な表現とは異なる表現で記述されたキーワードであっても、キーワードとして検出することができ、標準的なキーワードとの対応関係を検出することができる。 In addition, when an unnecessary space code or line feed code is included in a keyword, it can be extracted as a keyword by deleting the space code or line feed code. Therefore, even if a keyword is described in a different expression from the standard expression due to the addition of a cloud point or extra code, it can be detected as a keyword, and the correspondence with the standard keyword can be detected. Can do.

図１に示すキーワードランキングテーブル５では、３人の出願人Ａ社、Ｂ社、Ｃ社によりそれぞれ作成された特定の技術分野に関する文献情報群において使用されているキーワードが第１位から第３位まで示されている。 In the keyword ranking table 5 shown in FIG. 1, the keywords used in the literature information group relating to the specific technical field respectively created by the three applicants A company, B company, and C company are ranked first to third. Shown up to.

基準キーワード検出部６は、基準となるキーワードを検出するものであり、「第１キーワード検出部」に対応する。ここで、基準キーワードとは、多くの文献情報で使用されているが、一部の文献情報では使用されていないものを意味する。即ち、基準キーワードは、分析母集団を構成する文献情報群において標準的に使用されているキーワード、多数派の使用する主流キーワードと表現することもできる。分析母集団を構成する文献情報群のうち、基準キーワードを含む文献情報は「所定の文献情報」に該当し、基準キーワードを含まない文献情報は「所定の文献情報以外の他の文献情報」に該当する。
なお、厳密には、多数の文献情報で使用されているキーワードであっても、それが正式名称であるとは限らない。略称の方が正式名称よりも頻繁に使用される場合もある。また、登録商標の普通名称化または慣用商標化として知られているように、特定企業の登録商標が有名になった結果、正式名称よりも登録商標の方が使用される場合もある。 The reference keyword detection unit 6 detects a keyword as a reference, and corresponds to a “first keyword detection unit”. Here, the reference keyword means a keyword that is used in a lot of document information but is not used in some document information. That is, the reference keyword can be expressed as a keyword that is used as standard in the literature information group that constitutes the analysis population, or a mainstream keyword that is used by the majority. Of the document information group constituting the analysis population, document information including the reference keyword corresponds to “predetermined document information”, and document information not including the reference keyword is referred to as “other document information other than the predetermined document information”. Applicable.
Strictly speaking, even a keyword used in a large number of document information is not necessarily an official name. Abbreviations may be used more frequently than formal names. In addition, as is known as a common name or a common name of a registered trademark, a registered trademark may be used rather than an official name as a result of a registered trademark of a specific company becoming famous.

図１中のキーワードランキングテーブル５内に示す例では、キーワードＷ２が基準キーワードに該当する。キーワードＷ２は、Ａ社での出現頻度は第２位であり、Ｂ社での出現頻度は第１位であるのに対し、Ｃ社では出現していない。分析母集団内の多くの文献情報に出現しているにもかかわらず、一部の文献情報では出現していないため、キーワードＷ２は基準キーワードとして検出される。 In the example shown in the keyword ranking table 5 in FIG. 1, the keyword W2 corresponds to the reference keyword. The keyword W2 has the second highest appearance frequency in the company A and the first appearance frequency in the company B, but does not appear in the company C. Although it appears in many pieces of literature information in the analysis population, it does not appear in some literature information, so the keyword W2 is detected as a reference keyword.

対応キーワード候補検出部７は、対応キーワードの候補となりうるキーワードを検出するものであり、「第２キーワード候補検出部」に該当する。対応キーワードとは、基準キーワードを含まない文献情報において、基準キーワードに対応していると考えられるキーワードである。対応キーワードは、標準的な表現から外れた、いわゆる方言的キーワードと考えることができる。本実施例における対応キーワード、即ち、方言的キーワードには、特定の組織内で慣用されているマイナーな表現の他に、略称や異表記も含まれる。 The corresponding keyword candidate detection unit 7 detects a keyword that can be a candidate for the corresponding keyword, and corresponds to a “second keyword candidate detection unit”. The corresponding keyword is a keyword that is considered to correspond to the reference keyword in the literature information that does not include the reference keyword. Corresponding keywords can be thought of as so-called dialect keywords that deviate from the standard expression. Corresponding keywords in this embodiment, that is, dialectal keywords, include abbreviations and different notations in addition to minor expressions that are commonly used within a specific organization.

ある特定の文献情報から構成される分析母集団は、その分析条件に適合する共通のキーワードを一つまたは複数含んでいると考えられる。特に、出現頻度の順位が所定範囲内のキーワードは、各出願人間で共通しやすい。しかし、一部の出願人（Ｃ社）では、キーワードＷ２が使用されていない。このことは、一部の出願人において、標準的な表現とは異なる別の表現を用いて、特定の技術要素を記述している可能性を意味する。そこで、対応キーワード候補検出部７は、基準キーワードに対応するキーワードを、一部の出願人の文献情報群から検出する。図１に示す例では、Ｃ社の文献情報に登場するキーワードのうち、第２位のＷ４及び第３位のＷ５が、対応キーワード候補に該当する。キーワードＷ１は、全ての出願人において使用されているため、対応キーワードの候補から除外される。 An analysis population composed of specific document information is considered to include one or more common keywords that match the analysis conditions. In particular, keywords whose appearance frequency rank is within a predetermined range are likely to be common to each applicant. However, in some applicants (Company C), the keyword W2 is not used. This means that some applicants may describe specific technical elements using different expressions different from the standard expressions. Therefore, the corresponding keyword candidate detection unit 7 detects a keyword corresponding to the reference keyword from the literature information group of some applicants. In the example shown in FIG. 1, among the keywords appearing in the document information of company C, the second-ranked W4 and the third-ranked W5 correspond to corresponding keyword candidates. Since the keyword W1 is used by all applicants, it is excluded from the corresponding keyword candidates.

基準キーワード特徴情報検出部８は、基準キーワードの有する属性情報のうち、対応キーワードとの関連性を調べるために有用と考えられる特徴情報を検出するものであり、「第１特徴情報検出部」に該当する。 The reference keyword feature information detection unit 8 detects feature information that is considered useful for examining the relevance with the corresponding keyword among the attribute information of the reference keyword, and is referred to as a “first feature information detection unit”. Applicable.

対応キーワード候補特徴情報検出部９は、同様に、対応キーワード候補の特徴情報を検出するものであり、「第２特徴情報検出部」に該当する。 Similarly, the corresponding keyword candidate feature information detecting unit 9 detects feature information of the corresponding keyword candidate, and corresponds to a “second feature information detecting unit”.

ここで、属性情報としては、例えば、キーワードの文字種（平仮名、カタカナ、英数字、記号、漢字の別）、文字列の長さ等が知られているが、本実施例では、出現頻度の順位及び特許マップ上での位置（座標）に着目する。特徴情報は、例えば、キーワード同士の関連性を判断するための関連性判断情報、キーワード間の近似性を判断するための近似性判断情報等のように表現することもできる。 Here, as the attribute information, for example, the character type of the keyword (hiragana, katakana, alphanumeric characters, symbols, kanji characters), the length of the character string, and the like are known. Focus on the position (coordinates) on the patent map. The feature information can also be expressed as, for example, relevance determination information for determining relevance between keywords, proximity determination information for determining closeness between keywords, and the like.

詳細は後述するが、図１の例では、基準キーワードＷ１の出現頻度の順位は、Ａ社で第２位、Ｂ社で第１位であるから、その平均順位は、１．５位となる。対応キーワード候補Ｗ４の順位は第２位、別の対応キーワード候補Ｗ５の順位は第３位である。 Although the details will be described later, in the example of FIG. 1, the ranking of the appearance frequency of the reference keyword W1 is second in the company A and first in the company B, so the average rank is 1.5. . The corresponding keyword candidate W4 is ranked second, and another corresponding keyword candidate W5 is ranked third.

特徴情報比較部１０は、基準キーワード特徴情報検出部８により検出された特徴情報と、対応キーワード候補特徴情報検出部９により検出された特徴情報とを比較する。対応キーワード検出部１１は、特徴情報の比較結果に基づいて、対応キーワード候補の中から、基準キーワードに対応するキーワードを検出する。対応キーワード検出部１１は、「第２キーワード検出部」に該当する。上記の例では、基準キーワードＷ１の平均登場順位「１．５」であるから、これに最も近い順位（第２位）を有するキーワードＷ４が対応キーワードとして検出される。 The feature information comparison unit 10 compares the feature information detected by the reference keyword feature information detection unit 8 with the feature information detected by the corresponding keyword candidate feature information detection unit 9. The corresponding keyword detection unit 11 detects a keyword corresponding to the reference keyword from the corresponding keyword candidates based on the comparison result of the feature information. The corresponding keyword detection unit 11 corresponds to a “second keyword detection unit”. In the above example, since the average appearance rank “1.5” of the reference keyword W1 is, the keyword W4 having the closest rank (second rank) to this is detected as the corresponding keyword.

キーワード対応関係出力部１２は、基準キーワードと対応キーワードとを関連づけて、出力させるものである。例えば、キーワードの対応関係は、ディスプレイ等の表示部１３を介して、ユーザに提示される。なお、後述の実施例からも明らかなように、対応キーワードの検出に際して、ユーザの判断を仰ぐことも可能である。
また、キーワードランキングテーブル５それ自体を、表示部１３に出力させることもできる。そして、キーワードランキングテーブル５を画面出力する場合、基準キーワードと対応キーワード（または対応キーワード候補）との対応関係が明確に区別できるように、例えば、フォントの種類、フォントサイズ、文字色等を適宜設定することもできる。 The keyword correspondence output unit 12 outputs the reference keyword and the corresponding keyword in association with each other. For example, the correspondence relationship between keywords is presented to the user via the display unit 13 such as a display. As will be apparent from the examples described later, it is also possible to ask the user's judgment when detecting the corresponding keyword.
Further, the keyword ranking table 5 itself can be output to the display unit 13. When the keyword ranking table 5 is output on the screen, for example, the font type, font size, character color, etc. are appropriately set so that the correspondence between the reference keyword and the corresponding keyword (or corresponding keyword candidate) can be clearly distinguished. You can also

シソーラス登録部１４は、検出された対応キーワードを基準キーワードに対応付けて、例えば、辞書に登録する。これにより、次の検索では、より改善された辞書を用いることができ、検索精度が向上する。 The thesaurus registration unit 14 associates the detected corresponding keyword with the reference keyword and registers it in the dictionary, for example. Thereby, in the next search, a more improved dictionary can be used, and the search accuracy is improved.

キーワード置換部１５は、対応キーワードを基準キーワードに置換するものである。逆に、基準キーワードを対応キーワードに置換することも可能である。これにより、標準的な表現で文献情報を精読することができ、ユーザの利便性が向上する。 The keyword replacement unit 15 replaces the corresponding keyword with the reference keyword. Conversely, the reference keyword can be replaced with the corresponding keyword. Thereby, literature information can be read carefully by a standard expression, and a user's convenience improves.

図２は、データ処理の流れを模式的に示す説明図である。ユーザは、ユーザインターフェースを介して、分析条件１Ａを指定する。この例では、特定の出願人（作成者）Ａ〜Ｅ社によって、特定の技術分野に関して作成された文献情報を指定している。技術分野は、例えば、特許分類コードや技術用語等により特定することができる。また、例えば、所定の限られた研究者の関与する研究内容を分析する場合、研究者の氏名によって技術分野を特定することもできる。 FIG. 2 is an explanatory diagram schematically showing the flow of data processing. The user specifies the analysis condition 1A via the user interface. In this example, document information created for a specific technical field is specified by specific applicants (creators) A to E. The technical field can be specified by, for example, a patent classification code, a technical term, or the like. Further, for example, when analyzing the research contents involving a predetermined limited researcher, the technical field can be specified by the name of the researcher.

この分析条件１Ａに合致する文献情報群は、情報蓄積部３から抽出される。この抽出された文献情報群は、分析対象の母集団を構成する。この分析母集団について、キーワードの出現頻度が解析され、キーワードランキングテーブル５が生成される。 A document information group that matches the analysis condition 1A is extracted from the information storage unit 3. This extracted document information group constitutes a population to be analyzed. The keyword appearance frequency is analyzed for this analysis population, and the keyword ranking table 5 is generated.

キーワードランキングテーブル５を解析することにより、基準キーワードテーブル６Ａが生成される。図２に示す例では、太字で示すキーワード「パソコン」が基準キーワードとして検出され、基準キーワードテーブル６Ａに記憶される。 By analyzing the keyword ranking table 5, a reference keyword table 6A is generated. In the example shown in FIG. 2, the keyword “personal computer” shown in bold is detected as a reference keyword and stored in the reference keyword table 6A.

キーワード「パソコン」の出現頻度は、Ａ社では第３位、Ｂ社では第４位、Ｃ社では第２位、Ｄ社では第３位である。Ｃ社では、正式名称である「パーソナルコンピュータ」が使用されているが、この実施例では、正式名称「パーソナルコンピュータ」と略称「パソコン」とは、同義語であることが既に判明しており、情報蓄積部３に登録済みであるものとする。 The frequency of appearance of the keyword “personal computer” is 3rd in Company A, 4th in Company B, 2nd in Company C, and 3rd in Company D. In Company C, the official name “Personal Computer” is used, but in this example, the official name “Personal Computer” and the abbreviation “Personal Computer” have already been found to be synonymous, It is assumed that the information has been registered in the information storage unit 3.

キーワード「パソコン（パーソナルコンピュータを含む）」は、Ａ社〜Ｄ社において高い順位で登場するのに対し、Ｅ社では、判断対象の順位内（図示の例では、第１位〜第６位）で使用されていない。従って、キーワード「パソコン」は基準キーワードとして検出される。 The keyword “personal computer (including personal computer)” appears in a high ranking in Company A to Company D, while Company E is within the ranking of the judgment target (in the example shown, first to sixth). Not used in. Therefore, the keyword “PC” is detected as the reference keyword.

そして、キーワードランキングテーブル５に記録されているＥ社のキーワード群の中から、基準キーワードに対応しうるキーワードの候補が検出される。検出されたキーワードは、対応キーワード候補テーブル７Ａに記憶される。 Then, keyword candidates that can correspond to the reference keyword are detected from the keyword group of Company E recorded in the keyword ranking table 5. The detected keywords are stored in the corresponding keyword candidate table 7A.

図２に示す例では、「ソフトウェア」、「ハードウェア」及び「システム」は、全ての出願人Ａ社〜Ｅ社で使用されている共通のキーワードである。従って、これら「ソフトウェア」、「ハードウェア」及び「システム」は、基準キーワードまたは対応キーワードのいずれにも該当しない。 In the example shown in FIG. 2, “software”, “hardware”, and “system” are common keywords used by all applicants A company to E company. Therefore, “software”, “hardware”, and “system” do not correspond to any of the reference keyword or the corresponding keyword.

また、「プログラム」というキーワードは、Ｄ社を除くＡ社，Ｂ社，Ｃ社及びＥ社で使用されており、共通キーワードではないが、多くの出願人で使用されているため（即ち、多くの文献情報で使用されているため）、基準キーワード「パソコン」に対応するキーワードとはならない。Ａ社，Ｂ社，Ｃ社及びＥ社において、「プログラム」は、その通りの意味で使用されていると考えられる。そこで、所定数以上の出願人で使用されているキーワード「プログラム」は、対応キーワードの候補とならず、除外される。 The keyword “program” is used by A company, B company, C company, and E company except for D company, and is not a common keyword, but is used by many applicants (ie, many). ) Is not a keyword corresponding to the reference keyword “computer”. In Company A, Company B, Company C, and Company E, “program” is considered to be used in the same meaning. Therefore, the keyword “program” used by a predetermined number or more of applicants is not a candidate for the corresponding keyword, and is excluded.

もっとも、Ｄ社に関して「プログラム」というキーワードに着目すると、この「プログラム」は、別の基準キーワードに相当する。説明の便宜上、以下では、一つの基準キーワード「パソコン」に着目して説明するが、本実施例のキーワード対応関係分析装置は、キーワードランキングテーブル５から複数の基準キーワードを検出することができ、各基準キーワードに対応するキーワードをそれぞれ検出することができる。
即ち、特定の技術分野に関する文献情報群において、複数の出願人がそれぞれ別々の方言的表現（基準キーワードと異なるキーワード）を使用している場合でも、それぞれの方言的表現を別々に検出し、対応する標準的表現（基準キーワード）との関係を検出することができる。 However, focusing on the keyword “program” for Company D, this “program” corresponds to another reference keyword. For convenience of explanation, the following description focuses on one reference keyword “personal computer”. However, the keyword correspondence analysis apparatus of the present embodiment can detect a plurality of reference keywords from the keyword ranking table 5, and Each keyword corresponding to the reference keyword can be detected.
In other words, even when multiple applicants use different dialect expressions (keywords different from the standard keyword) in the literature information group related to a specific technical field, each dialect expression is detected separately and supported. It is possible to detect a relationship with a standard expression (reference keyword).

さて、Ｅ社で使用されているキーワード「情報処理装置」及び「音声認識」は、他の出願人Ａ社〜Ｄ社において所定数以上使用されていない。従って、これらの「情報処理装置」及び「音声認識」は、基準キーワード「パソコン」に対応しうるキーワードであるとして検出され、対応キーワード候補テーブル７Ａに記憶される。即ち、Ｅ社では、「パソコン」という技術要素を「情報処理装置」または「音声認識」という別のキーワードで表現している可能性がある。 The keywords “information processing apparatus” and “speech recognition” used by company E are not used more than a predetermined number by other applicants A to D. Therefore, these “information processing apparatus” and “voice recognition” are detected as keywords that can correspond to the reference keyword “personal computer”, and are stored in the corresponding keyword candidate table 7A. In other words, Company E may represent the technical element “PC” as another keyword “Information processing device” or “Voice recognition”.

次に、基準キーワードと対応キーワード候補の特徴情報に着目する。ここでは、特徴情報として、キーワードの出現頻度順位を採用する。基準キーワード「パソコン」の順位は、Ａ社では第３位、Ｂ社では第４位、Ｃ社では第２位、Ｄ社では第３位であるから、その平均値は「３」となる。これに対し、対応キーワード候補「情報処理装置」の順位は第３位であり、別の対応キーワード候補「音声認識」の順位は第６位である。そこで、より順位の近い「情報処理装置」が、基準キーワード「パソコン」に対応するキーワードとして検出される。 Next, attention is paid to the feature information of the reference keyword and the corresponding keyword candidate. Here, keyword appearance frequency ranking is adopted as feature information. The ranking of the reference keyword “PC” is No. 3 for Company A, No. 4 for Company B, No. 2 for Company C, and No. 3 for Company D, so the average value is “3”. On the other hand, the ranking of the corresponding keyword candidate “information processing apparatus” is third, and the ranking of another corresponding keyword candidate “speech recognition” is sixth. Therefore, the “information processing apparatus” having a lower rank is detected as a keyword corresponding to the reference keyword “PC”.

なお、基準キーワードに最も順位の近い対応キーワード候補のみを選択するのではなく、基準キーワードの平均順位から所定範囲内の順位を有する対応キーワード候補を全て選択して、キーワード対応関係テーブル１２Ａを生成することができる。 Instead of selecting only the corresponding keyword candidates that are closest in rank to the reference keyword, all the corresponding keyword candidates that have ranks within a predetermined range from the average rank of the reference keywords are selected to generate the keyword correspondence table 12A. be able to.

キーワード対応関係テーブル１２Ａには、基準キーワード「パソコン」と対応キーワード「情報処理装置」とが関連づけられて記憶される。この対応関係は、表示部１３を介してユーザに提示される。また、この対応関係は、情報蓄積部３に反映され、そこに含まれる辞書を改善する。 In the keyword correspondence table 12A, the reference keyword “PC” and the corresponding keyword “information processing apparatus” are stored in association with each other. This correspondence is presented to the user via the display unit 13. This correspondence is reflected in the information storage unit 3 to improve the dictionary included therein.

図３は、キーワード対応関係分析処理を示すフローチャートである。以下に述べる各フローチャートは、発明の理解及び実施に必要な程度で、処理の概要をそれぞれ示しており、実際のコンピュータプログラムとは相違する。なお、以下の説明ではステップを「Ｓ」と略記する。 FIG. 3 is a flowchart showing the keyword correspondence analysis process. Each flowchart described below shows an outline of processing to the extent necessary for understanding and implementing the invention, and is different from an actual computer program. In the following description, step is abbreviated as “S”.

対応関係分析装置は、ユーザにより指定された分析条件１Ａを取得し（Ｓ１１）、この分析条件１Ａに合致する文献情報群を情報蓄積部３から抽出する（Ｓ１２）。そして、対応関係分析装置は、抽出された文献情報群に基づいて、キーワードランキングテーブル５を生成する（Ｓ１３）。キーワードランキングテーブル５を生成するステップについては、図４と共に詳述する。 The correspondence analysis device acquires the analysis condition 1A designated by the user (S11), and extracts a document information group that matches the analysis condition 1A from the information storage unit 3 (S12). Then, the correspondence analysis device generates the keyword ranking table 5 based on the extracted document information group (S13). The step of generating the keyword ranking table 5 will be described in detail with reference to FIG.

次に、対応関係分析装置は、キーワードランキングテーブル５に基づいて、基準キーワードテーブル６Ａを生成し（Ｓ１４）、さらに、基準キーワードの平均順位を算出する（Ｓ１５）。基準キーワードテーブル６Ａを生成するステップについては、図５と共に詳述する。 Next, the correspondence analysis device generates a reference keyword table 6A based on the keyword ranking table 5 (S14), and further calculates the average rank of the reference keywords (S15). The step of generating the reference keyword table 6A will be described in detail with reference to FIG.

対応関係分析装置は、キーワードランキングテーブル５に基づいて、対応キーワード候補テーブル７Ａを生成し（Ｓ１６）、検出された対応キーワード候補の順位をそれぞれ取得する（Ｓ１７）。対応キーワード候補テーブル７Ａを生成するステップについては、図６と共に詳述する。 The correspondence analysis device generates a correspondence keyword candidate table 7A based on the keyword ranking table 5 (S16), and acquires the ranks of the detected correspondence keyword candidates, respectively (S17). The step of generating the corresponding keyword candidate table 7A will be described in detail with reference to FIG.

そして、対応関係分析装置は、基準キーワードの平均順位と、各対応キーワード候補の順位とを比較し（Ｓ１８）、順位の近い対応キーワード候補と基準キーワードとを関連づけて、キーワード対応関係テーブル１２Ａを生成する（Ｓ１９）。 Then, the correspondence relationship analysis apparatus compares the average ranking of the reference keywords with the ranking of each corresponding keyword candidate (S18), associates the correspondence keyword candidates having similar ranks with the reference keyword, and generates the keyword correspondence table 12A. (S19).

上述のように、キーワード対応関係テーブル１２Ａには、対応キーワード候補を一つだけ登録することもできるし、複数の対応キーワード候補を登録することもできる。第１に、基準キーワードの平均順位に最も近い順位を有するキーワードのみをテーブル１２Ａに登録する方法が考えられる。第２に、基準キーワードの平均順位から所定範囲内の順位を有する複数のキーワードをテーブル１２Ａに登録する方法が考えられる。第３に、ユーザによる事前の、または事後の選択により、第１の方法と第２の方法とを切り替える方法も考えられる。 As described above, only one corresponding keyword candidate can be registered in the keyword correspondence table 12A, or a plurality of corresponding keyword candidates can be registered. First, a method of registering only the keywords having the ranking closest to the average ranking of the reference keywords in the table 12A is conceivable. Second, a method of registering a plurality of keywords having a rank within a predetermined range from the average rank of the reference keywords in the table 12A can be considered. Thirdly, a method of switching between the first method and the second method by a user's prior or subsequent selection is also conceivable.

ここでは、第２または第３の方法により、複数の対応キーワード（候補）がユーザに提示される場合を説明する。ユーザは、提示された複数のキーワードの中から、いずれか一つのキーワードを選択することができる（Ｓ２１）。 Here, a case will be described in which a plurality of corresponding keywords (candidates) are presented to the user by the second or third method. The user can select any one of the presented keywords (S21).

図２の例では、ユーザは、「情報処理装置」と「音声認識」のいずれが、「パソコン」に対応するキーワードであるかを手動で選択することができる。ユーザによって選択された場合（S21:YES）、対応関係分析装置は、その選択されたキーワードを基準キーワードに対応するキーワードとして対応付ける（Ｓ２２）。ユーザによる手動選択が行われない場合（S21:NO）、対応関係分析装置は、最も順位の近いキーワードを自動的に選択し、基準キーワードに対応付ける（Ｓ２３）。この対応付け（Ｓ２２，Ｓ２３）により、対応キーワードが確定する。 In the example of FIG. 2, the user can manually select which of “information processing apparatus” and “voice recognition” is a keyword corresponding to “personal computer”. When selected by the user (S21: YES), the correspondence analysis device associates the selected keyword as a keyword corresponding to the reference keyword (S22). When the manual selection by the user is not performed (S21: NO), the correspondence analysis device automatically selects the keyword having the closest rank and associates it with the reference keyword (S23). By this association (S22, S23), the corresponding keyword is determined.

対応関係分析装置は、キーワード間の対応関係を種々の方法で利用できる。例えば、ユーザがシソーラスの登録を希望する場合（S24:YES）、対応関係分析装置は、対応キーワードを基準キーワードのシソーラスとして登録する（Ｓ２５）。また、例えば、ユーザが、対応キーワードの置換を希望する場合（S26:YES）、対応関係分析装置は、分析母集団に含まれるＥ社によって作成された全ての文献情報、または、ユーザにより選択された一部の文献情報について、対応キーワードを基準キーワードに変換する（Ｓ２７）。なお、これとは逆に、基準キーワードを対応キーワードに変換することもできる。 The correspondence analysis device can use the correspondence between keywords in various ways. For example, when the user wishes to register a thesaurus (S24: YES), the correspondence relationship analysis apparatus registers the corresponding keyword as a thesaurus for the reference keyword (S25). Further, for example, when the user desires to replace the corresponding keyword (S26: YES), the corresponding relationship analysis device is selected by the user from all the literature information created by the company E included in the analysis population, or by the user. For a part of the document information, the corresponding keyword is converted into the reference keyword (S27). On the other hand, the reference keyword can be converted into a corresponding keyword.

次に、図４は、図３中のＳ１３で示されたキーワードランキングテーブル生成処理の詳細を示すフローチャートである。まず、対応関係分析装置は、分析条件１Ａ中の出願人の中から一人の出願人を選択し（Ｓ１３０）、選択された出願人により作成された文献情報を取得する（Ｓ１３１）。 Next, FIG. 4 is a flowchart showing details of the keyword ranking table generation process shown in S13 of FIG. First, the correspondence analysis device selects one applicant from the applicants in the analysis condition 1A (S130), and acquires document information created by the selected applicant (S131).

そして、対応関係分析装置は、文献情報の中から助詞や定型句、見出し等の不要な語句を除去し（Ｓ１３２）、キーワードのみを抽出する（Ｓ１３３）。そして、対応関係分析装置は、抽出された各キーワード毎に、それぞれの出現回数を算出し（Ｓ１３４）、各キーワードを出現頻度の高い順に並び替え（Ｓ１３５）、キーワードランキングテーブル５に登録する（Ｓ１３６）。 Then, the correspondence analysis device removes unnecessary words such as particles, fixed phrases, and headings from the document information (S132), and extracts only keywords (S133). Then, the correspondence analysis device calculates the number of appearances for each extracted keyword (S134), rearranges the keywords in descending order of appearance frequency (S135), and registers them in the keyword ranking table 5 (S136). ).

対応関係分析装置は、分析条件１Ａ中の全ての出願人についてキーワードの出現頻度を解析したか否かを判定し（Ｓ１３７）、未解析の出願人が残っている場合（S137:NO）、次の出願人を選択して（Ｓ１３８）、Ｓ１３１に戻る。このように、各出願人のそれぞれについて、文献情報中に使用されているキーワード及びその出現頻度を検出する。 The correspondence analysis device determines whether or not the keyword appearance frequency has been analyzed for all applicants in the analysis condition 1A (S137). If there are unanalyzed applicants remaining (S137: NO), the next The applicant is selected (S138), and the process returns to S131. In this manner, the keywords used in the document information and their appearance frequencies are detected for each applicant.

図５は、図３中のＳ１４で示された基準キーワードテーブル生成処理を示すフローチャートである。対応関係分析装置は、キーワードランキングテーブル５を参照し（Ｓ１４０）、判断対象の順位Ｎ１までのキーワードを取得する（Ｓ１４１）。この判断対象の順位Ｎ１は、ユーザが指定可能である。図２に示す例では、「６」がＮ１に該当する。 FIG. 5 is a flowchart showing the reference keyword table generation process shown in S14 of FIG. The correspondence analysis device refers to the keyword ranking table 5 (S140), and acquires keywords up to the determination target rank N1 (S141). The order N1 of this determination target can be specified by the user. In the example shown in FIG. 2, “6” corresponds to N1.

次に、対応関係分析装置は、順位Ｎ１までのキーワードについて、各出願人による使用状況をそれぞれ検出し（Ｓ１４２）、全出願人で使用されている共通のキーワードを基準キーワードから除外する（Ｓ１４３）。 Next, the correspondence analysis device detects the usage status by each applicant for the keywords up to rank N1 (S142), and excludes the common keywords used by all applicants from the reference keyword (S143). .

次に、対応関係分析装置は、所定数Ｎ２以上の出願人で使用されているキーワードを検出し（Ｓ１４４）、この検出されたキーワードを基準キーワードとして、基準キーワードテーブル６Ａに登録する（Ｓ１４５）。Ｎ２の値は、ユーザが指定可能である。Ｎ２は、２以上、かつ、分析母集団の文献情報の総数未満の値に設定される。 Next, the correspondence analysis device detects keywords used by a predetermined number N2 or more of applicants (S144), and registers the detected keywords as reference keywords in the reference keyword table 6A (S145). The value of N2 can be specified by the user. N2 is set to a value of 2 or more and less than the total number of document information of the analysis population.

そして、対応関係分析装置は、基準キーワードを使用している出願人の名称または識別コードを記憶する（Ｓ１４６）。基準キーワードを使用する出願人を基準出願人と呼ぶことができる。同様に、対応関係分析装置は、基準キーワードを使用していない出願人の名称または識別コードを記憶する（Ｓ１４７）。基準キーワードを使用していない出願人を対象出願人と呼ぶことができる。基準出願人の名称または識別コード（装置内で、出願人を特定可能な情報であればよい）は、基準キーワードテーブル６Ａに登録される。対象出願人の名称等は、対応キーワード候補テーブル７Ａの生成に使用される。 Then, the correspondence analysis device stores the name or identification code of the applicant who uses the reference keyword (S146). An applicant who uses a reference keyword can be referred to as a reference applicant. Similarly, the correspondence analysis apparatus stores the name or identification code of the applicant who does not use the reference keyword (S147). An applicant who does not use the reference keyword can be called a target applicant. The name or identification code of the reference applicant (information that can identify the applicant in the apparatus) is registered in the reference keyword table 6A. The name of the subject applicant is used to generate the corresponding keyword candidate table 7A.

図６は、図３中のＳ１６で示された対応キーワード候補テーブル生成処理を示すフローチャートである。対応関係分析装置は、キーワードランキングテーブル５を参照し（Ｓ１６０）、対象出願人の作成した文献情報群で使用されている順位Ｎ１までのキーワードを取得する（Ｓ１６１）。 FIG. 6 is a flowchart showing the corresponding keyword candidate table generation process shown in S16 of FIG. The correspondence analysis device refers to the keyword ranking table 5 (S160), and acquires keywords up to rank N1 used in the literature information group created by the subject applicant (S161).

対応関係分析装置は、取得されたキーワード毎に、基準出願人による使用状況を検出し（Ｓ１６２）、所定数Ｎ３以上の基準出願人により使用されているキーワードを、対応キーワード候補から除外する（Ｓ１６３）。このようにして、対象出願人により使用されている順位Ｎ１までのキーワードから、不要なキーワードが取り除かれる。そして、対応関係分析装置は、残されたキーワードを対応キーワード候補として、対応キーワード候補テーブル７Ａに登録する（Ｓ１６４）。 For each acquired keyword, the correspondence analysis device detects the usage status by the reference applicant (S162), and excludes keywords used by the reference applicant of a predetermined number N3 or more from the corresponding keyword candidates (S163). ). In this way, unnecessary keywords are removed from the keywords up to the rank N1 used by the subject applicant. Then, the correspondence relationship analysis apparatus registers the remaining keywords as correspondence keyword candidates in the correspondence keyword candidate table 7A (S164).

本実施例は、上述のように構成されるので、以下の効果を奏する。本実施例では、同一の技術要素が、出願人間で異なる用語で表現されている場合でも、キーワード間の対応関係を分析し、基準キーワードに対応するキーワードを検出することができる。従って、用語が統一されていない技術分野の文献を、より効率的に、より高い精度で調査することができ、ユーザの使い勝手が向上する。
即ち、本実施例によれば、基準キーワード（標準的表現）を使用する文献情報群と、標準的表現から外れた対応キーワード（方言的表現）を使用する文献情報群との対応関係を把握することができる。つまり、標準的な表現とは異なる表現が使用されている文献情報群の中から、方言的な表現を抽出し、抽出された方言的な表現と標準的な表現との対応関係を可視化して示すことができる。 Since this embodiment is configured as described above, the following effects can be obtained. In the present embodiment, even when the same technical element is expressed by a different term for the applicant, it is possible to analyze the correspondence relationship between the keywords and detect the keyword corresponding to the reference keyword. Therefore, it is possible to search documents in a technical field in which terms are not unified more efficiently and with higher accuracy, and user convenience is improved.
That is, according to the present embodiment, the correspondence relationship between the document information group using the reference keyword (standard expression) and the document information group using the corresponding keyword (dialectual expression) deviating from the standard expression is grasped. be able to. In other words, dialectal expressions are extracted from the literature information group that uses expressions different from standard expressions, and the correspondence between the extracted dialect expressions and standard expressions is visualized. Can show.

本実施例では、キーワードランキングテーブル５に基づいて、基準キーワードに対応するキーワードを自動的に、または半自動的に検出可能な構成とした。従って、比較的簡易な構成でありながら、キーワード間の対応関係を解析することができる。 In this embodiment, the keyword corresponding to the reference keyword is automatically or semi-automatically detected based on the keyword ranking table 5. Accordingly, it is possible to analyze the correspondence between keywords while having a relatively simple configuration.

本実施例では、キーワード同士の出現頻度順位を比較することにより、対応するキーワードの組合せを検出する構成とした。従って、キーワードランキングテーブル５を用いた比較的簡易な制御構成で、キーワード間の対応関係を解析することができる。 In the present embodiment, the combination of corresponding keywords is detected by comparing the appearance frequency ranks of the keywords. Therefore, the correspondence relationship between keywords can be analyzed with a relatively simple control configuration using the keyword ranking table 5.

本実施例では、キーワードの対応関係をシソーラスとして登録可能な構成とした。従って、情報蓄積部３内の辞書を改善することができ、次の検索に役立たせることができ、ユーザの利便性が向上する。 In this embodiment, the correspondence relationship between keywords can be registered as a thesaurus. Therefore, the dictionary in the information storage unit 3 can be improved, which can be used for the next search, and the convenience for the user is improved.

本実施例では、対応キーワードを基準キーワードに置換可能な構成とした。従って、一部の出願人によって使用されている独特の表現を標準的な表現に翻訳することにより、内容把握に役立たせることができ、ユーザの使い勝手が向上する。即ち、元の「なまった」明細書の全体について、検出された方言キーワードを標準キーワードに置き換えることにより、その「なまった」明細書の理解に役立たせることができる。 In this embodiment, the corresponding keyword can be replaced with the reference keyword. Therefore, by translating unique expressions used by some applicants into standard expressions, it can be used for grasping the contents, and user convenience is improved. In other words, by replacing the detected dialect keyword with the standard keyword for the entire original “naked” specification, it is possible to make it easier to understand the “naked” specification.

図７〜図１３に基づいて、第２実施例を説明する。第２実施例では、第１実施例で述べたキーワード対応関係分析装置を、マップを自動作成するための文献情報分析装置１００内に組み込んでいる。 A second embodiment will be described with reference to FIGS. In the second embodiment, the keyword correspondence analysis apparatus described in the first embodiment is incorporated in the document information analysis apparatus 100 for automatically creating a map.

文献情報分析装置１００は、マップ制御部１１０と、キーワード対応関係分析部１２０及び情報蓄積部３００を備えたコンピュータ装置として構成可能である。そして、文献情報分析装置１００は、例えば、インターネットやLAN（Local Area Network）等の通信ネットワークを介して、クライアント端末２００と双方向通信可能に接続されている。クライアント端末２００は、例えば、パーソナルコンピュータや携帯情報端末（携帯電話を含む）等として構成可能である。 The document information analysis apparatus 100 can be configured as a computer apparatus including a map control unit 110, a keyword correspondence analysis unit 120, and an information storage unit 300. The document information analyzing apparatus 100 is connected to the client terminal 200 so as to be capable of bidirectional communication via a communication network such as the Internet or a LAN (Local Area Network). The client terminal 200 can be configured as, for example, a personal computer or a portable information terminal (including a mobile phone).

マップ制御部１１０は、情報蓄積部３００に記憶されている多数の文献情報に基づいて、技術マップ１５０を生成し、出力する。生成されたマップ１５０は、通信ネットワークを介して、クライアント端末２００に送信される。マップ１５０は、クライアント端末２００内に保存可能としてもよいし、クライアント端末２００内に保存不能としてもよい。マップ制御部１１０の詳細はさらに後述する。 The map control unit 110 generates and outputs a technical map 150 based on a large number of document information stored in the information storage unit 300. The generated map 150 is transmitted to the client terminal 200 via the communication network. The map 150 may be storable in the client terminal 200, or may not be storable in the client terminal 200. Details of the map control unit 110 will be described later.

キーワード対応関係分析部１２０は、ユーザから指定された分析条件に基づいて、マップ１５０上にマッピングされキーワード間の対応関係を分析する。キーワード対応関係分析部１２０は、上述したキーワード対応関係分析装置と同様の機能を備える。 The keyword correspondence analysis unit 120 is mapped on the map 150 based on the analysis conditions designated by the user and analyzes the correspondence between the keywords. The keyword correspondence analysis unit 120 has the same function as the keyword correspondence analysis device described above.

情報蓄積部３００は、例えば、特許公開公報や登録公報、あるいは、科学技術論文等のような文献情報を多数記憶している。 The information storage unit 300 stores a large number of document information such as patent publications, registration bulletins, and scientific and technical papers.

マップ１５０の構成を説明する。マップ１５０は、ユーザから指示された目的に添って生成されるものである。ユーザは、例えば、調査を希望する技術分野や特定のサーチワード等を指定することにより、情報蓄積部３００に記憶されている多数の文献情報群の中から所定の文献情報群のみを選ぶことができる。ユーザによって選ばれた文献情報群の内容は、マップ制御部１１０によって解析され、マップ１５０が生成される。 The configuration of the map 150 will be described. The map 150 is generated according to the purpose instructed by the user. For example, the user can select only a predetermined document information group from among a large number of document information groups stored in the information storage unit 300 by designating a technical field desired to be investigated, a specific search word, or the like. it can. The content of the document information group selected by the user is analyzed by the map control unit 110, and a map 150 is generated.

等高線１５１Ａ，１５１Ｂ，１５１Ｃ及び１５１Ｄは、マップ１５０に含まれる文献情報の分布密度を示す表示要素である。例えば、等高線１５１Ａは、そこに存在する文献情報の数が１以上Ｂ１未満であることを示し、等高線１５１Ｂは、そこに存在する文献情報の数がＢ１以上Ｂ２未満であることを示す（Ｂ１，Ｂ２は自然数）。このように、１５１Ａから１５１Ｂ、１５１Ｃ、１５１Ｄへと向かうにつれて、そこに含まれる文献情報の数は段階的に増大する。 The contour lines 151A, 151B, 151C, and 151D are display elements indicating the distribution density of the document information included in the map 150. For example, the contour line 151A indicates that the number of document information existing therein is 1 or more and less than B1, and the contour line 151B indicates that the number of document information existing therein is B1 or more and less than B2 (B1, B2 is a natural number). In this way, the number of document information included therein increases step by step from 151A toward 151B, 151C, and 151D.

マップ１５０には、複数のキーワード１５２も表示されている。これらのキーワード１５２は、マップ１５０を構成する文献情報群の各文献情報を特徴づける主要なキーワードであり、以下の説明では、主要キーワード１５２と呼ぶ場合がある。 A plurality of keywords 152 are also displayed on the map 150. These keywords 152 are main keywords that characterize each piece of document information of the document information group constituting the map 150, and may be referred to as the main keywords 152 in the following description.

また、主要キーワード１５２のうち、一つまたは複数のキーワードは、基準キーワード１５２Ａとなり、一つまたは複数のキーワードは、対応キーワード１５２Ｂとなる。基準キーワード１５２Ａと対応キーワード１５２Ｂとは、両者の対応関係を示すための接続線１５３によって接続されている。基準キーワード１５２Ａと対応キーワード１５２Ｂとは、例えば、色彩や線種、線の太さ等を他のキーワード１５２と変えることにより、マップ１５０上で目立たせることができる。あるいは、対応するキーワード１５２Ａ，１５２Ｂを明滅等させて、他のキーワード１５２と区別することもできる。
なお、本発明の実施に必要な範囲内で、特開２００５−１４９３４６号公報の開示内容を利用可能である。 Of the main keywords 152, one or more keywords become reference keywords 152A, and one or more keywords become corresponding keywords 152B. The reference keyword 152A and the corresponding keyword 152B are connected by a connection line 153 for indicating the correspondence between them. The reference keyword 152A and the corresponding keyword 152B can be made conspicuous on the map 150, for example, by changing the color, line type, line thickness, and the like from the other keywords 152. Alternatively, the corresponding keywords 152A and 152B can be blinked to distinguish them from other keywords 152.
It should be noted that the content disclosed in Japanese Patent Application Laid-Open No. 2005-149346 can be used within the scope necessary for carrying out the present invention.

図８は、文献情報分析装置１００の機能構成を示すブロック図である。マップ制御部１１０は、例えば、マップ生成条件入力受付部１１１と、文献抽出部１１２と、文献座標算出部１１３と、キーワード抽出部１１４と、キーワード座標算出部１１５と、マップ生成部１１６及びマップ表示部１１７とを含んで構成することができる。 FIG. 8 is a block diagram illustrating a functional configuration of the document information analysis apparatus 100. For example, the map control unit 110 includes a map generation condition input reception unit 111, a document extraction unit 112, a document coordinate calculation unit 113, a keyword extraction unit 114, a keyword coordinate calculation unit 115, a map generation unit 116, and a map display. It can comprise including the part 117. FIG.

また、情報蓄積部３００の一例として、文献データベース（図中、データベースを「DB」と略記）３１と、単語データベース３２と、インデックスデータベース３３とを設けることができる。 Further, as an example of the information storage unit 300, a document database (in the figure, database is abbreviated as “DB”) 31, a word database 32, and an index database 33 can be provided.

文献データベース３１は、複数の文献情報を記憶するものである。単語データベース３２は、辞書として使用されるものである。インデックスデータベース３３は、各文献にどのようなキーワードが含まれているかを管理するためのものである。 The document database 31 stores a plurality of document information. The word database 32 is used as a dictionary. The index database 33 is for managing what keywords are included in each document.

マップ生成条件入力受付部１１１は、ユーザから指定されるマップ生成条件の入力を受け付けるためのものである。受け付けるとは、例えば、ユーザの指定するマップ生成条件を示す情報を通信インターフェースを介して電子情報として受信し、メモリ等に記憶させることを意味する。 The map generation condition input receiving unit 111 is for receiving an input of a map generation condition designated by the user. Accepting means, for example, receiving information indicating map generation conditions designated by the user as electronic information via a communication interface and storing the information in a memory or the like.

文献抽出部１１２は、ユーザから指定された条件に基づいて、文献データベース３１及びインデックスデータベース３３を検索することにより、所定範囲内の文献情報を抽出するものである。文献座標算出部１１３は、抽出された文献情報を解析することにより、マップ１５０上における座標を算出するものである。例えば、文献座標算出部１１３は、抽出された各文献情報に含まれるキーワードの組合せ及び出現数に対して主成分分析を施すことにより、二次元平面上における各文献の座標をそれぞれ算出する。 The document extraction unit 112 extracts document information within a predetermined range by searching the document database 31 and the index database 33 based on conditions designated by the user. The document coordinate calculation unit 113 calculates coordinates on the map 150 by analyzing the extracted document information. For example, the document coordinate calculation unit 113 calculates the coordinates of each document on the two-dimensional plane by performing principal component analysis on the keyword combination and the number of appearances included in each extracted document information.

キーワード抽出部１１４は、抽出された文献情報を解析することにより、そこに含まれている複数のキーワードを抽出する。キーワード座標算出部１１５は、抽出されたキーワードの座標をそれぞれ算出するものである。例えば、キーワード座標算出部１１５は、キーワードを含む各文献情報の総数及び出現総数に対して主成分分析を施すことにより、二次元平面上における各キーワードの座標をそれぞれ算出するものである。なお、主成分分析は、多変量解析のための一手法であり、公知の技術であるため詳細を割愛する。算出されたキーワード座標は、キーワード対応関係分析部１２０にも使用される。 The keyword extraction unit 114 extracts a plurality of keywords included therein by analyzing the extracted document information. The keyword coordinate calculation unit 115 calculates the coordinates of the extracted keywords. For example, the keyword coordinate calculation unit 115 calculates the coordinates of each keyword on the two-dimensional plane by performing principal component analysis on the total number and the total number of appearances of each piece of document information including the keyword. Principal component analysis is a technique for multivariate analysis, and is a known technique, so details are omitted. The calculated keyword coordinates are also used by the keyword correspondence analysis unit 120.

マップ生成部１１６は、算出された各文献情報の座標及び各キーワードの座標に基づいて、マップ１５０を生成する。例えば、マップ生成部１１６は、有限の二次元平面を縦横に細かく区切って多数のブロック領域を設定し、各ブロック領域に存在する文献情報の数を算出する。これにより、マップ生成部１１６は、文献情報の分布密度を求め、この分布密度に対応する等高線１５１Ａ等を設定し、マップ図形を作成する。 The map generation unit 116 generates the map 150 based on the calculated coordinates of each document information and the coordinates of each keyword. For example, the map generation unit 116 sets a large number of block areas by finely dividing a finite two-dimensional plane vertically and horizontally, and calculates the number of document information existing in each block area. Thereby, the map generation part 116 calculates | requires the distribution density of literature information, sets the contour line 151A etc. corresponding to this distribution density, and produces a map figure.

マップ表示部１１７は、生成されたマップ１５０の構成に応じて所定の表示要素をそれぞれ割り当てることにより、ユーザが視認可能なマップ１５０を生成し、クライアント端末２００に提供する。所定の表示要素としては、例えば、各等高線１５１Ａ等を示すための輪郭線や各キーワードを示すための文字等を挙げることができる。このほかに、マップ１５０に対する操作を行うためのメニュー表示部等も追加される。 The map display unit 117 generates a map 150 that can be visually recognized by the user by assigning predetermined display elements according to the configuration of the generated map 150, and provides the map 150 to the client terminal 200. Examples of the predetermined display element include an outline for indicating each contour line 151A and the like, characters for indicating each keyword, and the like. In addition, a menu display unit for performing operations on the map 150 is also added.

キーワード対応関係分析部１２０は、例えば、対象文献抽出部１２１と、キーワードランキング生成部１２２と、基準キーワード検出部１２３と、対応キーワード候補検出部１２４と、キーワード座標比較部１２５と、対応キーワード検出部１２６及びキーワード対応関係出力部１２７を備えて構成される。 The keyword correspondence analysis unit 120 includes, for example, a target document extraction unit 121, a keyword ranking generation unit 122, a reference keyword detection unit 123, a corresponding keyword candidate detection unit 124, a keyword coordinate comparison unit 125, and a corresponding keyword detection unit. 126 and a keyword correspondence output unit 127.

対象文献抽出部１２１は、ユーザの指定する分析条件に合致する文献情報を抽出するもので、図１中の対象文献抽出部２に対応する。対象文献抽出部１２１は、マップ１５０を構成する全ての文献情報、または、一部の文献情報を抽出する。 The target document extraction unit 121 extracts document information that matches the analysis conditions specified by the user, and corresponds to the target document extraction unit 2 in FIG. The target document extraction unit 121 extracts all document information or a part of document information constituting the map 150.

キーワードランキング生成部１２２は、キーワードランキングテーブル５を生成するもので、図１中のキーワードランキング生成部４に対応する。基準キーワード検出部１２３は、基準キーワードを検出するもので、図１中の基準キーワード検出部６に対応する。対応キーワード候補検出部１２４は、基準キーワードに対応しうるキーワードの候補を検出するもので、図１中の対応キーワード候補検出部７に対応する。キーワード座標比較部１２５は、基準キーワードのマップ１５０における座標と対応キーワード候補のマップ１５０における座標とを比較するもので、図１中の特徴情報比較部１０に対応する。なお、キーワード座標算出部１１５は、図１中の各特徴情報検出部８，９に対応する。 The keyword ranking generation unit 122 generates the keyword ranking table 5 and corresponds to the keyword ranking generation unit 4 in FIG. The reference keyword detection unit 123 detects a reference keyword, and corresponds to the reference keyword detection unit 6 in FIG. The corresponding keyword candidate detection unit 124 detects keyword candidates that can correspond to the reference keyword, and corresponds to the corresponding keyword candidate detection unit 7 in FIG. The keyword coordinate comparison unit 125 compares the coordinates in the reference keyword map 150 with the coordinates in the corresponding keyword candidate map 150, and corresponds to the feature information comparison unit 10 in FIG. The keyword coordinate calculation unit 115 corresponds to the feature information detection units 8 and 9 in FIG.

対応キーワード検出部１２６は、基準キーワードに対応するキーワードを検出するもので、図１中の対応キーワード検出部１１に対応する。キーワード対応結果出力部１２７は、基準キーワードと対応キーワードの関係を出力するもので、図１中のキーワード対応関係出力部１２に対応する。 The corresponding keyword detection unit 126 detects a keyword corresponding to the reference keyword, and corresponds to the corresponding keyword detection unit 11 in FIG. The keyword correspondence result output unit 127 outputs the relationship between the reference keyword and the corresponding keyword, and corresponds to the keyword correspondence relationship output unit 12 in FIG.

図９は、文献情報分析装置１００及びクライアント端末２００のハードウェア構成の概略を示す構成説明図である。文献情報分析装置１００は、上述のように、サーバコンピュータ等のように構成可能である。 FIG. 9 is a configuration explanatory diagram showing an outline of the hardware configuration of the document information analysis apparatus 100 and the client terminal 200. The literature information analysis apparatus 100 can be configured as a server computer or the like as described above.

文献情報分析装置１００は、例えば、通信インターフェース（図中、インターフェースを「I/F」と略記）１００１と、CPU（Central Processing Unit）１００２と、ROM（Read Only Memory）１００３と、RAM（Random Access Memory）１００４と、補助記憶装置１００５を備えて構成することができる。 The document information analyzing apparatus 100 includes, for example, a communication interface (in the drawing, the interface is abbreviated as “I / F”) 1001, a CPU (Central Processing Unit) 1002, a ROM (Read Only Memory) 1003, and a RAM (Random Access). Memory) 1004 and an auxiliary storage device 1005 can be provided.

補助記憶装置１００５には、例えば、OS（Operating System）の他に、文献データベース３１と、単語データベース３２と、インデックスデータベース３３と、キーワード対応関係分析プログラム１１００と、表示制御プログラム１１１０と、主成分分析プログラム１１２０と、構造解析プログラム１１３０と、検索プログラム１１４０及びウェブサーバプログラム１１５０を、それぞれ記憶させることができる。 In the auxiliary storage device 1005, for example, in addition to the OS (Operating System), the document database 31, the word database 32, the index database 33, the keyword correspondence analysis program 1100, the display control program 1110, and the principal component analysis A program 1120, a structural analysis program 1130, a search program 1140, and a web server program 1150 can be stored.

文献データベース３１には、上述のように、特許公開公報等の文献情報が予め記憶されている。単語データベース３２には、例えば、助詞や接続詞等のようなキーワードに不適切な単語、同義語、類義語等が予め記憶されている。各文献情報に含まれているキーワードは、文献データベース３１と単語データベース３２とを用いることにより、それぞれ抽出することができる。インデックスデータベース３３は、このようにして抽出された各文献情報毎のキーワードの所在をそれぞれ管理する。従って、インデックスデータベース３３を用いることにより、多数の文献情報を記憶する文献データベース３１の中から、必要な文献情報を速やかに検索することができる。 As described above, the document database 31 stores document information such as patent publications in advance. In the word database 32, for example, words, synonyms, synonyms, and the like inappropriate for keywords such as particles and conjunctions are stored in advance. The keywords included in each document information can be extracted by using the document database 31 and the word database 32, respectively. The index database 33 manages the location of keywords for each piece of document information extracted in this way. Therefore, by using the index database 33, necessary document information can be quickly retrieved from the document database 31 storing a large number of document information.

キーワード対応関係分析プログラム１１００は、マップ１５０に含まれるキーワード間の対応関係を分析し、その分析結果をマップ１５０上に表示させるものである。表示制御プログラム１１１０は、マップ１５０の描画処理等を行うプログラムである。主成分分析プログラム１１２０は、主成分分析を行うプログラムである。構造解析プログラム１１３０は、例えば、テキストマイニング等の手法に基づいて、テキストデータの構造を解析するプログラムである。検索プログラム１１４０は、入力された検索条件に基づいて、文献データベース３１等を検索するプログラムである。ウェブサーバプログラム１１５０は、ウェブサーバ機能を実現するプログラムである。 The keyword correspondence analysis program 1100 analyzes the correspondence between the keywords included in the map 150 and displays the analysis result on the map 150. The display control program 1110 is a program that performs drawing processing of the map 150 and the like. The principal component analysis program 1120 is a program for performing principal component analysis. The structure analysis program 1130 is a program for analyzing the structure of text data based on a technique such as text mining. The search program 1140 is a program that searches the literature database 31 and the like based on the input search conditions. The web server program 1150 is a program that realizes a web server function.

クライアント端末２００の構成を説明する。クライアント端末２００は、インターネット等の通信ネットワークCNを介して文献情報分析装置１００に接続されており、例えば、通信インターフェース２００１と、CPU２００２と、ROM２００３と、RAM２００４及び補助記憶装置２００５を備えて構成可能である。 The configuration of the client terminal 200 will be described. The client terminal 200 is connected to the document information analysis apparatus 100 via a communication network CN such as the Internet. For example, the client terminal 200 can be configured to include a communication interface 2001, a CPU 2002, a ROM 2003, a RAM 2004, and an auxiliary storage device 2005. is there.

なお、以上の構成は例示であって、本発明はこれに限定されない。例えば、プログラムに代えて、プログラマブル・ロジック・デバイス等のようなハードウェア回路を用いて、各機能の少なくとも一部を実現可能な場合もある。 In addition, the above structure is an illustration and this invention is not limited to this. For example, it may be possible to realize at least a part of each function by using a hardware circuit such as a programmable logic device instead of the program.

文献情報分析装置１００（以下、分析装置１００とも呼ぶ）を用いた文献情報分析方法について説明する。図１０は、マップ１５０を生成して出力するためのマップ制御処理の概要を示すフローチャートである。 A document information analysis method using the document information analysis apparatus 100 (hereinafter also referred to as the analysis apparatus 100) will be described. FIG. 10 is a flowchart showing an overview of map control processing for generating and outputting the map 150.

まず、ユーザは、クライアント端末２００のユーザインターフェースを介して、検索条件を入力する（Ｓ３０）。検索条件は、例えば、「光触媒」等のようなキーワードを指定することにより、あるいは特許分類コードや文献の公開日等を指定することにより、行うことができる。 First, the user inputs search conditions via the user interface of the client terminal 200 (S30). The search condition can be performed, for example, by specifying a keyword such as “photocatalyst” or by specifying a patent classification code, a publication date of a document, or the like.

分析装置１００は、クライアント端末２００から検索条件を取得すると（Ｓ３１）、この検索条件に基づいてインデックスデータベース３３及び文献データベース３１を検索し、検索条件に合致する文献情報を全て抽出する（Ｓ３２）。より具体的には、ユーザから指定されたキーワードを有する文献情報を抽出するために、インデックスデータベース３３が使用される。そして、指定されたキーワードを有する文献情報が特定されると、その文献情報が文献データベース３１から読み出される。そして、抽出された結果は、分析装置１００からクライアント端末２００に送信され（Ｓ３３）、クライアント端末２００の画面に表示される（Ｓ３４）。 When acquiring the search condition from the client terminal 200 (S31), the analysis apparatus 100 searches the index database 33 and the document database 31 based on the search condition, and extracts all the document information that matches the search condition (S32). More specifically, the index database 33 is used to extract document information having a keyword designated by the user. When document information having the specified keyword is specified, the document information is read from the document database 31. Then, the extracted result is transmitted from the analysis apparatus 100 to the client terminal 200 (S33) and displayed on the screen of the client terminal 200 (S34).

ユーザは、クライアント端末２００の画面を介して、抽出された文献の総数や文献名称等を確認し、抽出結果に承認を与える（Ｓ３５）。なお、ユーザが抽出結果に満足しない場合、検索条件を変えて、再度検索を指示することができる。 The user confirms the total number of extracted documents, the document names, and the like via the screen of the client terminal 200, and gives approval to the extraction result (S35). If the user is not satisfied with the extraction result, the search condition can be changed and the search can be instructed again.

分析装置１００は、ユーザの承認を確認すると、抽出された各文献情報からキーワードをそれぞれ抽出する（Ｓ３６）。このキーワード抽出結果は、分析装置１００からクライアント端末２００に送信され（Ｓ３７）、クライアント端末２００の画面に表示される（Ｓ３８）。ユーザは、キーワード抽出結果に所望するキーワードが含まれているか等を確認して、承認を与える（Ｓ３９）。なお、ユーザがキーワード抽出結果に満足しない場合、再度のキーワード抽出を要求することもできる。 When the analysis apparatus 100 confirms the user's approval, the analysis apparatus 100 extracts a keyword from each extracted document information (S36). The keyword extraction result is transmitted from the analysis apparatus 100 to the client terminal 200 (S37) and displayed on the screen of the client terminal 200 (S38). The user confirms whether or not the desired keyword is included in the keyword extraction result and gives approval (S39). If the user is not satisfied with the keyword extraction result, it is possible to request another keyword extraction.

分析装置１００は、ユーザの承認を確認すると、主成分分析を行うことにより、抽出された各文献情報の座標をそれぞれ算出する（Ｓ４０）。続いて、分析装置１００は、各文献情報の座標に基づいて、文献情報の分布密度を算出する（Ｓ４１）。 After confirming the user's approval, the analysis apparatus 100 calculates the coordinates of each extracted document information by performing principal component analysis (S40). Subsequently, the analysis apparatus 100 calculates the distribution density of the document information based on the coordinates of each document information (S41).

分析装置１００は、主成分分析を行うことにより、抽出された各キーワードの座標をそれぞれ算出する（Ｓ４２）。そして、分析装置１００は、マップ１５０を生成してクライアント端末２００に送信する（Ｓ４３）。ユーザは、クライアント端末２００の画面に表示されたマップ１５０を確認する（Ｓ４４）。 The analysis apparatus 100 calculates the coordinates of each extracted keyword by performing principal component analysis (S42). Then, the analysis apparatus 100 generates a map 150 and transmits it to the client terminal 200 (S43). The user confirms the map 150 displayed on the screen of the client terminal 200 (S44).

図１１は、マップ１５０を生成する様子を模式的に示す説明図である。分析装置１００は、各文献情報の座標を算出し、文献座標管理テーブルＴ１に記憶させる。文献座標管理テーブルＴ１には、例えば、文献情報を特定するための文献番号に、その文献情報のＸ軸座標及びＹ軸座標とが対応付けられている。 FIG. 11 is an explanatory diagram schematically showing how the map 150 is generated. The analyzer 100 calculates the coordinates of each document information and stores them in the document coordinate management table T1. In the document coordinate management table T1, for example, the document number for specifying document information is associated with the X-axis coordinate and the Y-axis coordinate of the document information.

また、分析装置１００は、抽出された各キーワード（主要キーワード）の座標を算出し、キーワード座標管理テーブルＴ２に記憶させる。キーワード座標管理テーブルＴ２は、例えば、キーワードと、そのキーワードが含まれている文献番号と、そのキーワードのＸ軸座標及びＹ軸座標がそれぞれ記憶されている。なお、図中では、キーワードが含まれている文献番号を一つのみ示しているが、そのキーワードが含まれる全ての文献番号のリストを含めることができる。 Further, the analysis apparatus 100 calculates the coordinates of each extracted keyword (main keyword) and stores it in the keyword coordinate management table T2. For example, the keyword coordinate management table T2 stores a keyword, a document number including the keyword, and the X-axis coordinate and the Y-axis coordinate of the keyword. In the figure, only one document number including a keyword is shown, but a list of all document numbers including the keyword can be included.

図１１の下側に示すように、例えば、マップ１５０のＸ軸方法及びＹ軸方向をそれぞれ複数ずつ分割することにより、多数のブロック領域１３０をマップ１５０上に設定することができる。分析装置１００は、各ブロック領域１３０に位置する文献情報の数をそれぞれ算出することにより、文献情報の分布密度を求める。この算出された分布密度に応じて、等高線１５１が設定される。 As shown in the lower side of FIG. 11, for example, a plurality of block regions 130 can be set on the map 150 by dividing each of the X-axis method and the Y-axis direction of the map 150 by a plurality. The analysis apparatus 100 obtains the distribution density of the document information by calculating the number of document information located in each block region 130. A contour line 151 is set according to the calculated distribution density.

図１２は、キーワード対応関係分析処理を示すフローチャートである。このフローチャートは、図３に示すフローチャートと共通するステップを含んでいる。そこで、重複した説明を割愛し、本実施例に特徴的なステップを中心に説明する。 FIG. 12 is a flowchart showing the keyword correspondence analysis process. This flowchart includes steps common to the flowchart shown in FIG. Therefore, a duplicate description will be omitted, and the steps characteristic of this embodiment will be mainly described.

本実施例では、マップ１５０を生成する文献情報分析装置１００内に、キーワード対応関係を分析する機能を組み込んでいるため、キーワードの特徴情報としてキーワードの座標を使用する（Ｓ１５Ａ，Ｓ１７Ａ）。なぜなら、同一の技術要素を意味するが、その表現形態のそれぞれ異なるキーワード同士は、マップ１５０上で比較的近い位置に配置されると考えられるためである。 In this embodiment, since the function for analyzing the keyword correspondence is incorporated in the document information analyzing apparatus 100 that generates the map 150, the coordinates of the keyword are used as the keyword feature information (S15A, S17A). This is because keywords that mean the same technical element but differ in their expression form are considered to be arranged at relatively close positions on the map 150.

分析装置１００は、基準キーワードの座標と対応キーワード候補の座標を比較し（Ｓ１８Ａ）、座標の近い対応キーワード候補を選択して、キーワード対応関係テーブル１２Ａを生成する（Ｓ１９Ａ）。分析装置１００は、キーワード対応関係の分析結果を、マップ１５０上に表示させる。そして、分析装置１００は、ユーザが対応キーワード候補を選択しなかった場合（S21:NO）、基準キーワードの座標に最も近い座標を有するキーワードを、基準キーワードに対応付ける（Ｓ２３Ａ）。 The analysis apparatus 100 compares the coordinates of the reference keyword with the coordinates of the corresponding keyword candidate (S18A), selects the corresponding keyword candidate having a close coordinate, and generates the keyword correspondence table 12A (S19A). The analysis apparatus 100 displays the analysis result of the keyword correspondence on the map 150. When the user does not select the corresponding keyword candidate (S21: NO), the analysis apparatus 100 associates the keyword having the coordinates closest to the coordinates of the reference keyword with the reference keyword (S23A).

本実施例によれば、上述した第１実施例と同様の作用効果を奏する。そして、本実施例では、図１３に示すように、マップ１５０上に、複数のキーワード１５２が表示されると共に、基準キーワード１５２Ａと対応キーワード１５２Ｂとが接続線１５３で接続されて表示される。従って、ユーザは、マップ１５０を構成する文献情報群の全部または一部において、一部の用語が不統一であることと、その用語の標準的な表現とをマップ１５０上で容易に把握することができる。 According to the present embodiment, the same operational effects as those of the first embodiment described above can be obtained. In the present embodiment, as shown in FIG. 13, a plurality of keywords 152 are displayed on the map 150, and the reference keyword 152 </ b> A and the corresponding keyword 152 </ b> B are connected and displayed through a connection line 153. Therefore, the user can easily grasp on the map 150 that some terms are inconsistent and the standard expression of the terms in all or part of the document information group constituting the map 150. Can do.

図１４に基づいて第３実施例を説明する。本実施例では、ユーザが、対応キーワード候補の中から基準キーワードに対応するキーワードを選択しない場合に、特徴情報としての出現頻度順位（座標でもよい）を算出し、キーワード間の順位を比較する。 A third embodiment will be described with reference to FIG. In this embodiment, when the user does not select the keyword corresponding to the reference keyword from the corresponding keyword candidates, the appearance frequency rank (may be coordinates) as the feature information is calculated, and the ranks between the keywords are compared.

まず、分析装置１００は、ユーザにより指定された分析条件１Ａを取得し（Ｓ５１）、マップ１５０を構成する文献情報群のうち、分析条件１Ａに合致する文献情報群を抽出する（Ｓ１２）。そして、分析装置１００は、抽出された文献情報群に基づいて、キーワードランキングテーブル５を生成し（Ｓ５３）、キーワードランキングテーブル５に基づいて、基準キーワードテーブル６Ａ及び対応キーワード候補テーブル７Ａをそれぞれ生成する（Ｓ５４，Ｓ５５）。 First, the analysis apparatus 100 acquires the analysis condition 1A designated by the user (S51), and extracts the document information group that matches the analysis condition 1A from the document information group constituting the map 150 (S12). Then, the analysis apparatus 100 generates the keyword ranking table 5 based on the extracted document information group (S53), and generates the reference keyword table 6A and the corresponding keyword candidate table 7A based on the keyword ranking table 5, respectively. (S54, S55).

そして、分析装置１００は、キーワード対応関係テーブル１２Ａを生成して出力し（Ｓ５６，Ｓ５７）、ユーザからの選択を待つ（Ｓ５８）。ユーザが、対応キーワード候補の中からいずれか一つのキーワードを手動操作で選択すると（S58:YES）、その選択されたキーワードが基準キーワードに対応付けられる（Ｓ６２）。 Then, the analysis apparatus 100 generates and outputs the keyword correspondence table 12A (S56, S57), and waits for selection from the user (S58). When the user manually selects any one keyword from the corresponding keyword candidates (S58: YES), the selected keyword is associated with the reference keyword (S62).

これに対し、ユーザが、対応キーワード候補の中からいずれか一つのキーワードを選択しない場合（S58:NO）、分析装置１００は、基準キーワードの平均順位を算出し（Ｓ５９）、対応キーワード候補の順位を取得する（Ｓ６０）。そして、分析装置１００は、基準キーワードの平均順位と最も近い順位を有する対応キーワード候補を選択し、この選択された対応キーワードを基準キーワードに対応付ける（Ｓ６１）。 On the other hand, when the user does not select any one keyword from the corresponding keyword candidates (S58: NO), the analysis apparatus 100 calculates the average ranking of the reference keywords (S59) and ranks the corresponding keyword candidates. Is acquired (S60). Then, the analysis apparatus 100 selects a corresponding keyword candidate having a rank closest to the average rank of the reference keywords, and associates the selected corresponding keyword with the reference keyword (S61).

以下、図３で述べたと同様に、ユーザがシソーラスの登録を希望する場合（S63:YES）、分析装置１００は、対応キーワードを基準キーワードのシソーラスとして登録し（Ｓ６４）、ユーザが対応キーワードの置換を希望する場合（S65:YES）、対応キーワードを基準キーワードに変換する（Ｓ６６）。 Hereinafter, as described in FIG. 3, when the user desires to register a thesaurus (S63: YES), the analysis apparatus 100 registers the corresponding keyword as the reference keyword thesaurus (S64), and the user replaces the corresponding keyword. (S65: YES), the corresponding keyword is converted into a reference keyword (S66).

次に、図１５，図１６に基づいて、第４実施例を説明する。本実施例では、基準キーワードと係り受けをなす単語群と、各対応キーワード候補とそれぞれ係り受けをなす単語群とをそれぞれ求め、これら各係り受け単語群の最も一致する対応キーワード候補を対応キーワードとして検出する。 Next, a fourth embodiment will be described with reference to FIGS. In this embodiment, a word group that is modified with the reference keyword and a word group that is modified with each corresponding keyword candidate are obtained, and the corresponding keyword candidate that most matches each of these modified keyword groups is used as the corresponding keyword. To detect.

図１５は、本実施例によるキーワード対応関係分析装置によるデータの流れを模式的に示す説明図である。紙面の都合上、図１５では、図２中に示す分析条件１Ａ及び情報蓄積部３の図示を省略している。 FIG. 15 is an explanatory diagram schematically showing the flow of data by the keyword correspondence analysis apparatus according to this embodiment. For the sake of space, the analysis conditions 1A and the information storage unit 3 shown in FIG. 2 are not shown in FIG.

本実施例では、基準キーワード特徴情報検出部８は、基準キーワードと係り受けをなす単語のランキングテーブル８Ａを生成する。例えば、形態素解析や係り受け解析を行うことにより、基準キーワードと係り受けをなす単語を抽出し、この抽出された単語を出現頻度順に並べ替えることにより、係り受け単語ランキングテーブル８Ａを生成することができる。例えば、基準キーワードが「パソコン」の場合、係り受けをなす単語としては、「インストールする」、「フリーズする」、「購入する」等が挙げられる。 In the present embodiment, the reference keyword feature information detection unit 8 generates a ranking table 8A of words that are dependent on the reference keyword. For example, it is possible to generate a dependency word ranking table 8A by extracting words that are dependent on the reference keyword by performing morphological analysis or dependency analysis, and rearranging the extracted words in order of appearance frequency. it can. For example, when the reference keyword is “personal computer”, the words to be modified include “install”, “freeze”, “purchase”, and the like.

上記同様に、対応キーワード候補特徴情報検出部９は、各対応キーワード候補毎に、それぞれの対応キーワード候補と係り受けをなす単語のランキングテーブル９Ａをそれぞれ生成する。例えば、対応キーワード候補が「情報処理装置」の場合、係り受けをなす単語としては、「フリーズする」、「インストールする」、「制御する」等が挙げられる。対応キーワード候補が「マウス」の場合、係り受けをなす単語としては、「ドラッグする」、「クリックする」、「購入する」等が挙げられる。 In the same manner as described above, the corresponding keyword candidate feature information detection unit 9 generates, for each corresponding keyword candidate, a word ranking table 9A that is dependent on each corresponding keyword candidate. For example, when the corresponding keyword candidate is “information processing apparatus”, the words to be modified include “freeze”, “install”, “control”, and the like. In the case where the corresponding keyword candidate is “mouse”, the words to be modified include “drag”, “click”, “purchase”, and the like.

そして、対応キーワード検出部１１は、基準キーワードの係り受け単語ランキングテーブル８Ａと、各対応キーワード候補の係り受け単語ランキングテーブル９Ａとをそれぞれ比較する。これにより、対応キーワード検出部１１は、所定順位内の係り受け単語の一致する数が所定値以上である対応キーワード候補を複数検出する。あるいは、対応キーワード検出部１１は、所定順位内の係り受け単語が最も一致する対応キーワード候補を１つだけ検出することもできる。 Then, the corresponding keyword detection unit 11 compares the dependency word ranking table 8A of the reference keyword with the dependency word ranking table 9A of each corresponding keyword candidate. Thereby, the corresponding keyword detection unit 11 detects a plurality of corresponding keyword candidates in which the number of matching dependency words within a predetermined order is equal to or greater than a predetermined value. Alternatively, the corresponding keyword detection unit 11 can also detect only one corresponding keyword candidate whose dependency word in the predetermined order most closely matches.

図１６は、キーワード対応関係分析処理を示すフローチャートである。このフローチャートは、図１２と同様に、図３に示すフローチャートと共通するステップを含む。そこで、重複した説明を割愛し、本実施例に特徴的なステップを中心に説明する。 FIG. 16 is a flowchart showing the keyword correspondence analysis processing. Similar to FIG. 12, this flowchart includes steps common to the flowchart shown in FIG. Therefore, a duplicate description will be omitted, and the steps characteristic of this embodiment will be mainly described.

本実施例では、上述のように、基準キーワード及び対応キーワード候補の属性を直接的に比較するのではなく、これら各キーワードと係り受けをなす単語のランキングに基づいて、基準キーワードと対応キーワード候補との関係を推測する。 In this embodiment, as described above, instead of directly comparing the attributes of the reference keyword and the corresponding keyword candidate, the reference keyword and the corresponding keyword candidate are determined based on the ranking of words that are dependent on these keywords. Guess the relationship.

そこで、Ｓ１５Ｂでは、基準キーワードと係り受けをなす単語のランキングテーブル８Ａを生成し、Ｓ１７Ｂでは、対応キーワード候補毎に、各対応キーワード候補と係り受けをなす単語のランキングテーブル９Ａを生成する。そして、Ｓ１８Ｂでは、各係り受け単語ランキングテーブル８Ａ，９Ａを比較し、続くＳ１９Ｂでは、係り受け単語が所定値以上一致する対応キーワード候補を抽出する。ユーザによる手動操作の選択が行われない場合（S21:NO）、係り受け単語の一致する数（即ち、一致率）の最も大きい対応キーワード候補が一つ選択される（Ｓ２３Ｂ）。 Therefore, in S15B, a word ranking table 8A that is dependent on the reference keyword is generated, and in S17B, a word ranking table 9A that is dependent on each corresponding keyword candidate is generated for each corresponding keyword candidate. In S18B, the dependency word ranking tables 8A and 9A are compared, and in subsequent S19B, corresponding keyword candidates whose dependency words match by a predetermined value or more are extracted. If the user does not select manual operation (S21: NO), one corresponding keyword candidate having the largest number of matching words (ie, matching rate) is selected (S23B).

なお、本発明は、上述した実施の形態に限定されない。当業者であれば、本発明の範囲内で、種々の追加や変更等を行うことができる。例えば、当業者であれば、前記各実施例を適宜組み合わせることができる。 The present invention is not limited to the above-described embodiment. A person skilled in the art can make various additions and changes within the scope of the present invention. For example, those skilled in the art can appropriately combine the above embodiments.

例えば、上記実施例では、出願人単位で基準キーワードと対応キーワードとの関係を解析したが、これに限らず、文献単位でキーワード間の関係を分析することもできる。また、期間単位で、キーワード間の分析を行うこともできる。例えば、PHS（Personal Handyphone System）は、以前PHP（Personal Handy Phone）と呼ばれていたが、このように時代によって用語が変化した場合にも、本発明によって検出できる。さらに、発明者の氏名や出願人の名称に基づいてキーワードランキングを算出することにより、結婚等で発明者の氏名が変化した場合でも、旧姓との関係を把握し、現在の氏名に統合して管理することができる。同様に、企業の合併や分割等により、出願人の名称が変化した場合でも、旧名称と最新の名称との対応関係を容易に把握することができる。 For example, in the above embodiment, the relationship between the reference keyword and the corresponding keyword is analyzed in units of applicants, but the present invention is not limited to this, and the relationship between keywords can be analyzed in units of documents. It is also possible to analyze between keywords on a period basis. For example, PHS (Personal Handyphone System) was previously called PHP (Personal Handy Phone), but even when the term changes with the times, it can be detected by the present invention. Furthermore, by calculating the keyword ranking based on the name of the inventor and the name of the applicant, even if the name of the inventor changes due to marriage, etc., the relationship with the maiden name is grasped and integrated into the current name. Can be managed. Similarly, even when the name of the applicant changes due to merger or division of companies, the correspondence between the old name and the latest name can be easily grasped.

また、キーワード間の対応関係を検出するためのアルゴリズムを複数種類組合せて用いることもできる。例えば、キーワード出現順位に基づく方法と、キーワード座標に基づく方法と、キーワードと係り受けをなす単語の一致率に基づく方法の中から、いずれか複数の方法を組み合わせることにより、対応関係の検出精度を高めることもできる。 A plurality of types of algorithms for detecting the correspondence between keywords can be used in combination. For example, the correspondence detection accuracy can be improved by combining any of a method based on the keyword appearance rank, a method based on the keyword coordinates, and a method based on the matching rate of the words that are dependent on the keyword. It can also be increased.

本発明の実施形態に係るキーワード対応関係分析装置の全体を示す説明図である。It is explanatory drawing which shows the whole keyword corresponding relationship analyzer which concerns on embodiment of this invention. キーワード対応関係分析処理におけるデータの流れを模式的に示す説明図である。It is explanatory drawing which shows typically the flow of the data in a keyword corresponding relationship analysis process. キーワード対応関係分析処理を示すフローチャートである。It is a flowchart which shows a keyword correspondence analysis process. 図３中のキーワードランキングテーブル生成処理を示すフローチャートである。It is a flowchart which shows the keyword ranking table production | generation process in FIG. 図３中の基準キーワードテーブル生成処理を示すフローチャートである。It is a flowchart which shows the reference | standard keyword table production | generation process in FIG. 図３中の対応キーワード候補テーブル生成処理を示すフローチャートである。It is a flowchart which shows a corresponding keyword candidate table production | generation process in FIG. 本発明の第２実施例に係り、キーワード対応関係分析機能を備えた文献情報分析装置の全体を示す説明図である。It is explanatory drawing which concerns on 2nd Example of this invention, and shows the whole literature information analysis apparatus provided with the keyword corresponding relationship analysis function. 文献情報分析装置の機能ブロック図である。It is a functional block diagram of a literature information analysis device. 文献情報分析装置のハードウェア及びソフトウェアの構成概要を示す説明図である。It is explanatory drawing which shows the hardware and software structure outline | summary of a literature information analyzer. マップを生成し表示させる処理を示すフローチャートである。It is a flowchart which shows the process which produces | generates and displays a map. 文献座標管理テーブル及びキーワード座標管理テーブルとマップとの関係を示す説明図である。It is explanatory drawing which shows the relationship between a literature coordinate management table and a keyword coordinate management table, and a map. キーワード対応関係分析処理を示すフローチャートである。It is a flowchart which shows a keyword correspondence analysis process. キーワードの対応関係がマップ上に表示されている様子を示す説明図である。It is explanatory drawing which shows a mode that the correspondence of a keyword is displayed on the map. 本発明の第３実施例に係るキーワード対応関係分析処理のフローチャートである。It is a flowchart of the keyword corresponding relationship analysis process which concerns on 3rd Example of this invention. 本発明の第４実施例に係るキーワード対応関係分析処理におけるデータの流れを模式的に示す説明図である。It is explanatory drawing which shows typically the flow of the data in the keyword corresponding relationship analysis process which concerns on 4th Example of this invention. キーワード対応関係分析処理を示すフローチャートである。It is a flowchart which shows a keyword correspondence analysis process.

Explanation of symbols

１…分析条件設定部、１Ａ…分析条件、２…対象文献抽出部、３…情報蓄積部、３Ａ…文献情報記憶部、３Ｂ…キーワード等記憶部、４…キーワードランキング生成部、５…キーワードランキングテーブル、６…基準キーワード検出部、６Ａ…基準キーワードテーブル、７…対応キーワード候補検出部、７Ａ…対応キーワード候補テーブル、８…基準キーワード特徴情報検出部、８Ａ…基準キーワードと係り受けをなす単語のランキングテーブル、９…対応キーワード候補特徴情報検出部、９Ａ…対応キーワード候補と係り受けをなす単語のランキングテーブル、１０…特徴情報比較部、１１…対応キーワード検出部、１２…キーワード対応関係出力部、１２Ａ…キーワード対応関係テーブル、１３…表示部、１４…シソーラス登録部、１５…キーワード置換部、３１…文献データベース、３２…単語データベース、３３…インデックスデータベース、１００…文献情報分析装置、１１０…マップ制御部、１１１…マップ生成条件入力受付部、１１２…文献抽出部、１１３…文献座標算出部、１１４…キーワード抽出部、１１５…キーワード座標算出部、１１６…マップ生成部、１１７…マップ表示部、１２０…キーワード対応関係分析部、１２１…対象文献抽出部、１２２…キーワードランキング生成部、１２３…基準キーワード検出部、１２４…対応キーワード候補検出部、１２５…キーワード座標比較部、１２６…対応キーワード検出部、１２７…キーワード対応結果出力部、１２７…キーワード対応関係出力部、１５０…マップ、１５１Ａ，１５１Ｂ，１５１Ｃ…等高線、１５２…キーワード、１５２Ａ…基準キーワード、１５２Ｂ…対応キーワード、１５３…接続線、２００…クライアント端末 DESCRIPTION OF SYMBOLS 1 ... Analysis condition setting part, 1A ... Analysis condition, 2 ... Target literature extraction part, 3 ... Information storage part, 3A ... Document information storage part, 3B ... Keyword storage part, 4 ... Keyword ranking production | generation part, 5 ... Keyword ranking Table 6: Reference keyword detection unit 6A ... Reference keyword table 7 ... Corresponding keyword candidate detection unit 7A ... Corresponding keyword candidate table 8 ... Reference keyword feature information detection unit 8A ... Words that depend on the reference keyword Ranking table, 9 ... corresponding keyword candidate feature information detection unit, 9A ... ranking table of words that depend on the corresponding keyword candidate, 10 ... feature information comparison unit, 11 ... corresponding keyword detection unit, 12 ... keyword correspondence output unit, 12A ... Keyword correspondence table, 13 ... display unit, 14 ... thesaurus registration unit, 15 Keyword substitution unit, 31 ... literature database, 32 ... word database, 33 ... index database, 100 ... literature information analyzer, 110 ... map control unit, 111 ... map generation condition input acceptance unit, 112 ... literature extraction unit, 113 ... literature Coordinate calculator, 114 ... keyword extractor, 115 ... keyword coordinate calculator, 116 ... map generator, 117 ... map display, 120 ... keyword correspondence analysis unit, 121 ... target document extractor, 122 ... keyword ranking generator , 123 ... reference keyword detection unit, 124 ... corresponding keyword candidate detection unit, 125 ... keyword coordinate comparison unit, 126 ... corresponding keyword detection unit, 127 ... keyword correspondence result output unit, 127 ... keyword correspondence output unit, 150 ... map, 151A, 151B, 151C ... contour lines, 52 ... keyword, 152A ... reference keyword, 152B ... corresponding keywords, 153 ... connection line, 200 ... client terminal

Claims

A document information storage unit for storing a plurality of digitized document information;
By searching the document information storage unit based on the given analysis conditions, a target document extraction unit that extracts the document information to be analyzed,
For each keyword that appears in the document information based on the extracted document information, an appearance frequency analysis unit that generates keyword appearance frequency analysis information obtained by analyzing the application frequency;
A first keyword detecting unit that detects a first keyword used in predetermined document information among keywords included in the extracted document information based on the keyword appearance frequency analysis information;
Based on the keyword appearance frequency analysis information, a second keyword candidate that is a candidate for the second keyword corresponding to the first keyword is used as document information other than the predetermined document information in the extracted document information. A second keyword candidate detection unit for detecting from included keywords;
A second keyword detection unit for detecting the second keyword corresponding to the first keyword from the detected second keyword candidates;
Keyword correspondence analysis device with

The keyword correspondence analysis device according to claim 1, wherein the second keyword detection unit detects a second keyword candidate selected by a user from the second keyword candidates as the second keyword.

A first feature information detector for detecting feature information of the detected first keyword;
A second feature information detection unit for detecting feature information of the detected second keyword candidate,
The second keyword detecting unit compares the feature information of the detected first keyword with the feature information of the detected second keyword candidate, thereby determining the second keyword from the second keyword candidates. The keyword correspondence analysis device according to claim 1 to detect.

A first feature information detector for detecting feature information of the detected first keyword;
A second feature information detector for detecting feature information of the detected second keyword candidate;
A feature information comparison unit for comparing feature information of the first keyword and feature information of the second keyword;
A comparison result output unit that outputs a comparison result by the feature information comparison unit, and
The second keyword detection unit
A user designation mode for detecting, as the second keyword, a second keyword candidate selected by a user from the second keyword candidates based on the comparison result;
The automatic detection mode which detects the 2nd keyword candidate which has the feature information with the least difference with the feature information of the 1st keyword based on the comparison result as the 2nd keyword. Keyword correspondence analysis device.

The first keyword detection unit is based on the keyword appearance frequency analysis information, and is a second predetermined value or more and less than the total number of the extracted document information among keywords having appearance frequency ranks up to a first predetermined value. The keyword correspondence analysis apparatus according to claim 1, wherein a keyword used in the predetermined document information is detected as the first keyword.

Based on the keyword appearance frequency analysis information, the second keyword candidate detection unit has a third predetermined value or more in the predetermined document information among keywords included in the other document information other than the predetermined document information. The keyword correspondence analysis apparatus according to claim 1, wherein a keyword remaining after removing a used keyword is detected as the second keyword candidate.

The first feature information detection unit detects an average appearance frequency rank in the predetermined document information of the first keyword as feature information of the first keyword,
The said 2nd feature information detection part detects the appearance frequency rank of the said 2nd keyword candidate in said other document information other than the said predetermined document information as the feature information of the said 2nd keyword. The keyword correspondence analysis device according to any one of the above.

Coordinates of keywords included in the extracted document information by performing principal component analysis based on the total number of the extracted document information and the number of appearances of a plurality of predetermined keywords extracted from the extracted document information A keyword coordinate calculation unit for calculating
The first feature information detection unit detects the coordinates of the first keyword calculated by the keyword coordinate calculation unit as feature information of the first keyword,
The said 2nd feature information detection part detects the coordinate of the said 2nd keyword candidate calculated by the said keyword coordinate calculation part as the feature information of the said 2nd keyword candidate, either of Claim 3 or Claim 4 Keyword correspondence analysis device.

The first feature information detection unit detects a ranking of words that are dependent on the first keyword in the predetermined document information as the first feature information,
The said 2nd feature information detection part detects the ranking of the word which makes a dependency with the said 2nd keyword candidate in document information other than the said predetermined document information as said 2nd feature information. The keyword correspondence analysis apparatus according to any one of the above.

Document coordinates for calculating the coordinates of each document information of the extracted document information group by performing principal component analysis based on the combination and the number of appearances of a plurality of predetermined keywords extracted from the extracted document information A calculation unit;
A keyword coordinate calculation unit that calculates the coordinates of each of the predetermined keywords by performing principal component analysis based on the total number of document information including the predetermined keywords and the number of appearances of the predetermined keywords;
A distribution density of each document information is calculated based on the coordinates of each document information calculated by the document coordinate calculation unit, and a map figure having an outline based on the calculated distribution density and the predetermined keyword are visualized respectively. And a map generation unit for generating map information.
The keyword correspondence analysis device according to claim 1, wherein the first keyword and the second keyword are visualized in the map information.

The second keyword detection unit compares the coordinates of the first keyword and the coordinates of the second keyword candidate calculated by the keyword coordinate calculation unit, respectively, so that the second keyword candidate is selected from the second keyword candidates. The keyword correspondence analysis device according to claim 10, wherein

The first keyword and the second keyword are visualized in the map information in different display forms, and the map information includes a display element indicating a correspondence relationship between the first keyword and the second keyword. The keyword correspondence analysis device according to claim 10, which is included.

The keyword correspondence analysis device according to claim 1, further comprising a keyword replacement unit that replaces the second keyword with the first keyword.

The keyword correspondence analysis device according to claim 1, further comprising a relevance registration unit that stores the second keyword in association with the first keyword.

Computer
Document information storage means for storing a plurality of digitized document information;
A target document extracting means for extracting the document information of the analysis target based on the given analysis condition;
Based on the extracted document information, for each keyword appearing in the document information, appearance frequency analysis means for generating keyword appearance frequency analysis information obtained by analyzing the application frequency;
First keyword detecting means for detecting a first keyword used in predetermined document information among keywords included in the extracted document information based on the keyword appearance frequency analysis information;
Based on the keyword appearance frequency analysis information, a second keyword candidate that is a candidate for the second keyword corresponding to the first keyword is used as document information other than the predetermined document information in the extracted document information. Second keyword candidate detection means for detecting from among the included keywords;
A program that functions as a second keyword detection unit that detects the second keyword corresponding to the first keyword from the detected second keyword candidates.

Obtaining analysis conditions;
Extracting document information to be analyzed by searching a document information storage unit based on the acquired analysis conditions;
For each keyword appearing in the document information based on the extracted document information, generating keyword appearance frequency analysis information obtained by analyzing the application frequency;
Detecting a first keyword used in predetermined document information among keywords included in the extracted document information based on the keyword appearance frequency analysis information;
Based on the keyword appearance frequency analysis information, a second keyword candidate that is a candidate for the second keyword corresponding to the first keyword is used as document information other than the predetermined document information in the extracted document information. A step of detecting from included keywords;
Detecting the second keyword corresponding to the first keyword from the detected second keyword candidates;
Keyword correspondence analysis method including