JP2005339412A

JP2005339412A - Patent map generation method and program

Info

Publication number: JP2005339412A
Application number: JP2004160365A
Authority: JP
Inventors: Hideyuki Uchida; 秀幸内田; Takashi Yugawa; 高志湯川; Atsushi Mano; 淳真野
Original assignee: BEARNET Inc; Nagaoka University of Technology NUC
Current assignee: BEARNET Inc; Nagaoka University of Technology NUC
Priority date: 2004-05-31
Filing date: 2004-05-31
Publication date: 2005-12-08

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method for automatically or semi-automatically generating a patent map by a computer or the like to solve the problem of quantitative limitation and high cost, resulting from manual formation of the patent map only by patent experts of each technical object field. <P>SOLUTION: A conception base 204 that is a knowledge base for words contained in a patent specification document group 201 that is to be an object of patent map generation is generated by statistically processing the ways to use words in the patent specification document group 201, and vector values 205 for individual patent specifications contained in the patent specification document group 201 are calculated by the use of the concept base 204. By clustering the patent specification group 201 based on the vector values 205, the positions of the individual patents in the patent specification document group 201 are clarified, based on which clusters patents shown by the individual patent specifications of the patent specification document group 201 belong to, and the positions are visualized, whereby the patent map 303 is generated. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は，特定の対象に関する複数の特許が与えられた際に、それらの特許群に含まれる特許の相互の関係を内容に基づいて分類し、該分類された特許群をその分類の状況が視覚的にわかり易いように表示する特許マップ生成技術、特にミクロマップ生成技術に関する。 In the present invention, when a plurality of patents related to a specific object are given, the mutual relations of the patents included in those patent groups are classified based on the contents, and the classified patent groups are classified according to the classification status. The present invention relates to a patent map generation technique for displaying in a visually easy-to-understand manner, and more particularly to a micromap generation technique.

特許マップには、特定の技術領域に関する特許群に基づき出願傾向を可視化したマクロマップと、特定の対象に関する特許群について、１つ以上の観点に基づいて、該特許群に含まれる特許相互の関係を分類し、個々の特許が該対象に関する発明においてどのような位置づけにあるかを可視化するミクロマップとがある。 The patent map includes a macro map that visualizes application trends based on a group of patents related to a specific technical field, and a relationship between patents included in the group of patents based on one or more viewpoints regarding a group of patents related to a specific object And a micromap that visualizes how each patent is positioned in the subject invention.

本発明はミクロマップに関するものであるが、従来、ミクロマップの作成には、技術対象についての専門知識と特許文書の形式に関する知識が必要であるとされ、技術対象が属する技術分野における特許専門家が人手により作成していた。 The present invention relates to a micromap. Conventionally, the creation of a micromap requires technical knowledge about the technical object and knowledge about the format of the patent document, and patent experts in the technical field to which the technical object belongs. Was created manually.

しかし、このような背景技術においては、技術対象が属する技術分野ごとの特許専門家の人数が少ないために、一定期間に作成できる特許マップの数が限られてしまう点、膨大な費用を必要とする点が問題となっている。知的財産についての企業や個人の意識の高まりとともに、特許マップの重要性が増しているにも関わらず、このように、特許マップの作成が数少ない専門家の能力にのみ依存している点が、国の特許戦略上も大きな問題である。 However, in such background technology, since the number of patent experts for each technical field to which the technical object belongs is small, the number of patent maps that can be created in a certain period is limited, and enormous costs are required. This is a problem. Despite the increasing importance of patent maps with the increasing awareness of companies and individuals about intellectual property, the creation of patent maps depends only on the skills of a few specialists. The national patent strategy is also a big problem.

本発明の課題は、当該技術対象分野の特許専門家が人手のみにより特許マップを作成することに起因する量的な限界と高価な費用という課題を解決するために、コンピュータ等により自動的または半自動的に特許マップを生成する方法を提供することにある。 An object of the present invention is to automatically or semi-automatically use a computer or the like in order to solve the problem of quantitative limitations and expensive costs caused by a patent expert in the technical subject field creating a patent map only by hand. It is intended to provide a method for generating a patent map.

本発明は、特許マップ生成の対象となる特許明細文書群に含まれる語に関する知識ベースであるところの「概念ベース」を、該特許明細文書群における語の使われ方を統計処理することにより生成し、該概念ベースを用いて該特許明細文書群に含まれる個々の特許明細文書についてのベクトル値を計算し、該ベクトル値に基づいて該特許明細文書群をクラスタリングすることにより、該特許明細文書群の個々の特許明細書が表す特許がどのクラスターに属するかに基づいて、該個々の特許の該特許明細文書群における位置づけを明らかにし、その位置を視覚化することにより特許マップを生成する。 The present invention generates a “concept base”, which is a knowledge base related to a word included in a patent specification document group that is a target of patent map generation, by statistically processing how the word is used in the patent specification document group. And calculating a vector value for each patent specification document included in the patent specification document group using the concept base, and clustering the patent specification document group based on the vector value, thereby obtaining the patent specification document Based on which cluster the patents represented by the individual patent specifications of the group belong, the patent map is generated by clarifying the position of the individual patent in the patent specification document group and visualizing the position.

以下、本発明における課題を解決するための手段についてより具体的に説明する。
特定の対象に関する特許明細文書の集合は、記憶装置に格納される。記憶装置に格納された特許文書の集合（ここでは特許明細文書群と呼ぶ）の個々の特許明細文書から、与えられた特許マップ生成のための観点に基づき、該観点に関連する部分を抽出する。該特許明細文書から抽出された部分を、特許マップ生成において、該特許明細文書を代表するものとする。該抽出された特許明細文書の部分に対し、分かち書き処理を行って単語の列として、これを記憶装置に格納する。この処理を該特許明細文書群に含まれるすべての特許明細文書に対して施すことにより、抽出された特許明細文書に対応する単語列の群が、記憶装置に格納されることになる。 Hereinafter, means for solving the problems in the present invention will be described more specifically.
A collection of patent specification documents relating to a particular subject is stored in a storage device. Based on a given viewpoint for generating a patent map, a portion related to the viewpoint is extracted from each patent specification document of a set of patent documents (referred to as a patent specification document group) stored in the storage device. . The part extracted from the patent specification document shall be representative of the patent specification document in patent map generation. The extracted portion of the patent specification document is subjected to a splitting process and stored as a word string in the storage device. By applying this process to all patent specification documents included in the patent specification document group, a group of word strings corresponding to the extracted patent specification document is stored in the storage device.

該単語列群に対して、単語の出現頻度、単語と単語の出現距離の関係等に基づいて統計的な処理を行い、該単語列群に含まれるすべての異なった単語について、個々の単語に対応するベクトル値を計算し、記憶装置に格納する。この処理により、特許明細文書の抽出された部分に出現する単語に対し、該単語と該単語に対応するベクトル値とが組として得られる。この単語と該単語に対応するベクトルの組から成る集合を概念ベースと呼ぶ。概念ベースから単語を検索することにより、該単語に対応するベクトル値を得ることができる。 Statistical processing is performed on the word string group based on the appearance frequency of the word, the relationship between the word and the word appearance distance, etc., and all the different words included in the word string group are assigned to individual words. The corresponding vector value is calculated and stored in the storage device. By this processing, for the word appearing in the extracted part of the patent specification document, the word and the vector value corresponding to the word are obtained as a set. A set composed of a set of this word and a vector corresponding to the word is called a concept base. By retrieving a word from the concept base, a vector value corresponding to the word can be obtained.

概念ベースを作成した後、抽出された特許明細文書の部分の群の中の個々の抽出された特許明細文書の部分について、対応する単語列に含まれる個々の単語に関して概念ベースから該単語のベクトル値を得て、該単語列にわたってすべての単語に対応するベクトルを総和して、これを該特許明細文書が表す特許の特許ベクトル値とする。すなわち、これまでの処理により、与えられた特許明細文書群に含まれる個々の特許明細文書が表す特許に対して、対応する特許ベクトル値が付与される。 After creating the concept base, for each extracted patent specification document part in the group of extracted patent specification document parts, the word vector from the concept base with respect to the individual words contained in the corresponding word string The values are obtained and the vectors corresponding to all the words over the word string are summed, and this is used as the patent vector value of the patent represented by the patent specification document. That is, according to the processing so far, the corresponding patent vector value is assigned to the patent represented by each patent specification document included in the given patent specification document group.

特許ベクトル値が、記憶装置に格納されたすべての特許明細文書が表す特許に対して得られた後、該特許明細文書群が表す特許群を該特許ベクトル値に基づきクラスタリングを行う。該クラスタリング処理により、該特許群は、１以上のクラスターに分類される。 After patent vector values are obtained for patents represented by all patent specification documents stored in the storage device, the patent groups represented by the patent specification document groups are clustered based on the patent vector values. By the clustering process, the patent group is classified into one or more clusters.

分類のための観点が複数与えられている場合には、上述した、特許明細文書から観点に関連する部分を抽出する処理からクラスタリング処理までを、観点ごとに繰り返すことにより、個々の特許について、それぞれの観点における分類が得られる。 When multiple viewpoints for classification are given, the above-described processing from extracting a part related to the viewpoint from the patent specification document to the clustering process is repeated for each viewpoint, for each patent. Classification in terms of

得られた個々の観点における分類に基づいて、与えられた特許群を紙面や画面等の２次元平面状に配置すること等により特許マップを得る。
本発明の特許マップ生成方法では、与えられた特許の特許明細文書のうち、特許マップ作成の観点に関連する部分を抽出し、該抽出された部分の文書を単語列に分け、単語列に含まれる単語に対して統計処理を施すことにより、個々の単語に対応するベクトル値を得る。得られたベクトル値は、該特許明細文書の部分の群における単語の使われ方を反映しており、これは、該特許明細文書の部分の群での文脈における単語の意味を知識として表現しているとも言うことができる。 A patent map is obtained by arranging a given group of patents on a two-dimensional plane such as a paper surface or a screen based on the obtained classification in each viewpoint.
In the patent map generation method of the present invention, a portion related to the viewpoint of creating a patent map is extracted from the patent specification document of a given patent, the extracted portion of the document is divided into word strings, and included in the word string By applying statistical processing to each word, a vector value corresponding to each word is obtained. The resulting vector value reflects how the word is used in the group of parts of the patent specification document, which represents the meaning of the word in context in the group of parts of the patent specification document as knowledge. You can also say.

該特許明細文書の部分の群に含まれる特許明細文書の部分に関し、該特許明細書の部分に対応する単語列の個々の単語について、概念ベースから該単語に対応するベクトル値を得、このようにして得られるベクトル値を該単語列に含まれるすべての単語にわたって総和することにより、該特許明細文書が表す特許に対応した特許ベクトル値を得ている。文書は単語の集合から構成され、単語の知識の総体であると考えられるため、この処理によりえら得る特許ベクトル値は、該特許マップ生成のための観点に基づいた該特許の特徴を表現していることになる。 With respect to the parts of the patent specification document included in the group of parts of the patent specification document, a vector value corresponding to the word is obtained from the concept base for each word in the word string corresponding to the part of the patent specification. The vector values obtained in this way are summed over all the words included in the word string to obtain a patent vector value corresponding to the patent represented by the patent specification document. Since a document is composed of a set of words and is considered to be the total knowledge of words, the patent vector value obtained by this processing expresses the characteristics of the patent based on the viewpoint for generating the patent map. Will be.

特許ベクトル値が、特許マップ生成の観点における特許の特徴を表現しているのであるから、特許ベクトル値に基づいて特許群をクラスタリングすることにより、該特許マップ生成の観点における分類を得ることができる。 Since patent vector values express the characteristics of patents in terms of patent map generation, clustering patent groups based on patent vector values can provide classification in terms of patent map generation. .

観点が複数与えられた場合には、以上を観点ごとに繰り返すことにより、与えられた特許群について、個々の観点ごとの分類を得ることができる。例えば、観点が２つ与えられた場合には、特許群に含まれる個々の特許は、観点の異なる２つのクラスターに属することになる。２次元に分類できることになるとも言える。同様に、観点が３つ与えられた場合には、特許群に含まれる個々の特許は、観点の異なる３つのクラスターに属することになる。すなわち、３次元に分類できることになるとも言える。 When a plurality of viewpoints are given, the above is repeated for each viewpoint, whereby a classification for each viewpoint can be obtained for a given patent group. For example, when two viewpoints are given, individual patents included in the patent group belong to two clusters having different viewpoints. It can be said that it can be classified into two dimensions. Similarly, when three viewpoints are given, individual patents included in the patent group belong to three clusters having different viewpoints. That is, it can be said that it can be classified into three dimensions.

上記で得られた分類に基づいて、最終的には、特許群を紙面または画面上に視覚的に表現される。紙面や画面は２次元平面であるため、特許マップ生成のための観点が３以上の場合には、何らかの手法により２次元に縮退させられることになる。 Based on the classification obtained above, finally, the patent group is visually represented on a paper or a screen. Since the paper and the screen are two-dimensional planes, if the viewpoint for generating a patent map is three or more, the paper and the screen are reduced to two dimensions by some method.

本発明の方法およびそれに基づいたプログラムによれば、特定対象に関する特許明細文書の集合が与えられ、特許マップ生成のための観点が１以上与えられた際に、与えられた特許明細文書群の中の個々の特許明細文書が表す特許の相互の関係に基づいて、該特許を２次元平面上に配置する特許マップの生成を行うことが可能となる。従来の特許マップ生成は、当該対象が含まれる技術分野に精通した特許専門家が人手により行っていたが、本発明によりコンピュータ等により自動的又は半自動的に特許マップを生成できるようになる。 According to the method of the present invention and the program based thereon, a set of patent specification documents relating to a specific object is given, and one or more viewpoints for generating a patent map are given. It is possible to generate a patent map in which the patents are arranged on a two-dimensional plane based on the mutual relationship between the patents represented by the individual patent specification documents. Conventional patent map generation is performed manually by a patent expert who is familiar with the technical field in which the object is included, but according to the present invention, a patent map can be generated automatically or semi-automatically by a computer or the like.

以下、図面を参照しながら、本発明を実施するための最良の形態について詳細に説明する。
図１は、本発明の実施例の処理手順及びデータの流れを示している。 Hereinafter, the best mode for carrying out the present invention will be described in detail with reference to the drawings.
FIG. 1 shows a processing procedure and a data flow of an embodiment of the present invention.

与えられた特定の対象に関する特許明細文書の集合３０１は、記憶装置に格納され、特許明細文書群２０１となる。ここで、特定対象に関する特許明細文書を、どのように探し出すかは本発明が本質的に規定するところではないが、特許検索システムに適切なキーワードやキーワードを用いた論理式を人間が与えて該特許明細文書群を得る等の方法が考えられる。 A set 301 of patent specification documents relating to a given specific object is stored in a storage device and becomes a patent specification document group 201. Here, the present invention essentially does not prescribe how to search for a patent specification document related to a specific object. However, a human being gives a logical expression using keywords and keywords suitable for a patent search system. A method such as obtaining a patent specification document group is conceivable.

同時に特許マップ作成のための観点３０２が外部より指定される。指定された観点に基づいて、観点に対応する特許明細文書の部分の指定が１０１において生成される。例えば、観点として「解決すべき課題」が与えられた場合、特許明細文書の中から「発明が解決しようとする課題」の部分を抽出することが指定される。該指定を行うための具体的な方法は、本発明において本質的に規定されるものではないが、あらかじめ、観点と該観点に対応する部分との表を組み込んでおき、観点が与えられた際に、該表から該観点に対応する対応部分に関する記述を探し出す方法などが考えられる。 At the same time, a viewpoint 302 for creating a patent map is designated from the outside. Based on the specified viewpoint, a specification of the portion of the patent specification document corresponding to the viewpoint is generated at 101. For example, when “problem to be solved” is given as a viewpoint, it is designated to extract the “problem to be solved by the invention” from the patent specification document. The specific method for making the designation is not essentially defined in the present invention. However, when a table of viewpoints and parts corresponding to the viewpoints is incorporated in advance, and the viewpoint is given. In addition, a method of searching for a description about a corresponding part corresponding to the viewpoint from the table can be considered.

与えられた観点に対応する部分が指定されると、該指定に基づいて、記憶装置に格納された特許明細文書群２０１の個々の特許明細文書から、特許明細文書の関連部分の抽出が１０２において行われ、抽出された特許明細書の部分の群２０２として記憶装置に格納される。 When a portion corresponding to a given viewpoint is specified, extraction of a related portion of the patent specification document is performed at 102 based on the specification from the individual patent specification documents of the patent specification document group 201 stored in the storage device. The extracted patent specification part group 202 is stored in the storage device.

抽出された特許明細文書の部分の群２０２に含まれる個々の特許明細文書の部分に対し形態素解析処理１０３が行われる。形態素解析処理は、入力された文書の部分に含まれる文を単語の列に分解する処理を行う。すなわち、文を単語に分かち書きすることにより、日本語の文を該文に含まれる単語の列に変換する。抽出された特許明細文書の部分の群２０２に含まれるすべての文は、形態素解析処理１０３により単語の列に変換され、抽出された特許明細文書の部分に対応した単語列の群２０３として、記憶装置に格納される。 The morpheme analysis process 103 is performed on the individual patent specification document parts included in the group of extracted patent specification document parts 202. In the morpheme analysis process, a sentence included in an input document part is decomposed into word strings. That is, by dividing a sentence into words, a Japanese sentence is converted into a string of words included in the sentence. All sentences included in the extracted patent specification document part group 202 are converted into word strings by the morphological analysis processing 103 and stored as a word string group 203 corresponding to the extracted patent specification document parts. Stored in the device.

抽出された特許明細文書の部分に対応した単語列の群２０３から、近傍共起行列生成１０４により近傍共起行列が生成され、特異値分解による次元圧縮１０５が施されて、概念ベース２０４が生成される。以下では、概念ベースの生成について、別の図である図２を用いてより詳細に説明する。 A neighborhood co-occurrence matrix is generated by the neighborhood co-occurrence matrix generation 104 from the word string group 203 corresponding to the extracted part of the patent specification document, the dimension compression 105 is performed by singular value decomposition, and the concept base 204 is generated. Is done. In the following, the concept-based generation will be described in more detail with reference to FIG.

図１における抽出された特許明細文書の部分に対応した単語列群２０３には、図２における単語列群４０１に示すように、個々の特許明細文書の部分ごとに、該特許明細文書の部分に含まれる単語が、出現順に単語の列を形成している。該単語の列の群を入力として取り、近傍共起行列生成１０４において近傍共起行列が生成される。近傍共起行列とは、処理対象とするすべての単語列において、ある単語（これを Wi 書くことにする）とある単語(これを Wj と書くことにする)が一定の単語数以内に同時に生起する頻度から計算される値を Wi 対応する行、 Wj に対応する列の要素として持つ行列のことである。近傍共起行列を生成するための方法は、本発明において本質的に規定されるものではないが、単語列に含まれる単語数をｎとしたときに、全ての要素が０であるｎ行ｎ列の行列をまず用意し、単語列の群を順次走査しながら、ある単語 Wi に対して該単語の前後一定単語数以内に現れる個々の単語 Wj に対し、 Wi に対応する行、 Wj に対応する列の要素に一定数を加算すること等により、該単語列群に含まれる単語の数に比例するオーダーの計算量で生成できる。また、近傍共起する頻度から計算される値として、どのような値を用いるかについても、本発明で本質的に規定されるものではない。頻度そのものを用いることや、頻度を単調増加する関数に入力して得られる値を用いることなどが考えられる。また、近傍共起行列は、かならずしも行と列の数が互いに等しい正方行列である必要はない。個々の単語に関連付けられる単語ベクトル値は、行列の個々の行として得られるため、行数は概念ベースとして必要とする単語の数と同じである必要があるが、列数はこれと等しい必要はない。 As shown in the word string group 401 in FIG. 2, the word string group 203 corresponding to the extracted patent specification document part in FIG. The included words form a sequence of words in the order of appearance. The group of word strings is taken as an input, and a neighborhood co-occurrence matrix is generated in neighborhood co-occurrence matrix generation 104. Neighboring co-occurrence matrix means that a word (we write this as Wi) and a word (we write this as Wj) occur simultaneously within a certain number of words in all word strings to be processed. This is a matrix that has the values calculated from the frequency to be used as the elements of the row corresponding to Wi and the column corresponding to Wj. The method for generating the neighborhood co-occurrence matrix is not essentially defined in the present invention. However, when the number of words included in the word string is n, n rows n in which all elements are 0. First, a matrix of columns is prepared, and for each word Wj that appears within a certain number of words before and after the word Wi while scanning a group of word sequences in sequence, a row corresponding to Wi, corresponding to Wj By adding a certain number to the elements of the sequence to be generated, etc., it can be generated with an amount of calculation in an order proportional to the number of words included in the word sequence group. Also, what value is used as a value calculated from the frequency of co-occurrence in the vicinity is not essentially defined by the present invention. It is conceivable to use the frequency itself or to use a value obtained by inputting the frequency into a monotonically increasing function. In addition, the neighborhood co-occurrence matrix is not necessarily a square matrix having the same number of rows and columns. The word vector values associated with individual words are obtained as individual rows in the matrix, so the number of rows must be the same as the number of words required as a concept base, but the number of columns must be equal to this. Absent.

近傍共起行列４０２が生成されると、特異値分解による次元圧縮処理１０５により、該近傍共起行列が特異値分解された後、その結果に基づいて次元圧縮が行われる。
特異値分解とは、ある行列Aがあったときに、
Ａ＝Ｌ×λ×Ｒ
となるような行列Ｌ、λ、Ｒを求める処理を言う。ただし、λは対角行列、すなわち、対角要素以外の要素はすべて０であるような行列である。また、λの対角要素は、上の行にあるものほど、値が大きくなるように求められることが一般的である。行列Ａのランクをｒとすると、行列Ｌはｎ行ｒ列の大きさを持ち、行列λはｒ行ｒ列の大きさを持ち、行列Ｒはｒ行ｍ列の大きさを持つことになる。 When the neighborhood co-occurrence matrix 402 is generated, the neighborhood co-occurrence matrix is subjected to singular value decomposition by the dimension compression processing 105 based on singular value decomposition, and then dimension compression is performed based on the result.
Singular value decomposition means that when there is a matrix A,
A = L × λ × R
Is a process for obtaining matrices L, λ, and R such that However, λ is a diagonal matrix, that is, a matrix in which all elements other than the diagonal elements are zero. In general, the diagonal elements of λ are determined so that the values in the upper row are larger. When the rank of the matrix A is r, the matrix L has a size of n rows and r columns, the matrix λ has a size of r rows and r columns, and the matrix R has a size of r rows and m columns. .

特異値分解を行った後、該分解により得られた左側行列Ｌから一部の列だけを選択し抽出して、新たな行列４０４とする。近傍共起行列４０２において、ある単語 Wi に対応する行をｉ行目としたとき、該新たな行列４０４のｉ行目が該単語に関連付けられた単語ベクトル値となる。この単語と該単語に関連付けられた単語ベクトル値を組として、単語列群に含まれるすべての単語について記憶装置に格納したものが概念ベース２０４になる。左側行列から、どの列を選択して単語ベクトル値とするかは、本発明において本質的に規定されるものではないが、一般には左側の列より一定数の列が選択される。具体的には、近傍共起行列は 10000行×3000列程度であり、左側行列の左側から 100〜 200列が選択される。もちろん、これらの数は本実施例における単なる例示であり、本発明において本質的に規定されるものではない。 After performing the singular value decomposition, only some columns are selected and extracted from the left matrix L obtained by the decomposition to obtain a new matrix 404. In the neighborhood co-occurrence matrix 402, when the row corresponding to a certain word Wi is the i-th row, the i-th row of the new matrix 404 is a word vector value associated with the word. The concept base 204 is a combination of this word and the word vector value associated with the word stored in the storage device for all words included in the word string group. Which column is selected as the word vector value from the left matrix is not essentially defined in the present invention, but in general, a certain number of columns are selected from the left column. Specifically, the neighborhood co-occurrence matrix is about 10000 rows × 3000 columns, and 100 to 200 columns are selected from the left side of the left side matrix. Of course, these numbers are merely examples in the present embodiment, and are not essentially defined in the present invention.

また、特異値分解による次元圧縮１０５において、特異値分解をどのような方法で実施するかは、本発明において本質的に規定するところではない。一般には、行列Ａを左上部、行列Ａの転置を右下部に持ち、他の要素が０であるような正方行列Ｂを生成し、該正方行列Ｂの固有値を求めて、該固有地をＡの特異値と対応させることにより特異値分解を実行できる。また、近傍共起行列Ａにおいて、０である要素が比率的に多い場合、すなわち、Ａが疎な行列である場合には、少ないメモリ所要でかつ高速に特異値分解を行うことが可能なランチョス法等が用いられる。さらに、最終的に左側行列のうち必要とされる列における要素の値が求められれば良いため、左側行列のすべての要素を求めた後に、選択された列を抽出して単語ベクトル値とするといった２段階の処理をかならずしも行う必要はない。特異値分解の処理方法の中には、左側行列の列の値が順次求まって行く方法もあるので、このような方法を用いる場合は、必要とする列の値がすべて求まった時点で、特異値分解の処理を打ち切ってもかまわない。 In the dimension compression 105 based on singular value decomposition, the method of performing singular value decomposition is not essentially defined in the present invention. In general, a square matrix B having the matrix A at the upper left and the transpose of the matrix A at the lower right and the other elements being 0 is generated, the eigenvalue of the square matrix B is obtained, and the eigenlocation is determined as A Singular value decomposition can be performed by making it correspond to singular values. In addition, in the neighborhood co-occurrence matrix A, when the number of elements that are 0 is relatively high, that is, when A is a sparse matrix, a ranchos that can perform singular value decomposition with a small memory requirement and high speed. Laws are used. Furthermore, since it is only necessary to finally obtain the element values in the required columns of the left side matrix, after obtaining all the elements of the left side matrix, the selected columns are extracted and used as word vector values. It is not always necessary to perform a two-stage process. Some processing methods of singular value decomposition obtain the column values of the left matrix sequentially, so when using this method, the singular value decomposition is performed when all the required column values are obtained. The value decomposition process may be aborted.

概念ベースが生成された後、抽出された特許明細文書の部分に対応した単語列の群２０３から、個々の特許明細文書が表す特許に対する特許ベクトル値の生成１０６が行われる。抽出された特許明細文書の部分に対応した単語列の群２０３に含まれる個々の抽出された特許明細文書の部分の単語列を用いて、該抽出された特許明細文書の部分を含んでいる特許明細文書が表す特許の特許ベクトル値を計算することになる。個々の抽出された特許明細文書の部分の単語列に含まれる個々の単語に対して、概念ベースにおいて該単語に関連付けられた単語ベクトル値を得て、該得られた単語ベクトル値を、該個々の抽出された特許明細文書の部分の単語列に含まれるすべての単語について足し合わせて合成ベクトル値を得る。この合成ベクトル値を、該抽出された特許明細文書の部分を含んでいる特許明細文書が表す特許の特許ベクトル値とする。本実施例では、合成ベクトルを、単語のベクトル値を単純に足し合わせて得ているが、これは本発明において本質的に規定されるものでなく、個々の単語ベクトル値に単語の特性に応じた何らかの重みをかけた上で総和する方法なども考えられる。 After the concept base is generated, a patent vector value generation 106 for the patent represented by each patent specification document is performed from the group of word strings 203 corresponding to the extracted part of the patent specification document. A patent including a part of the extracted patent specification document by using a word string of each extracted part of the patent specification document included in the word string group 203 corresponding to the extracted part of the patent specification document The patent vector value of the patent represented by the specification document will be calculated. For each word included in the word string of each extracted part of the patent specification document, a word vector value associated with the word is obtained on a concept basis, and the obtained word vector value is All the words included in the word string of the extracted part of the patent specification document are added to obtain a combined vector value. This combined vector value is used as the patent vector value of the patent represented by the patent specification document including the extracted portion of the patent specification document. In this embodiment, the synthesized vector is obtained by simply adding the vector values of the words. However, this is not essentially defined in the present invention, and each word vector value depends on the characteristics of the word. A method of summing up some weight is also conceivable.

上記により、特許明細文書群２０１が表す特許群の個々の特許について、該特許に関連付けられた特許ベクトル値が求まることになる。特許を該特許を一意に指定できる符号（例えば特許番号）と該特許に関連付けられた特許ベクトル値とを組として、特許明細文書群２０１が表す特許群のすべての特許について記憶装置に格納した特許ベクトル値群２０５を得る。 As described above, for each patent of the patent group represented by the patent specification document group 201, the patent vector value associated with the patent is obtained. Patents in which all the patents of the patent group represented by the patent specification document group 201 are stored in the storage device as a set of a code (for example, a patent number) that can uniquely designate the patent and a patent vector value associated with the patent. A vector value group 205 is obtained.

特許ベクトル値群２０５は、特許ベクトル値に基づいたクラスタリング１０７によって分類され、特許を指定する符号と該特許が属する分類についての符号との組として記憶装置に格納される。特許明細文書群２０１が表すすべての特許を分類し格納したものとして、特許と分類の対応情報２０６が得られる。ここでクラスタリングの方法は、本発明において本質的に規定されるものではない。階層クラスタリング手法を用いても良いし、非階層クラスタリング手法を用いてもかまわない。階層クラスタリングにおいては、クラスタリングによりクラスターの階層図が得られることになり、この階層図上のどの階層に基づいて分類するかが問題となるが、クラスター化の際に求まるクラスター間の類似度があるあらかじめ設定した閾値よりも小さくならない階層のなかで最も高い階層のものを選ぶことにする。ただし、このクラスター階層選択基準は、本発明において本質的に規定されるものではなく、外部から与えられる他の情報に基づいて分類に用いるためのクラスター階層を決定する方法等も考えられる。また、非階層クラスタリングにおいては、クラスター数を与える必要があるが、該与えるクラスター数を決定する方法も、本発明において本質的に規定されるものではない。 The patent vector value group 205 is classified by the clustering 107 based on the patent vector value, and is stored in the storage device as a set of a code for designating a patent and a code for the classification to which the patent belongs. Assuming that all the patents represented by the patent specification document group 201 are classified and stored, the correspondence information 206 between the patent and the classification is obtained. Here, the clustering method is not essentially defined in the present invention. A hierarchical clustering method may be used, or a non-hierarchical clustering method may be used. In hierarchical clustering, a cluster hierarchy diagram can be obtained by clustering, and it is a problem to classify based on which hierarchy on this hierarchy diagram, but there is a similarity between clusters found in clustering The highest layer is selected from layers that do not become smaller than a preset threshold value. However, this cluster hierarchy selection criterion is not essentially defined in the present invention, and a method of determining a cluster hierarchy to be used for classification based on other information given from the outside is also conceivable. Further, in the non-hierarchical clustering, it is necessary to give the number of clusters, but the method for determining the given number of clusters is not essentially defined in the present invention.

得られた特許と分類の対応情報２０６に基づいて、特許の２次元平面への配置１０８が行われ、該配置に基づいて、紙面上あるいは画面上に対象となる特許が２次元平面上の配置として表示された特許マップ３０３が得られる。本実施例では、観点がひとつの場合を例として説明しているため、分類軸は１次元となり、２次元平面への配置は容易である。 Based on the obtained correspondence information 206 between the patent and the classification, the patent is placed on the two-dimensional plane, and based on the placement, the target patent is placed on the paper or the screen on the two-dimensional plane. As a result, a patent map 303 displayed as is obtained. In the present embodiment, the case where there is one viewpoint is described as an example, so the classification axis is one-dimensional and the arrangement on a two-dimensional plane is easy.

ただし、与えられる観点の数は、本発明において本質的に規定されるものではない。２以上の観点が与えられた場合は、個々の観点に対して、観点に関連した特許明細文書部分の指定１０１からクラスタリング１０７までを行い、該クラスタリングに基づく特許と分類の対応情報２０６も観点ごとに得る。このようにして得られた、観点ごとの特許と分類の対応情報２０６に基づいて特許を２次元平面上に配置する。観点が二つの場合は表として容易に配置できる。観点が三つ以上の場合には、２次元平面上に射影する処理が必要となるが、その処理法は本発明において本質的に規定されるものではない。観点が三つの場合は、三つのうち二つの観点に基づいて表として配置し、残りのひとつの観点に基づいた分類を、特許を表す文字や点の大きさとして表す方法等が考えられる。観点が四つの場合は、観点が三つの場合の射影に加え、もうひとつの観点の分類を色として表す方法等が考えられる。 However, the number of viewpoints given is not essentially defined in the present invention. When two or more viewpoints are given, the specification specification document portion specification 101 to clustering 107 related to the viewpoint is performed for each viewpoint, and the correspondence information 206 between the patent and the classification based on the clustering is also displayed for each viewpoint. To get to. Based on the patent-to-classification correspondence information 206 obtained for each viewpoint, the patents are arranged on a two-dimensional plane. When there are two viewpoints, it can be easily arranged as a table. When there are three or more viewpoints, a process of projecting on a two-dimensional plane is required, but the processing method is not essentially defined in the present invention. In the case where there are three viewpoints, a method of arranging as a table based on two of the three viewpoints, and expressing the classification based on the remaining one viewpoint as the size of a character or a point representing a patent, or the like can be considered. When there are four viewpoints, in addition to the projection when there are three viewpoints, a method of expressing the classification of another viewpoint as a color or the like can be considered.

本発明の一実施例における処理方法の手順図である。It is a procedure figure of the processing method in one Example of this invention. 本発明における概念ベース生成方法の手順図である。It is a procedure figure of the concept base production | generation method in this invention.

Explanation of symbols

１０１観点に関連した特許明細文書部分指定
１０２抽出指定に基づく特許明細文書部分の抽出
１０３形態素解析処理
１０４近傍共起行列生成
１０５特異値分解による次元圧縮
１０６特許ベクトル値の生成
１０７クラスタリング
１０８ 2次元平面への配置
２０１特許明細文書群
２０２抽出された特許明細文書の部分の群
２０３抽出された特許明細文書の部分に対応した単語列の群
２０４概念ベース
２０５特許ベクトル値群
２０６特許と分類の対応情報
３０１特許マップ作成対象の特許明細文書
３０２特許マップ作成のための観点
３０３特許マップ 101 Patent specification document part specification related to viewpoint 102 Extraction of patent specification document part based on extraction specification 103 Morphological analysis processing 104 Neighborhood co-occurrence matrix generation 105 Dimensional compression by singular value decomposition 106 Patent vector value generation 107 Clustering 108 Two-dimensional plane Arrangement in 201 Patent specification document group 202 Extracted patent specification document part group 203 Word string group corresponding to extracted patent specification document part 204 Concept base 205 Patent vector value group 206 Correspondence information of patent and classification 301 Patent Specification Document for Creating Patent Map 302 Perspective for Creating Patent Map 303 Patent Map

Claims

A computer program for causing a computer to execute a process of generating a patent map by classifying the mutual relations of patents included in a group of patents based on the contents when a plurality of patents relating to a specific object are given Because
For patent specification documents stored in the storage device,
A process of extracting a specific part from individual patent specification documents of the patent specification document group;
A process of dividing the specific part into words to form a word string;
By applying statistical processing to word strings in all patent specification documents of the patent specification document group, for all words included in the word string, individual words and vector values associated with the individual words A process based on a concept stored in a storage device;
For individual patent specification documents of the patent specification document group stored in the storage device,
A process of obtaining a word set by dividing a specific part extracted from the patent specification document for each word;
For each word included in the word set, obtaining a vector value associated with the word from the concept base;
A process of calculating the sum of vector values obtained for all words included in the word set;
Calculating a vector value that is the sum of the vectors as a vector value of a patent represented by the patent specification document, and storing the calculated value in a storage device in association with the patent;
Processing for clustering the patent group represented by the patent specification document group stored in the storage device based on the patent vector values associated with the individual patents of the patent group;
A process of performing classification for each patent belonging to the same cluster;
A computer program that executes.

A computer program according to claim 1,
For a word string group for a set of parts of a patent specification document obtained by extracting a specific part from a group of patent specification documents stored in a storage device,
Counting different words in the word string,
A process of creating a matrix in which the individual words of the counted words correspond to rows and the individual words also correspond to columns;
When the word string group is sequentially scanned and a word (hereinafter referred to as the word Wi) and a word (hereinafter referred to as the word Wj) appear nearby in the word string, the line corresponding to the word Wi of the matrix Incrementing the number of elements in the column corresponding to the word Wj by a positive number is repeated for all the word strings included in the word string group to obtain the value of each element of the matrix (hereinafter referred to as A matrix of element values is called a neighborhood co-occurrence matrix),
Processing for singular value decomposition of the neighborhood co-occurrence matrix;
A process of selecting several columns from the columns of the right matrix calculated by the singular value decomposition;
A process of creating a concept base by associating a word value with a vector value that also has a numerical value of an element of the selected column as an element in a row corresponding to the word in the right side matrix;
A computer program that executes.

A computer program according to claim 1,
A computer program for executing a process of arranging and displaying a patent, which is a result of executing classification for each patent belonging to the same cluster, in the vicinity.

A method for generating a patent map by classifying a mutual relationship of patents included in a group of patents based on contents when a plurality of patents related to a specific object are given,
For patent specification documents stored in the storage device,
Extracting specific parts from individual patent specification documents of the patent specification document group;
The specific part is divided into words to form a word string,
By applying statistical processing to word strings in all patent specification documents of the patent specification document group, for all words included in the word string, individual words and vector values associated with the individual words Is stored in a storage device as a concept base,
For individual patent specification documents of the patent specification document group stored in the storage device,
A process of obtaining a word set by dividing a specific part extracted from the patent specification document for each word;
For each word included in the word set, obtain a vector value associated with the word from the concept base;
A process of calculating the sum of vector values obtained for all words included in the word set;
A vector value that is the sum of the vectors is calculated as a vector value of a patent represented by the patent specification document, and stored in a storage device in association with the patent;
Clustering is performed on the patent group represented by the patent specification document group stored in the storage device based on the patent vector values associated with the individual patents of the patent group,
Perform classification for each patent belonging to the same cluster,
A method for generating a patent map, comprising a process.

The method of claim 4, comprising:
For a word string group for a set of parts of a patent specification document obtained by extracting a specific part from a group of patent specification documents stored in a storage device,
Count different words in a word string,
Create a matrix in which the individual words of the counted words correspond to rows and the individual words also correspond to columns,
When a sequence of word strings is scanned sequentially and a word (hereinafter referred to as word Wi) and a word (hereinafter referred to as word Wj) appear nearby in the word string, the line corresponding to word Wi in the matrix Increasing the number of elements in the column corresponding to the word Wj by a positive number is repeated for all the word strings included in the word string group, and the values of the elements of the matrix are obtained (hereinafter referred to as found A matrix of element values is called a neighborhood co-occurrence matrix),
Singular value decomposition of the neighborhood co-occurrence matrix,
Select some columns from the columns of the right matrix calculated by the singular value decomposition,
Creating a concept base by associating a word value with a vector value having the numerical value of the element of the selected column as an element in a row corresponding to the word in the right-side matrix;
A method for generating a patent map, comprising a process.

The method of claim 4, comprising:
A method for generating a patent map, comprising a step of arranging a patent that is a result of executing classification for each patent belonging to the same cluster and displaying it on the screen in the vicinity.