JPWO2008053949A1

JPWO2008053949A1 - Document group analyzer

Info

Publication number: JPWO2008053949A1
Application number: JP2008542170A
Authority: JP
Inventors: 博昭増山; 則夫荒木; 和巳蓮子; 敏郎大▲崎▼
Original assignee: PATENT RESULT CO., LTD.
Current assignee: PATENT RESULT CO., LTD.
Priority date: 2006-11-01
Filing date: 2007-11-01
Publication date: 2010-02-25
Also published as: WO2008053949A1

Abstract

文書解析装置に、複数の技術文献を取得するテキストデータ取得部１０１と、取得した各技術文献につき、各索引語の重み付け量を求める文書ベクトル取得部１０２と、取得した各技術文献を被験者とし、各索引語の重み付け量を用いて、各索引語を観測変数とした因子分析を行い、各索引語の各々について、因子毎に因子負荷量を算出するとともに、各技術文献の各々について、因子毎に因子得点を算出する因子演算部１０３と、各索引語の因子負荷量を用いて各索引語の帰属因子を決定するとともに、各技術文献の因子得点を用いて各技術文献の帰属因子を決定する帰属因子決定部１０４と、を設ける。これにより、複数の文書の中に、どのようなコンセプト乃至特徴を持った文書又は文書群が存在するかを、的確に分析する。In the document analysis apparatus, a text data acquisition unit 101 that acquires a plurality of technical documents, a document vector acquisition unit 102 that calculates a weighting amount of each index word for each acquired technical document, and each acquired technical document as a subject, Using the weighting amount of each index word, perform a factor analysis with each index word as an observation variable, calculate the factor loading for each factor for each index word, and for each factor for each technical document The factor calculation unit 103 that calculates the factor score and the factor loading of each index word are used to determine the attribution factor of each index word, and the factor of each technical document is determined using the factor score of each technical document And an attribution factor determination unit 104. As a result, it is possible to accurately analyze what concept or characteristic group or document group exists in a plurality of documents.

Description

本発明は、技術文献を分析する技術に関し、特に複数の文書の中に、どのようなコンセプト乃至特徴が存在するかを分析するための技術に関する。 The present invention relates to a technique for analyzing technical literature, and more particularly to a technique for analyzing what concept or feature exists in a plurality of documents.

複数の技術文献の内容を分析するために、技術文献を複数のクラスタに分類するものが知られている。例えば特開２００５−９２４４３号公報（特許文献１）は、検索された技術文献に対して形態素解析を行って得られた各単語にウェイトを付加して、各技術文献をベクトル化し、ベクトルの向きが近い技術文献同士を一つのクラスタにまとめている。そして、個々のクラスタごとに重要単語を抽出している。
特開２００５−９２４４３号公報 In order to analyze the contents of a plurality of technical documents, one that classifies technical documents into a plurality of clusters is known. For example, Japanese Patent Laying-Open No. 2005-92443 (Patent Document 1) adds weights to each word obtained by performing morphological analysis on a searched technical document, vectorizes each technical document, and determines the direction of the vector. Combining technical documents that are close to each other in one cluster. Then, important words are extracted for each cluster.
JP 2005-92443 A

しかし、このような従来のクラスタ分析による分類では、分析対象の複数の技術文献に表される技術分野のコンセプト乃至特徴を的確に把握することが困難な場合がある。例えば上記特開２００５−９２４４３号公報（特許文献１）では、ベクトルの向きの近さという尺度で分類を行っているので、ベクトルの向きが一定の閾値より近ければ同一クラスタ、少しでも遠ければ他クラスタとなってしまい、クラスタごとの特徴が必ずしも明確とはならない。クラスタごとに重要単語を抽出しても、抽出された重要単語が似通っているクラスタ間ではその違いを把握しにくいし、分類そのものの妥当性に対しても疑問が生じてしまう。
つまり、上記特許文献１の手法については、分類したクラスタの特徴を分析者に把握させることについて特に考慮されてはいないのである。従って、分析者は、分類されたクラスタの特徴を把握しようとすると、クラスタに属する各技術文献を読まざるを得なくなり、分析に長大な時間を費やしてしまうことになる。However, in such classification based on conventional cluster analysis, it may be difficult to accurately grasp the concept or feature of the technical field represented in a plurality of technical documents to be analyzed. For example, in JP-A-2005-92443 (Patent Document 1), classification is performed on the scale of the closeness of the vector direction. It becomes a cluster, and the characteristics of each cluster are not always clear. Even if an important word is extracted for each cluster, it is difficult to grasp the difference between clusters having similar extracted important words, and the validity of the classification itself is questioned.
In other words, the technique disclosed in Patent Document 1 does not take into consideration that the analyst grasps the characteristics of the classified cluster. Therefore, if an analyst tries to grasp the characteristics of the classified cluster, the analyst is forced to read each technical document belonging to the cluster, and takes a long time for the analysis.

そこで、本発明は上記事情に鑑みてなされたものであり、本発明の目的は、分析の客観性を維持しつつ、分析者が技術文献の分析において、複数の技術文献によって表される技術分野がどのようなコンセプト乃至特徴を有しているかを把握できるようにすることにある。 Therefore, the present invention has been made in view of the above circumstances, and an object of the present invention is to provide a technical field represented by a plurality of technical literatures when an analyst analyzes technical literatures while maintaining objectivity of analysis. This is to make it possible to grasp what concept or feature has the.

（１）上記課題を解決するため、本発明の一態様は、テキストデータの分析を行う文書群分析装置に適用される。
そして、前記文書群分析装置は、テキストデータで表された、複数の技術文献を取得するテキストデータ取得手段と、前記取得した各技術文献につき、各索引語の重み付け量を求める重み付け量算出手段と、前記取得した各技術文献を被験者とし、前記求めた各索引語の重み付け量を用いて、前記各索引語を観測変数とした因子分析を行い、各索引語の各々について、因子毎に因子負荷量を算出するとともに、前記各技術文献の各々について、因子毎に因子得点を算出する演算手段と、各索引語の因子負荷量を用いて各索引語の帰属因子を決定するとともに、各技術文献の因子得点を用いて各技術文献の帰属因子を決定する帰属因子決定手段と、同じ因子に属する索引語又は索引語群を、それぞれ該当する各因子に属する技術文献又は技術文献群のデータとともに、各因子につき出力する出力手段と、を備えることを特徴とする。(1) In order to solve the above-described problem, one embodiment of the present invention is applied to a document group analysis apparatus that analyzes text data.
The document group analyzing apparatus includes: text data acquisition means for acquiring a plurality of technical documents represented by text data; and weighting amount calculation means for calculating a weighting amount of each index word for each of the acquired technical documents. The obtained technical documents are used as subjects, and the weight analysis of each index word is used to perform a factor analysis using each index word as an observation variable. For each index word, factor loading is performed for each factor. For each of the technical documents, the calculation means for calculating the factor score for each factor and the factor loading of each index word are used to determine the attribution factor of each index word. The attribution factor determining means for determining the attribution factor of each technical document using the factor score of the technical document and the index word or index word group belonging to the same factor, respectively, the technical literature or technical sentence belonging to each corresponding factor Together with the group of data, characterized by comprising output means for outputting for each factor, a.

本発明の一態様によれば、各因子に索引語及び複数の技術文献を帰属させることができるので、索引語からは、分析者にとって理解可能な言語情報を通じて技術分野の特徴乃至コンセプトを把握することが可能となる。また、複数の技術文献からは、各技術文献に含まれる書誌情報が有する特定の傾向から技術分野の特徴乃至コンセプトを把握することが可能となる。なお、本発明では因子分析の手法を用いているが、通常の因子分析における「因子の解釈」というような分析者の主観的或いは恣意的な判断を必要としない。これは、観測変数として各索引語を用いていることによる。つまり、観測変数そのものが内容を表しているので、因子負荷量に基づいて因子に帰属する観測変数を決定すれば、各因子の内容は観測変数を用いて端的に示すことができる。 According to one aspect of the present invention, an index word and a plurality of technical documents can be attributed to each factor, so that characteristics or concepts of the technical field are grasped from the index word through linguistic information understandable to the analyst. It becomes possible. In addition, from a plurality of technical documents, it is possible to grasp the characteristics or concepts of the technical field from the specific tendency of the bibliographic information included in each technical document. In the present invention, the factor analysis technique is used, but it does not require the subjective or arbitrary judgment of the analyst, such as “interpretation of factors” in ordinary factor analysis. This is because each index word is used as an observation variable. That is, since the observation variable itself represents the content, if the observation variable belonging to the factor is determined based on the factor loading, the content of each factor can be simply indicated using the observation variable.

（２）また、前記文書群分析装置であって、
前記帰属因子決定手段は、各索引語の各々について、前記算出した因子負荷量を用いて、当該因子負荷量が最大の因子を選択し、その選択した因子を該索引語の帰属因子として特定するとともに、各技術文献の各々について、前記算出した因子得点を用いて、当該因子得点が最大の因子を選択し、その選択した因子を該技術文献の帰属因子として特定することとしてもよい。(2) In the document group analyzing apparatus,
The attribution factor determination means selects, for each index word, a factor having the maximum factor loading using the calculated factor loading, and specifies the selected factor as an attribution factor of the index word. At the same time, for each technical document, the factor score having the maximum factor score may be selected using the calculated factor score, and the selected factor may be specified as the attribution factor of the technical document.

各索引語又は各技術文献にとって最も関係性の強い因子に帰属することにより、その因子を最もよく説明することが可能となる。その結果、技術分野の特徴乃至コンセプトをより明確に把握することが可能となる。 By belonging to a factor most relevant to each index word or each technical document, it becomes possible to best explain the factor. As a result, it becomes possible to grasp the features and concepts of the technical field more clearly.

（３）また、前記文書群分析装置であって、
前記複数の文書に含まれる索引語の出現頻度を求め、該出現頻度を用いて各索引語の重要度を算出し、該算出した重要度を用いて、重要度上位所定個数の索引語を抽出する重要索引語抽出手段を更に備え、前記重み付け量算出手段は、前記各索引語の重み付け量として、前記重要度上位所定個数の索引語の重み付け量を求めることとしてもよい。(3) In the document group analyzing apparatus,
The appearance frequency of index words included in the plurality of documents is obtained, the importance of each index word is calculated using the appearance frequency, and a predetermined number of index words with the highest importance are extracted using the calculated importance An important index word extracting unit, and the weighting amount calculating unit may obtain a weighting amount of a predetermined number of index words having higher importance as a weighting amount of each index word.

複数の技術文献の中から、技術文献群全体の特徴を表す索引語のみを重要索引語として抽出した上で分析することにより、その技術分野の特徴乃至コンセプトをより明確に把握することが可能である。他方で、索引語を事前に絞り込んでおくことにより、分析処理の効率化を図ることも可能となる。 By extracting and analyzing only the index words that represent the characteristics of the entire technical document group as important index words from multiple technical documents, it is possible to grasp the characteristics or concepts of the technical field more clearly. is there. On the other hand, it is possible to improve the efficiency of the analysis process by narrowing down the index words in advance.

（４）また、前記文書群分析装置であって、
前記因子の各々について、該因子の技術的評価を示す因子評価値を算出する因子評価値算出手段を備え、前記出力手段は、前記技術文献又は技術文献群のデータとして、該技術文献又は技術文献群の因子評価値を出力することとしてもよい。(4) In the document group analyzing apparatus,
For each of the factors, a factor evaluation value calculation unit that calculates a factor evaluation value indicating a technical evaluation of the factor is provided, and the output unit uses the technical document or technical document as data of the technical document or technical document group. The group factor evaluation value may be output.

上記構成により、因子評価値による各因子間の相対比較を行うことができる。それにより、因子に帰属する技術文献によって表される技術要素間の相対的な位置関係を把握することができ、更には、それらの中から重要な技術要素とそうでないものとの分類を行うこともできる。 By the said structure, the relative comparison between each factor by a factor evaluation value can be performed. As a result, it is possible to grasp the relative positional relationship between the technical elements represented by the technical literature belonging to the factors, and further classify the important technical elements from those that do not. You can also.

（５）また、前記文書群分析装置であって、
前記技術文献は、特許公開公報及び特許掲載公報を含む特許文書であり、前記各特許文書の経過情報を取得する経過情報取得手段と、前記因子の各々について、該因子に属する前記技術文献又は技術文献群の経過情報を用いて、該因子の技術的評価を示す因子評価値を算出する因子評価値算出手段と、を備え、前記出力手段は、前記技術文献又は技術文献群のデータとして、前記因子評価値を出力することとしてもよい。(5) In the document group analyzing apparatus,
The technical document is a patent document including a patent publication and a patent publication, and for each of the factors, the technical document or the technology belonging to the factor, and a progress information acquisition unit that acquires the progress information of each patent document Factor evaluation value calculation means for calculating a factor evaluation value indicating technical evaluation of the factor using progress information of the document group, and the output means, as the technical literature or the technical literature group data, The factor evaluation value may be output.

本発明の一態様では、特許公報に基づく因子の評価を行うことができるため、その特許公報に関連する特許情報に含まれる審査経過情報を活用することにより、精度の高い因子評価を行うことが可能となる。 In one aspect of the present invention, factors can be evaluated based on a patent gazette. Therefore, by using examination progress information included in patent information related to the patent gazette, highly accurate factor evaluation can be performed. It becomes possible.

（６）また、前記文書群分析装置であって、
各因子に属する技術文献又は技術文献群の文献数を判定する文書数判定手段を備え、前記因子評価値算出手段は、前記因子の各々について、該因子に属する技術文献又は技術文献群の文献数に所定の重み付けをした第１指数を算出し、該因子に属する技術文献又は技術文献群の経過情報を指数化した第２指数を算出し、該算出した第１指数および第２指数を用いて、該因子の因子評価値を算出することとしてもよい。(6) In the document group analyzing apparatus,
Document number judging means for judging the number of documents of technical documents or technical document groups belonging to each factor, and the factor evaluation value calculating means for each factor is the number of technical documents or technical document groups belonging to the factor. A first index with a predetermined weight is calculated, a second index obtained by indexing the progress information of the technical documents or technical document groups belonging to the factor is calculated, and the calculated first index and second index are used. The factor evaluation value of the factor may be calculated.

技術文献数を判定することにより、技術文献群からなる母集団における因子のシェアを把握することができる。また、文献数に所定の重み付けを行うことにより、経済的な側面や技術競争力を加味した因子の評価を行うことができる。そして、経過情報を指数化することにより、定量的且つ客観的な因子の評価を行うことができる。それらの結果、その技術分野における技術要素のシェア及び経済的価値を把握することができるだけでなく、これら技術要素のシェア及び経済的価値を、数値化した上で定量的に把握することもできる。 By determining the number of technical documents, it is possible to grasp the share of factors in a population composed of technical documents. In addition, by applying a predetermined weight to the number of documents, it is possible to evaluate factors that take into account economic aspects and technical competitiveness. Then, quantitative and objective evaluation of factors can be performed by indexing the progress information. As a result, it is possible not only to grasp the share and economic value of the technical elements in the technical field, but also to quantitatively grasp the share and economic value of these technical elements after quantification.

（７）また、前記文書群分析装置であって、
前記経過情報には、他社引用件数、被特許異議申立ての回数、被特許無効審判請求の回数、審査請求の有無、および特許権設定登録の有無が含まれていて、前記第１指数とは、前記文献数に、他社引用件数の合計値、被特許異議申立て回数の合計値、および被特許無効審判請求回数の合計値のうちの少なくとも１つを用いて重み付けをした値であり、前記経過情報を指数化した第２指数とは、他社引用件数の合計値、被特許異議申立て回数の合計値、被特許無効審判請求の回数の合計値、審査請求率、および登録査定率のうちの少なくとも１つを指数化した値であることとしてもよい。(7) In the document group analyzing apparatus,
The progress information includes the number of citations from other companies, the number of oppositions to be patented, the number of requests for patent invalidation trials, the presence or absence of examination requests, and the presence or absence of registration of patent right setting. The number of documents is a value weighted using at least one of the total value of the number of citations from other companies, the total value of the number of oppositions to be patented, and the total number of requests for patent invalidation, The second index, which is an index of historical information, is the total number of citations from other companies, the total number of oppositions to be patented, the total number of requests for patent invalidation trials, the examination request rate, and the registered assessment rate. It may be a value obtained by indexing at least one of the above.

他社の特許取得、技術開発に対して障害となりうる特許の影響力を加味した因子評価が可能である。また、出願人の権利化意欲や審査官評価を加味した因子評価が可能である。それにより、因子評価の公平性及び適正度を担保することができるため、その結果として、技術要素間の相対的な位置関係、重要性の結果の公平且つ適正な把握をすることができる。 It is possible to evaluate factors that take into account the influence of patents that can be an obstacle to patent acquisition and technology development of other companies. In addition, it is possible to perform factor evaluation that takes into consideration the applicant's willingness to acquire rights and the examiner's evaluation. Thereby, since fairness and appropriateness of factor evaluation can be ensured, as a result, the relative positional relationship between the technical elements and the result of importance can be grasped fairly and appropriately.

（８）また、前記文書群分析装置であって、
前記出力手段は、前記因子評価値は前記因子ごと及び出願人ごとに算出することとしてもよい。(8) In the document group analyzing apparatus,
The output means may calculate the factor evaluation value for each factor and for each applicant.

各因子ごとに、因子に帰属する特許公報の出願人間の序列及びシェアを把握することができる。その結果、開発主体となる企業における競争状態の観点から、所定の技術分野の特徴を把握することができる。 For each factor, the rank and share of applicants of patent publications belonging to the factor can be grasped. As a result, it is possible to grasp the characteristics of a predetermined technical field from the viewpoint of a competitive state in a company that is a development subject.

（９及び１０）また本発明の他の態様は、上記各装置によって実行される方法と同じ工程を備えたデータ分析方法、並びに上記各装置によって実行される処理と同じ処理をコンピュータに実行させることのできるプログラムである。このプログラムは、ＦＤ、ＣＤＲＯＭ、ＤＶＤなどの記録媒体に記録されたものでもよく、ネットワークで送受信されるものでもよい。 (9 and 10) According to another aspect of the present invention, a data analysis method including the same steps as the methods executed by each of the above apparatuses and a computer that executes the same processes as the processes executed by the respective apparatuses are executed. It is a program that can. This program may be recorded on a recording medium such as FD, CDROM, or DVD, or may be transmitted / received via a network.

（１１）また、前記文書群分析装置であって、
前記技術文献は、特許公開公報及び特許掲載公報を含む特許文書であり、
前記取得した各特許文書について、当該特許文書の価値を個別に評価した特許スコアを取得する手段と、
前記因子の各々について、その因子に属する特許文書の前記特許スコアを用いて、該因子の技術的評価を示す因子評価値を算出する因子評価値算出手段とを備えること
としてもよい。(11) In the document group analyzing apparatus,
The technical literature is a patent document including a patent publication and a patent publication,
For each acquired patent document, means for acquiring a patent score that individually evaluates the value of the patent document;
Each of the factors may be provided with factor evaluation value calculating means for calculating a factor evaluation value indicating a technical evaluation of the factor using the patent score of a patent document belonging to the factor.

各特許文書の価値を個別に評価した特許スコアを用いることにより、各因子に属する特許文書の価値を反映した因子評価値の算出が可能となる。その結果、技術分野の特徴乃至コンセプトをより明確に把握することが可能となる。 By using a patent score that individually evaluates the value of each patent document, a factor evaluation value that reflects the value of the patent document belonging to each factor can be calculated. As a result, it becomes possible to grasp the features and concepts of the technical field more clearly.

（１２）また、前記文書群分析装置であって、
前記因子評価値算出手段は、
前記因子毎に、その因子に属する特許文書の前記特許スコアのうち、所定の閾値以上の特許スコアを選択し、その選択した特許スコアを集計した値を、前記因子評価値として算出すること
としてもよい。(12) In the document group analyzing apparatus,
The factor evaluation value calculation means includes
For each factor, a patent score that is equal to or higher than a predetermined threshold is selected from among the patent scores of patent documents belonging to the factor, and a value obtained by tabulating the selected patent scores may be calculated as the factor evaluation value. Good.

所定の閾値以上の値を集計対象とし閾値以下の値を捨象することにより、件数は多くても重要性の低い特許が多数あるだけで重要な特許の少ない因子が高得点になることを防止できる。その結果、適切な因子評価値を算出することができ、技術分野の特徴乃至コンセプトをより明確に把握することが可能となる。 By subtracting values below the threshold with the value above the specified threshold being the target of aggregation, it is possible to prevent a high number of factors with few important patents from having a high number of less important patents, even if there are many cases. . As a result, an appropriate factor evaluation value can be calculated, and it becomes possible to grasp the characteristics or concepts of the technical field more clearly.

（１３）また、前記文書群分析装置であって、
前記特許スコアは、前記因子評価値の算出対象である因子を含む母集団の文書群において標準化した値であること
が望ましい。(13) In the document group analyzing apparatus,
The patent score is preferably a value standardized in a document group of a population including a factor for which the factor evaluation value is calculated.

母集団における標準値を求めて因子評価値を算出することにより、異なる因子間での相対比較の精度を向上させることができる。その結果、技術分野の特徴乃至コンセプトをより明確に把握することが可能となる。 By calculating the standard value in the population and calculating the factor evaluation value, it is possible to improve the accuracy of relative comparison between different factors. As a result, it becomes possible to grasp the features and concepts of the technical field more clearly.

（１４）また、前記文書群分析装置であって、
前記特許スコアとは、前記特許文書を技術分野毎、且つ所定期間毎のグループに分類し、その分類したグループ毎に、そのグループに属する特許文書の経過情報を利用し、それぞれの特許文書についての算出した値であること
としてもよい。(14) In the document group analyzing apparatus,
The patent score means that the patent documents are classified into groups for each technical field and every predetermined period, and for each classified group, the progress information of the patent documents belonging to the group is used, It may be a calculated value.

技術分野毎、且つ所定期間毎のグループに分類し、その分類したグループ毎に経過情報を利用して特許スコアを算出することで、技術分野及び出願時期の違いによる経過情報の偏りを補正し、的確な特許スコアを算出することが可能となる。その結果、適切な因子評価値を算出することができ、技術分野の特徴乃至コンセプトをより明確に把握することが可能となる。 By classifying into groups for each technical field and every predetermined period, and calculating the patent score using the progress information for each classified group, to correct the bias of the progress information due to the difference in the technical field and the application time, An accurate patent score can be calculated. As a result, an appropriate factor evaluation value can be calculated, and it becomes possible to grasp the characteristics or concepts of the technical field more clearly.

本発明の一実施形態に係る文書群分析装置のハードウェア構成を示す図。The figure which shows the hardware constitutions of the document group analysis apparatus which concerns on one Embodiment of this invention. 図２（Ａ）は、上記文書群分析装置における因子抽出処理の手順を説明するフローチャート、図２（Ｂ）は、文書又は文書群のデータとして因子経過情報スコアを算出する処理手順を説明するフローチャート。FIG. 2A is a flowchart for explaining a procedure of factor extraction processing in the document group analysis apparatus, and FIG. 2B is a flowchart for explaining a processing procedure for calculating a factor progress information score as document or document group data. . 分析対象となる複数の文書のテキストデータを取得する方法の一例に関する説明図。Explanatory drawing regarding an example of the method of acquiring the text data of the some document used as analysis object. 分析対象となる複数の文書のテキストデータを取得する方法の一例に関する説明図。Explanatory drawing regarding an example of the method of acquiring the text data of the some document used as analysis object. 分析対象となる複数の文書のテキストデータを取得する方法の一例に関する説明図。Explanatory drawing regarding an example of the method of acquiring the text data of the some document used as analysis object. 本発明の実施例において算出された、各索引語の因子負荷量。Factor load amount of each index word calculated in the embodiment of the present invention. 本発明の実施例において算出された、各公報の因子得点。Factor scores for each publication calculated in the examples of the present invention. 本発明の実施例において抽出された各因子に属する索引語と、当該各因子に属する公報。An index word belonging to each factor extracted in the embodiment of the present invention and a gazette belonging to each factor. 本発明の実施例において算出された、各因子の複数の指標と、これに基づいて算出された特許インパクト指数。The patent impact index calculated based on the some parameter | index of each factor calculated in the Example of this invention. 本発明の実施例において算出された、各因子の複数の指標と、これに基づいて算出された経過情報指数。A plurality of indices of each factor calculated in the embodiment of the present invention, and a progress information index calculated based on the indices. 本発明の実施例において算出された、（因子ごとの）因子経過情報スコア。Factor progress information score (for each factor) calculated in the example of the present invention. 本発明の実施例において算出された因子経過情報スコアを図示した例。The example which illustrated the factor progress information score calculated in the Example of this invention. 本発明の実施例において算出された、因子ごと及び出願人ごとの因子経過情報スコア。The factor progress information score for every factor and every applicant calculated in the Example of this invention. 本発明の実施例において、各因子に属する公報群を更に出願人ごとに分類し、因子ごと及び出願人ごとに因子経過情報スコアを図示した例。In the Example of this invention, the example which further classified the publication group which belongs to each factor for every applicant, and illustrated the factor progress information score for every factor and every applicant. 上記実施形態の変形例に係る文書群分析装置の機能ブロック図。The functional block diagram of the document group analyzer which concerns on the modification of the said embodiment. 上記変形例における技術要素スコアの算出処理の手順を示すフローチャート。The flowchart which shows the procedure of the calculation process of the technical element score in the said modification. 上記変形例の技術要素スコア及び上記実施例の因子経過情報スコアの分布を、公報件数との関係において示した図。The figure which showed distribution of the technical element score of the said modification, and the factor progress information score of the said Example in relation to the number of publications. 上記変形例において、因子ごと及び出願人ごとに技術要素スコアを図示した例。The example which illustrated the technical element score for every factor and every applicant in the said modification. 上記変形例で利用する経過情報のデータ構成の一例を模擬的に示した図。The figure which showed schematically an example of the data structure of the progress information utilized in the said modification. 上記変形例で利用する内容情報のデータ構成の一例を模擬的に示した図。The figure which showed schematically an example of the data structure of the content information utilized in the said modification. 上記変形例におけるパテントスコアの算出処理の手順を示したフローチャート。The flowchart which showed the procedure of the calculation process of the patent score in the said modification. 上記変形例において各特許データの評価値を算出する処理の詳細を示すフローチャート。The flowchart which shows the detail of the process which calculates the evaluation value of each patent data in the said modification.

Explanation of symbols

１：処理装置、２：入力装置、３：記録装置、４：出力装置、１０１：テキストデータ取得手段、１０２：文書ベクトル取得手段、１０３：因子負荷量及び因子得点演算手段、１０４：帰属因子決定手段
1: processing device, 2: input device, 3: recording device, 4: output device, 101: text data acquisition means, 102: document vector acquisition means, 103: factor load and factor score calculation means, 104: attribution factor determination means

以下、本発明の実施の形態を、図面を参照して詳細に説明する。
＜１．文書群分析装置の構成＞
図１は、本発明の一実施形態に係る文書群分析装置のハードウェア構成を示す図である。本実施形態の文書群分析装置は、ＣＰＵ（中央演算装置）及びメモリ（記録装置）などを備えた処理装置１、キーボード（手入力器具）などの入力手段である入力装置２、文書群のデータや条件や処理装置１による作業結果などを格納する記録手段である記録装置３、及び抽出された因子に属する文書又は文書群のデータを表示又は印刷等する出力手段である出力装置４を備えたコンピュータ装置より構成されている。Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
<1. Configuration of document group analyzer>
FIG. 1 is a diagram showing a hardware configuration of a document group analysis apparatus according to an embodiment of the present invention. The document group analysis apparatus according to the present embodiment includes a processing device 1 having a CPU (central processing unit) and a memory (recording device), an input device 2 as input means such as a keyboard (manual input device), and data of a document group. And a recording device 3 which is a recording means for storing work conditions and processing results by the processing device 1, and an output device 4 which is an output means for displaying or printing data of documents or document groups belonging to the extracted factors. It consists of a computer device.

処理装置１は、テキストデータ取得部１０１、文書ベクトル取得部１０２、因子演算部１０３、帰属因子決定部１０４、文書数判定部２０１、経過情報読出し部２０２、指標算出部２０３、特許インパクト算出部２０４、経過情報算出部２０５、およびスコア算出部２０６を備える。
ここで、本実施形態では、処理装置１の各機能部（テキストデータ取得部１０１、文書ベクトル取得部１０２、因子演算部１０３、帰属因子決定部１０４、文書数判定部２０１、経過情報読出し部２０２、指標算出部２０３、特許インパクト算出部２０４、経過情報算出部２０５、およびスコア算出部２０６）が、ソフトウェアにより実現される場合を例にする。
具体的には、処理装置１のメモリには、各機能部の機能を実現するためのプログラム（テキストデータ取得プログラム、文書ベクトル取得プログラム、因子演算プログラム、帰属因子決定プログラム、文書数判定プログラム、経過情報読出しプログラム、指標算出プログラム、特許インパクト算出プログラム、経過情報算出プログラム、およびスコア算出プログラム）が記憶されているものとする。そして、処理装置１の各機能部の機能は、ＣＰＵがメモリに記憶されている上記のプログラムを実行することにより実現される。The processing apparatus 1 includes a text data acquisition unit 101, a document vector acquisition unit 102, a factor calculation unit 103, an attribution factor determination unit 104, a document number determination unit 201, a progress information reading unit 202, an index calculation unit 203, and a patent impact calculation unit 204. A progress information calculating unit 205 and a score calculating unit 206.
Here, in this embodiment, each functional unit (text data acquisition unit 101, document vector acquisition unit 102, factor calculation unit 103, attribution factor determination unit 104, document number determination unit 201, progress information reading unit 202 of the processing apparatus 1 is described. The case where the index calculation unit 203, the patent impact calculation unit 204, the progress information calculation unit 205, and the score calculation unit 206) are realized by software is taken as an example.
Specifically, in the memory of the processing device 1, a program (text data acquisition program, document vector acquisition program, factor calculation program, attribution factor determination program, document number determination program, progress, etc. for realizing the function of each functional unit Information reading program, index calculation program, patent impact calculation program, progress information calculation program, and score calculation program) are stored. And the function of each function part of processing device 1 is realized when CPU runs the above-mentioned program memorized by memory.

記録装置３は、条件記録部３１、作業結果格納部３２、文書格納部３３などから構成される。文書格納部３３は外部データベースや内部データベースから得た、文書群のデータを含んでいる。外部データベースとは、例えば日本国特許庁でサービスしている特許電子図書館のＩＰＤＬや、株式会社パトリスでサービスしているＰＡＴＯＬＩＳ（登録商標）などの文書データベースを意味する。又内部データベースとは、販売されている例えば特許ＪＰ−ＲＯＭなどのデータを自前で格納したデータベース、文書を格納したＦＤ（フレキシブルディスク）、ＣＤ（コンパクトディスク）ＲＯＭ、ＭＯ（光磁気ディスク）、ＤＶＤ（デジタルバーサタイルディスク）などの媒体から読み出す装置、紙などに出力された或いは手書きされた文書を読み込むＯＣＲ（光学的文字読み取り装置）などの装置及び読み込んだデータをテキストなどの電子データに変換する装置などを含んでいるものとする。
本実施形態では、分析対象文書として主に公開特許公報、公告特許公報、特許掲載公報、特許発明明細書、公表特許公報、再公表特許公報、特許審判請求公告、公開特許公報英文抄録、公開実用新案公報、公開実用新案明細書、公表実用新案公報、再公表実用新案公報、公告実用新案公報、実用新案登録公報、登録実用新案公報、登録実用新案明細書、実用新案審判請求公告、公開技報等の種々の特許公報類を扱うが、これに限らず、技術論文、技術を扱った雑誌、書籍など広く技術文献一般を分析することができる。The recording device 3 includes a condition recording unit 31, a work result storage unit 32, a document storage unit 33, and the like. The document storage unit 33 includes document group data obtained from an external database or an internal database. The external database means, for example, a document database such as IPDL of a patent digital library serviced by the Japan Patent Office or PATOLIS (registered trademark) serviced by Patrice Co., Ltd. Also, the internal database is a database that stores data such as patent JP-ROM that is sold on its own, FD (flexible disc), CD (compact disc) ROM, MO (magneto-optical disc), DVD that stores documents. A device that reads from a medium such as a (digital versatile disk), a device such as an OCR (optical character reader) that reads a document that has been output or handwritten on paper, and a device that converts the read data into electronic data such as text And so on.
In this embodiment, as the analysis target document, mainly published patent gazette, published patent gazette, patent publication gazette, patent invention specification, published patent gazette, re-published patent gazette, patent trial request publication, published patent gazette English abstract, published utility New model bulletin, published utility model specification, published utility model gazette, republished utility model gazette, public utility model gazette, utility model registration gazette, registered utility model gazette, registered utility model specification, utility model trial notice, public technical report However, the present invention is not limited to this, and it is possible to analyze general technical literature such as technical papers, magazines dealing with technology, and books.

処理装置１、入力装置２、記録装置３、及び出力装置４の間で信号やデータをやり取りする通信手段としては、ＵＳＢ（ユニバーサルシリアルバス）ケーブルなどで直接接続してもよいし、ＬＡＮ（ローカルエリアネットワーク）などのネットワークを介して送受信してもよいし、文書を格納したＦＤ、ＣＤＲＯＭ、ＭＯ、ＤＶＤなどの媒体を介してもよい。或いはこれらの一部、又はいくつかを組み合わせたものでもよい。 As a communication means for exchanging signals and data among the processing device 1, input device 2, recording device 3, and output device 4, it may be directly connected by a USB (Universal Serial Bus) cable or the like, or LAN (local It may be transmitted / received via a network such as an area network, or may be via a medium such as an FD, CDROM, MO, or DVD that stores a document. Alternatively, a part or a combination of these may be used.

＜１−１．入力装置２の詳細＞
次に、上記の文書群分析装置における構成と機能を詳しく説明する。
入力装置２では、分析対象文書群のテキストデータの取得条件、文書ベクトルの取得条件、因子負荷量及び因子得点の算出条件、帰属因子の決定条件、後述の因子経過情報スコアの算出条件、出力条件などの入力を受け付ける。これら入力された条件は、記録装置３の条件記録部３１へ送られて格納される。<1-1. Details of Input Device 2>
Next, the configuration and function of the document group analysis apparatus will be described in detail.
In the input device 2, the text data acquisition condition of the analysis target document group, the document vector acquisition condition, the factor loading amount and the factor score calculation condition, the attribution factor determination condition, the factor progress information score calculation condition described later, and the output condition Etc. are accepted. These input conditions are sent to and stored in the condition recording unit 31 of the recording device 3.

＜１−２．処理装置１の詳細＞
テキストデータ取得部１０１は、入力装置２で入力されるテキストデータの取得条件に従い、分析対象となる文書群のデータを記録装置３の文書格納部３３から取得する。例えば、一定条件で抽出された文書群を類似度に基づいてクラスタ分析した結果得られたクラスタのうち、１つのクラスタに属するＩ件の文書についてテキストデータを取得する。取得されたテキストデータは、文書ベクトル取得部１０２に直接送られてそこでの処理に用いられ、或いは記録装置３の作業結果格納部３２に送られて格納される。<1-2. Details of Processing Device 1>
The text data acquisition unit 101 acquires data of a document group to be analyzed from the document storage unit 33 of the recording device 3 in accordance with the acquisition conditions of the text data input by the input device 2. For example, text data is acquired for I documents belonging to one cluster among the clusters obtained as a result of cluster analysis of document groups extracted under a certain condition based on the similarity. The acquired text data is directly sent to the document vector acquisition unit 102 and used for processing there, or is sent to the work result storage unit 32 of the recording device 3 and stored therein.

文書ベクトル取得部１０２は、入力装置２で入力される文書ベクトルの取得条件に従い、テキストデータ取得部１０１で取得されたＩ件の文書のテキストデータに基づいて、Ｉ個の文書ベクトルを算出する。この文書ベクトルは、索引語数をＪとすると、各文書ｉにおける索引語ｊの重み付け量ｚ_ｉｊをベクトル要素とするＪ次元ベクトルとなる。算出された文書ベクトルは、因子演算部１０３に直接送られてそこでの処理に用いられ、或いは記録装置３の作業結果格納部３２に送られて格納される。The document vector acquisition unit 102 calculates I document vectors based on the text data of I documents acquired by the text data acquisition unit 101 in accordance with the document vector acquisition conditions input by the input device 2. When the number of index words is J, this document vector is a J-dimensional vector having the weighting amount z _ij of the index word j in each document i as a vector element. The calculated document vector is directly sent to the factor calculation unit 103 and used for processing there, or is sent to the work result storage unit 32 of the recording device 3 and stored therein.

因子演算部１０３は、入力装置２で入力される因子負荷量及び因子得点の算出条件に従い、文書ベクトル取得部１０２で算出された文書ベクトルのベクトル要素ｚ_ｉｊに基づき、因子負荷量ａ_ｊｋ及び因子得点ｆ_ｉｋを算出する。ここでｋは因子番号であり、因子負荷量ａ_ｊｋは各索引語ｊについて因子ごとに算出され、因子得点ｆ_ｉｋは各文書ｉについて因子ごとに算出される。算出された因子負荷量ａ_ｊｋ及び因子得点ｆ_ｉｋは、帰属因子決定部１０４に直接送られてそこでの処理に用いられ、或いは記録装置３の作業結果格納部３２に送られて格納される。The factor calculation unit 103 follows the factor load amount a _jk and the factor based on the vector element z _ij of the document vector calculated by the document vector acquisition unit 102 according to the factor load amount and factor score calculation conditions input by the input device 2. A score f _ik is calculated. Here, k is a factor number, a factor loading a _jk is calculated for each factor for each index word j, and a factor score f _ik is calculated for each factor for each document i. The calculated factor load a _jk and factor score f _ik are directly sent to the attribution factor determination unit 104 and used for processing there, or sent to the work result storage unit 32 of the recording device 3 and stored therein.

帰属因子決定部１０４は、入力装置２で入力される帰属因子の決定条件に従い、因子演算部１０３により算出された因子負荷量ａ_ｊｋに基づいて各索引語ｊの帰属因子を決定し、因子得点ｆ_ｉｋに基づいて各文書ｉの帰属因子を決定する。決定された帰属因子は、文書数判定部２０１及び経過情報読出し部２０２に直接送られてそこでの処理に用いられ、或いは記録装置３の作業結果格納部３２に送られて格納される。そして、入力装置２で入力される出力条件に従い、同じ因子に属する索引語ｊが、該当する文書ｉのデータとともに出力装置４により出力される。The attribution factor determination unit 104 determines the attribution factor of each index word j based on the factor loading a _jk calculated by the factor calculation unit 103 in accordance with the attribution factor determination condition input by the input device 2, and factor score The attribution factor of each document i is determined based on f _ik . The determined attribution factor is directly sent to the document number determination unit 201 and the progress information reading unit 202 and used for processing there, or is sent to the work result storage unit 32 of the recording apparatus 3 and stored therein. Then, according to the output condition input by the input device 2, the index word j belonging to the same factor is output by the output device 4 together with the data of the corresponding document i.

文書数判定部２０１乃至スコア算出部２０６は、入力装置２で入力される因子経過情報スコアの算出条件に従い、因子経過情報スコアの算出を行う。
文書数判定部２０１は、帰属因子決定部１０４で決定された各文書の帰属因子に基づいて、因子ごとの文書群、或いは、因子ごと及び出願人ごとの文書群を、スコア算出対象文書群として読出し、その文書数を判定する。
経過情報読出し部２０２は、上記スコア算出対象文書群について、各文書の経過情報を記録装置３の文書格納部３３から読出す。
指標算出部２０３は、スコア算出対象文書群について、経過情報読出し部２０２によって読出された経過情報に基づく指標を算出する。
特許インパクト算出手段２０４は、スコア算出対象文書群について、文書数判定部２０１により判定された文書数及び指標算出部２０３により算出された指標に基づいて特許インパクトを算出する。
経過情報算出部２０５は、スコア算出対象文書群について、指標算出部２０３により算出された指標に基づいて経過情報指数を算出する。
スコア算出部２０６は、スコア算出対象文書群について、特許インパクト算出部２０４によって算出された特許インパクト及び経過情報算出部２０５によって算出された経過情報指数に基づいて因子経過情報スコアを算出する。
各機能部による作業結果は記録装置３の作業結果格納部３２に送られて格納される。The document number determination unit 201 to the score calculation unit 206 calculate the factor progress information score in accordance with the factor progress information score calculation condition input by the input device 2.
Based on the attribution factor of each document determined by the attribution factor determination unit 104, the document number determination unit 201 sets a document group for each factor or a document group for each factor and each applicant as a score calculation target document group. Read and determine the number of documents.
The progress information reading unit 202 reads the progress information of each document from the document storage unit 33 of the recording device 3 for the score calculation target document group.
The index calculation unit 203 calculates an index based on the progress information read by the progress information reading unit 202 for the score calculation target document group.
The patent impact calculation unit 204 calculates the patent impact of the score calculation target document group based on the number of documents determined by the document number determination unit 201 and the index calculated by the index calculation unit 203.
The progress information calculation unit 205 calculates a progress information index based on the index calculated by the index calculation unit 203 for the score calculation target document group.
The score calculation unit 206 calculates a factor progress information score for the score calculation target document group based on the patent impact calculated by the patent impact calculation unit 204 and the progress information index calculated by the progress information calculation unit 205.
The work result by each functional unit is sent to and stored in the work result storage unit 32 of the recording device 3.

＜１−３．記録装置３の詳細＞
記録装置３において、条件記録部３１は、入力装置２から得られた条件などの情報を記録し、処理装置１の要求に基づいて、必要なデータを送る。作業結果格納部３２は、処理装置１における各構成要素の作業結果を格納し、処理装置１の要求に基づいて、必要なデータを送る。文書格納部３３は、入力装置２或いは処理装置１の要求に基づいて、外部データベース或いは内部データベースから得た、必要な文書群のデータを格納し、提供する。文書格納部３３は、特許文書のデータを格納するときは、その書誌情報（出願人名など）及び経過情報（審査請求などの情報）を併せて格納するのが好ましい。<1-3. Details of Recording Device 3>
In the recording device 3, the condition recording unit 31 records information such as conditions obtained from the input device 2, and sends necessary data based on a request from the processing device 1. The work result storage unit 32 stores the work result of each component in the processing device 1 and sends necessary data based on a request from the processing device 1. The document storage unit 33 stores and provides necessary document group data obtained from an external database or an internal database based on a request from the input device 2 or the processing device 1. When storing patent document data, the document storage unit 33 preferably stores the bibliographic information (such as the name of the applicant) and the progress information (information such as the examination request).

＜１−４．出力装置４の詳細＞
出力装置４は、処理装置１の帰属因子決定部１０４で帰属因子が決定された文書及び索引語を、因子ごとに出力する。この出力装置４は例えばディスプレイ装置などの表示部を備え、文書及び／又は索引語と因子との対応表や、因子ごとに算出した文書又は文書群の因子評価値等を表示する。出力の形態としては、表示部での表示に限らず、紙などの印刷媒体への印刷、或いは通信手段を介してのネットワーク上のコンピュータ装置への送信などによってもよい。
<1-4. Details of Output Device 4>
The output device 4 outputs, for each factor, the document and index word for which the attribution factor is determined by the attribution factor determination unit 104 of the processing device 1. The output device 4 includes a display unit such as a display device, and displays a correspondence table between documents and / or index words and factors, a factor evaluation value of a document or a document group calculated for each factor, and the like. The form of output is not limited to display on the display unit, but may be printing on a print medium such as paper or transmission to a computer apparatus on a network via a communication unit.

＜２．因子抽出処理＞
図２（Ａ）は、上記文書群分析装置における因子抽出処理の手順を説明するフローチャートである。本実施形態の文書群分析装置は、因子分析の手法を用いて、分析対象となる複数の文書から因子を抽出する。<2. Factor extraction process>
FIG. 2A is a flowchart for explaining the procedure of factor extraction processing in the document group analysis apparatus. The document group analysis apparatus of this embodiment extracts factors from a plurality of documents to be analyzed using a factor analysis technique.

＜２−１．テキストデータの取得＞
本実施形態の文書群分析装置は、テキストデータ取得部１０１により、文書格納部３３から、分析対象としてＩ件の文書ｉ（ｉ＝１，２，・・・，Ｉ）のテキストデータを取得する（Ｓ１０１）。Ｉ件の文書としてどのような文書群を選ぶかは任意であるが、例えば次のように行う。<2-1. Acquisition of text data>
In the document group analysis apparatus of this embodiment, the text data acquisition unit 101 acquires text data of I documents i (i = 1, 2,..., I) as analysis targets from the document storage unit 33. (S101). Although what kind of document group is selected as the I documents is arbitrary, for example, the following is performed.

図３Ａ乃至図３Ｃは、分析対象となる複数の文書のテキストデータを取得する方法の一例に関する説明図である。なお、以下に説明する（Ａ）及び（Ｂ）の事項は、本出願人により出願された特許公報（国際公開第２００６／０３０７５１号参照。）記載の手順により実現されるものである。そのため、以下の説明は簡略化する。
（Ａ）まず、ある企業（対象企業）の特許公報群から、注目技術を選定する処理を行う。具体的には、処理装置１のテキストデータ取得部１０１は、対象企業の特許文書群をクラスタリングして各クラスタ（対象企業クラスタ）の評価値（例えば後述の因子経過情報スコアに相当するもの）を算出し、評価値が最大の対象企業クラスタを注目技術として選定する（図３Ａ）。
（Ｂ）次に、テキストデータ取得部１０１は、対象企業及び他企業の公報を含む自他特許文書群から「選定した注目技術および注目技術に類似する技術（特定技術分野）」に属する文書群（自他特許特定分野文書群）を抽出する。具体的には、テキストデータ取得部１０１は、自他特許文書群の各文書と上記注目技術との類似度を計算し、上記注目技術との類似度上位所定個数の文書群を自他特許特定分野文書群として文書格納部３３から抽出する（図３Ｂ）。これにより、例えば、対象企業の特許群から重要な特許群を選び出した上で、他社特許を含めた類似特許群を分析することができる。
（Ｃ）次に、テキストデータ取得部１０１は、抽出した自他特許特定分野文書群をクラスタリングすることにより、分析対象となる文書群を得る。ここで分析対象とする文書群（自他特許クラスタ）は、文書同士の類似度の高い下位クラスタに限られるものではなく、下位クラスタ同士の類似度の高い上位クラスタ、或いはその中間の中位クラスタであってもよい。図３Ｃには、下位クラスタ７０個、中位クラスタ（ライン）８個、上位クラスタ（グループ）４個が生成された例を示している。自他特許クラスタとして下位クラスタを用いるか、中位或いは上位クラスタを用いるかは分析の目的に応じて選択すればよい。これにより、例えば、自他特許特定分野文書群を、細分化された技術領域に分類することができ、各技術領域ごとに、或いは中位又は上位クラスタごとに、特許群を分析することができる。また、類似性の高い文書群である自他特許特定分野文書群をクラスタリングして更に類似性の高い文書群を分析対象とすることにより、因子分析において因子によって説明される割合（累積寄与率）を向上させ、文書群に含まれるコンセプト乃至特徴を的確に表現することができる。3A to 3C are explanatory diagrams regarding an example of a method for acquiring text data of a plurality of documents to be analyzed. The items (A) and (B) described below are realized by the procedure described in the patent publication (see International Publication No. 2006/030751) filed by the present applicant. Therefore, the following description is simplified.
(A) First, processing for selecting a technology of interest from a patent gazette group of a certain company (target company) is performed. Specifically, the text data acquisition unit 101 of the processing device 1 clusters the patent documents of the target company, and obtains an evaluation value of each cluster (target company cluster) (for example, one corresponding to a factor progress information score described later). The target company cluster with the highest evaluation value is calculated and selected as the technology of interest (FIG. 3A).
(B) Next, the text data acquisition unit 101 selects a document group belonging to “selected attention technology and technology similar to the attention technology (specific technical field)” from the other patent document group including gazettes of the target company and other companies. (Own patent specific field document group) is extracted. Specifically, the text data acquisition unit 101 calculates the degree of similarity between each document of its own patent document group and the noted technology, and identifies a predetermined number of document groups having higher similarity to the noted technology. A field document group is extracted from the document storage unit 33 (FIG. 3B). Thereby, for example, after selecting an important patent group from the patent group of the target company, it is possible to analyze a similar patent group including other companies' patents.
(C) Next, the text data acquisition unit 101 obtains a document group to be analyzed by clustering the extracted patent and patent specific field document group. The group of documents to be analyzed (self-other patent cluster) is not limited to the lower cluster having a high degree of similarity between documents, but the upper cluster having a high degree of similarity between the lower clusters or an intermediate cluster in between. It may be. FIG. 3C shows an example in which 70 lower clusters, 8 middle clusters (lines), and 4 upper clusters (groups) are generated. Whether the lower cluster or the middle or upper cluster is used as the self-other patent cluster may be selected according to the purpose of analysis. As a result, for example, the patent group of a patent specific field can be classified into subdivided technical areas, and the patent groups can be analyzed for each technical area, or for each intermediate or upper cluster. . Also, the ratio of documents explained by factors in the factor analysis (cumulative contribution rate) by clustering the patent groups in the patent-specific field, which is a group of documents with high similarity, and analyzing the group of documents with higher similarity The concept or feature included in the document group can be accurately expressed.

テキストデータ取得部１０１は、分析対象となる各文書ｉのテキストデータを文書格納部３３から取得したら、所定個数Ｊ個の索引語ｊ（ｊ＝１，２，３，・・・，Ｊ）を抽出する。抽出する索引語は、例えばＩ件の文書における重要度上位Ｊ個の索引語とする。重要度上位Ｊ個の索引語を抽出するには、文書ｉに含まれる索引語について重要度を算出して降順に並べ替え、上位Ｊ個を抽出する。 After acquiring the text data of each document i to be analyzed from the document storage unit 33, the text data acquisition unit 101 obtains a predetermined number J of index words j (j = 1, 2, 3,..., J). Extract. The index words to be extracted are, for example, J words having the highest importance in I documents. In order to extract the index words having the highest importance, the importance is calculated for the index words included in the document i, rearranged in descending order, and the highest J is extracted.

ここで算出する索引語の重要度は、テキストデータ取得部で取得された分析対象となる複数の文書における重要度、例えばＧＦＩＤＦや、このＧＦＩＤＦを索引語同士の共起度に基づいて補正したものを用いるのが好ましい。 The importance of the index word calculated here is the importance of the plurality of documents to be analyzed acquired by the text data acquisition unit, for example, GFIDF, or the GFIDF corrected based on the co-occurrence of index words. Is preferably used.

ＧＦＩＤＦとは、ある索引語について、大域的頻度（ＧＦ：分析対象となる文書群における当該索引語の出現回数の合計値）と、文書頻度（ＤＦ：所定文書集団のうち当該索引語が出現する文書の文書数）の逆数又は文書頻度の対数の逆数（ＩＤＦ：逆文書頻度）との積により求められる値である。分析対象となる文書群において多数用いられる索引語であって、分析対象となる文書群とは異なる所定文書集団においてはあまり用いられていない索引語については高いＧＦＩＤＦ値が算出され、分析対象となる文書群の特徴を表す重要語と評価される。 GFIDF is a global frequency (GF: the total number of occurrences of the index word in the analysis target document group) and a document frequency (DF: the index word of a predetermined document group appears for a certain index word. This is a value obtained by multiplying the reciprocal of the number of documents) or the logarithm of the document frequency (IDF: inverse document frequency). A high GFIDF value is calculated for an index word that is used in a large number in the document group to be analyzed and is not frequently used in a predetermined document group different from the document group to be analyzed, and becomes an analysis target. It is evaluated as an important word representing the characteristics of the document group.

このＧＦＩＤＦを索引語同士の共起度に基づいて補正したものとしては、次に述べるＳｋｅｙがある。なお、以下に説明する事項は、本出願人により出願された特許公報（国際公開第２００６／０４８９９８号参照。）記載の手順により実現されるものであるため、説明を簡略化する。
Ｓｋｅｙを求めるには、まず分析対象となる文書群Ｃで多く出現する高頻度語であって、当該文書群の各索引語との文書単位の共起度
ｃ(ｗ_ｊ,ｗ_ｋ)＝Σ_{｛Ｄ∈Ｃ｝}［ＤＦ(ｗ_ｊ,Ｄ)×ＤＦ(ｗ_ｋ,Ｄ)］
（但し、ｗ_ｊ、ｗ_ｋは文書群Ｃに含まれる各索引語であり、Ｄは文書群Ｃに属する各文書であり、ＤＦ（ｗ，Ｄ）は文書Ｄでの索引語ｗの文書頻度である。）
が互いに類似する高頻度語を土台ｇ_ｈ（ｈ＝１，２，・・・，ｂ）とする。この土台ｇと索引語ｗの文書単位での共起度を
Ｃｏ（ｗ，ｇ）＝Σ_{｛ｗ'∈ｇ，ｗ'≠ｗ｝}ｃ（ｗ，ｗ'）
とする。ここでｗ'は、ある土台ｇに属する高頻度語であり、かつ共起度Ｃｏ（ｗ，ｇ）の計測対象である索引語ｗ以外のものをいう。索引語ｗと土台ｇとの共起度Ｃｏ（ｗ，ｇ）は、ｗ'すべてについての、ｗとの共起度ｃ（ｗ，ｗ'）の合計である。この共起度Ｃｏ（ｗ，ｇ）に基づき、次のｋｅｙ（ｗ）を算出する。
ｋｅｙ（ｗ）＝１− Π_{｛１≦ｈ≦ｂ｝}［１−Ｃｏ（ｗ，ｇ_ｈ）／Ｆ（ｇ_ｈ）］
ここで、Ｆ（ｇ_ｈ）＝Σ_{｛ｗ∈Ｃ｝}Ｃｏ（ｗ，ｇ_ｈ）、すなわち索引語ｗと土台ｇ_ｈとの共起度Ｃｏ（ｗ，ｇ_ｈ）の、全索引語ｗについての合計と定義する。Ｃｏ（ｗ，ｇ_ｈ）をＦ（ｇ_ｈ）で除して１との差をとり、これをすべての土台ｇ_ｈについて乗じて１との差をとったものが、ｋｅｙ（ｗ）である。
Ｓｋｅｙ（ｗ）は次の式により算出される。
Ｓｋｅｙ（ｗ）＝ＧＦ(ｗ,Ｃ) × ［ＩＤＦ(ｗ,Ｐ) ＋ ln ｋｅｙ(ｗ) ］
なお、ＧＦ（ｗ，Ｃ）は分析対象となる文書群Ｃでの索引語ｗの大域的頻度であり、ＩＤＦ（ｗ，Ｐ）は分析対象となる文書群Ｃとは異なる所定文書集団Ｐでの索引語ｗの逆文書頻度である。
ＧＦＩＤＦが高く、かつ、文書群Ｃの土台語に共起し文書群Ｃの内容との親和性が高い（ｋｅｙ（ｗ）が高い）語については高いＳｋｅｙ（ｗ）値が算出され、分析対象となる文書群Ｃの特徴を表す重要索引語と評価される。
上記のように、複数の技術文献の中から、技術文献群全体の特徴を表す索引語のみを重要索引語として抽出した上で分析することにより、その技術分野の特徴乃至コンセプトをより明確に把握することが可能である。他方で、索引語を事前に絞り込んでおくことにより、分析処理の効率化を図ることも可能となる。As a correction of this GFIDF based on the degree of co-occurrence between index words, there is Sky as described below. In addition, since the matter demonstrated below is implement | achieved by the procedure of the patent gazette (refer international publication 2006/048998) for which it applied by this applicant, description is simplified.
In order to obtain the Sky, first, a high-frequency word frequently appearing in the document group C to be analyzed, and the co-occurrence degree of the document unit with each index word of the document group c (w _j , w _k ) = Σ _{DεC} [DF (w _j , D) × DF (w _k , D)]
(Where w _j and w _k are index words included in the document group C, D is each document belonging to the document group C, and DF (w, D) is the document frequency of the index word w in the document D. .)
Let high frequency words that are similar to each other be a base g _h (h = 1, 2,..., B). Co (w, g) = Σ _{{w′∈g, w ′ ≠ w}} c (w, w ′)
And Here, w ′ is a high-frequency word belonging to a certain base g and other than the index word w that is a measurement target of the co-occurrence degree Co (w, g). The co-occurrence degree Co (w, g) between the index word w and the base g is the sum of the co-occurrence degrees c (w, w ′) with w for all w ′. Based on this co-occurrence degree Co (w, g), the next key (w) is calculated.
_{key (w) = 1- Π {} 1 ≦ h ≦ b} [1-Co (w, g h) / F (g h)]
Here, F (g _h ) = Σ _{wεC} Co (w, g _h ), that is, all index words w of the co-occurrence degree Co (w, g _h ) of the index word w and the base g _h Is defined as the sum of Key (w) is obtained by dividing Co (w, g _h ) by F (g _h ) to obtain a difference from 1, and multiplying all the bases g _h by taking the difference from 1, .
Sky (w) is calculated by the following equation.
Sky (w) = GF (w, C) × [IDF (w, P) + ln key (w)]
GF (w, C) is a global frequency of the index word w in the document group C to be analyzed, and IDF (w, P) is a predetermined document group P different from the document group C to be analyzed. The reverse document frequency of the index word w.
A high Sky (w) value is calculated for a word having a high GFIDF and co-occurring in the basic language of the document group C and having a high affinity with the contents of the document group C (high key (w)), and is analyzed It is evaluated as an important index word representing the characteristics of the document group C.
As described above, by extracting and analyzing only the index words representing the characteristics of the entire technical document group as important index words from multiple technical documents, the characteristics or concepts of the technical field can be understood more clearly. Is possible. On the other hand, by narrowing down the index words in advance, it becomes possible to improve the efficiency of the analysis process.

＜２−２．文書ベクトルの取得＞
次に、上記テキストデータ取得部１０１がテキストデータを取得した後、文書ベクトル取得部１０２により、Ｉ件の各文書ｉにつき、各索引語ｊの重み付け量ｚをベクトル要素とするＪ次元文書ベクトルを生成する（Ｓ１０２）。この結果、次のようなＩ行Ｊ列のデータを得ることができる。このｚを行列要素とするＩ行Ｊ列の行列をＺとおく。

<2-2. Get document vector>
Next, after the text data acquisition unit 101 acquires text data, the document vector acquisition unit 102 generates a J-dimensional document vector having a weighting amount z of each index word j as a vector element for each document I. Generate (S102). As a result, the following I row and J column data can be obtained. Let the matrix of I rows and J columns with z as a matrix element be Z.

ここで重み付け量とは、所定の観点から各索引語に対し各文書において与えられる数量をいい、例えばＴＦＩＤＦを用いるのが好ましい。ＴＦＩＤＦとは、ある索引語について、索引語頻度（ＴＦ：ある文書における当該索引語の出現回数）と、文書頻度（ＤＦ：所定文書集団のうち当該索引語が出現する文書の文書数）の逆数又は文書頻度の対数の逆数（ＩＤＦ：逆文書頻度）との積により求められる値である。文書ベクトルの算出対象となる文書において多数用いられる索引語であって、所定文書集団においてあまり用いられていない索引語については高いＴＦＩＤＦ値が算出される。 Here, the weighting amount refers to a quantity given to each index word in each document from a predetermined viewpoint, and for example, TFIDF is preferably used. TFIDF is the reciprocal of index word frequency (TF: number of occurrences of the index word in a document) and document frequency (DF: number of documents of the document in which the index word appears in a predetermined document group) for a certain index word. Or it is a value calculated | required by the product with the reciprocal number (IDF: reverse document frequency) of the logarithm of document frequency. A high TFIDF value is calculated for an index word that is used in a large number in a document for which a document vector is to be calculated and is not frequently used in a predetermined document group.

＜２−３．因子負荷量と因子得点の算出＞
次に、因子演算部１０３により、各文書ｉを被験者とし、各索引語ｊを観測変数とし、各文書の文書ベクトルを被験者による回答とした因子分析における因子負荷量及び因子得点を算出する（Ｓ１０３）。
具体的には、まず、因子ｋの因子数をＫとし（ｋ＝１，２，・・・，Ｋ）、各索引語ｊの各因子ｋに対する因子負荷量をａ_ｊｋとする。また、各文書ｉの各因子ｋに関する因子得点をｆ_ｉｋとする。そして、因子負荷量ａ_ｊｋを行列要素とする因子負荷行列Ａと、因子得点ｆ_ｉｋを行列要素とする因子得点行列Ｆを次のようにおく。

<2-3. Calculation of factor loading and factor score>
Next, the factor calculation unit 103 calculates factor loadings and factor scores in factor analysis in which each document i is a subject, each index word j is an observation variable, and the document vector of each document is an answer by the subject (S103). ).
Specifically, first, the factor number of the factor k is set to K (k = 1, 2,..., K), and the factor loading for each factor k of each index word j is set to a _jk . Also, let _fik be a factor score for each factor k of each document i. Then, a factor load matrix A having a factor load a _jk as a matrix element and a factor score matrix F having a factor score f _ik as a matrix element are set as follows.

次に、Ｉ行Ｊ列の残差行列をＥとおき、式
Ｚ＝Ｆ×Ａ^ｔ＋Ｅ
但し、Ａ^ｔはＡの転置行列
を以下のようにして解いて因子負荷行列Ａと因子得点行列Ｆを求める。Next, let E be the residual matrix of I rows and J columns, and
Z = F × A ^t + E
However, A ^t obtains a factor loading matrix A and the factor scores matrix F by solving as a transposed matrix following A.

因子得点行列Ｆの各要素である因子得点ｆ_ｉｋ及び残差行列Ｅの各要素である残差ｅ_ｉｊに関し、（１）因子得点は、平均０、標準偏差１に標準化されている、（２）各因子得点間の相関は０である、（３）各残差間の相関は０である、（４）各因子得点と各残差との相関は０である、との仮定を設けると、一般に、
Ｒ＝ＡＡ^ｔ＋Ｖ
但し、Ｒは観測変数間の相関行列、Ｖは残差の分散共分散行列
が成立することが知られている。そこで、次式において因子負荷量を求める。
ＡＡ^ｔ＝Ｒ−Ｖ
次に、Ｒ−Ｖ＝Ｒ^＊とおく。このＲ^＊を算出するため、行列Ｚの各要素ｚ_ｉｊの値から相関行列Ｒを算出した上で、相関行列の対角要素を共通性の推定値で置き換えることにより、Ｒ^＊行列を推定する（共通性の推定法としては例えばＳＭＣ法、ＲＭＡＸ法等がある）。そして、Ｒ^＊＝ＡＡ^ｔであることから、このＲ^＊行列を基に因子負荷行列Ａを算出して因子負荷量を求める（因子負荷量を求める方法としては例えば主因子法、最小二乗法、最尤法等がある）。Regarding the factor score f _ik which is each element of the factor score matrix F and the residual e _ij which is each element of the residual matrix E, (1) the factor score is standardized to mean 0 and standard deviation 1 (2 It is assumed that the correlation between each factor score is 0, (3) the correlation between each residual is 0, and (4) the correlation between each factor score and each residual is 0. ,In general,
R = AA ^t + V
Where R is a correlation matrix between observed variables, V is a residual covariance matrix
Is known to hold. Therefore, the factor loading is obtained by the following equation.
AA ^t = R−V
Next, R−V = R ^* is set. In order to calculate this R ^* , the correlation matrix R is calculated from the values of each element z _ij of the matrix Z, and the R ^* matrix is estimated by replacing the diagonal elements of the correlation matrix with the estimated value of commonality. (Examples of commonality estimation include SMC and RMAX). Then, R * ⁼ from being a AA ^t, the R ^* matrix by calculating the factor loading matrix A based on obtaining the factor loadings (principal factor method example is as a method for obtaining the factor loadings, least squares method, There is a maximum likelihood method).

そして、より有意味な因子を見出すため、因子演算部１０３によって、因子の回転という操作を行うことが望ましい。本実施形態における因子軸の回転方法としては、バリマックス（直交回転）法を用いるのが好ましい。つまり、因子負荷量の分散を最大化するように他因子との直交性を保ちながら因子軸を回転させることで観測変数と因子の関係を求める。特に、分析対象となる文書群が、文書ベクトル間の類似性の高い文書群である場合には、このバリマックス法を用いることにより、各因子負荷量の大きさに減り張りをつけ、その因子の特徴を明確にすることができるという利点がある。
なお、一般的な因子軸の回転方法としては、上記バリマックス法の他にも、直交回転では、例えば、コーティマックス、エカマックス、パーシマックス、オーソマックス、直交プロクラステス等が挙げられ、斜交回転では、プロマックス、オブリミン、ハリス・カイザー、斜交プロクラステス等が挙げられる。本実施例で因子軸を回転させる方法は、上記バリマックスに限らず、実施態様に応じてこれらの回転方法から適宜に選択してもよい。In order to find a more meaningful factor, it is desirable that the factor calculation unit 103 performs an operation of factor rotation. As a method of rotating the factor axis in the present embodiment, it is preferable to use a varimax (orthogonal rotation) method. That is, the relationship between the observed variable and the factor is obtained by rotating the factor axis while maintaining orthogonality with other factors so as to maximize the variance of the factor loading. In particular, when the document group to be analyzed is a document group having a high similarity between document vectors, this varimax method is used to reduce the amount of each factor loading, and the factor There is an advantage that the characteristics of can be clarified.
In addition to the above varimax method, as a general factor axis rotation method, for orthogonal rotation, for example, CoatMax, Ecamax, Percimax, Orthomax, Orthogonal Procrustes, etc. can be mentioned. Rotation includes Promax, Oblimin, Harris Kaiser, Oblique Procrustes, etc. The method of rotating the factor axis in the present embodiment is not limited to the above varimax, and may be appropriately selected from these rotation methods according to the embodiment.

因子得点行列Ｆは、例えば、
Ｆ＝ＺＲ^−１Ａ
で算出される（但し、Ｚはここでは標準化されたデータとする）。The factor score matrix F is, for example,
F = ZR ⁻¹ A
(Where Z is standardized data here).

＜２−４．帰属因子の決定＞
因子演算部１０３によって因子負荷行列Ａと因子得点行列Ｆが求められたら、帰属因子決定部１０４により、因子負荷量ａ_ｊｋに基づいて各索引語ｊの帰属因子を決定し、因子得点ｆ_ｉｋに基づいて各文書ｉの帰属因子を決定する（Ｓ１０４）。
これにより、各因子に索引語及び複数の技術文献を帰属させることができるので、索引語からは、分析者にとって理解可能な言語情報を通じて技術分野の特徴乃至コンセプトを把握することが可能となる。また、複数の技術文献からは、各技術文献に含まれる情報が有する特定の傾向から技術分野の特徴乃至コンセプトを把握することが可能となる。なお、本発明では因子分析の手法を用いているが、通常の因子分析における「因子の解釈」というような分析者の主観的或いは恣意的な判断を必要としない。これは、観測変数として各索引語を用いていることによる。つまり、観測変数そのものが内容を表しているので、因子負荷量に基づいて因子に帰属する観測変数を決定すれば、各因子の内容は観測変数を用いて端的に示すことができる。<2-4. Determination of attribution factor>
When the factor load matrix A and the factor score matrix F are obtained by the factor calculation unit 103, the attribute factor determination unit 104 determines the attribute factor of each index word j based on the factor load a _jk, and _sets the factor score f _ik Based on this, the attribution factor of each document i is determined (S104).
As a result, an index word and a plurality of technical documents can be attributed to each factor. Therefore, it is possible to grasp the characteristics or concept of the technical field from the index word through language information understandable to the analyst. In addition, from a plurality of technical documents, it is possible to grasp the characteristics or concepts of the technical field from the specific tendency of information included in each technical document. Although the present invention uses a factor analysis technique, it does not require an analyst's subjective or arbitrary judgment such as “interpretation of factors” in ordinary factor analysis. This is because each index word is used as an observation variable. That is, since the observation variable itself represents the content, if the observation variable belonging to the factor is determined based on the factor loading, the content of each factor can be simply indicated using the observation variable.

例えば、ある索引語ｊの各因子に対する因子負荷量ａ_ｊ１、ａ_ｊ２、・・・、ａ_ｊＫのうち、ある因子ｋに対する因子負荷量ａ_ｊｋが最大であれば、当該索引語ｊの帰属因子を当該因子ｋとする。同様に、ある文書ｉの各因子に関する因子得点ｆ_ｉ１、ｆ_ｉ２、・・・、ｆ_ｉＫのうち、ある因子ｋに関する因子得点ｆ_ｉｋが最大であれば、当該文書ｉの帰属因子を当該因子ｋとする。上記のように帰属因子を決定することが、例えば、各索引語又は各技術文献にとって最も関係性の強い因子に帰属することにより、その因子を最もよく説明ことが可能となる。その結果、技術分野の特徴乃至コンセプトをより明確に把握することが可能となる。なお、この場合、１つの索引語が帰属し得る因子は１つのみとなり、１つの文書が帰属し得る因子も１つのみとなる。これに対し、１つの因子に帰属する索引語は１つとは限らないし、１つの因子に帰属する文書も１つとは限らない。For example, among the factor loadings a _j1 , a _j2 ,..., A _jK for each factor of a certain index word j, if the factor loading a _jk for a certain factor k is the maximum, the attribution factor of the index word j Is the factor k. Similarly, out of the factor scores f _i1 , f _i2 ,..., F _iK for a factor of a document i, if the factor score f _ik for a factor k is the maximum, the attribution factor of the document i is determined as the factor. k. Determining the attribution factor as described above can best explain the factor by, for example, belonging to the factor most relevant to each index word or each technical document. As a result, it becomes possible to grasp the features and concepts of the technical field more clearly. In this case, one index word can belong to only one factor, and one document can belong to only one factor. On the other hand, the number of index words belonging to one factor is not necessarily one, and the number of documents belonging to one factor is not necessarily one.

また更に、因子負荷量に下限値を設け、ある索引語ｊの因子負荷量の最大値ａ_ｊｋが当該下限値未満であれば、当該索引語ｊはいかなる因子にも帰属しないこととし、因子負荷量の低い索引語ｊは因子の内容を示す索引語群から除外することが好ましい。同様に、因子得点にも下限値を設け、ある文書ｉの因子得点の最大値ｆ_ｉｋが当該下限値未満であれば、当該文書ｉはいかなる因子にも帰属しないこととし、因子との関係が高い文書のみを因子に帰属させることが好ましい。Furthermore, a lower limit is set for the factor loading, and if the maximum factor a _jk of an index word j is less than the lower limit, the index word j is not attributed to any factor, and factor loading It is preferable to exclude the index word j having a low amount from the index word group indicating the content of the factor. Similarly, a lower limit value is set for the factor score, and if the maximum factor score f _ik of a document i is less than the lower limit value, the document i is not attributed to any factor, and the relationship with the factor is It is preferable to assign only high documents to factors.

以上のように各索引語の帰属因子が決定されれば、各因子に属する索引語又は索引語群は、当該各因子の内容を示すものすなわち分析対象のＩ件の文書に含まれるコンセプト乃至特徴を示すものと考えることができる。そして、各因子に属する文書又は文書群は当該因子に関係の深い文書又は文書群であるから、各因子として抽出されたコンセプト乃至特徴が特にどの文書に表れているかを示すものと考えることができる。 If the attribution factor of each index word is determined as described above, the index word or index word group belonging to each factor indicates the contents of each factor, that is, the concept or feature included in the I document to be analyzed. Can be considered as indicating. Since the document or document group belonging to each factor is a document or document group closely related to the factor, it can be considered to indicate which document the concept or feature extracted as each factor is particularly expressed. .

＜２−５．出力＞
そして、文書群分析装置は、同じ因子に属する索引語又は索引語群を、それぞれ該当する各因子に属する文書又は文書群のデータとともに、各因子につき出力装置４により出力する（Ｓ１０５）。各因子に属する文書又は文書群のデータとしてどのようなものを出力するかは任意であるが、一例としては当該文書又は文書群を特定するデータ（特許公報であれば公報番号など）を出力することや、当該文書又は文書群の因子評価値を算出して出力することが考えられる。
上記のように因子評価値を算出することにより、因子評価値による各因子間の相対比較を行うことができる。その結果として、因子に帰属する技術文献によって表される技術要素間の相対的な位置関係を把握することができ、更には、それらの中から重要な技術要素とそうでないものとの分類を行うこともできる。なお、因子評価値を算出する好ましい一例として、当該文書又は文書群の因子経過情報スコアを算出して出力する例を次に述べる。<2-5. Output>
Then, the document group analysis apparatus outputs the index word or index word group belonging to the same factor together with the document or document group data belonging to each corresponding factor by the output device 4 for each factor (S105). What is output as the data of the document or document group belonging to each factor is arbitrary, but as an example, data specifying the document or document group (publication number etc. in the case of a patent publication) is output. In addition, it is conceivable to calculate and output a factor evaluation value of the document or document group.
By calculating the factor evaluation value as described above, a relative comparison between the factors based on the factor evaluation value can be performed. As a result, it is possible to grasp the relative positional relationship between the technical elements represented by the technical literature belonging to the factors, and further classify the important technical elements from those that do not. You can also. As a preferred example of calculating the factor evaluation value, an example of calculating and outputting a factor progress information score of the document or document group will be described below.

＜３．因子経過情報スコアの算出処理＞
図２（Ｂ）は、文書又は文書群のデータとして因子経過情報スコアを算出する処理手順を説明するフローチャートである。このフローチャートの処理は、（１）因子経過情報スコアを算出すべき各因子に属する文書又は文書群、或いは、（２）後述のように因子ごと及び出願人ごとに因子経過情報スコアを算出する場合には各因子かつ各出願人の文書又は文書群（以下では（１）（２）何れの場合についても「スコア算出対象文書群」という）について、それぞれ実行される。<3. Calculation process of factor progress information score>
FIG. 2B is a flowchart for explaining a processing procedure for calculating a factor progress information score as data of a document or a document group. The process of this flowchart is as follows: (1) Document or document group belonging to each factor for which factor progress information score is to be calculated; or (2) Factor progress information score is calculated for each factor and for each applicant as described later. Is executed for each factor and each applicant's document or document group (hereinafter referred to as “score calculation target document group” in either case (1) or (2)).

＜３−１．文書数の判定＞
まず、文書数判定部２０１により、スコア算出対象文書群の文書数Ｎを判定する（Ｓ２０１）。ある特許出願につき公開特許公報と特許掲載公報が発行されている場合には、当該特許出願についての文書数は２件としてカウントすることが望ましい。<3-1. Judging the number of documents>
First, the document number determination unit 201 determines the document number N of the score calculation target document group (S201). When a published patent gazette and a patent publication gazette are issued for a certain patent application, it is desirable to count the number of documents for the patent application as two.

＜３−２．経過情報の読み出し＞
次に、経過情報読み出し部２０２により、記録装置３の文書格納部３３又はその他のデータベースから、スコア算出対象文書群の各文書の文書属性として経過情報を読み出す（Ｓ２０２）。このデータベースには、各文書に係る特許出願の経過情報が記録されている。読み出される経過情報の例としては、各特許出願につき、
「他社引用回数」（０又は正の整数）、
「被特許異議申立若しくは被特許無効審判請求の回数」（０又は正の整数）、
「審査請求の有無」（１又は０）、
「特許権設定登録の有無」（１又は０）、
「早期審査請求の有無」（１又は０）、
「査定不服審判の有無」（１又は０）
等が挙げられるが、他の情報であってもよい。<3-2. Reading progress information>
Next, the progress information reading unit 202 reads the progress information as the document attribute of each document of the score calculation target document group from the document storage unit 33 of the recording device 3 or another database (S202). In this database, progress information of patent applications related to each document is recorded. As an example of the progress information to be read, for each patent application,
“Number of other company citations” (0 or positive integer),
“Number of opposition to patent or request for trial for invalidation of patent” (0 or positive integer),
“Existence request” (1 or 0),
“Patent right registration registration” (1 or 0),
"Presence / absence of accelerated examination request" (1 or 0),
"Presence or absence of appellate appeal" (1 or 0)
However, other information may be used.

＜３−３．経過情報に基づく指標の算出＞
次に、指標算出部２０３により、スコア算出対象文書群についての評価値として、上記データベースに記録された経過情報に基づく複数の指標を算出する（Ｓ２０３）。
この指標の例としては、「他社引用回数の合計値」、「被特許異議申立若しくは被特許無効審判請求の回数の合計値」、「審査請求率」、「登録査定率」の他、「特許登録率」、「早期審査請求率」、「他社引用件数比率」、「査定不服審判件数比率」、「被異議申立又は被無効審判請求件数比率」があるが、他の指標を用いてもよい。各々の定義は次の通りである。
「他社引用回数の合計値」＝「他社引用回数」のスコア算出対象文書群での合計
「被特許異議申立若しくは被特許無効審判請求の回数の合計値」
＝「被特許異議申立若しくは被特許無効審判請求の回数」のスコア算出対象文書群での合計
「審査請求率」＝審査請求件数／特許出願件数
「特許登録率」＝特許登録件数／特許出願件数
「登録査定率」＝特許登録件数／審査請求件数
「早期審査請求率」＝早期審査請求件数／審査請求件数
「他社引用件数比率」＝「他社引用回数の合計値」／特許出願件数
「査定不服審判件数比率」＝査定不服審判件数／審査請求件数
「被特許異議申立若しくは被特許無効審判請求件数比率」
＝「被特許異議申立若しくは被特許無効審判請求の回数の合計値」／特許登録件数
なお、これらの定義のうち
審査請求件数は「審査請求の有無」（１又は０）のスコア算出対象文書群での合計
特許登録件数は「特許権設定登録の有無」（１又は０）のスコア算出対象文書群での合計
早期審査請求件数は「早期審査請求の有無」（１又は０）のスコア算出対象文書群での合計
査定不服審判件数は「査定不服審判の有無」（１又は０）のスコア算出対象文書群での合計
で与えればよい。<3-3. Calculation of indicators based on progress information>
Next, the index calculation unit 203 calculates a plurality of indexes based on the progress information recorded in the database as evaluation values for the score calculation target document group (S203).
Examples of this indicator include “total number of citations from other companies”, “total number of requests for opposition to patent or patent invalidation trial”, “examination request rate”, “registration assessment rate”, There are “registration rate”, “prompt request rate”, “other companies quoted ratio”, “appraisal appeal trials ratio”, “opposed objection or invalid trial request ratio”, but other indicators may be used . Each definition is as follows.
“Total number of citations from other companies” = “Total number of citations from other companies” in the document group subject to score calculation “Total number of requests for opposition to patent or request for trial for invalidity of patent”
= Total number of documents subject to score calculation for “number of oppositions to be patented or patent invalidation request” “Requested Examination Rate” = Number of Examination Requests / Number of Patent Applications “Patent Registration Rate” = Number of Patent Registrations / Number of Patent Applications “Registered assessment rate” = Number of patent registrations / Number of requests for examination “Rate of accelerated examination” = Number of requests for accelerated examination / Number of requests for examination “Ratio of citations of other companies” = “Total number of citations of other companies” / Number of patent applications Ratio of trials "= number of trials against appraisal / number of requests for examination" ratio of requests for opposition to patent or trial for invalidity of patent "
= “Total number of requests for opposition to patents or requests for invalidation of patented patents” / Number of patent registrations Note that, among these definitions, the number of requests for examination is the group of documents subject to score calculation for “whether there is a request for examination” (1 or 0) The total number of patent registrations is the total number of documents subject to score calculation for “whether or not patent rights are registered” (1 or 0). Total in the Document Group The number of trials against the appraisal may be given as the sum of the documents for which the score is to be calculated as “Presence / No Appeal Appeal” (1 or 0).

＜３−４．特許インパクト指数の算出＞
次に、特許インパクト算出部２０４により、スコア算出対象文書群の「文書数」に所定の重み付けをして特許インパクト指数を算出する（Ｓ２０４）。特許インパクト指数は、スコア算出対象文書群について、他社牽制力（他社の権利化を抑制し、自社特許の価値を向上させる度合い）を評価しようとするものをいう。例えば、「文書数」に対して「他社引用回数の合計値」及び／又は「被特許異議申立若しくは被特許無効審判請求の回数の合計値」に基づく所定の重み付けを行い、
特許インパクト指数＝「文書数」＋「他社引用回数の合計値」＋「被特許異議申立若しくは被特許無効審判請求の回数の合計値」
によって算出することができる。「文書数」に対する重み付けは、上式のような加算により行っても良いし、他の何らかの比率を乗算することにより行ってもよい。
なお、上記所定の重み付けとしては、これらの他にも、例えば、特許収益性、特許生産性、特許活用度及び特許競争力等、種々のものが挙げられるが、これに限定されない。
上記のように技術文献数を判定することにより、技術文献群からなる母集団における因子のシェアを把握することができる。また、文献数に所定の重み付けを行うことにより、経済的な側面や技術競争力を加味した因子の評価を行うことができる。<3-4. Calculation of patent impact index>
Next, the patent impact calculation unit 204 calculates a patent impact index by giving a predetermined weight to the “number of documents” of the score calculation target document group (S204). The patent impact index refers to a document for which a score calculation target document group is to evaluate the other company's restraining power (the degree to which the rights of other companies are suppressed and the value of the company's patent is improved). For example, the “number of documents” is weighted based on the “total number of citations from other companies” and / or the “total number of requests for opposition to a patent or patent invalidation request”,
Patent Impact Index = “Number of documents” + “Total number of citations from other companies” + “Total number of requests for opposition to patent or request for trial for invalidation of patent”
Can be calculated. The weighting for the “number of documents” may be performed by addition as in the above equation or by multiplying by some other ratio.
In addition to these, the predetermined weighting includes, but is not limited to, various things such as patent profitability, patent productivity, patent utilization, and patent competitiveness.
By determining the number of technical documents as described above, it is possible to grasp the share of factors in a population composed of technical documents. In addition, by applying a predetermined weight to the number of documents, it is possible to evaluate factors that take into account economic aspects and technical competitiveness.

＜３−５．経過情報指数の算出＞
次に、経過情報算出部２０５により、スコア算出対象文書群の経過情報に基づく指標を二乗平均して経過情報指数を算出する（Ｓ２０５）。経過情報指数は、スコア算出対象文書群について、自社、特許庁及び競合他社の特許の価値を評価しようとするもので、例えば、
経過情報指数＝√｛Σ_{指標No＝１} ^ｄ (指標)^２／ｄ｝
によって算出することができる。すなわち、経過情報に基づくｄ個の指標、例えば上記「審査請求率」、「登録査定率」、「特許登録率」、「早期審査請求率」、「他社引用件数比率」、「査定不服審判件数比率」、「被異議申立又は被無効審判請求件数比率」の計７個の指標の二乗和を指標数ｄ＝７で除算して算出された値の正の平方根をとることにより、経過情報指数を算出することができる。ここでは、経過情報指標の例として上記７個の指標を示したが、他にも、例えば、「自社引用件数比率」、「国内優先権主張率」、「国外優先権主張率」、「包袋閲覧率」等を用いるようにしてもよい。上記の指標を用いた因子経過情報スコアを算出することにより、他社の特許取得、技術開発に対して障害となりうる特許の影響力を加味した因子評価が可能である。また、出願人の権利化意欲や審査官評価を加味した因子評価が可能である。それにより、因子評価の公平性及び適正さを担保することができるため、その結果として、技術要素間の相対的な位置関係、重要性の結果の公平且つ適正な把握をすることができる。<3-5. Calculation of progress information index>
Next, the progress information calculation unit 205 calculates a progress information index by averaging the indices based on the progress information of the score calculation target document group (S205). The progress information index is intended to evaluate the value of patents of its own company, patent office and competitors for the score calculation target document group. For example,
Progress information index = √ {Σ _{index No = 1} ^d (index) ² / d}
Can be calculated. That is, d indicators based on the progress information, for example, “examination request rate”, “registered appraisal rate”, “patent registration rate”, “early examination request rate”, “other companies cited number ratio”, “number of appealed trial cases” The progress information index is calculated by taking the positive square root of the value calculated by dividing the sum of squares of a total of seven indicators, "Ratio" and "Ratio of objections to appeal or invalidation request" by the number of indicators d = 7. Can be calculated. Here, the above seven indicators are shown as examples of progress information indicators. However, for example, “in-house citation count ratio”, “domestic priority claim ratio”, “foreign priority claim ratio”, “packet” You may make it use "bag browsing rate" etc. By calculating the factor progress information score using the above-mentioned index, it is possible to perform factor evaluation taking into account the influence of patents that may be an obstacle to patent acquisition and technology development of other companies. In addition, it is possible to perform factor evaluations that take into account the applicant's willingness to obtain rights and examiner evaluation. Thereby, since fairness and appropriateness of factor evaluation can be ensured, as a result, the relative positional relationship between the technical elements and the result of importance can be grasped fairly and appropriately.

ｄ個の指標から経過情報指数を算出する方法として、各指標の総和を求める方法も可能である（単純和法）。多数の指標において高い値を有する特許群が高く評価されるので、指標の総和を経過情報指数とすることは一見合理的である。但し、指標の数ｄを増やせば増やすほど、本来重視すべき「審査請求率」、「登録査定率」、「特許登録率」等の比重が下がり、評価の目的とは異なった経過情報指数となってしまう場合があり得る。
この問題を解決する１つの方法として、各指標のうち最大値を経過情報指数としたり、上位例えば３指標のみを使って経過情報指数を算出したりする方法も可能である（最大値法）。しかし、最大値又は上位指標のみを採用すると、他の指標はまったく勘案されないことになってしまい、一面的な評価になってしまう場合があり得る。
二乗和を指標数ｄで除算して平方根をとる上述の方法は、単純和法と最大値法の長所を兼ね備えた方法ということができる。すなわち、二乗和をとることにより、ある文書群に関するｄ個の指標の中に高い値の指標があるときは、その高い値が経過情報指数に大きく影響する。従って、値の高くなりやすい「審査請求率」、「登録査定率」、「特許登録率」等が特に高い（結果的に特許査定件数が多い）文書群に対しては、突出して高い評価素点を与えることができる。そして、高い値以外の指標についても、幾らか考慮された経過情報指数となる。
このように本実施形態では、ｄ個の指標を全て加味して経過情報指数を算出するようにしている。その結果、特許の価値を多面的に評価することが可能となる。As a method for calculating the progress information index from the d indices, a method for obtaining the sum of the indices is also possible (simple sum method). Since a group of patents having a high value in a large number of indexes is highly evaluated, it seems reasonable to use the sum of the indexes as a historical information index. However, as the number d of indicators is increased, the specific gravity of “examination request rate”, “registration assessment rate”, “patent registration rate”, etc., which should be emphasized, decreases, and the progress information index differs from the purpose of evaluation. It can happen.
As one method for solving this problem, it is also possible to use a method in which the maximum value of each index is used as a progress information index, or a progress information index is calculated using only the top three indexes (maximum value method). However, if only the maximum value or the higher index is adopted, the other indices are not taken into account at all, and the evaluation may be one-sided.
The above-described method of taking the square root by dividing the sum of squares by the index number d can be said to be a method that combines the advantages of the simple sum method and the maximum value method. That is, by taking the sum of squares, if there is a high value index among the d indices related to a certain document group, the high value greatly affects the historical information index. Therefore, for documents that have a particularly high “request for examination rate”, “registration assessment rate”, “patent registration rate”, etc., which tend to be high (resulting in a large number of patent assessments), this is an extremely high evaluation factor. Can give points. The index other than the high value is also a progress information index that is somewhat considered.
As described above, in the present embodiment, the progress information index is calculated in consideration of all d indices. As a result, it is possible to evaluate the value of a patent from multiple angles.

＜３−６．因子経過情報スコアの算出＞
次に、スコア算出部２０６により、上記特許インパクト指数と、上記経過情報指数とを乗算して当該スコア算出対象文書群の因子評価値を算出する（Ｓ２０６）。このように経過情報を指数化することにより、例えば、定量的且つ客観的な因子の評価を行うことができる。
この評価値を「因子経過情報スコア」と称する。算出されたスコアは、Ｓ１０５において出力される。この因子経過情報スコアは、以下の性質を持っている。
「審査請求率」＝０の場合、経過情報指数はほとんどのケースで０となり、その結果、因子経過情報スコアも０となる。
経過情報指数は、特許登録される件数が増えるにつれて増大する。また、査定不服、被異議申立等があれば勘案される。
特許インパクト指数は公報件数をカウントするので、特許出願が増えるほど増大し、更に特許掲載公報が発行されると一層増大する。そして、「他社引用回数の合計値」、「被特許異議申立若しくは被特許無効審判請求の回数の合計値」で重み付けされている。
この因子経過情報スコアにより、特許文書群を経過情報の側面から評価できるので、特許件数だけでは測れない特許の強さを窺い知ることができる。
以上のような因子経過情報スコアを算出することにより、特許公報に基づく因子の評価を行うことができる。その結果、その特許公報に関連する特許情報に含まれる審査経過情報を活用することにより、精度の高い因子評価を行うことが可能となる。
なお、因子経過情報スコアを算出する方法は、これに限らず、例えば、各因子に属する各特許公報の個別評価値を算出し、これらを総計することにより各因子ごとに因子経過情報スコアを求めることもできる。<3-6. Calculation of factor progress information score>
Next, the score calculation unit 206 multiplies the patent impact index and the progress information index to calculate a factor evaluation value of the score calculation target document group (S206). By indexing the progress information in this way, for example, quantitative and objective factor evaluation can be performed.
This evaluation value is referred to as a “factor progress information score”. The calculated score is output in S105. This factor progress information score has the following properties.
When “examination request rate” = 0, the progress information index is 0 in most cases, and as a result, the factor progress information score is also 0.
The progress information index increases as the number of patents registered increases. In addition, if there are any complaints against appraisals or oppositions, they will be taken into account.
Since the patent impact index counts the number of publications, it increases as the number of patent applications increases, and further increases when patent publications are issued. The weights are weighted by “total number of other company citations” and “total number of requests for opposition to patent or requests for invalidation of patent invalid”.
This factor progress information score allows the patent document group to be evaluated from the aspect of the progress information, so that the strength of the patent that cannot be measured only by the number of patents can be known.
By calculating the factor progress information score as described above, the factor can be evaluated based on the patent publication. As a result, it becomes possible to perform highly accurate factor evaluation by utilizing examination progress information included in the patent information related to the patent publication.
The method for calculating the factor progress information score is not limited to this. For example, the individual evaluation value of each patent publication belonging to each factor is calculated, and the factor evaluation information score is obtained for each factor by summing them. You can also.

＜４．実施例＞
次に、具体的な文書群の解析結果を紹介する。
ある自動車メーカー（企業Ａとする）の日本における特許公開公報及び特許掲載公報を類似度に基づいてクラスタ分析して複数のクラスタを得た。これら複数のクラスタを検討した結果、特に「内燃機関の排気制御」に関するクラスタについて、周辺他社特許との関係を詳細に解析することにした。そこで、自社及び他社の特許公開公報及び特許掲載公報を含む約４００万件の公報群の中から、当該「内燃機関の排気制御」に関するクラスタに属する公報群との類似度が高い２８６９件の公報群を抽出した。
以下に紹介する解析結果は、この２８６９件の公報群を各公報の文書ベクトルの類似度に基づいてクラスタ分析し、このうち特定クラスタに属する公報５６９件を分析対象文書群とした例である。<4. Example>
Next, we introduce the analysis results of a specific document group.
A cluster analysis was performed on a patent publication and a patent publication of a certain automobile manufacturer (named company A) in Japan based on the degree of similarity to obtain a plurality of clusters. As a result of examining these multiple clusters, we decided to analyze in detail the relationship with the patents of other companies around the company, especially for the cluster related to “exhaust control of internal combustion engines”. Therefore, 2869 publications having high similarity to the publication group belonging to the cluster related to the “exhaust control of the internal combustion engine” out of about 4 million publication groups including patent publications and patent publications of the company and other companies. Groups were extracted.
The analysis result introduced below is an example in which cluster analysis is performed on the 2869 publication groups based on the similarity of the document vectors of each publication, and among these, 569 publications belonging to a specific cluster are used as the analysis target document group.

＜４−１．因子分析処理＞
本実施例では、上記５６９件の分析対象文書群に含まれる索引語のうち重要度上位７７語を観測変数とし、因子分析の手法を用いて１６個の因子を抽出した。なお、抽出する因子の数はこれに限らず、実施態様に応じて適宜に変更してもよい。因子数を決定する方法としては、例えば、固有値、因子寄与、解釈可能性、カイザーガットマン基準による方法等が挙げられる。
以下、因子演算部１０３により算出した各索引語の因子負荷量と、各公報の因子得点の算出結果の例を図４及び図５の表にそれぞれ示す（ここでは索引語及び公報の一部についてのみ示す）。<4-1. Factor analysis processing>
In the present embodiment, among the 569 analysis target document groups, among the index words, the 77 words with the highest importance are used as observation variables, and 16 factors are extracted using the factor analysis technique. The number of factors to be extracted is not limited to this, and may be changed as appropriate according to the embodiment. Examples of methods for determining the number of factors include eigenvalues, factor contributions, interpretability, and Kaiser-Gatmann criteria.
Hereinafter, examples of the factor loading of each index word calculated by the factor calculation unit 103 and the calculation result of the factor score of each gazette are shown in the tables of FIG. 4 and FIG. Only).

図４に示す表は因子番号１〜１６までの各因子ごとに算出した各索引語の因子負荷量の例を示す表であり、同図中、行は各索引語、列は各因子の番号を示す。図４において、説明の便宜上、すべての索引語を表記せず、一部の索引語のみを抜粋して示している。
本実施例では、各索引語について、図４に示した因子負荷量が最大値（該当部分に網掛けを施した）を示す因子を当該索引語の帰属因子とした。具体的には、例えば図４の最上段にある「蔵」という索引語は、因子１に対して０．９６の因子負荷量を有する。この「蔵」という索引語は、因子２〜１６までの他のどの因子よりも因子１に対して大きな因子負荷量を有するため、因子１に帰属するということになる。その他、因子１には、「蔵」という索引語と同様に、「吸、ＮＯｘ、還元、放出、スパイク」という索引語が帰属している。但し、観測変数とした索引語のうち所定の閾値以上の因子負荷量を持たなかったものは、どの因子にも帰属しないものとした。The table shown in FIG. 4 is a table showing an example of the factor loading of each index word calculated for each factor of factor numbers 1 to 16. In FIG. Indicates. In FIG. 4, for convenience of explanation, not all index words are shown, and only some index words are extracted and shown.
In this example, for each index word, the factor indicating the maximum factor load (shown by shading the relevant part) shown in FIG. Specifically, for example, the index word “Kura” at the top of FIG. 4 has a factor loading of 0.96 with respect to factor 1. This index word “Kura” belongs to factor 1 because it has a larger factor loading for factor 1 than any other factors up to factors 2-16. In addition, similarly to the index word “Kura”, the index word “suck, NOx, reduction, release, spike” belongs to factor 1. However, the index words used as observation variables that did not have a factor loading equal to or greater than a predetermined threshold were not attributed to any factor.

図５に示す表は因子番号１〜１６までの各因子ごとに算出した各公報の因子得点の例を示す表であり、同図中、行は各公報番号、列は各因子の番号を示している。図５において、説明の便宜上、公報は各因子に帰属している公報すべてを表記せずに一部を抜粋し、そして、それら公報の番号は一部を加工して示している。
本実施例では、上記索引語の帰属因子を決定する場合と同様に、各公報についても、図５に示した因子得点が最大値（該当部分に網掛けを施した）を示す因子を当該公報の帰属因子とした。具体的には、例えば図５の最上段にある「ＸＸ−ＸＸＸ６０１」という公報は、因子１に対して５．４１の因子得点を有する。この「ＸＸ−ＸＸＸ６０１」の公報は、因子２〜１６までの他のどの因子よりも因子１に対して大きな因子得点を有するため、因子１に帰属することになる。その他、因子１には、「ＸＸ−ＸＸＸ６０１」の公報と同様に、「ＸＸ−ＸＸＸ０９７」、「ＸＸ−ＸＸＸ１８９」の公報が帰属している。但し、分析対象文書群に含まれる公報のうち因子得点の最大値が所定の閾値（ここでは１とした）以下のものは、どの因子にも帰属しないものとした。The table shown in FIG. 5 is a table showing an example of the factor score of each gazette calculated for each factor of factor numbers 1 to 16, in which the row indicates each gazette number and the column indicates the number of each factor. ing. In FIG. 5, for the sake of convenience of explanation, a part of the publication is extracted without indicating all publications belonging to each factor, and the numbers of these publications are partially processed.
In this embodiment, as in the case of determining the attribution factor of the index word, for each gazette, the factor in which the factor score shown in FIG. 5 shows the maximum value (the corresponding part is shaded) is the gazette. The attribution factor. Specifically, for example, the publication “XX-XXX601” at the top of FIG. 5 has a factor score of 5.41 for factor 1. This publication “XX-XXX601” has a larger factor score for factor 1 than any other factors up to factors 2 to 16, and therefore belongs to factor 1. In addition, similarly to the publication of “XX-XXX601”, publications of “XX-XXX097” and “XX-XXX189” belong to factor 1. However, among the publications included in the analysis target document group, those whose maximum factor score is equal to or less than a predetermined threshold (here, 1) are not attributed to any factor.

図６に示す表に、各因子に属する索引語と、当該各因子に属する公報を示す。図６は、各因子に帰属する索引語及び公報の例を示す表であるが、説明の便宜上、各因子に帰属している公報すべてを表記せず、一部を抜粋して示している。
これにより、もともと類似度の高い５６９件の分析対象文書群であっても、各因子に属する索引語によって分析対象の文書群に含まれるコンセプト及び特徴が容易に理解できる。つまり、文書群をクラスタリング等により分類した場合には分類間で何が違うのかを把握しにくいことがあるが、本実施形態によれば、因子という互いに独立の概念を抽出した上でその因子に最も関連性の強い索引語及び公報を帰属させることにより、索引語が示す技術要素、及び公報に含まれる技術情報から、技術分野の特徴乃至コンセプトを把握することができる。
また、各因子につきそれぞれ必要に応じて具体的公報を参照できることとなったので、注目したい因子に属する公報を実際に読み込んだり、各因子に属する公報のデータに基づいて後述の因子経過情報スコアを算出したりするなど更に進んだ分析が可能となる。The table shown in FIG. 6 shows index words belonging to each factor and publications belonging to each factor. FIG. 6 is a table showing examples of index words and publications belonging to each factor, but for convenience of explanation, not all publications belonging to each factor are shown, and some are extracted and shown.
As a result, even in the case of 569 analysis target document groups having a high degree of similarity, the concepts and features included in the analysis target document group can be easily understood by the index words belonging to each factor. In other words, when a document group is classified by clustering or the like, it may be difficult to grasp what is different between the classifications. However, according to the present embodiment, after extracting mutually independent concepts called factors, By assigning the most relevant index word and publication, it is possible to grasp the characteristics or concept of the technical field from the technical element indicated by the index word and the technical information included in the publication.
In addition, since it becomes possible to refer to specific gazettes for each factor as necessary, the gazette belonging to the factor to be noticed is actually read, or the factor progress information score described later is based on the gazette data belonging to each factor. More advanced analysis such as calculation is possible.

日本語の表記法は、英語のような単語間にスペースを入れる表記法とは異なり、単語の区切りを形態上明確には表現しない。このため日本語の文章に対してテキストマイニングを行う場合、コンピュータ上で予め形態素解析プログラムを適用することによりキーワード切出し処理を行っておくのが一般である。しかし、現在の形態素解析プログラムは日本語の自然文の多種多様な表現への対応能力が不十分であるため、本来一体となって意味をなす言葉が不自然に分割されてしまい分析に支障を来たすことがある。
ところが、本実施例によれば、分割されてしまった言葉が同じ因子に再び集まってくるという現象がみられた。例えば因子４には索引語「リー」及び「ン」が帰属することになったが、これらはもともと「リーン」（本分析対象文書群の分野では「燃料と空気の混合比が薄いこと」を意味する）という一体の用語であったものと思われる。このように用語が分割されても同じ因子に再び集まってくるのは、これら索引語の出現パターンの共通性が高いため、各因子に対する因子負荷量も似通ったものになることに起因するものと考えられる。同様に因子２の「空」、「燃」、「比」、「理論」も、もともと「理論空燃比」という一体の用語であったものと思われる。英語のように単語間にスペースを入れる表記法をとる言語でも、「理論空燃比」は「空」、「燃」、「比」、「理論」に分割のうえ解析される可能性があるが、その場合でも本発明を適用すれば同じ因子に再び集まってくるものと期待できる。The Japanese notation differs from the notation in which a space is inserted between words such as English, and does not clearly express word breaks. For this reason, when text mining is performed on Japanese sentences, it is common to perform keyword extraction processing by applying a morphological analysis program in advance on a computer. However, the current morphological analysis program has insufficient ability to handle various expressions of Japanese natural sentences, so the words that originally make sense together are unnaturally divided, which hinders analysis. May come.
However, according to the present example, a phenomenon was observed in which the words that had been divided gathered again in the same factor. For example, the index words “Lee” and “N” belonged to Factor 4, but these were originally “Lean” (in the field of this analysis document group, “the mixing ratio of fuel and air is low”). It seems to have been an integral term. Even if the terms are divided in this way, they are gathered again in the same factor because of the high commonality of the appearance patterns of these index words, and the factor loading for each factor is also similar. Conceivable. Similarly, “Air”, “Fuel”, “Ratio”, and “Theory” of Factor 2 are considered to have been integrated terms of “theoretical air-fuel ratio”. Even in languages that use notation with spaces between words, such as English, "theoretical air-fuel ratio" may be analyzed after being divided into "sky", "fuel", "ratio", and "theory". Even in that case, if the present invention is applied, it can be expected that the same factors are collected again.

＜４−２．因子経過情報スコアの算出＞
次に、各因子に属する公報の経過情報に基づいて、各因子につき、因子経過情報スコアを算出した例を示す。
図７及び図８に示す表に、各因子に属する公報の経過情報データに基づいて、各因子につき、因子経過情報スコアの算出に必要な複数の指標を算出し、これに基づいて特許インパクト指数及び経過情報指数を算出した結果の例を示す。
図７に因子番号１〜１６までの各因子ごとに算出した審査請求率、特許登録率、登録査定率、早期審査請求率、他社引用件数比率、査定不服審判件数比率、被異議申立件数比率、及び経過情報指数の例を示す。図８に因子番号１〜１６までの各因子ごとの公報件数、他社引用回数、被異議申立件数、及び特許インパクトの例を示す。<4-2. Calculation of factor progress information score>
Next, an example in which a factor progress information score is calculated for each factor based on the progress information of the gazette belonging to each factor is shown.
Based on the progress information data of the gazette belonging to each factor, the tables shown in FIG. 7 and FIG. And the example of the result of having calculated progress information index is shown.
The examination request rate, patent registration rate, registration appraisal rate, accelerated examination request rate, other company citation number ratio, appraisal appeal ratio ratio, objection ratio ratio calculated for each factor from factor numbers 1 to 16 in FIG. And an example of a progress information index. FIG. 8 shows an example of the number of publications for each factor of factor numbers 1 to 16, the number of citations from other companies, the number of oppositions, and patent impact.

図９に示す表に、特許インパクト指数及び経過情報指数に基づきスコア算出部２０６によって算出した因子経過情報スコアと、スコア算出部２０６により因子経過情報スコアを文書数で除算して算出した公報１件当たりの因子経過情報スコア（平均）を示す。また、参考のため、因子分析における各因子の固有値も同図に示す。左端列の因子番号は図４乃至図１２で共通のものである。図４乃至図８と図１１及び図１２では因子番号の順に配列したが、この図９での因子の配列順序は因子経過情報スコアの降順とした。
図９からわかるように、各因子の因子経過情報スコアを出力することによって、分析対象の文書群のなかでも重要度の高い因子がどれであるかを鮮明に理解できる。各因子の固有値は、データのうちどれだけの割合が説明されるかを示しているが、因子経過情報スコアの観点から見た各因子に含まれる文書の重要度とは無関係であることが理解できる。In the table shown in FIG. 9, the factor progress information score calculated by the score calculation unit 206 based on the patent impact index and the progress information index, and one publication calculated by dividing the factor progress information score by the number of documents by the score calculation unit 206 The factor progress information score (average) is shown. For reference, the eigenvalues of each factor in the factor analysis are also shown in the figure. The factor numbers in the leftmost column are common to FIGS. 4 to 8 and FIGS. 11 and 12 are arranged in the order of the factor numbers, the factor arrangement order in FIG. 9 is the descending order of the factor progress information score.
As can be seen from FIG. 9, by outputting the factor progress information score of each factor, it is possible to clearly understand which factor is the most important in the document group to be analyzed. The eigenvalue of each factor indicates how much of the data is explained, but it is understood that it is independent of the importance of the documents contained in each factor from the perspective of the factor history information score it can.

図１０は、この実施例において、スコア算出部２０６によって算出された因子経過情報スコアを、出力装置４が出力した図示例を示している。図１０において、各円が各因子を示し、縦軸上における各円の位置は各因子の因子経過情報スコアを示し、横軸上における各円の位置は各因子の因子経過情報スコア（当該因子に帰属する公報１件あたりの平均値）を示す。各円の大きさは各因子に属する公報件数を示し、各円に付記された技術用語は各因子に帰属する索引語を示す。
これによれば、例えば因子１３（排気、ＮＯＸ）は公報件数は少ないものの、経過情報の観点からみて極めて重要な公報群よりなる因子であると推測できる。その他因子４（リー、ン）、因子５（下流、上流、触媒、劣化、診断）など因子経過情報スコアの高い因子に注目することで、調査対象の文書群において重要な技術要素に注意を向けることができる。
具体的には、例えば、図１０において因子５（下流、上流、触媒、劣化、診断）は、触媒、劣化、診断という索引語の組合せから、触媒システム等の排出ガス低減装置の性能劣化を自動的に検出して運転者に知らせる機能を有する、いわゆる高度な車載診断システムを示す技術要素因子であると推測される。そうすると、近時の排出ガス規制により、ＮＯｘを低減するための触媒等の排出ガス低減装置の機能不良を監視し、それを検出して運転者に知らせる高度な車載診断システムの導入が、政府又は地方公共団体における各種審議会によって望まれているが、その技術エレメントの重要性が、この因子５に有意味な索引語を伴って顕れていると考えられる。FIG. 10 shows an example in which the output device 4 outputs the factor progress information score calculated by the score calculation unit 206 in this embodiment. In FIG. 10, each circle represents each factor, the position of each circle on the vertical axis represents the factor progress information score of each factor, and the position of each circle on the horizontal axis represents the factor progress information score of each factor (the relevant factor (Average value per publication). The size of each circle indicates the number of publications belonging to each factor, and the technical terms attached to each circle indicate the index word belonging to each factor.
According to this, for example, although factor 13 (exhaust gas, NOX) has a small number of publications, it can be presumed that it is an extremely important factor from the viewpoint of progress information. Attention is paid to important technical elements in the document group to be investigated by paying attention to factors with high factor history information scores such as factor 4 (Lee, N) and factor 5 (downstream, upstream, catalyst, deterioration, diagnosis). be able to.
Specifically, for example, factor 5 (downstream, upstream, catalyst, deterioration, diagnosis) in FIG. 10 automatically determines the performance deterioration of the exhaust gas reduction device such as a catalyst system from a combination of index words of catalyst, deterioration, and diagnosis. It is presumed that this is a technical element factor that indicates a so-called advanced in-vehicle diagnosis system having a function of automatically detecting and informing the driver. Then, due to the recent exhaust gas regulations, the introduction of an advanced in-vehicle diagnostic system that monitors the malfunction of exhaust gas reduction devices such as catalysts for reducing NOx, detects it, and notifies the driver, Although requested by various councils in local governments, the importance of the technical elements is thought to be manifested in this factor 5 with meaningful index terms.

因子経過情報スコアの算出にあたっては、上述の例では各因子に属する公報群を「スコア算出対象文書群」とし、因子ごとに算出したが、これに限らず、各因子に属する公報群を更に出願人ごとに区分し、この因子ごと及び出願人ごとの各区分に属する公報群を「スコア算出対象文書群」としてもよい。
図１１に示す表に、因子ごと及び出願人ごとの因子経過情報スコアの算出結果を示す。
この因子ごと及び出願人ごとの因子経過情報スコアの算出方法は、スコア算出対象文書群を変更しただけで他は上述の（因子ごとの）因子経過情報スコアと同じである。因子ごと及び出願人ごとに区分したので各スコア算出対象文書群の文書数は小さくなることが多いと考えられる。しかし、出願人ごとに区分したか否かに関わらず審査請求率などは０から１までの値域をとるので、必ずしも小さくなるわけではない。従って、ある因子について出願人ごとの因子経過情報スコアを全出願人で合計しても、当該因子の因子経過情報スコア（出願人ごとではないもの）とは一致しない。In calculating the factor progress information score, in the above example, the publication group belonging to each factor is set as the “score calculation target document group” and calculated for each factor. However, the invention is not limited to this, and further publications belonging to each factor are filed. It is good also as a "score calculation object document group" which classifies for every person and belongs to each classification for every factor and every applicant.
In the table | surface shown in FIG. 11, the calculation result of the factor progress information score for every factor and every applicant is shown.
The calculation method of the factor progress information score for each factor and for each applicant is the same as the above-described factor progress information score (for each factor) except that the score calculation target document group is changed. Since it is classified for each factor and for each applicant, the number of documents in each score calculation target document group is likely to be small. However, the examination request rate, etc. takes a value range from 0 to 1 regardless of whether or not it is classified for each applicant. Accordingly, even if the factor progress information scores for each applicant for a certain factor are summed up for all applicants, the factor progress information scores for the factors (not for each applicant) do not match.

図１２は、この実施例において、各因子に属する公報群を更に出願人ごとに分類し、因子ごと及び出願人ごとに因子経過情報スコアを図示した例である。具体的には、スコア算出部２０６が、各因子に属する公報群を更に出願人ごとに分類し、因子ごと及び出願人ごとに因子経過情報スコアを算出する。出力装置４が、その算出された因子経過情報スコアに基づく図を生成し、その図を出力した例である。
図１２において、図示左右方向に因子が列挙され、奥行き方向に出願人が列挙されており、高さ方向が因子ごと及び出願人ごとの因子経過情報スコアを示している。但し、図１２には、因子得点が一定水準以上を超えた全公報において、出願件数が上位１０位以内となる出願人の因子経過情報スコアについて算出した結果のみを示している。
具体的には、例えば図９及び図１０において因子経過情報スコアが高いことがわかった因子４及び因子５は、企業Ａの力が強いことがわかる（この企業Ａは、分析対象文書群を抽出する元になった自動車メーカー（自社）である）。また、例えば企業Ａは因子４及び因子５については強い立場にいるが、因子１３については他社に遅れを取っているなど、企業Ａと他企業との間、その他各企業間の強み、弱みを把握することができる。
これによれば、各因子ごとに、因子に帰属する特許公報の出願人間の序列及びシェアを把握することができる。その結果、開発主体となる企業における競争状態の観点から、所定の技術分野の特徴を把握することができる。FIG. 12 is an example in which the publication group belonging to each factor is further classified for each applicant in this embodiment, and the factor progress information score is shown for each factor and for each applicant. Specifically, the score calculation unit 206 further classifies the publication group belonging to each factor for each applicant, and calculates a factor progress information score for each factor and for each applicant. In this example, the output device 4 generates a diagram based on the calculated factor progress information score and outputs the diagram.
In FIG. 12, factors are listed in the horizontal direction in the figure, applicants are listed in the depth direction, and the height direction indicates factor progress information scores for each factor and each applicant. However, FIG. 12 shows only the results calculated for the applicant's factor progress information score with the number of filings within the top 10 in all publications with factor scores exceeding a certain level.
Specifically, for example, factor 4 and factor 5 that have been found to have a high factor progress information score in FIGS. 9 and 10 show that company A has strong power (this company A extracts a group of documents to be analyzed). The automaker (in-house) that became the source). In addition, for example, Company A is in a strong position with respect to Factor 4 and Factor 5, but is behind the other companies with respect to Factor 13, and the strengths and weaknesses between Company A and other companies, and other companies. I can grasp it.
According to this, for each factor, it is possible to grasp the rank and share of the applicants of patent publications belonging to the factor. As a result, it is possible to grasp the characteristics of a predetermined technical field from the viewpoint of a competitive state in a company that is a development subject.

なお、本発明は、以上で説明した実施形態に限定されるものではなく、本発明の要旨の範囲内において種々の変形が可能である。
上記実施形態では、分析対象の技術文献が特許公報類である場合を例にしたが、特にこれに限定されるものではない。分析対象の技術文献が技術論文であってもよい。この場合、因子経過情報スコアを、各因子に属する技術論文の数、引用回数等を利用して求めるようにすればよい。
また、上記実施形態では、スコア算出対象文書群の「文書数」に行う所定の重み付けとして、「他社引用回数の合計値」、「被特許異議申立の回数の合計値」若しくは「被特許無効審判請求の回数の合計値」の少なくともいずれか１つ、又はそれらの全てによって行う場合を例にしたが、特にこれに限定されるものではない。例えば、特許収益性、特許生産性、特許活用度又は特許競争力等を利用して求めるようにしてもよい。
また、上記実施形態では、処理装置１の各機能部（テキストデータ取得部１０１、文書ベクトル取得部１０２、因子演算部１０３、帰属因子決定部１０４、文書数判定部２０１、経過情報読出し部２０２、指標算出部２０３、特許インパクト算出部２０４、経過情報算出部２０５、およびスコア算出部２０６）がソフトウェアにより実現される場合を例にしたが、特にこれに限定されるものではない。処理装置１の各機能部は、各機能部を実行するために専用に設計された回路（ＡＳＩＣ（Application Specific Integrated Circuit）等）により実現されてもよい。
また、上記実施形態では、処理装置１は、分析対象の特許公報類を記憶装置３から取得する場合を例にしたが、特にこれに限定するものではない。例えば、インターネット等のネットワークを介して、外部の情報提供サーバと通信を行い、外部の情報提供サーバから、特許公報類を取得するようにしてもよい。
また、上記実施形態では、特許インパクト指数の算出に利用する指標として、「被特許異議申立の回数の合計値」又は「被特許無効審判請求の回数の合計値」を利用しているが、これは例示に過ぎない。特許インパクト指数の算出に利用する指標として、「被特許異議申立の回数の合計値」および「被特許無効審判請求の回数の合計値」の両者を利用するようにしてもよい。同様に、上記実施形態では、経過情報指数の算出に利用する指標として、「被特許異議申立件数比率」又は「被特許無効審判請求件数比率」を利用しているが、これは例示に過ぎない。経過情報指数の算出に利用する指標として、「被特許異議申立件数比率」および「被特許無効審判請求件数比率」の両者を利用するようにしてもよい。
The present invention is not limited to the embodiment described above, and various modifications can be made within the scope of the gist of the present invention.
In the above embodiment, the case where the technical document to be analyzed is a patent gazette is taken as an example, but the present invention is not particularly limited to this. The technical document to be analyzed may be a technical paper. In this case, the factor progress information score may be obtained by using the number of technical papers belonging to each factor, the number of citations, and the like.
In the above embodiment, as the predetermined weighting performed on the “number of documents” of the document group to be score-calculated, “total number of citations of other companies”, “total number of oppositions to be patented”, Although the case where it carries out by at least any one of "the total value of the number of claims" or all of them is taken as an example, it is not particularly limited to this. For example, it may be obtained using patent profitability, patent productivity, patent utilization, patent competitiveness, or the like.
In the above embodiment, each functional unit (text data acquisition unit 101, document vector acquisition unit 102, factor calculation unit 103, attribution factor determination unit 104, document number determination unit 201, progress information reading unit 202, Although the case where the index calculation unit 203, the patent impact calculation unit 204, the progress information calculation unit 205, and the score calculation unit 206) are realized by software is taken as an example, the present invention is not particularly limited thereto. Each functional unit of the processing device 1 may be realized by a circuit (ASIC (Application Specific Integrated Circuit) or the like) designed exclusively for executing each functional unit.
Moreover, in the said embodiment, although the processing apparatus 1 took as an example the case where the patent publications of analysis object are acquired from the memory | storage device 3, it does not specifically limit to this. For example, it is possible to communicate with an external information providing server via a network such as the Internet and acquire patent publications from the external information providing server.
In the above embodiment, the “total value of the number of oppositions to be patented” or the “total number of requests for invalidation of patents to be patented” is used as an index used for calculating the patent impact index. Is just an example. As an index used for calculating the patent impact index, both “the total value of the number of times of opposition to the patented object” and “the total value of the number of times of the patent invalidation trial request” may be used. Similarly, in the above embodiment, the “patent opposition number ratio” or “patent invalid trial request ratio” is used as an index used to calculate the progress information index, but this is merely an example. . As an index used for calculating the progress information index, both “the ratio of the number of oppositions to be patented” and “the ratio of the number of requests for invalidation of patents to be patented” may be used.

＜５．変形例＞
続いて、本発明の上述した実施形態の変形例を説明する。
本実施形態の変形例は、上述した実施形態が行う処理のうち、上述の因子経過情報スコアとは別の因子評価値（以下、変形例で算出する因子評価値を「技術要素スコア」という）の算出を行うものである。ここで技術要素とは、抽出された各因子のことを言い、各因子に含まれる技術文献および索引語により表される各因子の内容面から命名したものである。なお、以下の説明では、分析対象の文書に特許公報等の特許文献を利用する場合を例にする。<5. Modification>
Then, the modification of embodiment mentioned above of this invention is demonstrated.
The modification of this embodiment is a factor evaluation value different from the above-described factor progress information score among the processes performed by the above-described embodiment (hereinafter, the factor evaluation value calculated in the modification is referred to as “technical element score”). Is calculated. Here, the technical element refers to each extracted factor, and is named from the technical literature included in each factor and the content of each factor represented by an index word. In the following description, a case where a patent document such as a patent publication is used as a document to be analyzed is taken as an example.

以下、図１３〜図２０を参照しながら、本実施形態の変形例について詳細に説明していく。なお、本実施形態の変形例の説明において、上記実施形態と同じ構成については同じ符号を用いる。また、本変形例の説明では、上記実施形態と異なる部分を中心に説明し、同様の構成の説明は省略する。 Hereinafter, a modification of the present embodiment will be described in detail with reference to FIGS. 13 to 20. In the description of the modification of the present embodiment, the same reference numerals are used for the same configurations as in the above embodiment. Further, in the description of the present modified example, the description will focus on parts different from the above embodiment, and the description of the same configuration will be omitted.

＜５−１．変形例の構成＞
先ず、本実施形態の変形例の構成を図１３に示す。図１３は、本実施形態の変形例の文書群分析装置の機能ブロック図である。<5-1. Configuration of Modification>
First, FIG. 13 shows a configuration of a modified example of the present embodiment. FIG. 13 is a functional block diagram of a document group analysis apparatus according to a modification of the present embodiment.

図示するように、文書群分析装置は、入力装置２、記録装置３、出力装置４、および処理装置１００を備える。
入力装置２、記録装置３、および出力装置４は、上記実施形態と同様のものを用いる。
処理装置１００は、入力装置２からの要求にしたがい、記録装置３に格納されている特許文献を利用して、上述した図２（Ａ）の手順にしたがい、因子毎に特許文献を分類する。また、処理装置１００は、入力装置２を介して、ユーザから指定された因子に関して、技術要素スコアを算出する。As illustrated, the document group analysis apparatus includes an input device 2, a recording device 3, an output device 4, and a processing device 100.
The input device 2, the recording device 3, and the output device 4 are the same as those in the above embodiment.
The processing device 100 classifies the patent documents for each factor according to the request of the input device 2 and uses the patent documents stored in the recording device 3 according to the above-described procedure of FIG. Further, the processing device 100 calculates a technical element score for the factor designated by the user via the input device 2.

具体的には、処理装置１００は、テキストデータ取得部１０１、文書ベクトル取得部１０２、因子演算部１０３、帰属因子決定部１０４、経過情報読出し部２０２、スコア算出部２０６０、およびパテントスコア算出部２０７０を備える。なお、テキストデータ取得部１０１、文書ベクトル取得部１０２、因子演算部１０３、帰属因子決定部１０４、および経過情報読出し部２０２は、上記実施形態と同様の機能であるため、ここでの説明は省略する。 Specifically, the processing apparatus 100 includes a text data acquisition unit 101, a document vector acquisition unit 102, a factor calculation unit 103, an attribution factor determination unit 104, a progress information reading unit 202, a score calculation unit 2060, and a patent score calculation unit 2070. Is provided. Note that the text data acquisition unit 101, the document vector acquisition unit 102, the factor calculation unit 103, the attribution factor determination unit 104, and the progress information reading unit 202 have the same functions as those in the above embodiment, and thus description thereof is omitted here. To do.

スコア算出部２０６０は、帰属因子決定部１０４により、因子毎に特許文献が分類され、かつ各因子に索引語が対応付けられた状態において、入力装置２を介して、ユーザから技術要素スコアの算出要求を受け付ける。
スコア算出部２０６０は、技術要素スコアの算出要求を受け付けると、算出対象の因子に属する特許文献毎の評価値を示す「パテントスコア（ＰＳ）」を利用して、技術要素スコアを算出する。なお、「パテントスコア（ＰＳ）」は、以下に示す、パテントスコア算出部２０７０により、予め算出されていることとする。The score calculation unit 2060 calculates a technical element score from the user via the input device 2 in a state where patent documents are classified for each factor by the attribution factor determination unit 104 and an index word is associated with each factor. Accept the request.
When the score calculation unit 2060 receives a calculation request for a technical element score, the score calculation unit 2060 calculates a technical element score by using a “patent score (PS)” indicating an evaluation value for each patent document belonging to a calculation target factor. The “patent score (PS)” is calculated in advance by the patent score calculation unit 2070 shown below.

パテントスコア算出部２０７０は、特許文献毎に、その特許文献の経過情報（優先権主張の有無や、他の特許出願の審査での被引用回数などの情報）および内容情報（請求項の数や、明細書の枚数等の情報）を利用して、その特許文献を評価した「パテントスコア（ＰＳ）」を算出する。そして、パテントスコア算出部２０７０は、特許文献を識別する情報（公報番号）毎に、その特許文献の「パテントスコア（ＰＳ）」と、その特許が権利放棄されているか否かを示す「放棄情報（拒絶が確定しているか否かの情報も含まれるものとする）」とを対応付けた情報（以下、「ＰＳ情報」という）を生成する。 The patent score calculation unit 2070, for each patent document, progress information of the patent document (information such as presence / absence of priority claim, number of citations in examination of other patent applications) and content information (number of claims, , The “patent score (PS)” that evaluates the patent document is calculated. Then, for each piece of information (gazette number) for identifying a patent document, the patent score calculation unit 2070 “patent score (PS)” of the patent document and “abandonment information” indicating whether the patent is abandoned. (Information including whether or not rejection is confirmed is included) is generated (hereinafter referred to as “PS information”).

つぎに、処理装置１００のハードウェア構成について説明する。処理装置１００は、上記実施形態と同様、ＣＰＵ（Central Processing Unit）、メモリ、外部装置（入力装置２、記録装置３、出力装置４等）との間でデータの授受を行うＩ/Ｆ等を備えたコンピュータにより実現される。
そして、処理装置１００の各機能部（テキストデータ取得部１０１、文書ベクトル取得部１０２、因子演算部１０３、帰属因子決定部１０４、経過情報読出し部２０２、スコア算出部２０６０、およびパテントスコア算出部２０７０）は、ソフトウェアにより実現されるものとする。Next, the hardware configuration of the processing apparatus 100 will be described. As in the above-described embodiment, the processing device 100 includes a CPU (Central Processing Unit), a memory, an I / F that exchanges data with external devices (such as the input device 2, the recording device 3, and the output device 4). It is realized by a computer provided.
Then, each functional unit of the processing device 100 (text data acquisition unit 101, document vector acquisition unit 102, factor calculation unit 103, attribution factor determination unit 104, progress information reading unit 202, score calculation unit 2060, and patent score calculation unit 2070). ) Shall be realized by software.

具体的には、処理装置１００のメモリには、各機能部（テキストデータ取得部１０１、文書ベクトル取得部１０２、因子演算部１０３、帰属因子決定部１０４、経過情報読出し部２０２、スコア算出部２０６０、およびテントスコア算出部２０７０）を実現するためのプログラムが記憶されている。そして、処理装置１００の各機能部は、上記のＣＰＵ（Central Processing Unit）が、メモリに記憶されている上記のプログラムを実行することにより実現されるものとする。 Specifically, in the memory of the processing apparatus 100, each functional unit (text data acquisition unit 101, document vector acquisition unit 102, factor calculation unit 103, attribution factor determination unit 104, progress information reading unit 202, score calculation unit 2060 is stored. , And a program for realizing the tent score calculation unit 2070) is stored. And each function part of the processing apparatus 100 shall be implement | achieved when said CPU (Central Processing Unit) runs said program memorize | stored in memory.

＜５−２．変形例における算出処理＞
つぎに、上述した実施形態と異なる変形例の技術要素スコアの算出処理について説明する。
図１４は、本発明の実施形態の変形例の技術要素スコアの算出処理の手順を示すフローチャートである。
なお、以下に示す処理フローは、上述した図２（Ａ）の因子抽出処理において各文書の帰属因子の決定（Ｓ１０４）が完了してから、各因子に属する文書又は文書群のデータを出力（Ｓ１０５）するために行われるものとする。また、図２（Ａ）の因子抽出処理により求めた、「因子毎に、因子に属する特許文献を対応付けた情報（図６に示す情報）」は、処理装置１００のメモリの所定領域に記憶されているものとする。
また、図１４の処理を行う前に、パテントスコア算出部２０７０により、各因子に属する特許文献毎のパテントスコア（ＰＳ）が算出されているものとする。そして、処理装置１００のメモリ（或いは記憶装置３）には、特許文献を識別する情報（公報番号）毎に、その特許文献の「パテントスコア（ＰＳ）」と、その特許が権利放棄されているか否かを示す「放棄情報（拒絶が確定しているか否かの情報も含まれるものとする）」とを対応付けた情報（以下、「ＰＳ情報」という）が格納されているものとする。なお、パテントスコア（ＰＳ）の算出手順は、後述する図１７〜図２０で説明する。<5-2. Calculation processing in modification>
Next, a technical element score calculation process according to a modified example different from the above-described embodiment will be described.
FIG. 14 is a flowchart illustrating a procedure of a technical element score calculation process according to a modification of the embodiment of this invention.
The processing flow shown below outputs data of documents or document groups belonging to each factor after the determination of attribution factors (S104) of each document in the factor extraction processing of FIG. 2A described above (S104). S105). Further, “information in which patent documents belonging to factors are associated with each factor (information shown in FIG. 6)” obtained by the factor extraction processing in FIG. 2A is stored in a predetermined area of the memory of the processing device 100. It is assumed that
Further, it is assumed that the patent score (PS) for each patent document belonging to each factor is calculated by the patent score calculation unit 2070 before performing the processing of FIG. In the memory (or storage device 3) of the processing device 100, for each piece of information (gazette number) identifying the patent document, the “patent score (PS)” of the patent document and whether the patent is waived. It is assumed that information (hereinafter referred to as “PS information”) associated with “abandonment information (including information indicating whether rejection has been confirmed)” indicating whether or not is stored is stored. The procedure for calculating the patent score (PS) will be described with reference to FIGS.

具体的には、スコア算出部２０６０は、入力装置２を介して、ユーザから技術要素スコアの算出処理の要求を受け付ける（Ｓ２０１０）。なお、ユーザは、技術要素スコアの算出処理を要求する際、算出の対象となる区分も指定する。
算出の対象となる区分として、例えば、図２（Ａ）の因子抽出処理により求めた因子を指定してもよい。この場合には、因子毎に、技術要素スコアが算出される。
また、例えば、算出の対象となる区分として、各因子に属する特許公報を出願人毎に区分し、因子毎かつ出願人毎の分類を指定してもよい。この場合には、因子毎かつ出願人毎に技術要素スコアが算出される。
なお、以下では、ある因子の技術要素スコアを算出する要求を受け付けた場合を例にする。Specifically, the score calculation unit 2060 receives a request for a technical element score calculation process from the user via the input device 2 (S2010). In addition, when requesting the calculation process of the technical element score, the user also designates a category to be calculated.
For example, a factor obtained by the factor extraction process in FIG. 2A may be specified as the classification target. In this case, a technical element score is calculated for each factor.
Further, for example, as a classification subject to calculation, patent gazettes belonging to each factor may be classified for each applicant, and a classification for each factor and each applicant may be designated. In this case, a technical element score is calculated for each factor and for each applicant.
In the following, a case where a request for calculating a technical element score of a certain factor is received is taken as an example.

つぎに、スコア算出部２０６０は、Ｓ２０１０で受け付けた技術要素スコアの算出対象となる因子に属する特許文献のパテントスコア（ＰＳ）を取得する（Ｓ２０２０）。
具体的には、スコア算出部２０６０は、処理装置１００のメモリに記憶されている「因子毎に特許文献を対応付けた情報（図６に示す情報）」、および「ＰＳ情報」を利用して、算出対象となる因子に属する特許文献の「パテントスコア（ＰＳ）」および「放棄情報」を取得する。Next, the score calculation unit 2060 acquires a patent score (PS) of a patent document belonging to a factor that is a calculation target of the technical element score received in S2010 (S2020).
Specifically, the score calculation unit 2060 uses “information in which a patent document is associated with each factor (information shown in FIG. 6)” and “PS information” stored in the memory of the processing device 100. The “patent score (PS)” and “abandonment information” of the patent document belonging to the factor to be calculated are acquired.

つぎに、スコア算出部２０６０は、取得した算出対象となる因子に属する特許文献の「パテントスコア（ＰＳ）」および「放棄情報」を利用し、権利放棄されていないパテントスコア（ＰＳ）について、各々、その標準値を求める（Ｓ２０３０）。 Next, the score calculation unit 2060 uses the “patent score (PS)” and “waiver information” of the patent document belonging to the acquired factor to be calculated, and for each patent score (PS) that has not been waived. The standard value is obtained (S2030).

具体的には、スコア算出部２０６０は、「放棄情報」を参照し、指定された因子に属する特許文献のうち、権利放棄されていない特許文献（特許庁に係属中の出願も含める）のパテントスコア（ＰＳ）を特定する。
スコア算出部２０６０は、特定した各パテントスコア（ＰＳ）について、母集団（例えば、因子抽出処理の行われた分析対象文書群のうちの権利放棄されていない特許文献）における標準値を求める。より具体的には、スコア算出部２０６０は、以下に示す（数１）と、上記の特定したパテントスコア（ＰＳ）とを用いて、特定したパテントスコア（ＰＳ）毎に標準値を求める。Specifically, the score calculation unit 2060 refers to the “waiver information” and, among the patent documents belonging to the designated factor, patent patents (including applications pending with the Patent Office) that have not been surrendered. Specify the score (PS).
The score calculation unit 2060 obtains a standard value for each identified patent score (PS) in a population (for example, a patent document that has not been surrendered in the analysis target document group subjected to factor extraction processing). More specifically, the score calculation unit 2060 obtains a standard value for each identified patent score (PS) using the following (Equation 1) and the identified patent score (PS).

以下に示す（数１）では、権利放棄されていない特許文献のパテントスコア（ＰＳ）が母集団内に「ｍ」個あるものとし、パテントスコア（ＰＳ）に添え字ｉを付け、「ＰＳｉ（１≦ｉ≦ｍ（ｍは１以上の整数））」で示している。
また、（式１）では、ｍ個の特許文献のＰＳｉのうち、特定の因子に属する各特許文献ｊの「パテントスコアＰＳｊ」の標準値を求めている。

In (Equation 1) shown below, it is assumed that there are “m” patent scores (PS) of patent documents that have not been waived in the population, and a subscript i is added to the patent score (PS), and “PSi ( 1 ≦ i ≦ m (m is an integer of 1 or more)) ”.
In (Expression 1), the standard value of “patent score PSj” of each patent document j belonging to a specific factor is obtained from the PSi of m patent documents.

つぎに、スコア算出部２０６０は、Ｓ２０３０で求めた特定の因子に属する特許文献の各パテントスコアＰＳｊの標準値のうち、閾値以上のパテントスコアＰＳｊの標準値の合計値を求め、その合計値を当該因子の「技術要素スコア」とする（Ｓ２０４０）。また、スコア算出部２０６は、本ステップにおいて、Ｓ２０３０で求めた特定の因子に属する特許文献の各パテントスコアＰＳｊの標準値のうち、最大値を求める。 Next, the score calculation unit 2060 obtains the total value of the standard values of the patent score PSj greater than or equal to the threshold among the standard values of the patent scores PSj of the patent documents belonging to the specific factor obtained in S2030, and calculates the total value. The “technical element score” of the factor is set (S2040). In this step, the score calculation unit 206 obtains the maximum value among the standard values of the patent scores PSj of the patent documents belonging to the specific factor obtained in S2030.

具体的には、スコア算出部２０６は、以下に示す（数２）と、Ｓ２０３０で求めたパテントスコア（ＰＳｊ）の標準値とを用いて、ユーザから指定された因子に対する「技術要素スコア」を算出する。また、スコア算出部２０６は、Ｓ２０３０で求めた各パテントスコアＰＳｊの標準値の中から最大（ＭＡＸ）の標準値を選択し、選択した標準値を当該因子における最大値とする。
なお、（数２）では、Ｓ２０３０で求めた各パテントスコアＰＳｊの標準値のうち、閾値以上のパテントスコアＰＳｊの標準値の数が当該因子に「ｎ」個あるものとしている。また、（数２）では閾値ＰＳｓｔｄの例として、Ｓ２０３０で求めた各パテントスコアＰＳｉの標準値の母集団での平均（［数１］によれば０となる）を用いている。Specifically, the score calculation unit 206 uses the following (Equation 2) and the standard value of the patent score (PSj) obtained in S2030 to calculate the “technical element score” for the factor specified by the user. calculate. Further, the score calculation unit 206 selects the maximum (MAX) standard value from the standard values of the patent scores PSj obtained in S2030, and sets the selected standard value as the maximum value in the factor.
In (Expression 2), among the standard values of each patent score PSj obtained in S2030, it is assumed that the number of standard values of the patent score PSj equal to or greater than the threshold is “n” in the factor. In (Expression 2), as an example of the threshold value PSstd, the average of the standard values of each patent score PSi obtained in S2030 (0 according to [Expression 1]) is used.

そして、スコア算出部２０６０により技術要素スコアが算出されると、図２（Ａ）のＳ１０５（出力）の処理に移行する。
なお、図１４のフローでは、１つの因子に対する技術要素スコアを算出しているが、あくまでもこれは例示である。複数の因子の技術要素スコアを算出する要求を受けた場合には、各因子について、Ｓ２０２０〜Ｓ２０４０の処理を行い、因子毎に、技術要素スコアおよび最大値を求める。Then, when the technical element score is calculated by the score calculation unit 2060, the process proceeds to S105 (output) in FIG.
In the flow of FIG. 14, the technical element score for one factor is calculated, but this is merely an example. When a request for calculating the technical element score of a plurality of factors is received, the processing of S2020 to S2040 is performed for each factor, and the technical element score and the maximum value are obtained for each factor.

図２（Ａ）のＳ１０５では、出力装置４により、Ｓ２０４０で求めた技術要素スコアを出力する。或いは、出力装置４により、技術要素スコアと共に、その因子の最大値を出力する。
なお、スコア算出部２０６０により、因子ごとおよび出願人ごとに技術要素スコアを求め、上述した図１２に示すように、因子ごとおよび出願人ごとに技術要素スコアを示したグラフを示す情報を生成し、出力装置４により出力するようにしてもよい。この場合、因子ごとおよび出願人ごとに技術要素スコアと最高値を併せて示すようにしてもよい。In S105 of FIG. 2A, the output element 4 outputs the technical element score obtained in S2040. Alternatively, the output device 4 outputs the maximum value of the factor together with the technical element score.
The score calculation unit 2060 obtains a technical element score for each factor and each applicant, and generates information indicating a graph indicating the technical element score for each factor and each applicant as shown in FIG. The output device 4 may output the data. In this case, the technical element score and the maximum value may be shown together for each factor and for each applicant.

このように、本変形例では、権利放棄されていない特許文献のパテントスコア（ＰＳｉ）を利用して、技術要素スコアを算出するようにしている。このようにしたのは以下の理由による。例えば、ある企業において、技術分野毎の特許の評価をしようとした場合、ある技術分野（因子）に分類される特許文献の件数は非常に多いが、その多くが放棄されている出願（或いは拒絶査定の確定している出願）であったとする。このような場合、その技術分野の特許の評価に、すでに放棄されている出願（或いは拒絶が確定している出願）を含めてしまうと、特許権を多く保持していない技術分野が高く評価されてしまい、適切な分析ができない。
そこで、本変形例では、権利放棄されていない特許文献のパテントスコア（ＰＳｉ）を利用して技術要素スコアを算出するようにして、スコアの精度を高めるようにしている。As described above, in the present modification, the technical element score is calculated using the patent score (PSi) of the patent document that has not been waived. The reason for this is as follows. For example, when a company tries to evaluate patents in each technical field, the number of patent documents classified into a certain technical field (factor) is very large, but many of them are abandoned (or rejected). Suppose that the application has been finalized. In such a case, if an application that has already been abandoned (or an application for which refusal has been finalized) is included in the evaluation of a patent in that technical field, the technical field that does not hold many patent rights will be highly evaluated. Therefore, proper analysis is not possible.
Therefore, in this modification, the technical element score is calculated using the patent score (PSi) of a patent document that has not been abandoned so as to improve the accuracy of the score.

また、本変形例では、パテントスコア（ＰＳｉ）の標準値を算出する際に、単なる標準値ではなく、一般的な標準値に係数を乗算するようにしている（（数１）では１０倍している）。これは、求めた標準値間の差異を判別し易くするためである。なお、（数１）では１０倍しているがあくまでも例示である。 Further, in this modification, when calculating the standard value of the patent score (PSi), a general standard value is multiplied by a coefficient instead of a mere standard value ((Equation 1) is multiplied by 10). ing). This is for facilitating the discrimination between the obtained standard values. In (Equation 1), it is 10 times, but it is only an example.

また、本変形例では、技術要素スコアの算出に閾値を超えるパテントスコアＰＳｉの標準値だけを利用するようにしている。これは、技術要素スコアの値が受ける特許文献の件数の影響を緩和するためである。
例えば、出願人毎且つ因子毎に、技術要素スコアを求め、その求めた技術要素スコアを比較して出願人毎の技術傾向を分析しようとしたとする。この場合に本変形例のように閾値を考慮しないとすれば、出願件数が多い出願人の技術要素スコアの値が高くなり過ぎる傾向にあり、精度の高い分析ができなくなるおそれがある。
確かに、特定技術分野の特許を過不足なく抽出して分析対象文書群（母集団）としたような場合には、各出願人及び各因子の出願件数の多寡そのものも十分に有意な数値と考えることができる。しかし、そうではない任意の方法で分析対象文書群（母集団）を抽出したような場合には、出願件数そのものが各出願人の属する業種の特性等に応じて異なることから、各出願人及び各因子の出願件数の多寡にとらわれてしまうと、精度の高い分析ができなくなる可能性がある。
また、膨大な数の特許を含む分析対象文書群（母集団）から重要な要素を選出することを主眼とした場合には、「個々の重要度の低い多数の特許」より「個々の重要度の高い特許」が含まれている方を重視した方が好ましい場合もある。
そのため、本変形例では、パテントスコアＰＳｉの標準値のうち、所定値以上のものだけを利用するようにして、当該所定値以上の重要特許を含む因子にのみ高い技術要素スコアが付与されるようにして技術要素スコアの精度を高めるようにした。
特に、例えば平均が０となるようにパテントスコアを標準化し、平均（０）以上の標準値を集計して技術要素スコアとする場合には、平均以下のパテントスコアの値を捨象できるだけでなく、平均付近のパテントスコアが多数あっても技術要素スコアの値に与える影響が小さく、平均から飛び抜けて高いものがあれば技術要素スコアの値に大きな影響を与える。従って、技術要素に含まれる件数の影響を更に緩和し、重要度の高い特許が含まれている技術要素を的確に抽出することができる。In this modification, only the standard value of the patent score PSi exceeding the threshold is used for calculating the technical element score. This is to alleviate the influence of the number of patent documents received by the value of the technical element score.
For example, suppose that a technical element score is obtained for each applicant and for each factor, and the technical tendency for each applicant is analyzed by comparing the obtained technical element scores. In this case, if the threshold value is not taken into consideration as in the present modification, the value of the technical element score of the applicant having a large number of applications tends to be too high, and there is a possibility that a highly accurate analysis cannot be performed.
Certainly, if the patents of a specific technical field are extracted without excess or deficiency and used as the analysis target document group (population), the number of applications for each applicant and each factor itself is also a sufficiently significant value. Can think. However, if the analysis target document group (population) is extracted by an arbitrary method that is not so, the number of applications itself varies depending on the characteristics of the industry to which each applicant belongs. If the number of applications for each factor is limited, there is a possibility that a highly accurate analysis cannot be performed.
In addition, when the focus is on selecting important elements from a group of documents to be analyzed (population) including a huge number of patents, the “individual importance” is more than the “large number of patents with low individual importance”. In some cases, it is preferable to focus on those that include “high patents”.
Therefore, in the present modification, only the standard value of the patent score PSi that is equal to or higher than a predetermined value is used, and a high technical element score is given only to factors including important patents that are higher than the predetermined value. In order to improve the accuracy of the technical element score.
In particular, for example, when the patent score is standardized so that the average becomes 0 and the standard value equal to or higher than the average (0) is totaled as the technical element score, not only the patent score value below the average can be discarded, Even if there are a large number of patent scores near the average, the influence on the value of the technical element score is small, and if there is something that is higher than the average, the value of the technical element score is greatly affected. Therefore, it is possible to further reduce the influence of the number of cases included in the technical elements and accurately extract the technical elements including the patents with high importance.

図１５は、上記変形例の技術要素スコア及び上記実施例の因子経過情報スコアの分布を、公報件数との関係において示した図である。詳しくは、ある分析対象文書群から抽出した複数の因子について因子経過情報スコアを算出して平均０、分散１に標準化するとともに、同じ因子について技術要素スコア（平均０、分散１に標準化されている）を算出し、これら標準化された因子経過情報スコア及び技術要素スコアを縦軸に、各因子の公報件数を横軸にとっている。
図に示されるように、因子経過情報スコアは公報件数との正比例関係を示す直線に近い分布を有しており、公報件数の影響を大きく受けている。これに対し、技術要素スコアは公報件数とまったく無関係ではないものの正比例関係を示す直線からかなり離れた領域にも分布しており、公報件数の影響が緩和されていることがわかる。FIG. 15 is a diagram showing the distribution of the technical element score of the modified example and the factor progress information score of the example in relation to the number of publications. Specifically, a factor progress information score is calculated for a plurality of factors extracted from a group of documents to be analyzed and standardized to an average of 0 and variance 1, and a technical factor score (average 0 and variance 1 is standardized for the same factor. ), And the normalized factor progress information score and technical element score are plotted on the vertical axis, and the number of publications for each factor is plotted on the horizontal axis.
As shown in the figure, the factor progress information score has a distribution close to a straight line showing a direct proportional relationship with the number of publications, and is greatly influenced by the number of publications. On the other hand, although the technical element score is not completely irrelevant to the number of publications, it is distributed in a region considerably away from a straight line showing a direct proportional relationship, which shows that the influence of the number of publications is mitigated.

図１６は、上述した図２（Ａ）の因子抽出処理において各文書の帰属因子の決定（Ｓ１０４）が完了した後、各因子に属する公報群を更に出願人ごとに分類し、因子ごと及び出願人ごとに本変形例の技術要素スコアを算出し、図示した例である。具体的な算出方法は次の通りである。
ある住宅設備機器メーカー（ａ社）の日本における特許公開公報及び特許掲載公報を類似度に基づいてクラスタ分析して複数のクラスタを得た。これら複数のクラスタを検討した結果、特に「複合構造物」に関するクラスタについて、周辺他社特許との関係を詳細に解析することにした。そこで、自社及び他社の特許公開公報及び特許掲載公報を含む約４００万件の公報群の中から、当該「複合構造物」に関するクラスタに属する公報群との類似度が高い約３０００件の公報群を抽出した。
この約３０００件の公報群を各公報の文書ベクトルの類似度に基づいてクラスタ分析し、このうち特定クラスタに属する公報３２３件を分析対象文書群として因子を抽出した。スコア算出部２０６０が、各因子に属する公報群を更に出願人ごとに分類し、因子ごと及び出願人ごとに技術要素スコアを算出した。そして、出力装置４が、その算出された技術要素スコアに基づく図を生成し、その図を出力した。
図１６において、図示左右方向に因子が列挙され、奥行き方向には出願人がその出願件数順に列挙されており、高さ方向が因子ごと及び出願人ごとの技術要素スコアを示している。但し、図１６には、因子得点が一定水準以上を超えた全公報において、出願件数が上位１０位以内となる出願人の技術要素スコアについて算出した結果のみを示している。FIG. 16 shows that after determining the attribution factor of each document (S104) in the factor extraction process of FIG. 2A described above, the publication group belonging to each factor is further classified for each applicant, In this example, the technical element score of this modification is calculated for each person. A specific calculation method is as follows.
A cluster analysis was performed on a Japanese patent publication and a patent publication of a house equipment manufacturer (company a) based on the degree of similarity to obtain a plurality of clusters. As a result of examining these multiple clusters, we decided to analyze in detail the relationship with the patents of other companies around the company, especially for the “composite structure” cluster. Therefore, from about 4 million gazette groups including patent publications and patent publication gazettes of the company and other companies, about 3000 gazette groups having a high similarity to the gazette group belonging to the cluster related to the “composite structure”. Extracted.
A cluster analysis was performed on the approximately 3000 publication groups based on the similarity of the document vectors of each publication, and among these, 323 publications belonging to a specific cluster were extracted as factors to be analyzed. The score calculation unit 2060 further classifies the publication group belonging to each factor for each applicant, and calculates the technical element score for each factor and for each applicant. Then, the output device 4 generates a diagram based on the calculated technical element score, and outputs the diagram.
In FIG. 16, factors are listed in the horizontal direction in the figure, applicants are listed in the depth direction in the order of the number of applications, and the height direction indicates the technical element score for each factor and each applicant. However, FIG. 16 shows only the results calculated for the applicant's technical element score with the number of applications within the top 10 in all publications with factor scores exceeding a certain level.

本変形例においては、技術要素スコアを算出するにあたりパテントスコアが平均以下の公報を除外して合計しているため、平均以下の公報が多くを占める因子、若しくはすべての公報が平均以下である因子については、技術要素スコアは０に近い値となるか、若しくは０となる。従って、因子間のコントラストが明瞭になり、その結果、因子間の序列や評価が視覚的に把握し易くなる。
図１６の出力結果を見ると、分析対象文書群において出願件数が少ないｇ社が、出願件数の多いａ社やｂ社に匹敵する特許群をある技術要素において保有していることが読み取れる。また、出願件数においてトップであるａ社であっても、他社に遅れをとっている技術要素が存在するなど、どこに強み、弱みがあるかが一目瞭然となっている。このように、各企業の強み、弱みを明瞭に把握することができる。In this modification, since the technical score is calculated excluding publications with a patent score below the average, the factors that account for the majority of publications below the average, or the factors that all publications are below the average For, the technical element score is close to 0 or 0. Therefore, the contrast between factors becomes clear, and as a result, the order and evaluation between factors can be easily grasped visually.
From the output result of FIG. 16, it can be seen that company g, which has a small number of applications in the analysis target document group, has a patent group comparable to company a and company b, which have a large number of applications, in a certain technical element. In addition, it is clear at a glance where the strengths and weaknesses are, even in the case of company a, which is the top in the number of applications, such as the existence of technical elements that lag behind other companies. In this way, it is possible to clearly grasp the strengths and weaknesses of each company.

なお、本変形例では、閾値に母集団での平均を利用するようにしているが、特にこれに限定するものではない。例えば、処理装置１００に、特定出願人の特許群でのパテントスコアＰＳｉの標準値の平均や、その他のユーザが定めた閾値を設定するようにしてもよい。
また、本変形例では、パテントスコアＰＳｉの標準値を利用するようにしているが、特にこれに限定するものではない。例えば、標準化していないパテントスコアＰＳｉのうち所定値以上のものだけを加算した場合であっても、件数の影響を緩和することができる。In this modification, the average of the population is used as the threshold, but the present invention is not particularly limited to this. For example, an average of the standard values of the patent score PSi in the patent group of the specific applicant and other threshold values determined by other users may be set in the processing apparatus 100.
In this modification, the standard value of the patent score PSi is used, but the present invention is not particularly limited to this. For example, the influence of the number of cases can be mitigated even when only non-standardized patent scores PSi are added that are greater than or equal to a predetermined value.

また、本変形例によれば、ユーザに技術要素スコアを提示する際、その因子に分類される特許文献のパテントスコア（ＰＳｊ）の標準値の最高値も提示することができるようになる。これにより、ユーザは、高評価の特許がどの技術要素（因子）に含まれるのかを把握できるようになる。また、それに伴いユーザは、技術要素（因子）全体としての評価値は低くても、高評価の特許が含まれる技術要素（因子）を把握することができる。
例えば、ある企業において、技術分野毎の特許の評価をしようとして、その企業（出願人）の因子毎の技術要素スコアを求めたとする。この場合、各因子の最高値を提示することにより、自社のどの技術分野に、強い特許があるのかを把握できるようになる。Moreover, according to this modification, when the technical element score is presented to the user, the highest standard value of the patent score (PSj) of the patent document classified as the factor can be presented. As a result, the user can grasp which technical element (factor) the highly evaluated patent is included in. Accordingly, even if the evaluation value as a whole of the technical elements (factors) is low, the user can grasp the technical elements (factors) including the highly evaluated patent.
For example, assume that a company seeks to evaluate a patent for each technical field and obtains a technical element score for each factor of the company (applicant). In this case, by presenting the highest value of each factor, it becomes possible to grasp which technical field of the company has strong patents.

＜６．パテントスコア（ＰＳ）について＞
つぎに、図１７〜図２０を用いて、上記変形例における技術要素スコアの算出に利用したパテントスコア（ＰＳ）について説明する。
なお、以下の説明では、パテントスコア（ＰＳ）の算出処理は、処理装置１００のパテントスコア算出部２０７０により行うようにしているが、特にこれに限定するものではない。
ＣＰＵ（Central Processing Unit）、メモリ等を備える、別のコンピュータがパテントスコアの算出処理を行うようにしてもかまわない。この場合、別のコンピュータに、パテントスコア算出部２０７０の機能を実現するプログラム（ＰＳ算出プログラム）を記憶させておく。そして、別のコンピュータのＣＰＵが「ＰＳ算出プログラム」を実行することにより、パテントスコアＰＳを算出し、上述したＰＳ情報を生成する。処理装置１００は、別のコンピュータが生成したＰＳ情報を取得してメモリに記憶させておく。なお、別のコンピュータがパテントスコアの算出処理を行う場合、処理装置１００にパテントスコア算出部２０６０を設ける必要はない。<6. About Patent Score (PS)>
Next, the patent score (PS) used for calculating the technical element score in the above modification will be described with reference to FIGS.
In the following description, the patent score (PS) calculation process is performed by the patent score calculation unit 2070 of the processing apparatus 100. However, the present invention is not limited to this.
Another computer having a CPU (Central Processing Unit), a memory, and the like may perform the patent score calculation process. In this case, a program (PS calculation program) for realizing the function of the patent score calculation unit 2070 is stored in another computer. Then, the CPU of another computer executes the “PS calculation program”, thereby calculating the patent score PS and generating the above-described PS information. The processing apparatus 100 acquires PS information generated by another computer and stores it in the memory. When another computer performs the patent score calculation process, the processing apparatus 100 need not include the patent score calculation unit 2060.

＜６−１．データ構成＞
先ず、パテントスコアＰＳの算出に利用するデータ構成について説明する。
なお、記憶装置３には、特許データ（特許公報を示す電子データ）と、特許属性情報とが格納されている。特許公報を示す電子データには、少なくとも、その特許データＩＤ（公報番号等）、出願日、ＩＰＣコード等の書誌情報が含まれるものとする。
また、特許属性情報には、その特許文献の経過情報３００（優先権主張の有無や、他の特許出願の審査での被引用回数などの情報）、および内容情報４００（請求項の数や、明細書の枚数等の情報）が含まれる。以下、経過情報３００、および内容情報４００のデータ構成を説明する。<6-1. Data structure>
First, the data structure used for calculating the patent score PS will be described.
The storage device 3 stores patent data (electronic data indicating a patent publication) and patent attribute information. The electronic data indicating the patent publication includes at least the patent data ID (gazette number, etc.), the application date, and the bibliographic information such as the IPC code.
The patent attribute information includes progress information 300 of the patent document (information such as whether priority is claimed or the number of citations in examination of other patent applications), and content information 400 (number of claims, Information such as the number of specifications). Hereinafter, the data structure of the progress information 300 and the content information 400 will be described.

先ず、経過情報３００のデータ構成の一例を図１７に示す。
図１７は、本実施形態の変形例で利用する経過情報のデータ構成の一例を模擬的に示した図である。
図示するように、経過情報３００は、「特許データＩＤ（公報番号等）」を登録するためのフィールド３０１と、「出願日からの経過日数」を登録するためのフィールド３０２と、「審査請求日からの経過日数」を登録するためのフィールド３０３と、「登録日からの経過日数」を登録するためのフィールド３０４と、「分割出願」の有無を示す情報を登録するためのフィールド３０５と、「早期審査」の有無を示す情報を登録するためのフィールド３０６と、「不服審判特許審決」の有無を示す情報を登録するためのフィールド３０７と、「異議申立維持決定」の有無を示す情報を登録するためのフィールド３０８と、「無効審判維持審決」の有無を示す情報を登録するためのフィールド３０９と、「優先権主張」の有無を示す情報を登録するためのフィールド３１０と、「ＰＣＴ出願」の有無を示す情報を登録するためのフィールド３１１と、「包袋閲覧」の有無を示す情報を登録するためのフィールド３１２と、「被引用回数」を示す情報を登録するためのフィールド３１３とを備えて、１つのレコードが構成される。なお、経過情報３００は、複数のレコードよりなる。First, an example of the data structure of the progress information 300 is shown in FIG.
FIG. 17 is a diagram schematically illustrating an example of the data configuration of the progress information used in the modification of the present embodiment.
As shown in the figure, the progress information 300 includes a field 301 for registering “patent data ID (gazette number, etc.)”, a field 302 for registering “the number of days elapsed from the filing date”, and “examination request date”. A field 303 for registering "elapsed days since", a field 304 for registering "elapsed days since registration date", a field 305 for registering information indicating whether "divisional application" exists, A field 306 for registering information indicating the presence / absence of “early examination”, a field 307 for registering information indicating the presence / absence of “trial decision of appeal”, and information indicating the presence / absence of “opposition maintenance decision” are registered. A field 308 for registering, a field 309 for registering information indicating the presence / absence of “invalidation trial decision”, and information indicating the presence / absence of “priority claim” A field 310 for registering information indicating the presence / absence of “PCT application”, a field 312 for registering information indicating the presence / absence of “packaging browsing”, and “number of times cited” And a field 313 for registering the information to be shown constitutes one record. The progress information 300 includes a plurality of records.

ここで、「出願からの経過日数」、「審査請求からの経過日数」、および「登録日からの経過日数」は、該当する特許データの期間に関する情報である。「出願からの経過日数」は出願日、「審査請求からの経過日数」は出願審査請求日、「登録日からの経過日数」は特許権設定登録日に基づき、それぞれ評価日（パテントスコアの算出日）まで又は評価日に近い所定日付までの経過日数を算出したものが記憶装置３に格納される。未だ出願審査請求されていない特許出願についての「審査請求からの経過日数」はＮＵＬＬとなり、未だ設定登録されていない特許出願についての「登録日からの経過日数」はＮＵＬＬとなる。 Here, “elapsed days from application”, “elapsed days from examination request”, and “elapsed days from registration date” are information relating to the period of the corresponding patent data. “Elapsed days from application” is the application date, “Elapsed days from examination request” is the application examination request date, and “Elapsed days from registration date” is the evaluation date (calculation of patent score). The number of days elapsed up to a predetermined date close to the evaluation date is calculated and stored in the storage device 3. “Elapsed days from examination request” for a patent application that has not yet been requested for examination of application is NULL, and “elapsed days from registration date” for a patent application that has not yet been set and registered is NULL.

経過情報３００のうち、「分割出願」、「早期審査」、「不服審判特許審決」、「異議申立維持決定」、「無効審判維持審決」、「包袋閲覧」、「優先権」は、特許データに対する所定行為の有無を示す情報である。「分割出願」は当該特許出願をもとの出願として分割出願がなされているか否か、「早期審査」は当該特許出願の早期審査がなされているか否か、「不服審判特許審決」は当該特許出願について拒絶査定不服審判が請求され、且つ当該審判において特許審決がなされているか否か、「異議申立維持決定」は当該特許について特許異議申立がなされ、且つ維持決定がなされているか否か、「無効審判維持審決」は当該特許について特許無効審判が請求され、且つ当該審判において請求棄却審決がなされているか否か、「優先権」は当該特許出願が先の特許出願等に基づく優先権主張を伴っているか否か、或いは当該特許出願が特許協力条約に基づく国際出願を国内に移行したものであるか否か、「包袋閲覧」は当該特許出願について閲覧請求がなされているか否かに基づき、それぞれ所定行為がなされている場合は例えば１が与えられ、なされていない場合は例えば０が与えられる。 Among the progress information 300, “divisional application”, “accelerated examination”, “approval appeal decision”, “opposition maintenance decision”, “invalidity decision maintenance decision”, “packaging browsing”, “priority” This is information indicating the presence or absence of a predetermined action on the data. "Divisional application" is whether the divisional application has been filed based on the patent application, "Rapid examination" is whether the patent application has been expedited, and Whether an appeal against a decision to reject the application has been requested and whether a patent trial decision has been made in that trial, whether or not the opposition maintenance decision has been made, whether or not a patent opposition has been made and a maintenance decision has been made on the patent, The “invalidation trial maintenance decision” is whether the patent invalidation trial has been requested for the patent, and whether the appeal has been rejected in the trial, “priority” is the priority claim based on the previous patent application etc. Whether or not the patent application is an international application based on the Patent Cooperation Treaty and whether or not it is a domestic application. Based on whether it is, respectively given 1 for example if the predetermined action has been performed, if not been given a 0, for example.

つぎに、内容情報４００のデータ構成を図１８に示す。
図１８は、本実施形態の変形例で利用する内容情報のデータ構成の一例を模擬的に示した図である。Next, the data structure of the content information 400 is shown in FIG.
FIG. 18 is a diagram schematically illustrating an example of a data configuration of content information used in a modification of the present embodiment.

図示するように、内容情報４００は、「特許データＩＤ（公報番号等）」を登録するためのフィールド４０１と、その特許データの「請求項数」を登録するためのフィールド４０２と、「請求項の平均文字数」を登録するためのフィールド４０３と、その特許データの「明細書枚数」を登録するためのフィールド４０４とを備えて１つのレコードが構成される。なお、内容情報４００は、複数のレコードよりなる。
ここで、「請求項数」は、当該特許出願の請求項数を示す情報であり、「請求項の平均文字数」は、当該特許出願の請求項１項あたりの平均文字数（又は単語数）を示す情報である。「明細書頁数」は、当該特許出願の明細書頁数又は公報頁数を示す情報である。これらの情報は各特許出願の公開特許公報その他の特許データより抽出される。As shown in the figure, the content information 400 includes a field 401 for registering “patent data ID (gazette number, etc.)”, a field 402 for registering “number of claims” of the patent data, and “claim One record is composed of a field 403 for registering the “average number of characters” and a field 404 for registering the “number of specifications” of the patent data. The content information 400 includes a plurality of records.
Here, the “number of claims” is information indicating the number of claims of the patent application, and the “average number of characters of the claim” is the average number of characters (or the number of words) per claim of the patent application. It is information to show. The “number of specification pages” is information indicating the number of specification pages or publication pages of the patent application. Such information is extracted from published patent gazettes and other patent data of each patent application.

＜６−２．パテントスコア算出処理＞
続いて、図１９を用いて説明する。図１９は、本実施形体の変形例のパテントスコアの算出処理の手順を示したフローチャートである。<6-2. Patent score calculation processing>
Next, description will be made with reference to FIG. FIG. 19 is a flowchart illustrating a procedure of a patent score calculation process according to a modification of the present embodiment.

図１９に示すように、パテントスコア算出部２０７０は、ユーザからのＩＰＣコードの入力を受け付け、特許データ（特許公報を示す電子データ）を取得する（Ｓ４００）。
具体的には、パテントスコア算出部２０７は、ユーザからのＩＰＣコードの入力を受け付けると、記憶装置３にアクセスし、そのＩＰＣコードに分類される特許データを取得する。なお、特許データには、その特許出願の出願日の情報や優先日の情報（優先権を主張している場合に限る）等の書誌情報が含まれているAs shown in FIG. 19, the patent score calculation unit 2070 receives an input of an IPC code from the user, and acquires patent data (electronic data indicating a patent publication) (S400).
Specifically, when receiving an IPC code input from a user, the patent score calculation unit 207 accesses the storage device 3 and acquires patent data classified into the IPC code. The patent data includes bibliographic information such as the filing date information and priority date information of the patent application (only when priority is claimed).

つぎに、パテントスコア算出部２０７０は、取得した特許データの書誌情報のうち出願日の情報又は優先日の情報等を用いて、特許データを所定期間ごと（本実施形態の変形例では出願年ごと、優先日が属する年ごと等）のグループｔに分類する（Ｓ５００）。
つぎに、パテントスコア算出部２０７０は、各特許データの評価値を算出する（Ｓ６００）。この処理の詳細を、図２０に基づいて説明する。Next, the patent score calculation unit 2070 uses the application date information or the priority date information among the obtained bibliographic information of the patent data, and converts the patent data for each predetermined period (in the modification of this embodiment, for each application year). , For each year to which the priority date belongs, etc.) (S500).
Next, the patent score calculation unit 2070 calculates an evaluation value of each patent data (S600). Details of this processing will be described with reference to FIG.

図２０は、本実施形態の変形例の各特許データの評価値を算出する処理の詳細を示すフローチャートである。
パテントスコア算出部２０７０は、Ｓ２１０の分類によって生成されたグループに属する特許データについて、経過情報３００および内容情報４００を取得する（Ｓ６１０）。具体的には、パテントスコア算出部２０７０は、取得した特許データの書誌情報に含まれる特許ＩＤ（公報番号等）を利用して、記憶装置３に格納されている経過情報３００および内容情報４００の中から、取得した特許データの特許ＩＤに関連付けられている経過情報３００および内容情報４００を取得する。
ここで、図２０では、当該取得した１つのグループがＪ件の特許データからなるものとし、Ｊ件のそれぞれを区別するため添え字ｊ（ｊ＝１，２，・・・，Ｊ）を用いる。
Ｊ件の特許データを取得したら、これらＪ件の特許データの経過情報３００および内容情報４００を用いて、後述のＳ６３０２〜Ｓ６３０４で用いる「評価項目の該当有無データのＪ件分の合計値」等を予め求めておく。FIG. 20 is a flowchart showing details of processing for calculating an evaluation value of each patent data according to a modification of the present embodiment.
The patent score calculation unit 2070 acquires the progress information 300 and the content information 400 for the patent data belonging to the group generated by the classification of S210 (S610). Specifically, the patent score calculation unit 2070 uses the patent ID (gazette number, etc.) included in the bibliographic information of the acquired patent data to store the progress information 300 and the content information 400 stored in the storage device 3. The progress information 300 and the content information 400 associated with the patent ID of the acquired patent data are acquired from the inside.
Here, in FIG. 20, it is assumed that the acquired one group consists of J patent data, and a subscript j (j = 1, 2,..., J) is used to distinguish each of the J cases. .
When J patent data is acquired, using the J patent data progress information 300 and content information 400, “total value for J of the evaluation item corresponding data” used in S6302 to S6304, which will be described later, etc. Is obtained in advance.

次に、変数ｊを１にセットし（Ｓ６２０）、次のようにして特許データｊの評価素点を算出する。 Next, the variable j is set to 1 (S620), and the evaluation raw score of the patent data j is calculated as follows.

まず、経過情報３００の各フィールドに登録されている情報を評価項目とし、Ｉ個の評価項目ｉ（ｉ＝１，２，・・・，Ｉ）について、評価項目ごとに予め設定された評価点算出方法を選択する（Ｓ６３０１）。 First, information registered in each field of the progress information 300 is used as an evaluation item, and I evaluation items i (i = 1, 2,..., I) are evaluated in advance for each evaluation item. A calculation method is selected (S6301).

本実施形態の変形例における評価点算出方法には次の３通りがある。すなわち、フィールド３０５、３０６、３０７、３０８、３０９、３１０、３１１、３１２に登録されている情報については、当該特許データに対する所定行為の有無を示す情報としてＳ６３０２〔有無型〕を選択する。また、フィールド３０２、３０３、３０４については、当該特許データの期間に関する情報としてＳ６３０３〔時間減衰型〕を選択する。また、フィールド３１３については、当該特許データの引用回数を示す情報としてＳ６３０４〔回数型〕を選択する。 There are the following three evaluation point calculation methods in the modification of the present embodiment. That is, for information registered in the fields 305, 306, 307, 308, 309, 310, 311 and 312, S6302 [Presence / absence type] is selected as information indicating the presence / absence of a predetermined action on the patent data. For fields 302, 303, and 304, S6303 [time decay type] is selected as information related to the period of the patent data. In the field 313, S6304 [number-of-times] is selected as information indicating the number of times the patent data is cited.

評価点算出方法を選択したら、Ｉ個の評価項目ｉの各々について、特許データｊの評価点を算出する（Ｓ６３０２、Ｓ６３０３、Ｓ６３０４）。 When the evaluation score calculation method is selected, the evaluation score of the patent data j is calculated for each of the I evaluation items i (S6302, S6303, S6304).

＜６−２−１．有無型における評価点の算出＞
Ｓ６３０２〔有無型〕が選択された評価項目ｉについては、次の［数３］により評価点を算出する。

<6-2-1. Calculation of evaluation score for presence / absence type>
For the evaluation item i for which S6302 [presence / absence type] is selected, the evaluation score is calculated by the following [Equation 3].

ここで分子に配置された「評価項目ｉの該当有無データ」は、例えば「分割出願」については、上述のように分割出願がなされていれば１、なされていなければ０となる。 Here, the “existence / non-existence data of the evaluation item i” arranged in the numerator is, for example, “1” if the divisional application has been filed as described above, and “0” if it has not been made.

分母には、上記「評価項目ｉの該当有無データ」の当該グループ内合計値の正の平方根が配置されている。従って、当該グループ内に評価項目該当の特許データが多数存在する場合は分母が大きく、当該グループ内に評価項目該当の特許データが少数しか存在しない場合は分母が小さくなる。該当件数の多い評価項目（「包袋閲覧」等）を有する特許よりも、該当件数の少ない評価項目（「無効審判維持審決」等）を有する特許の方が、特許権設定登録後の維持率が高い傾向がある（一般に、維持率の高さは、維持費（特許料）に見合う経済的価値の高さを示すと考えられる）ので、各評価項目の重み付けが自動的になされる。また、所定期間ごとのグループ単位で集計しているので、例えば古い特許ほど多くの経過情報が付加され、公開されて間もない新しい特許には未だ経過情報が付加されていないことが多いが、それだけの理由で新しい特許に低い評価が与えられるという傾向を緩和することができる。
特許データの属性情報は、分析対象母集団内での相対評価に有用であるが、この分析対象母集団内の特許出願又は特許権を平等に扱ってしまうと適切な評価はできない。本実施形態によれば、分析対象母集団を時期ごとのグループに分類し、この分類されたグループごとに求めた値を分母として用いることで、異なる時期の特許出願又は特許権を含む分析対象母集団内において、適切な相対評価が可能となる。
また、例えばある技術分野において、特許出願が少ない同時期グループにおける１件の価値と、特許出願が多くなった同時期グループにおける１件の価値とでは、前者の価値の方が高いことが多い。一方で例えば、出願公開されて間もない特許出願より、数年経過した特許出願の方が、閲覧請求を受けた等の経過情報が付与される可能性は必然的に高いが、だからといって出願公開されて間もない特許出願をそのまま低く評価するのは誤りである。同時期グループ内の特許出願の中で、例えば閲覧請求を受けたものが数少ない場合、その閲覧請求を受けた特許出願は格別注目度の高い特許出願であり、高く評価されるべきである。逆に、同時期グループ内の特許出願の中で、閲覧請求を受けたものが数多い場合、その閲覧請求を受けた特許出願は、閲覧請求を受けたというだけの理由で高く評価されるべきものではない。
本実施形態によれば、各グループに属する各特許データの特許属性情報を利用して求めた値と、該グループに属する各特許データの特許属性情報を利用して求めた値を該グループ毎に合計した値の減少関数の値と、の積により評価点を算出する。この構成によれば、それぞれのグループにおける各特許データの相対的な位置づけを考慮した値を評価値として求めることができる。その結果、経過情報に基づく数値情報の前記同時期グループにおける合計値が低いほど高い重み付けをし、逆に当該合計値が高いほど低い重み付けをすることにより、分析対象文書群における特許出願又は特許権の適切な評価が可能となる。In the denominator, the positive square root of the in-group total value of the above “evaluation item i presence / absence data” is arranged. Therefore, the denominator is large when there are many patent data corresponding to the evaluation items in the group, and the denominator is small when there are only a few patent data corresponding to the evaluation items in the group. Patents with fewer evaluation items (such as “Invalidation Trial Maintenance Decision”) than patents with a higher number of evaluation items (such as “Bag Viewing”) will be maintained after patent registration (In general, a high maintenance rate is considered to indicate a high economic value commensurate with the maintenance cost (patent fee)), and thus each evaluation item is automatically weighted. In addition, since it is tabulated in groups for each predetermined period, for example, older patents have more progress information added, and new patents that have just been published often do not yet have progress information added. It can alleviate the tendency for new patents to be given low ratings for that reason.
The attribute information of the patent data is useful for relative evaluation within the analysis target population, but proper evaluation cannot be performed if the patent applications or patent rights in the analysis target population are treated equally. According to the present embodiment, the analysis object population including patent applications or patent rights at different periods is classified by classifying the analysis object population into groups for each period and using the value obtained for each classified group as a denominator. Appropriate relative assessment is possible within the population.
For example, in a certain technical field, the former value is often higher between one value in a simultaneous group with few patent applications and one value in a simultaneous group with many patent applications. On the other hand, for example, a patent application that has passed several years is more likely to be given progress information, such as a request for browsing, than a patent application that has just been published. It is an error to underestimate a patent application that has just been made. For example, if only a few of the patent applications in the same period group have been requested to be browsed, the patent application that has received the request for browsing is a patent application with a particularly high degree of attention and should be highly evaluated. On the other hand, if there are a large number of requests for inspection among patent applications in the same period group, the patent application that received the request for inspection should be highly evaluated just because it was requested for inspection. is not.
According to the present embodiment, the value obtained using the patent attribute information of each patent data belonging to each group and the value obtained using the patent attribute information of each patent data belonging to the group are determined for each group. The evaluation score is calculated by multiplying the sum of the values by the value of the decreasing function. According to this structure, the value which considered the relative positioning of each patent data in each group can be calculated | required as an evaluation value. As a result, the lower the total value of the numerical information based on the progress information in the simultaneous group, the higher the weight, and conversely the lower the higher the total value, the lower the weight, so that Appropriate evaluation is possible.

＜６−２−２．時間減衰型における評価点の算出＞
Ｓ６３０３〔時間減衰型〕が選択された評価項目ｉについては、次の［数４］により評価点を算出する。

<6-2-2. Calculation of evaluation points for time decay type>
For the evaluation item i for which S6303 [Time decay type] is selected, the evaluation score is calculated by the following [Equation 4].

ここで分子に配置された「Exp（−（Min（経過時間，年限））／年限）」は、「審査請求からの経過日数」については、当該「審査請求からの経過日数（年数換算値）」と「年限」のうち何れか小さい方の値を「年限」で除算し−１を乗算した値で、ネイピア数ｅをべき乗した値である。「年限」は出願日から特許権存続期間満了までの最大年数（日本の現行法では２０年）とする。「登録日からの経過日数」の場合も同じ計算式を用い、「年限」は出願日から特許権存続期間満了までの最大年数（日本の現行法では２０年）とする。「出願日からの経過日数」の場合も同じ計算式を用いるが、「年限」は出願日から出願審査請求期限までの年数（日本の現行法では３年）とする。これによると、経過時間が短いうちは分子の値はExp（０）＝１に近い値であるが、時間の経過とともに減衰して経過時間≧年限となるとExp（−１）＝１／ｅにまで低下する。指数関数にする利点は、価値に対する減価償却効果を導入できることと、評価値分布の離散化をなくし滑らかな分布にできることである。「審査請求からの経過日数」、「出願日からの経過日数」、「登録日からの経過日数」は、多くの特許に該当する基本評価項目であり、これら３評価項目しか該当しない特許群の同点化を避けることができる。 “Exp (− (Min (elapsed time, year limit)) / year limit)” placed in the numerator here is the “number of days elapsed since the request for examination”. ”Or“ year ”, which is the value obtained by dividing the smaller value by“ year ”and multiplying by −1 and the power of the Napier number e. The “year” is the maximum number of years from the filing date until the expiration of the patent right (20 years under the current Japanese law). The same formula is used for “elapsed days from registration date”, and “year” is the maximum number of years from the filing date to the expiration of the patent term (20 years under the current Japanese law). The same formula is used for the “elapsed days from the filing date”, but the “year” is the number of years from the filing date to the application examination request deadline (3 years in the current Japanese law). According to this, while the elapsed time is short, the value of the numerator is a value close to Exp (0) = 1. However, when the elapsed time ≧ years, the value of Exp (−1) = 1 / e. To fall. The advantage of using an exponential function is that a depreciation effect on the value can be introduced and that the evaluation value distribution can be eliminated and a smooth distribution can be achieved. “Elapsed days from request for examination”, “Elapsed days from application date”, and “Elapsed days from registration date” are basic evaluation items applicable to many patents. Tying can be avoided.

分母は上記Ｓ６３０２〔有無型〕と同様の式が配置されているが、「審査請求からの経過日数」については、当該特許出願につき出願審査請求されていれば例えば１、されていなければ例えば０の値を当該グループ内で合計し正の平方根をとったものである。「登録日からの経過日数」についても、当該特許出願につき特許権設定登録されていれば１、されていなければ０の値を当該グループ内で合計し正の平方根をとったものが分母となる。「出願からの経過日数」については、すべての特許データが該当するので、当該評価項目の該当有無データを１とすれば、分母の値はグループ内の特許データの件数の正の平方根に等しくなる。何れの場合も、当該グループ内に評価項目該当の特許データが多数存在する場合は分母が大きく、当該グループ内に評価項目該当の特許データが少数しか存在しない場合は分母が小さくなる。上述のように「審査請求からの経過日数」、「出願日からの経過日数」、「登録日からの経過日数」は、多くの特許に該当する基本評価項目であるので、これら評価項目の配点は小さくなりやすい。 The denominator has the same formula as the above S6302 [Presence / absence type], but the “days since examination request” is, for example, 1 if an application examination request is made for the patent application, and if not, for example 0 Are summed within the group to obtain a positive square root. For the “elapsed days from the date of registration”, the denominator is a value obtained by adding a value of 1 within the group by taking the positive square root by adding 1 if the patent application has been registered for patent right setting and not being registered. . Since all patent data falls under “Elapsed days since filing”, the value of the denominator is equal to the positive square root of the number of patent data in the group, assuming that the evaluation data of the relevant evaluation item is 1. . In any case, the denominator is large when there are many patent data corresponding to the evaluation items in the group, and the denominator is small when there are only a few patent data corresponding to the evaluation items in the group. As described above, “Elapsed days from request for examination”, “Elapsed days from application date”, and “Elapsed days from registration date” are basic evaluation items applicable to many patents. Tends to be small.

このＳ６３０３〔時間減衰型〕で算出された評価点は、更に内容情報による補正を行う。
なお、以下では、図１８に示した内容情報４００を利用する。
経過情報のみにより評価する場合、出願公開後又は特許権設定登録後間もない特許出願又は特許権には、今後付与されると期待される経過情報がなく評価が正しく行えない可能性がある。従ってこれを補正するため、経過情報による評価に内容情報を加味する。しかし、内容情報は、経過情報ほど維持率との相関が高くない傾向にあり、不用意に内容情報を加味すると却って評価の精度が落ちる可能性がある。
そこで、経過情報が十分に付与された特許の評価には内容情報の影響を小さくとどめ、経過情報が不十分な特許の評価に内容情報を効果的に反映させるため、このＳ２２３Ｃ〔時間減衰型〕で算出された評価点にのみ、内容情報に基づく補正係数を乗算する。
このように本実施形態によれば、出願の古い新しいを問わず、どの特許データにも一律に付与されやすい特性を有する期間に関する情報に、各々の特許データの内容情報を加味することができる。その結果、経過情報があまり付与されていない新しい出願からなる特許データについても、適切な評価を行うことができる。The evaluation score calculated in S6303 [time decay type] is further corrected by content information.
In the following, the content information 400 shown in FIG. 18 is used.
When the evaluation is based only on the progress information, there is a possibility that the patent application or the patent right shortly after the publication of the application or the registration of the patent right does not have the progress information expected to be granted in the future and the evaluation cannot be performed correctly. Therefore, in order to correct this, content information is added to the evaluation based on the progress information. However, the content information tends not to have a high correlation with the maintenance rate as the progress information, and if the content information is inadvertently added, the accuracy of the evaluation may decrease.
Therefore, in order to keep the influence of the content information small in the evaluation of a patent with sufficient progress information and to effectively reflect the content information in the evaluation of a patent with insufficient progress information, this S223C [time decay type] Only the evaluation score calculated in (5) is multiplied by the correction coefficient based on the content information.
As described above, according to the present embodiment, regardless of whether the application is old or new, it is possible to add the content information of each patent data to the information related to the period having characteristics that are easily given to any patent data. As a result, it is possible to perform appropriate evaluation even for patent data consisting of a new application to which little progress information is given.

具体的には、上記［数４］の各評価点に、
ａ_１×ａ_２×ａ_３
ここで、
ａ_１＝２^1/3（請求項当たりの平均文字数が平均以下の場合）又は
２^-1/3（請求項当たりの平均文字数が平均以上の場合）
ａ_２＝２^1/3（全頁数が平均以上の場合）又は
２^-1/3（全頁数が平均以下の場合）
ａ_３＝２^1/3（請求項数が平均値±１標準偏差以内の場合）又は
２^-1/3（請求項数が上記範囲外の場合）
を乗算する。ａ_１、ａ_２、ａ_３の最大値をそれぞれ２^1/3とすることにより、ａ_１×ａ_２×ａ_３を最大値とする補正にとどめている。なお、上記実施形態では、ａ_１×ａ_２×ａ_３の値が最大で２になるようにしている。Specifically, for each evaluation point in [Equation 4],
a ₁ × a ₂ × a ₃
here,
a ₁ = 2 ^1/3 (when the average number of characters per claim is below average) or 2 ^-1/3 (when the average number of characters per claim is above average)
a ₂ = 2 ^1/3 (when the total number of pages is above average) or 2 ^-1/3 (when the total number of pages is below average)
a ₃ = 2 ^1/3 (when the number of claims is within an average value ± 1 standard deviation) or 2 ^−1/3 (when the number of claims is outside the above range)
Multiply By setting the maximum values of a ₁ , a ₂ , and a ₃ to 2 ^1/3 , the correction is limited to a ₁ × a ₂ × a ₃ as the maximum value. In the above embodiment, the value of a ₁ × a ₂ × a ₃ is set to 2 at the maximum.

＜６−２−３．回数型における評価点の算出＞
Ｓ６３０４〔回数型〕が選択された評価項目ｉについては、次の［数５］により評価点を算出する。

<6-2-3. Calculation of evaluation points for the frequency type>
For the evaluation item i for which S6304 [number-of-times] is selected, an evaluation score is calculated by the following [Equation 5].

ここで分子に配置された「ｆ（引用）×log（ｎ_ｊ＋１）」は、「被引用回数」については、当該「被引用回数ｎ_ｊ」に１を加えた値の対数に重みｆ（引用）を乗算したものである。本発明者らの検証により、被引用の有無にとどまらずその回数によっても特許権の維持率が変化することがわかっているが、両者に比例関係はなく、被引用回数の増加による維持率の増加は次第に頭打ちの傾向を示すため、対数をとることとしたものである。Here, “f (quotation) × log (n _j +1)” arranged in the numerator is the weight of the logarithm of the value obtained by adding 1 to the “cited count n _j ” for the “cited count”. Quoting). According to the verification by the present inventors, it has been found that the maintenance rate of the patent right changes depending on the number of citations as well as the presence or absence of citations. Since the increase gradually shows a tendency to peak, the logarithm is taken.

分母には、上記「ｆ（引用）×log（ｎ_ｊ＋１）」の当該グループ内合計値の正の平方根が配置されている。従って、当該グループ内に他の出願で引用された特許データが多数存在する場合は分母が大きく、当該グループ内に他の出願で引用された特許データが少数しか存在しない場合は分母が小さくなる。In the denominator, the positive square root of the total value in the group of “f (quotation) × log (n _j +1)” is arranged. Accordingly, the denominator is large when there are many patent data cited in other applications in the group, and the denominator is small when there are only a few patent data cited in other applications in the group.

上記［数５］の分子及び分母において、重みｆ（引用）は任意の正数を用いることができるが、他社の特許出願で引用された回数（他社引用回数）ｎ_ｊotherと自社の他の特許出願で引用された回数（自社引用回数）ｎ_ｊselfとで区別し、それぞれの対数に異なる重みを付与する。この場合、上記［数５］に代え、次の［数６］を用いる。

具体的な重みとしては、他社引用の場合のｆ（引用_other）と、自社引用の場合のｆ（引用_self）との比を、１：２とした。In the numerator and denominator of [Formula 5], an arbitrary positive number can be used as the weight f (quotation), but the number of times cited in other patent applications (number of times other companies cited) n _jother and other patents of the company It is distinguished from the number of times cited in the application (in-house citation number) n _jself, and a different weight is _assigned to each logarithm. In this case, instead of the above [Equation 5], the following [Equation 6] is used.

As a specific weight, the ratio of f (quoting _other ) in the case of _other company citations and f (quoting _self ) in the case of company citations was set to 1: 2.

被引用回数は、特許の価値との間に高い相関がある。更に、本発明者らの検証によれば、他社の特許出願の審査において引用（他社引用）された回数と、自社の他の特許出願の審査において引用（自社引用）された回数とでは、後者と特許の価値との相関が有意に高いことが認められた。自社の他の特許出願の審査において引用された発明は、自社の実施技術において中核となる基本発明であることが多いことによるものと推測される。そして、そのような基本発明を自社が既に出願していることを認識しつつ、その改良技術をも出願し強固な特許ポートフォリオの構築を図った可能性が高い。
本実施形態によれば、被引用回数を他社引用と自社引用とに分けて考え、後者の回数をより大きく評価値に反映させることにより、特許出願又は特許権の適切な評価が可能となる。The number of times cited is highly correlated with the value of a patent. Furthermore, according to the verification by the present inventors, the number of times cited in the examination of patent applications of other companies (citation of other companies) and the number of times cited (in-house quotation) in examinations of other patent applications of the company are the latter. Was found to be significantly higher in correlation with patent value. The invention cited in the examination of other patent applications of the company is presumed to be due to the fact that it is often the basic invention that is the core in the implementation technology of the company. And while recognizing that the company has already applied for such a basic invention, there is a high possibility that the company has applied for the improved technology and built a strong patent portfolio.
According to this embodiment, it is possible to appropriately evaluate a patent application or a patent right by considering the number of citations separately from other company citations and company citations, and reflecting the latter number more largely in the evaluation value.

＜６−２−４．評価素点の算出＞
全ての評価項目ｉ（ｉ＝１，２，・・・，Ｉ）について、特許データｊの評価点が算出されたら、これに基づいて当該特許データｊの評価素点を、次の［数７］により算出する（Ｓ６４０）。

この式に示されるように、評価素点は、Ｉ個の評価点の二乗和の正の平方根、又は０となる。評価素点が０となるのは、審査請求期限までに出願審査請求しなかった場合、出願を取下げ又は放棄した場合、拒絶査定が確定した場合、その他特許出願が失効した場合と、異議申立による取消決定や無効審判による無効審決が確定した場合、特許権を放棄した場合、特許権の存続期間が満了した場合、その他の特許権が消滅した場合である。これらの情報も各特許データの経過情報から読み取り、該当する場合は評価素点を０とする。
上述のようにＳ６３０３〔時間減衰型〕で算出された評価点に対しては、内容情報による補正を行う。具体的には、「審査請求からの経過日数」、「出願日からの経過日数」、「登録日からの経過日数」に基づき上述の［数４］で算出された評価点にそれぞれ上述のａ_１×ａ_２×ａ_３を乗算した上で、［数７］に従い二乗和の平方根をとる。<6-2-4. Calculation of evaluation score>
When the evaluation score of the patent data j is calculated for all the evaluation items i (i = 1, 2,..., I), the evaluation score of the patent data j is calculated based on the evaluation score of the following [Expression 7]. ] (S640).

As shown in this equation, the evaluation raw score is a positive square root of the sum of squares of I evaluation points, or 0. The evaluation score is 0 because the application request is not requested by the deadline for requesting examination, the application is withdrawn or abandoned, the decision of refusal is finalized, other patent applications have expired, The case where the decision of revocation or the trial for invalidation by the trial for invalidation is finalized, the patent right is abandoned, the duration of the patent right expires, or the other patent right is extinguished. These pieces of information are also read from the progress information of each patent data, and the evaluation raw score is set to 0 when applicable.
As described above, the evaluation score calculated in S6303 [time decay type] is corrected by the content information. Specifically, each of the evaluation points calculated in the above [Equation 4] based on “the number of days elapsed since the request for examination”, “the number of days elapsed since the application date”, and “the number of days elapsed since the registration date” _After multiplying by ₁ × a ₂ × a ₃ , the square root of the sum of squares is taken according to [Equation 7].

複数の評価項目による評価点ｉから評価素点を算出する方法として、各評価点ｉの総和を求める方法がある（単純和法）。しかしこの算出方法によると、特許の維持率（経済的価値）との相関を有する経過情報が多数付与された特許の評価が高く算出されるので、評価点ｉの総和を評価素点とすることは一見合理的であるが、維持率との相関があまり高くない経過情報を多数付与されている特許の（低い評価点が多数加算される）評価素点が、維持率との相関が極めて高い経過情報を少数付与されている特許の評価素点を超えてしまうことがあり得るので注意が必要である。
この問題を解決する１つの方法として、各評価点ｉのうち最大値を評価素点とする方法もある（最大値法）。しかしこの算出方法によると、特に、ある経過情報と特許群の維持率との相関を調べる場合に、他にどんな経過情報が付与されているか無関係に相関を調べた場合には、ある特許の維持率は、最高の維持率を持つ経過情報の維持率で最もよく表現できると期待されるので、評価点ｉの最大値を評価素点とすることは一見合理的であるが、評価点ｉの最大値が２つの特許で同じである場合に優劣がつけられない。さらに、最大値法を用いた場合は、出願人、特許庁及び競合他社の異なる３主体の観点を加味した評価を行うことができず、それらの主体のうちのいずれか一者の観点のみが反映されることとなってしまい、残りの主体の観点を特許データの評価に反映させることができない。
二乗和の平方根をとる上述の方法は、単純和法と最大値法の長所を兼ね備えた方法ということができる。すなわち、二乗和の平方根をとることにより、ある特許データｊに関するＩ個の評価項目ｉの中に高い評価点ｉがあるときは、その高い評価点ｉが評価素点に大きく影響する。そして、評価点ｉの高い評価項目以外の評価点についても、幾らか考慮された評価素点となる。従って、評価点ｉの高くなりやすい「早期審査」、「異議申立維持決定」、「無効審判維持審決」等に複数該当するような特許データｊに対しては、突出して高い評価素点を与えることができる。
このように本変形例では、特許属性情報の種類に応じて算出した評価点を全て加味した特許評価を行うようにしている（Ｓ６３０、Ｓ６４０）。その結果、特許データの価値を多面的に評価することが可能となる。As a method of calculating an evaluation raw score from an evaluation point i based on a plurality of evaluation items, there is a method of calculating a sum of each evaluation point i (simple sum method). However, according to this calculation method, since the evaluation of a patent to which a lot of historical information having a correlation with the patent maintenance rate (economic value) is given is calculated high, the sum of the evaluation points i should be used as an evaluation raw score. Is reasonable at first glance, but the evaluation score of a patent that has been granted a lot of historical information that does not have a high correlation with the maintenance rate (a lot of low evaluation points are added) has a very high correlation with the maintenance rate Care should be taken because it may exceed the evaluation score of a patent to which a small amount of progress information is granted.
As one method for solving this problem, there is a method in which the maximum value among the evaluation points i is used as an evaluation raw score (maximum value method). However, according to this calculation method, especially when investigating the correlation between certain historical information and the maintenance rate of a group of patents, when investigating the correlation regardless of what other historical information is given, maintaining a certain patent Since the rate is expected to be best expressed by the maintenance rate of the historical information having the highest maintenance rate, it is reasonable to use the maximum value of the evaluation point i as an evaluation raw score. If the maximum value is the same in the two patents, no superiority or inferiority is given. Furthermore, when the maximum value method is used, it is not possible to carry out an evaluation that takes into account the viewpoints of three different entities of the applicant, the JPO, and competitors, and only the viewpoints of any one of those entities The viewpoint of the remaining subject cannot be reflected in the evaluation of patent data.
The above-described method for taking the square root of the sum of squares can be said to be a method that combines the advantages of the simple sum method and the maximum value method. That is, by taking the square root of the sum of squares, when there is a high evaluation point i in I evaluation items i related to a certain patent data j, the high evaluation point i greatly affects the evaluation raw score. The evaluation points other than the evaluation item having a high evaluation point i are also evaluation raw points that are somewhat considered. Therefore, a high evaluation score is given to patent data j that corresponds to multiple items such as “early examination”, “opposition to maintain opposition”, and “invalidation maintenance decision” that tend to be high. be able to.
Thus, in this modification, patent evaluation is performed in consideration of all evaluation points calculated according to the type of patent attribute information (S630, S640). As a result, it is possible to evaluate the value of patent data from multiple aspects.

＜６−２−５．評価値の算出＞
評価素点が算出されたら、その対数を算出して当該特許データｊの評価値とする（Ｓ６５０）。
経過情報又は内容情報に基づいて算出される評価値は、特異な経過又は内容が読み取れる数少ない特許出願又は特許権に対しては高い値が与えられるが、その他大勢の特許出願又は特許権に対しては低い値が与えられることが多い。従って評価値別の件数分布を見ると、評価値が高い特許出願又は特許権は数少なくまばらな分布となり、評価値が低い特許出願又は特許権は数多く密集した分布となる。
このような場合には、評価値の高い少数の特許出願又は特許権によって平均値（相加平均値）が大きく左右されるので、このような平均値との比較によって評価する際は注意が必要となる。また例えば高い評価値が得られた２つの特許出願又は特許権を比較する場合に、数値の上では評価値に大きな差があるように見えたとしても、実際には有意な差ではないこともある。<6-2-5. Calculation of evaluation value>
When the evaluation raw score is calculated, its logarithm is calculated and used as the evaluation value of the patent data j (S650).
The evaluation value calculated based on the progress information or content information is given a high value for a few patent applications or patent rights whose unique progress or content can be read, but for many other patent applications or patent rights. Is often given a low value. Accordingly, looking at the distribution of the number of evaluation values, the number of patent applications or patent rights with high evaluation values is a few and sparse distribution, and the number of patent applications or patent rights with low evaluation values is a dense distribution.
In such a case, the average value (arithmetic average value) is greatly influenced by a small number of patent applications or patent rights with high evaluation values, so care must be taken when evaluating by comparison with such average values. It becomes. In addition, for example, when comparing two patent applications or patent rights that have obtained high evaluation values, even if it appears that there is a large difference in evaluation values, it may not be a significant difference in practice. is there.

次に、すべての特許データｊについて評価値を算出したか否かを判定し（Ｓ６６０）、算出していない場合（Ｓ６６０：ＮО）、Ｓ６７０に進み、変数ｊをｊ＋１にセットし、Ｓ６３０に戻って次の特許データについて評価値を算出する。
すべての特許データｊについて評価値を算出した場合は（Ｓ６６０：ＹＥＳ）、当該グループに属する特許データに関する評価値の算出処理を終了する。
このように本実施形態では、特性の異なる複数の特許データを、技術分野ごと、出願時期ごとの特性を加味した上で評価するようにしている。その結果、特許データの価値をより適切に評価することができる。Next, it is determined whether or not evaluation values have been calculated for all patent data j (S660). If not calculated (S660: NO), the process proceeds to S670, the variable j is set to j + 1, and the process returns to S630. The evaluation value is calculated for the following patent data.
When the evaluation values are calculated for all the patent data j (S660: YES), the evaluation value calculation processing for the patent data belonging to the group ends.
As described above, in the present embodiment, a plurality of patent data having different characteristics are evaluated in consideration of the characteristics for each technical field and each filing time. As a result, the value of patent data can be more appropriately evaluated.

Ｓ６１０〜Ｓ６７０までの評価値算出処理は、Ｓ４００で取得した特許データをＳ５００で分類して得られたすべてのグループｔについて実行する。
すべてのグループｔについて評価値を算出したら図１９に戻り、この評価値に基づいて、Ｓ４００で取得した分析対象母集団における偏差値をパテントスコアＰＳとして算出する（Ｓ７００）。この偏差値は、本来ならば比較することが困難な、異なる技術分野間の特許データの相対比較（Ｓ４００で異なるＩＰＣにより別途選択される分析対象母集団との比較）をも可能とするものである。The evaluation value calculation processing from S610 to S670 is executed for all the groups t obtained by classifying the patent data acquired in S400 in S500.
When the evaluation values are calculated for all the groups t, the processing returns to FIG. 19, and the deviation value in the analysis target population acquired in S400 is calculated as the patent score PS based on the evaluation values (S700). This deviation value also enables relative comparison of patent data between different technical fields that are difficult to compare (comparison with a population to be analyzed separately selected by different IPCs in S400). is there.

そして、本実施形態の変形例では、上記の手順により求めたパテントスコアＰＳを基にして、技術要素スコアを算出するようにしているため、上記実施形態に比べて、以下のような利点がある。
具体的には、上記変形例では、技術要素スコアの基となるパテントスコアＰＳは、経過情報の種類に応じた重みを考慮している。そして、そのパテントスコアＰＳを用いて、技術要素スコアを求めるようにしているため、変形例では、より精度が高いスコアが算出される。
本変形例のパテントスコアによれば、分析対象母集団を時期ごとのグループに分類し、この分類されたグループごとに求めた値を分母として用いることで、異なる時期の特許出願又は特許権を含む分析対象母集団内において、適切な相対評価が可能としている。
そのため、出願が古い特許データが多く分類されている因子の技術要素スコアに、高い評価値が算出されてしまう可能性を低減できる。And in the modification of this embodiment, since the technical element score is calculated based on the patent score PS obtained by the above procedure, there are the following advantages compared to the above embodiment. .
Specifically, in the above modification, the patent score PS that is the basis of the technical element score takes into account the weight according to the type of progress information. Since the technical element score is obtained using the patent score PS, a higher accuracy score is calculated in the modification.
According to the patent score of this modification, the analysis target population is classified into groups for each period, and the values obtained for each classified group are used as denominators to include patent applications or patent rights at different periods. Appropriate relative evaluation is possible within the analysis population.
Therefore, it is possible to reduce the possibility that a high evaluation value is calculated for the technical element score of a factor in which many patent data whose applications are old are classified.

Claims

Text data acquisition means for acquiring a plurality of technical documents represented by text data;
For each acquired technical document, a weighting amount calculating means for obtaining a weighting amount of each index word,
Using each obtained technical document as a subject and using the obtained weighting amount of each index word, perform factor analysis using each index word as an observation variable, and for each index word, factor loading for each factor And calculating means for calculating a factor score for each factor for each of the technical documents,
An attribute factor determining means for determining an attribution factor of each index word using the factor loading of each index word, and determining an attribution factor of each technical document using a factor score of each technical document,
A document group analysis apparatus comprising: an output unit that outputs an index word or a group of index words belonging to the same factor together with data of technical documents or technical document groups belonging to the corresponding factors, for each factor.

The document group analysis apparatus according to claim 1,
The attribution factor determination means includes
For each index word, using the calculated factor loading, the factor with the largest factor loading is selected, the selected factor is specified as the attribution factor of the index word, and each technical document A document group analysis apparatus characterized in that, using the calculated factor score, the factor having the maximum factor score is selected and the selected factor is specified as the attribution factor of the technical document.

The document group analysis apparatus according to claim 1 or 2,
The frequency of occurrence of index words included in the plurality of technical documents is obtained, the importance level of each index word is calculated using the frequency of appearance, and a predetermined number of index words having the highest importance level are calculated using the calculated importance level. It further comprises an important index word extraction means for extracting,
The document group analysis apparatus characterized in that the weighting amount calculating means obtains a weighting amount of a predetermined number of index words with higher importance as a weighting amount of each index word.

The document group analysis apparatus according to any one of claims 1 to 3,
For each of the factors, a factor evaluation value calculation means for calculating a factor evaluation value indicating a technical evaluation of the factor,
The output unit outputs a factor evaluation value of the technical document or technical document group as data of the technical document or technical document group.

The document group analysis apparatus according to any one of claims 1 to 3,
The technical literature is a patent document including a patent publication and a patent publication,
Progress information acquisition means for acquiring progress information of each patent document;
For each of the factors, using the progress information of the technical literature or technical literature group belonging to the factor, a factor evaluation value calculating means for calculating a factor evaluation value indicating a technical evaluation of the factor, and
The document group analysis apparatus characterized in that the output means outputs the factor evaluation value as data of the technical document or technical document group.

The document group analysis apparatus according to claim 5,
Provided with document number determination means for determining the number of technical documents or technical documents belonging to each factor,
The factor evaluation value calculation means includes
For each of the factors, a first index is calculated by giving a predetermined weight to the number of documents in the technical document or technical document group belonging to the factor. 2. A document group analyzing apparatus that calculates two indexes and calculates a factor evaluation value of the factor using the calculated first index and second index.

The document group analysis apparatus according to claim 6,
The progress information includes the number of citations from other companies, the number of oppositions to be patented, the number of requests for a patent invalidation trial, the existence of a request for examination, and the presence or absence of registration of patent right setting,
The first index is weighted using at least one of the total number of citations of other companies, the total number of patent oppositions, and the total number of patent invalidation requests. Value
The second index, which is an index of the progress information, is the total number of citations from other companies, the total number of oppositions to be patented, the total number of requests for patent invalidation trials, the examination request rate, and the registration assessment rate. A document group analyzing apparatus characterized in that it is a value obtained by indexing at least one of them.

The document group analysis apparatus according to any one of claims 5 to 7,
The document group analysis apparatus characterized in that the output means calculates the factor evaluation value for each factor and for each applicant.

A data analysis method performed by an information processing apparatus,
The information processing apparatus includes:
Obtaining a plurality of technical documents represented by text data;
Obtaining a weighting amount of each index word for each acquired technical document;
Using each obtained technical document as a subject and using the obtained weighting amount of each index word, perform factor analysis using each index word as an observation variable, and for each index word, factor loading for each factor And for each of the technical documents, calculating a factor score for each factor;
Determining the attribution factor of each index word using the factor loading of each index word, and determining the attribution factor of each technical document using the factor score of each technical document. Data analysis method.

A program for causing an information processing device to execute data analysis processing,
The program is
A process of obtaining a plurality of technical documents represented by text data;
For each technical document acquired, a process for obtaining a weighting amount of each index word;
Using each obtained technical document as a subject and using the obtained weighting amount of each index word, perform factor analysis using each index word as an observation variable, and for each index word, factor loading for each factor And calculating the factor score for each factor for each of the technical documents,
Determine the attribution factor of each index word using the factor loading of each index word, and cause the information processing device to execute the process of determining the attribution factor of each technical document using the factor score of each technical document A program characterized by

The document group analysis apparatus according to any one of claims 1 to 3,
The technical literature is a patent document including a patent publication and a patent publication,
For each acquired patent document, means for acquiring a patent score that individually evaluates the value of the patent document;
Document group analysis comprising factor evaluation value calculation means for calculating a factor evaluation value indicating a technical evaluation of the factor for each factor using the patent score of a patent document belonging to the factor apparatus.

The document group analysis apparatus according to claim 11,
The factor evaluation value calculation means includes
For each factor, a patent score that is equal to or higher than a predetermined threshold is selected from among the patent scores of patent documents belonging to the factor, and a value obtained by tabulating the selected patent scores is calculated as the factor evaluation value. Document group analyzer.

The document group analysis apparatus according to claim 12,
The document score analyzing apparatus, wherein the patent score is a value standardized in a document group of a population including a factor for which the factor evaluation value is calculated.

The document group analysis apparatus according to any one of claims 11 to 13,
The patent score means that the patent documents are classified into groups for each technical field and for each predetermined period, and for each classified group, the progress information of the patent documents belonging to the group is used, and for each patent document, A document group analyzing apparatus characterized by being a calculated value.