JP6053131B2

JP6053131B2 - Information processing apparatus, information processing method, and program

Info

Publication number: JP6053131B2
Application number: JP2012266084A
Authority: JP
Inventors: 稔也鶴原; 明洋東; 芳文鈴木; 義則三木; 由紀子中村; 美里酒井; 谷川　英和; 英和谷川
Original assignee: 株式会社アイ・アール・ディー
Priority date: 2012-12-05
Filing date: 2012-12-05
Publication date: 2016-12-27
Anticipated expiration: 2032-12-05
Also published as: JP2014112283A

Description

本発明は、特許調査の選別結果を用いて、特許書類を選択する情報処理装置等に関するものである。 The present invention relates to an information processing apparatus for selecting a patent document using a screening result of a patent search.

従来、特許調査の結果を管理する装置等が開発されている（例えば、特許文献１参照）。さらに、特定の特許書類に類似する特許書類を収集するための検索式を作成する装置等が開発されている（例えば、特許文献２参照）。 Conventionally, devices for managing the results of patent searches have been developed (see, for example, Patent Document 1). Furthermore, an apparatus for creating a search expression for collecting patent documents similar to a specific patent document has been developed (see, for example, Patent Document 2).

特開２００７−２４２００４号公報JP 2007-224004 A 特開２０１１−５９８４３号公報JP 2011-59843 A

しかしながら、従来の特許調査においては、人手により関連特許と非関連特許とを分類しており、それは、非常に時間のかかる作業である。さらに、一度非関連特許と分類された特許書類に誤りがあったとしても、それが再考されることは希であり、誤った分類結果が残ってしまうという問題があった。 However, in the conventional patent search, related patents and unrelated patents are classified manually, which is a very time-consuming operation. Furthermore, even if there is an error in a patent document that has been classified as an unrelated patent, it is rarely reconsidered, leaving the wrong classification result.

本第一の発明の情報処理装置は、特許調査における選別作業の結果、関連すると判断された特許書類である関連特許書類を特定する情報である関連特許識別情報が１以上格納される関連特許識別情報格納部と、関連特許識別情報格納部に格納されている１以上の関連特許識別情報で識別される特許書類に含まれる要素を用いて、関連特許識別情報格納部に格納されている１以上の関連特許識別情報で識別される特許書類を取得可能な検索式を生成する検索式生成部と、検索式生成部が生成した検索式を用いて取得される特許書類である検索特許書類を特定する情報である検索特許識別情報を取得する検索特許識別情報取得部と、関連特許識別情報格納部に格納されている各関連特許識別情報で識別される関連特許書類の特徴ベクトルである１以上の関連特許特徴ベクトルを取得し、かつ、検索特許識別情報取得部が取得した各検索特許識別情報で識別される検索特許書類の特徴ベクトルである１以上の検索特許特徴ベクトルを取得する特徴ベクトル取得部と、特徴ベクトル取得部が取得した１以上の検索特許特徴ベクトルのうち、特徴ベクトル取得部が取得した１以上の関連特許特徴ベクトルが有する特徴と類似する検索特許特徴ベクトルに対応する検索特許識別情報であり、かつ、関連特許識別情報格納部に格納されている関連特許識別情報と一致しない検索特許識別情報を、少なくとも選択する選択部と、選択部が選択した結果に関する情報を出力する出力部とを具備する情報処理装置である。
かかる構成により、情報処理装置は、特許調査で関連特許に選別された関連特許書類に類似する特許書類を検索し、その類似特許書類が関連特許書類かどうかを選択できる。これにより、例えば、情報処理装置は、特許調査における選別作業で調査していなかった特許書類のうち関連特許書類に類似する特許書類を効率的に取得し、取得した特許書類から関連特許を選択できる。 The information processing apparatus according to the first aspect of the present invention is a related patent identification in which one or more related patent identification information, which is information specifying a related patent document that is determined to be related as a result of a screening operation in patent search, is stored. One or more stored in the related patent identification information storage unit using elements included in the patent document identified by the information storage unit and one or more related patent identification information stored in the related patent identification information storage unit A search expression generation unit that generates a search expression that can acquire a patent document identified by related patent identification information, and a search patent document that is a patent document acquired by using the search expression generated by the search expression generation unit 1 is a feature vector of a related patent document identified by a search patent identification information acquisition unit that acquires search patent identification information that is information to be stored and each related patent identification information stored in the related patent identification information storage unit A feature vector that acquires the above related patent feature vector and that acquires one or more search patent feature vectors that are feature vectors of the search patent document identified by each search patent identification information acquired by the search patent identification information acquisition unit Of the one or more search patent feature vectors acquired by the acquisition unit and the feature vector acquisition unit, the search patent corresponding to the search patent feature vector similar to the feature of the one or more related patent feature vectors acquired by the feature vector acquisition unit A selection unit that selects at least search patent identification information that is identification information and does not match the related patent identification information stored in the related patent identification information storage unit, and an output that outputs information related to a result selected by the selection unit An information processing apparatus.
With this configuration, the information processing apparatus can search for a patent document similar to the related patent document selected as the related patent in the patent search and select whether the similar patent document is a related patent document. Thereby, for example, the information processing apparatus can efficiently acquire a patent document similar to the related patent document among the patent documents that have not been searched in the screening work in the patent search, and can select the related patent from the acquired patent documents. .

また、本第二の発明の情報処理装置は、第一の発明に対して、選択部で選択された検索特許識別情報のうち、関連特許識別情報格納部に格納されている関連特許識別情報と一致しない検索特許識別情報が多いほど低い評価をする評価部をさらに具備し、出力部は、評価部が評価した結果を出力する、情報処理装置である。
かかる構成により、情報処理装置は、特許調査における選別作業で調査していなかった、関連特許書類に類似する特許書類の件数が多いほど低い評価が行える。これにより、情報処理装置は、例えば、特許調査において、調査対象に含められていなかった関連特許が多いほど低い評価が行える。 In addition, the information processing apparatus of the second aspect of the present invention is related to the first aspect of the invention, the related patent identification information stored in the related patent identification information storage unit among the search patent identification information selected by the selection unit. As the number of search patent identification information items that do not match increases, an evaluation unit that further evaluates lower is further provided, and the output unit is an information processing device that outputs a result of evaluation performed by the evaluation unit.
With this configuration, the information processing apparatus can perform a lower evaluation as the number of patent documents similar to related patent documents that have not been investigated in the sorting operation in the patent search increases. Thereby, the information processing apparatus can perform a lower evaluation as the number of related patents that are not included in the search target increases, for example, in the patent search.

また、本第三の発明の情報処理装置は、第一または第二の発明に対して、選択部は、特徴ベクトル取得部が取得した１以上の検索特許特徴ベクトルのうち、特徴ベクトル取得部が取得した１以上の関連特許特徴ベクトルが有する特徴と類似する検索特許特徴ベクトルに対応する検索特許識別情報であり、かつ、関連特許識別情報格納部に格納されている関連特許識別情報で識別される特許書類と対応する検索特許識別情報も選択する、情報処理装置である。
かかる構成により、情報処理装置は、特許調査で関連特許に選別された関連特許書類も選択の対象にできる。これにより、例えば、情報処理装置は、特許調査における選別作業において、誤って関連特許に選別された特許書類を選択しないことができる。 Further, in the information processing apparatus according to the third aspect of the present invention, in the first or second aspect of the invention, the selection unit includes a feature vector acquisition unit out of one or more search patent feature vectors acquired by the feature vector acquisition unit. It is the search patent identification information corresponding to the search patent feature vector similar to the feature of the acquired one or more related patent feature vectors, and is identified by the related patent identification information stored in the related patent identification information storage unit The information processing apparatus also selects search patent identification information corresponding to a patent document.
With this configuration, the information processing apparatus can also select related patent documents selected as related patents in patent search. Thereby, for example, the information processing apparatus can not select a patent document that has been erroneously selected as a related patent in a sorting operation in a patent search.

また、本第四の発明の情報処理装置は、第二の発明に対して、選択部は、特徴ベクトル取得部が取得した１以上の検索特許特徴ベクトルのうち、特徴ベクトル取得部が取得した１以上の関連特許特徴ベクトルが有する特徴と類似する検索特許特徴ベクトルに対応する検索特許識別情報であり、かつ、関連特許識別情報格納部に格納されている関連特許識別情報で識別される特許書類と対応する検索特許識別情報も選択し、評価部は、検索特許識別情報取得部が取得した検索特許識別情報であって、かつ、選択部が選択した検索特許識別情報と一致しない、関連特許識別情報格納部に格納されている関連特許識別情報が多いほど低い評価をする、情報処理装置である。
かかる構成により、情報処理装置は、特許調査で関連特許に選別された関連特許書類も評価の対象にできる。これにより、例えば、情報処理装置は、特許調査において、誤って関連特許書類に選別された特許書類の件数が多いほど低い評価が行える。 Further, in the information processing apparatus according to the fourth aspect of the present invention, in contrast to the second aspect of the invention, the selection unit acquires 1 of the one or more search patent feature vectors acquired by the feature vector acquisition unit. Patent documents identified by the related patent identification information stored in the related patent identification information storage unit, which is the search patent identification information corresponding to the search patent feature vector similar to the characteristics of the related patent feature vector. Corresponding search patent identification information is also selected, the evaluation unit is the search patent identification information acquired by the search patent identification information acquisition unit, and does not match the search patent identification information selected by the selection unit, the related patent identification information This is an information processing apparatus that performs a lower evaluation as the related patent identification information stored in the storage unit increases.
With this configuration, the information processing apparatus can also evaluate related patent documents selected as related patents in patent searches. Thereby, for example, the information processing apparatus can perform a lower evaluation as the number of patent documents erroneously selected as related patent documents in a patent search increases.

また、本第五の発明の情報処理装置は、第一から第四のいずれか１つの発明に対して、特許調査における選別作業の結果、関連しないと判断された特許書類である非関連特許書類を特定する情報である非関連特許識別情報が１以上格納される非関連特許識別情報格納部をさらに具備し、検索式生成部は、非関連特許識別情報格納部に格納されている１以上の非関連特許識別情報で識別される特許書類に含まれる要素をさらに用いて、非関連特許識別情報格納部に格納されている１以上の非関連特許書類のうち少なくとも一部を取得しない検索式を生成する、情報処理装置である。
かかる構成により、情報処理装置は、非関連特許を考慮した検索式を生成できる。これにより、例えば、情報処理装置は、不要な特許書類を検索結果から省くことができる。 The information processing apparatus according to the fifth aspect of the invention is an unrelated patent document that is a patent document that is determined to be irrelevant as a result of the sorting work in the patent search for any one of the first to fourth inventions. And further includes an unrelated patent identification information storage unit that stores one or more unrelated patent identification information that is information for identifying the search expression generation unit. A search expression that does not acquire at least a part of one or more unrelated patent documents stored in the unrelated patent identification information storage unit by further using an element included in the patent document identified by the unrelated patent identification information. An information processing apparatus to be generated.
With this configuration, the information processing apparatus can generate a search expression that takes into account unrelated patents. Thereby, for example, the information processing apparatus can omit unnecessary patent documents from the search result.

また、本第六の発明の情報処理装置は、第五の発明に対して、特徴ベクトル取得部は、非関連特許識別情報格納部に格納されている各非関連特許識別情報で識別される非関連特許書類の特徴ベクトルである１以上の非関連特許特徴ベクトルをさらに取得し、選択部は、特徴ベクトル取得部が取得した１以上の検索特許特徴ベクトルのうち、特徴ベクトル取得部が取得した１以上の非関連特許特徴ベクトルが有する特徴と類似しない検索特許特徴ベクトルに対応する検索特許識別情報を選択する、情報処理装置である。
かかる構成により、情報処理装置は、非関連特許を考慮して、検索特許書類が関連特許かどうかの選択ができる。これにより、例えば、情報処理装置は、非関連特許特徴ベクトルも用いることによって、より正確な選択を行うことができる。 Further, in the information processing apparatus of the sixth invention, in contrast to the fifth invention, the feature vector acquisition unit is identified by each non-related patent identification information stored in the non-related patent identification information storage unit. One or more unrelated patent feature vectors, which are feature vectors of related patent documents, are further acquired, and the selection unit acquires 1 of the one or more search patent feature vectors acquired by the feature vector acquisition unit. This is an information processing apparatus that selects search patent identification information corresponding to a search patent feature vector that is not similar to the features of the above unrelated patent feature vectors.
With this configuration, the information processing apparatus can select whether the search patent document is a related patent in consideration of an unrelated patent. Thereby, for example, the information processing apparatus can perform more accurate selection by using an unrelated patent feature vector.

また、本第七の発明の情報処理装置は、第一から第四のいずれか１つの発明に対して、特許調査における選別作業の結果、関連しないと判断された特許書類である非関連特許書類を特定する情報である非関連特許識別情報が１以上格納される非関連特許識別情報格納部をさらに具備し、特徴ベクトル取得部は、非関連特許識別情報格納部に格納されている各非関連特許識別情報で識別される非関連特許書類の特徴ベクトルである１以上の非関連特許特徴ベクトルをさらに取得し、選択部は、特徴ベクトル取得部が取得した１以上の検索特許特徴ベクトルのうち、特徴ベクトル取得部が取得した１以上の非関連特許特徴ベクトルが有する特徴と類似しない検索特許特徴ベクトルに対応する検索特許識別情報を選択する、情報処理装置である。
かかる構成により、情報処理装置は、非関連特許を考慮して、検索特許書類が関連特許かどうかの選択ができる。これにより、例えば、情報処理装置は、非関連特許特徴ベクトルも用いることによって、より正確な選択を行うことができる。 In addition, the information processing apparatus according to the seventh aspect of the invention relates to any one of the first to fourth aspects of the invention. And further includes an unrelated patent identification information storage unit that stores one or more unrelated patent identification information that is information for identifying the non-related patent identification information storage unit. One or more unrelated patent feature vectors that are feature vectors of unrelated patent documents identified by the patent identification information are further acquired, and the selection unit includes, among the one or more search patent feature vectors acquired by the feature vector acquisition unit, The information processing apparatus selects search patent identification information corresponding to a search patent feature vector that is not similar to a feature of one or more unrelated patent feature vectors acquired by a feature vector acquisition unit.
With this configuration, the information processing apparatus can select whether the search patent document is a related patent in consideration of an unrelated patent. Thereby, for example, the information processing apparatus can perform more accurate selection by using an unrelated patent feature vector.

また、本第八の発明の情報処理装置は、第六または第七の発明に対して、選択部は、特徴ベクトル取得部が取得した１以上の非関連特許特徴ベクトルのうち、特徴ベクトル取得部が取得した１以上の関連特許特徴ベクトルが有する特徴に類似し、特徴ベクトル取得部が取得した１以上非関連特許特徴ベクトルが有する特徴に類似しない、非関連特許特徴ベクトルに対応する非関連特許識別情報も選択する、情報処理装置である。
かかる構成により、情報処理装置は、特許調査で非関連特許に選別された特許書類も選択の対象にできる。これにより、例えば、情報処理装置は、特許調査における選別作業で非関連特許書類に選別されているが、非関連特許ではない可能性のある特許書類をも選択できる。 In the information processing apparatus according to the eighth aspect of the invention, in the sixth or seventh aspect of the invention, the selection unit is a feature vector acquisition unit among one or more unrelated patent feature vectors acquired by the feature vector acquisition unit. The unrelated patent identification corresponding to the unrelated patent feature vector, which is similar to the feature of one or more related patent feature vectors acquired by, and not similar to the feature of one or more unrelated patent feature vectors acquired by the feature vector acquisition unit An information processing apparatus that also selects information.
With this configuration, the information processing apparatus can also select patent documents selected as unrelated patents in the patent search. As a result, for example, the information processing apparatus can select a patent document that has been selected as an unrelated patent document in the screening work in the patent search but may not be an unrelated patent.

また、本第九の発明の情報処理装置は、第二または第三の発明に対して、特許調査における選別作業の結果、関連しないと判断された特許書類である非関連特許書類を特定する情報である非関連特許識別情報が１以上格納される非関連特許識別情報格納部をさらに具備し、特徴ベクトル取得部は、非関連特許識別情報格納部に格納されている各非関連特許識別情報で識別される非関連特許書類の特徴ベクトルである１以上の非関連特許特徴ベクトルをさらに取得し、選択部は、特徴ベクトル取得部が取得した１以上の非関連特許特徴ベクトルのうち、特徴ベクトル取得部が取得した１以上の関連特許特徴ベクトルが有する特徴に類似し、特徴ベクトル取得部が取得した１以上非関連特許特徴ベクトルが有する特徴に類似しない、非関連特許特徴ベクトルに対応する非関連特許識別情報も選択し、評価部は、選択部で選択された非関連特許識別情報が多いほど低い評価をする、情報処理装置である。
かかる構成により、情報処理装置は、特許調査で非関連特許に選別された特許書類も評価の対象にできる。これにより、例えば、情報処理装置は、特許調査における選別作業で非関連特許書類に選別されているが、非関連特許ではない可能性のある特許書類の件数が多いほど低い評価が行える。 In addition, the information processing apparatus of the ninth aspect of the invention is an information for specifying an unrelated patent document that is a patent document that is determined to be irrelevant as a result of the sorting work in the patent search for the second or third invention. And further includes an unrelated patent identification information storage unit storing one or more unrelated patent identification information, and the feature vector acquisition unit includes each unrelated patent identification information stored in the unrelated patent identification information storage unit. One or more unrelated patent feature vectors that are feature vectors of the unrelated patent document to be identified are further acquired, and the selection unit acquires a feature vector from among the one or more unrelated patent feature vectors acquired by the feature vector acquisition unit. The related patent feature vector is similar to the feature of one or more related patent feature vectors acquired by the department, and is not similar to the feature of one or more unrelated patent feature vectors acquired by the feature vector acquisition unit. Also selected unrelated patent identification information corresponding to the vector, the evaluation unit is a low evaluation The more unrelated patent identification information selected by the selecting unit, an information processing apparatus.
With this configuration, the information processing apparatus can also evaluate patent documents selected as unrelated patents in patent searches. Thereby, for example, the information processing apparatus is sorted into unrelated patent documents by sorting work in patent search, but the lower the number of patent documents that may not be unrelated patents, the lower the evaluation can be made.

また、本第十の発明の情報処理装置は、第一から第九のいずれか１つの発明に対して、特徴ベクトル取得部は、各特許書類を特定する情報で識別される各特許書類を取得し、各特許書類から１以上の要素を取得し、１以上の要素を用いて各特徴ベクトルを取得する、情報処理装置である。
かかる構成により、情報処理装置は、特許書類に記載された要素を基にして特徴ベクトルを作成できる。これにより、例えば、情報処理装置は、外部のサーバ等を用いなくても特徴ベクトルを取得できる。 Further, in the information processing apparatus according to the tenth aspect of the present invention, the feature vector acquisition unit acquires each patent document identified by information specifying each patent document, relative to any one of the first to ninth inventions The information processing apparatus acquires one or more elements from each patent document and acquires each feature vector using the one or more elements.
With this configuration, the information processing apparatus can create a feature vector based on the elements described in the patent document. Thereby, for example, the information processing apparatus can acquire a feature vector without using an external server or the like.

また、本第十一の発明の情報処理装置は、第一から第十のいずれか１つの発明に対して、特徴ベクトル取得部は、各特許書類から取得された類似する要素を同じベクトル要素に対応する要素とした各特徴ベクトルを取得する、情報処理装置である。
かかる構成により、情報処理装置は、特許書類に記載されている要素のうち、同じ意味を有する別の表現を同じベクトル要素とした特徴ベクトルを取得できる。これにより、例えば、異なる表現を用いて作成された類似文書から、類似する特徴ベクトルを取得できる。 In addition, in the information processing apparatus according to the eleventh aspect of the present invention, in any one of the first to tenth aspects, the feature vector acquisition unit converts similar elements acquired from the respective patent documents into the same vector element. This is an information processing apparatus that acquires each feature vector as a corresponding element.
With this configuration, the information processing apparatus can acquire a feature vector having another expression having the same meaning as the same vector element among elements described in the patent document. Thereby, for example, similar feature vectors can be acquired from similar documents created using different expressions.

本発明による情報処理装置等によれば、特許書類が関連特許である場合に選択できる。 The information processing apparatus according to the present invention can be selected when the patent document is a related patent.

実施の形態１における情報処理装置を含むシステムの概念図Conceptual diagram of a system including an information processing apparatus according to Embodiment 1 同実施の形態における情報処理装置のブロック図Block diagram of the information processing apparatus in the embodiment 同実施の形態における情報処理装置の動作を示すフローチャートThe flowchart which shows operation | movement of the information processing apparatus in the embodiment 同実施の形態における情報処理装置の検索式を生成する処理に関する動作を示すフローチャートThe flowchart which shows the operation | movement regarding the process which produces | generates the search formula of the information processing apparatus in the embodiment 同実施の形態における情報処理装置の特徴ベクトルを取得する処理に関する動作を示すフローチャートThe flowchart which shows the operation | movement regarding the process which acquires the feature vector of the information processing apparatus in the embodiment 同実施の形態における情報処理装置の関連特許書類を選択する処理に関する動作を示すフローチャートThe flowchart which shows the operation | movement regarding the process which selects the related patent document of the information processing apparatus in the embodiment 同実施の形態における特許書類格納部に格納されている特許書類の一例を示す図The figure which shows an example of the patent document stored in the patent document storage part in the embodiment 同実施の形態における類似要素格納部に格納されている類似要素の一例を示す図The figure which shows an example of the similar element stored in the similar element storage part in the embodiment 同実施の形態における各特許識別情報格納部に格納されている特許書類を特定する情報の一例を示す図The figure which shows an example of the information which identifies the patent document stored in each patent identification information storage part in the embodiment 同実施の形態における特徴ベクトル取得部が取得した各特許書類の特徴ベクトルの一例を示す図The figure which shows an example of the feature vector of each patent document which the feature vector acquisition part in the embodiment acquired 同実施の形態における情報処理装置の表示の一例を示す図The figure which shows an example of the display of the information processing apparatus in the embodiment 同実施の形態における各特許書類の関係を説明するための図The figure for demonstrating the relationship of each patent document in the embodiment 上記実施の形態におけるコンピュータシステムの外観の一例を示す図The figure which shows an example of the external appearance of the computer system in the said embodiment 上記実施の形態におけるコンピュータシステムの構成の一例を示す図The figure which shows an example of a structure of the computer system in the said embodiment.

以下、情報処理装置等の実施形態について図面を参照して説明する。なお、実施の形態において同じ符号を付した構成要素は同様の動作を行うので、再度の説明を省略する場合がある。 Hereinafter, embodiments of an information processing apparatus and the like will be described with reference to the drawings. In addition, since the component which attached | subjected the same code | symbol in embodiment performs the same operation | movement, description may be abbreviate | omitted again.

（実施の形態１）
本実施の形態において、特許調査における選別作業の結果を用いて関連特許を選択する情報処理装置１について説明する。 (Embodiment 1)
In the present embodiment, an information processing apparatus 1 that selects a related patent using a result of sorting work in a patent search will be described.

図１は、本実施の形態における情報処理装置１を含むシステムの概念図である。図１において、情報処理装置１と１または２以上のユーザ端末２とは、ネットワーク１００を介して接続されている。ネットワーク１００は、有線、または無線の通信回線であり、例えば、インターネットやイントラネット、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、公衆電話回線等である。また、ユーザ端末２は、ネットワーク１００に接続可能な端末であれば何でも良い。例えば、ユーザ端末２は、デスクトップパソコン、ノートパソコン、スマートフォン、またはＰＤＡ等であっても良い。 FIG. 1 is a conceptual diagram of a system including an information processing apparatus 1 according to the present embodiment. In FIG. 1, an information processing apparatus 1 and one or more user terminals 2 are connected via a network 100. The network 100 is a wired or wireless communication line, such as the Internet, an intranet, a LAN (Local Area Network), a public telephone line, or the like. The user terminal 2 may be any terminal that can be connected to the network 100. For example, the user terminal 2 may be a desktop personal computer, a notebook personal computer, a smartphone, or a PDA.

図２は、本実施の形態における情報処理装置１のブロック図である。情報処理装置１は、受付部１０１、関連特許識別情報格納部１０２、非関連特許識別情報格納部１０３、検索式生成部１０４、特許書類格納部１０５、検索特許識別情報取得部１０６、類似要素格納部１０７、特徴ベクトル取得部１０８、選択部１０９、評価部１１０、および出力部１１１を備える。 FIG. 2 is a block diagram of the information processing apparatus 1 in the present embodiment. The information processing apparatus 1 includes a reception unit 101, a related patent identification information storage unit 102, an unrelated patent identification information storage unit 103, a search formula generation unit 104, a patent document storage unit 105, a search patent identification information acquisition unit 106, and a similar element storage. Unit 107, feature vector acquisition unit 108, selection unit 109, evaluation unit 110, and output unit 111.

受付部１０１は、１または２以上の関連特許書類を特定する情報である関連特許識別情報を受け付ける。また、受付部１０１は、１または２以上の非関連特許書類を特定する情報である非関連特許識別情報を受け付けても良い。また、受付部１０１は、関連特許書類と非関連特許書類を取得するために使用した検索式を受け付けても良い。また、受付部１０１は、特許調査の種類を受け付けても良い。特許調査の種類は、特許調査の種類を識別する情報であっても良い。関連特許書類とは、調査対象の技術や発明に関連する特許書類である。非関連特許書類とは、調査対象の技術や発明等に関連しない特許書類である。受付部１０１が受け付ける関連特許識別情報、および非関連特許識別情報は、通常、人手で行った特許調査における選別作業の結果、関連する、または関連しないと判断された特許書類を特定する情報であるが、情報処理装置１またはその他の装置等で自動的に選別された結果であっても良い。特許書類とは、特許庁に出願された特許等の出願書類等に関する情報である。特許書類の種類は、公開特許公報、特許公報、実用新案公開公報、実用新案登録公報、公表特許公報、公表実用新案公報、再公表特許公報、再公表実用新案公報等である。また、特許書類の発行国は、日本、米国、中国、欧州、韓国等、問わない。また、特許書類の言語やデータ形式等は、問わない。特許書類を特定する情報は、特許書類を１つに特定できる情報であれば何でも良い。特許書類を特定する情報は、例えば、特許書類そのものであっても良く、特許書類を特定する特許書類ＩＤであっても良い。特許書類ＩＤは、出願番号であっても良く、公開番号であっても良く、登録番号等であっても良く、特許書類が格納されている格納装置等で管理されているＩＤ等であっても良い。特許調査とは、ある技術やある発明等に関連した文献を調査することである。例えば、特許調査は、その目的に応じていくつかの種類がある。特許調査の種類は、例えば、先行技術調査、侵害予防調査や無効化資料調査等である。先行技術調査は、特許出願前に先行技術が存在するか否かを調査することをいう。侵害予防調査は、商品やサービスを市場に出す前に他者特許を侵害するか否かを調査することをいう。無効化資料調査は、他者の登録特許を無効にするために調査することをいう。なお、特許調査は、関連特許書類と非関連特許書類を選別する作業のみを示す表現であっても良く、検索式を構築し、選別する特許書類を収集することも含めて示す表現であっても良く、調査対象を分析することもさらに含めて示す表現であっても良い。 The receiving unit 101 receives related patent identification information that is information for specifying one or more related patent documents. The receiving unit 101 may receive unrelated patent identification information that is information for specifying one or more unrelated patent documents. In addition, the reception unit 101 may receive a search expression used to acquire related patent documents and non-related patent documents. The accepting unit 101 may accept the type of patent search. The type of patent search may be information for identifying the type of patent search. Related patent documents are patent documents related to the technology or invention to be searched. Unrelated patent documents are patent documents that are not related to the technology or invention to be searched. The related patent identification information and the unrelated patent identification information received by the receiving unit 101 are information for identifying patent documents that are determined to be related or not related as a result of a screening operation in a manual patent search. However, it may be a result of being automatically selected by the information processing device 1 or other devices. A patent document is information relating to an application document such as a patent filed with the JPO. The types of patent documents include published patent gazette, patent gazette, utility model published gazette, utility model registered gazette, published patent gazette, published utility model gazette, republished patent gazette, republished utility model gazette, and the like. The country of patent documents issuance is not limited to Japan, the United States, China, Europe, South Korea, etc. The language and data format of patent documents are not limited. The information for specifying the patent document may be anything as long as the information can specify one patent document. The information specifying the patent document may be, for example, the patent document itself or a patent document ID that specifies the patent document. The patent document ID may be an application number, a public number, a registration number, or the like, which is an ID managed by a storage device or the like in which the patent document is stored. Also good. A patent search is a search for documents related to a certain technology or invention. For example, there are several types of patent searches depending on the purpose. The types of patent searches are, for example, prior art searches, infringement prevention searches, invalidation material searches, and the like. Prior art search refers to investigating whether prior art exists before filing a patent application. Infringement prevention investigation refers to investigating whether or not to infringe other patents before putting goods or services on the market. The invalidation document search is a search for invalidating the registered patent of another person. In addition, the patent search may be an expression indicating only the operation of selecting related patent documents and non-related patent documents, and is an expression indicating that a search expression is constructed and collecting patent documents to be selected. It may also be an expression that further includes analyzing the survey object.

検索式は、データベース等に格納されている特許書類を特定するために用いられる情報である。検索式は、検索する対象を絞り込むための要素を含む。また、検索式は、２以上の要素を関係づけたり、要素を否定したりする論理演算子をさらに含んでも良く、１または２以上の論理演算子と１または２以上の要素とを一つのブロックとして扱うための情報であるブロック区切り要素を含んでも良い。なお、検索式のデータ構造、データ形式、および定義方法等は問わない。 The search formula is information used to specify patent documents stored in a database or the like. The search expression includes an element for narrowing down a search target. The search expression may further include a logical operator that relates two or more elements or negates the element, and one or more logical operators and one or two or more elements are combined into one block. It may include a block delimiter element that is information for handling as. Note that the data structure, data format, definition method, and the like of the search expression are not limited.

要素は、用語であっても良く、特許分類であっても良く、書誌情報を絞り込む情報であっても良く、後述するブロック要素であっても良い。用語は、調査対象を特定するキーワードであることが好適であるが、どのような用語であっても良い。また、用語は、検索フィールドが対応付けられていても良い。検索フィールドとは、検索式中の用語を検索する対象である。例えば、「要約」、「特許請求の範囲」、「要約＋特許請求の範囲」、または「全文」等である。特許分類は、特許書類を分類し、検索時に利用可能な情報であれば何でも良い。例えば、特許分類は、ＩＰＣ、ＦＩ、Ｆターム、ＵＳクラス、ＥＣＬＡ、ＰＣＰ等のコードのいずれかであっても良く、または、上記コードの一部分であっても良い。コードの一部分とは、例えば、ＩＰＣであればセクション、クラス、サブクラス、またはメイングループ等であっても良い。また、コードの一部分は、例えば、Ｆタームであれば、テーマコード、またはテーマコードと観点等のことであっても良い。書誌情報とは、文献を特定するために必要な情報である。例えば、書誌情報は、出願番号であっても良く、公開番号であっても良く、登録番号であっても良く、代理人名等の代理人に関する情報であっても良く、出願人名等の出願人に関する情報であっても良く、発明者名等の発明者に関する情報であっても良く、出願日であっても良く、公開日であっても良く、その他の情報であっても良い。書誌情報を絞り込む情報は、書誌情報の値そのものであっても良く、範囲を有する値であっても良い。範囲を有する値とは、例えば、出願日「２０１０．１．１〜２０１０．３．１」のように、２０１０年に出願された特許書類を指定する値である。 The element may be a term, a patent classification, information for narrowing down bibliographic information, or a block element described later. The term is preferably a keyword that specifies a survey target, but may be any term. The term may be associated with a search field. The search field is a target for searching for terms in the search formula. For example, “Summary”, “Claims”, “Summary + Claims”, or “Full text”. The patent classification may be anything that classifies patent documents and can be used when searching. For example, the patent classification may be any of codes such as IPC, FI, F-term, US class, ECLA, and PCP, or may be a part of the code. The part of the code may be, for example, a section, a class, a subclass, or a main group in the case of IPC. Further, a part of the code may be, for example, a theme code or a theme code and a viewpoint as long as it is F-term. Bibliographic information is information necessary for specifying a document. For example, the bibliographic information may be an application number, a public number, a registration number, information on an agent such as an agent name, and an applicant such as an applicant name. May be information on the inventor, such as the inventor name, the filing date, the publication date, or other information. The information for narrowing down the bibliographic information may be the bibliographic information value itself or a value having a range. A value having a range is a value that designates a patent document filed in 2010, such as the filing date “2011.10.1 to 2010.10.1”.

論理演算子は、例えばＡＮＤ演算子であっても良く、ＯＲ演算子であっても良く、ＮＯＴ演算子であっても良い。ＡＮＤ演算子とＯＲ演算子とは、２以上の要素を関係づけたりする論理演算子である。例えば、ＡＮＤ演算子は、２つの要素の積集合を求める演算子である。ＯＲ演算子は、２つの要素の和集合を求める演算子である。例えば、ＮＯＴ演算子は、要素を否定する演算子である。例えば、ＮＯＴ演算子は、特定の要素の集合が含まない集合を求める演算子である。ブロック区切り要素は、例えば、括弧であっても良く、グラフィカルユーザインタフェース上におけるテキストボックスであっても良い。ブロック区切り要素で区切られた領域をブロック要素とする。例えば、「（用語ＡＯＲ用語Ｂ）ＡＮＤ用語Ｃ」の括弧で区切られた「（用語ＡＯＲ用語Ｂ）」は、ブロック要素であり、「（」と「）」とは、ブロック区切り要素である。 The logical operator may be, for example, an AND operator, an OR operator, or a NOT operator. The AND operator and the OR operator are logical operators that relate two or more elements. For example, the AND operator is an operator for obtaining a product set of two elements. The OR operator is an operator for obtaining a union of two elements. For example, the NOT operator is an operator that negates an element. For example, the NOT operator is an operator for obtaining a set that does not include a set of specific elements. The block delimiter element may be, for example, parentheses or a text box on a graphical user interface. An area delimited by block delimiter elements is defined as a block element. For example, “(term A OR term B) AND (term B) AND (term C)” in parentheses is a block element, and “(” and “)” are block delimiters. is there.

また、受付部１０１は、関連特許識別情報を関連特許識別情報格納部１０２に蓄積する。また、非関連特許識別情報を受け付けた場合は、受付部１０１は、その非関連特許識別情報を非関連特許識別情報格納部１０３に蓄積する。 In addition, the receiving unit 101 accumulates related patent identification information in the related patent identification information storage unit 102. When receiving unrelated patent identification information, the receiving unit 101 stores the unrelated patent identification information in the unrelated patent identification information storage unit 103.

受付部１０１は、通常、ネットワーク１００を介してユーザ端末２から送信された情報を受け付けるが、キーボードやマウス、タッチパネル等の入力デバイスから入力された情報の受け付け、有線もしくは無線の通信回線を介して送信された情報の受信、光ディスクや磁気ディスク、半導体メモリ等の記録媒体から読み出された情報の受け付けであっても良い。 The accepting unit 101 usually accepts information transmitted from the user terminal 2 via the network 100, but accepts information input from an input device such as a keyboard, mouse, touch panel, etc., via a wired or wireless communication line. It may be reception of transmitted information or reception of information read from a recording medium such as an optical disk, a magnetic disk, or a semiconductor memory.

関連特許識別情報格納部１０２には、１または２以上の関連特許識別情報が格納される。関連特許識別情報は、特許調査における選別作業の結果、関連すると判断された特許書類である関連特許書類を特定する情報である。関連特許識別情報格納部１０２は、不揮発性の記録媒体が好適であるが、揮発性の記録媒体でも実現可能である。 The related patent identification information storage unit 102 stores one or more related patent identification information. The related patent identification information is information for identifying a related patent document that is a patent document that is determined to be related as a result of the sorting operation in the patent search. The related patent identification information storage unit 102 is preferably a non-volatile recording medium, but can also be realized by a volatile recording medium.

非関連特許識別情報格納部１０３には、１または２以上の非関連特許識別情報が格納される。非関連特許識別情報は、特許調査における選別作業の結果、関連しないと判断された特許書類である非関連特許書類を特定する情報である。非関連特許識別情報格納部１０３は、不揮発性の記録媒体が好適であるが、揮発性の記録媒体でも実現可能である。 The unrelated patent identification information storage unit 103 stores one or more unrelated patent identification information. The unrelated patent identification information is information for identifying an unrelated patent document that is a patent document that has been determined to be unrelated as a result of the sorting operation in the patent search. The non-related patent identification information storage unit 103 is preferably a nonvolatile recording medium, but can also be realized by a volatile recording medium.

検索式生成部１０４は、関連特許識別情報格納部１０２に格納されている１以上の関連特許識別情報で識別される特許書類に含まれる要素を用いて、関連特許識別情報格納部１０２に格納されている１以上の関連特許識別情報で識別される特許書類を取得可能な検索式を生成する。なお、検索式生成部１０４は、関連特許識別情報格納部１０２に格納されている関連特許識別情報で識別される特許書類（以下、格納済み関連特許書類ということもある）をすべて取得する検索式を生成することが好適であるが、格納済み関連特許書類の一部を取得する検索式を生成しても良い。例えば、検索式生成部１０４は、格納済み関連特許書類から要素を取得し、取得した各要素をＯＲ演算子で接続して検索式を生成しても良い。 The search expression generation unit 104 is stored in the related patent identification information storage unit 102 using elements included in the patent document identified by one or more related patent identification information stored in the related patent identification information storage unit 102. A search expression capable of acquiring a patent document identified by one or more related patent identification information is generated. The search expression generation unit 104 acquires all patent documents identified by the related patent identification information stored in the related patent identification information storage unit 102 (hereinafter also referred to as stored related patent documents). However, it is also possible to generate a search expression for obtaining a part of stored related patent documents. For example, the search expression generation unit 104 may acquire elements from the stored related patent documents and connect the acquired elements with an OR operator to generate a search expression.

また、検索式生成部１０４は、非関連特許識別情報格納部１０３に格納されている１以上の非関連特許識別情報で識別される特許書類に含まれる要素をさらに用いて、非関連特許識別情報格納部１０３に格納されている１以上の非関連特許書類のうち少なくとも一部を取得しない検索式を生成しても良い。なお、検索式生成部１０４は、非関連特許識別情報格納部１０３に格納されている非関連特許識別情報で識別される特許書類（以下、格納済み非関連特許書類ということもある）をすべて取得しない検索式を生成することが好適であるが、格納済み非関連特許書類の一部を取得しない検索式を生成しても良い。例えば、検索式生成部１０４は、格納済み非関連特許書類から要素を取得し、取得した各要素をＯＲ演算子で接続した要素をブロック区切り要素でブロック要素にし、ブロック要素にＮＯＴ演算子を適用して生成した検索式を、格納済み関連特許書類を用いて生成した検索式をブロック区切り要素でブロック要素にしたものにＡＮＤ演算子で接続して検索式を生成しても良い。なお、検索式を生成する際に、格納済み関連特許書類、および格納済み非関連特許書類に含まれている要素は、生成する検索式に含めなくても良く、ＮＯＴ演算子を適用するブロック要素にのみ含めなくても良い。このように、検索式生成部１０４は、関連特許書類のすべてを含み、非関連特許書類を可能な限り含まない検索式を生成する。 Further, the search expression generation unit 104 further uses the elements included in the patent document identified by the one or more unrelated patent identification information stored in the unrelated patent identification information storage unit 103, A search expression that does not acquire at least a part of one or more unrelated patent documents stored in the storage unit 103 may be generated. The search expression generation unit 104 acquires all patent documents (hereinafter, also referred to as stored unrelated patent documents) identified by unrelated patent identification information stored in the unrelated patent identification information storage unit 103. Although it is preferable to generate a search expression that does not, a search expression that does not acquire a part of the stored unrelated patent documents may be generated. For example, the search expression generation unit 104 acquires an element from a stored unrelated patent document, makes an element obtained by connecting each acquired element with an OR operator a block element as a block delimiter element, and applies a NOT operator to the block element The search expression may be generated by connecting the search expression generated using the stored related patent document to a block element with a block delimiter element using an AND operator. When generating a search expression, elements included in stored related patent documents and stored non-related patent documents may not be included in the generated search expression, and block elements to which the NOT operator is applied. It is not necessary to include only. In this way, the search expression generation unit 104 generates a search expression that includes all of the related patent documents and does not include unrelated patent documents as much as possible.

なお、検索式生成部１０４は、検索式に含める要素を重要な用語のみを選択して、生成する検索式に含めても良い。重要な要素とは、たとえば、閾値以上の割合の特許書類に記載されている要素であっても良く、１の特許書類内で閾値以上の回数記載されている要素であっても良く、ＩＤＦ値が閾値以下の要素であっても良く、ＴＦ・ＩＤＦ値が閾値以上の要素であっても良く、上記条件のうち２以上の条件の組み合わせを満たす要素であっても良い。なお、各閾値は、ユーザが任意に設定する値であっても良く、開発者が経験的に設定する値であっても良い。また、閾値は、要素の種類ごとに設定しても良い。また、ＩＤＦ値を算出する対象となる文書は、特許書類格納部１０５に格納されている前特許書類であっても良く、格納済み関連特許書類と格納済み非関連特許書類とであっても良い。また、閾値以上の件数の特許書類に記載されている要素とは、閾値以上の格納済み関連特許書類に記載されている要素であっても良く、閾値以上の格納済み非関連特許書類に記載されている要素であっても良く、閾値以上の双方の特許書類に記載されている要素であっても良い。なお、格納済み関連特許書類と格納済み非関連特許書類とに記載されている要素は、ＮＯＴ演算子を適用するブロック要素に含めないことが好適であるが、ＮＯＴ演算子を適用するブロック要素に含めても良い。 Note that the search expression generation unit 104 may select only the important terms and include the elements to be included in the search expression in the generated search expression. The important element may be, for example, an element described in a patent document having a ratio equal to or greater than a threshold value, or an element described in the number of times equal to or greater than the threshold value in one patent document. May be an element that is equal to or less than a threshold value, may be an element that has a TF / IDF value equal to or greater than a threshold value, and may be an element that satisfies a combination of two or more conditions among the above conditions. Each threshold value may be a value that is arbitrarily set by the user or may be a value that is empirically set by the developer. The threshold value may be set for each element type. The document for which the IDF value is calculated may be a previous patent document stored in the patent document storage unit 105, or a stored related patent document and a stored unrelated patent document. . In addition, the elements described in the number of patent documents exceeding the threshold may be elements described in stored related patent documents exceeding the threshold, and may be described in stored unrelated patent documents exceeding the threshold. The element may be an element described in both patent documents exceeding the threshold value. The elements described in the stored related patent documents and the stored non-related patent documents are preferably not included in the block elements to which the NOT operator is applied, but are not included in the block elements to which the NOT operator is applied. May be included.

検索式生成部１０４は、検索式の要素に、特許書類から取得した要素に類似する要素を取得して含めても良い。例えば、検索式生成部１０４は、取得した要素に類似する要素を、類似要素格納部１０７から取得しても良く、類似要素を提供する外部の装置等から、ネットワーク１００を介して取得しても良い。検索式生成部１０４が取得する類似する要素は、後述する類似要素格納部１０７の類似要素と同様のものとする。また、検索式生成部１０４は、検索式に含める要素の数が最小になるよう選択しても良い。検索式に含める要素の数を最小にするのは、要素が特許分類である場合に好適であるが、要素が用語である場合であっても良く、要素が書誌情報である場合であっても良く、上記各要素を組み合わせた場合であっても良い。 The search expression generation unit 104 may acquire and include an element similar to the element acquired from the patent document in the element of the search expression. For example, the search expression generation unit 104 may acquire an element similar to the acquired element from the similar element storage unit 107, or may acquire it from the external device that provides the similar element via the network 100. good. Similar elements acquired by the search expression generation unit 104 are the same as similar elements in the similar element storage unit 107 described later. In addition, the search formula generation unit 104 may select so that the number of elements included in the search formula is minimized. Minimizing the number of elements to include in a search formula is preferred when the element is a patent classification, but it may be when the element is a term or even when the element is bibliographic information. It may be a case where the above elements are combined.

受付部１０１が検索式を受け付ける場合で、受け付けた検索式の用語に検索フィールドが対応付けられているときは、検索式生成部１０４は、生成する検索式の用語に検索フィールドを対応付けても良く、生成する検索式の用語にその検索フィールドを広げて対応付けても良く、対応付けなくても良い。検索フィールドを広げるとは、検索対象の特許書類を増やすようにすることである。具体的には、検索フィールドを広げるとは、受け付けた検索式の用語に対応付いている検索フィールドが「要約＋請求項」である場合に、検索フィールドを「全文」にすることであっても良い。また、受付部１０１が特許調査の種類を受け付ける場合は、検索式生成部１０４は、特許調査の種類に合った検索フィールドを用語に対応付けても良く、特許調査の種類に合った書誌情報を検索式に含めても良い。例えば、検索式生成部１０４は、特許調査の種類が無効化資料調査であった場合に、用語に検索フィールド「全文」が対応付けられ、かつ出願日が本日から２０年前までの特許書類を検索する検索式を生成しても良い。受付部１０１が検索式を受け付ける場合で、書誌情報を有しているときは、検索式生成部１０４は、その書誌情報を生成する検索式に含めても良く、その書誌情報を生成する検索式に含めなくても良い。なお、検索式生成部１０４は、生成する検索式の用語に対してあらかじめ設定されている検索フィールドを対応付けて検索式を生成しても良い。また、検索式生成部１０４は、あらかじめ設定されている書誌情報を含む検索式を生成しても良い。あらかじめ設定するとは、ユーザが任意に設定しても良く、開発者が経験から設定しても良い。なお、検索式に含める要素の数を最小にする方法等、検索式を生成する方法について、本説明で詳細に記載されていないものは、特開２０１１−５９８４３号公報を参照されたい。検索式生成部１０４は、通常、ＭＰＵやメモリ等から実現され得る。検索式生成部１０４の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 When the accepting unit 101 accepts a search expression and the search field is associated with the accepted search expression term, the search expression generating unit 104 may associate the search field with the generated search expression term. The search field to be generated may be associated with the search field expanded or may not be associated. Expanding the search field means increasing the number of patent documents to be searched. Specifically, expanding the search field may mean that the search field is “full text” when the search field associated with the accepted search term is “summary + claim”. good. Further, when the receiving unit 101 receives a type of patent search, the search expression generation unit 104 may associate a search field that matches the type of patent search with a term, and search for bibliographic information that matches the type of patent search. It may be included in the search expression. For example, when the type of patent search is invalidation material search, the search formula generation unit 104 searches for patent documents whose search field “full text” is associated with the term and whose application date is from the current date to 20 years ago. A search expression for searching may be generated. When the accepting unit 101 accepts a search formula and has bibliographic information, the search formula generating unit 104 may include the bibliographic information in the search formula for generating the bibliographic information. Does not have to be included. Note that the search expression generation unit 104 may generate a search expression by associating a search field set in advance with a search expression term to be generated. Further, the search formula generation unit 104 may generate a search formula including bibliographic information set in advance. Setting in advance may be arbitrarily set by the user or may be set by the developer based on experience. For a method for generating a search expression such as a method for minimizing the number of elements included in the search expression, see Japanese Patent Application Laid-Open No. 2011-59843 for a method that is not described in detail in this description. The search expression generation unit 104 can be usually realized by an MPU, a memory, or the like. The processing procedure of the search expression generation unit 104 is usually realized by software, and the software is recorded in a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

特許書類格納部１０５には、１または２以上の特許書類が格納される。特許書類格納部１０５に格納される特許書類は、その特許書類の少なくとも一部を含む情報であっても良く、全部であっても良い。特許書類格納部１０５には、特許書類を特定する情報で特許書類が特定できるように特許書類が格納されている。特許書類は、特許書類格納部１０５に、特許書類に含まれる情報で特許書類を特定する情報から特定できるように格納されていても良く、特許書類に特許書類を特定する情報を対応付けて、特許書類を特定する情報から特定できるように格納されていても良い。特許書類格納部１０５は、不揮発性の記録媒体が好適であるが、揮発性の記録媒体でも実現可能である。特許書類格納部１０５に特許書類が格納される過程は問わない。例えば、特許書類格納部１０５は、記録媒体を介して特許書類が格納されるようになっても良く、通信回線等を介して送信された特許書類が格納されるようになっても良く、あるいは、入力デバイスを介して入力された特許書類が格納されるようになっても良い。 The patent document storage unit 105 stores one or more patent documents. The patent documents stored in the patent document storage unit 105 may be information including at least a part of the patent documents, or may be all. The patent document storage unit 105 stores the patent document so that the patent document can be specified by the information specifying the patent document. The patent document may be stored in the patent document storage unit 105 so as to be able to be specified from the information specifying the patent document by the information included in the patent document. The patent document is associated with the information specifying the patent document, It may be stored so that it can be identified from information identifying the patent document. The patent document storage unit 105 is preferably a non-volatile recording medium, but can also be realized by a volatile recording medium. The process of storing patent documents in the patent document storage unit 105 does not matter. For example, the patent document storage unit 105 may store a patent document via a recording medium, or may store a patent document transmitted via a communication line or the like, or The patent document input via the input device may be stored.

検索特許識別情報取得部１０６は、検索式生成部１０４が生成した検索式を用いて取得される特許書類である検索特許書類を特定する情報である検索特許識別情報を取得する。つまり、検索特許識別情報取得部１０６は、検索式生成部１０４が生成した検索式で検索し、ヒットした特許書類を特定し、その特許書類を特定するである検索特許識別情報を取得する。なお、検索特許識別情報取得部１０６は、特許書類格納部１０５から検索特許識別情報を取得しても良く、特許書類を特定する情報を提供する外部の装置から検索特許識別情報を取得しても良い。なお、結果的に検索特許識別情報は、関連特許識別情報格納部１０２に格納されている関連特許識別情報、または非関連特許識別情報格納部１０３に格納されている非関連特許識別情報と重複しても良く、重複しなくても良い。つまり、検索式生成部１０４が生成する検索式は、格納済み非関連特許書類を検索する検索式であっても良い。検索特許識別情報取得部１０６は、通常、ＭＰＵやメモリ等から実現され得る。検索特許識別情報取得部１０６の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The search patent identification information acquisition unit 106 acquires search patent identification information that is information for specifying a search patent document that is a patent document acquired using the search formula generated by the search formula generation unit 104. That is, the search patent identification information acquisition unit 106 searches the search formula generated by the search formula generation unit 104, specifies a hit patent document, and acquires search patent identification information that specifies the patent document. The search patent identification information acquisition unit 106 may acquire the search patent identification information from the patent document storage unit 105, or may acquire the search patent identification information from an external device that provides information specifying the patent document. good. As a result, the searched patent identification information overlaps with related patent identification information stored in the related patent identification information storage unit 102 or unrelated patent identification information stored in the unrelated patent identification information storage unit 103. It does not have to be duplicated. That is, the search formula generated by the search formula generation unit 104 may be a search formula for searching stored unrelated patent documents. The search patent identification information acquisition unit 106 can usually be realized by an MPU, a memory, or the like. The processing procedure of the search patent identification information acquisition unit 106 is usually realized by software, and the software is recorded in a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

類似要素格納部１０７には、１または２以上の類似する要素である類似要素が格納される。類似要素は、同義語であっても良く、上位語と下位語の関係の語であっても良く、同じ技術分野を示す特許分類であっても良く、特許分類の変遷において移行または統合された特許分類等であっても良い。なお、類似要素格納部１０７には、見出しとなる要素と、見出しとなる要素と類似する要素とを対応づけて格納されることが好適であるが、１以上の類似要素が１対１で対応づけられていても良い。類似要素格納部１０７は、不揮発性の記録媒体が好適であるが、揮発性の記録媒体でも実現可能である。類似要素格納部１０７に類似要素が格納される過程は問わない。例えば、類似要素格納部１０７は、記録媒体を介して類似要素が格納されるようになっても良く、通信回線等を介して送信された類似要素が格納されるようになっても良く、あるいは、入力デバイスを介して入力された類似要素が格納されるようになっても良い。 The similar element storage unit 107 stores one or more similar elements that are similar elements. A similar element may be a synonym, a word having a relationship between a broader term and a narrower term, a patent category indicating the same technical field, or a transition or integration in the transition of a patent category. It may be a patent classification or the like. The similar element storage unit 107 preferably stores a heading element and an element similar to the heading element in association with each other, but one or more similar elements have a one-to-one correspondence. It may be attached. The similar element storage unit 107 is preferably a non-volatile recording medium, but can also be realized by a volatile recording medium. The process of storing similar elements in the similar element storage unit 107 does not matter. For example, the similar element storage unit 107 may store similar elements via a recording medium, may store similar elements transmitted via a communication line, or the like. Similar elements input via the input device may be stored.

特徴ベクトル取得部１０８は、関連特許識別情報格納部１０２に格納されている各関連特許識別情報で識別される関連特許書類の特徴ベクトルである１以上の関連特許特徴ベクトルを取得し、かつ、検索特許識別情報取得部１０６が取得した各検索特許識別情報で識別される検索特許書類の特徴ベクトルである１以上の検索特許特徴ベクトルを取得する。また、特徴ベクトル取得部１０８は、非関連特許識別情報格納部１０３に格納されている各非関連特許識別情報で識別される非関連特許書類の特徴ベクトルである１以上の非関連特許特徴ベクトルをさらに取得しても良い。なお、特徴ベクトル取得部１０８は、図示しない特徴ベクトル格納部から特許書類を特定する情報等を用いて特徴ベクトルを取得しても良く、図示しない外部の装置から特許書類を特定する情報等を用いて特徴ベクトルを取得しても良く、特許書類から特徴ベクトルを作成して取得しても良い。なお、図示しない外部の装置から特徴ベクトルを取得する場合は、外部の装置は、特徴ベクトル取得部１０８の要求に応じて特徴ベクトルを生成しても良く、あらかじめ生成された特徴ベクトルを格納していても良い。特許書類から特徴ベクトルを作成して取得する場合は、特徴ベクトル取得部１０８は、各特許書類を特定する情報で識別される各特許書類を取得し、各特許書類から１以上の要素を取得し、その１以上の要素を用いて各特徴ベクトルを取得する。要素は、用語であっても良く、特許分類であっても良く、書誌情報であっても良い。 The feature vector acquisition unit 108 acquires and searches one or more related patent feature vectors that are feature vectors of related patent documents identified by each related patent identification information stored in the related patent identification information storage unit 102. One or more search patent feature vectors, which are feature vectors of a search patent document identified by each search patent identification information acquired by the patent identification information acquisition unit 106, are acquired. The feature vector acquisition unit 108 also obtains one or more unrelated patent feature vectors that are feature vectors of unrelated patent documents identified by each unrelated patent identification information stored in the unrelated patent identification information storage unit 103. Furthermore, you may acquire. Note that the feature vector acquisition unit 108 may acquire a feature vector using information specifying a patent document from a feature vector storage unit (not shown), or use information specifying a patent document from an external device (not shown). The feature vector may be acquired, or the feature vector may be generated from a patent document. When acquiring a feature vector from an external device (not shown), the external device may generate a feature vector in response to a request from the feature vector acquisition unit 108, and stores a previously generated feature vector. May be. When creating and obtaining feature vectors from patent documents, the feature vector obtaining unit 108 obtains each patent document identified by information for identifying each patent document, and obtains one or more elements from each patent document. Each feature vector is acquired using the one or more elements. The element may be a term, a patent classification, or bibliographic information.

要素が用語である場合は、特徴ベクトル取得部１０８が各特許書類から取得する用語は、例えば、あらかじめ決められた品詞（例えば、名詞等）であっても良く、予め決められた品詞の連続であっても良く、専門用語であっても良く、あるいは、その他の用語であっても良い。用語が専門用語である場合には、例えば、図示しない記録媒体に専門用語が記憶されており、その専門用語が関連特許書類に含まれているかどうか判断することによって用語の取得が行われても良く、文書から専門用語を抽出するアルゴリズムを用いることによって用語の取得が行われても良い。後者の方法については、例えば、次の文献を参照されたい。大畑博一、中川裕志、「連接異なり語数による専門用語抽出」、情報処理学会研究報告、２０００−ＮＬ−１３６、ｐ．１１９−１２６。中川裕志、森辰則、湯本紘彰、「出現頻度と連接頻度に基づく専門用語抽出」、自然言語処理、Ｖｏｌ．１０Ｎｏ．１、ｐ．２７−４５、２００３年１月。なお、その用語の取得の際に、特徴ベクトル取得部１０８は、ＴＦ値やＴＦ・ＩＤＦ値を重要度として利用し、重要度の低い用語を除いて用語を取得しても良い。また、ＩＤＦ値を算出する対象となる文書は、特許書類格納部１０５に格納されている全特許書類であっても良く、格納済み関連特許書類と格納済み非関連特許書類と検索特許識別情報で識別される特許書類（以下、検索特許書類ということもある）とであっても良い。なお、非関連特許特徴ベクトルを取得しない場合は、格納済み非関連特許書類をＩＤＦ値を算出する対象としなくても良い。重要度の低い用語とは、重要度の値が閾値以下である用語であっても良く、重要度順に並べて上位から所定の個数の用語であっても良い。なお、閾値は、例えば、予め決められた値であっても良く、重要度の最大値に１より小さい数（例えば、０．９や０．８等）を掛けた値であっても良い。また、その所定の個数は、例えば、予め決められた個数であっても良く、特徴ベクトル取得部１０８が取得した用語の総数に１より小さい値（例えば、０．０１や０．００１等）を掛けた個数であっても良い。また、特徴ベクトル取得部１０８は、特許の書類の少なくとも一部を含む文字列から用語を取得する。特許の書類の少なくとも一部を含む文字列とは、例えば、特許の書類に記載された全文であっても良く、見出しによって識別される文字列であっても良く、複数の見出しの組み合わせによって識別される文字列であっても良い。見出しとは、例えば、発明の名称、特許請求の範囲、または要約等の特許の書類内の領域を識別するものであれば何でも良い。 When the element is a term, the term acquired from each patent document by the feature vector acquisition unit 108 may be, for example, a predetermined part of speech (for example, a noun or the like), and may be a series of predetermined parts of speech. There may be technical terms or other terms. If the term is a technical term, for example, the technical term is stored in a recording medium (not shown), and the term can be acquired by determining whether the technical term is included in the related patent document. The term may be acquired by using an algorithm for extracting technical terms from a document. For the latter method, see, for example, the following document. Hirokazu Ohata, Hiroshi Nakagawa, “Extraction of technical terms by different number of words”, Information Processing Society of Japan Research Report, 2000-NL-136, p. 119-126. Nakagawa Hiroshi, Mori Yasunori, Yumoto Yasuaki, “Terminology Extraction Based on Appearance Frequency and Connection Frequency”, Natural Language Processing, Vol. 10 No. 1, p. 27-45, January 2003. When acquiring the term, the feature vector acquisition unit 108 may acquire the term by using the TF value or the TF / IDF value as the importance and excluding the term having a low importance. The document for which the IDF value is calculated may be all the patent documents stored in the patent document storage unit 105. The stored related patent documents, the stored unrelated patent documents, and the search patent identification information may be used. It may be a patent document to be identified (hereinafter also referred to as a search patent document). In addition, when an unrelated patent feature vector is not acquired, the stored unrelated patent document may not be a target for calculating the IDF value. The term of low importance may be a term whose importance value is a threshold value or less, or may be a predetermined number of terms arranged in order of importance. The threshold may be a predetermined value, for example, or may be a value obtained by multiplying the maximum value of importance by a number smaller than 1 (for example, 0.9 or 0.8). The predetermined number may be a predetermined number, for example, and a value smaller than 1 (for example, 0.01 or 0.001) is added to the total number of terms acquired by the feature vector acquisition unit 108. It may be a multiplied number. The feature vector acquisition unit 108 acquires terms from a character string including at least a part of a patent document. The character string including at least a part of a patent document may be, for example, the full text described in the patent document, a character string identified by a heading, or a combination of a plurality of headings. It may be a character string. The heading may be anything as long as it identifies an area in a patent document such as, for example, the name of an invention, a claim, or a summary.

特徴ベクトル取得部１０８は、各特許書類から取得された、類似する要素を同じベクトル要素とした各特徴ベクトルを取得しても良い。類似する要素を同じベクトル要素とする場合は、特徴ベクトル取得部１０８は、類似要素格納部１０７に格納されている情報を用いて、類似する要素を同じベクトル要素にしても良く、図示しない類似する要素を提供する外部の装置からネットワークを介して類似する要素を取得し、その情報を用いて類似する要素を同じベクトル要素にしても良い。具体的には、特徴ベクトル取得部１０８は、類似要素格納部１０７に、「データベース、データーベース、ＤＢ、リポジトリ、辞書」が格納されている場合に、「データベース」と「データーベース」と「ＤＢ」と「リポジトリ」、「辞書」はすべて同一の用語として処理を行う。 The feature vector acquisition unit 108 may acquire each feature vector obtained from each patent document with similar elements as the same vector element. When similar elements are set to the same vector element, the feature vector acquisition unit 108 may use the information stored in the similar element storage unit 107 to change the similar elements into the same vector element, and is not illustrated. A similar element may be acquired via a network from an external device that provides the element, and the similar element may be made the same vector element using the information. Specifically, the feature vector acquisition unit 108, when “database, database, DB, repository, dictionary” is stored in the similar element storage unit 107, “database”, “database”, and “DB” ”,“ Repository ”, and“ dictionary ”are all processed as the same term.

特徴ベクトル取得部１０８が取得する各特徴ベクトルは、特徴ベクトル取得部１０８が取得したすべての要素数分の次元を持つベクトルであっても良く、各特許書類ごとに次元数が異なっていても良い。特徴ベクトルの要素は、例えば、用語の出現回数であっても良く、用語のＴＦ値であっても良く、用語のＴＦ・ＩＤＦ値であっても良く、用語が特許の書類内に存在するかどうかを示す数字、具体的には「１（存在する）」「−１（存在しない）」等であっても良い。 Each feature vector acquired by the feature vector acquisition unit 108 may be a vector having dimensions for all the elements acquired by the feature vector acquisition unit 108, and the number of dimensions may be different for each patent document. . The element of the feature vector may be, for example, the number of occurrences of the term, the TF value of the term, or the TF / IDF value of the term, and whether the term exists in the patent document. It may be a number indicating whether or not, specifically “1 (present)”, “−1 (not present)”, or the like.

また、特徴ベクトル取得部１０８は、特許書類全体（全文）から特徴ベクトルを取得しても良く、特許書類の一部分から特徴ベクトルを取得しても良い。特許書類の一部分とは、例えば、国際特許分類、要約書、特許請求の範囲等であっても良く、それらの組み合わせであっても良い。 The feature vector acquisition unit 108 may acquire a feature vector from the entire patent document (full text), or may acquire a feature vector from a part of the patent document. The part of the patent document may be, for example, an international patent classification, abstract, claims, or a combination thereof.

なお、特徴ベクトル取得部１０８は、同じ特許書類の特徴ベクトルを算出する場合は、２回以上同じ特徴ベクトルを取得する処理を行っても良く、１度算出した特徴ベクトルを図示しない格納部に蓄積することで、同じ処理を行わずに取得しても良い。具体的には、特徴ベクトル取得部１０８は、格納済み関連特許書類、および格納済み非関連特許書類と検索特許書類が重複した場合に、一度のみ特徴ベクトルを取得する処理を行っても良く、２度特徴ベクトルを取得する処理を行っても良い。特徴ベクトル取得部１０８は、通常、ＭＰＵやメモリ等から実現され得る。特徴ベクトル取得部１０８の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 Note that the feature vector acquisition unit 108 may perform a process of acquiring the same feature vector more than once when calculating the feature vector of the same patent document, and accumulates the calculated feature vector in a storage unit (not shown). By doing so, it may be obtained without performing the same processing. Specifically, the feature vector acquisition unit 108 may perform a process of acquiring a feature vector only once when a stored related patent document and a stored unrelated patent document and a search patent document overlap. You may perform the process which acquires a degree feature vector. The feature vector acquisition unit 108 can be usually realized by an MPU, a memory, or the like. The processing procedure of the feature vector acquisition unit 108 is usually realized by software, and the software is recorded in a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

選択部１０９は、特徴ベクトル取得部１０８が取得した１または２以上の検索特許特徴ベクトルのうち、特徴ベクトル取得部１０８が取得した１または２以上の関連特許特徴ベクトルが有する特徴と類似する検索特許特徴ベクトルに対応する検索特許識別情報であり、かつ、関連特許識別情報格納部１０２に格納されている関連特許識別情報と一致しない検索特許識別情報を、少なくとも選択する。つまり、選択部１０９は、特許調査における選別作業で関連特許書類に選別されていなかった、検索特許識別情報から関連特許の可能性のある特許書類を特定する情報を選択する。また、選択部１０９は、特徴ベクトル取得部１０８が取得した１または２以上の検索特許特徴ベクトルのうち、特徴ベクトル取得部１０８が取得した１または２以上の関連特許特徴ベクトルが有する特徴と類似する検索特許特徴ベクトルに対応する検索特許識別情報であり、かつ、関連特許識別情報格納部１０２に格納されている関連特許識別情報で識別される特許書類と対応する検索特許識別情報も選択しても良い。つまり、選択部１０９は、格納済み関連特許書類から、関連特許書類として正しく分類された特許書類を特定する情報を選択しても良い。また、選択部１０９は、特徴ベクトル取得部１０８が取得した１または２以上の検索特許特徴ベクトルのうち、特徴ベクトル取得部１０８が取得した１または２以上の非関連特許特徴ベクトルが有する特徴と類似しない検索特許特徴ベクトルに対応する検索特許識別情報を選択しても良い。つまり、選択部１０９は、非関連特許書類の特徴を有する検索特許書類を特定する情報を選択しなくても良い。また、選択部１０９は、特徴ベクトル取得部１０８が取得した１または２以上の非関連特許特徴ベクトルのうち、特徴ベクトル取得部１０８が取得した１または２以上の関連特許特徴ベクトルが有する特徴に類似し、特徴ベクトル取得部１０８が取得した１以上非関連特許特徴ベクトルが有する特徴に類似しない、非関連特許特徴ベクトルに対応する非関連特許識別情報も選択する。つまり、選択部１０９は、特許調査における選別作業で非関連特許書類と判断された、特許書類のうち、関連特許書類の可能性の高い特許書類を選択しても良い。特徴ベクトルが有する特徴と類似するとは、１または２以上の特徴ベクトルが構成するクラスに含まれることと考えても良い。以下、クラスを用いて説明する。なお、選択部１０９は、各特許書類を特定する情報を選択するが、各特許書類を選択しても良い。 The selection unit 109 is a search patent similar to the feature of one or more related patent feature vectors acquired by the feature vector acquisition unit 108 among one or more search patent feature vectors acquired by the feature vector acquisition unit 108. At least search patent identification information that corresponds to the feature vector and does not match the related patent identification information stored in the related patent identification information storage unit 102 is selected. In other words, the selection unit 109 selects information that identifies a patent document that has a possibility of a related patent from the search patent identification information that has not been selected as a related patent document in the screening work in the patent search. The selection unit 109 is similar to the feature of one or more related patent feature vectors acquired by the feature vector acquisition unit 108 among one or more search patent feature vectors acquired by the feature vector acquisition unit 108. The search patent identification information corresponding to the patent document identified by the related patent identification information stored in the related patent identification information storage unit 102 and the search patent identification information corresponding to the search patent feature vector is also selected. good. That is, the selection unit 109 may select information for identifying a patent document correctly classified as a related patent document from the stored related patent documents. The selection unit 109 is similar to the feature of one or more unrelated patent feature vectors acquired by the feature vector acquisition unit 108 among one or more search patent feature vectors acquired by the feature vector acquisition unit 108. Search patent identification information corresponding to the search patent feature vector not to be selected may be selected. That is, the selection unit 109 does not have to select information for specifying a search patent document having characteristics of an unrelated patent document. The selection unit 109 is similar to the feature of one or more related patent feature vectors acquired by the feature vector acquisition unit 108 among one or more unrelated patent feature vectors acquired by the feature vector acquisition unit 108. Then, the unrelated patent identification information corresponding to the unrelated patent feature vector that is not similar to the feature of the one or more unrelated patent feature vectors acquired by the feature vector acquisition unit 108 is also selected. That is, the selection unit 109 may select a patent document having a high possibility of a related patent document among the patent documents that are determined to be unrelated patent documents in the sorting operation in the patent search. It may be considered that one or two or more feature vectors are included in a class that is similar to a feature of a feature vector. Hereinafter, explanation will be made using classes. The selection unit 109 selects information specifying each patent document, but may select each patent document.

クラスとは、１または２以上の特徴ベクトルが算出されたものの集合である。クラスには、異なる種類の特徴ベクトルを含まない。異なる種類の特徴ベクトルを含まないとは、例えば、関連特許特徴ベクトルのクラスには、関連特許特徴ベクトルではない特徴ベクトルを含まないということである。なお、非関連特許識別情報格納部１０３がない場合は、選択部１０９は、関連特許特徴ベクトルのクラスの補集合を非関連特許特徴ベクトルの集合と判断しても良い。また、関連特許特徴ベクトルのクラスと非関連特許特徴ベクトルのクラスの和集合の補集合を、関連特許書類でも非関連特許書類でもない特許書類と判断しても良い。なお、クラスは、後述する機械学習を用いて、学習器が分類した結果である集合であっても良い。 A class is a set of one or more feature vectors calculated. Classes do not contain different types of feature vectors. The phrase “not including different types of feature vectors” means, for example, that a class of related patent feature vectors does not include a feature vector that is not a related patent feature vector. If there is no unrelated patent identification information storage unit 103, the selection unit 109 may determine that the complement of the class of related patent feature vectors is a set of unrelated patent feature vectors. Further, a complementary set of the union of the related patent feature vector class and the unrelated patent feature vector class may be determined as a patent document that is neither a related patent document nor an unrelated patent document. The class may be a set that is a result of classification by a learning device using machine learning described later.

選択部１０９が、特徴ベクトルを用いて、特許書類を選択する方法は問わない。例えば、選択部１０９は、ベクトルの類似度を用いて選択しても良く、機械学習を用いて選択しても良い。以下、選択部１０９が（Ａ）ベクトルの類似度を用いて選択する方法、（Ｂ）機械学習を用いて選択する方法、の２つに分けて説明する。なお、選択部１０９は、各特許特徴ベクトルを各々２以上のクラスに分類しても良い。 There is no limitation on the method in which the selection unit 109 selects a patent document using the feature vector. For example, the selection unit 109 may select using vector similarity or may use machine learning. Hereinafter, the selection unit 109 will be described in two parts: (A) a method of selecting using vector similarity, and (B) a method of selecting using machine learning. Note that the selection unit 109 may classify each patent feature vector into two or more classes.

（Ａ）特徴ベクトルの類似度を用いて選択
特徴ベクトルの類似度を用いて選択するとは、判断する対象の特許特徴ベクトルと、関連特許特徴ベクトルのクラスの代表ベクトルとの類似度が閾値以下である場合に、判断する対象の特許特徴ベクトルが関連特許であると判断して選択しても良く、判断する対象の特許特徴ベクトルと、関連特許特徴ベクトルのクラスの代表ベクトルとの類似度が、非関連特許特徴ベクトルのクラスの代表ベクトルの類似度より小さい場合に、判断する対象の特許特徴ベクトルが関連特許であると判断して選択しても良く、判断する対象の特許特徴ベクトルと、関連特許特徴ベクトルのクラスの代表ベクトルとの類似度が閾値以下である場合で、非関連特許特徴ベクトルのクラスの代表ベクトルの類似度より小さいときに、判断する対象の特許特徴ベクトルが関連特許であると判断して選択しても良い。なお、類似度の算出方法は、ＣＯＳ尺度を用いても良く、ピアソンの相関係数を用いても良く、偏差パターン類似度等のベクトル間の類似度を算出する他の計算方法を用いても良い。各類似度を算出する計算方法については、公知技術であるため、説明を省略する。また、関連特許特徴ベクトルのクラスは１であっても良く、２以上であっても良い。クラスを２以上作成する場合は、例えば、分割最適化クラスタリング（ｋ−ｍｅａｎｓ法等）を用いて特徴ベクトルをクラスタリングしても良く、階層的クラスタリング（最短距離法等）を用いて特徴ベクトルをクラスタリングしても良く、その他の公知なクラスタリング手法を用いて特徴ベクトルをクラスタリングしても良い。なお、各クラスタリング法の詳細は公知技術であるため、説明を省略する。また、非関連特許特徴ベクトルのクラスについても同様である。代表ベクトルは、クラス内の最も頻出する特徴ベクトルであっても良く、クラスの平均ベクトルであっても良い。なお、平均ベクトルは、通常のベクトル平均であっても良く、すべてを単位ベクトルとして扱って算出する単位ベクトル平均であっても良い。また、選択部１０９は、代表ベクトルを決定する際に、一度代表ベクトルを算出し、その代表ベクトルとの類似度が閾値以上高い特徴ベクトルをクラスから除外して再度算出した代表ベクトルを選択の際に使用しても良い。この際に使用する閾値は、最初に作成した代表ベクトルとその代表ベクトルを作成するのに用いた特徴ベクトルの類似度の平均値に、１以上の値（例えば、１．５や２．０）を掛けた値であっても良く、ユーザまたは開発者が任意に設定した値であっても良い。なお、類似かどうかを判断する閾値は、あらかじめ決められた値であっても良く、選択部１０９が算出した値であっても良い。閾値があらかじめ決められた値である場合は、閾値は、ユーザまたは開発者が任意に設定した値であっても良い。閾値が選択部１０９が算出した値である場合は、各クラスの代表ベクトルと、そのクラスに属する最も類似しない特徴ベクトルとの類似度であっても良く、各クラスの代表ベクトルと、そのクラスに属する最も類似しない特徴ベクトルとの類似度の平均値であっても良く、各クラスの代表ベクトルと、そのクラスに属する最も類似しない特徴ベクトルとの類似度の最小値であっても良く、各クラスの代表ベクトルと、そのクラスに属する最も類似しない特徴ベクトルとの類似度の最大値であっても良い。 (A) Selection using feature vector similarity The selection using feature vector similarity means that the similarity between a patent feature vector to be judged and a representative vector of a class of related patent feature vectors is equal to or less than a threshold value. In some cases, it may be determined that the patent feature vector to be judged is a related patent, and the similarity between the patent feature vector to be judged and the representative vector of the class of the related patent feature vector is If the degree of similarity of the representative vector of the class of the unrelated patent feature vector is smaller, the patent feature vector to be judged may be determined to be a related patent and may be selected. The similarity between the representative vector of the patent feature vector class and the representative vector is less than or equal to the threshold, and is smaller than the similarity of the representative vector of the class of the unrelated patent feature vector Sometimes, the patent feature vector to be determined may be selected based on the related patent. The similarity calculation method may use a COS scale, may use a Pearson correlation coefficient, or may use another calculation method for calculating a similarity between vectors such as a deviation pattern similarity. good. Since the calculation method for calculating each similarity is a known technique, a description thereof will be omitted. The class of the related patent feature vector may be 1 or 2 or more. When creating two or more classes, for example, feature vectors may be clustered using division optimization clustering (such as k-means method), or feature vectors may be clustered using hierarchical clustering (such as shortest distance method). Alternatively, the feature vectors may be clustered using other known clustering methods. Note that details of each clustering method are well-known techniques, and thus description thereof is omitted. The same applies to the class of unrelated patent feature vectors. The representative vector may be a feature vector that appears most frequently in the class, or may be an average vector of the class. The average vector may be a normal vector average, or a unit vector average calculated by treating all as unit vectors. Further, when determining the representative vector, the selection unit 109 calculates the representative vector once, selects a representative vector that is calculated again by excluding a feature vector whose similarity with the representative vector is higher than a threshold value from the class. May be used for The threshold used at this time is one or more values (for example, 1.5 or 2.0) in the average value of the similarity between the representative vector created first and the feature vector used to create the representative vector. Or a value arbitrarily set by the user or the developer. Note that the threshold for determining whether or not they are similar may be a predetermined value or a value calculated by the selection unit 109. When the threshold value is a predetermined value, the threshold value may be a value arbitrarily set by a user or a developer. When the threshold value is a value calculated by the selection unit 109, it may be the similarity between the representative vector of each class and the least similar feature vector belonging to the class, and the representative vector of each class and the class It may be the average value of the similarity with the least similar feature vector belonging to it, or it may be the minimum value of the similarity between the representative vector of each class and the least similar feature vector belonging to that class. May be the maximum value of the similarity between the representative vector and the least similar feature vector belonging to the class.

（Ｂ）機械学習を用いて選択
機械学習を用いて選択するとは、ニューラルネットワークやＳＶＭ（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ）やＳＶＲ（ＳｕｐｐｏｒｔＶｅｃｔｏｒＲｅｇｒｅｓｓｉｏｎ）等の学習器を用いて学習し、学習が完了した学習器に対して、判断する対象の特許特徴ベクトルを分類させて関連特許特徴ベクトルのクラスに属するかどうかを判断し、選択することである。ニューラルネットワークとは、脳機能におけるいくつかの特性を計算機上のシミュレーションによって表現することを目指した学習モデルである。ニューラルネットワークには、様々な種類のモデルや方法があるが、そのどれを採用しても良い。例えば、ニューラルネットワークの種類は、パーセプトロンを採用しても良く、バックプロパケーションを採用しても良く、ボルツマンマシン等を採用しても良い。ＳＶＭとは、教師データを用いて分類パターンを学習し、分類の境界線を設定し、分類を行う学習モデルである。ＳＶＲとは、教師データを用いて分類パターンを学習し、３以上のクラスに分類する学習モデルである。各機械学習のアルゴリズムは、公知技術であるため説明を省略する。選択部１０９が機械学習を用いて選択する場合、関連特許識別情報格納部１０２と非関連特許識別情報格納部１０３に格納されている特許の書類の特徴ベクトルを教師データとして学習させる。選択部１０９が学習に使用する素性は、特徴ベクトル取得部１０８が取得した各要素に対応した値である。学習が完了した後、判断する対象の特許特徴ベクトルが関連特許特徴ベクトルと判断された場合に、選択部１０９は、その判断する対象の特許特徴ベクトルが関連特許特徴ベクトルが有する特徴と類似する、および、または非関連特許特徴ベクトルが有する特徴と類似しないと判断し、その判断する対象の特許特徴ベクトルを選択する。 (B) Selection using machine learning Selection using machine learning means learning using a learning device such as a neural network, SVM (Support Vector Machine), or SVR (Support Vector Regression), and learning is completed. On the other hand, the patent feature vector to be judged is classified to determine whether it belongs to the class of the related patent feature vector and to select it. A neural network is a learning model that aims to express some characteristics of brain function by computer simulation. There are various types of models and methods in the neural network, any of which may be adopted. For example, as the type of neural network, a perceptron may be employed, back-property may be employed, or a Boltzmann machine may be employed. SVM is a learning model that learns a classification pattern using teacher data, sets boundary lines for classification, and performs classification. SVR is a learning model that learns classification patterns using teacher data and classifies them into three or more classes. Since each machine learning algorithm is a known technique, description thereof is omitted. When the selection unit 109 selects using machine learning, the feature vector of the patent document stored in the related patent identification information storage unit 102 and the unrelated patent identification information storage unit 103 is learned as teacher data. The feature used by the selection unit 109 for learning is a value corresponding to each element acquired by the feature vector acquisition unit 108. After the learning is completed, when the patent feature vector to be determined is determined to be the related patent feature vector, the selection unit 109 is similar to the feature included in the related patent feature vector. And / or judging that the feature is not similar to the feature of the unrelated patent feature vector, and selecting the patent feature vector to be judged.

なお、各特徴ベクトルの次元数が異なる場合は、選択部１０９は、次元数を統一する。選択部１０９は、通常、ＭＰＵやメモリ等から実現され得る。選択部１０９の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 Note that when the number of dimensions of each feature vector is different, the selection unit 109 unifies the number of dimensions. The selection unit 109 can be usually realized by an MPU, a memory, or the like. The processing procedure of the selection unit 109 is usually realized by software, and the software is recorded in a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

評価部１１０は、選択部１０９で選択された検索特許識別情報のうち、関連特許識別情報格納部１０２に格納されている関連特許識別情報と一致しない検索特許識別情報が多いほど低い評価をする。評価をするとは、評価値を取得する事であっても良い。つまり、評価部１１０は、特許調査における選別作業で選別していなかった検索特許書類が多いほど低い評価値を取得する事であっても良い。また、選択部１０９が格納済み関連特許書類も選択の対象とする場合は、評価部１１０は、検索特許識別情報取得部１０６が取得した検索特許識別情報であって、かつ、選択部１０９が選択した検索特許識別情報と一致しない、関連特許識別情報格納部１０２に格納されている関連特許識別情報が多いほど低い評価をしても良い。つまり、評価部１１０は、特許調査における選別作業で、誤って関連特許書類と判断した非関連特許書類が多いほど低い評価値を取得しても良い。また、選択部１０９が格納済み非関連特許書類も選択の対象とする場合は、評価部１１０は、選択部１０９で選択された非関連特許識別情報が多いほど低い評価をしても良い。つまり、評価部１１０は、特許調査における選別作業で、誤って非関連特許書類と判断した関連特許書類が多いほど低い評価値を取得しても良い。なお、評価値を取得する方法は、選択部１０９が選択した結果を用いて計算することで算出した評価値を取得しても良く、評価テーブルを用いて、選択部１０９が選択した結果に対応した評価値を取得しても良い。評価テーブルは、選択部１０９の選択結果と減点する値とが対応付けられているテーブルであっても良く、選択部１０９の選択結果と加点する値とが対応付けられているテーブルであっても良い。選択部１０９の選択結果とは、選択部１０９が選択した検索特許識別情報の件数であっても良く、選択部１０９が選択しなかった格納済み関連特許書類を特定する情報の件数であっても良く、選択部１０９が選択した格納済み非関連特許書類を特定する情報の件数であっても良い。 The evaluation unit 110 performs a lower evaluation as more search patent identification information that does not match the related patent identification information stored in the related patent identification information storage unit 102 among the search patent identification information selected by the selection unit 109 increases. The evaluation may be to acquire an evaluation value. That is, the evaluation unit 110 may acquire a lower evaluation value as there are more search patent documents that have not been selected in the selection work in the patent search. In addition, when the selection unit 109 selects the related patent document that has already been stored, the evaluation unit 110 is the search patent identification information acquired by the search patent identification information acquisition unit 106 and is selected by the selection unit 109. As the related patent identification information stored in the related patent identification information storage unit 102 does not match the retrieved patent identification information, the lower the evaluation, the lower the evaluation may be. That is, the evaluation unit 110 may acquire a lower evaluation value as there are more unrelated patent documents that are erroneously determined to be related patent documents in the screening work in the patent search. When the selection unit 109 also selects stored non-related patent documents, the evaluation unit 110 may perform a lower evaluation as the number of non-related patent identification information selected by the selection unit 109 increases. That is, the evaluation unit 110 may acquire a lower evaluation value as there are more related patent documents that are erroneously determined to be unrelated patent documents in the selection work in the patent search. Note that the evaluation value may be acquired by calculating an evaluation value calculated using the result selected by the selection unit 109, and corresponds to the result selected by the selection unit 109 using an evaluation table. You may acquire the evaluated value. The evaluation table may be a table in which the selection result of the selection unit 109 is associated with the value to be deducted, or may be a table in which the selection result of the selection unit 109 is associated with the value to be added. good. The selection result of the selection unit 109 may be the number of search patent identification information selected by the selection unit 109, or the number of information specifying the stored related patent documents that the selection unit 109 did not select. It may be the number of information specifying the stored non-related patent documents selected by the selection unit 109.

評価部１１０は、適合率、再現率、またはＦ値を用いて評価を行っても良い。適合率は、例えば「（特許調査における選別作業で選別された関連特許書類＋選択部１０９で選択された関連特許書類）／検索特許書類」としても良く、「（特許調査における選別作業で選別された関連特許書類＋選択部１０９でのみ選択された関連特許書類）／（検索特許書類を含まない特許調査における選別作業での選別対象の特許書類＋検索特許書類）」であっても良い。また、再現率は、「特許調査における選別作業で選別された関連特許書類／（特許調査における選別作業で選別された関連特許書類＋選択部１０９でのみ選択された関連特許書類）」であっても良い。Ｆ値は、（２×適合率×再現率）／（適合率＋再現率）としても良い。なお、評価部１１０は、１つの値を評価値として出力しても良く、複数の値をそれぞれ評価値として出力しても良い。 The evaluation unit 110 may perform evaluation using the precision, the recall, or the F value. The relevance rate may be, for example, “(related patent document selected in sorting operation in patent search + related patent document selected in selection unit 109) / search patent document” or “(selected in sorting operation in patent search). Related patent document + related patent document selected only by the selection unit 109) / (patent document to be selected in the search operation in the patent search not including the search patent document + search patent document) ”. In addition, the recall rate is “related patent document selected in the screening work in the patent search / (related patent document selected in the screening work in the patent search + related patent document selected only in the selection unit 109)”. Also good. The F value may be (2 × matching rate × recall rate) / (matching rate + recall rate). Note that the evaluation unit 110 may output one value as an evaluation value, or may output a plurality of values as evaluation values.

評価部１１０は、通常、ＭＰＵやメモリ等から実現され得る。評価部１１０の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The evaluation unit 110 can usually be realized by an MPU, a memory, or the like. The processing procedure of the evaluation unit 110 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

出力部１１１は、選択部１０９が選択した結果に関する情報を出力する。選択部１０９が選択した結果に関する情報は、選択部１０９が選択した結果を評価部１１０が評価した情報であっても良い。出力とは、ディスプレイへの表示、プロジェクターを用いた投影、プリンタでの印字、外部の装置への送信、記録媒体への蓄積、他の処理装置や他のプログラム等への処理結果の引渡し等を含む概念である。出力部１１１は、ディスプレイやスピーカー等の出力デバイスを含むと考えても含まないと考えても良い。出力部１１１は、出力デバイスのドライバーソフトまたは、出力デバイスのドライバーソフトと出力デバイス等で実現され得る。 The output unit 111 outputs information regarding the result selected by the selection unit 109. The information related to the result selected by the selection unit 109 may be information obtained by the evaluation unit 110 evaluating the result selected by the selection unit 109. Output refers to display on a display, projection using a projector, printing on a printer, transmission to an external device, storage in a recording medium, delivery of processing results to other processing devices or other programs, etc. It is a concept that includes. The output unit 111 may be considered as including or not including an output device such as a display or a speaker. The output unit 111 can be realized by output device driver software, or output device driver software and an output device.

図２は、本実施の形態における情報処理装置１の動作の一例を示すフローチャートである。以下、図３を用いて動作について説明する。 FIG. 2 is a flowchart showing an example of the operation of the information processing apparatus 1 in the present embodiment. Hereinafter, the operation will be described with reference to FIG.

（ステップＳ２０１）受付部１０１は、特許調査結果である、関連特許識別情報と非関連特許識別情報を受け付けたかどうかを判断する。特許調査結果を受け付けた場合は、ステップＳ２０２へ進み、受け付けなかった場合は、ステップＳ２０１を繰り返す。 (Step S201) The receiving unit 101 determines whether or not related patent identification information and unrelated patent identification information, which are patent search results, have been received. If a patent search result is accepted, the process proceeds to step S202. If not received, step S201 is repeated.

（ステップＳ２０２）受付部１０１は、受け付けた特許調査結果から関連特許識別情報を取得し、取得した関連特許識別情報を関連特許識別情報格納部１０２に蓄積する。 (Step S202) The receiving unit 101 acquires related patent identification information from the received patent search result, and accumulates the acquired related patent identification information in the related patent identification information storage unit 102.

（ステップＳ２０３）受付部１０１は、受け付けた特許調査結果から非関連特許識別情報を取得し、取得した非関連特許識別情報を非関連特許識別情報格納部１０３に蓄積する。 (Step S <b> 203) The receiving unit 101 acquires unrelated patent identification information from the received patent search result, and accumulates the acquired unrelated patent identification information in the unrelated patent identification information storage unit 103.

（ステップＳ２０４）検索式生成部１０４は、関連特許書類と非関連特許書類を用いて検索式を生成する。検索式を生成する方法の詳細は図４のフローチャートを用いて後述する。 (Step S204) The search expression generation unit 104 generates a search expression using related patent documents and unrelated patent documents. Details of the method for generating the search expression will be described later with reference to the flowchart of FIG.

（ステップＳ２０５）検索特許識別情報取得部１０６は、ステップＳ２０４で作成した検索式を用いて、検索特許識別情報を取得する。 (Step S205) The search patent identification information acquisition unit 106 acquires search patent identification information using the search formula created in step S204.

（ステップＳ２０６）特徴ベクトル取得部１０８は、関連特許識別情報格納部１０２に格納されている関連特許識別情報を用いて、関連特許特徴ベクトルを取得する。関連特許特徴ベクトルを取得する方法の詳細は、図５のフローチャートを用いて後述する。 (Step S206) The feature vector acquisition unit 108 acquires a related patent feature vector using the related patent identification information stored in the related patent identification information storage unit 102. Details of the method for acquiring the related patent feature vector will be described later with reference to the flowchart of FIG.

（ステップＳ２０７）特徴ベクトル取得部１０８は、非関連特許識別情報格納部１０３に格納されている非関連特許識別情報を用いて、非関連特許特徴ベクトルを取得する。非関連特許特徴ベクトルを取得する方法の詳細は、図５のフローチャートを用いて後述する。 (Step S207) The feature vector acquisition unit 108 acquires an unrelated patent feature vector using the unrelated patent identification information stored in the unrelated patent identification information storage unit 103. Details of the method for acquiring the unrelated patent feature vector will be described later with reference to the flowchart of FIG.

（ステップＳ２０８）特徴ベクトル取得部１０８は、ステップＳ２０５で取得した検索特許識別情報を用いて、検索特許特徴ベクトルを取得する。検索特許特徴ベクトルを取得する方法の詳細は、図５のフローチャートを用いて後述する。 (Step S208) The feature vector acquisition unit 108 acquires a search patent feature vector using the search patent identification information acquired in step S205. Details of the method for acquiring the search patent feature vector will be described later with reference to the flowchart of FIG.

（ステップＳ２０９）選択部１０９は、各特徴ベクトルの次元を統一する。 (Step S209) The selection unit 109 unifies the dimensions of the feature vectors.

（ステップＳ２１０）選択部１０９は、関連特許特徴ベクトルの平均ベクトルを取得する。 (Step S210) The selection unit 109 acquires an average vector of related patent feature vectors.

（ステップＳ２１１）選択部１０９は、非関連特許特徴ベクトルの平均ベクトルを取得する。 (Step S211) The selection unit 109 acquires an average vector of unrelated patent feature vectors.

（ステップＳ２１２）選択部１０９は、ステップＳ２０５で取得した検索特許書類のうち、本来の関連特許を選択する。本来の関連特許を選択する方法の詳細は、図６のフローチャートを用いて後述する。 (Step S212) The selection unit 109 selects an original related patent from the search patent documents acquired in Step S205. Details of the method of selecting the original related patent will be described later using the flowchart of FIG.

（ステップＳ２１３）選択部１０９は、あらかじめ関連特許識別情報格納部１０２に格納されている関連特許識別情報のうち、本来の関連特許を選択する。本来の関連特許を選択する方法の詳細は、図６のフローチャートを用いて後述する。 (Step S213) The selection unit 109 selects the original related patent from the related patent identification information stored in the related patent identification information storage unit 102 in advance. Details of the method of selecting the original related patent will be described later using the flowchart of FIG.

（ステップＳ２１４）選択部１０９は、あらかじめ非関連特許識別情報格納部１０３に格納されている非関連特許識別情報のうち、本来の関連特許を選択する。本来の関連特許を選択する方法の詳細は、図６のフローチャートを用いて後述する。 (Step S214) The selection unit 109 selects the original related patent from the unrelated patent identification information stored in the unrelated patent identification information storage unit 103 in advance. Details of the method of selecting the original related patent will be described later using the flowchart of FIG.

（ステップＳ２１５）評価部１１０は、ステップＳ２０１で受け付けた特許調査結果の選別が適切であったかどうかを、ステップＳ２１２からステップＳ２１４の選択結果を用いて評価する。 (Step S215) The evaluation unit 110 evaluates whether the selection of the patent search result received in step S201 is appropriate using the selection result of step S212 to step S214.

（ステップＳ２１６）出力部１１１は、ステップＳ２１５で評価した結果を出力する。 (Step S216) The output unit 111 outputs the result evaluated in step S215.

図４は、図３の検索式の生成（ステップＳ２０４）の動作の一例を示すフローチャートである。以下図４を用いて、検索式を生成する処理について説明する。 FIG. 4 is a flowchart showing an example of the operation of generating the search expression (step S204) in FIG. Hereinafter, a process for generating a search expression will be described with reference to FIG.

（ステップＳ３０１）検索式生成部１０４は、関連特許識別情報格納部１０２に格納されている関連特許識別情報を用いて、特許書類格納部１０５から関連特許書類を取得する。 (Step S <b> 301) The search expression generation unit 104 acquires related patent documents from the patent document storage unit 105 using the related patent identification information stored in the related patent identification information storage unit 102.

（ステップＳ３０２）検索式生成部１０４は、非関連特許識別情報格納部１０３に格納されている非関連特許識別情報を用いて、特許書類格納部１０５から非関連特許書類を取得する。 (Step S <b> 302) The search expression generation unit 104 acquires an unrelated patent document from the patent document storage unit 105 using the unrelated patent identification information stored in the unrelated patent identification information storage unit 103.

（ステップＳ３０３）検索式生成部１０４は、カウンタｍに１を代入する。 (Step S303) The search expression generation unit 104 assigns 1 to the counter m.

（ステップＳ３０４）検索式生成部１０４は、ステップＳ３０１で取得した関連特許書類に、ｍ番目の関連特許書類があるかどうか判断する。ｍ番目の関連特許書類がある場合はステップＳ３０５へ進み、無い場合は、ステップＳ３０７へ進む。 (Step S304) The search expression generation unit 104 determines whether the related patent document acquired in Step S301 includes the mth related patent document. If there is an m-th related patent document, the process proceeds to step S305, and if not, the process proceeds to step S307.

（ステップＳ３０５）検索式生成部１０４は、ｍ番目の関連特許書類から要素を取得する。 (Step S305) The search expression generation unit 104 acquires an element from the m-th related patent document.

（ステップＳ３０６）検索式生成部１０４は、カウンタｍを１だけインクリメントする。そして、ステップＳ３０４に戻る。 (Step S306) The search expression generation unit 104 increments the counter m by 1. Then, the process returns to step S304.

（ステップＳ３０７）検索式生成部１０４は、カウンタｎに１を代入する。 (Step S307) The search expression generation unit 104 substitutes 1 for a counter n.

（ステップＳ３０８）検索式生成部１０４は、ステップＳ３０２で取得した非関連特許書類に、ｎ番目の非関連特許書類があるかどうか判断する。ｎ番目の非関連特許書類がある場合はステップＳ３０９へ進み、無い場合は、ステップＳ３１１へ進む。 (Step S308) The search expression generation unit 104 determines whether or not the unrelated patent document acquired in step S302 includes the nth unrelated patent document. If there is an nth unrelated patent document, the process proceeds to step S309, and if not, the process proceeds to step S311.

（ステップＳ３０９）検索式生成部１０４は、ｎ番目の非関連特許書類から要素を取得する。 (Step S309) The search expression generation unit 104 acquires an element from the nth unrelated patent document.

（ステップＳ３１０）検索式生成部１０４は、カウンタｎを１だけインクリメントする。そして、ステップＳ３０８に戻る。 (Step S310) The search expression generation unit 104 increments the counter n by 1. Then, the process returns to step S308.

（ステップＳ３１１）検索式生成部１０４は、ステップＳ３０５で取得した関連特許書類の要素を含む特許書類を取得し、ステップＳ３０９で取得した非関連特許書類の要素を含む特許書類を取得しない検索式を生成する。そして、上位の処理に戻る。 (Step S311) The search expression generation unit 104 acquires a patent document including the related patent document element acquired in step S305, and acquires a search expression that does not acquire the patent document including the unrelated patent document element acquired in step S309. Generate. Then, the process returns to the upper process.

図５は、図３の特徴ベクトルの取得（ステップＳ２０６、ステップＳ２０７、ステップＳ２０８）の動作の一例を示すフローチャートである。以下図５を用いて、検索式を生成する処理について説明する。なお、図５におけるＸには、ステップＳ２０６から呼び出された場合は、「関連特許」が代入され、ステップＳ２０７から呼び出された場合は、「非関連特許」が代入され、ステップＳ２０８から呼び出された場合は、「検索特許」が代入されるものとする。以下は、Ｘに「関連特許」が代入されたステップＳ２０６から呼び出されたものとして説明する。ステップＳ２０７、およびステップＳ２０８については、それぞれ読み替えるものとする。 FIG. 5 is a flowchart showing an example of the operation of acquiring the feature vector (step S206, step S207, step S208) of FIG. Hereinafter, a process for generating a search expression will be described with reference to FIG. Note that “related patent” is assigned to X in FIG. 5 when called from step S206, and “unrelated patent” is assigned when called from step S207, and called from step S208. In this case, “search patent” is substituted. The following description is based on the assumption that it is called from step S206 in which “related patent” is assigned to X. Step S207 and step S208 are to be read respectively.

（ステップＳ４０１）特徴ベクトル取得部１０８は、関連特許識別情報格納部１０２に格納されている関連特許識別情報を用いて、特許書類格納部１０５から関連特許書類を取得する。 (Step S <b> 401) The feature vector acquisition unit 108 acquires related patent documents from the patent document storage unit 105 using the related patent identification information stored in the related patent identification information storage unit 102.

（ステップＳ４０２）特徴ベクトル取得部１０８は、カウンタｐに１を代入する。 (Step S402) The feature vector acquisition unit 108 substitutes 1 for the counter p.

（ステップＳ４０３）特徴ベクトル取得部１０８は、ステップＳ４０１で取得した関連特許書類に、ｐ番目の関連特許書類があるかどうか判断する。ｐ番目の関連特許書類がある場合はステップＳ４０４へ進み、無い場合は、上位の処理に戻る。 (Step S403) The feature vector acquisition unit 108 determines whether the related patent document acquired in step S401 includes the p-th related patent document. If there is a p-th related patent document, the process proceeds to step S404. If not, the process returns to the upper process.

（ステップＳ４０４）特徴ベクトル取得部１０８は、ｐ番目の関連特許書類から要素を取得する。 (Step S404) The feature vector acquisition unit 108 acquires elements from the p-th related patent document.

（ステップＳ４０５）特徴ベクトル取得部１０８は、類似要素格納部１０７に格納されている類似要素を用いて、類似する要素を統一要素に変換する。 (Step S405) The feature vector acquisition unit 108 converts similar elements into unified elements using the similar elements stored in the similar element storage unit 107.

（ステップＳ４０６）特徴ベクトル取得部１０８は、ステップＳ４０５で取得した要素のＴＦ・ＩＤＦ値を取得する。 (Step S406) The feature vector acquisition unit 108 acquires the TF / IDF value of the element acquired in step S405.

（ステップＳ４０７）特徴ベクトル取得部１０８は、ステップＳ４０５で取得した次元数を有し、各次元の値がステップＳ４０６で取得したＴＦ・ＩＤＦ値であるｐ番目の関連特許特徴ベクトルを取得する。 (Step S407) The feature vector acquisition unit 108 acquires the p-th related patent feature vector having the number of dimensions acquired in step S405 and the value of each dimension being the TF / IDF value acquired in step S406.

（ステップＳ４０８）特徴ベクトル取得部１０８は、カウンタｍを１だけインクリメントする。そして、ステップＳ４０３に戻る。 (Step S408) The feature vector acquisition unit 108 increments the counter m by 1. Then, the process returns to step S403.

図６は、図３の関連特許を選択する（ステップＳ２１２、ステップＳ２１３、ステップＳ２１４）の動作の一例を示すフローチャートである。以下図６を用いて、関連特許を選択する処理について説明する。なお、図５におけるＹには、ステップＳ２１２から呼び出された場合は、「検索特許」が代入され、ステップＳ２１３から呼び出された場合は、「あらかじめ関連特許に選別された特許」が代入され、ステップＳ２１４から呼び出された場合は、「あらかじめ非関連特許に選別された特許」が代入されるものとする。以下は、Ｙに「検索特許」が代入されたステップＳ２１２から呼び出されたものとして説明する。ステップＳ２１３、およびステップＳ２１４については、それぞれ読み替えるものとする。 FIG. 6 is a flowchart showing an example of the operation of selecting the related patent in FIG. 3 (Step S212, Step S213, Step S214). Hereinafter, the process of selecting a related patent will be described with reference to FIG. 5 is substituted with “search patent” when called from step S212, and is called with “patents selected in advance as related patents” when called from step S213. When called from S214, “patents selected in advance as unrelated patents” shall be substituted. The following description is based on the assumption that it is called from step S212 in which “search patent” is assigned to Y. Step S213 and step S214 are each read as different.

（ステップＳ５０１）選択部１０９は、カウンタｑに１を代入する。 (Step S501) The selection unit 109 substitutes 1 for a counter q.

（ステップＳ５０２）選択部１０９は、ｑ番目の検索特許書類があるかどうか判断する。ｑ番目の関連特許書類がある場合はステップＳ５０３へ進み、無い場合は、上位の処理に戻る。 (Step S502) The selection unit 109 determines whether there is a q-th search patent document. If there is a q-th related patent document, the process proceeds to step S503, and if not, the process returns to the upper process.

（ステップＳ５０３）選択部１０９は、ｑ番目の検索特許の特徴ベクトルと、関連特許特徴ベクトルの平均ベクトルとの類似度を算出する。 (Step S503) The selection unit 109 calculates the similarity between the feature vector of the qth search patent and the average vector of the related patent feature vectors.

（ステップＳ５０４）選択部１０９は、ｑ番目の検索特許の特徴ベクトルと、非関連特許特徴ベクトルの平均ベクトルとの類似度を算出する。 (Step S504) The selection unit 109 calculates the similarity between the feature vector of the qth search patent and the average vector of unrelated patent feature vectors.

（ステップＳ５０５）選択部１０９は、ステップＳ５０４で取得した非関連特許特徴ベクトルの平均ベクトルとの類似度より、ステップＳ５０３で取得した関連特許特徴ベクトルの平均ベクトルとの類似度の方が高いかどうかを判断する。高い場合は、ステップＳ５０６へ進み、低い場合はステップＳ５０２へ戻る。 (Step S505) The selection unit 109 determines whether the similarity with the average vector of the related patent feature vector acquired in step S503 is higher than the similarity with the average vector of the unrelated patent feature vector acquired in step S504. Judging. If it is higher, the process proceeds to step S506, and if it is lower, the process returns to step S502.

（ステップＳ５０６）選択部１０９は、ステップＳ５０３で取得した関連特許特徴ベクトルの平均ベクトルとの類似度が、閾値未満であるかどうかを判断する。閾値以下である場合は、ステップＳ５０７へ進み、閾値以上である場合は、ステップＳ５０２へ戻る。 (Step S506) The selection unit 109 determines whether the similarity with the average vector of the related patent feature vectors acquired in step S503 is less than a threshold value. If it is equal to or smaller than the threshold value, the process proceeds to step S507, and if it is equal to or larger than the threshold value, the process returns to step S502.

（ステップＳ５０７）選択部１０９は、ｑ番目の検索特許を特定する情報を図示しない格納部に蓄積する。そして、ステップＳ５０２へ戻る。 (Step S507) The selection unit 109 accumulates information for specifying the qth search patent in a storage unit (not shown). Then, the process returns to step S502.

以下、本実施の形態における情報処理装置１の具体的な動作について説明する。なお、本具体例において示した各図面の情報は、説明のために便宜上用意されたものであって、実際のデータを示すものではない。また、本具体例において、受付部１０１が受け付ける「特許調査結果．ｆｉｌｅ」は、関連特許識別情報と、非関連特許識別情報とを含むファイルであるものとする。また、本具体例において、特許書類を特定する情報は、公開番号であるものとする。また、本具体例において、検索式生成部１０４が作成する検索式は、専門用語のみの検索式であるものとする。 Hereinafter, a specific operation of the information processing apparatus 1 in the present embodiment will be described. In addition, the information of each drawing shown in this specific example is prepared for convenience of explanation, and does not indicate actual data. Further, in this specific example, it is assumed that “patent search result.file” received by the receiving unit 101 is a file including related patent identification information and unrelated patent identification information. In this specific example, the information specifying the patent document is a public number. In this specific example, it is assumed that the search formula created by the search formula generation unit 104 is a search formula for only technical terms.

本具体例において、特許書類格納部１０５に格納されている特許書類は、図７で示されているものであるとする。図７のテーブルは、特許書類を特定する情報と特許書類とを有している。特許書類は、発明の名称、国際特許分類、要約、特許請求の範囲、および背景技術等を有している。例えば、特許書類ＩＤ「１」のレコードには、発明の名称「検索式生成装置、…（（以下省略））」と、国際特許分類「Ｇ０６Ｆ１７／３０」と、公開番号「特開２０１１−ＡＡＡＡＡＡ」と、要約「ＧＰＳによる位置情報を…（以下省略）」と、特許請求の範囲「第一特許公報を示す…（以下省略）」と背景技術「本発明による特許調査支援装置によれば…（以下省略）」と、その他の情報とが対応付けて登録されている。 In this specific example, it is assumed that the patent documents stored in the patent document storage unit 105 are those shown in FIG. The table in FIG. 7 includes information specifying patent documents and patent documents. The patent document has the title of the invention, international patent classification, abstract, claims, background art, and the like. For example, in the record of the patent document ID “1”, the name of the invention “search expression generation device,... ((Hereinafter omitted))”, the international patent classification “G06F 17/30”, and the publication number “JP2011-2011”. According to the patent search support apparatus according to the present invention, AAAAAA ", summary" GPS position information ... (hereinafter omitted) ", claim" Show first patent publication ... " ... (hereinafter omitted) "and other information are registered in association with each other.

また、本具体例において、類似要素格納部１０７に格納されている類似要素は、図８で示されているものであるとする。図８のテーブルは、統一要素と、その統一要素に統一される類似要素を有している。例えば、統一要素「世界測位システム」のレコードには、類似要素「ＧＰＳ，全地球測位，…（以下省略）」が対応付けて登録されている。 In this specific example, the similar elements stored in the similar element storage unit 107 are assumed to be those shown in FIG. The table in FIG. 8 has a unified element and similar elements unified by the unified element. For example, a similar element “GPS, global positioning,... (Hereinafter omitted)” is registered in association with the record of the unified element “world positioning system”.

ユーザ端末２のユーザが、ポインティングデバイスやキーボード等を操作し、特許調査結果が記載されているファイルを選択した後、「実行」ボタンをクリックしたとする。すると、ネットワーク１００を介して、受付部１０１が「特許調査結果．ｆｉｌｅ」を受け付ける（ステップＳ２０１）。受付部１０１は、「特許調査結果．ｆｉｌｅ」から関連特許識別情報と非関連特許識別情報とを取得し、関連特許識別情報を関連特許識別情報格納部１０２に、非関連特許識別情報を非関連特許識別情報格納部１０３にそれぞれ蓄積する（ステップＳ２０２、Ｓ２０３）。その結果、図９（ａ）のように、関連特許識別情報が関連特許識別情報格納部１０２に格納される。また、図９（ｂ）のように非関連特許識別情報が非関連特許識別情報格納部１０３にも同様に格納される。 It is assumed that the user of the user terminal 2 operates a pointing device, a keyboard, etc., selects a file in which a patent search result is described, and then clicks an “execute” button. Then, the accepting unit 101 accepts “patent search result.file” via the network 100 (step S201). The receiving unit 101 acquires related patent identification information and unrelated patent identification information from “patent search result.file”, and related patent identification information is stored in the related patent identification information storage unit 102 and unrelated patent identification information is unrelated. The information is stored in the patent identification information storage unit 103 (steps S202 and S203). As a result, the related patent identification information is stored in the related patent identification information storage unit 102 as shown in FIG. Further, as shown in FIG. 9B, the unrelated patent identification information is similarly stored in the unrelated patent identification information storage unit 103.

検索式生成部１０４は、関連特許識別情報格納部１０２、および非関連特許識別情報格納部１０３に特許書類を特定する情報が格納されると、検索式を生成する（ステップＳ２０４）。検索式生成部１０４は、関連特許識別情報格納部１０２に格納されている関連特許識別情報「特開２０１１−ＡＡＡＡＡＡ」を用いて、特許書類格納部１０５から、特許書類ＩＤ「１」の特許書類を取得する（ステップＳ３０１）。検索式生成部１０４は、関連特許識別情報格納部１０２に格納されているすべての関連特許識別情報を用いて、同様に関連特許書類を取得する。次に、検索式生成部１０４は、非関連特許識別情報格納部１０３に格納されている非関連特許識別情報「特開２０１２−ＤＤＤＤＤＤ」を用いて、特許書類格納部１０５から、特許書類ＩＤ「４」の特許書類を取得する（ステップＳ３０２）。検索式生成部１０４は、非関連特許識別情報格納部１０３に格納されているすべての非関連特許識別情報を用いて、同様に非関連特許書類を取得する。そして、検索式生成部１０４は、１件目の関連特許書類である特許書類ＩＤ「１」の特許書類に記載されている専門用語「ＧＰＳ」や「位置情報」や「特許公報」等を取得する（ステップＳ３０３からＳ３０６）。検索式生成部１０４は、同様に取得したすべての関連特許書類から専門用語を取得する。さらに、検索式生成部１０４は、１件目の非関連特許書類である特許書類ＩＤ「４」の特許書類に記載されている専門用語「特許調査」や「調査対象」や「特許公報」等を取得する（ステップＳ３０７からＳ３１０）。検索式生成部１０４は、同様に取得したすべての非関連特許書類から専門用語を取得する。最後に検索式生成部１０４は、取得した専門用語を用いて、関連特許書類をすべて検索し、非関連特許書類を検索しない検索式「（ＧＰＳＯＲ位置情報ＯＲ特許公報ＯＲ・・・）ＡＮＤ（ＮＯＴ（特許調査ＯＲ調査対象ＯＲ・・・））」を生成する（ステップＳ３１１）。検索式生成部１０４は、検索式の生成が終了すると検索特許識別情報取得部１０６に生成した検索式を通知する。 When the information specifying the patent document is stored in the related patent identification information storage unit 102 and the unrelated patent identification information storage unit 103, the search formula generation unit 104 generates a search formula (step S204). The search expression generation unit 104 uses the related patent identification information “Japanese Patent Laid-Open No. 2011-AAAAAA” stored in the related patent identification information storage unit 102 to retrieve the patent document with the patent document ID “1” from the patent document storage unit 105. Is acquired (step S301). The search expression generation unit 104 similarly acquires related patent documents using all the related patent identification information stored in the related patent identification information storage unit 102. Next, the search expression generation unit 104 uses the unrelated patent identification information “JP 2012-DDDDDD” stored in the unrelated patent identification information storage unit 103, from the patent document storage unit 105 to the patent document ID “ 4 "is acquired (step S302). The search expression generation unit 104 similarly acquires unrelated patent documents using all the unrelated patent identification information stored in the unrelated patent identification information storage unit 103. Then, the search expression generation unit 104 acquires the technical terms “GPS”, “location information”, “patent bulletin”, etc. described in the patent document with the patent document ID “1” which is the first related patent document. (Steps S303 to S306). The search expression generation unit 104 acquires technical terms from all related patent documents acquired in the same manner. Further, the search expression generation unit 104 uses the technical terms “patent search”, “search object”, “patent bulletin”, etc. described in the patent document with the patent document ID “4” which is the first unrelated patent document. Is acquired (steps S307 to S310). The search expression generation unit 104 acquires technical terms from all unrelated patent documents acquired in the same manner. Finally, the search expression generation unit 104 searches all related patent documents using the acquired technical terms and does not search unrelated patent documents “(GPS OR position information OR patent publication OR...) AND ( NOT (patent search OR search target OR...)) ”Is generated (step S311). When the generation of the search formula is completed, the search formula generation unit 104 notifies the search patent identification information acquisition unit 106 of the generated search formula.

検索特許識別情報取得部１０６は、検索式生成部１０４から検索式を取得すると、特許書類格納部１０５に対して、検索を行う。そして、検索特許識別情報取得部１０６は、その検索式で検索される特許書類を特定する情報である検索特許識別情報「特開２０１２−ＧＧＧＧＧＧ」や「特開２０１１−ＨＨＨＨＨＨ」等を取得する（ステップＳ２０５）。 When the search patent identification information acquisition unit 106 acquires the search formula from the search formula generation unit 104, the search patent identification information acquisition unit 106 searches the patent document storage unit 105. Then, the search patent identification information acquisition unit 106 acquires search patent identification information “JP 2012-GGGGGG”, “JP 2011-HHHHHH”, etc., which is information for specifying a patent document searched by the search formula ( Step S205).

特徴ベクトル取得部１０８は、格納済み関連特許書類から特徴ベクトルを作成する（ステップＳ２０６）。すなわち、特徴ベクトル取得部１０８は、関連特許識別情報格納部１０２に格納されている関連特許識別情報「特開２０１１−ＡＡＡＡＡＡ」を用いて、特許書類格納部１０５から、特許書類ＩＤ「１」の特許書類を取得する（ステップＳ４０１）。特徴ベクトル取得部１０８は、関連特許識別情報格納部１０２に格納されているすべての関連特許識別情報を用いて、同様に関連特許書類を取得する。次に、特徴ベクトル取得部１０８は、１件目の関連特許書類である特許書類ＩＤ「１」の特許書類に記載されている専門用語「ＧＰＳ」や「位置情報」や「特許公報」等を取得する（ステップＳ４０２、Ｓ４０３）。そして、特徴ベクトル取得部１０８は、「ＧＰＳ」を統一要素の「世界測位システム」に置換する（ステップＳ４０５）。さらに、特徴ベクトル取得部１０８は、１件目の関連特許書類である特許書類ＩＤ「１」の特許書類から取得した専門用語ごとに「世界測位システム：０．０２７２」等ように、ＴＦ・ＩＤＦ値を算出し（ステップＳ４０６）、専門用語の数だけの次元を有し、算出したＴＦ・ＩＤＦ値を値にもつ関連特許特徴ベクトルを取得する（ステップＳ４０７、Ｓ４０８）。そして、特徴ベクトル取得部１０８は、同様に全ての関連特許書類に対して関連特許特徴ベクトルを取得する。特徴ベクトル取得部１０８は、関連特許特徴ベクトルを取得すると、同様の方法で非関連特許特徴ベクトルと検索特許特徴ベクトルを取得する（ステップＳ２０７、Ｓ２０８）。その結果、図１０のような特徴ベクトルが取得される。特徴ベクトル取得部１０８は、すべてのベクトルを取得し終えると、取得した特徴ベクトルを選択部１０９に渡す。 The feature vector acquisition unit 108 creates a feature vector from the stored related patent documents (step S206). That is, the feature vector acquisition unit 108 uses the related patent identification information “Japanese Patent Application Laid-Open No. 2011-AAAAAA” stored in the related patent identification information storage unit 102, and stores the patent document ID “1” from the patent document storage unit 105. A patent document is acquired (step S401). The feature vector acquisition unit 108 similarly acquires related patent documents using all the related patent identification information stored in the related patent identification information storage unit 102. Next, the feature vector acquisition unit 108 displays the technical terms “GPS”, “position information”, “patent bulletin”, etc. described in the patent document with the patent document ID “1” which is the first related patent document. Obtain (steps S402 and S403). Then, the feature vector acquisition unit 108 replaces “GPS” with the “world positioning system” as a unified element (step S405). Further, the feature vector acquisition unit 108 sets the TF / IDF for each technical term acquired from the patent document with the patent document ID “1”, which is the first related patent document, such as “World positioning system: 0.0272”. Values are calculated (step S406), and related patent feature vectors having dimensions corresponding to the number of technical terms and having the calculated TF / IDF values as values are acquired (steps S407 and S408). And the feature vector acquisition part 108 acquires a related patent feature vector with respect to all the related patent documents similarly. When the feature vector acquisition unit 108 acquires the related patent feature vector, the feature vector acquisition unit 108 acquires the unrelated patent feature vector and the search patent feature vector by the same method (steps S207 and S208). As a result, a feature vector as shown in FIG. 10 is acquired. When the feature vector acquisition unit 108 has acquired all the vectors, the feature vector acquisition unit 108 passes the acquired feature vector to the selection unit 109.

選択部１０９は、特徴ベクトル取得部１０８から特徴ベクトルを受け取ると、すべての特徴ベクトルが同じ次元数のベクトルになるよう次元数を統一する（ステップＳ２０９）。次に、選択部１０９は、関連特許特徴ベクトルの平均ベクトル、および非関連特許特徴ベクトルの平均ベクトルを取得する（ステップＳ２１０、Ｓ２１１）。この際、選択部１０９は、特許書類を特定する情報「特開２０１１−ＡＡＡＡＡＡ」の関連特許特徴ベクトルと平均ベクトルとの類似度が、他の関連特許特徴ベクトルとの類似度の平均の２倍より高かったものとして、特許書類を特定する情報「特開２０１１−ＡＡＡＡＡＡ」の関連特許特徴ベクトルを除いた関連特許特徴ベクトルの平均ベクトルを取得するものとする。また、非関連特許特徴ベクトルの平均ベクトルを取得する際も、選択部１０９は、同様にして、特許書類を特定する情報「特開２０１１−ＥＥＥＥＥＥ」等を除外した非関連特許特徴ベクトルの平均ベクトルを取得したものとする。そして、選択部１０９は、検索特許のうちの関連特許を選択する（ステップＳ２１２）。すなわち、選択部１０９は、１件目の検索特許書類である特許書類を特定する情報「特開２０１２−ＧＧＧＧＧＧ」に対応する検索特許特徴ベクトルと、関連特許特徴ベクトルの平均ベクトルとのＣＯＳ尺度を算出する（ステップＳ５０１からＳ５０３）。そして、選択部１０９は、特許書類を特定する情報「特開２０１２−ＧＧＧＧＧＧ」に対応する検索特許特徴ベクトルと、非関連特許特徴ベクトルの平均ベクトルとのＣＯＳ尺度を算出する（ステップＳ５０４）。この際、特許書類を特定する情報「特開２０１２−ＧＧＧＧＧＧ」に対応する検索特許特徴ベクトルは、関連特許特徴ベクトルとの類似度の方が高く、かつ関連特許特徴ベクトルの平均ベクトルから最も類似度の低い関連特許特徴ベクトルよりも類似度が高いものとする。よって、選択部１０９は、特許書類を特定する情報「特開２０１２−ＧＧＧＧＧＧ」を関連特許であると判断して選択する（ステップＳ５０５からＳ５０６）。選択部１０９は、選択した特許書類を特定する情報を図示しない格納部に蓄積する。選択部１０９は、同様の処理を残りの検索特許特徴ベクトル、関連特許特徴ベクトル、および非関連特許特徴ベクトルに対しても行い、特許書類を特定する情報「特開２０１２−ＧＧＧＧＧＧ」以外にもいくつかの検索特許識別情報と、関連特許識別情報「特開２０１１−ＡＡＡＡＡＡ」を除くすべての関連特許識別情報と、非関連特許識別情報「特開２０１１−ＥＥＥＥＥＥ」を選択したものとする（ステップＳ２１３、Ｓ２１４）。選択部１０９は、関連特許の選択が完了したことを評価部１１０に通知する。なお、図１１は、選択部１０９の選択の際の各特許書類の関係をわかりやすくするために、２次元上にマッピングしたものである。図１１では、格納済み関連特許に類似する範囲に、特許書類を特定する情報「特開２０１２−ＧＧＧＧＧＧ」と非関連特許識別情報「特開２０１１−ＥＥＥＥＥＥ」とが含まれており、関連特許識別情報「特開２０１１−ＡＡＡＡＡＡ」が非関連特許に類似する範囲に含まれている様子を示している。なお、図１１では、格納済み関連特許書類を特定する情報であり、かつ検索特許書類を特定する情報でもあるものは、円を四角が囲んだ図で示している。 Upon receiving the feature vector from the feature vector acquisition unit 108, the selection unit 109 unifies the number of dimensions so that all the feature vectors become vectors having the same number of dimensions (step S209). Next, the selection unit 109 acquires an average vector of related patent feature vectors and an average vector of unrelated patent feature vectors (steps S210 and S211). At this time, the selection unit 109 has twice the similarity between the related patent feature vector and the average vector of the information “Japanese Patent Laid-Open No. 2011-AAAAAA” specifying the patent document, and the average of the similarities with other related patent feature vectors. It is assumed that the average vector of the related patent feature vectors excluding the related patent feature vector of the information “JP-A 2011-AAAAAA” specifying the patent document is acquired as higher. Similarly, when acquiring an average vector of unrelated patent feature vectors, the selection unit 109 similarly uses the average vector of unrelated patent feature vectors excluding information “JP-2011-EEEEEE” and the like for specifying patent documents. Is obtained. Then, the selection unit 109 selects a related patent among the search patents (step S212). In other words, the selection unit 109 calculates the COS scale between the search patent feature vector corresponding to the information “JP 2012-GGGGGG” specifying the patent document that is the first search patent document and the average vector of the related patent feature vectors. Calculate (steps S501 to S503). Then, the selection unit 109 calculates a COS measure between the search patent feature vector corresponding to the information “JP 2012-GGGGGG” specifying the patent document and the average vector of the unrelated patent feature vectors (step S504). At this time, the search patent feature vector corresponding to the information “JP 2012-GGGGGG” specifying the patent document has a higher similarity to the related patent feature vector and the highest similarity from the average vector of the related patent feature vectors. It is assumed that the degree of similarity is higher than that of a related patent feature vector having a lower Therefore, the selection unit 109 determines that the information “JP 2012-GGGGGG” specifying the patent document is a related patent and selects the information (steps S505 to S506). The selection unit 109 accumulates information for specifying the selected patent document in a storage unit (not shown). The selection unit 109 performs the same processing for the remaining search patent feature vectors, related patent feature vectors, and unrelated patent feature vectors, and determines how many items other than the information “JP 2012-GGGGGG” specifying the patent document. Search patent identification information, all related patent identification information except related patent identification information “Japanese Unexamined Patent Publication No. 2011-AAAAAA”, and unrelated patent identification information “Japanese Unexamined Patent Application Publication No. 2011-EEEEEE” are selected (step S213). , S214). The selection unit 109 notifies the evaluation unit 110 that selection of related patents has been completed. Note that FIG. 11 is two-dimensionally mapped in order to make it easy to understand the relationship between the patent documents when the selection unit 109 selects. In FIG. 11, information “JP 2012-GGGGGG” for specifying a patent document and unrelated patent identification information “JP 2011-EEEEEE” are included in a range similar to the stored related patent. The information “Japanese Unexamined Patent Application Publication No. 2011-AAAAAA” is included in a range similar to an unrelated patent. In FIG. 11, the information specifying the stored related patent document and the information specifying the search patent document are shown by a circle surrounded by a square.

評価部１１０は、選択部１０９の選択が終了すると、図示しない格納部に格納されている特許書類を特定する情報を用いて、適合率０．８５と再現率０．９４とを算出し、評価値であるＦ値０．８９を算出したものとする（ステップＳ２１５）。評価部１１０は、評価結果を出力部１１１へ通知する。 When the selection by the selection unit 109 is completed, the evaluation unit 110 calculates a relevance ratio of 0.85 and a recall ratio of 0.94 using information for specifying patent documents stored in a storage unit (not shown), and evaluates Assume that an F value of 0.89 is calculated (step S215). The evaluation unit 110 notifies the output unit 111 of the evaluation result.

出力部１１１は、評価部１１０から受け取った評価結果と、図示しない格納部に格納されている特許書類を特定する情報とを出力する（ステップＳ２１６）。すると、図１２の結果以下に情報が表示される。 The output unit 111 outputs the evaluation result received from the evaluation unit 110 and the information specifying the patent document stored in the storage unit (not shown) (step S216). Then, information is displayed below the result of FIG.

以上、本実施の形態によれば、特許調査で選別された関連特許書類に特許書類を用いて、関連特許の可能性のある特許書類を自動で選択できる。また、本実施の形態によれば、特許調査で選別された関連特許書類と、非関連特許とを用いて、関連特許の可能性のある特許書類を自動で選択できる。また、本実施の形態によれば、関連特許書類から検索式を作成して類似する関連特許の可能性の高い特許書類を選択の対象にできる。また、本実施の形態によれば、関連特許書類と非関連特許書類から検索式を作成して類似する関連特許の可能性のさらに高い特許書類を選択の対象にできる。また、本実施の形態によれば、人手で行った特許調査の選別作業を評価できる。 As described above, according to the present embodiment, a patent document that can be a related patent can be automatically selected by using the patent document as the related patent document selected in the patent search. Further, according to the present embodiment, it is possible to automatically select a patent document with a possibility of a related patent by using a related patent document selected in a patent search and an unrelated patent. Further, according to the present embodiment, it is possible to select a patent document having a high possibility of a similar related patent by creating a search formula from the related patent document. Further, according to the present embodiment, a search expression can be created from related patent documents and non-related patent documents to select patent documents having a higher possibility of similar related patents. In addition, according to the present embodiment, it is possible to evaluate the screening work of the patent search performed manually.

また、本実施の形態において、評価部１１０を備える場合について説明したが、情報処理装置１は、評価部１１０を備えていなくても良い。評価部１１０を備えていない場合には、出力部１１１は、評価を行わずに選択部１０９の結果のみ出力しても良い。 Moreover, although the case where the evaluation unit 110 is provided has been described in the present embodiment, the information processing apparatus 1 may not include the evaluation unit 110. When the evaluation unit 110 is not provided, the output unit 111 may output only the result of the selection unit 109 without performing the evaluation.

また、本実施の形態において、類似要素格納部１０７を備える場合について説明したが、情報処理装置１は、類似要素格納部１０７を備えていなくても良い。類似要素格納部１０７を備えていない場合には、検索式生成部１０４、および特徴ベクトル取得部１０８は、類似要素を提供している外部の装置からネットワーク１００を介して類似要素を取得して使用しても良く、類似要素を使用しなくても良い。 Further, although the case where the similar element storage unit 107 is provided has been described in the present embodiment, the information processing apparatus 1 may not include the similar element storage unit 107. When the similar element storage unit 107 is not provided, the search formula generation unit 104 and the feature vector acquisition unit 108 acquire and use similar elements from an external device that provides similar elements via the network 100. Alternatively, similar elements may not be used.

また、本実施の形態において、特許書類格納部１０５を備える場合について説明したが、情報処理装置１は、特許書類格納部１０５を備えていなくても良い。特許書類格納部１０５を備えていない場合には、検索式生成部１０４、検索特許識別情報取得部１０６、および特徴ベクトル取得部１０８は、特許書類を提供している外部の装置からネットワーク１００を介して特許書類を取得して使用しても良い。 Moreover, although the case where the patent document storage unit 105 is provided has been described in the present embodiment, the information processing apparatus 1 may not include the patent document storage unit 105. When the patent document storage unit 105 is not provided, the search expression generation unit 104, the search patent identification information acquisition unit 106, and the feature vector acquisition unit 108 are connected via an external device that provides patent documents via the network 100. Patent documents may be obtained and used.

また、本実施の形態において、検索式生成部１０４と検索特許識別情報取得部１０６とを備える場合について説明したが、情報処理装置１は、検索式生成部１０４と検索特許識別情報取得部１０６とを備えていなくても良い。検索式生成部１０４と検索特許識別情報取得部１０６とを備えていない場合には、特徴ベクトル取得部１０８は、存在するすべての特許書類から特徴ベクトルを取得しても良く、受付部１０１がネットワーク１００を介して受け付けた、特許書類から特徴ベクトルを取得しても良い。 In the present embodiment, the case where the search expression generation unit 104 and the search patent identification information acquisition unit 106 are provided has been described. However, the information processing apparatus 1 includes the search expression generation unit 104, the search patent identification information acquisition unit 106, and the like. It is not necessary to have. When the search expression generation unit 104 and the search patent identification information acquisition unit 106 are not provided, the feature vector acquisition unit 108 may acquire feature vectors from all existing patent documents, and the reception unit 101 may The feature vector may be acquired from the patent document received via the system 100.

また、本実施の形態における情報処理装置１を実現するソフトウェアは、以下のようなプログラムである。つまり、プログラムは、特許調査における選別作業の結果、関連すると判断された特許書類である関連特許書類を特定する情報である関連特許識別情報が１以上格納される関連特許識別情報格納部にアクセス可能なコンピュータを、関連特許識別情報格納部に格納されている１以上の関連特許識別情報で識別される特許書類に含まれる要素を用いて、関連特許識別情報格納部に格納されている１以上の関連特許識別情報で識別される特許書類を取得可能な検索式を生成する検索式生成部、検索式生成部が生成した検索式を用いて取得される特許書類である検索特許書類を特定する情報である検索特許識別情報を取得する検索特許識別情報取得部、関連特許識別情報格納部に格納されている各関連特許識別情報で識別される関連特許書類の特徴ベクトルである１以上の関連特許特徴ベクトルを取得し、かつ、検索特許識別情報取得部が取得した各検索特許識別情報で識別される検索特許書類の特徴ベクトルである１以上の検索特許特徴ベクトルを取得する特徴ベクトル取得部、特徴ベクトル取得部が取得した１以上の検索特許特徴ベクトルのうち、特徴ベクトル取得部が取得した１以上の関連特許特徴ベクトルが有する特徴と類似する検索特許特徴ベクトルに対応する検索特許識別情報であり、かつ、関連特許識別情報格納部に格納されている関連特許識別情報と一致しない検索特許識別情報を、少なくとも選択する選択部、選択部が選択した結果に関する情報を出力する出力部として機能させるためのプログラムである。 Moreover, the software which implement | achieves the information processing apparatus 1 in this Embodiment is the following programs. In other words, the program can access a related patent identification information storage unit that stores one or more related patent identification information that is information for identifying related patent documents that are determined to be related as a result of screening work in patent search. One or more computers stored in the related patent identification information storage unit using elements included in the patent document identified by the one or more related patent identification information stored in the related patent identification information storage unit. A search expression generation unit that generates a search expression that can acquire a patent document identified by related patent identification information, and information that specifies a search patent document that is a patent document acquired using the search expression generated by the search expression generation unit Search patent identification information acquisition unit for acquiring search patent identification information, and characteristics of related patent documents identified by each related patent identification information stored in the related patent identification information storage unit One or more search patent feature vectors which are feature vectors of a search patent document identified by each search patent identification information acquired by the search patent identification information acquisition unit are acquired. Corresponds to a search patent feature vector similar to a feature of one or more related patent feature vectors acquired by the feature vector acquisition unit among one or more search patent feature vectors acquired by the feature vector acquisition unit and the feature vector acquisition unit The search patent identification information that is to be searched and the search patent identification information that does not match the related patent identification information stored in the related patent identification information storage unit is selected. This is a program for causing an output unit to function.

なお、本実施の形態において、各処理（各機能）は、単一の装置（システム）によって集中処理されることによって実現されても良く、または、複数の装置によって分散処理されることによって実現されても良い。また、本実施の形態において、一の装置に存在する２以上の通信手段は、物理的に一の媒体で実現されても良いことは言うまでもない。 In the present embodiment, each process (each function) may be realized by centralized processing by a single device (system), or by distributed processing by a plurality of devices. May be. In the present embodiment, it goes without saying that two or more communication means existing in one apparatus may be physically realized by one medium.

また、本実施の形態において、各構成要素は、専用のハードウェアにより構成されても良く、または、ソフトウェアにより実現可能な構成要素については、プログラムを実行することによって実現されても良い。例えば、ハードディスクや半導体メモリ等の記録媒体に記録されたソフトウェア・プログラムをＣＰＵ等のプログラム実行部が読み出して実行することによって、各構成要素が実現され得る。 In the present embodiment, each component may be configured by dedicated hardware, or a component that can be realized by software may be realized by executing a program. For example, each component can be realized by a program execution unit such as a CPU reading and executing a software program recorded on a recording medium such as a hard disk or a semiconductor memory.

なお、上記プログラムにおいて、上記プログラムが実現する機能には、ハードウェアでしか実現できない機能は含まれない。例えば、情報を取得する取得部や、情報を出力する出力部等におけるモデムやインターフェースカード等のハードウェアでしか実現できない機能は、上記プログラムが実現する機能には含まれない。 In the program, the functions realized by the program do not include functions that can be realized only by hardware. For example, functions that can be realized only by hardware such as a modem and an interface card in an acquisition unit that acquires information, an output unit that outputs information, and the like are not included in the functions realized by the program.

図１３は、上記プログラムを実行して、上記実施の形態による本発明を実現するコンピュータの外観の一例を示す模式図である。上記実施の形態は、コンピュータハードウェアおよびその上で実行されるコンピュータプログラムによって実現され得る。 FIG. 13 is a schematic diagram showing an example of the external appearance of a computer that executes the program and implements the present invention according to the embodiment. The embodiment described above can be realized by computer hardware and a computer program executed on the computer hardware.

図１３において、コンピュータシステム１１００は、ＣＤ−ＲＯＭドライブ１１０５、ＦＤドライブ１１０６を含むコンピュータ１１０１と、キーボード１１０２と、マウス１１０３と、モニタ１１０４とを備える。 In FIG. 13, a computer system 1100 includes a computer 1101 including a CD-ROM drive 1105 and an FD drive 1106, a keyboard 1102, a mouse 1103, and a monitor 1104.

図１４は、コンピュータシステム１１００の内部構成を示す図である。図１４において、コンピュータ１１０１は、ＣＤ−ＲＯＭドライブ１１０５、ＦＤドライブ１１０６に加えて、ＭＰＵ１１１１と、ブートアッププログラム等のプログラムを記憶するためのＲＯＭ１１１２と、ＭＰＵ１１１１に接続され、アプリケーションプログラムの命令を一時的に記憶すると共に、一時記憶空間を提供するＲＡＭ１１１３と、アプリケーションプログラム、システムプログラム、およびデータを記憶するハードディスク１１１４と、ＭＰＵ１１１１と、ＲＯＭ１１１２等を相互に接続するバス１１１５とを備える。なお、コンピュータ１１０１は、ＬＡＮへの接続を提供する図示しないネットワークカードを含んでいても良い。 FIG. 14 is a diagram showing an internal configuration of the computer system 1100. In FIG. 14, in addition to the CD-ROM drive 1105 and the FD drive 1106, a computer 1101 is connected to an MPU 1111, a ROM 1112 for storing programs such as a bootup program, and an MPU 1111. And a RAM 1113 that provides a temporary storage space, a hard disk 1114 that stores application programs, system programs, and data, an MPU 1111, a bus 1115 that interconnects the ROM 1112, and the like. The computer 1101 may include a network card (not shown) that provides connection to the LAN.

コンピュータシステム１１００に、上記実施の形態による本発明等の機能を実行させるプログラムは、ＣＤ−ＲＯＭ１１２１、またはＦＤ１１２２に記憶されて、ＣＤ−ＲＯＭドライブ１１０５、またはＦＤドライブ１１０６に挿入され、ハードディスク１１１４に転送されても良い。これに代えて、そのプログラムは、図示しないネットワークを介してコンピュータ１１０１に送信され、ハードディスク１１１４に記憶されても良い。プログラムは実行の際にＲＡＭ１１１３にロードされる。なお、プログラムは、ＣＤ−ＲＯＭ１１２１やＦＤ１１２２、またはネットワークから直接、ロードされても良い。 A program for causing the computer system 1100 to execute the functions of the present invention according to the above embodiment is stored in the CD-ROM 1121 or the FD 1122, inserted into the CD-ROM drive 1105 or the FD drive 1106, and transferred to the hard disk 1114. May be. Instead, the program may be transmitted to the computer 1101 via a network (not shown) and stored in the hard disk 1114. The program is loaded into the RAM 1113 when executed. The program may be loaded directly from the CD-ROM 1121, the FD 1122, or the network.

プログラムは、コンピュータ１１０１に、上記実施の形態による本発明の機能を実行させるオペレーティングシステム（ＯＳ）、またはサードパーティプログラム等を必ずしも含んでいなくても良い。プログラムは、制御された態様で適切な機能（モジュール）を呼び出し、所望の結果が得られるようにする命令の部分のみを含んでいても良い。コンピュータシステム１１００がどのように動作するのかについては周知であり、詳細な説明は省略する。 The program does not necessarily include an operating system (OS) or a third-party program that causes the computer 1101 to execute the functions of the present invention according to the above-described embodiment. The program may include only a part of an instruction that calls an appropriate function (module) in a controlled manner and obtains a desired result. How the computer system 1100 operates is well known and will not be described in detail.

本発明は、以上の実施の形態に限定されることなく、種々の変更が可能であり、それらも本発明の範囲内に包含されるものであることは言うまでもない。 The present invention is not limited to the above-described embodiments, and various modifications are possible, and it goes without saying that these are also included in the scope of the present invention.

以上のように、本発明にかかる情報処理装置等は、特許書類から関連特許を自動で選択できるという効果を有し、例えば、関連特許書類を選択する装置等として有用である。 As described above, the information processing apparatus and the like according to the present invention have an effect that related patents can be automatically selected from patent documents, and are useful as, for example, an apparatus for selecting related patent documents.

１情報処理装置
１０１受付部
１０２関連特許識別情報格納部
１０３非関連特許識別情報格納部
１０４検索式生成部
１０５特許書類格納部
１０６検索特許識別情報取得部
１０７類似要素格納部
１０８特徴ベクトル取得部
１０９選択部
１１０評価部
１１１出力部 DESCRIPTION OF SYMBOLS 1 Information processing apparatus 101 Reception part 102 Related patent identification information storage part 103 Unrelated patent identification information storage part 104 Search formula production | generation part 105 Patent document storage part 106 Search patent identification information acquisition part 107 Similar element storage part 108 Feature vector acquisition part 109 Selection unit 110 Evaluation unit 111 Output unit

Claims

A related patent identification information storage unit that stores one or more related patent identification information, which is information for specifying related patent documents, which are patent documents determined to be related as a result of screening work in the patent search;
One or more related patents stored in the related patent identification information storage unit using elements included in the patent document identified by the one or more related patent identification information stored in the related patent identification information storage unit A search expression generation unit that generates a search expression capable of acquiring a patent document identified by the identification information;
A search patent identification information acquisition unit that acquires search patent identification information that is information for specifying a search patent document that is a patent document acquired using the search formula generated by the search formula generation unit;
One or more related patent feature vectors that are feature vectors of related patent documents identified by each related patent identification information stored in the related patent identification information storage unit are acquired, and the search patent identification information acquisition unit is A feature vector acquisition unit that acquires one or more search patent feature vectors that are feature vectors of a search patent document identified by each acquired search patent identification information;
Search patent identification information corresponding to a search patent feature vector similar to a feature of one or more related patent feature vectors acquired by the feature vector acquisition unit among one or more search patent feature vectors acquired by the feature vector acquisition unit And a selection unit that selects at least search patent identification information that does not match the related patent identification information stored in the related patent identification information storage unit, and
An information processing apparatus comprising: an output unit that outputs information related to a result selected by the selection unit.

It further includes an evaluation unit that performs a lower evaluation as more search patent identification information that does not match the related patent identification information stored in the related patent identification information storage unit among the search patent identification information selected by the selection unit. ,
The output unit is
The information processing apparatus according to claim 1, wherein a result of the evaluation by the evaluation unit is output.

The selection unit includes:
Search patent identification information corresponding to a search patent feature vector similar to a feature of one or more related patent feature vectors acquired by the feature vector acquisition unit among one or more search patent feature vectors acquired by the feature vector acquisition unit The information processing apparatus according to claim 1 or 2, wherein search patent identification information corresponding to a patent document identified by the related patent identification information stored in the related patent identification information storage unit is also selected. .

The selection unit includes:
Search patent identification information corresponding to a search patent feature vector similar to a feature of one or more related patent feature vectors acquired by the feature vector acquisition unit among one or more search patent feature vectors acquired by the feature vector acquisition unit And also search patent identification information corresponding to the patent document identified by the related patent identification information stored in the related patent identification information storage unit,
The evaluation unit is
Related patent identification information stored in the related patent identification information storage unit that is the search patent identification information acquired by the search patent identification information acquisition unit and does not match the search patent identification information selected by the selection unit The information processing apparatus according to claim 2, wherein the lower the evaluation, the lower the evaluation.

And a non-related patent identification information storage unit for storing at least one non-related patent identification information, which is information for identifying an unrelated patent document that is determined to be irrelevant as a result of screening work in the patent search. ,
The search expression generation unit
1 further stored in the unrelated patent identification information storage unit using elements included in the patent document identified by one or more unrelated patent identification information stored in the unrelated patent identification information storage unit. The information processing apparatus according to any one of claims 1 to 4, wherein a search expression that does not acquire at least a part of the patent document identified by the above unrelated patent identification information is generated.

The feature vector acquisition unit
Further obtaining one or more unrelated patent feature vectors that are feature vectors of unrelated patent documents identified by each unrelated patent identification information stored in the unrelated patent identification information storage unit;
The selection unit includes:
Search patent identification corresponding to a search patent feature vector that is not similar to a feature of one or more unrelated patent feature vectors acquired by the feature vector acquisition unit among one or more search patent feature vectors acquired by the feature vector acquisition unit The information processing apparatus according to claim 5, wherein information is selected.

And a non-related patent identification information storage unit for storing at least one non-related patent identification information, which is information for identifying an unrelated patent document that is determined to be irrelevant as a result of screening work in the patent search. ,
The feature vector acquisition unit
Further obtaining one or more unrelated patent feature vectors that are feature vectors of unrelated patent documents identified by each unrelated patent identification information stored in the unrelated patent identification information storage unit;
The selection unit includes:
Search patent identification corresponding to a search patent feature vector that is not similar to a feature of one or more unrelated patent feature vectors acquired by the feature vector acquisition unit among one or more search patent feature vectors acquired by the feature vector acquisition unit The information processing apparatus according to claim 1, wherein information is selected.

The selection unit includes:
Among the one or more unrelated patent feature vectors acquired by the feature vector acquisition unit, the feature vector acquisition unit acquires 1 similar to the feature of the one or more related patent feature vectors acquired by the feature vector acquisition unit. The information processing apparatus according to claim 6 or 7, wherein non-related patent identification information corresponding to an unrelated patent feature vector that is not similar to the feature of the unrelated patent feature vector is also selected.

And a non-related patent identification information storage unit for storing at least one non-related patent identification information, which is information for identifying an unrelated patent document that is determined to be irrelevant as a result of screening work in the patent search. ,
The feature vector acquisition unit
Further obtaining one or more unrelated patent feature vectors that are feature vectors of unrelated patent documents identified by each unrelated patent identification information stored in the unrelated patent identification information storage unit;
The selection unit includes:
Among the one or more unrelated patent feature vectors acquired by the feature vector acquisition unit, the feature vector acquisition unit acquires 1 similar to the feature of the one or more related patent feature vectors acquired by the feature vector acquisition unit. Also select unrelated patent identification information corresponding to the unrelated patent feature vector that is not similar to the features of the unrelated patent feature vector,
The evaluation unit is
The information processing apparatus according to claim 2 or 3, wherein the lower the evaluation is, the more unrelated patent identification information is selected by the selection unit.

The feature vector acquisition unit
2. Each patent document identified by information identifying each patent document is acquired, one or more elements are acquired from each patent document, and each feature vector is acquired using the one or more elements. The information processing apparatus according to any one of claims 9 to 9.

The feature vector acquisition unit
The information processing apparatus according to any one of claims 1 to 10, wherein each feature vector is obtained by using similar elements acquired from the respective patent documents as elements corresponding to the same vector elements.

A related patent identification information storage unit that stores one or more related patent identification information, which is information for identifying related patent documents that are determined to be related as a result of screening work in the patent search, a search expression generation unit, An information processing method processed using a search patent identification information acquisition unit, a feature vector acquisition unit, a selection unit, and an output unit,
The search expression generation unit is stored in the related patent identification information storage unit using an element included in a patent document identified by one or more related patent identification information stored in the related patent identification information storage unit. A search expression generation step for generating a search expression capable of acquiring a patent document identified by the one or more related patent identification information;
The search patent identification information acquisition unit acquires search patent identification information that is information for specifying a search patent document that is a patent document acquired using the search formula generated in the search formula generation step. A step and the feature vector acquisition unit acquire one or more related patent feature vectors that are feature vectors of related patent documents identified by each related patent identification information stored in the related patent identification information storage unit; and A feature vector acquisition step of acquiring one or more search patent feature vectors that are feature vectors of the search patent document identified by each search patent identification information acquired in the search patent identification information acquisition step;
The selection unit corresponds to a search patent feature vector similar to a feature of one or more related patent feature vectors acquired in the feature vector acquisition step among one or more search patent feature vectors acquired in the feature vector acquisition step. A selection step of selecting at least search patent identification information that is search patent identification information that does not match the related patent identification information stored in the related patent identification information storage unit;
An information processing method comprising: an output step in which the output unit outputs information relating to a result selected in the selection step.

A computer that can access a related patent identification information storage unit that stores one or more related patent identification information, which is information for identifying related patent documents that are determined to be related as a result of screening work in a patent search;
One or more related patents stored in the related patent identification information storage unit using elements included in the patent document identified by the one or more related patent identification information stored in the related patent identification information storage unit A search expression generation unit that generates a search expression capable of acquiring a patent document identified by the identification information;
A search patent identification information acquisition unit that acquires search patent identification information that is information for specifying a search patent document that is a patent document acquired using the search formula generated by the search formula generation unit,
One or more related patent feature vectors that are feature vectors of related patent documents identified by each related patent identification information stored in the related patent identification information storage unit are acquired, and the search patent identification information acquisition unit is A feature vector acquisition unit that acquires one or more search patent feature vectors that are feature vectors of a search patent document identified by each acquired search patent identification information;
Search patent identification information corresponding to a search patent feature vector similar to a feature of one or more related patent feature vectors acquired by the feature vector acquisition unit among one or more search patent feature vectors acquired by the feature vector acquisition unit And a selection unit that selects at least search patent identification information that does not match the related patent identification information stored in the related patent identification information storage unit,
The program for functioning as an output part which outputs the information regarding the result which the said selection part selected.