JP2003157262A

JP2003157262A - Patent retrieval device, control method therefor, control program and recording medium

Info

Publication number: JP2003157262A
Application number: JP2001355274A
Authority: JP
Inventors: Takashige Tanaka; 敬重田中; Koji Yamada; 孝司山田
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 2001-11-20
Filing date: 2001-11-20
Publication date: 2003-05-30

Abstract

PROBLEM TO BE SOLVED: To clearly present referring and to-be-referred relation among a plurality of patent documents and to extract the patent document of approximate technical contents though there is not the referring relation. SOLUTION: The retrieval processing part 102 of a patent retrieval system 100 analyzes a retrieving source patent document and extracts a word or a composite word included in the patent document for setting conditions as a retrieval condition word/phrase. Then, by referring to a database part 11, a word/phrase for retrieval and retrieval condition word/phrase are compared with each other to retrieve a patent document similar to the patent document of the retrieval source as a patent document highly related with the patent document of the retrieval source.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、特許検索装置、特
許検索装置の制御方法、制御プログラムおよび記録媒体
に係り、特に複数の特許文献相互間の関係あるいは特許
文献における請求項相互間の関係を検索者に対して明確
に提示するための技術に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a patent search device, a control method for the patent search device, a control program, and a recording medium, and more particularly to a relationship between a plurality of patent documents or a relationship between claims in a patent document. A technique for clearly presenting to a searcher.

【０００２】[0002]

【従来の技術】近年の技術の発達に伴い、特許出願の件
数は膨大なものとなってきている。これに伴って特許文
献の量も膨大となってきており、重複研究を防止し、権
利侵害のチェック、出願前の調査、他社の技術開発動向
の把握、研究開発の方向性の決定等を行う場合の特許文
献調査も容易ではないのが実状である。このような特許
文献調査における負荷を軽減するためのものとして、従
来より、特許検索システムが知られている。従来の特許
検索システムにおいては、検索対象の特許文献に含まれ
ると思われるキーワードを一または複数入力し、当該キ
ーワードあるいはキーワード群を含む特許文献をデータ
ベースを参照して抽出し、抽出した特許文献のリストを
生成し、表示あるいは出力するように構成されていた。2. Description of the Related Art With the development of technology in recent years, the number of patent applications has become enormous. Along with this, the volume of patent documents is also enormous, preventing duplicate research, checking for infringement of rights, investigating before application, grasping technology development trends of other companies, determining the direction of research and development, etc. In fact, it is not easy to search for patent documents. A patent search system has been conventionally known as a means for reducing the load in such patent document search. In the conventional patent search system, one or more keywords that are considered to be included in the patent document to be searched are input, the patent document including the keyword or keyword group is extracted by referring to the database, and the extracted patent document It was configured to generate a list and display or output it.

【０００３】[0003]

【発明が解決しようとする課題】上記従来の特許検索シ
ステムにおいては、「漏れの少ない」調査を行うことが
できるものの、キーワードの選択を誤れば、逆に不必要
な情報も膨大に含まれてしまうと言う問題点があった。
そしてこのような場合には調査そのものの時間も膨大に
かかると言う問題点があった。また、不必要にキーワー
ドを増加して絞り込みを行ってしまうと関連する特許文
献も抜け落ちてしまうという問題点があった。また、複
数の特許文献が抽出される場合であっても、各特許文献
相互の関係は、文献内容を詳細に読み取らなければ分か
らないと言う問題点があった。すなわち、複数の関連す
る特許文献が抽出されてもどの特許文献がより基本特許
に近いものであるかは、当該複数の文献に記載されてい
る情報をある程度参照する必要があった。そこで、本発
明の目的は、複数の特許文献間の参照、被参照関係を明
確に提示し、さらに参照関係がなくても近似する技術内
容の特許文献を抽出することが可能な特許検索装置、特
許検索装置の制御方法、制御プログラムおよび記録媒体
を提供することにある。In the above-mentioned conventional patent search system, although a "leakage-free" search can be performed, if a keyword is selected incorrectly, a large amount of unnecessary information is included. There was a problem that it would end up.
And in such a case, there was a problem that the time for the investigation itself would be enormous. In addition, there is a problem in that if the number of keywords is unnecessarily increased and narrowed down, related patent documents will also be omitted. Further, even when a plurality of patent documents are extracted, there is a problem that the mutual relationship between the respective patent documents cannot be understood unless the contents of the documents are read in detail. In other words, even if a plurality of related patent documents are extracted, it is necessary to refer to information described in the plurality of documents to some extent to determine which patent document is closer to the basic patent. Therefore, an object of the present invention is to provide a patent search device capable of extracting a patent document having technical contents that are similar to each other even if there is no reference relationship, by clearly presenting references between a plurality of patent documents and referenced relationships. It is intended to provide a control method, a control program, and a recording medium of a patent search device.

【０００４】[0004]

【課題を解決するための手段】上記課題を解決するた
め、特許検索装置は、検索対象とする特許文献に含まれ
る単語あるいは複合語を予め検索用語句として当該特許
文献に関連づけて記憶する特許検索データベース部と、
検索元特許文献を解析し、当該条件設定用特許文献に含
まれる単語あるいは複合語を検索条件語句として抽出す
る語句抽出部と、前記検索用語句および前記検索条件語
句を比較し、前記検索元特許文献に対して類似性の高い
特許文献を前記検索元特許文献に関連性の高い特許文献
として検索する検索部と、を備えたことを特徴としてい
る。上記構成によれば、特許検索装置の語句抽出部は、
検索元特許文献を解析し、当該条件設定用特許文献に含
まれる単語あるいは複合語を検索条件語句として抽出す
る。検索部は、特許検索データベース部を参照して前記
検索用語句および前記検索条件語句を比較し、前記検索
元特許文献に対して類似性の高い特許文献を前記検索元
特許文献に関連性の高い特許文献として検索する。In order to solve the above problems, a patent search device stores a word or a compound word included in a patent document to be searched as a search term phrase in advance in association with the patent document. Database department,
The search source patent document is analyzed to compare the search term phrase and the search condition phrase with a phrase extraction unit that extracts a word or a compound word included in the condition setting patent document as a search condition phrase, and the search source patent document And a search unit for searching a patent document highly similar to the document as a patent document highly related to the search source patent document. According to the above configuration, the word extraction unit of the patent search device is
The search source patent document is analyzed, and a word or a compound word included in the condition setting patent document is extracted as a search condition phrase. The search unit refers to the patent search database unit to compare the search term phrase and the search condition phrase, and a patent document having a high similarity to the search source patent document is highly relevant to the search source patent document. Search as a patent document.

【０００５】この場合において、各特許文献中の記載に
基づいて、前記検索元特許文献と前記関連性の高い特許
文献との間の参照関係を抽出する参照先抽出部を備える
ようにしてもよい。また、前記特許検索データベース部
は、抽出した前記参照関係に基づいて、前記検索元特許
文書に対応づけて参照関係のある特許文献についての情
報を格納する参照先データベース部を備えるようにして
もよい。さらに前記参照先データベース部の情報に基づ
いて、前記検索元特許文献と前記関連性の高い特許文献
との間の参照関係を表す図表を作成する図表作成部と、
前記図表作成部により作成された図表を提示するディス
プレイ部と、を備えるようにしてもよい。In this case, a reference destination extraction unit for extracting a reference relationship between the search source patent document and the highly related patent document may be provided based on the description in each patent document. . Further, the patent search database unit may include a reference destination database unit that stores information about patent documents having a reference relationship in association with the search source patent document based on the extracted reference relationship. . Further, based on the information of the reference destination database unit, a chart creation unit that creates a chart representing a reference relationship between the search source patent document and the highly related patent document,
A display unit that presents the chart created by the chart creating unit may be provided.

【０００６】さらにまた、前記検索元特許文献の形態素
解析を行って前記単語あるいは前記複合語を抽出する形
態素解析部と、抽出された前記単語あるいは前記複合語
の当該検索元特許文献における重要度を計算する重要度
算出部と、抽出された前記単語あるいは前記複合語を前
記重要度に対応づけて前記特許検索データベース部に前
記検索用語句として登録する登録部と、を備えるように
してもよい。また、前記検索部は、前記検索用語句に対
応するベクトルおよび前記検索条件語句に対応するベク
トルの距離に基づいて関連性の高い特許文献を検索する
ようにしてもよい。Furthermore, a morphological analysis unit for performing morphological analysis of the search source patent document to extract the word or the compound word, and an importance of the extracted word or compound word in the search source patent document. An importance degree calculating unit for calculating and a registration unit for registering the extracted word or the compound word in the patent search database unit as the search term phrase in association with the importance degree may be provided. Further, the search unit may search for patent documents having high relevance based on a distance between a vector corresponding to the search term phrase and a vector corresponding to the search condition phrase.

【０００７】さらに前記特許文献は、複数の「請求項」
の記載を含み、前記検索部は、前記検索用語句に基づい
て前記複数の「請求項」の相互間の従属関係を抽出する
請求項解析部を備えるようにしてもよい。さらにまた、
前記請求項解析部は、検索対象の「請求項」中に含まれ
る他の「請求項」の記載並びに当該検索対象の「請求
項」および他の「請求項」の発明の名称に基づいて前記
従属関係を抽出するようにしてもよい。また、前記請求
項解析部は、前記検索対象の「請求項」に前記他の「請
求項」の記載があり、かつ、当該検索対象の「請求項」
の発明の名称と前記他の「請求項」の発明の名称が同一
である場合に、前記従属関係があると判別するようにし
てもよい。さらに前記特許文献は、「請求項」の記載を
含み、前記検索部は、前記検索用語句に基づいて前記
「請求項」の記載に含まれる構成要件を抽出する請求項
解析部を備えるようにしてもよい。Further, the above-mentioned patent document has a plurality of "claims".
The search unit may include a claim analysis unit that extracts a dependency relationship between the plurality of “claims” based on the search term phrase. Furthermore,
The claim analysis unit is based on the description of another “claim” included in the “claim” to be searched and the names of the inventions of the “claim” and other “claims” to be searched. You may make it extract a subordinate relationship. Further, the claim analysis unit has the description of the other "claim" in the "claim" of the search target, and the "claim" of the search target.
It may be determined that the subordinate relationship exists when the title of the invention of 1) and the title of the invention of the other claim are the same. Further, the patent document includes a description of “claim”, and the search unit includes a claim analysis unit that extracts a constituent element included in the description of “claim” based on the search term phrase. May be.

【０００８】また、検索対象とする特許文献に含まれる
単語あるいは複合語を予め検索用語句として当該特許文
献に関連づけて記憶する特許検索データベース部を有す
る特許検索装置の制御方法において、検索元特許文献を
解析し、当該条件設定用特許文献に含まれる単語あるい
は複合語を検索条件語句として抽出する語句抽出過程
と、前記検索用語句および前記検索条件語句を比較し、
前記検索元特許文献に対して類似性の高い特許文献を前
記検索元特許文献に関連性の高い特許文献として検索す
る検索過程と、を備えたことを特徴としている。この場
合において、各特許文献中の記載に基づいて、前記検索
元特許文献と前記関連性の高い特許文献との間の参照関
係を抽出する参照先抽出過程を備えるようにしてもよ
い。Further, in a control method of a patent search apparatus having a patent search database section for storing a word or a compound word included in a patent document to be searched as a search term phrase in advance in association with the patent document, Analyzing, the phrase extraction process of extracting a word or a compound word included in the condition setting patent document as a search condition phrase, comparing the search term phrase and the search condition phrase,
A search process for searching a patent document having a high similarity to the search source patent document as a patent document highly related to the search source patent document. In this case, a reference destination extraction process for extracting a reference relationship between the search source patent document and the highly related patent document may be provided based on the description in each patent document.

【０００９】また、前記特許検索データベース部は、抽
出した前記参照関係に基づいて、前記検索元特許文書に
対応づけて参照関係のある特許文献についての情報を格
納する参照先データベース部を備え、前記参照先データ
ベース部の情報に基づいて、前記検索元特許文献と前記
関連性の高い特許文献との間の参照関係を表す図表を作
成する図表作成過程と、前記図表作成部により作成され
た図表を提示するディスプレイ過程と、を備えるように
してもよい。さらに前記検索元特許文献の形態素解析を
行って前記単語あるいは前記複合語を抽出する形態素解
析過程と、抽出された前記単語あるいは前記複合語の当
該検索元特許文献における重要度を計算する重要度計算
過程と、抽出された前記単語あるいは前記複合語を前記
重要度に対応づけて前記特許検索データベース部に前記
検索用語句として登録する登録過程と、を備えるように
してもよい。The patent search database section includes a reference destination database section that stores information about patent documents having a reference relationship in association with the search source patent document based on the extracted reference relationship. Based on the information in the reference destination database unit, a chart creation process of creating a chart representing a reference relationship between the search source patent document and the highly related patent document, and a chart created by the diagram creation unit. And a display process to present. Further, a morphological analysis process of extracting the word or the compound word by performing a morphological analysis of the search source patent document, and an importance calculation for calculating the importance of the extracted word or compound word in the search source patent document. A process and a registration process of associating the extracted word or the compound word with the importance degree and registering the word or the compound word in the patent search database unit as the search term phrase may be provided.

【００１０】またさらに、前記検索過程は、前記検索用
語句に対応するベクトルおよび前記検索条件語句に対応
するベクトルの距離に基づいて関連性の高い特許文献を
検索するようにしてもよい。また、前記特許文献は、複
数の「請求項」の記載を含み、前記検索過程は、前記検
索用語句に基づいて前記複数の「請求項」の相互間の従
属関係を抽出する請求項解析過程を備えるようにしても
よい。さらに前記請求項解析過程は、検索対象の「請求
項」中に含まれる他の「請求項」の記載並びに当該検索
対象の「請求項」および他の「請求項」の発明の名称に
基づいて前記従属関係を抽出するようにしてもよい。Further, the searching step may search for patent documents having high relevance based on the distance between the vector corresponding to the search term phrase and the vector corresponding to the search condition phrase. Further, the patent document includes descriptions of a plurality of “claims”, and the search process extracts a subordinate relationship between the plurality of “claims” based on the search term phrase. May be provided. Further, the claim analysis process is based on the description of another "claim" included in the "claim" to be searched and the names of the inventions of the "claim" and other "claims" to be searched. You may make it extract the said subordinate relationship.

【００１１】さらにまた、前記請求項解析過程は、前記
検索対象の「請求項」に前記他の「請求項」の記載があ
り、かつ、当該検索対象の「請求項」の発明の名称と前
記他の「請求項」の発明の名称が同一である場合に、前
記従属関係があると判別するようにしてもよい。また、
前記特許文献は、「請求項」の記載を含み、前記検索過
程は、前記検索用語句に基づいて前記「請求項」の記載
に含まれる構成要件を抽出する請求項解析過程を備える
ようにしてもよい。Furthermore, in the claim analysis process, there is a description of the other "claim" in the "claim" of the search target, and the invention name of the "claim" of the search target and the It may be determined that there is the subordinate relationship when the names of the inventions in other "claims" are the same. Also,
The patent document includes a description of “claim”, and the search process includes a claim analysis process of extracting a constituent element included in the description of “claim” based on the search term phrase. Good.

【００１２】また、検索対象とする特許文献に含まれる
単語あるいは複合語を予め検索用語句として当該特許文
献に関連づけて記憶する特許検索データベース部を有す
るコンピュータを特許検索装置として機能させる制御プ
ログラムは、検索元特許文献を解析させ、当該条件設定
用特許文献に含まれる単語あるいは複合語を検索条件語
句として抽出させ、前記検索用語句および前記検索条件
語句を比較させ、前記検索元特許文献に対して類似性の
高い特許文献を前記検索元特許文献に関連性の高い特許
文献として検索させる、ことを特徴としている。この場
合において、各特許文献中の記載に基づいて、前記検索
元特許文献と前記関連性の高い特許文献との間の参照関
係を抽出させるようにしてもよい。また、前記特許文献
は、複数の「請求項」の記載を含み、前記検索用語句に
基づいて前記複数の「請求項」の相互間の従属関係を抽
出させるようにしてもよい。さらに前記特許文献は、
「請求項」の記載を含み、前記検索用語句に基づいて前
記「請求項」の記載に含まれる構成要件を抽出させるよ
うにしてもよい。また、上記各制御プログラムを記録媒
体に記録するようにしてもよい。Further, a control program that causes a computer having a patent search database unit that stores in advance a word or a compound word included in a patent document to be searched as a search term phrase in association with the patent document as a patent search device is: The search source patent document is analyzed, and a word or a compound word included in the condition setting patent document is extracted as a search condition phrase, and the search term phrase and the search condition phrase are compared, with respect to the search source patent document. It is characterized in that a patent document having a high similarity is searched as a patent document having a high relevance to the search source patent document. In this case, the reference relationship between the search source patent document and the highly related patent document may be extracted based on the description in each patent document. Moreover, the said patent document may contain description of several "claim", and you may make it extract the subordination relationship between these "claim" based on the said search term phrase. Further, the patent document,
It is also possible to include the description of “claim” and extract the constituent elements included in the description of “claim” based on the search term phrase. Further, each of the above control programs may be recorded in a recording medium.

【００１３】[0013]

【発明の実施の形態】次に本発明の好適な実施の形態に
ついて図面を参照して説明する。本実施形態は、本発明
を特許検索システムに適用する場合のものである。［１］特許検索データベース登録システムまず特許検索システムに用いられる特許検索データベー
スを構築するための特許検索データベース登録システム
について説明する。［１．１］特許検索データベース登録システムの構成図１は特許検索データベース登録システムの概要構成ブ
ロック図である。特許検索データベース登録システム１
０は、大別すると、各種データをデータベースとして蓄
積するデータベース部１１と、データベース部１１に各
種データを登録するための登録処理部１２と、を備えて
いる。ここで、特許検索データベース登録システム１０
は、コンピュータシステムにおいて実現可能であり、登
録処理部１２の機能は、登録処理部１２を構成する各部
に対応する、マイクロプロセッサで実行可能なプログラ
ムによって実現される。また、このようなプログラム
は、半導体メモリ、ＣＤ−ＲＯＭなどの記録媒体から直
接実行してもよい。また、外部記憶装置に予めプログラ
ムインストールして実行することも可能である。さらに
プログラムの実行に先立って実行する毎、あるいは、最
初に一度だけ、インターネットなどのネットワークを介
してインストールするようにしてもよい。BEST MODE FOR CARRYING OUT THE INVENTION Next, preferred embodiments of the present invention will be described with reference to the drawings. The present embodiment is a case where the present invention is applied to a patent search system. [1] Patent Search Database Registration System First, a patent search database registration system for constructing a patent search database used in the patent search system will be described. [1.1] Configuration of Patent Search Database Registration System FIG. 1 is a schematic block diagram of the patent search database registration system. Patent search database registration system 1
0 roughly includes a database unit 11 that accumulates various data as a database, and a registration processing unit 12 that registers various data in the database unit 11. Here, the patent search database registration system 10
Can be realized in a computer system, and the function of the registration processing unit 12 is realized by a program that can be executed by a microprocessor and corresponds to each unit that constitutes the registration processing unit 12. Further, such a program may be directly executed from a recording medium such as a semiconductor memory or a CD-ROM. It is also possible to install the program in an external storage device in advance and execute it. Furthermore, the program may be installed each time it is executed prior to execution or only once at the beginning via a network such as the Internet.

【００１４】［１．１．１］データベース部の構成まず、データベース部の構成について説明する。データ
ベース部１１は、ハードディスクなどの外部記憶装置に
構築されている。そしてデータベース部１１は、大別す
ると、単語データベース１５と、構成要件（発明特定事
項）データベース１６と、従属関係データベース１７
と、参照文献データベース１８と、を備えている。単語
データベース１５は、特許文献ＰＤに含まれている単語
あるいは複合語を予め指定された特定の分類（分野）に
関連させて記憶させるためのデータベースである。[1.1.1] Configuration of Database Unit First, the configuration of the database unit will be described. The database unit 11 is built in an external storage device such as a hard disk. The database unit 11 is roughly classified into a word database 15, a constituent requirement (invention specifying matter) database 16, and a dependency relation database 17.
And a reference document database 18. The word database 15 is a database for storing a word or a compound word included in the patent document PD in association with a predetermined specific classification (field).

【００１５】図２は単語データベースの基本構成であ
る。単語データベース１５は、大別すると、ページテー
ブル２１と、キーワードテーブル２２と、ワードテーブ
ル２３と、を備えている。図２においては、一つのペー
ジテーブル２１に対し、１系統のキーワードテーブル２
２およびワードテーブル２３のみを図示しているが、実
際には、一つのページテーブル２１に対し、複数系統の
キーワードテーブル２２およびワードテーブル２３が構
成され、全体として、ツリー構造（木構造）をなしてい
る。ページテーブル２１は、ページＩＤデータ３１、認
識番号データ３２、タイトルデータ３３、更新日時デー
タ３４、書誌事項データ３５および被リンク数データ３
６を備えている。なお、このデータ構成は一例であり、
これに限られるものではない。例えば、対応する公報の
ページ数や技術分野などのデータを組み込むようにして
も構わない。FIG. 2 shows the basic structure of the word database. The word database 15 roughly includes a page table 21, a keyword table 22, and a word table 23. In FIG. 2, one page table 21 is used for one keyword table 2
2 shows only the word table 23 and the word table 23, in reality, a plurality of systems of the keyword table 22 and the word table 23 are configured for one page table 21 to form a tree structure (tree structure) as a whole. ing. The page table 21 includes page ID data 31, identification number data 32, title data 33, update date / time data 34, bibliographic item data 35, and link count data 3
6 is provided. Note that this data structure is an example,
It is not limited to this. For example, data such as the number of pages and the technical field of the corresponding publication may be incorporated.

【００１６】ページＩＤデータ３１は、登録する特許文
献（特許文書）ＰＤのそれぞれに固有（unique）に割り
当てられ、各特許文献ＰＤを特定するためのデータであ
る。認識番号データ３２は、公報番号など特許庁によっ
て各特許文献ＰＤに割り振られた値を有するデータであ
る。タイトルデータ３３は、各特許文献ＰＤの内容を表
す名称を格納している。例えば、特許文献ＰＤが特許公
報である場合には、発明の名称である。更新日時データ
３４は、各特許文献ＰＤの登録日時を格納している。書
誌事項データ３５は、特許文献ＰＤが特許公報である場
合には、公報発行国、公報種別、公開番号、公開日、国
際特許分類情報（ＦＩ分類番号等）、審査請求の有無、
請求項の数、出願形態、全頁数、出願番号、出願日、優
先権主張基礎出願の出願番号、優先権主張基礎出願の出
願番号、優先権主張基礎出願の出願日、出願人情報（識
別番号、氏名または名称、住所または居所）、発明者情
報（氏名、住所または居所）、代理人情報（識別番号、
氏名、資格）、テーマコード、Ｆターム等である。すな
わち、書誌事項データ３５は、各特許文献ＰＤに関わる
様々な書誌的な情報を格納するためのデータである。被
リンク数データ３６は、各特許文献ＰＤが参照されてい
る他の特許文献の文献数を格納するためのデータであ
る。The page ID data 31 is data uniquely assigned to each patent document (patent document) PD to be registered, and is data for specifying each patent document PD. The identification number data 32 is data having a value assigned to each patent document PD by the patent office, such as a publication number. The title data 33 stores a name indicating the content of each patent document PD. For example, when the patent document PD is a patent publication, it is the title of the invention. The update date / time data 34 stores the registration date / time of each patent document PD. If the patent document PD is a patent gazette, the bibliographical matter data 35 includes the country of publication of the gazette, the gazette type, the publication number, the publication date, the international patent classification information (FI classification number, etc.), the presence or absence of an examination request,
Number of claims, application form, total number of pages, application number, application date, application number of basic application claiming priority, application number of basic application claiming priority, application date of basic application claiming priority, applicant information (identification Number, name or name, address or whereabouts), inventor information (name, address or whereabouts), agent information (identification number,
Name, qualification), theme code, F term, etc. That is, the bibliographic item data 35 is data for storing various bibliographical information relating to each patent document PD. The linked number data 36 is data for storing the number of documents of other patent documents to which each patent document PD is referred.

【００１７】また、キーワードテーブル２２は、ページ
テーブル２１とワードテーブル２３とを関係付けするた
めのデーブルである。そして、キーワードテーブル２２
は、ページＩＤデータ４１、ワードＩＤデータ４２、出
現回数（cost）データ４３、重要度データ４４および段
落フラグデータ４５を備えている。ページＩＤデータ４
１は、ページテーブル２１のページＩＤデータ３１に対
応するデータが格納するデータである。すなわち、ペー
ジテーブル２１のページＩＤデータ３１と同一のデータ
が格納されていれば、当該キーワードテーブル２２は、
同一のページＩＤデータを有するページテーブル２１に
関連するデータを格納していることを表している。ワー
ドＩＤデータ４２は、対応する単語についての情報を格
納しているワードテーブル２３との関連づけを行うため
のデータである。The keyword table 22 is a table for associating the page table 21 and the word table 23 with each other. And the keyword table 22
Includes page ID data 41, word ID data 42, appearance frequency (cost) data 43, importance level data 44, and paragraph flag data 45. Page ID data 4
1 is data stored in the data corresponding to the page ID data 31 of the page table 21. That is, if the same data as the page ID data 31 of the page table 21 is stored, the keyword table 22 is
This indicates that data related to the page table 21 having the same page ID data is stored. The word ID data 42 is data for associating with the word table 23 that stores information about corresponding words.

【００１８】出現回数（cost）データ４３は、ワードＩ
Ｄデータ４２で特定される単語のページＩＤデータ４１
で特定される特許文献中における出現回数に相当するデ
ータである。すなわち、ある単語が当該特許文献中に何
回使用されたかを表すデータである。重要度データ４４
は、ページＩＤデータ４１で特定される特許文献におけ
るワードＩＤデータ４２で特定される単語の重要度とし
てＴＦＩＤＦ法により算出したＴＦＩＤＦ値を格納して
いる。このＴＦＩＤＦ値は、後述するワードテーブルに
格納されているＩＤＦデータ（ＩＤＦ値；全特許文献中
における対応する単語の出現した文献数に相当）および
出現回数データ４３に基づいて算出される。すなわち、
ある単語が出現した文献数が少ないほど、かつ、各文献
における当該単語の出現数が多いほど重要な単語である
と認識されることとなる。The number of appearances (cost) data 43 is the word I
Page ID data 41 of the word specified by the D data 42
It is data corresponding to the number of appearances in the patent document specified by. That is, it is data indicating how many times a certain word is used in the patent document. Importance data 44
Stores the TFIDF value calculated by the TFIDF method as the importance of the word specified by the word ID data 42 in the patent document specified by the page ID data 41. This TFIDF value is calculated based on IDF data (IDF value; corresponding to the number of documents in which the corresponding word appears in all patent documents) and the number-of-occurrence data 43 stored in a word table described later. That is,
The smaller the number of documents in which a certain word appears and the larger the number of occurrences of the word in each document, the more important the word is to be recognized.

【００１９】段落フラグデータ４５は、ワードＩＤデー
タ４２で特定される単語がページＩＤデータ４１で特定
される特許文献中のいずれの段落で検出されたかを示す
データである。この場合における段落とは、特許文献が
論文や書籍などの通常の文献の場合には、一般的な段落
の意味の他、章や節などの概念を含むものである。ま
た、特許文献が特許公報などの場合には、発明の名称の
欄、特許請求の範囲の欄、発明の詳細な説明の欄、図面
の簡単な説明の欄、要約の欄などについても段落として
扱っている。The paragraph flag data 45 is data indicating in which paragraph in the patent document specified by the page ID data 41 the word specified by the word ID data 42 is detected. In this case, when the patent document is an ordinary document such as a paper or a book, the paragraph includes not only the meaning of general paragraphs but also the concept of chapters and sections. Further, when the patent document is a patent gazette or the like, paragraphs of the title column of the invention, the claims column, the detailed explanation column of the invention, the brief explanation column of the drawing, the abstract column, etc. Handling.

【００２０】ワードテーブル２３は、単語データ５１、
ワードＩＤデータ５２およびＩＤＦデータ５３を備えて
いる。単語データ５１は、形態素解析後の単語が登録さ
れている。ワードＩＤデータ５２は、単語データ５１と
して登録されている単語を特定するためのデータであ
る。ＩＤＦデータ５３は、単語データ５１として登録さ
れている単語が全特許文献中において出現した文献数に
相当するＩＤＦ値が登録されている。構成要件データベ
ース１６は、特許文献である特許公開公報、特許掲載公
報などから、特許請求の範囲に含まれる構成要件（いわ
ゆる発明特定事項）のデータを記憶している。具体的に
は、各構成要件の名称、各構成要件の作用を特徴づける
ような単語などを格納している。従属関係データベース
１７は、請求項（クレーム）間の繋がりである従属関係
に関するデータを記憶している。The word table 23 includes word data 51,
The word ID data 52 and the IDF data 53 are provided. In the word data 51, the words after morphological analysis are registered. The word ID data 52 is data for identifying a word registered as the word data 51. The IDF data 53 is registered with IDF values corresponding to the number of documents in which the word registered as the word data 51 appears in all patent documents. The constituent requirement database 16 stores data of constituent requirements (so-called invention specifying matters) included in the claims from patent publications, patent publications, and the like, which are patent documents. Specifically, the name of each component, a word that characterizes the action of each component, and the like are stored. The dependency database 17 stores data relating to a dependency which is a connection between claims.

【００２１】図３に従属関係データベース１７のデータ
フォーマットを示す。従属関係データベース１７は、ペ
ージＩＤデータ５５と、被引用請求項データ５６と、引
用請求項データ５７と、を備えている。ページＩＤデー
タ５５は、ページテーブル２１のページＩＤデータ３１
と同様のデータであり、ページＩＤデータ５５に対応す
る被引用請求項データ５６および引用請求項データ５７
は、このページＩＤデータ５５に対応する特許文献に含
まれる請求項の従属関係を表すこととなる。被引用請求
項データ５６は、他の請求項により引用される請求項の
請求項番号が登録される。引用請求項データ５７は、他
の請求項を引用する請求項の請求項番号が登録される。FIG. 3 shows the data format of the dependency database 17. The dependency database 17 includes page ID data 55, cited claim data 56, and cited claim data 57. The page ID data 55 is the page ID data 31 of the page table 21.
And the cited claim data 56 and the cited claim data 57 corresponding to the page ID data 55.
Represents the subordinate relationship of the claims included in the patent document corresponding to the page ID data 55. In the cited claim data 56, a claim number of a claim cited by another claim is registered. In the cited claim data 57, the claim number of a claim that refers to another claim is registered.

【００２２】具体的には、ページデータ＝１で特定され
る特許文献の特許請求の範囲の記載において、請求項１
および請求項２が独立項となっており、請求項３が請求
項１および請求項２を引用した従属項となっており、ペ
ージデータ＝２で特定される特許文献の特許請求の範囲
の記載において、請求項１が独立項となっており、請求
項２が請求項１を引用した従属項となっている場合に
は、従属関係データベース１７には、図４に示すような
データが登録される。すなわち、図４に示すように、以
下の〜のデータが登録される。ページＩＤデータ＝１、被引用請求項データ＝１、引
用請求項データ＝３ページＩＤデータ＝１、被引用請求項データ＝２、引
用請求項データ＝３ページＩＤデータ＝２、被引用請求項データ＝１、引
用請求項データ＝２Specifically, in the description of the claims of the patent document specified by page data = 1,
And claim 2 are independent claims, claim 3 is a dependent claim quoting claim 1 and claim 2, and description of claims of patent document specified by page data = 2 In claim 1, when claim 1 is an independent claim and claim 2 is a dependent claim that refers to claim 1, data as shown in FIG. 4 is registered in the dependent relationship database 17. It That is, as shown in FIG. 4, the following data items are registered. Page ID data = 1, Cited claim data = 1, Cited claim data = 3 Page ID data = 1, Cited claim data = 2, Cited claim data = 3 Page ID data = 2, Cited claim Data = 1, cited claim data = 2

【００２３】参照文献データベース１８は、各特許文献
について、当該文献中で参照している参照文献あるいは
審査などにおいて参照された参照文献に関するデータを
格納している。具体的には、参照文献番号（特許公開番
号、特許番号、……）、文献名、文献参照頁、文献発行
元、著者名などが必要に応じて格納されることとなる。
参照文献データベース１８は、図８に示すように、参照
先ページＩＤデータ８１、参照元ページＩＤデータ８
２、タイトルデータ８３、文献種別データ８４および文
献書誌情報データ８５を備えている。参照先ページＩＤ
データ８１は、参照先の特許文献を特定するためのペー
ジＩＤデータである。参照元ページＩＤデータ８２は、
参照元の特許文献を特定するためのページＩＤデータで
ある。The reference document database 18 stores, for each patent document, data relating to the reference document referred to in the document or the reference document referred to in the examination or the like. Specifically, the reference document number (patent publication number, patent number, ...), Document name, Document reference page, Document issuer, Author name, etc. are stored as necessary.
As shown in FIG. 8, the reference document database 18 includes reference destination page ID data 81 and reference source page ID data 8
2, title data 83, document type data 84, and document bibliographic information data 85. Reference page ID
The data 81 is page ID data for identifying the patent document referred to. The reference page ID data 82 is
This is page ID data for specifying the patent document of the reference source.

【００２４】タイトルデータ８３は、参照先の文献タイ
トルを登録する。上述の例の場合には「特開平△−１２
３４５号公報」が文献タイトルとなる。文献種別データ
８４は、当該参照先の特許文献の種別を表すデータであ
る。例えば、参照先の特許文献の種別が国内特許公報の
場合には文献種別データ＝０、外国特許公報の場合には
文献種別データ＝１、論文である場合には文献種別デー
タ＝２のように設定される。文献書誌情報データ８５
は、文献名、文献参照頁、文献発行元、著者名などの書
誌情報が所定の順番でＣＳＶ形式などのテキストデータ
形式で格納されている。なお、この文献書誌情報データ
８５は、これらの情報が必要なければ、必ずしも設ける
必要はない。The title data 83 registers the reference title of the reference. In the case of the above example, "Japanese Patent Laid-Open No. -12"
"345 gazette" is the title of the document. The document type data 84 is data representing the type of the patent document referred to. For example, when the type of the referred patent document is a domestic patent publication, the document classification data = 0, when it is a foreign patent publication, the document classification data = 1, and when it is a paper, the document classification data = 2. Is set. Bibliographic information data 85
Stores bibliographic information such as a document name, a document reference page, a document issuer, and an author name in a predetermined order in a text data format such as a CSV format. The bibliographic information data 85 need not be provided unless these pieces of information are needed.

【００２５】［１．１．２］検索処理部の構成次に検索処理部１２の構成について再び図１を参照して
説明する。検索処理部１２は、大別すると、形態素解析
部６１と、重要度算出部６２と、請求項解析部６３と、
形態素解析用辞書部６４と、参照文献抽出部６５と、標
準化部６６と、ストップワード処理部６７と、を備えて
いる。形態素解析部６１は、特許文献ＰＤを形態素解析
して頻出する名詞、サ変名詞、その他の名詞あるいはサ
変名詞に類する単語を単語データベース１５に登録す
る。この形態素解析部６１による単語の単語データベー
ス１５への登録に先立って、標準化部６６は、単語の標
準化（統一化）を行う。例えば、単語として「ＰＣ」、
「パソコン」、「パーソナルコンピュータ」の３種類の
単語が形態素解析により得られた場合に、これらの単語
は同一の意味を有しているので「パソコン」の単語に統
一するのである。また、ストップワード処理部６７は、
特許分野において特許文献の種類に関わらず頻出する単
語の単語データベース１５への登録を行わないように処
理をしている。例えば、単語として「請求項」、「出
願」、「特許」、「本発明」、「実用新案」等の単語に
ついては、単語データベース１５への登録を行わないよ
うにしている。[1.1.2] Configuration of Search Processing Unit Next, the configuration of the search processing unit 12 will be described with reference to FIG. 1 again. The search processing unit 12 is roughly classified into a morpheme analysis unit 61, an importance degree calculation unit 62, a claim analysis unit 63, and
A morphological analysis dictionary unit 64, a reference document extraction unit 65, a standardization unit 66, and a stop word processing unit 67 are provided. The morpheme analysis unit 61 registers in the word database 15 nouns, sahen nouns, other nouns or words similar to sahen nouns that frequently appear by morphologically analyzing the patent document PD. Prior to the registration of words in the word database 15 by the morphological analysis unit 61, the standardization unit 66 standardizes (unifies) the words. For example, the word “PC”,
When three types of words "personal computer" and "personal computer" are obtained by morphological analysis, these words have the same meaning and are therefore unified into the word "personal computer". In addition, the stop word processing unit 67
In the patent field, processing is performed so that words that frequently appear regardless of the type of patent document are not registered in the word database 15. For example, words such as “claim”, “application”, “patent”, “present invention” and “utility model” are not registered in the word database 15.

【００２６】重要度算出部６２は、形態素解析されて登
録された単語に対して重要度をＴＦＩＤＦ法を用いてＴ
ＦＩＤＦ値として計算し、単語データベース１５へ登録
する。請求項解析部６３は、大別すると、従属関係解析
部６３Ａと、構成要件抽出部６３Ｂと、を備えている。
従属関係解析部６３Ａは、特許文献が特許公報や実用新
案公報などである場合に、特許請求の範囲や実用新案登
録請求の範囲などを解析し、各請求項同士の従属関係を
調べる。また、構成要件抽出部６３Ｂは、特許請求の範
囲や実用新案登録請求の範囲などから構成要件（例え
ば、発明特定事項）を抽出する。形態素解析用辞書部６
４は、形態素解析部６１が形態素解析を行う際に用いる
一般的に知られているような態様の形態素解析辞書を格
納している。参照文献抽出部６５は、各特許文献につい
て、当該文献中で参照している参照文献あるいは審査な
どにおいて参照された参照文献を抽出し、参照文献デー
タベース１８に登録する。The importance calculating unit 62 calculates the importance of the word registered by the morphological analysis using the TFIDF method.
It is calculated as a FIDF value and registered in the word database 15. The claim analysis unit 63 roughly includes a dependency relationship analysis unit 63A and a constituent requirement extraction unit 63B.
When the patent document is a patent gazette or a utility model gazette, the dependency relationship analyzing unit 63A analyzes the claims and the utility model registration claim, and examines the dependency relationship between the claims. Further, the constituent requirement extracting unit 63B extracts constituent requirements (for example, an invention specifying matter) from the claims and the utility model registration request. Morphological analysis dictionary unit 6
Reference numeral 4 stores a morpheme analysis dictionary of a generally known form used when the morpheme analysis unit 61 performs morpheme analysis. For each patent document, the reference document extracting unit 65 extracts the reference document referred to in the document or the reference document referred to in the examination or the like, and registers it in the reference document database 18.

【００２７】［１．２］特許検索データベース登録シス
テムの動作次に特許検索データベース登録システムの動作を説明す
る。以下の説明においては、特許文献として特許公報を
登録対象とする場合を例として説明する。図５に特許検
索データベース登録処理のフローチャートを示す。まず
特許検索データベース登録システム１０は、形態素解析
を行い（ステップＳ１）、単語データベースに登録する
（ステップＳ２）。具体的には、形態素解析部６１は、
特許文献を形態素解析するに際し、各特許文献毎に固有
の番号を所定の条件に基づいてページＩＤとして割り振
り、ページテーブルのページＩＤデータおよびキーワー
ドテーブルのページＩＤデータに登録する。次に形態素
解析部６１は、特許文献である特許公報の形態素解析を
行い、公報番号を認識番号データ３２として登録する。[1.2] Operation of Patent Search Database Registration System Next, the operation of the patent search database registration system will be described. In the following description, the case of registering a patent publication as a patent document will be described as an example. FIG. 5 shows a flowchart of the patent search database registration processing. First, the patent search database registration system 10 performs morphological analysis (step S1) and registers it in the word database (step S2). Specifically, the morphological analysis unit 61
When performing morphological analysis on a patent document, a unique number for each patent document is assigned as a page ID based on a predetermined condition and registered in the page ID data of the page table and the page ID data of the keyword table. Next, the morpheme analysis unit 61 performs morpheme analysis of patent publications that are patent documents, and registers the publication number as the identification number data 32.

【００２８】続いて、形態素解析部６１は、発明の名称
を抽出し、当該発明の名称をタイトルデータ３３として
登録する。そして形態素解析部６１は、当該特許公報に
ついて、当該データベース登録システムにおける登録日
時を更新日時データに登録する。また、形態素解析部６
１は、特許公報から、公報発行国、公報種別、公開番
号、公開日、国際特許分類情報、審査請求の有無、請求
項の数、出願形態、全頁数、出願番号、出願日、優先権
主張基礎出願の出願番号、優先権主張基礎出願の出願番
号、優先権主張基礎出願の出願日、出願人情報、発明者
情報、代理人情報、テーマコード、Ｆタームなどを抽出
し、書誌事項データ３５として登録する。次に特許検索
データベース登録システムは、形態素解析の結果に基づ
いて各単語の重要度を算出し（ステップＳ３）、単語デ
ータベース１５のキーワードテーブル２２に登録する
（ステップＳ２）。Then, the morphological analysis unit 61 extracts the title of the invention and registers the title of the invention as title data 33. Then, the morpheme analysis unit 61 registers the registration date and time in the database registration system for the patent publication in the update date and time data. Also, the morphological analysis unit 6
1 indicates from the patent gazette, publication country, gazette type, publication number, publication date, international patent classification information, presence or absence of examination request, number of claims, application form, total number of pages, application number, application date, priority The application number of the claimed basic application, the application number of the claimed basic application, the application date of the claimed basic application, the applicant information, the inventor information, the agent information, the theme code, the F-term, etc. are extracted, and the bibliographic data Register as 35. Next, the patent search database registration system calculates the importance of each word based on the result of the morphological analysis (step S3) and registers it in the keyword table 22 of the word database 15 (step S2).

【００２９】この場合において、重要度算出部６２は、
キーワードテーブルの出現回数（cost）データ４３およ
びワードテーブルのＩＤＦデータ５３に基づいて、重要
度データとして格納する重要度を算出している。具体的
には、登録対象の全特許文献に含まれる単語（キーワー
ド）毎の出現回数をＴＦとし、登録対象の全特許文献数
をＮとし、登録対象の全特許文献のうち重要度算出対象
の単語が含まれる特許文献数をｎとした場合に、次式に
より重要度データとしてＴＦＩＤＦ値を算出する。ＩＤＦ＝ｌｏｇ（Ｎ／ｎ）ＴＦＩＤＦ＝ＴＦ・ＩＤＦこの場合において、算出したＩＤＦは、ＩＤＦデータと
してワードテーブルに登録される。次に特許検索データ
ベース登録システムは、請求項の解析の一部である請求
項間の従属関係を解析し（ステップＳ４）、従属関係デ
ータベース１７に登録する（ステップＳ５）。In this case, the importance calculator 62
The importance level stored as the importance level data is calculated based on the appearance frequency (cost) data 43 of the keyword table and the IDF data 53 of the word table. Specifically, the number of appearances of each word (keyword) included in all registered patent documents is TF, the number of all registered patent documents is N, and the importance calculation target among all registered patent documents is When the number of patent documents including a word is n, the TFIDF value is calculated as importance data by the following formula. IDF = log (N / n) TFIDF = TF · IDF In this case, the calculated IDF is registered in the word table as IDF data. Next, the patent search database registration system analyzes the dependency relation between claims which is a part of the analysis of the claims (step S4) and registers it in the dependency relation database 17 (step S5).

【００３０】この場合において、従属関係解析部６３Ａ
は、特許請求の範囲や実用新案登録請求の範囲における
「請求項○○記載の□□方法において」、「……ことを
特徴とする請求項△△記載の■■装置。」などの記載に
基づいて、各請求項同士の従属関係を解析し、図４に示
したような従属関係を従属関係データベースに登録す
る。次に特許検索データベース登録システムは、請求項
の解析の一部である構成要件抽出を行う（ステップＳ
６）。この場合において、構成要件抽出部６３Ｂは、請
求項の記載から構成要件（発明特定事項）を抽出し、構
成要件データベース１６に登録する（ステップＳ７）。
具体的に、以下の請求項の記載から構成要件を抽出する
場合について説明する。In this case, the dependency analysis unit 63A
Refers to the description such as "in the □□ method described in claim XX" in the claims and the utility model registration claim, "... ■ device in claim △△ characterized by ...." Based on this, the dependency relationship between each claim is analyzed, and the dependency relationship as shown in FIG. 4 is registered in the dependency relationship database. Next, the patent search database registration system extracts the constituent requirements that are part of the claim analysis (step S).
6). In this case, the constituent requirement extracting unit 63B extracts constituent requirements (invention specifying matters) from the description of the claims and registers them in the constituent requirement database 16 (step S7).
Specifically, a case where the constituent requirements are extracted from the following claims will be described.

【００３１】［請求項１］時系列に順序づけられたｎ
個の文書を関連づける方法であって、前記ｎ個の文書間
の類似度を計算するステップと、前記類似度から時間制
約を用いて類似度行列を作成するステップと、前記類似
度行列を前記文書の関連づけを示す隣接行列に変換する
ステップと、を有することを特徴とする文書の関連づけ
方法。[Claim 1] n ordered in time series
A method of associating a plurality of documents, calculating a similarity between the n documents, creating a similarity matrix from the similarity using a time constraint, and calculating the similarity matrix from the documents. And a step of converting into an adjacency matrix showing the association of the document.

【００３２】この場合には、「ステップ」というキーワ
ードにより以下の３つの構成要件が抽出される。「前記ｎ個の文書間の類似度を計算するステップ」「前記類似度から時間制約を用いて類似度行列を作成す
るステップ」、「前記類似度行列を前記文書の関連づけを示す隣接行列
に変換するステップ」そして抽出された構成要件は、構成要件データベース１
６に登録されることとなる。In this case, the following three constituent elements are extracted by the keyword "step". "Step of calculating the similarity between the n documents""Step of creating a similarity matrix from the similarity using a time constraint", "Conversion of the similarity matrix into an adjacency matrix showing association of the documents" Steps to be performed ”and the extracted configuration requirements are stored in the configuration requirement database 1
6 will be registered.

【００３３】具体的には、構成要件データベース１６
は、図６に示すように、ページＩＤデータ７１、クレー
ムＩＤデータ７２および構成要件テキストデータ７３を
備えている。ページＩＤデータ７１は、ページテーブル
２１のページＩＤデータ３１と同様のデータであり、ペ
ージＩＤデータ７１に対応するクレームＩＤデータ７２
および構成要件テキストデータ７３は、このページＩＤ
データ７１に対応する特許文献に含まれる請求項の内容
を表すこととなる。クレームＩＤデータ７２は、構成要
件テキストデータ７３に格納されているテキストデータ
のクレーム番号を特定するためのデータである。Specifically, the configuration requirement database 16
6, includes page ID data 71, complaint ID data 72, and constituent text data 73. The page ID data 71 is data similar to the page ID data 31 of the page table 21, and the complaint ID data 72 corresponding to the page ID data 71.
And the configuration requirement text data 73 is the page ID
The content of the claim included in the patent document corresponding to the data 71 is represented. The claim ID data 72 is data for specifying the claim number of the text data stored in the constituent requirement text data 73.

【００３４】構成要件テキストデータ７３は、構成要件
をテキストデータとして格納するためのデータである。
そして、上記請求項を構成要件データベース１６に登録
した場合には、図７に示すようになる。すなわち、特許
文献を特定するための番号＝２３１がページＩＤデータ
７１として格納され、請求項１を表す番号＝１がクレー
ムＩＤデータ７２として格納され、請求項１の内容に相
当するテキストデータ＝「前記ｎ個の文書間の類似度を
計算するステップ」が構成要件テキストデータ７３とし
て格納される。The constituent requirement text data 73 is data for storing the constituent requirement as text data.
Then, when the above claims are registered in the constituent requirement database 16, it becomes as shown in FIG. That is, the number = 231 for identifying the patent document is stored as the page ID data 71, the number = 1 representing claim 1 is stored as the claim ID data 72, and the text data corresponding to the content of claim 1 = “ The step of calculating the similarity between the n documents ”is stored as the constituent requirement text data 73.

【００３５】次に特許検索データベース登録システム１
０は、参照文献を抽出する（ステップＳ８）。この場合
において参照文献抽出部６５は、特許公報（特許文献）
内において、いずれかの特許文献（国内外の特許公報、
論文等）を引用しているかを検出し、引用先の情報（公
報番号、書名、論文タイトルなど）を参照先データベー
スに登録する。また、参照先の特許文献が既に登録され
ている場合には、当該参照先の特許文献に対応するペー
ジテーブル２１の被リンク数データ３６の値を更新す
る。Next, the patent search database registration system 1
0 extracts a reference document (step S8). In this case, the reference document extraction unit 65 is a patent document (patent document).
Within one of the
It is detected whether or not a citation (such as a paper) is cited, and information on the reference destination (publication number, title, article title, etc.) is registered in the reference database. Further, when the referred patent document is already registered, the value of the linked number data 36 of the page table 21 corresponding to the referred patent document is updated.

【００３６】具体的には、特許文献である特許公報中に
「特開平△−１２３４５号公報によると……」や「文献
“Fast Algorithms for Mining Association”を参照…
…」などの記載を抽出し、参照文献データベース１８に
登録する。Specifically, in the patent gazette which is a patent document, see "according to Japanese Patent Laid-Open No. 12345 -..." and "Document" Fast Algorithms for Mining Association "...
The description such as “...” is extracted and registered in the reference document database 18.

【００３７】［１．３］特許検索システムの構成図９は特許検索システムの概要構成ブロック図である。
図９において、図１の特許検索データベース登録システ
ムと同様の部分には同一の符号を付し、その詳細な説明
を省略する。特許検索システム１００は、大別すると、
各種データをデータベースとして蓄積するデータベース
部１１と、データベース部１１を用いて特許検索を行う
検索処理部１０２と、ディスプレイ部１０３と、入力部
１０４と、を備えている。[1.3] Configuration of Patent Retrieval System FIG. 9 is a schematic block diagram of the patent retrieval system.
9, the same parts as those in the patent search database registration system of FIG. 1 are designated by the same reference numerals, and detailed description thereof will be omitted. The patent search system 100 is roughly divided into
A database unit 11 that stores various data as a database, a search processing unit 102 that searches for a patent using the database unit 11, a display unit 103, and an input unit 104 are provided.

【００３８】ここで、特許検索システム１００は、コン
ピュータシステムにおいて実現可能であり、検索処理部
１０２の機能は、検索処理部１０２を構成する各部に対
応する、マイクロプロセッサで実行可能なプログラムに
よって実現される。また、このようなプログラムは、半
導体メモリ、ＣＤ−ＲＯＭなどの記録媒体から直接実行
してもよい。また、外部記憶装置に予めプログラムイン
ストールして実行することも可能である。さらにプログ
ラムの実行に先立って実行する毎、あるいは、最初に一
度だけ、インターネットなどのネットワークを介してイ
ンストールするようにしてもよい。検索処理部１０２
は、入力部１０４を介して入力された検索元特許文献
（あるいは検索元特許文献を特定するための特定情報）
に基づいて後述する類似検索、クレーム検索あるいは参
照先検索を行う。ディスプレイ部１０３は、検索結果を
含む各種データを表示し、検索オペレータに対して提示
する。入力部１０４は、キーボード、マウス、タブレッ
ト、スキャナ、リムーバブルディスク装置、通信インタ
フェース部などで構成され、検索元特許文献に対応する
データの入力や各種検索条件などの設定を行う。Here, the patent search system 100 can be realized in a computer system, and the function of the search processing unit 102 can be realized by a program executable by a microprocessor corresponding to each unit constituting the search processing unit 102. It Further, such a program may be directly executed from a recording medium such as a semiconductor memory or a CD-ROM. It is also possible to install the program in an external storage device in advance and execute it. Furthermore, the program may be installed each time it is executed prior to execution or only once at the beginning via a network such as the Internet. Search processing unit 102
Is the search source patent document input through the input unit 104 (or specific information for specifying the search source patent document)
Based on the above, a similar search, a claim search, or a reference destination search described later is performed. The display unit 103 displays various data including the search result and presents it to the search operator. The input unit 104 includes a keyboard, a mouse, a tablet, a scanner, a removable disk device, a communication interface unit, and the like, and inputs data corresponding to the search source patent document and sets various search conditions.

【００３９】［１．３．１］検索処理部の構成次に検索処理部の構成について説明する。検索処理部１
０２は、大別すると、構文解析・形態素解析部１１１
と、形態素解析用辞書部１１２と、図表作成部１１３
と、実検索処理部１１４と、標準化部１１５と、ストッ
プワード処理部１１６と、を備えている。構文解析・形
態素解析部１１１は、語句抽出部として機能し、形態素
解析部１２１と、請求項解析部１２２と、参照文献抽出
部１２３と、を備えている。形態素解析部１２１は、形
態素解析用辞書６４を用いて、特許文献を形態素解析し
て頻出する名詞、サ変名詞、その他の名詞あるいはサ変
名詞に類する単語を抽出する。[1.3.1] Configuration of Search Processing Unit Next, the configuration of the search processing unit will be described. Search processing unit 1
02 is roughly classified into a syntactic / morphological analysis unit 111.
And a morphological analysis dictionary unit 112 and a chart creation unit 113.
A real search processing unit 114, a standardization unit 115, and a stop word processing unit 116. The syntax analysis / morpheme analysis unit 111 functions as a phrase extraction unit, and includes a morpheme analysis unit 121, a claim analysis unit 122, and a reference document extraction unit 123. The morphological analysis unit 121 uses the morphological analysis dictionary 64 to morphologically analyze the patent document and extracts frequently occurring nouns, sahen nouns, other nouns, or words similar to sahen nouns.

【００４０】この形態素解析部１２１による形態素解析
結果の出力に際して、標準化部１１５は、単語の標準化
（統一化）を行う。また、ストップワード処理部１１６
は、特許分野において特許文献の種類に関わらず頻出す
る単語について形態素解析結果として出力しないように
処理をおこなっている。請求項解析部１２２は、検索元
特許文献が特許公報や実用新案公報などである場合に、
特許請求の範囲や実用新案登録請求の範囲などを解析
し、各請求項同士の従属関係を調べる。また、特許請求
の範囲や実用新案登録請求の範囲などから構成要件（例
えば、発明特定事項）を抽出する。参照先抽出部１２３
は、検索元特許文献について、当該検索元特許文献中で
参照している参照文献あるいは審査などにおいて参照さ
れた参照文献を抽出する。形態素解析用辞書部１１２
は、形態素解析部１２１が形態素解析を行う際に用いる
形態素解析辞書を格納している。When the morphological analysis unit 121 outputs the morphological analysis result, the standardization unit 115 standardizes (unifies) the words. In addition, the stop word processing unit 116
Performs processing so that a word that frequently appears in the patent field regardless of the type of patent document is not output as a morphological analysis result. When the search source patent document is a patent publication, a utility model publication, or the like, the claim analysis unit 122
Analyze claims and utility model registration claims, etc., and examine the subordination between claims. In addition, constituent elements (for example, invention specifying matters) are extracted from the claims and the utility model registration claims. Reference destination extraction unit 123
For the search source patent document, the reference document referenced in the search source patent document or the reference document referred to in the examination is extracted. Morphological analysis dictionary unit 112
Stores a morphological analysis dictionary used when the morphological analysis unit 121 performs morphological analysis.

【００４１】図表作成部１１３は、検索結果を図表化し
て検索オペレータに対して提示する。実検索処理部１１
４は、検索部として機能し、大別すると、類似検索部１
３１と、クレーム検索部１３２と、参照先検索部１３３
と、を備えている。類似検索部１３１は、検索元特許文
献と類似の特許文献を検索し、類似関係にある特許文献
を抽出し、その類似関係を把握する。クレーム検索部１
３２は、検索元特許文献のクレーム（請求項）の従属関
係を把握する。参照文献検索部１３３は、検索元特許文
献において参照している特許文献あるいは審査などにお
いて参照された特許文献を抽出する。The chart creation unit 113 charts the search result and presents it to the search operator. Real search processing unit 11
4 functions as a search unit, and if roughly classified, the similar search unit 1
31, a claim search unit 132, and a reference destination search unit 133
And are equipped with. The similarity search unit 131 searches for patent documents similar to the search source patent document, extracts patent documents having a similar relationship, and grasps the similarity relationship. Claim search section 1
32 grasps the subordinate relationship of the claims (claims) of the search source patent document. The reference document search unit 133 extracts the patent document referred to in the search source patent document or the patent document referred to in the examination or the like.

【００４２】［１．４］特許検索システムの動作次に特許検索システムの動作を説明する。以下の説明に
おいては、検索元特許文献として特許公報を用いる場合
を主として説明する。図１０に特許検索処理のフローチ
ャートを示す。この場合において、特許検索処理に先立
って、オペレータにより類似の特許文献を検索する類似
検索処理を行うか、あるいは、当該特許文献の請求項相
互間の関係を調べるクレーム検索処理を行うかのいずれ
かが指定されているものとし、類似検索処理を行う場合
には類似検索フラグがセットされ、クレーム検索処理を
行う場合には請求項フラグがセットされているものとす
る。まず特許検索システム１００は、検索元特許文献が
データベース部１１に未登録否かを判別する（ステップ
Ｓ１１）。[1.4] Operation of Patent Search System Next, the operation of the patent search system will be described. In the following description, the case where a patent publication is used as a search source patent document will be mainly described. FIG. 10 shows a flowchart of patent search processing. In this case, prior to the patent search process, the operator performs a similar search process in which similar patent documents are searched, or a claim search process in which the relationship between claims of the patent document is examined. Is specified, the similarity search flag is set when the similar search process is performed, and the claim flag is set when the complaint search process is performed. First, the patent search system 100 determines whether the search source patent document is not registered in the database unit 11 (step S11).

【００４３】ステップＳ１１の判別において、検索元特
許文献が既にデータベースに登録されている場合には
（ステップＳ１１；Ｎｏ）、類似検索フラグあるいは請
求項フラグのいずれがセットされているかを判別する
（ステップＳ１２）。ステップＳ１２の判別において、
類似検索フラグがセットされている場合には（ステップ
Ｓ１２；類似フラグセット）、実検索処理部１１４の参
照先検索部１３３は、参照文献データベース１８を検索
して検索元特許文献が参照している特許文献を抽出する
（ステップＳ１３）。次に実検索処理部１１４の類似検
索部１３１は、単語データベース１５を参照して検索元
特許文献に含まれる単語を読み出し、各単語毎に類似検
索を行い（ステップＳ１４）、検索元特許文献と単語デ
ータベース１５に登録されている他の特許文献との間の
距離計算および請求項の構成要件についてパターンマッ
チングを行う。In the determination in step S11, if the search source patent document is already registered in the database (step S11; No), it is determined whether the similarity search flag or the claim flag is set (step S11). S12). In the determination of step S12,
When the similarity search flag is set (step S12; similarity flag set), the reference destination search unit 133 of the actual search processing unit 114 searches the reference document database 18 and refers to the search source patent document. The patent documents are extracted (step S13). Next, the similarity search unit 131 of the actual search processing unit 114 reads out the words included in the search source patent document by referring to the word database 15 and performs a similarity search for each word (step S14), and searches Pattern matching is performed for distance calculation with other patent documents registered in the word database 15 and constituent elements of claims.

【００４４】より具体的には、検索元特許文献が日本国
内の特許公報である場合には、例えば、発明の詳細な説
明および要約に含まれる単語のそれぞれをベクトル表現
し、単語データベースに登録されている他の特許文献に
含まれる単語に対応するベクトルとの間で距離計算処理
を行う。また、検索元特許文献において特許請求の範囲
を構成する請求項の構成要件と、構成要件データベース
に登録されている構成要件との間のパターンマッチング
処理を行い、類似度を求める。この距離計算処理および
パターンマッチング処理により得られる類似の度合いに
基づいて所定の類似範囲内にある特許文献を特定する。
また、同様の距離計算処理およびパターンマッチング処
理を検索元特許文献が参照している特許文献についても
行う。そして、類似検索処理が終了すると、図表作成部
１１３は、類似検索結果を図表化するための処理を行う
（ステップＳ１５）。図表作成部１１３において、類似
検索結果を図表化するための処理が終了するとディスプ
レイ部１０３において類似検索結果に対応する図表を表
示する（ステップＳ１６）。More specifically, when the search source patent document is a patent publication in Japan, for example, each word included in the detailed description and abstract of the invention is expressed as a vector and registered in the word database. The distance calculation processing is performed with respect to the vector corresponding to the word included in another patent document. In addition, pattern matching processing is performed between the constituent features of the claims that form the claims in the search source patent document and the constituent features registered in the constituent feature database, and the degree of similarity is obtained. The patent documents within a predetermined similarity range are specified based on the degree of similarity obtained by the distance calculation process and the pattern matching process.
Further, the same distance calculation processing and pattern matching processing are also performed for the patent documents referred to by the search source patent document. Then, when the similarity search process ends, the chart creation unit 113 performs a process for tabulating the similarity search results (step S15). When the chart creating unit 113 completes the processing for tabulating the similar search results, the display unit 103 displays the chart corresponding to the similar search results (step S16).

【００４５】図１１に類似検索結果に対応する図表の一
例を示す。なお、以下の説明において、特許公報という
表現は、日本国における特許掲載公報ばかりでなく、公
開特許公報その他の公報も含むものである。図１１にお
いて、特許公報Ｘが検索元特許文献である。特許公報Ｘ
に対して直接あるいは間接に結ばれている特許文献は、
直接あるいは間接的に参照されている特許文献である。
またいずれの特許文献にも結ばれていない特許文献は、
類似文献であるが、直接的な参照関係にはない特許文献
である。この場合には、特許文献間の２次元的距離が近
いほど類似度が高いようになっている。さらに特許公報
Ｘの表示位置に対して左側に表示され、かつ、特許公報
Ｘに直接結ばれている特許文献（米国特許公報Ｓ、特許
公報Ｃ、特許公報Ｂ、特許公報Ｄ、大学論文Ｒ）は、特
許公報Ｘが直接参照している特許文献である。すなわ
ち、米国特許公報Ｓ、特許公報Ｃ、特許公報Ｂ、特許公
報Ｄ、大学論文Ｒは、検索元特許文献である特許公報Ｘ
中において直接的に記述され、参照されている特許文献
である。FIG. 11 shows an example of a chart corresponding to the similarity retrieval result. In the following description, the expression "patent publication" includes not only patent publications in Japan but also published patent publications and other publications. In FIG. 11, patent publication X is the search source patent literature. Patent publication X
Patent documents directly or indirectly linked to
It is a patent document referred to directly or indirectly.
Further, patent documents that are not tied to any of the patent documents,
It is a patent document that is a similar document but not in a direct reference relationship. In this case, the closer the two-dimensional distance between patent documents, the higher the degree of similarity. Further, the patent documents displayed on the left side of the display position of the patent publication X and directly linked to the patent publication X (US patent publication S, patent publication C, patent publication B, patent publication D, university paper R) Is a patent document directly referenced by Patent Publication X. That is, US patent publication S, patent publication C, patent publication B, patent publication D, and university thesis R are patent publication X, which is the search source patent literature.
Patent documents directly described and referenced therein.

【００４６】さらにこれらの米国特許公報Ｓ、特許公報
Ｃ、特許公報Ｂ、特許公報Ｄ、大学論文Ｒと特許公報Ｘ
との間の２次元的距離が近いほど類似度が高いようにな
っている。また、各特許文献を表す枠内に表示されてい
る数字は、当該特許文献を参照している特許文献数であ
る。この図の例の場合には、特許公報Ａおよび特許公報
Ｅは、参照件数が多いので、基本特許の可能性が高いと
いうことが判断できる。従って、特許公報Ｘが特許公開
公報であるような場合には、特許出願人は、特許公報Ａ
および特許公報Ｅに記載の発明をはじめに参照すべきと
考えられることとなる。さらに特許公報Ｘに類似してい
る特許公報Ｎ、Ｑ、Ｐも存在しているので、これらの特
許公報に記載の発明についても注意が必要であることが
わかる。この場合において、ディスプレイ部１０３の表
示画面上において、いずれかの特許文献を選択し、検索
種別を指定すれば、同様に当該選択した特許文献を検索
元特許文献とする検索処理がなされる。このように構成
することにより、階層的に検索を継続することができ、
所望の特許文献を効率的に検索することができる。Further, these US patent publication S, patent publication C, patent publication B, patent publication D, university thesis R and patent publication X
The closer the two-dimensional distance between and is, the higher the similarity is. Further, the numbers displayed in the frames representing the respective patent documents are the number of patent documents referring to the patent documents. In the case of the example of this figure, since the patent publications A and E have a large number of references, it can be determined that the possibility of a basic patent is high. Therefore, when the patent publication X is a patent publication, the patent applicant is
It is considered that the invention described in Patent Publication E should be referred to first. Furthermore, since there are patent publications N, Q, and P that are similar to patent publication X, it is understood that the inventions described in these patent publications also require attention. In this case, if any of the patent documents is selected on the display screen of the display unit 103 and the search type is designated, the search process is similarly performed with the selected patent document as the search source patent document. With this configuration, it is possible to continue searching hierarchically,
A desired patent document can be efficiently searched.

【００４７】また、複数の特許文献を線分で結ぶ場合に
線分の色分けを行って系統分けをしたり、出願人毎や、
発明者毎、代理人（弁理士など）毎に特許文献を表す枠
内の色を変更したり、ブリンク（点滅）表示させること
により、より特許文献相互間の関係を把握しやすくする
ことが可能となる。一方、ステップＳ１２の判別にお
いて、請求項フラグがセットされている場合には（ステ
ップＳ１２；請求項フラグセット）、実検索処理部１１
４のクレーム検索部１３２は、構成要件データベースお
よび従属関係データベースを検索して請求項相互の従属
関係を抽出する（ステップＳ１７）。そして、クレーム
検索処理が終了すると、図表作成部１１３は、クレーム
検索結果を図表化するための処理を行う（ステップＳ１
５）。図表作成部１１３において、クレーム検索結果を
図表化するための処理が終了するとディスプレイ部１０
３においてクレーム検索結果に対応する図表を表示する
（ステップＳ１６）。Further, when a plurality of patent documents are connected by a line segment, the line segment is color-coded for systematic classification, for each applicant,
It is possible to make it easier to understand the relationship between patent documents by changing the color in the frame representing the patent documents or displaying blinking (blinking) for each inventor and agent (patent attorney, etc.). Becomes On the other hand, when the claim flag is set in the determination of step S12 (step S12; claim flag set), the actual search processing unit 11
The claim retrieval unit 132 of No. 4 retrieves the dependency relation between claims by searching the constituent requirement database and the dependency relation database (step S17). Then, when the complaint search processing is completed, the chart creation unit 113 performs processing for charting the complaint search results (step S1).
5). When the chart creating unit 113 completes the processing for tabulating the complaint search results, the display unit 10
A chart corresponding to the complaint search result is displayed in step 3 (step S16).

【００４８】図１２にクレーム検索結果に対応する図表
の一例を示す。この場合において、各請求項には、以下
のような記載があるものとする。請求項１：「……を備えたことを特徴とする知識抽出
方法」請求項２：「請求項１記載の知識抽出方法において、
……」請求項３：「請求項２記載の知識抽出方法において、
……」請求項４：「……を備えたことを特徴とする知識抽出
装置」請求項５：「請求項４記載の知識抽出装置において、
……」請求項６：「請求項４記載の知識抽出装置において、
……」請求項７：「請求項４記載の知識抽出装置において、
……」請求項８：「請求項４記載の知識抽出装置において、
……」請求項９：「……を備えたことを特徴とする知識抽出
プログラム」FIG. 12 shows an example of a chart corresponding to the complaint search result. In this case, each claim shall have the following description. Claim 1: "Knowledge extraction method characterized by including ..." Claim 2: "In the knowledge extraction method according to Claim 1,
...... ”Claim 3:“ In the knowledge extraction method according to Claim 2,
...... "Claim 4:" Knowledge extraction device characterized by including ... "Claim 5:" In the knowledge extraction device according to claim 4,
...... ”Claim 6:“ In the knowledge extraction device according to claim 4,
...... ”Claim 7:“ In the knowledge extraction device according to claim 4,
...... ”Claim 8:“ In the knowledge extraction device according to claim 4,
...... ”Claim 9:“ Knowledge extraction program characterized by having …… ”

【００４９】クレーム検索部１３２は、各請求項の記載
に基づいて、請求項１、請求項４および請求項９は独立
請求項であることを検出する。すなわち、当該請求項の
記述中に他の請求項の記述を含まない請求項あるいは発
明の名称が異なる他の請求項の記述のみを含む請求項で
ある場合に独立請求項であるとする。また、クレーム検
索部１３２は、当該請求項より前に記述されている請求
項の発明の名称と発明の名称が同一であり、かつ、他の
請求項の記載（例えば、「請求項○○」）を含む請求項
は、当該他の請求項に従属する従属請求項であると判断
する。すなわち、請求項２は請求項１の従属請求項、請
求項３は請求項２の従属項、請求項５〜請求項８はそれ
ぞれ請求項４の従属項であることを検出する。この結
果、図１２に示すように、請求項１に対応する枠と請求
項２に対応する枠は線分により接続され、さらに請求項
２に対応する枠と請求項３に対応する枠は線分により接
続されて表示される。また、請求項４に対応する枠に
は、請求項５〜請求項８に対応する枠が並列的に線分に
より接続されて表示される。さらに請求項９に対応する
枠は独立して表示される。このような表示がなされるこ
とにより、検索者は、直感的に当該請求項群の中で基本
的な内容を特定している独立請求項を把握できるととも
に、請求項間の従属関係を容易に把握できる。The claim search unit 132 detects that claims 1, 4, and 9 are independent claims based on the description of each claim. That is, a claim that does not include the description of another claim in the description of the claim or a claim that includes only the description of another claim with a different title of the invention is regarded as an independent claim. Further, the claim search unit 132 has the same name as the invention of the claim described before the claim, and the description of another claim (for example, “Claim XX”). It is judged that a claim including ") is a dependent claim dependent on the other claim. That is, it is detected that claim 2 is a dependent claim of claim 1, claim 3 is a dependent claim of claim 2, and claims 5 to 8 are dependent claims of claim 4, respectively. As a result, as shown in FIG. 12, the frame corresponding to claim 1 and the frame corresponding to claim 2 are connected by a line segment, and the frame corresponding to claim 2 and the frame corresponding to claim 3 are lined. Connected and displayed by minutes. Further, in the frame corresponding to claim 4, the frames corresponding to claims 5 to 8 are displayed in parallel connected by line segments. Further, the frame corresponding to claim 9 is displayed independently. With such a display, the searcher can intuitively grasp the independent claim specifying the basic content in the claim group and easily establish the subordination relationship between the claims. I can figure it out.

【００５０】［１．５］実施形態の効果以上の説明のように、本実施形態によれば、特許検索を
行うに際し、オペレータがキーワードを入力することな
く、「漏れの少ない」調査を行い、かつ、不必要な情報
が含まれることを抑制することができる。また、複数の
特許文献間の参照、被参照関係を明確に提示し、さらに
参照関係がなくても近似する技術内容の特許文献を抽出
することができる。さらにまた、同一の特許文献中にお
ける請求項の従属関係を明確にし、オペレータに対して
視覚的に容易に把握できる状態で提示できる。[1.5] Effects of the Embodiments As described above, according to the present embodiment, when performing a patent search, an operator conducts a "least leak" survey without inputting keywords. In addition, it is possible to suppress the inclusion of unnecessary information. Further, it is possible to clearly present references and referenced relationships among a plurality of patent documents, and to extract patent documents having similar technical contents even if there is no reference relationship. Furthermore, it is possible to clarify the dependent relationship of claims in the same patent document and present them to the operator in a state where they can be easily visually grasped.

【００５１】［２］実施形態の変形例［２．１］第１変形例以上の説明においては、データベース部１１を登録処理
部１２（あるいは検索処理部１０２）と一体に構成して
いたが、両者をネットワークを介して分散処理システム
として構成することも可能である。この場合において、
さらにデータベース部１１を構成する各データベース１
５、１６、１７，１８および形態素解析用辞書６４をネ
ットワークを介して別のデータベースサーバに格納する
ように構成し、複数の登録処理部１２（あるいは検索処
理部１０２）として機能するコンピュータシステムから
利用可能な構成とすることも可能である。[2] Modification of Embodiment [2.1] First Modification In the above description, the database unit 11 is integrated with the registration processing unit 12 (or the search processing unit 102). Both can be configured as a distributed processing system via a network. In this case,
Further, each database 1 that constitutes the database unit 11
5, 16, 17, 18 and the morphological analysis dictionary 64 are configured to be stored in another database server via a network, and are used by a computer system functioning as a plurality of registration processing units 12 (or search processing units 102). A possible configuration is also possible.

【００５２】［２．２］第２変形例以上の説明においては、標準化部６６（あるいは標準化
部１１５）を必須の構成として説明したが、必ずしも標
準化部２３を設けなくてもデータベースの容量は多少増
加するというデメリットはあるが、ほぼ同様な効果を得
ることが可能である。[2.2] Second Modification In the above description, the standardization unit 66 (or the standardization unit 115) has been described as an indispensable configuration, but the capacity of the database is somewhat small even if the standardization unit 23 is not necessarily provided. Although it has the disadvantage of increasing, it is possible to obtain almost the same effect.

【００５３】[0053]

【発明の効果】本発明によれば、特許検索を行うに際
し、「漏れの少ない」検索を行え、かつ、不必要な情報
が含まれることを抑制することができる。また、複数の
特許文献間の参照、被参照関係を明確に提示し、さらに
参照関係がなくても近似する技術内容の特許文献を抽出
することができる。さらにまた、同一の特許文献中にお
ける請求項の従属関係を明確化し、提示することができ
る。According to the present invention, when conducting a patent search, it is possible to carry out a "least leaked" search and suppress the inclusion of unnecessary information. Further, it is possible to clearly present references and referenced relationships among a plurality of patent documents, and to extract patent documents having similar technical contents even if there is no reference relationship. Furthermore, the dependency relationship between claims in the same patent document can be clarified and presented.

[Brief description of drawings]

【図１】特許検索データベース登録システムの概要構
成ブロック図である。FIG. 1 is a schematic block diagram of a patent search database registration system.

【図２】単語データベースの基本構成図である。FIG. 2 is a basic configuration diagram of a word database.

【図３】従属関係データベースのデータフォーマット
の説明図である。FIG. 3 is an explanatory diagram of a data format of a dependency database.

【図４】従属関係データベースのデータ登録例の説明
図である。FIG. 4 is an explanatory diagram of a data registration example of a dependency database.

【図５】特許検索データベース登録処理のフローチャ
ートである。FIG. 5 is a flowchart of a patent search database registration process.

【図６】構成要件データベースのデータフォーマット
の説明図である。FIG. 6 is an explanatory diagram of a data format of a constituent requirement database.

【図７】構成要件データベースのデータ登録例の説明
図である。FIG. 7 is an explanatory diagram of a data registration example of a constituent requirement database.

【図８】参照文献データベースのデータフォーマット
の説明図である。FIG. 8 is an explanatory diagram of a data format of a reference document database.

【図９】特許検索システムの概要構成ブロック図であ
る。FIG. 9 is a schematic block diagram of a patent search system.

【図１０】特許検索処理のフローチャートである。FIG. 10 is a flowchart of patent search processing.

【図１１】類似検索結果に対応する図表の一例の説明
図である。FIG. 11 is an explanatory diagram of an example of a chart corresponding to a similarity search result.

【図１２】クレーム検索結果に対応する図表の一例の
説明図である。FIG. 12 is an explanatory diagram of an example of a chart corresponding to a complaint search result.

[Explanation of symbols]

１０……特許検索データベース登録システム１１……データベース部１２……登録処理部１５……単語データベース１６……構成要件（発明特定事項）データベース１７……従属関係データベース１８……参照文献データベース２１……ページテーブル２２……キーワードテーブル２３……ワードテーブル３１……ページＩＤデータ３２……認識番号データ３３……タイトルデータ３４……更新日時データ３５……書誌事項データ３６……被リンク数データ１００……特許検索システム１０２……検索処理部１０３……ディスプレイ部１０４……入力部１１１……構文解析・形態素解析部１１２……形態素解析用辞書部１１３……図表作成部１１４……実検索処理部１１５……標準化部１１６……ストップワード処理部１２１……形態素解析部１２２……請求項解析部１２３……参照文献抽出部 10 ... Patent search database registration system 11 ... Database section 12 ... Registration processing unit 15 ... Word database 16: Database of constituent requirements (items specifying invention) 17 ... Dependency database 18 ... Reference database 21 ... Page table 22 ... Keyword table 23 ... Word table 31 ... Page ID data 32: Identification number data 33 …… Title data 34: Update date / time data 35 …… Bibliographic data 36 ... Linked data 100 ... Patent search system 102 ... Search processing unit 103 ... Display section 104 ... Input section 111 ... Syntax analysis / morphological analysis unit 112 ... Dictionary for morphological analysis 113 ... Chart creation department 114 ... actual search processing unit 115 …… Standardization Department 116 ... Stop word processing section 121 ... Morphological analysis unit 122 ... Claim analysis unit 123 ... Reference document extraction unit

Claims

[Claims]

1. A patent search database unit that stores a word or a compound word included in a patent document to be searched as a search term phrase in advance in association with the patent document, and analyzes the search source patent document to set the condition. A word extraction unit that extracts a word or a compound word included in a patent document as a search condition phrase, compares the search term phrase and the search condition phrase, and obtains a patent document having high similarity to the search source patent document. A patent search device, comprising: a search unit that searches a patent document highly relevant to a search source patent document;

2. The patent search device according to claim 1, wherein a reference destination extraction for extracting a reference relationship between the search source patent document and the highly related patent document based on the description in each patent document. Patent search device characterized by comprising a section.

3. The patent search device according to claim 2, wherein the patent search database section associates information about patent documents having a reference relationship with the search source patent document based on the extracted reference relationship. A patent search device comprising a reference destination database unit for storing.

4. The patent search device according to claim 3, wherein a chart representing a reference relationship between the search source patent document and the highly related patent document is created based on the information in the reference destination database section. And a display section for presenting the chart created by the chart creating section.

5. The patent search device according to claim 1, further comprising: a morphological analysis unit that performs morphological analysis of the search source patent document to extract the word or the compound word. And an importance degree calculation unit that calculates the degree of importance of the word or the compound word in the search source patent document, and the extracted word or the compound word in the patent search database unit in association with the importance degree. A patent search device, comprising: a registration unit that registers as a term.

6. The patent search device according to claim 1, wherein the search unit is based on a distance between a vector corresponding to the search term phrase and a vector corresponding to the search condition phrase. A patent search device for searching patent documents having high relevance.

7. The patent search device according to claim 1, wherein the patent document includes descriptions of a plurality of “claims”, and the search unit includes a plurality of the search terms based on the search term phrase. A patent search device comprising a claim analysis unit for extracting a dependency relationship between "claims".

8. The patent search device according to claim 7, wherein the claim analysis unit includes a description of another “claim” included in the “claim” to be searched and the “claim” to be searched. And a patent search device for extracting the subordination relationship based on the title of the inventions in "Claims".

9. The patent search device according to claim 8, wherein the claim analysis unit includes the description of the other “claim” in the “claim” of the search target, and the “claim” of the search target. A patent search device, characterized in that when the title of the invention of claim is the same as the title of the invention of the other claim, it is determined that there is the subordinate relationship.

10. The patent search device according to claim 1, wherein the patent document includes a description of “claim”, and the search unit performs the search based on the search term phrase. A patent search device comprising a claim analysis unit for extracting a constituent feature included in the description of "claim".

11. A control method of a patent search apparatus having a patent search database section for storing a word or a compound word included in a patent document to be searched as a search term phrase in advance in association with the patent document. And a phrase extraction process of extracting a word or a compound word included in the condition setting patent document as a search condition phrase, comparing the search term phrase and the search condition phrase with respect to the search source patent document A method for controlling a patent search device, comprising: a search process for searching a patent document having a high similarity as a patent document highly related to the search source patent document.

12. The patent search device control method according to claim 11, wherein a reference relationship between the search source patent document and the highly related patent document is extracted based on the description in each patent document. A control method for a patent search device, comprising a reference destination extraction process.

13. The control method of the patent search device according to claim 12, wherein the patent search database unit relates to patent documents having a reference relationship in association with the search source patent document based on the extracted reference relationship. And a reference destination database unit for storing information on the reference source database unit, and based on the information in the reference destination database unit, creates a chart showing a reference relationship between the search source patent document and the highly related patent document. A method of controlling a patent search device, comprising: a process; and a display process for presenting the chart created by the chart creating unit.

14. The method of controlling a patent search device according to claim 11, further comprising: a morphological analysis process of performing morphological analysis of the search source patent document to extract the word or the compound word. An importance calculation process for calculating the importance of the extracted word or compound in the search source patent document, and the patent search database unit in which the extracted word or compound is associated with the importance. And a registration process for registering the search term as the search term.

15. The control method for a patent search device according to claim 11, wherein in the search step, a distance between a vector corresponding to the search term and a vector corresponding to the search condition term. A method for controlling a patent search device, which searches for patent documents having high relevance based on the above.

16. The method of controlling a patent search device according to claim 11, wherein the patent document includes a plurality of “claims”, and the search process is based on the search term phrase. A method for controlling a patent search device, comprising a claim analysis process for extracting a dependency relationship between the plurality of "claims".

17. The control method of the patent search device according to claim 16, wherein the claim analysis process includes the description of other “claims” included in the “claim” of the search target and the “claim” of the search target. A method for controlling a patent search device, characterized in that the subordinate relationships are extracted based on the titles of the inventions of "claim" and other "claims".

18. The control method of the patent search device according to claim 17, wherein in the claim analysis process, the “claim” of the search target includes the description of the other “claim”, and the search is performed. A control method of a patent search device, wherein when the name of the subject “claim” of the invention is the same as the name of the other “claim” of the invention, it is determined that there is the dependency.

19. The method for controlling a patent search device according to claim 11, wherein the patent document includes a description of “claim”, and the search process includes the search term in A method of controlling a patent search device, comprising a claim analysis process for extracting a constituent element included in the description of "claim" based on the claim.

20. A control program for causing a computer having a patent search database unit, which stores in advance a word or a compound word included in a patent document to be searched as a search term phrase in association with the patent document, to function as a patent search device. By analyzing the search source patent document, and extracting a word or a compound word included in the condition setting patent document as a search condition phrase, comparing the search term phrase and the search condition phrase, to the search source patent document On the other hand, a control program for searching a patent document having a high similarity as a patent document having a high relevance to the search source patent document.

21. The control program according to claim 20, wherein a reference relationship between the search source patent document and the highly related patent document is extracted based on the description in each patent document. And control program.

22. The control program according to claim 20 or 21, wherein the patent document includes a plurality of "claims", and the plurality of "claims" are mutually based on the search term phrase. A control program characterized by extracting subordinate relationships between them.

23. The control program according to claim 20 or 21, wherein the patent document includes a description of “claim” and is included in the description of “claim” based on the search term phrase. A control program characterized by extracting requirements.

24. A recording medium on which the control program according to any one of claims 20 to 23 is recorded.