JPH09128399A

JPH09128399A - Method and device for extracting keyword data of patent database

Info

Publication number: JPH09128399A
Application number: JP7309952A
Authority: JP
Inventors: Kimio Arai; 喜美雄新井
Original assignee: Techno Research KK
Current assignee: Techno Research KK
Priority date: 1995-11-02
Filing date: 1995-11-02
Publication date: 1997-05-16

Abstract

PROBLEM TO BE SOLVED: To accurately grasp the gist of an invention and to improve the keyword retrieval efficiency by mechanically extracting keyword data as it is from the range, etc., of a patent demand. SOLUTION: The extraction device 1 is provided with a keyword data storage means 2 where keyword data is previously set and stored, a control means 4 where keyword data set in the keyword data storage means 2 is compared with document information stored in an external storage means 3 and processed and a storage means 5 where a result processed in the control means 4 is temporarily stored. The control means is connected to an input means 6 and an output means 7. First keyword data consisting of KANJI(Chinese character), KATAKANA(square form of Japanese syllabary), signs and Roman latters, etc., is extracted from the document information of patent application such as the range, etc., of the patent demand which is stored in the external storage means 3 and second keyword data consisting of the character strings of HIRAKANA(cursive form of Japanese syllabary) is extracted. Then, third keyword data consisting of specified KANJI is obtained in a non-extraction state.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、多数の特許情報が
記憶された特許データベースの文書情報から、キーワー
ド検索する際に使用されるキーワードデータを自動的に
抽出するための、特許データベースのキーワードデータ
抽出方法及びキーワードデータ抽出装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a keyword data of a patent database for automatically extracting keyword data used in a keyword search from document information of a patent database storing a large number of patent information. The present invention relates to an extraction method and a keyword data extraction device.

【０００２】[0002]

【従来の技術】従来、特許情報を機械検索によって検索
する特許データベース検索システムとして、例えば日本
特許情報機構の「ＰＡＴＯＬＩＳ」がある。この検索シ
ステムでは、特許出願毎に発生する特許情報を、データ
ベースファイル（テキストファイル）としてコンピュー
タの記憶手段に随時記憶させることにより、特許データ
ベースを構築している。2. Description of the Related Art Conventionally, as a patent database search system for searching patent information by machine search, there is, for example, "PATOLIS" of Japan Patent Information Organization. In this search system, a patent database is constructed by storing the patent information generated for each patent application as a database file (text file) in the storage means of the computer at any time.

【０００３】この特許データベースに記憶されている特
許情報としては、特許出願の書誌事項（例えば、出願番
号、公開番号、公告番号、登録番号、出願日、公開日、
公告日、登録日、発明の名称、発明者、出願人、特許分
類等）と、技術内容を解析して付与した各種キーワード
（例えばフリーキーワード、固定キーワード、Ｆター
ム）等がある。The patent information stored in this patent database includes bibliographic items of a patent application (for example, application number, publication number, publication number, registration number, application date, publication date,
The date of publication, the date of registration, the name of the invention, the inventor, the applicant, the patent classification, etc.) and various keywords (for example, free keywords, fixed keywords, F-terms) that have been added by analyzing the technical content.

【０００４】そして、この検索システムを利用するユー
ザーは、例えば先行技術、異議資料、無効資料等を調査
するために、特定の特許出願もしくは実用新案登録出願
（対象出願）と同一のキーワードを有する特許出願等を
検索する場合がある。この場合、ユーザーが対象出願に
付与された上記キーワード中から特定のキーワードを選
択し、これを検索キーとして端末機に入力する。検索シ
ステムのコンピュータは、入力された検索キーに基づい
てデータベースファイル中の検索キーと同一のキーワー
ドを有する特許出願（該当特許）を検索し、該当特許の
書誌事項等を端末機のディスプレイ及びプリンタ等に出
力している。A user who uses this search system, for example, searches for a prior art, objection material, invalid material, etc., and a patent having the same keyword as a specific patent application or utility model registration application (subject application). We may search for applications. In this case, the user selects a specific keyword from the keywords given to the target application and inputs it to the terminal device as a search key. The computer of the search system searches for a patent application (corresponding patent) having the same keyword as the search key in the database file based on the input search key, and displays the bibliographical information of the patent, etc. on the terminal display and printer. Is output to.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、この検
索システムにあっては、例えば特許請求の範囲中に記載
されているキーワードを、そのままの状態で正確に検索
することができないという問題点があった。すなわち、
例えば特許出願にフリーキーワード等のキーワードを付
与する場合、キーワード付与者が各特許出願の技術内容
全体（特許請求の範囲及び明細書等）を解析して付与し
ているため、キーワード付与時に、付与者による人為的
要素が関与し易く、例えば特許請求の範囲中のキーワー
ドを上位概念でとらえる場合等がある。However, this search system has a problem that it is not possible to accurately search the keywords described in the claims, for example, as they are. . That is,
For example, when assigning keywords such as free keywords to a patent application, the keyword assigner analyzes and assigns the entire technical content of each patent application (claims, specifications, etc.), so when assigning keywords There is a case in which a human being is easily involved in a human factor, and for example, a keyword in the claims is regarded as a superordinate concept.

【０００６】その結果、特許請求の範囲に記載されてい
る単語がそのままの状態でキーワードとして付与されな
くなる。特に、近年、特許出願の技術内容が複雑かつ細
分化されるにしたがい、特許出願の特許請求の範囲の記
載や発明の名称及び要約（目的、構成等）のみから、特
定のキーワードを効率的に機械検索する要望が高まって
いるが、上記の検索システムでは、このような要望に対
応することができないのが実状である。As a result, the words described in the claims are not added as keywords in the state as they are. In particular, as the technical contents of patent applications have become complicated and subdivided in recent years, it is possible to efficiently identify specific keywords only from the description of the claims of the patent application and the title and abstract of the invention (purpose, structure, etc.). Although there is an increasing demand for machine search, the fact is that the above search system cannot meet such a request.

【０００７】本発明はこのような事情に鑑みてなされた
もので、その目的は、特許請求の範囲等からキーワード
データをそのままの状態で自動的に抽出することがで
き、キーワード検索効率を向上させ得る、特許データベ
ースのキーワードデータ抽出方法及びキーワードデータ
抽出装置を提供することにある。The present invention has been made in view of such circumstances, and an object thereof is to be able to automatically extract keyword data as it is from the scope of claims and the like, thereby improving keyword search efficiency. An object of the present invention is to provide a keyword data extraction method and a keyword data extraction device for a patent database.

【０００８】[0008]

【課題を解決するための手段】かかる目的を達成すべ
く、請求項１記載の特許データベースのキーワードデー
タ抽出方法は、外部記憶手段に記憶された特許出願の文
書情報からキーワードデータを抽出するキーワードデー
タ抽出方法において、文書情報中から漢字、カタカナ、
符号、ローマ字等からなる第１キーワードデータを抽出
するステップと、文書情報中からひらがなの文字列から
なる第２キーワードデータを抽出するステップと、特定
の漢字からなる第３キーワードデータを非抽出状態とし
得るステップと、を具備することを特徴とする。In order to achieve the above object, a keyword data extracting method for a patent database according to claim 1 is a keyword data extracting keyword data from document information of a patent application stored in an external storage means. In the extraction method, kanji, katakana,
A step of extracting the first keyword data consisting of a code, a roman character, etc., a step of extracting the second keyword data consisting of a hiragana character string from the document information, and a third keyword data consisting of a specific kanji character are set to a non-extracted state. And a step of obtaining.

【０００９】この抽出方法によれば、例えば特許請求の
範囲の文書情報中から、先ず漢字、カタカナ、符号、ロ
ーマ字等からなる第１キーワードデータを抽出し、次に
ひらがなの文字列からなる第２キーワードデータを抽出
する。そして、例えば抽出した第１及び第２キーワード
データの中から、「発明」、「前記」、「構成」等の特
許情報のキーワードとなり得ない特定の漢字からなる第
３キーワードデータを消去する。これにより、特許請求
の範囲に記載された熟語、ひらがなの文字列、符号等か
らなるキーワードデータがそのままの状態で抽出され、
これがキーワード検索される。According to this extraction method, for example, from the document information in the claims, first keyword data consisting of kanji, katakana, code, romaji, etc. is extracted, and secondly consisting of hiragana character strings. Extract keyword data. Then, for example, from the extracted first and second keyword data, the third keyword data consisting of specific kanji that cannot be a keyword of patent information such as "invention", "above", "composition" is deleted. As a result, the keyword data including the phrase, the hiragana character string, the code, etc. described in the claims is extracted as it is,
This is searched by keyword.

【００１０】また、請求項２記載の特許データベースの
キーワードデータ抽出装置は、外部記憶手段に記憶され
た特許出願の文書情報からキーワードデータを抽出する
キーワードデータ抽出装置において、複数種類のキーワ
ードデータを設定記憶するキーワードデータ記憶手段
と、キーワードデータ記憶手段のキーワードデータに基
づいて、外部記憶手段の文書情報中から、漢字、カタカ
ナ、符号、ローマ字等からなる第１キーワードデータ及
びひらがなの文字列からなる第２キーワードデータを抽
出すると共に、特定の漢字からなる第３キーワードデー
タを非抽出状態とし得る制御手段と、を具備することを
特徴とする。According to another aspect of the present invention, there is provided a keyword data extracting device for a patent database, wherein plural types of keyword data are set in the keyword data extracting device for extracting keyword data from document information of a patent application stored in an external storage means. Based on the keyword data storage means to be stored and the keyword data in the keyword data storage means, the first keyword data consisting of kanji, katakana, code, romaji, etc. and the hiragana character string from the document information in the external storage means are stored. And a control means for extracting the second keyword data and for putting the third keyword data consisting of a specific kanji character into the non-extracted state.

【００１１】このキーワードデータ抽出装置によれば、
先ず、例えばひらがなの文字列からなる第２キーワード
データ、特許情報のキーワードとなり得ない第３キーワ
ードデータ等からなる複数種類のキーワードデータをキ
ーワードデータ記憶手段に設定記憶する。制御手段は、
外部記憶手段に記憶されている文書情報中から、漢字、
カタカナ、符号等からなる第１キーワードデータと、キ
ーワードデータ記憶手段に記憶されている第２キーワー
ドデータに該当するキーワードデータを抽出し、例えば
この抽出したキーワードデータで中から第３キーワード
データに該当するキーワードデータを消去する。これに
より、例えば特許請求の範囲中のキーワードデータがそ
のままの状態で抽出され、これがキーワード検索され
る。According to this keyword data extraction device,
First, a plurality of types of keyword data, such as second keyword data composed of a hiragana character string and third keyword data that cannot be a keyword of patent information, are set and stored in the keyword data storage means. The control means
From the document information stored in the external storage means, kanji,
First keyword data consisting of katakana, codes, etc. and keyword data corresponding to the second keyword data stored in the keyword data storage means are extracted, and for example, the extracted keyword data corresponds to the third keyword data from the inside. Delete keyword data. Thereby, for example, the keyword data in the claims is extracted as it is, and the keyword data is searched.

【００１２】[0012]

【発明の実施形態】以下、本発明の実施の形態を図面に
基づいて詳細に説明する。図１は、本発明に係わるキー
ワードデータ抽出装置のブロック図を示している。図１
において、キーワードデータ抽出装置１（以下、単に抽
出装置１という）は、キーワードデータが予め設定記憶
されるキーワードデータ記憶手段２と、このキーワード
データ記憶手段２に設定されているキーワードデータと
外部記憶手段３に記憶されている文書情報とを比較処理
等する制御手段４と、この制御手段４で処理した結果を
一時記憶する記憶手段５等を有している。なお、制御手
段４には、入力手段６及び出力手段７が接続されてい
る。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described in detail below with reference to the drawings. FIG. 1 shows a block diagram of a keyword data extraction device according to the present invention. FIG.
1, the keyword data extraction device 1 (hereinafter, simply referred to as the extraction device 1) includes a keyword data storage unit 2 in which keyword data is preset and stored, and keyword data and external storage unit set in the keyword data storage unit 2. 3 has a control means 4 for comparing the document information stored in the storage means 3 and the like, a storage means 5 for temporarily storing a result processed by the control means 4, and the like. An input means 6 and an output means 7 are connected to the control means 4.

【００１３】そして、この抽出装置１は、パーソナルコ
ンピュータで構成されて、図２に示すような特許デター
ベース検索装置１０（以下、単に検索装置１０という）
が構築される。すなわち、検索装置１０は、パーソナル
コンピュータ１１（以下、パソコン１１という）と、入
力手段としてのキーボード１２と、出力手段としてのデ
ィスプレイ（ＣＲＴ）１３及びプリンタ１４と、外部記
憶手段としてのＣＤ−ＲＯＭドライブ装置に装着される
ＣＤ−ＲＯＭ１５（外部記憶装置）等を有している。The extraction device 1 is composed of a personal computer and has a patent database retrieval device 10 (hereinafter simply referred to as retrieval device 10) as shown in FIG.
Is constructed. That is, the search device 10 includes a personal computer 11 (hereinafter referred to as a personal computer 11), a keyboard 12 as an input unit, a display (CRT) 13 and a printer 14 as an output unit, and a CD-ROM drive as an external storage unit. It has a CD-ROM 15 (external storage device) and the like mounted on the device.

【００１４】パソコン１１は、中央演算処理装置（ＣＰ
Ｕ）１６、ランダムアクセスメモリ（ＲＡＭ）１７、リ
ードオンリーメモリ（ＲＯＭ）１８等を有し、メモリ１
８に記憶された検索プログラムにしたがって各種演算及
び制御を行う。なお、外部記憶装置は、ＣＤ−ＲＯＭ１
５に限らず、例えば特許データベースの磁気ディスク等
が使用され、特許出願の上記書誌事項及びキーワード等
の文書情報が多数件収容記憶されている。The personal computer 11 is a central processing unit (CP
U) 16, random access memory (RAM) 17, read only memory (ROM) 18, etc.
Various calculations and controls are performed according to the search program stored in 8. The external storage device is the CD-ROM 1
Not limited to 5, a magnetic disk of a patent database or the like is used, and a large number of document information such as the above-mentioned bibliographic items of patent applications and keywords are stored and stored.

【００１５】次に、上記抽出装置１の動作を図３のフロ
ーチャート等に基づいて説明する。このフローチャート
は、パソコン１１のメモリ１８に記憶されている検索プ
ログラムによって実行される。先ず、パソコン１１のメ
モリ１８に、２種類のキーワードデータ（以下、このキ
ーワードデータを第２及び第３キーワードデータとい
う）を記憶させる。Next, the operation of the extraction device 1 will be described with reference to the flowchart of FIG. This flowchart is executed by the search program stored in the memory 18 of the personal computer 11. First, two kinds of keyword data (hereinafter, this keyword data is referred to as second and third keyword data) are stored in the memory 18 of the personal computer 11.

【００１６】このキーワードデータは、図４及び図５に
示すように、メモリ１８上にテーブルＡ及びテーブルＢ
として記憶され、テーブルＡの第２キーワードデータ
は、漢字に変換されていないかあるいは該当する漢字の
存在しないひらがなの文字列で構成され、テーブルＢの
第３キーワードデータは、特許情報となり得ない漢字で
構成されている。As shown in FIGS. 4 and 5, this keyword data is stored in the table 18 and the table B on the memory 18.
The second keyword data of table A is composed of a hiragana character string that has not been converted to kanji or the corresponding kanji does not exist, and the third keyword data of table B cannot be used as patent information. It is composed of.

【００１７】そして、テーブルＡ及びテーブルＢが記憶
されている状態で、プログラムがスタート（Ｓ１００）
すると、ＣＤ−ＲＯＭ１５に記憶されているテキストフ
ァイルの一部、すなわち一番目の特許出願の文書情報の
中の、例えば、発明・考案の名称、要約（目的、構成）
及び特許請求の範囲（実用新案登録出願の場合は実用新
案登録請求の範囲）を、図６に示す状態でファイルデー
タとして読み込む（Ｓ１０１）。Then, the program starts with the table A and the table B stored (S100).
Then, in a part of the text file stored in the CD-ROM 15, that is, in the document information of the first patent application, for example, the title of the invention / invention, the abstract (purpose, configuration)
Then, the claims (the utility model registration request in the case of the utility model registration application) are read as file data in the state shown in FIG. 6 (S101).

【００１８】このファイルデータが読み込まれると、パ
ソコン１１のＣＰＵ１６によって、ファイルデータ中に
第１キーワードデータがあるか否かが判断（Ｓ１０２）
される。第１キーワードデータは漢字、カタカナ、数字
や記号等の符号、ローマ字等からなり、第１キーワード
データがある場合は、ステップ１０２で「ＹＥＳ」とな
り、これらのキーワードデータが、図７に示すように抽
出（Ｓ１０３）されてメモリ１７に一時記憶されるWhen this file data is read, the CPU 16 of the personal computer 11 determines whether or not there is the first keyword data in the file data (S102).
Is done. The first keyword data consists of Kanji, Katakana, codes such as numbers and symbols, Roman letters, etc. If there is the first keyword data, it becomes "YES" at step 102, and these keyword data are as shown in FIG. It is extracted (S103) and temporarily stored in the memory 17.

【００１９】次に、ファイルデータ中に第２キーワード
データがあるか否かが判断（Ｓ１０４）される。この判
断は、メモリ１８に予め記憶されている上記第２キーワ
ードデータのテーブルＡとデータを比較することによっ
て行われ、第２キーワードデータがある場合は、ステッ
プ１０４で「ＹＥＳ」となり、図８に示すような特定の
ひらがなからなる文字列（実施例では点線の下線を付し
た「ねじ」）が抽出（Ｓ１０５）されてメモリ１７に一
時記憶される。Next, it is judged whether or not there is the second keyword data in the file data (S104). This judgment is made by comparing the data with the table A of the second keyword data stored in advance in the memory 18, and if there is the second keyword data, it becomes “YES” in step 104, and FIG. A character string composed of a specific hiragana as shown (in the embodiment, a "screw" with a dotted underline) is extracted (S105) and temporarily stored in the memory 17.

【００２０】なお、ステップ１０２で第１キーワードデ
ータがない場合は、ステップ１０２で「ＮＯ」となり、
ステップ１０４にジャンプする。また、ステップ１０４
で第２キーワードデータがない場合は、ステップ１０４
で「ＮＯ」となり、後述するステップ１０６にジャンプ
する。If there is no first keyword data in step 102, "NO" in step 102,
Jump to step 104. Also, step 104
If there is no second keyword data in step 104,
Becomes "NO", and the process jumps to step 106 described later.

【００２１】第１及び第２キーワードデータが抽出さ
れ、メモリ１７に一時記憶されると、第３キーワードデ
ータがあるか否かが判断（Ｓ１０６）される。この判断
は、ステップ１０４と同様にメモリ１８に予め記憶され
ているテーブルＢと、メモリ１７に記憶されている抽出
したキーワードデータとを比較することによって行わ
れ、第３キーワードデータがある場合は、ステップ１０
６で「ＹＥＳ」となって、第３キーワードデータをメモ
リ１７上から消去（Ｓ１０７）する。このステップ１０
７により、例えば図８の場合、下線を付した「特許請
求」とか「前記」等の第３キーワードデータが消去さ
れ、メモリ１７上には図９に示すようなキーワードデー
タが記憶されていることになる。When the first and second keyword data are extracted and temporarily stored in the memory 17, it is judged whether or not there is the third keyword data (S106). This determination is performed by comparing the table B stored in advance in the memory 18 with the extracted keyword data stored in the memory 17, as in step 104. If there is the third keyword data, Step 10
When the answer is 6, YES, the third keyword data is erased from the memory 17 (S107). This step 10
7. For example, in FIG. 8, the third keyword data such as underlined “patent claim” or “previous” is deleted by 7, and the memory 17 stores the keyword data as shown in FIG. become.

【００２２】第３キーワードデータが消去されると、だ
ぶりキーワードデータがあるか否かが判断（Ｓ１０８）
され、この判断で「ＹＥＳ」の場合は、だぶっているキ
ーワードデータのうち後方に位置する同一のキーワード
データを全て消去（Ｓ１０９）する。図９の実施例にお
いては、下線を付した「エネルギ吸収プレート１」と
「コラムブラケット」が消去されることになる。これに
より、一番目の特許出願のファイルデータ（発明の名
称、要約及び特許請求の範囲）からキーワードデータが
図１０に示す如く抽出され、プログラムが終了（Ｓ１１
０）する。When the third keyword data is erased, it is judged whether or not there is a dull keyword data (S108).
If "YES" in this determination, all the same keyword data located in the rear of the sloppy keyword data are erased (S109). In the embodiment of FIG. 9, the underlined "energy absorbing plate 1" and "column bracket" are eliminated. As a result, the keyword data is extracted from the file data of the first patent application (name of invention, abstract and claims) as shown in FIG. 10, and the program ends (S11).
0).

【００２３】そして、二番目の特許出願に係わるファイ
ルデータについて、ステップ１０１〜１０９が同様に繰
り返えされ、キーワードデータが抽出されてメモリ１７
に新たに記憶され、これらの各キーワードデータに基づ
いて、検索装置１０によるキーワード検索処理が行われ
る。この検索処理としては、例えば特開平５−１３５１
０９号公報に示すように、対象特許のキーワードと該当
特許の類似度を演算することによって行われる。この検
索時、抽出されたキーワードデータには人的要素の関与
が全くなく機械的に抽出されるため、特許請求の範囲等
のキーワードがそのままの状態で検索されることにな
る。Then, with respect to the file data relating to the second patent application, steps 101 to 109 are similarly repeated, and the keyword data is extracted and stored in the memory 17.
The keyword search processing is performed by the search device 10 based on each of these keyword data. As this search processing, for example, Japanese Patent Application Laid-Open No. 5-1351
As disclosed in Japanese Patent Publication No. 09, the method is performed by calculating the similarity between the keyword of the target patent and the relevant patent. At the time of this search, the extracted keyword data is mechanically extracted with no human element involved, so that the keywords such as the claims are searched as they are.

【００２４】なお、上記のフローチャートにおいて、第
２キーワードデータの抽出後に第１キーワードデータを
抽出するようにしても良く、また、第３キーワードデー
タを最初から抽出し得ない状態とし、その後、第１及び
第２キーワードデータを抽出するようにしても良い。In the above flow chart, the first keyword data may be extracted after the extraction of the second keyword data, or the third keyword data may not be extracted from the beginning, and then the first keyword data may be extracted. Alternatively, the second keyword data may be extracted.

【００２５】このように、上記実施例によれば、発明の
名称、要約及び特許請求の範囲中のキーワードデータを
そのままの状態で抽出することができると共に、抽出に
人為的要素が全く関与しないため、例えば権利範囲とな
る特許請求の範囲に基づくキーワード検索ができて、発
明の要旨を的確に把握した機械検索が可能になる。その
結果、複雑かつ細分化されつつある特許情報を高能率か
つ高精度に検索できて、キーワード検索効率を向上させ
ることができると共に、例えばキーワードに関するパテ
ントマップやクレームマップ等のパテントマップの作成
等を容易に行うことができる。As described above, according to the above-described embodiment, the keyword data in the title, abstract and claims of the invention can be extracted as it is, and no artificial element is involved in the extraction. For example, it is possible to perform a keyword search based on the scope of claims, which is the scope of rights, and it is possible to perform a machine search that accurately grasps the gist of the invention. As a result, complicated and subdivided patent information can be searched with high efficiency and high accuracy, and keyword search efficiency can be improved, and for example, patent maps relating to keywords and patent maps such as claim maps can be created. It can be done easily.

【００２６】また、３種類のキーワードデータを種々設
定することにより、発明の技術的レベル及び検索精度等
に応じたキーワードデータの抽出ができ、汎用性の高い
抽出装置１や検索装置１０を得ることができる。さら
に、抽出装置１をパソコン１１内に組み込むことがで
き、検索装置１０自体を安価に構成し得て、検索費用の
コストダウンを図ることができる。Further, by setting three kinds of keyword data variously, the keyword data can be extracted according to the technical level of the invention, the retrieval accuracy, etc., and the extraction device 1 and the retrieval device 10 having high versatility can be obtained. You can Further, the extraction device 1 can be incorporated in the personal computer 11, the search device 10 itself can be constructed at low cost, and the search cost can be reduced.

【００２７】なお、上記実施例においては、テキストフ
ァイルから読み込むファイルデータとして、発明の名
称、要約、特許請求の範囲を使用する場合について説明
したが、本発明はこれに限定されず、例えば特許請求の
範囲のみに適用しても良いし、あるいは明細書の実施例
（図面）の説明に具体的に使用されているキーワードに
も適用できる。この実施例の説明に適用する場合は、抽
出したキーワードを蓄積することによって、特定の構造
体の改良（例えばメカトロニクス分野）に係わる特許情
報の検索抽出が可能になる。In the above embodiments, the case where the title of the invention, the abstract, and the scope of claims are used as the file data read from the text file has been described, but the present invention is not limited to this and, for example, claims The present invention may be applied only to the above range, or may be applied to the keywords specifically used for the description of the embodiments (drawings) in the specification. When applied to the description of this embodiment, by accumulating the extracted keywords, it becomes possible to search and extract patent information related to improvement of a specific structure (for example, in the field of mechatronics).

【００２８】また、上記実施例においては、３種類のキ
ーワードデータを設定して比較したが、例えば４種類以
上のキーワードデータを設定して、ファイルデータと比
較しても良い。さらに、上記実施例におけるハードウェ
アーの構成等も一例であって、本発明の要旨を逸脱しな
い範囲において、種々変更可能であることは言うまでも
ない。In the above embodiment, three types of keyword data are set and compared, but, for example, four or more types of keyword data may be set and compared with file data. Furthermore, it goes without saying that the hardware configuration and the like in the above embodiments are also examples, and various modifications can be made without departing from the spirit of the present invention.

【００２９】[0029]

【発明の効果】以上詳述したように、本発明の特許デー
タベースのキーワードデータ抽出方法及びキーワードデ
ータ抽出装置によれば、特許請求の範囲等からキーワー
ドデータをそのままの状態で自動的に抽出することがで
き、発明の要旨を的確に把握することができて、キーワ
ード検索効率を向上させることができる等の効果を奏す
る。As described above in detail, according to the keyword data extracting method and the keyword data extracting device of the patent database of the present invention, the keyword data can be automatically extracted from the claims and the like as they are. Therefore, the gist of the invention can be accurately grasped, and the keyword search efficiency can be improved.

[Brief description of the drawings]

【図１】本発明に係わる特許データベースのキーワード
データ抽出装置のブロック図FIG. 1 is a block diagram of a keyword data extraction device for a patent database according to the present invention.

【図２】同抽出装置を使用した特許データベース検索装
置のブロック図FIG. 2 is a block diagram of a patent database search device using the extraction device.

【図３】同抽出装置の動作を説明するためのフローチャ
ートFIG. 3 is a flowchart for explaining the operation of the extraction device.

【図４】同第２キーワードデータを示すテーブルＡの概
念図FIG. 4 is a conceptual diagram of table A showing the second keyword data.

【図５】同第３キーワードデータを示すテーブルＢの概
念図FIG. 5 is a conceptual diagram of table B showing the third keyword data.

【図６】同抽出前のファイルデータの一部を示す概念図FIG. 6 is a conceptual diagram showing a part of the file data before the extraction.

【図７】同第１キーワードデータを抽出した状態を示す
概念図FIG. 7 is a conceptual diagram showing a state in which the first keyword data is extracted.

【図８】同第２キーワードデータを抽出した状態を示す
概念図FIG. 8 is a conceptual diagram showing a state in which the second keyword data is extracted.

【図９】同第３キーワードデータを消去した状態を示す
概念図FIG. 9 is a conceptual diagram showing a state in which the third keyword data is deleted.

【図１０】同だぶりキーワードデータを消去した状態を
示す概念図FIG. 10 is a conceptual diagram showing a state where the same keyword data is deleted.

[Explanation of symbols]

１・・・・・・・・・抽出装置２・・・・・・・・・キーワードデータ記憶手段３・・・・・・・・・外部記憶手段４・・・・・・・・・制御手段５・・・・・・・・・記憶手段６・・・・・・・・・入力手段７・・・・・・・・・出力手段１０・・・・・・・・検索装置１１・・・・・・・・パソコン１２・・・・・・・・キーボード１３・・・・・・・・ディスプレイ（ＣＲＴ）１４・・・・・・・・プリンタ１５・・・・・・・・ＣＤ−ＲＯＭ１６・・・・・・・・中央処理装置（ＣＰＵ）１７・・・・・・・・ランダムアクセスメモリ（ＲＡ
Ｍ）１８・・・・・・・・リードオンリーメモリ（ＲＯＭ）1 --- Extractor 2 --- Keyword data storage means 3--External storage means 4--Control Means 5 ・・・・・・ Storage means 6 ・・・・・・・・ Input means 7 ・・・・・・・ Output means 10 ・・・・・・ Search device 11 ・··· PC 12 ··· Keyboard 13 ··· Display (CRT) 14 ··· Printer 15 ··· CD-ROM 16 ... Central processing unit (CPU) 17 ... Random access memory (RA
M) 18 ... Read-only memory (ROM)

Claims

[Claims]

1. A keyword data extraction method for extracting keyword data from document information of a patent application stored in an external storage means, wherein first keyword data consisting of kanji, katakana, code, romaji, etc. is extracted from the document information. And a step of extracting second keyword data consisting of a hiragana character string from the document information, and a step of leaving the third keyword data consisting of a specific kanji in a non-extracted state. Patent database keyword data extraction method.

2. A keyword data extraction device for extracting keyword data from document information of a patent application stored in an external storage means, wherein the keyword data storage means sets and stores a plurality of types of keyword data, and the keyword data storage means Based on the keyword data, first keyword data consisting of kanji, katakana, code, romaji, etc. and second keyword data consisting of a hiragana character string are extracted from the document information of the external storage means, and at the same time from a particular kanji And a control unit capable of keeping the third keyword data in a non-extracted state.