JPH11306198A

JPH11306198A - Retrieval data base construction method, system therefor and recording medium

Info

Publication number: JPH11306198A
Application number: JP10115767A
Authority: JP
Inventors: Toru Taguchi; 徹田口
Original assignee: NEC Communication Systems Ltd
Current assignee: NEC Communication Systems Ltd
Priority date: 1998-04-24
Filing date: 1998-04-24
Publication date: 1999-11-05

Abstract

PROBLEM TO BE SOLVED: To extract the important words, etc., from the files for construction of a retrieval data base by retrieving the punctuation symbols out of the character strings of the retrieved files, extracting the words which are held between the punctuation symbols and accumulating the extracted words for every retrieved file in the form of a retrieval data base. SOLUTION: This system includes a storage A which stores the information and a data processor B which operates under the control of a program. The storage A includes an input file storage part 1 which stores plural files to be retrieved, a key storage part 3 which stores the keys used for preparing a data base and a retrieval data base storage part 4 which stores the produced data. Meanwhile, the processor B includes a file type discrimination part 21 which discriminates the types of files which are read out of the part 1 and a word retrieval part 22 which retrieves and extracts the words based on the file types and the retrieval keys stored in the part 3. Thus, the processor B forms a retrieval data base production part 2.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、検索データベース
を構築する方法及び検索データの構築システムに関し、
特に、単語の抽出を自動的に行うことを可能とした検索
データベース構築方法及び検索データ構築システム並び
に記録媒体に関する。The present invention relates to a method for constructing a search database and a system for constructing search data.
In particular, the present invention relates to a search database construction method, a search data construction system, and a recording medium capable of automatically extracting words.

【０００２】[0002]

【従来の技術】従来、複数のファイルの中から特定の単
語を記述しているファイルを検索するシステムとして
は、全ファイルを特定の単語により検索する方式と、予
め全ファイルの重要単語を抽出してデータベース化して
おき、これを前記単語により検索する方式とがある。2. Description of the Related Art Conventionally, as a system for retrieving a file describing a specific word from a plurality of files, a method of retrieving all files by a specific word and a method of extracting important words of all files in advance are described. There is a method in which a database is prepared by using the above-mentioned word and searched.

【０００３】複数のファイルの中から特定の単語を記述
しているファイルを検索するシステムにおいては、全フ
ァイルを特定の単語により１ファイル毎に検索するフル
テキスト検索的な処理を行う検索手法を採用すると検索
処理が膨大となり、多大な処理時間を必要とする。ま
た、あらかじめ全ファイルの全ての内容に基づき、記述
されている全ての単語を抽出してデータベースとして格
納しておき、これを検索する手法を用いると、ファイル
の内容全てをデータベースとして格納する必要があるた
め、データベースとして入力ファイルの合計分のデータ
が必要になり、また、同一の単語が同じファイルで重複
登録される可能性も高いため、無駄な検索が増え検索時
間は必ずしも短くならないという問題がある。A system for searching for a file describing a specific word from a plurality of files employs a search method that performs a full-text search-like process of searching all files for each file by a specific word. Then, the search processing becomes enormous, and a large processing time is required. Also, based on all the contents of all the files, all the words described are extracted in advance and stored as a database, and if a search method is used, it is necessary to store all the contents of the file as a database. Therefore, there is a problem that data for the total number of input files is required as a database, and there is a high possibility that the same word is registered twice in the same file. is there.

【０００４】図９は、予め重要単語を抽出してデータベ
ース化しておき、これを対象に特定の単語を検索するよ
うにした検索データベースの作成方式に関するものであ
る。FIG. 9 relates to a method for creating a search database in which important words are extracted in advance and made into a database, and a specific word is searched for the extracted words.

【０００５】文書作成様式として技術用語に図面番号
（符号）を付与するような形式のテキストファイルを対
象とし、これを記憶する記憶手段３１と、前記記憶手段
３１に対してユーザーがキーワード検索を行うキーワー
ド検索手段３２とを有するデータベース検索装置におい
て、前記記憶手段３１から読み出したテキストファイル
の前記符号と所定の位置関係にある文字列をキーワード
として抽出するキーワード自動抽出手段３３と、抽出し
たキーワードをキーワード検出手段３２が検索できるよ
うに前記テキストファイル毎に抽出したキーワードを記
憶するキーワードファイル記憶手段３４とを設けてい
る。図１０は、テキストファイルの符号が付与された文
字列と、前記符号と、抽出された単語（キーワード）と
の対応関係を示す図である。A text file of a format in which a drawing number (sign) is added to a technical term as a document creation format is targeted, and a storage means 31 for storing the text file, and a user performs a keyword search on the storage means 31. In a database search device having keyword search means 32, a keyword automatic extraction means 33 for extracting a character string having a predetermined positional relationship with the code of the text file read from the storage means 31 as a keyword, There is provided a keyword file storage means 34 for storing the keywords extracted for each text file so that the detection means 32 can search. FIG. 10 is a diagram illustrating a correspondence relationship between a character string to which a code of a text file is assigned, the code, and an extracted word (keyword).

【０００６】[0006]

【発明が解決しようとする課題】図９に示す従来のデー
タベース検索装置における検索データベースの作成手法
では、単語には図面番号（符号）が付与されるという文
書作成様式に基づく文書に限られ、単語に符号が付与さ
れることの少ない通常の文書作成様式のものに適用する
ことは困難である。In the method of creating a search database in the conventional database search apparatus shown in FIG. 9, words are limited to documents based on a document creation format in which drawing numbers (signs) are assigned. It is difficult to apply the method to an ordinary document creation style in which a code is rarely added to a document.

【０００７】また、従来の検索データベースの作成手法
では、検索するファイルとしてＨＴＭＬファイル等のテ
キストファイル以外の種別のものを対象とする場合に
は、膨大なタグと単語に付随する符号との区別が付かな
いなどにより効率的な単語の特定が困難となり、ファイ
ル種別に応じた効率的な単語の抽出も困難であるという
問題がある。In the conventional search database creation method, when a file to be searched is of a type other than a text file such as an HTML file, a huge number of tags and codes accompanying words are distinguished. There is a problem that it is difficult to efficiently identify words due to no attachment, and it is also difficult to efficiently extract words according to the file type.

【０００８】（発明の目的）本発明の目的は、ファイル
から重要な単語等を抽出し検索データベースを構築する
ことを可能とする検索データベース構築方法及び検索デ
ータ構築システム並びにプログラムが記憶された記録媒
体を提供することにある。(Object of the Invention) An object of the present invention is to provide a search database construction method and a search data construction system capable of extracting important words and the like from a file and constructing a search database, and a recording medium storing a program. Is to provide.

【０００９】本発明の他の目的は、データの膨張を招く
ことなく、短時間に検索を行える検索データベースを構
築することを可能にする検索データベース構築方法及び
検索データ構築システム並びにプログラムが記憶された
記録媒体を提供することにある。Another object of the present invention is to store a search database construction method, a search data construction system, and a program which enable a search database to be constructed in a short time without causing data expansion. It is to provide a recording medium.

【００１０】本発明の他の目的は、ファイル種別に拘わ
らず各種ファイルから重要な単語等を抽出し検索データ
ベースを構築することを可能とする検索データベース構
築方法及び検索データ構築システム並びにプログラムが
記憶された記録媒体を提供することにある。Another object of the present invention is to store a search database construction method, a search data construction system, and a program, which are capable of extracting important words and the like from various files and constructing a search database regardless of the file type. To provide a recording medium.

【００１１】[0011]

【課題を解決するための手段】本発明の検索データベー
ス構築方法は、複数の検索ファイルから検索用の単語を
抽出して検索データベースを作成する検索データベース
構築方法において、検索ファイルの文字列の区切り記号
を用いて、前記検索ファイルの文字列から区切り記号を
検索し、前記区切り記号により挟まれた単語を抽出し、
抽出した単語を検索ファイル毎に検索データベースとし
て蓄積することを特徴とする。According to the present invention, there is provided a search database construction method for extracting a search word from a plurality of search files to create a search database. Using, to search for a delimiter from the character string of the search file, to extract the word sandwiched by the delimiter,
It is characterized in that the extracted words are stored as a search database for each search file.

【００１２】また、本発明の検索データ構築システム
は、複数の検索ファイルを記憶する入力ファイル記憶部
と、検索ファイルの文字列に含まれる区切り符号を記憶
するキー記憶部と、抽出した単語をファイル毎にデータ
ベースとして記憶する検索データベース記憶部と、前記
入力ファイル記憶部及び前記キー記憶部から読み込んだ
検索ファイル及び区切り記号により検索ファイルの文字
列から区切り記号により挟まれた単語を抽出して前記検
索データベース記憶部に記憶する検索データベース作成
部とを有することを特徴とする。Also, the search data construction system of the present invention includes an input file storage unit for storing a plurality of search files, a key storage unit for storing a delimiter included in a character string of the search file, and a method for storing extracted words in a file. A search database storage unit for storing a search file read from the input file storage unit and the key storage unit, and extracting a word sandwiched by delimiters from a character string of the search file using delimiters. A search database creation unit that stores the search database in the database storage unit.

【００１３】更に、本発明の検索データ構築システム
は、単語の抽出を行う毎に検索ファイルのタイムスタン
プを記憶するタイムスタンプデータベースを有し、前記
検索データベース作成部は入力ファイル記憶部から読み
込んだ検索ファイルの作成時のタイムスタンプと前記タ
イムスタンプデータベースから読み込んだ当該検索ファ
イルのタイムスタンプとを比較して、新検索ファイル又
は更新検索ファイルのみから単語を抽出することを特徴
とする。Further, the search data construction system of the present invention has a time stamp database for storing a time stamp of a search file every time a word is extracted, and the search database creation unit reads the search file read from the input file storage unit. The time stamp at the time of file creation is compared with the time stamp of the search file read from the time stamp database, and words are extracted from only the new search file or the updated search file.

【００１４】また、本発明の記録媒体は、検索ファイル
を入力するファイル入力処理と、検索ファイルのファイ
ル種別を判別するファイル判別処理と、入力した検索フ
ァイルのファイル種別に応じた区切り記号を入力する区
切り記号入力処理と、前記区切り記号を用いて前記検索
ファイルの文字列から区切り記号を検索する区切り記号
検索処理と、前記区切り記号により挟まれた単語を抽出
する単語抽出処理と、抽出した単語を検索ファイル毎に
検索データベースとして蓄積する記憶処理とをコンピュ
ータに実行させるためのプログラムが記録されている。Further, in the recording medium of the present invention, a file input process for inputting a search file, a file determination process for determining a file type of the search file, and a delimiter corresponding to the file type of the input search file are input. A delimiter input process, a delimiter search process for searching for a delimiter from the character string of the search file using the delimiter, a word extraction process for extracting a word sandwiched between the delimiters, A program for causing a computer to execute storage processing for accumulating as a search database for each search file is recorded.

【００１５】（作用）本発明は、複数のファイルの中か
ら特定の単語が記述されているファイルを検索するた
め、事前に各ファイル中の重要な単語を抽出して検索デ
ータベースとして登録しておき、単語検索時は検索デー
タベースをサーチすることにより検索効率を上げること
を可能とする。抽出する単語は特定の区切り記号のキー
ワード内に記述された単語を検索データベースに蓄積す
べき重要単語として扱うようにする。(Operation) In the present invention, in order to search for a file in which a specific word is described from a plurality of files, important words in each file are extracted in advance and registered as a search database. When searching words, the search efficiency can be improved by searching the search database. As words to be extracted, words described in keywords of specific delimiters are treated as important words to be stored in the search database.

【００１６】区切りを表す記号は、ファイル種別毎にデ
ータベース化しファイル種別に応じて選択して用いる。
また、タイムスタンプデータベースを用いて更新された
検索ファイル又は新しい検索ファイルのみから単語を抽
出し検索データベースを更新する。A symbol representing a delimiter is made into a database for each file type and is selected and used according to the file type.
Also, words are extracted from only the updated search file or the new search file using the time stamp database, and the search database is updated.

【００１７】[0017]

【発明の実施の形態】本発明の検索データ構築システム
の一実施の形態について詳細に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of a search data construction system according to the present invention will be described in detail.

【００１８】図１は、本実施の形態を示すブロック図で
ある。本実施の形態は、情報を記憶する記憶装置Ａと、
プログラム制御により動作するデータ処理装置Ｂとを備
える。FIG. 1 is a block diagram showing the present embodiment. This embodiment includes a storage device A for storing information,
A data processing device B that operates under program control.

【００１９】記憶装置Ａは、検索対象となる複数のファ
イルを記憶する入力ファイル記憶部１と、データベース
を作成する際に使用するキーを記憶するキー記憶部３
と、作成されたデータを記憶する検索データベース記憶
部４とを有する。The storage device A includes an input file storage unit 1 for storing a plurality of files to be searched, and a key storage unit 3 for storing keys used when creating a database.
And a search database storage unit 4 for storing the created data.

【００２０】データ処理装置Ｂは、入力ファイル記憶部
１から読み込んだファイルの種別を判別するファイル種
別判別部２１と、当該ファイルのファイル種別に基づき
キー記憶部３からの検索キーに基づき単語を検索して抽
出する単語検索部２２とからなり、データ処理装置Ｂは
検索データベース作成部２を構成する。The data processing device B searches for a word based on a search key from the key storage unit 3 based on the file type of the file, and a file type determination unit 21 for determining the type of the file read from the input file storage unit 1. The data processing device B constitutes a search database creation unit 2.

【００２１】次に、記憶装置Ａにおける各記憶部におけ
る記憶データ及び処理装置Ｂにおける各部の処理につい
て具体的に説明する。Next, the data stored in each storage unit of the storage device A and the processing of each unit in the processing device B will be specifically described.

【００２２】入力ファイル記憶部１には、ファイル種別
がＴｅｘｔ（テキスト）、ＴｅＸ、ＨＴＭＬ等、各種の
形式の文書等が記憶される。The input file storage unit 1 stores documents in various formats such as Text (text), TeX, and HTML.

【００２３】図２は、入力ファイル記憶部１に記憶され
たファイルの例を示す図である。同図の文書はテキスト
ファイルであり、ファイル名はｒｅａｄｍｅ．ｔｘｔと
して記憶されている。FIG. 2 is a diagram showing an example of a file stored in the input file storage unit 1. The document in the figure is a text file, and the file name is readme. txt.

【００２４】キー記憶部３には、検索データベースの作
成の際に抽出する単語を特定するためのキーとなる区切
り記号等が前記ファイル種別毎にあらかじめ選定され記
憶されている。In the key storage unit 3, a delimiter or the like serving as a key for specifying a word to be extracted when creating a search database is previously selected and stored for each file type.

【００２５】図３は、このようなファイル種別毎のキー
の記憶例を示す図である。同図において、例えば、テキ
ストファイルの場合には、″”″、 ″「 ″、 ″
１．″、 ″２．″、……等に続き、それぞれ
″”″、 ″」″、 ″改行コード″、……等までに
記述された単語を抽出するようにし、前記記号、文字等
をキーとしてファイル種別毎にあらかじめ選定する。FIG. 3 is a diagram showing a storage example of such keys for each file type. In the figure, for example, in the case of a text file, "", ""","
1. "," 2. …,…, Etc., and the words described before ″ ″ ″, ″ ″ ″, ″ line feed code ″,…, etc., respectively, are extracted. Select.

【００２６】検索データベース記憶部４には、検索を行
う際に参照されるデータであり、単語、ファイル名、フ
ァイル種別が対応して記述されたデータが検索データベ
ースとして記憶される。The search database storage unit 4 stores, as a search database, data that is referred to when performing a search and that describes data corresponding to words, file names, and file types.

【００２７】図４は、検索データベースの例を示す図で
あり、検索データベースは、例えば、ＡＴＭ（単語）−
ｒｅａｄｍｅ．ｔｘｔ（ファイル名）−Ｔｅｘｔ（ファ
イル種別）のような関連付けを有するデータベースとし
て蓄積される。FIG. 4 is a diagram showing an example of a search database. The search database is, for example, an ATM (word)-
readme. The data is stored as a database having an association such as txt (file name) -Text (file type).

【００２８】次に、処理装置Ｂの検索データベース作成
部２のファイル種別判別部２１及び単語検索部２２につ
いて説明する。Next, the file type determining section 21 and the word searching section 22 of the search database creating section 2 of the processing device B will be described.

【００２９】ファイル種別判別部２１は、入力ファイル
記憶部１よりファイルを読み込み、ファイル種別を判別
して、読み込んだ入力ファイルとともに単語検索部に出
力する。The file type determination section 21 reads a file from the input file storage section 1, determines the file type, and outputs the file type to the word search section together with the read input file.

【００３０】単語検索部２２は、キー記憶部３にアクセ
スし、ファイル種別判別部２１から入力したファイル種
別の情報により、記憶されている前記キーデータの内、
入力されたファイルのファイル種別に対応するキーを抽
出して、当該入力ファイルの前記キーを検索し、前記キ
ーに付随して記憶されている単語を抽出し、重複単語を
排除して、抽出単語と入力ファイルのファイル種別及び
ファイル名を対応関係をもって検索データベース記憶部
４に出力する。The word search unit 22 accesses the key storage unit 3 and, based on the file type information input from the file type determination unit 21, among the stored key data,
The key corresponding to the file type of the input file is extracted, the key of the input file is searched, the word stored in association with the key is extracted, the duplicate word is eliminated, and the extracted word is extracted. And the file type and file name of the input file are output to the search database storage unit 4 in a correspondence relationship.

【００３１】検索データベース記憶部４は、単語検索部
２２からの出力を蓄積し検索データベースを構築する。
図５は、検索データベースの他の例を示す図である。The search database storage unit 4 stores the output from the word search unit 22 and constructs a search database.
FIG. 5 is a diagram illustrating another example of the search database.

【００３２】次に、図１に示す本実施の形態における検
索データベースの作成動作を、図２〜図４及び図６を参
照して詳細に説明する。図６は、本実施の形態の動作を
示すフロー図である。Next, the operation of creating the search database according to the present embodiment shown in FIG. 1 will be described in detail with reference to FIGS. FIG. 6 is a flowchart showing the operation of the present embodiment.

【００３３】キー記憶部３には、あらかじめ各種のファ
イルのファイル種別毎のキーデータが記憶されており、
ユーザは、検索データベースの作成時の設定としてキー
記憶部３のキーを適宜、登録又は削除する操作を行う。The key storage unit 3 stores key data for each file type of various files in advance.
The user performs an operation of registering or deleting a key of the key storage unit 3 as appropriate when setting the search database.

【００３４】入力ファイル記憶部１には、ユーザの日常
の文書作成時のファイル等が記憶され、入力ファイル記
憶部１に記憶されたファイルはファイル種別判別部２１
に出力される。The input file storage unit 1 stores a file or the like at the time of a user's daily document creation, and the file stored in the input file storage unit 1 is a file type determination unit 21.
Is output to

【００３５】ファイル種別判別部２１は、入力されたフ
ァイルのファイル種別が何であるか、例えば、Ｔｅｘｔ
ファイル、ＨＴＭＬファイル等の何れのファイル種別か
を判別し、ファイル種別及びファイルの内容を単語検索
部２２に出力する（図４のＡ１，Ａ２）。The file type determining section 21 determines the type of the input file, for example, Text
The file type and the content of the file are output to the word search unit 22 by determining which file type is the file or the HTML file (A1, A2 in FIG. 4).

【００３６】単語検索部２２は、ファイル種別判別部２
１から供給されたファイル種別及びファイル名からキー
記憶部３に登録されたキーを検索し、キー記憶部３に記
憶されている単語抽出に使用されるファイル種別毎のキ
ーを読み出し、ファイル内容からキーを検索する（図４
のＡ３，Ａ４）。キーが検索されたらそれに付随する単
語をファイル内容から抽出し、検索データベース４に出
力する（図４のＡ５，Ａ６）。The word search section 22 is a file type determination section 2
The key registered in the key storage unit 3 is searched from the file type and the file name supplied from 1 and the key for each file type used for word extraction stored in the key storage unit 3 is read out. Search for a key (Fig. 4
A3, A4). When the key is searched, a word accompanying the key is extracted from the file contents and output to the search database 4 (A5, A6 in FIG. 4).

【００３７】具体例として、図２に示すＴｅｘｔファイ
ルが入力ファイルとして与えられ、当該ファイルのキー
として図３に示すキーが与えられた場合の動作を説明す
る。As a specific example, an operation when the text file shown in FIG. 2 is given as an input file and the key shown in FIG. 3 is given as a key of the file will be described.

【００３８】ファイル種別判別部２１は、入力ファイル
がＴｅｘｔファイルであることを判別し、当該ファイル
を単語検索部２２に出力する（図４のＡ１，Ａ２）。The file type determining section 21 determines that the input file is a text file, and outputs the file to the word searching section 22 (A1, A2 in FIG. 4).

【００３９】単語検索部２２は、供給されたファイルか
らファイル種別を判断し、キー記憶部３のＴｅｘｔファ
イルのキーとして登録してあるファーストキー、すなわ
ち、″”″、 ″「″、 ″１．″、 ″２．″、……
等を検索する（図４のＡ３，Ａ４）、次に、前記キーに
対応するエンドキー、すなわち、″”″、 ″」″、″
改行コード″、……等を検索する。検索の結果、図５の
入力ファイルにおいてファーストキーとエンドキーとそ
の間の単語とを含む箇所として「１．ＡＴＭ」、「”セ
ル（Ｃｅｌｌ）”」、「２．信号方式」が特定される。
そこで、ファーストキーとエンドキー間の単語を、当該
ファイルのファイル名及びファイル種別とともに検索デ
ータベース４に出力する（図４のＡ５，Ａ６）。結果と
して図４に示す検索データベースが作成され、複数のフ
ァイルの中から特定の単語を記述しているファイルを検
索する際の検索データベースとして利用される。The word search unit 22 determines the file type from the supplied file, and registers the first key registered as the key of the text file in the key storage unit 3, that is, "", "", "1. "," 2. ", ...
(A3, A4 in FIG. 4), and then an end key corresponding to the key, ie, "", "", ""
Searches for line feed codes “,..., Etc. As a result of the search,“ 1. ATM ”,“ ”(Cell)”, "2. Signal system" is specified.
Therefore, the word between the first key and the end key is output to the search database 4 together with the file name and file type of the file (A5, A6 in FIG. 4). As a result, the search database shown in FIG. 4 is created, and is used as a search database when searching for a file describing a specific word from a plurality of files.

【００４０】（他の実施の形態）次に、本発明の他の実
施の形態について図面を参照して詳細に説明する。(Other Embodiment) Next, another embodiment of the present invention will be described in detail with reference to the drawings.

【００４１】図７は、本実施の形態の構成を示すブロッ
ク図であり、図８はその動作フローを示すフロー図であ
る。同図を参照すると、本実施の形態では、図１に示し
た実施の形態に加え、タイムスタンプ比較部２１と、タ
イムスタンプデータベース３とを有する点で図１の実施
の形態と異なる。FIG. 7 is a block diagram showing a configuration of the present embodiment, and FIG. 8 is a flowchart showing an operation flow thereof. Referring to the figure, the present embodiment is different from the embodiment of FIG. 1 in that a time stamp comparing unit 21 and a time stamp database 3 are provided in addition to the embodiment shown in FIG.

【００４２】入力ファイル記憶部１に記憶されるファイ
ルには、通常そのファイルの作成時及び更新時のタイム
スタンプ情報が付加されている。The file stored in the input file storage unit 1 is usually provided with time stamp information when the file is created and updated.

【００４３】タイムスタンプデータベース記憶部３に
は、各ファイル毎に検索データベースを作成したときの
タイムスタンプデータが記憶され、該データベースは検
索データベースを作成する毎に更新される。The time stamp database storage unit 3 stores time stamp data when a search database is created for each file, and the database is updated each time a search database is created.

【００４４】タイムスタンプ比較部２１は、入力ファイ
ルの作成時及び更新時のタイムスタンプを読み取り、タ
イプスタンプデータベース記憶部３の当該ファイルのタ
イムスタンプとを比較し、同じであった場合は処理を終
了する（図８のＡ２，Ａ３）。タイムスタンプが異なる
場合は、そのファイルが新しく作成されたもの又は更新
されたものであるとみなして、図１に示す実施の形態と
同様にファイル種別の判別、キーの特定、キーの検索、
単語抽出及びデータベースの出力の一連の処理を続ける
（図８のＡ４，Ａ５，Ａ６，Ａ７，Ａ８）。The time stamp comparing section 21 reads the time stamps when the input file is created and updated, compares the time stamps with the time stamp of the file in the type stamp database storage section 3, and ends the processing if they are the same. (A2, A3 in FIG. 8). If the time stamps are different, the file is regarded as newly created or updated, and the file type is determined, the key is specified, the key is searched, as in the embodiment shown in FIG.
A series of word extraction and database output processing is continued (A4, A5, A6, A7, A8 in FIG. 8).

【００４５】本発明の他の実施の形態として、コンピュ
ータで検索データベースを構築する処理を行うようにし
たものを説明する。As another embodiment of the present invention, a description will be given of an embodiment in which a computer performs processing for constructing a search database.

【００４６】図９は、本実施の形態を示す図であり、検
索データベース構築プログラムを記録した記録媒体９を
備える。この記録媒体９としては磁気ディスク、半導体
メモリ、フレキシブルディスクその他の記録媒体が使用
できる。FIG. 9 is a diagram showing the present embodiment, and includes a recording medium 9 on which a search database construction program is recorded. As the recording medium 9, a magnetic disk, a semiconductor memory, a flexible disk and other recording media can be used.

【００４７】データ処理装置１０は、記録媒体９から検
索データベースプログラムを読み込み、このプログラム
により検索データベースを構築する処理を実行する。デ
ータ処理装置１０により行われる処理は以下のとおりで
ある。The data processing device 10 reads a search database program from the recording medium 9 and executes a process of constructing a search database using the program. The processing performed by the data processing device 10 is as follows.

【００４８】入力装置１１から入力された各種ファイル
又はデータ処理装置１０でファイル種別が変換されたフ
ァイルは、検索ファイルとして入力ファイル記憶部１３
１に蓄積される。同様に各ファイル種別毎の区切り記号
はキー記憶部１３２に蓄積される。入力装置からの指令
又はデータ処理装置１０における定期的な起動により、
データ処理装置１０は入力ファイル記憶部１３１のファ
イルを順次に読み込み、当該ファイルの種別に対応する
キー記憶部１３２の区切り記号を読み込み、前述のよう
な単語の抽出処理を行い、検索データベース記憶部１３
３に蓄積する。The various files input from the input device 11 or the files whose file types have been converted by the data processing device 10 are input file storage units 13 as search files.
1 is stored. Similarly, the delimiter for each file type is stored in the key storage unit 132. By a command from the input device or a regular activation in the data processing device 10,
The data processing device 10 sequentially reads the files of the input file storage unit 131, reads the delimiter of the key storage unit 132 corresponding to the type of the file, performs the above-described word extraction processing, and performs the search database storage unit 13
Store in 3.

【００４９】入力装置１１から特定の単語を入力して、
入力ファイル記憶部１３１のファイルから前記単語が記
述されているファイルを検索する場合は、データ処理装
置１０は、前記単語により検索データベース記憶部１３
３に蓄積された単語を検索し一致する単語が記述された
ファイル名を読み出し、そのファイル名から入力ファイ
ル記憶部１３１のファイルを特定して読み込み、表示部
等の出力装置１２に出力する。By inputting a specific word from the input device 11,
When retrieving a file in which the word is described from a file in the input file storage unit 131, the data processing apparatus 10 uses the word to search the search database storage unit 13
3 is searched, the file name in which the matching word is described is read, the file in the input file storage unit 131 is specified and read from the file name, and the file is output to the output device 12 such as a display unit.

【００５０】[0050]

【発明の効果】本発明によれば、ファイル内の記号等を
検索キーとして、抽出する単語を特定しファイル名等と
ともに検索データとしてデータベースに登録することに
より、抽出単語を絞り込むことができるから、登録単語
数を減少させることができ、検索データベースを小規模
化することが可能である。According to the present invention, a word to be extracted is specified by using a symbol or the like in a file as a search key, and is registered in a database as search data together with a file name or the like. The number of registered words can be reduced, and the search database can be downsized.

【００５１】本発明により作成される検索データベース
を用いることによりデータ検索時間を大幅に短縮させる
ことが可能である。By using the search database created according to the present invention, the data search time can be greatly reduced.

【００５２】また、タイムスタンプデータを利用して、
前記検索データベース作成処理時に更新されたファイル
だけから単語を抽出することで、検索データベースの作
成時間を短縮することを可能とする。Also, using the time stamp data,
By extracting words only from the file updated during the search database creation processing, it is possible to reduce the time required to create the search database.

【００５３】[0053]

[Brief description of the drawings]

【図１】本発明の検索データベース作成システムの一実
施の形態を示す図である。FIG. 1 is a diagram showing an embodiment of a search database creation system of the present invention.

【図２】検索単語を抽出する対象ファイルの例を示す図
である。FIG. 2 is a diagram illustrating an example of a target file from which a search word is extracted.

【図３】単語抽出に用いるファイル種別毎のキー（区切
り記号）を示す図である。FIG. 3 is a diagram showing keys (delimiters) for each file type used for word extraction.

【図４】図２のファイルから作成された検索データベー
スの例を示す図である。FIG. 4 is a diagram showing an example of a search database created from the file of FIG. 2;

【図５】検索データベースの他の例を示す図である。FIG. 5 is a diagram showing another example of a search database.

【図６】図１の検索データベース作成システムの動作フ
ローを示す図である。FIG. 6 is a diagram showing an operation flow of the search database creation system of FIG. 1;

【図７】本発明の検索データベース作成システムの他の
実施の形態を示す図である。FIG. 7 is a diagram showing another embodiment of the search database creation system of the present invention.

【図８】図７の検索データベース作成システムの動作フ
ローを示す図である。8 is a diagram showing an operation flow of the search database creation system of FIG.

【図９】本発明の他の実施の形態の検索データベース作
成システム及び検索データベースを構築するためのコン
ピュータプログラムが記録された記録媒体を示す図であ
る。FIG. 9 is a diagram showing a search database creation system and a recording medium on which a computer program for constructing a search database is recorded according to another embodiment of the present invention.

【図１０】従来例のデータベース作成装置を示す図であ
る。FIG. 10 is a diagram showing a conventional database creation device.

【図１１】従来例のファイルの文字列、符号及び抽出さ
れた単語の関係を示す図である。FIG. 11 is a diagram illustrating a relationship between a character string, a code, and an extracted word of a file according to a conventional example.

[Explanation of symbols]

１入力ファイル記憶部２、５検索データベース作成部３キー記憶部４検索データベース記憶部６タイムスタンプデータベース９記憶媒体１０データ処理装置（ＣＰＵ）１１入力装置１２出力装置１３記憶装置１３１入力ファイル記憶部１３２キー記憶部１３３検索データベース記憶部２１ファイル種別判別部２２単語検索部３１記憶手段３２キーワード検出手段３４キーワードファイル記憶手段３３キーワード自動抽出手段５１タイムスタンプ比較部５２ファイル種別判別部５３単語検索部 Reference Signs List 1 input file storage unit 2, 5 search database creation unit 3 key storage unit 4 search database storage unit 6 time stamp database 9 storage medium 10 data processing device (CPU) 11 input device 12 output device 13 storage device 131 input file storage unit 132 Key storage unit 133 Search database storage unit 21 File type determination unit 22 Word search unit 31 Storage unit 32 Keyword detection unit 34 Keyword file storage unit 33 Automatic keyword extraction unit 51 Time stamp comparison unit 52 File type determination unit 53 Word search unit

Claims

[Claims]

1. A search database construction method for extracting a search word from a plurality of search files to create a search database, comprising the steps of: And extracting a word sandwiched between the delimiters, and accumulating the extracted word as a search database for each search file.

2. The search database construction method according to claim 1, wherein a file type of the search file is determined, and words are extracted using a delimiter for each file type of the search file.

3. An input file storage unit for storing a plurality of search files, a key storage unit for storing a delimiter included in a character string of the search file, and a search database storage for storing extracted words as a database for each file. And a search file read from the input file storage unit and the key storage unit, and a search database created by extracting words sandwiched by delimiters from a character string of the search file by delimiters and storing the words in the search database storage unit And a search data construction system comprising:

4. The key storage unit stores a delimiter for each file type of a search file, and the search database creation unit has a file type discrimination unit, and performs the search for each file type of the search file. 4. The search data construction system according to claim 3, wherein words are extracted using a delimiter for each file type of the file.

5. A time stamp database for storing a time stamp of a search file each time a word is extracted, wherein the search database creation unit includes a time stamp at the time of creation of the search file read from an input file storage unit and the time stamp. 5. The search data construction system according to claim 3, wherein a word is extracted from only the new search file or the updated search file by comparing the time stamp of the search file read from the time stamp database.

6. A file input process for inputting a search file, a file determination process for determining a file type of the search file, a delimiter input process for inputting a delimiter according to the file type of the input search file, A delimiter search process for searching for a delimiter from the character string of the search file using a delimiter, a word extraction process for extracting words sandwiched by the delimiters, and using the extracted words as a search database for each search file A recording medium on which a program for causing a computer to execute a storage process to be stored is recorded.